mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2019-05-05
This commit is contained in:
@ -86,4 +86,14 @@ Please see the DSpace documentation for assistance.
|
||||
- I will ask ILRI ICT to reset the password
|
||||
- They reset the password and I tested it on CGSpace
|
||||
|
||||
## 2019-05-05
|
||||
|
||||
- Run all system updates on DSpace Test (linode19) and reboot it
|
||||
- Merge changes into the `5_x-prod` branch of CGSpace:
|
||||
- Updates to remove deprecated social media websites (Google+ and Delicious), update Twitter share intent, and add item title to Twitter and email links ([#421](https://github.com/ilri/DSpace/pull/421))
|
||||
- Add new CCAFS Phase II project tags ([#420](https://github.com/ilri/DSpace/pull/420))
|
||||
- Add item ID to REST API error logging ([#422](https://github.com/ilri/DSpace/pull/422))
|
||||
- Re-deploy CGSpace from `5_x-prod` branch
|
||||
- Run all system updates on CGSpace (linode18) and reboot it
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -11,11 +11,12 @@
|
||||
|
||||
CGSpace went down
|
||||
Looks like DSpace exhausted its PostgreSQL connection pool
|
||||
Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:
|
||||
|
||||
Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:
|
||||
|
||||
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
78
|
||||
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2015-11/" />
|
||||
@ -29,13 +30,14 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac
|
||||
|
||||
CGSpace went down
|
||||
Looks like DSpace exhausted its PostgreSQL connection pool
|
||||
Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:
|
||||
|
||||
Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:
|
||||
|
||||
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
78
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -119,12 +121,13 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac
|
||||
<ul>
|
||||
<li>CGSpace went down</li>
|
||||
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
|
||||
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
78
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
<li>For now I have increased the limit from 60 to 90, run updates, and rebooted the server</li>
|
||||
@ -135,60 +138,56 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac
|
||||
<ul>
|
||||
<li>CGSpace went down again</li>
|
||||
<li>Getting emails from uptimeRobot and uptimeButler that it’s down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors</li>
|
||||
<li>Looks like there are still a bunch of idle PostgreSQL connections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Looks like there are still a bunch of idle PostgreSQL connections:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
96
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>For some reason the number of idle connections is very high since we upgraded to DSpace 5</li>
|
||||
<li><p>For some reason the number of idle connections is very high since we upgraded to DSpace 5</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2015-11-25">2015-11-25</h2>
|
||||
|
||||
<ul>
|
||||
<li>Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config</li>
|
||||
<li>The OAI application requests stylesheets and javascript files with the path <code>/oai/static/css</code>, which gets matched here:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The OAI application requests stylesheets and javascript files with the path <code>/oai/static/css</code>, which gets matched here:</p>
|
||||
|
||||
<pre><code># static assets we can load from the file system directly with nginx
|
||||
location ~ /(themes|static|aspects/ReportingSuite) {
|
||||
try_files $uri @tomcat;
|
||||
try_files $uri @tomcat;
|
||||
...
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The document root is relative to the xmlui app, so this gets a 404—I’m not sure why it doesn’t pass to <code>@tomcat</code></li>
|
||||
<li>Anyways, I can’t find any URIs with path <code>/static</code>, and the more important point is to handle all the static theme assets, so we can just remove <code>static</code> from the regex for now (who cares if we can’t use nginx to send Etags for OAI CSS!)</li>
|
||||
<li>Also, I noticed we aren’t setting CSP headers on the static assets, because in nginx headers are inherited in child blocks, but if you use <code>add_header</code> in a child block it doesn’t inherit the others</li>
|
||||
<li>We simply need to add <code>include extra-security.conf;</code> to the above location block (but research and test first)</li>
|
||||
<li>We should add WOFF assets to the list of things to set expires for:</li>
|
||||
</ul>
|
||||
<li><p>The document root is relative to the xmlui app, so this gets a 404—I’m not sure why it doesn’t pass to <code>@tomcat</code></p></li>
|
||||
|
||||
<li><p>Anyways, I can’t find any URIs with path <code>/static</code>, and the more important point is to handle all the static theme assets, so we can just remove <code>static</code> from the regex for now (who cares if we can’t use nginx to send Etags for OAI CSS!)</p></li>
|
||||
|
||||
<li><p>Also, I noticed we aren’t setting CSP headers on the static assets, because in nginx headers are inherited in child blocks, but if you use <code>add_header</code> in a child block it doesn’t inherit the others</p></li>
|
||||
|
||||
<li><p>We simply need to add <code>include extra-security.conf;</code> to the above location block (but research and test first)</p></li>
|
||||
|
||||
<li><p>We should add WOFF assets to the list of things to set expires for:</p>
|
||||
|
||||
<pre><code>location ~* \.(?:ico|css|js|gif|jpe?g|png|woff)$ {
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We should also add <code>aspects/Statistics</code> to the location block for static assets (minus <code>static</code> from above):</li>
|
||||
</ul>
|
||||
<li><p>We should also add <code>aspects/Statistics</code> to the location block for static assets (minus <code>static</code> from above):</p>
|
||||
|
||||
<pre><code>location ~ /(themes|aspects/ReportingSuite|aspects/Statistics) {
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Need to check <code>/about</code> on CGSpace, as it’s blank on my local test server and we might need to add something there</li>
|
||||
<li>CGSpace has been up and down all day due to PostgreSQL idle connections (current DSpace pool is 90):</li>
|
||||
</ul>
|
||||
<li><p>Need to check <code>/about</code> on CGSpace, as it’s blank on my local test server and we might need to add something there</p></li>
|
||||
|
||||
<li><p>CGSpace has been up and down all day due to PostgreSQL idle connections (current DSpace pool is 90):</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
93
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I looked closer at the idle connections and saw that many have been idle for hours (current time on server is <code>2015-11-25T20:20:42+0000</code>):</li>
|
||||
</ul>
|
||||
<li><p>I looked closer at the idle connections and saw that many have been idle for hours (current time on server is <code>2015-11-25T20:20:42+0000</code>):</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | less -S
|
||||
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start |
|
||||
@ -196,14 +195,17 @@ datid | datname | pid | usesysid | usename | application_name | client_addr
|
||||
20951 | cgspace | 10966 | 18205 | cgspace | | 127.0.0.1 | | 37731 | 2015-11-25 13:13:02.837624+00 | | 20
|
||||
20951 | cgspace | 10967 | 18205 | cgspace | | 127.0.0.1 | | 37737 | 2015-11-25 13:13:03.069421+00 | | 20
|
||||
...
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There is a relevant Jira issue about this: <a href="https://jira.duraspace.org/browse/DS-1458">https://jira.duraspace.org/browse/DS-1458</a></li>
|
||||
<li>It seems there is some sense changing DSpace’s default <code>db.maxidle</code> from unlimited (-1) to something like 8 (Tomcat default) or 10 (Confluence default)</li>
|
||||
<li>Change <code>db.maxidle</code> from -1 to 10, reduce <code>db.maxconnections</code> from 90 to 50, and restart postgres and tomcat7</li>
|
||||
<li>Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well</li>
|
||||
<li>Also deploy the nginx fixes for the <code>try_files</code> location block as well as the expires block</li>
|
||||
<li><p>There is a relevant Jira issue about this: <a href="https://jira.duraspace.org/browse/DS-1458">https://jira.duraspace.org/browse/DS-1458</a></p></li>
|
||||
|
||||
<li><p>It seems there is some sense changing DSpace’s default <code>db.maxidle</code> from unlimited (-1) to something like 8 (Tomcat default) or 10 (Confluence default)</p></li>
|
||||
|
||||
<li><p>Change <code>db.maxidle</code> from -1 to 10, reduce <code>db.maxconnections</code> from 90 to 50, and restart postgres and tomcat7</p></li>
|
||||
|
||||
<li><p>Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well</p></li>
|
||||
|
||||
<li><p>Also deploy the nginx fixes for the <code>try_files</code> location block as well as the expires block</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2015-11-26">2015-11-26</h2>
|
||||
@ -211,52 +213,50 @@ datid | datname | pid | usesysid | usename | application_name | client_addr
|
||||
<ul>
|
||||
<li>CGSpace behaving much better since changing <code>db.maxidle</code> yesterday, but still two up/down notices from monitoring this morning (better than 50!)</li>
|
||||
<li>CCAFS colleagues mentioned that the REST API is very slow, 24 seconds for one item</li>
|
||||
<li>Not as bad for me, but still unsustainable if you have to get many:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Not as bad for me, but still unsustainable if you have to get many:</p>
|
||||
|
||||
<pre><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||||
8.415
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Monitoring e-mailed in the evening to say CGSpace was down</li>
|
||||
<li>Idle connections in PostgreSQL again:</li>
|
||||
</ul>
|
||||
<li><p>Monitoring e-mailed in the evening to say CGSpace was down</p></li>
|
||||
|
||||
<li><p>Idle connections in PostgreSQL again:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
|
||||
66
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>At the time, the current DSpace pool size was 50…</li>
|
||||
<li>I reduced the pool back to the default of 30, and reduced the <code>db.maxidle</code> settings from 10 to 8</li>
|
||||
<li><p>At the time, the current DSpace pool size was 50…</p></li>
|
||||
|
||||
<li><p>I reduced the pool back to the default of 30, and reduced the <code>db.maxidle</code> settings from 10 to 8</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2015-11-29">2015-11-29</h2>
|
||||
|
||||
<ul>
|
||||
<li>Still more alerts that CGSpace has been up and down all day</li>
|
||||
<li>Current database settings for DSpace:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Current database settings for DSpace:</p>
|
||||
|
||||
<pre><code>db.maxconnections = 30
|
||||
db.maxwait = 5000
|
||||
db.maxidle = 8
|
||||
db.statementpool = true
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And idle connections:</li>
|
||||
</ul>
|
||||
<li><p>And idle connections:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
|
||||
49
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Perhaps I need to start drastically increasing the connection limits—like to 300—to see if DSpace’s thirst can ever be quenched</li>
|
||||
<li>On another note, SUNScholar’s notes suggest adjusting some other postgres variables: <a href="http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database">http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database</a></li>
|
||||
<li>This might help with REST API speed (which I mentioned above and still need to do real tests)</li>
|
||||
<li><p>Perhaps I need to start drastically increasing the connection limits—like to 300—to see if DSpace’s thirst can ever be quenched</p></li>
|
||||
|
||||
<li><p>On another note, SUNScholar’s notes suggest adjusting some other postgres variables: <a href="http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database">http://wiki.lib.sun.ac.za/index.php/SUNScholar/Optimisations/Database</a></p></li>
|
||||
|
||||
<li><p>This might help with REST API speed (which I mentioned above and still need to do real tests)</p></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -11,12 +11,12 @@
|
||||
|
||||
Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
|
||||
|
||||
|
||||
# cd /home/dspacetest.cgiar.org/log
|
||||
# ls -lh dspace.log.2015-11-18*
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2015-12/" />
|
||||
@ -30,14 +30,14 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
|
||||
|
||||
Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
|
||||
|
||||
|
||||
# cd /home/dspacetest.cgiar.org/log
|
||||
# ls -lh dspace.log.2015-11-18*
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -119,41 +119,38 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
|
||||
<h2 id="2015-12-02">2015-12-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
|
||||
</ul>
|
||||
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p>
|
||||
|
||||
<pre><code># cd /home/dspacetest.cgiar.org/log
|
||||
# ls -lh dspace.log.2015-11-18*
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
<li>I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar wrapper</li>
|
||||
<li>Need to remember to go check if everything is ok in a few days and then change CGSpace</li>
|
||||
<li>CGSpace went down again (due to PostgreSQL idle connections of course)</li>
|
||||
<li>Current database settings for DSpace are <code>db.maxconnections = 30</code> and <code>db.maxidle = 8</code>, yet idle connections are exceeding this:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Current database settings for DSpace are <code>db.maxconnections = 30</code> and <code>db.maxidle = 8</code>, yet idle connections are exceeding this:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
|
||||
39
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I restarted PostgreSQL and Tomcat and it’s back</li>
|
||||
<li>On a related note of why CGSpace is so slow, I decided to finally try the <code>pgtune</code> script to tune the postgres settings:</li>
|
||||
</ul>
|
||||
<li><p>I restarted PostgreSQL and Tomcat and it’s back</p></li>
|
||||
|
||||
<li><p>On a related note of why CGSpace is so slow, I decided to finally try the <code>pgtune</code> script to tune the postgres settings:</p>
|
||||
|
||||
<pre><code># apt-get install pgtune
|
||||
# pgtune -i /etc/postgresql/9.3/main/postgresql.conf -o postgresql.conf-pgtune
|
||||
# mv /etc/postgresql/9.3/main/postgresql.conf /etc/postgresql/9.3/main/postgresql.conf.orig
|
||||
# mv postgresql.conf-pgtune /etc/postgresql/9.3/main/postgresql.conf
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It introduced the following new settings:</li>
|
||||
</ul>
|
||||
<li><p>It introduced the following new settings:</p>
|
||||
|
||||
<pre><code>default_statistics_target = 50
|
||||
maintenance_work_mem = 480MB
|
||||
@ -165,12 +162,11 @@ wal_buffers = 8MB
|
||||
checkpoint_segments = 16
|
||||
shared_buffers = 1920MB
|
||||
max_connections = 80
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Now I need to go read PostgreSQL docs about these options, and watch memory settings in munin etc</li>
|
||||
<li>For what it’s worth, now the REST API should be faster (because of these PostgreSQL tweaks):</li>
|
||||
</ul>
|
||||
<li><p>Now I need to go read PostgreSQL docs about these options, and watch memory settings in munin etc</p></li>
|
||||
|
||||
<li><p>For what it’s worth, now the REST API should be faster (because of these PostgreSQL tweaks):</p>
|
||||
|
||||
<pre><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||||
1.474
|
||||
@ -182,11 +178,11 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
|
||||
1.995
|
||||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||||
1.786
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Last week it was an average of 8 seconds… now this is <sup>1</sup>⁄<sub>4</sub> of that</li>
|
||||
<li>CCAFS noticed that one of their items displays only the Atmire statlets: <a href="https://cgspace.cgiar.org/handle/10568/42445">https://cgspace.cgiar.org/handle/10568/42445</a></li>
|
||||
<li><p>Last week it was an average of 8 seconds… now this is <sup>1</sup>⁄<sub>4</sub> of that</p></li>
|
||||
|
||||
<li><p>CCAFS noticed that one of their items displays only the Atmire statlets: <a href="https://cgspace.cgiar.org/handle/10568/42445">https://cgspace.cgiar.org/handle/10568/42445</a></p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2015/12/ccafs-item-no-metadata.png" alt="CCAFS item" /></p>
|
||||
@ -201,19 +197,20 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
|
||||
|
||||
<ul>
|
||||
<li>CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)</li>
|
||||
<li>Idle postgres connections look like this (with no change in DSpace db settings lately):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Idle postgres connections look like this (with no change in DSpace db settings lately):</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
|
||||
29
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I restarted Tomcat and postgres…</li>
|
||||
<li>Atmire commented that we should raise the JVM heap size by ~500M, so it is now <code>-Xms3584m -Xmx3584m</code></li>
|
||||
<li>We weren’t out of heap yet, but it’s probably fair enough that the DSpace 5 upgrade (and new Atmire modules) requires more memory so it’s ok</li>
|
||||
<li>A possible side effect is that I see that the REST API is twice as fast for the request above now:</li>
|
||||
</ul>
|
||||
<li><p>I restarted Tomcat and postgres…</p></li>
|
||||
|
||||
<li><p>Atmire commented that we should raise the JVM heap size by ~500M, so it is now <code>-Xms3584m -Xmx3584m</code></p></li>
|
||||
|
||||
<li><p>We weren’t out of heap yet, but it’s probably fair enough that the DSpace 5 upgrade (and new Atmire modules) requires more memory so it’s ok</p></li>
|
||||
|
||||
<li><p>A possible side effect is that I see that the REST API is twice as fast for the request above now:</p>
|
||||
|
||||
<pre><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||||
1.368
|
||||
@ -227,22 +224,23 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
|
||||
0.806
|
||||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||||
0.854
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2015-12-05">2015-12-05</h2>
|
||||
|
||||
<ul>
|
||||
<li>CGSpace has been up and down all day and REST API is completely unresponsive</li>
|
||||
<li>PostgreSQL idle connections are currently:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>PostgreSQL idle connections are currently:</p>
|
||||
|
||||
<pre><code>postgres@linode01:~$ psql -c 'SELECT * from pg_stat_activity;' | grep cgspace | grep -c idle
|
||||
28
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I have reverted all the pgtune tweaks from the other day, as they didn’t fix the stability issues, so I’d rather not have them introducing more variables into the equation</li>
|
||||
<li>The PostgreSQL stats from Munin all point to something database-related with the DSpace 5 upgrade around mid–late November</li>
|
||||
<li><p>I have reverted all the pgtune tweaks from the other day, as they didn’t fix the stability issues, so I’d rather not have them introducing more variables into the equation</p></li>
|
||||
|
||||
<li><p>The PostgreSQL stats from Munin all point to something database-related with the DSpace 5 upgrade around mid–late November</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2015/12/postgres_bgwriter-year.png" alt="PostgreSQL bgwriter (year)" />
|
||||
@ -254,8 +252,8 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
|
||||
|
||||
<ul>
|
||||
<li>Atmire sent <a href="https://github.com/ilri/DSpace/pull/161">some fixes</a> to DSpace’s REST API code that was leaving contexts open (causing the slow performance and database issues)</li>
|
||||
<li>After deploying the fix to CGSpace the REST API is consistently faster:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>After deploying the fix to CGSpace the REST API is consistently faster:</p>
|
||||
|
||||
<pre><code>$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||||
0.675
|
||||
@ -267,7 +265,8 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
|
||||
0.566
|
||||
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
|
||||
0.497
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2015-12-08">2015-12-08</h2>
|
||||
|
||||
|
@ -27,7 +27,7 @@ Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_
|
||||
I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
|
||||
Update GitHub wiki for documentation of maintenance tasks.
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
|
@ -41,7 +41,7 @@ I noticed we have a very interesting list of countries on CGSpace:
|
||||
Not only are there 49,000 countries, we have some blanks (25)…
|
||||
Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -139,41 +139,39 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r
|
||||
|
||||
<ul>
|
||||
<li>Found a way to get items with null/empty metadata values from SQL</li>
|
||||
<li>First, find the <code>metadata_field_id</code> for the field you want from the <code>metadatafieldregistry</code> table:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>First, find the <code>metadata_field_id</code> for the field you want from the <code>metadatafieldregistry</code> table:</p>
|
||||
|
||||
<pre><code>dspacetest=# select * from metadatafieldregistry;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In this case our country field is 78</li>
|
||||
<li>Now find all resources with type 2 (item) that have null/empty values for that field:</li>
|
||||
</ul>
|
||||
<li><p>In this case our country field is 78</p></li>
|
||||
|
||||
<li><p>Now find all resources with type 2 (item) that have null/empty values for that field:</p>
|
||||
|
||||
<pre><code>dspacetest=# select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=78 and (text_value='' OR text_value IS NULL);
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then you can find the handle that owns it from its <code>resource_id</code>:</li>
|
||||
</ul>
|
||||
<li><p>Then you can find the handle that owns it from its <code>resource_id</code>:</p>
|
||||
|
||||
<pre><code>dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '22678';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It’s 25 items so editing in the web UI is annoying, let’s try SQL!</li>
|
||||
</ul>
|
||||
<li><p>It’s 25 items so editing in the web UI is annoying, let’s try SQL!</p>
|
||||
|
||||
<pre><code>dspacetest=# delete from metadatavalue where metadata_field_id=78 and text_value='';
|
||||
DELETE 25
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After that perhaps a regular <code>dspace index-discovery</code> (no -b) <em>should</em> suffice…</li>
|
||||
<li>Hmm, I indexed, cleared the Cocoon cache, and restarted Tomcat but the 25 “|||” countries are still there</li>
|
||||
<li>Maybe I need to do a full re-index…</li>
|
||||
<li>Yep! The full re-index seems to work.</li>
|
||||
<li>Process the empty countries on CGSpace</li>
|
||||
<li><p>After that perhaps a regular <code>dspace index-discovery</code> (no -b) <em>should</em> suffice…</p></li>
|
||||
|
||||
<li><p>Hmm, I indexed, cleared the Cocoon cache, and restarted Tomcat but the 25 “|||” countries are still there</p></li>
|
||||
|
||||
<li><p>Maybe I need to do a full re-index…</p></li>
|
||||
|
||||
<li><p>Yep! The full re-index seems to work.</p></li>
|
||||
|
||||
<li><p>Process the empty countries on CGSpace</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-02-07">2016-02-07</h2>
|
||||
@ -184,8 +182,8 @@ DELETE 25
|
||||
<li>For some reason when you import an Excel file into OpenRefine it exports dates like 1949 to 1949.0 in the CSV</li>
|
||||
<li>I re-import the resulting CSV and run a GREL on the date issued column: <code>value.replace("\.0", "")</code></li>
|
||||
<li>I need to start running DSpace in Mac OS X instead of a Linux VM</li>
|
||||
<li>Install PostgreSQL from homebrew, then configure and import CGSpace database dump:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Install PostgreSQL from homebrew, then configure and import CGSpace database dump:</p>
|
||||
|
||||
<pre><code>$ postgres -D /opt/brew/var/postgres
|
||||
$ createuser --superuser postgres
|
||||
@ -200,11 +198,9 @@ postgres=# alter user dspacetest nocreateuser;
|
||||
postgres=# \q
|
||||
$ vacuumdb dspacetest
|
||||
$ psql -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest -h localhost
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After building and running a <code>fresh_install</code> I symlinked the webapps into Tomcat’s webapps folder:</li>
|
||||
</ul>
|
||||
<li><p>After building and running a <code>fresh_install</code> I symlinked the webapps into Tomcat’s webapps folder:</p>
|
||||
|
||||
<pre><code>$ mv /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT.orig
|
||||
$ ln -sfv ~/dspace/webapps/xmlui /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT
|
||||
@ -213,22 +209,20 @@ $ ln -sfv ~/dspace/webapps/jspui /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/
|
||||
$ ln -sfv ~/dspace/webapps/oai /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/oai
|
||||
$ ln -sfv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/solr
|
||||
$ /opt/brew/Cellar/tomcat/8.0.30/bin/catalina start
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Add CATALINA_OPTS in <code>/opt/brew/Cellar/tomcat/8.0.30/libexec/bin/setenv.sh</code>, as this script is sourced by the <code>catalina</code> startup script</li>
|
||||
<li>For example:</li>
|
||||
</ul>
|
||||
<li><p>Add CATALINA_OPTS in <code>/opt/brew/Cellar/tomcat/8.0.30/libexec/bin/setenv.sh</code>, as this script is sourced by the <code>catalina</code> startup script</p></li>
|
||||
|
||||
<li><p>For example:</p>
|
||||
|
||||
<pre><code>CATALINA_OPTS="-Djava.awt.headless=true -Xms2048m -Xmx2048m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After verifying that the site is working, start a full index:</li>
|
||||
</ul>
|
||||
<li><p>After verifying that the site is working, start a full index:</p>
|
||||
|
||||
<pre><code>$ ~/dspace/bin/dspace index-discovery -b
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-02-08">2016-02-08</h2>
|
||||
|
||||
@ -245,8 +239,8 @@ $ /opt/brew/Cellar/tomcat/8.0.30/bin/catalina start
|
||||
<ul>
|
||||
<li>Re-sync DSpace Test with CGSpace</li>
|
||||
<li>Help Sisay with OpenRefine</li>
|
||||
<li>Enable HTTPS on DSpace Test using Let’s Encrypt:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Enable HTTPS on DSpace Test using Let’s Encrypt:</p>
|
||||
|
||||
<pre><code>$ cd ~/src/git
|
||||
$ git clone https://github.com/letsencrypt/letsencrypt
|
||||
@ -256,39 +250,36 @@ $ sudo service nginx stop
|
||||
$ ./letsencrypt-auto certonly --standalone -d dspacetest.cgiar.org
|
||||
$ sudo service nginx start
|
||||
$ ansible-playbook dspace.yml -l linode02 -t nginx,firewall -u aorth --ask-become-pass
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We should install it in /opt/letsencrypt and then script the renewal script, but first we have to wire up some variables and template stuff based on the script here: <a href="https://letsencrypt.org/howitworks/">https://letsencrypt.org/howitworks/</a></li>
|
||||
<li>I had to export some CIAT items that were being cleaned up on the test server and I noticed their <code>dc.contributor.author</code> fields have DSpace 5 authority index UUIDs…</li>
|
||||
<li>To clean those up in OpenRefine I used this GREL expression: <code>value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")</code></li>
|
||||
<li>Getting more and more hangs on DSpace Test, seemingly random but also during CSV import</li>
|
||||
<li>Logs don’t always show anything right when it fails, but eventually one of these appears:</li>
|
||||
</ul>
|
||||
<li><p>We should install it in /opt/letsencrypt and then script the renewal script, but first we have to wire up some variables and template stuff based on the script here: <a href="https://letsencrypt.org/howitworks/">https://letsencrypt.org/howitworks/</a></p></li>
|
||||
|
||||
<li><p>I had to export some CIAT items that were being cleaned up on the test server and I noticed their <code>dc.contributor.author</code> fields have DSpace 5 authority index UUIDs…</p></li>
|
||||
|
||||
<li><p>To clean those up in OpenRefine I used this GREL expression: <code>value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")</code></p></li>
|
||||
|
||||
<li><p>Getting more and more hangs on DSpace Test, seemingly random but also during CSV import</p></li>
|
||||
|
||||
<li><p>Logs don’t always show anything right when it fails, but eventually one of these appears:</p>
|
||||
|
||||
<pre><code>org.dspace.discovery.SearchServiceException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>or</li>
|
||||
</ul>
|
||||
<li><p>or</p>
|
||||
|
||||
<pre><code>Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Right now DSpace Test’s Tomcat heap is set to 1536m and we have quite a bit of free RAM:</li>
|
||||
</ul>
|
||||
<li><p>Right now DSpace Test’s Tomcat heap is set to 1536m and we have quite a bit of free RAM:</p>
|
||||
|
||||
<pre><code># free -m
|
||||
total used free shared buffers cached
|
||||
total used free shared buffers cached
|
||||
Mem: 3950 3902 48 9 37 1311
|
||||
-/+ buffers/cache: 2552 1397
|
||||
Swap: 255 57 198
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So I’ll bump up the Tomcat heap to 2048 (CGSpace production server is using 3GB)</li>
|
||||
<li><p>So I’ll bump up the Tomcat heap to 2048 (CGSpace production server is using 3GB)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-02-11">2016-02-11</h2>
|
||||
@ -296,15 +287,13 @@ Swap: 255 57 198
|
||||
<ul>
|
||||
<li>Massaging some CIAT data in OpenRefine</li>
|
||||
<li>There are 1200 records that have PDFs, and will need to be imported into CGSpace</li>
|
||||
<li>I created a <code>filename</code> column based on the <code>dc.identifier.url</code> column using the following transform:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I created a <code>filename</code> column based on the <code>dc.identifier.url</code> column using the following transform:</p>
|
||||
|
||||
<pre><code>value.split('/')[-1]
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then I wrote a tool called <a href="https://gist.github.com/alanorth/2206f24483fe5f0454fc"><code>generate-thumbnails.py</code></a> to download the PDFs and generate thumbnails for them, for example:</li>
|
||||
</ul>
|
||||
<li><p>Then I wrote a tool called <a href="https://gist.github.com/alanorth/2206f24483fe5f0454fc"><code>generate-thumbnails.py</code></a> to download the PDFs and generate thumbnails for them, for example:</p>
|
||||
|
||||
<pre><code>$ ./generate-thumbnails.py ciat-reports.csv
|
||||
Processing 64661.pdf
|
||||
@ -313,7 +302,8 @@ Processing 64661.pdf
|
||||
Processing 64195.pdf
|
||||
> Downloading 64195.pdf
|
||||
> Creating thumbnail for 64195.pdf
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-02-12">2016-02-12</h2>
|
||||
|
||||
@ -330,44 +320,47 @@ Processing 64195.pdf
|
||||
|
||||
<ul>
|
||||
<li>Looking at CIAT’s records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I’m not sure if we can use those</li>
|
||||
<li>265 items have dirty, URL-encoded filenames:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>265 items have dirty, URL-encoded filenames:</p>
|
||||
|
||||
<pre><code>$ ls | grep -c -E "%"
|
||||
265
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I suggest that we import ~850 or so of the clean ones first, then do the rest after I can find a clean/reliable way to decode the filenames</li>
|
||||
<li>This python2 snippet seems to work in the CLI, but not so well in OpenRefine:</li>
|
||||
</ul>
|
||||
<li><p>I suggest that we import ~850 or so of the clean ones first, then do the rest after I can find a clean/reliable way to decode the filenames</p></li>
|
||||
|
||||
<li><p>This python2 snippet seems to work in the CLI, but not so well in OpenRefine:</p>
|
||||
|
||||
<pre><code>$ python -c "import urllib, sys; print urllib.unquote(sys.argv[1])" CIAT_COLOMBIA_000169_T%C3%A9cnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_yuca.pdf
|
||||
CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_yuca.pdf
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Merge pull requests for submission form theming (<a href="https://github.com/ilri/DSpace/pull/178">#178</a>) and missing center subjects in XMLUI item views (<a href="https://github.com/ilri/DSpace/pull/176">#176</a>)</li>
|
||||
<li>They will be deployed on CGSpace the next time I re-deploy</li>
|
||||
<li><p>Merge pull requests for submission form theming (<a href="https://github.com/ilri/DSpace/pull/178">#178</a>) and missing center subjects in XMLUI item views (<a href="https://github.com/ilri/DSpace/pull/176">#176</a>)</p></li>
|
||||
|
||||
<li><p>They will be deployed on CGSpace the next time I re-deploy</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-02-16">2016-02-16</h2>
|
||||
|
||||
<ul>
|
||||
<li>Turns out OpenRefine has an unescape function!</li>
|
||||
</ul>
|
||||
<li><p>Turns out OpenRefine has an unescape function!</p>
|
||||
|
||||
<pre><code>value.unescape("url")
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This turns the URLs into human-readable versions that we can use as proper filenames</li>
|
||||
<li>Run web server and system updates on DSpace Test and reboot</li>
|
||||
<li>To merge <code>dc.identifier.url</code> and <code>dc.identifier.url[]</code>, rename the second column so it doesn’t have the brackets, like <code>dc.identifier.url2</code></li>
|
||||
<li>Then you create a facet for blank values on each column, show the rows that have values for one and not the other, then transform each independently to have the contents of the other, with “||” in between</li>
|
||||
<li>Work on Python script for parsing and downloading PDF records from <code>dc.identifier.url</code></li>
|
||||
<li>To get filenames from <code>dc.identifier.url</code>, create a new column based on this transform: <code>forEach(value.split('||'), v, v.split('/')[-1]).join('||')</code></li>
|
||||
<li>This also works for records that have multiple URLs (separated by “||”)</li>
|
||||
<li><p>This turns the URLs into human-readable versions that we can use as proper filenames</p></li>
|
||||
|
||||
<li><p>Run web server and system updates on DSpace Test and reboot</p></li>
|
||||
|
||||
<li><p>To merge <code>dc.identifier.url</code> and <code>dc.identifier.url[]</code>, rename the second column so it doesn’t have the brackets, like <code>dc.identifier.url2</code></p></li>
|
||||
|
||||
<li><p>Then you create a facet for blank values on each column, show the rows that have values for one and not the other, then transform each independently to have the contents of the other, with “||” in between</p></li>
|
||||
|
||||
<li><p>Work on Python script for parsing and downloading PDF records from <code>dc.identifier.url</code></p></li>
|
||||
|
||||
<li><p>To get filenames from <code>dc.identifier.url</code>, create a new column based on this transform: <code>forEach(value.split('||'), v, v.split('/')[-1]).join('||')</code></p></li>
|
||||
|
||||
<li><p>This also works for records that have multiple URLs (separated by “||”)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-02-17">2016-02-17</h2>
|
||||
@ -383,40 +376,39 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
|
||||
|
||||
<ul>
|
||||
<li>Turns out the “bug” in SAFBuilder isn’t a bug, it’s a feature that allows you to encode extra information like the destintion bundle in the filename</li>
|
||||
<li>Also, it seems DSpace’s SAF import tool doesn’t like importing filenames that have accents in them:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Also, it seems DSpace’s SAF import tool doesn’t like importing filenames that have accents in them:</p>
|
||||
|
||||
<pre><code>java.io.FileNotFoundException: /usr/share/tomcat7/SimpleArchiveFormat/item_1021/CIAT_COLOMBIA_000075_Medición_de_palatabilidad_en_forrajes.pdf (No such file or directory)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Need to rename files to have no accents or umlauts, etc…</li>
|
||||
<li>Useful custom text facet for URLs ending with “.pdf”: <code>value.endsWith(".pdf")</code></li>
|
||||
<li><p>Need to rename files to have no accents or umlauts, etc…</p></li>
|
||||
|
||||
<li><p>Useful custom text facet for URLs ending with “.pdf”: <code>value.endsWith(".pdf")</code></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-02-22">2016-02-22</h2>
|
||||
|
||||
<ul>
|
||||
<li>To change Spanish accents to ASCII in OpenRefine:</li>
|
||||
</ul>
|
||||
<li><p>To change Spanish accents to ASCII in OpenRefine:</p>
|
||||
|
||||
<pre><code>value.replace('ó','o').replace('í','i').replace('á','a').replace('é','e').replace('ñ','n')
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac</li>
|
||||
<li>On closer inspection, I can import files with the following names on Linux (DSpace Test):</li>
|
||||
</ul>
|
||||
<li><p>But actually, the accents might not be an issue, as I can successfully import files containing Spanish accents on my Mac</p></li>
|
||||
|
||||
<li><p>On closer inspection, I can import files with the following names on Linux (DSpace Test):</p>
|
||||
|
||||
<pre><code>Bitstream: tést.pdf
|
||||
Bitstream: tést señora.pdf
|
||||
Bitstream: tést señora alimentación.pdf
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Seems it could be something with the HFS+ filesystem actually, as it’s not UTF-8 (<a href="http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html">it’s something like UCS-2</a>)</li>
|
||||
<li>HFS+ stores filenames as a string, and filenames with accents get stored as <a href="https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/">character+accent</a> whereas Linux’s ext4 stores them as an array of bytes</li>
|
||||
<li>Running the SAFBuilder on Mac OS X works if you’re going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem’s encoding matches</li>
|
||||
<li><p>Seems it could be something with the HFS+ filesystem actually, as it’s not UTF-8 (<a href="http://www.cio.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html">it’s something like UCS-2</a>)</p></li>
|
||||
|
||||
<li><p>HFS+ stores filenames as a string, and filenames with accents get stored as <a href="https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/">character+accent</a> whereas Linux’s ext4 stores them as an array of bytes</p></li>
|
||||
|
||||
<li><p>Running the SAFBuilder on Mac OS X works if you’re going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem’s encoding matches</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-02-29">2016-02-29</h2>
|
||||
@ -433,15 +425,15 @@ Bitstream: tést señora alimentación.pdf
|
||||
<li>Trying to test Atmire’s series of stats and CUA fixes from January and February, but their branch history is really messy and it’s hard to see what’s going on</li>
|
||||
<li>Rebasing their branch on top of our production branch results in a broken Tomcat, so I’m going to tell them to fix their history and make a proper pull request</li>
|
||||
<li>Looking at the filenames for the CIAT Reports, some have some really ugly characters, like: <code>'</code> or <code>,</code> or <code>=</code> or <code>[</code> or <code>]</code> or <code>(</code> or <code>)</code> or <code>_.pdf</code> or <code>._</code> etc</li>
|
||||
<li>It’s tricky to parse those things in some programming languages so I’d rather just get rid of the weird stuff now in OpenRefine:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It’s tricky to parse those things in some programming languages so I’d rather just get rid of the weird stuff now in OpenRefine:</p>
|
||||
|
||||
<pre><code>value.replace("'",'').replace('_=_','_').replace(',','').replace('[','').replace(']','').replace('(','').replace(')','').replace('_.pdf','.pdf').replace('._','_')
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Finally import the 1127 CIAT items into CGSpace: <a href="https://cgspace.cgiar.org/handle/10568/35710">https://cgspace.cgiar.org/handle/10568/35710</a></li>
|
||||
<li>Re-deploy CGSpace with the Google Scholar fix, but I’m waiting on the Atmire fixes for now, as the branch history is ugly</li>
|
||||
<li><p>Finally import the 1127 CIAT items into CGSpace: <a href="https://cgspace.cgiar.org/handle/10568/35710">https://cgspace.cgiar.org/handle/10568/35710</a></p></li>
|
||||
|
||||
<li><p>Re-deploy CGSpace with the Google Scholar fix, but I’m waiting on the Atmire fixes for now, as the branch history is ugly</p></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -27,7 +27,7 @@ Looking at issues with author authorities on CGSpace
|
||||
For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
|
||||
Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -121,11 +121,12 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
|
||||
<li>Their changes on <code>5_x-dev</code> branch work, but it is messy as hell with merge commits and old branch base</li>
|
||||
<li>When I rebase their branch on the latest <code>5_x-prod</code> I get blank white pages</li>
|
||||
<li>I identified one commit that causes the issue and let them know</li>
|
||||
<li>Restart DSpace Test, as it seems to have crashed after Sisay tried to import some CSV or zip or something:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Restart DSpace Test, as it seems to have crashed after Sisay tried to import some CSV or zip or something:</p>
|
||||
|
||||
<pre><code>Exception in thread "Lucene Merge Thread #19" org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-03-08">2016-03-08</h2>
|
||||
|
||||
@ -185,28 +186,28 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
|
||||
<li>More discussion on the GitHub issue here: <a href="https://github.com/ilri/DSpace/pull/182">https://github.com/ilri/DSpace/pull/182</a></li>
|
||||
<li>Clean up Atmire CUA config (<a href="https://github.com/ilri/DSpace/pull/193">#193</a>)</li>
|
||||
<li>Help Sisay with some PostgreSQL queries to clean up the incorrect <code>dc.contributor.corporateauthor</code> field</li>
|
||||
<li>I noticed that we have some weird values in <code>dc.language</code>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I noticed that we have some weird values in <code>dc.language</code>:</p>
|
||||
|
||||
<pre><code># select * from metadatavalue where metadata_field_id=37;
|
||||
metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
|
||||
metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
|
||||
-------------------+-------------+-------------------+------------+-----------+-------+-----------+------------+------------------
|
||||
1942571 | 35342 | 37 | hi | | 1 | | -1 | 2
|
||||
1942468 | 35345 | 37 | hi | | 1 | | -1 | 2
|
||||
1942479 | 35337 | 37 | hi | | 1 | | -1 | 2
|
||||
1942505 | 35336 | 37 | hi | | 1 | | -1 | 2
|
||||
1942519 | 35338 | 37 | hi | | 1 | | -1 | 2
|
||||
1942535 | 35340 | 37 | hi | | 1 | | -1 | 2
|
||||
1942555 | 35341 | 37 | hi | | 1 | | -1 | 2
|
||||
1942588 | 35343 | 37 | hi | | 1 | | -1 | 2
|
||||
1942610 | 35346 | 37 | hi | | 1 | | -1 | 2
|
||||
1942624 | 35347 | 37 | hi | | 1 | | -1 | 2
|
||||
1942639 | 35339 | 37 | hi | | 1 | | -1 | 2
|
||||
</code></pre>
|
||||
1942571 | 35342 | 37 | hi | | 1 | | -1 | 2
|
||||
1942468 | 35345 | 37 | hi | | 1 | | -1 | 2
|
||||
1942479 | 35337 | 37 | hi | | 1 | | -1 | 2
|
||||
1942505 | 35336 | 37 | hi | | 1 | | -1 | 2
|
||||
1942519 | 35338 | 37 | hi | | 1 | | -1 | 2
|
||||
1942535 | 35340 | 37 | hi | | 1 | | -1 | 2
|
||||
1942555 | 35341 | 37 | hi | | 1 | | -1 | 2
|
||||
1942588 | 35343 | 37 | hi | | 1 | | -1 | 2
|
||||
1942610 | 35346 | 37 | hi | | 1 | | -1 | 2
|
||||
1942624 | 35347 | 37 | hi | | 1 | | -1 | 2
|
||||
1942639 | 35339 | 37 | hi | | 1 | | -1 | 2
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It seems this <code>dc.language</code> field isn’t really used, but we should delete these values</li>
|
||||
<li>Also, <code>dc.language.iso</code> has some weird values, like “En” and “English”</li>
|
||||
<li><p>It seems this <code>dc.language</code> field isn’t really used, but we should delete these values</p></li>
|
||||
|
||||
<li><p>Also, <code>dc.language.iso</code> has some weird values, like “En” and “English”</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-03-17">2016-03-17</h2>
|
||||
@ -236,14 +237,12 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
|
||||
<p><img src="/cgspace-notes/2016/03/bioversity-thumbnail-good.jpg" alt="Trimmed thumbnail" /></p>
|
||||
|
||||
<ul>
|
||||
<li>Command used:</li>
|
||||
</ul>
|
||||
<li><p>Command used:</p>
|
||||
|
||||
<pre><code>$ gm convert -trim -quality 82 -thumbnail x300 -flatten Descriptor\ for\ Butia_EN-2015_2021.pdf\[0\] cover.jpg
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also, it looks like adding <code>-sharpen 0x1.0</code> really improves the quality of the image for only a few KB</li>
|
||||
<li><p>Also, it looks like adding <code>-sharpen 0x1.0</code> really improves the quality of the image for only a few KB</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-03-21">2016-03-21</h2>
|
||||
@ -295,15 +294,14 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
|
||||
<h2 id="2016-03-23">2016-03-23</h2>
|
||||
|
||||
<ul>
|
||||
<li>Abenet is having problems saving group memberships, and she gets this error: <a href="https://gist.github.com/alanorth/87281c061c2de57b773e">https://gist.github.com/alanorth/87281c061c2de57b773e</a></li>
|
||||
</ul>
|
||||
<li><p>Abenet is having problems saving group memberships, and she gets this error: <a href="https://gist.github.com/alanorth/87281c061c2de57b773e">https://gist.github.com/alanorth/87281c061c2de57b773e</a></p>
|
||||
|
||||
<pre><code>Can't find method org.dspace.app.xmlui.aspect.administrative.FlowGroupUtils.processSaveGroup(org.dspace.core.Context,number,string,[Ljava.lang.String;,[Ljava.lang.String;,org.apache.cocoon.environment.wrapper.RequestWrapper). (resource://aspects/Administrative/administrative.js#967)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I can reproduce the same error on DSpace Test and on my Mac</li>
|
||||
<li>Looks to be an issue with the Atmire modules, I’ve submitted a ticket to their tracker.</li>
|
||||
<li><p>I can reproduce the same error on DSpace Test and on my Mac</p></li>
|
||||
|
||||
<li><p>Looks to be an issue with the Atmire modules, I’ve submitted a ticket to their tracker.</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-03-24">2016-03-24</h2>
|
||||
|
@ -31,7 +31,7 @@ After running DSpace for over five years I’ve never needed to look in any
|
||||
This will save us a few gigs of backup space we’re paying for on S3
|
||||
Also, I noticed the checker log has some errors we should pay attention to:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -154,42 +154,40 @@ java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290
|
||||
<h2 id="2016-04-05">2016-04-05</h2>
|
||||
|
||||
<ul>
|
||||
<li>Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don’t need!</li>
|
||||
</ul>
|
||||
<li><p>Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don’t need!</p>
|
||||
|
||||
<pre><code># s3cmd ls s3://cgspace.cgiar.org/log/ > /tmp/s3-logs.txt
|
||||
# grep checker.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
|
||||
# grep cocoon.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
|
||||
# grep handle-plugin.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
|
||||
# grep solr.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also, adjust the cron jobs for backups so they only backup <code>dspace.log</code> and some stats files (.dat)</li>
|
||||
<li>Try to do some metadata field migrations using the Atmire batch UI (<code>dc.Species</code> → <code>cg.species</code>) but it took several hours and even missed a few records</li>
|
||||
<li><p>Also, adjust the cron jobs for backups so they only backup <code>dspace.log</code> and some stats files (.dat)</p></li>
|
||||
|
||||
<li><p>Try to do some metadata field migrations using the Atmire batch UI (<code>dc.Species</code> → <code>cg.species</code>) but it took several hours and even missed a few records</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-04-06">2016-04-06</h2>
|
||||
|
||||
<ul>
|
||||
<li>A better way to move metadata on this scale is via SQL, for example <code>dc.type.output</code> → <code>dc.type</code> (their IDs in the metadatafieldregistry are 66 and 109, respectively):</li>
|
||||
</ul>
|
||||
<li><p>A better way to move metadata on this scale is via SQL, for example <code>dc.type.output</code> → <code>dc.type</code> (their IDs in the metadatafieldregistry are 66 and 109, respectively):</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set metadata_field_id=109 where metadata_field_id=66;
|
||||
UPDATE 40852
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After that an <code>index-discovery -bf</code> is required</li>
|
||||
<li>Start working on metadata migrations, add 25 or so new metadata fields to CGSpace</li>
|
||||
<li><p>After that an <code>index-discovery -bf</code> is required</p></li>
|
||||
|
||||
<li><p>Start working on metadata migrations, add 25 or so new metadata fields to CGSpace</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-04-07">2016-04-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>Write shell script to do the migration of fields: <a href="https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b">https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b</a></li>
|
||||
<li>Testing with a few fields it seems to work well:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing with a few fields it seems to work well:</p>
|
||||
|
||||
<pre><code>$ ./migrate-fields.sh
|
||||
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
|
||||
@ -198,7 +196,8 @@ UPDATE metadatavalue SET metadata_field_id=202 WHERE metadata_field_id=72
|
||||
UPDATE 21420
|
||||
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
|
||||
UPDATE 51258
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-04-08">2016-04-08</h2>
|
||||
|
||||
@ -211,23 +210,22 @@ UPDATE 51258
|
||||
|
||||
<ul>
|
||||
<li>Looking at the DOI issue <a href="https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=678507860">reported by Leroy from CIAT a few weeks ago</a></li>
|
||||
<li>It seems the <code>dx.doi.org</code> URLs are much more proper in our repository!</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It seems the <code>dx.doi.org</code> URLs are much more proper in our repository!</p>
|
||||
|
||||
<pre><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://dx.doi.org%';
|
||||
count
|
||||
count
|
||||
-------
|
||||
5638
|
||||
5638
|
||||
(1 row)
|
||||
|
||||
dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://doi.org%';
|
||||
count
|
||||
count
|
||||
-------
|
||||
3
|
||||
</code></pre>
|
||||
3
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I will manually edit the <code>dc.identifier.doi</code> in <a href="https://cgspace.cgiar.org/handle/10568/72509?show=full"><sup>10568</sup>⁄<sub>72509</sub></a> and tweet the link, then check back in a week to see if the donut gets updated</li>
|
||||
<li><p>I will manually edit the <code>dc.identifier.doi</code> in <a href="https://cgspace.cgiar.org/handle/10568/72509?show=full"><sup>10568</sup>⁄<sub>72509</sub></a> and tweet the link, then check back in a week to see if the donut gets updated</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-04-11">2016-04-11</h2>
|
||||
@ -240,38 +238,41 @@ dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and t
|
||||
<h2 id="2016-04-12">2016-04-12</h2>
|
||||
|
||||
<ul>
|
||||
<li>Looking at quality of WLE data (<code>cg.subject.iwmi</code>) in SQL:</li>
|
||||
</ul>
|
||||
<li><p>Looking at quality of WLE data (<code>cg.subject.iwmi</code>) in SQL:</p>
|
||||
|
||||
<pre><code>dspacetest=# select text_value, count(*) from metadatavalue where metadata_field_id=217 group by text_value order by count(*) desc;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Listings and Reports is still not returning reliable data for <code>dc.type</code></li>
|
||||
<li>I think we need to ask Atmire, as their documentation isn’t too clear on the format of the filter configs</li>
|
||||
<li>Alternatively, I want to see if I move all the data from <code>dc.type.output</code> to <code>dc.type</code> and then re-index, if it behaves better</li>
|
||||
<li>Looking at our <code>input-forms.xml</code> I see we have two sets of ILRI subjects, but one has a few extra subjects</li>
|
||||
<li>Remove one set of ILRI subjects and remove duplicate <code>VALUE CHAINS</code> from existing list (<a href="https://github.com/ilri/DSpace/pull/216">#216</a>)</li>
|
||||
<li>I decided to keep the set of subjects that had <code>FMD</code> and <code>RANGELANDS</code> added, as it appears to have been requested to have been added, and might be the newer list</li>
|
||||
<li>I found 226 blank metadatavalues:</li>
|
||||
</ul>
|
||||
<li><p>Listings and Reports is still not returning reliable data for <code>dc.type</code></p></li>
|
||||
|
||||
<li><p>I think we need to ask Atmire, as their documentation isn’t too clear on the format of the filter configs</p></li>
|
||||
|
||||
<li><p>Alternatively, I want to see if I move all the data from <code>dc.type.output</code> to <code>dc.type</code> and then re-index, if it behaves better</p></li>
|
||||
|
||||
<li><p>Looking at our <code>input-forms.xml</code> I see we have two sets of ILRI subjects, but one has a few extra subjects</p></li>
|
||||
|
||||
<li><p>Remove one set of ILRI subjects and remove duplicate <code>VALUE CHAINS</code> from existing list (<a href="https://github.com/ilri/DSpace/pull/216">#216</a>)</p></li>
|
||||
|
||||
<li><p>I decided to keep the set of subjects that had <code>FMD</code> and <code>RANGELANDS</code> added, as it appears to have been requested to have been added, and might be the newer list</p></li>
|
||||
|
||||
<li><p>I found 226 blank metadatavalues:</p>
|
||||
|
||||
<pre><code>dspacetest# select * from metadatavalue where resource_type_id=2 and text_value='';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think we should delete them and do a full re-index:</li>
|
||||
</ul>
|
||||
<li><p>I think we should delete them and do a full re-index:</p>
|
||||
|
||||
<pre><code>dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value='';
|
||||
DELETE 226
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I deleted them on CGSpace but I’ll wait to do the re-index as we’re going to be doing one in a few days for the metadata changes anyways</li>
|
||||
<li>In other news, moving the <code>dc.type.output</code> to <code>dc.type</code> and re-indexing seems to have fixed the Listings and Reports issue from above</li>
|
||||
<li>Unfortunately this isn’t a very good solution, because Listings and Reports config should allow us to filter on <code>dc.type.*</code> but the documentation isn’t very clear and I couldn’t reach Atmire today</li>
|
||||
<li>We want to do the <code>dc.type.output</code> move on CGSpace anyways, but we should wait as it might affect other external people!</li>
|
||||
<li><p>I deleted them on CGSpace but I’ll wait to do the re-index as we’re going to be doing one in a few days for the metadata changes anyways</p></li>
|
||||
|
||||
<li><p>In other news, moving the <code>dc.type.output</code> to <code>dc.type</code> and re-indexing seems to have fixed the Listings and Reports issue from above</p></li>
|
||||
|
||||
<li><p>Unfortunately this isn’t a very good solution, because Listings and Reports config should allow us to filter on <code>dc.type.*</code> but the documentation isn’t very clear and I couldn’t reach Atmire today</p></li>
|
||||
|
||||
<li><p>We want to do the <code>dc.type.output</code> move on CGSpace anyways, but we should wait as it might affect other external people!</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-04-14">2016-04-14</h2>
|
||||
@ -315,8 +316,8 @@ DELETE 226
|
||||
<li>cg.livestock.agegroup: 9 items, in ILRI collections</li>
|
||||
<li>cg.livestock.function: 20 items, mostly in EADD</li>
|
||||
</ul></li>
|
||||
<li>Test metadata migration on local instance again:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Test metadata migration on local instance again:</p>
|
||||
|
||||
<pre><code>$ ./migrate-fields.sh
|
||||
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
|
||||
@ -332,95 +333,88 @@ UPDATE 3872
|
||||
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
|
||||
UPDATE 46075
|
||||
$ JAVA_OPTS="-Xms512m -Xmx512m -Dfile.encoding=UTF-8" ~/dspace/bin/dspace index-discovery -bf
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>CGSpace was down but I’m not sure why, this was in <code>catalina.out</code>:</li>
|
||||
</ul>
|
||||
<li><p>CGSpace was down but I’m not sure why, this was in <code>catalina.out</code>:</p>
|
||||
|
||||
<pre><code>Apr 18, 2016 7:32:26 PM com.sun.jersey.spi.container.ContainerResponse logException
|
||||
SEVERE: Mapped exception to response: 500 (Internal Server Error)
|
||||
javax.ws.rs.WebApplicationException
|
||||
at org.dspace.rest.Resource.processFinally(Resource.java:163)
|
||||
at org.dspace.rest.HandleResource.getObject(HandleResource.java:81)
|
||||
at sun.reflect.GeneratedMethodAccessor198.invoke(Unknown Source)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:606)
|
||||
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
|
||||
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
|
||||
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
|
||||
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
|
||||
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
|
||||
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
|
||||
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
|
||||
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
|
||||
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
|
||||
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
|
||||
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
|
||||
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
|
||||
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
|
||||
at org.dspace.rest.Resource.processFinally(Resource.java:163)
|
||||
at org.dspace.rest.HandleResource.getObject(HandleResource.java:81)
|
||||
at sun.reflect.GeneratedMethodAccessor198.invoke(Unknown Source)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:606)
|
||||
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
|
||||
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
|
||||
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
|
||||
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
|
||||
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
|
||||
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
|
||||
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
|
||||
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
|
||||
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
|
||||
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
|
||||
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
|
||||
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
|
||||
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
|
||||
...
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)</li>
|
||||
<li>After restarting Tomcat a few more of these errors were logged but the application was up</li>
|
||||
<li><p>Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)</p></li>
|
||||
|
||||
<li><p>After restarting Tomcat a few more of these errors were logged but the application was up</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-04-19">2016-04-19</h2>
|
||||
|
||||
<ul>
|
||||
<li>Get handles for items that are using a given metadata field, ie <code>dc.Species.animal</code> (105):</li>
|
||||
</ul>
|
||||
<li><p>Get handles for items that are using a given metadata field, ie <code>dc.Species.animal</code> (105):</p>
|
||||
|
||||
<pre><code># select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=105);
|
||||
handle
|
||||
handle
|
||||
-------------
|
||||
10568/10298
|
||||
10568/16413
|
||||
10568/16774
|
||||
10568/34487
|
||||
</code></pre>
|
||||
10568/10298
|
||||
10568/16413
|
||||
10568/16774
|
||||
10568/34487
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Delete metadata values for <code>dc.GRP</code> and <code>dc.icsubject.icrafsubject</code>:</li>
|
||||
</ul>
|
||||
<li><p>Delete metadata values for <code>dc.GRP</code> and <code>dc.icsubject.icrafsubject</code>:</p>
|
||||
|
||||
<pre><code># delete from metadatavalue where resource_type_id=2 and metadata_field_id=96;
|
||||
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=83;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>They are old ICRAF fields and we haven’t used them since 2011 or so</li>
|
||||
<li>Also delete them from the metadata registry</li>
|
||||
<li>CGSpace went down again, <code>dspace.log</code> had this:</li>
|
||||
</ul>
|
||||
<li><p>They are old ICRAF fields and we haven’t used them since 2011 or so</p></li>
|
||||
|
||||
<li><p>Also delete them from the metadata registry</p></li>
|
||||
|
||||
<li><p>CGSpace went down again, <code>dspace.log</code> had this:</p>
|
||||
|
||||
<pre><code>2016-04-19 15:02:17,025 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I restarted Tomcat and PostgreSQL and now it’s back up</li>
|
||||
<li>I bet this is the same crash as yesterday, but I only saw the errors in <code>catalina.out</code></li>
|
||||
<li>Looks to be related to this, from <code>dspace.log</code>:</li>
|
||||
</ul>
|
||||
<li><p>I restarted Tomcat and PostgreSQL and now it’s back up</p></li>
|
||||
|
||||
<li><p>I bet this is the same crash as yesterday, but I only saw the errors in <code>catalina.out</code></p></li>
|
||||
|
||||
<li><p>Looks to be related to this, from <code>dspace.log</code>:</p>
|
||||
|
||||
<pre><code>2016-04-19 15:16:34,670 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We have 18,000 of these errors right now…</li>
|
||||
<li>Delete a few more old metadata values: <code>dc.Species.animal</code>, <code>dc.type.journal</code>, and <code>dc.publicationcategory</code>:</li>
|
||||
</ul>
|
||||
<li><p>We have 18,000 of these errors right now…</p></li>
|
||||
|
||||
<li><p>Delete a few more old metadata values: <code>dc.Species.animal</code>, <code>dc.type.journal</code>, and <code>dc.publicationcategory</code>:</p>
|
||||
|
||||
<pre><code># delete from metadatavalue where resource_type_id=2 and metadata_field_id=105;
|
||||
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=85;
|
||||
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=95;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And then remove them from the metadata registry</li>
|
||||
<li><p>And then remove them from the metadata registry</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-04-20">2016-04-20</h2>
|
||||
@ -428,8 +422,8 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
|
||||
<ul>
|
||||
<li>Re-deploy DSpace Test with the new subject and type fields, run all system updates, and reboot the server</li>
|
||||
<li>Migrate fields and re-deploy CGSpace with the new subject and type fields, run all system updates, and reboot the server</li>
|
||||
<li>Field migration went well:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Field migration went well:</p>
|
||||
|
||||
<pre><code>$ ./migrate-fields.sh
|
||||
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
|
||||
@ -444,22 +438,23 @@ UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
|
||||
UPDATE 3872
|
||||
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
|
||||
UPDATE 46075
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also, I migrated CGSpace to using the PGDG PostgreSQL repo as the infrastructure playbooks had been using it for a while and it seemed to be working well</li>
|
||||
<li>Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)</li>
|
||||
<li>Looking into the REST API errors again, it looks like these started appearing a few days ago in the tens of thousands:</li>
|
||||
</ul>
|
||||
<li><p>Also, I migrated CGSpace to using the PGDG PostgreSQL repo as the infrastructure playbooks had been using it for a while and it seemed to be working well</p></li>
|
||||
|
||||
<li><p>Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)</p></li>
|
||||
|
||||
<li><p>Looking into the REST API errors again, it looks like these started appearing a few days ago in the tens of thousands:</p>
|
||||
|
||||
<pre><code>$ grep -c "Aborting context in finally statement" dspace.log.2016-04-20
|
||||
21252
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I found a recent discussion on the DSpace mailing list and I’ve asked for advice there</li>
|
||||
<li>Looks like this issue was noted and fixed in DSpace 5.5 (we’re on 5.1): <a href="https://jira.duraspace.org/browse/DS-2936">https://jira.duraspace.org/browse/DS-2936</a></li>
|
||||
<li>I’ve sent a message to Atmire asking about compatibility with DSpace 5.5</li>
|
||||
<li><p>I found a recent discussion on the DSpace mailing list and I’ve asked for advice there</p></li>
|
||||
|
||||
<li><p>Looks like this issue was noted and fixed in DSpace 5.5 (we’re on 5.1): <a href="https://jira.duraspace.org/browse/DS-2936">https://jira.duraspace.org/browse/DS-2936</a></p></li>
|
||||
|
||||
<li><p>I’ve sent a message to Atmire asking about compatibility with DSpace 5.5</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-04-21">2016-04-21</h2>
|
||||
@ -496,8 +491,8 @@ UPDATE 46075
|
||||
<ul>
|
||||
<li>I woke up to ten or fifteen “up” and “down” emails from the monitoring website</li>
|
||||
<li>Looks like the last one was “down” from about four hours ago</li>
|
||||
<li>I think there must be something with this REST stuff:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think there must be something with this REST stuff:</p>
|
||||
|
||||
<pre><code># grep -c "Aborting context in finally statement" dspace.log.2016-04-*
|
||||
dspace.log.2016-04-01:0
|
||||
@ -527,15 +522,19 @@ dspace.log.2016-04-24:28775
|
||||
dspace.log.2016-04-25:28626
|
||||
dspace.log.2016-04-26:28655
|
||||
dspace.log.2016-04-27:7271
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I restarted tomcat and it is back up</li>
|
||||
<li>Add Spanish XMLUI strings so those users see “CGSpace” instead of “DSpace” in the user interface (<a href="https://github.com/ilri/DSpace/pull/222">#222</a>)</li>
|
||||
<li>Submit patch to upstream DSpace for the misleading help text in the embargo step of the item submission: <a href="https://jira.duraspace.org/browse/DS-3172">https://jira.duraspace.org/browse/DS-3172</a></li>
|
||||
<li>Update infrastructure playbooks for nginx 1.10.x (stable) release: <a href="https://github.com/ilri/rmg-ansible-public/issues/32">https://github.com/ilri/rmg-ansible-public/issues/32</a></li>
|
||||
<li>Currently running on DSpace Test, we’ll give it a few days before we adjust CGSpace</li>
|
||||
<li>CGSpace down, restarted tomcat and it’s back up</li>
|
||||
<li><p>I restarted tomcat and it is back up</p></li>
|
||||
|
||||
<li><p>Add Spanish XMLUI strings so those users see “CGSpace” instead of “DSpace” in the user interface (<a href="https://github.com/ilri/DSpace/pull/222">#222</a>)</p></li>
|
||||
|
||||
<li><p>Submit patch to upstream DSpace for the misleading help text in the embargo step of the item submission: <a href="https://jira.duraspace.org/browse/DS-3172">https://jira.duraspace.org/browse/DS-3172</a></p></li>
|
||||
|
||||
<li><p>Update infrastructure playbooks for nginx 1.10.x (stable) release: <a href="https://github.com/ilri/rmg-ansible-public/issues/32">https://github.com/ilri/rmg-ansible-public/issues/32</a></p></li>
|
||||
|
||||
<li><p>Currently running on DSpace Test, we’ll give it a few days before we adjust CGSpace</p></li>
|
||||
|
||||
<li><p>CGSpace down, restarted tomcat and it’s back up</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-04-28">2016-04-28</h2>
|
||||
@ -548,17 +547,15 @@ dspace.log.2016-04-27:7271
|
||||
<h2 id="2016-04-30">2016-04-30</h2>
|
||||
|
||||
<ul>
|
||||
<li>Logs for today and yesterday have zero references to this REST error, so I’m going to open back up the REST API but log all requests</li>
|
||||
</ul>
|
||||
<li><p>Logs for today and yesterday have zero references to this REST error, so I’m going to open back up the REST API but log all requests</p>
|
||||
|
||||
<pre><code>location /rest {
|
||||
access_log /var/log/nginx/rest.log;
|
||||
proxy_pass http://127.0.0.1:8443;
|
||||
}
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I will check the logs again in a few days to look for patterns, see who is accessing it, etc</li>
|
||||
<li><p>I will check the logs again in a few days to look for patterns, see who is accessing it, etc</p></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -11,11 +11,12 @@
|
||||
|
||||
Since yesterday there have been 10,000 REST errors and the site has been unstable again
|
||||
I have blocked access to the API now
|
||||
There are 3,000 IPs accessing the REST API in a 24-hour period!
|
||||
|
||||
There are 3,000 IPs accessing the REST API in a 24-hour period!
|
||||
|
||||
# awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
|
||||
3168
|
||||
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-05/" />
|
||||
@ -29,13 +30,14 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
|
||||
|
||||
Since yesterday there have been 10,000 REST errors and the site has been unstable again
|
||||
I have blocked access to the API now
|
||||
There are 3,000 IPs accessing the REST API in a 24-hour period!
|
||||
|
||||
There are 3,000 IPs accessing the REST API in a 24-hour period!
|
||||
|
||||
# awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
|
||||
3168
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -119,24 +121,25 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
|
||||
<ul>
|
||||
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
|
||||
<li>I have blocked access to the API now</li>
|
||||
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
|
||||
</ul>
|
||||
|
||||
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
|
||||
3168
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
<li>The two most often requesters are in Ethiopia and Colombia: 213.55.99.121 and 181.118.144.29</li>
|
||||
<li>100% of the requests coming from Ethiopia are like this and result in an HTTP 500:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>100% of the requests coming from Ethiopia are like this and result in an HTTP 500:</p>
|
||||
|
||||
<pre><code>GET /rest/handle/10568/NaN?expand=parentCommunityList,metadata HTTP/1.1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>For now I’ll block just the Ethiopian IP</li>
|
||||
<li>The owner of that application has said that the <code>NaN</code> (not a number) is an error in his code and he’ll fix it</li>
|
||||
<li><p>For now I’ll block just the Ethiopian IP</p></li>
|
||||
|
||||
<li><p>The owner of that application has said that the <code>NaN</code> (not a number) is an error in his code and he’ll fix it</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-05-03">2016-05-03</h2>
|
||||
@ -156,8 +159,8 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
|
||||
<li>Hmm, also disk space is full</li>
|
||||
<li>I decided to blow away the solr indexes, since they are 50GB and we don’t really need all the Atmire stuff there right now</li>
|
||||
<li>I will re-generate the Discovery indexes after re-deploying</li>
|
||||
<li>Testing <code>renew-letsencrypt.sh</code> script for nginx</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing <code>renew-letsencrypt.sh</code> script for nginx</p>
|
||||
|
||||
<pre><code>#!/usr/bin/env bash
|
||||
|
||||
@ -174,16 +177,15 @@ LE_RESULT=$?
|
||||
$SERVICE_BIN nginx start
|
||||
|
||||
if [[ "$LE_RESULT" != 0 ]]; then
|
||||
echo 'Automated renewal failed:'
|
||||
echo 'Automated renewal failed:'
|
||||
|
||||
cat /var/log/letsencrypt/renew.log
|
||||
cat /var/log/letsencrypt/renew.log
|
||||
|
||||
exit 1
|
||||
exit 1
|
||||
fi
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Seems to work well</li>
|
||||
<li><p>Seems to work well</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-05-10">2016-05-10</h2>
|
||||
@ -221,17 +223,18 @@ fi
|
||||
|
||||
<li><p>There were a handful of conflicts that I didn’t understand</p></li>
|
||||
|
||||
<li><p>After completing the rebase I tried to build with the module versions Atmire had indicated as being 5.5 ready but I got this error:</p></li>
|
||||
</ul>
|
||||
<li><p>After completing the rebase I tried to build with the module versions Atmire had indicated as being 5.5 ready but I got this error:</p>
|
||||
|
||||
<pre><code>[ERROR] Failed to execute goal on project additions: Could not resolve dependencies for project org.dspace.modules:additions:jar:5.5: Could not find artifact com.atmire:atmire-metadata-quality-api:jar:5.5-2.10.1-0 in sonatype-releases (https://oss.sonatype.org/content/repositories/releases/) -> [Help 1]
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’ve sent them a question about it</li>
|
||||
<li>A user mentioned having problems with uploading a 33 MB PDF</li>
|
||||
<li>I told her I would increase the limit temporarily tomorrow morning</li>
|
||||
<li>Turns out she was able to decrease the size of the PDF so we didn’t have to do anything</li>
|
||||
<li><p>I’ve sent them a question about it</p></li>
|
||||
|
||||
<li><p>A user mentioned having problems with uploading a 33 MB PDF</p></li>
|
||||
|
||||
<li><p>I told her I would increase the limit temporarily tomorrow morning</p></li>
|
||||
|
||||
<li><p>Turns out she was able to decrease the size of the PDF so we didn’t have to do anything</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-05-12">2016-05-12</h2>
|
||||
@ -252,11 +255,12 @@ fi
|
||||
<li>Our <code>dc.contributor.affiliation</code> and <code>dc.contributor.corporate</code> could both map to <code>dc.contributor</code> and possibly <code>dc.contributor.center</code> depending on if it’s a CG center or not</li>
|
||||
<li><code>dc.title.jtitle</code> could either map to <code>dc.publisher</code> or <code>dc.source</code> depending on how you read things</li>
|
||||
</ul></li>
|
||||
<li>Found ~200 messed up CIAT values in <code>dc.publisher</code>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Found ~200 messed up CIAT values in <code>dc.publisher</code>:</p>
|
||||
|
||||
<pre><code># select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=39 and text_value similar to "% %";
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-05-13">2016-05-13</h2>
|
||||
|
||||
@ -277,65 +281,65 @@ fi
|
||||
<ul>
|
||||
<li>Work on 707 CCAFS records</li>
|
||||
<li>They have thumbnails on Flickr and elsewhere</li>
|
||||
<li>In OpenRefine I created a new <code>filename</code> column based on the <code>thumbnail</code> column with the following GREL:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>In OpenRefine I created a new <code>filename</code> column based on the <code>thumbnail</code> column with the following GREL:</p>
|
||||
|
||||
<pre><code>if(cells['thumbnails'].value.contains('hqdefault'), cells['thumbnails'].value.split('/')[-2] + '.jpg', cells['thumbnails'].value.split('/')[-1])
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Because ~400 records had the same filename on Flickr (hqdefault.jpg) but different UUIDs in the URL</li>
|
||||
<li>So for the <code>hqdefault.jpg</code> ones I just take the UUID (-2) and use it as the filename</li>
|
||||
<li>Before importing with SAFBuilder I tested adding “__bundle:THUMBNAIL” to the <code>filename</code> column and it works fine</li>
|
||||
<li><p>Because ~400 records had the same filename on Flickr (hqdefault.jpg) but different UUIDs in the URL</p></li>
|
||||
|
||||
<li><p>So for the <code>hqdefault.jpg</code> ones I just take the UUID (-2) and use it as the filename</p></li>
|
||||
|
||||
<li><p>Before importing with SAFBuilder I tested adding “__bundle:THUMBNAIL” to the <code>filename</code> column and it works fine</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-05-19">2016-05-19</h2>
|
||||
|
||||
<ul>
|
||||
<li>More quality control on <code>filename</code> field of CCAFS records to make processing in shell and SAFBuilder more reliable:</li>
|
||||
</ul>
|
||||
<li><p>More quality control on <code>filename</code> field of CCAFS records to make processing in shell and SAFBuilder more reliable:</p>
|
||||
|
||||
<pre><code>value.replace('_','').replace('-','')
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We need to hold off on moving <code>dc.Species</code> to <code>cg.species</code> because it is only used for plants, and might be better to move it to something like <code>cg.species.plant</code></li>
|
||||
<li>And <code>dc.identifier.fund</code> is MOSTLY used for CPWF project identifier but has some other sponsorship things
|
||||
<li><p>We need to hold off on moving <code>dc.Species</code> to <code>cg.species</code> because it is only used for plants, and might be better to move it to something like <code>cg.species.plant</code></p></li>
|
||||
|
||||
<li><p>And <code>dc.identifier.fund</code> is MOSTLY used for CPWF project identifier but has some other sponsorship things</p>
|
||||
|
||||
<ul>
|
||||
<li>We should move PN<em>, SG</em>, CBA, IA, and PHASE* values to <code>cg.identifier.cpwfproject</code></li>
|
||||
<li>The rest, like BMGF and USAID etc, might have to go to either <code>dc.description.sponsorship</code> or <code>cg.identifier.fund</code> (not sure yet)</li>
|
||||
<li>There are also some mistakes in CPWF’s things, like “PN 47”</li>
|
||||
<li>This ought to catch all the CPWF values (there don’t appear to be and SG* values):</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<li><p>This ought to catch all the CPWF values (there don’t appear to be and SG* values):</p>
|
||||
|
||||
<pre><code># select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA');
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-05-20">2016-05-20</h2>
|
||||
|
||||
<ul>
|
||||
<li>More work on CCAFS Video and Images records</li>
|
||||
<li>For SAFBuilder we need to modify filename column to have the thumbnail bundle:
|
||||
<br /></li>
|
||||
</ul>
|
||||
|
||||
<li><p>For SAFBuilder we need to modify filename column to have the thumbnail bundle:</p>
|
||||
|
||||
<pre><code>value + "__bundle:THUMBNAIL"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also, I fixed some weird characters using OpenRefine’s transform with the following GREL:</li>
|
||||
</ul>
|
||||
<li><p>Also, I fixed some weird characters using OpenRefine’s transform with the following GREL:</p>
|
||||
|
||||
<pre><code>value.replace(/\u0081/,'')
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Write shell script to resize thumbnails with height larger than 400: <a href="https://gist.github.com/alanorth/131401dcd39d00e0ce12e1be3ed13256">https://gist.github.com/alanorth/131401dcd39d00e0ce12e1be3ed13256</a></li>
|
||||
<li>Upload 707 CCAFS records to DSpace Test</li>
|
||||
<li>A few miscellaneous fixes for XMLUI display niggles (spaces in item lists and link target <code>_black</code>): <a href="https://github.com/ilri/DSpace/pull/224">#224</a></li>
|
||||
<li>Work on configuration changes for Phase 2 metadata migrations</li>
|
||||
<li><p>Write shell script to resize thumbnails with height larger than 400: <a href="https://gist.github.com/alanorth/131401dcd39d00e0ce12e1be3ed13256">https://gist.github.com/alanorth/131401dcd39d00e0ce12e1be3ed13256</a></p></li>
|
||||
|
||||
<li><p>Upload 707 CCAFS records to DSpace Test</p></li>
|
||||
|
||||
<li><p>A few miscellaneous fixes for XMLUI display niggles (spaces in item lists and link target <code>_black</code>): <a href="https://github.com/ilri/DSpace/pull/224">#224</a></p></li>
|
||||
|
||||
<li><p>Work on configuration changes for Phase 2 metadata migrations</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-05-23">2016-05-23</h2>
|
||||
@ -350,46 +354,44 @@ fi
|
||||
<h2 id="2016-05-30">2016-05-30</h2>
|
||||
|
||||
<ul>
|
||||
<li>Export CCAFS video and image records from DSpace Test using the migrate option (<code>-m</code>):</li>
|
||||
</ul>
|
||||
<li><p>Export CCAFS video and image records from DSpace Test using the migrate option (<code>-m</code>):</p>
|
||||
|
||||
<pre><code>$ mkdir ~/ccafs-images
|
||||
$ /home/dspacetest.cgiar.org/bin/dspace export -t COLLECTION -i 10568/79355 -d ~/ccafs-images -n 0 -m
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And then import to CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>And then import to CGSpace:</p>
|
||||
|
||||
<pre><code>$ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" /home/cgspace.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/70974 --source /tmp/ccafs-images --mapfile=/tmp/ccafs-images-may30.map &> /tmp/ccafs-images-may30.log
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But now we have double authors for “CGIAR Research Program on Climate Change, Agriculture and Food Security” in the authority</li>
|
||||
<li>I’m trying to do a Discovery index before messing with the authority index</li>
|
||||
<li>Looks like we are missing the <code>index-authority</code> cron job, so who knows what’s up with our authority index</li>
|
||||
<li>Run system updates on DSpace Test, re-deploy code, and reboot the server</li>
|
||||
<li>Clean up and import ~200 CTA records to CGSpace via CSV like:</li>
|
||||
</ul>
|
||||
<li><p>But now we have double authors for “CGIAR Research Program on Climate Change, Agriculture and Food Security” in the authority</p></li>
|
||||
|
||||
<li><p>I’m trying to do a Discovery index before messing with the authority index</p></li>
|
||||
|
||||
<li><p>Looks like we are missing the <code>index-authority</code> cron job, so who knows what’s up with our authority index</p></li>
|
||||
|
||||
<li><p>Run system updates on DSpace Test, re-deploy code, and reboot the server</p></li>
|
||||
|
||||
<li><p>Clean up and import ~200 CTA records to CGSpace via CSV like:</p>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8"
|
||||
$ /home/cgspace.cgiar.org/bin/dspace metadata-import -e aorth@mjanja.ch -f ~/CTA-May30/CTA-42229.csv &> ~/CTA-May30/CTA-42229.log
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Discovery indexing took a few hours for some reason, and after that I started the <code>index-authority</code> script</li>
|
||||
</ul>
|
||||
<li><p>Discovery indexing took a few hours for some reason, and after that I started the <code>index-authority</code> script</p>
|
||||
|
||||
<pre><code>$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" /home/cgspace.cgiar.org/bin/dspace index-authority
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-05-31">2016-05-31</h2>
|
||||
|
||||
<ul>
|
||||
<li>The <code>index-authority</code> script ran over night and was finished in the morning</li>
|
||||
<li>Hopefully this was because we haven’t been running it regularly and it will speed up next time</li>
|
||||
<li>I am running it again with a timer to see:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I am running it again with a timer to see:</p>
|
||||
|
||||
<pre><code>$ time /home/cgspace.cgiar.org/bin/dspace index-authority
|
||||
Retrieving all data
|
||||
@ -401,14 +403,17 @@ All done !
|
||||
real 37m26.538s
|
||||
user 2m24.627s
|
||||
sys 0m20.540s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Update <code>tomcat7</code> crontab on CGSpace and DSpace Test to have the <code>index-authority</code> script that we were missing</li>
|
||||
<li>Add new ILRI subject and CCAFS project tags to <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/226">#226</a>, <a href="https://github.com/ilri/DSpace/pull/225">#225</a>)</li>
|
||||
<li>Manually mapped the authors of a few old CCAFS records to the new CCAFS authority UUID and re-indexed authority indexes to see if it helps correct those items.</li>
|
||||
<li>Re-sync DSpace Test data with CGSpace</li>
|
||||
<li>Clean up and import ~65 more CTA items into CGSpace</li>
|
||||
<li><p>Update <code>tomcat7</code> crontab on CGSpace and DSpace Test to have the <code>index-authority</code> script that we were missing</p></li>
|
||||
|
||||
<li><p>Add new ILRI subject and CCAFS project tags to <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/226">#226</a>, <a href="https://github.com/ilri/DSpace/pull/225">#225</a>)</p></li>
|
||||
|
||||
<li><p>Manually mapped the authors of a few old CCAFS records to the new CCAFS authority UUID and re-indexed authority indexes to see if it helps correct those items.</p></li>
|
||||
|
||||
<li><p>Re-sync DSpace Test data with CGSpace</p></li>
|
||||
|
||||
<li><p>Clean up and import ~65 more CTA items into CGSpace</p></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -33,7 +33,7 @@ This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRec
|
||||
You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
|
||||
Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -137,91 +137,88 @@ UPDATE 14
|
||||
|
||||
<ul>
|
||||
<li>Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with <code>cg.coverage.admin-unit</code></li>
|
||||
<li>Seems that the Browse configuration in <code>dspace.cfg</code> can’t handle the ‘-’ in the field name:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Seems that the Browse configuration in <code>dspace.cfg</code> can’t handle the ‘-’ in the field name:</p>
|
||||
|
||||
<pre><code>webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error</li>
|
||||
<li>I’ve sent a message to the DSpace mailing list to ask about the Browse index definition</li>
|
||||
<li>A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue</li>
|
||||
<li>I found a thread on the mailing list talking about it and there is bug report and a patch: <a href="https://jira.duraspace.org/browse/DS-2740">https://jira.duraspace.org/browse/DS-2740</a></li>
|
||||
<li>The patch applies successfully on DSpace 5.1 so I will try it later</li>
|
||||
<li><p>But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error</p></li>
|
||||
|
||||
<li><p>I’ve sent a message to the DSpace mailing list to ask about the Browse index definition</p></li>
|
||||
|
||||
<li><p>A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue</p></li>
|
||||
|
||||
<li><p>I found a thread on the mailing list talking about it and there is bug report and a patch: <a href="https://jira.duraspace.org/browse/DS-2740">https://jira.duraspace.org/browse/DS-2740</a></p></li>
|
||||
|
||||
<li><p>The patch applies successfully on DSpace 5.1 so I will try it later</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-03">2016-06-03</h2>
|
||||
|
||||
<ul>
|
||||
<li>Investigating the CCAFS authority issue, I exported the metadata for the Videos collection</li>
|
||||
<li>The top two authors are:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top two authors are:</p>
|
||||
|
||||
<pre><code>CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500
|
||||
CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So the only difference is the “confidence”</li>
|
||||
<li>Ok, well THAT is interesting:</li>
|
||||
</ul>
|
||||
<li><p>So the only difference is the “confidence”</p></li>
|
||||
|
||||
<li><p>Ok, well THAT is interesting:</p>
|
||||
|
||||
<pre><code>dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
|
||||
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
|
||||
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, Alan | | -1
|
||||
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
|
||||
Orth, A. | 05c2c622-d252-4efb-b9ed-95a07d3adf11 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, A. | ab606e3a-2b04-4c7d-9423-14beccf54257 | -1
|
||||
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
|
||||
Orth, Alan | ad281dbf-ef81-4007-96c3-a7f5d2eaa6d9 | 600
|
||||
(13 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And now an actually relevent example:</li>
|
||||
</ul>
|
||||
<li><p>And now an actually relevent example:</p>
|
||||
|
||||
<pre><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence = 500;
|
||||
count
|
||||
count
|
||||
-------
|
||||
707
|
||||
707
|
||||
(1 row)
|
||||
|
||||
dspacetest=# select count(*) from metadatavalue where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security' and confidence != 500;
|
||||
count
|
||||
count
|
||||
-------
|
||||
253
|
||||
253
|
||||
(1 row)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Trying something experimental:</li>
|
||||
</ul>
|
||||
<li><p>Trying something experimental:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set confidence=500 where metadata_field_id=3 and text_value like 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
|
||||
UPDATE 960
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And then re-indexing authority and Discovery…?</li>
|
||||
<li>After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet</li>
|
||||
<li>The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:</li>
|
||||
</ul>
|
||||
<li><p>And then re-indexing authority and Discovery…?</p></li>
|
||||
|
||||
<li><p>After Discovery reindex the CCAFS authors are all together in the Authors sidebar facet</p></li>
|
||||
|
||||
<li><p>The docs for the ORCiD and Authority stuff for DSpace 5 mention changing the browse indexes to use the Authority as well:</p>
|
||||
|
||||
<pre><code>webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>That would only be for the “Browse by” function… so we’ll have to see what effect that has later</li>
|
||||
<li><p>That would only be for the “Browse by” function… so we’ll have to see what effect that has later</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-04">2016-06-04</h2>
|
||||
@ -235,13 +232,11 @@ UPDATE 960
|
||||
<h2 id="2016-06-07">2016-06-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>Figured out how to export a list of the unique values from a metadata field ordered by count:</li>
|
||||
</ul>
|
||||
<li><p>Figured out how to export a list of the unique values from a metadata field ordered by count:</p>
|
||||
|
||||
<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=29 group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><p>Identified the next round of fields to migrate:</p>
|
||||
|
||||
<ul>
|
||||
@ -266,17 +261,19 @@ UPDATE 960
|
||||
<ul>
|
||||
<li>Discuss controlled vocabularies for ~28 fields</li>
|
||||
<li>Looks like this is all we need: <a href="https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ConfiguringControlledVocabularies">https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ConfiguringControlledVocabularies</a></li>
|
||||
<li>I wrote an XPath expression to extract the ILRI subjects from <code>input-forms.xml</code> (uses xmlstartlet):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I wrote an XPath expression to extract the ILRI subjects from <code>input-forms.xml</code> (uses xmlstartlet):</p>
|
||||
|
||||
<pre><code>$ xml sel -t -m '//value-pairs[@value-pairs-name="ilrisubject"]/pair/displayed-value/text()' -c '.' -n dspace/config/input-forms.xml
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Write to Atmire about the use of <code>atmire.orcid.id</code> to see if we can change it</li>
|
||||
<li>Seems to be a virtual field that is queried from the authority cache… hmm</li>
|
||||
<li>In other news, I found out that the About page that we haven’t been using lives in <code>dspace/config/about.xml</code>, so now we can update the text</li>
|
||||
<li>File bug about <code>closed="true"</code> attribute of controlled vocabularies not working: <a href="https://jira.duraspace.org/browse/DS-3238">https://jira.duraspace.org/browse/DS-3238</a></li>
|
||||
<li><p>Write to Atmire about the use of <code>atmire.orcid.id</code> to see if we can change it</p></li>
|
||||
|
||||
<li><p>Seems to be a virtual field that is queried from the authority cache… hmm</p></li>
|
||||
|
||||
<li><p>In other news, I found out that the About page that we haven’t been using lives in <code>dspace/config/about.xml</code>, so now we can update the text</p></li>
|
||||
|
||||
<li><p>File bug about <code>closed="true"</code> attribute of controlled vocabularies not working: <a href="https://jira.duraspace.org/browse/DS-3238">https://jira.duraspace.org/browse/DS-3238</a></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-09">2016-06-09</h2>
|
||||
@ -292,24 +289,30 @@ UPDATE 960
|
||||
<ul>
|
||||
<li>Investigating authority confidences</li>
|
||||
<li>It looks like the values are documented in <code>Choices.java</code></li>
|
||||
<li>Experiment with setting all 960 CCAFS author values to be 500:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Experiment with setting all 960 CCAFS author values to be 500:</p>
|
||||
|
||||
<pre><code>dspacetest=# SELECT authority, confidence FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value = 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
|
||||
|
||||
dspacetest=# UPDATE metadatavalue set confidence = 500 where resource_type_id=2 AND metadata_field_id=3 AND text_value = 'CGIAR Research Program on Climate Change, Agriculture and Food Security';
|
||||
UPDATE 960
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After the database edit, I did a full Discovery re-index</li>
|
||||
<li>And now there are exactly 960 items in the authors facet for ‘CGIAR Research Program on Climate Change, Agriculture and Food Security’</li>
|
||||
<li>Now I ran the same on CGSpace</li>
|
||||
<li>Merge controlled vocabulary functionality for animal breeds to <code>5_x-prod</code> (<a href="https://github.com/ilri/DSpace/pull/236">#236</a>)</li>
|
||||
<li>Write python script to update metadata values in batch via PostgreSQL: <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a></li>
|
||||
<li>We need to use this to correct some pretty ugly values in fields like <code>dc.description.sponsorship</code></li>
|
||||
<li>Merge item display tweaks from earlier this week (<a href="https://github.com/ilri/DSpace/pull/231">#231</a>)</li>
|
||||
<li>Merge controlled vocabulary functionality for subregions (<a href="https://github.com/ilri/DSpace/pull/238">#238</a>)</li>
|
||||
<li><p>After the database edit, I did a full Discovery re-index</p></li>
|
||||
|
||||
<li><p>And now there are exactly 960 items in the authors facet for ‘CGIAR Research Program on Climate Change, Agriculture and Food Security’</p></li>
|
||||
|
||||
<li><p>Now I ran the same on CGSpace</p></li>
|
||||
|
||||
<li><p>Merge controlled vocabulary functionality for animal breeds to <code>5_x-prod</code> (<a href="https://github.com/ilri/DSpace/pull/236">#236</a>)</p></li>
|
||||
|
||||
<li><p>Write python script to update metadata values in batch via PostgreSQL: <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a></p></li>
|
||||
|
||||
<li><p>We need to use this to correct some pretty ugly values in fields like <code>dc.description.sponsorship</code></p></li>
|
||||
|
||||
<li><p>Merge item display tweaks from earlier this week (<a href="https://github.com/ilri/DSpace/pull/231">#231</a>)</p></li>
|
||||
|
||||
<li><p>Merge controlled vocabulary functionality for subregions (<a href="https://github.com/ilri/DSpace/pull/238">#238</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-11">2016-06-11</h2>
|
||||
@ -355,35 +358,33 @@ UPDATE 960
|
||||
<h2 id="2016-06-20">2016-06-20</h2>
|
||||
|
||||
<ul>
|
||||
<li>CGSpace’s HTTPS certificate expired last night and I didn’t notice, had to renew:</li>
|
||||
</ul>
|
||||
<li><p>CGSpace’s HTTPS certificate expired last night and I didn’t notice, had to renew:</p>
|
||||
|
||||
<pre><code># /opt/letsencrypt/letsencrypt-auto renew --standalone --pre-hook "/usr/bin/service nginx stop" --post-hook "/usr/bin/service nginx start"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I really need to fix that cron job…</li>
|
||||
<li><p>I really need to fix that cron job…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-24">2016-06-24</h2>
|
||||
|
||||
<ul>
|
||||
<li>Run the replacements/deletes for <code>dc.description.sponsorship</code> (investors) on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>Run the replacements/deletes for <code>dc.description.sponsorship</code> (investors) on CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i investors-not-blank-not-delete-85.csv -f dc.description.sponsorship -t 'correct investor' -m 29 -d cgspace -p 'fuuu' -u cgspace
|
||||
$ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.sponsorship -m 29 -d cgspace -p 'fuuu' -u cgspace
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The scripts for this are here:
|
||||
<li><p>The scripts for this are here:</p>
|
||||
|
||||
<ul>
|
||||
<li><a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a></li>
|
||||
<li><a href="https://gist.github.com/alanorth/bd7d58c947f686401a2b1fadc78736be">delete-metadata-values.py</a></li>
|
||||
</ul></li>
|
||||
<li>Add new sponsors to controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/244">#244</a>)</li>
|
||||
<li>Refine submission form labels and hints</li>
|
||||
|
||||
<li><p>Add new sponsors to controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/244">#244</a>)</p></li>
|
||||
|
||||
<li><p>Refine submission form labels and hints</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-28">2016-06-28</h2>
|
||||
@ -391,21 +392,19 @@ $ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.spons
|
||||
<ul>
|
||||
<li>Testing the cleanup of <code>dc.contributor.corporate</code> with 13 deletions and 121 replacements</li>
|
||||
<li>There are still ~97 fields that weren’t indicated to do anything</li>
|
||||
<li>After the above deletions and replacements I regenerated a CSV and sent it to Peter <em>et al</em> to have a look</li>
|
||||
</ul>
|
||||
|
||||
<li><p>After the above deletions and replacements I regenerated a CSV and sent it to Peter <em>et al</em> to have a look</p>
|
||||
|
||||
<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=126 group by text_value order by count desc) to /tmp/contributors-june28.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Re-evaluate <code>dc.contributor.corporate</code> and it seems we will move it to <code>dc.contributor.author</code> as this is more in line with how editors are actually using it</li>
|
||||
<li><p>Re-evaluate <code>dc.contributor.corporate</code> and it seems we will move it to <code>dc.contributor.author</code> as this is more in line with how editors are actually using it</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-06-29">2016-06-29</h2>
|
||||
|
||||
<ul>
|
||||
<li>Test run of <code>migrate-fields.sh</code> with the following re-mappings:</li>
|
||||
</ul>
|
||||
<li><p>Test run of <code>migrate-fields.sh</code> with the following re-mappings:</p>
|
||||
|
||||
<pre><code>72 55 #dc.source
|
||||
86 230 #cg.contributor.crp
|
||||
@ -417,20 +416,18 @@ $ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.spons
|
||||
74 220 #cg.identifier.doi
|
||||
79 222 #cg.identifier.googleurl
|
||||
89 223 #cg.identifier.dataurl
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run all cleanups and deletions of <code>dc.contributor.corporate</code> on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>Run all cleanups and deletions of <code>dc.contributor.corporate</code> on CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i Corporate-Authors-Fix-121.csv -f dc.contributor.corporate -t 'Correct style' -m 126 -d cgspace -u cgspace -p 'fuuu'
|
||||
$ ./fix-metadata-values.py -i Corporate-Authors-Fix-PB.csv -f dc.contributor.corporate -t 'should be' -m 126 -d cgspace -u cgspace -p 'fuuu'
|
||||
$ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-Delete-13.csv -m 126 -u cgspace -d cgspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Re-deploy CGSpace and DSpace Test with latest June changes</li>
|
||||
<li>Now the sharing and Altmetric bits are more prominent:</li>
|
||||
<li><p>Re-deploy CGSpace and DSpace Test with latest June changes</p></li>
|
||||
|
||||
<li><p>Now the sharing and Altmetric bits are more prominent:</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2016/06/xmlui-altmetric-sharing.png" alt="DSpace 5.1 XMLUI With Altmetric Badge" /></p>
|
||||
@ -443,18 +440,16 @@ $ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-D
|
||||
<h2 id="2016-06-30">2016-06-30</h2>
|
||||
|
||||
<ul>
|
||||
<li>Wow, there are 95 authors in the database who have ‘,’ at the end of their name:</li>
|
||||
</ul>
|
||||
<li><p>Wow, there are 95 authors in the database who have ‘,’ at the end of their name:</p>
|
||||
|
||||
<pre><code># select text_value from metadatavalue where metadata_field_id=3 and text_value like '%,';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We need to use something like this to fix them, need to write a proper regex later:</li>
|
||||
</ul>
|
||||
<li><p>We need to use something like this to fix them, need to write a proper regex later:</p>
|
||||
|
||||
<pre><code># update metadatavalue set text_value = regexp_replace(text_value, '(Poole, J),', '\1') where metadata_field_id=3 and text_value = 'Poole, J,';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
|
@ -10,18 +10,17 @@
|
||||
|
||||
|
||||
Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
|
||||
I think this query should find and replace all authors that have “,” at the end of their names:
|
||||
|
||||
I think this query should find and replace all authors that have “,” at the end of their names:
|
||||
|
||||
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
UPDATE 95
|
||||
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
text_value
|
||||
text_value
|
||||
------------
|
||||
(0 rows)
|
||||
|
||||
|
||||
|
||||
In this case the select query was showing 95 results before the update
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
@ -35,21 +34,20 @@ In this case the select query was showing 95 results before the update
|
||||
|
||||
|
||||
Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
|
||||
I think this query should find and replace all authors that have “,” at the end of their names:
|
||||
|
||||
I think this query should find and replace all authors that have “,” at the end of their names:
|
||||
|
||||
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
UPDATE 95
|
||||
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
text_value
|
||||
text_value
|
||||
------------
|
||||
(0 rows)
|
||||
|
||||
|
||||
|
||||
In this case the select query was showing 95 results before the update
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -132,19 +130,18 @@ In this case the select query was showing 95 results before the update
|
||||
|
||||
<ul>
|
||||
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
|
||||
<li>I think this query should find and replace all authors that have “,” at the end of their names:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think this query should find and replace all authors that have “,” at the end of their names:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
UPDATE 95
|
||||
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
text_value
|
||||
text_value
|
||||
------------
|
||||
(0 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In this case the select query was showing 95 results before the update</li>
|
||||
<li><p>In this case the select query was showing 95 results before the update</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-07-02">2016-07-02</h2>
|
||||
@ -164,31 +161,31 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<ul>
|
||||
<li>Amend <code>backup-solr.sh</code> script so it backs up the entire Solr folder</li>
|
||||
<li>We <em>really</em> only need <code>statistics</code> and <code>authority</code> but meh</li>
|
||||
<li>Fix metadata for species on DSpace Test:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Fix metadata for species on DSpace Test:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 94 -d dspacetest -u dspacetest -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Will run later on CGSpace</li>
|
||||
<li>A user is still having problems with Sherpa/Romeo causing crashes during the submission process when the journal is “ungraded”</li>
|
||||
<li>I tested the <a href="https://jira.duraspace.org/browse/DS-2740">patch for DS-2740</a> that I had found last month and it seems to work</li>
|
||||
<li>I will merge it to <code>5_x-prod</code></li>
|
||||
<li><p>Will run later on CGSpace</p></li>
|
||||
|
||||
<li><p>A user is still having problems with Sherpa/Romeo causing crashes during the submission process when the journal is “ungraded”</p></li>
|
||||
|
||||
<li><p>I tested the <a href="https://jira.duraspace.org/browse/DS-2740">patch for DS-2740</a> that I had found last month and it seems to work</p></li>
|
||||
|
||||
<li><p>I will merge it to <code>5_x-prod</code></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-07-06">2016-07-06</h2>
|
||||
|
||||
<ul>
|
||||
<li>Delete 23 blank metadata values from CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>Delete 23 blank metadata values from CGSpace:</p>
|
||||
|
||||
<pre><code>cgspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
|
||||
DELETE 23
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Complete phase three of metadata migration, for the following fields:
|
||||
<li><p>Complete phase three of metadata migration, for the following fields:</p>
|
||||
|
||||
<ul>
|
||||
<li>dc.title.jtitle → dc.source</li>
|
||||
@ -202,27 +199,26 @@ DELETE 23
|
||||
<li>dc.identifier.googleurl → cg.identifier.googleurl</li>
|
||||
<li>dc.identifier.dataurl → cg.identifier.dataurl</li>
|
||||
</ul></li>
|
||||
<li>Also, run fixes and deletes for species and author affiliations (over 1000 corrections!)</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Also, run fixes and deletes for species and author affiliations (over 1000 corrections!)</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 212 -d dspace -u dspace -p 'fuuu'
|
||||
$ ./fix-metadata-values.py -i Affiliations-Fix-1045-Peter-Abenet.csv -f dc.contributor.affiliation -t Correct -m 211 -d dspace -u dspace -p 'fuuu'
|
||||
$ ./delete-metadata-values.py -f dc.contributor.affiliation -i Affiliations-Delete-Peter-Abenet.csv -m 211 -u dspace -d dspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I then ran all server updates and rebooted the server</li>
|
||||
<li><p>I then ran all server updates and rebooted the server</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-07-11">2016-07-11</h2>
|
||||
|
||||
<ul>
|
||||
<li>Doing some author cleanups from Peter and Abenet:</li>
|
||||
</ul>
|
||||
<li><p>Doing some author cleanups from Peter and Abenet:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/Authors-Fix-205-UTF8.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
|
||||
$ ./delete-metadata-values.py -f dc.contributor.author -i /tmp/Authors-Delete-UTF8.csv -m 3 -u dspacetest -d dspacetest -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-07-13">2016-07-13</h2>
|
||||
|
||||
@ -242,36 +238,33 @@ $ ./delete-metadata-values.py -f dc.contributor.author -i /tmp/Authors-Delete-UT
|
||||
<ul>
|
||||
<li>Adjust identifiers in XMLUI item display to be more prominent</li>
|
||||
<li>Add species and breed to the XMLUI item display</li>
|
||||
<li>CGSpace crashed late at night and the DSpace logs were showing:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>CGSpace crashed late at night and the DSpace logs were showing:</p>
|
||||
|
||||
<pre><code>2016-07-18 20:26:30,941 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
|
||||
...
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I suspect it’s someone hitting REST too much:</li>
|
||||
</ul>
|
||||
<li><p>I suspect it’s someone hitting REST too much:</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | sort -n | uniq -c | sort -h | tail -n 3
|
||||
710 66.249.78.38
|
||||
1781 181.118.144.29
|
||||
24904 70.32.99.142
|
||||
</code></pre>
|
||||
710 66.249.78.38
|
||||
1781 181.118.144.29
|
||||
24904 70.32.99.142
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I just blocked access to <code>/rest</code> for that last IP for now:</li>
|
||||
<li><p>I just blocked access to <code>/rest</code> for that last IP for now:</p>
|
||||
|
||||
<pre><code> # log rest requests
|
||||
location /rest {
|
||||
access_log /var/log/nginx/rest.log;
|
||||
proxy_pass http://127.0.0.1:8443;
|
||||
deny 70.32.99.142;
|
||||
}
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<pre><code> # log rest requests
|
||||
location /rest {
|
||||
access_log /var/log/nginx/rest.log;
|
||||
proxy_pass http://127.0.0.1:8443;
|
||||
deny 70.32.99.142;
|
||||
}
|
||||
</code></pre>
|
||||
|
||||
<h2 id="2016-07-21">2016-07-21</h2>
|
||||
|
||||
<ul>
|
||||
@ -287,84 +280,79 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
|
||||
<li>Altmetric reports having an issue with some of our authors being doubled…</li>
|
||||
<li>This is related to authority and confidence!</li>
|
||||
<li>We might need to use <code>index.authority.ignore-prefered=true</code> to tell the Discovery index to prefer the variation that exists in the metadatavalue rather than what it finds in the authority cache.</li>
|
||||
<li>Trying these on DSpace Test after a discussion by Daniel Scharon on the dspace-tech mailing list:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Trying these on DSpace Test after a discussion by Daniel Scharon on the dspace-tech mailing list:</p>
|
||||
|
||||
<pre><code>index.authority.ignore-prefered.dc.contributor.author=true
|
||||
index.authority.ignore-variants.dc.contributor.author=false
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After reindexing I don’t see any change in Discovery’s display of authors, and still have entries like:</li>
|
||||
</ul>
|
||||
<li><p>After reindexing I don’t see any change in Discovery’s display of authors, and still have entries like:</p>
|
||||
|
||||
<pre><code>Grace, D. (464)
|
||||
Grace, D. (62)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I asked for clarification of the following options on the DSpace mailing list:</li>
|
||||
</ul>
|
||||
<li><p>I asked for clarification of the following options on the DSpace mailing list:</p>
|
||||
|
||||
<pre><code>index.authority.ignore
|
||||
index.authority.ignore-prefered
|
||||
index.authority.ignore-variants
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In the mean time, I will try these on DSpace Test (plus a reindex):</li>
|
||||
</ul>
|
||||
<li><p>In the mean time, I will try these on DSpace Test (plus a reindex):</p>
|
||||
|
||||
<pre><code>index.authority.ignore=true
|
||||
index.authority.ignore-prefered=true
|
||||
index.authority.ignore-variants=true
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Enabled usage of <code>X-Forwarded-For</code> in DSpace admin control panel (<a href="https://github.com/ilri/DSpace/pull/255">#255</a></li>
|
||||
<li>It was misconfigured and disabled, but already working for some reason <em>sigh</em></li>
|
||||
<li>… no luck. Trying with just:</li>
|
||||
</ul>
|
||||
<li><p>Enabled usage of <code>X-Forwarded-For</code> in DSpace admin control panel (<a href="https://github.com/ilri/DSpace/pull/255">#255</a></p></li>
|
||||
|
||||
<li><p>It was misconfigured and disabled, but already working for some reason <em>sigh</em></p></li>
|
||||
|
||||
<li><p>… no luck. Trying with just:</p>
|
||||
|
||||
<pre><code>index.authority.ignore=true
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After re-indexing and clearing the XMLUI cache nothing has changed</li>
|
||||
<li><p>After re-indexing and clearing the XMLUI cache nothing has changed</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-07-25">2016-07-25</h2>
|
||||
|
||||
<ul>
|
||||
<li>Trying a few more settings (plus reindex) for Discovery on DSpace Test:</li>
|
||||
</ul>
|
||||
<li><p>Trying a few more settings (plus reindex) for Discovery on DSpace Test:</p>
|
||||
|
||||
<pre><code>index.authority.ignore-prefered.dc.contributor.author=true
|
||||
index.authority.ignore-variants=true
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run all OS updates and reboot DSpace Test server</li>
|
||||
<li>No changes to Discovery after reindexing… hmm.</li>
|
||||
<li>Integrate and massively clean up About page (<a href="https://github.com/ilri/DSpace/pull/256">#256</a>)</li>
|
||||
<li><p>Run all OS updates and reboot DSpace Test server</p></li>
|
||||
|
||||
<li><p>No changes to Discovery after reindexing… hmm.</p></li>
|
||||
|
||||
<li><p>Integrate and massively clean up About page (<a href="https://github.com/ilri/DSpace/pull/256">#256</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2016/07/cgspace-about-page.png" alt="About page" /></p>
|
||||
|
||||
<ul>
|
||||
<li>The DSpace source code mentions the configuration key <code>discovery.index.authority.ignore-prefered.*</code> (with prefix of discovery, despite the docs saying otherwise), so I’m trying the following on DSpace Test:</li>
|
||||
</ul>
|
||||
<li><p>The DSpace source code mentions the configuration key <code>discovery.index.authority.ignore-prefered.*</code> (with prefix of discovery, despite the docs saying otherwise), so I’m trying the following on DSpace Test:</p>
|
||||
|
||||
<pre><code>discovery.index.authority.ignore-prefered.dc.contributor.author=true
|
||||
discovery.index.authority.ignore-variants=true
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Still no change!</li>
|
||||
<li>Deploy species, breed, and identifier changes to CGSpace, as well as About page</li>
|
||||
<li>Run Linode RAM upgrade (8→12GB)</li>
|
||||
<li>Re-sync DSpace Test with CGSpace</li>
|
||||
<li>I noticed that our backup scripts don’t send Solr cores to S3 so I amended the script</li>
|
||||
<li><p>Still no change!</p></li>
|
||||
|
||||
<li><p>Deploy species, breed, and identifier changes to CGSpace, as well as About page</p></li>
|
||||
|
||||
<li><p>Run Linode RAM upgrade (8→12GB)</p></li>
|
||||
|
||||
<li><p>Re-sync DSpace Test with CGSpace</p></li>
|
||||
|
||||
<li><p>I noticed that our backup scripts don’t send Solr cores to S3 so I amended the script</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-07-31">2016-07-31</h2>
|
||||
|
@ -14,12 +14,13 @@ Play with upgrading Mirage 2 dependencies in bower.json because most are several
|
||||
Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more
|
||||
bower stuff is a dead end, waste of time, too many issues
|
||||
Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of fonts)
|
||||
Start working on DSpace 5.1 → 5.5 port:
|
||||
|
||||
Start working on DSpace 5.1 → 5.5 port:
|
||||
|
||||
$ git checkout -b 55new 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-08/" />
|
||||
@ -36,14 +37,15 @@ Play with upgrading Mirage 2 dependencies in bower.json because most are several
|
||||
Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more
|
||||
bower stuff is a dead end, waste of time, too many issues
|
||||
Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of fonts)
|
||||
Start working on DSpace 5.1 → 5.5 port:
|
||||
|
||||
Start working on DSpace 5.1 → 5.5 port:
|
||||
|
||||
$ git checkout -b 55new 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -130,13 +132,14 @@ $ git rebase -i dspace-5.5
|
||||
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
|
||||
<li>bower stuff is a dead end, waste of time, too many issues</li>
|
||||
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
|
||||
<li>Start working on DSpace 5.1 → 5.5 port:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Start working on DSpace 5.1 → 5.5 port:</p>
|
||||
|
||||
<pre><code>$ git checkout -b 55new 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
<li>Lots of conflicts that don’t make sense (ie, shouldn’t conflict!)</li>
|
||||
@ -168,11 +171,12 @@ $ git rebase -i dspace-5.5
|
||||
|
||||
<ul>
|
||||
<li>Fix item display incorrectly displaying Species when Breeds were present (<a href="https://github.com/ilri/DSpace/pull/260">#260</a>)</li>
|
||||
<li>Experiment with fixing more authors, like Delia Grace:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Experiment with fixing more authors, like Delia Grace:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where metadata_field_id=3 and text_value='Grace, D.';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-08-06">2016-08-06</h2>
|
||||
|
||||
@ -194,20 +198,18 @@ $ git rebase -i dspace-5.5
|
||||
<li>Ooh, and vanilla DSpace 5.5 works on Tomcat 8 with Java 8!</li>
|
||||
<li>Some notes about setting up Tomcat 8, since it’s new on this machine…</li>
|
||||
<li>Install latest Oracle Java 8 JDK</li>
|
||||
<li>Create <code>setenv.sh</code> in Tomcat 8 <code>libexec/bin</code> directory:
|
||||
<br /></li>
|
||||
</ul>
|
||||
|
||||
<li><p>Create <code>setenv.sh</code> in Tomcat 8 <code>libexec/bin</code> directory:</p>
|
||||
|
||||
<pre><code>CATALINA_OPTS="-Djava.awt.headless=true -Xms3072m -Xmx3072m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Dfile.encoding=UTF-8"
|
||||
CATALINA_OPTS="$CATALINA_OPTS -Djava.library.path=/opt/brew/Cellar/tomcat-native/1.2.8/lib"
|
||||
|
||||
JRE_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Edit Tomcat 8 <code>server.xml</code> to add regular HTTP listener for solr</li>
|
||||
<li>Symlink webapps:</li>
|
||||
</ul>
|
||||
<li><p>Edit Tomcat 8 <code>server.xml</code> to add regular HTTP listener for solr</p></li>
|
||||
|
||||
<li><p>Symlink webapps:</p>
|
||||
|
||||
<pre><code>$ rm -rf /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/ROOT
|
||||
$ ln -sv ~/dspace/webapps/xmlui /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/ROOT
|
||||
@ -215,7 +217,8 @@ $ ln -sv ~/dspace/webapps/oai /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/oai
|
||||
$ ln -sv ~/dspace/webapps/jspui /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/jspui
|
||||
$ ln -sv ~/dspace/webapps/rest /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/rest
|
||||
$ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/solr
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-08-09">2016-08-09</h2>
|
||||
|
||||
@ -280,14 +283,13 @@ $ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/sol
|
||||
|
||||
<ul>
|
||||
<li>Fix “CONGO,DR” country name in <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/264">#264</a>)</li>
|
||||
<li>Also need to fix existing records using the incorrect form in the database:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Also need to fix existing records using the incorrect form in the database:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value='CONGO, DR' where resource_type_id=2 and metadata_field_id=228 and text_value='CONGO,DR';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I asked a question on the DSpace mailing list about updating “preferred” forms of author names from ORCID</li>
|
||||
<li><p>I asked a question on the DSpace mailing list about updating “preferred” forms of author names from ORCID</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-08-21">2016-08-21</h2>
|
||||
@ -303,8 +305,7 @@ $ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/sol
|
||||
<h2 id="2016-08-22">2016-08-22</h2>
|
||||
|
||||
<ul>
|
||||
<li>Database migrations are fine on DSpace 5.1:</li>
|
||||
</ul>
|
||||
<li><p>Database migrations are fine on DSpace 5.1:</p>
|
||||
|
||||
<pre><code>$ ~/dspace/bin/dspace database info
|
||||
|
||||
@ -335,10 +336,9 @@ Database Driver: PostgreSQL Native Driver version PostgreSQL 9.1 JDBC4 (build 90
|
||||
| 5.1.2015.12.03 | Atmire CUA 4 migration | 2016-03-21 17:10:41 | Success |
|
||||
| 5.1.2015.12.03 | Atmire MQM migration | 2016-03-21 17:10:42 | Success |
|
||||
+----------------+----------------------------+---------------------+---------+
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So I’m not sure why they have problems when we move to DSpace 5.5 (even the 5.1 migrations themselves show as “Missing”)</li>
|
||||
<li><p>So I’m not sure why they have problems when we move to DSpace 5.5 (even the 5.1 migrations themselves show as “Missing”)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-08-23">2016-08-23</h2>
|
||||
@ -346,98 +346,92 @@ Database Driver: PostgreSQL Native Driver version PostgreSQL 9.1 JDBC4 (build 90
|
||||
<ul>
|
||||
<li>Help Paola from CCAFS with her thumbnails again</li>
|
||||
<li>Talk to Atmire about the DSpace 5.5 issue, and it seems to be caused by a bug in FlywayDB</li>
|
||||
<li>They said I should delete the Atmire migrations
|
||||
<br /></li>
|
||||
</ul>
|
||||
|
||||
<li><p>They said I should delete the Atmire migrations</p>
|
||||
|
||||
<pre><code>dspacetest=# delete from schema_version where description = 'Atmire CUA 4 migration' and version='5.1.2015.12.03.2';
|
||||
dspacetest=# delete from schema_version where description = 'Atmire MQM migration' and version='5.1.2015.12.03.3';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After that DSpace starts up by XMLUI now has unrelated issues that I need to solve!</li>
|
||||
</ul>
|
||||
<li><p>After that DSpace starts up by XMLUI now has unrelated issues that I need to solve!</p>
|
||||
|
||||
<pre><code>org.apache.avalon.framework.configuration.ConfigurationException: Type 'ThemeResourceReader' does not exist for 'map:read' at jndi:/localhost/themes/0_CGIAR/sitemap.xmap:136:77
|
||||
context:/jndi:/localhost/themes/0_CGIAR/sitemap.xmap - 136:77
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looks like we’re missing some stuff in the XMLUI module’s <code>sitemap.xmap</code>, as well as in each of our XMLUI themes</li>
|
||||
<li>Diff them with these to get the <code>ThemeResourceReader</code> changes:
|
||||
<li><p>Looks like we’re missing some stuff in the XMLUI module’s <code>sitemap.xmap</code>, as well as in each of our XMLUI themes</p></li>
|
||||
|
||||
<li><p>Diff them with these to get the <code>ThemeResourceReader</code> changes:</p>
|
||||
|
||||
<ul>
|
||||
<li><code>dspace-xmlui/src/main/webapp/sitemap.xmap</code></li>
|
||||
<li><code>dspace-xmlui-mirage2/src/main/webapp/sitemap.xmap</code></li>
|
||||
</ul></li>
|
||||
<li>Then we had some NullPointerException from the SolrLogger class, which is apparently part of Atmire’s CUA module</li>
|
||||
<li>I tried with a small version bump to CUA but it didn’t work (version <code>5.5-4.1.1-0</code>)</li>
|
||||
<li>Also, I started looking into huge pages to prepare for PostgreSQL 9.5, but it seems Linode’s kernels don’t enable them</li>
|
||||
|
||||
<li><p>Then we had some NullPointerException from the SolrLogger class, which is apparently part of Atmire’s CUA module</p></li>
|
||||
|
||||
<li><p>I tried with a small version bump to CUA but it didn’t work (version <code>5.5-4.1.1-0</code>)</p></li>
|
||||
|
||||
<li><p>Also, I started looking into huge pages to prepare for PostgreSQL 9.5, but it seems Linode’s kernels don’t enable them</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-08-24">2016-08-24</h2>
|
||||
|
||||
<ul>
|
||||
<li>Clean up and import 48 CCAFS records into DSpace Test</li>
|
||||
<li>SQL to get all journal titles from dc.source (55), since it’s apparently used for internal DSpace filename shit, but we moved all our journal titles there a few months ago:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>SQL to get all journal titles from dc.source (55), since it’s apparently used for internal DSpace filename shit, but we moved all our journal titles there a few months ago:</p>
|
||||
|
||||
<pre><code>dspacetest=# select distinct text_value from metadatavalue where metadata_field_id=55 and text_value !~ '.*(\.pdf|\.png|\.PDF|\.Pdf|\.JPEG|\.jpg|\.JPG|\.jpeg|\.xls|\.rtf|\.docx?|\.potx|\.dotx|\.eqa|\.tiff|\.mp4|\.mp3|\.gif|\.zip|\.txt|\.pptx|\.indd|\.PNG|\.bmp|\.exe|org\.dspace\.app\.mediafilter).*';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-08-25">2016-08-25</h2>
|
||||
|
||||
<ul>
|
||||
<li>Atmire suggested adding a missing bean to <code>dspace/config/spring/api/atmire-cua.xml</code> but it doesn’t help:</li>
|
||||
</ul>
|
||||
<li><p>Atmire suggested adding a missing bean to <code>dspace/config/spring/api/atmire-cua.xml</code> but it doesn’t help:</p>
|
||||
|
||||
<pre><code>...
|
||||
Error creating bean with name 'MetadataStorageInfoService'
|
||||
...
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Atmire sent an updated version of <code>dspace/config/spring/api/atmire-cua.xml</code> and now XMLUI starts but gives a null pointer exception:</li>
|
||||
</ul>
|
||||
<li><p>Atmire sent an updated version of <code>dspace/config/spring/api/atmire-cua.xml</code> and now XMLUI starts but gives a null pointer exception:</p>
|
||||
|
||||
<pre><code>Java stacktrace: java.lang.NullPointerException
|
||||
at org.dspace.app.xmlui.aspect.statistics.Navigation.addOptions(Navigation.java:129)
|
||||
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:228)
|
||||
at sun.reflect.GeneratedMethodAccessor126.invoke(Unknown Source)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:606)
|
||||
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
|
||||
at com.sun.proxy.$Proxy103.startElement(Unknown Source)
|
||||
at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
|
||||
at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
|
||||
at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
|
||||
at org.dspace.app.xmlui.aspect.statistics.Navigation.addOptions(Navigation.java:129)
|
||||
at org.dspace.app.xmlui.wing.AbstractWingTransformer.startElement(AbstractWingTransformer.java:228)
|
||||
at sun.reflect.GeneratedMethodAccessor126.invoke(Unknown Source)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:606)
|
||||
at org.apache.cocoon.core.container.spring.avalon.PoolableProxyHandler.invoke(PoolableProxyHandler.java:71)
|
||||
at com.sun.proxy.$Proxy103.startElement(Unknown Source)
|
||||
at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
|
||||
at org.apache.cocoon.environment.internal.EnvironmentChanger.startElement(EnvironmentStack.java:140)
|
||||
at org.apache.cocoon.xml.AbstractXMLPipe.startElement(AbstractXMLPipe.java:94)
|
||||
...
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Import the 47 CCAFS records to CGSpace, creating the SimpleArchiveFormat bundles and importing like:</li>
|
||||
</ul>
|
||||
<li><p>Import the 47 CCAFS records to CGSpace, creating the SimpleArchiveFormat bundles and importing like:</p>
|
||||
|
||||
<pre><code>$ ./safbuilder.sh -c /tmp/Thumbnails\ to\ Upload\ to\ CGSpace/3546.csv
|
||||
$ JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m" /home/cgspace.cgiar.org/bin/dspace import -a -e aorth@mjanja.ch -c 10568/3546 -s /tmp/Thumbnails\ to\ Upload\ to\ CGSpace/SimpleArchiveFormat -m 3546.map
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Finally got DSpace 5.5 working with the Atmire modules after a few rounds of back and forth with Atmire devs</li>
|
||||
<li><p>Finally got DSpace 5.5 working with the Atmire modules after a few rounds of back and forth with Atmire devs</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-08-26">2016-08-26</h2>
|
||||
|
||||
<ul>
|
||||
<li>CGSpace had issues tonight, not entirely crashing, but becoming unresponsive</li>
|
||||
<li>The dspace log had this:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The dspace log had this:</p>
|
||||
|
||||
<pre><code>2016-08-26 20:48:05,040 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Related to /rest no doubt</li>
|
||||
<li><p>Related to /rest no doubt</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-08-27">2016-08-27</h2>
|
||||
|
@ -12,10 +12,11 @@
|
||||
Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors
|
||||
Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace
|
||||
We had been using DC=ILRI to determine whether a user was ILRI or not
|
||||
|
||||
It looks like we might be able to use OUs now, instead of DCs:
|
||||
|
||||
|
||||
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
|
||||
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-09/" />
|
||||
@ -30,12 +31,13 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=or
|
||||
Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors
|
||||
Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace
|
||||
We had been using DC=ILRI to determine whether a user was ILRI or not
|
||||
|
||||
It looks like we might be able to use OUs now, instead of DCs:
|
||||
|
||||
|
||||
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -120,29 +122,27 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=or
|
||||
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
|
||||
<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
|
||||
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
|
||||
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
|
||||
|
||||
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
<li>User who has been migrated to the root vs user still in the hierarchical structure:</li>
|
||||
</ul>
|
||||
<li><p>User who has been migrated to the root vs user still in the hierarchical structure:</p>
|
||||
|
||||
<pre><code>distinguishedName: CN=Last\, First (ILRI),OU=ILRI Kenya Employees,OU=ILRI Kenya,OU=ILRIHUB,DC=CGIARAD,DC=ORG
|
||||
distinguishedName: CN=Last\, First (ILRI),OU=ILRI Ethiopia Employees,OU=ILRI Ethiopia,DC=ILRI,DC=CGIARAD,DC=ORG
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Changing the DSpace LDAP config to use <code>OU=ILRIHUB</code> seems to work:</li>
|
||||
<li><p>Changing the DSpace LDAP config to use <code>OU=ILRIHUB</code> seems to work:</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2016/09/ilri-ldap-users.png" alt="DSpace groups based on LDAP DN" /></p>
|
||||
|
||||
<ul>
|
||||
<li>Notes for local PostgreSQL database recreation from production snapshot:</li>
|
||||
</ul>
|
||||
<li><p>Notes for local PostgreSQL database recreation from production snapshot:</p>
|
||||
|
||||
<pre><code>$ dropdb dspacetest
|
||||
$ createdb -O dspacetest --encoding=UNICODE dspacetest
|
||||
@ -151,96 +151,83 @@ $ pg_restore -O -U dspacetest -d dspacetest ~/Downloads/cgspace_2016-09-01.backu
|
||||
$ psql dspacetest -c 'alter user dspacetest nocreateuser;'
|
||||
$ psql -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest -h localhost
|
||||
$ vacuumdb dspacetest
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Some names that I thought I fixed in July seem not to be:</li>
|
||||
</ul>
|
||||
<li><p>Some names that I thought I fixed in July seem not to be:</p>
|
||||
|
||||
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
-----------------------+--------------------------------------+------------
|
||||
Poole, Elizabeth Jane | b6efa27f-8829-4b92-80fe-bc63e03e3ccb | 600
|
||||
Poole, Elizabeth Jane | 41628f42-fc38-4b38-b473-93aec9196326 | 600
|
||||
Poole, Elizabeth Jane | 83b82da0-f652-4ebc-babc-591af1697919 | 600
|
||||
Poole, Elizabeth Jane | c3a22456-8d6a-41f9-bba0-de51ef564d45 | 600
|
||||
Poole, E.J. | c3a22456-8d6a-41f9-bba0-de51ef564d45 | 600
|
||||
Poole, E.J. | 0fbd91b9-1b71-4504-8828-e26885bf8b84 | 600
|
||||
Poole, Elizabeth Jane | b6efa27f-8829-4b92-80fe-bc63e03e3ccb | 600
|
||||
Poole, Elizabeth Jane | 41628f42-fc38-4b38-b473-93aec9196326 | 600
|
||||
Poole, Elizabeth Jane | 83b82da0-f652-4ebc-babc-591af1697919 | 600
|
||||
Poole, Elizabeth Jane | c3a22456-8d6a-41f9-bba0-de51ef564d45 | 600
|
||||
Poole, E.J. | c3a22456-8d6a-41f9-bba0-de51ef564d45 | 600
|
||||
Poole, E.J. | 0fbd91b9-1b71-4504-8828-e26885bf8b84 | 600
|
||||
(6 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>At least a few of these actually have the correct ORCID, but I will unify the authority to be c3a22456-8d6a-41f9-bba0-de51ef564d45</li>
|
||||
</ul>
|
||||
<li><p>At least a few of these actually have the correct ORCID, but I will unify the authority to be c3a22456-8d6a-41f9-bba0-de51ef564d45</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set authority='c3a22456-8d6a-41f9-bba0-de51ef564d45', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Poole, %';
|
||||
UPDATE 69
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And for Peter Ballantyne:</li>
|
||||
</ul>
|
||||
<li><p>And for Peter Ballantyne:</p>
|
||||
|
||||
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
-------------------+--------------------------------------+------------
|
||||
Ballantyne, Peter | 2dcbcc7b-47b0-4fd7-bef9-39d554494081 | 600
|
||||
Ballantyne, Peter | 4f04ca06-9a76-4206-bd9c-917ca75d278e | 600
|
||||
Ballantyne, P.G. | 4f04ca06-9a76-4206-bd9c-917ca75d278e | 600
|
||||
Ballantyne, Peter | ba5f205b-b78b-43e5-8e80-0c9a1e1ad2ca | 600
|
||||
Ballantyne, Peter | 20f21160-414c-4ecf-89ca-5f2cb64e75c1 | 600
|
||||
Ballantyne, Peter | 2dcbcc7b-47b0-4fd7-bef9-39d554494081 | 600
|
||||
Ballantyne, Peter | 4f04ca06-9a76-4206-bd9c-917ca75d278e | 600
|
||||
Ballantyne, P.G. | 4f04ca06-9a76-4206-bd9c-917ca75d278e | 600
|
||||
Ballantyne, Peter | ba5f205b-b78b-43e5-8e80-0c9a1e1ad2ca | 600
|
||||
Ballantyne, Peter | 20f21160-414c-4ecf-89ca-5f2cb64e75c1 | 600
|
||||
(5 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Again, a few have the correct ORCID, but there should only be one authority…</li>
|
||||
</ul>
|
||||
<li><p>Again, a few have the correct ORCID, but there should only be one authority…</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set authority='4f04ca06-9a76-4206-bd9c-917ca75d278e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Ballantyne, %';
|
||||
UPDATE 58
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And for me:</li>
|
||||
</ul>
|
||||
<li><p>And for me:</p>
|
||||
|
||||
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, A%';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, Alan | 4884def0-4d7e-4256-9dd4-018cd60a5871 | 600
|
||||
Orth, A. | 4884def0-4d7e-4256-9dd4-018cd60a5871 | 600
|
||||
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 4884def0-4d7e-4256-9dd4-018cd60a5871 | 600
|
||||
Orth, A. | 4884def0-4d7e-4256-9dd4-018cd60a5871 | 600
|
||||
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
(3 rows)
|
||||
dspacetest=# update metadatavalue set authority='1a1943a0-3f87-402f-9afe-e52fb46a513e', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Orth, %';
|
||||
UPDATE 11
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And for CCAFS author Bruce Campbell that I had discussed with CCAFS earlier this week:</li>
|
||||
</ul>
|
||||
<li><p>And for CCAFS author Bruce Campbell that I had discussed with CCAFS earlier this week:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set authority='0e414b4c-4671-4a23-b570-6077aca647d8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';
|
||||
UPDATE 166
|
||||
dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Campbell, B%';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
------------------------+--------------------------------------+------------
|
||||
Campbell, Bruce | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
|
||||
Campbell, Bruce Morgan | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
|
||||
Campbell, B. | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
|
||||
Campbell, B.M. | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
|
||||
Campbell, Bruce | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
|
||||
Campbell, Bruce Morgan | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
|
||||
Campbell, B. | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
|
||||
Campbell, B.M. | 0e414b4c-4671-4a23-b570-6077aca647d8 | 600
|
||||
(4 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After updating the Authority indexes (<code>bin/dspace index-authority</code>) everything looks good</li>
|
||||
<li>Run authority updates on CGSpace</li>
|
||||
<li><p>After updating the Authority indexes (<code>bin/dspace index-authority</code>) everything looks good</p></li>
|
||||
|
||||
<li><p>Run authority updates on CGSpace</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-05">2016-09-05</h2>
|
||||
|
||||
<ul>
|
||||
<li>After one week of logging TLS connections on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>After one week of logging TLS connections on CGSpace:</p>
|
||||
|
||||
<pre><code># zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | wc -l
|
||||
217
|
||||
@ -249,18 +236,16 @@ dspacetest=# select distinct text_value, authority, confidence from metadatavalu
|
||||
# zgrep "DES-CBC3" /var/log/nginx/cgspace.cgiar.org-access-ssl.log* | awk '{print $6}' | sort | uniq
|
||||
TLSv1/DES-CBC3-SHA
|
||||
TLSv1/EDH-RSA-DES-CBC3-SHA
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So this represents <code>0.02%</code> of 1.16M connections over a one-week period</li>
|
||||
<li>Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:</li>
|
||||
</ul>
|
||||
<li><p>So this represents <code>0.02%</code> of 1.16M connections over a one-week period</p></li>
|
||||
|
||||
<li><p>Transforming some filenames in OpenRefine so they can have a useful description for SAFBuilder:</p>
|
||||
|
||||
<pre><code>value + "__description:" + cells["dc.type"].value
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This gives you, for example: <code>Mainstreaming gender in agricultural R&D.pdf__description:Brief</code></li>
|
||||
<li><p>This gives you, for example: <code>Mainstreaming gender in agricultural R&D.pdf__description:Brief</code></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-06">2016-09-06</h2>
|
||||
@ -283,28 +268,31 @@ TLSv1/EDH-RSA-DES-CBC3-SHA
|
||||
<li>See: <a href="http://www.fileformat.info/info/unicode/char/e1/index.htm">http://www.fileformat.info/info/unicode/char/e1/index.htm</a></li>
|
||||
<li>See: <a href="http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&s=&uv=0">http://demo.icu-project.org/icu-bin/nbrowser?t=%C3%A1&s=&uv=0</a></li>
|
||||
<li>If I unzip the original zip from CIAT on Windows, re-zip it with 7zip on Windows, and then unzip it on Linux directly, the file names seem to be proper UTF-8</li>
|
||||
<li>We should definitely clean filenames so they don’t use characters that are tricky to process in CSV and shell scripts, like: <code>,</code>, <code>'</code>, and <code>"</code></li>
|
||||
</ul>
|
||||
|
||||
<li><p>We should definitely clean filenames so they don’t use characters that are tricky to process in CSV and shell scripts, like: <code>,</code>, <code>'</code>, and <code>"</code></p>
|
||||
|
||||
<pre><code>value.replace("'","").replace(",","").replace('"','')
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I need to write a Python script to match that for renaming files in the file system</li>
|
||||
<li>When importing SAF bundles it seems you can specify the target collection on the command line using <code>-c 10568/4003</code> or in the <code>collections</code> file inside each item in the bundle</li>
|
||||
<li>Seems that the latter method causes a null pointer exception, so I will just have to use the former method</li>
|
||||
<li>In the end I was able to import the files after unzipping them ONLY on Linux
|
||||
<li><p>I need to write a Python script to match that for renaming files in the file system</p></li>
|
||||
|
||||
<li><p>When importing SAF bundles it seems you can specify the target collection on the command line using <code>-c 10568/4003</code> or in the <code>collections</code> file inside each item in the bundle</p></li>
|
||||
|
||||
<li><p>Seems that the latter method causes a null pointer exception, so I will just have to use the former method</p></li>
|
||||
|
||||
<li><p>In the end I was able to import the files after unzipping them ONLY on Linux</p>
|
||||
|
||||
<ul>
|
||||
<li>The CSV file was giving file names in UTF-8, and unzipping the zip on Mac OS X and transferring it was converting the file names to Unicode equivalence like I saw above</li>
|
||||
</ul></li>
|
||||
<li>Import CIAT Gender Network records to CGSpace, first creating the SAF bundles as my user, then importing as the <code>tomcat7</code> user, and deleting the bundle, for each collection’s items:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Import CIAT Gender Network records to CGSpace, first creating the SAF bundles as my user, then importing as the <code>tomcat7</code> user, and deleting the bundle, for each collection’s items:</p>
|
||||
|
||||
<pre><code>$ ./safbuilder.sh -c /home/aorth/ciat-gender-2016-09-06/66601.csv
|
||||
$ JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m" /home/cgspace.cgiar.org/bin/dspace import -a -e aorth@mjanja.ch -c 10568/66601 -s /home/aorth/ciat-gender-2016-09-06/SimpleArchiveFormat -m 66601.map
|
||||
$ rm -rf ~/ciat-gender-2016-09-06/SimpleArchiveFormat/
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-07">2016-09-07</h2>
|
||||
|
||||
@ -313,132 +301,117 @@ $ rm -rf ~/ciat-gender-2016-09-06/SimpleArchiveFormat/
|
||||
<li>Reading about PostgreSQL maintenance and it seems manual vacuuming is only for certain workloads, such as heavy update/write loads</li>
|
||||
<li>I suggest we disable our nightly manual vacuum task, as we’re a mostly read workload, and I’d rather stick as close to the documentation as possible since we haven’t done any testing/observation of PostgreSQL</li>
|
||||
<li>See: <a href="https://www.postgresql.org/docs/9.3/static/routine-vacuuming.html">https://www.postgresql.org/docs/9.3/static/routine-vacuuming.html</a></li>
|
||||
<li>CGSpace went down and the error seems to be the same as always (lately):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>CGSpace went down and the error seems to be the same as always (lately):</p>
|
||||
|
||||
<pre><code>2016-09-07 11:39:23,162 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
|
||||
...
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Since CGSpace had crashed I quickly deployed the new LDAP settings before restarting Tomcat</li>
|
||||
<li><p>Since CGSpace had crashed I quickly deployed the new LDAP settings before restarting Tomcat</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-13">2016-09-13</h2>
|
||||
|
||||
<ul>
|
||||
<li>CGSpace crashed twice today, errors from <code>catalina.out</code>:</li>
|
||||
</ul>
|
||||
<li><p>CGSpace crashed twice today, errors from <code>catalina.out</code>:</p>
|
||||
|
||||
<pre><code>org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
|
||||
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
|
||||
</code></pre>
|
||||
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I enabled logging of requests to <code>/rest</code> again</li>
|
||||
<li><p>I enabled logging of requests to <code>/rest</code> again</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-14">2016-09-14</h2>
|
||||
|
||||
<ul>
|
||||
<li>CGSpace crashed again, errors from <code>catalina.out</code>:</li>
|
||||
</ul>
|
||||
<li><p>CGSpace crashed again, errors from <code>catalina.out</code>:</p>
|
||||
|
||||
<pre><code>org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
|
||||
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
|
||||
</code></pre>
|
||||
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:114)
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I restarted Tomcat and it was ok again</li>
|
||||
<li>CGSpace crashed a few hours later, errors from <code>catalina.out</code>:</li>
|
||||
</ul>
|
||||
<li><p>I restarted Tomcat and it was ok again</p></li>
|
||||
|
||||
<li><p>CGSpace crashed a few hours later, errors from <code>catalina.out</code>:</p>
|
||||
|
||||
<pre><code>Exception in thread "http-bio-127.0.0.1-8081-exec-25" java.lang.OutOfMemoryError: Java heap space
|
||||
at java.lang.StringCoding.decode(StringCoding.java:215)
|
||||
</code></pre>
|
||||
at java.lang.StringCoding.decode(StringCoding.java:215)
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We haven’t seen that in quite a while…</li>
|
||||
<li>Indeed, in a month of logs it only occurs 15 times:</li>
|
||||
</ul>
|
||||
<li><p>We haven’t seen that in quite a while…</p></li>
|
||||
|
||||
<li><p>Indeed, in a month of logs it only occurs 15 times:</p>
|
||||
|
||||
<pre><code># grep -rsI "OutOfMemoryError" /var/log/tomcat7/catalina.* | wc -l
|
||||
15
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I also see a bunch of errors from dspace.log:</li>
|
||||
</ul>
|
||||
<li><p>I also see a bunch of errors from dspace.log:</p>
|
||||
|
||||
<pre><code>2016-09-14 12:23:07,981 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking at REST requests, it seems there is one IP hitting us nonstop:</li>
|
||||
</ul>
|
||||
<li><p>Looking at REST requests, it seems there is one IP hitting us nonstop:</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | sort -n | uniq -c | sort -h | tail -n 3
|
||||
820 50.87.54.15
|
||||
12872 70.32.99.142
|
||||
25744 70.32.83.92
|
||||
820 50.87.54.15
|
||||
12872 70.32.99.142
|
||||
25744 70.32.83.92
|
||||
# awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 3
|
||||
7966 181.118.144.29
|
||||
54706 70.32.99.142
|
||||
109412 70.32.83.92
|
||||
</code></pre>
|
||||
7966 181.118.144.29
|
||||
54706 70.32.99.142
|
||||
109412 70.32.83.92
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Those are the same IPs that were hitting us heavily in July, 2016 as well…</li>
|
||||
<li>I think the stability issues are definitely from REST</li>
|
||||
<li>Crashed AGAIN, errors from dspace.log:</li>
|
||||
</ul>
|
||||
<li><p>Those are the same IPs that were hitting us heavily in July, 2016 as well…</p></li>
|
||||
|
||||
<li><p>I think the stability issues are definitely from REST</p></li>
|
||||
|
||||
<li><p>Crashed AGAIN, errors from dspace.log:</p>
|
||||
|
||||
<pre><code>2016-09-14 14:31:43,069 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And more heap space errors:</li>
|
||||
</ul>
|
||||
<li><p>And more heap space errors:</p>
|
||||
|
||||
<pre><code># grep -rsI "OutOfMemoryError" /var/log/tomcat7/catalina.* | wc -l
|
||||
19
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There are no more rest requests since the last crash, so maybe there are other things causing this.</li>
|
||||
<li>Hmm, I noticed a shitload of IPs from 180.76.0.0/16 are connecting to both CGSpace and DSpace Test (58 unique IPs concurrently!)</li>
|
||||
<li>They seem to be coming from Baidu, and so far during today alone account for <sup>1</sup>⁄<sub>6</sub> of every connection:</li>
|
||||
</ul>
|
||||
<li><p>There are no more rest requests since the last crash, so maybe there are other things causing this.</p></li>
|
||||
|
||||
<li><p>Hmm, I noticed a shitload of IPs from 180.76.0.0/16 are connecting to both CGSpace and DSpace Test (58 unique IPs concurrently!)</p></li>
|
||||
|
||||
<li><p>They seem to be coming from Baidu, and so far during today alone account for <sup>1</sup>⁄<sub>6</sub> of every connection:</p>
|
||||
|
||||
<pre><code># grep -c ip_addr= /home/cgspace.cgiar.org/log/dspace.log.2016-09-14
|
||||
29084
|
||||
# grep -c ip_addr=180.76.15 /home/cgspace.cgiar.org/log/dspace.log.2016-09-14
|
||||
5192
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Other recent days are the same… hmmm.</li>
|
||||
<li>From the activity control panel I can see 58 unique IPs hitting the site <em>concurrently</em>, which has GOT to hurt our stability</li>
|
||||
<li>A list of all 2000 unique IPs from CGSpace logs today:</li>
|
||||
</ul>
|
||||
<li><p>Other recent days are the same… hmmm.</p></li>
|
||||
|
||||
<li><p>From the activity control panel I can see 58 unique IPs hitting the site <em>concurrently</em>, which has GOT to hurt our stability</p></li>
|
||||
|
||||
<li><p>A list of all 2000 unique IPs from CGSpace logs today:</p>
|
||||
|
||||
<pre><code># grep ip_addr= /home/cgspace.cgiar.org/log/dspace.log.2016-09-11 | awk -F: '{print $5}' | sort -n | uniq -c | sort -h | tail -n 100
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking at the top 20 IPs or so, most are Yahoo, MSN, Google, Baidu, TurnitIn (iParadigm), etc… do we have any real users?</li>
|
||||
<li>Generate a list of all author affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:</li>
|
||||
</ul>
|
||||
<li><p>Looking at the top 20 IPs or so, most are Yahoo, MSN, Google, Baidu, TurnitIn (iParadigm), etc… do we have any real users?</p></li>
|
||||
|
||||
<li><p>Generate a list of all author affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:</p>
|
||||
|
||||
<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking into the Catalina logs again around the time of the first crash, I see:</li>
|
||||
</ul>
|
||||
<li><p>Looking into the Catalina logs again around the time of the first crash, I see:</p>
|
||||
|
||||
<pre><code>Wed Sep 14 09:47:27 UTC 2016 | Query:id: 78581 AND type:2
|
||||
Wed Sep 14 09:47:28 UTC 2016 | Updating : 6/6 docs.
|
||||
@ -446,12 +419,11 @@ Commit
|
||||
Commit done
|
||||
dn:CN=Haman\, Magdalena (CIAT-CCAFS),OU=Standard,OU=Users,OU=HQ,OU=CIATHUB,dc=cgiarad,dc=org
|
||||
Exception in thread "http-bio-127.0.0.1-8081-exec-193" java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And after that I see a bunch of “pool error Timeout waiting for idle object”</li>
|
||||
<li>Later, near the time of the next crash I see:</li>
|
||||
</ul>
|
||||
<li><p>And after that I see a bunch of “pool error Timeout waiting for idle object”</p></li>
|
||||
|
||||
<li><p>Later, near the time of the next crash I see:</p>
|
||||
|
||||
<pre><code>dn:CN=Haman\, Magdalena (CIAT-CCAFS),OU=Standard,OU=Users,OU=HQ,OU=CIATHUB,dc=cgiarad,dc=org
|
||||
Wed Sep 14 11:29:55 UTC 2016 | Query:id: 79078 AND type:2
|
||||
@ -462,27 +434,24 @@ Sep 14, 2016 11:32:22 AM com.sun.jersey.server.wadl.generators.WadlGeneratorJAXB
|
||||
SEVERE: Failed to generate the schema for the JAX-B elements
|
||||
com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of IllegalAnnotationExceptions
|
||||
java.util.Map is an interface, and JAXB can't handle interfaces.
|
||||
this problem is related to the following location:
|
||||
at java.util.Map
|
||||
at public java.util.Map com.atmire.dspace.rest.common.Statlet.getRender()
|
||||
at com.atmire.dspace.rest.common.Statlet
|
||||
this problem is related to the following location:
|
||||
at java.util.Map
|
||||
at public java.util.Map com.atmire.dspace.rest.common.Statlet.getRender()
|
||||
at com.atmire.dspace.rest.common.Statlet
|
||||
java.util.Map does not have a no-arg default constructor.
|
||||
this problem is related to the following location:
|
||||
at java.util.Map
|
||||
at public java.util.Map com.atmire.dspace.rest.common.Statlet.getRender()
|
||||
at com.atmire.dspace.rest.common.Statlet
|
||||
</code></pre>
|
||||
this problem is related to the following location:
|
||||
at java.util.Map
|
||||
at public java.util.Map com.atmire.dspace.rest.common.Statlet.getRender()
|
||||
at com.atmire.dspace.rest.common.Statlet
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then 20 minutes later another outOfMemoryError:</li>
|
||||
</ul>
|
||||
<li><p>Then 20 minutes later another outOfMemoryError:</p>
|
||||
|
||||
<pre><code>Exception in thread "http-bio-127.0.0.1-8081-exec-25" java.lang.OutOfMemoryError: Java heap space
|
||||
at java.lang.StringCoding.decode(StringCoding.java:215)
|
||||
</code></pre>
|
||||
at java.lang.StringCoding.decode(StringCoding.java:215)
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Perhaps these particular issues <em>are</em> memory issues, the munin graphs definitely show some weird purging/allocating behavior starting this week</li>
|
||||
<li><p>Perhaps these particular issues <em>are</em> memory issues, the munin graphs definitely show some weird purging/allocating behavior starting this week</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2016/09/tomcat_jvm-day.png" alt="Tomcat JVM usage day" />
|
||||
@ -492,15 +461,15 @@ java.util.Map does not have a no-arg default constructor.
|
||||
<ul>
|
||||
<li>And really, we did reduce the memory of CGSpace in late 2015, so maybe we should just increase it again, now that our usage is higher and we are having memory errors in the logs</li>
|
||||
<li>Oh great, the configuration on the actual server is different than in configuration management!</li>
|
||||
<li>Seems we added a bunch of settings to the <code>/etc/default/tomcat7</code> in December, 2015 and never updated our ansible repository:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Seems we added a bunch of settings to the <code>/etc/default/tomcat7</code> in December, 2015 and never updated our ansible repository:</p>
|
||||
|
||||
<pre><code>JAVA_OPTS="-Djava.awt.headless=true -Xms3584m -Xmx3584m -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -Dfile.encoding=UTF-8 -XX:-UseGCOverheadLimit -XX:MaxGCPauseMillis=250 -XX:GCTimeRatio=9 -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=8m -XX:InitiatingHeapOccupancyPercent=75 -XX:+UseLargePages -XX:+AggressiveOpts"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So I’m going to bump the heap +512m and remove all the other experimental shit (and update ansible!)</li>
|
||||
<li>Increased JVM heap to 4096m on CGSpace (linode01)</li>
|
||||
<li><p>So I’m going to bump the heap +512m and remove all the other experimental shit (and update ansible!)</p></li>
|
||||
|
||||
<li><p>Increased JVM heap to 4096m on CGSpace (linode01)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-15">2016-09-15</h2>
|
||||
@ -514,8 +483,7 @@ java.util.Map does not have a no-arg default constructor.
|
||||
<h2 id="2016-09-16">2016-09-16</h2>
|
||||
|
||||
<ul>
|
||||
<li>CGSpace crashed again, and there are TONS of heap space errors but the datestamps aren’t on those lines so I’m not sure if they were yesterday:</li>
|
||||
</ul>
|
||||
<li><p>CGSpace crashed again, and there are TONS of heap space errors but the datestamps aren’t on those lines so I’m not sure if they were yesterday:</p>
|
||||
|
||||
<pre><code>dn:CN=Orentlicher\, Natalie (CIAT),OU=Standard,OU=Users,OU=HQ,OU=CIATHUB,dc=cgiarad,dc=org
|
||||
Thu Sep 15 18:45:25 UTC 2016 | Query:id: 55785 AND type:2
|
||||
@ -533,41 +501,38 @@ Exception in thread "http-bio-127.0.0.1-8081-exec-263" java.lang.OutOf
|
||||
Exception in thread "http-bio-127.0.0.1-8081-exec-280" java.lang.OutOfMemoryError: Java heap space
|
||||
Exception in thread "Thread-54216" org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception writing document id 7feaa95d-8e1f-4f45-80bb
|
||||
-e14ef82ee224 to the index; possible analysis error.
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
|
||||
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
|
||||
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
|
||||
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
|
||||
at com.atmire.statistics.SolrLogThread.run(SourceFile:25)
|
||||
</code></pre>
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
|
||||
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
|
||||
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
|
||||
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
|
||||
at com.atmire.statistics.SolrLogThread.run(SourceFile:25)
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I bumped the heap space from 4096m to 5120m to see if this is <em>really</em> about heap speace or not.</li>
|
||||
<li>Looking into some of these errors that I’ve seen this week but haven’t noticed before:</li>
|
||||
</ul>
|
||||
<li><p>I bumped the heap space from 4096m to 5120m to see if this is <em>really</em> about heap speace or not.</p></li>
|
||||
|
||||
<li><p>Looking into some of these errors that I’ve seen this week but haven’t noticed before:</p>
|
||||
|
||||
<pre><code># zcat -f -- /var/log/tomcat7/catalina.* | grep -c 'Failed to generate the schema for the JAX-B elements'
|
||||
113
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’ve sent a message to Atmire about the Solr error to see if it’s related to their batch update module</li>
|
||||
<li><p>I’ve sent a message to Atmire about the Solr error to see if it’s related to their batch update module</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-19">2016-09-19</h2>
|
||||
|
||||
<ul>
|
||||
<li>Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions:</li>
|
||||
</ul>
|
||||
<li><p>Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu
|
||||
$ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace -d dspace -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After that we need to take the top ~300 and make a controlled vocabulary for it</li>
|
||||
<li>I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (<a href="https://github.com/ilri/DSpace/pull/267">#267</a>)</li>
|
||||
<li><p>After that we need to take the top ~300 and make a controlled vocabulary for it</p></li>
|
||||
|
||||
<li><p>I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (<a href="https://github.com/ilri/DSpace/pull/267">#267</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-20">2016-09-20</h2>
|
||||
@ -587,42 +552,42 @@ $ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2
|
||||
|
||||
<ul>
|
||||
<li>Turns out the Solr search logic switched from OR to AND in DSpace 6.0 and the change is easy to backport: <a href="https://jira.duraspace.org/browse/DS-2809">https://jira.duraspace.org/browse/DS-2809</a></li>
|
||||
<li>We just need to set this in <code>dspace/solr/search/conf/schema.xml</code>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>We just need to set this in <code>dspace/solr/search/conf/schema.xml</code>:</p>
|
||||
|
||||
<pre><code><solrQueryParser defaultOperator="AND"/>
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It actually works really well, and search results return much less hits now (before, after):</li>
|
||||
<li><p>It actually works really well, and search results return much less hits now (before, after):</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2016/09/cgspace-search.png" alt="CGSpace search with "OR" boolean logic" />
|
||||
<img src="/cgspace-notes/2016/09/dspacetest-search.png" alt="DSpace Test search with "AND" boolean logic" /></p>
|
||||
|
||||
<ul>
|
||||
<li>Found a way to improve the configuration of Atmire’s Content and Usage Analysis (CUA) module for date fields</li>
|
||||
</ul>
|
||||
<li><p>Found a way to improve the configuration of Atmire’s Content and Usage Analysis (CUA) module for date fields</p>
|
||||
|
||||
<pre><code>-content.analysis.dataset.option.8=metadata:dateAccessioned:discovery
|
||||
+content.analysis.dataset.option.8=metadata:dc.date.accessioned:date(month)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This allows the module to treat the field as a date rather than a text string, so we can interrogate it more intelligently</li>
|
||||
<li>Add <code>dc.date.accessioned</code> to XMLUI Discovery search filters</li>
|
||||
<li>Major CGSpace crash because ILRI forgot to pay the Linode bill</li>
|
||||
<li>45 minutes of downtime!</li>
|
||||
<li>Start processing the fixes to <code>dc.description.sponsorship</code> from Peter Ballantyne:</li>
|
||||
</ul>
|
||||
<li><p>This allows the module to treat the field as a date rather than a text string, so we can interrogate it more intelligently</p></li>
|
||||
|
||||
<li><p>Add <code>dc.date.accessioned</code> to XMLUI Discovery search filters</p></li>
|
||||
|
||||
<li><p>Major CGSpace crash because ILRI forgot to pay the Linode bill</p></li>
|
||||
|
||||
<li><p>45 minutes of downtime!</p></li>
|
||||
|
||||
<li><p>Start processing the fixes to <code>dc.description.sponsorship</code> from Peter Ballantyne:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i sponsors-fix-23.csv -f dc.description.sponsorship -t correct -m 29 -d dspace -u dspace -p fuuu
|
||||
$ ./delete-metadata-values.py -i sponsors-delete-8.csv -f dc.description.sponsorship -m 29 -d dspace -u dspace -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I need to run these and the others from a few days ago on CGSpace the next time we run updates</li>
|
||||
<li>Also, I need to update the controlled vocab for sponsors based on these</li>
|
||||
<li><p>I need to run these and the others from a few days ago on CGSpace the next time we run updates</p></li>
|
||||
|
||||
<li><p>Also, I need to update the controlled vocab for sponsors based on these</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-22">2016-09-22</h2>
|
||||
@ -639,18 +604,19 @@ $ ./delete-metadata-values.py -i sponsors-delete-8.csv -f dc.description.sponsor
|
||||
<li>Merge updates to sponsorship controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/277">#277</a>)</li>
|
||||
<li>I’ve been trying to add a search filter for <code>dc.description</code> so the IITA people can search for some tags they use there, but for some reason the filter never shows up in Atmire’s CUA</li>
|
||||
<li>Not sure if it’s something like we already have too many filters there (30), or the filter name is reserved, etc…</li>
|
||||
<li>Generate a list of ILRI subjects for Peter and Abenet to look through/fix:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Generate a list of ILRI subjects for Peter and Abenet to look through/fix:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where resource_type_id=2 and metadata_field_id=203 group by text_value order by count desc) to /tmp/ilrisubjects.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Regenerate Discovery indexes a few times after playing with <code>discovery.xml</code> index definitions (syntax, parameters, etc).</li>
|
||||
<li>Merge changes to boolean logic in Solr search (<a href="https://github.com/ilri/DSpace/pull/274">#274</a>)</li>
|
||||
<li>Run all sponsorship and affiliation fixes on CGSpace, deploy latest <code>5_x-prod</code> branch, and re-index Discovery on CGSpace</li>
|
||||
<li>Tested OCSP stapling on DSpace Test’s nginx and it works:</li>
|
||||
</ul>
|
||||
<li><p>Regenerate Discovery indexes a few times after playing with <code>discovery.xml</code> index definitions (syntax, parameters, etc).</p></li>
|
||||
|
||||
<li><p>Merge changes to boolean logic in Solr search (<a href="https://github.com/ilri/DSpace/pull/274">#274</a>)</p></li>
|
||||
|
||||
<li><p>Run all sponsorship and affiliation fixes on CGSpace, deploy latest <code>5_x-prod</code> branch, and re-index Discovery on CGSpace</p></li>
|
||||
|
||||
<li><p>Tested OCSP stapling on DSpace Test’s nginx and it works:</p>
|
||||
|
||||
<pre><code>$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status
|
||||
...
|
||||
@ -658,48 +624,48 @@ OCSP response:
|
||||
======================================
|
||||
OCSP Response Data:
|
||||
...
|
||||
Cert Status: good
|
||||
</code></pre>
|
||||
Cert Status: good
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’ve been monitoring this for almost two years in this GitHub issue: <a href="https://github.com/ilri/DSpace/issues/38">https://github.com/ilri/DSpace/issues/38</a></li>
|
||||
<li><p>I’ve been monitoring this for almost two years in this GitHub issue: <a href="https://github.com/ilri/DSpace/issues/38">https://github.com/ilri/DSpace/issues/38</a></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-27">2016-09-27</h2>
|
||||
|
||||
<ul>
|
||||
<li>Discuss fixing some ORCIDs for CCAFS author Sonja Vermeulen with Magdalena Haman</li>
|
||||
<li>This author has a few variations:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>This author has a few variations:</p>
|
||||
|
||||
<pre><code>dspacetest=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeu
|
||||
len, S%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And it looks like <code>fe4b719f-6cc4-4d65-8504-7a83130b9f83</code> is the authority with the correct ORCID linked</li>
|
||||
</ul>
|
||||
<li><p>And it looks like <code>fe4b719f-6cc4-4d65-8504-7a83130b9f83</code> is the authority with the correct ORCID linked</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set authority='fe4b719f-6cc4-4d65-8504-7a83130b9f83w', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%';
|
||||
UPDATE 101
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Hmm, now her name is missing from the authors facet and only shows the authority ID</li>
|
||||
<li>On the production server there is an item with her ORCID but it is using a different authority: f01f7b7b-be3f-4df7-a61d-b73c067de88d</li>
|
||||
<li>Maybe I used the wrong one… I need to look again at the production database</li>
|
||||
<li>On a clean snapshot of the database I see the correct authority should be <code>f01f7b7b-be3f-4df7-a61d-b73c067de88d</code>, not <code>fe4b719f-6cc4-4d65-8504-7a83130b9f83</code></li>
|
||||
<li>Updating her authorities again and reindexing:</li>
|
||||
</ul>
|
||||
<li><p>Hmm, now her name is missing from the authors facet and only shows the authority ID</p></li>
|
||||
|
||||
<li><p>On the production server there is an item with her ORCID but it is using a different authority: f01f7b7b-be3f-4df7-a61d-b73c067de88d</p></li>
|
||||
|
||||
<li><p>Maybe I used the wrong one… I need to look again at the production database</p></li>
|
||||
|
||||
<li><p>On a clean snapshot of the database I see the correct authority should be <code>f01f7b7b-be3f-4df7-a61d-b73c067de88d</code>, not <code>fe4b719f-6cc4-4d65-8504-7a83130b9f83</code></p></li>
|
||||
|
||||
<li><p>Updating her authorities again and reindexing:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set authority='f01f7b7b-be3f-4df7-a61d-b73c067de88d', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%';
|
||||
UPDATE 101
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Use GitHub icon from Font Awesome instead of a PNG to save one extra network request</li>
|
||||
<li>We can also replace the RSS and mail icons in community text!</li>
|
||||
<li>Fix reference to <code>dc.type.*</code> in Atmire CUA module, as we now only index <code>dc.type</code> for “Output type”</li>
|
||||
<li><p>Use GitHub icon from Font Awesome instead of a PNG to save one extra network request</p></li>
|
||||
|
||||
<li><p>We can also replace the RSS and mail icons in community text!</p></li>
|
||||
|
||||
<li><p>Fix reference to <code>dc.type.*</code> in Atmire CUA module, as we now only index <code>dc.type</code> for “Output type”</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-28">2016-09-28</h2>
|
||||
@ -711,22 +677,23 @@ UPDATE 101
|
||||
<li>Going to try to update Sonja Vermeulen’s authority to 2b4166b7-6e4d-4f66-9d8b-ddfbec9a6ae0, as that seems to be one of her authorities that has an ORCID</li>
|
||||
<li>Merge Font Awesome changes (<a href="https://github.com/ilri/DSpace/pull/279">#279</a>)</li>
|
||||
<li>Minor fix to a string in Atmire’s CUA module (<a href="https://github.com/ilri/DSpace/pull/280">#280</a>)</li>
|
||||
<li>This seems to be what I’ll need to do for Sonja Vermeulen (but with <code>2b4166b7-6e4d-4f66-9d8b-ddfbec9a6ae0</code> instead on the live site):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>This seems to be what I’ll need to do for Sonja Vermeulen (but with <code>2b4166b7-6e4d-4f66-9d8b-ddfbec9a6ae0</code> instead on the live site):</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set authority='09e4da69-33a3-45ca-b110-7d3f82d2d6d2', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen, S%';
|
||||
dspacetest=# update metadatavalue set authority='09e4da69-33a3-45ca-b110-7d3f82d2d6d2', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Vermeulen SJ%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And then update Discovery and Authority indexes</li>
|
||||
<li>Minor fix for “Subject” string in Discovery search and Atmire modules (<a href="https://github.com/ilri/DSpace/pull/281">#281</a>)</li>
|
||||
<li>Start testing batch fixes for ILRI subject from Peter:</li>
|
||||
</ul>
|
||||
<li><p>And then update Discovery and Authority indexes</p></li>
|
||||
|
||||
<li><p>Minor fix for “Subject” string in Discovery search and Atmire modules (<a href="https://github.com/ilri/DSpace/pull/281">#281</a>)</p></li>
|
||||
|
||||
<li><p>Start testing batch fixes for ILRI subject from Peter:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i ilrisubjects-fix-32.csv -f cg.subject.ilri -t correct -m 203 -d dspace -u dspace -p fuuuu
|
||||
$ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -m 203 -d dspace -u dspace -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-29">2016-09-29</h2>
|
||||
|
||||
@ -734,11 +701,12 @@ $ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -
|
||||
<li>Add <code>cg.identifier.ciatproject</code> to metadata registry in preparation for CIAT project tag</li>
|
||||
<li>Merge changes for CIAT project tag (<a href="https://github.com/ilri/DSpace/pull/282">#282</a>)</li>
|
||||
<li>DSpace Test (linode02) became unresponsive for some reason, I had to hard reboot it from the Linode console</li>
|
||||
<li>People on DSpace mailing list gave me a query to get authors from certain collections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>People on DSpace mailing list gave me a query to get authors from certain collections:</p>
|
||||
|
||||
<pre><code>dspacetest=# select distinct text_value from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/5472', '10568/5473')));
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-09-30">2016-09-30</h2>
|
||||
|
||||
|
@ -16,10 +16,11 @@ Need to test the following scenarios to see how author order is affected:
|
||||
ORCIDs only
|
||||
ORCIDs plus normal authors
|
||||
|
||||
|
||||
I exported a random item’s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
|
||||
|
||||
|
||||
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-10/" />
|
||||
@ -38,12 +39,13 @@ Need to test the following scenarios to see how author order is affected:
|
||||
ORCIDs only
|
||||
ORCIDs plus normal authors
|
||||
|
||||
|
||||
I exported a random item’s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
|
||||
|
||||
|
||||
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -132,11 +134,12 @@ I exported a random item’s metadata as CSV, deleted all columns except id
|
||||
<li>ORCIDs only</li>
|
||||
<li>ORCIDs plus normal authors</li>
|
||||
</ul></li>
|
||||
<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
|
||||
|
||||
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
<li>Hmm, with the <code>dc.contributor.author</code> column removed, DSpace doesn’t detect any changes</li>
|
||||
@ -161,21 +164,22 @@ I exported a random item’s metadata as CSV, deleted all columns except id
|
||||
<li>Find invalid characters</li>
|
||||
<li>Cluster values to merge obvious authors</li>
|
||||
</ul></li>
|
||||
<li>That left us with 3,180 valid corrections and 3 deletions:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>That left us with 3,180 valid corrections and 3 deletions:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i authors-fix-3180.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
|
||||
$ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -m 3 -d dspacetest -u dspacetest -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Remove old about page (<a href="https://github.com/ilri/DSpace/pull/284">#284</a>)</li>
|
||||
<li>CGSpace crashed a few times today</li>
|
||||
<li>Generate list of unique authors in CCAFS collections:</li>
|
||||
</ul>
|
||||
<li><p>Remove old about page (<a href="https://github.com/ilri/DSpace/pull/284">#284</a>)</p></li>
|
||||
|
||||
<li><p>CGSpace crashed a few times today</p></li>
|
||||
|
||||
<li><p>Generate list of unique authors in CCAFS collections:</p>
|
||||
|
||||
<pre><code>dspacetest=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/32729', '10568/5472', '10568/5473', '10568/10288', '10568/70974', '10568/3547', '10568/3549', '10568/3531','10568/16890','10568/5470','10568/3546', '10568/36024', '10568/66581', '10568/21789', '10568/5469', '10568/5468', '10568/3548', '10568/71053', '10568/25167'))) group by text_value order by count desc) to /tmp/ccafs-authors.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-10-05">2016-10-05</h2>
|
||||
|
||||
@ -203,24 +207,22 @@ $ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -
|
||||
|
||||
<ul>
|
||||
<li>Re-deploy CGSpace with latest changes from late September and early October</li>
|
||||
<li>Run fixes for ILRI subjects and delete blank metadata values:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Run fixes for ILRI subjects and delete blank metadata values:</p>
|
||||
|
||||
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
|
||||
DELETE 11
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run all system updates and reboot CGSpace</li>
|
||||
<li>Delete ten gigs of old 2015 Tomcat logs that never got rotated (WTF?):</li>
|
||||
</ul>
|
||||
<li><p>Run all system updates and reboot CGSpace</p></li>
|
||||
|
||||
<li><p>Delete ten gigs of old 2015 Tomcat logs that never got rotated (WTF?):</p>
|
||||
|
||||
<pre><code>root@linode01:~# ls -lh /var/log/tomcat7/localhost_access_log.2015* | wc -l
|
||||
47
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Delete 2GB <code>cron-filter-media.log</code> file, as it is just a log from a cron job and it doesn’t get rotated like normal log files (almost a year now maybe)</li>
|
||||
<li><p>Delete 2GB <code>cron-filter-media.log</code> file, as it is just a log from a cron job and it doesn’t get rotated like normal log files (almost a year now maybe)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-10-14">2016-10-14</h2>
|
||||
@ -234,34 +236,34 @@ DELETE 11
|
||||
<h2 id="2016-10-17">2016-10-17</h2>
|
||||
|
||||
<ul>
|
||||
<li>A bit more cleanup on the CCAFS authors, and run the corrections on DSpace Test:</li>
|
||||
</ul>
|
||||
<li><p>A bit more cleanup on the CCAFS authors, and run the corrections on DSpace Test:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i ccafs-authors-oct-16.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>One observation is that there are still some old versions of names in the author lookup because authors appear in other communities (as we only corrected authors from CCAFS for this round)</li>
|
||||
<li><p>One observation is that there are still some old versions of names in the author lookup because authors appear in other communities (as we only corrected authors from CCAFS for this round)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-10-18">2016-10-18</h2>
|
||||
|
||||
<ul>
|
||||
<li>Start working on DSpace 5.5 porting work again:
|
||||
<br /></li>
|
||||
</ul>
|
||||
<li><p>Start working on DSpace 5.5 porting work again:</p>
|
||||
|
||||
<pre><code>$ git checkout -b 5_x-55 5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Have to fix about ten merge conflicts, mostly in the SCSS for the CGIAR theme</li>
|
||||
<li>Skip 1e34751b8cf17021f45d4cf2b9a5800c93fb4cb2 in lieu of upstream’s 55e623d1c2b8b7b1fa45db6728e172e06bfa8598 (fixes X-Forwarded-For header) because I had made the same fix myself and it’s better to use the upstream one</li>
|
||||
<li>I notice this rebase gets rid of GitHub merge commits… which actually might be fine because merges are fucking annoying to deal with when remote people merge without pulling and rebasing their branch first</li>
|
||||
<li>Finished up applying the 5.5 sitemap changes to all themes</li>
|
||||
<li>Merge the <code>discovery.xml</code> cleanups (<a href="https://github.com/ilri/DSpace/pull/278">#278</a>)</li>
|
||||
<li>Merge some minor edits to the distribution license (<a href="https://github.com/ilri/DSpace/pull/285">#285</a>)</li>
|
||||
<li><p>Have to fix about ten merge conflicts, mostly in the SCSS for the CGIAR theme</p></li>
|
||||
|
||||
<li><p>Skip 1e34751b8cf17021f45d4cf2b9a5800c93fb4cb2 in lieu of upstream’s 55e623d1c2b8b7b1fa45db6728e172e06bfa8598 (fixes X-Forwarded-For header) because I had made the same fix myself and it’s better to use the upstream one</p></li>
|
||||
|
||||
<li><p>I notice this rebase gets rid of GitHub merge commits… which actually might be fine because merges are fucking annoying to deal with when remote people merge without pulling and rebasing their branch first</p></li>
|
||||
|
||||
<li><p>Finished up applying the 5.5 sitemap changes to all themes</p></li>
|
||||
|
||||
<li><p>Merge the <code>discovery.xml</code> cleanups (<a href="https://github.com/ilri/DSpace/pull/278">#278</a>)</p></li>
|
||||
|
||||
<li><p>Merge some minor edits to the distribution license (<a href="https://github.com/ilri/DSpace/pull/285">#285</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-10-19">2016-10-19</h2>
|
||||
@ -286,38 +288,31 @@ $ git rebase -i dspace-5.5
|
||||
<h2 id="2016-10-25">2016-10-25</h2>
|
||||
|
||||
<ul>
|
||||
<li>Move the LIVES community from the top level to the ILRI projects community</li>
|
||||
</ul>
|
||||
<li><p>Move the LIVES community from the top level to the ILRI projects community</p>
|
||||
|
||||
<pre><code>$ /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child=10568/25101
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Start testing some things for DSpace 5.5, like command line metadata import, PDF media filter, and Atmire CUA</li>
|
||||
<li>Start looking at batch fixing of “old” ILRI website links without www or https, for example:</li>
|
||||
</ul>
|
||||
<li><p>Start testing some things for DSpace 5.5, like command line metadata import, PDF media filter, and Atmire CUA</p></li>
|
||||
|
||||
<li><p>Start looking at batch fixing of “old” ILRI website links without www or https, for example:</p>
|
||||
|
||||
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and text_value like 'http://ilri.org%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also CCAFS has HTTPS and their links should use it where possible:</li>
|
||||
</ul>
|
||||
<li><p>Also CCAFS has HTTPS and their links should use it where possible:</p>
|
||||
|
||||
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and text_value like 'http://ccafs.cgiar.org%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And this will find community and collection HTML text that is using the old style PNG/JPG icons for RSS and email (we should be using Font Awesome icons instead):</li>
|
||||
</ul>
|
||||
<li><p>And this will find community and collection HTML text that is using the old style PNG/JPG icons for RSS and email (we should be using Font Awesome icons instead):</p>
|
||||
|
||||
<pre><code>dspace=# select text_value from metadatavalue where resource_type_id in (3,4) and text_value like '%Iconrss2.png%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Turns out there are shit tons of varieties of this, like with http, https, www, separate <code></img></code> tags, alignments, etc</li>
|
||||
<li>Had to find all variations and replace them individually:</li>
|
||||
</ul>
|
||||
<li><p>Turns out there are shit tons of varieties of this, like with http, https, www, separate <code></img></code> tags, alignments, etc</p></li>
|
||||
|
||||
<li><p>Had to find all variations and replace them individually:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://www.ilri.org/images/Iconrss2.png"/>','<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://www.ilri.org/images/Iconrss2.png"/>%';
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://www.ilri.org/images/email.jpg"/>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://www.ilri.org/images/email.jpg"/>%';
|
||||
@ -335,19 +330,19 @@ dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<i
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img valign="center" align="left" src="https://www.ilri.org/images/email.jpg"/>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img valign="center" align="left" src="https://www.ilri.org/images/email.jpg"/>%';
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img valign="center" align="left" src="http://www.ilri.org/images/Iconrss2.png"/>', '<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img valign="center" align="left" src="http://www.ilri.org/images/Iconrss2.png"/>%';
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img valign="center" align="left" src="http://www.ilri.org/images/email.jpg"/>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img valign="center" align="left" src="http://www.ilri.org/images/email.jpg"/>%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Getting rid of these reduces the number of network requests each client makes on community/collection pages, and makes use of Font Awesome icons (which they are already loading anyways!)</li>
|
||||
<li>And now that I start looking, I want to fix a bunch of links to popular sites that should be using HTTPS, like Twitter, Facebook, Google, Feed Burner, DOI, etc</li>
|
||||
<li>I should look to see if any of those domains is sending an HTTP 301 or setting HSTS headers to their HTTPS domains, then just replace them</li>
|
||||
<li><p>Getting rid of these reduces the number of network requests each client makes on community/collection pages, and makes use of Font Awesome icons (which they are already loading anyways!)</p></li>
|
||||
|
||||
<li><p>And now that I start looking, I want to fix a bunch of links to popular sites that should be using HTTPS, like Twitter, Facebook, Google, Feed Burner, DOI, etc</p></li>
|
||||
|
||||
<li><p>I should look to see if any of those domains is sending an HTTP 301 or setting HSTS headers to their HTTPS domains, then just replace them</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-10-27">2016-10-27</h2>
|
||||
|
||||
<ul>
|
||||
<li>Run Font Awesome fixes on DSpace Test:</li>
|
||||
</ul>
|
||||
<li><p>Run Font Awesome fixes on DSpace Test:</p>
|
||||
|
||||
<pre><code>dspace=# \i /tmp/font-awesome-text-replace.sql
|
||||
UPDATE 17
|
||||
@ -367,10 +362,9 @@ UPDATE 1
|
||||
UPDATE 1
|
||||
UPDATE 1
|
||||
UPDATE 0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looks much better now:</li>
|
||||
<li><p>Looks much better now:</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2016/10/cgspace-icons.png" alt="CGSpace with old icons" />
|
||||
@ -383,53 +377,47 @@ UPDATE 0
|
||||
<h2 id="2016-10-30">2016-10-30</h2>
|
||||
|
||||
<ul>
|
||||
<li>Fix some messed up authors on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>Fix some messed up authors on CGSpace:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set authority='799da1d8-22f3-43f5-8233-3d2ef5ebf8a8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Charleston, B.%';
|
||||
UPDATE 10
|
||||
dspace=# update metadatavalue set authority='e936f5c5-343d-4c46-aa91-7a1fff6277ed', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Knight-Jones%';
|
||||
UPDATE 36
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I updated the authority index but nothing seemed to change, so I’ll wait and do it again after I update Discovery below</li>
|
||||
<li>Skype chat with Tsega about the <a href="https://github.com/ilri/ckm-cgspace-contentdm-bridge">IFPRI contentdm bridge</a></li>
|
||||
<li>We tested harvesting OAI in an example collection to see how it works</li>
|
||||
<li>Talk to Carlos Quiros about CG Core metadata in CGSpace</li>
|
||||
<li>Get a list of countries from CGSpace so I can do some batch corrections:</li>
|
||||
</ul>
|
||||
<li><p>I updated the authority index but nothing seemed to change, so I’ll wait and do it again after I update Discovery below</p></li>
|
||||
|
||||
<li><p>Skype chat with Tsega about the <a href="https://github.com/ilri/ckm-cgspace-contentdm-bridge">IFPRI contentdm bridge</a></p></li>
|
||||
|
||||
<li><p>We tested harvesting OAI in an example collection to see how it works</p></li>
|
||||
|
||||
<li><p>Talk to Carlos Quiros about CG Core metadata in CGSpace</p></li>
|
||||
|
||||
<li><p>Get a list of countries from CGSpace so I can do some batch corrections:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=228 group by text_value order by count desc) to /tmp/countries.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Fix a bunch of countries in Open Refine and run the corrections on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>Fix a bunch of countries in Open Refine and run the corrections on CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i countries-fix-18.csv -f dc.coverage.country -t 'correct' -m 228 -d dspace -u dspace -p fuuu
|
||||
$ ./delete-metadata-values.py -i countries-delete-2.csv -f dc.coverage.country -m 228 -d dspace -u dspace -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run a shit ton of author fixes from Peter Ballantyne that we’ve been cleaning up for two months:</li>
|
||||
</ul>
|
||||
<li><p>Run a shit ton of author fixes from Peter Ballantyne that we’ve been cleaning up for two months:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/authors-fix-pb2.csv -f dc.contributor.author -t correct -m 3 -u dspace -d dspace -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run a few URL corrections for ilri.org and doi.org, etc:</li>
|
||||
</ul>
|
||||
<li><p>Run a few URL corrections for ilri.org and doi.org, etc:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://www.ilri.org','https://www.ilri.org') where resource_type_id=2 and text_value like '%http://www.ilri.org%';
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://mahider.ilri.org', 'https://cgspace.cgiar.org') where resource_type_id=2 and text_value like '%http://mahider.%.org%' and metadata_field_id not in (28);
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://dx.doi.org', 'https://dx.doi.org') where resource_type_id=2 and text_value like '%http://dx.doi.org%' and metadata_field_id not in (18,26,28,111);
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://doi.org', 'https://dx.doi.org') where resource_type_id=2 and text_value like '%http://doi.org%' and metadata_field_id not in (18,26,28,111);
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I skipped metadata fields like citation and description</li>
|
||||
<li><p>I skipped metadata fields like citation and description</p></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
File diff suppressed because one or more lines are too long
@ -10,8 +10,8 @@
|
||||
|
||||
|
||||
CGSpace was down for five hours in the morning while I was sleeping
|
||||
While looking in the logs for errors, I see tons of warnings about Atmire MQM:
|
||||
|
||||
While looking in the logs for errors, I see tons of warnings about Atmire MQM:
|
||||
|
||||
2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
|
||||
@ -20,9 +20,10 @@ While looking in the logs for errors, I see tons of warnings about Atmire MQM:
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
|
||||
|
||||
|
||||
I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
|
||||
|
||||
I’ve raised a ticket with Atmire to ask
|
||||
|
||||
Another worrying error from dspace.log is:
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
@ -36,8 +37,8 @@ Another worrying error from dspace.log is:
|
||||
|
||||
|
||||
CGSpace was down for five hours in the morning while I was sleeping
|
||||
While looking in the logs for errors, I see tons of warnings about Atmire MQM:
|
||||
|
||||
While looking in the logs for errors, I see tons of warnings about Atmire MQM:
|
||||
|
||||
2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
|
||||
@ -46,12 +47,13 @@ While looking in the logs for errors, I see tons of warnings about Atmire MQM:
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
|
||||
|
||||
|
||||
I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
|
||||
|
||||
I’ve raised a ticket with Atmire to ask
|
||||
|
||||
Another worrying error from dspace.log is:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -134,20 +136,21 @@ Another worrying error from dspace.log is:
|
||||
|
||||
<ul>
|
||||
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
|
||||
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p>
|
||||
|
||||
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
|
||||
<li>I’ve raised a ticket with Atmire to ask</li>
|
||||
<li>Another worrying error from dspace.log is:</li>
|
||||
<li><p>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</p></li>
|
||||
|
||||
<li><p>I’ve raised a ticket with Atmire to ask</p></li>
|
||||
|
||||
<li><p>Another worrying error from dspace.log is:</p></li>
|
||||
</ul>
|
||||
|
||||
<pre><code>org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
|
||||
@ -239,35 +242,35 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>The first error I see in dspace.log this morning is:</li>
|
||||
</ul>
|
||||
<li><p>The first error I see in dspace.log this morning is:</p>
|
||||
|
||||
<pre><code>2016-12-02 03:00:46,656 ERROR org.dspace.authority.AuthorityValueFinder @ anonymous::Error while retrieving AuthorityValue from solr:query\colon; id\colon;"b0b541c1-ec15-48bf-9209-6dbe8e338cdc"
|
||||
org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8081/solr/authority
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking through DSpace’s solr log I see that about 20 seconds before this, there were a few 30+ KiB solr queries</li>
|
||||
<li>The last logs here right before Solr became unresponsive (and right after I restarted it five hours later) were:</li>
|
||||
</ul>
|
||||
<li><p>Looking through DSpace’s solr log I see that about 20 seconds before this, there were a few 30+ KiB solr queries</p></li>
|
||||
|
||||
<li><p>The last logs here right before Solr became unresponsive (and right after I restarted it five hours later) were:</p>
|
||||
|
||||
<pre><code>2016-12-02 03:00:42,606 INFO org.apache.solr.core.SolrCore @ [statistics] webapp=/solr path=/select params={q=containerItem:72828+AND+type:0&shards=localhost:8081/solr/statistics-2010,localhost:8081/solr/statistics&fq=-isInternal:true&fq=-(author_mtdt:"CGIAR\+Institutional\+Learning\+and\+Change\+Initiative"++AND+subject_mtdt:"PARTNERSHIPS"+AND+subject_mtdt:"RESEARCH"+AND+subject_mtdt:"AGRICULTURE"+AND+subject_mtdt:"DEVELOPMENT"++AND+iso_mtdt:"en"+)&rows=0&wt=javabin&version=2} hits=0 status=0 QTime=19
|
||||
2016-12-02 08:28:23,908 INFO org.apache.solr.servlet.SolrDispatchFilter @ SolrDispatchFilter.init()
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>DSpace’s own Solr logs don’t give IP addresses, so I will have to enable Nginx’s logging of <code>/solr</code> so I can see where this request came from</li>
|
||||
<li>I enabled logging of <code>/rest/</code> and I think I’ll leave it on for good</li>
|
||||
<li>Also, the disk is nearly full because of log file issues, so I’m running some compression on DSpace logs</li>
|
||||
<li>Normally these stay uncompressed for a month just in case we need to look at them, so now I’ve just compressed anything older than 2 weeks so we can get some disk space back</li>
|
||||
<li><p>DSpace’s own Solr logs don’t give IP addresses, so I will have to enable Nginx’s logging of <code>/solr</code> so I can see where this request came from</p></li>
|
||||
|
||||
<li><p>I enabled logging of <code>/rest/</code> and I think I’ll leave it on for good</p></li>
|
||||
|
||||
<li><p>Also, the disk is nearly full because of log file issues, so I’m running some compression on DSpace logs</p></li>
|
||||
|
||||
<li><p>Normally these stay uncompressed for a month just in case we need to look at them, so now I’ve just compressed anything older than 2 weeks so we can get some disk space back</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-12-04">2016-12-04</h2>
|
||||
|
||||
<ul>
|
||||
<li>I got a weird report from the CGSpace checksum checker this morning</li>
|
||||
<li>It says 732 bitstreams have potential issues, for example:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It says 732 bitstreams have potential issues, for example:</p>
|
||||
|
||||
<pre><code>------------------------------------------------
|
||||
Bitstream Id = 6
|
||||
@ -286,14 +289,15 @@ Checksum Expected = 9959301aa4ca808d00957dff88214e38
|
||||
Checksum Calculated =
|
||||
Result = The bitstream could not be found
|
||||
-----------------------------------------------
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The first one seems ok, but I don’t know what to make of the second one…</li>
|
||||
<li>I had a look and there is indeed no file with the second checksum in the assetstore (ie, looking in <code>[dspace-dir]/assetstore/99/59/30/...</code>)</li>
|
||||
<li>For what it’s worth, there is no item on DSpace Test or S3 backups with that checksum either…</li>
|
||||
<li>In other news, I’m looking at JVM settings from the Solr 4.10.2 release, from <code>bin/solr.in.sh</code>:</li>
|
||||
</ul>
|
||||
<li><p>The first one seems ok, but I don’t know what to make of the second one…</p></li>
|
||||
|
||||
<li><p>I had a look and there is indeed no file with the second checksum in the assetstore (ie, looking in <code>[dspace-dir]/assetstore/99/59/30/...</code>)</p></li>
|
||||
|
||||
<li><p>For what it’s worth, there is no item on DSpace Test or S3 backups with that checksum either…</p></li>
|
||||
|
||||
<li><p>In other news, I’m looking at JVM settings from the Solr 4.10.2 release, from <code>bin/solr.in.sh</code>:</p>
|
||||
|
||||
<pre><code># These GC settings have shown to work well for a number of common Solr workloads
|
||||
GC_TUNE="-XX:-UseSuperWord \
|
||||
@ -314,11 +318,11 @@ GC_TUNE="-XX:-UseSuperWord \
|
||||
-XX:+CMSParallelRemarkEnabled \
|
||||
-XX:+ParallelRefProcEnabled \
|
||||
-XX:+AggressiveOpts"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I need to try these because they are recommended by the Solr project itself</li>
|
||||
<li>Also, as always, I need to read <a href="https://wiki.apache.org/solr/ShawnHeisey">Shawn Heisey’s wiki page on Solr</a></li>
|
||||
<li><p>I need to try these because they are recommended by the Solr project itself</p></li>
|
||||
|
||||
<li><p>Also, as always, I need to read <a href="https://wiki.apache.org/solr/ShawnHeisey">Shawn Heisey’s wiki page on Solr</a></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-12-05">2016-12-05</h2>
|
||||
@ -330,21 +334,19 @@ GC_TUNE="-XX:-UseSuperWord \
|
||||
<li>I did a few traceroutes from Jordan and Kenya and it seems that Linode’s Frankfurt datacenter is a few less hops and perhaps less packet loss than the London one, so I put the new server in Frankfurt</li>
|
||||
<li>Do initial provisioning</li>
|
||||
<li>Atmire responded about the MQM warnings in the DSpace logs</li>
|
||||
<li>Apparently we need to change the batch edit consumers in <code>dspace/config/dspace.cfg</code>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Apparently we need to change the batch edit consumers in <code>dspace/config/dspace.cfg</code>:</p>
|
||||
|
||||
<pre><code>event.consumer.batchedit.filters = Community|Collection+Create
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I haven’t tested it yet, but I created a pull request: <a href="https://github.com/ilri/DSpace/pull/289">#289</a></li>
|
||||
<li><p>I haven’t tested it yet, but I created a pull request: <a href="https://github.com/ilri/DSpace/pull/289">#289</a></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-12-06">2016-12-06</h2>
|
||||
|
||||
<ul>
|
||||
<li>Some author authority corrections and name standardizations for Peter:</li>
|
||||
</ul>
|
||||
<li><p>Some author authority corrections and name standardizations for Peter:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
|
||||
UPDATE 11
|
||||
@ -358,47 +360,55 @@ dspace=# update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183ac
|
||||
UPDATE 360
|
||||
dspace=# update metadatavalue set text_value='Grace, Delia', authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
||||
UPDATE 561
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Pay attention to the regex to prevent false positives in tricky cases with Dutch names!</li>
|
||||
<li>I will run these updates on DSpace Test and then force a Discovery reindex, and then run them on CGSpace next week</li>
|
||||
<li>More work on the KM4Dev Journal article</li>
|
||||
<li>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</li>
|
||||
<li>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</li>
|
||||
<li>Paola from CCAFS mentioned she also has the “take task” bug on CGSpace</li>
|
||||
<li>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</li>
|
||||
<li>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</li>
|
||||
<li>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn’t dedicated (also runs Solr, which can benefit from OS cache) so let’s try 1024MB</li>
|
||||
<li>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</li>
|
||||
</ul>
|
||||
<li><p>Pay attention to the regex to prevent false positives in tricky cases with Dutch names!</p></li>
|
||||
|
||||
<li><p>I will run these updates on DSpace Test and then force a Discovery reindex, and then run them on CGSpace next week</p></li>
|
||||
|
||||
<li><p>More work on the KM4Dev Journal article</p></li>
|
||||
|
||||
<li><p>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</p></li>
|
||||
|
||||
<li><p>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</p></li>
|
||||
|
||||
<li><p>Paola from CCAFS mentioned she also has the “take task” bug on CGSpace</p></li>
|
||||
|
||||
<li><p>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</p></li>
|
||||
|
||||
<li><p>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</p></li>
|
||||
|
||||
<li><p>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn’t dedicated (also runs Solr, which can benefit from OS cache) so let’s try 1024MB</p></li>
|
||||
|
||||
<li><p>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</p>
|
||||
|
||||
<pre><code>$ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority
|
||||
Retrieving all data
|
||||
Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
|
||||
Exception: null
|
||||
java.lang.NullPointerException
|
||||
at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
|
||||
at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
|
||||
at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
|
||||
real 8m39.913s
|
||||
user 1m54.190s
|
||||
sys 0m22.647s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-12-07">2016-12-07</h2>
|
||||
|
||||
@ -407,108 +417,107 @@ sys 0m22.647s
|
||||
<li>I will have to test more</li>
|
||||
<li>Anyways, I noticed that some of the authority values I set actually have versions of author names we don’t want, ie “Grace, D.”</li>
|
||||
<li>For example, do a Solr query for “first_name:Grace” and look at the results</li>
|
||||
<li>Querying that ID shows the fields that need to be changed:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Querying that ID shows the fields that need to be changed:</p>
|
||||
|
||||
<pre><code>{
|
||||
"responseHeader": {
|
||||
"status": 0,
|
||||
"QTime": 1,
|
||||
"params": {
|
||||
"q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
|
||||
"indent": "true",
|
||||
"wt": "json",
|
||||
"_": "1481102189244"
|
||||
}
|
||||
},
|
||||
"response": {
|
||||
"numFound": 1,
|
||||
"start": 0,
|
||||
"docs": [
|
||||
{
|
||||
"id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
|
||||
"field": "dc_contributor_author",
|
||||
"value": "Grace, D.",
|
||||
"deleted": false,
|
||||
"creation_date": "2016-11-10T15:13:40.318Z",
|
||||
"last_modified_date": "2016-11-10T15:13:40.318Z",
|
||||
"authority_type": "person",
|
||||
"first_name": "D.",
|
||||
"last_name": "Grace"
|
||||
}
|
||||
]
|
||||
}
|
||||
"responseHeader": {
|
||||
"status": 0,
|
||||
"QTime": 1,
|
||||
"params": {
|
||||
"q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
|
||||
"indent": "true",
|
||||
"wt": "json",
|
||||
"_": "1481102189244"
|
||||
}
|
||||
</code></pre>
|
||||
},
|
||||
"response": {
|
||||
"numFound": 1,
|
||||
"start": 0,
|
||||
"docs": [
|
||||
{
|
||||
"id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
|
||||
"field": "dc_contributor_author",
|
||||
"value": "Grace, D.",
|
||||
"deleted": false,
|
||||
"creation_date": "2016-11-10T15:13:40.318Z",
|
||||
"last_modified_date": "2016-11-10T15:13:40.318Z",
|
||||
"authority_type": "person",
|
||||
"first_name": "D.",
|
||||
"last_name": "Grace"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields…</li>
|
||||
<li>The update syntax should be something like this, but I’m getting errors from Solr:</li>
|
||||
</ul>
|
||||
<li><p>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields…</p></li>
|
||||
|
||||
<li><p>The update syntax should be something like this, but I’m getting errors from Solr:</p>
|
||||
|
||||
<pre><code>$ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'
|
||||
{
|
||||
"responseHeader":{
|
||||
"status":400,
|
||||
"QTime":0},
|
||||
"error":{
|
||||
"msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]",
|
||||
"code":400}}
|
||||
</code></pre>
|
||||
"responseHeader":{
|
||||
"status":400,
|
||||
"QTime":0},
|
||||
"error":{
|
||||
"msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]",
|
||||
"code":400}}
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</li>
|
||||
<li>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</li>
|
||||
</ul>
|
||||
<li><p>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</p></li>
|
||||
|
||||
<li><p>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
||||
UPDATE 561
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then I’ll reindex discovery and authority and see how the authority Solr core looks</li>
|
||||
<li>After this, now there are authorities for some of the “Grace, D.” and “Grace, Delia” text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</li>
|
||||
</ul>
|
||||
<li><p>Then I’ll reindex discovery and authority and see how the authority Solr core looks</p></li>
|
||||
|
||||
<li><p>After this, now there are authorities for some of the “Grace, D.” and “Grace, Delia” text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</p>
|
||||
|
||||
<pre><code>$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true'
|
||||
{
|
||||
"responseHeader":{
|
||||
"status":0,
|
||||
"QTime":0,
|
||||
"params":{
|
||||
"q":"id:18ea1525-2513-430a-8817-a834cd733fbc",
|
||||
"indent":"true",
|
||||
"wt":"json"}},
|
||||
"response":{"numFound":1,"start":0,"docs":[
|
||||
{
|
||||
"id":"18ea1525-2513-430a-8817-a834cd733fbc",
|
||||
"field":"dc_contributor_author",
|
||||
"value":"Grace, Delia",
|
||||
"deleted":false,
|
||||
"creation_date":"2016-12-07T10:54:34.356Z",
|
||||
"last_modified_date":"2016-12-07T10:54:34.356Z",
|
||||
"authority_type":"person",
|
||||
"first_name":"Delia",
|
||||
"last_name":"Grace"}]
|
||||
}}
|
||||
</code></pre>
|
||||
"responseHeader":{
|
||||
"status":0,
|
||||
"QTime":0,
|
||||
"params":{
|
||||
"q":"id:18ea1525-2513-430a-8817-a834cd733fbc",
|
||||
"indent":"true",
|
||||
"wt":"json"}},
|
||||
"response":{"numFound":1,"start":0,"docs":[
|
||||
{
|
||||
"id":"18ea1525-2513-430a-8817-a834cd733fbc",
|
||||
"field":"dc_contributor_author",
|
||||
"value":"Grace, Delia",
|
||||
"deleted":false,
|
||||
"creation_date":"2016-12-07T10:54:34.356Z",
|
||||
"last_modified_date":"2016-12-07T10:54:34.356Z",
|
||||
"authority_type":"person",
|
||||
"first_name":"Delia",
|
||||
"last_name":"Grace"}]
|
||||
}}
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</li>
|
||||
<li>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</li>
|
||||
<li>Better to use:</li>
|
||||
</ul>
|
||||
<li><p>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</p></li>
|
||||
|
||||
<li><p>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</p></li>
|
||||
|
||||
<li><p>Better to use:</p>
|
||||
|
||||
<pre><code>dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</li>
|
||||
<li>Perhaps another way is to just add our own UUID to the authority field for the text_value we like, then re-index authority so they get synced from PostgreSQL to Solr, then set the other text_values to use that authority ID</li>
|
||||
<li>Deploy MQM WARN fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/289">#289</a>)</li>
|
||||
<li>Deploy “take task” hack/fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/290">#290</a>)</li>
|
||||
<li>I ran the following author corrections and then reindexed discovery:</li>
|
||||
</ul>
|
||||
<li><p>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</p></li>
|
||||
|
||||
<li><p>Perhaps another way is to just add our own UUID to the authority field for the text_value we like, then re-index authority so they get synced from PostgreSQL to Solr, then set the other text_values to use that authority ID</p></li>
|
||||
|
||||
<li><p>Deploy MQM WARN fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/289">#289</a>)</p></li>
|
||||
|
||||
<li><p>Deploy “take task” hack/fix on CGSpace (<a href="https://github.com/ilri/DSpace/pull/290">#290</a>)</p></li>
|
||||
|
||||
<li><p>I ran the following author corrections and then reindexed discovery:</p>
|
||||
|
||||
<pre><code>update metadatavalue set authority='b041f2f4-19e7-4113-b774-0439baabd197', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Mora Benard%';
|
||||
update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-6fd5-4b43-9363-58d18e7952c9', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Hoek, R%';
|
||||
@ -516,68 +525,63 @@ update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-
|
||||
update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
|
||||
update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
|
||||
update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-12-08">2016-12-08</h2>
|
||||
|
||||
<ul>
|
||||
<li>Something weird happened and Peter Thorne’s names all ended up as “Thorne”, I guess because the original authority had that as its name value:</li>
|
||||
</ul>
|
||||
<li><p>Something weird happened and Peter Thorne’s names all ended up as “Thorne”, I guess because the original authority had that as its name value:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne%';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
------------------+--------------------------------------+------------
|
||||
Thorne, P.J. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
|
||||
Thorne | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
|
||||
Thorne-Lyman, A. | 0781e13a-1dc8-4e3f-82e8-5c422b44a344 | -1
|
||||
Thorne, M. D. | 54c52649-cefd-438d-893f-3bcef3702f07 | -1
|
||||
Thorne, P.J | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
|
||||
Thorne, P. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
|
||||
Thorne, P.J. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
|
||||
Thorne | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
|
||||
Thorne-Lyman, A. | 0781e13a-1dc8-4e3f-82e8-5c422b44a344 | -1
|
||||
Thorne, M. D. | 54c52649-cefd-438d-893f-3bcef3702f07 | -1
|
||||
Thorne, P.J | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
|
||||
Thorne, P. | 18349f29-61b1-44d7-ac60-89e55546e812 | 600
|
||||
(6 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I generated a new UUID using <code>uuidgen | tr [A-Z] [a-z]</code> and set it along with correct name variation for all records:</li>
|
||||
</ul>
|
||||
<li><p>I generated a new UUID using <code>uuidgen | tr [A-Z] [a-z]</code> and set it along with correct name variation for all records:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set authority='b2f7603d-2fb5-4018-923a-c4ec8d85b3bb', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812';
|
||||
UPDATE 43
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Apparently we also need to normalize Phil Thornton’s names to <code>Thornton, Philip K.</code>:
|
||||
<br /></li>
|
||||
</ul>
|
||||
<li><p>Apparently we also need to normalize Phil Thornton’s names to <code>Thornton, Philip K.</code>:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
---------------------+--------------------------------------+------------
|
||||
Thornton, P | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, P K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, P K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton. P.K. | 3e1e6639-d4fb-449e-9fce-ce06b5b0f702 | -1
|
||||
Thornton, P K . | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, P.K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, P.K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, Philip K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, Philip K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, P. K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, P | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, P K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, P K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton. P.K. | 3e1e6639-d4fb-449e-9fce-ce06b5b0f702 | -1
|
||||
Thornton, P K . | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, P.K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, P.K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, Philip K | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, Philip K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
Thornton, P. K. | 0d8369bb-57f7-4b2f-92aa-af820b183aca | 600
|
||||
(10 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Seems his original authorities are using an incorrect version of the name so I need to generate another UUID and tie it to the correct name, then reindex:</li>
|
||||
</ul>
|
||||
<li><p>Seems his original authorities are using an incorrect version of the name so I need to generate another UUID and tie it to the correct name, then reindex:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
|
||||
UPDATE 362
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It seems that, when you are messing with authority and author text values in the database, it is better to run authority reindex first (postgres→solr authority core) and then Discovery reindex (postgres→solr Discovery core)</li>
|
||||
<li>Everything looks ok after authority and discovery reindex</li>
|
||||
<li>In other news, I think we should really be using more RAM for PostgreSQL’s <code>shared_buffers</code></li>
|
||||
<li>The <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html">PostgreSQL documentation</a> recommends using 25% of the system’s RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache</li>
|
||||
<li><p>It seems that, when you are messing with authority and author text values in the database, it is better to run authority reindex first (postgres→solr authority core) and then Discovery reindex (postgres→solr Discovery core)</p></li>
|
||||
|
||||
<li><p>Everything looks ok after authority and discovery reindex</p></li>
|
||||
|
||||
<li><p>In other news, I think we should really be using more RAM for PostgreSQL’s <code>shared_buffers</code></p></li>
|
||||
|
||||
<li><p>The <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html">PostgreSQL documentation</a> recommends using 25% of the system’s RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-12-09">2016-12-09</h2>
|
||||
@ -585,15 +589,14 @@ UPDATE 362
|
||||
<ul>
|
||||
<li>More work on finishing rough draft of KM4Dev article</li>
|
||||
<li>Set PostgreSQL’s <code>shared_buffers</code> on CGSpace to 10% of system RAM (1200MB)</li>
|
||||
<li>Run the following author corrections on CGSpace:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Run the following author corrections on CGSpace:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set authority='34df639a-42d8-4867-a3f2-1892075fcb3f', text_value='Thorne, P.J.' where resource_type_id=2 and metadata_field_id=3 and authority='18349f29-61b1-44d7-ac60-89e55546e812' or authority='021cd183-946b-42bb-964e-522ebff02993';
|
||||
dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab764', text_value='Thornton, Philip K.', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^Thornton[,\.]? P.*';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The authority IDs were different now than when I was looking a few days ago so I had to adjust them here</li>
|
||||
<li><p>The authority IDs were different now than when I was looking a few days ago so I had to adjust them here</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-12-11">2016-12-11</h2>
|
||||
@ -606,40 +609,38 @@ dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab76
|
||||
<img src="/cgspace-notes/2016/12/postgres_connections_ALL-week.png" alt="postgres_connections_ALL-week" /></p>
|
||||
|
||||
<ul>
|
||||
<li>Looking at CIAT records from last week again, they have a lot of double authors like:</li>
|
||||
</ul>
|
||||
<li><p>Looking at CIAT records from last week again, they have a lot of double authors like:</p>
|
||||
|
||||
<pre><code>International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600
|
||||
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500
|
||||
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Some in the same <code>dc.contributor.author</code> field, and some in others like <code>dc.contributor.author[en_US]</code> etc</li>
|
||||
<li>Removing the duplicates in OpenRefine and uploading a CSV to DSpace says “no changes detected”</li>
|
||||
<li>Seems like the only way to sortof clean these up would be to start in SQL:</li>
|
||||
</ul>
|
||||
<li><p>Some in the same <code>dc.contributor.author</code> field, and some in others like <code>dc.contributor.author[en_US]</code> etc</p></li>
|
||||
|
||||
<li><p>Removing the duplicates in OpenRefine and uploading a CSV to DSpace says “no changes detected”</p></li>
|
||||
|
||||
<li><p>Seems like the only way to sortof clean these up would be to start in SQL:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'International Center for Tropical Agriculture';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
-----------------------------------------------+--------------------------------------+------------
|
||||
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1
|
||||
International Center for Tropical Agriculture | | 600
|
||||
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500
|
||||
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600
|
||||
International Center for Tropical Agriculture | | -1
|
||||
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500
|
||||
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600
|
||||
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1
|
||||
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0
|
||||
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1
|
||||
International Center for Tropical Agriculture | | 600
|
||||
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500
|
||||
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600
|
||||
International Center for Tropical Agriculture | | -1
|
||||
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500
|
||||
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600
|
||||
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1
|
||||
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0
|
||||
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
|
||||
UPDATE 1693
|
||||
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', text_value='International Center for Tropical Agriculture', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%CIAT%';
|
||||
UPDATE 35
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Work on article for KM4Dev journal</li>
|
||||
<li><p>Work on article for KM4Dev journal</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-12-13">2016-12-13</h2>
|
||||
@ -659,31 +660,34 @@ UPDATE 35
|
||||
<li>Would probably be better to make custom logrotate files for them in the future</li>
|
||||
<li>Clean up some unneeded log files from 2014 (they weren’t large, just don’t need them)</li>
|
||||
<li>So basically, new cron jobs for logs should look something like this:</li>
|
||||
<li>Find any file named <code>*.log*</code> that isn’t <code>dspace.log*</code>, isn’t already zipped, and is older than one day, and zip it:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Find any file named <code>*.log*</code> that isn’t <code>dspace.log*</code>, isn’t already zipped, and is older than one day, and zip it:</p>
|
||||
|
||||
<pre><code># find /home/dspacetest.cgiar.org/log -regextype posix-extended -iregex ".*\.log.*" ! -iregex ".*dspace\.log.*" ! -iregex ".*\.(gz|lrz|lzo|xz)" ! -newermt "Yesterday" -exec schedtool -B -e ionice -c2 -n7 xz {} \;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Since there is <code>xzgrep</code> and <code>xzless</code> we can actually just zip them after one day, why not?!</li>
|
||||
<li>We can keep the zipped ones for two weeks just in case we need to look for errors, etc, and delete them after that</li>
|
||||
<li>I use <code>schedtool -B</code> and <code>ionice -c2 -n7</code> to set the CPU scheduling to <code>SCHED_BATCH</code> and the IO to best effort which should, in theory, impact important system processes like Tomcat and PostgreSQL less</li>
|
||||
<li>When the tasks are running you can see that the policies do apply:</li>
|
||||
</ul>
|
||||
<li><p>Since there is <code>xzgrep</code> and <code>xzless</code> we can actually just zip them after one day, why not?!</p></li>
|
||||
|
||||
<li><p>We can keep the zipped ones for two weeks just in case we need to look for errors, etc, and delete them after that</p></li>
|
||||
|
||||
<li><p>I use <code>schedtool -B</code> and <code>ionice -c2 -n7</code> to set the CPU scheduling to <code>SCHED_BATCH</code> and the IO to best effort which should, in theory, impact important system processes like Tomcat and PostgreSQL less</p></li>
|
||||
|
||||
<li><p>When the tasks are running you can see that the policies do apply:</p>
|
||||
|
||||
<pre><code>$ schedtool $(ps aux | grep "xz /home" | grep -v grep | awk '{print $2}') && ionice -p $(ps aux | grep "xz /home" | grep -v grep | awk '{print $2}')
|
||||
PID 17049: PRIO 0, POLICY B: SCHED_BATCH , NICE 0, AFFINITY 0xf
|
||||
best-effort: prio 7
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>All in all this should free up a few gigs (we were at 9.3GB free when I started)</li>
|
||||
<li>Next thing to look at is whether we need Tomcat’s access logs</li>
|
||||
<li>I just looked and it seems that we saved 10GB by zipping these logs</li>
|
||||
<li>Some users pointed out issues with the “most popular” stats on a community or collection</li>
|
||||
<li>This error appears in the logs when you try to view them:</li>
|
||||
</ul>
|
||||
<li><p>All in all this should free up a few gigs (we were at 9.3GB free when I started)</p></li>
|
||||
|
||||
<li><p>Next thing to look at is whether we need Tomcat’s access logs</p></li>
|
||||
|
||||
<li><p>I just looked and it seems that we saved 10GB by zipping these logs</p></li>
|
||||
|
||||
<li><p>Some users pointed out issues with the “most popular” stats on a community or collection</p></li>
|
||||
|
||||
<li><p>This error appears in the logs when you try to view them:</p>
|
||||
|
||||
<pre><code>2016-12-13 21:17:37,486 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
|
||||
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceObjectDatasetGenerator.toDatasetQuery(Lorg/dspace/core/Context;)Lcom/atmire/statistics/content/DatasetQuery;
|
||||
@ -735,11 +739,11 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
|
||||
at com.atmire.statistics.mostpopular.JSONStatsMostPopularGenerator.generate(SourceFile:246)
|
||||
at com.atmire.app.xmlui.aspect.statistics.JSONStatsMostPopular.generate(JSONStatsMostPopular.java:145)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It happens on development and production, so I will have to ask Atmire</li>
|
||||
<li>Most likely an issue with installation/configuration</li>
|
||||
<li><p>It happens on development and production, so I will have to ask Atmire</p></li>
|
||||
|
||||
<li><p>Most likely an issue with installation/configuration</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-12-14">2016-12-14</h2>
|
||||
@ -777,8 +781,8 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
|
||||
<li>Last week, when we asked CGNET to update the DNS records this weekend, they misunderstood and did it immediately</li>
|
||||
<li>We quickly told them to undo it, but I just realized they didn’t undo the IPv6 AAAA record!</li>
|
||||
<li>None of our users in African institutes will have IPv6, but some Europeans might, so I need to check if any submissions have been added since then</li>
|
||||
<li>Update some names and authorities in the database:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Update some names and authorities in the database:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set authority='5ff35043-942e-4d0a-b377-4daed6e3c1a3', confidence=600, text_value='Duncan, Alan' where resource_type_id=2 and metadata_field_id=3 and text_value ~ '^.*Duncan,? A.*';
|
||||
UPDATE 204
|
||||
@ -786,15 +790,17 @@ dspace=# update metadatavalue set authority='46804b53-ea30-4a85-9ccf-b79a35816fa
|
||||
UPDATE 89
|
||||
dspace=# update metadatavalue set authority='f840da02-26e7-4a74-b7ba-3e2b723f3684', confidence=600, text_value='Lukuyu, Ben A.' where resource_type_id=2 and metadata_field_id=3 and text_value like '%Lukuyu, B%';
|
||||
UPDATE 140
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generated a new UUID for Ben using <code>uuidgen | tr [A-Z] [a-z]</code> as the one in Solr had his ORCID but the name format was incorrect</li>
|
||||
<li>In theory DSpace should be able to check names from ORCID and update the records in the database, but I find that this doesn’t work (see Jira bug <a href="https://jira.duraspace.org/browse/DS-3302">DS-3302</a>)</li>
|
||||
<li>I need to run these updates along with the other one for CIAT that I found last week</li>
|
||||
<li>Enable OCSP stapling for hosts >= Ubuntu 16.04 in our Ansible playbooks (<a href="https://github.com/ilri/rmg-ansible-public/pull/76">#76</a>)</li>
|
||||
<li>Working for DSpace Test on the second response:</li>
|
||||
</ul>
|
||||
<li><p>Generated a new UUID for Ben using <code>uuidgen | tr [A-Z] [a-z]</code> as the one in Solr had his ORCID but the name format was incorrect</p></li>
|
||||
|
||||
<li><p>In theory DSpace should be able to check names from ORCID and update the records in the database, but I find that this doesn’t work (see Jira bug <a href="https://jira.duraspace.org/browse/DS-3302">DS-3302</a>)</p></li>
|
||||
|
||||
<li><p>I need to run these updates along with the other one for CIAT that I found last week</p></li>
|
||||
|
||||
<li><p>Enable OCSP stapling for hosts >= Ubuntu 16.04 in our Ansible playbooks (<a href="https://github.com/ilri/rmg-ansible-public/pull/76">#76</a>)</p></li>
|
||||
|
||||
<li><p>Working for DSpace Test on the second response:</p>
|
||||
|
||||
<pre><code>$ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgiar.org -tls1_2 -tlsextdebug -status
|
||||
...
|
||||
@ -803,21 +809,18 @@ $ openssl s_client -connect dspacetest.cgiar.org:443 -servername dspacetest.cgia
|
||||
...
|
||||
OCSP Response Data:
|
||||
...
|
||||
Cert Status: good
|
||||
</code></pre>
|
||||
Cert Status: good
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Migrate CGSpace to new server, roughly following these steps:</li>
|
||||
<li>On old server:</li>
|
||||
</ul>
|
||||
<li><p>Migrate CGSpace to new server, roughly following these steps:</p></li>
|
||||
|
||||
<li><p>On old server:</p>
|
||||
|
||||
<pre><code># service tomcat7 stop
|
||||
# /home/backup/scripts/postgres_backup.sh
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>On new server:</li>
|
||||
</ul>
|
||||
<li><p>On new server:</p>
|
||||
|
||||
<pre><code># systemctl stop tomcat7
|
||||
# rsync -4 -av --delete 178.79.187.182:/home/cgspace.cgiar.org/assetstore/ /home/cgspace.cgiar.org/assetstore/
|
||||
@ -843,10 +846,9 @@ $ cd src/git/DSpace/dspace/target/dspace-installer
|
||||
$ ant update clean_backups
|
||||
$ exit
|
||||
# systemctl start tomcat7
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It took about twenty minutes and afterwards I had to check a few things, like:
|
||||
<li><p>It took about twenty minutes and afterwards I had to check a few things, like:</p>
|
||||
|
||||
<ul>
|
||||
<li>check and enable systemd timer for let’s encrypt</li>
|
||||
@ -862,11 +864,12 @@ $ exit
|
||||
|
||||
<ul>
|
||||
<li>Abenet wanted a CSV of the IITA community, but the web export doesn’t include the <code>dc.date.accessioned</code> field</li>
|
||||
<li>I had to export it from the command line using the <code>-a</code> flag:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I had to export it from the command line using the <code>-a</code> flag:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-12-28">2016-12-28</h2>
|
||||
|
||||
|
@ -27,7 +27,7 @@ I checked to see if the Solr sharding task that is supposed to run on January 1s
|
||||
I tested on DSpace Test as well and it doesn’t work there either
|
||||
I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -117,69 +117,65 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua
|
||||
<h2 id="2017-01-04">2017-01-04</h2>
|
||||
|
||||
<ul>
|
||||
<li>I tried to shard my local dev instance and it fails the same way:</li>
|
||||
</ul>
|
||||
<li><p>I tried to shard my local dev instance and it fails the same way:</p>
|
||||
|
||||
<pre><code>$ JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" ~/dspace/bin/dspace stats-util -s
|
||||
Moving: 9318 into core statistics-2016
|
||||
Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2016
|
||||
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2016
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
|
||||
at org.dspace.statistics.SolrLogger.shardSolrIndex(SourceFile:2291)
|
||||
at org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:106)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
|
||||
at org.dspace.statistics.SolrLogger.shardSolrIndex(SourceFile:2291)
|
||||
at org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:106)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
Caused by: org.apache.http.client.ClientProtocolException
|
||||
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:867)
|
||||
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
|
||||
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
|
||||
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
|
||||
... 10 more
|
||||
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:867)
|
||||
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
|
||||
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
|
||||
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
|
||||
... 10 more
|
||||
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed.
|
||||
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:659)
|
||||
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
|
||||
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
|
||||
... 14 more
|
||||
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:659)
|
||||
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
|
||||
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
|
||||
... 14 more
|
||||
Caused by: java.net.SocketException: Broken pipe (Write failed)
|
||||
at java.net.SocketOutputStream.socketWrite0(Native Method)
|
||||
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
|
||||
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
|
||||
at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
|
||||
at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:124)
|
||||
at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:181)
|
||||
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:132)
|
||||
at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
|
||||
at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
|
||||
at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
|
||||
at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
|
||||
at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
|
||||
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
|
||||
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
|
||||
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
|
||||
... 16 more
|
||||
</code></pre>
|
||||
at java.net.SocketOutputStream.socketWrite0(Native Method)
|
||||
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
|
||||
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
|
||||
at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:181)
|
||||
at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:124)
|
||||
at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:181)
|
||||
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:132)
|
||||
at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
|
||||
at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
|
||||
at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
|
||||
at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
|
||||
at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
|
||||
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
|
||||
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
|
||||
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
|
||||
... 16 more
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And the DSpace log shows:</li>
|
||||
</ul>
|
||||
<li><p>And the DSpace log shows:</p>
|
||||
|
||||
<pre><code>2017-01-04 22:39:05,412 INFO org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
|
||||
2017-01-04 22:39:05,412 INFO org.dspace.statistics.SolrLogger @ Moving: 9318 records into core statistics-2016
|
||||
2017-01-04 22:39:07,310 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ I/O exception (java.net.SocketException) caught when processing request to {}->http://localhost:8081: Broken pipe (Write failed)
|
||||
2017-01-04 22:39:07,310 INFO org.apache.http.impl.client.SystemDefaultHttpClient @ Retrying request to {}->http://localhost:8081
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Despite failing instantly, a <code>statistics-2016</code> directory was created, but it only has a data dir (no conf)</li>
|
||||
<li>The Tomcat access logs show more:</li>
|
||||
</ul>
|
||||
<li><p>Despite failing instantly, a <code>statistics-2016</code> directory was created, but it only has a data dir (no conf)</p></li>
|
||||
|
||||
<li><p>The Tomcat access logs show more:</p>
|
||||
|
||||
<pre><code>127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] "GET /solr/statistics/select?q=type%3A2+AND+id%3A1&wt=javabin&version=2 HTTP/1.1" 200 107
|
||||
127.0.0.1 - - [04/Jan/2017:22:39:05 +0200] "GET /solr/statistics/select?q=*%3A*&rows=0&facet=true&facet.range=time&facet.range.start=NOW%2FYEAR-17YEARS&facet.range.end=NOW%2FYEAR%2B0YEARS&facet.range.gap=%2B1YEAR&facet.mincount=1&wt=javabin&version=2 HTTP/1.1" 200 423
|
||||
@ -190,10 +186,9 @@ Caused by: java.net.SocketException: Broken pipe (Write failed)
|
||||
127.0.0.1 - - [04/Jan/2017:22:39:07 +0200] "POST /solr//statistics-2016/update/csv?commit=true&softCommit=false&waitSearcher=true&f.previousWorkflowStep.split=true&f.previousWorkflowStep.separator=%7C&f.previousWorkflowStep.encapsulator=%22&f.actingGroupId.split=true&f.actingGroupId.separator=%7C&f.actingGroupId.encapsulator=%22&f.containerCommunity.split=true&f.containerCommunity.separator=%7C&f.containerCommunity.encapsulator=%22&f.range.split=true&f.range.separator=%7C&f.range.encapsulator=%22&f.containerItem.split=true&f.containerItem.separator=%7C&f.containerItem.encapsulator=%22&f.p_communities_map.split=true&f.p_communities_map.separator=%7C&f.p_communities_map.encapsulator=%22&f.ngram_query_search.split=true&f.ngram_query_search.separator=%7C&f.ngram_query_search.encapsulator=%22&f.containerBitstream.split=true&f.containerBitstream.separator=%7C&f.containerBitstream.encapsulator=%22&f.owningItem.split=true&f.owningItem.separator=%7C&f.owningItem.encapsulator=%22&f.actingGroupParentId.split=true&f.actingGroupParentId.separator=%7C&f.actingGroupParentId.encapsulator=%22&f.text.split=true&f.text.separator=%7C&f.text.encapsulator=%22&f.simple_query_search.split=true&f.simple_query_search.separator=%7C&f.simple_query_search.encapsulator=%22&f.owningComm.split=true&f.owningComm.separator=%7C&f.owningComm.encapsulator=%22&f.owner.split=true&f.owner.separator=%7C&f.owner.encapsulator=%22&f.filterquery.split=true&f.filterquery.separator=%7C&f.filterquery.encapsulator=%22&f.p_group_map.split=true&f.p_group_map.separator=%7C&f.p_group_map.encapsulator=%22&f.actorMemberGroupId.split=true&f.actorMemberGroupId.separator=%7C&f.actorMemberGroupId.encapsulator=%22&f.bitstreamId.split=true&f.bitstreamId.separator=%7C&f.bitstreamId.encapsulator=%22&f.group_name.split=true&f.group_name.separator=%7C&f.group_name.encapsulator=%22&f.p_communities_name.split=true&f.p_communities_name.separator=%7C&f.p_communities_name.encapsulator=%22&f.query.split=true&f.query.separator=%7C&f.query.encapsulator=%22&f.workflowStep.split=true&f.workflowStep.separator=%7C&f.workflowStep.encapsulator=%22&f.containerCollection.split=true&f.containerCollection.separator=%7C&f.containerCollection.encapsulator=%22&f.complete_query_search.split=true&f.complete_query_search.separator=%7C&f.complete_query_search.encapsulator=%22&f.p_communities_id.split=true&f.p_communities_id.separator=%7C&f.p_communities_id.encapsulator=%22&f.rangeDescription.split=true&f.rangeDescription.separator=%7C&f.rangeDescription.encapsulator=%22&f.group_id.split=true&f.group_id.separator=%7C&f.group_id.encapsulator=%22&f.bundleName.split=true&f.bundleName.separator=%7C&f.bundleName.encapsulator=%22&f.ngram_simplequery_search.split=true&f.ngram_simplequery_search.separator=%7C&f.ngram_simplequery_search.encapsulator=%22&f.group_map.split=true&f.group_map.separator=%7C&f.group_map.encapsulator=%22&f.owningColl.split=true&f.owningColl.separator=%7C&f.owningColl.encapsulator=%22&f.p_group_id.split=true&f.p_group_id.separator=%7C&f.p_group_id.encapsulator=%22&f.p_group_name.split=true&f.p_group_name.separator=%7C&f.p_group_name.encapsulator=%22&wt=javabin&version=2 HTTP/1.1" 409 156
|
||||
127.0.0.1 - - [04/Jan/2017:22:44:00 +0200] "POST /solr/datatables/update?wt=javabin&version=2 HTTP/1.1" 200 41
|
||||
127.0.0.1 - - [04/Jan/2017:22:44:00 +0200] "POST /solr/datatables/update HTTP/1.1" 200 40
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Very interesting… it creates the core and then fails somehow</li>
|
||||
<li><p>Very interesting… it creates the core and then fails somehow</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-01-08">2017-01-08</h2>
|
||||
@ -217,65 +212,61 @@ Caused by: java.net.SocketException: Broken pipe (Write failed)
|
||||
<li>I tried to clean up the duplicate mappings by exporting the item’s metadata to CSV, editing, and re-importing, but DSpace said “no changes were detected”</li>
|
||||
<li>I’ve asked on the dspace-tech mailing list to see if anyone can help</li>
|
||||
<li>I found an old post on the mailing list discussing a similar issue, and listing some SQL commands that might help</li>
|
||||
<li>For example, this shows 186 mappings for the item, the first three of which are real:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>For example, this shows 186 mappings for the item, the first three of which are real:</p>
|
||||
|
||||
<pre><code>dspace=# select * from collection2item where item_id = '80596';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then I deleted the others:</li>
|
||||
</ul>
|
||||
<li><p>Then I deleted the others:</p>
|
||||
|
||||
<pre><code>dspace=# delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807);
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And in the item view it now shows the correct mappings</li>
|
||||
<li>I will have to ask the DSpace people if this is a valid approach</li>
|
||||
<li>Finish looking at the Journal Title corrections of the top 500 Journal Titles so we can make a controlled vocabulary from it</li>
|
||||
<li><p>And in the item view it now shows the correct mappings</p></li>
|
||||
|
||||
<li><p>I will have to ask the DSpace people if this is a valid approach</p></li>
|
||||
|
||||
<li><p>Finish looking at the Journal Title corrections of the top 500 Journal Titles so we can make a controlled vocabulary from it</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-01-11">2017-01-11</h2>
|
||||
|
||||
<ul>
|
||||
<li>Maria found another item with duplicate mappings: <a href="https://cgspace.cgiar.org/handle/10568/78658">https://cgspace.cgiar.org/handle/10568/78658</a></li>
|
||||
<li>Error in <code>fix-metadata-values.py</code> when it tries to print the value for Entwicklung & Ländlicher Raum:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Error in <code>fix-metadata-values.py</code> when it tries to print the value for Entwicklung & Ländlicher Raum:</p>
|
||||
|
||||
<pre><code>Traceback (most recent call last):
|
||||
File "./fix-metadata-values.py", line 80, in <module>
|
||||
print("Fixing {} occurences of: {}".format(records_to_fix, record[0]))
|
||||
File "./fix-metadata-values.py", line 80, in <module>
|
||||
print("Fixing {} occurences of: {}".format(records_to_fix, record[0]))
|
||||
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Seems we need to encode as UTF-8 before printing to screen, ie:</li>
|
||||
</ul>
|
||||
<li><p>Seems we need to encode as UTF-8 before printing to screen, ie:</p>
|
||||
|
||||
<pre><code>print("Fixing {} occurences of: {}".format(records_to_fix, record[0].encode('utf-8')))
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>See: <a href="http://stackoverflow.com/a/36427358/487333">http://stackoverflow.com/a/36427358/487333</a></li>
|
||||
<li>I’m actually not sure if we need to encode() the strings to UTF-8 before writing them to the database… I’ve never had this issue before</li>
|
||||
<li>Now back to cleaning up some journal titles so we can make the controlled vocabulary:</li>
|
||||
</ul>
|
||||
<li><p>See: <a href="http://stackoverflow.com/a/36427358/487333">http://stackoverflow.com/a/36427358/487333</a></p></li>
|
||||
|
||||
<li><p>I’m actually not sure if we need to encode() the strings to UTF-8 before writing them to the database… I’ve never had this issue before</p></li>
|
||||
|
||||
<li><p>Now back to cleaning up some journal titles so we can make the controlled vocabulary:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/fix-27-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Now get the top 500 journal titles:</li>
|
||||
</ul>
|
||||
<li><p>Now get the top 500 journal titles:</p>
|
||||
|
||||
<pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The values are a bit dirty and outdated, since the file I had given to Abenet and Peter was from November</li>
|
||||
<li>I will have to go through these and fix some more before making the controlled vocabulary</li>
|
||||
<li>Added 30 more corrections or so, now there are 49 total and I’ll have to get the top 500 after applying them</li>
|
||||
<li><p>The values are a bit dirty and outdated, since the file I had given to Abenet and Peter was from November</p></li>
|
||||
|
||||
<li><p>I will have to go through these and fix some more before making the controlled vocabulary</p></li>
|
||||
|
||||
<li><p>Added 30 more corrections or so, now there are 49 total and I’ll have to get the top 500 after applying them</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-01-13">2017-01-13</h2>
|
||||
@ -287,14 +278,14 @@ UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15:
|
||||
<h2 id="2017-01-16">2017-01-16</h2>
|
||||
|
||||
<ul>
|
||||
<li>Fix the two items Maria found with duplicate mappings with this script:</li>
|
||||
</ul>
|
||||
<li><p>Fix the two items Maria found with duplicate mappings with this script:</p>
|
||||
|
||||
<pre><code>/* 184 in correct mappings: https://cgspace.cgiar.org/handle/10568/78596 */
|
||||
delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807);
|
||||
/* 1 incorrect mapping: https://cgspace.cgiar.org/handle/10568/78658 */
|
||||
delete from collection2item where id = '91082';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-01-17">2017-01-17</h2>
|
||||
|
||||
@ -303,48 +294,43 @@ delete from collection2item where id = '91082';
|
||||
<li>There are about 30 files with <code>%20</code> (space) and Spanish accents in the file name</li>
|
||||
<li>At first I thought we should fix these, but actually it is <a href="https://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1">prescribed by the W3 working group to convert these to UTF8 and URL encode them</a>!</li>
|
||||
<li>And the file names don’t really matter either, as long as the SAF Builder tool can read them—after that DSpace renames them with a hash in the assetstore</li>
|
||||
<li>Seems like the only ones I should replace are the <code>'</code> apostrophe characters, as <code>%27</code>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Seems like the only ones I should replace are the <code>'</code> apostrophe characters, as <code>%27</code>:</p>
|
||||
|
||||
<pre><code>value.replace("'",'%27')
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Add the item’s Type to the filename column as a hint to SAF Builder so it can set a more useful description field:</li>
|
||||
</ul>
|
||||
<li><p>Add the item’s Type to the filename column as a hint to SAF Builder so it can set a more useful description field:</p>
|
||||
|
||||
<pre><code>value + "__description:" + cells["dc.type"].value
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Test importing of the new CIAT records (actually there are 232, not 234):</li>
|
||||
</ul>
|
||||
<li><p>Test importing of the new CIAT records (actually there are 232, not 234):</p>
|
||||
|
||||
<pre><code>$ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &> /tmp/ciat.log
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Many of the PDFs are 20, 30, 40, 50+ MB, which makes a total of 4GB</li>
|
||||
<li>These are scanned from paper and likely have no compression, so we should try to test if these compression techniques help without comprimising the quality too much:</li>
|
||||
</ul>
|
||||
<li><p>Many of the PDFs are 20, 30, 40, 50+ MB, which makes a total of 4GB</p></li>
|
||||
|
||||
<li><p>These are scanned from paper and likely have no compression, so we should try to test if these compression techniques help without comprimising the quality too much:</p>
|
||||
|
||||
<pre><code>$ convert -compress Zip -density 150x150 input.pdf output.pdf
|
||||
$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Somewhere on the Internet suggested using a DPI of 144</li>
|
||||
<li><p>Somewhere on the Internet suggested using a DPI of 144</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-01-19">2017-01-19</h2>
|
||||
|
||||
<ul>
|
||||
<li>In testing a random sample of CIAT’s PDFs for compressability, it looks like all of these methods generally increase the file size so we will just import them as they are</li>
|
||||
<li>Import 232 CIAT records into CGSpace:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Import 232 CIAT records into CGSpace:</p>
|
||||
|
||||
<pre><code>$ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" /home/cgspace.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/68704 --source /home/aorth/CIAT_232/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &> /tmp/ciat.log
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-01-22">2017-01-22</h2>
|
||||
|
||||
@ -357,40 +343,37 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
|
||||
|
||||
<ul>
|
||||
<li>I merged Atmire’s pull request into the development branch so they can deploy it on DSpace Test</li>
|
||||
<li>Move some old ILRI Program communities to a new subcommunity for former programs (<sup>10568</sup>⁄<sub>79164</sub>):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Move some old ILRI Program communities to a new subcommunity for former programs (<sup>10568</sup>⁄<sub>79164</sub>):</p>
|
||||
|
||||
<pre><code>$ for community in 10568/171 10568/27868 10568/231 10568/27869 10568/150 10568/230 10568/32724 10568/172; do /home/cgspace.cgiar.org/bin/dspace community-filiator --remove --parent=10568/27866 --child="$community" && /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/79164 --child="$community"; done
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Move some collections with <a href="https://gist.github.com/alanorth/e60b530ed4989df0c731afbb0c640515"><code>move-collections.sh</code></a> using the following config:</li>
|
||||
</ul>
|
||||
<li><p>Move some collections with <a href="https://gist.github.com/alanorth/e60b530ed4989df0c731afbb0c640515"><code>move-collections.sh</code></a> using the following config:</p>
|
||||
|
||||
<pre><code>10568/42161 10568/171 10568/79341
|
||||
10568/41914 10568/171 10568/79340
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-01-24">2017-01-24</h2>
|
||||
|
||||
<ul>
|
||||
<li>Run all updates on DSpace Test and reboot the server</li>
|
||||
<li>Run fixes for Journal titles on CGSpace:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Run fixes for Journal titles on CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/fix-49-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p 'password'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Create a new list of the top 500 journal titles from the database:</li>
|
||||
</ul>
|
||||
<li><p>Create a new list of the top 500 journal titles from the database:</p>
|
||||
|
||||
<pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request (<a href="https://github.com/ilri/DSpace/pull/298">#298</a>)</li>
|
||||
<li>This would be the last issue remaining to close the meta issue about switching to controlled vocabularies (<a href="https://github.com/ilri/DSpace/pull/69">#69</a>)</li>
|
||||
<li><p>Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request (<a href="https://github.com/ilri/DSpace/pull/298">#298</a>)</p></li>
|
||||
|
||||
<li><p>This would be the last issue remaining to close the meta issue about switching to controlled vocabularies (<a href="https://github.com/ilri/DSpace/pull/69">#69</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-01-25">2017-01-25</h2>
|
||||
|
@ -11,20 +11,19 @@
|
||||
|
||||
An item was mapped twice erroneously again, so I had to remove one of the mappings manually:
|
||||
|
||||
|
||||
dspace=# select * from collection2item where item_id = '80278';
|
||||
id | collection_id | item_id
|
||||
id | collection_id | item_id
|
||||
-------+---------------+---------
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
(3 rows)
|
||||
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
|
||||
DELETE 1
|
||||
|
||||
|
||||
|
||||
Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
|
||||
|
||||
Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
@ -39,23 +38,22 @@ Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
|
||||
|
||||
An item was mapped twice erroneously again, so I had to remove one of the mappings manually:
|
||||
|
||||
|
||||
dspace=# select * from collection2item where item_id = '80278';
|
||||
id | collection_id | item_id
|
||||
id | collection_id | item_id
|
||||
-------+---------------+---------
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
(3 rows)
|
||||
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
|
||||
DELETE 1
|
||||
|
||||
|
||||
|
||||
Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
|
||||
|
||||
Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -137,23 +135,22 @@ Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
|
||||
<h2 id="2017-02-07">2017-02-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
|
||||
</ul>
|
||||
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p>
|
||||
|
||||
<pre><code>dspace=# select * from collection2item where item_id = '80278';
|
||||
id | collection_id | item_id
|
||||
id | collection_id | item_id
|
||||
-------+---------------+---------
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
(3 rows)
|
||||
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
|
||||
<li>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
|
||||
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li>
|
||||
|
||||
<li><p>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-02-08">2017-02-08</h2>
|
||||
@ -168,11 +165,12 @@ DELETE 1
|
||||
<li>POLICIES AND INSTITUTIONS → PRIORITIES AND POLICIES FOR CSA</li>
|
||||
</ul></li>
|
||||
<li>The climate risk management one doesn’t exist, so I will have to ask Magdalena if they want me to add it to the input forms</li>
|
||||
<li>Start testing some nearly 500 author corrections that CCAFS sent me:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Start testing some nearly 500 author corrections that CCAFS sent me:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/CCAFS-Authors-Feb-7.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-02-09">2017-02-09</h2>
|
||||
|
||||
@ -181,11 +179,12 @@ DELETE 1
|
||||
<li>Looks like simply adding a new metadata field to <code>dspace/config/registries/cgiar-types.xml</code> and restarting DSpace causes the field to get added to the rregistry</li>
|
||||
<li>It requires a restart but at least it allows you to manage the registry programmatically</li>
|
||||
<li>It’s not a very good way to manage the registry, though, as removing one there doesn’t cause it to be removed from the registry, and we always restore from database backups so there would never be a scenario when we needed these to be created</li>
|
||||
<li>Testing some corrections on CCAFS Phase II flagships (<code>cg.subject.ccafs</code>):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing some corrections on CCAFS Phase II flagships (<code>cg.subject.ccafs</code>):</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-02-10">2017-02-10</h2>
|
||||
|
||||
@ -235,74 +234,57 @@ DELETE 1
|
||||
|
||||
<ul>
|
||||
<li>Fix issue with duplicate declaration of in atmire-dspace-xmlui <code>pom.xml</code> (causing non-fatal warnings during the maven build)</li>
|
||||
<li>Experiment with making DSpace generate HTTPS handle links, first a change in dspace.cfg or the site’s properties file:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Experiment with making DSpace generate HTTPS handle links, first a change in dspace.cfg or the site’s properties file:</p>
|
||||
|
||||
<pre><code>handle.canonical.prefix = https://hdl.handle.net/
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And then a SQL command to update existing records:</li>
|
||||
</ul>
|
||||
<li><p>And then a SQL command to update existing records:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'uri');
|
||||
UPDATE 58193
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Seems to work fine!</li>
|
||||
<li>I noticed a few items that have incorrect DOI links (<code>dc.identifier.doi</code>), and after looking in the database I see there are over 100 that are missing the scheme or are just plain wrong:</li>
|
||||
</ul>
|
||||
<li><p>Seems to work fine!</p></li>
|
||||
|
||||
<li><p>I noticed a few items that have incorrect DOI links (<code>dc.identifier.doi</code>), and after looking in the database I see there are over 100 that are missing the scheme or are just plain wrong:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value not like 'http%://%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This will replace any that begin with <code>10.</code> and change them to <code>https://dx.doi.org/10.</code>:</li>
|
||||
</ul>
|
||||
<li><p>This will replace any that begin with <code>10.</code> and change them to <code>https://dx.doi.org/10.</code>:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^10\..+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like '10.%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This will get any that begin with <code>doi:10.</code> and change them to <code>https://dx.doi.org/10.x</code>:</li>
|
||||
</ul>
|
||||
<li><p>This will get any that begin with <code>doi:10.</code> and change them to <code>https://dx.doi.org/10.x</code>:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^doi:(10\..+$)', 'https://dx.doi.org/\1') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'doi:10%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Fix DOIs like <code>dx.doi.org/10.</code> to be <code>https://dx.doi.org/10.</code>:</li>
|
||||
</ul>
|
||||
<li><p>Fix DOIs like <code>dx.doi.org/10.</code> to be <code>https://dx.doi.org/10.</code>:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org/%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Fix DOIs like <code>http//</code>:</li>
|
||||
</ul>
|
||||
<li><p>Fix DOIs like <code>http//</code>:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^http//(dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http//%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Fix DOIs like <code>dx.doi.org./</code>:</li>
|
||||
</ul>
|
||||
<li><p>Fix DOIs like <code>dx.doi.org./</code>:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org\./.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org./%'
|
||||
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Delete some invalid DOIs:</li>
|
||||
</ul>
|
||||
<li><p>Delete some invalid DOIs:</p>
|
||||
|
||||
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value in ('DOI','CPWF Mekong','Bulawayo, Zimbabwe','bb');
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Fix some other random outliers:</li>
|
||||
</ul>
|
||||
<li><p>Fix some other random outliers:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.1016/j.aquaculture.2015.09.003' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:/dx.doi.org/10.1016/j.aquaculture.2015.09.003';
|
||||
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.5337/2016.200' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'doi: https://dx.doi.org/10.5337/2016.200';
|
||||
@ -310,23 +292,22 @@ dspace=# update metadatavalue set text_value = 'https://dx.doi.org/doi:10.1371/j
|
||||
dspace=# update metadatavalue set text_value = 'https://dx.doi.10.1016/j.cosust.2013.11.012' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:dx.doi.10.1016/j.cosust.2013.11.012';
|
||||
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.1080/03632415.2014.883570' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'org/10.1080/03632415.2014.883570';
|
||||
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.15446/agron.colomb.v32n3.46052' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'Doi: 10.15446/agron.colomb.v32n3.46052';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And do another round of <code>http://</code> → <code>https://</code> cleanups:</li>
|
||||
</ul>
|
||||
<li><p>And do another round of <code>http://</code> → <code>https://</code> cleanups:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://dx.doi.org', 'https://dx.doi.org') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http://dx.doi.org%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run all DOI corrections on CGSpace</li>
|
||||
<li>Something to think about here is to write a <a href="https://wiki.duraspace.org/display/DSDOC5x/Curation+System#CurationSystem-ScriptedTasks">Curation Task</a> in Java to do these sanity checks / corrections every night</li>
|
||||
<li>Then we could add a cron job for them and run them from the command line like:</li>
|
||||
</ul>
|
||||
<li><p>Run all DOI corrections on CGSpace</p></li>
|
||||
|
||||
<li><p>Something to think about here is to write a <a href="https://wiki.duraspace.org/display/DSDOC5x/Curation+System#CurationSystem-ScriptedTasks">Curation Task</a> in Java to do these sanity checks / corrections every night</p></li>
|
||||
|
||||
<li><p>Then we could add a cron job for them and run them from the command line like:</p>
|
||||
|
||||
<pre><code>[dspace]/bin/dspace curate -t noop -i 10568/79891
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-02-20">2017-02-20</h2>
|
||||
|
||||
@ -337,8 +318,8 @@ dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.15446/agro
|
||||
<li>Help Sisay with SQL commands</li>
|
||||
<li>Help Paola from CCAFS with the Atmire Listings and Reports module</li>
|
||||
<li>Testing the <code>fix-metadata-values.py</code> script on macOS and it seems like we don’t need to use <code>.encode('utf-8')</code> anymore when printing strings to the screen</li>
|
||||
<li>It seems this might have only been a temporary problem, as both Python 3.5.2 and 3.6.0 are able to print the problematic string “Entwicklung & Ländlicher Raum” without the <code>encode()</code> call, but print it as a bytes when it <em>is</em> used:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It seems this might have only been a temporary problem, as both Python 3.5.2 and 3.6.0 are able to print the problematic string “Entwicklung & Ländlicher Raum” without the <code>encode()</code> call, but print it as a bytes when it <em>is</em> used:</p>
|
||||
|
||||
<pre><code>$ python
|
||||
Python 3.6.0 (default, Dec 25 2016, 17:30:53)
|
||||
@ -346,37 +327,34 @@ Python 3.6.0 (default, Dec 25 2016, 17:30:53)
|
||||
Entwicklung & Ländlicher Raum
|
||||
>>> print('Entwicklung & Ländlicher Raum'.encode())
|
||||
b'Entwicklung & L\xc3\xa4ndlicher Raum'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So for now I will remove the encode call from the script (though it was never used on the versions on the Linux hosts), leading me to believe it really <em>was</em> a temporary problem, perhaps due to macOS or the Python build I was using.</li>
|
||||
<li><p>So for now I will remove the encode call from the script (though it was never used on the versions on the Linux hosts), leading me to believe it really <em>was</em> a temporary problem, perhaps due to macOS or the Python build I was using.</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-02-21">2017-02-21</h2>
|
||||
|
||||
<ul>
|
||||
<li>Testing regenerating PDF thumbnails, like I started in 2016-11</li>
|
||||
<li>It seems there is a bug in <code>filter-media</code> that causes it to process formats that aren’t part of its configuration:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It seems there is a bug in <code>filter-media</code> that causes it to process formats that aren’t part of its configuration:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16856 -p "ImageMagick PDF Thumbnail"
|
||||
File: earlywinproposal_esa_postharvest.pdf.jpg
|
||||
FILTERED: bitstream 13787 (item: 10568/16881) and created 'earlywinproposal_esa_postharvest.pdf.jpg'
|
||||
File: postHarvest.jpg.jpg
|
||||
FILTERED: bitstream 16524 (item: 10568/24655) and created 'postHarvest.jpg.jpg'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>According to <code>dspace.cfg</code> the ImageMagick PDF Thumbnail plugin should only process PDFs:</li>
|
||||
</ul>
|
||||
<li><p>According to <code>dspace.cfg</code> the ImageMagick PDF Thumbnail plugin should only process PDFs:</p>
|
||||
|
||||
<pre><code>filter.org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter.inputFormats = BMP, GIF, image/png, JPG, TIFF, JPEG, JPEG 2000
|
||||
filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = Adobe PDF
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’ve sent a message to the mailing list and might file a Jira issue</li>
|
||||
<li>Ask Atmire about the failed interpolation of the <code>dspace.internalUrl</code> variable in <code>atmire-cua.cfg</code></li>
|
||||
<li><p>I’ve sent a message to the mailing list and might file a Jira issue</p></li>
|
||||
|
||||
<li><p>Ask Atmire about the failed interpolation of the <code>dspace.internalUrl</code> variable in <code>atmire-cua.cfg</code></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-02-22">2017-02-22</h2>
|
||||
@ -389,24 +367,22 @@ filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = A
|
||||
<h2 id="2017-02-26">2017-02-26</h2>
|
||||
|
||||
<ul>
|
||||
<li>Find all fields with “<a href="http://hdl.handle.net"">http://hdl.handle.net"</a> values (most are in <code>dc.identifier.uri</code>, but some are in other URL-related fields like <code>cg.link.reference</code>, <code>cg.identifier.dataurl</code>, and <code>cg.identifier.url</code>):</li>
|
||||
</ul>
|
||||
<li><p>Find all fields with “<a href="http://hdl.handle.net"">http://hdl.handle.net"</a> values (most are in <code>dc.identifier.uri</code>, but some are in other URL-related fields like <code>cg.link.reference</code>, <code>cg.identifier.dataurl</code>, and <code>cg.identifier.url</code>):</p>
|
||||
|
||||
<pre><code>dspace=# select distinct metadata_field_id from metadatavalue where resource_type_id=2 and text_value like 'http://hdl.handle.net%';
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where resource_type_id=2 and metadata_field_id IN (25, 113, 179, 219, 220, 223) and text_value like 'http://hdl.handle.net%';
|
||||
UPDATE 58633
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This works but I’m thinking I’ll wait on the replacement as there are perhaps some other places that rely on <code>http://hdl.handle.net</code> (grep the code, it’s scary how many things are hard coded)</li>
|
||||
<li>Send message to dspace-tech mailing list with concerns about this</li>
|
||||
<li><p>This works but I’m thinking I’ll wait on the replacement as there are perhaps some other places that rely on <code>http://hdl.handle.net</code> (grep the code, it’s scary how many things are hard coded)</p></li>
|
||||
|
||||
<li><p>Send message to dspace-tech mailing list with concerns about this</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-02-27">2017-02-27</h2>
|
||||
|
||||
<ul>
|
||||
<li>LDAP users cannot log in today, looks to be an issue with CGIAR’s LDAP server:</li>
|
||||
</ul>
|
||||
<li><p>LDAP users cannot log in today, looks to be an issue with CGIAR’s LDAP server:</p>
|
||||
|
||||
<pre><code>$ openssl s_client -connect svcgroot2.cgiarad.org:3269
|
||||
CONNECTED(00000003)
|
||||
@ -418,15 +394,14 @@ verify error:num=21:unable to verify the first certificate
|
||||
verify return:1
|
||||
---
|
||||
Certificate chain
|
||||
0 s:/CN=SVCGROOT2.CGIARAD.ORG
|
||||
i:/CN=CGIARAD-RDWA-CA
|
||||
0 s:/CN=SVCGROOT2.CGIARAD.ORG
|
||||
i:/CN=CGIARAD-RDWA-CA
|
||||
---
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>For some reason it is now signed by a private certificate authority</li>
|
||||
<li>This error seems to have started on 2017-02-25:</li>
|
||||
</ul>
|
||||
<li><p>For some reason it is now signed by a private certificate authority</p></li>
|
||||
|
||||
<li><p>This error seems to have started on 2017-02-25:</p>
|
||||
|
||||
<pre><code>$ grep -c "unable to find valid certification path" [dspace]/log/dspace.log.2017-02-*
|
||||
[dspace]/log/dspace.log.2017-02-01:0
|
||||
@ -456,24 +431,28 @@ Certificate chain
|
||||
[dspace]/log/dspace.log.2017-02-25:7
|
||||
[dspace]/log/dspace.log.2017-02-26:8
|
||||
[dspace]/log/dspace.log.2017-02-27:90
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also, it seems that we need to use a different user for LDAP binds, as we’re still using the temporary one from the root migration, so maybe we can go back to the previous user we were using</li>
|
||||
<li>So it looks like the certificate is invalid AND the bind users we had been using were deleted</li>
|
||||
<li>Biruk Debebe recreated the bind user and now we are just waiting for CGNET to update their certificates</li>
|
||||
<li>Regarding the <code>filter-media</code> issue I found earlier, it seems that the ImageMagick PDF plugin will also process JPGs if they are in the “Content Files” (aka <code>ORIGINAL</code>) bundle</li>
|
||||
<li>The problem likely lies in the logic of <code>ImageMagickThumbnailFilter.java</code>, as <code>ImageMagickPdfThumbnailFilter.java</code> extends it</li>
|
||||
<li>Run CIAT corrections on CGSpace</li>
|
||||
</ul>
|
||||
<li><p>Also, it seems that we need to use a different user for LDAP binds, as we’re still using the temporary one from the root migration, so maybe we can go back to the previous user we were using</p></li>
|
||||
|
||||
<li><p>So it looks like the certificate is invalid AND the bind users we had been using were deleted</p></li>
|
||||
|
||||
<li><p>Biruk Debebe recreated the bind user and now we are just waiting for CGNET to update their certificates</p></li>
|
||||
|
||||
<li><p>Regarding the <code>filter-media</code> issue I found earlier, it seems that the ImageMagick PDF plugin will also process JPGs if they are in the “Content Files” (aka <code>ORIGINAL</code>) bundle</p></li>
|
||||
|
||||
<li><p>The problem likely lies in the logic of <code>ImageMagickThumbnailFilter.java</code>, as <code>ImageMagickPdfThumbnailFilter.java</code> extends it</p></li>
|
||||
|
||||
<li><p>Run CIAT corrections on CGSpace</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>CGNET has fixed the certificate chain on their LDAP server</li>
|
||||
<li>Redeploy CGSpace and DSpace Test to on latest <code>5_x-prod</code> branch with fixes for LDAP bind user</li>
|
||||
<li>Run all system updates on CGSpace server and reboot</li>
|
||||
<li><p>CGNET has fixed the certificate chain on their LDAP server</p></li>
|
||||
|
||||
<li><p>Redeploy CGSpace and DSpace Test to on latest <code>5_x-prod</code> branch with fixes for LDAP bind user</p></li>
|
||||
|
||||
<li><p>Run all system updates on CGSpace server and reboot</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-02-28">2017-02-28</h2>
|
||||
@ -481,26 +460,23 @@ Certificate chain
|
||||
<ul>
|
||||
<li>After running the CIAT corrections and updating the Discovery and authority indexes, there is still no change in the number of items listed for CIAT in Discovery</li>
|
||||
<li>Ah, this is probably because some items have the <code>International Center for Tropical Agriculture</code> author twice, which I first noticed in 2016-12 but couldn’t figure out how to fix</li>
|
||||
<li>I think I can do it by first exporting all metadatavalues that have the author <code>International Center for Tropical Agriculture</code></li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think I can do it by first exporting all metadatavalues that have the author <code>International Center for Tropical Agriculture</code></p>
|
||||
|
||||
<pre><code>dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='International Center for Tropical Agriculture') to /tmp/ciat.csv with csv;
|
||||
COPY 1968
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And then use awk to print the duplicate lines to a separate file:</li>
|
||||
</ul>
|
||||
<li><p>And then use awk to print the duplicate lines to a separate file:</p>
|
||||
|
||||
<pre><code>$ awk -F',' 'seen[$1]++' /tmp/ciat.csv > /tmp/ciat-dupes.csv
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>From that file I can create a list of 279 deletes and put them in a batch script like:</li>
|
||||
</ul>
|
||||
<li><p>From that file I can create a list of 279 deletes and put them in a batch script like:</p>
|
||||
|
||||
<pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and metadata_value_id=2742061;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
|
@ -23,11 +23,12 @@ Need to send Peter and Michael some notes about this in a few days
|
||||
Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI
|
||||
Filed an issue on DSpace issue tracker for the filter-media bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516
|
||||
Discovered that the ImageMagic filter-media plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK
|
||||
Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 10568⁄51999):
|
||||
|
||||
Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 10568⁄51999):
|
||||
|
||||
$ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-03/" />
|
||||
@ -53,13 +54,14 @@ Need to send Peter and Michael some notes about this in a few days
|
||||
Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI
|
||||
Filed an issue on DSpace issue tracker for the filter-media bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516
|
||||
Discovered that the ImageMagic filter-media plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK
|
||||
Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 10568⁄51999):
|
||||
|
||||
Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 10568⁄51999):
|
||||
|
||||
$ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -155,12 +157,13 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
|
||||
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
|
||||
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
|
||||
<li>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>⁄<sub>51999</sub></a>):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>⁄<sub>51999</sub></a>):</p>
|
||||
|
||||
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
<li>This results in discolored thumbnails when compared to the original PDF, for example sRGB and CMYK:</li>
|
||||
@ -178,26 +181,30 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
|
||||
<ul>
|
||||
<li>I created a patch for DS-3517 and made a pull request against upstream <code>dspace-5_x</code>: <a href="https://github.com/DSpace/DSpace/pull/1669">https://github.com/DSpace/DSpace/pull/1669</a></li>
|
||||
<li>Looks like <code>-colorspace sRGB</code> alone isn’t enough, we need to use profiles:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Looks like <code>-colorspace sRGB</code> alone isn’t enough, we need to use profiles:</p>
|
||||
|
||||
<pre><code>$ convert alc_contrastes_desafios.pdf\[0\] -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_cmyk.icc -thumbnail 300x300 -flatten -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_rgb.icc alc_contrastes_desafios.pdf.jpg
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This reads the input file, applies the CMYK profile, applies the RGB profile, then writes the file</li>
|
||||
<li>Note that you should set the first profile immediately after the input file</li>
|
||||
<li>Also, it is better to use profiles than setting <code>-colorspace</code></li>
|
||||
<li>This is a great resource describing the color stuff: <a href="http://www.imagemagick.org/Usage/formats/#profiles">http://www.imagemagick.org/Usage/formats/#profiles</a></li>
|
||||
<li>Somehow we need to detect the color system being used by the input file and handle each case differently (with profiles)</li>
|
||||
<li>This is trivial with <code>identify</code> (even by the <a href="http://im4java.sourceforge.net/api/org/im4java/core/IMOps.html#identify">Java ImageMagick API</a>):</li>
|
||||
</ul>
|
||||
<li><p>This reads the input file, applies the CMYK profile, applies the RGB profile, then writes the file</p></li>
|
||||
|
||||
<li><p>Note that you should set the first profile immediately after the input file</p></li>
|
||||
|
||||
<li><p>Also, it is better to use profiles than setting <code>-colorspace</code></p></li>
|
||||
|
||||
<li><p>This is a great resource describing the color stuff: <a href="http://www.imagemagick.org/Usage/formats/#profiles">http://www.imagemagick.org/Usage/formats/#profiles</a></p></li>
|
||||
|
||||
<li><p>Somehow we need to detect the color system being used by the input file and handle each case differently (with profiles)</p></li>
|
||||
|
||||
<li><p>This is trivial with <code>identify</code> (even by the <a href="http://im4java.sourceforge.net/api/org/im4java/core/IMOps.html#identify">Java ImageMagick API</a>):</p>
|
||||
|
||||
<pre><code>$ identify -format '%r\n' alc_contrastes_desafios.pdf\[0\]
|
||||
DirectClass CMYK
|
||||
$ identify -format '%r\n' Africa\ group\ of\ negotiators.pdf\[0\]
|
||||
DirectClass sRGB Alpha
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-03-04">2017-03-04</h2>
|
||||
|
||||
@ -212,60 +219,57 @@ DirectClass sRGB Alpha
|
||||
<ul>
|
||||
<li>Look into helping developers from landportal.info with a query for items related to LAND on the REST API</li>
|
||||
<li>They want something like the items that are returned by the general “LAND” query in the search interface, but we cannot do that</li>
|
||||
<li>We can only return specific results for metadata fields, like:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>We can only return specific results for metadata fields, like:</p>
|
||||
|
||||
<pre><code>$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "LAND REFORM", "language": null}' | json_pp
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But there are hundreds of combinations of fields and values (like <code>dc.subject</code> and all the center subjects), and we can’t use wildcards in REST!</li>
|
||||
<li>Reading about enabling multiple handle prefixes in DSpace</li>
|
||||
<li>There is a mailing list thread from 2011 about it: <a href="http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html">http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html</a></li>
|
||||
<li>And a comment from Atmire’s Bram about it on the DSpace wiki: <a href="https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296">https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296</a></li>
|
||||
<li>Bram mentions an undocumented configuration option <code>handle.plugin.checknameauthority</code>, but I noticed another one in <code>dspace.cfg</code>:</li>
|
||||
</ul>
|
||||
<li><p>But there are hundreds of combinations of fields and values (like <code>dc.subject</code> and all the center subjects), and we can’t use wildcards in REST!</p></li>
|
||||
|
||||
<li><p>Reading about enabling multiple handle prefixes in DSpace</p></li>
|
||||
|
||||
<li><p>There is a mailing list thread from 2011 about it: <a href="http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html">http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html</a></p></li>
|
||||
|
||||
<li><p>And a comment from Atmire’s Bram about it on the DSpace wiki: <a href="https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296">https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296</a></p></li>
|
||||
|
||||
<li><p>Bram mentions an undocumented configuration option <code>handle.plugin.checknameauthority</code>, but I noticed another one in <code>dspace.cfg</code>:</p>
|
||||
|
||||
<pre><code># List any additional prefixes that need to be managed by this handle server
|
||||
# (as for examle handle prefix coming from old dspace repository merged in
|
||||
# that repository)
|
||||
# handle.additional.prefixes = prefix1[, prefix2]
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Because of this I noticed that our Handle server’s <code>config.dct</code> was potentially misconfigured!</li>
|
||||
<li>We had some default values still present:</li>
|
||||
</ul>
|
||||
<li><p>Because of this I noticed that our Handle server’s <code>config.dct</code> was potentially misconfigured!</p></li>
|
||||
|
||||
<li><p>We had some default values still present:</p>
|
||||
|
||||
<pre><code>"300:0.NA/YOUR_NAMING_AUTHORITY"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’ve changed them to the following and restarted the handle server:</li>
|
||||
</ul>
|
||||
<li><p>I’ve changed them to the following and restarted the handle server:</p>
|
||||
|
||||
<pre><code>"300:0.NA/10568"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In looking at all the configs I just noticed that we are not providing a DOI in the Google-specific metadata crosswalk</li>
|
||||
<li>From <code>dspace/config/crosswalks/google-metadata.properties</code>:</li>
|
||||
</ul>
|
||||
<li><p>In looking at all the configs I just noticed that we are not providing a DOI in the Google-specific metadata crosswalk</p></li>
|
||||
|
||||
<li><p>From <code>dspace/config/crosswalks/google-metadata.properties</code>:</p>
|
||||
|
||||
<pre><code>google.citation_doi = cg.identifier.doi
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This works, and makes DSpace output the following metadata on the item view page:</li>
|
||||
</ul>
|
||||
<li><p>This works, and makes DSpace output the following metadata on the item view page:</p>
|
||||
|
||||
<pre><code><meta content="https://dx.doi.org/10.1186/s13059-017-1153-y" name="citation_doi">
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Submitted and merged pull request for this: <a href="https://github.com/ilri/DSpace/pull/305">https://github.com/ilri/DSpace/pull/305</a></li>
|
||||
<li>Submit pull request to set the author separator for XMLUI item lists to a semicolon instead of “,”: <a href="https://github.com/ilri/DSpace/pull/306">https://github.com/ilri/DSpace/pull/306</a></li>
|
||||
<li>I want to show it briefly to Abenet and Peter to get feedback</li>
|
||||
<li><p>Submitted and merged pull request for this: <a href="https://github.com/ilri/DSpace/pull/305">https://github.com/ilri/DSpace/pull/305</a></p></li>
|
||||
|
||||
<li><p>Submit pull request to set the author separator for XMLUI item lists to a semicolon instead of “,”: <a href="https://github.com/ilri/DSpace/pull/306">https://github.com/ilri/DSpace/pull/306</a></p></li>
|
||||
|
||||
<li><p>I want to show it briefly to Abenet and Peter to get feedback</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-03-06">2017-03-06</h2>
|
||||
@ -302,35 +306,34 @@ DirectClass sRGB Alpha
|
||||
<h2 id="2017-03-09">2017-03-09</h2>
|
||||
|
||||
<ul>
|
||||
<li>Export list of sponsors so Peter can clean it up:</li>
|
||||
</ul>
|
||||
<li><p>Export list of sponsors so Peter can clean it up:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship') group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
|
||||
COPY 285
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-03-12">2017-03-12</h2>
|
||||
|
||||
<ul>
|
||||
<li>Test the sponsorship fixes and deletes from Peter:</li>
|
||||
</ul>
|
||||
<li><p>Test the sponsorship fixes and deletes from Peter:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i Investors-Fix-51.csv -f dc.description.sponsorship -t Action -m 29 -d dspace -u dspace -p fuuuu
|
||||
$ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.sponsorship -m 29 -d dspace -u dspace -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generate a new list of unique sponsors so we can update the controlled vocabulary:</li>
|
||||
</ul>
|
||||
<li><p>Generate a new list of unique sponsors so we can update the controlled vocabulary:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship')) to /tmp/sponsorship.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Pull request for controlled vocabulary if Peter approves: <a href="https://github.com/ilri/DSpace/pull/308">https://github.com/ilri/DSpace/pull/308</a></li>
|
||||
<li>Review Sisay’s roots, tubers, and bananas (RTB) theme, which still needs some fixes to work properly: <a href="https://github.com/ilri/DSpace/pull/307">https://github.com/ilri/DSpace/pull/307</a></li>
|
||||
<li>Created an issue to track the progress on the Livestock CRP theme: <a href="https://github.com/ilri/DSpace/issues/309">https://github.com/ilri/DSpace/issues/309</a></li>
|
||||
<li>Created a basic theme for the Livestock CRP community</li>
|
||||
<li><p>Pull request for controlled vocabulary if Peter approves: <a href="https://github.com/ilri/DSpace/pull/308">https://github.com/ilri/DSpace/pull/308</a></p></li>
|
||||
|
||||
<li><p>Review Sisay’s roots, tubers, and bananas (RTB) theme, which still needs some fixes to work properly: <a href="https://github.com/ilri/DSpace/pull/307">https://github.com/ilri/DSpace/pull/307</a></p></li>
|
||||
|
||||
<li><p>Created an issue to track the progress on the Livestock CRP theme: <a href="https://github.com/ilri/DSpace/issues/309">https://github.com/ilri/DSpace/issues/309</a></p></li>
|
||||
|
||||
<li><p>Created a basic theme for the Livestock CRP community</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2017/03/livestock-theme.png" alt="Livestock CRP theme" /></p>
|
||||
@ -374,40 +377,36 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
|
||||
<h2 id="2017-03-28">2017-03-28</h2>
|
||||
|
||||
<ul>
|
||||
<li>CCAFS said they are ready for the flagship updates for Phase II to be run (<code>cg.subject.ccafs</code>), so I ran them on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>CCAFS said they are ready for the flagship updates for Phase II to be run (<code>cg.subject.ccafs</code>), so I ran them on CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We’ve been waiting since February to run these</li>
|
||||
<li>Also, I generated a list of all CCAFS flagships because there are a dozen or so more than there should be:</li>
|
||||
</ul>
|
||||
<li><p>We’ve been waiting since February to run these</p></li>
|
||||
|
||||
<li><p>Also, I generated a list of all CCAFS flagships because there are a dozen or so more than there should be:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=210 group by text_value order by count desc) to /tmp/ccafs.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I sent a list to CCAFS people so they can tell me if some should be deleted or moved, etc</li>
|
||||
<li>Test, squash, and merge Sisay’s RTB theme into <code>5_x-prod</code>: <a href="https://github.com/ilri/DSpace/pull/316">https://github.com/ilri/DSpace/pull/316</a></li>
|
||||
<li><p>I sent a list to CCAFS people so they can tell me if some should be deleted or moved, etc</p></li>
|
||||
|
||||
<li><p>Test, squash, and merge Sisay’s RTB theme into <code>5_x-prod</code>: <a href="https://github.com/ilri/DSpace/pull/316">https://github.com/ilri/DSpace/pull/316</a></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-03-29">2017-03-29</h2>
|
||||
|
||||
<ul>
|
||||
<li>Dump a list of fields in the DC and CG schemas to compare with CG Core:</li>
|
||||
</ul>
|
||||
<li><p>Dump a list of fields in the DC and CG schemas to compare with CG Core:</p>
|
||||
|
||||
<pre><code>dspace=# select case when metadata_schema_id=1 then 'dc' else 'cg' end as schema, element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Ooh, a better one!</li>
|
||||
</ul>
|
||||
<li><p>Ooh, a better one!</p>
|
||||
|
||||
<pre><code>dspace=# select coalesce(case when metadata_schema_id=1 then 'dc.' else 'cg.' end) || concat_ws('.', element, qualifier) as field, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-03-30">2017-03-30</h2>
|
||||
|
||||
|
@ -17,10 +17,11 @@ Quick proof-of-concept hack to add dc.rights to the input form, including some i
|
||||
|
||||
|
||||
Remove redundant/duplicate text in the DSpace submission license
|
||||
|
||||
Testing the CMYK patch on a collection with 650 items:
|
||||
|
||||
|
||||
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
|
||||
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-04/" />
|
||||
@ -40,12 +41,13 @@ Quick proof-of-concept hack to add dc.rights to the input form, including some i
|
||||
|
||||
|
||||
Remove redundant/duplicate text in the DSpace submission license
|
||||
|
||||
Testing the CMYK patch on a collection with 650 items:
|
||||
|
||||
|
||||
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -135,95 +137,88 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th
|
||||
|
||||
<ul>
|
||||
<li>Remove redundant/duplicate text in the DSpace submission license</li>
|
||||
<li>Testing the CMYK patch on a collection with 650 items:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-04-03">2017-04-03</h2>
|
||||
|
||||
<ul>
|
||||
<li>Continue testing the CMYK patch on more communities:</li>
|
||||
</ul>
|
||||
<li><p>Continue testing the CMYK patch on more communities:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/1 -p "ImageMagick PDF Thumbnail" -v >> /tmp/filter-media-cmyk.txt 2>&1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So far there are almost 500:</li>
|
||||
</ul>
|
||||
<li><p>So far there are almost 500:</p>
|
||||
|
||||
<pre><code>$ grep -c profile /tmp/filter-media-cmyk.txt
|
||||
484
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking at the CG Core document again, I’ll send some feedback to Peter and Abenet:
|
||||
<li><p>Looking at the CG Core document again, I’ll send some feedback to Peter and Abenet:</p>
|
||||
|
||||
<ul>
|
||||
<li>We use cg.contributor.crp to indicate the CRP(s) affiliated with the item</li>
|
||||
<li>DSpace has dc.date.available, but this field isn’t particularly meaningful other than as an automatic timestamp at the time of item accession (and is identical to dc.date.accessioned)</li>
|
||||
<li>dc.relation exists in CGSpace, but isn’t used—rather dc.relation.ispartofseries, which is used ~5,000 times to Series name and number within that series</li>
|
||||
</ul></li>
|
||||
<li>Also, I’m noticing some weird outliers in <code>cg.coverage.region</code>, need to remember to go correct these later:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Also, I’m noticing some weird outliers in <code>cg.coverage.region</code>, need to remember to go correct these later:</p>
|
||||
|
||||
<pre><code>dspace=# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=227;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-04-04">2017-04-04</h2>
|
||||
|
||||
<ul>
|
||||
<li>The <code>filter-media</code> script has been running on more large communities and now there are many more CMYK PDFs that have been fixed:</li>
|
||||
</ul>
|
||||
<li><p>The <code>filter-media</code> script has been running on more large communities and now there are many more CMYK PDFs that have been fixed:</p>
|
||||
|
||||
<pre><code>$ grep -c profile /tmp/filter-media-cmyk.txt
|
||||
1584
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Trying to find a way to get the number of items submitted by a certain user in 2016</li>
|
||||
<li>It’s not possible in the DSpace search / module interfaces, but might be able to be derived from <code>dc.description.provenance</code>, as that field contains the name and email of the submitter/approver, ie:</li>
|
||||
</ul>
|
||||
<li><p>Trying to find a way to get the number of items submitted by a certain user in 2016</p></li>
|
||||
|
||||
<li><p>It’s not possible in the DSpace search / module interfaces, but might be able to be derived from <code>dc.description.provenance</code>, as that field contains the name and email of the submitter/approver, ie:</p>
|
||||
|
||||
<pre><code>Submitted by Francesca Giampieri (fgiampieri) on 2016-01-19T13:56:43Z^M
|
||||
No. of bitstreams: 1^M
|
||||
ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0 (MD5)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This SQL query returns fields that were submitted or approved by giampieri in 2016 and contain a “checksum” (ie, there was a bitstream in the submission):</li>
|
||||
</ul>
|
||||
<li><p>This SQL query returns fields that were submitted or approved by giampieri in 2016 and contain a “checksum” (ie, there was a bitstream in the submission):</p>
|
||||
|
||||
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^(Submitted|Approved).*giampieri.*2016-.*checksum.*';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then this one does the same, but for fields that don’t contain checksums (ie, there was no bitstream in the submission):</li>
|
||||
</ul>
|
||||
<li><p>Then this one does the same, but for fields that don’t contain checksums (ie, there was no bitstream in the submission):</p>
|
||||
|
||||
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^(Submitted|Approved).*giampieri.*2016-.*' and text_value !~ '^(Submitted|Approved).*giampieri.*2016-.*checksum.*';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>For some reason there seem to be way too many fields, for example there are 498 + 13 here, which is 511 items for just this one user.</li>
|
||||
<li>It looks like there can be a scenario where the user submitted AND approved it, so some records might be doubled…</li>
|
||||
<li>In that case it might just be better to see how many the user submitted (both <em>with</em> and <em>without</em> bitstreams):</li>
|
||||
</ul>
|
||||
<li><p>For some reason there seem to be way too many fields, for example there are 498 + 13 here, which is 511 items for just this one user.</p></li>
|
||||
|
||||
<li><p>It looks like there can be a scenario where the user submitted AND approved it, so some records might be doubled…</p></li>
|
||||
|
||||
<li><p>In that case it might just be better to see how many the user submitted (both <em>with</em> and <em>without</em> bitstreams):</p>
|
||||
|
||||
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^Submitted.*giampieri.*2016-.*';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-04-05">2017-04-05</h2>
|
||||
|
||||
<ul>
|
||||
<li>After doing a few more large communities it seems this is the final count of CMYK PDFs:</li>
|
||||
</ul>
|
||||
<li><p>After doing a few more large communities it seems this is the final count of CMYK PDFs:</p>
|
||||
|
||||
<pre><code>$ grep -c profile /tmp/filter-media-cmyk.txt
|
||||
2505
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-04-06">2017-04-06</h2>
|
||||
|
||||
@ -301,8 +296,8 @@ ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0
|
||||
</ul></li>
|
||||
<li>I don’t see these fields anywhere in our source code or the database’s metadata registry, so maybe it’s just a cache issue</li>
|
||||
<li>I will have to check the OAI cron scripts on DSpace Test, and then run them on CGSpace</li>
|
||||
<li>Running <code>dspace oai import</code> and <code>dspace oai clean-cache</code> have zero effect, but this seems to rebuild the cache from scratch:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Running <code>dspace oai import</code> and <code>dspace oai clean-cache</code> have zero effect, but this seems to rebuild the cache from scratch:</p>
|
||||
|
||||
<pre><code>$ /home/dspacetest.cgiar.org/bin/dspace oai import -c
|
||||
...
|
||||
@ -311,14 +306,15 @@ ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0
|
||||
Total: 64056 items
|
||||
Purging cached OAI responses.
|
||||
OAI 2.0 manager action ended. It took 829 seconds.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After reading some threads on the DSpace mailing list, I see that <code>clean-cache</code> is actually only for caching <em>responses</em>, ie to client requests in the OAI web application</li>
|
||||
<li>These are stored in <code>[dspace]/var/oai/requests/</code></li>
|
||||
<li>The import command should theoretically catch situations like this where an item’s metadata was updated, but in this case we changed the metadata schema and it doesn’t seem to catch it (could be a bug!)</li>
|
||||
<li>Attempting a full rebuild of OAI on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>After reading some threads on the DSpace mailing list, I see that <code>clean-cache</code> is actually only for caching <em>responses</em>, ie to client requests in the OAI web application</p></li>
|
||||
|
||||
<li><p>These are stored in <code>[dspace]/var/oai/requests/</code></p></li>
|
||||
|
||||
<li><p>The import command should theoretically catch situations like this where an item’s metadata was updated, but in this case we changed the metadata schema and it doesn’t seem to catch it (could be a bug!)</p></li>
|
||||
|
||||
<li><p>Attempting a full rebuild of OAI on CGSpace:</p>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace oai import -c
|
||||
@ -331,12 +327,13 @@ OAI 2.0 manager action ended. It took 1032 seconds.
|
||||
real 17m20.156s
|
||||
user 4m35.293s
|
||||
sys 1m29.310s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Now the data for <sup>10568</sup>⁄<sub>6</sub> is correct in OAI: <a href="https://cgspace.cgiar.org/oai/request?verb=GetRecord&metadataPrefix=dim&identifier=oai:cgspace.cgiar.org:10568/6">https://cgspace.cgiar.org/oai/request?verb=GetRecord&metadataPrefix=dim&identifier=oai:cgspace.cgiar.org:10568/6</a></li>
|
||||
<li>Perhaps I need to file a bug for this, or at least ask on the DSpace Test mailing list?</li>
|
||||
<li>I wonder if we could use a crosswalk to convert to a format that CG Core wants, like <code><date Type="Available"></code></li>
|
||||
<li><p>Now the data for <sup>10568</sup>⁄<sub>6</sub> is correct in OAI: <a href="https://cgspace.cgiar.org/oai/request?verb=GetRecord&metadataPrefix=dim&identifier=oai:cgspace.cgiar.org:10568/6">https://cgspace.cgiar.org/oai/request?verb=GetRecord&metadataPrefix=dim&identifier=oai:cgspace.cgiar.org:10568/6</a></p></li>
|
||||
|
||||
<li><p>Perhaps I need to file a bug for this, or at least ask on the DSpace Test mailing list?</p></li>
|
||||
|
||||
<li><p>I wonder if we could use a crosswalk to convert to a format that CG Core wants, like <code><date Type="Available"></code></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-04-13">2017-04-13</h2>
|
||||
@ -381,19 +378,20 @@ sys 1m29.310s
|
||||
<li>CIFOR has now implemented a new “cgiar” context in their OAI that exposes CG fields, so I am re-harvesting that to see how it looks in the Discovery sidebars and searches</li>
|
||||
<li>See: <a href="https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&metadataPrefix=dim&identifier=oai:data.cifor.org:11463/947">https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&metadataPrefix=dim&identifier=oai:data.cifor.org:11463/947</a></li>
|
||||
<li>One thing we need to remember if we start using OAI is to enable the autostart of the harvester process (see <code>harvester.autoStart</code> in <code>dspace/config/modules/oai.cfg</code>)</li>
|
||||
<li>Error when running DSpace cleanup task on DSpace Test and CGSpace (on the same item), I need to look this up:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Error when running DSpace cleanup task on DSpace Test and CGSpace (on the same item), I need to look this up:</p>
|
||||
|
||||
<pre><code>Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(435) is still referenced from table "bundle".
|
||||
</code></pre>
|
||||
Detail: Key (bitstream_id)=(435) is still referenced from table "bundle".
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-04-18">2017-04-18</h2>
|
||||
|
||||
<ul>
|
||||
<li>Helping Tsega test his new <a href="https://github.com/ilri/ckm-cgspace-rest-api">CGSpace REST API Rails app</a> on DSpace Test</li>
|
||||
<li>Setup and run with:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Setup and run with:</p>
|
||||
|
||||
<pre><code>$ git clone https://github.com/ilri/ckm-cgspace-rest-api.git
|
||||
$ cd ckm-cgspace-rest-api/app
|
||||
@ -401,22 +399,20 @@ $ gem install bundler
|
||||
$ bundle
|
||||
$ cd ..
|
||||
$ rails -s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I used Ansible to create a PostgreSQL user that only has <code>SELECT</code> privileges on the tables it needs:</li>
|
||||
</ul>
|
||||
<li><p>I used Ansible to create a PostgreSQL user that only has <code>SELECT</code> privileges on the tables it needs:</p>
|
||||
|
||||
<pre><code>$ ansible linode02 -u aorth -b --become-user=postgres -K -m postgresql_user -a 'db=database name=username password=password priv=CONNECT/item:SELECT/metadatavalue:SELECT/metadatafieldregistry:SELECT/metadataschemaregistry:SELECT/collection:SELECT/handle:SELECT/bundle2bitstream:SELECT/bitstream:SELECT/bundle:SELECT/item2bundle:SELECT state=present
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Need to look into <a href="https://github.com/puma/puma/blob/master/docs/systemd.md">running this via systemd</a></li>
|
||||
<li>This is interesting for creating runnable commands from <code>bundle</code>:</li>
|
||||
</ul>
|
||||
<li><p>Need to look into <a href="https://github.com/puma/puma/blob/master/docs/systemd.md">running this via systemd</a></p></li>
|
||||
|
||||
<li><p>This is interesting for creating runnable commands from <code>bundle</code>:</p>
|
||||
|
||||
<pre><code>$ bundle binstubs puma --path ./sbin
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-04-19">2017-04-19</h2>
|
||||
|
||||
@ -429,30 +425,27 @@ $ rails -s
|
||||
<li>Abenet noticed that the “Workflow Statistics” option is missing now, but we have screenshots from a presentation in 2016 when it was there</li>
|
||||
<li>I filed a ticket with Atmire</li>
|
||||
<li>Looking at 933 CIAT records from Sisay, he’s having problems creating a SAF bundle to import to DSpace Test</li>
|
||||
<li>I started by looking at his CSV in OpenRefine, and I see there a <em>bunch</em> of fields with whitespace issues that I cleaned up:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I started by looking at his CSV in OpenRefine, and I see there a <em>bunch</em> of fields with whitespace issues that I cleaned up:</p>
|
||||
|
||||
<pre><code>value.replace(" ||","||").replace("|| ","||").replace(" || ","||")
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also, all the filenames have spaces and URL encoded characters in them, so I decoded them from URL encoding:</li>
|
||||
</ul>
|
||||
<li><p>Also, all the filenames have spaces and URL encoded characters in them, so I decoded them from URL encoding:</p>
|
||||
|
||||
<pre><code>unescape(value,"url")
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then create the filename column using the following transform from URL:</li>
|
||||
</ul>
|
||||
<li><p>Then create the filename column using the following transform from URL:</p>
|
||||
|
||||
<pre><code>value.split('/')[-1].replace(/#.*$/,"")
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The <code>replace</code> part is because some URLs have an anchor like <code>#page=14</code> which we obviously don’t want on the filename</li>
|
||||
<li>Also, we need to only use the PDF on the item corresponding with page 1, so we don’t end up with literally hundreds of duplicate PDFs</li>
|
||||
<li>Alternatively, I could export each page to a standalone PDF…</li>
|
||||
<li><p>The <code>replace</code> part is because some URLs have an anchor like <code>#page=14</code> which we obviously don’t want on the filename</p></li>
|
||||
|
||||
<li><p>Also, we need to only use the PDF on the item corresponding with page 1, so we don’t end up with literally hundreds of duplicate PDFs</p></li>
|
||||
|
||||
<li><p>Alternatively, I could export each page to a standalone PDF…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-04-20">2017-04-20</h2>
|
||||
@ -461,99 +454,97 @@ $ rails -s
|
||||
<li>Atmire responded about the Workflow Statistics, saying that it had been disabled because many environments needed customization to be useful</li>
|
||||
<li>I re-enabled it with a hidden config key <code>workflow.stats.enabled = true</code> on DSpace Test and will evaluate adding it on CGSpace</li>
|
||||
<li>Looking at the CIAT data again, a bunch of items have metadata values ending in <code>||</code>, which might cause blank fields to be added at import time</li>
|
||||
<li>Cleaning them up with OpenRefine:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Cleaning them up with OpenRefine:</p>
|
||||
|
||||
<pre><code>value.replace(/\|\|$/,"")
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Working with the CIAT data in OpenRefine to remove the filename column from all but the first item which requires a particular PDF, as there are many items pointing to the same PDF, which would cause hundreds of duplicates to be added if we included them in the SAF bundle</li>
|
||||
<li>I did some massaging in OpenRefine, flagging duplicates with stars and flags, then filtering and removing the filenames of those items</li>
|
||||
<li><p>Working with the CIAT data in OpenRefine to remove the filename column from all but the first item which requires a particular PDF, as there are many items pointing to the same PDF, which would cause hundreds of duplicates to be added if we included them in the SAF bundle</p></li>
|
||||
|
||||
<li><p>I did some massaging in OpenRefine, flagging duplicates with stars and flags, then filtering and removing the filenames of those items</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2017/04/openrefine-flagging-duplicates.png" alt="Flagging and filtering duplicates in OpenRefine" /></p>
|
||||
|
||||
<ul>
|
||||
<li>Also there are loads of whitespace errors in almost every field, so I trimmed leading/trailing whitespace</li>
|
||||
<li>Unbelievable, there are also metadata values like:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Unbelievable, there are also metadata values like:</p>
|
||||
|
||||
<pre><code>COLLETOTRICHUM LINDEMUTHIANUM|| FUSARIUM||GERMPLASM
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Add a description to the file names using:</li>
|
||||
</ul>
|
||||
<li><p>Add a description to the file names using:</p>
|
||||
|
||||
<pre><code>value + "__description:" + cells["dc.type"].value
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Test import of 933 records:</li>
|
||||
</ul>
|
||||
<li><p>Test import of 933 records:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace import -a -e aorth@mjanja.ch -c 10568/87193 -s /home/aorth/src/CIAT-Books/SimpleArchiveFormat/ -m /tmp/ciat
|
||||
$ wc -l /tmp/ciat
|
||||
933 /tmp/ciat
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run system updates on CGSpace and reboot server</li>
|
||||
<li>This includes switching nginx to using upstream with keepalive instead of direct <code>proxy_pass</code></li>
|
||||
<li>Re-deploy CGSpace to latest <code>5_x-prod</code>, including the PABRA and RTB XMLUI themes, as well as the PDF processing and CMYK changes</li>
|
||||
<li>More work on Ansible infrastructure stuff for Tsega’s CKM DSpace REST API</li>
|
||||
<li>I’m going to start re-processing all the PDF thumbnails on CGSpace, one community at a time:</li>
|
||||
</ul>
|
||||
<li><p>Run system updates on CGSpace and reboot server</p></li>
|
||||
|
||||
<li><p>This includes switching nginx to using upstream with keepalive instead of direct <code>proxy_pass</code></p></li>
|
||||
|
||||
<li><p>Re-deploy CGSpace to latest <code>5_x-prod</code>, including the PABRA and RTB XMLUI themes, as well as the PDF processing and CMYK changes</p></li>
|
||||
|
||||
<li><p>More work on Ansible infrastructure stuff for Tsega’s CKM DSpace REST API</p></li>
|
||||
|
||||
<li><p>I’m going to start re-processing all the PDF thumbnails on CGSpace, one community at a time:</p>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace filter-media -f -v -i 10568/71249 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-04-22">2017-04-22</h2>
|
||||
|
||||
<ul>
|
||||
<li>Someone on the dspace-tech mailing list responded with a suggestion about the foreign key violation in the <code>cleanup</code> task</li>
|
||||
<li>The solution is to remove the ID (ie set to NULL) from the <code>primary_bitstream_id</code> column in the <code>bundle</code> table</li>
|
||||
<li>After doing that and running the <code>cleanup</code> task again I find more bitstreams that are affected and end up with a long list of IDs that need to be fixed:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>After doing that and running the <code>cleanup</code> task again I find more bitstreams that are affected and end up with a long list of IDs that need to be fixed:</p>
|
||||
|
||||
<pre><code>dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (435, 1136, 1132, 1220, 1236, 3002, 3255, 5322);
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-04-24">2017-04-24</h2>
|
||||
|
||||
<ul>
|
||||
<li>Two users mentioned some items they recently approved not showing up in the search / XMLUI</li>
|
||||
<li>I looked at the logs from yesterday and it seems the Discovery indexing has been crashing:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I looked at the logs from yesterday and it seems the Discovery indexing has been crashing:</p>
|
||||
|
||||
<pre><code>2017-04-24 00:00:15,578 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (55 of 58853): 70590
|
||||
2017-04-24 00:00:15,586 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (56 of 58853): 74507
|
||||
2017-04-24 00:00:15,614 ERROR com.atmire.dspace.discovery.AtmireSolrService @ this IndexWriter is closed
|
||||
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: this IndexWriter is closed
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
|
||||
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
|
||||
at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:285)
|
||||
at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:271)
|
||||
at org.dspace.discovery.SolrServiceImpl.unIndexContent(SolrServiceImpl.java:331)
|
||||
at org.dspace.discovery.SolrServiceImpl.unIndexContent(SolrServiceImpl.java:315)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:803)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:876)
|
||||
at org.dspace.discovery.IndexClient.main(IndexClient.java:127)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
</code></pre>
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
|
||||
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
|
||||
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
|
||||
at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:285)
|
||||
at org.apache.solr.client.solrj.SolrServer.deleteByQuery(SolrServer.java:271)
|
||||
at org.dspace.discovery.SolrServiceImpl.unIndexContent(SolrServiceImpl.java:331)
|
||||
at org.dspace.discovery.SolrServiceImpl.unIndexContent(SolrServiceImpl.java:315)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:803)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:876)
|
||||
at org.dspace.discovery.IndexClient.main(IndexClient.java:127)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking at the past few days of logs, it looks like the indexing process started crashing on 2017-04-20:</li>
|
||||
</ul>
|
||||
<li><p>Looking at the past few days of logs, it looks like the indexing process started crashing on 2017-04-20:</p>
|
||||
|
||||
<pre><code># grep -c 'IndexWriter is closed' [dspace]/log/dspace.log.2017-04-*
|
||||
[dspace]/log/dspace.log.2017-04-01:0
|
||||
@ -580,41 +571,35 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: this Index
|
||||
[dspace]/log/dspace.log.2017-04-22:13278
|
||||
[dspace]/log/dspace.log.2017-04-23:22720
|
||||
[dspace]/log/dspace.log.2017-04-24:21422
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I restarted Tomcat and re-ran the discovery process manually:</li>
|
||||
</ul>
|
||||
<li><p>I restarted Tomcat and re-ran the discovery process manually:</p>
|
||||
|
||||
<pre><code>[dspace]/bin/dspace index-discovery
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Now everything is ok</li>
|
||||
<li>Finally finished manually running the cleanup task over and over and null’ing the conflicting IDs:</li>
|
||||
</ul>
|
||||
<li><p>Now everything is ok</p></li>
|
||||
|
||||
<li><p>Finally finished manually running the cleanup task over and over and null’ing the conflicting IDs:</p>
|
||||
|
||||
<pre><code>dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (435, 1132, 1136, 1220, 1236, 3002, 3255, 5322, 5098, 5982, 5897, 6245, 6184, 4927, 6070, 4925, 6888, 7368, 7136, 7294, 7698, 7864, 10799, 10839, 11765, 13241, 13634, 13642, 14127, 14146, 15582, 16116, 16254, 17136, 17486, 17824, 18098, 22091, 22149, 22206, 22449, 22548, 22559, 22454, 22253, 22553, 22897, 22941, 30262, 33657, 39796, 46943, 56561, 58237, 58739, 58734, 62020, 62535, 64149, 64672, 66988, 66919, 76005, 79780, 78545, 81078, 83620, 84492, 92513, 93915);
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Now running the cleanup script on DSpace Test and already seeing 11GB freed from the assetstore—it’s likely we haven’t had a cleanup task complete successfully in years…</li>
|
||||
<li><p>Now running the cleanup script on DSpace Test and already seeing 11GB freed from the assetstore—it’s likely we haven’t had a cleanup task complete successfully in years…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-04-25">2017-04-25</h2>
|
||||
|
||||
<ul>
|
||||
<li>Finally finished running the PDF thumbnail re-processing on CGSpace, the final count of CMYK PDFs is about 2751</li>
|
||||
<li>Preparing to run the cleanup task on CGSpace, I want to see how many files are in the assetstore:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Preparing to run the cleanup task on CGSpace, I want to see how many files are in the assetstore:</p>
|
||||
|
||||
<pre><code># find [dspace]/assetstore/ -type f | wc -l
|
||||
113104
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Troubleshooting the Atmire Solr update process that runs at 3:00 AM every morning, after finishing at 100% it has this error:</li>
|
||||
</ul>
|
||||
<li><p>Troubleshooting the Atmire Solr update process that runs at 3:00 AM every morning, after finishing at 100% it has this error:</p>
|
||||
|
||||
<pre><code>[=================================================> ]99% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:12
|
||||
[=================================================> ]99% time remaining: 0 seconds. timestamp: 2017-04-25 09:07:12
|
||||
@ -666,33 +651,36 @@ Caused by: java.lang.ClassNotFoundException: org.dspace.statistics.content.DSpac
|
||||
at java.lang.Class.forName(Class.java:264)
|
||||
at com.atmire.statistics.statlet.XmlParser.parsedatasetGenerator(SourceFile:299)
|
||||
at com.atmire.statistics.display.StatisticsGraph.parseDatasetGenerators(SourceFile:250)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run system updates on DSpace Test and reboot the server (new Java 8 131)</li>
|
||||
<li>Run the SQL cleanups on the bundle table on CGSpace and run the <code>[dspace]/bin/dspace cleanup</code> task</li>
|
||||
<li>I will be interested to see the file count in the assetstore as well as the database size after the next backup (last backup size is 111M)</li>
|
||||
<li>Final file count after the cleanup task finished: 77843</li>
|
||||
<li>So that is 30,000 files, and about 7GB</li>
|
||||
<li>Add logging to the cleanup cron task</li>
|
||||
<li><p>Run system updates on DSpace Test and reboot the server (new Java 8 131)</p></li>
|
||||
|
||||
<li><p>Run the SQL cleanups on the bundle table on CGSpace and run the <code>[dspace]/bin/dspace cleanup</code> task</p></li>
|
||||
|
||||
<li><p>I will be interested to see the file count in the assetstore as well as the database size after the next backup (last backup size is 111M)</p></li>
|
||||
|
||||
<li><p>Final file count after the cleanup task finished: 77843</p></li>
|
||||
|
||||
<li><p>So that is 30,000 files, and about 7GB</p></li>
|
||||
|
||||
<li><p>Add logging to the cleanup cron task</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-04-26">2017-04-26</h2>
|
||||
|
||||
<ul>
|
||||
<li>The size of the CGSpace database dump went from 111MB to 96MB, not sure about actual database size though</li>
|
||||
<li>Update RVM’s Ruby from 2.3.0 to 2.4.0 on DSpace Test:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Update RVM’s Ruby from 2.3.0 to 2.4.0 on DSpace Test:</p>
|
||||
|
||||
<pre><code>$ gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
|
||||
$ \curl -sSL https://raw.githubusercontent.com/wayneeseguin/rvm/master/binscripts/rvm-installer | bash -s stable --ruby
|
||||
... reload shell to get new Ruby
|
||||
$ gem install sass -v 3.3.14
|
||||
$ gem install compass -v 1.0.3
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Help Tsega re-deploy the ckm-cgspace-rest-api on DSpace Test</li>
|
||||
<li><p>Help Tsega re-deploy the ckm-cgspace-rest-api on DSpace Test</p></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="May, 2017"/>
|
||||
<meta name="twitter:description" content="2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -128,14 +128,13 @@
|
||||
|
||||
<ul>
|
||||
<li>Discovered that CGSpace has ~700 items that are missing the <code>cg.identifier.status</code> field</li>
|
||||
<li>Need to perhaps try using the “required metadata” curation task to find fields missing these items:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Need to perhaps try using the “required metadata” curation task to find fields missing these items:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace curate -t requiredmetadata -i 10568/1 -r - > /tmp/curation.out
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It seems the curation task dies when it finds an item which has missing metadata</li>
|
||||
<li><p>It seems the curation task dies when it finds an item which has missing metadata</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-05-06">2017-05-06</h2>
|
||||
@ -149,15 +148,14 @@
|
||||
<h2 id="2017-05-07">2017-05-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>Testing one replacement for CCAFS Flagships (<code>cg.subject.ccafs</code>), first changed in the submission forms, and then in the database:</li>
|
||||
</ul>
|
||||
<li><p>Testing one replacement for CCAFS Flagships (<code>cg.subject.ccafs</code>), first changed in the submission forms, and then in the database:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i ccafs-flagships-may7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also, CCAFS wants to re-order their flagships to prioritize the Phase II ones</li>
|
||||
<li>Waiting for feedback from CCAFS, then I can merge <a href="https://github.com/ilri/DSpace/pull/320">#320</a></li>
|
||||
<li><p>Also, CCAFS wants to re-order their flagships to prioritize the Phase II ones</p></li>
|
||||
|
||||
<li><p>Waiting for feedback from CCAFS, then I can merge <a href="https://github.com/ilri/DSpace/pull/320">#320</a></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-05-08">2017-05-08</h2>
|
||||
@ -168,19 +166,20 @@
|
||||
<li>When ingesting some collections I was getting <code>java.lang.OutOfMemoryError: GC overhead limit exceeded</code>, which can be solved by disabling the GC timeout with <code>-XX:-UseGCOverheadLimit</code></li>
|
||||
<li>Other times I was getting an error about heap space, so I kept bumping the RAM allocation by 512MB each time (up to 4096m!) it crashed</li>
|
||||
<li>This leads to tens of thousands of abandoned files in the assetstore, which need to be cleaned up using <code>dspace cleanup -v</code>, or else you’ll run out of disk space</li>
|
||||
<li>In the end I realized it’s better to use submission mode (<code>-s</code>) to ingest the community object as a single AIP without its children, followed by each of the collections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>In the end I realized it’s better to use submission mode (<code>-s</code>) to ingest the community object as a single AIP without its children, followed by each of the collections:</p>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m -XX:-UseGCOverheadLimit"
|
||||
$ [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10568/87775 /home/aorth/10947-1/10947-1.zip
|
||||
$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10947/1 $collection; done
|
||||
$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e some@user.com $item; done
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Note that in submission mode DSpace ignores the handle specified in <code>mets.xml</code> in the zip file, so you need to turn that off with <code>-o ignoreHandle=false</code></li>
|
||||
<li>The <code>-u</code> option supresses prompts, to allow the process to run without user input</li>
|
||||
<li>Give feedback to CIFOR about their data quality:
|
||||
<li><p>Note that in submission mode DSpace ignores the handle specified in <code>mets.xml</code> in the zip file, so you need to turn that off with <code>-o ignoreHandle=false</code></p></li>
|
||||
|
||||
<li><p>The <code>-u</code> option supresses prompts, to allow the process to run without user input</p></li>
|
||||
|
||||
<li><p>Give feedback to CIFOR about their data quality:</p>
|
||||
|
||||
<ul>
|
||||
<li>Suggestion: uppercase dc.subject, cg.coverage.region, and cg.coverage.subregion in your crosswalk so they match CGSpace and therefore can be faceted / reported on easier</li>
|
||||
@ -189,34 +188,37 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
|
||||
<li>Suggestion: use dc.type “Blog Post” instead of “Blog” for your blog post items (we are also adding a “Blog Post” type to CGSpace soon)</li>
|
||||
<li>Question: many of your items use dc.document.uri AND cg.identifier.url with the same text value?</li>
|
||||
</ul></li>
|
||||
<li>Help Marianne from WLE with an Open Search query to show the latest WLE CRP outputs: <a href="https://cgspace.cgiar.org/open-search/discover?query=crpsubject:WATER%2C+LAND+AND+ECOSYSTEMS&sort_by=2&order=DESC">https://cgspace.cgiar.org/open-search/discover?query=crpsubject:WATER%2C+LAND+AND+ECOSYSTEMS&sort_by=2&order=DESC</a></li>
|
||||
<li>This uses the webui’s item list sort options, see <code>webui.itemlist.sort-option</code> in <code>dspace.cfg</code></li>
|
||||
<li>The equivalent Discovery search would be: <a href="https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&filter_relational_operator_1=equals&filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&submit_apply_filter=&query=&rpp=10&sort_by=dc.date.issued_dt&order=desc">https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&filter_relational_operator_1=equals&filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&submit_apply_filter=&query=&rpp=10&sort_by=dc.date.issued_dt&order=desc</a></li>
|
||||
|
||||
<li><p>Help Marianne from WLE with an Open Search query to show the latest WLE CRP outputs: <a href="https://cgspace.cgiar.org/open-search/discover?query=crpsubject:WATER%2C+LAND+AND+ECOSYSTEMS&sort_by=2&order=DESC">https://cgspace.cgiar.org/open-search/discover?query=crpsubject:WATER%2C+LAND+AND+ECOSYSTEMS&sort_by=2&order=DESC</a></p></li>
|
||||
|
||||
<li><p>This uses the webui’s item list sort options, see <code>webui.itemlist.sort-option</code> in <code>dspace.cfg</code></p></li>
|
||||
|
||||
<li><p>The equivalent Discovery search would be: <a href="https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&filter_relational_operator_1=equals&filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&submit_apply_filter=&query=&rpp=10&sort_by=dc.date.issued_dt&order=desc">https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&filter_relational_operator_1=equals&filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&submit_apply_filter=&query=&rpp=10&sort_by=dc.date.issued_dt&order=desc</a></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-05-09">2017-05-09</h2>
|
||||
|
||||
<ul>
|
||||
<li>The CGIAR Library metadata has some blank metadata values, which leads to <code>|||</code> in the Discovery facets</li>
|
||||
<li>Clean these up in the database using:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Clean these up in the database using:</p>
|
||||
|
||||
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I ended up running into issues during data cleaning and decided to wipe out the entire community and re-sync DSpace Test assetstore and database from CGSpace rather than waiting for the cleanup task to clean up</li>
|
||||
<li>Hours into the re-ingestion I ran into more errors, and had to erase everything and start over <em>again</em>!</li>
|
||||
<li>Now, no matter what I do I keep getting foreign key errors…</li>
|
||||
</ul>
|
||||
<li><p>I ended up running into issues during data cleaning and decided to wipe out the entire community and re-sync DSpace Test assetstore and database from CGSpace rather than waiting for the cleanup task to clean up</p></li>
|
||||
|
||||
<li><p>Hours into the re-ingestion I ran into more errors, and had to erase everything and start over <em>again</em>!</p></li>
|
||||
|
||||
<li><p>Now, no matter what I do I keep getting foreign key errors…</p>
|
||||
|
||||
<pre><code>Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "handle_pkey"
|
||||
Detail: Key (handle_id)=(80928) already exists.
|
||||
</code></pre>
|
||||
Detail: Key (handle_id)=(80928) already exists.
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think those errors actually come from me running the <code>update-sequences.sql</code> script while Tomcat/DSpace are running</li>
|
||||
<li>Apparently you need to stop Tomcat!</li>
|
||||
<li><p>I think those errors actually come from me running the <code>update-sequences.sql</code> script while Tomcat/DSpace are running</p></li>
|
||||
|
||||
<li><p>Apparently you need to stop Tomcat!</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-05-10">2017-05-10</h2>
|
||||
@ -224,8 +226,8 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
|
||||
<ul>
|
||||
<li>Atmire says they are willing to extend the ORCID implementation, and I’ve asked them to provide a quote</li>
|
||||
<li>I clarified that the scope of the implementation should be that ORCIDs are stored in the database and exposed via REST / API like other fields</li>
|
||||
<li>Finally finished importing all the CGIAR Library content, final method was:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Finally finished importing all the CGIAR Library content, final method was:</p>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit"
|
||||
$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2517/10947-2517.zip
|
||||
@ -234,17 +236,19 @@ $ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@
|
||||
$ [dspace]/bin/dspace packager -s -t AIP -o ignoreHandle=false -e some@user.com -p 10568/80923 /home/aorth/10947-1/10947-1.zip
|
||||
$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10947/1 $collection; done
|
||||
$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e some@user.com $item; done
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Basically, import the smaller communities using recursive AIP import (with <code>skipIfParentMissing</code>)</li>
|
||||
<li>Then, for the larger collection, create the community, collections, and items separately, ingesting the items one by one</li>
|
||||
<li>The <code>-XX:-UseGCOverheadLimit</code> JVM option helps with some issues in large imports</li>
|
||||
<li>After this I ran the <code>update-sequences.sql</code> script (with Tomcat shut down), and cleaned up the 200+ blank metadata records:</li>
|
||||
</ul>
|
||||
<li><p>Basically, import the smaller communities using recursive AIP import (with <code>skipIfParentMissing</code>)</p></li>
|
||||
|
||||
<li><p>Then, for the larger collection, create the community, collections, and items separately, ingesting the items one by one</p></li>
|
||||
|
||||
<li><p>The <code>-XX:-UseGCOverheadLimit</code> JVM option helps with some issues in large imports</p></li>
|
||||
|
||||
<li><p>After this I ran the <code>update-sequences.sql</code> script (with Tomcat shut down), and cleaned up the 200+ blank metadata records:</p>
|
||||
|
||||
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-05-13">2017-05-13</h2>
|
||||
|
||||
@ -261,13 +265,12 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
|
||||
<li>After that I started looking in the <code>dc.subject</code> field to try to pull countries and regions out, but there are too many values in there</li>
|
||||
<li>Bump the Academicons dependency of the Mirage 2 themes from 1.6.0 to 1.8.0 because the upstream deleted the old tag and now the build is failing: <a href="https://github.com/ilri/DSpace/pull/321">#321</a></li>
|
||||
<li>Merge changes to CCAFS project identifiers and flagships: <a href="https://github.com/ilri/DSpace/pull/320">#320</a></li>
|
||||
<li>Run updates for CCAFS flagships on CGSpace:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Run updates for CCAFS flagships on CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/ccafs-flagships-may7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><p>These include:</p>
|
||||
|
||||
<ul>
|
||||
@ -292,44 +295,38 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
|
||||
<h2 id="2017-05-17">2017-05-17</h2>
|
||||
|
||||
<ul>
|
||||
<li>Looking into the error I get when trying to create a new collection on DSpace Test:</li>
|
||||
</ul>
|
||||
<li><p>Looking into the error I get when trying to create a new collection on DSpace Test:</p>
|
||||
|
||||
<pre><code>ERROR: duplicate key value violates unique constraint "handle_pkey" Detail: Key (handle_id)=(84834) already exists.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I tried updating the sequences a few times, with Tomcat running and stopped, but it hasn’t helped</li>
|
||||
<li>It appears item with <code>handle_id</code> 84834 is one of the imported CGIAR Library items:</li>
|
||||
</ul>
|
||||
<li><p>I tried updating the sequences a few times, with Tomcat running and stopped, but it hasn’t helped</p></li>
|
||||
|
||||
<li><p>It appears item with <code>handle_id</code> 84834 is one of the imported CGIAR Library items:</p>
|
||||
|
||||
<pre><code>dspace=# select * from handle where handle_id=84834;
|
||||
handle_id | handle | resource_type_id | resource_id
|
||||
handle_id | handle | resource_type_id | resource_id
|
||||
-----------+------------+------------------+-------------
|
||||
84834 | 10947/1332 | 2 | 87113
|
||||
</code></pre>
|
||||
84834 | 10947/1332 | 2 | 87113
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looks like the max <code>handle_id</code> is actually much higher:</li>
|
||||
</ul>
|
||||
<li><p>Looks like the max <code>handle_id</code> is actually much higher:</p>
|
||||
|
||||
<pre><code>dspace=# select * from handle where handle_id=(select max(handle_id) from handle);
|
||||
handle_id | handle | resource_type_id | resource_id
|
||||
handle_id | handle | resource_type_id | resource_id
|
||||
-----------+----------+------------------+-------------
|
||||
86873 | 10947/99 | 2 | 89153
|
||||
86873 | 10947/99 | 2 | 89153
|
||||
(1 row)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’ve posted on the dspace-test mailing list to see if I can just manually set the <code>handle_seq</code> to that value</li>
|
||||
<li>Actually, it seems I can manually set the handle sequence using:</li>
|
||||
</ul>
|
||||
<li><p>I’ve posted on the dspace-test mailing list to see if I can just manually set the <code>handle_seq</code> to that value</p></li>
|
||||
|
||||
<li><p>Actually, it seems I can manually set the handle sequence using:</p>
|
||||
|
||||
<pre><code>dspace=# select setval('handle_seq',86873);
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After that I can create collections just fine, though I’m not sure if it has other side effects</li>
|
||||
<li><p>After that I can create collections just fine, though I’m not sure if it has other side effects</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-05-21">2017-05-21</h2>
|
||||
@ -344,15 +341,13 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
|
||||
|
||||
<ul>
|
||||
<li>Do some cleanups of community and collection names in CGIAR System Management Office community on DSpace Test, as well as move some items as Peter requested</li>
|
||||
<li>Peter wanted a list of authors in here, so I generated a list of collections using the “View Source” on each community and this hacky awk:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Peter wanted a list of authors in here, so I generated a list of collections using the “View Source” on each community and this hacky awk:</p>
|
||||
|
||||
<pre><code>$ grep 10947/ /tmp/collections | grep -v cocoon | awk -F/ '{print $3"/"$4}' | awk -F\" '{print $1}' | vim -
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then I joined them together and ran this old SQL query from the dspace-tech mailing list which gives you authors for items in those collections:</li>
|
||||
</ul>
|
||||
<li><p>Then I joined them together and ran this old SQL query from the dspace-tech mailing list which gives you authors for items in those collections:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value
|
||||
from metadatavalue
|
||||
@ -367,18 +362,17 @@ AND resource_id IN (select item_id from collection2item where collection_id IN (
|
||||
47/2519', '10947/2708', '10947/2526', '10947/2871', '10947/2527', '10947/4467', '10947/3457', '10947/2528', '10947/2529', '10947/2533', '10947/2530', '10947/2
|
||||
531', '10947/2532', '10947/2538', '10947/2534', '10947/2540', '10947/2900', '10947/2539', '10947/2784', '10947/2536', '10947/2805', '10947/2541', '10947/2535'
|
||||
, '10947/2537', '10568/93761')));
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>To get a CSV (with counts) from that:</li>
|
||||
</ul>
|
||||
<li><p>To get a CSV (with counts) from that:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*)
|
||||
from metadatavalue
|
||||
where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author')
|
||||
AND resource_type_id = 2
|
||||
AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10947/2', '10947/3', '10947/10', '10947/4', '10947/5', '10947/6', '10947/7', '10947/8', '10947/9', '10947/11', '10947/25', '10947/12', '10947/26', '10947/27', '10947/28', '10947/29', '10947/30', '10947/13', '10947/14', '10947/15', '10947/16', '10947/31', '10947/32', '10947/33', '10947/34', '10947/35', '10947/36', '10947/37', '10947/17', '10947/18', '10947/38', '10947/19', '10947/39', '10947/40', '10947/41', '10947/42', '10947/43', '10947/2512', '10947/44', '10947/20', '10947/21', '10947/45', '10947/46', '10947/47', '10947/48', '10947/49', '10947/22', '10947/23', '10947/24', '10947/50', '10947/51', '10947/2518', '10947/2776', '10947/2790', '10947/2521', '10947/2522', '10947/2782', '10947/2525', '10947/2836', '10947/2524', '10947/2878', '10947/2520', '10947/2523', '10947/2786', '10947/2631', '10947/2589', '10947/2519', '10947/2708', '10947/2526', '10947/2871', '10947/2527', '10947/4467', '10947/3457', '10947/2528', '10947/2529', '10947/2533', '10947/2530', '10947/2531', '10947/2532', '10947/2538', '10947/2534', '10947/2540', '10947/2900', '10947/2539', '10947/2784', '10947/2536', '10947/2805', '10947/2541', '10947/2535', '10947/2537', '10568/93761'))) group by text_value order by count desc) to /tmp/cgiar-librar-authors.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-05-23">2017-05-23</h2>
|
||||
|
||||
@ -386,15 +380,14 @@ AND resource_id IN (select item_id from collection2item where collection_id IN (
|
||||
<li>Add Affiliation to filters on Listing and Reports module (<a href="https://github.com/ilri/DSpace/pull/325">#325</a>)</li>
|
||||
<li>Start looking at WLE’s Phase II metadata updates but it seems they are not tagging their items properly, as their website importer infers which theme to use based on the name of the CGSpace collection!</li>
|
||||
<li>For now I’ve suggested that they just change the collection names and that we fix their metadata manually afterwards</li>
|
||||
<li>Also, they have a lot of messed up values in their <code>cg.subject.wle</code> field so I will clean up some of those first:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Also, they have a lot of messed up values in their <code>cg.subject.wle</code> field so I will clean up some of those first:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id=119) to /tmp/wle.csv with csv;
|
||||
COPY 111
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Respond to Atmire message about ORCIDs, saying that right now we’d prefer to just have them available via REST API like any other metadata field, and that I’m available for a Skype</li>
|
||||
<li><p>Respond to Atmire message about ORCIDs, saying that right now we’d prefer to just have them available via REST API like any other metadata field, and that I’m available for a Skype</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-05-26">2017-05-26</h2>
|
||||
@ -410,38 +403,33 @@ COPY 111
|
||||
<li>File an issue on GitHub to explore/track migration to proper country/region codes (ISO <sup>2</sup>⁄<sub>3</sub> and UN M.49): <a href="https://github.com/ilri/DSpace/issues/326">#326</a></li>
|
||||
<li>Ask Peter how the Landportal.info people should acknowledge us as the source of data on their website</li>
|
||||
<li>Communicate with MARLO people about progress on exposing ORCIDs via the REST API, as it is set to be discussed in the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+June+2017">June, 2017 DCAT meeting</a></li>
|
||||
<li>Find all of Amos Omore’s author name variations so I can link them to his authority entry that has an ORCID:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Find all of Amos Omore’s author name variations so I can link them to his authority entry that has an ORCID:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like 'Omore, A%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Set the authority for all variations to one containing an ORCID:</li>
|
||||
</ul>
|
||||
<li><p>Set the authority for all variations to one containing an ORCID:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set authority='4428ee88-90ef-4107-b837-3c0ec988520b', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Omore, A%';
|
||||
UPDATE 187
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Next I need to do Edgar Twine:</li>
|
||||
</ul>
|
||||
<li><p>Next I need to do Edgar Twine:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like 'Twine, E%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But it doesn’t look like any of his existing entries are linked to an authority which has an ORCID, so I edited the metadata via “Edit this Item” and looked up his ORCID and linked it there</li>
|
||||
<li>Now I should be able to set his name variations to the new authority:</li>
|
||||
</ul>
|
||||
<li><p>But it doesn’t look like any of his existing entries are linked to an authority which has an ORCID, so I edited the metadata via “Edit this Item” and looked up his ORCID and linked it there</p></li>
|
||||
|
||||
<li><p>Now I should be able to set his name variations to the new authority:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set authority='f70d0a01-d562-45b8-bca3-9cf7f249bc8b', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Twine, E%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run the corrections on CGSpace and then update discovery / authority</li>
|
||||
<li>I notice that there are a handful of <code>java.lang.OutOfMemoryError: Java heap space</code> errors in the Catalina logs on CGSpace, I should go look into that…</li>
|
||||
<li><p>Run the corrections on CGSpace and then update discovery / authority</p></li>
|
||||
|
||||
<li><p>I notice that there are a handful of <code>java.lang.OutOfMemoryError: Java heap space</code> errors in the Catalina logs on CGSpace, I should go look into that…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-05-29">2017-05-29</h2>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="June, 2017"/>
|
||||
<meta name="twitter:description" content="2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we’ll create a new sub-community for Phase II and create collections for the research themes there The current “Research Themes” community will be renamed to “WLE Phase I Research Themes” Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -148,16 +148,17 @@
|
||||
<li>Command like: <code>$ gs -dNOPAUSE -dBATCH -dFirstPage=14 -dLastPage=27 -sDEVICE=pdfwrite -sOutputFile=beans.pdf -f 12605-1.pdf</code></li>
|
||||
</ul></li>
|
||||
<li>17 of the items have issues with incorrect page number ranges, and upon closer inspection they do not appear in the referenced PDF</li>
|
||||
<li>I’ve flagged them and proceeded without them (752 total) on DSpace Test:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I’ve flagged them and proceeded without them (752 total) on DSpace Test:</p>
|
||||
|
||||
<pre><code>$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/93843 --source /home/aorth/src/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &> /tmp/ciat-books.log
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I went and did some basic sanity checks on the remaining items in the CIAT Book Chapters and decided they are mostly fine (except one duplicate and the flagged ones), so I imported them to DSpace Test too (162 items)</li>
|
||||
<li>Total items in CIAT Book Chapters is 914, with the others being flagged for some reason, and we should send that back to CIAT</li>
|
||||
<li>Restart Tomcat on CGSpace so that the <code>cg.identifier.wletheme</code> field is available on REST API for Macaroni Bros</li>
|
||||
<li><p>I went and did some basic sanity checks on the remaining items in the CIAT Book Chapters and decided they are mostly fine (except one duplicate and the flagged ones), so I imported them to DSpace Test too (162 items)</p></li>
|
||||
|
||||
<li><p>Total items in CIAT Book Chapters is 914, with the others being flagged for some reason, and we should send that back to CIAT</p></li>
|
||||
|
||||
<li><p>Restart Tomcat on CGSpace so that the <code>cg.identifier.wletheme</code> field is available on REST API for Macaroni Bros</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-06-07">2017-06-07</h2>
|
||||
@ -167,8 +168,8 @@
|
||||
<li>Still doesn’t seem to give results I’d expect, like there are no results for Maria Garruccio, or for the ILRI community!</li>
|
||||
<li>Then I’ll file an update to the issue on Atmire’s tracker</li>
|
||||
<li>Created a new branch with just the relevant changes, so I can send it to them</li>
|
||||
<li>One thing I noticed is that there is a failed database migration related to CUA:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>One thing I noticed is that there is a failed database migration related to CUA:</p>
|
||||
|
||||
<pre><code>+----------------+----------------------------+---------------------+---------+
|
||||
| Version | Description | Installed on | State |
|
||||
@ -194,10 +195,9 @@
|
||||
| 5.5.2015.12.03 | Atmire MQM migration | 2016-11-27 06:39:06 | OutOrde |
|
||||
| 5.6.2016.08.08 | CUA emailreport migration | 2017-01-29 11:18:56 | OutOrde |
|
||||
+----------------+----------------------------+---------------------+---------+
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Merge the pull request for <a href="https://github.com/ilri/DSpace/pull/328">WLE Phase II themes</a></li>
|
||||
<li><p>Merge the pull request for <a href="https://github.com/ilri/DSpace/pull/328">WLE Phase II themes</a></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-06-18">2017-06-18</h2>
|
||||
@ -220,53 +220,56 @@
|
||||
<li><code>replace(value,/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>/,'')</code></li>
|
||||
<li><code>value.unescape("html").unescape("xml")</code></li>
|
||||
</ul></li>
|
||||
<li>Finally import 914 CIAT Book Chapters to CGSpace in two batches:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Finally import 914 CIAT Book Chapters to CGSpace in two batches:</p>
|
||||
|
||||
<pre><code>$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &> /tmp/ciat-books.log
|
||||
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books2.map &> /tmp/ciat-books2.log
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-06-25">2017-06-25</h2>
|
||||
|
||||
<ul>
|
||||
<li>WLE has said that one of their Phase II research themes is being renamed from <code>Regenerating Degraded Landscapes</code> to <code>Restoring Degraded Landscapes</code></li>
|
||||
<li>Pull request with the changes to <code>input-forms.xml</code>: <a href="https://github.com/ilri/DSpace/pull/329">#329</a></li>
|
||||
<li>As of now it doesn’t look like there are any items using this research theme so we don’t need to do any updates:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>As of now it doesn’t look like there are any items using this research theme so we don’t need to do any updates:</p>
|
||||
|
||||
<pre><code>dspace=# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=237 and text_value like 'Regenerating Degraded Landscapes%';
|
||||
text_value
|
||||
text_value
|
||||
------------
|
||||
(0 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Marianne from WLE asked if they can have both Phase I and II research themes together in the item submission form</li>
|
||||
<li>Perhaps we can add them together in the same question for <code>cg.identifier.wletheme</code></li>
|
||||
<li><p>Marianne from WLE asked if they can have both Phase I and II research themes together in the item submission form</p></li>
|
||||
|
||||
<li><p>Perhaps we can add them together in the same question for <code>cg.identifier.wletheme</code></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-06-30">2017-06-30</h2>
|
||||
|
||||
<ul>
|
||||
<li>CGSpace went down briefly, I see lots of these errors in the dspace logs:</li>
|
||||
</ul>
|
||||
<li><p>CGSpace went down briefly, I see lots of these errors in the dspace logs:</p>
|
||||
|
||||
<pre><code>Java stacktrace: java.util.NoSuchElementException: Timeout waiting for idle object
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After looking at the Tomcat logs, Munin graphs, and PostgreSQL connection stats, it seems there is just a high load</li>
|
||||
<li>Might be a good time to adjust DSpace’s database connection settings, like I first mentioned in April, 2017 after reading the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">2017-04 DCAT comments</a></li>
|
||||
<li>I’ve adjusted the following in CGSpace’s config:
|
||||
<li><p>After looking at the Tomcat logs, Munin graphs, and PostgreSQL connection stats, it seems there is just a high load</p></li>
|
||||
|
||||
<li><p>Might be a good time to adjust DSpace’s database connection settings, like I first mentioned in April, 2017 after reading the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">2017-04 DCAT comments</a></p></li>
|
||||
|
||||
<li><p>I’ve adjusted the following in CGSpace’s config:</p>
|
||||
|
||||
<ul>
|
||||
<li><code>db.maxconnections</code> 30→70 (the default PostgreSQL config allows 100 connections, so DSpace’s default of 30 is quite low)</li>
|
||||
<li><code>db.maxwait</code> 5000→10000</li>
|
||||
<li><code>db.maxidle</code> 8→20 (DSpace default is -1, unlimited, but we had set it to 8 earlier)</li>
|
||||
</ul></li>
|
||||
<li>We will need to adjust this again (as well as the <code>pg_hba.conf</code> settings) when we deploy tsega’s REST API</li>
|
||||
<li>Whip up a test for Marianne of WLE to be able to show both their Phase I and II research themes in the CGSpace item submission form:</li>
|
||||
|
||||
<li><p>We will need to adjust this again (as well as the <code>pg_hba.conf</code> settings) when we deploy tsega’s REST API</p></li>
|
||||
|
||||
<li><p>Whip up a test for Marianne of WLE to be able to show both their Phase I and II research themes in the CGSpace item submission form:</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2017/06/wle-theme-test-a.png" alt="Test A for displaying the Phase I and II research themes" />
|
||||
|
@ -39,7 +39,7 @@ Merge changes for WLE Phase II theme rename (#329)
|
||||
Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
|
||||
We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -155,32 +155,30 @@ We can use PostgreSQL’s extended output format (-x) plus sed to format the
|
||||
|
||||
<ul>
|
||||
<li>Adjust WLE Research Theme to include both Phase I and II on the submission form according to editor feedback (<a href="https://github.com/ilri/DSpace/pull/330">#330</a>)</li>
|
||||
<li>Generate list of fields in the current CGSpace <code>cg</code> scheme so we can record them properly in the metadata registry:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Generate list of fields in the current CGSpace <code>cg</code> scheme so we can record them properly in the metadata registry:</p>
|
||||
|
||||
<pre><code>$ psql dspace -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=2 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:</dc-type>\n<dc-type>\n<schema>cg</schema>:;s:([^ ]*) +\| (.*): <\1>\2</\1>:;s:^$:</dc-type>:;1s:</dc-type>\n::' > cg-types.xml
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>CGSpace was unavailable briefly, and I saw this error in the DSpace log file:</li>
|
||||
</ul>
|
||||
<li><p>CGSpace was unavailable briefly, and I saw this error in the DSpace log file:</p>
|
||||
|
||||
<pre><code>2017-07-05 13:05:36,452 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
||||
org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking at the <code>pg_stat_activity</code> table I saw there were indeed 98 active connections to PostgreSQL, and at this time the limit is 100, so that makes sense</li>
|
||||
<li>Tsega restarted Tomcat and it’s working now</li>
|
||||
<li>Abenet said she was generating a report with Atmire’s CUA module, so it could be due to that?</li>
|
||||
<li>Looking in the logs I see this random error again that I should report to DSpace:</li>
|
||||
</ul>
|
||||
<li><p>Looking at the <code>pg_stat_activity</code> table I saw there were indeed 98 active connections to PostgreSQL, and at this time the limit is 100, so that makes sense</p></li>
|
||||
|
||||
<li><p>Tsega restarted Tomcat and it’s working now</p></li>
|
||||
|
||||
<li><p>Abenet said she was generating a report with Atmire’s CUA module, so it could be due to that?</p></li>
|
||||
|
||||
<li><p>Looking in the logs I see this random error again that I should report to DSpace:</p>
|
||||
|
||||
<pre><code>2017-07-05 13:50:07,196 ERROR org.dspace.statistics.SolrLogger @ COUNTRY ERROR: EU
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Seems to come from <code>dspace-api/src/main/java/org/dspace/statistics/SolrLogger.java</code></li>
|
||||
<li><p>Seems to come from <code>dspace-api/src/main/java/org/dspace/statistics/SolrLogger.java</code></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-07-06">2017-07-06</h2>
|
||||
@ -236,14 +234,12 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
|
||||
<h2 id="2017-07-24">2017-07-24</h2>
|
||||
|
||||
<ul>
|
||||
<li>Move two top-level communities to be sub-communities of ILRI Projects</li>
|
||||
</ul>
|
||||
<li><p>Move two top-level communities to be sub-communities of ILRI Projects</p>
|
||||
|
||||
<pre><code>$ for community in 10568/2347 10568/25209; do /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child="$community"; done
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Discuss CGIAR Library data cleanup with Sisay and Abenet</li>
|
||||
<li><p>Discuss CGIAR Library data cleanup with Sisay and Abenet</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-07-27">2017-07-27</h2>
|
||||
@ -279,27 +275,25 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
|
||||
<h2 id="2017-07-31">2017-07-31</h2>
|
||||
|
||||
<ul>
|
||||
<li>Looks like the final list of metadata corrections for CCAFS project tags will be:</li>
|
||||
</ul>
|
||||
<li><p>Looks like the final list of metadata corrections for CCAFS project tags will be:</p>
|
||||
|
||||
<pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
|
||||
update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
|
||||
update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
|
||||
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Now just waiting to run them on CGSpace, and then apply the modified input forms after Macaroni Bros give me an updated list</li>
|
||||
<li>Temporarily increase the nginx upload limit to 200MB for Sisay to upload the CIAT presentations</li>
|
||||
<li>Looking at CGSpace activity page, there are 52 Baidu bots concurrently crawling our website (I copied the activity page to a text file and grep it)!</li>
|
||||
</ul>
|
||||
<li><p>Now just waiting to run them on CGSpace, and then apply the modified input forms after Macaroni Bros give me an updated list</p></li>
|
||||
|
||||
<li><p>Temporarily increase the nginx upload limit to 200MB for Sisay to upload the CIAT presentations</p></li>
|
||||
|
||||
<li><p>Looking at CGSpace activity page, there are 52 Baidu bots concurrently crawling our website (I copied the activity page to a text file and grep it)!</p>
|
||||
|
||||
<pre><code>$ grep 180.76. /tmp/status | awk '{print $5}' | sort | uniq | wc -l
|
||||
52
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>From looking at the <code>dspace.log</code> I see they are all using the same session, which means our Crawler Session Manager Valve is working</li>
|
||||
<li><p>From looking at the <code>dspace.log</code> I see they are all using the same session, which means our Crawler Session Manager Valve is working</p></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -59,7 +59,7 @@ This was due to newline characters in the dc.description.abstract column, which
|
||||
I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
|
||||
Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -220,14 +220,13 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
<li>Had to do some quality checks and column renames before importing, as either Sisay or Abenet renamed a few columns and the metadata importer wanted to remove/add new metadata for title, abstract, etc.</li>
|
||||
<li>Also I applied the HTML entities unescape transform on the abstract column in Open Refine</li>
|
||||
<li>I need to get an author list from the database for only the CGIAR Library community to send to Peter</li>
|
||||
<li>It turns out that I had already used this SQL query in <a href="/cgspace-notes/2017-05">May, 2017</a> to get the authors from CGIAR Library:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It turns out that I had already used this SQL query in <a href="/cgspace-notes/2017-05">May, 2017</a> to get the authors from CGIAR Library:</p>
|
||||
|
||||
<pre><code>dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9'))) group by text_value order by count desc) to /tmp/cgiar-library-authors.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Meeting with Peter and CGSpace team
|
||||
<li><p>Meeting with Peter and CGSpace team</p>
|
||||
|
||||
<ul>
|
||||
<li>Alan to follow up with ICARDA about depositing in CGSpace, we want ICARD and Drylands legacy content but not duplicates</li>
|
||||
@ -235,8 +234,10 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
<li>Alan to follow up with Atmire about a dedicated field for ORCIDs, based on the discussion in the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+June+2017">June, 2017 DCAT meeting</a></li>
|
||||
<li>Alan to ask about how to query external services like AGROVOC in the DSpace submission form</li>
|
||||
</ul></li>
|
||||
<li>Follow up with Atmire on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">ticket about ORCID metadata in DSpace</a></li>
|
||||
<li>Follow up with Lili and Andrea about the pending CCAFS metadata and flagship updates</li>
|
||||
|
||||
<li><p>Follow up with Atmire on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">ticket about ORCID metadata in DSpace</a></p></li>
|
||||
|
||||
<li><p>Follow up with Lili and Andrea about the pending CCAFS metadata and flagship updates</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-08-11">2017-08-11</h2>
|
||||
@ -254,29 +255,29 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
|
||||
<ul>
|
||||
<li>I sent a message to the mailing list about the duplicate content issue with <code>/rest</code> and <code>/bitstream</code> URLs</li>
|
||||
<li>Looking at the logs for the REST API on <code>/rest</code>, it looks like there is someone hammering doing testing or something on it…</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Looking at the logs for the REST API on <code>/rest</code>, it looks like there is someone hammering doing testing or something on it…</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 5
|
||||
140 66.249.66.91
|
||||
404 66.249.66.90
|
||||
1479 50.116.102.77
|
||||
9794 45.5.184.196
|
||||
85736 70.32.83.92
|
||||
</code></pre>
|
||||
140 66.249.66.91
|
||||
404 66.249.66.90
|
||||
1479 50.116.102.77
|
||||
9794 45.5.184.196
|
||||
85736 70.32.83.92
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The top offender is 70.32.83.92 which is actually the same IP as ccafs.cgiar.org, so I will email the Macaroni Bros to see if they can test on DSpace Test instead</li>
|
||||
<li>I’ve enabled logging of <code>/oai</code> requests on nginx as well so we can potentially determine bad actors here (also to see if anyone is actually using OAI!)</li>
|
||||
<li><p>The top offender is 70.32.83.92 which is actually the same IP as ccafs.cgiar.org, so I will email the Macaroni Bros to see if they can test on DSpace Test instead</p></li>
|
||||
|
||||
<li><p>I’ve enabled logging of <code>/oai</code> requests on nginx as well so we can potentially determine bad actors here (also to see if anyone is actually using OAI!)</p>
|
||||
|
||||
<pre><code># log oai requests
|
||||
location /oai {
|
||||
access_log /var/log/nginx/oai.log;
|
||||
proxy_pass http://tomcat_http;
|
||||
}
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<pre><code> # log oai requests
|
||||
location /oai {
|
||||
access_log /var/log/nginx/oai.log;
|
||||
proxy_pass http://tomcat_http;
|
||||
}
|
||||
</code></pre>
|
||||
|
||||
<h2 id="2017-08-13">2017-08-13</h2>
|
||||
|
||||
<ul>
|
||||
@ -287,27 +288,25 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
<h2 id="2017-08-14">2017-08-14</h2>
|
||||
|
||||
<ul>
|
||||
<li>Run author corrections on CGIAR Library community from Peter</li>
|
||||
</ul>
|
||||
<li><p>Run author corrections on CGIAR Library community from Peter</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/authors-fix-523.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p fuuuu
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There were only three deletions so I just did them manually:</li>
|
||||
</ul>
|
||||
<li><p>There were only three deletions so I just did them manually:</p>
|
||||
|
||||
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='C';
|
||||
DELETE 1
|
||||
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='WSSD';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generate a new list of authors from the CGIAR Library community for Peter to look through now that the initial corrections have been done</li>
|
||||
<li>Thinking about resource limits for PostgreSQL again after last week’s CGSpace crash and related to a recently discussion I had in the comments of the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting notes</a></li>
|
||||
<li>In that thread Chris Wilper suggests a new default of 35 max connections for <code>db.maxconnections</code> (from the current default of 30), knowing that <em>each DSpace web application</em> gets to use up to this many on its own</li>
|
||||
<li>It would be good to approximate what the theoretical maximum number of connections on a busy server would be, perhaps by looking to see which apps use SQL:</li>
|
||||
</ul>
|
||||
<li><p>Generate a new list of authors from the CGIAR Library community for Peter to look through now that the initial corrections have been done</p></li>
|
||||
|
||||
<li><p>Thinking about resource limits for PostgreSQL again after last week’s CGSpace crash and related to a recently discussion I had in the comments of the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting notes</a></p></li>
|
||||
|
||||
<li><p>In that thread Chris Wilper suggests a new default of 35 max connections for <code>db.maxconnections</code> (from the current default of 30), knowing that <em>each DSpace web application</em> gets to use up to this many on its own</p></li>
|
||||
|
||||
<li><p>It would be good to approximate what the theoretical maximum number of connections on a busy server would be, perhaps by looking to see which apps use SQL:</p>
|
||||
|
||||
<pre><code>$ grep -rsI SQLException dspace-jspui | wc -l
|
||||
473
|
||||
@ -319,18 +318,25 @@ $ grep -rsI SQLException dspace-solr | wc -l
|
||||
0
|
||||
$ grep -rsI SQLException dspace-xmlui | wc -l
|
||||
866
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Of those five applications we’re running, only <code>solr</code> appears not to use the database directly</li>
|
||||
<li>And JSPUI is only used internally (so it doesn’t really count), leaving us with OAI, REST, and XMLUI</li>
|
||||
<li>Assuming each takes a theoretical maximum of 35 connections during a heavy load (35 * 3 = 105), that would put the connections well above PostgreSQL’s default max of 100 connections (remember a handful of connections are reserved for the PostgreSQL super user, see <code>superuser_reserved_connections</code>)</li>
|
||||
<li>So we should adjust PostgreSQL’s max connections to be DSpace’s <code>db.maxconnections</code> * 3 + 3</li>
|
||||
<li>This would allow each application to use up to <code>db.maxconnections</code> and not to go over the system’s PostgreSQL limit</li>
|
||||
<li>Perhaps since CGSpace is a busy site with lots of resources we could actually use something like 40 for <code>db.maxconnections</code></li>
|
||||
<li>Also worth looking into is to set up a database pool using JNDI, as apparently DSpace’s <code>db.poolname</code> hasn’t been used since around DSpace 1.7 (according to Chris Wilper’s comments in the thread)</li>
|
||||
<li>Need to go check the PostgreSQL connection stats in Munin on CGSpace from the past week to get an idea if 40 is appropriate</li>
|
||||
<li>Looks like connections hover around 50:</li>
|
||||
<li><p>Of those five applications we’re running, only <code>solr</code> appears not to use the database directly</p></li>
|
||||
|
||||
<li><p>And JSPUI is only used internally (so it doesn’t really count), leaving us with OAI, REST, and XMLUI</p></li>
|
||||
|
||||
<li><p>Assuming each takes a theoretical maximum of 35 connections during a heavy load (35 * 3 = 105), that would put the connections well above PostgreSQL’s default max of 100 connections (remember a handful of connections are reserved for the PostgreSQL super user, see <code>superuser_reserved_connections</code>)</p></li>
|
||||
|
||||
<li><p>So we should adjust PostgreSQL’s max connections to be DSpace’s <code>db.maxconnections</code> * 3 + 3</p></li>
|
||||
|
||||
<li><p>This would allow each application to use up to <code>db.maxconnections</code> and not to go over the system’s PostgreSQL limit</p></li>
|
||||
|
||||
<li><p>Perhaps since CGSpace is a busy site with lots of resources we could actually use something like 40 for <code>db.maxconnections</code></p></li>
|
||||
|
||||
<li><p>Also worth looking into is to set up a database pool using JNDI, as apparently DSpace’s <code>db.poolname</code> hasn’t been used since around DSpace 1.7 (according to Chris Wilper’s comments in the thread)</p></li>
|
||||
|
||||
<li><p>Need to go check the PostgreSQL connection stats in Munin on CGSpace from the past week to get an idea if 40 is appropriate</p></li>
|
||||
|
||||
<li><p>Looks like connections hover around 50:</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2017/08/postgresql-connections-cgspace.png" alt="PostgreSQL connections 2017-08" /></p>
|
||||
@ -356,67 +362,61 @@ $ grep -rsI SQLException dspace-xmlui | wc -l
|
||||
<h2 id="2017-08-16">2017-08-16</h2>
|
||||
|
||||
<ul>
|
||||
<li>I wanted to merge the various field variations like <code>cg.subject.system</code> and <code>cg.subject.system[en_US]</code> in OpenRefine but I realized it would be easier in PostgreSQL:</li>
|
||||
</ul>
|
||||
<li><p>I wanted to merge the various field variations like <code>cg.subject.system</code> and <code>cg.subject.system[en_US]</code> in OpenRefine but I realized it would be easier in PostgreSQL:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, text_lang from metadatavalue where resource_type_id=2 and metadata_field_id=254;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And actually, we can do it for other generic fields for items in those collections, for example <code>dc.description.abstract</code>:</li>
|
||||
</ul>
|
||||
<li><p>And actually, we can do it for other generic fields for items in those collections, for example <code>dc.description.abstract</code>:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_lang='en_US' where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'abstract') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9')))
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And on others like <code>dc.language.iso</code>, <code>dc.relation.ispartofseries</code>, <code>dc.type</code>, <code>dc.title</code>, etc…</li>
|
||||
<li>Also, to move fields from <code>dc.identifier.url</code> to <code>cg.identifier.url[en_US]</code> (because we don’t use the Dublin Core one for some reason):</li>
|
||||
</ul>
|
||||
<li><p>And on others like <code>dc.language.iso</code>, <code>dc.relation.ispartofseries</code>, <code>dc.type</code>, <code>dc.title</code>, etc…</p></li>
|
||||
|
||||
<li><p>Also, to move fields from <code>dc.identifier.url</code> to <code>cg.identifier.url[en_US]</code> (because we don’t use the Dublin Core one for some reason):</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set metadata_field_id = 219, text_lang = 'en_US' where resource_type_id = 2 AND metadata_field_id = 237;
|
||||
UPDATE 15
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Set the text_lang of all <code>dc.identifier.uri</code> (Handle) fields to be NULL, just like default DSpace does:</li>
|
||||
</ul>
|
||||
<li><p>Set the text_lang of all <code>dc.identifier.uri</code> (Handle) fields to be NULL, just like default DSpace does:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_lang=NULL where resource_type_id = 2 and metadata_field_id = 25 and text_value like 'http://hdl.handle.net/10947/%';
|
||||
UPDATE 4248
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also update the text_lang of <code>dc.contributor.author</code> fields for metadata in these collections:</li>
|
||||
</ul>
|
||||
<li><p>Also update the text_lang of <code>dc.contributor.author</code> fields for metadata in these collections:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_lang=NULL where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9')));
|
||||
UPDATE 4899
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Wow, I just wrote this baller regex facet to find duplicate authors:</li>
|
||||
</ul>
|
||||
<li><p>Wow, I just wrote this baller regex facet to find duplicate authors:</p>
|
||||
|
||||
<pre><code>isNotNull(value.match(/(CGIAR .+?)\|\|\1/))
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This would be true if the authors were like <code>CGIAR System Management Office||CGIAR System Management Office</code>, which some of the CGIAR Library’s were</li>
|
||||
<li>Unfortunately when you fix these in OpenRefine and then submit the metadata to DSpace it doesn’t detect any changes, so you have to edit them all manually via DSpace’s “Edit Item”</li>
|
||||
<li>Ooh! And an even more interesting regex would match <em>any</em> duplicated author:</li>
|
||||
</ul>
|
||||
<li><p>This would be true if the authors were like <code>CGIAR System Management Office||CGIAR System Management Office</code>, which some of the CGIAR Library’s were</p></li>
|
||||
|
||||
<li><p>Unfortunately when you fix these in OpenRefine and then submit the metadata to DSpace it doesn’t detect any changes, so you have to edit them all manually via DSpace’s “Edit Item”</p></li>
|
||||
|
||||
<li><p>Ooh! And an even more interesting regex would match <em>any</em> duplicated author:</p>
|
||||
|
||||
<pre><code>isNotNull(value.match(/(.+?)\|\|\1/))
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Which means it can also be used to find items with duplicate <code>dc.subject</code> fields…</li>
|
||||
<li>Finally sent Peter the final dump of the CGIAR System Organization community so he can have a last look at it</li>
|
||||
<li>Post a message to the dspace-tech mailing list to ask about querying the AGROVOC API from the submission form</li>
|
||||
<li>Abenet was asking if there was some way to hide certain internal items from the “ILRI Research Outputs” RSS feed (which is the top-level ILRI community feed), because Shirley was complaining</li>
|
||||
<li>I think we could use <code>harvest.includerestricted.rss = false</code> but the items might need to be 100% restricted, not just the metadata</li>
|
||||
<li>Adjust Ansible postgres role to use <code>max_connections</code> from a template variable and deploy a new limit of 123 on CGSpace</li>
|
||||
<li><p>Which means it can also be used to find items with duplicate <code>dc.subject</code> fields…</p></li>
|
||||
|
||||
<li><p>Finally sent Peter the final dump of the CGIAR System Organization community so he can have a last look at it</p></li>
|
||||
|
||||
<li><p>Post a message to the dspace-tech mailing list to ask about querying the AGROVOC API from the submission form</p></li>
|
||||
|
||||
<li><p>Abenet was asking if there was some way to hide certain internal items from the “ILRI Research Outputs” RSS feed (which is the top-level ILRI community feed), because Shirley was complaining</p></li>
|
||||
|
||||
<li><p>I think we could use <code>harvest.includerestricted.rss = false</code> but the items might need to be 100% restricted, not just the metadata</p></li>
|
||||
|
||||
<li><p>Adjust Ansible postgres role to use <code>max_connections</code> from a template variable and deploy a new limit of 123 on CGSpace</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-08-17">2017-08-17</h2>
|
||||
@ -424,16 +424,14 @@ UPDATE 4899
|
||||
<ul>
|
||||
<li>Run Peter’s edits to the CGIAR System Organization community on DSpace Test</li>
|
||||
<li>Uptime Robot said CGSpace went down for 1 minute, not sure why</li>
|
||||
<li>Looking in <code>dspace.log.2017-08-17</code> I see some weird errors that might be related?</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Looking in <code>dspace.log.2017-08-17</code> I see some weird errors that might be related?</p>
|
||||
|
||||
<pre><code>2017-08-17 07:55:31,396 ERROR net.sf.ehcache.store.DiskStore @ cocoon-ehcacheCache: Could not read disk store element for key PK_G-aspect-cocoon://DRI/12/handle/10568/65885?pipelinehash=823411183535858997_T-Navigation-3368194896954203241. Error was invalid stream header: 00000000
|
||||
java.io.StreamCorruptedException: invalid stream header: 00000000
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Weird that these errors seem to have started on August 11th, the same day we had capacity issues with PostgreSQL:</li>
|
||||
</ul>
|
||||
<li><p>Weird that these errors seem to have started on August 11th, the same day we had capacity issues with PostgreSQL:</p>
|
||||
|
||||
<pre><code># grep -c "ERROR net.sf.ehcache.store.DiskStore" dspace.log.2017-08-*
|
||||
dspace.log.2017-08-01:0
|
||||
@ -453,14 +451,17 @@ dspace.log.2017-08-14:2135
|
||||
dspace.log.2017-08-15:1506
|
||||
dspace.log.2017-08-16:1935
|
||||
dspace.log.2017-08-17:584
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There are none in 2017-07 either…</li>
|
||||
<li>A few posts on the dspace-tech mailing list say this is related to the Cocoon cache somehow</li>
|
||||
<li>I will clear the XMLUI cache for now and see if the errors continue (though perpaps shutting down Tomcat and removing the cache is more effective somehow?)</li>
|
||||
<li>We tested the option for limiting restricted items from the RSS feeds on DSpace Test</li>
|
||||
<li>I created four items, and only the two with public metadata showed up in the community’s RSS feed:
|
||||
<li><p>There are none in 2017-07 either…</p></li>
|
||||
|
||||
<li><p>A few posts on the dspace-tech mailing list say this is related to the Cocoon cache somehow</p></li>
|
||||
|
||||
<li><p>I will clear the XMLUI cache for now and see if the errors continue (though perpaps shutting down Tomcat and removing the cache is more effective somehow?)</p></li>
|
||||
|
||||
<li><p>We tested the option for limiting restricted items from the RSS feeds on DSpace Test</p></li>
|
||||
|
||||
<li><p>I created four items, and only the two with public metadata showed up in the community’s RSS feed:</p>
|
||||
|
||||
<ul>
|
||||
<li>Public metadata, public bitstream ✓</li>
|
||||
@ -468,7 +469,8 @@ dspace.log.2017-08-17:584
|
||||
<li>Restricted metadata, restricted bitstream ✗</li>
|
||||
<li>Private item ✗</li>
|
||||
</ul></li>
|
||||
<li>Peter responded and said that he doesn’t want to limit items to be restricted just so we can change the RSS feeds</li>
|
||||
|
||||
<li><p>Peter responded and said that he doesn’t want to limit items to be restricted just so we can change the RSS feeds</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-08-18">2017-08-18</h2>
|
||||
@ -479,16 +481,16 @@ dspace.log.2017-08-17:584
|
||||
<li>I wired it up to the <code>dc.subject</code> field of the submission interface using the “lookup” type and it works!</li>
|
||||
<li>I think we can use this example to get a working AGROVOC query</li>
|
||||
<li>More information about authority framework: <a href="https://wiki.duraspace.org/display/DSPACE/Authority+Control+of+Metadata+Values">https://wiki.duraspace.org/display/DSPACE/Authority+Control+of+Metadata+Values</a></li>
|
||||
<li>Wow, I’m playing with the AGROVOC SPARQL endpoint using the <a href="https://github.com/tialaramex/sparql-query">sparql-query tool</a>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Wow, I’m playing with the AGROVOC SPARQL endpoint using the <a href="https://github.com/tialaramex/sparql-query">sparql-query tool</a>:</p>
|
||||
|
||||
<pre><code>$ ./sparql-query http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc
|
||||
sparql$ PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
|
||||
SELECT
|
||||
?label
|
||||
?label
|
||||
WHERE {
|
||||
{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . }
|
||||
FILTER regex(str(?label), "^fish", "i") .
|
||||
{ ?concept skos:altLabel ?label . } UNION { ?concept skos:prefLabel ?label . }
|
||||
FILTER regex(str(?label), "^fish", "i") .
|
||||
} LIMIT 10;
|
||||
|
||||
┌───────────────────────┐
|
||||
@ -505,12 +507,13 @@ WHERE {
|
||||
│ fishing times │
|
||||
│ fish passes │
|
||||
└───────────────────────┘
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>More examples about SPARQL syntax: <a href="https://github.com/rsinger/openlcsh/wiki/Sparql-Examples">https://github.com/rsinger/openlcsh/wiki/Sparql-Examples</a></li>
|
||||
<li>I found this blog post about speeding up the Tomcat startup time: <a href="http://skybert.net/java/improve-tomcat-startup-time/">http://skybert.net/java/improve-tomcat-startup-time/</a></li>
|
||||
<li>The startup time went from ~80s to 40s!</li>
|
||||
<li><p>More examples about SPARQL syntax: <a href="https://github.com/rsinger/openlcsh/wiki/Sparql-Examples">https://github.com/rsinger/openlcsh/wiki/Sparql-Examples</a></p></li>
|
||||
|
||||
<li><p>I found this blog post about speeding up the Tomcat startup time: <a href="http://skybert.net/java/improve-tomcat-startup-time/">http://skybert.net/java/improve-tomcat-startup-time/</a></p></li>
|
||||
|
||||
<li><p>The startup time went from ~80s to 40s!</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-08-19">2017-08-19</h2>
|
||||
@ -526,35 +529,35 @@ WHERE {
|
||||
|
||||
<ul>
|
||||
<li>Since I cleared the XMLUI cache on 2017-08-17 there haven’t been any more <code>ERROR net.sf.ehcache.store.DiskStore</code> errors</li>
|
||||
<li>Look at the CGIAR Library to see if I can find the items that have been submitted since May:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Look at the CGIAR Library to see if I can find the items that have been submitted since May:</p>
|
||||
|
||||
<pre><code>dspace=# select * from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z';
|
||||
metadata_value_id | item_id | metadata_field_id | text_value | text_lang | place | authority | confidence
|
||||
metadata_value_id | item_id | metadata_field_id | text_value | text_lang | place | authority | confidence
|
||||
-------------------+---------+-------------------+----------------------+-----------+-------+-----------+------------
|
||||
123117 | 5872 | 11 | 2017-06-28T13:05:18Z | | 1 | | -1
|
||||
123042 | 5869 | 11 | 2017-05-15T03:29:23Z | | 1 | | -1
|
||||
123056 | 5870 | 11 | 2017-05-22T11:27:15Z | | 1 | | -1
|
||||
123072 | 5871 | 11 | 2017-06-06T07:46:01Z | | 1 | | -1
|
||||
123171 | 5874 | 11 | 2017-08-04T07:51:20Z | | 1 | | -1
|
||||
123117 | 5872 | 11 | 2017-06-28T13:05:18Z | | 1 | | -1
|
||||
123042 | 5869 | 11 | 2017-05-15T03:29:23Z | | 1 | | -1
|
||||
123056 | 5870 | 11 | 2017-05-22T11:27:15Z | | 1 | | -1
|
||||
123072 | 5871 | 11 | 2017-06-06T07:46:01Z | | 1 | | -1
|
||||
123171 | 5874 | 11 | 2017-08-04T07:51:20Z | | 1 | | -1
|
||||
(5 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>According to <code>dc.date.accessioned</code> (metadata field id 11) there have only been five items submitted since May</li>
|
||||
<li>These are their handles:</li>
|
||||
</ul>
|
||||
<li><p>According to <code>dc.date.accessioned</code> (metadata field id 11) there have only been five items submitted since May</p></li>
|
||||
|
||||
<li><p>These are their handles:</p>
|
||||
|
||||
<pre><code>dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z');
|
||||
handle
|
||||
handle
|
||||
------------
|
||||
10947/4658
|
||||
10947/4659
|
||||
10947/4660
|
||||
10947/4661
|
||||
10947/4664
|
||||
10947/4658
|
||||
10947/4659
|
||||
10947/4660
|
||||
10947/4661
|
||||
10947/4664
|
||||
(5 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-08-23">2017-08-23</h2>
|
||||
|
||||
@ -575,17 +578,18 @@ WHERE {
|
||||
<li>I notice that in many WLE collections Marianne Gadeberg is in the edit or approval steps, but she is also in the groups for those steps.</li>
|
||||
<li>I think we need to have a process to go back and check / fix some of these scenarios—to remove her user from the step and instead add her to the group—because we have way too many authorizations and in late 2016 we had <a href="https://github.com/ilri/rmg-ansible-public/commit/358b5ea43f9e5820986f897c9d560937c702ac6e">performance issues with Solr</a> because of this</li>
|
||||
<li>I asked Sisay about this and hinted that he should go back and fix these things, but let’s see what he says</li>
|
||||
<li>Saw CGSpace go down briefly today and noticed SQL connection pool errors in the dspace log file:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Saw CGSpace go down briefly today and noticed SQL connection pool errors in the dspace log file:</p>
|
||||
|
||||
<pre><code>ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error
|
||||
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking at the logs I see we have been having hundreds or thousands of these errors a few times per week in 2017-07 and almost every day in 2017-08</li>
|
||||
<li>It seems that I changed the <code>db.maxconnections</code> setting from 70 to 40 around 2017-08-14, but Macaroni Bros also reduced their hourly hammering of the REST API then</li>
|
||||
<li>Nevertheless, it seems like a connection limit is not enough and that I should increase it (as well as the system’s PostgreSQL <code>max_connections</code>)</li>
|
||||
<li><p>Looking at the logs I see we have been having hundreds or thousands of these errors a few times per week in 2017-07 and almost every day in 2017-08</p></li>
|
||||
|
||||
<li><p>It seems that I changed the <code>db.maxconnections</code> setting from 70 to 40 around 2017-08-14, but Macaroni Bros also reduced their hourly hammering of the REST API then</p></li>
|
||||
|
||||
<li><p>Nevertheless, it seems like a connection limit is not enough and that I should increase it (as well as the system’s PostgreSQL <code>max_connections</code>)</p></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -35,7 +35,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two
|
||||
|
||||
Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -129,26 +129,33 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account
|
||||
<h2 id="2017-09-10">2017-09-10</h2>
|
||||
|
||||
<ul>
|
||||
<li>Delete 58 blank metadata values from the CGSpace database:</li>
|
||||
</ul>
|
||||
<li><p>Delete 58 blank metadata values from the CGSpace database:</p>
|
||||
|
||||
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
|
||||
DELETE 58
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I also ran it on DSpace Test because we’ll be migrating the CGIAR Library soon and it would be good to catch these before we migrate</li>
|
||||
<li>Run system updates and restart DSpace Test</li>
|
||||
<li>We only have 7.7GB of free space on DSpace Test so I need to copy some data off of it before doing the CGIAR Library migration (requires lots of exporting and creating temp files)</li>
|
||||
<li>I still have the original data from the CGIAR Library so I’ve zipped it up and sent it off to linode18 for now</li>
|
||||
<li>sha256sum of <code>original-cgiar-library-6.6GB.tar.gz</code> is: bcfabb52f51cbdf164b61b7e9b3a0e498479e4c1ed1d547d32d11f44c0d5eb8a</li>
|
||||
<li>Start doing a test run of the CGIAR Library migration locally</li>
|
||||
<li>Notes and todo checklist here for now: <a href="https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c">https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c</a></li>
|
||||
<li>Create pull request for Phase I and II changes to CCAFS Project Tags: <a href="https://github.com/ilri/DSpace/pull/336">#336</a></li>
|
||||
<li>We’ve been discussing with Macaroni Bros and CCAFS for the past month or so and the list of tags was recently finalized</li>
|
||||
<li>There will need to be some metadata updates — though if I recall correctly it is only about seven records — for that as well, I had made some notes about it in <a href="/cgspace-notes/2017-07">2017-07</a>, but I’ve asked for more clarification from Lili just in case</li>
|
||||
<li>Looking at the DSpace logs to see if we’ve had a change in the “Cannot get a connection” errors since last month when we adjusted the <code>db.maxconnections</code> parameter on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>I also ran it on DSpace Test because we’ll be migrating the CGIAR Library soon and it would be good to catch these before we migrate</p></li>
|
||||
|
||||
<li><p>Run system updates and restart DSpace Test</p></li>
|
||||
|
||||
<li><p>We only have 7.7GB of free space on DSpace Test so I need to copy some data off of it before doing the CGIAR Library migration (requires lots of exporting and creating temp files)</p></li>
|
||||
|
||||
<li><p>I still have the original data from the CGIAR Library so I’ve zipped it up and sent it off to linode18 for now</p></li>
|
||||
|
||||
<li><p>sha256sum of <code>original-cgiar-library-6.6GB.tar.gz</code> is: bcfabb52f51cbdf164b61b7e9b3a0e498479e4c1ed1d547d32d11f44c0d5eb8a</p></li>
|
||||
|
||||
<li><p>Start doing a test run of the CGIAR Library migration locally</p></li>
|
||||
|
||||
<li><p>Notes and todo checklist here for now: <a href="https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c">https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c</a></p></li>
|
||||
|
||||
<li><p>Create pull request for Phase I and II changes to CCAFS Project Tags: <a href="https://github.com/ilri/DSpace/pull/336">#336</a></p></li>
|
||||
|
||||
<li><p>We’ve been discussing with Macaroni Bros and CCAFS for the past month or so and the list of tags was recently finalized</p></li>
|
||||
|
||||
<li><p>There will need to be some metadata updates — though if I recall correctly it is only about seven records — for that as well, I had made some notes about it in <a href="/cgspace-notes/2017-07">2017-07</a>, but I’ve asked for more clarification from Lili just in case</p></li>
|
||||
|
||||
<li><p>Looking at the DSpace logs to see if we’ve had a change in the “Cannot get a connection” errors since last month when we adjusted the <code>db.maxconnections</code> parameter on CGSpace:</p>
|
||||
|
||||
<pre><code># grep -c "Cannot get a connection, pool error Timeout waiting for idle object" dspace.log.2017-09-*
|
||||
dspace.log.2017-09-01:0
|
||||
@ -161,14 +168,17 @@ dspace.log.2017-09-07:0
|
||||
dspace.log.2017-09-08:10
|
||||
dspace.log.2017-09-09:0
|
||||
dspace.log.2017-09-10:0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also, since last month (2017-08) Macaroni Bros no longer runs their REST API scraper every hour, so I’m sure that helped</li>
|
||||
<li>There are still some errors, though, so maybe I should bump the connection limit up a bit</li>
|
||||
<li>I remember seeing that Munin shows that the average number of connections is 50 (which is probably mostly from the XMLUI) and we’re currently allowing 40 connections per app, so maybe it would be good to bump that value up to 50 or 60 along with the system’s PostgreSQL <code>max_connections</code> (formula should be: webapps * 60 + 3, or 3 * 60 + 3 = 183 in our case)</li>
|
||||
<li>I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit)</li>
|
||||
<li>I’m expecting to see 0 connection errors for the next few months</li>
|
||||
<li><p>Also, since last month (2017-08) Macaroni Bros no longer runs their REST API scraper every hour, so I’m sure that helped</p></li>
|
||||
|
||||
<li><p>There are still some errors, though, so maybe I should bump the connection limit up a bit</p></li>
|
||||
|
||||
<li><p>I remember seeing that Munin shows that the average number of connections is 50 (which is probably mostly from the XMLUI) and we’re currently allowing 40 connections per app, so maybe it would be good to bump that value up to 50 or 60 along with the system’s PostgreSQL <code>max_connections</code> (formula should be: webapps * 60 + 3, or 3 * 60 + 3 = 183 in our case)</p></li>
|
||||
|
||||
<li><p>I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit)</p></li>
|
||||
|
||||
<li><p>I’m expecting to see 0 connection errors for the next few months</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-09-11">2017-09-11</h2>
|
||||
@ -183,27 +193,30 @@ dspace.log.2017-09-10:0
|
||||
<ul>
|
||||
<li>I was testing the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-AIPConfigurationsToImproveIngestionSpeedwhileValidating">METS XSD caching during AIP ingest</a> but it doesn’t seem to help actually</li>
|
||||
<li>The import process takes the same amount of time with and without the caching</li>
|
||||
<li>Also, I captured TCP packets destined for port 80 and both imports only captured ONE packet (an update check from some component in Java):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Also, I captured TCP packets destined for port 80 and both imports only captured ONE packet (an update check from some component in Java):</p>
|
||||
|
||||
<pre><code>$ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x47455420'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Great TCP dump guide here: <a href="https://danielmiessler.com/study/tcpdump">https://danielmiessler.com/study/tcpdump</a></li>
|
||||
<li>The last part of that command filters for HTTP GET requests, of which there should have been many to fetch all the XSD files for validation</li>
|
||||
<li>I sent a message to the mailing list to see if anyone knows more about this</li>
|
||||
<li>In looking at the tcpdump results I notice that there is an update check to the ehcache server on <em>every</em> iteration of the ingest loop, for example:</li>
|
||||
</ul>
|
||||
<li><p>Great TCP dump guide here: <a href="https://danielmiessler.com/study/tcpdump">https://danielmiessler.com/study/tcpdump</a></p></li>
|
||||
|
||||
<li><p>The last part of that command filters for HTTP GET requests, of which there should have been many to fetch all the XSD files for validation</p></li>
|
||||
|
||||
<li><p>I sent a message to the mailing list to see if anyone knows more about this</p></li>
|
||||
|
||||
<li><p>In looking at the tcpdump results I notice that there is an update check to the ehcache server on <em>every</em> iteration of the ingest loop, for example:</p>
|
||||
|
||||
<pre><code>09:39:36.008956 IP 192.168.8.124.50515 > 157.189.192.67.http: Flags [P.], seq 1736833672:1736834103, ack 147469926, win 4120, options [nop,nop,TS val 1175113331 ecr 550028064], length 431: HTTP: GET /kit/reflector?kitID=ehcache.default&pageID=update.properties&id=2130706433&os-name=Mac+OS+X&jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&jvm-version=1.8.0_144&platform=x86_64&tc-version=UNKNOWN&tc-product=Ehcache+Core+1.7.2&source=Ehcache+Core&uptime-secs=0&patch=UNKNOWN HTTP/1.1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Turns out this is a known issue and Ehcache has refused to make it opt-in: <a href="https://jira.terracotta.org/jira/browse/EHC-461">https://jira.terracotta.org/jira/browse/EHC-461</a></li>
|
||||
<li>But we can disable it by adding an <code>updateCheck="false"</code> attribute to the main <code><ehcache ></code> tag in <code>dspace-services/src/main/resources/caching/ehcache-config.xml</code></li>
|
||||
<li>After re-compiling and re-deploying DSpace I no longer see those update checks during item submission</li>
|
||||
<li>I had a Skype call with Bram Luyten from Atmire to discuss various issues related to ORCID in DSpace
|
||||
<li><p>Turns out this is a known issue and Ehcache has refused to make it opt-in: <a href="https://jira.terracotta.org/jira/browse/EHC-461">https://jira.terracotta.org/jira/browse/EHC-461</a></p></li>
|
||||
|
||||
<li><p>But we can disable it by adding an <code>updateCheck="false"</code> attribute to the main <code><ehcache ></code> tag in <code>dspace-services/src/main/resources/caching/ehcache-config.xml</code></p></li>
|
||||
|
||||
<li><p>After re-compiling and re-deploying DSpace I no longer see those update checks during item submission</p></li>
|
||||
|
||||
<li><p>I had a Skype call with Bram Luyten from Atmire to discuss various issues related to ORCID in DSpace</p>
|
||||
|
||||
<ul>
|
||||
<li>First, ORCID is deprecating their version 1 API (which DSpace uses) and in version 2 API they have removed the ability to search for users by name</li>
|
||||
@ -221,35 +234,32 @@ dspace.log.2017-09-10:0
|
||||
<ul>
|
||||
<li>Last night Linode sent an alert about CGSpace (linode18) that it has exceeded the outbound traffic rate threshold of 10Mb/s for the last two hours</li>
|
||||
<li>I wonder what was going on, and looking into the nginx logs I think maybe it’s OAI…</li>
|
||||
<li>Here is yesterday’s top ten IP addresses making requests to <code>/oai</code>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Here is yesterday’s top ten IP addresses making requests to <code>/oai</code>:</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/oai.log | sort -n | uniq -c | sort -h | tail -n 10
|
||||
1 213.136.89.78
|
||||
1 66.249.66.90
|
||||
1 66.249.66.92
|
||||
3 68.180.229.31
|
||||
4 35.187.22.255
|
||||
13745 54.70.175.86
|
||||
15814 34.211.17.113
|
||||
15825 35.161.215.53
|
||||
16704 54.70.51.7
|
||||
</code></pre>
|
||||
1 213.136.89.78
|
||||
1 66.249.66.90
|
||||
1 66.249.66.92
|
||||
3 68.180.229.31
|
||||
4 35.187.22.255
|
||||
13745 54.70.175.86
|
||||
15814 34.211.17.113
|
||||
15825 35.161.215.53
|
||||
16704 54.70.51.7
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Compared to the previous day’s logs it looks VERY high:</li>
|
||||
</ul>
|
||||
<li><p>Compared to the previous day’s logs it looks VERY high:</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
|
||||
1 207.46.13.39
|
||||
1 66.249.66.93
|
||||
2 66.249.66.91
|
||||
4 216.244.66.194
|
||||
14 66.249.66.90
|
||||
</code></pre>
|
||||
1 207.46.13.39
|
||||
1 66.249.66.93
|
||||
2 66.249.66.91
|
||||
4 216.244.66.194
|
||||
14 66.249.66.90
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The user agents for those top IPs are:
|
||||
<li><p>The user agents for those top IPs are:</p>
|
||||
|
||||
<ul>
|
||||
<li>54.70.175.86: API scraper</li>
|
||||
@ -257,8 +267,8 @@ dspace.log.2017-09-10:0
|
||||
<li>35.161.215.53: API scraper</li>
|
||||
<li>54.70.51.7: API scraper</li>
|
||||
</ul></li>
|
||||
<li>And this user agent has never been seen before today (or at least recently!):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>And this user agent has never been seen before today (or at least recently!):</p>
|
||||
|
||||
<pre><code># grep -c "API scraper" /var/log/nginx/oai.log
|
||||
62088
|
||||
@ -292,185 +302,179 @@ dspace.log.2017-09-10:0
|
||||
/var/log/nginx/oai.log.7.gz:0
|
||||
/var/log/nginx/oai.log.8.gz:0
|
||||
/var/log/nginx/oai.log.9.gz:0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Some of these heavy users are also using XMLUI, and their user agent isn’t matched by the <a href="https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/tomcat/server-tomcat7.xml.j2#L158">Tomcat Session Crawler valve</a>, so each request uses a different session</li>
|
||||
<li>Yesterday alone the IP addresses using the <code>API scraper</code> user agent were responsible for 16,000 sessions in XMLUI:</li>
|
||||
</ul>
|
||||
<li><p>Some of these heavy users are also using XMLUI, and their user agent isn’t matched by the <a href="https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/tomcat/server-tomcat7.xml.j2#L158">Tomcat Session Crawler valve</a>, so each request uses a different session</p></li>
|
||||
|
||||
<li><p>Yesterday alone the IP addresses using the <code>API scraper</code> user agent were responsible for 16,000 sessions in XMLUI:</p>
|
||||
|
||||
<pre><code># grep -a -E "(54.70.51.7|35.161.215.53|34.211.17.113|54.70.175.86)" /home/cgspace.cgiar.org/log/dspace.log.2017-09-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
15924
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex</li>
|
||||
<li>A search for “API scraper” user agent on Google returns a <code>robots.txt</code> with a comment that this is the Yewno bot: <a href="http://www.escholarship.org/robots.txt">http://www.escholarship.org/robots.txt</a></li>
|
||||
<li>Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into:</li>
|
||||
</ul>
|
||||
<li><p>If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex</p></li>
|
||||
|
||||
<li><p>A search for “API scraper” user agent on Google returns a <code>robots.txt</code> with a comment that this is the Yewno bot: <a href="http://www.escholarship.org/robots.txt">http://www.escholarship.org/robots.txt</a></p></li>
|
||||
|
||||
<li><p>Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into:</p>
|
||||
|
||||
<pre><code>WARN org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking at the spreadsheet with deletions and corrections that CCAFS sent last week</li>
|
||||
<li>It appears they want to delete a lot of metadata, which I’m not sure they realize the implications of:</li>
|
||||
</ul>
|
||||
<li><p>Looking at the spreadsheet with deletions and corrections that CCAFS sent last week</p></li>
|
||||
|
||||
<li><p>It appears they want to delete a lot of metadata, which I’m not sure they realize the implications of:</p>
|
||||
|
||||
<pre><code>dspace=# select text_value, count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange') group by text_value;
|
||||
text_value | count
|
||||
text_value | count
|
||||
--------------------------+-------
|
||||
FP4_ClimateModels | 6
|
||||
FP1_CSAEvidence | 7
|
||||
SEA_UpscalingInnovation | 7
|
||||
FP4_Baseline | 69
|
||||
WA_Partnership | 1
|
||||
WA_SciencePolicyExchange | 6
|
||||
SA_GHGMeasurement | 2
|
||||
SA_CSV | 7
|
||||
EA_PAR | 18
|
||||
FP4_Livestock | 7
|
||||
FP4_GenderPolicy | 4
|
||||
FP2_CRMWestAfrica | 12
|
||||
FP4_ClimateData | 24
|
||||
FP4_CCPAG | 2
|
||||
SEA_mitigationSAMPLES | 2
|
||||
SA_Biodiversity | 1
|
||||
FP4_PolicyEngagement | 20
|
||||
FP3_Gender | 9
|
||||
FP4_GenderToolbox | 3
|
||||
FP4_ClimateModels | 6
|
||||
FP1_CSAEvidence | 7
|
||||
SEA_UpscalingInnovation | 7
|
||||
FP4_Baseline | 69
|
||||
WA_Partnership | 1
|
||||
WA_SciencePolicyExchange | 6
|
||||
SA_GHGMeasurement | 2
|
||||
SA_CSV | 7
|
||||
EA_PAR | 18
|
||||
FP4_Livestock | 7
|
||||
FP4_GenderPolicy | 4
|
||||
FP2_CRMWestAfrica | 12
|
||||
FP4_ClimateData | 24
|
||||
FP4_CCPAG | 2
|
||||
SEA_mitigationSAMPLES | 2
|
||||
SA_Biodiversity | 1
|
||||
FP4_PolicyEngagement | 20
|
||||
FP3_Gender | 9
|
||||
FP4_GenderToolbox | 3
|
||||
(19 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I sent CCAFS people an email to ask if they really want to remove these 200+ tags</li>
|
||||
<li>She responded yes, so I’ll at least need to do these deletes in PostgreSQL:</li>
|
||||
</ul>
|
||||
<li><p>I sent CCAFS people an email to ask if they really want to remove these 200+ tags</p></li>
|
||||
|
||||
<li><p>She responded yes, so I’ll at least need to do these deletes in PostgreSQL:</p>
|
||||
|
||||
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
|
||||
DELETE 207
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>When we discussed this in late July there were some other renames they had requested, but I don’t see them in the current spreadsheet so I will have to follow that up</li>
|
||||
<li>I talked to Macaroni Bros and they said to just go ahead with the other corrections as well as their spreadsheet was evolved organically rather than systematically!</li>
|
||||
<li>The final list of corrections and deletes should therefore be:</li>
|
||||
</ul>
|
||||
<li><p>When we discussed this in late July there were some other renames they had requested, but I don’t see them in the current spreadsheet so I will have to follow that up</p></li>
|
||||
|
||||
<li><p>I talked to Macaroni Bros and they said to just go ahead with the other corrections as well as their spreadsheet was evolved organically rather than systematically!</p></li>
|
||||
|
||||
<li><p>The final list of corrections and deletes should therefore be:</p>
|
||||
|
||||
<pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
|
||||
update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
|
||||
update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
|
||||
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
|
||||
delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Create and merge pull request to shut up the Ehcache update check (<a href="https://github.com/ilri/DSpace/pull/337">#337</a>)</li>
|
||||
<li>Although it looks like there was a previous attempt to disable these update checks that was merged in DSpace 4.0 (although it only affects XMLUI): <a href="https://jira.duraspace.org/browse/DS-1492">https://jira.duraspace.org/browse/DS-1492</a></li>
|
||||
<li>I commented there suggesting that we disable it globally</li>
|
||||
<li>I merged the changes to the CCAFS project tags (<a href="https://github.com/ilri/DSpace/pull/336">#336</a>) but still need to finalize the metadata deletions/renames</li>
|
||||
<li>I merged the CGIAR Library theme changes (<a href="https://github.com/ilri/DSpace/pull/338">#338</a>) to the <code>5_x-prod</code> branch in preparation for next week’s migration</li>
|
||||
<li>I emailed the Handle administrators (hdladmin@cnri.reston.va.us) to ask them what the process for changing their prefix to be resolved by our resolver</li>
|
||||
<li>They responded and said that they need email confirmation from the contact of record of the other prefix, so I should have the CGIAR System Organization people email them before I send the new <code>sitebndl.zip</code></li>
|
||||
<li>Testing to see how we end up with all these new authorities after we keep cleaning and merging them in the database</li>
|
||||
<li>Here are all my distinct authority combinations in the database before:</li>
|
||||
</ul>
|
||||
<li><p>Create and merge pull request to shut up the Ehcache update check (<a href="https://github.com/ilri/DSpace/pull/337">#337</a>)</p></li>
|
||||
|
||||
<li><p>Although it looks like there was a previous attempt to disable these update checks that was merged in DSpace 4.0 (although it only affects XMLUI): <a href="https://jira.duraspace.org/browse/DS-1492">https://jira.duraspace.org/browse/DS-1492</a></p></li>
|
||||
|
||||
<li><p>I commented there suggesting that we disable it globally</p></li>
|
||||
|
||||
<li><p>I merged the changes to the CCAFS project tags (<a href="https://github.com/ilri/DSpace/pull/336">#336</a>) but still need to finalize the metadata deletions/renames</p></li>
|
||||
|
||||
<li><p>I merged the CGIAR Library theme changes (<a href="https://github.com/ilri/DSpace/pull/338">#338</a>) to the <code>5_x-prod</code> branch in preparation for next week’s migration</p></li>
|
||||
|
||||
<li><p>I emailed the Handle administrators (hdladmin@cnri.reston.va.us) to ask them what the process for changing their prefix to be resolved by our resolver</p></li>
|
||||
|
||||
<li><p>They responded and said that they need email confirmation from the contact of record of the other prefix, so I should have the CGIAR System Organization people email them before I send the new <code>sitebndl.zip</code></p></li>
|
||||
|
||||
<li><p>Testing to see how we end up with all these new authorities after we keep cleaning and merging them in the database</p></li>
|
||||
|
||||
<li><p>Here are all my distinct authority combinations in the database before:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
|
||||
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
|
||||
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
|
||||
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
|
||||
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
|
||||
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
|
||||
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
|
||||
(8 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And then after adding a new item and selecting an existing “Orth, Alan” with an ORCID in the author lookup:</li>
|
||||
</ul>
|
||||
<li><p>And then after adding a new item and selecting an existing “Orth, Alan” with an ORCID in the author lookup:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
|
||||
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
|
||||
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
|
||||
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
|
||||
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
|
||||
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
|
||||
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
|
||||
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
|
||||
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
|
||||
(9 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It created a new authority… let’s try to add another item and select the same existing author and see what happens in the database:</li>
|
||||
</ul>
|
||||
<li><p>It created a new authority… let’s try to add another item and select the same existing author and see what happens in the database:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
|
||||
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
|
||||
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
|
||||
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
|
||||
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
|
||||
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
|
||||
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
|
||||
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
|
||||
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
|
||||
(9 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>No new one… so now let me try to add another item and select the italicized result from the ORCID lookup and see what happens in the database:</li>
|
||||
</ul>
|
||||
<li><p>No new one… so now let me try to add another item and select the italicized result from the ORCID lookup and see what happens in the database:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
|
||||
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
|
||||
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
|
||||
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
|
||||
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
|
||||
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
|
||||
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
|
||||
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
|
||||
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
|
||||
(10 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Shit, it created another authority! Let’s try it again!</li>
|
||||
</ul>
|
||||
<li><p>Shit, it created another authority! Let’s try it again!</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
|
||||
Orth, Alan | 9aed566a-a248-4878-9577-0caedada43db | 600
|
||||
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
|
||||
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
|
||||
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
|
||||
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f | 600
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 600
|
||||
Orth, Alan | 9aed566a-a248-4878-9577-0caedada43db | 600
|
||||
Orth, A. | 1a1943a0-3f87-402f-9afe-e52fb46a513e | 600
|
||||
Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e | -1
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | 0
|
||||
Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde | 600
|
||||
Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 | -1
|
||||
Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600
|
||||
(11 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It added <em>another</em> authority… surely this is not the desired behavior, or maybe we are not using this as intented?</li>
|
||||
<li><p>It added <em>another</em> authority… surely this is not the desired behavior, or maybe we are not using this as intented?</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-09-14">2017-09-14</h2>
|
||||
@ -487,8 +491,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
|
||||
<h2 id="2017-09-15">2017-09-15</h2>
|
||||
|
||||
<ul>
|
||||
<li>Apply CCAFS project tag corrections on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>Apply CCAFS project tag corrections on CGSpace:</p>
|
||||
|
||||
<pre><code>dspace=# \i /tmp/ccafs-projects.sql
|
||||
DELETE 5
|
||||
@ -496,15 +499,16 @@ UPDATE 4
|
||||
UPDATE 1
|
||||
DELETE 1
|
||||
DELETE 207
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-09-17">2017-09-17</h2>
|
||||
|
||||
<ul>
|
||||
<li>Create pull request for CGSpace to be able to resolve multiple handles (<a href="https://github.com/ilri/DSpace/pull/339">#339</a>)</li>
|
||||
<li>We still need to do the changes to <code>config.dct</code> and regenerate the <code>sitebndl.zip</code> to send to the Handle.net admins</li>
|
||||
<li>According to this <a href="http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html">dspace-tech mailing list entry from 2011</a>, we need to add the extra handle prefixes to <code>config.dct</code> like this:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>According to this <a href="http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html">dspace-tech mailing list entry from 2011</a>, we need to add the extra handle prefixes to <code>config.dct</code> like this:</p>
|
||||
|
||||
<pre><code>"server_admins" = (
|
||||
"300:0.NA/10568"
|
||||
@ -520,21 +524,22 @@ DELETE 207
|
||||
"300:0.NA/10568"
|
||||
"300:0.NA/10947"
|
||||
)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>More work on the CGIAR Library migration test run locally, as I was having problem with importing the last fourteen items from the CGIAR System Management Office community</li>
|
||||
<li>The problem was that we remapped the items to new collections after the initial import, so the items were using the 10947 prefix but the community and collection was using 10568</li>
|
||||
<li>I ended up having to read the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-ForceReplaceMode">AIP Backup and Restore</a> closely a few times and then explicitly preserve handles and ignore parents:</li>
|
||||
</ul>
|
||||
<li><p>More work on the CGIAR Library migration test run locally, as I was having problem with importing the last fourteen items from the CGIAR System Management Office community</p></li>
|
||||
|
||||
<li><p>The problem was that we remapped the items to new collections after the initial import, so the items were using the 10947 prefix but the community and collection was using 10568</p></li>
|
||||
|
||||
<li><p>I ended up having to read the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-ForceReplaceMode">AIP Backup and Restore</a> closely a few times and then explicitly preserve handles and ignore parents:</p>
|
||||
|
||||
<pre><code>$ for item in 10568-93759/ITEM@10947-46*; do ~/dspace/bin/dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/87738 $item; done
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also, this was in replace mode (-r) rather than submit mode (-s), because submit mode always generated a new handle even if I told it not to!</li>
|
||||
<li>I decided to start the import process in the evening rather than waiting for the morning, and right as the first community was finished importing I started seeing <code>Timeout waiting for idle object</code> errors</li>
|
||||
<li>I had to cancel the import, clean up a bunch of database entries, increase the PostgreSQL <code>max_connections</code> as a precaution, restart PostgreSQL and Tomcat, and then finally completed the import</li>
|
||||
<li><p>Also, this was in replace mode (-r) rather than submit mode (-s), because submit mode always generated a new handle even if I told it not to!</p></li>
|
||||
|
||||
<li><p>I decided to start the import process in the evening rather than waiting for the morning, and right as the first community was finished importing I started seeing <code>Timeout waiting for idle object</code> errors</p></li>
|
||||
|
||||
<li><p>I had to cancel the import, clean up a bunch of database entries, increase the PostgreSQL <code>max_connections</code> as a precaution, restart PostgreSQL and Tomcat, and then finally completed the import</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-09-18">2017-09-18</h2>
|
||||
@ -555,35 +560,37 @@ DELETE 207
|
||||
<h2 id="2017-09-19">2017-09-19</h2>
|
||||
|
||||
<ul>
|
||||
<li>Nightly Solr indexing is working again, and it appears to be pretty quick actually:</li>
|
||||
</ul>
|
||||
<li><p>Nightly Solr indexing is working again, and it appears to be pretty quick actually:</p>
|
||||
|
||||
<pre><code>2017-09-19 00:00:14,953 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (0 of 65808): 17607
|
||||
...
|
||||
2017-09-19 00:04:18,017 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (65807 of 65808): 83753
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Sisay asked if he could import 50 items for IITA that have already been checked by Bosede and Bizuwork</li>
|
||||
<li>I had a look at the collection and noticed a bunch of issues with item types and donors, so I asked him to fix those and import it to DSpace Test again first</li>
|
||||
<li>Abenet wants to be able to filter by ISI Journal in advanced search on queries like this: <a href="https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&filtertype_1=dateIssued&filter_relational_operator_1=equals&filter_relational_operator_0=equals&filter_1=%5B2010+TO+2017%5D&filter_0=2017&filtertype=type&filter_relational_operator=equals&filter=Journal+Article">https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&filtertype_1=dateIssued&filter_relational_operator_1=equals&filter_relational_operator_0=equals&filter_1=%5B2010+TO+2017%5D&filter_0=2017&filtertype=type&filter_relational_operator=equals&filter=Journal+Article</a></li>
|
||||
<li>I opened an issue to track this (<a href="https://github.com/ilri/DSpace/issues/340">#340</a>) and will test it on DSpace Test soon</li>
|
||||
<li>Marianne Gadeberg from WLE asked if I would add an account for Adam Hunt on CGSpace and give him permissions to approve all WLE publications</li>
|
||||
<li>I told him to register first, as he’s a CGIAR user and needs an account to be created before I can add him to the groups</li>
|
||||
<li><p>Sisay asked if he could import 50 items for IITA that have already been checked by Bosede and Bizuwork</p></li>
|
||||
|
||||
<li><p>I had a look at the collection and noticed a bunch of issues with item types and donors, so I asked him to fix those and import it to DSpace Test again first</p></li>
|
||||
|
||||
<li><p>Abenet wants to be able to filter by ISI Journal in advanced search on queries like this: <a href="https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&filtertype_1=dateIssued&filter_relational_operator_1=equals&filter_relational_operator_0=equals&filter_1=%5B2010+TO+2017%5D&filter_0=2017&filtertype=type&filter_relational_operator=equals&filter=Journal+Article">https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&filtertype_1=dateIssued&filter_relational_operator_1=equals&filter_relational_operator_0=equals&filter_1=%5B2010+TO+2017%5D&filter_0=2017&filtertype=type&filter_relational_operator=equals&filter=Journal+Article</a></p></li>
|
||||
|
||||
<li><p>I opened an issue to track this (<a href="https://github.com/ilri/DSpace/issues/340">#340</a>) and will test it on DSpace Test soon</p></li>
|
||||
|
||||
<li><p>Marianne Gadeberg from WLE asked if I would add an account for Adam Hunt on CGSpace and give him permissions to approve all WLE publications</p></li>
|
||||
|
||||
<li><p>I told him to register first, as he’s a CGIAR user and needs an account to be created before I can add him to the groups</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-09-20">2017-09-20</h2>
|
||||
|
||||
<ul>
|
||||
<li>Abenet and I noticed that hdl.handle.net is blocked by ETC at ILRI Addis so I asked Biruk Debebe to route it over the satellite</li>
|
||||
<li>Force thumbnail regeneration for the CGIAR System Organization’s Historic Archive community (2000 items):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Force thumbnail regeneration for the CGIAR System Organization’s Historic Archive community (2000 items):</p>
|
||||
|
||||
<pre><code>$ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -f -i 10947/1 -p "ImageMagick PDF Thumbnail"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’m still waiting (over 1 day later) to hear back from the CGIAR System Organization about updating the DNS for library.cgiar.org</li>
|
||||
<li><p>I’m still waiting (over 1 day later) to hear back from the CGIAR System Organization about updating the DNS for library.cgiar.org</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-09-21">2017-09-21</h2>
|
||||
@ -633,16 +640,17 @@ DELETE 207
|
||||
<li>It is easy to do via CSV using OpenRefine but I noticed that on CGSpace ~1,000 of the expected 2,500 are already mapped, while on DSpace Test they were not</li>
|
||||
<li>I’ve asked Peter if he knows what’s going on (or who mapped them)</li>
|
||||
<li>Turns out he had already mapped some, but requested that I finish the rest</li>
|
||||
<li>With this GREL in OpenRefine I can find items that are mapped, ie they have <code>10568/3||</code> or <code>10568/3$</code> in their <code>collection</code> field:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>With this GREL in OpenRefine I can find items that are mapped, ie they have <code>10568/3||</code> or <code>10568/3$</code> in their <code>collection</code> field:</p>
|
||||
|
||||
<pre><code>isNotNull(value.match(/.+?10568\/3(\|\|.+|$)/))
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Peter also made a lot of changes to the data in the Archives collections while I was attempting to import the changes, so we were essentially competing for PostgreSQL and Solr connections</li>
|
||||
<li>I ended up having to kill the import and wait until he was done</li>
|
||||
<li>I exported a clean CSV and applied the changes from that one, which was a hundred or two less than I thought there should be (at least compared to the current state of DSpace Test, which is a few months old)</li>
|
||||
<li><p>Peter also made a lot of changes to the data in the Archives collections while I was attempting to import the changes, so we were essentially competing for PostgreSQL and Solr connections</p></li>
|
||||
|
||||
<li><p>I ended up having to kill the import and wait until he was done</p></li>
|
||||
|
||||
<li><p>I exported a clean CSV and applied the changes from that one, which was a hundred or two less than I thought there should be (at least compared to the current state of DSpace Test, which is a few months old)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-09-25">2017-09-25</h2>
|
||||
@ -650,30 +658,27 @@ DELETE 207
|
||||
<ul>
|
||||
<li>Email Rosemary Kande from ICT to ask about the administrative / finance procedure for moving DSpace Test from EU to US region on Linode</li>
|
||||
<li>Communicate (finally) with Tania and Tunji from the CGIAR System Organization office to tell them to request CGNET make the DNS updates for library.cgiar.org</li>
|
||||
<li>Peter wants me to clean up the text values for Delia Grace’s metadata, as the authorities are all messed up again since we cleaned them up in <a href="/cgspace-notes/2016-12">2016-12</a>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Peter wants me to clean up the text values for Delia Grace’s metadata, as the authorities are all messed up again since we cleaned them up in <a href="/cgspace-notes/2016-12">2016-12</a>:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
||||
text_value | authority | confidence
|
||||
text_value | authority | confidence
|
||||
--------------+--------------------------------------+------------
|
||||
Grace, Delia | | 600
|
||||
Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c | 600
|
||||
Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c | -1
|
||||
Grace, D. | 6a8ddca3-33c1-45f9-aa00-6fa9fc91e3fc | -1
|
||||
</code></pre>
|
||||
Grace, Delia | | 600
|
||||
Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c | 600
|
||||
Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c | -1
|
||||
Grace, D. | 6a8ddca3-33c1-45f9-aa00-6fa9fc91e3fc | -1
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Strangely, none of her authority entries have ORCIDs anymore…</li>
|
||||
<li>I’ll just fix the text values and forget about it for now:</li>
|
||||
</ul>
|
||||
<li><p>Strangely, none of her authority entries have ORCIDs anymore…</p></li>
|
||||
|
||||
<li><p>I’ll just fix the text values and forget about it for now:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
||||
UPDATE 610
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After this we have to reindex the Discovery and Authority cores (as <code>tomcat7</code> user):</li>
|
||||
</ul>
|
||||
<li><p>After this we have to reindex the Discovery and Authority cores (as <code>tomcat7</code> user):</p>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
@ -686,41 +691,39 @@ Retrieving all data
|
||||
Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
|
||||
Exception: null
|
||||
java.lang.NullPointerException
|
||||
at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
|
||||
at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
|
||||
at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
|
||||
at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
|
||||
real 6m6.447s
|
||||
user 1m34.010s
|
||||
sys 0m12.113s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The <code>index-authority</code> script always seems to fail, I think it’s the same old bug</li>
|
||||
<li>Something interesting for my notes about JNDI database pool—since I couldn’t determine if it was working or not when I tried it locally the other day—is this error message that I just saw in the DSpace logs today:</li>
|
||||
</ul>
|
||||
<li><p>The <code>index-authority</code> script always seems to fail, I think it’s the same old bug</p></li>
|
||||
|
||||
<li><p>Something interesting for my notes about JNDI database pool—since I couldn’t determine if it was working or not when I tried it locally the other day—is this error message that I just saw in the DSpace logs today:</p>
|
||||
|
||||
<pre><code>ERROR org.dspace.storage.rdbms.DatabaseManager @ Error retrieving JNDI context: jdbc/dspaceLocal
|
||||
...
|
||||
INFO org.dspace.storage.rdbms.DatabaseManager @ Unable to locate JNDI dataSource: jdbc/dspaceLocal
|
||||
INFO org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Database pool
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So it’s good to know that <em>something</em> gets printed when it fails because I didn’t see <em>any</em> mention of JNDI before when I was testing!</li>
|
||||
<li><p>So it’s good to know that <em>something</em> gets printed when it fails because I didn’t see <em>any</em> mention of JNDI before when I was testing!</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-09-26">2017-09-26</h2>
|
||||
@ -741,24 +744,23 @@ INFO org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Da
|
||||
<ul>
|
||||
<li>Tunji from the System Organization finally sent the DNS request for library.cgiar.org to CGNET</li>
|
||||
<li>Now the redirects work</li>
|
||||
<li>I quickly registered a Let’s Encrypt certificate for the domain:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I quickly registered a Let’s Encrypt certificate for the domain:</p>
|
||||
|
||||
<pre><code># systemctl stop nginx
|
||||
# /opt/certbot-auto certonly --standalone --email aorth@mjanja.ch -d library.cgiar.org
|
||||
# systemctl start nginx
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I modified the nginx configuration of the ansible playbooks to use this new certificate and now the certificate is enabled and OCSP stapling is working:</li>
|
||||
</ul>
|
||||
<li><p>I modified the nginx configuration of the ansible playbooks to use this new certificate and now the certificate is enabled and OCSP stapling is working:</p>
|
||||
|
||||
<pre><code>$ openssl s_client -connect cgspace.cgiar.org:443 -servername library.cgiar.org -tls1_2 -tlsextdebug -status
|
||||
...
|
||||
OCSP Response Data:
|
||||
...
|
||||
Cert Status: good
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
|
@ -11,12 +11,11 @@
|
||||
|
||||
Peter emailed to point out that many items in the ILRI archive collection have multiple handles:
|
||||
|
||||
|
||||
http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
|
||||
|
||||
|
||||
|
||||
There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
|
||||
|
||||
Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
@ -31,15 +30,14 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
|
||||
|
||||
Peter emailed to point out that many items in the ILRI archive collection have multiple handles:
|
||||
|
||||
|
||||
http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
|
||||
|
||||
|
||||
|
||||
There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
|
||||
|
||||
Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -121,40 +119,38 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
|
||||
<h2 id="2017-10-01">2017-10-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
|
||||
</ul>
|
||||
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p>
|
||||
|
||||
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
|
||||
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
|
||||
<li><p>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li>
|
||||
|
||||
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-10-02">2017-10-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Peter Ballantyne said he was having problems logging into CGSpace with “both” of his accounts (CGIAR LDAP and personal, apparently)</li>
|
||||
<li>I looked in the logs and saw some LDAP lookup failures due to timeout but also strangely a “no DN found” error:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I looked in the logs and saw some LDAP lookup failures due to timeout but also strangely a “no DN found” error:</p>
|
||||
|
||||
<pre><code>2017-10-01 20:24:57,928 WARN org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:ldap_attribute_lookup:type=failed_search javax.naming.CommunicationException\colon; svcgroot2.cgiarad.org\colon;3269 [Root exception is java.net.ConnectException\colon; Connection timed out (Connection timed out)]
|
||||
2017-10-01 20:22:37,982 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:failed_login:no DN found for user pballantyne
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I thought maybe his account had expired (seeing as it’s was the first of the month) but he says he was finally able to log in today</li>
|
||||
<li>The logs for yesterday show fourteen errors related to LDAP auth failures:</li>
|
||||
</ul>
|
||||
<li><p>I thought maybe his account had expired (seeing as it’s was the first of the month) but he says he was finally able to log in today</p></li>
|
||||
|
||||
<li><p>The logs for yesterday show fourteen errors related to LDAP auth failures:</p>
|
||||
|
||||
<pre><code>$ grep -c "ldap_authentication:type=failed_auth" dspace.log.2017-10-01
|
||||
14
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>For what it’s worth, there are no errors on any other recent days, so it must have been some network issue on Linode or CGNET’s LDAP server</li>
|
||||
<li>Linode emailed to say that linode578611 (DSpace Test) needs to migrate to a new host for a security update so I initiated the migration immediately rather than waiting for the scheduled time in two weeks</li>
|
||||
<li><p>For what it’s worth, there are no errors on any other recent days, so it must have been some network issue on Linode or CGNET’s LDAP server</p></li>
|
||||
|
||||
<li><p>Linode emailed to say that linode578611 (DSpace Test) needs to migrate to a new host for a security update so I initiated the migration immediately rather than waiting for the scheduled time in two weeks</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-10-04">2017-10-04</h2>
|
||||
@ -162,59 +158,67 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
|
||||
<ul>
|
||||
<li>Twice in the last twenty-four hours Linode has alerted about high CPU usage on CGSpace (linode2533629)</li>
|
||||
<li>Communicate with Sam from the CGIAR System Organization about some broken links coming from their CGIAR Library domain to CGSpace</li>
|
||||
<li>The first is a link to a browse page that should be handled better in nginx:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The first is a link to a browse page that should be handled better in nginx:</p>
|
||||
|
||||
<pre><code>http://library.cgiar.org/browse?value=Intellectual%20Assets%20Reports&type=subject → https://cgspace.cgiar.org/browse?value=Intellectual%20Assets%20Reports&type=subject
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We’ll need to check for browse links and handle them properly, including swapping the <code>subject</code> parameter for <code>systemsubject</code> (which doesn’t exist in Discovery yet, but we’ll need to add it) as we have moved their poorly curated subjects from <code>dc.subject</code> to <code>cg.subject.system</code></li>
|
||||
<li>The second link was a direct link to a bitstream which has broken due to the sequence being updated, so I told him he should link to the handle of the item instead</li>
|
||||
<li>Help Sisay proof sixty-two IITA records on DSpace Test</li>
|
||||
<li>Lots of inconsistencies and errors in subjects, dc.format.extent, regions, countries</li>
|
||||
<li>Merge the Discovery search changes for ISI Journal (<a href="https://github.com/ilri/DSpace/pull/341">#341</a>)</li>
|
||||
<li><p>We’ll need to check for browse links and handle them properly, including swapping the <code>subject</code> parameter for <code>systemsubject</code> (which doesn’t exist in Discovery yet, but we’ll need to add it) as we have moved their poorly curated subjects from <code>dc.subject</code> to <code>cg.subject.system</code></p></li>
|
||||
|
||||
<li><p>The second link was a direct link to a bitstream which has broken due to the sequence being updated, so I told him he should link to the handle of the item instead</p></li>
|
||||
|
||||
<li><p>Help Sisay proof sixty-two IITA records on DSpace Test</p></li>
|
||||
|
||||
<li><p>Lots of inconsistencies and errors in subjects, dc.format.extent, regions, countries</p></li>
|
||||
|
||||
<li><p>Merge the Discovery search changes for ISI Journal (<a href="https://github.com/ilri/DSpace/pull/341">#341</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-10-05">2017-10-05</h2>
|
||||
|
||||
<ul>
|
||||
<li>Twice in the past twenty-four hours Linode has warned that CGSpace’s outbound traffic rate was exceeding the notification threshold</li>
|
||||
<li>I had a look at yesterday’s OAI and REST logs in <code>/var/log/nginx</code> but didn’t see anything unusual:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I had a look at yesterday’s OAI and REST logs in <code>/var/log/nginx</code> but didn’t see anything unusual:</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 10
|
||||
141 157.55.39.240
|
||||
145 40.77.167.85
|
||||
162 66.249.66.92
|
||||
181 66.249.66.95
|
||||
211 66.249.66.91
|
||||
312 66.249.66.94
|
||||
384 66.249.66.90
|
||||
1495 50.116.102.77
|
||||
3904 70.32.83.92
|
||||
9904 45.5.184.196
|
||||
141 157.55.39.240
|
||||
145 40.77.167.85
|
||||
162 66.249.66.92
|
||||
181 66.249.66.95
|
||||
211 66.249.66.91
|
||||
312 66.249.66.94
|
||||
384 66.249.66.90
|
||||
1495 50.116.102.77
|
||||
3904 70.32.83.92
|
||||
9904 45.5.184.196
|
||||
# awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
|
||||
5 66.249.66.71
|
||||
6 66.249.66.67
|
||||
6 68.180.229.31
|
||||
8 41.84.227.85
|
||||
8 66.249.66.92
|
||||
17 66.249.66.65
|
||||
24 66.249.66.91
|
||||
38 66.249.66.95
|
||||
69 66.249.66.90
|
||||
148 66.249.66.94
|
||||
</code></pre>
|
||||
5 66.249.66.71
|
||||
6 66.249.66.67
|
||||
6 68.180.229.31
|
||||
8 41.84.227.85
|
||||
8 66.249.66.92
|
||||
17 66.249.66.65
|
||||
24 66.249.66.91
|
||||
38 66.249.66.95
|
||||
69 66.249.66.90
|
||||
148 66.249.66.94
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Working on the nginx redirects for CGIAR Library</li>
|
||||
<li>We should start using 301 redirects and also allow for <code>/sitemap</code> to work on the library.cgiar.org domain so the CGIAR System Organization people can update their Google Search Console and allow Google to find their content in a structured way</li>
|
||||
<li>Remove eleven occurrences of <code>ACP</code> in IITA’s <code>cg.coverage.region</code> using the Atmire batch edit module from Discovery</li>
|
||||
<li>Need to investigate how we can verify the library.cgiar.org using the HTML or DNS methods</li>
|
||||
<li>Run corrections on 143 ILRI Archive items that had two <code>dc.identifier.uri</code> values (Handle) that Peter had pointed out earlier this week</li>
|
||||
<li>I used OpenRefine to isolate them and then fixed and re-imported them into CGSpace</li>
|
||||
<li>I manually checked a dozen of them and it appeared that the correct handle was always the second one, so I just deleted the first one</li>
|
||||
<li><p>Working on the nginx redirects for CGIAR Library</p></li>
|
||||
|
||||
<li><p>We should start using 301 redirects and also allow for <code>/sitemap</code> to work on the library.cgiar.org domain so the CGIAR System Organization people can update their Google Search Console and allow Google to find their content in a structured way</p></li>
|
||||
|
||||
<li><p>Remove eleven occurrences of <code>ACP</code> in IITA’s <code>cg.coverage.region</code> using the Atmire batch edit module from Discovery</p></li>
|
||||
|
||||
<li><p>Need to investigate how we can verify the library.cgiar.org using the HTML or DNS methods</p></li>
|
||||
|
||||
<li><p>Run corrections on 143 ILRI Archive items that had two <code>dc.identifier.uri</code> values (Handle) that Peter had pointed out earlier this week</p></li>
|
||||
|
||||
<li><p>I used OpenRefine to isolate them and then fixed and re-imported them into CGSpace</p></li>
|
||||
|
||||
<li><p>I manually checked a dozen of them and it appeared that the correct handle was always the second one, so I just deleted the first one</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-10-06">2017-10-06</h2>
|
||||
@ -251,19 +255,19 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
|
||||
<li>I tried to submit a “Change of Address” request in the Google Search Console but I need to be an owner on CGSpace’s console (currently I’m just a user) in order to do that</li>
|
||||
<li>Manually clean up some communities and collections that Peter had requested a few weeks ago</li>
|
||||
<li>Delete Community <sup>10568</sup>⁄<sub>102</sub> (ILRI Research and Development Issues)</li>
|
||||
<li>Move five collections to <sup>10568</sup>⁄<sub>27629</sub> (ILRI Projects) using <code>move-collections.sh</code> with the following configuration:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Move five collections to <sup>10568</sup>⁄<sub>27629</sub> (ILRI Projects) using <code>move-collections.sh</code> with the following configuration:</p>
|
||||
|
||||
<pre><code>10568/1637 10568/174 10568/27629
|
||||
10568/1642 10568/174 10568/27629
|
||||
10568/1614 10568/174 10568/27629
|
||||
10568/75561 10568/150 10568/27629
|
||||
10568/183 10568/230 10568/27629
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Delete community <sup>10568</sup>⁄<sub>174</sub> (Sustainable livestock futures)</li>
|
||||
<li>Delete collections in <sup>10568</sup>⁄<sub>27629</sub> that have zero items (33 of them!)</li>
|
||||
<li><p>Delete community <sup>10568</sup>⁄<sub>174</sub> (Sustainable livestock futures)</p></li>
|
||||
|
||||
<li><p>Delete collections in <sup>10568</sup>⁄<sub>27629</sub> that have zero items (33 of them!)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-10-11">2017-10-11</h2>
|
||||
@ -311,31 +315,34 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
|
||||
<li>In other news, I was trying to look at a question about stats raised by Magdalena and then CGSpace went down due to SQL connection pool</li>
|
||||
<li>Looking at the PostgreSQL activity I see there are 93 connections, but after a minute or two they went down and CGSpace came back up</li>
|
||||
<li>Annnd I reloaded the Atmire Usage Stats module and the connections shot back up and CGSpace went down again</li>
|
||||
<li>Still not sure where the load is coming from right now, but it’s clear why there were so many alerts yesterday on the 25th!</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Still not sure where the load is coming from right now, but it’s clear why there were so many alerts yesterday on the 25th!</p>
|
||||
|
||||
<pre><code># grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-25 | sort -n | uniq | wc -l
|
||||
18022
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Compared to other days there were two or three times the number of requests yesterday!</li>
|
||||
</ul>
|
||||
<li><p>Compared to other days there were two or three times the number of requests yesterday!</p>
|
||||
|
||||
<pre><code># grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-23 | sort -n | uniq | wc -l
|
||||
3141
|
||||
# grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-26 | sort -n | uniq | wc -l
|
||||
7851
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I still have no idea what was causing the load to go up today</li>
|
||||
<li>I finally investigated Magdalena’s issue with the item download stats and now I can’t reproduce it: I get the same number of downloads reported in the stats widget on the item page, the “Most Popular Items” page, and in Usage Stats</li>
|
||||
<li>I think it might have been an issue with the statistics not being fresh</li>
|
||||
<li>I added the admin group for the systems organization to the admin role of the top-level community of CGSpace because I guess Sisay had forgotten</li>
|
||||
<li>Magdalena asked if there was a way to reuse data in item submissions where items have a lot of similar data</li>
|
||||
<li>I told her about the possibility to use per-collection item templates, and asked if her items in question were all from a single collection</li>
|
||||
<li>We’ve never used it but it could be worth looking at</li>
|
||||
<li><p>I still have no idea what was causing the load to go up today</p></li>
|
||||
|
||||
<li><p>I finally investigated Magdalena’s issue with the item download stats and now I can’t reproduce it: I get the same number of downloads reported in the stats widget on the item page, the “Most Popular Items” page, and in Usage Stats</p></li>
|
||||
|
||||
<li><p>I think it might have been an issue with the statistics not being fresh</p></li>
|
||||
|
||||
<li><p>I added the admin group for the systems organization to the admin role of the top-level community of CGSpace because I guess Sisay had forgotten</p></li>
|
||||
|
||||
<li><p>Magdalena asked if there was a way to reuse data in item submissions where items have a lot of similar data</p></li>
|
||||
|
||||
<li><p>I told her about the possibility to use per-collection item templates, and asked if her items in question were all from a single collection</p></li>
|
||||
|
||||
<li><p>We’ve never used it but it could be worth looking at</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-10-27">2017-10-27</h2>
|
||||
@ -355,133 +362,126 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
|
||||
<ul>
|
||||
<li>Linode alerted about high CPU usage again on CGSpace around 2AM and 4AM</li>
|
||||
<li>I’m still not sure why this started causing alerts so repeatadely the past week</li>
|
||||
<li>I don’t see any tell tale signs in the REST or OAI logs, so trying to do rudimentary analysis in DSpace logs:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I don’t see any tell tale signs in the REST or OAI logs, so trying to do rudimentary analysis in DSpace logs:</p>
|
||||
|
||||
<pre><code># grep '2017-10-29 02:' dspace.log.2017-10-29 | grep -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
2049
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So there were 2049 unique sessions during the hour of 2AM</li>
|
||||
<li>Looking at my notes, the number of unique sessions was about the same during the same hour on other days when there were no alerts</li>
|
||||
<li>I think I’ll need to enable access logging in nginx to figure out what’s going on</li>
|
||||
<li>After enabling logging on requests to XMLUI on <code>/</code> I see some new bot I’ve never seen before:</li>
|
||||
</ul>
|
||||
<li><p>So there were 2049 unique sessions during the hour of 2AM</p></li>
|
||||
|
||||
<li><p>Looking at my notes, the number of unique sessions was about the same during the same hour on other days when there were no alerts</p></li>
|
||||
|
||||
<li><p>I think I’ll need to enable access logging in nginx to figure out what’s going on</p></li>
|
||||
|
||||
<li><p>After enabling logging on requests to XMLUI on <code>/</code> I see some new bot I’ve never seen before:</p>
|
||||
|
||||
<pre><code>137.108.70.6 - - [29/Oct/2017:07:39:49 +0000] "GET /discover?filtertype_0=type&filter_relational_operator_0=equals&filter_0=Internal+Document&filtertype=author&filter_relational_operator=equals&filter=CGIAR+Secretariat HTTP/1.1" 200 7776 "-" "Mozilla/5.0 (compatible; CORE/0.6; +http://core.ac.uk; http://core.ac.uk/intro/contact)"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>CORE seems to be some bot that is “Aggregating the world’s open access research papers”</li>
|
||||
<li>The contact address listed in their bot’s user agent is incorrect, correct page is simply: <a href="https://core.ac.uk/contact">https://core.ac.uk/contact</a></li>
|
||||
<li>I will check the logs in a few days to see if they are harvesting us regularly, then add their bot’s user agent to the Tomcat Crawler Session Valve</li>
|
||||
<li>After browsing the CORE site it seems that the CGIAR Library is somehow a member of CORE, so they have probably only been harvesting CGSpace since we did the migration, as library.cgiar.org directs to us now</li>
|
||||
<li>For now I will just contact them to have them update their contact info in the bot’s user agent, but eventually I think I’ll tell them to swap out the CGIAR Library entry for CGSpace</li>
|
||||
<li><p>CORE seems to be some bot that is “Aggregating the world’s open access research papers”</p></li>
|
||||
|
||||
<li><p>The contact address listed in their bot’s user agent is incorrect, correct page is simply: <a href="https://core.ac.uk/contact">https://core.ac.uk/contact</a></p></li>
|
||||
|
||||
<li><p>I will check the logs in a few days to see if they are harvesting us regularly, then add their bot’s user agent to the Tomcat Crawler Session Valve</p></li>
|
||||
|
||||
<li><p>After browsing the CORE site it seems that the CGIAR Library is somehow a member of CORE, so they have probably only been harvesting CGSpace since we did the migration, as library.cgiar.org directs to us now</p></li>
|
||||
|
||||
<li><p>For now I will just contact them to have them update their contact info in the bot’s user agent, but eventually I think I’ll tell them to swap out the CGIAR Library entry for CGSpace</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-10-30">2017-10-30</h2>
|
||||
|
||||
<ul>
|
||||
<li>Like clock work, Linode alerted about high CPU usage on CGSpace again this morning (this time at 8:13 AM)</li>
|
||||
<li>Uptime Robot noticed that CGSpace went down around 10:15 AM, and I saw that there were 93 PostgreSQL connections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Uptime Robot noticed that CGSpace went down around 10:15 AM, and I saw that there were 93 PostgreSQL connections:</p>
|
||||
|
||||
<pre><code>dspace=# SELECT * FROM pg_stat_activity;
|
||||
...
|
||||
(93 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Surprise surprise, the CORE bot is likely responsible for the recent load issues, making hundreds of thousands of requests yesterday and today:</li>
|
||||
</ul>
|
||||
<li><p>Surprise surprise, the CORE bot is likely responsible for the recent load issues, making hundreds of thousands of requests yesterday and today:</p>
|
||||
|
||||
<pre><code># grep -c "CORE/0.6" /var/log/nginx/access.log
|
||||
26475
|
||||
# grep -c "CORE/0.6" /var/log/nginx/access.log.1
|
||||
135083
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>IP addresses for this bot currently seem to be:</li>
|
||||
</ul>
|
||||
<li><p>IP addresses for this bot currently seem to be:</p>
|
||||
|
||||
<pre><code># grep "CORE/0.6" /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq
|
||||
137.108.70.6
|
||||
137.108.70.7
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I will add their user agent to the Tomcat Session Crawler Valve but it won’t help much because they are only using two sessions:</li>
|
||||
</ul>
|
||||
<li><p>I will add their user agent to the Tomcat Session Crawler Valve but it won’t help much because they are only using two sessions:</p>
|
||||
|
||||
<pre><code># grep 137.108.70 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq
|
||||
session_id=5771742CABA3D0780860B8DA81E0551B
|
||||
session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>… and most of their requests are for dynamic discover pages:</li>
|
||||
</ul>
|
||||
<li><p>… and most of their requests are for dynamic discover pages:</p>
|
||||
|
||||
<pre><code># grep -c 137.108.70 /var/log/nginx/access.log
|
||||
26622
|
||||
# grep 137.108.70 /var/log/nginx/access.log | grep -c "GET /discover"
|
||||
24055
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Just because I’m curious who the top IPs are:</li>
|
||||
</ul>
|
||||
<li><p>Just because I’m curious who the top IPs are:</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/access.log | sort -n | uniq -c | sort -h | tail
|
||||
496 62.210.247.93
|
||||
571 46.4.94.226
|
||||
651 40.77.167.39
|
||||
763 157.55.39.231
|
||||
782 207.46.13.90
|
||||
998 66.249.66.90
|
||||
1948 104.196.152.243
|
||||
4247 190.19.92.5
|
||||
31602 137.108.70.6
|
||||
31636 137.108.70.7
|
||||
</code></pre>
|
||||
496 62.210.247.93
|
||||
571 46.4.94.226
|
||||
651 40.77.167.39
|
||||
763 157.55.39.231
|
||||
782 207.46.13.90
|
||||
998 66.249.66.90
|
||||
1948 104.196.152.243
|
||||
4247 190.19.92.5
|
||||
31602 137.108.70.6
|
||||
31636 137.108.70.7
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>At least we know the top two are CORE, but who are the others?</li>
|
||||
<li>190.19.92.5 is apparently in Argentina, and 104.196.152.243 is from Google Cloud Engine</li>
|
||||
<li>Actually, these two scrapers might be more responsible for the heavy load than the CORE bot, because they don’t reuse their session variable, creating thousands of new sessions!</li>
|
||||
</ul>
|
||||
<li><p>At least we know the top two are CORE, but who are the others?</p></li>
|
||||
|
||||
<li><p>190.19.92.5 is apparently in Argentina, and 104.196.152.243 is from Google Cloud Engine</p></li>
|
||||
|
||||
<li><p>Actually, these two scrapers might be more responsible for the heavy load than the CORE bot, because they don’t reuse their session variable, creating thousands of new sessions!</p>
|
||||
|
||||
<pre><code># grep 190.19.92.5 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
1419
|
||||
# grep 104.196.152.243 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
2811
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>From looking at the requests, it appears these are from CIAT and CCAFS</li>
|
||||
<li>I wonder if I could somehow instruct them to use a user agent so that we could apply a crawler session manager valve to them</li>
|
||||
<li>Actually, according to the Tomcat docs, we could use an IP with <code>crawlerIps</code>: <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve">https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve</a></li>
|
||||
<li>Ah, wait, it looks like <code>crawlerIps</code> only came in 2017-06, so probably isn’t in Ubuntu 16.04’s 7.0.68 build!</li>
|
||||
<li>That would explain the errors I was getting when trying to set it:</li>
|
||||
</ul>
|
||||
<li><p>From looking at the requests, it appears these are from CIAT and CCAFS</p></li>
|
||||
|
||||
<li><p>I wonder if I could somehow instruct them to use a user agent so that we could apply a crawler session manager valve to them</p></li>
|
||||
|
||||
<li><p>Actually, according to the Tomcat docs, we could use an IP with <code>crawlerIps</code>: <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve">https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve</a></p></li>
|
||||
|
||||
<li><p>Ah, wait, it looks like <code>crawlerIps</code> only came in 2017-06, so probably isn’t in Ubuntu 16.04’s 7.0.68 build!</p></li>
|
||||
|
||||
<li><p>That would explain the errors I was getting when trying to set it:</p>
|
||||
|
||||
<pre><code>WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Valve} Setting property 'crawlerIps' to '190\.19\.92\.5|104\.196\.152\.243' did not find a matching property.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>As for now, it actually seems the CORE bot coming from 137.108.70.6 and 137.108.70.7 is only using a few sessions per day, which is good:</li>
|
||||
</ul>
|
||||
<li><p>As for now, it actually seems the CORE bot coming from 137.108.70.6 and 137.108.70.7 is only using a few sessions per day, which is good:</p>
|
||||
|
||||
<pre><code># grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=137.108.70.(6|7)' dspace.log.2017-10-30 | sort -n | uniq -c | sort -h
|
||||
410 session_id=74F0C3A133DBF1132E7EC30A7E7E0D60:ip_addr=137.108.70.7
|
||||
574 session_id=5771742CABA3D0780860B8DA81E0551B:ip_addr=137.108.70.7
|
||||
1012 session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A:ip_addr=137.108.70.6
|
||||
</code></pre>
|
||||
410 session_id=74F0C3A133DBF1132E7EC30A7E7E0D60:ip_addr=137.108.70.7
|
||||
574 session_id=5771742CABA3D0780860B8DA81E0551B:ip_addr=137.108.70.7
|
||||
1012 session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A:ip_addr=137.108.70.6
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I will check again tomorrow</li>
|
||||
<li><p>I will check again tomorrow</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-10-31">2017-10-31</h2>
|
||||
@ -489,40 +489,43 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
|
||||
<ul>
|
||||
<li>Very nice, Linode alerted that CGSpace had high CPU usage at 2AM again</li>
|
||||
<li>Ask on the dspace-tech mailing list if it’s possible to use an existing item as a template for a new item</li>
|
||||
<li>To follow up on the CORE bot traffic, there were almost 300,000 request yesterday:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>To follow up on the CORE bot traffic, there were almost 300,000 request yesterday:</p>
|
||||
|
||||
<pre><code># grep "CORE/0.6" /var/log/nginx/access.log.1 | awk '{print $1}' | sort -n | uniq -c | sort -h
|
||||
139109 137.108.70.6
|
||||
139253 137.108.70.7
|
||||
</code></pre>
|
||||
139109 137.108.70.6
|
||||
139253 137.108.70.7
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’ve emailed the CORE people to ask if they can update the repository information from CGIAR Library to CGSpace</li>
|
||||
<li>Also, I asked if they could perhaps use the <code>sitemap.xml</code>, OAI-PMH, or REST APIs to index us more efficiently, because they mostly seem to be crawling the nearly endless Discovery facets</li>
|
||||
<li>I added <a href="https://goaccess.io/">GoAccess</a> to the list of package to install in the DSpace role of the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a></li>
|
||||
<li>It makes it very easy to analyze nginx logs from the command line, to see where traffic is coming from:</li>
|
||||
</ul>
|
||||
<li><p>I’ve emailed the CORE people to ask if they can update the repository information from CGIAR Library to CGSpace</p></li>
|
||||
|
||||
<li><p>Also, I asked if they could perhaps use the <code>sitemap.xml</code>, OAI-PMH, or REST APIs to index us more efficiently, because they mostly seem to be crawling the nearly endless Discovery facets</p></li>
|
||||
|
||||
<li><p>I added <a href="https://goaccess.io/">GoAccess</a> to the list of package to install in the DSpace role of the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a></p></li>
|
||||
|
||||
<li><p>It makes it very easy to analyze nginx logs from the command line, to see where traffic is coming from:</p>
|
||||
|
||||
<pre><code># goaccess /var/log/nginx/access.log --log-format=COMBINED
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>According to Uptime Robot CGSpace went down and up a few times</li>
|
||||
<li>I had a look at goaccess and I saw that CORE was actively indexing</li>
|
||||
<li>Also, PostgreSQL connections were at 91 (with the max being 60 per web app, hmmm)</li>
|
||||
<li>I’m really starting to get annoyed with these guys, and thinking about blocking their IP address for a few days to see if CGSpace becomes more stable</li>
|
||||
<li>Actually, come to think of it, they aren’t even obeying <code>robots.txt</code>, because we actually disallow <code>/discover</code> and <code>/search-filter</code> URLs but they are hitting those massively:</li>
|
||||
</ul>
|
||||
<li><p>According to Uptime Robot CGSpace went down and up a few times</p></li>
|
||||
|
||||
<li><p>I had a look at goaccess and I saw that CORE was actively indexing</p></li>
|
||||
|
||||
<li><p>Also, PostgreSQL connections were at 91 (with the max being 60 per web app, hmmm)</p></li>
|
||||
|
||||
<li><p>I’m really starting to get annoyed with these guys, and thinking about blocking their IP address for a few days to see if CGSpace becomes more stable</p></li>
|
||||
|
||||
<li><p>Actually, come to think of it, they aren’t even obeying <code>robots.txt</code>, because we actually disallow <code>/discover</code> and <code>/search-filter</code> URLs but they are hitting those massively:</p>
|
||||
|
||||
<pre><code># grep "CORE/0.6" /var/log/nginx/access.log | grep -o -E "GET /(discover|search-filter)" | sort -n | uniq -c | sort -rn
|
||||
158058 GET /discover
|
||||
14260 GET /search-filter
|
||||
</code></pre>
|
||||
158058 GET /discover
|
||||
14260 GET /search-filter
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I tested a URL of pattern <code>/discover</code> in Google’s webmaster tools and it was indeed identified as blocked</li>
|
||||
<li>I will send feedback to the CORE bot team</li>
|
||||
<li><p>I tested a URL of pattern <code>/discover</code> in Google’s webmaster tools and it was indeed identified as blocked</p></li>
|
||||
|
||||
<li><p>I will send feedback to the CORE bot team</p></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -23,7 +23,7 @@ Export a CSV of the IITA community metadata for Martin Mueller
|
||||
|
||||
Export a CSV of the IITA community metadata for Martin Mueller
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -115,34 +115,34 @@ Export a CSV of the IITA community metadata for Martin Mueller
|
||||
<li>Andrea from Macaroni Bros had sent me an email that CCAFS needs them</li>
|
||||
<li>Give Udana more feedback on his WLE records from last month</li>
|
||||
<li>There were some records using a non-breaking space in their AGROVOC subject field</li>
|
||||
<li>I checked and tested some author corrections from Peter from last week, and then applied them on CGSpace</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I checked and tested some author corrections from Peter from last week, and then applied them on CGSpace</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i Correct-309-authors-2018-03-06.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
|
||||
$ ./delete-metadata-values.py -i Delete-3-Authors-2018-03-06.csv -db dspace -u dspace-p 'fuuu' -f dc.contributor.author -m 3
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This time there were no errors in whitespace but I did have to correct one incorrectly encoded accent character</li>
|
||||
<li>Add new CRP subject “GRAIN LEGUMES AND DRYLAND CEREALS” to <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/358">#358</a>)</li>
|
||||
<li>Merge the ORCID integration stuff in to <code>5_x-prod</code> for deployment on CGSpace soon (<a href="https://github.com/ilri/DSpace/pull/359">#359</a>)</li>
|
||||
<li>Deploy ORCID changes on CGSpace (linode18), run all system updates, and reboot the server</li>
|
||||
<li>Run all system updates on DSpace Test and reboot server</li>
|
||||
<li>I ran the <a href="https://gist.github.com/alanorth/24d8081a5dc25e2a4e27e548e7e2389c">orcid-authority-to-item.py</a> script on CGSpace and mapped 2,864 ORCID identifiers from Solr to item metadata</li>
|
||||
</ul>
|
||||
<li><p>This time there were no errors in whitespace but I did have to correct one incorrectly encoded accent character</p></li>
|
||||
|
||||
<li><p>Add new CRP subject “GRAIN LEGUMES AND DRYLAND CEREALS” to <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/358">#358</a>)</p></li>
|
||||
|
||||
<li><p>Merge the ORCID integration stuff in to <code>5_x-prod</code> for deployment on CGSpace soon (<a href="https://github.com/ilri/DSpace/pull/359">#359</a>)</p></li>
|
||||
|
||||
<li><p>Deploy ORCID changes on CGSpace (linode18), run all system updates, and reboot the server</p></li>
|
||||
|
||||
<li><p>Run all system updates on DSpace Test and reboot server</p></li>
|
||||
|
||||
<li><p>I ran the <a href="https://gist.github.com/alanorth/24d8081a5dc25e2a4e27e548e7e2389c">orcid-authority-to-item.py</a> script on CGSpace and mapped 2,864 ORCID identifiers from Solr to item metadata</p>
|
||||
|
||||
<pre><code>$ ./orcid-authority-to-item.py -db dspace -u dspace -p 'fuuu' -s http://localhost:8081/solr -d
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I ran the DSpace cleanup script on CGSpace and it threw an error (as always):</li>
|
||||
</ul>
|
||||
<li><p>I ran the DSpace cleanup script on CGSpace and it threw an error (as always):</p>
|
||||
|
||||
<pre><code>Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(150659) is still referenced from table "bundle".
|
||||
</code></pre>
|
||||
Detail: Key (bitstream_id)=(150659) is still referenced from table "bundle".
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><p>The solution is, as always:</p>
|
||||
|
||||
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (150659);'
|
||||
@ -166,89 +166,89 @@ UPDATE 1
|
||||
<ul>
|
||||
<li>Looking at a CSV dump of the CIAT community I see there are tons of stupid text languages people add for their metadata</li>
|
||||
<li>This makes the CSV have tons of columns, for example <code>dc.title</code>, <code>dc.title[]</code>, <code>dc.title[en]</code>, <code>dc.title[eng]</code>, <code>dc.title[en_US]</code> and so on!</li>
|
||||
<li>I think I can fix — or at least normalize — them in the database:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think I can fix — or at least normalize — them in the database:</p>
|
||||
|
||||
<pre><code>dspace=# select distinct text_lang from metadatavalue where resource_type_id=2;
|
||||
text_lang
|
||||
text_lang
|
||||
-----------
|
||||
|
||||
ethnob
|
||||
en
|
||||
spa
|
||||
EN
|
||||
En
|
||||
en_
|
||||
en_US
|
||||
E.
|
||||
ethnob
|
||||
en
|
||||
spa
|
||||
EN
|
||||
En
|
||||
en_
|
||||
en_US
|
||||
E.
|
||||
|
||||
EN_US
|
||||
en_U
|
||||
eng
|
||||
fr
|
||||
es_ES
|
||||
es
|
||||
EN_US
|
||||
en_U
|
||||
eng
|
||||
fr
|
||||
es_ES
|
||||
es
|
||||
(16 rows)
|
||||
|
||||
dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('en','EN','En','en_','EN_US','en_U','eng');
|
||||
UPDATE 122227
|
||||
dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2;
|
||||
text_lang
|
||||
text_lang
|
||||
-----------
|
||||
|
||||
ethnob
|
||||
en_US
|
||||
spa
|
||||
E.
|
||||
ethnob
|
||||
en_US
|
||||
spa
|
||||
E.
|
||||
|
||||
fr
|
||||
es_ES
|
||||
es
|
||||
fr
|
||||
es_ES
|
||||
es
|
||||
(9 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>On second inspection it looks like <code>dc.description.provenance</code> fields use the text_lang “en” so that’s probably why there are over 100,000 fields changed…</li>
|
||||
<li>If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc:</li>
|
||||
</ul>
|
||||
<li><p>On second inspection it looks like <code>dc.description.provenance</code> fields use the text_lang “en” so that’s probably why there are over 100,000 fields changed…</p></li>
|
||||
|
||||
<li><p>If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng');
|
||||
UPDATE 2309
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I will apply this on CGSpace right now</li>
|
||||
<li>In other news, I was playing with adding ORCID identifiers to a dump of CIAT’s community via CSV in OpenRefine</li>
|
||||
<li>Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the <code>cg.creator.id</code> field</li>
|
||||
<li>For example, a GREL expression in a custom text facet to get all items with <code>dc.contributor.author[en_US]</code> of a certain author with several name variations (this is how you use a logical OR in OpenRefine):</li>
|
||||
</ul>
|
||||
<li><p>I will apply this on CGSpace right now</p></li>
|
||||
|
||||
<li><p>In other news, I was playing with adding ORCID identifiers to a dump of CIAT’s community via CSV in OpenRefine</p></li>
|
||||
|
||||
<li><p>Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the <code>cg.creator.id</code> field</p></li>
|
||||
|
||||
<li><p>For example, a GREL expression in a custom text facet to get all items with <code>dc.contributor.author[en_US]</code> of a certain author with several name variations (this is how you use a logical OR in OpenRefine):</p>
|
||||
|
||||
<pre><code>or(value.contains('Ceballos, Hern'), value.contains('Hernández Ceballos'))
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then you can flag or star matching items and then use a conditional to either set the value directly or add it to an existing value:</li>
|
||||
</ul>
|
||||
<li><p>Then you can flag or star matching items and then use a conditional to either set the value directly or add it to an existing value:</p>
|
||||
|
||||
<pre><code>if(isBlank(value), "Hernan Ceballos: 0000-0002-8744-7918", value + "||Hernan Ceballos: 0000-0002-8744-7918")
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>One thing that bothers me is that this won’t honor author order</li>
|
||||
<li>It might be better to do batches of these in PostgreSQL with a script that takes the <code>place</code> column of an author into account when setting the <code>cg.creator.id</code></li>
|
||||
<li>I wrote a Python script to read the author names and ORCID identifiers from CSV and create matching <code>cg.creator.id</code> fields: <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py </a></li>
|
||||
<li>The CSV should have two columns: author name and ORCID identifier:</li>
|
||||
</ul>
|
||||
<li><p>One thing that bothers me is that this won’t honor author order</p></li>
|
||||
|
||||
<li><p>It might be better to do batches of these in PostgreSQL with a script that takes the <code>place</code> column of an author into account when setting the <code>cg.creator.id</code></p></li>
|
||||
|
||||
<li><p>I wrote a Python script to read the author names and ORCID identifiers from CSV and create matching <code>cg.creator.id</code> fields: <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py </a></p></li>
|
||||
|
||||
<li><p>The CSV should have two columns: author name and ORCID identifier:</p>
|
||||
|
||||
<pre><code>dc.contributor.author,cg.creator.id
|
||||
"Orth, Alan",Alan S. Orth: 0000-0002-1735-7458
|
||||
"Orth, A.",Alan S. Orth: 0000-0002-1735-7458
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I didn’t integrate the ORCID API lookup for author names in this script for now because I was only interested in “tagging” old items for a few given authors</li>
|
||||
<li>I added ORCID identifers for 187 items by CIAT’s Hernan Ceballos, because that is what Elizabeth was trying to do manually!</li>
|
||||
<li>Also, I decided to add ORCID identifiers for all records from Peter, Abenet, and Sisay as well</li>
|
||||
<li><p>I didn’t integrate the ORCID API lookup for author names in this script for now because I was only interested in “tagging” old items for a few given authors</p></li>
|
||||
|
||||
<li><p>I added ORCID identifers for 187 items by CIAT’s Hernan Ceballos, because that is what Elizabeth was trying to do manually!</p></li>
|
||||
|
||||
<li><p>Also, I decided to add ORCID identifiers for all records from Peter, Abenet, and Sisay as well</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-03-09">2018-03-09</h2>
|
||||
@ -262,8 +262,8 @@ UPDATE 2309
|
||||
|
||||
<ul>
|
||||
<li>Peter also wrote to say he is having issues with the Atmire Listings and Reports module</li>
|
||||
<li>When I logged in to try it I get a blank white page after continuing and I see this in dspace.log.2018-03-11:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>When I logged in to try it I get a blank white page after continuing and I see this in dspace.log.2018-03-11:</p>
|
||||
|
||||
<pre><code>2018-03-11 11:38:15,592 WARN org.dspace.app.webui.servlet.InternalErrorServlet @ :session_id=91C2C0C59669B33A7683570F6010603A:internal_error:-- URL Was: https://cgspace.cgiar.or
|
||||
g/jspui/listings-and-reports
|
||||
@ -275,11 +275,11 @@ g/jspui/listings-and-reports
|
||||
-- step: "1"
|
||||
|
||||
org.apache.jasper.JasperException: java.lang.NullPointerException
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looks like I needed to remove the Humidtropics subject from Listings and Reports because it was looking for the terms and couldn’t find them</li>
|
||||
<li>I made a quick fix and it’s working now (<a href="https://github.com/ilri/DSpace/pull/364">#364</a>)</li>
|
||||
<li><p>Looks like I needed to remove the Humidtropics subject from Listings and Reports because it was looking for the terms and couldn’t find them</p></li>
|
||||
|
||||
<li><p>I made a quick fix and it’s working now (<a href="https://github.com/ilri/DSpace/pull/364">#364</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-03-12">2018-03-12</h2>
|
||||
@ -321,17 +321,18 @@ org.apache.jasper.JasperException: java.lang.NullPointerException
|
||||
<p><img src="/cgspace-notes/2018/03/layout-only-citation.png" alt="Listing and Reports layout" /></p>
|
||||
|
||||
<ul>
|
||||
<li>The error in the DSpace log is:</li>
|
||||
</ul>
|
||||
<li><p>The error in the DSpace log is:</p>
|
||||
|
||||
<pre><code>org.apache.jasper.JasperException: java.lang.ArrayIndexOutOfBoundsException: -1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The full error is here: <a href="https://gist.github.com/alanorth/ea47c092725960e39610db9b0c13f6ca">https://gist.github.com/alanorth/ea47c092725960e39610db9b0c13f6ca</a></li>
|
||||
<li>If I do a report for “Orth, Alan” with the same custom layout it works!</li>
|
||||
<li>I submitted a ticket to Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589</a></li>
|
||||
<li>Small fix to the example citation text in Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/365">#365</a>)</li>
|
||||
<li><p>The full error is here: <a href="https://gist.github.com/alanorth/ea47c092725960e39610db9b0c13f6ca">https://gist.github.com/alanorth/ea47c092725960e39610db9b0c13f6ca</a></p></li>
|
||||
|
||||
<li><p>If I do a report for “Orth, Alan” with the same custom layout it works!</p></li>
|
||||
|
||||
<li><p>I submitted a ticket to Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589</a></p></li>
|
||||
|
||||
<li><p>Small fix to the example citation text in Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/365">#365</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-03-16">2018-03-16</h2>
|
||||
@ -339,29 +340,24 @@ org.apache.jasper.JasperException: java.lang.NullPointerException
|
||||
<ul>
|
||||
<li>ICT made the DNS updates for dspacetest.cgiar.org late last night</li>
|
||||
<li>I have removed the old server (linode02 aka linode578611) in favor of linode19 aka linode6624164</li>
|
||||
<li>Looking at the CRP subjects on CGSpace I see there is one blank one so I’ll just fix it:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Looking at the CRP subjects on CGSpace I see there is one blank one so I’ll just fix it:</p>
|
||||
|
||||
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=230 and text_value='';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Copy all CRP subjects to a CSV to do the mass updates:</li>
|
||||
</ul>
|
||||
<li><p>Copy all CRP subjects to a CSV to do the mass updates:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=230 group by text_value order by count desc) to /tmp/crps.csv with csv header;
|
||||
COPY 21
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Once I prepare the new input forms (<a href="https://github.com/ilri/DSpace/issues/362">#362</a>) I will need to do the batch corrections:</li>
|
||||
</ul>
|
||||
<li><p>Once I prepare the new input forms (<a href="https://github.com/ilri/DSpace/issues/362">#362</a>) I will need to do the batch corrections:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i Correct-21-CRPs-2018-03-16.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.crp -t correct -m 230 -n -d
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Create a pull request to update the input forms for the new CRP subject style (<a href="https://github.com/ilri/DSpace/pull/366">#366</a>)</li>
|
||||
<li><p>Create a pull request to update the input forms for the new CRP subject style (<a href="https://github.com/ilri/DSpace/pull/366">#366</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-03-19">2018-03-19</h2>
|
||||
@ -371,17 +367,15 @@ COPY 21
|
||||
<li>She is getting an HTTPS error apparently</li>
|
||||
<li>It’s working outside, and Ethiopian users seem to be having no issues so I’ve asked ICT to have a look</li>
|
||||
<li>CGSpace crashed this morning for about seven minutes and Dani restarted Tomcat</li>
|
||||
<li>Around that time there were an increase of SQL errors:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Around that time there were an increase of SQL errors:</p>
|
||||
|
||||
<pre><code>2018-03-19 09:10:54,856 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
|
||||
...
|
||||
2018-03-19 09:10:54,862 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL query singleTable Error -
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But these errors, I don’t even know what they mean, because a handful of them happen every day:</li>
|
||||
</ul>
|
||||
<li><p>But these errors, I don’t even know what they mean, because a handful of them happen every day:</p>
|
||||
|
||||
<pre><code>$ grep -c 'ERROR org.dspace.storage.rdbms.DatabaseManager' dspace.log.2018-03-1*
|
||||
dspace.log.2018-03-10:13
|
||||
@ -394,103 +388,105 @@ dspace.log.2018-03-16:13
|
||||
dspace.log.2018-03-17:13
|
||||
dspace.log.2018-03-18:15
|
||||
dspace.log.2018-03-19:90
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There wasn’t even a lot of traffic at the time (8–9 AM):</li>
|
||||
</ul>
|
||||
<li><p>There wasn’t even a lot of traffic at the time (8–9 AM):</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Mar/2018:0[89]:" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.197
|
||||
92 83.103.94.48
|
||||
96 40.77.167.175
|
||||
116 207.46.13.178
|
||||
122 66.249.66.153
|
||||
140 95.108.181.88
|
||||
196 213.55.99.121
|
||||
206 197.210.168.174
|
||||
207 104.196.152.243
|
||||
294 54.198.169.202
|
||||
</code></pre>
|
||||
92 40.77.167.197
|
||||
92 83.103.94.48
|
||||
96 40.77.167.175
|
||||
116 207.46.13.178
|
||||
122 66.249.66.153
|
||||
140 95.108.181.88
|
||||
196 213.55.99.121
|
||||
206 197.210.168.174
|
||||
207 104.196.152.243
|
||||
294 54.198.169.202
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Well there is a hint in Tomcat’s <code>catalina.out</code>:</li>
|
||||
</ul>
|
||||
<li><p>Well there is a hint in Tomcat’s <code>catalina.out</code>:</p>
|
||||
|
||||
<pre><code>Mon Mar 19 09:05:28 UTC 2018 | Query:id: 92032 AND type:2
|
||||
Exception in thread "http-bio-127.0.0.1-8081-exec-280" java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So someone was doing something heavy somehow… my guess is content and usage stats!</li>
|
||||
<li>ICT responded that they “fixed” the CGSpace connectivity issue in Nairobi without telling me the problem</li>
|
||||
<li>When I asked, Robert Okal said CGNET messed up when updating the DNS for cgspace.cgiar.org last week</li>
|
||||
<li>I told him that my request last week was for dspacetest.cgiar.org, not cgspace.cgiar.org!</li>
|
||||
<li>So they updated the wrong fucking DNS records</li>
|
||||
<li>Magdalena from CCAFS wrote to ask about one record that has a bunch of metadata missing in her Listings and Reports export</li>
|
||||
<li>It appears to be this one: <a href="https://cgspace.cgiar.org/handle/10568/83473?show=full">https://cgspace.cgiar.org/handle/10568/83473?show=full</a></li>
|
||||
<li>The title is “Untitled” and there is some metadata but indeed the citation is missing</li>
|
||||
<li>I don’t know what would cause that</li>
|
||||
<li><p>So someone was doing something heavy somehow… my guess is content and usage stats!</p></li>
|
||||
|
||||
<li><p>ICT responded that they “fixed” the CGSpace connectivity issue in Nairobi without telling me the problem</p></li>
|
||||
|
||||
<li><p>When I asked, Robert Okal said CGNET messed up when updating the DNS for cgspace.cgiar.org last week</p></li>
|
||||
|
||||
<li><p>I told him that my request last week was for dspacetest.cgiar.org, not cgspace.cgiar.org!</p></li>
|
||||
|
||||
<li><p>So they updated the wrong fucking DNS records</p></li>
|
||||
|
||||
<li><p>Magdalena from CCAFS wrote to ask about one record that has a bunch of metadata missing in her Listings and Reports export</p></li>
|
||||
|
||||
<li><p>It appears to be this one: <a href="https://cgspace.cgiar.org/handle/10568/83473?show=full">https://cgspace.cgiar.org/handle/10568/83473?show=full</a></p></li>
|
||||
|
||||
<li><p>The title is “Untitled” and there is some metadata but indeed the citation is missing</p></li>
|
||||
|
||||
<li><p>I don’t know what would cause that</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-03-20">2018-03-20</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test has been down for a few hours with SQL and memory errors starting this morning:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test has been down for a few hours with SQL and memory errors starting this morning:</p>
|
||||
|
||||
<pre><code>2018-03-20 08:47:10,177 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
|
||||
...
|
||||
2018-03-20 08:53:11,624 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
|
||||
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I have no idea why it crashed</li>
|
||||
<li>I ran all system updates and rebooted it</li>
|
||||
<li>Abenet told me that one of Lance Robinson’s ORCID iDs on CGSpace is incorrect</li>
|
||||
<li>I will remove it from the controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/367">#367</a>) and update any items using the old one:</li>
|
||||
</ul>
|
||||
<li><p>I have no idea why it crashed</p></li>
|
||||
|
||||
<li><p>I ran all system updates and rebooted it</p></li>
|
||||
|
||||
<li><p>Abenet told me that one of Lance Robinson’s ORCID iDs on CGSpace is incorrect</p></li>
|
||||
|
||||
<li><p>I will remove it from the controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/367">#367</a>) and update any items using the old one:</p>
|
||||
|
||||
<pre><code>dspace=# update metadatavalue set text_value='Lance W. Robinson: 0000-0002-5224-8644' where resource_type_id=2 and metadata_field_id=240 and text_value like '%0000-0002-6344-195X%';
|
||||
UPDATE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Communicate with DSpace editors on Yammer about being more careful about spaces and character editing when doing manual metadata edits</li>
|
||||
<li>Merge the changes to CRP names to the <code>5_x-prod</code> branch and deploy on CGSpace (<a href="https://github.com/ilri/DSpace/pull/363">#363</a>)</li>
|
||||
<li>Run corrections for CRP names in the database:</li>
|
||||
</ul>
|
||||
<li><p>Communicate with DSpace editors on Yammer about being more careful about spaces and character editing when doing manual metadata edits</p></li>
|
||||
|
||||
<li><p>Merge the changes to CRP names to the <code>5_x-prod</code> branch and deploy on CGSpace (<a href="https://github.com/ilri/DSpace/pull/363">#363</a>)</p></li>
|
||||
|
||||
<li><p>Run corrections for CRP names in the database:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db dspace -u dspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run all system updates on CGSpace (linode18) and reboot the server</li>
|
||||
<li>I started a full Discovery re-index on CGSpace because of the updated CRPs</li>
|
||||
<li>I see this error in the DSpace log:</li>
|
||||
</ul>
|
||||
<li><p>Run all system updates on CGSpace (linode18) and reboot the server</p></li>
|
||||
|
||||
<li><p>I started a full Discovery re-index on CGSpace because of the updated CRPs</p></li>
|
||||
|
||||
<li><p>I see this error in the DSpace log:</p>
|
||||
|
||||
<pre><code>2018-03-20 19:03:14,844 ERROR com.atmire.dspace.discovery.AtmireSolrService @ No choices plugin was configured for field "dc_contributor_author".
|
||||
java.lang.IllegalArgumentException: No choices plugin was configured for field "dc_contributor_author".
|
||||
at org.dspace.content.authority.ChoiceAuthorityManager.getLabel(ChoiceAuthorityManager.java:261)
|
||||
at org.dspace.content.authority.ChoiceAuthorityManager.getLabel(ChoiceAuthorityManager.java:249)
|
||||
at org.dspace.browse.SolrBrowseCreateDAO.additionalIndex(SolrBrowseCreateDAO.java:215)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:662)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:807)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:876)
|
||||
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
|
||||
at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
</code></pre>
|
||||
at org.dspace.content.authority.ChoiceAuthorityManager.getLabel(ChoiceAuthorityManager.java:261)
|
||||
at org.dspace.content.authority.ChoiceAuthorityManager.getLabel(ChoiceAuthorityManager.java:249)
|
||||
at org.dspace.browse.SolrBrowseCreateDAO.additionalIndex(SolrBrowseCreateDAO.java:215)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:662)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:807)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:876)
|
||||
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
|
||||
at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I have to figure that one out…</li>
|
||||
<li><p>I have to figure that one out…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-03-21">2018-03-21</h2>
|
||||
@ -516,75 +512,71 @@ COPY 56156
|
||||
|
||||
<ul>
|
||||
<li>Afterwards we’ll want to do some batch tagging of ORCID identifiers to these names</li>
|
||||
<li>CGSpace crashed again this afternoon, I’m not sure of the cause but there are a lot of SQL errors in the DSpace log:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>CGSpace crashed again this afternoon, I’m not sure of the cause but there are a lot of SQL errors in the DSpace log:</p>
|
||||
|
||||
<pre><code>2018-03-21 15:11:08,166 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
|
||||
java.sql.SQLException: Connection has already been closed.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I have no idea why so many connections were abandoned this afternoon:</li>
|
||||
</ul>
|
||||
<li><p>I have no idea why so many connections were abandoned this afternoon:</p>
|
||||
|
||||
<pre><code># grep 'Mar 21, 2018' /var/log/tomcat7/catalina.out | grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon'
|
||||
268
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test crashed again due to Java heap space, this is from the DSpace log:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test crashed again due to Java heap space, this is from the DSpace log:</p>
|
||||
|
||||
<pre><code>2018-03-21 15:18:48,149 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
|
||||
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And this is from the Tomcat Catalina log:</li>
|
||||
</ul>
|
||||
<li><p>And this is from the Tomcat Catalina log:</p>
|
||||
|
||||
<pre><code>Mar 21, 2018 11:20:00 AM org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor run
|
||||
SEVERE: Unexpected death of background thread ContainerBackgroundProcessor[StandardEngine[Catalina]]
|
||||
java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But there are tons of heap space errors on DSpace Test actually:</li>
|
||||
</ul>
|
||||
<li><p>But there are tons of heap space errors on DSpace Test actually:</p>
|
||||
|
||||
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
|
||||
319
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I guess we need to give it more RAM because it now has CGSpace’s large Solr core</li>
|
||||
<li>I will increase the memory from 3072m to 4096m</li>
|
||||
<li>Update <a href="https://github.com/ilri/rmg-ansible-public">Ansible playbooks</a> to use <a href="https://jdbc.postgresql.org/">PostgreSQL JBDC driver</a> 42.2.2</li>
|
||||
<li>Deploy the new JDBC driver on DSpace Test</li>
|
||||
<li>I’m also curious to see how long the <code>dspace index-discovery -b</code> takes on DSpace Test where the DSpace installation directory is on one of Linode’s new block storage volumes</li>
|
||||
</ul>
|
||||
<li><p>I guess we need to give it more RAM because it now has CGSpace’s large Solr core</p></li>
|
||||
|
||||
<li><p>I will increase the memory from 3072m to 4096m</p></li>
|
||||
|
||||
<li><p>Update <a href="https://github.com/ilri/rmg-ansible-public">Ansible playbooks</a> to use <a href="https://jdbc.postgresql.org/">PostgreSQL JBDC driver</a> 42.2.2</p></li>
|
||||
|
||||
<li><p>Deploy the new JDBC driver on DSpace Test</p></li>
|
||||
|
||||
<li><p>I’m also curious to see how long the <code>dspace index-discovery -b</code> takes on DSpace Test where the DSpace installation directory is on one of Linode’s new block storage volumes</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
|
||||
real 208m19.155s
|
||||
user 8m39.138s
|
||||
sys 2m45.135s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So that’s about three times as long as it took on CGSpace this morning</li>
|
||||
<li>I should also check the raw read speed with <code>hdparm -tT /dev/sdc</code></li>
|
||||
<li>Looking at Peter’s author corrections there are some mistakes due to Windows 1252 encoding</li>
|
||||
<li>I need to find a way to filter these easily with OpenRefine</li>
|
||||
<li>For example, Peter has inadvertantly introduced Unicode character 0xfffd into several fields</li>
|
||||
<li>I can search for Unicode values by their hex code in OpenRefine using the following GREL expression:</li>
|
||||
</ul>
|
||||
<li><p>So that’s about three times as long as it took on CGSpace this morning</p></li>
|
||||
|
||||
<li><p>I should also check the raw read speed with <code>hdparm -tT /dev/sdc</code></p></li>
|
||||
|
||||
<li><p>Looking at Peter’s author corrections there are some mistakes due to Windows 1252 encoding</p></li>
|
||||
|
||||
<li><p>I need to find a way to filter these easily with OpenRefine</p></li>
|
||||
|
||||
<li><p>For example, Peter has inadvertantly introduced Unicode character 0xfffd into several fields</p></li>
|
||||
|
||||
<li><p>I can search for Unicode values by their hex code in OpenRefine using the following GREL expression:</p>
|
||||
|
||||
<pre><code>isNotNull(value.match(/.*\ufffd.*/))
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I need to be able to add many common characters though so that it is useful to copy and paste into a new project to find issues</li>
|
||||
<li><p>I need to be able to add many common characters though so that it is useful to copy and paste into a new project to find issues</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-03-22">2018-03-22</h2>
|
||||
@ -605,36 +597,31 @@ sys 2m45.135s
|
||||
|
||||
<ul>
|
||||
<li>Looking at Peter’s author corrections and trying to work out a way to find errors in OpenRefine easily</li>
|
||||
<li>I can find all names that have acceptable characters using a GREL expression like:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I can find all names that have acceptable characters using a GREL expression like:</p>
|
||||
|
||||
<pre><code>isNotNull(value.match(/.*[a-zA-ZáÁéèïíñØøöóúü].*/))
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But it’s probably better to just say which characters I know for sure are not valid (like parentheses, pipe, or weird Unicode characters):</li>
|
||||
</ul>
|
||||
<li><p>But it’s probably better to just say which characters I know for sure are not valid (like parentheses, pipe, or weird Unicode characters):</p>
|
||||
|
||||
<pre><code>or(
|
||||
isNotNull(value.match(/.*[(|)].*/)),
|
||||
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||
isNotNull(value.match(/.*\u00A0.*/)),
|
||||
isNotNull(value.match(/.*\u200A.*/))
|
||||
isNotNull(value.match(/.*[(|)].*/)),
|
||||
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||
isNotNull(value.match(/.*\u00A0.*/)),
|
||||
isNotNull(value.match(/.*\u200A.*/))
|
||||
)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And here’s one combined GREL expression to check for items marked as to delete or check so I can flag them and export them to a separate CSV (though perhaps it’s time to add delete support to my <code>fix-metadata-values.py</code> script:</li>
|
||||
</ul>
|
||||
<li><p>And here’s one combined GREL expression to check for items marked as to delete or check so I can flag them and export them to a separate CSV (though perhaps it’s time to add delete support to my <code>fix-metadata-values.py</code> script:</p>
|
||||
|
||||
<pre><code>or(
|
||||
isNotNull(value.match(/.*delete.*/i)),
|
||||
isNotNull(value.match(/.*remove.*/i)),
|
||||
isNotNull(value.match(/.*check.*/i))
|
||||
isNotNull(value.match(/.*delete.*/i)),
|
||||
isNotNull(value.match(/.*remove.*/i)),
|
||||
isNotNull(value.match(/.*check.*/i))
|
||||
)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><p>So I guess the routine is in OpenRefine is:</p>
|
||||
|
||||
<ul>
|
||||
@ -644,17 +631,17 @@ sys 2m45.135s
|
||||
<li>Custom text facet for illegal characters</li>
|
||||
</ul></li>
|
||||
|
||||
<li><p>Test the corrections and deletions locally, then run them on CGSpace:</p></li>
|
||||
</ul>
|
||||
<li><p>Test the corrections and deletions locally, then run them on CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-2928-Authors-2018-03-21.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
|
||||
$ ./delete-metadata-values.py -i /tmp/Delete-8-Authors-2018-03-21.csv -f dc.contributor.author -m 3 -db dspacetest -u dspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Afterwards I started a full Discovery reindexing on both CGSpace and DSpace Test</li>
|
||||
<li>CGSpace took 76m28.292s</li>
|
||||
<li>DSpace Test took 194m56.048s</li>
|
||||
<li><p>Afterwards I started a full Discovery reindexing on both CGSpace and DSpace Test</p></li>
|
||||
|
||||
<li><p>CGSpace took 76m28.292s</p></li>
|
||||
|
||||
<li><p>DSpace Test took 194m56.048s</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-03-26">2018-03-26</h2>
|
||||
@ -674,16 +661,15 @@ $ ./delete-metadata-values.py -i /tmp/Delete-8-Authors-2018-03-21.csv -f dc.cont
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test crashed due to heap space so I’ve increased it from 4096m to 5120m</li>
|
||||
<li>The error in Tomcat’s <code>catalina.out</code> was:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The error in Tomcat’s <code>catalina.out</code> was:</p>
|
||||
|
||||
<pre><code>Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Add ISI Journal (cg.isijournal) as an option in Atmire’s Listing and Reports layout (<a href="https://github.com/ilri/DSpace/pull/370">#370</a>) for Abenet</li>
|
||||
<li>I noticed a few hundred CRPs using the old capitalized formatting so I corrected them:</li>
|
||||
</ul>
|
||||
<li><p>Add ISI Journal (cg.isijournal) as an option in Atmire’s Listing and Reports layout (<a href="https://github.com/ilri/DSpace/pull/370">#370</a>) for Abenet</p></li>
|
||||
|
||||
<li><p>I noticed a few hundred CRPs using the old capitalized formatting so I corrected them:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db cgspace -u cgspace -p 'fuuu'
|
||||
Fixed 29 occurences of: CLIMATE CHANGE, AGRICULTURE AND FOOD SECURITY
|
||||
@ -696,14 +682,17 @@ Fixed 11 occurences of: POLICIES, INSTITUTIONS, AND MARKETS
|
||||
Fixed 28 occurences of: GRAIN LEGUMES
|
||||
Fixed 3 occurences of: FORESTS, TREES AND AGROFORESTRY
|
||||
Fixed 5 occurences of: GENEBANKS
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>That’s weird because we just updated them last week…</li>
|
||||
<li>Create a pull request to enable searching by ORCID identifier (<code>cg.creator.id</code>) in Discovery and Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/371">#371</a>)</li>
|
||||
<li>I will test it on DSpace Test first!</li>
|
||||
<li>Fix one missing XMLUI string for “Access Status” (cg.identifier.status)</li>
|
||||
<li>Run all system updates on DSpace Test and reboot the machine</li>
|
||||
<li><p>That’s weird because we just updated them last week…</p></li>
|
||||
|
||||
<li><p>Create a pull request to enable searching by ORCID identifier (<code>cg.creator.id</code>) in Discovery and Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/371">#371</a>)</p></li>
|
||||
|
||||
<li><p>I will test it on DSpace Test first!</p></li>
|
||||
|
||||
<li><p>Fix one missing XMLUI string for “Access Status” (cg.identifier.status)</p></li>
|
||||
|
||||
<li><p>Run all system updates on DSpace Test and reboot the machine</p></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -25,7 +25,7 @@ Catalina logs at least show some memory errors yesterday:
|
||||
I tried to test something on DSpace Test but noticed that it’s down since god knows when
|
||||
Catalina logs at least show some memory errors yesterday:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -130,16 +130,14 @@ Exception in thread "ContainerBackgroundProcessor[StandardEngine[Catalina]]
|
||||
|
||||
<ul>
|
||||
<li>Peter noticed that there were still some old CRP names on CGSpace, because I hadn’t forced the Discovery index to be updated after I fixed the others last week</li>
|
||||
<li>For completeness I re-ran the CRP corrections on CGSpace:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>For completeness I re-ran the CRP corrections on CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db dspace -u dspace -p 'fuuu'
|
||||
Fixed 1 occurences of: AGRICULTURE FOR NUTRITION AND HEALTH
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then started a full Discovery index:</li>
|
||||
</ul>
|
||||
<li><p>Then started a full Discovery index:</p>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx1024m'
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
@ -147,18 +145,16 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
real 76m13.841s
|
||||
user 8m22.960s
|
||||
sys 2m2.498s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Elizabeth from CIAT emailed to ask if I could help her by adding ORCID identifiers to all of Joseph Tohme’s items</li>
|
||||
<li>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</li>
|
||||
</ul>
|
||||
<li><p>Elizabeth from CIAT emailed to ask if I could help her by adding ORCID identifiers to all of Joseph Tohme’s items</p></li>
|
||||
|
||||
<li><p>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</p>
|
||||
|
||||
<pre><code>$ ./add-orcid-identifiers-csv.py -i /tmp/jtohme-2018-04-04.csv -db dspace -u dspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The CSV format of <code>jtohme-2018-04-04.csv</code> was:</li>
|
||||
<li><p>The CSV format of <code>jtohme-2018-04-04.csv</code> was:</p></li>
|
||||
</ul>
|
||||
|
||||
<pre><code class="language-csv">dc.contributor.author,cg.creator.id
|
||||
@ -168,16 +164,15 @@ sys 2m2.498s
|
||||
<ul>
|
||||
<li>There was a quoting error in my CRP CSV and the replacements for <code>Forests, Trees and Agroforestry</code> got messed up</li>
|
||||
<li>So I fixed them and had to re-index again!</li>
|
||||
<li>I started preparing the git branch for the the DSpace 5.5→5.8 upgrade:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I started preparing the git branch for the the DSpace 5.5→5.8 upgrade:</p>
|
||||
|
||||
<pre><code>$ git checkout -b 5_x-dspace-5.8 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.8
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I was prepared to skip some commits that I had cherry picked from the upstream <code>dspace-5_x</code> branch when we did the DSpace 5.5 upgrade (see notes on 2016-10-19 and 2017-12-17):
|
||||
<li><p>I was prepared to skip some commits that I had cherry picked from the upstream <code>dspace-5_x</code> branch when we did the DSpace 5.5 upgrade (see notes on 2016-10-19 and 2017-12-17):</p>
|
||||
|
||||
<ul>
|
||||
<li>[DS-3246] Improve cleanup in recyclable components (upstream commit on dspace-5_x: 9f0f5940e7921765c6a22e85337331656b18a403)</li>
|
||||
@ -185,93 +180,95 @@ $ git rebase -i dspace-5.8
|
||||
<li>bump up to latest minor pdfbox version (upstream commit on dspace-5_x: b5330b78153b2052ed3dc2fd65917ccdbfcc0439)</li>
|
||||
<li>DS-3583 Usage of correct Collection Array (#1731) (upstream commit on dspace-5_x: c8f62e6f496fa86846bfa6bcf2d16811087d9761)</li>
|
||||
</ul></li>
|
||||
<li>… but somehow git knew, and didn’t include them in my interactive rebase!</li>
|
||||
<li>I need to send this branch to Atmire and also arrange payment (see <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">ticket #560</a> in their tracker)</li>
|
||||
<li>Fix Sisay’s SSH access to the new DSpace Test server (linode19)</li>
|
||||
|
||||
<li><p>… but somehow git knew, and didn’t include them in my interactive rebase!</p></li>
|
||||
|
||||
<li><p>I need to send this branch to Atmire and also arrange payment (see <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">ticket #560</a> in their tracker)</p></li>
|
||||
|
||||
<li><p>Fix Sisay’s SSH access to the new DSpace Test server (linode19)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-04-05">2018-04-05</h2>
|
||||
|
||||
<ul>
|
||||
<li>Fix Sisay’s sudo access on the new DSpace Test server (linode19)</li>
|
||||
<li>The reindexing process on DSpace Test took <em>forever</em> yesterday:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The reindexing process on DSpace Test took <em>forever</em> yesterday:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
|
||||
real 599m32.961s
|
||||
user 9m3.947s
|
||||
sys 2m52.585s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So we really should not use this Linode block storage for Solr</li>
|
||||
<li>Assetstore might be fine but would complicate things with configuration and deployment (ughhh)</li>
|
||||
<li>Better to use Linode block storage only for backup</li>
|
||||
<li>Help Peter with the GDPR compliance / reporting form for CGSpace</li>
|
||||
<li>DSpace Test crashed due to memory issues again:</li>
|
||||
</ul>
|
||||
<li><p>So we really should not use this Linode block storage for Solr</p></li>
|
||||
|
||||
<li><p>Assetstore might be fine but would complicate things with configuration and deployment (ughhh)</p></li>
|
||||
|
||||
<li><p>Better to use Linode block storage only for backup</p></li>
|
||||
|
||||
<li><p>Help Peter with the GDPR compliance / reporting form for CGSpace</p></li>
|
||||
|
||||
<li><p>DSpace Test crashed due to memory issues again:</p>
|
||||
|
||||
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
|
||||
16
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
||||
<li>Proof some records on DSpace Test for Udana from IWMI</li>
|
||||
<li>He has done better with the small syntax and consistency issues but then there are larger concerns with not linking to DOIs, copying titles incorrectly, etc</li>
|
||||
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
|
||||
|
||||
<li><p>Proof some records on DSpace Test for Udana from IWMI</p></li>
|
||||
|
||||
<li><p>He has done better with the small syntax and consistency issues but then there are larger concerns with not linking to DOIs, copying titles incorrectly, etc</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-04-10">2018-04-10</h2>
|
||||
|
||||
<ul>
|
||||
<li>I got a notice that CGSpace CPU usage was very high this morning</li>
|
||||
<li>Looking at the nginx logs, here are the top users today so far:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Looking at the nginx logs, here are the top users today so far:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Apr/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
282 207.46.13.112
|
||||
286 54.175.208.220
|
||||
287 207.46.13.113
|
||||
298 66.249.66.153
|
||||
322 207.46.13.114
|
||||
780 104.196.152.243
|
||||
3994 178.154.200.38
|
||||
4295 70.32.83.92
|
||||
4388 95.108.181.88
|
||||
7653 45.5.186.2
|
||||
</code></pre>
|
||||
282 207.46.13.112
|
||||
286 54.175.208.220
|
||||
287 207.46.13.113
|
||||
298 66.249.66.153
|
||||
322 207.46.13.114
|
||||
780 104.196.152.243
|
||||
3994 178.154.200.38
|
||||
4295 70.32.83.92
|
||||
4388 95.108.181.88
|
||||
7653 45.5.186.2
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>45.5.186.2 is of course CIAT</li>
|
||||
<li>95.108.181.88 appears to be Yandex:</li>
|
||||
</ul>
|
||||
<li><p>45.5.186.2 is of course CIAT</p></li>
|
||||
|
||||
<li><p>95.108.181.88 appears to be Yandex:</p>
|
||||
|
||||
<pre><code>95.108.181.88 - - [09/Apr/2018:06:34:16 +0000] "GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1" 200 2638 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And for some reason Yandex created a lot of Tomcat sessions today:</li>
|
||||
</ul>
|
||||
<li><p>And for some reason Yandex created a lot of Tomcat sessions today:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-04-10
|
||||
4363
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>70.32.83.92 appears to be some harvester we’ve seen before, but on a new IP</li>
|
||||
<li>They are not creating new Tomcat sessions so there is no problem there</li>
|
||||
<li>178.154.200.38 also appears to be Yandex, and is also creating many Tomcat sessions:</li>
|
||||
</ul>
|
||||
<li><p>70.32.83.92 appears to be some harvester we’ve seen before, but on a new IP</p></li>
|
||||
|
||||
<li><p>They are not creating new Tomcat sessions so there is no problem there</p></li>
|
||||
|
||||
<li><p>178.154.200.38 also appears to be Yandex, and is also creating many Tomcat sessions:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=178.154.200.38' dspace.log.2018-04-10
|
||||
3982
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’m not sure why Yandex creates so many Tomcat sessions, as its user agent should match the Crawler Session Manager valve</li>
|
||||
<li>Let’s try a manual request with and without their user agent:</li>
|
||||
</ul>
|
||||
<li><p>I’m not sure why Yandex creates so many Tomcat sessions, as its user agent should match the Crawler Session Manager valve</p></li>
|
||||
|
||||
<li><p>Let’s try a manual request with and without their user agent:</p>
|
||||
|
||||
<pre><code>$ http --print Hh https://cgspace.cgiar.org/bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg 'User-Agent:Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)'
|
||||
GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1
|
||||
@ -321,19 +318,19 @@ X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-XSS-Protection: 1; mode=block
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So it definitely looks like Yandex requests are getting assigned a session from the Crawler Session Manager valve</li>
|
||||
<li>And if I look at the DSpace log I see its IP sharing a session with other crawlers like Google (66.249.66.153)</li>
|
||||
<li>Indeed the number of Tomcat sessions appears to be normal:</li>
|
||||
<li><p>So it definitely looks like Yandex requests are getting assigned a session from the Crawler Session Manager valve</p></li>
|
||||
|
||||
<li><p>And if I look at the DSpace log I see its IP sharing a session with other crawlers like Google (66.249.66.153)</p></li>
|
||||
|
||||
<li><p>Indeed the number of Tomcat sessions appears to be normal:</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2018/04/jmx_dspace_sessions-week.png" alt="Tomcat sessions week" /></p>
|
||||
|
||||
<ul>
|
||||
<li>In other news, it looks like the number of total requests processed by nginx in March went down from the previous months:</li>
|
||||
</ul>
|
||||
<li><p>In other news, it looks like the number of total requests processed by nginx in March went down from the previous months:</p>
|
||||
|
||||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Mar/2018"
|
||||
2266594
|
||||
@ -341,85 +338,84 @@ X-XSS-Protection: 1; mode=block
|
||||
real 0m13.658s
|
||||
user 0m16.533s
|
||||
sys 0m1.087s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In other other news, the database cleanup script has an issue again:</li>
|
||||
</ul>
|
||||
<li><p>In other other news, the database cleanup script has an issue again:</p>
|
||||
|
||||
<pre><code>$ dspace cleanup -v
|
||||
...
|
||||
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(151626) is still referenced from table "bundle".
|
||||
</code></pre>
|
||||
Detail: Key (bitstream_id)=(151626) is still referenced from table "bundle".
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The solution is, as always:</li>
|
||||
</ul>
|
||||
<li><p>The solution is, as always:</p>
|
||||
|
||||
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (151626);'
|
||||
UPDATE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking at abandoned connections in Tomcat:</li>
|
||||
</ul>
|
||||
<li><p>Looking at abandoned connections in Tomcat:</p>
|
||||
|
||||
<pre><code># zcat /var/log/tomcat7/catalina.out.[1-9].gz | grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon'
|
||||
2115
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Apparently from these stacktraces we should be able to see which code is not closing connections properly</li>
|
||||
<li>Here’s a pretty good overview of days where we had database issues recently:</li>
|
||||
</ul>
|
||||
<li><p>Apparently from these stacktraces we should be able to see which code is not closing connections properly</p></li>
|
||||
|
||||
<li><p>Here’s a pretty good overview of days where we had database issues recently:</p>
|
||||
|
||||
<pre><code># zcat /var/log/tomcat7/catalina.out.[1-9].gz | grep 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon' | awk '{print $1,$2, $3}' | sort | uniq -c | sort -n
|
||||
1 Feb 18, 2018
|
||||
1 Feb 19, 2018
|
||||
1 Feb 20, 2018
|
||||
1 Feb 24, 2018
|
||||
2 Feb 13, 2018
|
||||
3 Feb 17, 2018
|
||||
5 Feb 16, 2018
|
||||
5 Feb 23, 2018
|
||||
5 Feb 27, 2018
|
||||
6 Feb 25, 2018
|
||||
40 Feb 14, 2018
|
||||
63 Feb 28, 2018
|
||||
154 Mar 19, 2018
|
||||
202 Feb 21, 2018
|
||||
264 Feb 26, 2018
|
||||
268 Mar 21, 2018
|
||||
524 Feb 22, 2018
|
||||
570 Feb 15, 2018
|
||||
</code></pre>
|
||||
1 Feb 18, 2018
|
||||
1 Feb 19, 2018
|
||||
1 Feb 20, 2018
|
||||
1 Feb 24, 2018
|
||||
2 Feb 13, 2018
|
||||
3 Feb 17, 2018
|
||||
5 Feb 16, 2018
|
||||
5 Feb 23, 2018
|
||||
5 Feb 27, 2018
|
||||
6 Feb 25, 2018
|
||||
40 Feb 14, 2018
|
||||
63 Feb 28, 2018
|
||||
154 Mar 19, 2018
|
||||
202 Feb 21, 2018
|
||||
264 Feb 26, 2018
|
||||
268 Mar 21, 2018
|
||||
524 Feb 22, 2018
|
||||
570 Feb 15, 2018
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In Tomcat 8.5 the <code>removeAbandoned</code> property has been split into two: <code>removeAbandonedOnBorrow</code> and <code>removeAbandonedOnMaintenance</code></li>
|
||||
<li>See: <a href="https://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html#Database_Connection_Pool_(DBCP_2)_Configurations">https://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html#Database_Connection_Pool_(DBCP_2)_Configurations</a></li>
|
||||
<li>I assume we want <code>removeAbandonedOnBorrow</code> and make updates to the Tomcat 8 templates in Ansible</li>
|
||||
<li>After reading more documentation I see that Tomcat 8.5’s default DBCP seems to now be Commons DBCP2 instead of Tomcat DBCP</li>
|
||||
<li>It can be overridden in Tomcat’s <em>server.xml</em> by setting <code>factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"</code> in the <code><Resource></code></li>
|
||||
<li>I think we should use this default, so we’ll need to remove some other settings that are specific to Tomcat’s DBCP like <code>jdbcInterceptors</code> and <code>abandonWhenPercentageFull</code></li>
|
||||
<li>Merge the changes adding ORCID identifier to advanced search and Atmire Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/371">#371</a>)</li>
|
||||
<li>Fix one more issue of missing XMLUI strings (for CRP subject when clicking “view more” in the Discovery sidebar)</li>
|
||||
<li>I told Udana to fix the citation and abstract of the one item, and to correct the <code>dc.language.iso</code> for the five Spanish items in his Book Chapters collection</li>
|
||||
<li>Then we can import the records to CGSpace</li>
|
||||
<li><p>In Tomcat 8.5 the <code>removeAbandoned</code> property has been split into two: <code>removeAbandonedOnBorrow</code> and <code>removeAbandonedOnMaintenance</code></p></li>
|
||||
|
||||
<li><p>See: <a href="https://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html#Database_Connection_Pool_(DBCP_2)_Configurations">https://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html#Database_Connection_Pool_(DBCP_2)_Configurations</a></p></li>
|
||||
|
||||
<li><p>I assume we want <code>removeAbandonedOnBorrow</code> and make updates to the Tomcat 8 templates in Ansible</p></li>
|
||||
|
||||
<li><p>After reading more documentation I see that Tomcat 8.5’s default DBCP seems to now be Commons DBCP2 instead of Tomcat DBCP</p></li>
|
||||
|
||||
<li><p>It can be overridden in Tomcat’s <em>server.xml</em> by setting <code>factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"</code> in the <code><Resource></code></p></li>
|
||||
|
||||
<li><p>I think we should use this default, so we’ll need to remove some other settings that are specific to Tomcat’s DBCP like <code>jdbcInterceptors</code> and <code>abandonWhenPercentageFull</code></p></li>
|
||||
|
||||
<li><p>Merge the changes adding ORCID identifier to advanced search and Atmire Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/371">#371</a>)</p></li>
|
||||
|
||||
<li><p>Fix one more issue of missing XMLUI strings (for CRP subject when clicking “view more” in the Discovery sidebar)</p></li>
|
||||
|
||||
<li><p>I told Udana to fix the citation and abstract of the one item, and to correct the <code>dc.language.iso</code> for the five Spanish items in his Book Chapters collection</p></li>
|
||||
|
||||
<li><p>Then we can import the records to CGSpace</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-04-11">2018-04-11</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test (linode19) crashed again some time since yesterday:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test (linode19) crashed again some time since yesterday:</p>
|
||||
|
||||
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
|
||||
168
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I ran all system updates and rebooted the server</li>
|
||||
<li><p>I ran all system updates and rebooted the server</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-04-12">2018-04-12</h2>
|
||||
@ -438,35 +434,34 @@ UPDATE 1
|
||||
<h2 id="2018-04-15">2018-04-15</h2>
|
||||
|
||||
<ul>
|
||||
<li>While testing an XMLUI patch for <a href="https://jira.duraspace.org/browse/DS-3883">DS-3883</a> I noticed that there is still some remaining Authority / Solr configuration left that we need to remove:</li>
|
||||
</ul>
|
||||
<li><p>While testing an XMLUI patch for <a href="https://jira.duraspace.org/browse/DS-3883">DS-3883</a> I noticed that there is still some remaining Authority / Solr configuration left that we need to remove:</p>
|
||||
|
||||
<pre><code>2018-04-14 18:55:25,841 ERROR org.dspace.authority.AuthoritySolrServiceImpl @ Authority solr is not correctly configured, check "solr.authority.server" property in the dspace.cfg
|
||||
java.lang.NullPointerException
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I assume we need to remove <code>authority</code> from the consumers in <code>dspace/config/dspace.cfg</code>:</li>
|
||||
</ul>
|
||||
<li><p>I assume we need to remove <code>authority</code> from the consumers in <code>dspace/config/dspace.cfg</code>:</p>
|
||||
|
||||
<pre><code>event.dispatcher.default.consumers = authority, versioning, discovery, eperson, harvester, statistics,batchedit, versioningmqm
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see the same error on DSpace Test so this is definitely a problem</li>
|
||||
<li>After disabling the authority consumer I no longer see the error</li>
|
||||
<li>I merged a pull request to the <code>5_x-prod</code> branch to clean that up (<a href="https://github.com/ilri/DSpace/pull/372">#372</a>)</li>
|
||||
<li>File a ticket on DSpace’s Jira for the <code>target="_blank"</code> security and performance issue (<a href="https://jira.duraspace.org/browse/DS-3891">DS-3891</a>)</li>
|
||||
<li>I re-deployed DSpace Test (linode19) and was surprised by how long it took the ant update to complete:</li>
|
||||
</ul>
|
||||
<li><p>I see the same error on DSpace Test so this is definitely a problem</p></li>
|
||||
|
||||
<li><p>After disabling the authority consumer I no longer see the error</p></li>
|
||||
|
||||
<li><p>I merged a pull request to the <code>5_x-prod</code> branch to clean that up (<a href="https://github.com/ilri/DSpace/pull/372">#372</a>)</p></li>
|
||||
|
||||
<li><p>File a ticket on DSpace’s Jira for the <code>target="_blank"</code> security and performance issue (<a href="https://jira.duraspace.org/browse/DS-3891">DS-3891</a>)</p></li>
|
||||
|
||||
<li><p>I re-deployed DSpace Test (linode19) and was surprised by how long it took the ant update to complete:</p>
|
||||
|
||||
<pre><code>BUILD SUCCESSFUL
|
||||
Total time: 4 minutes 12 seconds
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The Linode block storage is much slower than the instance storage</li>
|
||||
<li>I ran all system updates and rebooted DSpace Test (linode19)</li>
|
||||
<li><p>The Linode block storage is much slower than the instance storage</p></li>
|
||||
|
||||
<li><p>I ran all system updates and rebooted DSpace Test (linode19)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-04-16">2018-04-16</h2>
|
||||
@ -481,69 +476,79 @@ Total time: 4 minutes 12 seconds
|
||||
<li>IWMI people are asking about building a search query that outputs RSS for their reports</li>
|
||||
<li>They want the same results as this Discovery query: <a href="https://cgspace.cgiar.org/discover?filtertype_1=dateAccessioned&filter_relational_operator_1=contains&filter_1=2018&submit_apply_filter=&query=&scope=10568%2F16814&rpp=100&sort_by=dc.date.issued_dt&order=desc">https://cgspace.cgiar.org/discover?filtertype_1=dateAccessioned&filter_relational_operator_1=contains&filter_1=2018&submit_apply_filter=&query=&scope=10568%2F16814&rpp=100&sort_by=dc.date.issued_dt&order=desc</a></li>
|
||||
<li>They will need to use OpenSearch, but I can’t remember all the parameters</li>
|
||||
<li>Apparently search sort options for OpenSearch are in <code>dspace.cfg</code>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Apparently search sort options for OpenSearch are in <code>dspace.cfg</code>:</p>
|
||||
|
||||
<pre><code>webui.itemlist.sort-option.1 = title:dc.title:title
|
||||
webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
|
||||
webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
|
||||
webui.itemlist.sort-option.4 = type:dc.type:text
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>They want items by issue date, so we need to use sort option 2</li>
|
||||
<li>According to the DSpace Manual there are only the following parameters to OpenSearch: format, scope, rpp, start, and sort_by</li>
|
||||
<li>The OpenSearch <code>query</code> parameter expects a Discovery search filter that is defined in <code>dspace/config/spring/api/discovery.xml</code></li>
|
||||
<li>So for IWMI they should be able to use something like this: <a href="https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&scope=10568/16814&sort_by=2&order=DESC&format=rss">https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&scope=10568/16814&sort_by=2&order=DESC&format=rss</a></li>
|
||||
<li>There are also <code>rpp</code> (results per page) and <code>start</code> parameters but in my testing now on DSpace 5.5 they behave very strangely</li>
|
||||
<li>For example, set <code>rpp=1</code> and then check the results for <code>start</code> values of 0, 1, and 2 and they are all the same!</li>
|
||||
<li>If I have time I will check if this behavior persists on DSpace 6.x on the official DSpace demo and file a bug</li>
|
||||
<li>Also, the DSpace Manual as of 5.x has very poor documentation for OpenSearch</li>
|
||||
<li>They don’t tell you to use Discovery search filters in the <code>query</code> (with format <code>query=dateIssued:2018</code>)</li>
|
||||
<li>They don’t tell you that the sort options are actually defined in <code>dspace.cfg</code> (ie, you need to use <code>2</code> instead of <code>dc.date.issued_dt</code>)</li>
|
||||
<li>They are missing the <code>order</code> parameter (ASC vs DESC)</li>
|
||||
<li>I notice that DSpace Test has crashed again, due to memory:</li>
|
||||
</ul>
|
||||
<li><p>They want items by issue date, so we need to use sort option 2</p></li>
|
||||
|
||||
<li><p>According to the DSpace Manual there are only the following parameters to OpenSearch: format, scope, rpp, start, and sort_by</p></li>
|
||||
|
||||
<li><p>The OpenSearch <code>query</code> parameter expects a Discovery search filter that is defined in <code>dspace/config/spring/api/discovery.xml</code></p></li>
|
||||
|
||||
<li><p>So for IWMI they should be able to use something like this: <a href="https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&scope=10568/16814&sort_by=2&order=DESC&format=rss">https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&scope=10568/16814&sort_by=2&order=DESC&format=rss</a></p></li>
|
||||
|
||||
<li><p>There are also <code>rpp</code> (results per page) and <code>start</code> parameters but in my testing now on DSpace 5.5 they behave very strangely</p></li>
|
||||
|
||||
<li><p>For example, set <code>rpp=1</code> and then check the results for <code>start</code> values of 0, 1, and 2 and they are all the same!</p></li>
|
||||
|
||||
<li><p>If I have time I will check if this behavior persists on DSpace 6.x on the official DSpace demo and file a bug</p></li>
|
||||
|
||||
<li><p>Also, the DSpace Manual as of 5.x has very poor documentation for OpenSearch</p></li>
|
||||
|
||||
<li><p>They don’t tell you to use Discovery search filters in the <code>query</code> (with format <code>query=dateIssued:2018</code>)</p></li>
|
||||
|
||||
<li><p>They don’t tell you that the sort options are actually defined in <code>dspace.cfg</code> (ie, you need to use <code>2</code> instead of <code>dc.date.issued_dt</code>)</p></li>
|
||||
|
||||
<li><p>They are missing the <code>order</code> parameter (ASC vs DESC)</p></li>
|
||||
|
||||
<li><p>I notice that DSpace Test has crashed again, due to memory:</p>
|
||||
|
||||
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
|
||||
178
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I will increase the JVM heap size from 5120M to 6144M, though we don’t have much room left to grow as DSpace Test (linode19) is using a smaller instance size than CGSpace</li>
|
||||
<li>Gabriela from CIP asked if I could send her a list of all CIP authors so she can do some replacements on the name formats</li>
|
||||
<li>I got a list of all the CIP collections manually and use the same query that I used in <a href="/cgspace-notes/2017-08">August, 2017</a>:</li>
|
||||
</ul>
|
||||
<li><p>I will increase the JVM heap size from 5120M to 6144M, though we don’t have much room left to grow as DSpace Test (linode19) is using a smaller instance size than CGSpace</p></li>
|
||||
|
||||
<li><p>Gabriela from CIP asked if I could send her a list of all CIP authors so she can do some replacements on the name formats</p></li>
|
||||
|
||||
<li><p>I got a list of all the CIP collections manually and use the same query that I used in <a href="/cgspace-notes/2017-08">August, 2017</a>:</p>
|
||||
|
||||
<pre><code>dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/89347', '10568/88229', '10568/53086', '10568/53085', '10568/69069', '10568/53087', '10568/53088', '10568/53089', '10568/53090', '10568/53091', '10568/53092', '10568/70150', '10568/53093', '10568/64874', '10568/53094'))) group by text_value order by count desc) to /tmp/cip-authors.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-04-19">2018-04-19</h2>
|
||||
|
||||
<ul>
|
||||
<li>Run updates on DSpace Test (linode19) and reboot the server</li>
|
||||
<li>Also try deploying updated GeoLite database during ant update while re-deploying code:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Also try deploying updated GeoLite database during ant update while re-deploying code:</p>
|
||||
|
||||
<pre><code>$ ant update update_geolite clean_backups
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I also re-deployed CGSpace (linode18) to make the ORCID search, authority cleanup, CCAFS project tag <code>PII-LAM_CSAGender</code> live</li>
|
||||
<li>When re-deploying I also updated the GeoLite databases so I hope the country stats become more accurate…</li>
|
||||
<li>After re-deployment I ran all system updates on the server and rebooted it</li>
|
||||
<li>After the reboot I forced a reïndexing of the Discovery to populate the new ORCID index:</li>
|
||||
</ul>
|
||||
<li><p>I also re-deployed CGSpace (linode18) to make the ORCID search, authority cleanup, CCAFS project tag <code>PII-LAM_CSAGender</code> live</p></li>
|
||||
|
||||
<li><p>When re-deploying I also updated the GeoLite databases so I hope the country stats become more accurate…</p></li>
|
||||
|
||||
<li><p>After re-deployment I ran all system updates on the server and rebooted it</p></li>
|
||||
|
||||
<li><p>After the reboot I forced a reïndexing of the Discovery to populate the new ORCID index:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
|
||||
real 73m42.635s
|
||||
user 8m15.885s
|
||||
sys 2m2.687s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This time is with about 70,000 items in the repository</li>
|
||||
<li><p>This time is with about 70,000 items in the repository</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-04-20">2018-04-20</h2>
|
||||
@ -551,48 +556,40 @@ sys 2m2.687s
|
||||
<ul>
|
||||
<li>Gabriela from CIP emailed to say that CGSpace was returning a white page, but I haven’t seen any emails from UptimeRobot</li>
|
||||
<li>I confirm that it’s just giving a white page around 4:16</li>
|
||||
<li>The DSpace logs show that there are no database connections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The DSpace logs show that there are no database connections:</p>
|
||||
|
||||
<pre><code>org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-715] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle:0; lastwait:5000].
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And there have been shit tons of errors in the last (starting only 20 minutes ago luckily):</li>
|
||||
</ul>
|
||||
<li><p>And there have been shit tons of errors in the last (starting only 20 minutes ago luckily):</p>
|
||||
|
||||
<pre><code># grep -c 'org.apache.tomcat.jdbc.pool.PoolExhaustedException' /home/cgspace.cgiar.org/log/dspace.log.2018-04-20
|
||||
32147
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I can’t even log into PostgreSQL as the <code>postgres</code> user, WTF?</li>
|
||||
</ul>
|
||||
<li><p>I can’t even log into PostgreSQL as the <code>postgres</code> user, WTF?</p>
|
||||
|
||||
<pre><code>$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
|
||||
^C
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Here are the most active IPs today:</li>
|
||||
</ul>
|
||||
<li><p>Here are the most active IPs today:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "20/Apr/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
917 207.46.13.182
|
||||
935 213.55.99.121
|
||||
970 40.77.167.134
|
||||
978 207.46.13.80
|
||||
1422 66.249.64.155
|
||||
1577 50.116.102.77
|
||||
2456 95.108.181.88
|
||||
3216 104.196.152.243
|
||||
4325 70.32.83.92
|
||||
10718 45.5.184.2
|
||||
</code></pre>
|
||||
917 207.46.13.182
|
||||
935 213.55.99.121
|
||||
970 40.77.167.134
|
||||
978 207.46.13.80
|
||||
1422 66.249.64.155
|
||||
1577 50.116.102.77
|
||||
2456 95.108.181.88
|
||||
3216 104.196.152.243
|
||||
4325 70.32.83.92
|
||||
10718 45.5.184.2
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It doesn’t even seem like there is a lot of traffic compared to the previous days:</li>
|
||||
</ul>
|
||||
<li><p>It doesn’t even seem like there is a lot of traffic compared to the previous days:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "20/Apr/2018" | wc -l
|
||||
74931
|
||||
@ -600,43 +597,46 @@ sys 2m2.687s
|
||||
91073
|
||||
# zcat --force /var/log/nginx/*.log.2.gz /var/log/nginx/*.log.3.gz| grep -E "18/Apr/2018" | wc -l
|
||||
93459
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I tried to restart Tomcat but <code>systemctl</code> hangs</li>
|
||||
<li>I tried to reboot the server from the command line but after a few minutes it didn’t come back up</li>
|
||||
<li>Looking at the Linode console I see that it is stuck trying to shut down</li>
|
||||
<li>Even “Reboot” via Linode console doesn’t work!</li>
|
||||
<li>After shutting it down a few times via the Linode console it finally rebooted</li>
|
||||
<li>Everything is back but I have no idea what caused this—I suspect something with the hosting provider</li>
|
||||
<li>Also super weird, the last entry in the DSpace log file is from <code>2018-04-20 16:35:09</code>, and then immediately it goes to <code>2018-04-20 19:15:04</code> (three hours later!):</li>
|
||||
</ul>
|
||||
<li><p>I tried to restart Tomcat but <code>systemctl</code> hangs</p></li>
|
||||
|
||||
<li><p>I tried to reboot the server from the command line but after a few minutes it didn’t come back up</p></li>
|
||||
|
||||
<li><p>Looking at the Linode console I see that it is stuck trying to shut down</p></li>
|
||||
|
||||
<li><p>Even “Reboot” via Linode console doesn’t work!</p></li>
|
||||
|
||||
<li><p>After shutting it down a few times via the Linode console it finally rebooted</p></li>
|
||||
|
||||
<li><p>Everything is back but I have no idea what caused this—I suspect something with the hosting provider</p></li>
|
||||
|
||||
<li><p>Also super weird, the last entry in the DSpace log file is from <code>2018-04-20 16:35:09</code>, and then immediately it goes to <code>2018-04-20 19:15:04</code> (three hours later!):</p>
|
||||
|
||||
<pre><code>2018-04-20 16:35:09,144 ERROR org.dspace.app.util.AbstractDSpaceWebapp @ Failed to record shutdown in Webapp table.
|
||||
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle
|
||||
:0; lastwait:5000].
|
||||
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:685)
|
||||
at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:187)
|
||||
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:128)
|
||||
at org.dspace.storage.rdbms.DatabaseManager.getConnection(DatabaseManager.java:632)
|
||||
at org.dspace.core.Context.init(Context.java:121)
|
||||
at org.dspace.core.Context.<init>(Context.java:95)
|
||||
at org.dspace.app.util.AbstractDSpaceWebapp.deregister(AbstractDSpaceWebapp.java:97)
|
||||
at org.dspace.app.util.DSpaceContextListener.contextDestroyed(DSpaceContextListener.java:146)
|
||||
at org.apache.catalina.core.StandardContext.listenerStop(StandardContext.java:5115)
|
||||
at org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5779)
|
||||
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:224)
|
||||
at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1588)
|
||||
at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1577)
|
||||
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
|
||||
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
|
||||
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
|
||||
at java.lang.Thread.run(Thread.java:748)
|
||||
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:685)
|
||||
at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:187)
|
||||
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:128)
|
||||
at org.dspace.storage.rdbms.DatabaseManager.getConnection(DatabaseManager.java:632)
|
||||
at org.dspace.core.Context.init(Context.java:121)
|
||||
at org.dspace.core.Context.<init>(Context.java:95)
|
||||
at org.dspace.app.util.AbstractDSpaceWebapp.deregister(AbstractDSpaceWebapp.java:97)
|
||||
at org.dspace.app.util.DSpaceContextListener.contextDestroyed(DSpaceContextListener.java:146)
|
||||
at org.apache.catalina.core.StandardContext.listenerStop(StandardContext.java:5115)
|
||||
at org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5779)
|
||||
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:224)
|
||||
at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1588)
|
||||
at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1577)
|
||||
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
|
||||
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
|
||||
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
|
||||
at java.lang.Thread.run(Thread.java:748)
|
||||
2018-04-20 19:15:04,006 INFO org.dspace.core.ConfigurationManager @ Loading from classloader: file:/home/cgspace.cgiar.org/config/dspace.cfg
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Very suspect!</li>
|
||||
<li><p>Very suspect!</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-04-24">2018-04-24</h2>
|
||||
@ -660,34 +660,32 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Time
|
||||
<ul>
|
||||
<li>Still testing the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> for Ubuntu 18.04, Tomcat 8.5, and PostgreSQL 9.6</li>
|
||||
<li>One other new thing I notice is that PostgreSQL 9.6 no longer uses <code>createuser</code> and <code>nocreateuser</code>, as those have actually meant <code>superuser</code> and <code>nosuperuser</code> and have been deprecated for <em>ten years</em></li>
|
||||
<li>So for my notes, when I’m importing a CGSpace database dump I need to amend my notes to give super user permission to a user, rather than create user:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>So for my notes, when I’m importing a CGSpace database dump I need to amend my notes to give super user permission to a user, rather than create user:</p>
|
||||
|
||||
<pre><code>$ psql dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-18.backup
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There’s another issue with Tomcat in Ubuntu 18.04:</li>
|
||||
</ul>
|
||||
<li><p>There’s another issue with Tomcat in Ubuntu 18.04:</p>
|
||||
|
||||
<pre><code>25-Apr-2018 13:26:21.493 SEVERE [http-nio-127.0.0.1-8443-exec-1] org.apache.coyote.AbstractProtocol$ConnectionHandler.process Error reading request, ignored
|
||||
java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
|
||||
at org.apache.coyote.http11.Http11InputBuffer.init(Http11InputBuffer.java:688)
|
||||
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:672)
|
||||
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
|
||||
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
|
||||
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459)
|
||||
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
|
||||
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
|
||||
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
|
||||
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
|
||||
at java.lang.Thread.run(Thread.java:748)
|
||||
</code></pre>
|
||||
java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
|
||||
at org.apache.coyote.http11.Http11InputBuffer.init(Http11InputBuffer.java:688)
|
||||
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:672)
|
||||
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
|
||||
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
|
||||
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459)
|
||||
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
|
||||
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
|
||||
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
|
||||
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
|
||||
at java.lang.Thread.run(Thread.java:748)
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There’s a <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=895866">Debian bug about this from a few weeks ago</a></li>
|
||||
<li>Apparently Tomcat was compiled with Java 9, so doesn’t work with Java 8</li>
|
||||
<li><p>There’s a <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=895866">Debian bug about this from a few weeks ago</a></p></li>
|
||||
|
||||
<li><p>Apparently Tomcat was compiled with Java 9, so doesn’t work with Java 8</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-04-29">2018-04-29</h2>
|
||||
|
@ -37,7 +37,7 @@ http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E
|
||||
Then I reduced the JVM heap size from 6144 back to 5120m
|
||||
Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -164,72 +164,75 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
|
||||
<ul>
|
||||
<li>It turns out that the IITA records that I was helping Sisay with in March were imported in 2018-04 without a final check by Abenet or I</li>
|
||||
<li>There are lots of errors on language, CRP, and even some encoding errors on abstract fields</li>
|
||||
<li>I export them and include the hidden metadata fields like <code>dc.date.accessioned</code> so I can filter the ones from 2018-04 and correct them in Open Refine:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I export them and include the hidden metadata fields like <code>dc.date.accessioned</code> so I can filter the ones from 2018-04 and correct them in Open Refine:</p>
|
||||
|
||||
<pre><code>$ dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Abenet sent a list of 46 ORCID identifiers for ILRI authors so I need to get their names using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script and merge them into our controlled vocabulary</li>
|
||||
<li>On the messed up IITA records from 2018-04 I see sixty DOIs in incorrect format (cg.identifier.doi)</li>
|
||||
<li><p>Abenet sent a list of 46 ORCID identifiers for ILRI authors so I need to get their names using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script and merge them into our controlled vocabulary</p></li>
|
||||
|
||||
<li><p>On the messed up IITA records from 2018-04 I see sixty DOIs in incorrect format (cg.identifier.doi)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-05-06">2018-05-06</h2>
|
||||
|
||||
<ul>
|
||||
<li>Fixing the IITA records from Sisay, sixty DOIs have completely invalid format like <code>http:dx.doi.org10.1016j.cropro.2008.07.003</code></li>
|
||||
<li>I corrected all the DOIs and then checked them for validity with a quick bash loop:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I corrected all the DOIs and then checked them for validity with a quick bash loop:</p>
|
||||
|
||||
<pre><code>$ for line in $(< /tmp/links.txt); do echo $line; http --print h $line; done
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Most of the links are good, though one is duplicate and one seems to even be incorrect in the publisher’s site so…</li>
|
||||
<li>Also, there are some duplicates:
|
||||
<li><p>Most of the links are good, though one is duplicate and one seems to even be incorrect in the publisher’s site so…</p></li>
|
||||
|
||||
<li><p>Also, there are some duplicates:</p>
|
||||
|
||||
<ul>
|
||||
<li><code>10568/92241</code> and <code>10568/92230</code> (same DOI)</li>
|
||||
<li><code>10568/92151</code> and <code>10568/92150</code> (same ISBN)</li>
|
||||
<li><code>10568/92291</code> and <code>10568/92286</code> (same citation, title, authors, year)</li>
|
||||
</ul></li>
|
||||
<li>Messed up abstracts:
|
||||
|
||||
<li><p>Messed up abstracts:</p>
|
||||
|
||||
<ul>
|
||||
<li><code>10568/92309</code></li>
|
||||
</ul></li>
|
||||
<li>Fixed some issues in regions, countries, sponsors, ISSN, and cleaned whitespace errors from citation, abstract, author, and titles</li>
|
||||
<li>Fixed all issues with CRPs</li>
|
||||
<li>A few more interesting Unicode characters to look for in text fields like author, abstracts, and citations might be: <code>’</code> (0x2019), <code>·</code> (0x00b7), and <code>€</code> (0x20ac)</li>
|
||||
<li>A custom text facit in OpenRefine with this GREL expression could be a good for finding invalid characters or encoding errors in authors, abstracts, etc:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Fixed some issues in regions, countries, sponsors, ISSN, and cleaned whitespace errors from citation, abstract, author, and titles</p></li>
|
||||
|
||||
<li><p>Fixed all issues with CRPs</p></li>
|
||||
|
||||
<li><p>A few more interesting Unicode characters to look for in text fields like author, abstracts, and citations might be: <code>’</code> (0x2019), <code>·</code> (0x00b7), and <code>€</code> (0x20ac)</p></li>
|
||||
|
||||
<li><p>A custom text facit in OpenRefine with this GREL expression could be a good for finding invalid characters or encoding errors in authors, abstracts, etc:</p>
|
||||
|
||||
<pre><code>or(
|
||||
isNotNull(value.match(/.*[(|)].*/)),
|
||||
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||
isNotNull(value.match(/.*\u00A0.*/)),
|
||||
isNotNull(value.match(/.*\u200A.*/)),
|
||||
isNotNull(value.match(/.*\u2019.*/)),
|
||||
isNotNull(value.match(/.*\u00b7.*/)),
|
||||
isNotNull(value.match(/.*\u20ac.*/))
|
||||
isNotNull(value.match(/.*[(|)].*/)),
|
||||
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||
isNotNull(value.match(/.*\u00A0.*/)),
|
||||
isNotNull(value.match(/.*\u200A.*/)),
|
||||
isNotNull(value.match(/.*\u2019.*/)),
|
||||
isNotNull(value.match(/.*\u00b7.*/)),
|
||||
isNotNull(value.match(/.*\u20ac.*/))
|
||||
)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I found some more IITA records that Sisay imported on 2018-03-23 that have invalid CRP names, so now I kinda want to check those ones!</li>
|
||||
<li>Combine the ORCID identifiers Abenet sent with our existing list and resolve their names using the <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</li>
|
||||
</ul>
|
||||
<li><p>I found some more IITA records that Sisay imported on 2018-03-23 that have invalid CRP names, so now I kinda want to check those ones!</p></li>
|
||||
|
||||
<li><p>Combine the ORCID identifiers Abenet sent with our existing list and resolve their names using the <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</p>
|
||||
|
||||
<pre><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/ilri-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > /tmp/2018-05-06-combined.txt
|
||||
$ ./resolve-orcids.py -i /tmp/2018-05-06-combined.txt -o /tmp/2018-05-06-combined-names.txt -d
|
||||
# sort names, copy to cg-creator-id.xml, add XML formatting, and then format with tidy (preserving accents)
|
||||
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I made a pull request (<a href="https://github.com/ilri/DSpace/pull/373">#373</a>) for this that I’ll merge some time next week (I’m expecting Atmire to get back to us about DSpace 5.8 soon)</li>
|
||||
<li>After testing quickly I just decided to merge it, and I noticed that I don’t even need to restart Tomcat for the changes to get loaded</li>
|
||||
<li><p>I made a pull request (<a href="https://github.com/ilri/DSpace/pull/373">#373</a>) for this that I’ll merge some time next week (I’m expecting Atmire to get back to us about DSpace 5.8 soon)</p></li>
|
||||
|
||||
<li><p>After testing quickly I just decided to merge it, and I noticed that I don’t even need to restart Tomcat for the changes to get loaded</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-05-07">2018-05-07</h2>
|
||||
@ -249,16 +252,14 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
|
||||
<li>I told him that there were still some TODO items for him on that data, for example to update the <code>dc.language.iso</code> field for the Spanish items</li>
|
||||
<li>I was trying to remember how I parsed the <code>input-forms.xml</code> using <code>xmllint</code> to extract subjects neatly</li>
|
||||
<li>I could use it with <a href="https://github.com/okfn/reconcile-csv">reconcile-csv</a> or to populate a Solr instance for reconciliation</li>
|
||||
<li>This XPath expression gets close, but outputs all items on one line:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>This XPath expression gets close, but outputs all items on one line:</p>
|
||||
|
||||
<pre><code>$ xmllint --xpath '//value-pairs[@value-pairs-name="crpsubject"]/pair/stored-value/node()' dspace/config/input-forms.xml
|
||||
Agriculture for Nutrition and HealthBig DataClimate Change, Agriculture and Food SecurityExcellence in BreedingFishForests, Trees and AgroforestryGenebanksGrain Legumes and Dryland CerealsLivestockMaizePolicies, Institutions and MarketsRiceRoots, Tubers and BananasWater, Land and EcosystemsWheatAquatic Agricultural SystemsDryland CerealsDryland SystemsGrain LegumesIntegrated Systems for the Humid TropicsLivestock and Fish
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Maybe <code>xmlstarlet</code> is better:</li>
|
||||
</ul>
|
||||
<li><p>Maybe <code>xmlstarlet</code> is better:</p>
|
||||
|
||||
<pre><code>$ xmlstarlet sel -t -v '//value-pairs[@value-pairs-name="crpsubject"]/pair/stored-value/text()' dspace/config/input-forms.xml
|
||||
Agriculture for Nutrition and Health
|
||||
@ -282,20 +283,20 @@ Dryland Systems
|
||||
Grain Legumes
|
||||
Integrated Systems for the Humid Tropics
|
||||
Livestock and Fish
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Discuss Colombian BNARS harvesting the CIAT data from CGSpace</li>
|
||||
<li>They are using a system called Primo and the only options for data harvesting in that system are via FTP and OAI</li>
|
||||
<li>I told them to get all <a href="https://cgspace.cgiar.org/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=com_10568_35697">CIAT records via OAI</a></li>
|
||||
<li>Just a note to myself, I figured out how to get reconcile-csv to run from source rather than running the old pre-compiled JAR file:</li>
|
||||
</ul>
|
||||
<li><p>Discuss Colombian BNARS harvesting the CIAT data from CGSpace</p></li>
|
||||
|
||||
<li><p>They are using a system called Primo and the only options for data harvesting in that system are via FTP and OAI</p></li>
|
||||
|
||||
<li><p>I told them to get all <a href="https://cgspace.cgiar.org/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=com_10568_35697">CIAT records via OAI</a></p></li>
|
||||
|
||||
<li><p>Just a note to myself, I figured out how to get reconcile-csv to run from source rather than running the old pre-compiled JAR file:</p>
|
||||
|
||||
<pre><code>$ lein run /tmp/crps.csv name id
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I tried to reconcile against a CSV of our countries but reconcile-csv crashes</li>
|
||||
<li><p>I tried to reconcile against a CSV of our countries but reconcile-csv crashes</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-05-13">2018-05-13</h2>
|
||||
@ -329,83 +330,85 @@ Livestock and Fish
|
||||
<ul>
|
||||
<li>Turns out I was doing the OpenRefine reconciliation wrong: I needed to copy the matched values to a new column!</li>
|
||||
<li>Also, I learned how to do something cool with Jython expressions in OpenRefine</li>
|
||||
<li>This will fetch a URL and return its HTTP response code:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>This will fetch a URL and return its HTTP response code:</p>
|
||||
|
||||
<pre><code>import urllib2
|
||||
import re
|
||||
|
||||
pattern = re.compile('.*10.1016.*')
|
||||
if pattern.match(value):
|
||||
get = urllib2.urlopen(value)
|
||||
return get.getcode()
|
||||
get = urllib2.urlopen(value)
|
||||
return get.getcode()
|
||||
|
||||
return "blank"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I used a regex to limit it to just some of the DOIs in this case because there were thousands of URLs</li>
|
||||
<li>Here the response code would be 200, 404, etc, or “blank” if there is no URL for that item</li>
|
||||
<li>You could use this in a facet or in a new column</li>
|
||||
<li>More information and good examples here: <a href="https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine">https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine</a></li>
|
||||
<li>Finish looking at the 2,640 CIFOR records on DSpace Test (<a href="https://dspacetest.cgiar.org/handle/10568/92904"><sup>10568</sup>⁄<sub>92904</sub></a>), cleaning up authors and adding collection mappings</li>
|
||||
<li>They can now be moved to CGSpace as far as I’m concerned, but I don’t know if Sisay will do it or me</li>
|
||||
<li>I was checking the CIFOR data for duplicates using Atmire’s Metadata Quality Module (and found some duplicates actually), but then DSpace died…</li>
|
||||
<li>I didn’t see anything in the Tomcat, DSpace, or Solr logs, but I saw this in <code>dmest -T</code>:</li>
|
||||
</ul>
|
||||
<li><p>I used a regex to limit it to just some of the DOIs in this case because there were thousands of URLs</p></li>
|
||||
|
||||
<li><p>Here the response code would be 200, 404, etc, or “blank” if there is no URL for that item</p></li>
|
||||
|
||||
<li><p>You could use this in a facet or in a new column</p></li>
|
||||
|
||||
<li><p>More information and good examples here: <a href="https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine">https://programminghistorian.org/lessons/fetch-and-parse-data-with-openrefine</a></p></li>
|
||||
|
||||
<li><p>Finish looking at the 2,640 CIFOR records on DSpace Test (<a href="https://dspacetest.cgiar.org/handle/10568/92904"><sup>10568</sup>⁄<sub>92904</sub></a>), cleaning up authors and adding collection mappings</p></li>
|
||||
|
||||
<li><p>They can now be moved to CGSpace as far as I’m concerned, but I don’t know if Sisay will do it or me</p></li>
|
||||
|
||||
<li><p>I was checking the CIFOR data for duplicates using Atmire’s Metadata Quality Module (and found some duplicates actually), but then DSpace died…</p></li>
|
||||
|
||||
<li><p>I didn’t see anything in the Tomcat, DSpace, or Solr logs, but I saw this in <code>dmest -T</code>:</p>
|
||||
|
||||
<pre><code>[Tue May 15 12:10:01 2018] Out of memory: Kill process 3763 (java) score 706 or sacrifice child
|
||||
[Tue May 15 12:10:01 2018] Killed process 3763 (java) total-vm:14667688kB, anon-rss:5705268kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Tue May 15 12:10:01 2018] oom_reaper: reaped process 3763 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So the Linux kernel killed Java…</li>
|
||||
<li>Maria from Bioversity mailed to say she got an error while submitting an item on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>So the Linux kernel killed Java…</p></li>
|
||||
|
||||
<li><p>Maria from Bioversity mailed to say she got an error while submitting an item on CGSpace:</p>
|
||||
|
||||
<pre><code>Unable to load Submission Information, since WorkspaceID (ID:S96060) is not a valid in-process submission
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking in the DSpace log I see something related:</li>
|
||||
</ul>
|
||||
<li><p>Looking in the DSpace log I see something related:</p>
|
||||
|
||||
<pre><code>2018-05-15 12:35:30,858 INFO org.dspace.submit.step.CompleteStep @ m.garruccio@cgiar.org:session_id=8AC4499945F38B45EF7A1226E3042DAE:submission_complete:Completed submission with id=96060
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So I’m not sure…</li>
|
||||
<li>I finally figured out how to get OpenRefine to reconcile values from Solr via <a href="https://github.com/codeforkjeff/conciliator">conciliator</a>:</li>
|
||||
<li>The trick was to use a more appropriate Solr fieldType <code>text_en</code> instead of <code>text_general</code> so that more terms match, for example uppercase and lower case:</li>
|
||||
</ul>
|
||||
<li><p>So I’m not sure…</p></li>
|
||||
|
||||
<li><p>I finally figured out how to get OpenRefine to reconcile values from Solr via <a href="https://github.com/codeforkjeff/conciliator">conciliator</a>:</p></li>
|
||||
|
||||
<li><p>The trick was to use a more appropriate Solr fieldType <code>text_en</code> instead of <code>text_general</code> so that more terms match, for example uppercase and lower case:</p>
|
||||
|
||||
<pre><code>$ ./bin/solr start
|
||||
$ ./bin/solr create_core -c countries
|
||||
$ curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": {"name":"country", "type":"text_en", "multiValued":false, "stored":true}}' http://localhost:8983/solr/countries/schema
|
||||
$ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It still doesn’t catch simple mistakes like “ALBANI” or “AL BANIA” for “ALBANIA”, and it doesn’t return scores, so I have to select matches manually:</li>
|
||||
<li><p>It still doesn’t catch simple mistakes like “ALBANI” or “AL BANIA” for “ALBANIA”, and it doesn’t return scores, so I have to select matches manually:</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2018/05/openrefine-solr-conciliator.png" alt="OpenRefine reconciling countries from local Solr" /></p>
|
||||
|
||||
<ul>
|
||||
<li>I should probably make a general copy field and set it to be the default search field, like DSpace’s search core does (see schema.xml):</li>
|
||||
</ul>
|
||||
<li><p>I should probably make a general copy field and set it to be the default search field, like DSpace’s search core does (see schema.xml):</p>
|
||||
|
||||
<pre><code><defaultSearchField>search_text</defaultSearchField>
|
||||
...
|
||||
<copyField source="*" dest="search_text"/>
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Actually, I wonder how much of their schema I could just copy…</li>
|
||||
<li>Apparently the default search field is the <code>df</code> parameter and you could technically just add it to the query string, so no need to bother with that in the schema now</li>
|
||||
<li>I copied over the DSpace <code>search_text</code> field type from the DSpace Solr config (had to remove some properties so Solr would start) but it doesn’t seem to be any better at matching than the <code>text_en</code> type</li>
|
||||
<li>I think I need to focus on trying to return scores with conciliator</li>
|
||||
<li><p>Actually, I wonder how much of their schema I could just copy…</p></li>
|
||||
|
||||
<li><p>Apparently the default search field is the <code>df</code> parameter and you could technically just add it to the query string, so no need to bother with that in the schema now</p></li>
|
||||
|
||||
<li><p>I copied over the DSpace <code>search_text</code> field type from the DSpace Solr config (had to remove some properties so Solr would start) but it doesn’t seem to be any better at matching than the <code>text_en</code> type</p></li>
|
||||
|
||||
<li><p>I think I need to focus on trying to return scores with conciliator</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-05-16">2018-05-16</h2>
|
||||
@ -422,18 +425,19 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
|
||||
</ul></li>
|
||||
<li>Silvia asked if I could sort the records in her Listings and Report output and it turns out that the options are misconfigured in <code>dspace/config/modules/atmire-listings-and-reports.cfg</code></li>
|
||||
<li>I created and merged a pull request to fix the sorting issue in Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/374">#374</a>)</li>
|
||||
<li>Regarding the IP Address Anonymization for GDPR, I ammended the Google Analytics snippet in <code>page-structure-alterations.xsl</code> to:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Regarding the IP Address Anonymization for GDPR, I ammended the Google Analytics snippet in <code>page-structure-alterations.xsl</code> to:</p>
|
||||
|
||||
<pre><code>ga('send', 'pageview', {
|
||||
'anonymizeIp': true
|
||||
'anonymizeIp': true
|
||||
});
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I tested loading a certain page before and after adding this and afterwards I saw that the parameter <code>aip=1</code> was being sent with the analytics response to Google</li>
|
||||
<li>According to the <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference#anonymizeIp">analytics.js protocol parameter documentation</a> this means that IPs are being anonymized</li>
|
||||
<li>After finding and fixing some duplicates in IITA’s <code>IITA_April_27</code> test collection on DSpace Test (<sup>10568</sup>⁄<sub>92703</sub>) I told Sisay that he can move them to IITA’s Journal Articles collection on CGSpace</li>
|
||||
<li><p>I tested loading a certain page before and after adding this and afterwards I saw that the parameter <code>aip=1</code> was being sent with the analytics response to Google</p></li>
|
||||
|
||||
<li><p>According to the <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference#anonymizeIp">analytics.js protocol parameter documentation</a> this means that IPs are being anonymized</p></li>
|
||||
|
||||
<li><p>After finding and fixing some duplicates in IITA’s <code>IITA_April_27</code> test collection on DSpace Test (<sup>10568</sup>⁄<sub>92703</sub>) I told Sisay that he can move them to IITA’s Journal Articles collection on CGSpace</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-05-17">2018-05-17</h2>
|
||||
@ -495,18 +499,20 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
|
||||
<h2 id="2018-05-23">2018-05-23</h2>
|
||||
|
||||
<ul>
|
||||
<li>I’m investigating how many non-CGIAR users we have registered on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>I’m investigating how many non-CGIAR users we have registered on CGSpace:</p>
|
||||
|
||||
<pre><code>dspace=# select email, netid from eperson where email not like '%cgiar.org%' and email like '%@%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We might need to do something regarding these users for GDPR compliance because we have their names, emails, and potentially phone numbers</li>
|
||||
<li>I decided that I will just use the cookieconsent script as is, since it looks good and technically does set the cookie with “allow” or “dismiss”</li>
|
||||
<li>I wrote a quick conditional to check if the user has agreed or not before enabling Google Analytics</li>
|
||||
<li>I made a pull request for the GDPR compliance popup (<a href="https://github.com/ilri/DSpace/pull/377">#377</a>) and merged it to the <code>5_x-prod</code> branch</li>
|
||||
<li>I will deploy it to CGSpace tonight</li>
|
||||
<li><p>We might need to do something regarding these users for GDPR compliance because we have their names, emails, and potentially phone numbers</p></li>
|
||||
|
||||
<li><p>I decided that I will just use the cookieconsent script as is, since it looks good and technically does set the cookie with “allow” or “dismiss”</p></li>
|
||||
|
||||
<li><p>I wrote a quick conditional to check if the user has agreed or not before enabling Google Analytics</p></li>
|
||||
|
||||
<li><p>I made a pull request for the GDPR compliance popup (<a href="https://github.com/ilri/DSpace/pull/377">#377</a>) and merged it to the <code>5_x-prod</code> branch</p></li>
|
||||
|
||||
<li><p>I will deploy it to CGSpace tonight</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-05-28">2018-05-28</h2>
|
||||
@ -523,54 +529,60 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
|
||||
<ul>
|
||||
<li>Talk to Samantha from Bioversity about something related to Google Analytics, I’m still not sure what they want</li>
|
||||
<li>DSpace Test crashed last night, seems to be related to system memory (not JVM heap)</li>
|
||||
<li>I see this in <code>dmesg</code>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I see this in <code>dmesg</code>:</p>
|
||||
|
||||
<pre><code>[Wed May 30 00:00:39 2018] Out of memory: Kill process 6082 (java) score 697 or sacrifice child
|
||||
[Wed May 30 00:00:39 2018] Killed process 6082 (java) total-vm:14876264kB, anon-rss:5683372kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Wed May 30 00:00:40 2018] oom_reaper: reaped process 6082 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I need to check the Tomcat JVM heap size/usage, command line JVM heap size (for cron jobs), and PostgreSQL memory usage</li>
|
||||
<li>It might be possible to adjust some things, but eventually we’ll need a larger VPS instance</li>
|
||||
<li>For some reason there are no JVM stats in Munin, ugh</li>
|
||||
<li>Run all system updates on DSpace Test and reboot it</li>
|
||||
<li>I generated a list of CIFOR duplicates from the <code>CIFOR_May_9</code> collection using the Atmire MQM module and then dumped the HTML source so I could process it for sending to Vika</li>
|
||||
<li>I used grep to filter all relevant handle lines from the HTML source then used sed to insert a newline before each “Item1” line (as the duplicates are grouped like Item1, Item2, Item3 for each set of duplicates):</li>
|
||||
</ul>
|
||||
<li><p>I need to check the Tomcat JVM heap size/usage, command line JVM heap size (for cron jobs), and PostgreSQL memory usage</p></li>
|
||||
|
||||
<li><p>It might be possible to adjust some things, but eventually we’ll need a larger VPS instance</p></li>
|
||||
|
||||
<li><p>For some reason there are no JVM stats in Munin, ugh</p></li>
|
||||
|
||||
<li><p>Run all system updates on DSpace Test and reboot it</p></li>
|
||||
|
||||
<li><p>I generated a list of CIFOR duplicates from the <code>CIFOR_May_9</code> collection using the Atmire MQM module and then dumped the HTML source so I could process it for sending to Vika</p></li>
|
||||
|
||||
<li><p>I used grep to filter all relevant handle lines from the HTML source then used sed to insert a newline before each “Item1” line (as the duplicates are grouped like Item1, Item2, Item3 for each set of duplicates):</p>
|
||||
|
||||
<pre><code>$ grep -E 'aspect.duplicatechecker.DuplicateResults.field.del_handle_[0-9]{1,3}_Item' ~/Desktop/https\ _dspacetest.cgiar.org_atmire_metadata-quality_duplicate-checker.html > ~/cifor-duplicates.txt
|
||||
$ sed 's/.*Item1.*/\n&/g' ~/cifor-duplicates.txt > ~/cifor-duplicates-cleaned.txt
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I told Vika to look through the list manually and indicate which ones are indeed duplicates that we should delete, and which ones to map to CIFOR’s collection</li>
|
||||
<li>A few weeks ago Peter wanted a list of authors from the ILRI collections, so I need to find a way to get the handles of all those collections</li>
|
||||
<li>I can use the <code>/communities/{id}/collections</code> endpoint of the REST API but it only takes IDs (not handles) and doesn’t seem to descend into sub communities</li>
|
||||
<li>Shit, so I need the IDs for the the top-level ILRI community and all its sub communities (and their sub communities)</li>
|
||||
<li>There has got to be a better way to do this than going to each community and getting their handles and IDs manually</li>
|
||||
<li>Oh shit, I literally already wrote a script to get all collections in a community hierarchy from the REST API: <a href="https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50">rest-find-collections.py</a></li>
|
||||
<li>The output isn’t great, but all the handles and IDs are printed in debug mode:</li>
|
||||
</ul>
|
||||
<li><p>I told Vika to look through the list manually and indicate which ones are indeed duplicates that we should delete, and which ones to map to CIFOR’s collection</p></li>
|
||||
|
||||
<li><p>A few weeks ago Peter wanted a list of authors from the ILRI collections, so I need to find a way to get the handles of all those collections</p></li>
|
||||
|
||||
<li><p>I can use the <code>/communities/{id}/collections</code> endpoint of the REST API but it only takes IDs (not handles) and doesn’t seem to descend into sub communities</p></li>
|
||||
|
||||
<li><p>Shit, so I need the IDs for the the top-level ILRI community and all its sub communities (and their sub communities)</p></li>
|
||||
|
||||
<li><p>There has got to be a better way to do this than going to each community and getting their handles and IDs manually</p></li>
|
||||
|
||||
<li><p>Oh shit, I literally already wrote a script to get all collections in a community hierarchy from the REST API: <a href="https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50">rest-find-collections.py</a></p></li>
|
||||
|
||||
<li><p>The output isn’t great, but all the handles and IDs are printed in debug mode:</p>
|
||||
|
||||
<pre><code>$ ./rest-find-collections.py -u https://cgspace.cgiar.org/rest -d 10568/1 2> /tmp/ilri-collections.txt
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then I format the list of handles and put it into this SQL query to export authors from items ONLY in those collections (too many to list here):</li>
|
||||
</ul>
|
||||
<li><p>Then I format the list of handles and put it into this SQL query to export authors from items ONLY in those collections (too many to list here):</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/67236','10568/67274',...))) group by text_value order by count desc) to /tmp/ilri-authors.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-05-31">2018-05-31</h2>
|
||||
|
||||
<ul>
|
||||
<li>Clarify CGSpace’s usage of Google Analytics and personally identifiable information during user registration for Bioversity team who had been asking about GDPR compliance</li>
|
||||
<li>Testing running PostgreSQL in a Docker container on localhost because when I’m on Arch Linux there isn’t an easily installable package for particular PostgreSQL versions</li>
|
||||
<li>Now I can just use Docker:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Now I can just use Docker:</p>
|
||||
|
||||
<pre><code>$ docker pull postgres:9.5-alpine
|
||||
$ docker run --name dspacedb -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.5-alpine
|
||||
@ -581,7 +593,8 @@ $ pg_restore -h localhost -O -U dspacetest -d dspacetest -W -h localhost ~/Downl
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
$ psql -h localhost -U dspacetest -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
|
||||
$ psql -h localhost -U postgres dspacetest
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
|
@ -15,22 +15,22 @@ Test the DSpace 5.8 module upgrades from Atmire (#378)
|
||||
There seems to be a problem with the CUA and L&R versions in pom.xml because they are using SNAPSHOT and it doesn’t build
|
||||
|
||||
I added the new CCAFS Phase II Project Tag PII-FP1_PACCA2 and merged it into the 5_x-prod branch (#379)
|
||||
I proofed and tested the ILRI author corrections that Peter sent back to me this week:
|
||||
|
||||
I proofed and tested the ILRI author corrections that Peter sent back to me this week:
|
||||
|
||||
$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
|
||||
|
||||
|
||||
I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in March, 2018
|
||||
Time to index ~70,000 items on CGSpace:
|
||||
|
||||
Time to index ~70,000 items on CGSpace:
|
||||
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-06/" />
|
||||
@ -48,24 +48,24 @@ Test the DSpace 5.8 module upgrades from Atmire (#378)
|
||||
There seems to be a problem with the CUA and L&R versions in pom.xml because they are using SNAPSHOT and it doesn’t build
|
||||
|
||||
I added the new CCAFS Phase II Project Tag PII-FP1_PACCA2 and merged it into the 5_x-prod branch (#379)
|
||||
I proofed and tested the ILRI author corrections that Peter sent back to me this week:
|
||||
|
||||
I proofed and tested the ILRI author corrections that Peter sent back to me this week:
|
||||
|
||||
$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
|
||||
|
||||
|
||||
I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in March, 2018
|
||||
Time to index ~70,000 items on CGSpace:
|
||||
|
||||
Time to index ~70,000 items on CGSpace:
|
||||
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -153,23 +153,23 @@ sys 2m7.289s
|
||||
<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
|
||||
</ul></li>
|
||||
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
|
||||
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
|
||||
<li>Time to index ~70,000 items on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></p></li>
|
||||
|
||||
<li><p>Time to index ~70,000 items on CGSpace:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-06-06">2018-06-06</h2>
|
||||
|
||||
@ -198,32 +198,29 @@ sys 2m7.289s
|
||||
<li>Universit F lix Houphouet-Boigny</li>
|
||||
</ul></li>
|
||||
<li>I uploaded fixes for all those now, but I will continue with the rest of the data later</li>
|
||||
<li>Regarding the SQL migration errors, Atmire told me I need to run some migrations manually in PostgreSQL:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Regarding the SQL migration errors, Atmire told me I need to run some migrations manually in PostgreSQL:</p>
|
||||
|
||||
<pre><code>delete from schema_version where version = '5.6.2015.12.03.2';
|
||||
update schema_version set version = '5.6.2015.12.03.2' where version = '5.5.2015.12.03.2';
|
||||
update schema_version set version = '5.8.2015.12.03.3' where version = '5.5.2015.12.03.3';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And then I need to ignore the ignored ones:</li>
|
||||
</ul>
|
||||
<li><p>And then I need to ignore the ignored ones:</p>
|
||||
|
||||
<pre><code>$ ~/dspace/bin/dspace database migrate ignored
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Now DSpace starts up properly!</li>
|
||||
<li>Gabriela from CIP got back to me about the author names we were correcting on CGSpace</li>
|
||||
<li>I did a quick sanity check on them and then did a test import with my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897"><code>fix-metadata-value.py</code></a> script:</li>
|
||||
</ul>
|
||||
<li><p>Now DSpace starts up properly!</p></li>
|
||||
|
||||
<li><p>Gabriela from CIP got back to me about the author names we were correcting on CGSpace</p></li>
|
||||
|
||||
<li><p>I did a quick sanity check on them and then did a test import with my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897"><code>fix-metadata-value.py</code></a> script:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-06-08-CIP-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I will apply them on CGSpace tomorrow I think…</li>
|
||||
<li><p>I will apply them on CGSpace tomorrow I think…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-06-09">2018-06-09</h2>
|
||||
@ -238,17 +235,18 @@ update schema_version set version = '5.8.2015.12.03.3' where version = '5.5.2015
|
||||
|
||||
<ul>
|
||||
<li>I spent some time removing the Atmire Metadata Quality Module (MQM) from the proposed DSpace 5.8 changes</li>
|
||||
<li>After removing all code mentioning MQM, mqm, metadata-quality, batchedit, duplicatechecker, etc, I think I got most of it removed, but there is a Spring error during Tomcat startup:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code> INFO [org.dspace.servicemanager.DSpaceServiceManager] Shutdown DSpace core service manager
|
||||
<li><p>After removing all code mentioning MQM, mqm, metadata-quality, batchedit, duplicatechecker, etc, I think I got most of it removed, but there is a Spring error during Tomcat startup:</p>
|
||||
|
||||
<pre><code>INFO [org.dspace.servicemanager.DSpaceServiceManager] Shutdown DSpace core service manager
|
||||
Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'org.dspace.servicemanager.spring.DSpaceBeanPostProcessor#0' defined in class path resource [spring/spring-dspace-applicationContext.xml]: Unsatisfied dependency expressed through constructor argument with index 0 of type [org.dspace.servicemanager.config.DSpaceConfigurationService]: : Cannot find class [com.atmire.dspace.discovery.ItemCollectionPlugin] for bean with name 'itemCollectionPlugin' defined in file [/home/aorth/dspace/config/spring/api/discovery.xml];
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I can fix this by commenting out the <code>ItemCollectionPlugin</code> line of <code>discovery.xml</code>, but from looking at the git log I’m not actually sure if that is related to MQM or not</li>
|
||||
<li>I will have to ask Atmire</li>
|
||||
<li>I continued to look at Sisay’s IITA records from last week
|
||||
<li><p>I can fix this by commenting out the <code>ItemCollectionPlugin</code> line of <code>discovery.xml</code>, but from looking at the git log I’m not actually sure if that is related to MQM or not</p></li>
|
||||
|
||||
<li><p>I will have to ask Atmire</p></li>
|
||||
|
||||
<li><p>I continued to look at Sisay’s IITA records from last week</p>
|
||||
|
||||
<ul>
|
||||
<li>I normalized all DOIs to use HTTPS and “doi.org” instead of “dx.doi.org”</li>
|
||||
@ -263,7 +261,8 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
|
||||
<li>dÕpassÕ</li>
|
||||
<li>Also the abstracts have missing accents, ie “recherche sur le d veloppement”</li>
|
||||
</ul></li>
|
||||
<li>I will have to tell IITA people to redo these entirely I think…</li>
|
||||
|
||||
<li><p>I will have to tell IITA people to redo these entirely I think…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-06-11">2018-06-11</h2>
|
||||
@ -301,7 +300,8 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
|
||||
<li>The style sheet obfuscates the data, but if you look at the source it is all there, including information about pagination of results</li>
|
||||
<li>Regarding Udana’s Book Chapters and Reports on DSpace Test last week, Abenet told him to fix some character encoding and CRP issues, then I told him I’d check them after that</li>
|
||||
<li>The latest batch of IITA’s 200 records (based on Abenet’s version <code>Mercy1805_AY.xls</code>) are now in the <a href="https://dspacetest.cgiar.org/handle/10568/96071">IITA_Jan_9_II_Ab</a> collection</li>
|
||||
<li>So here are some corrections:
|
||||
|
||||
<li><p>So here are some corrections:</p>
|
||||
|
||||
<ul>
|
||||
<li>use of Unicode smart quote (hex 2019) in countries and affiliations, for example “COTE D’IVOIRE” and “Institut d’Economic Rurale, Mali”</li>
|
||||
@ -338,27 +338,27 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
|
||||
<li>“LEGUMINOUS COVER CROP” and “LEGUMINOUS COVER CROPS”</li>
|
||||
<li>“MATÉRIEL DE PLANTATION” and “MATÉRIELS DE PLANTATION”</li>
|
||||
<li>I noticed that some records do have encoding errors in the <code>dc.description.abstract</code> field, but only four of them so probably not from Abenet’s handling of the XLS file</li>
|
||||
<li>Based on manually eyeballing the text I used a custom text facet with this GREL to identify the records:</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<li><p>Based on manually eyeballing the text I used a custom text facet with this GREL to identify the records:</p>
|
||||
|
||||
<pre><code>or(
|
||||
value.contains('€'),
|
||||
value.contains('6g'),
|
||||
value.contains('6m'),
|
||||
value.contains('6d'),
|
||||
value.contains('6e')
|
||||
value.contains('€'),
|
||||
value.contains('6g'),
|
||||
value.contains('6m'),
|
||||
value.contains('6d'),
|
||||
value.contains('6e')
|
||||
)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So IITA should double check the abstracts for these:
|
||||
<li><p>So IITA should double check the abstracts for these:</p></li>
|
||||
|
||||
<ul>
|
||||
<li><a href="https://dspacetest.cgiar.org/10568/96184">https://dspacetest.cgiar.org/10568/96184</a></li>
|
||||
<li><a href="https://dspacetest.cgiar.org/10568/96141">https://dspacetest.cgiar.org/10568/96141</a></li>
|
||||
<li><a href="https://dspacetest.cgiar.org/10568/96118">https://dspacetest.cgiar.org/10568/96118</a></li>
|
||||
<li><a href="https://dspacetest.cgiar.org/10568/96113">https://dspacetest.cgiar.org/10568/96113</a></li>
|
||||
<li><p><a href="https://dspacetest.cgiar.org/10568/96184">https://dspacetest.cgiar.org/10568/96184</a></p></li>
|
||||
|
||||
<li><p><a href="https://dspacetest.cgiar.org/10568/96141">https://dspacetest.cgiar.org/10568/96141</a></p></li>
|
||||
|
||||
<li><p><a href="https://dspacetest.cgiar.org/10568/96118">https://dspacetest.cgiar.org/10568/96118</a></p></li>
|
||||
|
||||
<li><p><a href="https://dspacetest.cgiar.org/10568/96113">https://dspacetest.cgiar.org/10568/96113</a></p></li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
@ -366,38 +366,33 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
|
||||
|
||||
<ul>
|
||||
<li>Elizabeth from CIAT contacted me to ask if I could add ORCID identifiers to all of Robin Buruchara’s items</li>
|
||||
<li>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</p>
|
||||
|
||||
<pre><code>$ ./add-orcid-identifiers-csv.py -i 2018-06-13-Robin-Buruchara.csv -db dspace -u dspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The contents of <code>2018-06-13-Robin-Buruchara.csv</code> were:</li>
|
||||
</ul>
|
||||
<li><p>The contents of <code>2018-06-13-Robin-Buruchara.csv</code> were:</p>
|
||||
|
||||
<pre><code>dc.contributor.author,cg.creator.id
|
||||
"Buruchara, Robin",Robin Buruchara: 0000-0003-0934-1218
|
||||
"Buruchara, Robin A.",Robin Buruchara: 0000-0003-0934-1218
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>On a hunch I checked to see if CGSpace’s bitstream cleanup was working properly and of course it’s broken:</li>
|
||||
</ul>
|
||||
<li><p>On a hunch I checked to see if CGSpace’s bitstream cleanup was working properly and of course it’s broken:</p>
|
||||
|
||||
<pre><code>$ dspace cleanup -v
|
||||
...
|
||||
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(152402) is still referenced from table "bundle".
|
||||
</code></pre>
|
||||
Detail: Key (bitstream_id)=(152402) is still referenced from table "bundle".
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>As always, the solution is to delete that ID manually in PostgreSQL:</li>
|
||||
</ul>
|
||||
<li><p>As always, the solution is to delete that ID manually in PostgreSQL:</p>
|
||||
|
||||
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (152402);'
|
||||
UPDATE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-06-14">2018-06-14</h2>
|
||||
|
||||
@ -411,39 +406,47 @@ UPDATE 1
|
||||
<h2 id="2018-06-24">2018-06-24</h2>
|
||||
|
||||
<ul>
|
||||
<li>I was restoring a PostgreSQL dump on my test machine and found a way to restore the CGSpace dump as the <code>postgres</code> user, but have the owner of the schema be the <code>dspacetest</code> user:</li>
|
||||
</ul>
|
||||
<li><p>I was restoring a PostgreSQL dump on my test machine and found a way to restore the CGSpace dump as the <code>postgres</code> user, but have the owner of the schema be the <code>dspacetest</code> user:</p>
|
||||
|
||||
<pre><code>$ dropdb -h localhost -U postgres dspacetest
|
||||
$ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost /tmp/cgspace_2018-06-24.backup
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The <code>-O</code> option to <code>pg_restore</code> makes the import process ignore ownership specified in the dump itself, and instead makes the schema owned by the user doing the restore</li>
|
||||
<li>I always prefer to use the <code>postgres</code> user locally because it’s just easier than remembering the <code>dspacetest</code> user’s password, but then I couldn’t figure out why the resulting schema was owned by <code>postgres</code></li>
|
||||
<li>So with this you connect as the <code>postgres</code> superuser and then switch roles to <code>dspacetest</code> (also, make sure this user has <code>superuser</code> privileges before the restore)</li>
|
||||
<li>Last week Linode emailed me to say that our Linode 8192 instance used for DSpace Test qualified for an upgrade</li>
|
||||
<li>Apparently they announced some <a href="https://blog.linode.com/2018/05/17/updated-linode-plans-new-larger-linodes/">upgrades to most of their plans in 2018-05</a></li>
|
||||
<li>After the upgrade I see we have more disk space available in the instance’s dashboard, so I shut the instance down and resized it from 98GB to 160GB</li>
|
||||
<li>The resize was very quick (less than one minute) and after booting the instance back up I now have 160GB for the root filesystem!</li>
|
||||
<li>I will move the DSpace installation directory back to the root file system and delete the extra 300GB block storage, as it was actually kinda slow when we put Solr there and now we don’t actually need it anymore because running the production Solr on this instance didn’t work well with 8GB of RAM</li>
|
||||
<li>Also, the larger instance we’re using for CGSpace will go from 24GB of RAM to 32, and will also get a storage increase from 320GB to 640GB… that means we don’t need to consider using block storage right now!</li>
|
||||
<li>The smaller instances get increased storage and network speed but I doubt many are actually using much of their current allocations so we probably don’t need to bother with upgrading them</li>
|
||||
<li>Last week Abenet asked if we could add <code>dc.language.iso</code> to the advanced search filters</li>
|
||||
<li>There is already a search filter for this field defined in <code>discovery.xml</code> but we aren’t using it, so I quickly enabled and tested it, then merged it to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/380">#380</a>)</li>
|
||||
<li>Back to testing the DSpace 5.8 changes from Atmire, I had another issue with SQL migrations:</li>
|
||||
</ul>
|
||||
<li><p>The <code>-O</code> option to <code>pg_restore</code> makes the import process ignore ownership specified in the dump itself, and instead makes the schema owned by the user doing the restore</p></li>
|
||||
|
||||
<li><p>I always prefer to use the <code>postgres</code> user locally because it’s just easier than remembering the <code>dspacetest</code> user’s password, but then I couldn’t figure out why the resulting schema was owned by <code>postgres</code></p></li>
|
||||
|
||||
<li><p>So with this you connect as the <code>postgres</code> superuser and then switch roles to <code>dspacetest</code> (also, make sure this user has <code>superuser</code> privileges before the restore)</p></li>
|
||||
|
||||
<li><p>Last week Linode emailed me to say that our Linode 8192 instance used for DSpace Test qualified for an upgrade</p></li>
|
||||
|
||||
<li><p>Apparently they announced some <a href="https://blog.linode.com/2018/05/17/updated-linode-plans-new-larger-linodes/">upgrades to most of their plans in 2018-05</a></p></li>
|
||||
|
||||
<li><p>After the upgrade I see we have more disk space available in the instance’s dashboard, so I shut the instance down and resized it from 98GB to 160GB</p></li>
|
||||
|
||||
<li><p>The resize was very quick (less than one minute) and after booting the instance back up I now have 160GB for the root filesystem!</p></li>
|
||||
|
||||
<li><p>I will move the DSpace installation directory back to the root file system and delete the extra 300GB block storage, as it was actually kinda slow when we put Solr there and now we don’t actually need it anymore because running the production Solr on this instance didn’t work well with 8GB of RAM</p></li>
|
||||
|
||||
<li><p>Also, the larger instance we’re using for CGSpace will go from 24GB of RAM to 32, and will also get a storage increase from 320GB to 640GB… that means we don’t need to consider using block storage right now!</p></li>
|
||||
|
||||
<li><p>The smaller instances get increased storage and network speed but I doubt many are actually using much of their current allocations so we probably don’t need to bother with upgrading them</p></li>
|
||||
|
||||
<li><p>Last week Abenet asked if we could add <code>dc.language.iso</code> to the advanced search filters</p></li>
|
||||
|
||||
<li><p>There is already a search filter for this field defined in <code>discovery.xml</code> but we aren’t using it, so I quickly enabled and tested it, then merged it to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/380">#380</a>)</p></li>
|
||||
|
||||
<li><p>Back to testing the DSpace 5.8 changes from Atmire, I had another issue with SQL migrations:</p>
|
||||
|
||||
<pre><code>Caused by: org.flywaydb.core.api.FlywayException: Validate failed. Found differences between applied migrations and available migrations: Detected applied migration missing on the classpath: 5.8.2015.12.03.3
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It took me a while to figure out that this migration is for MQM, which I removed after Atmire’s original advice about the migrations so we actually need to delete this migration instead up updating it</li>
|
||||
<li>So I need to make sure to run the following during the DSpace 5.8 upgrade:</li>
|
||||
</ul>
|
||||
<li><p>It took me a while to figure out that this migration is for MQM, which I removed after Atmire’s original advice about the migrations so we actually need to delete this migration instead up updating it</p></li>
|
||||
|
||||
<li><p>So I need to make sure to run the following during the DSpace 5.8 upgrade:</p>
|
||||
|
||||
<pre><code>-- Delete existing CUA 4 migration if it exists
|
||||
delete from schema_version where version = '5.6.2015.12.03.2';
|
||||
@ -453,49 +456,45 @@ update schema_version set version = '5.6.2015.12.03.2' where version = '5.5.2015
|
||||
|
||||
-- Delete MQM migration since we're no longer using it
|
||||
delete from schema_version where version = '5.5.2015.12.03.3';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After that you can run the migrations manually and then DSpace should work fine:</li>
|
||||
</ul>
|
||||
<li><p>After that you can run the migrations manually and then DSpace should work fine:</p>
|
||||
|
||||
<pre><code>$ ~/dspace/bin/dspace database migrate ignored
|
||||
...
|
||||
Done.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Elizabeth from CIAT contacted me to ask if I could add ORCID identifiers to all of Andy Jarvis’ items on CGSpace</li>
|
||||
<li>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</li>
|
||||
</ul>
|
||||
<li><p>Elizabeth from CIAT contacted me to ask if I could add ORCID identifiers to all of Andy Jarvis’ items on CGSpace</p></li>
|
||||
|
||||
<li><p>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</p>
|
||||
|
||||
<pre><code>$ ./add-orcid-identifiers-csv.py -i 2018-06-24-andy-jarvis-orcid.csv -db dspacetest -u dspacetest -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The contents of <code>2018-06-24-andy-jarvis-orcid.csv</code> were:</li>
|
||||
</ul>
|
||||
<li><p>The contents of <code>2018-06-24-andy-jarvis-orcid.csv</code> were:</p>
|
||||
|
||||
<pre><code>dc.contributor.author,cg.creator.id
|
||||
"Jarvis, A.",Andy Jarvis: 0000-0001-6543-0798
|
||||
"Jarvis, Andy",Andy Jarvis: 0000-0001-6543-0798
|
||||
"Jarvis, Andrew",Andy Jarvis: 0000-0001-6543-0798
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-06-26">2018-06-26</h2>
|
||||
|
||||
<ul>
|
||||
<li>Atmire got back to me to say that we can remove the <code>itemCollectionPlugin</code> and <code>HasBitstreamsSSIPlugin</code> beans from DSpace’s <code>discovery.xml</code> file, as they are used by the Metadata Quality Module (MQM) that we are not using anymore</li>
|
||||
<li>I removed both those beans and did some simple tests to check item submission, media-filter of PDFs, REST API, but got an error “No matches for the query” when listing records in OAI</li>
|
||||
<li>This warning appears in the DSpace log:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>This warning appears in the DSpace log:</p>
|
||||
|
||||
<pre><code>2018-06-26 16:58:12,052 WARN org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It’s actually only a warning and it also appears in the logs on DSpace Test (which is currently running DSpace 5.5), so I need to keep troubleshooting</li>
|
||||
<li>Ah, I think I just need to run <code>dspace oai import</code></li>
|
||||
<li><p>It’s actually only a warning and it also appears in the logs on DSpace Test (which is currently running DSpace 5.5), so I need to keep troubleshooting</p></li>
|
||||
|
||||
<li><p>Ah, I think I just need to run <code>dspace oai import</code></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-06-27">2018-06-27</h2>
|
||||
@ -503,8 +502,8 @@ Done.
|
||||
<ul>
|
||||
<li>Vika from CIFOR sent back his annotations on the duplicates for the “CIFOR_May_9” archive import that I sent him last week</li>
|
||||
<li>I’ll have to figure out how to separate those we’re keeping, deleting, and mapping into CIFOR’s archive collection</li>
|
||||
<li>First, get the 62 deletes from Vika’s file and remove them from the collection:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>First, get the 62 deletes from Vika’s file and remove them from the collection:</p>
|
||||
|
||||
<pre><code>$ grep delete 2018-06-22-cifor-duplicates.txt | grep -o -E '[0-9]{5}\/[0-9]{5}' > cifor-handle-to-delete.txt
|
||||
$ wc -l cifor-handle-to-delete.txt
|
||||
@ -514,51 +513,53 @@ $ wc -l 10568-92904.csv
|
||||
$ while read line; do sed -i "\#$line#d" 10568-92904.csv; done < cifor-handle-to-delete.txt
|
||||
$ wc -l 10568-92904.csv
|
||||
2399 10568-92904.csv
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This iterates over the handles for deletion and uses <code>sed</code> with an alternative pattern delimiter of ‘#’ (which must be escaped), because the pattern itself contains a ‘/’</li>
|
||||
<li>The mapped ones will be difficult because we need their internal IDs in order to map them, and there are 50 of them:</li>
|
||||
</ul>
|
||||
<li><p>This iterates over the handles for deletion and uses <code>sed</code> with an alternative pattern delimiter of ‘#’ (which must be escaped), because the pattern itself contains a ‘/’</p></li>
|
||||
|
||||
<li><p>The mapped ones will be difficult because we need their internal IDs in order to map them, and there are 50 of them:</p>
|
||||
|
||||
<pre><code>$ grep map 2018-06-22-cifor-duplicates.txt | grep -o -E '[0-9]{5}\/[0-9]{5}' > cifor-handle-to-map.txt
|
||||
$ wc -l cifor-handle-to-map.txt
|
||||
50 cifor-handle-to-map.txt
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I can either get them from the databse, or programatically export the metadata using <code>dspace metadata-export -i 10568/xxxxx</code>…</li>
|
||||
<li>Oooh, I can export the items one by one, concatenate them together, remove the headers, and extract the <code>id</code> and <code>collection</code> columns using <a href="https://csvkit.readthedocs.io/">csvkit</a>:</li>
|
||||
</ul>
|
||||
<li><p>I can either get them from the databse, or programatically export the metadata using <code>dspace metadata-export -i 10568/xxxxx</code>…</p></li>
|
||||
|
||||
<li><p>Oooh, I can export the items one by one, concatenate them together, remove the headers, and extract the <code>id</code> and <code>collection</code> columns using <a href="https://csvkit.readthedocs.io/">csvkit</a>:</p>
|
||||
|
||||
<pre><code>$ while read line; do filename=${line/\//-}.csv; dspace metadata-export -i $line -f $filename; done < /tmp/cifor-handle-to-map.txt
|
||||
$ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 > map-to-cifor-archive.csv
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then I can use Open Refine to add the “CIFOR Archive” collection to the mappings</li>
|
||||
<li>Importing the 2398 items via <code>dspace metadata-import</code> ends up with a Java garbage collection error, so I think I need to do it in batches of 1,000</li>
|
||||
<li>After deleting the 62 duplicates, mapping the 50 items from elsewhere in CGSpace, and uploading 2,398 unique items, there are a total of 2,448 items added in this batch</li>
|
||||
<li>I’ll let Abenet take one last look and then move them to CGSpace</li>
|
||||
<li><p>Then I can use Open Refine to add the “CIFOR Archive” collection to the mappings</p></li>
|
||||
|
||||
<li><p>Importing the 2398 items via <code>dspace metadata-import</code> ends up with a Java garbage collection error, so I think I need to do it in batches of 1,000</p></li>
|
||||
|
||||
<li><p>After deleting the 62 duplicates, mapping the 50 items from elsewhere in CGSpace, and uploading 2,398 unique items, there are a total of 2,448 items added in this batch</p></li>
|
||||
|
||||
<li><p>I’ll let Abenet take one last look and then move them to CGSpace</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-06-28">2018-06-28</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test appears to have crashed last night</li>
|
||||
<li>There is nothing in the Tomcat or DSpace logs, but I see the following in <code>dmesg -T</code>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>There is nothing in the Tomcat or DSpace logs, but I see the following in <code>dmesg -T</code>:</p>
|
||||
|
||||
<pre><code>[Thu Jun 28 00:00:30 2018] Out of memory: Kill process 14501 (java) score 701 or sacrifice child
|
||||
[Thu Jun 28 00:00:30 2018] Killed process 14501 (java) total-vm:14926704kB, anon-rss:5693608kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Thu Jun 28 00:00:30 2018] oom_reaper: reaped process 14501 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Look over IITA’s <a href="https://dspacetest.cgiar.org/handle/10568/96071">IITA_Jan_9_II_Ab</a> collection from earlier this month on DSpace Test</li>
|
||||
<li>Bosede fixed a few things (and seems to have removed many French IITA subjects like <code>AMÉLIORATION DES PLANTES</code> and <code>SANTÉ DES PLANTES</code>)</li>
|
||||
<li>I still see at least one issue with author affiliations, and I didn’t bother to check the AGROVOC subjects because it’s such a mess aanyways</li>
|
||||
<li>I suggested that IITA provide an updated list of subject to us so we can include their controlled vocabulary in CGSpace, which would also make it easier to do automated validation</li>
|
||||
<li><p>Look over IITA’s <a href="https://dspacetest.cgiar.org/handle/10568/96071">IITA_Jan_9_II_Ab</a> collection from earlier this month on DSpace Test</p></li>
|
||||
|
||||
<li><p>Bosede fixed a few things (and seems to have removed many French IITA subjects like <code>AMÉLIORATION DES PLANTES</code> and <code>SANTÉ DES PLANTES</code>)</p></li>
|
||||
|
||||
<li><p>I still see at least one issue with author affiliations, and I didn’t bother to check the AGROVOC subjects because it’s such a mess aanyways</p></li>
|
||||
|
||||
<li><p>I suggested that IITA provide an updated list of subject to us so we can include their controlled vocabulary in CGSpace, which would also make it easier to do automated validation</p></li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -11,15 +11,13 @@
|
||||
|
||||
I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:
|
||||
|
||||
|
||||
$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
||||
|
||||
|
||||
|
||||
During the mvn package stage on the 5.8 branch I kept getting issues with java running out of memory:
|
||||
|
||||
|
||||
There is insufficient memory for the Java Runtime Environment to continue.
|
||||
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-07/" />
|
||||
@ -33,17 +31,15 @@ There is insufficient memory for the Java Runtime Environment to continue.
|
||||
|
||||
I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:
|
||||
|
||||
|
||||
$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
||||
|
||||
|
||||
|
||||
During the mvn package stage on the 5.8 branch I kept getting issues with java running out of memory:
|
||||
|
||||
|
||||
There is insufficient memory for the Java Runtime Environment to continue.
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -125,30 +121,25 @@ There is insufficient memory for the Java Runtime Environment to continue.
|
||||
<h2 id="2018-07-01">2018-07-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
|
||||
</ul>
|
||||
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p>
|
||||
|
||||
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
|
||||
</ul>
|
||||
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
|
||||
|
||||
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
<li>As the machine only has 8GB of RAM, I reduced the Tomcat memory heap from 5120m to 4096m so I could try to allocate more to the build process:</li>
|
||||
</ul>
|
||||
<li><p>As the machine only has 8GB of RAM, I reduced the Tomcat memory heap from 5120m to 4096m so I could try to allocate more to the build process:</p>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=dspacetest.cgiar.org -P \!dspace-lni,\!dspace-rdf,\!dspace-sword,\!dspace-swordv2 clean package
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then I stopped the Tomcat 7 service, ran the ant update, and manually ran the old and ignored SQL migrations:</li>
|
||||
</ul>
|
||||
<li><p>Then I stopped the Tomcat 7 service, ran the ant update, and manually ran the old and ignored SQL migrations:</p>
|
||||
|
||||
<pre><code>$ sudo su - postgres
|
||||
$ psql dspace
|
||||
@ -163,10 +154,9 @@ dspace=# commit
|
||||
dspace=# \q
|
||||
$ exit
|
||||
$ dspace database migrate ignored
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have</li>
|
||||
<li><p>After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-07-02">2018-07-02</h2>
|
||||
@ -179,38 +169,34 @@ $ dspace database migrate ignored
|
||||
<h2 id="2018-07-03">2018-07-03</h2>
|
||||
|
||||
<ul>
|
||||
<li>Finally finish with the CIFOR Archive records (a total of 2448):
|
||||
<li><p>Finally finish with the CIFOR Archive records (a total of 2448):</p>
|
||||
|
||||
<ul>
|
||||
<li>I mapped the 50 items that were duplicates from elsewhere in CGSpace into <a href="https://cgspace.cgiar.org/handle/10568/16702">CIFOR Archive</a></li>
|
||||
<li>I did one last check of the remaining 2398 items and found eight who have a <code>cg.identifier.doi</code> that links to some URL other than a DOI so I moved those to <code>cg.identifier.url</code> and <code>cg.identifier.googleurl</code> as appropriate</li>
|
||||
<li>Also, thirteen items had a DOI in their citation, but did not have a <code>cg.identifier.doi</code> field, so I added those</li>
|
||||
<li>Then I imported those 2398 items in two batches (to deal with memory issues):</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<li><p>Then I imported those 2398 items in two batches (to deal with memory issues):</p>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive.csv
|
||||
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive2.csv
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul>
|
||||
<li>I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely:</li>
|
||||
</ul>
|
||||
<li><p>I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely:</p>
|
||||
|
||||
<pre><code>dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
|
||||
count
|
||||
count
|
||||
-------
|
||||
785
|
||||
785
|
||||
dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
|
||||
count
|
||||
count
|
||||
-------
|
||||
4
|
||||
</code></pre>
|
||||
4
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think I should fix that as well as some other garbage values like “test” and “dspace.ilri.org” etc:</li>
|
||||
</ul>
|
||||
<li><p>I think I should fix that as well as some other garbage values like “test” and “dspace.ilri.org” etc:</p>
|
||||
|
||||
<pre><code>dspace=# begin;
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
|
||||
@ -222,14 +208,12 @@ UPDATE 1
|
||||
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403);
|
||||
DELETE 4
|
||||
dspace=# commit;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Testing DSpace 5.8 with PostgreSQL 9.6 and Tomcat 8.5.32 (instead of my usual 7.0.88) and for some reason I get autowire errors on Catalina startup with 8.5.32:</li>
|
||||
</ul>
|
||||
<li><p>Testing DSpace 5.8 with PostgreSQL 9.6 and Tomcat 8.5.32 (instead of my usual 7.0.88) and for some reason I get autowire errors on Catalina startup with 8.5.32:</p>
|
||||
|
||||
<pre><code>03-Jul-2018 19:51:37.272 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener]
|
||||
java.lang.RuntimeException: Failure during filter init: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
|
||||
java.lang.RuntimeException: Failure during filter init: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
|
||||
at org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener.contextInitialized(DSpaceKernelServletContextListener.java:92)
|
||||
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4792)
|
||||
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5256)
|
||||
@ -245,10 +229,9 @@ dspace=# commit;
|
||||
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
|
||||
at java.lang.Thread.run(Thread.java:748)
|
||||
Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Gotta check that out later…</li>
|
||||
<li><p>Gotta check that out later…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-07-04">2018-07-04</h2>
|
||||
@ -274,92 +257,96 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
|
||||
<li>I was tempted to do the Linode instance upgrade on CGSpace (linode18), but after looking closely at the system backups I noticed that Solr isn’t being backed up to S3</li>
|
||||
<li>I apparently noticed this—and fixed it!—in <a href="/cgspace-notes/2016-07/">2016-07</a>, but it doesn’t look like the backup has been updated since then!</li>
|
||||
<li>It looks like I added Solr to the <code>backup_to_s3.sh</code> script, but that script is not even being used (<code>s3cmd</code> is run directly from root’s crontab)</li>
|
||||
<li>For now I have just initiated a manual S3 backup of the Solr data:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>For now I have just initiated a manual S3 backup of the Solr data:</p>
|
||||
|
||||
<pre><code># s3cmd sync --delete-removed /home/backup/solr/ s3://cgspace.cgiar.org/solr/
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But I need to add this to cron!</li>
|
||||
<li>I wonder if I should convert some of the cron jobs to systemd services / timers…</li>
|
||||
<li>I sent a note to all our users on Yammer to ask them about possible maintenance on Sunday, July 14th</li>
|
||||
<li>Abenet wants to be able to search by journal title (dc.source) in the advanced Discovery search so I opened an issue for it (<a href="https://github.com/ilri/DSpace/issues/384">#384</a>)</li>
|
||||
<li>I regenerated the list of names for all our ORCID iDs using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</li>
|
||||
</ul>
|
||||
<li><p>But I need to add this to cron!</p></li>
|
||||
|
||||
<li><p>I wonder if I should convert some of the cron jobs to systemd services / timers…</p></li>
|
||||
|
||||
<li><p>I sent a note to all our users on Yammer to ask them about possible maintenance on Sunday, July 14th</p></li>
|
||||
|
||||
<li><p>Abenet wants to be able to search by journal title (dc.source) in the advanced Discovery search so I opened an issue for it (<a href="https://github.com/ilri/DSpace/issues/384">#384</a>)</p></li>
|
||||
|
||||
<li><p>I regenerated the list of names for all our ORCID iDs using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</p>
|
||||
|
||||
<pre><code>$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml | sort | uniq > /tmp/2018-07-08-orcids.txt
|
||||
$ ./resolve-orcids.py -i /tmp/2018-07-08-orcids.txt -o /tmp/2018-07-08-names.txt -d
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But after comparing to the existing list of names I didn’t see much change, so I just ignored it</li>
|
||||
<li><p>But after comparing to the existing list of names I didn’t see much change, so I just ignored it</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-07-09">2018-07-09</h2>
|
||||
|
||||
<ul>
|
||||
<li>Uptime Robot said that CGSpace was down for two minutes early this morning but I don’t see anything in Tomcat logs or dmesg</li>
|
||||
<li>Uptime Robot said that CGSpace was down for two minutes again later in the day, and this time I saw a memory error in Tomcat’s <code>catalina.out</code>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Uptime Robot said that CGSpace was down for two minutes again later in the day, and this time I saw a memory error in Tomcat’s <code>catalina.out</code>:</p>
|
||||
|
||||
<pre><code>Exception in thread "http-bio-127.0.0.1-8081-exec-557" java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’m not sure if it’s the same error, but I see this in DSpace’s <code>solr.log</code>:</li>
|
||||
</ul>
|
||||
<li><p>I’m not sure if it’s the same error, but I see this in DSpace’s <code>solr.log</code>:</p>
|
||||
|
||||
<pre><code>2018-07-09 06:25:09,913 ERROR org.apache.solr.servlet.SolrDispatchFilter @ null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see a strange error around that time in <code>dspace.log.2018-07-08</code>:</li>
|
||||
</ul>
|
||||
<li><p>I see a strange error around that time in <code>dspace.log.2018-07-08</code>:</p>
|
||||
|
||||
<pre><code>2018-07-09 06:23:43,510 ERROR com.atmire.statistics.SolrLogThread @ IOException occured when talking to server at: http://localhost:8081/solr/statistics
|
||||
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr/statistics
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But not sure what caused that…</li>
|
||||
<li>I got a message from Linode tonight that CPU usage was high on CGSpace for the past few hours around 8PM GMT</li>
|
||||
<li>Looking in the nginx logs I see the top ten IP addresses active today:</li>
|
||||
</ul>
|
||||
<li><p>But not sure what caused that…</p></li>
|
||||
|
||||
<li><p>I got a message from Linode tonight that CPU usage was high on CGSpace for the past few hours around 8PM GMT</p></li>
|
||||
|
||||
<li><p>Looking in the nginx logs I see the top ten IP addresses active today:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "09/Jul/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1691 40.77.167.84
|
||||
1701 40.77.167.69
|
||||
1718 50.116.102.77
|
||||
1872 137.108.70.6
|
||||
2172 157.55.39.234
|
||||
2190 207.46.13.47
|
||||
2848 178.154.200.38
|
||||
4367 35.227.26.162
|
||||
4387 70.32.83.92
|
||||
4738 95.108.181.88
|
||||
</code></pre>
|
||||
1691 40.77.167.84
|
||||
1701 40.77.167.69
|
||||
1718 50.116.102.77
|
||||
1872 137.108.70.6
|
||||
2172 157.55.39.234
|
||||
2190 207.46.13.47
|
||||
2848 178.154.200.38
|
||||
4367 35.227.26.162
|
||||
4387 70.32.83.92
|
||||
4738 95.108.181.88
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Of those, <em>all</em> except <code>70.32.83.92</code> and <code>50.116.102.77</code> are <em>NOT</em> re-using their Tomcat sessions, for example from the XMLUI logs:</li>
|
||||
</ul>
|
||||
<li><p>Of those, <em>all</em> except <code>70.32.83.92</code> and <code>50.116.102.77</code> are <em>NOT</em> re-using their Tomcat sessions, for example from the XMLUI logs:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-07-09
|
||||
4435
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>95.108.181.88</code> appears to be Yandex, so I dunno why it’s creating so many sessions, as its user agent should match Tomcat’s Crawler Session Manager Valve</li>
|
||||
<li><code>70.32.83.92</code> is on MediaTemple but I’m not sure who it is. They are mostly hitting REST so I guess that’s fine</li>
|
||||
<li><code>35.227.26.162</code> doesn’t declare a user agent and is on Google Cloud, so I should probably mark them as a bot in nginx</li>
|
||||
<li><code>178.154.200.38</code> is Yandex again</li>
|
||||
<li><code>207.46.13.47</code> is Bing</li>
|
||||
<li><code>157.55.39.234</code> is Bing</li>
|
||||
<li><code>137.108.70.6</code> is our old friend CORE bot</li>
|
||||
<li><code>50.116.102.77</code> doesn’t declare a user agent and lives on HostGator, but mostly just hits the REST API so I guess that’s fine</li>
|
||||
<li><code>40.77.167.84</code> is Bing again</li>
|
||||
<li>Interestingly, the first time that I see <code>35.227.26.162</code> was on 2018-06-08</li>
|
||||
<li>I’ve added <code>35.227.26.162</code> to the bot tagging logic in the nginx vhost</li>
|
||||
<li><p><code>95.108.181.88</code> appears to be Yandex, so I dunno why it’s creating so many sessions, as its user agent should match Tomcat’s Crawler Session Manager Valve</p></li>
|
||||
|
||||
<li><p><code>70.32.83.92</code> is on MediaTemple but I’m not sure who it is. They are mostly hitting REST so I guess that’s fine</p></li>
|
||||
|
||||
<li><p><code>35.227.26.162</code> doesn’t declare a user agent and is on Google Cloud, so I should probably mark them as a bot in nginx</p></li>
|
||||
|
||||
<li><p><code>178.154.200.38</code> is Yandex again</p></li>
|
||||
|
||||
<li><p><code>207.46.13.47</code> is Bing</p></li>
|
||||
|
||||
<li><p><code>157.55.39.234</code> is Bing</p></li>
|
||||
|
||||
<li><p><code>137.108.70.6</code> is our old friend CORE bot</p></li>
|
||||
|
||||
<li><p><code>50.116.102.77</code> doesn’t declare a user agent and lives on HostGator, but mostly just hits the REST API so I guess that’s fine</p></li>
|
||||
|
||||
<li><p><code>40.77.167.84</code> is Bing again</p></li>
|
||||
|
||||
<li><p>Interestingly, the first time that I see <code>35.227.26.162</code> was on 2018-06-08</p></li>
|
||||
|
||||
<li><p>I’ve added <code>35.227.26.162</code> to the bot tagging logic in the nginx vhost</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-07-10">2018-07-10</h2>
|
||||
@ -372,32 +359,30 @@ org.apache.solr.client.solrj.SolrServerException: IOException occured when talki
|
||||
<li>All were tested and merged to the <code>5_x-prod</code> branch and will be deployed on CGSpace this coming weekend when I do the Linode server upgrade</li>
|
||||
<li>I need to get them onto the 5.8 testing branch too, either via cherry-picking or by rebasing after we finish testing Atmire’s 5.8 pull request (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)</li>
|
||||
<li>Linode sent an alert about CPU usage on CGSpace again, about 13:00UTC</li>
|
||||
<li>These are the top ten users in the last two hours:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>These are the top ten users in the last two hours:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Jul/2018:(11|12|13)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
81 193.95.22.113
|
||||
82 50.116.102.77
|
||||
112 40.77.167.90
|
||||
117 196.190.95.98
|
||||
120 178.154.200.38
|
||||
215 40.77.167.96
|
||||
243 41.204.190.40
|
||||
415 95.108.181.88
|
||||
695 35.227.26.162
|
||||
697 213.139.52.250
|
||||
</code></pre>
|
||||
81 193.95.22.113
|
||||
82 50.116.102.77
|
||||
112 40.77.167.90
|
||||
117 196.190.95.98
|
||||
120 178.154.200.38
|
||||
215 40.77.167.96
|
||||
243 41.204.190.40
|
||||
415 95.108.181.88
|
||||
695 35.227.26.162
|
||||
697 213.139.52.250
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looks like <code>213.139.52.250</code> is Moayad testing his new CGSpace vizualization thing:</li>
|
||||
</ul>
|
||||
<li><p>Looks like <code>213.139.52.250</code> is Moayad testing his new CGSpace vizualization thing:</p>
|
||||
|
||||
<pre><code>213.139.52.250 - - [10/Jul/2018:13:39:41 +0000] "GET /bitstream/handle/10568/75668/dryad.png HTTP/2.0" 200 53750 "http://localhost:4200/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>He said there was a bug that caused his app to request a bunch of invalid URLs</li>
|
||||
<li>I’ll have to keep and eye on this and see how their platform evolves</li>
|
||||
<li><p>He said there was a bug that caused his app to request a bunch of invalid URLs</p></li>
|
||||
|
||||
<li><p>I’ll have to keep and eye on this and see how their platform evolves</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-07-11">2018-07-11</h2>
|
||||
@ -417,85 +402,83 @@ org.apache.solr.client.solrj.SolrServerException: IOException occured when talki
|
||||
|
||||
<ul>
|
||||
<li>Uptime Robot said that CGSpace went down a few times last night, around 10:45 PM and 12:30 AM</li>
|
||||
<li>Here are the top ten IPs from last night and this morning:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Here are the top ten IPs from last night and this morning:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "11/Jul/2018:22" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
48 66.249.64.91
|
||||
50 35.227.26.162
|
||||
57 157.55.39.234
|
||||
59 157.55.39.71
|
||||
62 147.99.27.190
|
||||
82 95.108.181.88
|
||||
92 40.77.167.90
|
||||
97 183.128.40.185
|
||||
97 240e:f0:44:fa53:745a:8afe:d221:1232
|
||||
3634 208.110.72.10
|
||||
48 66.249.64.91
|
||||
50 35.227.26.162
|
||||
57 157.55.39.234
|
||||
59 157.55.39.71
|
||||
62 147.99.27.190
|
||||
82 95.108.181.88
|
||||
92 40.77.167.90
|
||||
97 183.128.40.185
|
||||
97 240e:f0:44:fa53:745a:8afe:d221:1232
|
||||
3634 208.110.72.10
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "12/Jul/2018:00" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
25 216.244.66.198
|
||||
38 40.77.167.185
|
||||
46 66.249.64.93
|
||||
56 157.55.39.71
|
||||
60 35.227.26.162
|
||||
65 157.55.39.234
|
||||
83 95.108.181.88
|
||||
87 66.249.64.91
|
||||
96 40.77.167.90
|
||||
7075 208.110.72.10
|
||||
</code></pre>
|
||||
25 216.244.66.198
|
||||
38 40.77.167.185
|
||||
46 66.249.64.93
|
||||
56 157.55.39.71
|
||||
60 35.227.26.162
|
||||
65 157.55.39.234
|
||||
83 95.108.181.88
|
||||
87 66.249.64.91
|
||||
96 40.77.167.90
|
||||
7075 208.110.72.10
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We have never seen <code>208.110.72.10</code> before… so that’s interesting!</li>
|
||||
<li>The user agent for these requests is: Pcore-HTTP/v0.44.0</li>
|
||||
<li>A brief Google search doesn’t turn up any information about what this bot is, but lots of users complaining about it</li>
|
||||
<li>This bot does make a lot of requests all through the day, although it seems to re-use its Tomcat session:</li>
|
||||
</ul>
|
||||
<li><p>We have never seen <code>208.110.72.10</code> before… so that’s interesting!</p></li>
|
||||
|
||||
<li><p>The user agent for these requests is: Pcore-HTTP/v0.44.0</p></li>
|
||||
|
||||
<li><p>A brief Google search doesn’t turn up any information about what this bot is, but lots of users complaining about it</p></li>
|
||||
|
||||
<li><p>This bot does make a lot of requests all through the day, although it seems to re-use its Tomcat session:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "Pcore-HTTP" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
17098 208.110.72.10
|
||||
17098 208.110.72.10
|
||||
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=208.110.72.10' dspace.log.2018-07-11
|
||||
1161
|
||||
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=208.110.72.10' dspace.log.2018-07-12
|
||||
1885
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think the problem is that, despite the bot requesting <code>robots.txt</code>, it almost exlusively requests dynamic pages from <code>/discover</code>:</li>
|
||||
</ul>
|
||||
<li><p>I think the problem is that, despite the bot requesting <code>robots.txt</code>, it almost exlusively requests dynamic pages from <code>/discover</code>:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "Pcore-HTTP" | grep -o -E "GET /(browse|discover|search-filter)" | sort -n | uniq -c | sort -rn
|
||||
13364 GET /discover
|
||||
993 GET /search-filter
|
||||
804 GET /browse
|
||||
13364 GET /discover
|
||||
993 GET /search-filter
|
||||
804 GET /browse
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "Pcore-HTTP" | grep robots
|
||||
208.110.72.10 - - [12/Jul/2018:00:22:28 +0000] "GET /robots.txt HTTP/1.1" 200 1301 "https://cgspace.cgiar.org/robots.txt" "Pcore-HTTP/v0.44.0"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So this bot is just like Baiduspider, and I need to add it to the nginx rate limiting</li>
|
||||
<li>I’ll also add it to Tomcat’s Crawler Session Manager Valve to force the re-use of a common Tomcat sesssion for all crawlers just in case</li>
|
||||
<li>Generate a list of all affiliations in CGSpace to send to Mohamed Salem to compare with the list on MEL (sorting the list by most occurrences):</li>
|
||||
</ul>
|
||||
<li><p>So this bot is just like Baiduspider, and I need to add it to the nginx rate limiting</p></li>
|
||||
|
||||
<li><p>I’ll also add it to Tomcat’s Crawler Session Manager Valve to force the re-use of a common Tomcat sesssion for all crawlers just in case</p></li>
|
||||
|
||||
<li><p>Generate a list of all affiliations in CGSpace to send to Mohamed Salem to compare with the list on MEL (sorting the list by most occurrences):</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv header
|
||||
COPY 4518
|
||||
dspace=# \q
|
||||
$ csvcut -c 1 < /tmp/affiliations.csv > /tmp/affiliations-1.csv
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We also need to discuss standardizing our countries and comparing our ORCID iDs</li>
|
||||
<li><p>We also need to discuss standardizing our countries and comparing our ORCID iDs</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-07-13">2018-07-13</h2>
|
||||
|
||||
<ul>
|
||||
<li>Generate a list of affiliations for Peter and Abenet to go over so we can batch correct them before we deploy the new data visualization dashboard:</li>
|
||||
</ul>
|
||||
<li><p>Generate a list of affiliations for Peter and Abenet to go over so we can batch correct them before we deploy the new data visualization dashboard:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv header;
|
||||
COPY 4518
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-07-15">2018-07-15</h2>
|
||||
|
||||
@ -506,8 +489,8 @@ COPY 4518
|
||||
<li>Peter had asked a question about how mapped items are displayed in the Altmetric dashboard</li>
|
||||
<li>For example, <a href="10568/82810"><sup>10568</sup>⁄<sub>82810</sub></a> is mapped to four collections, but only shows up in one “department” in their dashboard</li>
|
||||
<li>Altmetric help said that <a href="https://cgspace.cgiar.org/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:cgspace.cgiar.org:10568/82810">according to OAI that item is only in one department</a></li>
|
||||
<li>I noticed that indeed there was only one collection listed, so I forced an OAI re-import on CGSpace:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I noticed that indeed there was only one collection listed, so I forced an OAI re-import on CGSpace:</p>
|
||||
|
||||
<pre><code>$ dspace oai import -c
|
||||
OAI 2.0 manager action started
|
||||
@ -522,38 +505,34 @@ Full import
|
||||
Total: 73925 items
|
||||
Purging cached OAI responses.
|
||||
OAI 2.0 manager action ended. It took 697 seconds.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Now I see four colletions in OAI for that item!</li>
|
||||
<li>I need to ask the dspace-tech mailing list if the nightly OAI import catches the case of old items that have had metadata or mappings change</li>
|
||||
<li>ICARDA sent me a list of the ORCID iDs they have in the MEL system and it looks like almost 150 are new and unique to us!</li>
|
||||
</ul>
|
||||
<li><p>Now I see four colletions in OAI for that item!</p></li>
|
||||
|
||||
<li><p>I need to ask the dspace-tech mailing list if the nightly OAI import catches the case of old items that have had metadata or mappings change</p></li>
|
||||
|
||||
<li><p>ICARDA sent me a list of the ORCID iDs they have in the MEL system and it looks like almost 150 are new and unique to us!</p>
|
||||
|
||||
<pre><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l
|
||||
1020
|
||||
$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq | wc -l
|
||||
1158
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I combined the two lists and regenerated the names for all our the ORCID iDs using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</li>
|
||||
</ul>
|
||||
<li><p>I combined the two lists and regenerated the names for all our the ORCID iDs using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</p>
|
||||
|
||||
<pre><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > /tmp/2018-07-15-orcid-ids.txt
|
||||
$ ./resolve-orcids.py -i /tmp/2018-07-15-orcid-ids.txt -o /tmp/2018-07-15-resolved-orcids.txt -d
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then I added the XML formatting for controlled vocabularies, sorted the list with GNU sort in vim via <code>% !sort</code> and then checked the formatting with tidy:</li>
|
||||
</ul>
|
||||
<li><p>Then I added the XML formatting for controlled vocabularies, sorted the list with GNU sort in vim via <code>% !sort</code> and then checked the formatting with tidy:</p>
|
||||
|
||||
<pre><code>$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I will check with the CGSpace team to see if they want me to add these to CGSpace</li>
|
||||
<li>Help Udana from WLE understand some Altmetrics concepts</li>
|
||||
<li><p>I will check with the CGSpace team to see if they want me to add these to CGSpace</p></li>
|
||||
|
||||
<li><p>Help Udana from WLE understand some Altmetrics concepts</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-07-18">2018-07-18</h2>
|
||||
@ -565,20 +544,20 @@ $ ./resolve-orcids.py -i /tmp/2018-07-15-orcid-ids.txt -o /tmp/2018-07-15-resolv
|
||||
<li>I suggested that we should have a wider meeting about this, and that I would post that on Yammer</li>
|
||||
<li>I was curious about how and when Altmetric harvests the OAI, so I looked in nginx’s OAI log</li>
|
||||
<li>For every day in the past week I only see about 50 to 100 requests per day, but then about nine days ago I see 1500 requsts</li>
|
||||
<li>In there I see two bots making about 750 requests each, and this one is probably Altmetric:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>In there I see two bots making about 750 requests each, and this one is probably Altmetric:</p>
|
||||
|
||||
<pre><code>178.33.237.157 - - [09/Jul/2018:17:00:46 +0000] "GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////100 HTTP/1.1" 200 58653 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_121)"
|
||||
178.33.237.157 - - [09/Jul/2018:17:01:11 +0000] "GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////200 HTTP/1.1" 200 67950 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_121)"
|
||||
...
|
||||
178.33.237.157 - - [09/Jul/2018:22:10:39 +0000] "GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////73900 HTTP/1.1" 20 0 25049 "-" "Apache-HttpClient/4.5.2 (Java/1.8.0_121)"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So if they are getting 100 records per OAI request it would take them 739 requests</li>
|
||||
<li>I wonder if I should add this user agent to the Tomcat Crawler Session Manager valve… does OAI use Tomcat sessions?</li>
|
||||
<li>Appears not:</li>
|
||||
</ul>
|
||||
<li><p>So if they are getting 100 records per OAI request it would take them 739 requests</p></li>
|
||||
|
||||
<li><p>I wonder if I should add this user agent to the Tomcat Crawler Session Manager valve… does OAI use Tomcat sessions?</p></li>
|
||||
|
||||
<li><p>Appears not:</p>
|
||||
|
||||
<pre><code>$ http --print Hh 'https://cgspace.cgiar.org/oai/request?verb=ListRecords&resumptionToken=oai_dc////100'
|
||||
GET /oai/request?verb=ListRecords&resumptionToken=oai_dc////100 HTTP/1.1
|
||||
@ -600,7 +579,8 @@ Vary: Accept-Encoding
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-XSS-Protection: 1; mode=block
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-07-19">2018-07-19</h2>
|
||||
|
||||
@ -620,44 +600,45 @@ X-XSS-Protection: 1; mode=block
|
||||
<ul>
|
||||
<li>I told the IWMI people that they can use <code>sort_by=3</code> in their OpenSearch query to sort the results by <code>dc.date.accessioned</code> instead of <code>dc.date.issued</code></li>
|
||||
<li>They say that it is a burden for them to capture the issue dates, so I cautioned them that this is in their own benefit for future posterity and that everyone else on CGSpace manages to capture the issue dates!</li>
|
||||
<li>For future reference, as I had previously noted in <a href="/cgspace-notes/2018-04/">2018-04</a>, sort options are configured in <code>dspace.cfg</code>, for example:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>For future reference, as I had previously noted in <a href="/cgspace-notes/2018-04/">2018-04</a>, sort options are configured in <code>dspace.cfg</code>, for example:</p>
|
||||
|
||||
<pre><code>webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Just because I was curious I made sure that these options are working as expected in DSpace 5.8 on DSpace Test (they are)</li>
|
||||
<li>I tested the Atmire Listings and Reports (L&R) module one last time on my local test environment with a new snapshot of CGSpace’s database and re-generated Discovery index and it worked fine</li>
|
||||
<li>I finally informed Atmire that we’re ready to proceed with deploying this to CGSpace and that they should advise whether we should wait about the SNAPSHOT versions in <code>pom.xml</code></li>
|
||||
<li>There is no word on the issue I reported with Tomcat 8.5.32 yet, though…</li>
|
||||
<li><p>Just because I was curious I made sure that these options are working as expected in DSpace 5.8 on DSpace Test (they are)</p></li>
|
||||
|
||||
<li><p>I tested the Atmire Listings and Reports (L&R) module one last time on my local test environment with a new snapshot of CGSpace’s database and re-generated Discovery index and it worked fine</p></li>
|
||||
|
||||
<li><p>I finally informed Atmire that we’re ready to proceed with deploying this to CGSpace and that they should advise whether we should wait about the SNAPSHOT versions in <code>pom.xml</code></p></li>
|
||||
|
||||
<li><p>There is no word on the issue I reported with Tomcat 8.5.32 yet, though…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-07-23">2018-07-23</h2>
|
||||
|
||||
<ul>
|
||||
<li>Still discussing dates with IWMI</li>
|
||||
<li>I looked in the database to see the breakdown of date formats used in <code>dc.date.issued</code>, ie YYYY, YYYY-MM, or YYYY-MM-DD:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I looked in the database to see the breakdown of date formats used in <code>dc.date.issued</code>, ie YYYY, YYYY-MM, or YYYY-MM-DD:</p>
|
||||
|
||||
<pre><code>dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=15 and text_value ~ '^[0-9]{4}$';
|
||||
count
|
||||
count
|
||||
-------
|
||||
53292
|
||||
53292
|
||||
(1 row)
|
||||
dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=15 and text_value ~ '^[0-9]{4}-[0-9]{2}$';
|
||||
count
|
||||
count
|
||||
-------
|
||||
3818
|
||||
3818
|
||||
(1 row)
|
||||
dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=15 and text_value ~ '^[0-9]{4}-[0-9]{2}-[0-9]{2}$';
|
||||
count
|
||||
count
|
||||
-------
|
||||
17357
|
||||
</code></pre>
|
||||
17357
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So it looks like YYYY is the most numerious, followed by YYYY-MM-DD, then YYYY-MM</li>
|
||||
<li><p>So it looks like YYYY is the most numerious, followed by YYYY-MM-DD, then YYYY-MM</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-07-26">2018-07-26</h2>
|
||||
|
@ -11,18 +11,21 @@
|
||||
|
||||
DSpace Test had crashed at some point yesterday morning and I see the following in dmesg:
|
||||
|
||||
|
||||
[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
||||
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
|
||||
|
||||
|
||||
Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight
|
||||
|
||||
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat’s
|
||||
|
||||
I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…
|
||||
|
||||
Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core
|
||||
|
||||
The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
|
||||
|
||||
I ran all system updates on DSpace Test and rebooted it
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
@ -37,21 +40,24 @@ I ran all system updates on DSpace Test and rebooted it
|
||||
|
||||
DSpace Test had crashed at some point yesterday morning and I see the following in dmesg:
|
||||
|
||||
|
||||
[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
||||
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
|
||||
|
||||
|
||||
Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight
|
||||
|
||||
From the DSpace log I see that eventually Solr stopped responding, so I guess the java process that was OOM killed above was Tomcat’s
|
||||
|
||||
I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…
|
||||
|
||||
Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core
|
||||
|
||||
The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
|
||||
|
||||
I ran all system updates on DSpace Test and rebooted it
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -133,21 +139,24 @@ I ran all system updates on DSpace Test and rebooted it
|
||||
<h2 id="2018-08-01">2018-08-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p>
|
||||
|
||||
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
||||
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
|
||||
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
|
||||
<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
|
||||
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
|
||||
<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
|
||||
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
||||
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li>
|
||||
|
||||
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</p></li>
|
||||
|
||||
<li><p>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</p></li>
|
||||
|
||||
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
|
||||
|
||||
<li><p>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
|
||||
|
||||
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
@ -164,22 +173,27 @@ I ran all system updates on DSpace Test and rebooted it
|
||||
<h2 id="2018-08-02">2018-08-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test crashed again and I don’t see the only error I see is this in <code>dmesg</code>:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test crashed again and I don’t see the only error I see is this in <code>dmesg</code>:</p>
|
||||
|
||||
<pre><code>[Thu Aug 2 00:00:12 2018] Out of memory: Kill process 1407 (java) score 787 or sacrifice child
|
||||
[Thu Aug 2 00:00:12 2018] Killed process 1407 (java) total-vm:18876328kB, anon-rss:6323836kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I am still assuming that this is the Tomcat process that is dying, so maybe actually we need to reduce its memory instead of increasing it?</li>
|
||||
<li>The risk we run there is that we’ll start getting OutOfMemory errors from Tomcat</li>
|
||||
<li>So basically we need a new test server with more RAM very soon…</li>
|
||||
<li>Abenet asked about the workflow statistics in the Atmire CUA module again</li>
|
||||
<li>Last year Atmire told me that it’s disabled by default but you can enable it with <code>workflow.stats.enabled = true</code> in the CUA configuration file</li>
|
||||
<li>There was a bug with adding users so they sent a patch, but I didn’t merge it because it was <a href="https://github.com/ilri/DSpace/pull/319">very dirty</a> and I wasn’t sure it actually fixed the problem</li>
|
||||
<li>I just tried to enable the stats again on DSpace Test now that we’re on DSpace 5.8 with updated Atmire modules, but every user I search for shows “No data available”</li>
|
||||
<li>As a test I submitted a new item and I was able to see it in the workflow statistics “data” tab, but not in the graph</li>
|
||||
<li><p>I am still assuming that this is the Tomcat process that is dying, so maybe actually we need to reduce its memory instead of increasing it?</p></li>
|
||||
|
||||
<li><p>The risk we run there is that we’ll start getting OutOfMemory errors from Tomcat</p></li>
|
||||
|
||||
<li><p>So basically we need a new test server with more RAM very soon…</p></li>
|
||||
|
||||
<li><p>Abenet asked about the workflow statistics in the Atmire CUA module again</p></li>
|
||||
|
||||
<li><p>Last year Atmire told me that it’s disabled by default but you can enable it with <code>workflow.stats.enabled = true</code> in the CUA configuration file</p></li>
|
||||
|
||||
<li><p>There was a bug with adding users so they sent a patch, but I didn’t merge it because it was <a href="https://github.com/ilri/DSpace/pull/319">very dirty</a> and I wasn’t sure it actually fixed the problem</p></li>
|
||||
|
||||
<li><p>I just tried to enable the stats again on DSpace Test now that we’re on DSpace 5.8 with updated Atmire modules, but every user I search for shows “No data available”</p></li>
|
||||
|
||||
<li><p>As a test I submitted a new item and I was able to see it in the workflow statistics “data” tab, but not in the graph</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-08-15">2018-08-15</h2>
|
||||
@ -187,31 +201,35 @@ I ran all system updates on DSpace Test and rebooted it
|
||||
<ul>
|
||||
<li>Run through Peter’s list of author affiliations from earlier this month</li>
|
||||
<li>I did some quick sanity checks and small cleanups in Open Refine, checking for spaces, weird accents, and encoding errors</li>
|
||||
<li>Finally I did a test run with the <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897"><code>fix-metadata-value.py</code></a> script:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Finally I did a test run with the <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897"><code>fix-metadata-value.py</code></a> script:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i 2018-08-15-Correct-1083-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t correct -m 211
|
||||
$ ./delete-metadata-values.py -i 2018-08-15-Remove-11-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-08-16">2018-08-16</h2>
|
||||
|
||||
<ul>
|
||||
<li>Generate a list of the top 1,500 authors on CGSpace for Sisay so he can create the controlled vocabulary:</li>
|
||||
</ul>
|
||||
<li><p>Generate a list of the top 1,500 authors on CGSpace for Sisay so he can create the controlled vocabulary:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc limit 1500) to /tmp/2018-08-16-top-1500-authors.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Start working on adding the ORCID metadata to a handful of CIAT authors as requested by Elizabeth earlier this month</li>
|
||||
<li>I might need to overhaul the <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py</a> script to be a little more robust about author order and ORCID metadata that might have been altered manually by editors after submission, as this script was written without that consideration</li>
|
||||
<li>After checking a few examples I see that checking only the <code>text_value</code> and <code>place</code> when adding ORCID fields is not enough anymore</li>
|
||||
<li>It was a sane assumption when I was initially migrating ORCID records from Solr to regular metadata, but now it seems that some authors might have been added or changed after item submission</li>
|
||||
<li>Now it is better to check if there is <em>any</em> existing ORCID identifier for a given author for the item…</li>
|
||||
<li>I will have to update my script to extract the ORCID identifier and search for that</li>
|
||||
<li>Re-create my local DSpace database using the latest PostgreSQL 9.6 Docker image and re-import the latest CGSpace dump:</li>
|
||||
</ul>
|
||||
<li><p>Start working on adding the ORCID metadata to a handful of CIAT authors as requested by Elizabeth earlier this month</p></li>
|
||||
|
||||
<li><p>I might need to overhaul the <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py</a> script to be a little more robust about author order and ORCID metadata that might have been altered manually by editors after submission, as this script was written without that consideration</p></li>
|
||||
|
||||
<li><p>After checking a few examples I see that checking only the <code>text_value</code> and <code>place</code> when adding ORCID fields is not enough anymore</p></li>
|
||||
|
||||
<li><p>It was a sane assumption when I was initially migrating ORCID records from Solr to regular metadata, but now it seems that some authors might have been added or changed after item submission</p></li>
|
||||
|
||||
<li><p>Now it is better to check if there is <em>any</em> existing ORCID identifier for a given author for the item…</p></li>
|
||||
|
||||
<li><p>I will have to update my script to extract the ORCID identifier and search for that</p></li>
|
||||
|
||||
<li><p>Re-create my local DSpace database using the latest PostgreSQL 9.6 Docker image and re-import the latest CGSpace dump:</p>
|
||||
|
||||
<pre><code>$ sudo docker run --name dspacedb -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
|
||||
$ createuser -h localhost -U postgres --pwprompt dspacetest
|
||||
@ -220,7 +238,8 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest ~/Downloads/cgspace_2018-08-16.backup
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-08-19">2018-08-19</h2>
|
||||
|
||||
@ -228,8 +247,8 @@ $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-s
|
||||
<li>Keep working on the CIAT ORCID identifiers from Elizabeth</li>
|
||||
<li>In the spreadsheet she sent me there are some names with other versions in the database, so when it is obviously the same one (ie “Schultze-Kraft, Rainer” and “Schultze-Kraft, R.”) I will just tag them with ORCID identifiers too</li>
|
||||
<li>This is less obvious and more error prone with names like “Peters” where there are many more authors</li>
|
||||
<li>I see some errors in the variations of names as well, for example:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I see some errors in the variations of names as well, for example:</p>
|
||||
|
||||
<pre><code>Verchot, Louis
|
||||
Verchot, L
|
||||
@ -238,12 +257,11 @@ Verchot, L.V
|
||||
Verchot, L.V.
|
||||
Verchot, LV
|
||||
Verchot, Louis V.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’ll just tag them all with Louis Verchot’s ORCID identifier…</li>
|
||||
<li>In the end, I’ll run the following CSV with my <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py</a> script:</li>
|
||||
</ul>
|
||||
<li><p>I’ll just tag them all with Louis Verchot’s ORCID identifier…</p></li>
|
||||
|
||||
<li><p>In the end, I’ll run the following CSV with my <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers-csv.py</a> script:</p>
|
||||
|
||||
<pre><code>dc.contributor.author,cg.creator.id
|
||||
"Campbell, Bruce",Bruce M Campbell: 0000-0002-0123-4859
|
||||
@ -273,42 +291,37 @@ Verchot, Louis V.
|
||||
"Chirinda, Ngonidzashe",Ngonidzashe Chirinda: 0000-0002-4213-6294
|
||||
"Chirinda, Ngoni",Ngonidzashe Chirinda: 0000-0002-4213-6294
|
||||
"Ngonidzashe, Chirinda",Ngonidzashe Chirinda: 0000-0002-4213-6294
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The invocation would be:</li>
|
||||
</ul>
|
||||
<li><p>The invocation would be:</p>
|
||||
|
||||
<pre><code>$ ./add-orcid-identifiers-csv.py -i 2018-08-16-ciat-orcid.csv -db dspace -u dspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I ran the script on DSpace Test and CGSpace and tagged a total of 986 ORCID identifiers</li>
|
||||
<li>Looking at the list of author affialitions from Peter one last time</li>
|
||||
<li>I notice that I should add the Unicode character 0x00b4 (`) to my list of invalid characters to look for in Open Refine, making the latest version of the GREL expression being:</li>
|
||||
</ul>
|
||||
<li><p>I ran the script on DSpace Test and CGSpace and tagged a total of 986 ORCID identifiers</p></li>
|
||||
|
||||
<li><p>Looking at the list of author affialitions from Peter one last time</p></li>
|
||||
|
||||
<li><p>I notice that I should add the Unicode character 0x00b4 (`) to my list of invalid characters to look for in Open Refine, making the latest version of the GREL expression being:</p>
|
||||
|
||||
<pre><code>or(
|
||||
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||
isNotNull(value.match(/.*\u00A0.*/)),
|
||||
isNotNull(value.match(/.*\u200A.*/)),
|
||||
isNotNull(value.match(/.*\u2019.*/)),
|
||||
isNotNull(value.match(/.*\u00b4.*/))
|
||||
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||
isNotNull(value.match(/.*\u00A0.*/)),
|
||||
isNotNull(value.match(/.*\u200A.*/)),
|
||||
isNotNull(value.match(/.*\u2019.*/)),
|
||||
isNotNull(value.match(/.*\u00b4.*/))
|
||||
)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This character all by itself is indicative of encoding issues in French, Italian, and Spanish names, for example: De´veloppement and Investigacio´n</li>
|
||||
<li>I will run the following on DSpace Test and CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>This character all by itself is indicative of encoding issues in French, Italian, and Spanish names, for example: De´veloppement and Investigacio´n</p></li>
|
||||
|
||||
<li><p>I will run the following on DSpace Test and CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-08-15-Correct-1083-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t correct -m 211
|
||||
$ ./delete-metadata-values.py -i /tmp/2018-08-15-Remove-11-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then force an update of the Discovery index on DSpace Test:</li>
|
||||
</ul>
|
||||
<li><p>Then force an update of the Discovery index on DSpace Test:</p>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m"
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
@ -316,11 +329,9 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
real 72m12.570s
|
||||
user 6m45.305s
|
||||
sys 2m2.461s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And then on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>And then on CGSpace:</p>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
@ -328,29 +339,26 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
real 79m44.392s
|
||||
user 8m50.730s
|
||||
sys 2m20.248s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run system updates on DSpace Test and reboot the server</li>
|
||||
<li>In unrelated news, I see some newish Russian bot making a few thousand requests per day and not re-using its XMLUI session:</li>
|
||||
</ul>
|
||||
<li><p>Run system updates on DSpace Test and reboot the server</p></li>
|
||||
|
||||
<li><p>In unrelated news, I see some newish Russian bot making a few thousand requests per day and not re-using its XMLUI session:</p>
|
||||
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep '19/Aug/2018' | grep -c 5.9.6.51
|
||||
1553
|
||||
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-08-19
|
||||
1724
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I don’t even know how its possible for the bot to use MORE sessions than total requests…</li>
|
||||
<li>The user agent is:</li>
|
||||
</ul>
|
||||
<li><p>I don’t even know how its possible for the bot to use MORE sessions than total requests…</p></li>
|
||||
|
||||
<li><p>The user agent is:</p>
|
||||
|
||||
<pre><code>Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So I’m thinking we should add “crawl” to the Tomcat Crawler Session Manager valve, as we already have “bot” that catches Googlebot, Bingbot, etc.</li>
|
||||
<li><p>So I’m thinking we should add “crawl” to the Tomcat Crawler Session Manager valve, as we already have “bot” that catches Googlebot, Bingbot, etc.</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-08-20">2018-08-20</h2>
|
||||
@ -375,31 +383,37 @@ sys 2m20.248s
|
||||
<h2 id="2018-08-21">2018-08-21</h2>
|
||||
|
||||
<ul>
|
||||
<li>Something must have happened, as the <code>mvn package</code> <em>always</em> takes about two hours now, stopping for a very long time near the end at this step:</li>
|
||||
</ul>
|
||||
<li><p>Something must have happened, as the <code>mvn package</code> <em>always</em> takes about two hours now, stopping for a very long time near the end at this step:</p>
|
||||
|
||||
<pre><code>[INFO] Processing overlay [ id org.dspace.modules:xmlui-mirage2]
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It’s the same on DSpace Test, my local laptop, and CGSpace…</li>
|
||||
<li>It wasn’t this way before when I was constantly building the previous 5.8 branch with Atmire patches…</li>
|
||||
<li>I will restore the previous <code>5_x-dspace-5.8</code> and <code>atmire-module-upgrades-5.8</code> branches to see if the build time is different there</li>
|
||||
<li>… it seems that the <code>atmire-module-upgrades-5.8</code> branch still takes 1 hour and 23 minutes on my local machine…</li>
|
||||
<li>Let me try to build the old <code>5_x-prod-dspace-5.5</code> branch on my local machine and see how long it takes</li>
|
||||
<li>That one only took 13 minutes! So there is definitely something wrong with our 5.8 branch, now I should try vanilla DSpace 5.8</li>
|
||||
<li>I notice that the step this pauses at is:</li>
|
||||
</ul>
|
||||
<li><p>It’s the same on DSpace Test, my local laptop, and CGSpace…</p></li>
|
||||
|
||||
<li><p>It wasn’t this way before when I was constantly building the previous 5.8 branch with Atmire patches…</p></li>
|
||||
|
||||
<li><p>I will restore the previous <code>5_x-dspace-5.8</code> and <code>atmire-module-upgrades-5.8</code> branches to see if the build time is different there</p></li>
|
||||
|
||||
<li><p>… it seems that the <code>atmire-module-upgrades-5.8</code> branch still takes 1 hour and 23 minutes on my local machine…</p></li>
|
||||
|
||||
<li><p>Let me try to build the old <code>5_x-prod-dspace-5.5</code> branch on my local machine and see how long it takes</p></li>
|
||||
|
||||
<li><p>That one only took 13 minutes! So there is definitely something wrong with our 5.8 branch, now I should try vanilla DSpace 5.8</p></li>
|
||||
|
||||
<li><p>I notice that the step this pauses at is:</p>
|
||||
|
||||
<pre><code>[INFO] --- maven-war-plugin:2.4:war (default-war) @ xmlui ---
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And I notice that Atmire changed something in the XMLUI module’s <code>pom.xml</code> as part of the DSpace 5.8 changes, specifically to remove the exclude for <code>node_modules</code> in the <code>maven-war-plugin</code> step</li>
|
||||
<li>This exclude is <em>present</em> in vanilla DSpace, and if I add it back the build time goes from 1 hour 23 minutes to 12 minutes!</li>
|
||||
<li>It makes sense that it would take longer to complete this step because the <code>node_modules</code> folder has tens of thousands of files, and we have 27 themes!</li>
|
||||
<li>I need to test to see if this has any side effects when deployed…</li>
|
||||
<li>In other news, I see there was a pull request in DSpace 5.9 that fixes the issue with not being able to have blank lines in CSVs when importing via command line or webui (<a href="https://jira.duraspace.org/browse/DS-3245">DS-3245</a>)</li>
|
||||
<li><p>And I notice that Atmire changed something in the XMLUI module’s <code>pom.xml</code> as part of the DSpace 5.8 changes, specifically to remove the exclude for <code>node_modules</code> in the <code>maven-war-plugin</code> step</p></li>
|
||||
|
||||
<li><p>This exclude is <em>present</em> in vanilla DSpace, and if I add it back the build time goes from 1 hour 23 minutes to 12 minutes!</p></li>
|
||||
|
||||
<li><p>It makes sense that it would take longer to complete this step because the <code>node_modules</code> folder has tens of thousands of files, and we have 27 themes!</p></li>
|
||||
|
||||
<li><p>I need to test to see if this has any side effects when deployed…</p></li>
|
||||
|
||||
<li><p>In other news, I see there was a pull request in DSpace 5.9 that fixes the issue with not being able to have blank lines in CSVs when importing via command line or webui (<a href="https://jira.duraspace.org/browse/DS-3245">DS-3245</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-08-23">2018-08-23</h2>
|
||||
@ -410,34 +424,31 @@ sys 2m20.248s
|
||||
<li>I sent a list of the top 1500 author affiliations on CGSpace to CodeObia so we can compare ours with the ones on MELSpace</li>
|
||||
<li>Discuss CTA items with Sisay, he was trying to figure out how to do the collection mapping in combination with SAFBuilder</li>
|
||||
<li>It appears that the web UI’s upload interface <em>requires</em> you to specify the collection, whereas the CLI interface allows you to omit the collection command line flag and defer to the <code>collections</code> file inside each item in the bundle</li>
|
||||
<li>I imported the CTA items on CGSpace for Sisay:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I imported the CTA items on CGSpace for Sisay:</p>
|
||||
|
||||
<pre><code>$ dspace import -a -e s.webshet@cgiar.org -s /home/swebshet/ictupdates_uploads_August_21 -m /tmp/2018-08-23-cta-ictupdates.map
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-08-26">2018-08-26</h2>
|
||||
|
||||
<ul>
|
||||
<li>Doing the DSpace 5.8 upgrade on CGSpace (linode18)</li>
|
||||
<li>I already finished the Maven build, now I’ll take a backup of the PostgreSQL database and do a database cleanup just in case:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I already finished the Maven build, now I’ll take a backup of the PostgreSQL database and do a database cleanup just in case:</p>
|
||||
|
||||
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-08-26-before-dspace-58.backup dspace
|
||||
$ dspace cleanup -v
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Now I can stop Tomcat and do the install:</li>
|
||||
</ul>
|
||||
<li><p>Now I can stop Tomcat and do the install:</p>
|
||||
|
||||
<pre><code>$ cd dspace/target/dspace-installer
|
||||
$ ant update clean_backups update_geolite
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After the successful Ant update I can run the database migrations:</li>
|
||||
</ul>
|
||||
<li><p>After the successful Ant update I can run the database migrations:</p>
|
||||
|
||||
<pre><code>$ psql dspace dspace
|
||||
|
||||
@ -448,48 +459,55 @@ DELETE 1
|
||||
dspace=> \q
|
||||
|
||||
$ dspace database migrate ignored
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then I’ll run all system updates and reboot the server:</li>
|
||||
</ul>
|
||||
<li><p>Then I’ll run all system updates and reboot the server:</p>
|
||||
|
||||
<pre><code>$ sudo su -
|
||||
# apt update && apt full-upgrade
|
||||
# apt clean && apt autoclean && apt autoremove
|
||||
# reboot
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After reboot I logged in and cleared all the XMLUI caches and everything looked to be working fine</li>
|
||||
<li>Adam from WLE had asked a few weeks ago about getting the metadata for a bunch of items related to gender from 2013 until now</li>
|
||||
<li>They want a CSV with <em>all</em> metadata, which the Atmire Listings and Reports module can’t do</li>
|
||||
<li>I exported a list of items from Listings and Reports with the following criteria: from year 2013 until now, have WLE subject <code>GENDER</code> or <code>GENDER POVERTY AND INSTITUTIONS</code>, and CRP <code>Water, Land and Ecosystems</code></li>
|
||||
<li>Then I extracted the Handle links from the report so I could export each item’s metadata as CSV</li>
|
||||
</ul>
|
||||
<li><p>After reboot I logged in and cleared all the XMLUI caches and everything looked to be working fine</p></li>
|
||||
|
||||
<li><p>Adam from WLE had asked a few weeks ago about getting the metadata for a bunch of items related to gender from 2013 until now</p></li>
|
||||
|
||||
<li><p>They want a CSV with <em>all</em> metadata, which the Atmire Listings and Reports module can’t do</p></li>
|
||||
|
||||
<li><p>I exported a list of items from Listings and Reports with the following criteria: from year 2013 until now, have WLE subject <code>GENDER</code> or <code>GENDER POVERTY AND INSTITUTIONS</code>, and CRP <code>Water, Land and Ecosystems</code></p></li>
|
||||
|
||||
<li><p>Then I extracted the Handle links from the report so I could export each item’s metadata as CSV</p>
|
||||
|
||||
<pre><code>$ grep -o -E "[0-9]{5}/[0-9]{0,5}" listings-export.txt > /tmp/iwmi-gender-items.txt
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then on the DSpace server I exported the metadata for each item one by one:</li>
|
||||
</ul>
|
||||
<li><p>Then on the DSpace server I exported the metadata for each item one by one:</p>
|
||||
|
||||
<pre><code>$ while read -r line; do dspace metadata-export -f "/tmp/${line/\//-}.csv" -i $line; sleep 2; done < /tmp/iwmi-gender-items.txt
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But from here I realized that each of the fifty-nine items will have different columns in their CSVs, making it difficult to combine them</li>
|
||||
<li>I’m not sure how to proceed without writing some script to parse and join the CSVs, and I don’t think it’s worth my time</li>
|
||||
<li>I tested DSpace 5.8 in Tomcat 8.5.32 and it seems to work now, so I’m not sure why I got those errors last time I tried</li>
|
||||
<li>It could have been a configuration issue, though, as I also reconciled the <code>server.xml</code> with the one in <a href="https://github.com/ilri/rmg-ansible-public">our Ansible infrastructure scripts</a></li>
|
||||
<li>But now I can start testing and preparing to move DSpace Test to Ubuntu 18.04 + Tomcat 8.5 + OpenJDK + PostgreSQL 9.6…</li>
|
||||
<li>Actually, upon closer inspection, it seems that when you try to go to Listings and Reports under Tomcat 8.5.33 you are taken to the JSPUI login page despite having already logged in in XMLUI</li>
|
||||
<li>If I type my username and password again it <em>does</em> take me to Listings and Reports, though…</li>
|
||||
<li>I don’t see anything interesting in the Catalina or DSpace logs, so I might have to file a bug with Atmire</li>
|
||||
<li>For what it’s worth, the Content and Usage (CUA) module does load, though I can’t seem to get any results in the graph</li>
|
||||
<li>I just checked to see if the Listings and Reports issue with using the CGSpace citation field was fixed as planned alongside the DSpace 5.8 upgrades (<a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">#589</a></li>
|
||||
<li>I was able to create a new layout containing only the citation field, so I closed the ticket</li>
|
||||
<li><p>But from here I realized that each of the fifty-nine items will have different columns in their CSVs, making it difficult to combine them</p></li>
|
||||
|
||||
<li><p>I’m not sure how to proceed without writing some script to parse and join the CSVs, and I don’t think it’s worth my time</p></li>
|
||||
|
||||
<li><p>I tested DSpace 5.8 in Tomcat 8.5.32 and it seems to work now, so I’m not sure why I got those errors last time I tried</p></li>
|
||||
|
||||
<li><p>It could have been a configuration issue, though, as I also reconciled the <code>server.xml</code> with the one in <a href="https://github.com/ilri/rmg-ansible-public">our Ansible infrastructure scripts</a></p></li>
|
||||
|
||||
<li><p>But now I can start testing and preparing to move DSpace Test to Ubuntu 18.04 + Tomcat 8.5 + OpenJDK + PostgreSQL 9.6…</p></li>
|
||||
|
||||
<li><p>Actually, upon closer inspection, it seems that when you try to go to Listings and Reports under Tomcat 8.5.33 you are taken to the JSPUI login page despite having already logged in in XMLUI</p></li>
|
||||
|
||||
<li><p>If I type my username and password again it <em>does</em> take me to Listings and Reports, though…</p></li>
|
||||
|
||||
<li><p>I don’t see anything interesting in the Catalina or DSpace logs, so I might have to file a bug with Atmire</p></li>
|
||||
|
||||
<li><p>For what it’s worth, the Content and Usage (CUA) module does load, though I can’t seem to get any results in the graph</p></li>
|
||||
|
||||
<li><p>I just checked to see if the Listings and Reports issue with using the CGSpace citation field was fixed as planned alongside the DSpace 5.8 upgrades (<a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">#589</a></p></li>
|
||||
|
||||
<li><p>I was able to create a new layout containing only the citation field, so I closed the ticket</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-08-29">2018-08-29</h2>
|
||||
|
@ -29,7 +29,7 @@ I’ll update the DSpace role in our Ansible infrastructure playbooks and ru
|
||||
Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month
|
||||
I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -181,62 +181,59 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
|
||||
|
||||
<ul>
|
||||
<li>Playing with <a href="https://github.com/eykhagen/strest">strest</a> to test the DSpace REST API programatically</li>
|
||||
<li>For example, given this <code>test.yaml</code>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>For example, given this <code>test.yaml</code>:</p>
|
||||
|
||||
<pre><code>version: 1
|
||||
|
||||
requests:
|
||||
test:
|
||||
method: GET
|
||||
url: https://dspacetest.cgiar.org/rest/test
|
||||
validate:
|
||||
raw: "REST api is running."
|
||||
test:
|
||||
method: GET
|
||||
url: https://dspacetest.cgiar.org/rest/test
|
||||
validate:
|
||||
raw: "REST api is running."
|
||||
|
||||
login:
|
||||
url: https://dspacetest.cgiar.org/rest/login
|
||||
method: POST
|
||||
data:
|
||||
json: {"email":"test@dspace","password":"thepass"}
|
||||
login:
|
||||
url: https://dspacetest.cgiar.org/rest/login
|
||||
method: POST
|
||||
data:
|
||||
json: {"email":"test@dspace","password":"thepass"}
|
||||
|
||||
status:
|
||||
url: https://dspacetest.cgiar.org/rest/status
|
||||
method: GET
|
||||
headers:
|
||||
rest-dspace-token: Value(login)
|
||||
status:
|
||||
url: https://dspacetest.cgiar.org/rest/status
|
||||
method: GET
|
||||
headers:
|
||||
rest-dspace-token: Value(login)
|
||||
|
||||
logout:
|
||||
url: https://dspacetest.cgiar.org/rest/logout
|
||||
method: POST
|
||||
headers:
|
||||
rest-dspace-token: Value(login)
|
||||
logout:
|
||||
url: https://dspacetest.cgiar.org/rest/logout
|
||||
method: POST
|
||||
headers:
|
||||
rest-dspace-token: Value(login)
|
||||
|
||||
# vim: set sw=2 ts=2:
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Works pretty well, though the DSpace <code>logout</code> always returns an HTTP 415 error for some reason</li>
|
||||
<li>We could eventually use this to test sanity of the API for creating collections etc</li>
|
||||
<li>A user is getting an error in her workflow:</li>
|
||||
</ul>
|
||||
<li><p>Works pretty well, though the DSpace <code>logout</code> always returns an HTTP 415 error for some reason</p></li>
|
||||
|
||||
<li><p>We could eventually use this to test sanity of the API for creating collections etc</p></li>
|
||||
|
||||
<li><p>A user is getting an error in her workflow:</p>
|
||||
|
||||
<pre><code>2018-09-10 07:26:35,551 ERROR org.dspace.submit.step.CompleteStep @ Caught exception in submission step:
|
||||
org.dspace.authorize.AuthorizeException: Authorization denied for action WORKFLOW_STEP_1 on COLLECTION:2 by user 3819
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Seems to be during submit step, because it’s workflow step 1…?</li>
|
||||
<li>Move some top-level CRP communities to be below the new <a href="https://cgspace.cgiar.org/handle/10568/97114">CGIAR Research Programs and Platforms</a> community:</li>
|
||||
</ul>
|
||||
<li><p>Seems to be during submit step, because it’s workflow step 1…?</p></li>
|
||||
|
||||
<li><p>Move some top-level CRP communities to be below the new <a href="https://cgspace.cgiar.org/handle/10568/97114">CGIAR Research Programs and Platforms</a> community:</p>
|
||||
|
||||
<pre><code>$ dspace community-filiator --set -p 10568/97114 -c 10568/51670
|
||||
$ dspace community-filiator --set -p 10568/97114 -c 10568/35409
|
||||
$ dspace community-filiator --set -p 10568/97114 -c 10568/3112
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Valerio contacted me to point out some issues with metadata on CGSpace, which I corrected in PostgreSQL:</li>
|
||||
</ul>
|
||||
<li><p>Valerio contacted me to point out some issues with metadata on CGSpace, which I corrected in PostgreSQL:</p>
|
||||
|
||||
<pre><code>update metadatavalue set text_value='ISI Journal' where resource_type_id=2 and metadata_field_id=226 and text_value='ISI Juornal';
|
||||
UPDATE 1
|
||||
@ -248,48 +245,45 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id=226 and
|
||||
DELETE 17
|
||||
update metadatavalue set text_value='ISI Journal' where resource_type_id=2 and metadata_field_id=226 and text_value='ISI';
|
||||
UPDATE 15
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Start working on adding metadata for access and usage rights that we started earlier in 2018 (and also in 2017)</li>
|
||||
<li>The current <code>cg.identifier.status</code> field will become “Access rights” and <code>dc.rights</code> will become “Usage rights”</li>
|
||||
<li>I have some work in progress on the <a href="https://github.com/alanorth/DSpace/tree/5_x-rights"><code>5_x-rights</code> branch</a></li>
|
||||
<li>Linode said that CGSpace (linode18) had a high CPU load earlier today</li>
|
||||
<li>When I looked, I see it’s the same Russian IP that I noticed last month:</li>
|
||||
</ul>
|
||||
<li><p>Start working on adding metadata for access and usage rights that we started earlier in 2018 (and also in 2017)</p></li>
|
||||
|
||||
<li><p>The current <code>cg.identifier.status</code> field will become “Access rights” and <code>dc.rights</code> will become “Usage rights”</p></li>
|
||||
|
||||
<li><p>I have some work in progress on the <a href="https://github.com/alanorth/DSpace/tree/5_x-rights"><code>5_x-rights</code> branch</a></p></li>
|
||||
|
||||
<li><p>Linode said that CGSpace (linode18) had a high CPU load earlier today</p></li>
|
||||
|
||||
<li><p>When I looked, I see it’s the same Russian IP that I noticed last month:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Sep/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1459 157.55.39.202
|
||||
1579 95.108.181.88
|
||||
1615 157.55.39.147
|
||||
1714 66.249.64.91
|
||||
1924 50.116.102.77
|
||||
3696 157.55.39.106
|
||||
3763 157.55.39.148
|
||||
4470 70.32.83.92
|
||||
4724 35.237.175.180
|
||||
14132 5.9.6.51
|
||||
</code></pre>
|
||||
1459 157.55.39.202
|
||||
1579 95.108.181.88
|
||||
1615 157.55.39.147
|
||||
1714 66.249.64.91
|
||||
1924 50.116.102.77
|
||||
3696 157.55.39.106
|
||||
3763 157.55.39.148
|
||||
4470 70.32.83.92
|
||||
4724 35.237.175.180
|
||||
14132 5.9.6.51
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And this bot is still creating more Tomcat sessions than Nginx requests (WTF?):</li>
|
||||
</ul>
|
||||
<li><p>And this bot is still creating more Tomcat sessions than Nginx requests (WTF?):</p>
|
||||
|
||||
<pre><code># grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-09-10
|
||||
14133
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The user agent is still the same:</li>
|
||||
</ul>
|
||||
<li><p>The user agent is still the same:</p>
|
||||
|
||||
<pre><code>Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I added <code>.*crawl.*</code> to the Tomcat Session Crawler Manager Valve, so I’m not sure why the bot is creating so many sessions…</li>
|
||||
<li>I just tested that user agent on CGSpace and it <em>does not</em> create a new session:</li>
|
||||
</ul>
|
||||
<li><p>I added <code>.*crawl.*</code> to the Tomcat Session Crawler Manager Valve, so I’m not sure why the bot is creating so many sessions…</p></li>
|
||||
|
||||
<li><p>I just tested that user agent on CGSpace and it <em>does not</em> create a new session:</p>
|
||||
|
||||
<pre><code>$ http --print Hh https://cgspace.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)'
|
||||
GET / HTTP/1.1
|
||||
@ -313,29 +307,31 @@ X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-XSS-Protection: 1; mode=block
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I will have to keep an eye on it and perhaps add it to the list of “bad bots” that get rate limited</li>
|
||||
<li><p>I will have to keep an eye on it and perhaps add it to the list of “bad bots” that get rate limited</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-09-12">2018-09-12</h2>
|
||||
|
||||
<ul>
|
||||
<li>Merge AReS explorer changes to nginx config and deploy on CGSpace so CodeObia can start testing more</li>
|
||||
<li>Re-create my local Docker container for PostgreSQL data, but using a volume for the database data:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Re-create my local Docker container for PostgreSQL data, but using a volume for the database data:</p>
|
||||
|
||||
<pre><code>$ sudo docker volume create --name dspacetest_data
|
||||
$ sudo docker run --name dspacedb -v dspacetest_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Sisay is still having problems with the controlled vocabulary for top authors</li>
|
||||
<li>I took a look at the submission template and Firefox complains that the XML file is missing a root element</li>
|
||||
<li>I guess it’s because Firefox is receiving an empty XML file</li>
|
||||
<li>I told Sisay to run the XML file through tidy</li>
|
||||
<li>More testing of the access and usage rights changes</li>
|
||||
<li><p>Sisay is still having problems with the controlled vocabulary for top authors</p></li>
|
||||
|
||||
<li><p>I took a look at the submission template and Firefox complains that the XML file is missing a root element</p></li>
|
||||
|
||||
<li><p>I guess it’s because Firefox is receiving an empty XML file</p></li>
|
||||
|
||||
<li><p>I told Sisay to run the XML file through tidy</p></li>
|
||||
|
||||
<li><p>More testing of the access and usage rights changes</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-09-13">2018-09-13</h2>
|
||||
@ -347,53 +343,60 @@ $ sudo docker run --name dspacedb -v dspacetest_data:/var/lib/postgresql/data -e
|
||||
<li>The <code>dateStamp</code> is most probably only updated when the item’s metadata changes, not its mappings, so if Altmetric is relying on that we’re in a tricky spot</li>
|
||||
<li>We need to make sure that our OAI isn’t publicizing stale data… I was going to post something on the dspace-tech mailing list, but never did</li>
|
||||
<li>Linode says that CGSpace (linode18) has had high CPU for the past two hours</li>
|
||||
<li>The top IP addresses today are:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top IP addresses today are:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E "13/Sep/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
32 46.229.161.131
|
||||
38 104.198.9.108
|
||||
39 66.249.64.91
|
||||
56 157.55.39.224
|
||||
57 207.46.13.49
|
||||
58 40.77.167.120
|
||||
78 169.255.105.46
|
||||
702 54.214.112.202
|
||||
1840 50.116.102.77
|
||||
4469 70.32.83.92
|
||||
</code></pre>
|
||||
32 46.229.161.131
|
||||
38 104.198.9.108
|
||||
39 66.249.64.91
|
||||
56 157.55.39.224
|
||||
57 207.46.13.49
|
||||
58 40.77.167.120
|
||||
78 169.255.105.46
|
||||
702 54.214.112.202
|
||||
1840 50.116.102.77
|
||||
4469 70.32.83.92
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And the top two addresses seem to be re-using their Tomcat sessions properly:</li>
|
||||
</ul>
|
||||
<li><p>And the top two addresses seem to be re-using their Tomcat sessions properly:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=70.32.83.92' dspace.log.2018-09-13 | sort | uniq
|
||||
7
|
||||
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-13 | sort | uniq
|
||||
2
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So I’m not sure what’s going on</li>
|
||||
<li>Valerio asked me if there’s a way to get the page views and downloads from CGSpace</li>
|
||||
<li>I said no, but that we might be able to piggyback on the Atmire statlet REST API</li>
|
||||
<li>For example, when you expand the “statlet” at the bottom of an item like <a href="https://cgspace.cgiar.org/handle/10568/97103"><sup>10568</sup>⁄<sub>97103</sub></a> you can see the following request in the browser console:</li>
|
||||
</ul>
|
||||
<li><p>So I’m not sure what’s going on</p></li>
|
||||
|
||||
<li><p>Valerio asked me if there’s a way to get the page views and downloads from CGSpace</p></li>
|
||||
|
||||
<li><p>I said no, but that we might be able to piggyback on the Atmire statlet REST API</p></li>
|
||||
|
||||
<li><p>For example, when you expand the “statlet” at the bottom of an item like <a href="https://cgspace.cgiar.org/handle/10568/97103"><sup>10568</sup>⁄<sub>97103</sub></a> you can see the following request in the browser console:</p>
|
||||
|
||||
<pre><code>https://cgspace.cgiar.org/rest/statlets?handle=10568/97103
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>That JSON file has the total page views and item downloads for the item…</li>
|
||||
<li>Abenet forwarded a request by CIP that item thumbnails be included in RSS feeds</li>
|
||||
<li>I had a quick look at the DSpace 5.x manual and it doesn’t not seem that this is possible (you can only add metadata)</li>
|
||||
<li>Testing the new LDAP server the CGNET says will be replacing the old one, it doesn’t seem that they are using the global catalog on port 3269 anymore, now only 636 is open</li>
|
||||
<li>I did a clean deploy of DSpace 5.8 on Ubuntu 18.04 with some stripped down Tomcat 8 configuration and actually managed to get it up and running without the autowire errors that I had previously experienced</li>
|
||||
<li>I realized that it always works on my local machine with Tomcat 8.5.x, but not when I do the deployment from Ansible in Ubuntu 18.04</li>
|
||||
<li>So there must be something in my Tomcat 8 <code>server.xml</code> template</li>
|
||||
<li>Now I re-deployed it with the normal server template and it’s working, WTF?</li>
|
||||
<li>Must have been something like an old DSpace 5.5 file in the spring folder… weird</li>
|
||||
<li>But yay, this means we can update DSpace Test to Ubuntu 18.04, Tomcat 8, PostgreSQL 9.6, etc…</li>
|
||||
<li><p>That JSON file has the total page views and item downloads for the item…</p></li>
|
||||
|
||||
<li><p>Abenet forwarded a request by CIP that item thumbnails be included in RSS feeds</p></li>
|
||||
|
||||
<li><p>I had a quick look at the DSpace 5.x manual and it doesn’t not seem that this is possible (you can only add metadata)</p></li>
|
||||
|
||||
<li><p>Testing the new LDAP server the CGNET says will be replacing the old one, it doesn’t seem that they are using the global catalog on port 3269 anymore, now only 636 is open</p></li>
|
||||
|
||||
<li><p>I did a clean deploy of DSpace 5.8 on Ubuntu 18.04 with some stripped down Tomcat 8 configuration and actually managed to get it up and running without the autowire errors that I had previously experienced</p></li>
|
||||
|
||||
<li><p>I realized that it always works on my local machine with Tomcat 8.5.x, but not when I do the deployment from Ansible in Ubuntu 18.04</p></li>
|
||||
|
||||
<li><p>So there must be something in my Tomcat 8 <code>server.xml</code> template</p></li>
|
||||
|
||||
<li><p>Now I re-deployed it with the normal server template and it’s working, WTF?</p></li>
|
||||
|
||||
<li><p>Must have been something like an old DSpace 5.5 file in the spring folder… weird</p></li>
|
||||
|
||||
<li><p>But yay, this means we can update DSpace Test to Ubuntu 18.04, Tomcat 8, PostgreSQL 9.6, etc…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-09-14">2018-09-14</h2>
|
||||
@ -440,51 +443,46 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
|
||||
<li>I want to explore creating a thin API to make the item view and download stats available from Solr so CodeObia can use them in the AReS explorer</li>
|
||||
<li>Currently CodeObia is exploring using the Atmire statlets internal API, but I don’t really like that…</li>
|
||||
<li>There are some example queries on the <a href="https://wiki.duraspace.org/display/DSPACE/Solr">DSpace Solr wiki</a></li>
|
||||
<li>For example, this query returns 1655 rows for item <a href="https://cgspace.cgiar.org/handle/10568/10630"><sup>10568</sup>⁄<sub>10630</sub></a>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>For example, this query returns 1655 rows for item <a href="https://cgspace.cgiar.org/handle/10568/10630"><sup>10568</sup>⁄<sub>10630</sub></a>:</p>
|
||||
|
||||
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The id in the Solr query is the item’s database id (get it from the REST API or something)</li>
|
||||
<li>Next, I adopted a query to get the downloads and it shows 889, which is similar to the number Atmire’s statlet shows, though the query logic here is confusing:</li>
|
||||
</ul>
|
||||
<li><p>The id in the Solr query is the item’s database id (get it from the REST API or something)</p></li>
|
||||
|
||||
<li><p>Next, I adopted a query to get the downloads and it shows 889, which is similar to the number Atmire’s statlet shows, though the query logic here is confusing:</p>
|
||||
|
||||
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=-(bundleName:[*+TO+*]-bundleName:ORIGINAL)&fq=-(statistics_type:[*+TO+*]+-statistics_type:view)'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>According to the <a href="https://wiki.apache.org/solr/SolrQuerySyntax">SolrQuerySyntax</a> page on the Apache wiki, the <code>[* TO *]</code> syntax just selects a range (in this case all values for a field)</li>
|
||||
<li>So it seems to be:
|
||||
<li><p>According to the <a href="https://wiki.apache.org/solr/SolrQuerySyntax">SolrQuerySyntax</a> page on the Apache wiki, the <code>[* TO *]</code> syntax just selects a range (in this case all values for a field)</p></li>
|
||||
|
||||
<li><p>So it seems to be:</p>
|
||||
|
||||
<ul>
|
||||
<li><code>type:0</code> is for bitstreams according to the DSpace Solr documentation</li>
|
||||
<li><code>-(bundleName:[*+TO+*]-bundleName:ORIGINAL)</code> seems to be a <a href="https://wiki.apache.org/solr/NegativeQueryProblems">negative query starting with all documents</a>, subtracting those with <code>bundleName:ORIGINAL</code>, and then negating the whole thing… meaning only documents from <code>bundleName:ORIGINAL</code>?</li>
|
||||
</ul></li>
|
||||
<li>What the shit, I think I’m right: the simplified logic in <em>this</em> query returns the same 889:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>What the shit, I think I’m right: the simplified logic in <em>this</em> query returns the same 889:</p>
|
||||
|
||||
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=bundleName:ORIGINAL&fq=-(statistics_type:[*+TO+*]+-statistics_type:view)'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And if I simplify the <code>statistics_type</code> logic the same way, it still returns the same 889!</li>
|
||||
</ul>
|
||||
<li><p>And if I simplify the <code>statistics_type</code> logic the same way, it still returns the same 889!</p>
|
||||
|
||||
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=bundleName:ORIGINAL&fq=statistics_type:view'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>As for item views, I suppose that’s just the same query, minus the <code>bundleName:ORIGINAL</code>:</li>
|
||||
</ul>
|
||||
<li><p>As for item views, I suppose that’s just the same query, minus the <code>bundleName:ORIGINAL</code>:</p>
|
||||
|
||||
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?indent=on&rows=0&q=type:0+owningItem:11576&fq=isBot:false&fq=-bundleName:ORIGINAL&fq=statistics_type:view'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>That one returns 766, which is exactly 1655 minus 889…</li>
|
||||
<li>Also, Solr’s <code>fq</code> is similar to the regular <code>q</code> query parameter, but it is considered for the Solr query cache so it should be faster for multiple queries</li>
|
||||
<li><p>That one returns 766, which is exactly 1655 minus 889…</p></li>
|
||||
|
||||
<li><p>Also, Solr’s <code>fq</code> is similar to the regular <code>q</code> query parameter, but it is considered for the Solr query cache so it should be faster for multiple queries</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-09-18">2018-09-18</h2>
|
||||
@ -492,28 +490,27 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
|
||||
<ul>
|
||||
<li>I managed to create a simple proof of concept REST API to expose item view and download statistics: <a href="https://github.com/alanorth/cgspace-statistics-api">cgspace-statistics-api</a></li>
|
||||
<li>It uses the Python-based <a href="https://falcon.readthedocs.io">Falcon</a> web framework and talks to Solr directly using the <a href="https://github.com/moonlitesolutions/SolrClient">SolrClient</a> library (which seems to have issues in Python 3.7 currently)</li>
|
||||
<li>After deploying on DSpace Test I can then get the stats for an item using its ID:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>After deploying on DSpace Test I can then get the stats for an item using its ID:</p>
|
||||
|
||||
<pre><code>$ http -b 'https://dspacetest.cgiar.org/rest/statistics/item?id=110988'
|
||||
{
|
||||
"downloads": 2,
|
||||
"id": 110988,
|
||||
"views": 15
|
||||
"downloads": 2,
|
||||
"id": 110988,
|
||||
"views": 15
|
||||
}
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The numbers are different than those that come from Atmire’s statlets for some reason, but as I’m querying Solr directly, I have no idea where their numbers come from!</li>
|
||||
<li>Moayad from CodeObia asked if I could make the API be able to paginate over all items, for example: /statistics?limit=100&page=1</li>
|
||||
<li>Getting all the item IDs from PostgreSQL is certainly easy:</li>
|
||||
</ul>
|
||||
<li><p>The numbers are different than those that come from Atmire’s statlets for some reason, but as I’m querying Solr directly, I have no idea where their numbers come from!</p></li>
|
||||
|
||||
<li><p>Moayad from CodeObia asked if I could make the API be able to paginate over all items, for example: /statistics?limit=100&page=1</p></li>
|
||||
|
||||
<li><p>Getting all the item IDs from PostgreSQL is certainly easy:</p>
|
||||
|
||||
<pre><code>dspace=# select item_id from item where in_archive is True and withdrawn is False and discoverable is True;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The rest of the Falcon tooling will be more difficult…</li>
|
||||
<li><p>The rest of the Falcon tooling will be more difficult…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-09-19">2018-09-19</h2>
|
||||
@ -527,24 +524,24 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
|
||||
|
||||
<ul>
|
||||
<li>Contact Atmire to ask how we can buy more credits for future development</li>
|
||||
<li>I researched the Solr <code>filterCache</code> size and I found out that the formula for calculating the potential memory use of <strong>each entry</strong> in the cache is:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I researched the Solr <code>filterCache</code> size and I found out that the formula for calculating the potential memory use of <strong>each entry</strong> in the cache is:</p>
|
||||
|
||||
<pre><code>((maxDoc/8) + 128) * (size_defined_in_solrconfig.xml)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Which means that, for our statistics core with <em>149 million</em> documents, each entry in our <code>filterCache</code> would use 8.9 GB!</li>
|
||||
</ul>
|
||||
<li><p>Which means that, for our statistics core with <em>149 million</em> documents, each entry in our <code>filterCache</code> would use 8.9 GB!</p>
|
||||
|
||||
<pre><code>((149374568/8) + 128) * 512 = 9560037888 bytes (8.9 GB)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So I think we can forget about tuning this for now!</li>
|
||||
<li><a href="http://lucene.472066.n3.nabble.com/Calculating-filterCache-size-td4142526.html">Discussion on the mailing list about <code>filterCache</code> size</a></li>
|
||||
<li><a href="https://docs.google.com/document/d/1vl-nmlprSULvNZKQNrqp65eLnLhG9s_ydXQtg9iML10/edit">Article discussing testing methodology for different <code>filterCache</code> sizes</a></li>
|
||||
<li>Discuss Handle links on Twitter with IWMI</li>
|
||||
<li><p>So I think we can forget about tuning this for now!</p></li>
|
||||
|
||||
<li><p><a href="http://lucene.472066.n3.nabble.com/Calculating-filterCache-size-td4142526.html">Discussion on the mailing list about <code>filterCache</code> size</a></p></li>
|
||||
|
||||
<li><p><a href="https://docs.google.com/document/d/1vl-nmlprSULvNZKQNrqp65eLnLhG9s_ydXQtg9iML10/edit">Article discussing testing methodology for different <code>filterCache</code> sizes</a></p></li>
|
||||
|
||||
<li><p>Discuss Handle links on Twitter with IWMI</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-09-21">2018-09-21</h2>
|
||||
@ -577,8 +574,8 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
|
||||
|
||||
<ul>
|
||||
<li>Trying to figure out how to get item views and downloads from SQLite in a join</li>
|
||||
<li>It appears SQLite doesn’t support <code>FULL OUTER JOIN</code> so some people on StackOverflow have emulated it with <code>LEFT JOIN</code> and <code>UNION</code>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It appears SQLite doesn’t support <code>FULL OUTER JOIN</code> so some people on StackOverflow have emulated it with <code>LEFT JOIN</code> and <code>UNION</code>:</p>
|
||||
|
||||
<pre><code>> SELECT views.views, views.id, downloads.downloads, downloads.id FROM itemviews views
|
||||
LEFT JOIN itemdownloads downloads USING(id)
|
||||
@ -586,12 +583,11 @@ UNION ALL
|
||||
SELECT views.views, views.id, downloads.downloads, downloads.id FROM itemdownloads downloads
|
||||
LEFT JOIN itemviews views USING(id)
|
||||
WHERE views.id IS NULL;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This “works” but the resulting rows are kinda messy so I’d have to do extra logic in Python</li>
|
||||
<li>Maybe we can use one “items” table with defaults values and UPSERT (aka insert… on conflict … do update):</li>
|
||||
</ul>
|
||||
<li><p>This “works” but the resulting rows are kinda messy so I’d have to do extra logic in Python</p></li>
|
||||
|
||||
<li><p>Maybe we can use one “items” table with defaults values and UPSERT (aka insert… on conflict … do update):</p>
|
||||
|
||||
<pre><code>sqlite> CREATE TABLE items(id INT PRIMARY KEY, views INT DEFAULT 0, downloads INT DEFAULT 0);
|
||||
sqlite> INSERT INTO items(id, views) VALUES(0, 52);
|
||||
@ -600,29 +596,31 @@ sqlite> INSERT INTO items(id, downloads) VALUES(1, 176) ON CONFLICT(id) DO UP
|
||||
sqlite> INSERT INTO items(id, views) VALUES(0, 78) ON CONFLICT(id) DO UPDATE SET views=78;
|
||||
sqlite> INSERT INTO items(id, views) VALUES(0, 3) ON CONFLICT(id) DO UPDATE SET downloads=3;
|
||||
sqlite> INSERT INTO items(id, views) VALUES(0, 7) ON CONFLICT(id) DO UPDATE SET downloads=excluded.views;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This totally works!</li>
|
||||
<li>Note the special <code>excluded.views</code> form! See <a href="https://www.sqlite.org/lang_UPSERT.html">SQLite’s lang_UPSERT documentation</a></li>
|
||||
<li>Oh nice, I finally finished the Falcon API route to page through all the results using SQLite’s amazing <code>LIMIT</code> and <code>OFFSET</code> support</li>
|
||||
<li>But when I deployed it on my Ubuntu 16.04 environment I realized Ubuntu’s SQLite is old and doesn’t support <code>UPSERT</code>, so my indexing doesn’t work…</li>
|
||||
<li>Apparently <code>UPSERT</code> came in SQLite 3.24.0 (2018-06-04), and Ubuntu 16.04 has 3.11.0</li>
|
||||
<li>Ok this is hilarious, I manually downloaded the <a href="https://packages.ubuntu.com/cosmic/libsqlite3-0">libsqlite3 3.24.0 deb from Ubuntu 18.10 “cosmic”</a> and installed it in Ubnutu 16.04 and now the Python <code>indexer.py</code> works</li>
|
||||
<li>This is definitely a dirty hack, but the list of packages we use that depend on <code>libsqlite3-0</code> in Ubuntu 16.04 are actually pretty few:</li>
|
||||
</ul>
|
||||
<li><p>This totally works!</p></li>
|
||||
|
||||
<li><p>Note the special <code>excluded.views</code> form! See <a href="https://www.sqlite.org/lang_UPSERT.html">SQLite’s lang_UPSERT documentation</a></p></li>
|
||||
|
||||
<li><p>Oh nice, I finally finished the Falcon API route to page through all the results using SQLite’s amazing <code>LIMIT</code> and <code>OFFSET</code> support</p></li>
|
||||
|
||||
<li><p>But when I deployed it on my Ubuntu 16.04 environment I realized Ubuntu’s SQLite is old and doesn’t support <code>UPSERT</code>, so my indexing doesn’t work…</p></li>
|
||||
|
||||
<li><p>Apparently <code>UPSERT</code> came in SQLite 3.24.0 (2018-06-04), and Ubuntu 16.04 has 3.11.0</p></li>
|
||||
|
||||
<li><p>Ok this is hilarious, I manually downloaded the <a href="https://packages.ubuntu.com/cosmic/libsqlite3-0">libsqlite3 3.24.0 deb from Ubuntu 18.10 “cosmic”</a> and installed it in Ubnutu 16.04 and now the Python <code>indexer.py</code> works</p></li>
|
||||
|
||||
<li><p>This is definitely a dirty hack, but the list of packages we use that depend on <code>libsqlite3-0</code> in Ubuntu 16.04 are actually pretty few:</p>
|
||||
|
||||
<pre><code># apt-cache rdepends --installed libsqlite3-0 | sort | uniq
|
||||
gnupg2
|
||||
libkrb5-26-heimdal
|
||||
libnss3
|
||||
libpython2.7-stdlib
|
||||
libpython3.5-stdlib
|
||||
</code></pre>
|
||||
gnupg2
|
||||
libkrb5-26-heimdal
|
||||
libnss3
|
||||
libpython2.7-stdlib
|
||||
libpython3.5-stdlib
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I wonder if I could work around this by detecting the SQLite library version, for example on Ubuntu 16.04 after I replaced the library:</li>
|
||||
</ul>
|
||||
<li><p>I wonder if I could work around this by detecting the SQLite library version, for example on Ubuntu 16.04 after I replaced the library:</p>
|
||||
|
||||
<pre><code># python3
|
||||
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
|
||||
@ -631,20 +629,21 @@ Type "help", "copyright", "credits" or "licen
|
||||
>>> import sqlite3
|
||||
>>> print(sqlite3.sqlite_version)
|
||||
3.24.0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Or maybe I should just bite the bullet and migrate this to PostgreSQL, as it <a href="https://wiki.postgresql.org/wiki/UPSERT">supports <code>UPSERT</code> since version 9.5</a> and also seems to have my new favorite <code>LIMIT</code> and <code>OFFSET</code></li>
|
||||
<li>I changed the syntax of the SQLite stuff and PostgreSQL is working flawlessly with psycopg2… hmmm.</li>
|
||||
<li>For reference, creating a PostgreSQL database for testing this locally (though <code>indexer.py</code> will create the table):</li>
|
||||
</ul>
|
||||
<li><p>Or maybe I should just bite the bullet and migrate this to PostgreSQL, as it <a href="https://wiki.postgresql.org/wiki/UPSERT">supports <code>UPSERT</code> since version 9.5</a> and also seems to have my new favorite <code>LIMIT</code> and <code>OFFSET</code></p></li>
|
||||
|
||||
<li><p>I changed the syntax of the SQLite stuff and PostgreSQL is working flawlessly with psycopg2… hmmm.</p></li>
|
||||
|
||||
<li><p>For reference, creating a PostgreSQL database for testing this locally (though <code>indexer.py</code> will create the table):</p>
|
||||
|
||||
<pre><code>$ createdb -h localhost -U postgres -O dspacestatistics --encoding=UNICODE dspacestatistics
|
||||
$ createuser -h localhost -U postgres --pwprompt dspacestatistics
|
||||
$ psql -h localhost -U postgres dspacestatistics
|
||||
dspacestatistics=> CREATE TABLE IF NOT EXISTS items
|
||||
dspacestatistics-> (id INT PRIMARY KEY, views INT DEFAULT 0, downloads INT DEFAULT 0)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-09-25">2018-09-25</h2>
|
||||
|
||||
@ -656,55 +655,66 @@ dspacestatistics-> (id INT PRIMARY KEY, views INT DEFAULT 0, downloads INT DE
|
||||
<li>I want to purge the bot hits from the Solr statistics core, as I am now realizing that I don’t give a shit about tens of millions of hits by Google and Bing indexing my shit every day (at least not in Solr!)</li>
|
||||
<li>CGSpace’s Solr core has 150,000,000 documents in it… and it’s still pretty fast to query, but it’s really a maintenance and backup burden</li>
|
||||
<li>DSpace Test currently has about 2,000,000 documents with <code>isBot:true</code> in its Solr statistics core, and the size on disk is 2GB (it’s not much, but I have to test this somewhere!)</li>
|
||||
<li>According to the <a href="https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance">DSpace 5.x Solr documentation</a> I can use <code>dspace stats-util -f</code>, so let’s try it:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>According to the <a href="https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance">DSpace 5.x Solr documentation</a> I can use <code>dspace stats-util -f</code>, so let’s try it:</p>
|
||||
|
||||
<pre><code>$ dspace stats-util -f
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The command comes back after a few seconds and I still see 2,000,000 documents in the statistics core with <code>isBot:true</code></li>
|
||||
<li>I was just writing a message to the dspace-tech mailing list and then I decided to check the number of bot view events on DSpace Test again, and now it’s 201 instead of 2,000,000, and statistics core is only 30MB now!</li>
|
||||
<li>I will set the <code>logBots = false</code> property in <code>dspace/config/modules/usage-statistics.cfg</code> on DSpace Test and check if the number of <code>isBot:true</code> events goes up any more…</li>
|
||||
<li>I restarted the server with <code>logBots = false</code> and after it came back up I see 266 events with <code>isBots:true</code> (maybe they were buffered)… I will check again tomorrow</li>
|
||||
<li>After a few hours I see there are still only 266 view events with <code>isBot:true</code> on DSpace Test’s Solr statistics core, so I’m definitely going to deploy this on CGSpace soon</li>
|
||||
<li>Also, CGSpace currently has 60,089,394 view events with <code>isBot:true</code> in it’s Solr statistics core and it is 124GB!</li>
|
||||
<li>Amazing! After running <code>dspace stats-util -f</code> on CGSpace the Solr statistics core went from 124GB to 60GB, and now there are only 700 events with <code>isBot:true</code> so I should really disable logging of bot events!</li>
|
||||
<li>I’m super curious to see how the JVM heap usage changes…</li>
|
||||
<li>I made (and merged) a pull request to disable bot logging on the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/387">#387</a>)</li>
|
||||
<li>Now I’m wondering if there are other bot requests that aren’t classified as bots because the IP lists or user agents are outdated</li>
|
||||
<li>DSpace ships a list of spider IPs, for example: <code>config/spiders/iplists.com-google.txt</code></li>
|
||||
<li>I checked the list against all the IPs we’ve seen using the “Googlebot” useragent on CGSpace’s nginx access logs</li>
|
||||
<li>The first thing I learned is that shit tons of IPs in Russia, Ukraine, Ireland, Brazil, Portugal, the US, Canada, etc are pretending to be “Googlebot”…</li>
|
||||
<li>According to the <a href="https://support.google.com/webmasters/answer/80553">Googlebot FAQ</a> the domain name in the reverse DNS lookup should contain either <code>googlebot.com</code> or <code>google.com</code></li>
|
||||
<li>In Solr this appears to be an appropriate query that I can maybe use later (returns 81,000 documents):</li>
|
||||
</ul>
|
||||
<li><p>The command comes back after a few seconds and I still see 2,000,000 documents in the statistics core with <code>isBot:true</code></p></li>
|
||||
|
||||
<li><p>I was just writing a message to the dspace-tech mailing list and then I decided to check the number of bot view events on DSpace Test again, and now it’s 201 instead of 2,000,000, and statistics core is only 30MB now!</p></li>
|
||||
|
||||
<li><p>I will set the <code>logBots = false</code> property in <code>dspace/config/modules/usage-statistics.cfg</code> on DSpace Test and check if the number of <code>isBot:true</code> events goes up any more…</p></li>
|
||||
|
||||
<li><p>I restarted the server with <code>logBots = false</code> and after it came back up I see 266 events with <code>isBots:true</code> (maybe they were buffered)… I will check again tomorrow</p></li>
|
||||
|
||||
<li><p>After a few hours I see there are still only 266 view events with <code>isBot:true</code> on DSpace Test’s Solr statistics core, so I’m definitely going to deploy this on CGSpace soon</p></li>
|
||||
|
||||
<li><p>Also, CGSpace currently has 60,089,394 view events with <code>isBot:true</code> in it’s Solr statistics core and it is 124GB!</p></li>
|
||||
|
||||
<li><p>Amazing! After running <code>dspace stats-util -f</code> on CGSpace the Solr statistics core went from 124GB to 60GB, and now there are only 700 events with <code>isBot:true</code> so I should really disable logging of bot events!</p></li>
|
||||
|
||||
<li><p>I’m super curious to see how the JVM heap usage changes…</p></li>
|
||||
|
||||
<li><p>I made (and merged) a pull request to disable bot logging on the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/387">#387</a>)</p></li>
|
||||
|
||||
<li><p>Now I’m wondering if there are other bot requests that aren’t classified as bots because the IP lists or user agents are outdated</p></li>
|
||||
|
||||
<li><p>DSpace ships a list of spider IPs, for example: <code>config/spiders/iplists.com-google.txt</code></p></li>
|
||||
|
||||
<li><p>I checked the list against all the IPs we’ve seen using the “Googlebot” useragent on CGSpace’s nginx access logs</p></li>
|
||||
|
||||
<li><p>The first thing I learned is that shit tons of IPs in Russia, Ukraine, Ireland, Brazil, Portugal, the US, Canada, etc are pretending to be “Googlebot”…</p></li>
|
||||
|
||||
<li><p>According to the <a href="https://support.google.com/webmasters/answer/80553">Googlebot FAQ</a> the domain name in the reverse DNS lookup should contain either <code>googlebot.com</code> or <code>google.com</code></p></li>
|
||||
|
||||
<li><p>In Solr this appears to be an appropriate query that I can maybe use later (returns 81,000 documents):</p>
|
||||
|
||||
<pre><code>*:* AND (dns:*googlebot.com. OR dns:*google.com.) AND isBot:false
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I translate that into a delete command using the <code>/update</code> handler:</li>
|
||||
</ul>
|
||||
<li><p>I translate that into a delete command using the <code>/update</code> handler:</p>
|
||||
|
||||
<pre><code>http://localhost:8081/solr/statistics/update?commit=true&stream.body=<delete><query>*:*+AND+(dns:*googlebot.com.+OR+dns:*google.com.)+AND+isBot:false</query></delete>
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And magically all those 81,000 documents are gone!</li>
|
||||
<li>After a few hours the Solr statistics core is down to 44GB on CGSpace!</li>
|
||||
<li>I did a <em>major</em> refactor and logic fix in the DSpace Statistics API’s <code>indexer.py</code></li>
|
||||
<li>Basically, it turns out that using <code>facet.mincount=1</code> is really beneficial for me because it reduces the size of the Solr result set, reduces the amount of data we need to ingest into PostgreSQL, and the API returns HTTP 404 Not Found for items without views or downloads anyways</li>
|
||||
<li>I deployed the new version on CGSpace and now it looks pretty good!</li>
|
||||
</ul>
|
||||
<li><p>And magically all those 81,000 documents are gone!</p></li>
|
||||
|
||||
<li><p>After a few hours the Solr statistics core is down to 44GB on CGSpace!</p></li>
|
||||
|
||||
<li><p>I did a <em>major</em> refactor and logic fix in the DSpace Statistics API’s <code>indexer.py</code></p></li>
|
||||
|
||||
<li><p>Basically, it turns out that using <code>facet.mincount=1</code> is really beneficial for me because it reduces the size of the Solr result set, reduces the amount of data we need to ingest into PostgreSQL, and the API returns HTTP 404 Not Found for items without views or downloads anyways</p></li>
|
||||
|
||||
<li><p>I deployed the new version on CGSpace and now it looks pretty good!</p>
|
||||
|
||||
<pre><code>Indexing item views (page 28 of 753)
|
||||
...
|
||||
Indexing item downloads (page 260 of 260)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And now it’s fast as hell due to the muuuuch smaller Solr statistics core</li>
|
||||
<li><p>And now it’s fast as hell due to the muuuuch smaller Solr statistics core</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-09-26">2018-09-26</h2>
|
||||
@ -720,68 +730,71 @@ Indexing item downloads (page 260 of 260)
|
||||
|
||||
<ul>
|
||||
<li>I will have to keep an eye on that over the next few weeks to see if things stay as they are</li>
|
||||
<li>I did a batch replacement of the access rights with my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a> script on DSpace Test:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I did a batch replacement of the access rights with my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a> script on DSpace Test:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/fix-access-status.csv -db dspace -u dspace -p 'fuuu' -f cg.identifier.status -t correct -m 206
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This changes “Open Access” to “Unrestricted Access” and “Limited Access” to “Restricted Access”</li>
|
||||
<li>After that I did a full Discovery reindex:</li>
|
||||
</ul>
|
||||
<li><p>This changes “Open Access” to “Unrestricted Access” and “Limited Access” to “Restricted Access”</p></li>
|
||||
|
||||
<li><p>After that I did a full Discovery reindex:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
|
||||
real 77m3.755s
|
||||
user 7m39.785s
|
||||
sys 2m18.485s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I told Peter it’s better to do the access rights before the usage rights because the git branches are conflicting with each other and it’s actually a pain in the ass to keep changing the values as we discuss, rebase, merge, fix conflicts…</li>
|
||||
<li>Udana and Mia from WLE were asking some questions about their <a href="https://feeds.feedburner.com/WLEcgspace">WLE Feedburner feed</a></li>
|
||||
<li>It’s pretty confusing, because until recently they were entering issue dates as only YYYY (like 2018) and their feeds were all showing items in the wrong order</li>
|
||||
<li>I’m not exactly sure what their problem now is, though (confusing)</li>
|
||||
<li>I updated the dspace-statistiscs-api to use psycopg2’s <code>execute_values()</code> to insert batches of 100 values into PostgreSQL instead of doing every insert individually</li>
|
||||
<li>On CGSpace this reduces the total run time of <code>indexer.py</code> from 432 seconds to 400 seconds (most of the time is actually spent in getting the data from Solr though)</li>
|
||||
<li><p>I told Peter it’s better to do the access rights before the usage rights because the git branches are conflicting with each other and it’s actually a pain in the ass to keep changing the values as we discuss, rebase, merge, fix conflicts…</p></li>
|
||||
|
||||
<li><p>Udana and Mia from WLE were asking some questions about their <a href="https://feeds.feedburner.com/WLEcgspace">WLE Feedburner feed</a></p></li>
|
||||
|
||||
<li><p>It’s pretty confusing, because until recently they were entering issue dates as only YYYY (like 2018) and their feeds were all showing items in the wrong order</p></li>
|
||||
|
||||
<li><p>I’m not exactly sure what their problem now is, though (confusing)</p></li>
|
||||
|
||||
<li><p>I updated the dspace-statistiscs-api to use psycopg2’s <code>execute_values()</code> to insert batches of 100 values into PostgreSQL instead of doing every insert individually</p></li>
|
||||
|
||||
<li><p>On CGSpace this reduces the total run time of <code>indexer.py</code> from 432 seconds to 400 seconds (most of the time is actually spent in getting the data from Solr though)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-09-27">2018-09-27</h2>
|
||||
|
||||
<ul>
|
||||
<li>Linode emailed to say that CGSpace’s (linode19) CPU load was high for a few hours last night</li>
|
||||
<li>Looking in the nginx logs around that time I see some new IPs that look like they are harvesting things:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Looking in the nginx logs around that time I see some new IPs that look like they are harvesting things:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "26/Sep/2018:(19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
295 34.218.226.147
|
||||
296 66.249.64.95
|
||||
350 157.55.39.185
|
||||
359 207.46.13.28
|
||||
371 157.55.39.85
|
||||
388 40.77.167.148
|
||||
444 66.249.64.93
|
||||
544 68.6.87.12
|
||||
834 66.249.64.91
|
||||
902 35.237.175.180
|
||||
</code></pre>
|
||||
295 34.218.226.147
|
||||
296 66.249.64.95
|
||||
350 157.55.39.185
|
||||
359 207.46.13.28
|
||||
371 157.55.39.85
|
||||
388 40.77.167.148
|
||||
444 66.249.64.93
|
||||
544 68.6.87.12
|
||||
834 66.249.64.91
|
||||
902 35.237.175.180
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>35.237.175.180</code> is on Google Cloud</li>
|
||||
<li><code>68.6.87.12</code> is on Cox Communications in the US (?)</li>
|
||||
<li>These hosts are not using proper user agents and are not re-using their Tomcat sessions:</li>
|
||||
</ul>
|
||||
<li><p><code>35.237.175.180</code> is on Google Cloud</p></li>
|
||||
|
||||
<li><p><code>68.6.87.12</code> is on Cox Communications in the US (?)</p></li>
|
||||
|
||||
<li><p>These hosts are not using proper user agents and are not re-using their Tomcat sessions:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=35.237.175.180' dspace.log.2018-09-26 | sort | uniq
|
||||
5423
|
||||
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=68.6.87.12' dspace.log.2018-09-26 | sort | uniq
|
||||
758
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I will add their IPs to the list of bad bots in nginx so we can add a “bot” user agent to them and let Tomcat’s Crawler Session Manager Valve handle them</li>
|
||||
<li>I asked Atmire to prepare an invoice for 125 credits</li>
|
||||
<li><p>I will add their IPs to the list of bad bots in nginx so we can add a “bot” user agent to them and let Tomcat’s Crawler Session Manager Valve handle them</p></li>
|
||||
|
||||
<li><p>I asked Atmire to prepare an invoice for 125 credits</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-09-29">2018-09-29</h2>
|
||||
@ -789,90 +802,80 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=68.6.87.12' dspace.log.2018-09-26
|
||||
<ul>
|
||||
<li>I merged some changes to author affiliations from Sisay as well as some corrections to organizational names using smart quotes like <code>Université d’Abomey Calavi</code> (<a href="https://github.com/ilri/DSpace/pull/388">#388</a>)</li>
|
||||
<li>Peter sent me a list of 43 author names to fix, but it had some encoding errors like <code>Belalcázar, John</code> like usual (I will tell him to stop trying to export as UTF-8 because it never seems to work)</li>
|
||||
<li>I did batch replaces for both on CGSpace with my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a> script:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I did batch replaces for both on CGSpace with my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a> script:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i 2018-09-29-fix-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t correct -m 211
|
||||
$ ./fix-metadata-values.py -i 2018-09-29-fix-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Afterwards I started a full Discovery re-index:</li>
|
||||
</ul>
|
||||
<li><p>Afterwards I started a full Discovery re-index:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Linode sent an alert that both CGSpace and DSpace Test were using high CPU for the last two hours</li>
|
||||
<li>It seems to be Moayad trying to do the AReS explorer indexing</li>
|
||||
<li>He was sending too many (5 or 10) concurrent requests to the server, but still… why is this shit so slow?!</li>
|
||||
<li><p>Linode sent an alert that both CGSpace and DSpace Test were using high CPU for the last two hours</p></li>
|
||||
|
||||
<li><p>It seems to be Moayad trying to do the AReS explorer indexing</p></li>
|
||||
|
||||
<li><p>He was sending too many (5 or 10) concurrent requests to the server, but still… why is this shit so slow?!</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-09-30">2018-09-30</h2>
|
||||
|
||||
<ul>
|
||||
<li>Valerio keeps sending items on CGSpace that have weird or incorrect languages, authors, etc</li>
|
||||
<li>I think I should just batch export and update all languages…</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think I should just batch export and update all languages…</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2018-09-30-languages.csv with csv;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then I can simply delete the “Other” and “other” ones because that’s not useful at all:</li>
|
||||
</ul>
|
||||
<li><p>Then I can simply delete the “Other” and “other” ones because that’s not useful at all:</p>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='Other';
|
||||
DELETE 6
|
||||
dspace=# DELETE FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='other';
|
||||
DELETE 79
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looking through the list I see some weird language codes like <code>gh</code>, so I checked out those items:</li>
|
||||
</ul>
|
||||
<li><p>Looking through the list I see some weird language codes like <code>gh</code>, so I checked out those items:</p>
|
||||
|
||||
<pre><code>dspace=# SELECT resource_id FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='gh';
|
||||
resource_id
|
||||
resource_id
|
||||
-------------
|
||||
94530
|
||||
94529
|
||||
94530
|
||||
94529
|
||||
dspace=# SELECT handle,item_id FROM item, handle WHERE handle.resource_type_id=2 AND handle.resource_id = item.item_id AND handle.resource_id in (94530, 94529);
|
||||
handle | item_id
|
||||
handle | item_id
|
||||
-------------+---------
|
||||
10568/91386 | 94529
|
||||
10568/91387 | 94530
|
||||
</code></pre>
|
||||
10568/91386 | 94529
|
||||
10568/91387 | 94530
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Those items are from Ghana, so the submitter apparently thought <code>gh</code> was a language… I can safely delete them:</li>
|
||||
</ul>
|
||||
<li><p>Those items are from Ghana, so the submitter apparently thought <code>gh</code> was a language… I can safely delete them:</p>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='gh';
|
||||
DELETE 2
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The next issue would be <code>jn</code>:</li>
|
||||
</ul>
|
||||
<li><p>The next issue would be <code>jn</code>:</p>
|
||||
|
||||
<pre><code>dspace=# SELECT resource_id FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='jn';
|
||||
resource_id
|
||||
resource_id
|
||||
-------------
|
||||
94001
|
||||
94003
|
||||
94001
|
||||
94003
|
||||
dspace=# SELECT handle,item_id FROM item, handle WHERE handle.resource_type_id=2 AND handle.resource_id = item.item_id AND handle.resource_id in (94001, 94003);
|
||||
handle | item_id
|
||||
handle | item_id
|
||||
-------------+---------
|
||||
10568/90868 | 94001
|
||||
10568/90870 | 94003
|
||||
</code></pre>
|
||||
10568/90868 | 94001
|
||||
10568/90870 | 94003
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Those items are about Japan, so I will update them to be <code>ja</code></li>
|
||||
<li>Other replacements:</li>
|
||||
</ul>
|
||||
<li><p>Those items are about Japan, so I will update them to be <code>ja</code></p></li>
|
||||
|
||||
<li><p>Other replacements:</p>
|
||||
|
||||
<pre><code>DELETE FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='gh';
|
||||
UPDATE metadatavalue SET text_value='fr' WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='fn';
|
||||
@ -880,10 +883,9 @@ UPDATE metadatavalue SET text_value='hi' WHERE resource_type_id=2 AND metadata_f
|
||||
UPDATE metadatavalue SET text_value='ja' WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='Ja';
|
||||
UPDATE metadatavalue SET text_value='ja' WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='jn';
|
||||
UPDATE metadatavalue SET text_value='ja' WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='jp';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then there are 12 items with <code>en|hi</code>, but they were all in one collection so I just exported it as a CSV and then re-imported the corrected metadata</li>
|
||||
<li><p>Then there are 12 items with <code>en|hi</code>, but they were all in one collection so I just exported it as a CSV and then re-imported the corrected metadata</p></li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -25,7 +25,7 @@ I created a GitHub issue to track this #389, because I’m super busy in Nai
|
||||
Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
|
||||
I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -114,106 +114,93 @@ I created a GitHub issue to track this #389, because I’m super busy in Nai
|
||||
<h2 id="2018-10-03">2018-10-03</h2>
|
||||
|
||||
<ul>
|
||||
<li>I see Moayad was busy collecting item views and downloads from CGSpace yesterday:</li>
|
||||
</ul>
|
||||
<li><p>I see Moayad was busy collecting item views and downloads from CGSpace yesterday:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1}
|
||||
' | sort | uniq -c | sort -n | tail -n 10
|
||||
933 40.77.167.90
|
||||
971 95.108.181.88
|
||||
1043 41.204.190.40
|
||||
1454 157.55.39.54
|
||||
1538 207.46.13.69
|
||||
1719 66.249.64.61
|
||||
2048 50.116.102.77
|
||||
4639 66.249.64.59
|
||||
4736 35.237.175.180
|
||||
150362 34.218.226.147
|
||||
</code></pre>
|
||||
933 40.77.167.90
|
||||
971 95.108.181.88
|
||||
1043 41.204.190.40
|
||||
1454 157.55.39.54
|
||||
1538 207.46.13.69
|
||||
1719 66.249.64.61
|
||||
2048 50.116.102.77
|
||||
4639 66.249.64.59
|
||||
4736 35.237.175.180
|
||||
150362 34.218.226.147
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Of those, about 20% were HTTP 500 responses (!):</li>
|
||||
</ul>
|
||||
<li><p>Of those, about 20% were HTTP 500 responses (!):</p>
|
||||
|
||||
<pre><code>$ zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | grep 34.218.226.147 | awk '{print $9}' | sort -n | uniq -c
|
||||
118927 200
|
||||
31435 500
|
||||
</code></pre>
|
||||
118927 200
|
||||
31435 500
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I added Phil Thornton and Sonal Henson’s ORCID identifiers to the controlled vocabulary for <code>cg.creator.orcid</code> and then re-generated the names using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</li>
|
||||
</ul>
|
||||
<li><p>I added Phil Thornton and Sonal Henson’s ORCID identifiers to the controlled vocabulary for <code>cg.creator.orcid</code> and then re-generated the names using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</p>
|
||||
|
||||
<pre><code>$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml | sort | uniq > 2018-10-03-orcids.txt
|
||||
$ ./resolve-orcids.py -i 2018-10-03-orcids.txt -o 2018-10-03-names.txt -d
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I found a new corner case error that I need to check, given <em>and</em> family names deactivated:</li>
|
||||
</ul>
|
||||
<li><p>I found a new corner case error that I need to check, given <em>and</em> family names deactivated:</p>
|
||||
|
||||
<pre><code>Looking up the names associated with ORCID iD: 0000-0001-7930-5752
|
||||
Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It appears to be Jim Lorenzen… I need to check that later!</li>
|
||||
<li>I merged the changes to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/390">#390</a>)</li>
|
||||
<li>Linode sent another alert about CPU usage on CGSpace (linode18) this evening</li>
|
||||
<li>It seems that Moayad is making quite a lot of requests today:</li>
|
||||
</ul>
|
||||
<li><p>It appears to be Jim Lorenzen… I need to check that later!</p></li>
|
||||
|
||||
<li><p>I merged the changes to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/390">#390</a>)</p></li>
|
||||
|
||||
<li><p>Linode sent another alert about CPU usage on CGSpace (linode18) this evening</p></li>
|
||||
|
||||
<li><p>It seems that Moayad is making quite a lot of requests today:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1594 157.55.39.160
|
||||
1627 157.55.39.173
|
||||
1774 136.243.6.84
|
||||
4228 35.237.175.180
|
||||
4497 70.32.83.92
|
||||
4856 66.249.64.59
|
||||
7120 50.116.102.77
|
||||
12518 138.201.49.199
|
||||
87646 34.218.226.147
|
||||
111729 213.139.53.62
|
||||
</code></pre>
|
||||
1594 157.55.39.160
|
||||
1627 157.55.39.173
|
||||
1774 136.243.6.84
|
||||
4228 35.237.175.180
|
||||
4497 70.32.83.92
|
||||
4856 66.249.64.59
|
||||
7120 50.116.102.77
|
||||
12518 138.201.49.199
|
||||
87646 34.218.226.147
|
||||
111729 213.139.53.62
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But in super positive news, he says they are using my new <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> and it’s MUCH faster than using Atmire CUA’s internal “restlet” API</li>
|
||||
<li>I don’t recognize the <code>138.201.49.199</code> IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams:</li>
|
||||
</ul>
|
||||
<li><p>But in super positive news, he says they are using my new <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> and it’s MUCH faster than using Atmire CUA’s internal “restlet” API</p></li>
|
||||
|
||||
<li><p>I don’t recognize the <code>138.201.49.199</code> IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams:</p>
|
||||
|
||||
<pre><code># grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c
|
||||
8324 GET /bitstream
|
||||
4193 GET /handle
|
||||
</code></pre>
|
||||
8324 GET /bitstream
|
||||
4193 GET /handle
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Suspiciously, it’s only grabbing the CGIAR System Office community (handle prefix 10947):</li>
|
||||
</ul>
|
||||
<li><p>Suspiciously, it’s only grabbing the CGIAR System Office community (handle prefix 10947):</p>
|
||||
|
||||
<pre><code># grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c
|
||||
7 GET /handle/10568
|
||||
4186 GET /handle/10947
|
||||
</code></pre>
|
||||
7 GET /handle/10568
|
||||
4186 GET /handle/10947
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The user agent is suspicious too:</li>
|
||||
</ul>
|
||||
<li><p>The user agent is suspicious too:</p>
|
||||
|
||||
<pre><code>Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It’s clearly a bot and it’s not re-using its Tomcat session, so I will add its IP to the nginx bad bot list</li>
|
||||
<li>I looked in Solr’s statistics core and these hits were actually all counted as <code>isBot:false</code> (of course)… hmmm</li>
|
||||
<li>I tagged all of Sonal and Phil’s items with their ORCID identifiers on CGSpace using my <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers.py</a> script:</li>
|
||||
</ul>
|
||||
<li><p>It’s clearly a bot and it’s not re-using its Tomcat session, so I will add its IP to the nginx bad bot list</p></li>
|
||||
|
||||
<li><p>I looked in Solr’s statistics core and these hits were actually all counted as <code>isBot:false</code> (of course)… hmmm</p></li>
|
||||
|
||||
<li><p>I tagged all of Sonal and Phil’s items with their ORCID identifiers on CGSpace using my <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers.py</a> script:</p>
|
||||
|
||||
<pre><code>$ ./add-orcid-identifiers-csv.py -i 2018-10-03-add-orcids.csv -db dspace -u dspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Where <code>2018-10-03-add-orcids.csv</code> contained:</li>
|
||||
</ul>
|
||||
<li><p>Where <code>2018-10-03-add-orcids.csv</code> contained:</p>
|
||||
|
||||
<pre><code>dc.contributor.author,cg.creator.id
|
||||
"Henson, Sonal P.",Sonal Henson: 0000-0002-2002-5462
|
||||
@ -224,7 +211,8 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
"Thornton, Philip K.",Philip Thornton: 0000-0002-1854-0182
|
||||
"Thornton, Phillip",Philip Thornton: 0000-0002-1854-0182
|
||||
"Thornton, Phillip K.",Philip Thornton: 0000-0002-1854-0182
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-04">2018-10-04</h2>
|
||||
|
||||
@ -239,16 +227,16 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
<li>I see there are other bundles we might need to pay attention to: <code>TEXT</code>, <code>@_LOGO-COLLECTION_@</code>, <code>@_LOGO-COMMUNITY_@</code>, etc…</li>
|
||||
<li>On a hunch I dropped the statistics table and re-indexed and now those two items above have no downloads</li>
|
||||
<li>So it’s fixed, but I’m not sure why!</li>
|
||||
<li>Peter wants to know the number of API requests per month, which was about 250,000 in September (exluding statlet requests):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Peter wants to know the number of API requests per month, which was about 250,000 in September (exluding statlet requests):</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/{oai,rest}.log* | grep -E 'Sep/2018' | grep -c -v 'statlets'
|
||||
251226
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I found a logic error in the dspace-statistics-api <code>indexer.py</code> script that was causing item views to be inserted into downloads</li>
|
||||
<li>I tagged version 0.4.2 of the tool and redeployed it on CGSpace</li>
|
||||
<li><p>I found a logic error in the dspace-statistics-api <code>indexer.py</code> script that was causing item views to be inserted into downloads</p></li>
|
||||
|
||||
<li><p>I tagged version 0.4.2 of the tool and redeployed it on CGSpace</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-05">2018-10-05</h2>
|
||||
@ -278,46 +266,49 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
|
||||
<ul>
|
||||
<li>Peter noticed that some recently added PDFs don’t have thumbnails</li>
|
||||
<li>When I tried to force them to be generated I got an error that I’ve never seen before:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>When I tried to force them to be generated I got an error that I’ve never seen before:</p>
|
||||
|
||||
<pre><code>$ dspace filter-media -v -f -i 10568/97613
|
||||
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: not authorized `/tmp/impdfthumb5039464037201498062.pdf' @ error/constitute.c/ReadImage/412.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see there was an update to Ubuntu’s ImageMagick on 2018-10-05, so maybe something changed or broke?</li>
|
||||
<li>I get the same error when forcing <code>filter-media</code> to run on DSpace Test too, so it’s gotta be an ImageMagic bug</li>
|
||||
<li>The ImageMagick version is currently 8:6.8.9.9-7ubuntu5.13, and there is an <a href="https://usn.ubuntu.com/3785-1/">Ubuntu Security Notice from 2018-10-04</a></li>
|
||||
<li>Wow, someone on <a href="https://twitter.com/rosscampbell/status/1048268966819319808">Twitter posted about this breaking his web application</a> (and it was retweeted by the ImageMagick acount!)</li>
|
||||
<li>I commented out the line that disables PDF thumbnails in <code>/etc/ImageMagick-6/policy.xml</code>:</li>
|
||||
</ul>
|
||||
<li><p>I see there was an update to Ubuntu’s ImageMagick on 2018-10-05, so maybe something changed or broke?</p></li>
|
||||
|
||||
<pre><code> <!--<policy domain="coder" rights="none" pattern="PDF" />-->
|
||||
</code></pre>
|
||||
<li><p>I get the same error when forcing <code>filter-media</code> to run on DSpace Test too, so it’s gotta be an ImageMagic bug</p></li>
|
||||
|
||||
<ul>
|
||||
<li>This works, but I’m not sure what ImageMagick’s long-term plan is if they are going to disable ALL image formats…</li>
|
||||
<li>I suppose I need to enable a workaround for this in Ansible?</li>
|
||||
<li><p>The ImageMagick version is currently 8:6.8.9.9-7ubuntu5.13, and there is an <a href="https://usn.ubuntu.com/3785-1/">Ubuntu Security Notice from 2018-10-04</a></p></li>
|
||||
|
||||
<li><p>Wow, someone on <a href="https://twitter.com/rosscampbell/status/1048268966819319808">Twitter posted about this breaking his web application</a> (and it was retweeted by the ImageMagick acount!)</p></li>
|
||||
|
||||
<li><p>I commented out the line that disables PDF thumbnails in <code>/etc/ImageMagick-6/policy.xml</code>:</p>
|
||||
|
||||
<pre><code><!--<policy domain="coder" rights="none" pattern="PDF" />-->
|
||||
</code></pre></li>
|
||||
|
||||
<li><p>This works, but I’m not sure what ImageMagick’s long-term plan is if they are going to disable ALL image formats…</p></li>
|
||||
|
||||
<li><p>I suppose I need to enable a workaround for this in Ansible?</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-11">2018-10-11</h2>
|
||||
|
||||
<ul>
|
||||
<li>I emailed DuraSpace to update <a href="https://duraspace.org/registry/entry/4188/?gvid=178">our entry in their DSpace registry</a> (the data was still on DSpace 3, JSPUI, etc)</li>
|
||||
<li>Generate a list of the top 1500 values for <code>dc.subject</code> so Sisay can start making a controlled vocabulary for it:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Generate a list of the top 1500 values for <code>dc.subject</code> so Sisay can start making a controlled vocabulary for it:</p>
|
||||
|
||||
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-10-11-top-1500-subject.csv WITH CSV HEADER;
|
||||
COPY 1500
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Give WorldFish advice about Handles because they are talking to some company called KnowledgeArc who recommends they do not use Handles!</li>
|
||||
<li>Last week I emailed Altmetric to ask if their software would notice mentions of our Handle in the format “handle:<sup>10568</sup>⁄<sub>80775</sub>” because I noticed that the <a href="https://landportal.org/library/resources/handle1056880775/unlocking-farming-potential-bangladesh%E2%80%99-polders">Land Portal does this</a></li>
|
||||
<li>Altmetric support responded to say no, but the reason is that Land Portal is doing even more strange stuff by not using <code><meta></code> tags in their page header, and using “dct:identifier” property instead of “dc:identifier”</li>
|
||||
<li>I re-created my local DSpace databse container using <a href="https://github.com/containers/libpod">podman</a> instead of Docker:</li>
|
||||
</ul>
|
||||
<li><p>Give WorldFish advice about Handles because they are talking to some company called KnowledgeArc who recommends they do not use Handles!</p></li>
|
||||
|
||||
<li><p>Last week I emailed Altmetric to ask if their software would notice mentions of our Handle in the format “handle:<sup>10568</sup>⁄<sub>80775</sub>” because I noticed that the <a href="https://landportal.org/library/resources/handle1056880775/unlocking-farming-potential-bangladesh%E2%80%99-polders">Land Portal does this</a></p></li>
|
||||
|
||||
<li><p>Altmetric support responded to say no, but the reason is that Land Portal is doing even more strange stuff by not using <code><meta></code> tags in their page header, and using “dct:identifier” property instead of “dc:identifier”</p></li>
|
||||
|
||||
<li><p>I re-created my local DSpace databse container using <a href="https://github.com/containers/libpod">podman</a> instead of Docker:</p>
|
||||
|
||||
<pre><code>$ mkdir -p ~/.local/lib/containers/volumes/dspacedb_data
|
||||
$ sudo podman create --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
|
||||
@ -328,30 +319,29 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2018-10-11.backup
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I tried to make an Artifactory in podman, but it seems to have problems because Artifactory is distributed on the Bintray repository</li>
|
||||
<li>I can pull the <code>docker.bintray.io/jfrog/artifactory-oss:latest</code> image, but not start it</li>
|
||||
<li>I decided to use a Sonatype Nexus repository instead:</li>
|
||||
</ul>
|
||||
<li><p>I tried to make an Artifactory in podman, but it seems to have problems because Artifactory is distributed on the Bintray repository</p></li>
|
||||
|
||||
<li><p>I can pull the <code>docker.bintray.io/jfrog/artifactory-oss:latest</code> image, but not start it</p></li>
|
||||
|
||||
<li><p>I decided to use a Sonatype Nexus repository instead:</p>
|
||||
|
||||
<pre><code>$ mkdir -p ~/.local/lib/containers/volumes/nexus_data
|
||||
$ sudo podman run --name nexus -d -v /home/aorth/.local/lib/containers/volumes/nexus_data:/nexus_data -p 8081:8081 sonatype/nexus3
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>With a few changes to my local Maven <code>settings.xml</code> it is working well</li>
|
||||
<li>Generate a list of the top 10,000 authors for Peter Ballantyne to look through:</li>
|
||||
</ul>
|
||||
<li><p>With a few changes to my local Maven <code>settings.xml</code> it is working well</p></li>
|
||||
|
||||
<li><p>Generate a list of the top 10,000 authors for Peter Ballantyne to look through:</p>
|
||||
|
||||
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 3 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 10000) to /tmp/2018-10-11-top-10000-authors.csv WITH CSV HEADER;
|
||||
COPY 10000
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>CTA uploaded some infographics that are very tall and their thumbnails disrupt the item lists on the front page and in their communities and collections</li>
|
||||
<li>I decided to constrain the max height of these to 200px using CSS (<a href="https://github.com/ilri/DSpace/pull/392">#392</a>)</li>
|
||||
<li><p>CTA uploaded some infographics that are very tall and their thumbnails disrupt the item lists on the front page and in their communities and collections</p></li>
|
||||
|
||||
<li><p>I decided to constrain the max height of these to 200px using CSS (<a href="https://github.com/ilri/DSpace/pull/392">#392</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-13">2018-10-13</h2>
|
||||
@ -359,27 +349,24 @@ COPY 10000
|
||||
<ul>
|
||||
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
|
||||
<li>Look through Peter’s list of 746 author corrections in OpenRefine</li>
|
||||
<li>I first facet by blank, trim whitespace, and then check for weird characters that might be indicative of encoding issues with this GREL:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I first facet by blank, trim whitespace, and then check for weird characters that might be indicative of encoding issues with this GREL:</p>
|
||||
|
||||
<pre><code>or(
|
||||
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||
isNotNull(value.match(/.*\u00A0.*/)),
|
||||
isNotNull(value.match(/.*\u200A.*/)),
|
||||
isNotNull(value.match(/.*\u2019.*/)),
|
||||
isNotNull(value.match(/.*\u00b4.*/))
|
||||
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||
isNotNull(value.match(/.*\u00A0.*/)),
|
||||
isNotNull(value.match(/.*\u200A.*/)),
|
||||
isNotNull(value.match(/.*\u2019.*/)),
|
||||
isNotNull(value.match(/.*\u00b4.*/))
|
||||
)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then I exported and applied them on my local test server:</li>
|
||||
</ul>
|
||||
<li><p>Then I exported and applied them on my local test server:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i 2018-10-11-top-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t CORRECT -m 3
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I will apply these on CGSpace when I do the other updates tomorrow, as well as double check the high scoring ones to see if they are correct in Sisay’s author controlled vocabulary</li>
|
||||
<li><p>I will apply these on CGSpace when I do the other updates tomorrow, as well as double check the high scoring ones to see if they are correct in Sisay’s author controlled vocabulary</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-14">2018-10-14</h2>
|
||||
@ -387,26 +374,32 @@ COPY 10000
|
||||
<ul>
|
||||
<li>Merge the authors controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/393">#393</a>), usage rights (<a href="https://github.com/ilri/DSpace/pull/394">#394</a>), and the upstream DSpace 5.x cherry-picks (<a href="https://github.com/ilri/DSpace/pull/395">#394</a>) into our <code>5_x-prod</code> branch</li>
|
||||
<li>Switch to new CGIAR LDAP server on CGSpace, as it’s been running (at least for authentication) on DSpace Test for the last few weeks, and I think they old one will be deprecated soon (today?)</li>
|
||||
<li>Apply Peter’s 746 author corrections on CGSpace and DSpace Test using my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a> script:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Apply Peter’s 746 author corrections on CGSpace and DSpace Test using my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a> script:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-10-11-top-authors.csv -f dc.contributor.author -t CORRECT -m 3 -db dspace -u dspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Run all system updates on CGSpace (linode19) and reboot the server</li>
|
||||
<li>After rebooting the server I noticed that Handles are not resolving, and the <code>dspace-handle-server</code> systemd service is not running (or rather, it exited with success)</li>
|
||||
<li>Restarting the service with systemd works for a few seconds, then the java process quits</li>
|
||||
<li>I suspect that the systemd service type needs to be <code>forking</code> rather than <code>simple</code>, because the service calls the default DSpace <code>start-handle-server</code> shell script, which uses <code>nohup</code> and <code>&</code> to background the java process</li>
|
||||
<li>It would be nice if there was a cleaner way to start the service and then just log to the systemd journal rather than all this hiding and log redirecting</li>
|
||||
<li>Email the Landportal.org people to ask if they would consider Dublin Core metadata tags in their page’s header, rather than the HTML properties they are using in their body</li>
|
||||
<li>Peter pointed out that some thumbnails were still not getting generated
|
||||
<li><p>Run all system updates on CGSpace (linode19) and reboot the server</p></li>
|
||||
|
||||
<li><p>After rebooting the server I noticed that Handles are not resolving, and the <code>dspace-handle-server</code> systemd service is not running (or rather, it exited with success)</p></li>
|
||||
|
||||
<li><p>Restarting the service with systemd works for a few seconds, then the java process quits</p></li>
|
||||
|
||||
<li><p>I suspect that the systemd service type needs to be <code>forking</code> rather than <code>simple</code>, because the service calls the default DSpace <code>start-handle-server</code> shell script, which uses <code>nohup</code> and <code>&</code> to background the java process</p></li>
|
||||
|
||||
<li><p>It would be nice if there was a cleaner way to start the service and then just log to the systemd journal rather than all this hiding and log redirecting</p></li>
|
||||
|
||||
<li><p>Email the Landportal.org people to ask if they would consider Dublin Core metadata tags in their page’s header, rather than the HTML properties they are using in their body</p></li>
|
||||
|
||||
<li><p>Peter pointed out that some thumbnails were still not getting generated</p>
|
||||
|
||||
<ul>
|
||||
<li>When I tried to generate them manually I noticed that the path to the CMYK profile had changed because Ubuntu upgraded Ghostscript from 9.18 to 9.25 last week… WTF?</li>
|
||||
<li>Looks like I can use <code>/usr/share/ghostscript/current</code> instead of <code>/usr/share/ghostscript/9.25</code>…</li>
|
||||
</ul></li>
|
||||
<li>I limited the tall thumbnails even further to 170px because Peter said CTA’s were still too tall at 200px (<a href="https://github.com/ilri/DSpace/pull/396">#396</a>)</li>
|
||||
|
||||
<li><p>I limited the tall thumbnails even further to 170px because Peter said CTA’s were still too tall at 200px (<a href="https://github.com/ilri/DSpace/pull/396">#396</a>)</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-15">2018-10-15</h2>
|
||||
@ -423,8 +416,8 @@ COPY 10000
|
||||
<li>He said he actually wants to test creation of communities, collections, etc, so I had to make him a super admin for now</li>
|
||||
<li>I told him we need to think about the workflow more seriously in the future</li>
|
||||
</ul></li>
|
||||
<li>I ended up having some issues with podman and went back to Docker, so I had to re-create my containers:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I ended up having some issues with podman and went back to Docker, so I had to re-create my containers:</p>
|
||||
|
||||
<pre><code>$ sudo docker run --name nexus --network dspace-build -d -v /home/aorth/.local/lib/containers/volumes/nexus_data:/nexus_data -p 8081:8081 sonatype/nexus3
|
||||
$ sudo docker run --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
|
||||
@ -434,21 +427,20 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2018-10-11.backup
|
||||
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-16">2018-10-16</h2>
|
||||
|
||||
<ul>
|
||||
<li>Generate a list of the schema on CGSpace so CodeObia can compare with MELSpace:</li>
|
||||
</ul>
|
||||
<li><p>Generate a list of the schema on CGSpace so CodeObia can compare with MELSpace:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (SELECT (CASE when metadata_schema_id=1 THEN 'dc' WHEN metadata_schema_id=2 THEN 'cg' END) AS schema, element, qualifier, scope_note FROM metadatafieldregistry where metadata_schema_id IN (1,2)) TO /tmp/cgspace-schema.csv WITH CSV HEADER;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Talking to the CodeObia guys about the REST API I started to wonder why it’s so slow and how I can quantify it in order to ask the dspace-tech mailing list for help profiling it</li>
|
||||
<li>Interestingly, the speed doesn’t get better after you request the same thing multiple times–it’s consistently bad on both CGSpace and DSpace Test!</li>
|
||||
</ul>
|
||||
<li><p>Talking to the CodeObia guys about the REST API I started to wonder why it’s so slow and how I can quantify it in order to ask the dspace-tech mailing list for help profiling it</p></li>
|
||||
|
||||
<li><p>Interestingly, the speed doesn’t get better after you request the same thing multiple times–it’s consistently bad on both CGSpace and DSpace Test!</p>
|
||||
|
||||
<pre><code>$ time http --print h 'https://cgspace.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
|
||||
...
|
||||
@ -465,13 +457,13 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,b
|
||||
0.23s user 0.04s system 1% cpu 16.460 total
|
||||
0.24s user 0.04s system 1% cpu 21.043 total
|
||||
0.22s user 0.04s system 1% cpu 17.132 total
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I should note that at this time CGSpace is using Oracle Java and DSpace Test is using OpenJDK (both version 8)</li>
|
||||
<li>I wonder if the Java garbage collector is important here, or if there are missing indexes in PostgreSQL?</li>
|
||||
<li>I switched DSpace Test to the G1GC garbage collector and tried again and now the results are worse!</li>
|
||||
</ul>
|
||||
<li><p>I should note that at this time CGSpace is using Oracle Java and DSpace Test is using OpenJDK (both version 8)</p></li>
|
||||
|
||||
<li><p>I wonder if the Java garbage collector is important here, or if there are missing indexes in PostgreSQL?</p></li>
|
||||
|
||||
<li><p>I switched DSpace Test to the G1GC garbage collector and tried again and now the results are worse!</p>
|
||||
|
||||
<pre><code>$ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
|
||||
...
|
||||
@ -480,11 +472,9 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,b
|
||||
0.24s user 0.02s system 1% cpu 22.496 total
|
||||
0.22s user 0.03s system 1% cpu 22.720 total
|
||||
0.23s user 0.03s system 1% cpu 22.632 total
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>If I make a request without the expands it is ten time faster:</li>
|
||||
</ul>
|
||||
<li><p>If I make a request without the expands it is ten time faster:</p>
|
||||
|
||||
<pre><code>$ time http --print h 'https://dspacetest.cgiar.org/rest/items?limit=100&offset=0'
|
||||
...
|
||||
@ -492,10 +482,9 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,b
|
||||
0.22s user 0.03s system 8% cpu 2.896 total
|
||||
0.21s user 0.05s system 9% cpu 2.787 total
|
||||
0.23s user 0.02s system 8% cpu 2.896 total
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I sent a mail to dspace-tech to ask how to profile this…</li>
|
||||
<li><p>I sent a mail to dspace-tech to ask how to profile this…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-17">2018-10-17</h2>
|
||||
@ -503,8 +492,8 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,b
|
||||
<ul>
|
||||
<li>I decided to update most of the existing metadata values that we have in <code>dc.rights</code> on CGSpace to be machine readable in SPDX format (with Creative Commons version if it was included)</li>
|
||||
<li>Most of the are from Bioversity, and I asked Maria for permission before updating them</li>
|
||||
<li>I manually went through and looked at the existing values and updated them in several batches:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I manually went through and looked at the existing values and updated them in several batches:</p>
|
||||
|
||||
<pre><code>UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%CC BY %';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-ND%' AND text_value LIKE '%by-nc-nd%';
|
||||
@ -522,34 +511,35 @@ UPDATE metadatavalue SET text_value='CC-BY-3.0' WHERE resource_type_id=2 AND met
|
||||
UPDATE metadatavalue SET text_value='CC-BY-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78184;
|
||||
UPDATE metadatavalue SET text_value='CC-BY' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value NOT LIKE '%CC0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%CC-%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78564;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I updated the fields on CGSpace and then started a re-index of Discovery</li>
|
||||
<li>We also need to re-think the <code>dc.rights</code> field in the submission form: we should probably use a popup controlled vocabulary and list the Creative Commons values with version numbers and allow the user to enter their own (like the ORCID identifier field)</li>
|
||||
<li>Ask Jane if we can use some of the BDP money to host AReS explorer on a more powerful server</li>
|
||||
<li>IWMI sent me a list of new ORCID identifiers for their staff so I combined them with our list, updated the names with my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script, and regenerated the controlled vocabulary:</li>
|
||||
</ul>
|
||||
<li><p>I updated the fields on CGSpace and then started a re-index of Discovery</p></li>
|
||||
|
||||
<li><p>We also need to re-think the <code>dc.rights</code> field in the submission form: we should probably use a popup controlled vocabulary and list the Creative Commons values with version numbers and allow the user to enter their own (like the ORCID identifier field)</p></li>
|
||||
|
||||
<li><p>Ask Jane if we can use some of the BDP money to host AReS explorer on a more powerful server</p></li>
|
||||
|
||||
<li><p>IWMI sent me a list of new ORCID identifiers for their staff so I combined them with our list, updated the names with my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script, and regenerated the controlled vocabulary:</p>
|
||||
|
||||
<pre><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json MEL\ ORCID_V2.json 2018-10-17-IWMI-ORCIDs.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq >
|
||||
2018-10-17-orcids.txt
|
||||
$ ./resolve-orcids.py -i 2018-10-17-orcids.txt -o 2018-10-17-names.txt -d
|
||||
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I also decided to add the ORCID identifiers that MEL had sent us a few months ago…</li>
|
||||
<li>One problem I had with the <code>resolve-orcids.py</code> script is that one user seems to have disabled their profile data since we last updated:</li>
|
||||
</ul>
|
||||
<li><p>I also decided to add the ORCID identifiers that MEL had sent us a few months ago…</p></li>
|
||||
|
||||
<li><p>One problem I had with the <code>resolve-orcids.py</code> script is that one user seems to have disabled their profile data since we last updated:</p>
|
||||
|
||||
<pre><code>Looking up the names associated with ORCID iD: 0000-0001-7930-5752
|
||||
Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So I need to handle that situation in the script for sure, but I’m not sure what to do organizationally or ethically, since that user disabled their name! Do we remove him from the list?</li>
|
||||
<li>I made a pull request and merged the ORCID updates into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/397">#397</a>)</li>
|
||||
<li>Improve the logic of name checking in my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script</li>
|
||||
<li><p>So I need to handle that situation in the script for sure, but I’m not sure what to do organizationally or ethically, since that user disabled their name! Do we remove him from the list?</p></li>
|
||||
|
||||
<li><p>I made a pull request and merged the ORCID updates into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/397">#397</a>)</p></li>
|
||||
|
||||
<li><p>Improve the logic of name checking in my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-18">2018-10-18</h2>
|
||||
@ -557,79 +547,78 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
<ul>
|
||||
<li>I granted MEL’s deposit user admin access to IITA, CIP, Bioversity, and RTB communities on DSpace Test so they can start testing real depositing</li>
|
||||
<li>After they do some tests and we check the values Enrico will send a formal email to Peter et al to ask that they start depositing officially</li>
|
||||
<li>I upgraded PostgreSQL to 9.6 on DSpace Test using Ansible, then had to manually <a href="https://wiki.postgresql.org/wiki/Using_pg_upgrade_on_Ubuntu/Debian">migrate from 9.5 to 9.6</a>:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I upgraded PostgreSQL to 9.6 on DSpace Test using Ansible, then had to manually <a href="https://wiki.postgresql.org/wiki/Using_pg_upgrade_on_Ubuntu/Debian">migrate from 9.5 to 9.6</a>:</p>
|
||||
|
||||
<pre><code># su - postgres
|
||||
$ /usr/lib/postgresql/9.6/bin/pg_upgrade -b /usr/lib/postgresql/9.5/bin -B /usr/lib/postgresql/9.6/bin -d /var/lib/postgresql/9.5/main -D /var/lib/postgresql/9.6/main -o ' -c config_file=/etc/postgresql/9.5/main/postgresql.conf' -O ' -c config_file=/etc/postgresql/9.6/main/postgresql.conf'
|
||||
$ exit
|
||||
# systemctl start postgresql
|
||||
# dpkg -r postgresql-9.5 postgresql-client-9.5 postgresql-contrib-9.5
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-19">2018-10-19</h2>
|
||||
|
||||
<ul>
|
||||
<li>Help Francesca from Bioversity generate a report about items they uploaded in 2015 through 2018</li>
|
||||
<li>Linode emailed me to say that CGSpace (linode18) had high CPU usage for a few hours this afternoon</li>
|
||||
<li>Looking at the nginx logs around that time I see the following IPs making the most requests:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Looking at the nginx logs around that time I see the following IPs making the most requests:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Oct/2018:(12|13|14|15)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
361 207.46.13.179
|
||||
395 181.115.248.74
|
||||
485 66.249.64.93
|
||||
535 157.55.39.213
|
||||
536 157.55.39.99
|
||||
551 34.218.226.147
|
||||
580 157.55.39.173
|
||||
1516 35.237.175.180
|
||||
1629 66.249.64.91
|
||||
1758 5.9.6.51
|
||||
</code></pre>
|
||||
361 207.46.13.179
|
||||
395 181.115.248.74
|
||||
485 66.249.64.93
|
||||
535 157.55.39.213
|
||||
536 157.55.39.99
|
||||
551 34.218.226.147
|
||||
580 157.55.39.173
|
||||
1516 35.237.175.180
|
||||
1629 66.249.64.91
|
||||
1758 5.9.6.51
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>5.9.6.51 is MegaIndex, which I’ve seen before…</li>
|
||||
<li><p>5.9.6.51 is MegaIndex, which I’ve seen before…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-20">2018-10-20</h2>
|
||||
|
||||
<ul>
|
||||
<li>I was going to try to run Solr in Docker because I learned I can run Docker on Travis-CI (for testing my dspace-statistics-api), but the oldest official Solr images are for 5.5, and DSpace’s Solr configuration is for 4.9</li>
|
||||
<li>This means our existing Solr configuration doesn’t run in Solr 5.5:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>This means our existing Solr configuration doesn’t run in Solr 5.5:</p>
|
||||
|
||||
<pre><code>$ sudo docker pull solr:5
|
||||
$ sudo docker run --name my_solr -v ~/dspace/solr/statistics/conf:/tmp/conf -d -p 8983:8983 -t solr:5
|
||||
$ sudo docker logs my_solr
|
||||
...
|
||||
ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics] Caused by: solr.IntField
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Apparently a bunch of variable types were removed in <a href="https://issues.apache.org/jira/browse/SOLR-5936">Solr 5</a></li>
|
||||
<li>So for now it’s actually a huge pain in the ass to run the tests for my dspace-statistics-api</li>
|
||||
<li>Linode sent a message that the CPU usage was high on CGSpace (linode18) last night</li>
|
||||
<li>According to the nginx logs around that time it was 5.9.6.51 (MegaIndex) again:</li>
|
||||
</ul>
|
||||
<li><p>Apparently a bunch of variable types were removed in <a href="https://issues.apache.org/jira/browse/SOLR-5936">Solr 5</a></p></li>
|
||||
|
||||
<li><p>So for now it’s actually a huge pain in the ass to run the tests for my dspace-statistics-api</p></li>
|
||||
|
||||
<li><p>Linode sent a message that the CPU usage was high on CGSpace (linode18) last night</p></li>
|
||||
|
||||
<li><p>According to the nginx logs around that time it was 5.9.6.51 (MegaIndex) again:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "20/Oct/2018:(14|15|16)" | awk '{print $1}' | sort
|
||||
| uniq -c | sort -n | tail -n 10
|
||||
249 207.46.13.179
|
||||
250 157.55.39.173
|
||||
301 54.166.207.223
|
||||
303 157.55.39.213
|
||||
310 66.249.64.95
|
||||
362 34.218.226.147
|
||||
381 66.249.64.93
|
||||
415 35.237.175.180
|
||||
1205 66.249.64.91
|
||||
1227 5.9.6.51
|
||||
</code></pre>
|
||||
| uniq -c | sort -n | tail -n 10
|
||||
249 207.46.13.179
|
||||
250 157.55.39.173
|
||||
301 54.166.207.223
|
||||
303 157.55.39.213
|
||||
310 66.249.64.95
|
||||
362 34.218.226.147
|
||||
381 66.249.64.93
|
||||
415 35.237.175.180
|
||||
1205 66.249.64.91
|
||||
1227 5.9.6.51
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This bot is only using the XMLUI and it does <em>not</em> seem to be re-using its sessions:</li>
|
||||
</ul>
|
||||
<li><p>This bot is only using the XMLUI and it does <em>not</em> seem to be re-using its sessions:</p>
|
||||
|
||||
<pre><code># grep -c 5.9.6.51 /var/log/nginx/*.log
|
||||
/var/log/nginx/access.log:9323
|
||||
@ -640,17 +629,14 @@ ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics]
|
||||
/var/log/nginx/statistics.log:0
|
||||
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-10-20 | sort | uniq
|
||||
8915
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Last month I added “crawl” to the Tomcat Crawler Session Manager Valve’s regular expression matching, and it seems to be working for MegaIndex’s user agent:</li>
|
||||
</ul>
|
||||
<li><p>Last month I added “crawl” to the Tomcat Crawler Session Manager Valve’s regular expression matching, and it seems to be working for MegaIndex’s user agent:</p>
|
||||
|
||||
<pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/1' User-Agent:'"Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)"'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So I’m not sure why this bot uses so many sessions — is it because it requests very slowly?</li>
|
||||
<li><p>So I’m not sure why this bot uses so many sessions — is it because it requests very slowly?</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-21">2018-10-21</h2>
|
||||
@ -664,27 +650,29 @@ ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics]
|
||||
<ul>
|
||||
<li>Post message to Yammer about usage rights (dc.rights)</li>
|
||||
<li>Change <code>build.properties</code> to use HTTPS for Handles in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a></li>
|
||||
<li>We will still need to do a batch update of the <code>dc.identifier.uri</code> and other fields in the database:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>We will still need to do a batch update of the <code>dc.identifier.uri</code> and other fields in the database:</p>
|
||||
|
||||
<pre><code>dspace=# UPDATE metadatavalue SET text_value=replace(text_value, 'http://', 'https://') WHERE resource_type_id=2 AND text_value LIKE 'http://hdl.handle.net%';
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>While I was doing that I found two items using CGSpace URLs instead of handles in their <code>dc.identifier.uri</code> so I corrected those</li>
|
||||
<li>I also found several items that had invalid characters or multiple Handles in some related URL field like <code>cg.link.reference</code> so I corrected those too</li>
|
||||
<li>Improve the usage rights on the submission form by adding a default selection with no value as well as a better hint to look for the CC license on the publisher page or in the PDF (<a href="https://github.com/ilri/DSpace/pull/398">#398</a>)</li>
|
||||
<li>I deployed the changes on CGSpace, ran all system updates, and rebooted the server</li>
|
||||
<li>Also, I updated all Handles in the database to use HTTPS:</li>
|
||||
</ul>
|
||||
<li><p>While I was doing that I found two items using CGSpace URLs instead of handles in their <code>dc.identifier.uri</code> so I corrected those</p></li>
|
||||
|
||||
<li><p>I also found several items that had invalid characters or multiple Handles in some related URL field like <code>cg.link.reference</code> so I corrected those too</p></li>
|
||||
|
||||
<li><p>Improve the usage rights on the submission form by adding a default selection with no value as well as a better hint to look for the CC license on the publisher page or in the PDF (<a href="https://github.com/ilri/DSpace/pull/398">#398</a>)</p></li>
|
||||
|
||||
<li><p>I deployed the changes on CGSpace, ran all system updates, and rebooted the server</p></li>
|
||||
|
||||
<li><p>Also, I updated all Handles in the database to use HTTPS:</p>
|
||||
|
||||
<pre><code>dspace=# UPDATE metadatavalue SET text_value=replace(text_value, 'http://', 'https://') WHERE resource_type_id=2 AND text_value LIKE 'http://hdl.handle.net%';
|
||||
UPDATE 76608
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Skype with Peter about ToRs for the AReS open source work and future plans to develop tools around the DSpace ecosystem</li>
|
||||
<li>Help CGSpace users with some issues related to usage rights</li>
|
||||
<li><p>Skype with Peter about ToRs for the AReS open source work and future plans to develop tools around the DSpace ecosystem</p></li>
|
||||
|
||||
<li><p>Help CGSpace users with some issues related to usage rights</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-23">2018-10-23</h2>
|
||||
@ -693,18 +681,16 @@ UPDATE 76608
|
||||
<li>Improve the usage rights (dc.rights) on CGSpace again by adding the long names in the submission form, as well as adding versio 3.0 and Creative Commons Zero (CC0) public domain license (<a href="https://github.com/ilri/DSpace/pull/399">#399</a>)</li>
|
||||
<li>Add “usage rights” to the XMLUI item display (<a href="https://github.com/ilri/DSpace/pull/400">#400</a>)</li>
|
||||
<li>I emailed the MARLO guys to ask if they can send us a dump of rights data and Handles from their system so we can tag our older items on CGSpace</li>
|
||||
<li>Testing REST login and logout via httpie because Felix from Earlham says he’s having issues:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing REST login and logout via httpie because Felix from Earlham says he’s having issues:</p>
|
||||
|
||||
<pre><code>$ http --print b POST 'https://dspacetest.cgiar.org/rest/login' email='testdeposit@cgiar.org' password=deposit
|
||||
acef8a4a-41f3-4392-b870-e873790f696b
|
||||
|
||||
$ http POST 'https://dspacetest.cgiar.org/rest/logout' rest-dspace-token:acef8a4a-41f3-4392-b870-e873790f696b
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Also works via curl (login, check status, logout, check status):</li>
|
||||
</ul>
|
||||
<li><p>Also works via curl (login, check status, logout, check status):</p>
|
||||
|
||||
<pre><code>$ curl -H "Content-Type: application/json" --data '{"email":"testdeposit@cgiar.org", "password":"deposit"}' https://dspacetest.cgiar.org/rest/login
|
||||
e09fb5e1-72b0-4811-a2e5-5c1cd78293cc
|
||||
@ -713,11 +699,11 @@ $ curl -X GET -H "Content-Type: application/json" -H "Accept: app
|
||||
$ curl -X POST -H "Content-Type: application/json" -H "rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc" https://dspacetest.cgiar.org/rest/logout
|
||||
$ curl -X GET -H "Content-Type: application/json" -H "Accept: application/json" -H "rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc" https://dspacetest.cgiar.org/rest/status
|
||||
{"okay":true,"authenticated":false,"email":null,"fullname":null,"token":null}%
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Improve the documentatin of my <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a></li>
|
||||
<li>Email Modi and Jayashree from ICRISAT to ask if they want to join CGSpace as partners</li>
|
||||
<li><p>Improve the documentatin of my <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a></p></li>
|
||||
|
||||
<li><p>Email Modi and Jayashree from ICRISAT to ask if they want to join CGSpace as partners</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-10-24">2018-10-24</h2>
|
||||
|
@ -39,7 +39,7 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list
|
||||
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
|
||||
Today these are the top 10 IPs:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -148,109 +148,105 @@ Today these are the top 10 IPs:
|
||||
<ul>
|
||||
<li>The <code>66.249.64.x</code> are definitely Google</li>
|
||||
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API</li>
|
||||
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
|
||||
</ul>
|
||||
|
||||
<li><p><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</p>
|
||||
|
||||
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>They at least seem to be re-using their Tomcat sessions:</li>
|
||||
</ul>
|
||||
<li><p>They at least seem to be re-using their Tomcat sessions:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=84.38.130.177' dspace.log.2018-11-03
|
||||
342
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>50.116.102.77</code> is also a regular REST API user</li>
|
||||
<li><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</li>
|
||||
<li><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</li>
|
||||
</ul>
|
||||
<li><p><code>50.116.102.77</code> is also a regular REST API user</p></li>
|
||||
|
||||
<li><p><code>40.77.167.175</code> and <code>207.46.13.156</code> seem to be Bing</p></li>
|
||||
|
||||
<li><p><code>138.201.52.218</code> seems to be on Hetzner in Germany, but is using this user agent:</p>
|
||||
|
||||
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And it doesn’t seem they are re-using their Tomcat sessions:</li>
|
||||
</ul>
|
||||
<li><p>And it doesn’t seem they are re-using their Tomcat sessions:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03
|
||||
1243
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…</li>
|
||||
<li>I wonder if it’s worth adding them to the list of bots in the nginx config?</li>
|
||||
<li>Linode sent a mail that CGSpace (linode18) is using high outgoing bandwidth</li>
|
||||
<li>Looking at the nginx logs again I see the following top ten IPs:</li>
|
||||
</ul>
|
||||
<li><p>Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…</p></li>
|
||||
|
||||
<li><p>I wonder if it’s worth adding them to the list of bots in the nginx config?</p></li>
|
||||
|
||||
<li><p>Linode sent a mail that CGSpace (linode18) is using high outgoing bandwidth</p></li>
|
||||
|
||||
<li><p>Looking at the nginx logs again I see the following top ten IPs:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1979 50.116.102.77
|
||||
1980 35.237.175.180
|
||||
2186 207.46.13.156
|
||||
2208 40.77.167.175
|
||||
2843 66.249.64.63
|
||||
4220 84.38.130.177
|
||||
4537 70.32.83.92
|
||||
5593 66.249.64.61
|
||||
12557 78.46.89.18
|
||||
32152 66.249.64.59
|
||||
</code></pre>
|
||||
1979 50.116.102.77
|
||||
1980 35.237.175.180
|
||||
2186 207.46.13.156
|
||||
2208 40.77.167.175
|
||||
2843 66.249.64.63
|
||||
4220 84.38.130.177
|
||||
4537 70.32.83.92
|
||||
5593 66.249.64.61
|
||||
12557 78.46.89.18
|
||||
32152 66.249.64.59
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>78.46.89.18</code> is new since I last checked a few hours ago, and it’s from Hetzner with the following user agent:</li>
|
||||
</ul>
|
||||
<li><p><code>78.46.89.18</code> is new since I last checked a few hours ago, and it’s from Hetzner with the following user agent:</p>
|
||||
|
||||
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>It’s making lots of requests, though actually it does seem to be re-using its Tomcat sessions:</li>
|
||||
</ul>
|
||||
<li><p>It’s making lots of requests, though actually it does seem to be re-using its Tomcat sessions:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-03
|
||||
8449
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-03 | sort | uniq | wc -l
|
||||
1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><em>Updated on 2018-12-04 to correct the grep command above, as it was inaccurate and it seems the bot was actually already re-using its Tomcat sessions</em></li>
|
||||
<li>I could add this IP to the list of bot IPs in nginx, but it seems like a futile effort when some new IP could come along and do the same thing</li>
|
||||
<li>Perhaps I should think about adding rate limits to dynamic pages like <code>/discover</code> and <code>/browse</code></li>
|
||||
<li>I think it’s reasonable for a human to click one of those links five or ten times a minute…</li>
|
||||
<li>To contrast, <code>78.46.89.18</code> made about 300 requests per minute for a few hours today:</li>
|
||||
</ul>
|
||||
<li><p><em>Updated on 2018-12-04 to correct the grep command above, as it was inaccurate and it seems the bot was actually already re-using its Tomcat sessions</em></p></li>
|
||||
|
||||
<li><p>I could add this IP to the list of bot IPs in nginx, but it seems like a futile effort when some new IP could come along and do the same thing</p></li>
|
||||
|
||||
<li><p>Perhaps I should think about adding rate limits to dynamic pages like <code>/discover</code> and <code>/browse</code></p></li>
|
||||
|
||||
<li><p>I think it’s reasonable for a human to click one of those links five or ten times a minute…</p></li>
|
||||
|
||||
<li><p>To contrast, <code>78.46.89.18</code> made about 300 requests per minute for a few hours today:</p>
|
||||
|
||||
<pre><code># grep 78.46.89.18 /var/log/nginx/access.log | grep -o -E '03/Nov/2018:[0-9][0-9]:[0-9][0-9]' | sort | uniq -c | sort -n | tail -n 20
|
||||
286 03/Nov/2018:18:02
|
||||
287 03/Nov/2018:18:21
|
||||
289 03/Nov/2018:18:23
|
||||
291 03/Nov/2018:18:27
|
||||
293 03/Nov/2018:18:34
|
||||
300 03/Nov/2018:17:58
|
||||
300 03/Nov/2018:18:22
|
||||
300 03/Nov/2018:18:32
|
||||
304 03/Nov/2018:18:12
|
||||
305 03/Nov/2018:18:13
|
||||
305 03/Nov/2018:18:24
|
||||
312 03/Nov/2018:18:39
|
||||
322 03/Nov/2018:18:17
|
||||
326 03/Nov/2018:18:38
|
||||
327 03/Nov/2018:18:16
|
||||
330 03/Nov/2018:17:57
|
||||
332 03/Nov/2018:18:19
|
||||
336 03/Nov/2018:17:56
|
||||
340 03/Nov/2018:18:14
|
||||
341 03/Nov/2018:18:18
|
||||
</code></pre>
|
||||
286 03/Nov/2018:18:02
|
||||
287 03/Nov/2018:18:21
|
||||
289 03/Nov/2018:18:23
|
||||
291 03/Nov/2018:18:27
|
||||
293 03/Nov/2018:18:34
|
||||
300 03/Nov/2018:17:58
|
||||
300 03/Nov/2018:18:22
|
||||
300 03/Nov/2018:18:32
|
||||
304 03/Nov/2018:18:12
|
||||
305 03/Nov/2018:18:13
|
||||
305 03/Nov/2018:18:24
|
||||
312 03/Nov/2018:18:39
|
||||
322 03/Nov/2018:18:17
|
||||
326 03/Nov/2018:18:38
|
||||
327 03/Nov/2018:18:16
|
||||
330 03/Nov/2018:17:57
|
||||
332 03/Nov/2018:18:19
|
||||
336 03/Nov/2018:17:56
|
||||
340 03/Nov/2018:18:14
|
||||
341 03/Nov/2018:18:18
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>If they want to download all our metadata and PDFs they should use an API rather than scraping the XMLUI</li>
|
||||
<li>I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later</li>
|
||||
<li>Also, this is the third (?) time a mysterious IP on Hetzner has done this… who is this?</li>
|
||||
<li><p>If they want to download all our metadata and PDFs they should use an API rather than scraping the XMLUI</p></li>
|
||||
|
||||
<li><p>I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later</p></li>
|
||||
|
||||
<li><p>Also, this is the third (?) time a mysterious IP on Hetzner has done this… who is this?</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-11-04">2018-11-04</h2>
|
||||
@ -258,137 +254,127 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-03
|
||||
<ul>
|
||||
<li>Forward Peter’s information about CGSpace financials to Modi from ICRISAT</li>
|
||||
<li>Linode emailed about the CPU load and outgoing bandwidth on CGSpace (linode18) again</li>
|
||||
<li>Here are the top ten IPs active so far this morning:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Here are the top ten IPs active so far this morning:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "04/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1083 2a03:2880:11ff:2::face:b00c
|
||||
1105 2a03:2880:11ff:d::face:b00c
|
||||
1111 2a03:2880:11ff:f::face:b00c
|
||||
1134 84.38.130.177
|
||||
1893 50.116.102.77
|
||||
2040 66.249.64.63
|
||||
4210 66.249.64.61
|
||||
4534 70.32.83.92
|
||||
13036 78.46.89.18
|
||||
20407 66.249.64.59
|
||||
</code></pre>
|
||||
1083 2a03:2880:11ff:2::face:b00c
|
||||
1105 2a03:2880:11ff:d::face:b00c
|
||||
1111 2a03:2880:11ff:f::face:b00c
|
||||
1134 84.38.130.177
|
||||
1893 50.116.102.77
|
||||
2040 66.249.64.63
|
||||
4210 66.249.64.61
|
||||
4534 70.32.83.92
|
||||
13036 78.46.89.18
|
||||
20407 66.249.64.59
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>78.46.89.18</code> is back… and it is still actually re-using its Tomcat sessions:</li>
|
||||
</ul>
|
||||
<li><p><code>78.46.89.18</code> is back… and it is still actually re-using its Tomcat sessions:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-04
|
||||
8765
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-04 | sort | uniq | wc -l
|
||||
1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><em>Updated on 2018-12-04 to correct the grep command and point out that the bot was actually re-using its Tomcat sessions properly</em></li>
|
||||
<li>Also, now we have a ton of Facebook crawlers:</li>
|
||||
</ul>
|
||||
<li><p><em>Updated on 2018-12-04 to correct the grep command and point out that the bot was actually re-using its Tomcat sessions properly</em></p></li>
|
||||
|
||||
<li><p>Also, now we have a ton of Facebook crawlers:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "04/Nov/2018" | grep "2a03:2880:11ff:" | awk '{print $1}' | sort | uniq -c | sort -n
|
||||
905 2a03:2880:11ff:b::face:b00c
|
||||
955 2a03:2880:11ff:5::face:b00c
|
||||
965 2a03:2880:11ff:e::face:b00c
|
||||
984 2a03:2880:11ff:8::face:b00c
|
||||
993 2a03:2880:11ff:3::face:b00c
|
||||
994 2a03:2880:11ff:7::face:b00c
|
||||
1006 2a03:2880:11ff:10::face:b00c
|
||||
1011 2a03:2880:11ff:4::face:b00c
|
||||
1023 2a03:2880:11ff:6::face:b00c
|
||||
1026 2a03:2880:11ff:9::face:b00c
|
||||
1039 2a03:2880:11ff:1::face:b00c
|
||||
1043 2a03:2880:11ff:c::face:b00c
|
||||
1070 2a03:2880:11ff::face:b00c
|
||||
1075 2a03:2880:11ff:a::face:b00c
|
||||
1093 2a03:2880:11ff:2::face:b00c
|
||||
1107 2a03:2880:11ff:d::face:b00c
|
||||
1116 2a03:2880:11ff:f::face:b00c
|
||||
</code></pre>
|
||||
905 2a03:2880:11ff:b::face:b00c
|
||||
955 2a03:2880:11ff:5::face:b00c
|
||||
965 2a03:2880:11ff:e::face:b00c
|
||||
984 2a03:2880:11ff:8::face:b00c
|
||||
993 2a03:2880:11ff:3::face:b00c
|
||||
994 2a03:2880:11ff:7::face:b00c
|
||||
1006 2a03:2880:11ff:10::face:b00c
|
||||
1011 2a03:2880:11ff:4::face:b00c
|
||||
1023 2a03:2880:11ff:6::face:b00c
|
||||
1026 2a03:2880:11ff:9::face:b00c
|
||||
1039 2a03:2880:11ff:1::face:b00c
|
||||
1043 2a03:2880:11ff:c::face:b00c
|
||||
1070 2a03:2880:11ff::face:b00c
|
||||
1075 2a03:2880:11ff:a::face:b00c
|
||||
1093 2a03:2880:11ff:2::face:b00c
|
||||
1107 2a03:2880:11ff:d::face:b00c
|
||||
1116 2a03:2880:11ff:f::face:b00c
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>They are really making shit tons of requests:</li>
|
||||
</ul>
|
||||
<li><p>They are really making shit tons of requests:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04
|
||||
37721
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><em>Updated on 2018-12-04 to correct the grep command to accurately show the number of requests</em></li>
|
||||
<li>Their user agent is:</li>
|
||||
</ul>
|
||||
<li><p><em>Updated on 2018-12-04 to correct the grep command to accurately show the number of requests</em></p></li>
|
||||
|
||||
<li><p>Their user agent is:</p>
|
||||
|
||||
<pre><code>facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I will add it to the Tomcat Crawler Session Manager valve</li>
|
||||
<li>Later in the evening… ok, this Facebook bot is getting super annoying:</li>
|
||||
</ul>
|
||||
<li><p>I will add it to the Tomcat Crawler Session Manager valve</p></li>
|
||||
|
||||
<li><p>Later in the evening… ok, this Facebook bot is getting super annoying:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "04/Nov/2018" | grep "2a03:2880:11ff:" | awk '{print $1}' | sort | uniq -c | sort -n
|
||||
1871 2a03:2880:11ff:3::face:b00c
|
||||
1885 2a03:2880:11ff:b::face:b00c
|
||||
1941 2a03:2880:11ff:8::face:b00c
|
||||
1942 2a03:2880:11ff:e::face:b00c
|
||||
1987 2a03:2880:11ff:1::face:b00c
|
||||
2023 2a03:2880:11ff:2::face:b00c
|
||||
2027 2a03:2880:11ff:4::face:b00c
|
||||
2032 2a03:2880:11ff:9::face:b00c
|
||||
2034 2a03:2880:11ff:10::face:b00c
|
||||
2050 2a03:2880:11ff:5::face:b00c
|
||||
2061 2a03:2880:11ff:c::face:b00c
|
||||
2076 2a03:2880:11ff:6::face:b00c
|
||||
2093 2a03:2880:11ff:7::face:b00c
|
||||
2107 2a03:2880:11ff::face:b00c
|
||||
2118 2a03:2880:11ff:d::face:b00c
|
||||
2164 2a03:2880:11ff:a::face:b00c
|
||||
2178 2a03:2880:11ff:f::face:b00c
|
||||
</code></pre>
|
||||
1871 2a03:2880:11ff:3::face:b00c
|
||||
1885 2a03:2880:11ff:b::face:b00c
|
||||
1941 2a03:2880:11ff:8::face:b00c
|
||||
1942 2a03:2880:11ff:e::face:b00c
|
||||
1987 2a03:2880:11ff:1::face:b00c
|
||||
2023 2a03:2880:11ff:2::face:b00c
|
||||
2027 2a03:2880:11ff:4::face:b00c
|
||||
2032 2a03:2880:11ff:9::face:b00c
|
||||
2034 2a03:2880:11ff:10::face:b00c
|
||||
2050 2a03:2880:11ff:5::face:b00c
|
||||
2061 2a03:2880:11ff:c::face:b00c
|
||||
2076 2a03:2880:11ff:6::face:b00c
|
||||
2093 2a03:2880:11ff:7::face:b00c
|
||||
2107 2a03:2880:11ff::face:b00c
|
||||
2118 2a03:2880:11ff:d::face:b00c
|
||||
2164 2a03:2880:11ff:a::face:b00c
|
||||
2178 2a03:2880:11ff:f::face:b00c
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Now at least the Tomcat Crawler Session Manager Valve seems to be forcing it to re-use some Tomcat sessions:</li>
|
||||
</ul>
|
||||
<li><p>Now at least the Tomcat Crawler Session Manager Valve seems to be forcing it to re-use some Tomcat sessions:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04
|
||||
37721
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11-04 | sort | uniq | wc -l
|
||||
15206
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think we still need to limit more of the dynamic pages, like the “most popular” country, item, and author pages</li>
|
||||
<li>It seems these are popular too, and there is no fucking way Facebook needs that information, yet they are requesting thousands of them!</li>
|
||||
</ul>
|
||||
<li><p>I think we still need to limit more of the dynamic pages, like the “most popular” country, item, and author pages</p></li>
|
||||
|
||||
<li><p>It seems these are popular too, and there is no fucking way Facebook needs that information, yet they are requesting thousands of them!</p>
|
||||
|
||||
<pre><code># grep 'face:b00c' /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -c 'most-popular/'
|
||||
7033
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I added the “most-popular” pages to the list that return <code>X-Robots-Tag: none</code> to try to inform bots not to index or follow those pages</li>
|
||||
<li>Also, I implemented an nginx rate limit of twelve requests per minute on all dynamic pages… I figure a human user might legitimately request one every five seconds</li>
|
||||
<li><p>I added the “most-popular” pages to the list that return <code>X-Robots-Tag: none</code> to try to inform bots not to index or follow those pages</p></li>
|
||||
|
||||
<li><p>Also, I implemented an nginx rate limit of twelve requests per minute on all dynamic pages… I figure a human user might legitimately request one every five seconds</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-11-05">2018-11-05</h2>
|
||||
|
||||
<ul>
|
||||
<li>I wrote a small Python script <a href="https://gist.github.com/alanorth/4ff81d5f65613814a66cb6f84fdf1fc5">add-dc-rights.py</a> to add usage rights (<code>dc.rights</code>) to CGSpace items based on the CSV Hector gave me from MARLO:</li>
|
||||
</ul>
|
||||
<li><p>I wrote a small Python script <a href="https://gist.github.com/alanorth/4ff81d5f65613814a66cb6f84fdf1fc5">add-dc-rights.py</a> to add usage rights (<code>dc.rights</code>) to CGSpace items based on the CSV Hector gave me from MARLO:</p>
|
||||
|
||||
<pre><code>$ ./add-dc-rights.py -i /tmp/marlo.csv -db dspace -u dspace -p 'fuuu'
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>The file <code>marlo.csv</code> was cleaned up and formatted in Open Refine</li>
|
||||
<li>165 of the items in their 2017 data are from CGSpace!</li>
|
||||
<li>I will add the data to CGSpace this week (done!)</li>
|
||||
<li>Jesus, is Facebook <em>trying</em> to be annoying? At least the Tomcat Crawler Session Manager Valve is working to force the bot to re-use its Tomcat sessions:</li>
|
||||
</ul>
|
||||
<li><p>The file <code>marlo.csv</code> was cleaned up and formatted in Open Refine</p></li>
|
||||
|
||||
<li><p>165 of the items in their 2017 data are from CGSpace!</p></li>
|
||||
|
||||
<li><p>I will add the data to CGSpace this week (done!)</p></li>
|
||||
|
||||
<li><p>Jesus, is Facebook <em>trying</em> to be annoying? At least the Tomcat Crawler Session Manager Valve is working to force the bot to re-use its Tomcat sessions:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "05/Nov/2018" | grep -c "2a03:2880:11ff:"
|
||||
29889
|
||||
@ -398,11 +384,11 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
|
||||
1057
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "05/Nov/2018" | grep "2a03:2880:11ff:" | grep -c -E "(handle|bitstream)"
|
||||
29896
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>29,000 requests from Facebook and none of the requests are to the dynamic pages I rate limited yesterday!</li>
|
||||
<li>At least the Tomcat Crawler Session Manager Valve is working now…</li>
|
||||
<li><p>29,000 requests from Facebook and none of the requests are to the dynamic pages I rate limited yesterday!</p></li>
|
||||
|
||||
<li><p>At least the Tomcat Crawler Session Manager Valve is working now…</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-11-06">2018-11-06</h2>
|
||||
@ -410,14 +396,13 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
|
||||
<ul>
|
||||
<li>I updated all the <a href="https://github.com/ilri/DSpace/wiki/Scripts">DSpace helper Python scripts</a> to validate against PEP 8 using Flake8</li>
|
||||
<li>While I was updating the <a href="https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50">rest-find-collections.py</a> script I noticed it was using <code>expand=all</code> to get the collection and community IDs</li>
|
||||
<li>I realized I actually only need <code>expand=collections,subCommunities</code>, and I wanted to see how much overhead the extra expands created so I did three runs of each:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I realized I actually only need <code>expand=collections,subCommunities</code>, and I wanted to see how much overhead the extra expands created so I did three runs of each:</p>
|
||||
|
||||
<pre><code>$ time ./rest-find-collections.py 10568/27629 --rest-url https://dspacetest.cgiar.org/rest
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Average time with all expands was 14.3 seconds, and 12.8 seconds with <code>collections,subCommunities</code>, so <strong>1.5 seconds difference</strong>!</li>
|
||||
<li><p>Average time with all expands was 14.3 seconds, and 12.8 seconds with <code>collections,subCommunities</code>, so <strong>1.5 seconds difference</strong>!</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-11-07">2018-11-07</h2>
|
||||
@ -482,55 +467,51 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
|
||||
<h2 id="2018-11-19">2018-11-19</h2>
|
||||
|
||||
<ul>
|
||||
<li>Testing corrections and deletions for AGROVOC (<code>dc.subject</code>) that Sisay and Peter were working on earlier this month:</li>
|
||||
</ul>
|
||||
<li><p>Testing corrections and deletions for AGROVOC (<code>dc.subject</code>) that Sisay and Peter were working on earlier this month:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i 2018-11-19-correct-agrovoc.csv -f dc.subject -t correct -m 57 -db dspace -u dspace -p 'fuu' -d
|
||||
$ ./delete-metadata-values.py -i 2018-11-19-delete-agrovoc.csv -f dc.subject -m 57 -db dspace -u dspace -p 'fuu' -d
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Then I ran them on both CGSpace and DSpace Test, and started a full Discovery re-index on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>Then I ran them on both CGSpace and DSpace Test, and started a full Discovery re-index on CGSpace:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generate a new list of the top 1500 AGROVOC subjects on CGSpace to send to Peter and Sisay:</li>
|
||||
</ul>
|
||||
<li><p>Generate a new list of the top 1500 AGROVOC subjects on CGSpace to send to Peter and Sisay:</p>
|
||||
|
||||
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-11-19-top-1500-subject.csv WITH CSV HEADER;
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-11-20">2018-11-20</h2>
|
||||
|
||||
<ul>
|
||||
<li>The Discovery re-indexing on CGSpace never finished yesterday… the command died after six minutes</li>
|
||||
<li>The <code>dspace.log.2018-11-19</code> shows this at the time:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The <code>dspace.log.2018-11-19</code> shows this at the time:</p>
|
||||
|
||||
<pre><code>2018-11-19 15:23:04,221 ERROR com.atmire.dspace.discovery.AtmireSolrService @ DSpace kernel cannot be null
|
||||
java.lang.IllegalStateException: DSpace kernel cannot be null
|
||||
at org.dspace.utils.DSpace.getServiceManager(DSpace.java:63)
|
||||
at org.dspace.utils.DSpace.getSingletonService(DSpace.java:87)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:102)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:815)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:884)
|
||||
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
|
||||
at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
at org.dspace.utils.DSpace.getServiceManager(DSpace.java:63)
|
||||
at org.dspace.utils.DSpace.getSingletonService(DSpace.java:87)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:102)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:815)
|
||||
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:884)
|
||||
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
|
||||
at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
2018-11-19 15:23:04,223 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (4629 of 76007): 72731
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I looked in the Solr log around that time and I don’t see anything…</li>
|
||||
<li>Working on Udana’s WLE records from last month, first the sixteen records in <a href="https://dspacetest.cgiar.org/handle/10568/108254">2018-11-20 RDL Temp</a>
|
||||
<li><p>I looked in the Solr log around that time and I don’t see anything…</p></li>
|
||||
|
||||
<li><p>Working on Udana’s WLE records from last month, first the sixteen records in <a href="https://dspacetest.cgiar.org/handle/10568/108254">2018-11-20 RDL Temp</a></p>
|
||||
|
||||
<ul>
|
||||
<li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81592">Restoring Degraded Landscapes collection</a></li>
|
||||
@ -543,7 +524,8 @@ java.lang.IllegalStateException: DSpace kernel cannot be null
|
||||
<li>remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: <code>value.replace('<27>','')</code></li>
|
||||
<li>add dc.rights to some fields that I noticed while checking DOIs</li>
|
||||
</ul></li>
|
||||
<li>Then the 24 records in <a href="https://dspacetest.cgiar.org/handle/10568/108271">2018-11-20 VRC Temp</a>
|
||||
|
||||
<li><p>Then the 24 records in <a href="https://dspacetest.cgiar.org/handle/10568/108271">2018-11-20 VRC Temp</a></p>
|
||||
|
||||
<ul>
|
||||
<li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81589">Variability, Risks and Competing Uses collection</a></li>
|
||||
@ -575,61 +557,61 @@ java.lang.IllegalStateException: DSpace kernel cannot be null
|
||||
|
||||
<ul>
|
||||
<li><a href="https://cgspace.cgiar.org/handle/10568/97709">This WLE item</a> is issued on 2018-10 and accessioned on 2018-10-22 but does not show up in the <a href="https://cgspace.cgiar.org/handle/10568/41888">WLE R4D Learning Series</a> collection on CGSpace for some reason, and therefore does not show up on the WLE publication website</li>
|
||||
<li>I tried to remove that collection from Discovery and do a simple re-index:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I tried to remove that collection from Discovery and do a simple re-index:</p>
|
||||
|
||||
<pre><code>$ dspace index-discovery -r 10568/41888
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>… but the item still doesn’t appear in the collection</li>
|
||||
<li>Now I will try a full Discovery re-index:</li>
|
||||
</ul>
|
||||
<li><p>… but the item still doesn’t appear in the collection</p></li>
|
||||
|
||||
<li><p>Now I will try a full Discovery re-index:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Ah, Marianne had set the item as private when she uploaded it, so it was still private</li>
|
||||
<li>I made it public and now it shows up in the collection list</li>
|
||||
<li>More work on the AReS terms of reference for CodeObia</li>
|
||||
<li>Erica from AgriKnowledge emailed me to say that they have implemented the changes in their item page UI so that they include the permanent identifier on items harvested from CGSpace, for example: <a href="https://www.agriknowledge.org/concern/generics/wd375w33s">https://www.agriknowledge.org/concern/generics/wd375w33s</a></li>
|
||||
<li><p>Ah, Marianne had set the item as private when she uploaded it, so it was still private</p></li>
|
||||
|
||||
<li><p>I made it public and now it shows up in the collection list</p></li>
|
||||
|
||||
<li><p>More work on the AReS terms of reference for CodeObia</p></li>
|
||||
|
||||
<li><p>Erica from AgriKnowledge emailed me to say that they have implemented the changes in their item page UI so that they include the permanent identifier on items harvested from CGSpace, for example: <a href="https://www.agriknowledge.org/concern/generics/wd375w33s">https://www.agriknowledge.org/concern/generics/wd375w33s</a></p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-11-27">2018-11-27</h2>
|
||||
|
||||
<ul>
|
||||
<li>Linode alerted me that the outbound traffic rate on CGSpace (linode19) was very high</li>
|
||||
<li>The top users this morning are:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top users this morning are:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "27/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
229 46.101.86.248
|
||||
261 66.249.64.61
|
||||
447 66.249.64.59
|
||||
541 207.46.13.77
|
||||
548 40.77.167.97
|
||||
564 35.237.175.180
|
||||
595 40.77.167.135
|
||||
611 157.55.39.91
|
||||
4564 205.186.128.185
|
||||
4564 70.32.83.92
|
||||
</code></pre>
|
||||
229 46.101.86.248
|
||||
261 66.249.64.61
|
||||
447 66.249.64.59
|
||||
541 207.46.13.77
|
||||
548 40.77.167.97
|
||||
564 35.237.175.180
|
||||
595 40.77.167.135
|
||||
611 157.55.39.91
|
||||
4564 205.186.128.185
|
||||
4564 70.32.83.92
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>We know 70.32.83.92 is CCAFS harvester on MediaTemple, but 205.186.128.185 is new appears to be a new CCAFS harvester</li>
|
||||
<li>I think we might want to prune some old accounts from CGSpace, perhaps users who haven’t logged in in the last two years would be a conservative bunch:</li>
|
||||
</ul>
|
||||
<li><p>We know 70.32.83.92 is CCAFS harvester on MediaTemple, but 205.186.128.185 is new appears to be a new CCAFS harvester</p></li>
|
||||
|
||||
<li><p>I think we might want to prune some old accounts from CGSpace, perhaps users who haven’t logged in in the last two years would be a conservative bunch:</p>
|
||||
|
||||
<pre><code>$ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 | wc -l
|
||||
409
|
||||
$ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This deleted about 380 users, skipping those who have submissions in the repository</li>
|
||||
<li>Judy Kimani was having problems taking tasks in the <a href="https://cgspace.cgiar.org/handle/10568/78">ILRI project reports, papers and documents</a> collection again
|
||||
<li><p>This deleted about 380 users, skipping those who have submissions in the repository</p></li>
|
||||
|
||||
<li><p>Judy Kimani was having problems taking tasks in the <a href="https://cgspace.cgiar.org/handle/10568/78">ILRI project reports, papers and documents</a> collection again</p>
|
||||
|
||||
<ul>
|
||||
<li>The workflow step 1 (accept/reject) is now undefined for some reason</li>
|
||||
@ -637,7 +619,8 @@ $ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d
|
||||
<li>Since then it looks like the group was deleted, so now she didn’t have permission to take or leave the tasks in her pool</li>
|
||||
<li>We added her back to the group, then she was able to take the tasks, and then we removed the group again, as we generally don’t use this step in CGSpace</li>
|
||||
</ul></li>
|
||||
<li>Help Marianne troubleshoot some issue with items in their WLE collections and the WLE publicatons website</li>
|
||||
|
||||
<li><p>Help Marianne troubleshoot some issue with items in their WLE collections and the WLE publicatons website</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-11-28">2018-11-28</h2>
|
||||
|
@ -39,7 +39,7 @@ Then I ran all system updates and restarted the server
|
||||
|
||||
I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -133,49 +133,45 @@ I noticed that there is another issue with PDF thumbnails on CGSpace, and I see
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
<li>The error when I try to manually run the media filter for one item from the command line:</li>
|
||||
</ul>
|
||||
<li><p>The error when I try to manually run the media filter for one item from the command line:</p>
|
||||
|
||||
<pre><code>org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d" "-f/tmp/magick-129895Bmp44lvUfxo" "-f/tmp/magick-12989C0QFG51fktLF"' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
|
||||
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-12989PcFN0DnJOej7%d" "-f/tmp/magick-129895Bmp44lvUfxo" "-f/tmp/magick-12989C0QFG51fktLF"' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
|
||||
at org.im4java.core.Info.getBaseInfo(Info.java:360)
|
||||
at org.im4java.core.Info.<init>(Info.java:151)
|
||||
at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:142)
|
||||
at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:24)
|
||||
at org.dspace.app.mediafilter.FormatFilter.processBitstream(FormatFilter.java:170)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:475)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:429)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:401)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:237)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
</code></pre>
|
||||
at org.im4java.core.Info.getBaseInfo(Info.java:360)
|
||||
at org.im4java.core.Info.<init>(Info.java:151)
|
||||
at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:142)
|
||||
at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:24)
|
||||
at org.dspace.app.mediafilter.FormatFilter.processBitstream(FormatFilter.java:170)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:475)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:429)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:401)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:237)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>A comment on <a href="https://stackoverflow.com/questions/53560755/ghostscript-9-26-update-breaks-imagick-readimage-for-multipage-pdf">StackOverflow question</a> from yesterday suggests it might be a bug with the <code>pngalpha</code> device in Ghostscript and <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">links to an upstream bug</a></li>
|
||||
<li>I think we need to wait for a fix from Ubuntu</li>
|
||||
<li>For what it’s worth, I get the same error on my local Arch Linux environment with Ghostscript 9.26:</li>
|
||||
</ul>
|
||||
<li><p>A comment on <a href="https://stackoverflow.com/questions/53560755/ghostscript-9-26-update-breaks-imagick-readimage-for-multipage-pdf">StackOverflow question</a> from yesterday suggests it might be a bug with the <code>pngalpha</code> device in Ghostscript and <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">links to an upstream bug</a></p></li>
|
||||
|
||||
<li><p>I think we need to wait for a fix from Ubuntu</p></li>
|
||||
|
||||
<li><p>For what it’s worth, I get the same error on my local Arch Linux environment with Ghostscript 9.26:</p>
|
||||
|
||||
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
|
||||
DEBUG: FC_WEIGHT didn't match
|
||||
zsh: segmentation fault (core dumped) gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>When I replace the <code>pngalpha</code> device with <code>png16m</code> as suggested in the StackOverflow comments it works:</li>
|
||||
</ul>
|
||||
<li><p>When I replace the <code>pngalpha</code> device with <code>png16m</code> as suggested in the StackOverflow comments it works:</p>
|
||||
|
||||
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=png16m -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Food\ safety\ Kenya\ fruits.pdf
|
||||
DEBUG: FC_WEIGHT didn't match
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Start proofing the latest round of 226 IITA archive records that Bosede sent last week and Sisay uploaded to DSpace Test this weekend (<a href="https://dspacetest.cgiar.org/handle/10568/108298">IITA_Dec_1_1997 aka Daniel1807</a>)
|
||||
<li><p>Start proofing the latest round of 226 IITA archive records that Bosede sent last week and Sisay uploaded to DSpace Test this weekend (<a href="https://dspacetest.cgiar.org/handle/10568/108298">IITA_Dec_1_1997 aka Daniel1807</a>)</p>
|
||||
|
||||
<ul>
|
||||
<li>One item missing the authorship type</li>
|
||||
@ -189,60 +185,54 @@ DEBUG: FC_WEIGHT didn't match
|
||||
<li>Six items had encoding errors in French text so I will ask Bosede to re-do them carefully</li>
|
||||
<li>Correct and normalize a few AGROVOC subjects</li>
|
||||
</ul></li>
|
||||
<li>Expand my “encoding error” detection GREL to include <code>~</code> as I saw a lot of that in some copy pasted French text recently:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Expand my “encoding error” detection GREL to include <code>~</code> as I saw a lot of that in some copy pasted French text recently:</p>
|
||||
|
||||
<pre><code>or(
|
||||
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||
isNotNull(value.match(/.*\u00A0.*/)),
|
||||
isNotNull(value.match(/.*\u200A.*/)),
|
||||
isNotNull(value.match(/.*\u2019.*/)),
|
||||
isNotNull(value.match(/.*\u00b4.*/)),
|
||||
isNotNull(value.match(/.*\u007e.*/))
|
||||
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||
isNotNull(value.match(/.*\u00A0.*/)),
|
||||
isNotNull(value.match(/.*\u200A.*/)),
|
||||
isNotNull(value.match(/.*\u2019.*/)),
|
||||
isNotNull(value.match(/.*\u00b4.*/)),
|
||||
isNotNull(value.match(/.*\u007e.*/))
|
||||
)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-12-03">2018-12-03</h2>
|
||||
|
||||
<ul>
|
||||
<li>I looked at the DSpace Ghostscript issue more and it seems to only affect certain PDFs…</li>
|
||||
<li>I can successfully generate a thumbnail for another recent item (<a href="https://hdl.handle.net/10568/98394"><sup>10568</sup>⁄<sub>98394</sub></a>), but not for <a href="https://hdl.handle.net/10568/98390"><sup>10568</sup>⁄<sub>98930</sub></a></li>
|
||||
<li>Even manually on my Arch Linux desktop with ghostscript 9.26-1 and the <code>pngalpha</code> device, I can generate a thumbnail for the first one (<sup>10568</sup>⁄<sub>98394</sub>):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Even manually on my Arch Linux desktop with ghostscript 9.26-1 and the <code>pngalpha</code> device, I can generate a thumbnail for the first one (<sup>10568</sup>⁄<sub>98394</sub>):</p>
|
||||
|
||||
<pre><code>$ gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pngalpha -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r72x72 -dFirstPage=1 -dLastPage=1 -sOutputFile=/tmp/out%d -f/home/aorth/Desktop/Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So it seems to be something about the PDFs themselves, perhaps related to alpha support?</li>
|
||||
<li>The first item (<sup>10568</sup>⁄<sub>98394</sub>) has the following information:</li>
|
||||
</ul>
|
||||
<li><p>So it seems to be something about the PDFs themselves, perhaps related to alpha support?</p></li>
|
||||
|
||||
<li><p>The first item (<sup>10568</sup>⁄<sub>98394</sub>) has the following information:</p>
|
||||
|
||||
<pre><code>$ identify Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf\[0\]
|
||||
Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf[0]=>Info Note Mainstreaming gender and social differentiation into CCAFS research activities in West Africa-converted.pdf PDF 595x841 595x841+0+0 16-bit sRGB 107443B 0.000u 0:00.000
|
||||
identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And wow, I can’t even run ImageMagick’s <code>identify</code> on the first page of the second item (<sup>10568</sup>⁄<sub>98930</sub>):</li>
|
||||
</ul>
|
||||
<li><p>And wow, I can’t even run ImageMagick’s <code>identify</code> on the first page of the second item (<sup>10568</sup>⁄<sub>98930</sub>):</p>
|
||||
|
||||
<pre><code>$ identify Food\ safety\ Kenya\ fruits.pdf\[0\]
|
||||
zsh: abort (core dumped) identify Food\ safety\ Kenya\ fruits.pdf\[0\]
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But with GraphicsMagick’s <code>identify</code> it works:</li>
|
||||
</ul>
|
||||
<li><p>But with GraphicsMagick’s <code>identify</code> it works:</p>
|
||||
|
||||
<pre><code>$ gm identify Food\ safety\ Kenya\ fruits.pdf\[0\]
|
||||
DEBUG: FC_WEIGHT didn't match
|
||||
Food safety Kenya fruits.pdf PDF 612x792+0+0 DirectClass 8-bit 1.4Mi 0.000u 0m:0.000002s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Interesting that ImageMagick’s <code>identify</code> <em>does</em> work if you do not specify a page, perhaps as <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">alluded to in the recent Ghostscript bug report</a>:</li>
|
||||
</ul>
|
||||
<li><p>Interesting that ImageMagick’s <code>identify</code> <em>does</em> work if you do not specify a page, perhaps as <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">alluded to in the recent Ghostscript bug report</a>:</p>
|
||||
|
||||
<pre><code>$ identify Food\ safety\ Kenya\ fruits.pdf
|
||||
Food safety Kenya fruits.pdf[0] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
|
||||
@ -251,69 +241,60 @@ Food safety Kenya fruits.pdf[2] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010
|
||||
Food safety Kenya fruits.pdf[3] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
|
||||
Food safety Kenya fruits.pdf[4] PDF 612x792 612x792+0+0 16-bit sRGB 64626B 0.010u 0:00.009
|
||||
identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/1746.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>As I expected, ImageMagick cannot generate a thumbnail, but GraphicsMagick can (though it looks like crap):</li>
|
||||
</ul>
|
||||
<li><p>As I expected, ImageMagick cannot generate a thumbnail, but GraphicsMagick can (though it looks like crap):</p>
|
||||
|
||||
<pre><code>$ convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg
|
||||
zsh: abort (core dumped) convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten
|
||||
$ gm convert Food\ safety\ Kenya\ fruits.pdf\[0\] -thumbnail 600x600 -flatten Food\ safety\ Kenya\ fruits.pdf.jpg
|
||||
DEBUG: FC_WEIGHT didn't match
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I inspected the troublesome PDF using <a href="http://jhove.openpreservation.org/">jhove</a> and noticed that it is using <code>ISO PDF/A-1, Level B</code> and the other one doesn’t list a profile, though I don’t think this is relevant</li>
|
||||
<li>I found another item that fails when generating a thumbnail (<a href="https://hdl.handle.net/10568/98391"><sup>10568</sup>⁄<sub>98391</sub></a>, DSpace complains:</li>
|
||||
</ul>
|
||||
<li><p>I inspected the troublesome PDF using <a href="http://jhove.openpreservation.org/">jhove</a> and noticed that it is using <code>ISO PDF/A-1, Level B</code> and the other one doesn’t list a profile, though I don’t think this is relevant</p></li>
|
||||
|
||||
<li><p>I found another item that fails when generating a thumbnail (<a href="https://hdl.handle.net/10568/98391"><sup>10568</sup>⁄<sub>98391</sub></a>, DSpace complains:</p>
|
||||
|
||||
<pre><code>org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d" "-f/tmp/magick-14296Q0rJjfCeIj3w" "-f/tmp/magick-14296k_K6MWqwvpDm"' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
|
||||
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d" "-f/tmp/magick-14296Q0rJjfCeIj3w" "-f/tmp/magick-14296k_K6MWqwvpDm"' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
|
||||
at org.im4java.core.Info.getBaseInfo(Info.java:360)
|
||||
at org.im4java.core.Info.<init>(Info.java:151)
|
||||
at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:142)
|
||||
at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:24)
|
||||
at org.dspace.app.mediafilter.FormatFilter.processBitstream(FormatFilter.java:170)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:475)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:429)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:401)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:237)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
at org.im4java.core.Info.getBaseInfo(Info.java:360)
|
||||
at org.im4java.core.Info.<init>(Info.java:151)
|
||||
at org.dspace.app.mediafilter.ImageMagickThumbnailFilter.getImageFile(ImageMagickThumbnailFilter.java:142)
|
||||
at org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.getDestinationStream(ImageMagickPdfThumbnailFilter.java:24)
|
||||
at org.dspace.app.mediafilter.FormatFilter.processBitstream(FormatFilter.java:170)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:475)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:429)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:401)
|
||||
at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:237)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
Caused by: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d" "-f/tmp/magick-14296Q0rJjfCeIj3w" "-f/tmp/magick-14296k_K6MWqwvpDm"' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
|
||||
at org.im4java.core.ImageCommand.run(ImageCommand.java:219)
|
||||
at org.im4java.core.Info.getBaseInfo(Info.java:342)
|
||||
... 14 more
|
||||
at org.im4java.core.ImageCommand.run(ImageCommand.java:219)
|
||||
at org.im4java.core.Info.getBaseInfo(Info.java:342)
|
||||
... 14 more
|
||||
Caused by: org.im4java.core.CommandException: identify: FailedToExecuteCommand `"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" -dFirstPage=1 -dLastPage=1 "-sOutputFile=/tmp/magick-142966vQs5Di64ntH%d" "-f/tmp/magick-14296Q0rJjfCeIj3w" "-f/tmp/magick-14296k_K6MWqwvpDm"' (-1) @ error/delegate.c/ExternalDelegateCommand/461.
|
||||
at org.im4java.core.ImageCommand.finished(ImageCommand.java:253)
|
||||
at org.im4java.process.ProcessStarter.run(ProcessStarter.java:314)
|
||||
at org.im4java.core.ImageCommand.run(ImageCommand.java:215)
|
||||
... 15 more
|
||||
</code></pre>
|
||||
at org.im4java.core.ImageCommand.finished(ImageCommand.java:253)
|
||||
at org.im4java.process.ProcessStarter.run(ProcessStarter.java:314)
|
||||
at org.im4java.core.ImageCommand.run(ImageCommand.java:215)
|
||||
... 15 more
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And on my Arch Linux environment ImageMagick’s <code>convert</code> also segfaults:</li>
|
||||
</ul>
|
||||
<li><p>And on my Arch Linux environment ImageMagick’s <code>convert</code> also segfaults:</p>
|
||||
|
||||
<pre><code>$ convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg
|
||||
zsh: abort (core dumped) convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] x60
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But GraphicsMagick’s <code>convert</code> works:</li>
|
||||
</ul>
|
||||
<li><p>But GraphicsMagick’s <code>convert</code> works:</p>
|
||||
|
||||
<pre><code>$ gm convert bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf\[0\] -thumbnail x600 -flatten bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf.jpg
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>So far the only thing that stands out is that the two files that don’t work were created with Microsoft Office 2016:</li>
|
||||
</ul>
|
||||
<li><p>So far the only thing that stands out is that the two files that don’t work were created with Microsoft Office 2016:</p>
|
||||
|
||||
<pre><code>$ pdfinfo bnfb_biofortification\ Module_Participants\ Guide\ 2018.pdf | grep -E '^(Creator|Producer)'
|
||||
Creator: Microsoft® Word 2016
|
||||
@ -321,134 +302,118 @@ Producer: Microsoft® Word 2016
|
||||
$ pdfinfo Food\ safety\ Kenya\ fruits.pdf | grep -E '^(Creator|Producer)'
|
||||
Creator: Microsoft® Word 2016
|
||||
Producer: Microsoft® Word 2016
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And the one that works was created with Office 365:</li>
|
||||
</ul>
|
||||
<li><p>And the one that works was created with Office 365:</p>
|
||||
|
||||
<pre><code>$ pdfinfo Info\ Note\ Mainstreaming\ gender\ and\ social\ differentiation\ into\ CCAFS\ research\ activities\ in\ West\ Africa-converted.pdf | grep -E '^(Creator|Producer)'
|
||||
Creator: Microsoft® Word for Office 365
|
||||
Producer: Microsoft® Word for Office 365
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I remembered an old technique I was using to generate thumbnails in 2015 using Inkscape followed by ImageMagick or GraphicsMagick:</li>
|
||||
</ul>
|
||||
<li><p>I remembered an old technique I was using to generate thumbnails in 2015 using Inkscape followed by ImageMagick or GraphicsMagick:</p>
|
||||
|
||||
<pre><code>$ inkscape Food\ safety\ Kenya\ fruits.pdf -z --export-dpi=72 --export-area-drawing --export-png='cover.png'
|
||||
$ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’ve tried a few times this week to register for the <a href="https://www.evisa.gov.et/">Ethiopian eVisa website</a>, but it is never successful</li>
|
||||
<li>In the end I tried one last time to just apply without registering and it was apparently successful</li>
|
||||
<li>Testing DSpace 5.8 (<code>5_x-prod</code> branch) in an Ubuntu 18.04 VM with Tomcat 8.5 and had some issues:
|
||||
<li><p>I’ve tried a few times this week to register for the <a href="https://www.evisa.gov.et/">Ethiopian eVisa website</a>, but it is never successful</p></li>
|
||||
|
||||
<li><p>In the end I tried one last time to just apply without registering and it was apparently successful</p></li>
|
||||
|
||||
<li><p>Testing DSpace 5.8 (<code>5_x-prod</code> branch) in an Ubuntu 18.04 VM with Tomcat 8.5 and had some issues:</p>
|
||||
|
||||
<ul>
|
||||
<li>JSPUI shows an internal error (log shows something about tag cloud, though, so might be unrelated)</li>
|
||||
<li>Atmire Listings and Reports, which use JSPUI, asks you to log in again and then doesn’t work</li>
|
||||
<li>Content and Usage Analysis doesn’t show up in the sidebar after logging in</li>
|
||||
<li>I can navigate to <a href="https://dspacetest.cgiar.org/atmire/reporting-suite/usage-graph-editor">/atmire/reporting-suite/usage-graph-editor</a>, but it’s only the Atmire theme and a “page not found” message</li>
|
||||
<li>Related messages from dspace.log:</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<li><p>Related messages from dspace.log:</p>
|
||||
|
||||
<pre><code>2018-12-03 15:44:00,030 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-datatables not found
|
||||
2018-12-03 15:44:03,390 ERROR com.atmire.app.webui.servlet.ExportServlet @ Error converter plugin not found: interface org.infoCon.ConverterPlugin
|
||||
...
|
||||
2018-12-03 15:45:01,667 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-listing-and-reports not found
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul>
|
||||
<li>I tested it on my local environment with Tomcat 8.5.34 and the JSPUI application still has an error (again, the logs show something about tag cloud, so be unrelated), and the Listings and Reports still asks you to log in again, despite already being logged in in XMLUI, but does appear to work (I generated a report and exported a PDF)</li>
|
||||
<li>I think the errors about missing Atmire components must be important, here on my local machine as well (though not the one about atmire-listings-and-reports):</li>
|
||||
</ul>
|
||||
<li><p>I tested it on my local environment with Tomcat 8.5.34 and the JSPUI application still has an error (again, the logs show something about tag cloud, so be unrelated), and the Listings and Reports still asks you to log in again, despite already being logged in in XMLUI, but does appear to work (I generated a report and exported a PDF)</p></li>
|
||||
|
||||
<li><p>I think the errors about missing Atmire components must be important, here on my local machine as well (though not the one about atmire-listings-and-reports):</p>
|
||||
|
||||
<pre><code>2018-12-03 16:44:00,009 WARN org.dspace.core.ConfigurationManager @ Requested configuration module: atmire-datatables not found
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This has got to be part Ubuntu Tomcat packaging, and part DSpace 5.x Tomcat 8.5 readiness…?</li>
|
||||
<li><p>This has got to be part Ubuntu Tomcat packaging, and part DSpace 5.x Tomcat 8.5 readiness…?</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-12-04">2018-12-04</h2>
|
||||
|
||||
<ul>
|
||||
<li>Last night Linode sent a message that the load on CGSpace (linode18) was too high, here’s a list of the top users at the time and throughout the day:</li>
|
||||
</ul>
|
||||
<li><p>Last night Linode sent a message that the load on CGSpace (linode18) was too high, here’s a list of the top users at the time and throughout the day:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Dec/2018:1(5|6|7|8)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
225 40.77.167.142
|
||||
226 66.249.64.63
|
||||
232 46.101.86.248
|
||||
285 45.5.186.2
|
||||
333 54.70.40.11
|
||||
411 193.29.13.85
|
||||
476 34.218.226.147
|
||||
962 66.249.70.27
|
||||
1193 35.237.175.180
|
||||
1450 2a01:4f8:140:3192::2
|
||||
225 40.77.167.142
|
||||
226 66.249.64.63
|
||||
232 46.101.86.248
|
||||
285 45.5.186.2
|
||||
333 54.70.40.11
|
||||
411 193.29.13.85
|
||||
476 34.218.226.147
|
||||
962 66.249.70.27
|
||||
1193 35.237.175.180
|
||||
1450 2a01:4f8:140:3192::2
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Dec/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1141 207.46.13.57
|
||||
1299 197.210.168.174
|
||||
1341 54.70.40.11
|
||||
1429 40.77.167.142
|
||||
1528 34.218.226.147
|
||||
1973 66.249.70.27
|
||||
2079 50.116.102.77
|
||||
2494 78.46.79.71
|
||||
3210 2a01:4f8:140:3192::2
|
||||
4190 35.237.175.180
|
||||
</code></pre>
|
||||
1141 207.46.13.57
|
||||
1299 197.210.168.174
|
||||
1341 54.70.40.11
|
||||
1429 40.77.167.142
|
||||
1528 34.218.226.147
|
||||
1973 66.249.70.27
|
||||
2079 50.116.102.77
|
||||
2494 78.46.79.71
|
||||
3210 2a01:4f8:140:3192::2
|
||||
4190 35.237.175.180
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>35.237.175.180</code> is known to us (CCAFS?), and I’ve already added it to the list of bot IPs in nginx, which appears to be working:</li>
|
||||
</ul>
|
||||
<li><p><code>35.237.175.180</code> is known to us (CCAFS?), and I’ve already added it to the list of bot IPs in nginx, which appears to be working:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=35.237.175.180' dspace.log.2018-12-03
|
||||
4772
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=35.237.175.180' dspace.log.2018-12-03 | sort | uniq | wc -l
|
||||
630
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I haven’t seen <code>2a01:4f8:140:3192::2</code> before. Its user agent is some new bot:</li>
|
||||
</ul>
|
||||
<li><p>I haven’t seen <code>2a01:4f8:140:3192::2</code> before. Its user agent is some new bot:</p>
|
||||
|
||||
<pre><code>Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>At least it seems the Tomcat Crawler Session Manager Valve is working to re-use the common bot XMLUI sessions:</li>
|
||||
</ul>
|
||||
<li><p>At least it seems the Tomcat Crawler Session Manager Valve is working to re-use the common bot XMLUI sessions:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=2a01:4f8:140:3192::2' dspace.log.2018-12-03
|
||||
5111
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a01:4f8:140:3192::2' dspace.log.2018-12-03 | sort | uniq | wc -l
|
||||
419
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>78.46.79.71</code> is another host on Hetzner with the following user agent:</li>
|
||||
</ul>
|
||||
<li><p><code>78.46.79.71</code> is another host on Hetzner with the following user agent:</p>
|
||||
|
||||
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>This is not the first time a host on Hetzner has used a “normal” user agent to make thousands of requests</li>
|
||||
<li>At least it is re-using its Tomcat sessions somehow:</li>
|
||||
</ul>
|
||||
<li><p>This is not the first time a host on Hetzner has used a “normal” user agent to make thousands of requests</p></li>
|
||||
|
||||
<li><p>At least it is re-using its Tomcat sessions somehow:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03
|
||||
2044
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03 | sort | uniq | wc -l
|
||||
1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In other news, it’s good to see my re-work of the database connectivity in the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> actually caused a reduction of persistent database connections (from 1 to 0, but still!):</li>
|
||||
<li><p>In other news, it’s good to see my re-work of the database connectivity in the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> actually caused a reduction of persistent database connections (from 1 to 0, but still!):</p></li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2018/12/postgres_connections_db-month.png" alt="PostgreSQL connections day" /></p>
|
||||
@ -463,43 +428,40 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03
|
||||
|
||||
<ul>
|
||||
<li>Linode sent a message that the CPU usage of CGSpace (linode18) is too high last night</li>
|
||||
<li>I looked in the logs and there’s nothing particular going on:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I looked in the logs and there’s nothing particular going on:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "05/Dec/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1225 157.55.39.177
|
||||
1240 207.46.13.12
|
||||
1261 207.46.13.101
|
||||
1411 207.46.13.157
|
||||
1529 34.218.226.147
|
||||
2085 50.116.102.77
|
||||
3334 2a01:7e00::f03c:91ff:fe0a:d645
|
||||
3733 66.249.70.27
|
||||
3815 35.237.175.180
|
||||
7669 54.70.40.11
|
||||
</code></pre>
|
||||
1225 157.55.39.177
|
||||
1240 207.46.13.12
|
||||
1261 207.46.13.101
|
||||
1411 207.46.13.157
|
||||
1529 34.218.226.147
|
||||
2085 50.116.102.77
|
||||
3334 2a01:7e00::f03c:91ff:fe0a:d645
|
||||
3733 66.249.70.27
|
||||
3815 35.237.175.180
|
||||
7669 54.70.40.11
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>54.70.40.11</code> is some new bot with the following user agent:</li>
|
||||
</ul>
|
||||
<li><p><code>54.70.40.11</code> is some new bot with the following user agent:</p>
|
||||
|
||||
<pre><code>Mozilla/5.0 (compatible) SemanticScholarBot (+https://www.semanticscholar.org/crawler)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But Tomcat is forcing them to re-use their Tomcat sessions with the Crawler Session Manager valve:</li>
|
||||
</ul>
|
||||
<li><p>But Tomcat is forcing them to re-use their Tomcat sessions with the Crawler Session Manager valve:</p>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
|
||||
6980
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05 | sort | uniq | wc -l
|
||||
1156
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>2a01:7e00::f03c:91ff:fe0a:d645</code> appears to be the CKM dev server where Danny is testing harvesting via Drupal</li>
|
||||
<li>It seems they are hitting the XMLUI’s OpenSearch a bit, but mostly on the REST API so no issues here yet</li>
|
||||
<li><code>Drupal</code> is already in the Tomcat Crawler Session Manager Valve’s regex so that’s good!</li>
|
||||
<li><p><code>2a01:7e00::f03c:91ff:fe0a:d645</code> appears to be the CKM dev server where Danny is testing harvesting via Drupal</p></li>
|
||||
|
||||
<li><p>It seems they are hitting the XMLUI’s OpenSearch a bit, but mostly on the REST API so no issues here yet</p></li>
|
||||
|
||||
<li><p><code>Drupal</code> is already in the Tomcat Crawler Session Manager Valve’s regex so that’s good!</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-12-10">2018-12-10</h2>
|
||||
@ -541,32 +503,30 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
|
||||
|
||||
<ul>
|
||||
<li>Linode alerted me twice today that the load on CGSpace (linode18) was very high</li>
|
||||
<li>Looking at the nginx logs I see a few new IPs in the top 10:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Looking at the nginx logs I see a few new IPs in the top 10:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "17/Dec/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
927 157.55.39.81
|
||||
975 54.70.40.11
|
||||
2090 50.116.102.77
|
||||
2121 66.249.66.219
|
||||
3811 35.237.175.180
|
||||
4590 205.186.128.185
|
||||
4590 70.32.83.92
|
||||
5436 2a01:4f8:173:1e85::2
|
||||
5438 143.233.227.216
|
||||
6706 94.71.244.172
|
||||
</code></pre>
|
||||
927 157.55.39.81
|
||||
975 54.70.40.11
|
||||
2090 50.116.102.77
|
||||
2121 66.249.66.219
|
||||
3811 35.237.175.180
|
||||
4590 205.186.128.185
|
||||
4590 70.32.83.92
|
||||
5436 2a01:4f8:173:1e85::2
|
||||
5438 143.233.227.216
|
||||
6706 94.71.244.172
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>94.71.244.172</code> and <code>143.233.227.216</code> are both in Greece and use the following user agent:</li>
|
||||
</ul>
|
||||
<li><p><code>94.71.244.172</code> and <code>143.233.227.216</code> are both in Greece and use the following user agent:</p>
|
||||
|
||||
<pre><code>Mozilla/3.0 (compatible; Indy Library)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see that I added this bot to the Tomcat Crawler Session Manager valve in 2017-12 so its XMLUI sessions are getting re-used</li>
|
||||
<li><code>2a01:4f8:173:1e85::2</code> is some new bot called <code>BLEXBot/1.0</code> which should be matching the existing “bot” pattern in the Tomcat Crawler Session Manager regex</li>
|
||||
<li><p>I see that I added this bot to the Tomcat Crawler Session Manager valve in 2017-12 so its XMLUI sessions are getting re-used</p></li>
|
||||
|
||||
<li><p><code>2a01:4f8:173:1e85::2</code> is some new bot called <code>BLEXBot/1.0</code> which should be matching the existing “bot” pattern in the Tomcat Crawler Session Manager regex</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-12-18">2018-12-18</h2>
|
||||
@ -584,8 +544,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
|
||||
<h2 id="2018-12-20">2018-12-20</h2>
|
||||
|
||||
<ul>
|
||||
<li>Testing compression of PostgreSQL backups with xz and gzip:</li>
|
||||
</ul>
|
||||
<li><p>Testing compression of PostgreSQL backups with xz and gzip:</p>
|
||||
|
||||
<pre><code>$ time xz -c cgspace_2018-12-19.backup > cgspace_2018-12-19.backup.xz
|
||||
xz -c cgspace_2018-12-19.backup > cgspace_2018-12-19.backup.xz 48.29s user 0.19s system 99% cpu 48.579 total
|
||||
@ -595,43 +554,40 @@ $ ls -lh cgspace_2018-12-19.backup*
|
||||
-rw-r--r-- 1 aorth aorth 96M Dec 19 02:15 cgspace_2018-12-19.backup
|
||||
-rw-r--r-- 1 aorth aorth 94M Dec 20 11:36 cgspace_2018-12-19.backup.gz
|
||||
-rw-r--r-- 1 aorth aorth 93M Dec 20 11:35 cgspace_2018-12-19.backup.xz
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Looks like it’s really not worth it…</li>
|
||||
<li>Peter pointed out that Discovery filters for CTA subjects on item pages were not working</li>
|
||||
<li>It looks like there were some mismatches in the Discovery index names and the XMLUI configuration, so I fixed them (<a href="https://github.com/ilri/DSpace/pull/406">#406</a>)</li>
|
||||
<li>Peter asked if we could create a controlled vocabulary for publishers (<code>dc.publisher</code>)</li>
|
||||
<li>I see we have about 3500 distinct publishers:</li>
|
||||
</ul>
|
||||
<li><p>Looks like it’s really not worth it…</p></li>
|
||||
|
||||
<li><p>Peter pointed out that Discovery filters for CTA subjects on item pages were not working</p></li>
|
||||
|
||||
<li><p>It looks like there were some mismatches in the Discovery index names and the XMLUI configuration, so I fixed them (<a href="https://github.com/ilri/DSpace/pull/406">#406</a>)</p></li>
|
||||
|
||||
<li><p>Peter asked if we could create a controlled vocabulary for publishers (<code>dc.publisher</code>)</p></li>
|
||||
|
||||
<li><p>I see we have about 3500 distinct publishers:</p>
|
||||
|
||||
<pre><code># SELECT COUNT(DISTINCT(text_value)) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=39;
|
||||
count
|
||||
count
|
||||
-------
|
||||
3522
|
||||
3522
|
||||
(1 row)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I reverted the metadata changes related to “Unrestricted Access” and “Restricted Access” on DSpace Test because we’re not pushing forward with the new status terms for now</li>
|
||||
<li>Purge remaining Oracle Java 8 stuff from CGSpace (linode18) since we migrated to OpenJDK a few months ago:</li>
|
||||
</ul>
|
||||
<li><p>I reverted the metadata changes related to “Unrestricted Access” and “Restricted Access” on DSpace Test because we’re not pushing forward with the new status terms for now</p></li>
|
||||
|
||||
<li><p>Purge remaining Oracle Java 8 stuff from CGSpace (linode18) since we migrated to OpenJDK a few months ago:</p>
|
||||
|
||||
<pre><code># dpkg -P oracle-java8-installer oracle-java8-set-default
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Update usage rights on CGSpace as we agreed with Maria Garruccio and Peter last month:</li>
|
||||
</ul>
|
||||
<li><p>Update usage rights on CGSpace as we agreed with Maria Garruccio and Peter last month:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-11-27-update-rights.csv -f dc.rights -t correct -m 53 -db dspace -u dspace -p 'fuu' -d
|
||||
Connected to database.
|
||||
Fixed 466 occurences of: Copyrighted; Any re-use allowed
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Upgrade PostgreSQL on CGSpace (linode18) from 9.5 to 9.6:</li>
|
||||
</ul>
|
||||
<li><p>Upgrade PostgreSQL on CGSpace (linode18) from 9.5 to 9.6:</p>
|
||||
|
||||
<pre><code># apt install postgresql-9.6 postgresql-client-9.6 postgresql-contrib-9.6 postgresql-server-dev-9.6
|
||||
# pg_ctlcluster 9.5 main stop
|
||||
@ -642,74 +598,69 @@ Fixed 466 occurences of: Copyrighted; Any re-use allowed
|
||||
# pg_upgradecluster 9.5 main
|
||||
# pg_dropcluster 9.5 main
|
||||
# dpkg -l | grep postgresql | grep 9.5 | awk '{print $2}' | xargs dpkg -r
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I’ve been running PostgreSQL 9.6 for months on my local development and public DSpace Test (linode19) environments</li>
|
||||
<li>Run all system updates on CGSpace (linode18) and restart the server</li>
|
||||
<li>Try to run the DSpace cleanup script on CGSpace (linode18), but I get some errors about foreign key constraints:</li>
|
||||
</ul>
|
||||
<li><p>I’ve been running PostgreSQL 9.6 for months on my local development and public DSpace Test (linode19) environments</p></li>
|
||||
|
||||
<li><p>Run all system updates on CGSpace (linode18) and restart the server</p></li>
|
||||
|
||||
<li><p>Try to run the DSpace cleanup script on CGSpace (linode18), but I get some errors about foreign key constraints:</p>
|
||||
|
||||
<pre><code>$ dspace cleanup -v
|
||||
- Deleting bitstream information (ID: 158227)
|
||||
- Deleting bitstream record from database (ID: 158227)
|
||||
- Deleting bitstream information (ID: 158227)
|
||||
- Deleting bitstream record from database (ID: 158227)
|
||||
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(158227) is still referenced from table "bundle".
|
||||
Detail: Key (bitstream_id)=(158227) is still referenced from table "bundle".
|
||||
...
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>As always, the solution is to delete those IDs manually in PostgreSQL:</li>
|
||||
</ul>
|
||||
<li><p>As always, the solution is to delete those IDs manually in PostgreSQL:</p>
|
||||
|
||||
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (158227, 158251);'
|
||||
UPDATE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>After all that I started a full Discovery reindex to get the index name changes and rights updates</li>
|
||||
<li><p>After all that I started a full Discovery reindex to get the index name changes and rights updates</p></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-12-29">2018-12-29</h2>
|
||||
|
||||
<ul>
|
||||
<li>CGSpace went down today for a few minutes while I was at dinner and I quickly restarted Tomcat</li>
|
||||
<li>The top IP addresses as of this evening are:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top IP addresses as of this evening are:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "29/Dec/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
963 40.77.167.152
|
||||
987 35.237.175.180
|
||||
1062 40.77.167.55
|
||||
1464 66.249.66.223
|
||||
1660 34.218.226.147
|
||||
1801 70.32.83.92
|
||||
2005 50.116.102.77
|
||||
3218 66.249.66.219
|
||||
4608 205.186.128.185
|
||||
5585 54.70.40.11
|
||||
</code></pre>
|
||||
963 40.77.167.152
|
||||
987 35.237.175.180
|
||||
1062 40.77.167.55
|
||||
1464 66.249.66.223
|
||||
1660 34.218.226.147
|
||||
1801 70.32.83.92
|
||||
2005 50.116.102.77
|
||||
3218 66.249.66.219
|
||||
4608 205.186.128.185
|
||||
5585 54.70.40.11
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And just around the time of the alert:</li>
|
||||
</ul>
|
||||
<li><p>And just around the time of the alert:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log.1 /var/log/nginx/*.log.2.gz | grep -E "29/Dec/2018:1(6|7|8)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
115 66.249.66.223
|
||||
118 207.46.13.14
|
||||
123 34.218.226.147
|
||||
133 95.108.181.88
|
||||
137 35.237.175.180
|
||||
164 66.249.66.219
|
||||
260 157.55.39.59
|
||||
291 40.77.167.55
|
||||
312 207.46.13.129
|
||||
1253 54.70.40.11
|
||||
</code></pre>
|
||||
115 66.249.66.223
|
||||
118 207.46.13.14
|
||||
123 34.218.226.147
|
||||
133 95.108.181.88
|
||||
137 35.237.175.180
|
||||
164 66.249.66.219
|
||||
260 157.55.39.59
|
||||
291 40.77.167.55
|
||||
312 207.46.13.129
|
||||
1253 54.70.40.11
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>All these look ok (<code>54.70.40.11</code> is known to us from earlier this month and should be reusing its Tomcat sessions)</li>
|
||||
<li>So I’m not sure what was going on last night…</li>
|
||||
<li><p>All these look ok (<code>54.70.40.11</code> is known to us from earlier this month and should be reusing its Tomcat sessions)</p></li>
|
||||
|
||||
<li><p>So I’m not sure what was going on last night…</p></li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -16,20 +16,19 @@ A user on the dspace-tech mailing list offered some suggestions for troubleshoot
|
||||
Apparently if the item is in the workflowitem table it is submitted to a workflow
|
||||
And if it is in the workspaceitem table it is in the pre-submitted state
|
||||
|
||||
The item seems to be in a pre-submitted state, so I tried to delete it from there:
|
||||
|
||||
The item seems to be in a pre-submitted state, so I tried to delete it from there:
|
||||
|
||||
dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
|
||||
DELETE 1
|
||||
|
||||
|
||||
|
||||
But after this I tried to delete the item from the XMLUI and it is still present…
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-05/" />
|
||||
<meta property="article:published_time" content="2019-05-01T07:37:43+03:00"/>
|
||||
<meta property="article:modified_time" content="2019-05-03T10:29:01+03:00"/>
|
||||
<meta property="article:modified_time" content="2019-05-03T16:33:34+03:00"/>
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="May, 2019"/>
|
||||
@ -43,17 +42,16 @@ A user on the dspace-tech mailing list offered some suggestions for troubleshoot
|
||||
Apparently if the item is in the workflowitem table it is submitted to a workflow
|
||||
And if it is in the workspaceitem table it is in the pre-submitted state
|
||||
|
||||
The item seems to be in a pre-submitted state, so I tried to delete it from there:
|
||||
|
||||
The item seems to be in a pre-submitted state, so I tried to delete it from there:
|
||||
|
||||
dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
|
||||
DELETE 1
|
||||
|
||||
|
||||
|
||||
But after this I tried to delete the item from the XMLUI and it is still present…
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -63,9 +61,9 @@ But after this I tried to delete the item from the XMLUI and it is still present
|
||||
"@type": "BlogPosting",
|
||||
"headline": "May, 2019",
|
||||
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-05\/",
|
||||
"wordCount": "568",
|
||||
"wordCount": "644",
|
||||
"datePublished": "2019-05-01T07:37:43\x2b03:00",
|
||||
"dateModified": "2019-05-03T10:29:01\x2b03:00",
|
||||
"dateModified": "2019-05-03T16:33:34\x2b03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -142,113 +140,123 @@ But after this I tried to delete the item from the XMLUI and it is still present
|
||||
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
|
||||
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
|
||||
</ul></li>
|
||||
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</li>
|
||||
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</p></li>
|
||||
</ul>
|
||||
|
||||
<ul>
|
||||
<li>I managed to delete the problematic item from the database
|
||||
<li><p>I managed to delete the problematic item from the database</p>
|
||||
|
||||
<ul>
|
||||
<li>First I deleted the item’s bitstream in XMLUI and then ran <code>dspace cleanup -v</code> to remove it from the assetstore</li>
|
||||
<li>Then I ran the following SQL:</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<li><p>Then I ran the following SQL:</p>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM metadatavalue WHERE resource_id=74648;
|
||||
dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
|
||||
dspace=# DELETE FROM item WHERE item_id=74648;
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Now the item is (hopefully) really gone and I can continue to troubleshoot the issue with REST API’s <code>/items/find-by-metadata-value</code> endpoint
|
||||
|
||||
<ul>
|
||||
<li>Of course I run into another HTTP 401 error when I continue trying the LandPortal search from last month:</li>
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<li><p>Now the item is (hopefully) really gone and I can continue to troubleshoot the issue with REST API’s <code>/items/find-by-metadata-value</code> endpoint</p>
|
||||
|
||||
<ul>
|
||||
<li><p>Of course I run into another HTTP 401 error when I continue trying the LandPortal search from last month:</p>
|
||||
|
||||
<pre><code>$ curl -f -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": "en_US"}'
|
||||
curl: (22) The requested URL returned error: 401 Unauthorized
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul>
|
||||
<li>The DSpace log shows the item ID (because I modified the error text):</li>
|
||||
</ul>
|
||||
<li><p>The DSpace log shows the item ID (because I modified the error text):</p>
|
||||
|
||||
<pre><code>2019-05-01 11:41:11,069 ERROR org.dspace.rest.ItemsResource @ User(anonymous) has not permission to read item(id=77708)!
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>If I delete that one I get another, making the list of item IDs so far:
|
||||
<li><p>If I delete that one I get another, making the list of item IDs so far:</p>
|
||||
|
||||
<ul>
|
||||
<li>74648</li>
|
||||
<li>77708</li>
|
||||
<li>85079</li>
|
||||
</ul></li>
|
||||
<li>Some are in the <code>workspaceitem</code> table (pre-submission), others are in the <code>workflowitem</code> table (submitted), and others are actually approved, but withdrawn…
|
||||
|
||||
<li><p>Some are in the <code>workspaceitem</code> table (pre-submission), others are in the <code>workflowitem</code> table (submitted), and others are actually approved, but withdrawn…</p>
|
||||
|
||||
<ul>
|
||||
<li>This is actually a worthless exercise because the real issue is that the <code>/items/find-by-metadata-value</code> endpoint is simply designed flawed and shouldn’t be fatally erroring when the search returns items the user doesn’t have permission to access</li>
|
||||
<li>It would take way too much time to try to fix the fucked up items that are in limbo by deleting them in SQL, but also, it doesn’t actually fix the problem because some items are <em>submitted</em> but <em>withdrawn</em>, so they actually have handles and everything</li>
|
||||
<li>I think the solution is to recommend people don’t use the <code>/items/find-by-metadata-value</code> endpoint</li>
|
||||
</ul></li>
|
||||
<li>CIP is asking about embedding PDF thumbnail images in their RSS feeds again
|
||||
|
||||
<li><p>CIP is asking about embedding PDF thumbnail images in their RSS feeds again</p>
|
||||
|
||||
<ul>
|
||||
<li>They asked in 2018-09 as well and I told them it wasn’t possible</li>
|
||||
<li>To make sure, I looked at <a href="https://wiki.duraspace.org/display/DSPACE/Enable+Media+RSS+Feeds">the documentation for RSS media feeds</a> and tried it, but couldn’t get it to work</li>
|
||||
<li>It seems to be geared towards iTunes and Podcasts… I dunno</li>
|
||||
</ul></li>
|
||||
<li>CIP also asked for a way to get an XML file of all their RTB journal articles on CGSpace
|
||||
|
||||
<li><p>CIP also asked for a way to get an XML file of all their RTB journal articles on CGSpace</p>
|
||||
|
||||
<ul>
|
||||
<li>I told them to use the REST API like (where <code>1179</code> is the id of the RTB journal articles collection):</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
<li><p>I told them to use the REST API like (where <code>1179</code> is the id of the RTB journal articles collection):</p>
|
||||
|
||||
<pre><code>https://cgspace.cgiar.org/rest/collections/1179/items?limit=812&expand=metadata
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2019-05-03">2019-05-03</h2>
|
||||
|
||||
<ul>
|
||||
<li>A user from CIAT emailed to say that CGSpace submission emails have not been working the last few weeks
|
||||
<li><p>A user from CIAT emailed to say that CGSpace submission emails have not been working the last few weeks</p>
|
||||
|
||||
<ul>
|
||||
<li>I checked the <code>dspace test-email</code> script on CGSpace and they are indeed failing:</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
<li><p>I checked the <code>dspace test-email</code> script on CGSpace and they are indeed failing:</p>
|
||||
|
||||
<pre><code>$ dspace test-email
|
||||
|
||||
About to send test email:
|
||||
- To: woohoo@cgiar.org
|
||||
- Subject: DSpace test email
|
||||
- Server: smtp.office365.com
|
||||
- To: woohoo@cgiar.org
|
||||
- Subject: DSpace test email
|
||||
- Server: smtp.office365.com
|
||||
|
||||
Error sending email:
|
||||
- Error: javax.mail.AuthenticationFailedException
|
||||
- Error: javax.mail.AuthenticationFailedException
|
||||
|
||||
Please see the DSpace documentation for assistance.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul>
|
||||
<li>I will ask ILRI ICT to reset the password
|
||||
<li><p>I will ask ILRI ICT to reset the password</p>
|
||||
|
||||
<ul>
|
||||
<li>They reset the password and I tested it on CGSpace</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2019-05-05">2019-05-05</h2>
|
||||
|
||||
<ul>
|
||||
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
|
||||
<li>Merge changes into the <code>5_x-prod</code> branch of CGSpace:
|
||||
|
||||
<ul>
|
||||
<li>Updates to remove deprecated social media websites (Google+ and Delicious), update Twitter share intent, and add item title to Twitter and email links (<a href="https://github.com/ilri/DSpace/pull/421">#421</a>)</li>
|
||||
<li>Add new CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/pull/420">#420</a>)</li>
|
||||
<li>Add item ID to REST API error logging (<a href="https://github.com/ilri/DSpace/pull/422">#422</a>)</li>
|
||||
</ul></li>
|
||||
<li>Re-deploy CGSpace from <code>5_x-prod</code> branch</li>
|
||||
<li>Run all system updates on CGSpace (linode18) and reboot it</li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="404 Page not found"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -108,15 +108,14 @@
|
||||
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
|
||||
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
|
||||
</ul></li>
|
||||
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</li>
|
||||
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
|
||||
</article>
|
||||
@ -143,27 +142,27 @@ DELETE 1
|
||||
<ul>
|
||||
<li>They asked if we had plans to enable RDF support in CGSpace</li>
|
||||
</ul></li>
|
||||
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
|
||||
|
||||
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p>
|
||||
|
||||
<ul>
|
||||
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p>
|
||||
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
|
||||
4432 200
|
||||
</code></pre>
|
||||
4432 200
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul>
|
||||
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
|
||||
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li>
|
||||
|
||||
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
|
||||
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -218,27 +217,27 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
|
||||
<ul>
|
||||
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
|
||||
<li>The top IPs before, during, and after this latest alert tonight were:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top IPs before, during, and after this latest alert tonight were:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre>
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
|
||||
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
|
||||
<li>There were just over 3 million accesses in the nginx logs last month:</li>
|
||||
</ul>
|
||||
<li><p><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</p></li>
|
||||
|
||||
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li>
|
||||
|
||||
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
|
||||
|
||||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
|
||||
3018243
|
||||
@ -246,7 +245,8 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
real 0m19.873s
|
||||
user 0m22.203s
|
||||
sys 0m1.979s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -268,21 +268,22 @@ sys 0m1.979s
|
||||
|
||||
<ul>
|
||||
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
|
||||
<li>I don’t see anything interesting in the web server logs around that time though:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I don’t see anything interesting in the web server logs around that time though:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre>
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -411,21 +412,24 @@ sys 0m1.979s
|
||||
<h2 id="2018-08-01">2018-08-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p>
|
||||
|
||||
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
||||
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
|
||||
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
|
||||
<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
|
||||
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
|
||||
<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
|
||||
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
||||
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li>
|
||||
|
||||
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</p></li>
|
||||
|
||||
<li><p>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</p></li>
|
||||
|
||||
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
|
||||
|
||||
<li><p>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
|
||||
|
||||
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -101,18 +101,16 @@
|
||||
<h2 id="2018-07-01">2018-07-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
|
||||
</ul>
|
||||
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p>
|
||||
|
||||
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
|
||||
</ul>
|
||||
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
|
||||
|
||||
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -139,23 +137,23 @@
|
||||
<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
|
||||
</ul></li>
|
||||
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
|
||||
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
|
||||
<li>Time to index ~70,000 items on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></p></li>
|
||||
|
||||
<li><p>Time to index ~70,000 items on CGSpace:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -279,26 +277,25 @@ sys 2m7.289s
|
||||
<li>I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary</li>
|
||||
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
|
||||
<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
|
||||
<li>And just before that I see this:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>And just before that I see this:</p>
|
||||
|
||||
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Ah hah! So the pool was actually empty!</li>
|
||||
<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
|
||||
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
|
||||
<li>I notice this error quite a few times in dspace.log:</li>
|
||||
</ul>
|
||||
<li><p>Ah hah! So the pool was actually empty!</p></li>
|
||||
|
||||
<li><p>I need to increase that, let’s try to bump it up from 50 to 75</p></li>
|
||||
|
||||
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</p></li>
|
||||
|
||||
<li><p>I notice this error quite a few times in dspace.log:</p>
|
||||
|
||||
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
||||
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And there are many of these errors every day for the past month:</li>
|
||||
</ul>
|
||||
<li><p>And there are many of these errors every day for the past month:</p>
|
||||
|
||||
<pre><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
|
||||
dspace.log.2017-11-21:4
|
||||
@ -344,10 +341,9 @@ dspace.log.2017-12-30:89
|
||||
dspace.log.2017-12-31:53
|
||||
dspace.log.2018-01-01:45
|
||||
dspace.log.2018-01-02:34
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</li>
|
||||
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a>
|
||||
</article>
|
||||
@ -400,20 +396,18 @@ dspace.log.2018-01-02:34
|
||||
<h2 id="2017-11-02">2017-11-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
|
||||
</ul>
|
||||
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p>
|
||||
|
||||
<pre><code># grep -c "CORE" /var/log/nginx/access.log
|
||||
0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
|
||||
</ul>
|
||||
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
|
||||
COPY 54701
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -434,15 +428,14 @@ COPY 54701
|
||||
<h2 id="2017-10-01">2017-10-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
|
||||
</ul>
|
||||
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p>
|
||||
|
||||
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
|
||||
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
|
||||
<li><p>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li>
|
||||
|
||||
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -261,11 +261,12 @@
|
||||
|
||||
<ul>
|
||||
<li>Remove redundant/duplicate text in the DSpace submission license</li>
|
||||
<li>Testing the CMYK patch on a collection with 650 items:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -300,12 +301,13 @@
|
||||
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
|
||||
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
|
||||
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
|
||||
<li>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>⁄<sub>51999</sub></a>):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>⁄<sub>51999</sub></a>):</p>
|
||||
|
||||
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -326,23 +328,22 @@
|
||||
<h2 id="2017-02-07">2017-02-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
|
||||
</ul>
|
||||
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p>
|
||||
|
||||
<pre><code>dspace=# select * from collection2item where item_id = '80278';
|
||||
id | collection_id | item_id
|
||||
id | collection_id | item_id
|
||||
-------+---------------+---------
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
(3 rows)
|
||||
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
|
||||
<li>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
|
||||
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li>
|
||||
|
||||
<li><p>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -102,20 +102,21 @@
|
||||
|
||||
<ul>
|
||||
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
|
||||
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p>
|
||||
|
||||
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
|
||||
<li>I’ve raised a ticket with Atmire to ask</li>
|
||||
<li>Another worrying error from dspace.log is:</li>
|
||||
<li><p>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</p></li>
|
||||
|
||||
<li><p>I’ve raised a ticket with Atmire to ask</p></li>
|
||||
|
||||
<li><p>Another worrying error from dspace.log is:</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a>
|
||||
</article>
|
||||
@ -168,11 +169,12 @@
|
||||
<li>ORCIDs only</li>
|
||||
<li>ORCIDs plus normal authors</li>
|
||||
</ul></li>
|
||||
<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
|
||||
|
||||
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -196,11 +198,12 @@
|
||||
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
|
||||
<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
|
||||
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
|
||||
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
|
||||
|
||||
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -226,13 +229,14 @@
|
||||
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
|
||||
<li>bower stuff is a dead end, waste of time, too many issues</li>
|
||||
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
|
||||
<li>Start working on DSpace 5.1 → 5.5 port:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Start working on DSpace 5.1 → 5.5 port:</p>
|
||||
|
||||
<pre><code>$ git checkout -b 55new 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -254,19 +258,18 @@ $ git rebase -i dspace-5.5
|
||||
|
||||
<ul>
|
||||
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
|
||||
<li>I think this query should find and replace all authors that have “,” at the end of their names:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think this query should find and replace all authors that have “,” at the end of their names:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
UPDATE 95
|
||||
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
text_value
|
||||
text_value
|
||||
------------
|
||||
(0 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In this case the select query was showing 95 results before the update</li>
|
||||
<li><p>In this case the select query was showing 95 results before the update</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a>
|
||||
</article>
|
||||
@ -317,12 +320,13 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<ul>
|
||||
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
|
||||
<li>I have blocked access to the API now</li>
|
||||
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
|
||||
</ul>
|
||||
|
||||
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
|
||||
3168
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -156,15 +156,15 @@
|
||||
<h2 id="2015-12-02">2015-12-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
|
||||
</ul>
|
||||
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p>
|
||||
|
||||
<pre><code># cd /home/dspacetest.cgiar.org/log
|
||||
# ls -lh dspace.log.2015-11-18*
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -187,12 +187,13 @@
|
||||
<ul>
|
||||
<li>CGSpace went down</li>
|
||||
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
|
||||
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
78
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGIAR Library Migration"/>
|
||||
<meta name="twitter:description" content="Notes on the migration of the CGIAR Library to CGSpace"/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -25,7 +25,7 @@
|
||||
"@type": "BlogPosting",
|
||||
"headline": "CGIAR Library Migration",
|
||||
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/cgiar-library-migration\/",
|
||||
"wordCount": "1278",
|
||||
"wordCount": "1285",
|
||||
"datePublished": "2017-09-18T16:38:35\x2b03:00",
|
||||
"dateModified": "2018-03-09T22:10:33\x2b02:00",
|
||||
"author": {
|
||||
@ -121,8 +121,8 @@
|
||||
<li><code>SELECT * FROM pg_stat_activity;</code> seems to show ~6 extra connections used by the command line tools during import</li>
|
||||
</ul></label></li>
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Temporarily disable nightly <code>index-discovery</code> cron job because the import process will be taking place during some of this time and I don’t want them to be competing to update the Solr index</label></li>
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Copy HTTPS certificate key pair from CGIAR Library server’s Tomcat keystore:</label></li>
|
||||
</ul>
|
||||
|
||||
<li><p>[x] Copy HTTPS certificate key pair from CGIAR Library server’s Tomcat keystore:</p>
|
||||
|
||||
<pre><code>$ keytool -list -keystore tomcat.keystore
|
||||
$ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat
|
||||
@ -130,7 +130,8 @@ $ openssl pkcs12 -in library.cgiar.org.p12 -nokeys -out library.cgiar.org.crt.pe
|
||||
$ openssl pkcs12 -in library.cgiar.org.p12 -nodes -nocerts -out library.cgiar.org.key.pem
|
||||
$ wget https://certs.godaddy.com/repository/gdroot-g2.crt https://certs.godaddy.com/repository/gdig2.crt.pem
|
||||
$ cat library.cgiar.org.crt.pem gdig2.crt.pem > library.cgiar.org-chained.pem
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="migration-process">Migration Process</h2>
|
||||
|
||||
@ -155,16 +156,14 @@ $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip
|
||||
|
||||
<ul class="task-list">
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Copy all exports from DSpace Test</label></li>
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Add ingestion overrides to <code>dspace.cfg</code> before import:</label></li>
|
||||
</ul>
|
||||
|
||||
<li><p>[x] Add ingestion overrides to <code>dspace.cfg</code> before import:</p>
|
||||
|
||||
<pre><code>mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
|
||||
mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul class="task-list">
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Import communities and collections, paying attention to options to skip missing parents and ignore handles:</label></li>
|
||||
</ul>
|
||||
<li><p>[x] Import communities and collections, paying attention to options to skip missing parents and ignore handles:</p>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
||||
$ export PATH=$PATH:/home/cgspace.cgiar.org/bin
|
||||
@ -182,36 +181,37 @@ $ for item in 10947-2527/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aor
|
||||
$ dspace packager -s -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-1/10947-1.zip
|
||||
$ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
|
||||
$ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<p>This submits AIP hierarchies recursively (-r) and suppresses errors when an item’s parent collection hasn’t been created yet—for example, if the item is mapped. The large historic archive (<sup>10947</sup>⁄<sub>1</sub>) is created in several steps because it requires a lot of memory and often crashes.</p>
|
||||
|
||||
<p><strong>Create new subcommunities and collections for content we reorganized into new hierarchies from the original:</strong></p>
|
||||
|
||||
<ul class="task-list">
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Board</em> sub-community: <code>10568/83536</code>
|
||||
<li><p>[x] Create <em>CGIAR System Management Board</em> sub-community: <code>10568/83536</code></p>
|
||||
|
||||
<ul class="task-list">
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Content from <em>CGIAR System Management Board documents</em> collection (<code>10947/4561</code>) goes here</label></li>
|
||||
<li>Import collection hierarchy first and then the items:</li>
|
||||
</ul></label></li>
|
||||
</ul>
|
||||
|
||||
<li><p>Import collection hierarchy first and then the items:</p>
|
||||
|
||||
<pre><code>$ dspace packager -r -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83536 10568-93760/COLLECTION@10947-4651.zip
|
||||
$ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul class="task-list">
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Office</em> sub-community: <code>10568/83537</code>
|
||||
<li><p>[x] Create <em>CGIAR System Management Office</em> sub-community: <code>10568/83537</code></p>
|
||||
|
||||
<ul class="task-list">
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Office documents</em> collection: <code>10568/83538</code></label></li>
|
||||
<li>Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:</li>
|
||||
</ul></label></li>
|
||||
</ul>
|
||||
|
||||
<li><p>Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:</p>
|
||||
|
||||
<pre><code>$ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<p><strong>Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:</strong></p>
|
||||
|
||||
@ -219,18 +219,16 @@ $ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Export them from the CGIAR Library:</li>
|
||||
</ul>
|
||||
<li><p>Export them from the CGIAR Library:</p>
|
||||
|
||||
<pre><code># for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Import on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>Import on CGSpace:</p>
|
||||
|
||||
<pre><code>$ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="post-migration">Post Migration</h2>
|
||||
|
||||
@ -239,8 +237,8 @@ $ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Remove ingestion overrides from <code>dspace.cfg</code></label></li>
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Reset PostgreSQL <code>max_connections</code> to 183</label></li>
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Enable nightly <code>index-discovery</code> cron job</label></li>
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Adjust CGSpace’s <code>handle-server/config.dct</code> to add the new prefix alongside our existing 10568, ie:</label></li>
|
||||
</ul>
|
||||
|
||||
<li><p>[x] Adjust CGSpace’s <code>handle-server/config.dct</code> to add the new prefix alongside our existing 10568, ie:</p>
|
||||
|
||||
<pre><code>"server_admins" = (
|
||||
"300:0.NA/10568"
|
||||
@ -256,7 +254,8 @@ $ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e
|
||||
"300:0.NA/10568"
|
||||
"300:0.NA/10947"
|
||||
)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<p>I had been regenerated the <code>sitebndl.zip</code> file on the CGIAR Library server and sent it to the Handle.net admins but they said that there were mismatches between the public and private keys, which I suspect is due to <code>make-handle-config</code> not being very flexible. After discussing our scenario with the Handle.net admins they said we actually don’t need to send an updated <code>sitebndl.zip</code> for this type of change, and the above <code>config.dct</code> edits are all that is required. I guess they just did something on their end by setting the authoritative IP address for the 10947 prefix to be the same as ours…</p>
|
||||
|
||||
@ -269,13 +268,14 @@ $ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Re-deploy DSpace from freshly built <code>5_x-prod</code> branch</label></li>
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Merge <code>cgiar-library</code> branch to <code>master</code> and re-run ansible nginx templates</label></li>
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Run system updates and reboot server</label></li>
|
||||
<li><label><input type="checkbox" checked disabled class="task-list-item"> Switch to Let’s Encrypt HTTPS certificates (after DNS is updated and server isn’t busy):</label></li>
|
||||
</ul>
|
||||
|
||||
<li><p>[x] Switch to Let’s Encrypt HTTPS certificates (after DNS is updated and server isn’t busy):</p>
|
||||
|
||||
<pre><code>$ sudo systemctl stop nginx
|
||||
$ /opt/certbot-auto certonly --standalone -d library.cgiar.org
|
||||
$ sudo systemctl start nginx
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="troubleshooting">Troubleshooting</h2>
|
||||
|
||||
|
122
docs/index.html
122
docs/index.html
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -108,15 +108,14 @@
|
||||
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
|
||||
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
|
||||
</ul></li>
|
||||
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</li>
|
||||
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
|
||||
</article>
|
||||
@ -143,27 +142,27 @@ DELETE 1
|
||||
<ul>
|
||||
<li>They asked if we had plans to enable RDF support in CGSpace</li>
|
||||
</ul></li>
|
||||
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
|
||||
|
||||
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p>
|
||||
|
||||
<ul>
|
||||
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p>
|
||||
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
|
||||
4432 200
|
||||
</code></pre>
|
||||
4432 200
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul>
|
||||
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
|
||||
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li>
|
||||
|
||||
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
|
||||
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -218,27 +217,27 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
|
||||
<ul>
|
||||
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
|
||||
<li>The top IPs before, during, and after this latest alert tonight were:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top IPs before, during, and after this latest alert tonight were:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre>
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
|
||||
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
|
||||
<li>There were just over 3 million accesses in the nginx logs last month:</li>
|
||||
</ul>
|
||||
<li><p><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</p></li>
|
||||
|
||||
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li>
|
||||
|
||||
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
|
||||
|
||||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
|
||||
3018243
|
||||
@ -246,7 +245,8 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
real 0m19.873s
|
||||
user 0m22.203s
|
||||
sys 0m1.979s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -268,21 +268,22 @@ sys 0m1.979s
|
||||
|
||||
<ul>
|
||||
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
|
||||
<li>I don’t see anything interesting in the web server logs around that time though:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I don’t see anything interesting in the web server logs around that time though:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre>
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -411,21 +412,24 @@ sys 0m1.979s
|
||||
<h2 id="2018-08-01">2018-08-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p>
|
||||
|
||||
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
||||
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
|
||||
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
|
||||
<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
|
||||
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
|
||||
<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
|
||||
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
||||
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li>
|
||||
|
||||
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</p></li>
|
||||
|
||||
<li><p>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</p></li>
|
||||
|
||||
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
|
||||
|
||||
<li><p>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
|
||||
|
||||
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
|
||||
</article>
|
||||
|
301
docs/index.xml
301
docs/index.xml
@ -27,15 +27,14 @@
|
||||
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
|
||||
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
|
||||
</ul></li>
|
||||
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
|
||||
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -53,27 +52,27 @@ DELETE 1
|
||||
<ul>
|
||||
<li>They asked if we had plans to enable RDF support in CGSpace</li>
|
||||
</ul></li>
|
||||
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
|
||||
|
||||
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p>
|
||||
|
||||
<ul>
|
||||
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p>
|
||||
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
|
||||
4432 200
|
||||
</code></pre>
|
||||
4432 200
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul>
|
||||
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
|
||||
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li>
|
||||
|
||||
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
|
||||
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -110,27 +109,27 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
|
||||
<ul>
|
||||
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
|
||||
<li>The top IPs before, during, and after this latest alert tonight were:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top IPs before, during, and after this latest alert tonight were:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre>
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</li>
|
||||
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
|
||||
<li>There were just over 3 million accesses in the nginx logs last month:</li>
|
||||
</ul>
|
||||
<li><p><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</p></li>
|
||||
|
||||
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li>
|
||||
|
||||
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
|
||||
|
||||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot;
|
||||
3018243
|
||||
@ -138,7 +137,8 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
real 0m19.873s
|
||||
user 0m22.203s
|
||||
sys 0m1.979s
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -151,21 +151,22 @@ sys 0m1.979s
|
||||
|
||||
<ul>
|
||||
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
|
||||
<li>I don&rsquo;t see anything interesting in the web server logs around that time though:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I don&rsquo;t see anything interesting in the web server logs around that time though:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre></description>
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -249,21 +250,24 @@ sys 0m1.979s
|
||||
<description><h2 id="2018-08-01">2018-08-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p>
|
||||
|
||||
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
||||
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
|
||||
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</li>
|
||||
<li>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</li>
|
||||
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
|
||||
<li>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</li>
|
||||
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
||||
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li>
|
||||
|
||||
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</p></li>
|
||||
|
||||
<li><p>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</p></li>
|
||||
|
||||
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
|
||||
|
||||
<li><p>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
|
||||
|
||||
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -276,18 +280,16 @@ sys 0m1.979s
|
||||
<description><h2 id="2018-07-01">2018-07-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
|
||||
</ul>
|
||||
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p>
|
||||
|
||||
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
|
||||
</ul>
|
||||
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
|
||||
|
||||
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -305,23 +307,23 @@ sys 0m1.979s
|
||||
<li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn&rsquo;t build</li>
|
||||
</ul></li>
|
||||
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
|
||||
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/">March, 2018</a></li>
|
||||
<li>Time to index ~70,000 items on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/">March, 2018</a></p></li>
|
||||
|
||||
<li><p>Time to index ~70,000 items on CGSpace:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -400,26 +402,25 @@ sys 2m7.289s
|
||||
<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li>
|
||||
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
|
||||
<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li>
|
||||
<li>And just before that I see this:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>And just before that I see this:</p>
|
||||
|
||||
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Ah hah! So the pool was actually empty!</li>
|
||||
<li>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</li>
|
||||
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</li>
|
||||
<li>I notice this error quite a few times in dspace.log:</li>
|
||||
</ul>
|
||||
<li><p>Ah hah! So the pool was actually empty!</p></li>
|
||||
|
||||
<li><p>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</p></li>
|
||||
|
||||
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</p></li>
|
||||
|
||||
<li><p>I notice this error quite a few times in dspace.log:</p>
|
||||
|
||||
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
||||
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And there are many of these errors every day for the past month:</li>
|
||||
</ul>
|
||||
<li><p>And there are many of these errors every day for the past month:</p>
|
||||
|
||||
<pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.*
|
||||
dspace.log.2017-11-21:4
|
||||
@ -465,10 +466,9 @@ dspace.log.2017-12-30:89
|
||||
dspace.log.2017-12-31:53
|
||||
dspace.log.2018-01-01:45
|
||||
dspace.log.2018-01-02:34
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</li>
|
||||
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -503,20 +503,18 @@ dspace.log.2018-01-02:34
|
||||
<h2 id="2017-11-02">2017-11-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
|
||||
</ul>
|
||||
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p>
|
||||
|
||||
<pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log
|
||||
0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
|
||||
</ul>
|
||||
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
|
||||
COPY 54701
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -528,15 +526,14 @@ COPY 54701
|
||||
<description><h2 id="2017-10-01">2017-10-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
|
||||
</ul>
|
||||
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p>
|
||||
|
||||
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
|
||||
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
|
||||
<li><p>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li>
|
||||
|
||||
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -655,11 +652,12 @@ COPY 54701
|
||||
|
||||
<ul>
|
||||
<li>Remove redundant/duplicate text in the DSpace submission license</li>
|
||||
<li>Testing the CMYK patch on a collection with 650 items:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -685,12 +683,13 @@ COPY 54701
|
||||
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
|
||||
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
|
||||
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
|
||||
<li>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</p>
|
||||
|
||||
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -702,23 +701,22 @@ COPY 54701
|
||||
<description><h2 id="2017-02-07">2017-02-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
|
||||
</ul>
|
||||
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p>
|
||||
|
||||
<pre><code>dspace=# select * from collection2item where item_id = '80278';
|
||||
id | collection_id | item_id
|
||||
id | collection_id | item_id
|
||||
-------+---------------+---------
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
(3 rows)
|
||||
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
|
||||
<li>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
|
||||
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li>
|
||||
|
||||
<li><p>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -747,20 +745,21 @@ DELETE 1
|
||||
|
||||
<ul>
|
||||
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
|
||||
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p>
|
||||
|
||||
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</li>
|
||||
<li>I&rsquo;ve raised a ticket with Atmire to ask</li>
|
||||
<li>Another worrying error from dspace.log is:</li>
|
||||
<li><p>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</p></li>
|
||||
|
||||
<li><p>I&rsquo;ve raised a ticket with Atmire to ask</p></li>
|
||||
|
||||
<li><p>Another worrying error from dspace.log is:</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -795,11 +794,12 @@ DELETE 1
|
||||
<li>ORCIDs only</li>
|
||||
<li>ORCIDs plus normal authors</li>
|
||||
</ul></li>
|
||||
<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
|
||||
|
||||
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -814,11 +814,12 @@ DELETE 1
|
||||
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
|
||||
<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
|
||||
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
|
||||
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
|
||||
|
||||
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -835,13 +836,14 @@ DELETE 1
|
||||
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
|
||||
<li>bower stuff is a dead end, waste of time, too many issues</li>
|
||||
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
|
||||
<li>Start working on DSpace 5.1 → 5.5 port:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Start working on DSpace 5.1 → 5.5 port:</p>
|
||||
|
||||
<pre><code>$ git checkout -b 55new 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -854,19 +856,18 @@ $ git rebase -i dspace-5.5
|
||||
|
||||
<ul>
|
||||
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
|
||||
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
UPDATE 95
|
||||
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
text_value
|
||||
text_value
|
||||
------------
|
||||
(0 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In this case the select query was showing 95 results before the update</li>
|
||||
<li><p>In this case the select query was showing 95 results before the update</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -899,12 +900,13 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<ul>
|
||||
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
|
||||
<li>I have blocked access to the API now</li>
|
||||
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
|
||||
</ul>
|
||||
|
||||
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
|
||||
3168
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -985,15 +987,15 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<description><h2 id="2015-12-02">2015-12-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
|
||||
</ul>
|
||||
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p>
|
||||
|
||||
<pre><code># cd /home/dspacetest.cgiar.org/log
|
||||
# ls -lh dspace.log.2015-11-18*
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -1007,12 +1009,13 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<ul>
|
||||
<li>CGSpace went down</li>
|
||||
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
|
||||
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
78
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
</channel>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -101,18 +101,16 @@
|
||||
<h2 id="2018-07-01">2018-07-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
|
||||
</ul>
|
||||
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p>
|
||||
|
||||
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
|
||||
</ul>
|
||||
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
|
||||
|
||||
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -139,23 +137,23 @@
|
||||
<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
|
||||
</ul></li>
|
||||
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
|
||||
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
|
||||
<li>Time to index ~70,000 items on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></p></li>
|
||||
|
||||
<li><p>Time to index ~70,000 items on CGSpace:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -279,26 +277,25 @@ sys 2m7.289s
|
||||
<li>I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary</li>
|
||||
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
|
||||
<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
|
||||
<li>And just before that I see this:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>And just before that I see this:</p>
|
||||
|
||||
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Ah hah! So the pool was actually empty!</li>
|
||||
<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
|
||||
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
|
||||
<li>I notice this error quite a few times in dspace.log:</li>
|
||||
</ul>
|
||||
<li><p>Ah hah! So the pool was actually empty!</p></li>
|
||||
|
||||
<li><p>I need to increase that, let’s try to bump it up from 50 to 75</p></li>
|
||||
|
||||
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</p></li>
|
||||
|
||||
<li><p>I notice this error quite a few times in dspace.log:</p>
|
||||
|
||||
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
||||
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And there are many of these errors every day for the past month:</li>
|
||||
</ul>
|
||||
<li><p>And there are many of these errors every day for the past month:</p>
|
||||
|
||||
<pre><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
|
||||
dspace.log.2017-11-21:4
|
||||
@ -344,10 +341,9 @@ dspace.log.2017-12-30:89
|
||||
dspace.log.2017-12-31:53
|
||||
dspace.log.2018-01-01:45
|
||||
dspace.log.2018-01-02:34
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</li>
|
||||
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a>
|
||||
</article>
|
||||
@ -400,20 +396,18 @@ dspace.log.2018-01-02:34
|
||||
<h2 id="2017-11-02">2017-11-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
|
||||
</ul>
|
||||
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p>
|
||||
|
||||
<pre><code># grep -c "CORE" /var/log/nginx/access.log
|
||||
0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
|
||||
</ul>
|
||||
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
|
||||
COPY 54701
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -434,15 +428,14 @@ COPY 54701
|
||||
<h2 id="2017-10-01">2017-10-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
|
||||
</ul>
|
||||
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p>
|
||||
|
||||
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
|
||||
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
|
||||
<li><p>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li>
|
||||
|
||||
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -261,11 +261,12 @@
|
||||
|
||||
<ul>
|
||||
<li>Remove redundant/duplicate text in the DSpace submission license</li>
|
||||
<li>Testing the CMYK patch on a collection with 650 items:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -300,12 +301,13 @@
|
||||
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
|
||||
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
|
||||
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
|
||||
<li>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>⁄<sub>51999</sub></a>):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>⁄<sub>51999</sub></a>):</p>
|
||||
|
||||
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -326,23 +328,22 @@
|
||||
<h2 id="2017-02-07">2017-02-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
|
||||
</ul>
|
||||
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p>
|
||||
|
||||
<pre><code>dspace=# select * from collection2item where item_id = '80278';
|
||||
id | collection_id | item_id
|
||||
id | collection_id | item_id
|
||||
-------+---------------+---------
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
(3 rows)
|
||||
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
|
||||
<li>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
|
||||
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li>
|
||||
|
||||
<li><p>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -102,20 +102,21 @@
|
||||
|
||||
<ul>
|
||||
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
|
||||
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p>
|
||||
|
||||
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
|
||||
<li>I’ve raised a ticket with Atmire to ask</li>
|
||||
<li>Another worrying error from dspace.log is:</li>
|
||||
<li><p>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</p></li>
|
||||
|
||||
<li><p>I’ve raised a ticket with Atmire to ask</p></li>
|
||||
|
||||
<li><p>Another worrying error from dspace.log is:</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a>
|
||||
</article>
|
||||
@ -168,11 +169,12 @@
|
||||
<li>ORCIDs only</li>
|
||||
<li>ORCIDs plus normal authors</li>
|
||||
</ul></li>
|
||||
<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
|
||||
|
||||
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -196,11 +198,12 @@
|
||||
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
|
||||
<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
|
||||
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
|
||||
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
|
||||
|
||||
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -226,13 +229,14 @@
|
||||
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
|
||||
<li>bower stuff is a dead end, waste of time, too many issues</li>
|
||||
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
|
||||
<li>Start working on DSpace 5.1 → 5.5 port:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Start working on DSpace 5.1 → 5.5 port:</p>
|
||||
|
||||
<pre><code>$ git checkout -b 55new 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -254,19 +258,18 @@ $ git rebase -i dspace-5.5
|
||||
|
||||
<ul>
|
||||
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
|
||||
<li>I think this query should find and replace all authors that have “,” at the end of their names:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think this query should find and replace all authors that have “,” at the end of their names:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
UPDATE 95
|
||||
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
text_value
|
||||
text_value
|
||||
------------
|
||||
(0 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In this case the select query was showing 95 results before the update</li>
|
||||
<li><p>In this case the select query was showing 95 results before the update</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a>
|
||||
</article>
|
||||
@ -317,12 +320,13 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<ul>
|
||||
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
|
||||
<li>I have blocked access to the API now</li>
|
||||
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
|
||||
</ul>
|
||||
|
||||
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
|
||||
3168
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -156,15 +156,15 @@
|
||||
<h2 id="2015-12-02">2015-12-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
|
||||
</ul>
|
||||
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p>
|
||||
|
||||
<pre><code># cd /home/dspacetest.cgiar.org/log
|
||||
# ls -lh dspace.log.2015-11-18*
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -187,12 +187,13 @@
|
||||
<ul>
|
||||
<li>CGSpace went down</li>
|
||||
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
|
||||
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
78
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -108,15 +108,14 @@
|
||||
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
|
||||
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
|
||||
</ul></li>
|
||||
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</li>
|
||||
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
|
||||
</article>
|
||||
@ -143,27 +142,27 @@ DELETE 1
|
||||
<ul>
|
||||
<li>They asked if we had plans to enable RDF support in CGSpace</li>
|
||||
</ul></li>
|
||||
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
|
||||
|
||||
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p>
|
||||
|
||||
<ul>
|
||||
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p>
|
||||
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
|
||||
4432 200
|
||||
</code></pre>
|
||||
4432 200
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul>
|
||||
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
|
||||
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li>
|
||||
|
||||
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
|
||||
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -218,27 +217,27 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
|
||||
<ul>
|
||||
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
|
||||
<li>The top IPs before, during, and after this latest alert tonight were:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top IPs before, during, and after this latest alert tonight were:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre>
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
|
||||
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
|
||||
<li>There were just over 3 million accesses in the nginx logs last month:</li>
|
||||
</ul>
|
||||
<li><p><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</p></li>
|
||||
|
||||
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li>
|
||||
|
||||
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
|
||||
|
||||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
|
||||
3018243
|
||||
@ -246,7 +245,8 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
real 0m19.873s
|
||||
user 0m22.203s
|
||||
sys 0m1.979s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -268,21 +268,22 @@ sys 0m1.979s
|
||||
|
||||
<ul>
|
||||
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
|
||||
<li>I don’t see anything interesting in the web server logs around that time though:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I don’t see anything interesting in the web server logs around that time though:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre>
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -411,21 +412,24 @@ sys 0m1.979s
|
||||
<h2 id="2018-08-01">2018-08-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p>
|
||||
|
||||
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
||||
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
|
||||
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
|
||||
<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
|
||||
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
|
||||
<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
|
||||
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
||||
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li>
|
||||
|
||||
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</p></li>
|
||||
|
||||
<li><p>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</p></li>
|
||||
|
||||
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
|
||||
|
||||
<li><p>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
|
||||
|
||||
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -27,15 +27,14 @@
|
||||
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
|
||||
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
|
||||
</ul></li>
|
||||
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
|
||||
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -53,27 +52,27 @@ DELETE 1
|
||||
<ul>
|
||||
<li>They asked if we had plans to enable RDF support in CGSpace</li>
|
||||
</ul></li>
|
||||
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
|
||||
|
||||
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p>
|
||||
|
||||
<ul>
|
||||
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p>
|
||||
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
|
||||
4432 200
|
||||
</code></pre>
|
||||
4432 200
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul>
|
||||
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
|
||||
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li>
|
||||
|
||||
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
|
||||
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -110,27 +109,27 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
|
||||
<ul>
|
||||
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
|
||||
<li>The top IPs before, during, and after this latest alert tonight were:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top IPs before, during, and after this latest alert tonight were:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre>
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</li>
|
||||
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
|
||||
<li>There were just over 3 million accesses in the nginx logs last month:</li>
|
||||
</ul>
|
||||
<li><p><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</p></li>
|
||||
|
||||
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li>
|
||||
|
||||
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
|
||||
|
||||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot;
|
||||
3018243
|
||||
@ -138,7 +137,8 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
real 0m19.873s
|
||||
user 0m22.203s
|
||||
sys 0m1.979s
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -151,21 +151,22 @@ sys 0m1.979s
|
||||
|
||||
<ul>
|
||||
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
|
||||
<li>I don&rsquo;t see anything interesting in the web server logs around that time though:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I don&rsquo;t see anything interesting in the web server logs around that time though:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre></description>
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -249,21 +250,24 @@ sys 0m1.979s
|
||||
<description><h2 id="2018-08-01">2018-08-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p>
|
||||
|
||||
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
||||
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
|
||||
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</li>
|
||||
<li>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</li>
|
||||
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
|
||||
<li>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</li>
|
||||
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
||||
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li>
|
||||
|
||||
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</p></li>
|
||||
|
||||
<li><p>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</p></li>
|
||||
|
||||
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
|
||||
|
||||
<li><p>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
|
||||
|
||||
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -276,18 +280,16 @@ sys 0m1.979s
|
||||
<description><h2 id="2018-07-01">2018-07-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
|
||||
</ul>
|
||||
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p>
|
||||
|
||||
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
|
||||
</ul>
|
||||
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
|
||||
|
||||
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -305,23 +307,23 @@ sys 0m1.979s
|
||||
<li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn&rsquo;t build</li>
|
||||
</ul></li>
|
||||
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
|
||||
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/">March, 2018</a></li>
|
||||
<li>Time to index ~70,000 items on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/">March, 2018</a></p></li>
|
||||
|
||||
<li><p>Time to index ~70,000 items on CGSpace:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -400,26 +402,25 @@ sys 2m7.289s
|
||||
<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li>
|
||||
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
|
||||
<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li>
|
||||
<li>And just before that I see this:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>And just before that I see this:</p>
|
||||
|
||||
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Ah hah! So the pool was actually empty!</li>
|
||||
<li>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</li>
|
||||
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</li>
|
||||
<li>I notice this error quite a few times in dspace.log:</li>
|
||||
</ul>
|
||||
<li><p>Ah hah! So the pool was actually empty!</p></li>
|
||||
|
||||
<li><p>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</p></li>
|
||||
|
||||
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</p></li>
|
||||
|
||||
<li><p>I notice this error quite a few times in dspace.log:</p>
|
||||
|
||||
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
||||
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And there are many of these errors every day for the past month:</li>
|
||||
</ul>
|
||||
<li><p>And there are many of these errors every day for the past month:</p>
|
||||
|
||||
<pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.*
|
||||
dspace.log.2017-11-21:4
|
||||
@ -465,10 +466,9 @@ dspace.log.2017-12-30:89
|
||||
dspace.log.2017-12-31:53
|
||||
dspace.log.2018-01-01:45
|
||||
dspace.log.2018-01-02:34
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</li>
|
||||
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -503,20 +503,18 @@ dspace.log.2018-01-02:34
|
||||
<h2 id="2017-11-02">2017-11-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
|
||||
</ul>
|
||||
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p>
|
||||
|
||||
<pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log
|
||||
0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
|
||||
</ul>
|
||||
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
|
||||
COPY 54701
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -528,15 +526,14 @@ COPY 54701
|
||||
<description><h2 id="2017-10-01">2017-10-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
|
||||
</ul>
|
||||
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p>
|
||||
|
||||
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
|
||||
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
|
||||
<li><p>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li>
|
||||
|
||||
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -655,11 +652,12 @@ COPY 54701
|
||||
|
||||
<ul>
|
||||
<li>Remove redundant/duplicate text in the DSpace submission license</li>
|
||||
<li>Testing the CMYK patch on a collection with 650 items:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -685,12 +683,13 @@ COPY 54701
|
||||
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
|
||||
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
|
||||
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
|
||||
<li>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</p>
|
||||
|
||||
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -702,23 +701,22 @@ COPY 54701
|
||||
<description><h2 id="2017-02-07">2017-02-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
|
||||
</ul>
|
||||
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p>
|
||||
|
||||
<pre><code>dspace=# select * from collection2item where item_id = '80278';
|
||||
id | collection_id | item_id
|
||||
id | collection_id | item_id
|
||||
-------+---------------+---------
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
(3 rows)
|
||||
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
|
||||
<li>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
|
||||
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li>
|
||||
|
||||
<li><p>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -747,20 +745,21 @@ DELETE 1
|
||||
|
||||
<ul>
|
||||
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
|
||||
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p>
|
||||
|
||||
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</li>
|
||||
<li>I&rsquo;ve raised a ticket with Atmire to ask</li>
|
||||
<li>Another worrying error from dspace.log is:</li>
|
||||
<li><p>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</p></li>
|
||||
|
||||
<li><p>I&rsquo;ve raised a ticket with Atmire to ask</p></li>
|
||||
|
||||
<li><p>Another worrying error from dspace.log is:</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -795,11 +794,12 @@ DELETE 1
|
||||
<li>ORCIDs only</li>
|
||||
<li>ORCIDs plus normal authors</li>
|
||||
</ul></li>
|
||||
<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
|
||||
|
||||
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -814,11 +814,12 @@ DELETE 1
|
||||
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
|
||||
<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
|
||||
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
|
||||
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
|
||||
|
||||
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -835,13 +836,14 @@ DELETE 1
|
||||
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
|
||||
<li>bower stuff is a dead end, waste of time, too many issues</li>
|
||||
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
|
||||
<li>Start working on DSpace 5.1 → 5.5 port:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Start working on DSpace 5.1 → 5.5 port:</p>
|
||||
|
||||
<pre><code>$ git checkout -b 55new 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -854,19 +856,18 @@ $ git rebase -i dspace-5.5
|
||||
|
||||
<ul>
|
||||
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
|
||||
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
UPDATE 95
|
||||
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
text_value
|
||||
text_value
|
||||
------------
|
||||
(0 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In this case the select query was showing 95 results before the update</li>
|
||||
<li><p>In this case the select query was showing 95 results before the update</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -899,12 +900,13 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<ul>
|
||||
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
|
||||
<li>I have blocked access to the API now</li>
|
||||
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
|
||||
</ul>
|
||||
|
||||
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
|
||||
3168
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -985,15 +987,15 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<description><h2 id="2015-12-02">2015-12-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
|
||||
</ul>
|
||||
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p>
|
||||
|
||||
<pre><code># cd /home/dspacetest.cgiar.org/log
|
||||
# ls -lh dspace.log.2015-11-18*
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -1007,12 +1009,13 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<ul>
|
||||
<li>CGSpace went down</li>
|
||||
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
|
||||
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
78
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
</channel>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -101,18 +101,16 @@
|
||||
<h2 id="2018-07-01">2018-07-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
|
||||
</ul>
|
||||
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p>
|
||||
|
||||
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
|
||||
</ul>
|
||||
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
|
||||
|
||||
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -139,23 +137,23 @@
|
||||
<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
|
||||
</ul></li>
|
||||
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
|
||||
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
|
||||
<li>Time to index ~70,000 items on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></p></li>
|
||||
|
||||
<li><p>Time to index ~70,000 items on CGSpace:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -279,26 +277,25 @@ sys 2m7.289s
|
||||
<li>I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary</li>
|
||||
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
|
||||
<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
|
||||
<li>And just before that I see this:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>And just before that I see this:</p>
|
||||
|
||||
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Ah hah! So the pool was actually empty!</li>
|
||||
<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
|
||||
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
|
||||
<li>I notice this error quite a few times in dspace.log:</li>
|
||||
</ul>
|
||||
<li><p>Ah hah! So the pool was actually empty!</p></li>
|
||||
|
||||
<li><p>I need to increase that, let’s try to bump it up from 50 to 75</p></li>
|
||||
|
||||
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</p></li>
|
||||
|
||||
<li><p>I notice this error quite a few times in dspace.log:</p>
|
||||
|
||||
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
||||
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And there are many of these errors every day for the past month:</li>
|
||||
</ul>
|
||||
<li><p>And there are many of these errors every day for the past month:</p>
|
||||
|
||||
<pre><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
|
||||
dspace.log.2017-11-21:4
|
||||
@ -344,10 +341,9 @@ dspace.log.2017-12-30:89
|
||||
dspace.log.2017-12-31:53
|
||||
dspace.log.2018-01-01:45
|
||||
dspace.log.2018-01-02:34
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</li>
|
||||
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a>
|
||||
</article>
|
||||
@ -400,20 +396,18 @@ dspace.log.2018-01-02:34
|
||||
<h2 id="2017-11-02">2017-11-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
|
||||
</ul>
|
||||
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p>
|
||||
|
||||
<pre><code># grep -c "CORE" /var/log/nginx/access.log
|
||||
0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
|
||||
</ul>
|
||||
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
|
||||
COPY 54701
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -434,15 +428,14 @@ COPY 54701
|
||||
<h2 id="2017-10-01">2017-10-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
|
||||
</ul>
|
||||
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p>
|
||||
|
||||
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
|
||||
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
|
||||
<li><p>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li>
|
||||
|
||||
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -261,11 +261,12 @@
|
||||
|
||||
<ul>
|
||||
<li>Remove redundant/duplicate text in the DSpace submission license</li>
|
||||
<li>Testing the CMYK patch on a collection with 650 items:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -300,12 +301,13 @@
|
||||
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
|
||||
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
|
||||
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
|
||||
<li>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>⁄<sub>51999</sub></a>):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>⁄<sub>51999</sub></a>):</p>
|
||||
|
||||
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -326,23 +328,22 @@
|
||||
<h2 id="2017-02-07">2017-02-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
|
||||
</ul>
|
||||
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p>
|
||||
|
||||
<pre><code>dspace=# select * from collection2item where item_id = '80278';
|
||||
id | collection_id | item_id
|
||||
id | collection_id | item_id
|
||||
-------+---------------+---------
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
(3 rows)
|
||||
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
|
||||
<li>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
|
||||
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li>
|
||||
|
||||
<li><p>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -102,20 +102,21 @@
|
||||
|
||||
<ul>
|
||||
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
|
||||
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p>
|
||||
|
||||
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
|
||||
<li>I’ve raised a ticket with Atmire to ask</li>
|
||||
<li>Another worrying error from dspace.log is:</li>
|
||||
<li><p>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</p></li>
|
||||
|
||||
<li><p>I’ve raised a ticket with Atmire to ask</p></li>
|
||||
|
||||
<li><p>Another worrying error from dspace.log is:</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a>
|
||||
</article>
|
||||
@ -168,11 +169,12 @@
|
||||
<li>ORCIDs only</li>
|
||||
<li>ORCIDs plus normal authors</li>
|
||||
</ul></li>
|
||||
<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
|
||||
|
||||
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -196,11 +198,12 @@
|
||||
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
|
||||
<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
|
||||
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
|
||||
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
|
||||
|
||||
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -226,13 +229,14 @@
|
||||
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
|
||||
<li>bower stuff is a dead end, waste of time, too many issues</li>
|
||||
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
|
||||
<li>Start working on DSpace 5.1 → 5.5 port:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Start working on DSpace 5.1 → 5.5 port:</p>
|
||||
|
||||
<pre><code>$ git checkout -b 55new 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -254,19 +258,18 @@ $ git rebase -i dspace-5.5
|
||||
|
||||
<ul>
|
||||
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
|
||||
<li>I think this query should find and replace all authors that have “,” at the end of their names:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think this query should find and replace all authors that have “,” at the end of their names:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
UPDATE 95
|
||||
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
text_value
|
||||
text_value
|
||||
------------
|
||||
(0 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In this case the select query was showing 95 results before the update</li>
|
||||
<li><p>In this case the select query was showing 95 results before the update</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a>
|
||||
</article>
|
||||
@ -317,12 +320,13 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<ul>
|
||||
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
|
||||
<li>I have blocked access to the API now</li>
|
||||
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
|
||||
</ul>
|
||||
|
||||
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
|
||||
3168
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -156,15 +156,15 @@
|
||||
<h2 id="2015-12-02">2015-12-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
|
||||
</ul>
|
||||
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p>
|
||||
|
||||
<pre><code># cd /home/dspacetest.cgiar.org/log
|
||||
# ls -lh dspace.log.2015-11-18*
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -187,12 +187,13 @@
|
||||
<ul>
|
||||
<li>CGSpace went down</li>
|
||||
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
|
||||
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
78
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
@ -4,30 +4,30 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2019-05-03T10:29:01+03:00</lastmod>
|
||||
<lastmod>2019-05-03T16:33:34+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2019-05/</loc>
|
||||
<lastmod>2019-05-03T10:29:01+03:00</lastmod>
|
||||
<lastmod>2019-05-03T16:33:34+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2019-05-03T10:29:01+03:00</lastmod>
|
||||
<lastmod>2019-05-03T16:33:34+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2019-05-03T10:29:01+03:00</lastmod>
|
||||
<lastmod>2019-05-03T16:33:34+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2019-05-03T10:29:01+03:00</lastmod>
|
||||
<lastmod>2019-05-03T16:33:34+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -108,15 +108,14 @@
|
||||
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
|
||||
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
|
||||
</ul></li>
|
||||
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</li>
|
||||
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
|
||||
</article>
|
||||
@ -143,27 +142,27 @@ DELETE 1
|
||||
<ul>
|
||||
<li>They asked if we had plans to enable RDF support in CGSpace</li>
|
||||
</ul></li>
|
||||
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
|
||||
|
||||
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p>
|
||||
|
||||
<ul>
|
||||
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p>
|
||||
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
|
||||
4432 200
|
||||
</code></pre>
|
||||
4432 200
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul>
|
||||
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
|
||||
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li>
|
||||
|
||||
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
|
||||
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -218,27 +217,27 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
|
||||
<ul>
|
||||
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
|
||||
<li>The top IPs before, during, and after this latest alert tonight were:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top IPs before, during, and after this latest alert tonight were:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre>
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
|
||||
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
|
||||
<li>There were just over 3 million accesses in the nginx logs last month:</li>
|
||||
</ul>
|
||||
<li><p><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</p></li>
|
||||
|
||||
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li>
|
||||
|
||||
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
|
||||
|
||||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
|
||||
3018243
|
||||
@ -246,7 +245,8 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
real 0m19.873s
|
||||
user 0m22.203s
|
||||
sys 0m1.979s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -268,21 +268,22 @@ sys 0m1.979s
|
||||
|
||||
<ul>
|
||||
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
|
||||
<li>I don’t see anything interesting in the web server logs around that time though:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I don’t see anything interesting in the web server logs around that time though:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre>
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -411,21 +412,24 @@ sys 0m1.979s
|
||||
<h2 id="2018-08-01">2018-08-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p>
|
||||
|
||||
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
||||
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
|
||||
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
|
||||
<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
|
||||
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
|
||||
<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
|
||||
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
||||
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li>
|
||||
|
||||
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</p></li>
|
||||
|
||||
<li><p>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</p></li>
|
||||
|
||||
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
|
||||
|
||||
<li><p>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
|
||||
|
||||
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -93,15 +93,14 @@
|
||||
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
|
||||
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
|
||||
</ul></li>
|
||||
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</li>
|
||||
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present…</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
|
||||
</article>
|
||||
@ -128,27 +127,27 @@ DELETE 1
|
||||
<ul>
|
||||
<li>They asked if we had plans to enable RDF support in CGSpace</li>
|
||||
</ul></li>
|
||||
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
|
||||
|
||||
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p>
|
||||
|
||||
<ul>
|
||||
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p>
|
||||
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
|
||||
4432 200
|
||||
</code></pre>
|
||||
4432 200
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul>
|
||||
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
|
||||
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li>
|
||||
|
||||
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
|
||||
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -203,27 +202,27 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
|
||||
<ul>
|
||||
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
|
||||
<li>The top IPs before, during, and after this latest alert tonight were:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top IPs before, during, and after this latest alert tonight were:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre>
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</li>
|
||||
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
|
||||
<li>There were just over 3 million accesses in the nginx logs last month:</li>
|
||||
</ul>
|
||||
<li><p><code>85.25.237.71</code> is the “Linguee Bot” that I first saw last month</p></li>
|
||||
|
||||
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li>
|
||||
|
||||
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
|
||||
|
||||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
|
||||
3018243
|
||||
@ -231,7 +230,8 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
real 0m19.873s
|
||||
user 0m22.203s
|
||||
sys 0m1.979s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -253,21 +253,22 @@ sys 0m1.979s
|
||||
|
||||
<ul>
|
||||
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
|
||||
<li>I don’t see anything interesting in the web server logs around that time though:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I don’t see anything interesting in the web server logs around that time though:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre>
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -396,21 +397,24 @@ sys 0m1.979s
|
||||
<h2 id="2018-08-01">2018-08-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p>
|
||||
|
||||
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
||||
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
|
||||
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</li>
|
||||
<li>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</li>
|
||||
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
|
||||
<li>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</li>
|
||||
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
||||
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li>
|
||||
|
||||
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat’s</p></li>
|
||||
|
||||
<li><p>I’m not sure why Tomcat didn’t crash with an OutOfMemoryError…</p></li>
|
||||
|
||||
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
|
||||
|
||||
<li><p>The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
|
||||
|
||||
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -27,15 +27,14 @@
|
||||
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
|
||||
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
|
||||
</ul></li>
|
||||
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The item seems to be in a pre-submitted state, so I tried to delete it from there:</p>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
|
||||
<li><p>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -53,27 +52,27 @@ DELETE 1
|
||||
<ul>
|
||||
<li>They asked if we had plans to enable RDF support in CGSpace</li>
|
||||
</ul></li>
|
||||
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
|
||||
|
||||
<li><p>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today</p>
|
||||
|
||||
<ul>
|
||||
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
<li><p>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</p>
|
||||
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
|
||||
4432 200
|
||||
</code></pre>
|
||||
4432 200
|
||||
</code></pre></li>
|
||||
</ul></li>
|
||||
|
||||
<ul>
|
||||
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
|
||||
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</p></li>
|
||||
|
||||
<li><p>Apply country and region corrections and deletions on DSpace Test and CGSpace:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
|
||||
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -110,27 +109,27 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
|
||||
<ul>
|
||||
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
|
||||
<li>The top IPs before, during, and after this latest alert tonight were:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>The top IPs before, during, and after this latest alert tonight were:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre>
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
405 207.46.13.173
|
||||
405 207.46.13.75
|
||||
1117 66.249.66.219
|
||||
1121 35.237.175.180
|
||||
1546 5.9.6.51
|
||||
2474 45.5.186.2
|
||||
5490 85.25.237.71
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</li>
|
||||
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
|
||||
<li>There were just over 3 million accesses in the nginx logs last month:</li>
|
||||
</ul>
|
||||
<li><p><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</p></li>
|
||||
|
||||
<li><p>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</p></li>
|
||||
|
||||
<li><p>There were just over 3 million accesses in the nginx logs last month:</p>
|
||||
|
||||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot;
|
||||
3018243
|
||||
@ -138,7 +137,8 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
real 0m19.873s
|
||||
user 0m22.203s
|
||||
sys 0m1.979s
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -151,21 +151,22 @@ sys 0m1.979s
|
||||
|
||||
<ul>
|
||||
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
|
||||
<li>I don&rsquo;t see anything interesting in the web server logs around that time though:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I don&rsquo;t see anything interesting in the web server logs around that time though:</p>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre></description>
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
177 35.237.175.180
|
||||
177 40.77.167.32
|
||||
216 66.249.75.219
|
||||
225 18.203.76.93
|
||||
261 46.101.86.248
|
||||
357 207.46.13.1
|
||||
903 54.70.40.11
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -249,21 +250,24 @@ sys 0m1.979s
|
||||
<description><h2 id="2018-08-01">2018-08-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
|
||||
</ul>
|
||||
<li><p>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</p>
|
||||
|
||||
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
|
||||
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
|
||||
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
|
||||
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</li>
|
||||
<li>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</li>
|
||||
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
|
||||
<li>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</li>
|
||||
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
||||
<li><p>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</p></li>
|
||||
|
||||
<li><p>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</p></li>
|
||||
|
||||
<li><p>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</p></li>
|
||||
|
||||
<li><p>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</p></li>
|
||||
|
||||
<li><p>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</p></li>
|
||||
|
||||
<li><p>I ran all system updates on DSpace Test and rebooted it</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -276,18 +280,16 @@ sys 0m1.979s
|
||||
<description><h2 id="2018-07-01">2018-07-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
|
||||
</ul>
|
||||
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p>
|
||||
|
||||
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
|
||||
</ul>
|
||||
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
|
||||
|
||||
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -305,23 +307,23 @@ sys 0m1.979s
|
||||
<li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn&rsquo;t build</li>
|
||||
</ul></li>
|
||||
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
|
||||
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/">March, 2018</a></li>
|
||||
<li>Time to index ~70,000 items on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-03/">March, 2018</a></p></li>
|
||||
|
||||
<li><p>Time to index ~70,000 items on CGSpace:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -400,26 +402,25 @@ sys 2m7.289s
|
||||
<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li>
|
||||
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
|
||||
<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li>
|
||||
<li>And just before that I see this:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>And just before that I see this:</p>
|
||||
|
||||
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Ah hah! So the pool was actually empty!</li>
|
||||
<li>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</li>
|
||||
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</li>
|
||||
<li>I notice this error quite a few times in dspace.log:</li>
|
||||
</ul>
|
||||
<li><p>Ah hah! So the pool was actually empty!</p></li>
|
||||
|
||||
<li><p>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</p></li>
|
||||
|
||||
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</p></li>
|
||||
|
||||
<li><p>I notice this error quite a few times in dspace.log:</p>
|
||||
|
||||
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
||||
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And there are many of these errors every day for the past month:</li>
|
||||
</ul>
|
||||
<li><p>And there are many of these errors every day for the past month:</p>
|
||||
|
||||
<pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.*
|
||||
dspace.log.2017-11-21:4
|
||||
@ -465,10 +466,9 @@ dspace.log.2017-12-30:89
|
||||
dspace.log.2017-12-31:53
|
||||
dspace.log.2018-01-01:45
|
||||
dspace.log.2018-01-02:34
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</li>
|
||||
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -503,20 +503,18 @@ dspace.log.2018-01-02:34
|
||||
<h2 id="2017-11-02">2017-11-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
|
||||
</ul>
|
||||
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p>
|
||||
|
||||
<pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log
|
||||
0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
|
||||
</ul>
|
||||
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
|
||||
COPY 54701
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -528,15 +526,14 @@ COPY 54701
|
||||
<description><h2 id="2017-10-01">2017-10-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
|
||||
</ul>
|
||||
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p>
|
||||
|
||||
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
|
||||
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
|
||||
<li><p>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li>
|
||||
|
||||
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -646,11 +643,12 @@ COPY 54701
|
||||
|
||||
<ul>
|
||||
<li>Remove redundant/duplicate text in the DSpace submission license</li>
|
||||
<li>Testing the CMYK patch on a collection with 650 items:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -676,12 +674,13 @@ COPY 54701
|
||||
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
|
||||
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
|
||||
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
|
||||
<li>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>&frasl;<sub>51999</sub></a>):</p>
|
||||
|
||||
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -693,23 +692,22 @@ COPY 54701
|
||||
<description><h2 id="2017-02-07">2017-02-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
|
||||
</ul>
|
||||
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p>
|
||||
|
||||
<pre><code>dspace=# select * from collection2item where item_id = '80278';
|
||||
id | collection_id | item_id
|
||||
id | collection_id | item_id
|
||||
-------+---------------+---------
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
(3 rows)
|
||||
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
|
||||
<li>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
|
||||
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li>
|
||||
|
||||
<li><p>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -738,20 +736,21 @@ DELETE 1
|
||||
|
||||
<ul>
|
||||
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
|
||||
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p>
|
||||
|
||||
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</li>
|
||||
<li>I&rsquo;ve raised a ticket with Atmire to ask</li>
|
||||
<li>Another worrying error from dspace.log is:</li>
|
||||
<li><p>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</p></li>
|
||||
|
||||
<li><p>I&rsquo;ve raised a ticket with Atmire to ask</p></li>
|
||||
|
||||
<li><p>Another worrying error from dspace.log is:</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -786,11 +785,12 @@ DELETE 1
|
||||
<li>ORCIDs only</li>
|
||||
<li>ORCIDs plus normal authors</li>
|
||||
</ul></li>
|
||||
<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
|
||||
|
||||
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -805,11 +805,12 @@ DELETE 1
|
||||
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
|
||||
<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
|
||||
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
|
||||
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
|
||||
|
||||
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -826,13 +827,14 @@ DELETE 1
|
||||
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
|
||||
<li>bower stuff is a dead end, waste of time, too many issues</li>
|
||||
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
|
||||
<li>Start working on DSpace 5.1 → 5.5 port:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Start working on DSpace 5.1 → 5.5 port:</p>
|
||||
|
||||
<pre><code>$ git checkout -b 55new 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -845,19 +847,18 @@ $ git rebase -i dspace-5.5
|
||||
|
||||
<ul>
|
||||
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
|
||||
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
UPDATE 95
|
||||
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
text_value
|
||||
text_value
|
||||
------------
|
||||
(0 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In this case the select query was showing 95 results before the update</li>
|
||||
<li><p>In this case the select query was showing 95 results before the update</p></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
@ -890,12 +891,13 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<ul>
|
||||
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
|
||||
<li>I have blocked access to the API now</li>
|
||||
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
|
||||
</ul>
|
||||
|
||||
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
|
||||
3168
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -976,15 +978,15 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<description><h2 id="2015-12-02">2015-12-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
|
||||
</ul>
|
||||
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p>
|
||||
|
||||
<pre><code># cd /home/dspacetest.cgiar.org/log
|
||||
# ls -lh dspace.log.2015-11-18*
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
<item>
|
||||
@ -998,12 +1000,13 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<ul>
|
||||
<li>CGSpace went down</li>
|
||||
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
|
||||
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
78
|
||||
</code></pre></description>
|
||||
</code></pre></li>
|
||||
</ul></description>
|
||||
</item>
|
||||
|
||||
</channel>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -86,18 +86,16 @@
|
||||
<h2 id="2018-07-01">2018-07-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
|
||||
</ul>
|
||||
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p>
|
||||
|
||||
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
|
||||
</ul>
|
||||
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
|
||||
|
||||
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -124,23 +122,23 @@
|
||||
<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
|
||||
</ul></li>
|
||||
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
|
||||
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
|
||||
<li>Time to index ~70,000 items on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></p></li>
|
||||
|
||||
<li><p>Time to index ~70,000 items on CGSpace:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -264,26 +262,25 @@ sys 2m7.289s
|
||||
<li>I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary</li>
|
||||
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
|
||||
<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
|
||||
<li>And just before that I see this:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>And just before that I see this:</p>
|
||||
|
||||
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Ah hah! So the pool was actually empty!</li>
|
||||
<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
|
||||
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
|
||||
<li>I notice this error quite a few times in dspace.log:</li>
|
||||
</ul>
|
||||
<li><p>Ah hah! So the pool was actually empty!</p></li>
|
||||
|
||||
<li><p>I need to increase that, let’s try to bump it up from 50 to 75</p></li>
|
||||
|
||||
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</p></li>
|
||||
|
||||
<li><p>I notice this error quite a few times in dspace.log:</p>
|
||||
|
||||
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
||||
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And there are many of these errors every day for the past month:</li>
|
||||
</ul>
|
||||
<li><p>And there are many of these errors every day for the past month:</p>
|
||||
|
||||
<pre><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
|
||||
dspace.log.2017-11-21:4
|
||||
@ -329,10 +326,9 @@ dspace.log.2017-12-30:89
|
||||
dspace.log.2017-12-31:53
|
||||
dspace.log.2018-01-01:45
|
||||
dspace.log.2018-01-02:34
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</li>
|
||||
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a>
|
||||
</article>
|
||||
@ -385,20 +381,18 @@ dspace.log.2018-01-02:34
|
||||
<h2 id="2017-11-02">2017-11-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
|
||||
</ul>
|
||||
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p>
|
||||
|
||||
<pre><code># grep -c "CORE" /var/log/nginx/access.log
|
||||
0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
|
||||
</ul>
|
||||
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
|
||||
COPY 54701
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -419,15 +413,14 @@ COPY 54701
|
||||
<h2 id="2017-10-01">2017-10-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
|
||||
</ul>
|
||||
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p>
|
||||
|
||||
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
|
||||
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
|
||||
<li><p>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li>
|
||||
|
||||
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -228,11 +228,12 @@
|
||||
|
||||
<ul>
|
||||
<li>Remove redundant/duplicate text in the DSpace submission license</li>
|
||||
<li>Testing the CMYK patch on a collection with 650 items:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -267,12 +268,13 @@
|
||||
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
|
||||
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
|
||||
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
|
||||
<li>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>⁄<sub>51999</sub></a>):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>⁄<sub>51999</sub></a>):</p>
|
||||
|
||||
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -293,23 +295,22 @@
|
||||
<h2 id="2017-02-07">2017-02-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
|
||||
</ul>
|
||||
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p>
|
||||
|
||||
<pre><code>dspace=# select * from collection2item where item_id = '80278';
|
||||
id | collection_id | item_id
|
||||
id | collection_id | item_id
|
||||
-------+---------------+---------
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
(3 rows)
|
||||
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
|
||||
<li>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
|
||||
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li>
|
||||
|
||||
<li><p>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a>
|
||||
</article>
|
||||
@ -356,20 +357,21 @@ DELETE 1
|
||||
|
||||
<ul>
|
||||
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
|
||||
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p>
|
||||
|
||||
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
|
||||
<li>I’ve raised a ticket with Atmire to ask</li>
|
||||
<li>Another worrying error from dspace.log is:</li>
|
||||
<li><p>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</p></li>
|
||||
|
||||
<li><p>I’ve raised a ticket with Atmire to ask</p></li>
|
||||
|
||||
<li><p>Another worrying error from dspace.log is:</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -117,11 +117,12 @@
|
||||
<li>ORCIDs only</li>
|
||||
<li>ORCIDs plus normal authors</li>
|
||||
</ul></li>
|
||||
<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
|
||||
|
||||
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -145,11 +146,12 @@
|
||||
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
|
||||
<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
|
||||
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
|
||||
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
|
||||
|
||||
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -175,13 +177,14 @@
|
||||
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
|
||||
<li>bower stuff is a dead end, waste of time, too many issues</li>
|
||||
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
|
||||
<li>Start working on DSpace 5.1 → 5.5 port:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Start working on DSpace 5.1 → 5.5 port:</p>
|
||||
|
||||
<pre><code>$ git checkout -b 55new 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -203,19 +206,18 @@ $ git rebase -i dspace-5.5
|
||||
|
||||
<ul>
|
||||
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
|
||||
<li>I think this query should find and replace all authors that have “,” at the end of their names:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think this query should find and replace all authors that have “,” at the end of their names:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
UPDATE 95
|
||||
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
text_value
|
||||
text_value
|
||||
------------
|
||||
(0 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In this case the select query was showing 95 results before the update</li>
|
||||
<li><p>In this case the select query was showing 95 results before the update</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a>
|
||||
</article>
|
||||
@ -266,12 +268,13 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<ul>
|
||||
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
|
||||
<li>I have blocked access to the API now</li>
|
||||
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
|
||||
</ul>
|
||||
|
||||
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
|
||||
3168
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -110,15 +110,15 @@
|
||||
<h2 id="2015-12-02">2015-12-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
|
||||
</ul>
|
||||
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p>
|
||||
|
||||
<pre><code># cd /home/dspacetest.cgiar.org/log
|
||||
# ls -lh dspace.log.2015-11-18*
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -141,12 +141,13 @@
|
||||
<ul>
|
||||
<li>CGSpace went down</li>
|
||||
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
|
||||
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
78
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -101,18 +101,16 @@
|
||||
<h2 id="2018-07-01">2018-07-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
|
||||
</ul>
|
||||
<li><p>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</p>
|
||||
|
||||
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
|
||||
</ul>
|
||||
<li><p>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</p>
|
||||
|
||||
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -139,23 +137,23 @@
|
||||
<li>There seems to be a problem with the CUA and L&R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn’t build</li>
|
||||
</ul></li>
|
||||
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
|
||||
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</p>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
|
||||
<li>Time to index ~70,000 items on CGSpace:</li>
|
||||
</ul>
|
||||
<li><p>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></p></li>
|
||||
|
||||
<li><p>Time to index ~70,000 items on CGSpace:</p>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -279,26 +277,25 @@ sys 2m7.289s
|
||||
<li>I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary</li>
|
||||
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
|
||||
<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
|
||||
<li>And just before that I see this:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>And just before that I see this:</p>
|
||||
|
||||
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Ah hah! So the pool was actually empty!</li>
|
||||
<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
|
||||
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
|
||||
<li>I notice this error quite a few times in dspace.log:</li>
|
||||
</ul>
|
||||
<li><p>Ah hah! So the pool was actually empty!</p></li>
|
||||
|
||||
<li><p>I need to increase that, let’s try to bump it up from 50 to 75</p></li>
|
||||
|
||||
<li><p>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</p></li>
|
||||
|
||||
<li><p>I notice this error quite a few times in dspace.log:</p>
|
||||
|
||||
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
||||
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>And there are many of these errors every day for the past month:</li>
|
||||
</ul>
|
||||
<li><p>And there are many of these errors every day for the past month:</p>
|
||||
|
||||
<pre><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
|
||||
dspace.log.2017-11-21:4
|
||||
@ -344,10 +341,9 @@ dspace.log.2017-12-30:89
|
||||
dspace.log.2017-12-31:53
|
||||
dspace.log.2018-01-01:45
|
||||
dspace.log.2018-01-02:34
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</li>
|
||||
<li><p>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a>
|
||||
</article>
|
||||
@ -400,20 +396,18 @@ dspace.log.2018-01-02:34
|
||||
<h2 id="2017-11-02">2017-11-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
|
||||
</ul>
|
||||
<li><p>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</p>
|
||||
|
||||
<pre><code># grep -c "CORE" /var/log/nginx/access.log
|
||||
0
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
|
||||
</ul>
|
||||
<li><p>Generate list of authors on CGSpace for Peter to go through and correct:</p>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
|
||||
COPY 54701
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -434,15 +428,14 @@ COPY 54701
|
||||
<h2 id="2017-10-01">2017-10-01</h2>
|
||||
|
||||
<ul>
|
||||
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
|
||||
</ul>
|
||||
<li><p>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</p>
|
||||
|
||||
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
|
||||
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
|
||||
<li><p>There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</p></li>
|
||||
|
||||
<li><p>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -261,11 +261,12 @@
|
||||
|
||||
<ul>
|
||||
<li>Remove redundant/duplicate text in the DSpace submission license</li>
|
||||
<li>Testing the CMYK patch on a collection with 650 items:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Testing the CMYK patch on a collection with 650 items:</p>
|
||||
|
||||
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -300,12 +301,13 @@
|
||||
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
|
||||
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
|
||||
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
|
||||
<li>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>⁄<sub>51999</sub></a>):</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999"><sup>10568</sup>⁄<sub>51999</sub></a>):</p>
|
||||
|
||||
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -326,23 +328,22 @@
|
||||
<h2 id="2017-02-07">2017-02-07</h2>
|
||||
|
||||
<ul>
|
||||
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
|
||||
</ul>
|
||||
<li><p>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</p>
|
||||
|
||||
<pre><code>dspace=# select * from collection2item where item_id = '80278';
|
||||
id | collection_id | item_id
|
||||
id | collection_id | item_id
|
||||
-------+---------------+---------
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
92551 | 313 | 80278
|
||||
92550 | 313 | 80278
|
||||
90774 | 1051 | 80278
|
||||
(3 rows)
|
||||
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
|
||||
DELETE 1
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
|
||||
<li>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
|
||||
<li><p>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</p></li>
|
||||
|
||||
<li><p>Looks like we’ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a>
|
||||
</article>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -102,20 +102,21 @@
|
||||
|
||||
<ul>
|
||||
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
|
||||
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</p>
|
||||
|
||||
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
|
||||
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</li>
|
||||
<li>I’ve raised a ticket with Atmire to ask</li>
|
||||
<li>Another worrying error from dspace.log is:</li>
|
||||
<li><p>I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade</p></li>
|
||||
|
||||
<li><p>I’ve raised a ticket with Atmire to ask</p></li>
|
||||
|
||||
<li><p>Another worrying error from dspace.log is:</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a>
|
||||
</article>
|
||||
@ -168,11 +169,12 @@
|
||||
<li>ORCIDs only</li>
|
||||
<li>ORCIDs plus normal authors</li>
|
||||
</ul></li>
|
||||
<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</p>
|
||||
|
||||
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -196,11 +198,12 @@
|
||||
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
|
||||
<li>Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
|
||||
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
|
||||
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>It looks like we might be able to use OUs now, instead of DCs:</p>
|
||||
|
||||
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -226,13 +229,14 @@
|
||||
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
|
||||
<li>bower stuff is a dead end, waste of time, too many issues</li>
|
||||
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
|
||||
<li>Start working on DSpace 5.1 → 5.5 port:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Start working on DSpace 5.1 → 5.5 port:</p>
|
||||
|
||||
<pre><code>$ git checkout -b 55new 5_x-prod
|
||||
$ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -254,19 +258,18 @@ $ git rebase -i dspace-5.5
|
||||
|
||||
<ul>
|
||||
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
|
||||
<li>I think this query should find and replace all authors that have “,” at the end of their names:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>I think this query should find and replace all authors that have “,” at the end of their names:</p>
|
||||
|
||||
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
UPDATE 95
|
||||
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
|
||||
text_value
|
||||
text_value
|
||||
------------
|
||||
(0 rows)
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
|
||||
<ul>
|
||||
<li>In this case the select query was showing 95 results before the update</li>
|
||||
<li><p>In this case the select query was showing 95 results before the update</p></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a>
|
||||
</article>
|
||||
@ -317,12 +320,13 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
|
||||
<ul>
|
||||
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
|
||||
<li>I have blocked access to the API now</li>
|
||||
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
|
||||
</ul>
|
||||
|
||||
<li><p>There are 3,000 IPs accessing the REST API in a 24-hour period!</p>
|
||||
|
||||
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
|
||||
3168
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.55.3" />
|
||||
<meta name="generator" content="Hugo 0.55.5" />
|
||||
|
||||
|
||||
|
||||
@ -156,15 +156,15 @@
|
||||
<h2 id="2015-12-02">2015-12-02</h2>
|
||||
|
||||
<ul>
|
||||
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
|
||||
</ul>
|
||||
<li><p>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</p>
|
||||
|
||||
<pre><code># cd /home/dspacetest.cgiar.org/log
|
||||
# ls -lh dspace.log.2015-11-18*
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -187,12 +187,13 @@
|
||||
<ul>
|
||||
<li>CGSpace went down</li>
|
||||
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
|
||||
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
|
||||
</ul>
|
||||
|
||||
<li><p>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</p>
|
||||
|
||||
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
|
||||
78
|
||||
</code></pre>
|
||||
</code></pre></li>
|
||||
</ul>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
Reference in New Issue
Block a user