Add notes for 2020-01-27

This commit is contained in:
2020-01-27 16:20:44 +02:00
parent 207ace0883
commit 8feb93be39
112 changed files with 11466 additions and 5158 deletions

View File

@ -61,7 +61,7 @@ $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u ds
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
"/>
<meta name="generator" content="Hugo 0.62.2" />
<meta name="generator" content="Hugo 0.63.1" />
@ -91,7 +91,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.a20c1a4367639632cdb341d23c27ca44fedcc75b0f8b3cbea6203010da153d3c.css" rel="stylesheet" integrity="sha256-ogwaQ2djljLNs0HSPCfKRP7cx1sPizy&#43;piAwENoVPTw=" crossorigin="anonymous">
<link href="https://alanorth.github.io/cgspace-notes/css/style.23e2c3298bcc8c1136c19aba330c211ec94c36f7c4454ea15cf4d3548370042a.css" rel="stylesheet" integrity="sha256-I&#43;LDKYvMjBE2wZq6MwwhHslMNvfERU6hXPTTVINwBCo=" crossorigin="anonymous">
<!-- RSS 2.0 feed -->
@ -138,7 +138,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-04/">April, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-04-01T09:00:43&#43;03:00">Mon Apr 01, 2019</time> by Alan Orth in
<i class="fa fa-folder" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
</p>
@ -169,7 +169,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</code></pre><h2 id="2019-04-02">2019-04-02</h2>
<ul>
<li>CTA says the Amazon IPs are AWS gateways for real user traffic</li>
<li>I was trying to add Felix Shaw's account back to the Administrators group on DSpace Test, but I couldn't find his name in the user search of the groups page
<li>I was trying to add Felix Shaw&rsquo;s account back to the Administrators group on DSpace Test, but I couldn&rsquo;t find his name in the user search of the groups page
<ul>
<li>If I searched for &ldquo;Felix&rdquo; or &ldquo;Shaw&rdquo; I saw other matches, included one for his personal email address!</li>
<li>I ended up finding him via searching for his email address</li>
@ -192,12 +192,12 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<pre><code>$ ./resolve-orcids.py -i /tmp/2019-04-03-orcid-ids.txt -o 2019-04-03-orcid-ids.txt -d
</code></pre><ul>
<li>After that I added the XML formatting, formatted the file with tidy, and sorted the names in vim</li>
<li>One user's name has changed so I will update those using my <code>fix-metadata-values.py</code> script:</li>
<li>One user&rsquo;s name has changed so I will update those using my <code>fix-metadata-values.py</code> script:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i 2019-04-03-update-orcids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -m 240 -t correct -d
</code></pre><ul>
<li>I created a pull request and merged the changes to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/417">#417</a>)</li>
<li>A few days ago I noticed some weird update process for the statistics-2018 Solr core and I see it's still going:</li>
<li>A few days ago I noticed some weird update process for the statistics-2018 Solr core and I see it&rsquo;s still going:</li>
</ul>
<pre><code>2019-04-03 16:34:02,262 INFO org.dspace.statistics.SolrLogger @ Updating : 1754500/21701 docs in http://localhost:8081/solr//statistics-2018
</code></pre><ul>
@ -228,10 +228,10 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</ul>
<p><img src="/cgspace-notes/2019/04/cpu-week.png" alt="CPU usage week"></p>
<ul>
<li>The other thing visible there is that the past few days the load has spiked to 500% and I don't think it's a coincidence that the Solr updating thing is happening&hellip;</li>
<li>The other thing visible there is that the past few days the load has spiked to 500% and I don&rsquo;t think it&rsquo;s a coincidence that the Solr updating thing is happening&hellip;</li>
<li>I ran all system updates and rebooted the server
<ul>
<li>The load was lower on the server after reboot, but Solr didn't come back up properly according to the Solr Admin UI:</li>
<li>The load was lower on the server after reboot, but Solr didn&rsquo;t come back up properly according to the Solr Admin UI:</li>
</ul>
</li>
</ul>
@ -241,7 +241,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</ul>
<h2 id="2019-04-06">2019-04-06</h2>
<ul>
<li>Udana asked why item <a href="https://cgspace.cgiar.org/handle/10568/91278">10568/91278</a> didn't have an Altmetric badge on CGSpace, but on the <a href="https://wle.cgiar.org/food-and-agricultural-innovation-pathways-prosperity">WLE website</a> it does
<li>Udana asked why item <a href="https://cgspace.cgiar.org/handle/10568/91278">10568/91278</a> didn&rsquo;t have an Altmetric badge on CGSpace, but on the <a href="https://wle.cgiar.org/food-and-agricultural-innovation-pathways-prosperity">WLE website</a> it does
<ul>
<li>I looked and saw that the WLE website is using the Altmetric score associated with the DOI, and that the Handle has no score at all</li>
<li>I tweeted the item and I assume this will link the Handle with the DOI in the system</li>
@ -273,12 +273,12 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
4267 45.5.186.2
4893 205.186.128.185
</code></pre><ul>
<li><code>45.5.184.72</code> is in Colombia so it's probably CIAT, and I see they are indeed trying to get crawl the Discover pages on CIAT's datasets collection:</li>
<li><code>45.5.184.72</code> is in Colombia so it&rsquo;s probably CIAT, and I see they are indeed trying to get crawl the Discover pages on CIAT&rsquo;s datasets collection:</li>
</ul>
<pre><code>GET /handle/10568/72970/discover?filtertype_0=type&amp;filtertype_1=author&amp;filter_relational_operator_1=contains&amp;filter_relational_operator_0=equals&amp;filter_1=&amp;filter_0=Dataset&amp;filtertype=dateIssued&amp;filter_relational_operator=equals&amp;filter=2014
</code></pre><ul>
<li>Their user agent is the one I added to the badbots list in nginx last week: &ldquo;GuzzleHttp/6.3.3 curl/7.47.0 PHP/7.0.30-0ubuntu0.16.04.1&rdquo;</li>
<li>They made 22,000 requests to Discover on this collection today alone (and it's only 11AM):</li>
<li>They made 22,000 requests to Discover on this collection today alone (and it&rsquo;s only 11AM):</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep &quot;06/Apr/2019&quot; | grep 45.5.184.72 | grep -oE '/handle/[0-9]+/[0-9]+/discover' | sort | uniq -c
22077 /handle/10568/72970/discover
@ -332,7 +332,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
}
}
</code></pre><ul>
<li>Strangely I don't see many hits in 2019-04:</li>
<li>Strangely I don&rsquo;t see many hits in 2019-04:</li>
</ul>
<pre><code>$ http --print b 'http://localhost:8081/solr/statistics/select?q=type%3A0+AND+(ip%3A18.196.196.108+OR+ip%3A18.195.78.144+OR+ip%3A18.195.218.6)&amp;fq=statistics_type%3Aview&amp;fq=bundleName%3AORIGINAL&amp;fq=dateYearMonth%3A2019-04&amp;rows=0&amp;wt=json&amp;indent=true'
{
@ -417,7 +417,7 @@ X-XSS-Protection: 1; mode=block
</code></pre><ul>
<li>So definitely the <em>size</em> of the transfer is more efficient with a HEAD, but I need to wait to see if these requests show up in Solr
<ul>
<li>After twenty minutes of waiting I still don't see any new requests in the statistics core, but when I try the requests from the command line again I see the following in the DSpace log:</li>
<li>After twenty minutes of waiting I still don&rsquo;t see any new requests in the statistics core, but when I try the requests from the command line again I see the following in the DSpace log:</li>
</ul>
</li>
</ul>
@ -426,7 +426,7 @@ X-XSS-Protection: 1; mode=block
</code></pre><ul>
<li>So my inclination is that both HEAD and GET requests are registered as views as far as Solr and DSpace are concerned
<ul>
<li>Strangely, the statistics Solr core says it hasn't been modified in 24 hours, so I tried to start the &ldquo;optimize&rdquo; process from the Admin UI and I see this in the Solr log:</li>
<li>Strangely, the statistics Solr core says it hasn&rsquo;t been modified in 24 hours, so I tried to start the &ldquo;optimize&rdquo; process from the Admin UI and I see this in the Solr log:</li>
</ul>
</li>
</ul>
@ -434,7 +434,7 @@ X-XSS-Protection: 1; mode=block
</code></pre><ul>
<li>Ugh, even after optimizing there are no Solr results for requests from my IP, and actually I only see 18 results from 2019-04 so far and none of them are <code>statistics_type:view</code>&hellip; very weird
<ul>
<li>I don't even see many hits for days after 2019-03-17, when I migrated the server to Ubuntu 18.04 and copied the statistics core from CGSpace (linode18)</li>
<li>I don&rsquo;t even see many hits for days after 2019-03-17, when I migrated the server to Ubuntu 18.04 and copied the statistics core from CGSpace (linode18)</li>
<li>I will try to re-deploy the <code>5_x-dev</code> branch and test again</li>
</ul>
</li>
@ -465,7 +465,7 @@ X-XSS-Protection: 1; mode=block
}
</code></pre><ul>
<li>I confirmed the same on CGSpace itself after making one HEAD request</li>
<li>So I'm pretty sure it's something about DSpace Test using the CGSpace statistics core, and not that I deployed Solr 4.10.4 there last week
<li>So I&rsquo;m pretty sure it&rsquo;s something about DSpace Test using the CGSpace statistics core, and not that I deployed Solr 4.10.4 there last week
<ul>
<li>I deployed Solr 4.10.4 locally and ran a bunch of requests for bitstreams and they do show up in the Solr statistics log, so the issue must be with re-using the existing Solr core from CGSpace</li>
</ul>
@ -482,12 +482,12 @@ X-XSS-Protection: 1; mode=block
<li>See: <a href="https://jira.duraspace.org/browse/DS-3986">DS-3986</a></li>
<li>See: <a href="https://jira.duraspace.org/browse/DS-4020">DS-4020</a></li>
<li>See: <a href="https://jira.duraspace.org/browse/DS-3832">DS-3832</a></li>
<li>DSpace 5.10 upgraded to use GeoIP2, but we are on 5.8 so I just copied the missing database file from another server because it has been <em>removed</em> from MaxMind's server as of 2018-04-01</li>
<li>DSpace 5.10 upgraded to use GeoIP2, but we are on 5.8 so I just copied the missing database file from another server because it has been <em>removed</em> from MaxMind&rsquo;s server as of 2018-04-01</li>
<li>Now I made 100 requests and I see them in the Solr statistics&hellip; fuck my life for wasting five hours debugging this</li>
</ul>
</li>
<li>UptimeRobot said CGSpace went down and up a few times tonight, and my first instict was to check <code>iostat 1 10</code> and I saw that CPU steal is around 1030 percent right now&hellip;</li>
<li>The load average is super high right now, as I've noticed the last few times UptimeRobot said that CGSpace went down:</li>
<li>The load average is super high right now, as I&rsquo;ve noticed the last few times UptimeRobot said that CGSpace went down:</li>
</ul>
<pre><code>$ cat /proc/loadavg
10.70 9.17 8.85 18/633 4198
@ -532,7 +532,7 @@ X-XSS-Protection: 1; mode=block
</ul>
<h2 id="2019-04-08">2019-04-08</h2>
<ul>
<li>Start checking IITA's last round of batch uploads from <a href="https://dspacetest.cgiar.org/handle/10568/100333">March on DSpace Test</a> (20193rd.xls)
<li>Start checking IITA&rsquo;s last round of batch uploads from <a href="https://dspacetest.cgiar.org/handle/10568/100333">March on DSpace Test</a> (20193rd.xls)
<ul>
<li>Lots of problems with affiliations, I had to correct about sixty of them</li>
<li>I used lein to host the latest CSV of our affiliations for OpenRefine to reconcile against:</li>
@ -543,7 +543,7 @@ X-XSS-Protection: 1; mode=block
</code></pre><ul>
<li>After matching the values and creating some new matches I had trouble remembering how to copy the reconciled values to a new column
<ul>
<li>The matched values can be accessed with <code>cell.recon.match.name</code>, but some of the new values don't appear, perhaps because I edited the original cell values?</li>
<li>The matched values can be accessed with <code>cell.recon.match.name</code>, but some of the new values don&rsquo;t appear, perhaps because I edited the original cell values?</li>
<li>I ended up using this GREL expression to copy all values to a new column:</li>
</ul>
</li>
@ -599,7 +599,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exe
</ul>
<p><img src="/cgspace-notes/2019/04/cpu-week2.png" alt="CPU usage week"></p>
<ul>
<li>Linode Support still didn't respond to my ticket from yesterday, so I attached a new output of <code>iostat 1 10</code> and asked them to move the VM to a less busy host</li>
<li>Linode Support still didn&rsquo;t respond to my ticket from yesterday, so I attached a new output of <code>iostat 1 10</code> and asked them to move the VM to a less busy host</li>
<li>The web server logs are not very busy:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/{access,library-access}.log /var/log/nginx/{access,library-access}.log.1 | grep -E &quot;08/Apr/2019:(17|18|19)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
@ -679,7 +679,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exe
<pre><code>$ http 'https://api.crossref.org/funders?query=mercator&amp;mailto=me@cgiar.org'
</code></pre><ul>
<li>Otherwise, they provide the funder data in <a href="https://www.crossref.org/services/funder-registry/">CSV and RDF format</a></li>
<li>I did a quick test with the recent IITA records against reconcile-csv in OpenRefine and it matched a few, but the ones that didn't match will need a human to go and do some manual checking and informed decision making&hellip;</li>
<li>I did a quick test with the recent IITA records against reconcile-csv in OpenRefine and it matched a few, but the ones that didn&rsquo;t match will need a human to go and do some manual checking and informed decision making&hellip;</li>
<li>If I want to write a script for this I could use the Python <a href="https://habanero.readthedocs.io/en/latest/modules/crossref.html">habanero library</a>:</li>
</ul>
<pre><code>from habanero import Crossref
@ -687,7 +687,7 @@ cr = Crossref(mailto=&quot;me@cgiar.org&quot;)
x = cr.funders(query = &quot;mercator&quot;)
</code></pre><h2 id="2019-04-11">2019-04-11</h2>
<ul>
<li>Continue proofing IITA's last round of batch uploads from <a href="https://dspacetest.cgiar.org/handle/10568/100333">March on DSpace Test</a> (20193rd.xls)
<li>Continue proofing IITA&rsquo;s last round of batch uploads from <a href="https://dspacetest.cgiar.org/handle/10568/100333">March on DSpace Test</a> (20193rd.xls)
<ul>
<li>One misspelled country</li>
<li>Three incorrect regions</li>
@ -711,7 +711,7 @@ x = cr.funders(query = &quot;mercator&quot;)
</li>
</ul>
</li>
<li>I captured a few general corrections and deletions for AGROVOC subjects while looking at IITA's records, so I applied them to DSpace Test and CGSpace:</li>
<li>I captured a few general corrections and deletions for AGROVOC subjects while looking at IITA&rsquo;s records, so I applied them to DSpace Test and CGSpace:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-04-11-fix-14-subjects.csv -db dspace -u dspace -p 'fuuu' -f dc.subject -m 57 -t correct -d
$ ./delete-metadata-values.py -i /tmp/2019-04-11-delete-6-subjects.csv -db dspace -u dspace -p 'fuuu' -m 57 -f dc.subject -d
@ -719,9 +719,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-04-11-delete-6-subjects.csv -db dspac
<li>Answer more questions about DOIs and Altmetric scores from WLE</li>
<li>Answer more questions about DOIs and Altmetric scores from IWMI
<ul>
<li>They can't seem to understand the Altmetric + Twitter flow for associating Handles and DOIs</li>
<li>To make things worse, many of their items DON'T have DOIs, so when Altmetric harvests them of course there is no link! - Then, a bunch of their items don't have scores because they never tweeted them!</li>
<li>They added a DOI to this old item <a href="https://cgspace.cgiar.org/handle/10568/97087">10567/97087</a> this morning and wonder why Altmetric's score hasn't linked with the DOI magically</li>
<li>They can&rsquo;t seem to understand the Altmetric + Twitter flow for associating Handles and DOIs</li>
<li>To make things worse, many of their items DON&rsquo;T have DOIs, so when Altmetric harvests them of course there is no link! - Then, a bunch of their items don&rsquo;t have scores because they never tweeted them!</li>
<li>They added a DOI to this old item <a href="https://cgspace.cgiar.org/handle/10568/97087">10567/97087</a> this morning and wonder why Altmetric&rsquo;s score hasn&rsquo;t linked with the DOI magically</li>
<li>We should check in a week to see if Altmetric will make the association after one week when they harvest again</li>
</ul>
</li>
@ -734,7 +734,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-04-11-delete-6-subjects.csv -db dspac
<ul>
<li>It took about eight minutes to index 784 pages of item views and 268 of downloads, and you can see a clear &ldquo;sawtooth&rdquo; pattern in the garbage collection</li>
<li>I am curious if the GC pattern would be different if I switched from the <code>-XX:+UseConcMarkSweepGC</code> to G1GC</li>
<li>I switched to G1GC and restarted Tomcat but for some reason I couldn't see the Tomcat PID in VisualVM&hellip;
<li>I switched to G1GC and restarted Tomcat but for some reason I couldn&rsquo;t see the Tomcat PID in VisualVM&hellip;
<ul>
<li>Anyways, the indexing process took much longer, perhaps twice as long!</li>
</ul>
@ -771,10 +771,10 @@ $ ./delete-metadata-values.py -i /tmp/2019-04-11-delete-6-subjects.csv -db dspac
<li><a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.0.0">Tag version 1.0.0</a> and deploy it on DSpace Test</li>
</ul>
</li>
<li>Pretty annoying to see CGSpace (linode18) with 2050% CPU steal according to <code>iostat 1 10</code>, though I haven't had any Linode alerts in a few days</li>
<li>Abenet sent me a list of ILRI items that don't have CRPs added to them
<li>Pretty annoying to see CGSpace (linode18) with 2050% CPU steal according to <code>iostat 1 10</code>, though I haven&rsquo;t had any Linode alerts in a few days</li>
<li>Abenet sent me a list of ILRI items that don&rsquo;t have CRPs added to them
<ul>
<li>The spreadsheet only had Handles (no IDs), so I'm experimenting with using Python in OpenRefine to get the IDs</li>
<li>The spreadsheet only had Handles (no IDs), so I&rsquo;m experimenting with using Python in OpenRefine to get the IDs</li>
<li>I cloned the handle column and then did a transform to get the IDs from the CGSpace REST API:</li>
</ul>
</li>
@ -795,12 +795,12 @@ item_id = data['id']
return item_id
</code></pre><ul>
<li>Luckily none of the items already had CRPs, so I didn't have to worry about them getting removed
<li>Luckily none of the items already had CRPs, so I didn&rsquo;t have to worry about them getting removed
<ul>
<li>It would have been much trickier if I had to get the CRPs for the items first, then add the CRPs&hellip;</li>
</ul>
</li>
<li>I ran a full Discovery indexing on CGSpace because I didn't do it after all the metadata updates last week:</li>
<li>I ran a full Discovery indexing on CGSpace because I didn&rsquo;t do it after all the metadata updates last week:</li>
</ul>
<pre><code>$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
@ -809,7 +809,7 @@ user 7m33.446s
sys 2m13.463s
</code></pre><h2 id="2019-04-16">2019-04-16</h2>
<ul>
<li>Export IITA's community from CGSpace because they want to experiment with importing it into their internal DSpace for some testing or something</li>
<li>Export IITA&rsquo;s community from CGSpace because they want to experiment with importing it into their internal DSpace for some testing or something</li>
</ul>
<h2 id="2019-04-17">2019-04-17</h2>
<ul>
@ -914,8 +914,8 @@ sys 2m13.463s
</li>
<li>The biggest takeaway I have is that this workload benefits from a larger <code>filterCache</code> (for Solr fq parameter), but barely uses the <code>queryResultCache</code> (for Solr q parameter) at all
<ul>
<li>The number of hits goes up and the time taken decreases when we increase the <code>filterCache</code>, and total JVM heap memory doesn't seem to increase much at all</li>
<li>I guess the <code>queryResultCache</code> size is always 2 because I'm only doing two queries: <code>type:0</code> and <code>type:2</code> (downloads and views, respectively)</li>
<li>The number of hits goes up and the time taken decreases when we increase the <code>filterCache</code>, and total JVM heap memory doesn&rsquo;t seem to increase much at all</li>
<li>I guess the <code>queryResultCache</code> size is always 2 because I&rsquo;m only doing two queries: <code>type:0</code> and <code>type:2</code> (downloads and views, respectively)</li>
</ul>
</li>
<li>Here is the general pattern of running three sequential indexing runs as seen in VisualVM while monitoring the Tomcat process:</li>
@ -959,7 +959,7 @@ sys 2m13.463s
<p><img src="/cgspace-notes/2019/04/cpu-week3.png" alt="CPU usage week"></p>
<h2 id="2019-04-18">2019-04-18</h2>
<ul>
<li>I've been trying to copy the <code>statistics-2018</code> Solr core from CGSpace to DSpace Test since yesterday, but the network speed is like 20KiB/sec
<li>I&rsquo;ve been trying to copy the <code>statistics-2018</code> Solr core from CGSpace to DSpace Test since yesterday, but the network speed is like 20KiB/sec
<ul>
<li>I opened a support ticket to ask Linode to investigate</li>
<li>They asked me to send an <code>mtr</code> report from Fremont to Frankfurt and vice versa</li>
@ -968,10 +968,10 @@ sys 2m13.463s
<li>Deploy Tomcat 7.0.94 on DSpace Test (linode19)
<ul>
<li>Also, I realized that the CMS GC changes I deployed a few days ago were ignored by Tomcat because of something with how Ansible formatted the options string</li>
<li>I needed to use the &ldquo;folded&rdquo; YAML variable format <code>&gt;-</code> (with the dash so it doesn't add a return at the end)</li>
<li>I needed to use the &ldquo;folded&rdquo; YAML variable format <code>&gt;-</code> (with the dash so it doesn&rsquo;t add a return at the end)</li>
</ul>
</li>
<li>UptimeRobot says that CGSpace went &ldquo;down&rdquo; this afternoon, but I looked at the CPU steal with <code>iostat 1 10</code> and it's in the 50s and 60s
<li>UptimeRobot says that CGSpace went &ldquo;down&rdquo; this afternoon, but I looked at the CPU steal with <code>iostat 1 10</code> and it&rsquo;s in the 50s and 60s
<ul>
<li>The munin graph shows a lot of CPU steal (red) currently (and over all during the week):</li>
</ul>
@ -1009,13 +1009,13 @@ TCP window size: 85.0 KByte (default)
[ 5] 0.0-10.2 sec 172 MBytes 142 Mbits/sec
[ 4] 0.0-10.5 sec 202 MBytes 162 Mbits/sec
</code></pre><ul>
<li>Even with the software firewalls disabled the rsync speed was low, so it's not a rate limiting issue</li>
<li>Even with the software firewalls disabled the rsync speed was low, so it&rsquo;s not a rate limiting issue</li>
<li>I also tried to download a file over HTTPS from CGSpace to DSpace Test, but it was capped at 20KiB/sec
<ul>
<li>I updated the Linode issue with this information</li>
</ul>
</li>
<li>I'm going to try to switch the kernel to the latest upstream (5.0.8) instead of Linode's latest x86_64
<li>I&rsquo;m going to try to switch the kernel to the latest upstream (5.0.8) instead of Linode&rsquo;s latest x86_64
<ul>
<li>Nope, still 20KiB/sec</li>
</ul>
@ -1026,7 +1026,7 @@ TCP window size: 85.0 KByte (default)
<li>Deploy Solr 4.10.4 on CGSpace (linode18)</li>
<li>Deploy Tomcat 7.0.94 on CGSpace</li>
<li>Deploy dspace-statistics-api v1.0.0 on CGSpace</li>
<li>Linode support replicated the results I had from the network speed testing and said they don't know why it's so slow
<li>Linode support replicated the results I had from the network speed testing and said they don&rsquo;t know why it&rsquo;s so slow
<ul>
<li>They offered to live migrate the instance to another host to see if that helps</li>
</ul>
@ -1034,7 +1034,7 @@ TCP window size: 85.0 KByte (default)
</ul>
<h2 id="2019-04-22">2019-04-22</h2>
<ul>
<li>Abenet pointed out <a href="https://hdl.handle.net/10568/97912">an item</a> that doesn't have an Altmetric score on CGSpace, but has a score of 343 in the CGSpace Altmetric dashboard
<li>Abenet pointed out <a href="https://hdl.handle.net/10568/97912">an item</a> that doesn&rsquo;t have an Altmetric score on CGSpace, but has a score of 343 in the CGSpace Altmetric dashboard
<ul>
<li>I tweeted the Handle to see if it will pick it up&hellip;</li>
<li>Like clockwork, after fifteen minutes there was a donut showing on CGSpace</li>
@ -1062,7 +1062,7 @@ dspace.log.2019-04-20:1515
</ul>
<!-- raw HTML omitted -->
<ul>
<li>Perhaps that's why the <a href="https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/">Azure pricing</a> is so expensive!</li>
<li>Perhaps that&rsquo;s why the <a href="https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/">Azure pricing</a> is so expensive!</li>
<li>Add a privacy page to CGSpace
<ul>
<li>The work was mostly similar to the About page at <code>/page/about</code>, but in addition to adding i18n strings etc, I had to add the logic for the trail to <code>dspace-xmlui-mirage2/src/main/webapp/xsl/preprocess/general.xsl</code></li>
@ -1086,7 +1086,7 @@ dspace.log.2019-04-20:1515
</li>
<li>While I was uploading the IITA records I noticed that twenty of the records Sisay uploaded in 2018-09 had double Handles (<code>dc.identifier.uri</code>)
<ul>
<li>According to my notes in 2018-09 I had noticed this when he uploaded the records and told him to remove them, but he didn't&hellip;</li>
<li>According to my notes in 2018-09 I had noticed this when he uploaded the records and told him to remove them, but he didn&rsquo;t&hellip;</li>
<li>I exported the IITA community as a CSV then used <code>csvcut</code> to extract the two URI columns and identify and fix the records:</li>
</ul>
</li>
@ -1097,14 +1097,14 @@ dspace.log.2019-04-20:1515
<ul>
<li>I told him we never finished it, and that he should try to use the <code>/items/find-by-metadata-field</code> endpoint, with the caveat that you need to match the language attribute exactly (ie &ldquo;en&rdquo;, &ldquo;en_US&rdquo;, null, etc)</li>
<li>I asked him how many terms they are interested in, as we could probably make it easier by normalizing the language attributes of these fields (it would help us anyways)</li>
<li>He says he's getting HTTP 401 errors when trying to search for CPWF subject terms, which I can reproduce:</li>
<li>He says he&rsquo;s getting HTTP 401 errors when trying to search for CPWF subject terms, which I can reproduce:</li>
</ul>
</li>
</ul>
<pre><code>$ curl -f -H &quot;accept: application/json&quot; -H &quot;Content-Type: application/json&quot; -X POST &quot;https://dspacetest.cgiar.org/rest/items/find-by-metadata-field&quot; -d '{&quot;key&quot;:&quot;cg.subject.cpwf&quot;, &quot;value&quot;:&quot;WATER MANAGEMENT&quot;,&quot;language&quot;: &quot;en_US&quot;}'
curl: (22) The requested URL returned error: 401
</code></pre><ul>
<li>Note that curl only shows the HTTP 401 error if you use <code>-f</code> (fail), and only then if you <em>don't</em> include <code>-s</code>
<li>Note that curl only shows the HTTP 401 error if you use <code>-f</code> (fail), and only then if you <em>don&rsquo;t</em> include <code>-s</code>
<ul>
<li>I see there are about 1,000 items using CPWF subject &ldquo;WATER MANAGEMENT&rdquo; in the database, so there should definitely be results</li>
<li>The breakdown of <code>text_lang</code> fields used in those items is 942:</li>
@ -1129,7 +1129,7 @@ dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AN
417
(1 row)
</code></pre><ul>
<li>I see that the HTTP 401 issue seems to be a bug due to an item that the user doesn't have permission to access&hellip; from the DSpace log:</li>
<li>I see that the HTTP 401 issue seems to be a bug due to an item that the user doesn&rsquo;t have permission to access&hellip; from the DSpace log:</li>
</ul>
<pre><code>2019-04-24 08:11:51,129 INFO org.dspace.rest.ItemsResource @ Looking for item with metadata(key=cg.subject.cpwf,value=WATER MANAGEMENT, language=en_US).
2019-04-24 08:11:51,231 INFO org.dspace.usage.LoggerUsageEventListener @ anonymous::view_item:handle=10568/72448
@ -1209,7 +1209,7 @@ $ curl -f -H &quot;rest-dspace-token: b43d41a6-5ac1-455d-b49a-616b8debc25b&quot;
COPY 65752
</code></pre><h2 id="2019-04-28">2019-04-28</h2>
<ul>
<li>Still trying to figure out the issue with the items that cause the REST API's <code>/items/find-by-metadata-value</code> endpoint to throw an exception
<li>Still trying to figure out the issue with the items that cause the REST API&rsquo;s <code>/items/find-by-metadata-value</code> endpoint to throw an exception
<ul>
<li>I made the item private in the UI and then I see in the UI and PostgreSQL that it is no longer discoverable:</li>
</ul>
@ -1234,7 +1234,7 @@ COPY 65752
</ul>
<pre><code>$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
</code></pre><ul>
<li>Carlos from LandPortal asked if I could export CGSpace in a machine-readable format so I think I'll try to do a CSV
<li>Carlos from LandPortal asked if I could export CGSpace in a machine-readable format so I think I&rsquo;ll try to do a CSV
<ul>
<li>In order to make it easier for him to understand the CSV I will normalize the text languages (minus the provenance field) on my local development instance before exporting:</li>
</ul>