Add notes for 2019-12-17

This commit is contained in:
2019-12-17 14:49:24 +02:00
parent d83c951532
commit d54e5b69f1
90 changed files with 1420 additions and 1377 deletions

View File

@ -61,7 +61,7 @@ $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u ds
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -142,7 +142,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</p>
</header>
<h2 id="20190401">2019-04-01</h2>
<h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>
@ -165,7 +165,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre><h2 id="20190402">2019-04-02</h2>
</code></pre><h2 id="2019-04-02">2019-04-02</h2>
<ul>
<li>CTA says the Amazon IPs are AWS gateways for real user traffic</li>
<li>I was trying to add Felix Shaw's account back to the Administrators group on DSpace Test, but I couldn't find his name in the user search of the groups page
@ -175,7 +175,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</ul>
</li>
</ul>
<h2 id="20190403">2019-04-03</h2>
<h2 id="2019-04-03">2019-04-03</h2>
<ul>
<li>Maria from Bioversity emailed me a list of new ORCID identifiers for their researchers so I will add them to our controlled vocabulary
<ul>
@ -209,7 +209,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</code></pre><ul>
<li>I will have to keep an eye on it because nothing should be updating 2018 stats in 2019&hellip;</li>
</ul>
<h2 id="20190405">2019-04-05</h2>
<h2 id="2019-04-05">2019-04-05</h2>
<ul>
<li>Uptime Robot reported that CGSpace (linode18) went down tonight</li>
<li>I see there are lots of PostgreSQL connections:</li>
@ -238,7 +238,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</code></pre><ul>
<li>I restarted it again and all the Solr cores came up properly&hellip;</li>
</ul>
<h2 id="20190406">2019-04-06</h2>
<h2 id="2019-04-06">2019-04-06</h2>
<ul>
<li>Udana asked why item <a href="https://cgspace.cgiar.org/handle/10568/91278">10568/91278</a> didn't have an Altmetric badge on CGSpace, but on the <a href="https://wle.cgiar.org/food-and-agricultural-innovation-pathways-prosperity">WLE website</a> it does
<ul>
@ -297,7 +297,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</ul>
</li>
</ul>
<h2 id="20190407">2019-04-07</h2>
<h2 id="2019-04-07">2019-04-07</h2>
<ul>
<li>Looking into the impact of harvesters like <code>45.5.184.72</code>, I see in Solr that this user is not categorized as a bot so it definitely impacts the usage stats by some tens of thousands <em>per day</em></li>
<li>Last week CTA switched their frontend code to use HEAD requests instead of GET requests for bitstreams
@ -529,7 +529,7 @@ X-XSS-Protection: 1; mode=block
<li>It seems that the issue with CGSpace being &ldquo;down&rdquo; is actually because of CPU steal again!!!</li>
<li>I opened a ticket with support and asked them to migrate the VM to a less busy host</li>
</ul>
<h2 id="20190408">2019-04-08</h2>
<h2 id="2019-04-08">2019-04-08</h2>
<ul>
<li>Start checking IITA's last round of batch uploads from <a href="https://dspacetest.cgiar.org/handle/10568/100333">March on DSpace Test</a> (20193rd.xls)
<ul>
@ -623,7 +623,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exe
19 157.55.39.164
20 40.77.167.132
370 51.254.16.223
</code></pre><h2 id="20190409">2019-04-09</h2>
</code></pre><h2 id="2019-04-09">2019-04-09</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was 440% CPU for the last two hours this morning</li>
<li>Here are the top IPs in the web server logs around that time:</li>
@ -670,7 +670,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exe
</li>
<li>In other news, Linode staff identified a noisy neighbor sharing our host and migrated it elsewhere last night</li>
</ul>
<h2 id="20190410">2019-04-10</h2>
<h2 id="2019-04-10">2019-04-10</h2>
<ul>
<li>Abenet pointed out a possibility of validating funders against the <a href="https://support.crossref.org/hc/en-us/articles/215788143-Funder-data-via-the-API">CrossRef API</a></li>
<li>Note that if you use HTTPS and specify a contact address in the API request you have less likelihood of being blocked</li>
@ -684,7 +684,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exe
<pre><code>from habanero import Crossref
cr = Crossref(mailto=&quot;me@cgiar.org&quot;)
x = cr.funders(query = &quot;mercator&quot;)
</code></pre><h2 id="20190411">2019-04-11</h2>
</code></pre><h2 id="2019-04-11">2019-04-11</h2>
<ul>
<li>Continue proofing IITA's last round of batch uploads from <a href="https://dspacetest.cgiar.org/handle/10568/100333">March on DSpace Test</a> (20193rd.xls)
<ul>
@ -725,7 +725,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-04-11-delete-6-subjects.csv -db dspac
</ul>
</li>
</ul>
<h2 id="20190413">2019-04-13</h2>
<h2 id="2019-04-13">2019-04-13</h2>
<ul>
<li>I copied the <code>statistics</code> and <code>statistics-2018</code> Solr cores from CGSpace to my local machine and watched the Java process in VisualVM while indexing item views and downloads with my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a>:</li>
</ul>
@ -741,7 +741,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-04-11-delete-6-subjects.csv -db dspac
<li>I tried again with the GC tuning settings from the Solr 4.10.4 release:</li>
</ul>
<p><img src="/cgspace-notes/2019/04/visualvm-solr-indexing-solr-settings.png" alt="Java GC during Solr indexing Solr 4.10.4 settings"></p>
<h2 id="20190414">2019-04-14</h2>
<h2 id="2019-04-14">2019-04-14</h2>
<ul>
<li>Change DSpace Test (linode19) to use the Java GC tuning from the Solr 4.14.4 startup script:</li>
</ul>
@ -763,7 +763,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-04-11-delete-6-subjects.csv -db dspac
<li>I need to remember to check the Munin JVM graphs in a few days</li>
<li>It might be placebo, but the site <em>does</em> feel snappier&hellip;</li>
</ul>
<h2 id="20190415">2019-04-15</h2>
<h2 id="2019-04-15">2019-04-15</h2>
<ul>
<li>Rework the dspace-statistics-api to use the vanilla Python requests library instead of Solr client
<ul>
@ -806,11 +806,11 @@ return item_id
real 82m45.324s
user 7m33.446s
sys 2m13.463s
</code></pre><h2 id="20190416">2019-04-16</h2>
</code></pre><h2 id="2019-04-16">2019-04-16</h2>
<ul>
<li>Export IITA's community from CGSpace because they want to experiment with importing it into their internal DSpace for some testing or something</li>
</ul>
<h2 id="20190417">2019-04-17</h2>
<h2 id="2019-04-17">2019-04-17</h2>
<ul>
<li>Reading an interesting <a href="https://teaspoon-consulting.com/articles/solr-cache-tuning.html">blog post about Solr caching</a></li>
<li>Did some tests of the dspace-statistics-api on my local DSpace instance with 28 million documents in a sharded statistics core (<code>statistics</code> and <code>statistics-2018</code>) and monitored the memory usage of Tomcat in VisualVM</li>
@ -956,7 +956,7 @@ sys 2m13.463s
<li>Lots of CPU steal going on still on CGSpace (linode18):</li>
</ul>
<p><img src="/cgspace-notes/2019/04/cpu-week3.png" alt="CPU usage week"></p>
<h2 id="20190418">2019-04-18</h2>
<h2 id="2019-04-18">2019-04-18</h2>
<ul>
<li>I've been trying to copy the <code>statistics-2018</code> Solr core from CGSpace to DSpace Test since yesterday, but the network speed is like 20KiB/sec
<ul>
@ -984,7 +984,7 @@ sys 2m13.463s
</ul>
</li>
</ul>
<h2 id="20190420">2019-04-20</h2>
<h2 id="2019-04-20">2019-04-20</h2>
<ul>
<li>Linode agreed to move CGSpace (linode18) to a new machine shortly after I filed my ticket about CPU steal two days ago and now the load is much more sane:</li>
</ul>
@ -1020,7 +1020,7 @@ TCP window size: 85.0 KByte (default)
</ul>
</li>
</ul>
<h2 id="20190421">2019-04-21</h2>
<h2 id="2019-04-21">2019-04-21</h2>
<ul>
<li>Deploy Solr 4.10.4 on CGSpace (linode18)</li>
<li>Deploy Tomcat 7.0.94 on CGSpace</li>
@ -1031,7 +1031,7 @@ TCP window size: 85.0 KByte (default)
</ul>
</li>
</ul>
<h2 id="20190422">2019-04-22</h2>
<h2 id="2019-04-22">2019-04-22</h2>
<ul>
<li>Abenet pointed out <a href="https://hdl.handle.net/10568/97912">an item</a> that doesn't have an Altmetric score on CGSpace, but has a score of 343 in the CGSpace Altmetric dashboard
<ul>
@ -1055,7 +1055,7 @@ dspace.log.2019-04-20:1515
</ul>
</li>
</ul>
<h2 id="20190423">2019-04-23</h2>
<h2 id="2019-04-23">2019-04-23</h2>
<ul>
<li>One blog post says that there is <a href="https://kvaes.wordpress.com/2017/07/01/what-azure-virtual-machine-size-should-i-pick/">no overprovisioning in Azure</a>:</li>
</ul>
@ -1068,7 +1068,7 @@ dspace.log.2019-04-20:1515
</ul>
</li>
</ul>
<h2 id="20190424">2019-04-24</h2>
<h2 id="2019-04-24">2019-04-24</h2>
<ul>
<li>Linode migrated CGSpace (linode18) to a new host, but I am still getting poor performance when copying data to DSpace Test (linode19)
<ul>
@ -1159,7 +1159,7 @@ dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AN
</code></pre><ul>
<li>I sent a message to the dspace-tech mailing list to ask for help</li>
</ul>
<h2 id="20190425">2019-04-25</h2>
<h2 id="2019-04-25">2019-04-25</h2>
<ul>
<li>Peter pointed out that we need to remove Delicious and Google+ from our social sharing links
<ul>
@ -1200,13 +1200,13 @@ $ curl -f -H &quot;rest-dspace-token: b43d41a6-5ac1-455d-b49a-616b8debc25b&quot;
<li>Communicate with Carlos Tejo from the Land Portal about the <code>/items/find-by-metadata-value</code> endpoint</li>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
</ul>
<h2 id="20190426">2019-04-26</h2>
<h2 id="2019-04-26">2019-04-26</h2>
<ul>
<li>Export a list of authors for Peter to look through:</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-04-26-all-authors.csv with csv header;
COPY 65752
</code></pre><h2 id="20190428">2019-04-28</h2>
</code></pre><h2 id="2019-04-28">2019-04-28</h2>
<ul>
<li>Still trying to figure out the issue with the items that cause the REST API's <code>/items/find-by-metadata-value</code> endpoint to throw an exception
<ul>
@ -1226,7 +1226,7 @@ COPY 65752
</code></pre><ul>
<li>I even tried to &ldquo;expunge&rdquo; the item using an <a href="https://wiki.duraspace.org/display/DSDOC5x/Batch+Metadata+Editing#BatchMetadataEditing-Performing'actions'onitems">action in CSV</a>, and it said &ldquo;EXPUNGED!&rdquo; but the item is still there&hellip;</li>
</ul>
<h2 id="20190430">2019-04-30</h2>
<h2 id="2019-04-30">2019-04-30</h2>
<ul>
<li>Send mail to the dspace-tech mailing list to ask about the item expunge issue</li>
<li>Delete and re-create Podman container for dspacedb after pulling a new PostgreSQL container:</li>