mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2019-12-17
This commit is contained in:
@ -69,7 +69,7 @@ real 0m19.873s
|
||||
user 0m22.203s
|
||||
sys 0m1.979s
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.60.1" />
|
||||
<meta name="generator" content="Hugo 0.61.0" />
|
||||
|
||||
|
||||
|
||||
@ -150,7 +150,7 @@ sys 0m1.979s
|
||||
|
||||
</p>
|
||||
</header>
|
||||
<h2 id="20190201">2019-02-01</h2>
|
||||
<h2 id="2019-02-01">2019-02-01</h2>
|
||||
<ul>
|
||||
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
|
||||
<li>The top IPs before, during, and after this latest alert tonight were:</li>
|
||||
@ -186,7 +186,7 @@ sys 0m1.979s
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="20190202">2019-02-02</h2>
|
||||
<h2 id="2019-02-02">2019-02-02</h2>
|
||||
<ul>
|
||||
<li>Another alert from Linode about CGSpace (linode18) this morning, here are the top IPs in the web server logs before, during, and after that time:</li>
|
||||
</ul>
|
||||
@ -206,7 +206,7 @@ sys 0m1.979s
|
||||
<li>I will increase the Linode alert threshold from 275 to 300% because this is becoming too much!</li>
|
||||
<li>I tested the Atmire Metadata Quality Module (MQM)‘s duplicate checked on the some <a href="https://dspacetest.cgiar.org/handle/10568/81268">WLE items</a> that I helped Udana with a few months ago on DSpace Test (linode19) and indeed it found many duplicates!</li>
|
||||
</ul>
|
||||
<h2 id="20190203">2019-02-03</h2>
|
||||
<h2 id="2019-02-03">2019-02-03</h2>
|
||||
<ul>
|
||||
<li>This is seriously getting annoying, Linode sent another alert this morning that CGSpace (linode18) load was 377%!</li>
|
||||
<li>Here are the top IPs before, during, and after that time:</li>
|
||||
@ -268,7 +268,7 @@ sys 0m1.979s
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="20190204">2019-02-04</h2>
|
||||
<h2 id="2019-02-04">2019-02-04</h2>
|
||||
<ul>
|
||||
<li>Generate a list of CTA subjects from CGSpace for Peter:</li>
|
||||
</ul>
|
||||
@ -294,7 +294,7 @@ COPY 321
|
||||
<li>At this rate I think I just need to stop paying attention to these alerts—DSpace gets thrashed when people use the APIs properly and there's nothing we can do to improve REST API performance!</li>
|
||||
<li>Perhaps I just need to keep increasing the Linode alert threshold (currently 300%) for this host?</li>
|
||||
</ul>
|
||||
<h2 id="20190205">2019-02-05</h2>
|
||||
<h2 id="2019-02-05">2019-02-05</h2>
|
||||
<ul>
|
||||
<li>Peter sent me corrections and deletions for the CTA subjects and as usual, there were encoding errors with some accentsÁ in his file</li>
|
||||
<li>In other news, it seems that the GREL syntax regarding booleans changed in OpenRefine recently, so I need to update some expressions like the one I use to detect encoding errors to use <code>toString()</code>:</li>
|
||||
@ -328,7 +328,7 @@ MARKETING ET COMMERCE,MARKETING||COMMERCE
|
||||
NATURAL RESOURCES AND ENVIRONMENT,NATURAL RESOURCES MANAGEMENT||ENVIRONMENT
|
||||
PÊCHES ET AQUACULTURE,PÊCHES||AQUACULTURE
|
||||
PESCAS E AQUACULTURE,PISCICULTURA||AQUACULTURE
|
||||
</code></pre><h2 id="20190206">2019-02-06</h2>
|
||||
</code></pre><h2 id="2019-02-06">2019-02-06</h2>
|
||||
<ul>
|
||||
<li>I dumped the CTA community so I can try to fix the subjects with multiple subjects that Peter indicated in his corrections:</li>
|
||||
</ul>
|
||||
@ -406,7 +406,7 @@ PESCAS E AQUACULTURE,PISCICULTURA||AQUACULTURE
|
||||
4661 205.186.128.185
|
||||
4661 70.32.83.92
|
||||
5102 45.5.186.2
|
||||
</code></pre><h2 id="20190207">2019-02-07</h2>
|
||||
</code></pre><h2 id="2019-02-07">2019-02-07</h2>
|
||||
<ul>
|
||||
<li>Linode sent an alert last night that the load on CGSpace (linode18) was over 300%</li>
|
||||
<li>Here are the top IPs in the web server and API logs before, during, and after that time, respectively:</li>
|
||||
@ -491,7 +491,7 @@ Please see the DSpace documentation for assistance.
|
||||
<li>I can't connect to TCP port 25 on that server so I sent a mail to CGNET support to ask what's up</li>
|
||||
<li>CGNET said these servers were discontinued in 2018-01 and that I should use <a href="https://docs.microsoft.com/en-us/exchange/mail-flow-best-practices/how-to-set-up-a-multifunction-device-or-application-to-send-email-using-office-3">Office 365</a></li>
|
||||
</ul>
|
||||
<h2 id="20190208">2019-02-08</h2>
|
||||
<h2 id="2019-02-08">2019-02-08</h2>
|
||||
<ul>
|
||||
<li>I re-configured CGSpace to use the email/password for cgspace-support, but I get this error when I try the <code>test-email</code> script:</li>
|
||||
</ul>
|
||||
@ -500,7 +500,7 @@ Please see the DSpace documentation for assistance.
|
||||
</code></pre><ul>
|
||||
<li>I tried to log into Outlook 365 with the credentials but I think the ones I have must be wrong, so I will ask ICT to reset the password</li>
|
||||
</ul>
|
||||
<h2 id="20190209">2019-02-09</h2>
|
||||
<h2 id="2019-02-09">2019-02-09</h2>
|
||||
<ul>
|
||||
<li>Linode sent alerts about CPU load yesterday morning, yesterday night, and this morning! All over 300% CPU load!</li>
|
||||
<li>This is just for this morning:</li>
|
||||
@ -535,7 +535,7 @@ Please see the DSpace documentation for assistance.
|
||||
</code></pre><ul>
|
||||
<li>151.80.203.180 is on OVH so I sent a message to their abuse email…</li>
|
||||
</ul>
|
||||
<h2 id="20190210">2019-02-10</h2>
|
||||
<h2 id="2019-02-10">2019-02-10</h2>
|
||||
<ul>
|
||||
<li>Linode sent another alert about CGSpace (linode18) CPU load this morning, here are the top IPs in the web server XMLUI and API logs before, during, and after that time:</li>
|
||||
</ul>
|
||||
@ -624,12 +624,12 @@ Please see the DSpace documentation for assistance.
|
||||
# mkdir -p /home/aorth/.local/lib/containers/volumes/artifactory5_data
|
||||
# chown 1030 /home/aorth/.local/lib/containers/volumes/artifactory5_data
|
||||
# docker run --name artifactory --network dspace-build -d -v /home/aorth/.local/lib/containers/volumes/artifactory5_data:/var/opt/jfrog/artifactory -p 8081:8081 docker.bintray.io/jfrog/artifactory-oss
|
||||
</code></pre><h2 id="20190211">2019-02-11</h2>
|
||||
</code></pre><h2 id="2019-02-11">2019-02-11</h2>
|
||||
<ul>
|
||||
<li>Bosede from IITA said we can use “SOCIAL SCIENCE & AGRIBUSINESS” in their new IITA theme field to be consistent with other places they are using it</li>
|
||||
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
|
||||
</ul>
|
||||
<h2 id="20190212">2019-02-12</h2>
|
||||
<h2 id="2019-02-12">2019-02-12</h2>
|
||||
<ul>
|
||||
<li>I notice that <a href="https://jira.duraspace.org/browse/DS-3052">DSpace 6 has included a new JAR-based PDF thumbnailer based on PDFBox</a>, I wonder how good its thumbnails are and how it handles CMYK PDFs</li>
|
||||
<li>On a similar note, I wonder if we could use the performance-focused <a href="https://libvips.github.io/libvips/">libvps</a> and the third-party <a href="https://github.com/codecitizen/jlibvips/">jlibvips Java library</a> in DSpace</li>
|
||||
@ -658,7 +658,7 @@ dspacestatistics=# SELECT * FROM items WHERE downloads > 0 ORDER BY downloads
|
||||
</code></pre><ul>
|
||||
<li>I will read the PDFBox thumbnailer documentation to see if I can change the size and quality</li>
|
||||
</ul>
|
||||
<h2 id="20190213">2019-02-13</h2>
|
||||
<h2 id="2019-02-13">2019-02-13</h2>
|
||||
<ul>
|
||||
<li>ILRI ICT reset the password for the CGSpace mail account, but I still can't get it to send mail from DSpace's <code>test-email</code> utility</li>
|
||||
<li>I even added extra mail properties to <code>dspace.cfg</code> as suggested by someone on the dspace-tech mailing list:</li>
|
||||
@ -735,7 +735,7 @@ $ podman run --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspace
|
||||
<li>I increased the nginx upload limit, but she said she was having problems and couldn't really tell me why</li>
|
||||
<li>I logged in as her and completed the submission with no problems…</li>
|
||||
</ul>
|
||||
<h2 id="20190215">2019-02-15</h2>
|
||||
<h2 id="2019-02-15">2019-02-15</h2>
|
||||
<ul>
|
||||
<li>Tomcat was killed around 3AM by the kernel's OOM killer according to <code>dmesg</code>:</li>
|
||||
</ul>
|
||||
@ -805,7 +805,7 @@ $ podman start artifactory
|
||||
</code></pre><ul>
|
||||
<li>More on the <a href="https://podman.io/blogs/2018/10/03/podman-remove-content-homedir.html">subuid permissions issue with rootless containers here</a></li>
|
||||
</ul>
|
||||
<h2 id="20190217">2019-02-17</h2>
|
||||
<h2 id="2019-02-17">2019-02-17</h2>
|
||||
<ul>
|
||||
<li>I ran DSpace's cleanup task on CGSpace (linode18) and there were errors:</li>
|
||||
</ul>
|
||||
@ -821,7 +821,7 @@ UPDATE 1
|
||||
<li>I merged the Atmire Metadata Quality Module (MQM) changes to the <code>5_x-prod</code> branch and deployed it on CGSpace (<a href="https://github.com/ilri/DSpace/pull/407">#407</a>)</li>
|
||||
<li>Then I ran all system updates on CGSpace server and rebooted it</li>
|
||||
</ul>
|
||||
<h2 id="20190218">2019-02-18</h2>
|
||||
<h2 id="2019-02-18">2019-02-18</h2>
|
||||
<ul>
|
||||
<li>Jesus fucking Christ, Linode sent an alert that CGSpace (linode18) was using 421% CPU for a few hours this afternoon (server time):</li>
|
||||
<li>There seems to have been a lot of activity in XMLUI:</li>
|
||||
@ -942,7 +942,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
|
||||
</code></pre><ul>
|
||||
<li>I merged the changes to the <code>5_x-prod</code> branch and they will go live the next time we re-deploy CGSpace (<a href="https://github.com/ilri/DSpace/pull/412">#412</a>)</li>
|
||||
</ul>
|
||||
<h2 id="20190219">2019-02-19</h2>
|
||||
<h2 id="2019-02-19">2019-02-19</h2>
|
||||
<ul>
|
||||
<li>Linode sent another alert about CPU usage on CGSpace (linode18) averaging 417% this morning</li>
|
||||
<li>Unfortunately, I don't see any strange activity in the web server API or XMLUI logs at that time in particular</li>
|
||||
@ -1028,7 +1028,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
|
||||
</code></pre><ul>
|
||||
<li>I wrote a quick and dirty Python script called <code>resolve-addresses.py</code> to resolve IP addresses to their owning organization's name, ASN, and country using the <a href="https://ipapi.co">IPAPI.co API</a></li>
|
||||
</ul>
|
||||
<h2 id="20190220">2019-02-20</h2>
|
||||
<h2 id="2019-02-20">2019-02-20</h2>
|
||||
<ul>
|
||||
<li>Ben Hack was asking about getting authors publications programmatically from CGSpace for the new ILRI website</li>
|
||||
<li>I told him that they should probably try to use the REST API's <code>find-by-metadata-field</code> endpoint</li>
|
||||
@ -1049,7 +1049,7 @@ $ curl -s -H "accept: application/json" -H "Content-Type: applica
|
||||
<li>See this <a href="https://jira.duraspace.org/browse/VIVO-1655">issue on the VIVO tracker</a> for more information about this endpoint</li>
|
||||
<li>The old-school AGROVOC SOAP WSDL works with the <a href="https://python-zeep.readthedocs.io/en/master/">Zeep Python library</a>, but in my tests the results are way too broad despite trying to use a “exact match” searching</li>
|
||||
</ul>
|
||||
<h2 id="20190221">2019-02-21</h2>
|
||||
<h2 id="2019-02-21">2019-02-21</h2>
|
||||
<ul>
|
||||
<li>I wrote a script <a href="https://github.com/ilri/DSpace/blob/5_x-prod/agrovoc-lookup.py">agrovoc-lookup.py</a> to resolve subject terms against the public AGROVOC REST API</li>
|
||||
<li>It allows specifying the language the term should be queried in as well as output files to save the matched and unmatched terms to</li>
|
||||
@ -1088,7 +1088,7 @@ COPY 33
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="20190222">2019-02-22</h2>
|
||||
<h2 id="2019-02-22">2019-02-22</h2>
|
||||
<ul>
|
||||
<li>
|
||||
<p>Help Udana from WLE with some issues related to CGSpace items on their <a href="https://www.wle.cgiar.org/publications">Publications website</a></p>
|
||||
@ -1134,7 +1134,7 @@ return "unmatched"
|
||||
<li>You have to make sure to URL encode the value with <code>quote_plus()</code> and it totally works, but it seems to refresh the facets (and therefore re-query everything) when you select a facet so that makes it basically unusable</li>
|
||||
<li>There is a <a href="https://programminghistorian.org/en/lessons/fetch-and-parse-data-with-openrefine#example-2-url-queries-and-parsing-json">good resource discussing OpenRefine, Jython, and web scraping</a></li>
|
||||
</ul>
|
||||
<h2 id="20190224">2019-02-24</h2>
|
||||
<h2 id="2019-02-24">2019-02-24</h2>
|
||||
<ul>
|
||||
<li>I decided to try to validate the AGROVOC subjects in IITA's recent batch upload by dumping all their terms, checking them in en/es/fr with <code>agrovoc-lookup.py</code>, then reconciling against the final list using reconcile-csv with OpenRefine</li>
|
||||
<li>I'm not sure how to deal with terms like “CORN” that are alternative labels (<code>altLabel</code>) in AGROVOC where the preferred label (<code>prefLabel</code>) would be “MAIZE”</li>
|
||||
@ -1163,7 +1163,7 @@ return "unmatched"
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="20190225">2019-02-25</h2>
|
||||
<h2 id="2019-02-25">2019-02-25</h2>
|
||||
<ul>
|
||||
<li>There seems to be something going on with Solr on CGSpace (linode18) because statistics on communities and collections are blank for January and February this year</li>
|
||||
<li>I see some errors started recently in Solr (yesterday):</li>
|
||||
@ -1257,7 +1257,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
|
||||
<ul>
|
||||
<li>I still have not figured out what the <em>real</em> cause for the Solr cores to not load was, though</li>
|
||||
</ul>
|
||||
<h2 id="20190226">2019-02-26</h2>
|
||||
<h2 id="2019-02-26">2019-02-26</h2>
|
||||
<ul>
|
||||
<li>I sent a mail to the dspace-tech mailing list about the “solr_update_time_stamp” error</li>
|
||||
<li>A CCAFS user sent a message saying they got this error when submitting to CGSpace:</li>
|
||||
@ -1268,7 +1268,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
|
||||
<li>I looked at the <code>WORKFLOW_STEP_1</code> (Accept/Reject) and the group is of course empty</li>
|
||||
<li>As we've seen several times recently, we are not using this step so it should simply be deleted</li>
|
||||
</ul>
|
||||
<h2 id="20190227">2019-02-27</h2>
|
||||
<h2 id="2019-02-27">2019-02-27</h2>
|
||||
<ul>
|
||||
<li>Discuss batch uploads with Sisay</li>
|
||||
<li>He's trying to upload some CTA records, but it's not possible to do collection mapping when using the web UI
|
||||
@ -1291,7 +1291,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="20190228">2019-02-28</h2>
|
||||
<h2 id="2019-02-28">2019-02-28</h2>
|
||||
<ul>
|
||||
<li>I helped Sisay upload the nineteen CTA records from last week via the command line because they required mappings (which is not possible to do via the batch upload web interface)</li>
|
||||
</ul>
|
||||
|
Reference in New Issue
Block a user