Add notes for 2019-12-17

This commit is contained in:
2019-12-17 14:49:24 +02:00
parent d83c951532
commit d54e5b69f1
90 changed files with 1420 additions and 1377 deletions

View File

@ -147,7 +147,7 @@ dspace.log.2018-01-02:34
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -228,7 +228,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
</p>
</header>
<h2 id="20180102">2018-01-02</h2>
<h2 id="2018-01-02">2018-01-02</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
@ -295,7 +295,7 @@ dspace.log.2018-01-02:34
</code></pre><ul>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains</li>
</ul>
<h2 id="20180103">2018-01-03</h2>
<h2 id="2018-01-03">2018-01-03</h2>
<ul>
<li>I woke up to more up and down of CGSpace, this time UptimeRobot noticed a few rounds of up and down of a few minutes each and Linode also notified of high CPU load from 12 to 2 PM</li>
<li>Looks like I need to increase the database pool size again:</li>
@ -389,7 +389,7 @@ dspace.log.2018-01-03:1909
<li>I guess for now I just have to increase the database connection pool's max active</li>
<li>It's currently 75 and normally I'd just bump it by 25 but let me be a bit daring and push it by 50 to 125, because I used to see at least 121 connections in pg_stat_activity before when we were using the shitty default pooling</li>
</ul>
<h2 id="20180104">2018-01-04</h2>
<h2 id="2018-01-04">2018-01-04</h2>
<ul>
<li>CGSpace went down and up a bunch of times last night and ILRI staff were complaining a lot last night</li>
<li>The XMLUI logs show this activity:</li>
@ -423,7 +423,7 @@ dspace.log.2018-01-04:1559
<li>Once I get back to Amman I will have to try to create different database pools for different web applications, like recently discussed on the dspace-tech mailing list</li>
<li>Create accounts on CGSpace for two CTA staff <a href="mailto:km4ard@cta.int">km4ard@cta.int</a> and <a href="mailto:bheenick@cta.int">bheenick@cta.int</a></li>
</ul>
<h2 id="20180105">2018-01-05</h2>
<h2 id="2018-01-05">2018-01-05</h2>
<ul>
<li>Peter said that CGSpace was down last night and Tsega restarted Tomcat</li>
<li>I don't see any alerts from Linode or UptimeRobot, and there are no PostgreSQL connection errors in the dspace logs for today:</li>
@ -453,7 +453,7 @@ sys 3m14.890s
</code></pre><ul>
<li>Reboot CGSpace and DSpace Test for new kernels (4.14.12-x86_64-linode92) that partially mitigate the <a href="https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/">Spectre and Meltdown CPU vulnerabilities</a></li>
</ul>
<h2 id="20180106">2018-01-06</h2>
<h2 id="2018-01-06">2018-01-06</h2>
<ul>
<li>I'm still seeing Solr errors in the DSpace logs even after the full reindex yesterday:</li>
</ul>
@ -461,14 +461,14 @@ sys 3m14.890s
</code></pre><ul>
<li>I posted a message to the dspace-tech mailing list to see if anyone can help</li>
</ul>
<h2 id="20180109">2018-01-09</h2>
<h2 id="2018-01-09">2018-01-09</h2>
<ul>
<li>Advise Sisay about blank lines in some IITA records</li>
<li>Generate a list of author affiliations for Peter to clean up:</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
COPY 4515
</code></pre><h2 id="20180110">2018-01-10</h2>
</code></pre><h2 id="2018-01-10">2018-01-10</h2>
<ul>
<li>I looked to see what happened to this year's Solr statistics sharding task that should have run on 2018-01-01 and of course it failed:</li>
</ul>
@ -619,7 +619,7 @@ cache_alignment : 64
<li>Citing concerns with metadata quality, I suggested adding him on DSpace Test first</li>
<li>I opened a ticket with Atmire to ask them about DSpace 5.8 compatibility: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560</a></li>
</ul>
<h2 id="20180111">2018-01-11</h2>
<h2 id="2018-01-11">2018-01-11</h2>
<ul>
<li>The PostgreSQL and firewall graphs from this week show clearly the load from the new bot from PerfectIP.net yesterday:</li>
</ul>
@ -673,7 +673,7 @@ cache_alignment : 64
</code></pre><ul>
<li>With that it is super easy to see where PostgreSQL connections are coming from in <code>pg_stat_activity</code></li>
</ul>
<h2 id="20180112">2018-01-12</h2>
<h2 id="2018-01-12">2018-01-12</h2>
<ul>
<li>I'm looking at the <a href="https://wiki.duraspace.org/display/DSDOC6x/Installing+DSpace#InstallingDSpace-ServletEngine(ApacheTomcat7orlater,Jetty,CauchoResinorequivalent)">DSpace 6.0 Install docs</a> and notice they tweak the number of threads in their Tomcat connector:</li>
</ul>
@ -698,7 +698,7 @@ cache_alignment : 64
</code></pre><ul>
<li>That could be very interesting</li>
</ul>
<h2 id="20180113">2018-01-13</h2>
<h2 id="2018-01-13">2018-01-13</h2>
<ul>
<li>Still testing DSpace 6.2 on Tomcat 8.5.24</li>
<li>Catalina errors at Tomcat 8.5 startup:</li>
@ -741,14 +741,14 @@ Caused by: java.lang.NullPointerException
<li>Shit, this might actually be a DSpace error: <a href="https://jira.duraspace.org/browse/DS-3434">https://jira.duraspace.org/browse/DS-3434</a></li>
<li>I'll comment on that issue</li>
</ul>
<h2 id="20180114">2018-01-14</h2>
<h2 id="2018-01-14">2018-01-14</h2>
<ul>
<li>Looking at the authors Peter had corrected</li>
<li>Some had multiple and he's corrected them by adding <code>||</code> in the correction column, but I can't process those this way so I will just have to flag them and do those manually later</li>
<li>Also, I can flag the values that have &ldquo;DELETE&rdquo;</li>
<li>Then I need to facet the correction column on isBlank(value) and not flagged</li>
</ul>
<h2 id="20180115">2018-01-15</h2>
<h2 id="2018-01-15">2018-01-15</h2>
<ul>
<li>Help Udana from IWMI export a CSV from DSpace Test so he can start trying a batch upload</li>
<li>I'm going to apply these ~130 corrections on CGSpace:</li>
@ -830,7 +830,7 @@ COPY 4552
real 0m25.756s
user 0m28.016s
sys 0m2.210s
</code></pre><h2 id="20180116">2018-01-16</h2>
</code></pre><h2 id="2018-01-16">2018-01-16</h2>
<ul>
<li>Meeting with CGSpace team, a few action items:
<ul>
@ -849,7 +849,7 @@ sys 0m2.210s
<li>I ended up creating a Jira issue for my <code>db.jndi</code> documentation fix: <a href="https://jira.duraspace.org/browse/DS-3803">DS-3803</a></li>
<li>The DSpace developers said they wanted each pull request to be associated with a Jira issue</li>
</ul>
<h2 id="20180117">2018-01-17</h2>
<h2 id="2018-01-17">2018-01-17</h2>
<ul>
<li>Abenet asked me to proof and upload 54 records for LIVES</li>
<li>A few records were missing countries (even though they're all from Ethiopia)</li>
@ -990,7 +990,7 @@ $ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/
<li>Overall the heap space usage in the munin graph seems ok, though I usually increase it by 512MB over the average a few times per year as usage grows</li>
<li>But maybe I should increase it by more, like 1024MB, to give a bit more head room</li>
</ul>
<h2 id="20180118">2018-01-18</h2>
<h2 id="2018-01-18">2018-01-18</h2>
<ul>
<li>UptimeRobot said CGSpace was down for 1 minute last night</li>
<li>I don't see any errors in the nginx or catalina logs, so I guess UptimeRobot just got impatient and closed the request, which caused nginx to send an HTTP 499</li>
@ -1013,7 +1013,7 @@ Jan 18 07:01:22 linode18 sudo[10812]: pam_unix(sudo:session): session opened for
<li>I had to cancel the Discovery indexing and I'll have to re-try it another time when the server isn't so busy (it had already taken two hours and wasn't even close to being done)</li>
<li>For now I've increased the Tomcat JVM heap from 5632 to 6144m, to give ~1GB of free memory over the average usage to hopefully account for spikes caused by load or background jobs</li>
</ul>
<h2 id="20180119">2018-01-19</h2>
<h2 id="2018-01-19">2018-01-19</h2>
<ul>
<li>Linode alerted and said that the CPU load was 264.1% on CGSpace</li>
<li>Start the Discovery indexing again:</li>
@ -1029,7 +1029,7 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspa
</code></pre><ul>
<li>I told Peter we should keep an eye out and try again next week</li>
</ul>
<h2 id="20180120">2018-01-20</h2>
<h2 id="2018-01-20">2018-01-20</h2>
<ul>
<li>Run the authority indexing script on CGSpace and of course it died:</li>
</ul>
@ -1072,7 +1072,7 @@ $ docker exec dspace_db psql -U postgres dspace -c 'alter user dspace nocreateus
$ docker exec dspace_db vacuumdb -U postgres dspace
$ docker cp ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspace_db:/tmp
$ docker exec dspace_db psql -U dspace -f /tmp/update-sequences.sql dspace
</code></pre><h2 id="20180122">2018-01-22</h2>
</code></pre><h2 id="2018-01-22">2018-01-22</h2>
<ul>
<li>Look over Udana's CSV of 25 WLE records from last week</li>
<li>I sent him some corrections:
@ -1106,7 +1106,7 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
<li>I'd still like to get arbitrary mbeans like activeSessions etc, though</li>
<li>I can't remember if I had to configure the jmx settings in <code>/etc/munin/plugin-conf.d/munin-node</code> or not—I think all I did was re-run the <code>munin-node-configure</code> script and of course enable JMX in Tomcat's JVM options</li>
</ul>
<h2 id="20180123">2018-01-23</h2>
<h2 id="2018-01-23">2018-01-23</h2>
<ul>
<li>Thinking about generating a jmeter test plan for DSpace, along the lines of <a href="https://github.com/Georgetown-University-Libraries/dspace-performance-test">Georgetown's dspace-performance-test</a></li>
<li>I got a list of all the GET requests on CGSpace for January 21st (the last time Linode complained the load was high), excluding admin calls:</li>
@ -1141,7 +1141,7 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
</code></pre><ul>
<li>I can definitely design a test plan on this!</li>
</ul>
<h2 id="20180124">2018-01-24</h2>
<h2 id="2018-01-24">2018-01-24</h2>
<ul>
<li>Looking at the REST requests, most of them are to expand all or metadata, but 5% are for retrieving bitstreams:</li>
</ul>
@ -1205,7 +1205,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
<li>Then I generated reports for these runs like this:</li>
</ul>
<pre><code>$ jmeter -g 2018-01-24-linode5451120-baseline.jtl -o 2018-01-24-linode5451120-baseline
</code></pre><h2 id="20180125">2018-01-25</h2>
</code></pre><h2 id="2018-01-25">2018-01-25</h2>
<ul>
<li>Run another round of tests on DSpace Test with jmeter after changing Tomcat's <code>minSpareThreads</code> to 20 (default is 10) and <code>acceptorThreadCount</code> to 2 (default is 1):</li>
</ul>
@ -1222,7 +1222,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
</code></pre><ul>
<li>I haven't had time to look at the results yet</li>
</ul>
<h2 id="20180126">2018-01-26</h2>
<h2 id="2018-01-26">2018-01-26</h2>
<ul>
<li>Peter followed up about some of the points from the Skype meeting last week</li>
<li>Regarding the ORCID field issue, I see <a href="http://repo.mel.cgiar.org/handle/20.500.11766/7668?show=full">ICARDA's MELSpace is using <code>cg.creator.ID</code></a>: 0000-0001-9156-7691</li>
@ -1246,7 +1246,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
<li>I submitted a test item with ORCiDs and dc.rights from a controlled vocabulary on DSpace Test: <a href="https://dspacetest.cgiar.org/handle/10568/97703">https://dspacetest.cgiar.org/handle/10568/97703</a></li>
<li>I will send it to Peter to check and give feedback (ie, about the ORCiD field name as well as allowing users to add ORCiDs manually or not)</li>
</ul>
<h2 id="20180128">2018-01-28</h2>
<h2 id="2018-01-28">2018-01-28</h2>
<ul>
<li>Assist Udana from WLE again to proof his 25 records and upload them to DSpace Test</li>
<li>I am playing with the <code>startStopThreads=&quot;0&quot;</code> parameter in Tomcat <code>&lt;Engine&gt;</code> and <code>&lt;Host&gt;</code> configuration</li>
@ -1254,7 +1254,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
<li>On my local test machine the startup time went from 70 to 30 seconds</li>
<li>See: <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/host.html">https://tomcat.apache.org/tomcat-7.0-doc/config/host.html</a></li>
</ul>
<h2 id="20180129">2018-01-29</h2>
<h2 id="2018-01-29">2018-01-29</h2>
<ul>
<li>CGSpace went down this morning for a few minutes, according to UptimeRobot</li>
<li>Looking at the DSpace logs I see this error happened just before UptimeRobot noticed it going down:</li>
@ -1353,7 +1353,7 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name=&quot;jdbc/dspace&quot;
</code></pre><ul>
<li>I filed a ticket with Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566</a></li>
</ul>
<h2 id="20180131">2018-01-31</h2>
<h2 id="2018-01-31">2018-01-31</h2>
<ul>
<li>UptimeRobot says CGSpace went down at 7:57 AM, and indeed I see a lot of HTTP 499 codes in nginx logs</li>
<li>PostgreSQL activity shows 222 database connections</li>