1
0
mirror of https://github.com/alanorth/cgspace-notes.git synced 2024-12-26 15:04:30 +01:00
cgspace-notes/docs/2021-04/index.html

539 lines
26 KiB
HTML
Raw Normal View History

2021-04-05 18:36:44 +02:00
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="April, 2021" />
<meta property="og:description" content="2021-04-01
I wrote a script to query Sherpa&rsquo;s API for our ISSNs: sherpa-issn-lookup.py
I&rsquo;m curious to see how the results compare with the results from Crossref yesterday
AReS Explorer was down since this morning, I didn&rsquo;t see anything in the systemd journal
I simply took everything down with docker-compose and then back up, and then it was OK
Perhaps one of the containers crashed, I should have looked closer but I was in a hurry
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2021-04/" />
<meta property="article:published_time" content="2021-04-01T09:50:54+03:00" />
2021-04-06 21:48:44 +02:00
<meta property="article:modified_time" content="2021-04-06T22:33:43+03:00" />
2021-04-05 18:36:44 +02:00
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="April, 2021"/>
<meta name="twitter:description" content="2021-04-01
I wrote a script to query Sherpa&rsquo;s API for our ISSNs: sherpa-issn-lookup.py
I&rsquo;m curious to see how the results compare with the results from Crossref yesterday
AReS Explorer was down since this morning, I didn&rsquo;t see anything in the systemd journal
I simply took everything down with docker-compose and then back up, and then it was OK
Perhaps one of the containers crashed, I should have looked closer but I was in a hurry
"/>
<meta name="generator" content="Hugo 0.82.0" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "April, 2021",
"url": "https://alanorth.github.io/cgspace-notes/2021-04/",
2021-04-06 21:48:44 +02:00
"wordCount": "1824",
2021-04-05 18:36:44 +02:00
"datePublished": "2021-04-01T09:50:54+03:00",
2021-04-06 21:48:44 +02:00
"dateModified": "2021-04-06T22:33:43+03:00",
2021-04-05 18:36:44 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2021-04/">
<title>April, 2021 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.beb8012edc08ba10be012f079d618dc243812267efe62e11f22fe49618f976a4.css" rel="stylesheet" integrity="sha256-vrgBLtwIuhC&#43;AS8HnWGNwkOBImfv5i4R8i/klhj5dqQ=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.ffbfea088a9a1666ec65c3a8cb4906e2a0e4f92dc70dbbf400a125ad2422123a.js" integrity="sha256-/7/qCIqaFmbsZcOoy0kG4qDk&#43;S3HDbv0AKElrSQiEjo=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2021-04/">April, 2021</a></h2>
<p class="blog-post-meta">
<time datetime="2021-04-01T09:50:54+03:00">Thu Apr 01, 2021</time>
in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2021-04-01">2021-04-01</h2>
<ul>
<li>I wrote a script to query Sherpa&rsquo;s API for our ISSNs: <code>sherpa-issn-lookup.py</code>
<ul>
<li>I&rsquo;m curious to see how the results compare with the results from Crossref yesterday</li>
</ul>
</li>
<li>AReS Explorer was down since this morning, I didn&rsquo;t see anything in the systemd journal
<ul>
<li>I simply took everything down with docker-compose and then back up, and then it was OK</li>
<li>Perhaps one of the containers crashed, I should have looked closer but I was in a hurry</li>
</ul>
</li>
</ul>
<h2 id="2021-04-03">2021-04-03</h2>
<ul>
<li>Biruk from ICT contacted me to say that some CGSpace users still can&rsquo;t log in
<ul>
<li>I guess the CGSpace LDAP bind account is really still locked after last week&rsquo;s reset</li>
<li>He fixed the account and then I was finally able to bind and query:</li>
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;cgspace-account&quot; -W &quot;(sAMAccountName=otheraccounttoquery)&quot;
</code></pre><h2 id="2021-04-04">2021-04-04</h2>
<ul>
<li>Check the index aliases on AReS Explorer to make sure they are sane before starting a new harvest:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool | less
</code></pre><ul>
<li>Then set the <code>openrxv-items-final</code> index to read-only so we can make a backup:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -X PUT &quot;localhost:9200/openrxv-items-final/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: true}}'
{&quot;acknowledged&quot;:true}%
$ curl -s -X POST http://localhost:9200/openrxv-items-final/_clone/openrxv-items-final-backup
{&quot;acknowledged&quot;:true,&quot;shards_acknowledged&quot;:true,&quot;index&quot;:&quot;openrxv-items-final-backup&quot;}%
$ curl -X PUT &quot;localhost:9200/openrxv-items-final/_settings&quot; -H 'Content-Type: application/json' -d'{&quot;settings&quot;: {&quot;index.blocks.write&quot;: false}}'
</code></pre><ul>
<li>Then start a harvesting on AReS Explorer</li>
<li>Help Enrico get some 2020 statistics for the Roots, Tubers and Bananas (RTB) community on CGSpace
<ul>
<li>He was hitting <a href="https://github.com/ilri/OpenRXV/issues/66">a bug on AReS</a> and also he only needed stats for 2020, and AReS currently only gives all-time stats</li>
</ul>
</li>
<li>I cleaned up about 230 ISSNs on CGSpace in OpenRefine
<ul>
<li>I had exported them last week, then filtered for anything not looking like an ISSN with this GREL: <code>isNotNull(value.match(/^\p{Alnum}{4}-\p{Alnum}{4}$/))</code></li>
<li>Then I applied them on CGSpace with the <code>fix-metadata-values.py</code> script:</li>
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ ./ilri/fix-metadata-values.py -i /tmp/2021-04-01-ISSNs.csv -db dspace -u dspace -p 'fuuu' -f cg.issn -t 'correct' -m 253
</code></pre><ul>
<li>For now I only fixed obvious errors like &ldquo;1234-5678.&rdquo; and &ldquo;e-ISSN: 1234-5678&rdquo; etc, but there are still lots of invalid ones which need more manual work:
<ul>
<li>Too few characters</li>
<li>Too many characters</li>
<li>ISBNs</li>
</ul>
</li>
<li>Create the CGSpace community and collection structure for the new Accelerating Impacts of CGIAR Climate Research for Africa (AICCRA) and assign all workflow steps</li>
</ul>
<h2 id="2021-04-04-1">2021-04-04</h2>
<ul>
<li>The AReS Explorer harvesting from yesterday finished, and the results look OK, but actually the Elasticsearch indexes are messed up again:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s 'http://localhost:9200/_alias/' | python -m json.tool
{
&quot;openrxv-items-final&quot;: {
&quot;aliases&quot;: {}
},
&quot;openrxv-items-temp&quot;: {
&quot;aliases&quot;: {
&quot;openrxv-items&quot;: {}
}
},
...
}
</code></pre><ul>
<li><code>openrxv-items</code> should be an alias of <code>openrxv-items-final</code>, not <code>openrxv-temp</code>&hellip; I will have to fix that manually</li>
<li>Enrico asked for more information on the RTB stats I gave him yesterday
<ul>
<li>I remembered (again) that we can&rsquo;t filter Atmire&rsquo;s CUA stats by date issued</li>
<li>To show, for example, views/downloads in the year 2020 for RTB issued in 2020, we would need to use the DSpace statistics API and post a list of IDs and a custom date range</li>
<li>I tried to do that here by exporting the RTB community and extracting the IDs for items issued in 2020:</li>
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ ~/dspace63/bin/dspace metadata-export -i 10568/80100 -f /tmp/rtb.csv
$ csvcut -c 'id,dcterms.issued,dcterms.issued[],dcterms.issued[en_US]' /tmp/rtb.csv | \
sed '1d' | \
csvsql --no-header --no-inference --query 'SELECT a AS id,COALESCE(b, &quot;&quot;)||COALESCE(c, &quot;&quot;)||COALESCE(d, &quot;&quot;) AS issued FROM stdin' | \
csvgrep -c issued -m 2020 | \
csvcut -c id | \
sed '1d' | \
sort | \
uniq
</code></pre><ul>
<li>So I remember in the future, this basically does the following:
<ul>
<li>Use csvcut to extract the id and all date issued columns from the CSV</li>
<li>Use sed to remove the header so we can refer to the columns using default a, b, c instead of their real names (which are tricky to match due to special characters)</li>
<li>Use csvsql to concatenate the various date issued columns (coalescing where null)</li>
<li>Use csvgrep to filter items by date issued in 2020</li>
<li>Use csvcut to extract the id column</li>
<li>Use sed to delete the header row</li>
<li>Use sort and uniq to filter out any duplicate IDs (there were three)</li>
</ul>
</li>
<li>Then I have a list of 296 IDs for RTB items issued in 2020</li>
<li>I constructed a JSON file to post to the DSpace Statistics API:</li>
</ul>
<div class="highlight"><pre style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-json" data-lang="json">{
<span style="color:#f92672">&#34;limit&#34;</span>: <span style="color:#ae81ff">100</span>,
<span style="color:#f92672">&#34;page&#34;</span>: <span style="color:#ae81ff">0</span>,
<span style="color:#f92672">&#34;dateFrom&#34;</span>: <span style="color:#e6db74">&#34;2020-01-01T00:00:00Z&#34;</span>,
<span style="color:#f92672">&#34;dateTo&#34;</span>: <span style="color:#e6db74">&#34;2020-12-31T00:00:00Z&#34;</span>,
<span style="color:#f92672">&#34;items&#34;</span>: [
<span style="color:#e6db74">&#34;00358715-b70c-4fdd-aa55-730e05ba739e&#34;</span>,
<span style="color:#e6db74">&#34;004b54bb-f16f-4cec-9fbc-ab6c6345c43d&#34;</span>,
<span style="color:#e6db74">&#34;02fb7630-d71a-449e-b65d-32b4ea7d6904&#34;</span>,
<span style="color:#960050;background-color:#1e0010">...</span>
]
}
</code></pre></div><ul>
<li>Then I submitted the file three times (changing the page parameter):</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s -d @/tmp/2020-items.txt https://cgspace.cgiar.org/rest/statistics/items | json_pp &gt; /tmp/page1.json
$ curl -s -d @/tmp/2020-items.txt https://cgspace.cgiar.org/rest/statistics/items | json_pp &gt; /tmp/page2.json
$ curl -s -d @/tmp/2020-items.txt https://cgspace.cgiar.org/rest/statistics/items | json_pp &gt; /tmp/page3.json
</code></pre><ul>
<li>Then I extracted the views and downloads in the most ridiculous way:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ grep views /tmp/page*.json | grep -o -E '[0-9]+$' | sed 's/,//' | xargs | sed -e 's/ /+/g' | bc
30364
$ grep downloads /tmp/page*.json | grep -o -E '[0-9]+,' | sed 's/,//' | xargs | sed -e 's/ /+/g' | bc
9100
</code></pre><ul>
<li>For curiousity I did the same exercise for items issued in 2019 and got the following:
<ul>
<li>Views: 30721</li>
<li>Downloads: 10205</li>
</ul>
</li>
</ul>
2021-04-06 21:33:43 +02:00
<h2 id="2021-04-06">2021-04-06</h2>
<ul>
<li>Margarita from CCAFS was having problems deleting an item from CGSpace again
<ul>
<li>The error was &ldquo;Authorization denied for action OBSOLETE (DELETE) on BITSTREAM:bd157345-448e &hellip;&rdquo;</li>
<li>This is the same issue as last month</li>
</ul>
</li>
<li>Create a new collection on CGSpace for a new CIP project at Mishel Portilla&rsquo;s request</li>
<li>I got a notice that CGSpace was down
<ul>
<li>I didn&rsquo;t see anything strange at first, but there are an insane amount of database connections:</li>
</ul>
</li>
</ul>
<pre><code class="language-console" data-lang="console">$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
12413
</code></pre><ul>
<li>The system journal shows thousands of these messages in the system journal, this is the first one:</li>
</ul>
<pre><code class="language-console" data-lang="console">Apr 06 07:52:13 linode18 tomcat7[556]: Apr 06, 2021 7:52:13 AM org.apache.tomcat.jdbc.pool.ConnectionPool abandon
</code></pre><ul>
<li>Around that time in the dspace log I see nothing unusual, but maybe these?</li>
</ul>
<pre><code class="language-console" data-lang="console">2021-04-06 07:52:29,409 INFO com.atmire.dspace.cua.CUASolrLoggerServiceImpl @ Updating : 200/127 docs in http://localhost:8081/solr/statistics
</code></pre><ul>
<li>(BTW what is the deal with the &ldquo;200/127&rdquo;? I should send a comment to Atmire)
<ul>
<li>I file a ticket with Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-tickets">https://tracker.atmire.com/tickets-cgiar-ilri/view-tickets</a></li>
</ul>
</li>
<li>I restarted the PostgreSQL and Tomcat services and now I see less connections, but still WAY high:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
3640
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
2968
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
13
</code></pre><ul>
<li>After ten minutes or so it went back down&hellip;</li>
<li>And now it&rsquo;s back up in the thousands&hellip; I am seeing a lot of stuff in dspace log like this:</li>
</ul>
<pre><code class="language-console" data-lang="console">2021-04-06 11:59:34,364 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717951
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717952
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717953
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717954
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717955
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717956
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717957
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717958
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717959
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717960
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717961
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717962
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717963
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717964
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717965
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717966
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717967
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717968
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717969
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717970
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717971
</code></pre><ul>
<li>I sent some notes and a log to Atmire on our existing issue about the database stuff
<ul>
<li>Also I asked them about the possibility of doing a formal review of Hibernate</li>
</ul>
</li>
<li>Falcon 3.0.0 was released so I updated the 3.0.0 branch for dspace-statistics-api and merged it to <code>v6_x</code>
<ul>
<li>I also fixed one minor (unrelated) bug in the tests</li>
<li>Then I deployed the new version on DSpace Test</li>
</ul>
</li>
<li>I had a meeting with Peter and Abenet about CGSpace TODOs</li>
<li>CGSpace went down again and the PostgreSQL locks are through the roof:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
12154
</code></pre><ul>
<li>I don&rsquo;t see any activity on REST API, but in the last four hours there have been 3,500 DSpace sessions:</li>
</ul>
<pre><code class="language-console" data-lang="console"># grep -a -E '2021-04-06 (13|14|15|16|17):' /home/cgspace.cgiar.org/log/dspace.log.2021-04-06 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
3547
</code></pre><ul>
<li>I looked at the same time of day for the past few weeks and it seems to be a normal number of sessions:</li>
</ul>
<pre><code class="language-console" data-lang="console"># for file in /home/cgspace.cgiar.org/log/dspace.log.2021-0{3,4}-*; do grep -a -E &quot;2021-0(3|4)-[0-9]{2} (13|14|15|16|17):&quot; &quot;$file&quot; | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l; done
...
3572
4085
3476
3128
2949
2016
1839
4513
3463
4425
3328
2783
3898
3848
7799
255
534
2755
599
4463
3547
</code></pre><ul>
<li>What about total number of sessions per day?</li>
</ul>
<pre><code class="language-console" data-lang="console"># for file in /home/cgspace.cgiar.org/log/dspace.log.2021-0{3,4}-*; do echo &quot;$file:&quot;; grep -a -o -E 'session_id=[A-Z0-9]{32}' &quot;$file&quot; | sort | uniq | wc -l; done
...
/home/cgspace.cgiar.org/log/dspace.log.2021-03-28:
11784
/home/cgspace.cgiar.org/log/dspace.log.2021-03-29:
15104
/home/cgspace.cgiar.org/log/dspace.log.2021-03-30:
19396
/home/cgspace.cgiar.org/log/dspace.log.2021-03-31:
32612
/home/cgspace.cgiar.org/log/dspace.log.2021-04-01:
26037
/home/cgspace.cgiar.org/log/dspace.log.2021-04-02:
14315
/home/cgspace.cgiar.org/log/dspace.log.2021-04-03:
12530
/home/cgspace.cgiar.org/log/dspace.log.2021-04-04:
13138
/home/cgspace.cgiar.org/log/dspace.log.2021-04-05:
16756
/home/cgspace.cgiar.org/log/dspace.log.2021-04-06:
12343
</code></pre><ul>
<li>So it&rsquo;s not the number of sessions&hellip; it&rsquo;s something with the workload&hellip;</li>
<li>I had to step away for an hour or so and when I came back the site was still down and there were still 12,000 locks
<ul>
<li>I restarted postgresql and tomcat7&hellip;</li>
</ul>
</li>
<li>The locks in PostgreSQL shot up again&hellip;</li>
</ul>
<pre><code class="language-console" data-lang="console">$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
3447
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
3527
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
4582
</code></pre><ul>
2021-04-06 21:48:44 +02:00
<li>I don&rsquo;t know what the hell is going on, but the PostgreSQL connections and locks are way higher than ever before:</li>
</ul>
<p><img src="/cgspace-notes/2021/04/postgres_connections_cgspace-week.png" alt="PostgreSQL connections week">
<img src="/cgspace-notes/2021/04/postgres_locks_cgspace-week.png" alt="PostgreSQL locks week">
<img src="/cgspace-notes/2021/04/jmx_tomcat_dbpools-week.png" alt="Tomcat database pool"></p>
<ul>
<li>Otherwise, the number of DSpace sessions is completely normal:</li>
</ul>
<p><img src="/cgspace-notes/2021/04/jmx_dspace_sessions-week.png" alt="DSpace sessions"></p>
<ul>
<li>While looking at the nginx logs I see that MEL is trying to log into CGSpace&rsquo;s REST API and delete items:</li>
2021-04-06 21:33:43 +02:00
</ul>
<pre><code class="language-console" data-lang="console">34.209.213.122 - - [06/Apr/2021:03:50:46 +0200] &quot;POST /rest/login HTTP/1.1&quot; 401 727 &quot;-&quot; &quot;MEL&quot;
34.209.213.122 - - [06/Apr/2021:03:50:48 +0200] &quot;DELETE /rest/items/95f52bf1-f082-4e10-ad57-268a76ca18ec/metadata HTTP/1.1&quot; 401 704 &quot;-&quot; &quot;-&quot;
</code></pre><ul>
2021-04-06 21:48:44 +02:00
<li>I see a few of these per day going back several months
<ul>
<li>I sent a message to Salem and Enrico to ask if they know</li>
</ul>
</li>
2021-04-06 21:33:43 +02:00
<li>Also annoying, I see tons of what look like penetration testing requests from Qualys:</li>
</ul>
<pre><code class="language-console" data-lang="console">2021-04-04 06:35:17,889 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:failed_login:no DN found for user &quot;'&gt;&lt;qss a=X158062356Y1_2Z&gt;
2021-04-04 06:35:17,889 INFO org.dspace.authenticate.PasswordAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:authenticate:attempting password auth of user=&quot;'&gt;&lt;qss a=X158062356Y1_2Z&gt;
2021-04-04 06:35:17,890 INFO org.dspace.app.xmlui.utils.AuthenticationUtil @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:failed_login:email=&quot;'&gt;&lt;qss a=X158062356Y1_2Z&gt;, realm=null, result=2
2021-04-04 06:35:18,145 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:auth:attempting trivial auth of user=was@qualys.com
2021-04-04 06:35:18,519 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:failed_login:no DN found for user was@qualys.com
2021-04-04 06:35:18,520 INFO org.dspace.authenticate.PasswordAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:authenticate:attempting password auth of user=was@qualys.com
</code></pre><ul>
<li>I deleted the ilri/AReS repository on GitHub since we haven&rsquo;t updated it in two years
<ul>
<li>All development is happening in <a href="https://github.com/ilri/openRXV">https://github.com/ilri/openRXV</a> now</li>
</ul>
</li>
</ul>
2021-04-05 18:36:44 +02:00
<!-- raw HTML omitted -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2021-04/">April, 2021</a></li>
<li><a href="/cgspace-notes/2021-03/">March, 2021</a></li>
<li><a href="/cgspace-notes/cgspace-cgcorev2-migration/">CGSpace CG Core v2 Migration</a></li>
<li><a href="/cgspace-notes/2021-02/">February, 2021</a></li>
<li><a href="/cgspace-notes/2021-01/">January, 2021</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>