<!DOCTYPE html> <html lang="en" > <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta property="og:title" content="April, 2021" /> <meta property="og:description" content="2021-04-01 I wrote a script to query Sherpa’s API for our ISSNs: sherpa-issn-lookup.py I’m curious to see how the results compare with the results from Crossref yesterday AReS Explorer was down since this morning, I didn’t see anything in the systemd journal I simply took everything down with docker-compose and then back up, and then it was OK Perhaps one of the containers crashed, I should have looked closer but I was in a hurry " /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2021-04/" /> <meta property="article:published_time" content="2021-04-01T09:50:54+03:00" /> <meta property="article:modified_time" content="2021-04-28T18:57:48+03:00" /> <meta name="twitter:card" content="summary"/> <meta name="twitter:title" content="April, 2021"/> <meta name="twitter:description" content="2021-04-01 I wrote a script to query Sherpa’s API for our ISSNs: sherpa-issn-lookup.py I’m curious to see how the results compare with the results from Crossref yesterday AReS Explorer was down since this morning, I didn’t see anything in the systemd journal I simply took everything down with docker-compose and then back up, and then it was OK Perhaps one of the containers crashed, I should have looked closer but I was in a hurry "/> <meta name="generator" content="Hugo 0.101.0" /> <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "BlogPosting", "headline": "April, 2021", "url": "https://alanorth.github.io/cgspace-notes/2021-04/", "wordCount": "4669", "datePublished": "2021-04-01T09:50:54+03:00", "dateModified": "2021-04-28T18:57:48+03:00", "author": { "@type": "Person", "name": "Alan Orth" }, "keywords": "Notes" } </script> <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2021-04/"> <title>April, 2021 | CGSpace Notes</title> <!-- combined, minified CSS --> <link href="https://alanorth.github.io/cgspace-notes/css/style.beb8012edc08ba10be012f079d618dc243812267efe62e11f22fe49618f976a4.css" rel="stylesheet" integrity="sha256-vrgBLtwIuhC+AS8HnWGNwkOBImfv5i4R8i/klhj5dqQ=" crossorigin="anonymous"> <!-- minified Font Awesome for SVG icons --> <script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz+lcnA=" crossorigin="anonymous"></script> <!-- RSS 2.0 feed --> </head> <body> <div class="blog-masthead"> <div class="container"> <nav class="nav blog-nav"> <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a> </nav> </div> </div> <header class="blog-header"> <div class="container"> <h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1> <p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p> </div> </header> <div class="container"> <div class="row"> <div class="col-sm-8 blog-main"> <article class="blog-post"> <header> <h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2021-04/">April, 2021</a></h2> <p class="blog-post-meta"> <time datetime="2021-04-01T09:50:54+03:00">Thu Apr 01, 2021</time> in <span class="fas fa-folder" aria-hidden="true"></span> <a href="/categories/notes/" rel="category tag">Notes</a> </p> </header> <h2 id="2021-04-01">2021-04-01</h2> <ul> <li>I wrote a script to query Sherpa’s API for our ISSNs: <code>sherpa-issn-lookup.py</code> <ul> <li>I’m curious to see how the results compare with the results from Crossref yesterday</li> </ul> </li> <li>AReS Explorer was down since this morning, I didn’t see anything in the systemd journal <ul> <li>I simply took everything down with docker-compose and then back up, and then it was OK</li> <li>Perhaps one of the containers crashed, I should have looked closer but I was in a hurry</li> </ul> </li> </ul> <h2 id="2021-04-03">2021-04-03</h2> <ul> <li>Biruk from ICT contacted me to say that some CGSpace users still can’t log in <ul> <li>I guess the CGSpace LDAP bind account is really still locked after last week’s reset</li> <li>He fixed the account and then I was finally able to bind and query:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b <span style="color:#e6db74">"dc=cgiarad,dc=org"</span> -D <span style="color:#e6db74">"cgspace-account"</span> -W <span style="color:#e6db74">"(sAMAccountName=otheraccounttoquery)"</span> </span></span></code></pre></div><h2 id="2021-04-04">2021-04-04</h2> <ul> <li>Check the index aliases on AReS Explorer to make sure they are sane before starting a new harvest:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/_alias/'</span> | python -m json.tool | less </span></span></code></pre></div><ul> <li>Then set the <code>openrxv-items-final</code> index to read-only so we can make a backup:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-final/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": true}}'</span> </span></span><span style="display:flex;"><span>{"acknowledged":true}% </span></span><span style="display:flex;"><span>$ curl -s -X POST http://localhost:9200/openrxv-items-final/_clone/openrxv-items-final-backup </span></span><span style="display:flex;"><span>{"acknowledged":true,"shards_acknowledged":true,"index":"openrxv-items-final-backup"}% </span></span><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-final/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": false}}'</span> </span></span></code></pre></div><ul> <li>Then start a harvesting on AReS Explorer</li> <li>Help Enrico get some 2020 statistics for the Roots, Tubers and Bananas (RTB) community on CGSpace <ul> <li>He was hitting <a href="https://github.com/ilri/OpenRXV/issues/66">a bug on AReS</a> and also he only needed stats for 2020, and AReS currently only gives all-time stats</li> </ul> </li> <li>I cleaned up about 230 ISSNs on CGSpace in OpenRefine <ul> <li>I had exported them last week, then filtered for anything not looking like an ISSN with this GREL: <code>isNotNull(value.match(/^\p{Alnum}{4}-\p{Alnum}{4}$/))</code></li> <li>Then I applied them on CGSpace with the <code>fix-metadata-values.py</code> script:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/fix-metadata-values.py -i /tmp/2021-04-01-ISSNs.csv -db dspace -u dspace -p <span style="color:#e6db74">'fuuu'</span> -f cg.issn -t <span style="color:#e6db74">'correct'</span> -m <span style="color:#ae81ff">253</span> </span></span></code></pre></div><ul> <li>For now I only fixed obvious errors like “1234-5678.” and “e-ISSN: 1234-5678” etc, but there are still lots of invalid ones which need more manual work: <ul> <li>Too few characters</li> <li>Too many characters</li> <li>ISBNs</li> </ul> </li> <li>Create the CGSpace community and collection structure for the new Accelerating Impacts of CGIAR Climate Research for Africa (AICCRA) and assign all workflow steps</li> </ul> <h2 id="2021-04-05">2021-04-05</h2> <ul> <li>The AReS Explorer harvesting from yesterday finished, and the results look OK, but actually the Elasticsearch indexes are messed up again:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/_alias/'</span> | python -m json.tool </span></span><span style="display:flex;"><span>{ </span></span><span style="display:flex;"><span> "openrxv-items-final": { </span></span><span style="display:flex;"><span> "aliases": {} </span></span><span style="display:flex;"><span> }, </span></span><span style="display:flex;"><span> "openrxv-items-temp": { </span></span><span style="display:flex;"><span> "aliases": { </span></span><span style="display:flex;"><span> "openrxv-items": {} </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span> }, </span></span><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><ul> <li><code>openrxv-items</code> should be an alias of <code>openrxv-items-final</code>, not <code>openrxv-temp</code>… I will have to fix that manually</li> <li>Enrico asked for more information on the RTB stats I gave him yesterday <ul> <li>I remembered (again) that we can’t filter Atmire’s CUA stats by date issued</li> <li>To show, for example, views/downloads in the year 2020 for RTB issued in 2020, we would need to use the DSpace statistics API and post a list of IDs and a custom date range</li> <li>I tried to do that here by exporting the RTB community and extracting the IDs for items issued in 2020:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ~/dspace63/bin/dspace metadata-export -i 10568/80100 -f /tmp/rtb.csv </span></span><span style="display:flex;"><span>$ csvcut -c <span style="color:#e6db74">'id,dcterms.issued,dcterms.issued[],dcterms.issued[en_US]'</span> /tmp/rtb.csv | <span style="color:#ae81ff">\ </span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> sed '1d' | \ </span></span><span style="display:flex;"><span> csvsql --no-header --no-inference --query 'SELECT a AS id,COALESCE(b, "")||COALESCE(c, "")||COALESCE(d, "") AS issued FROM stdin' | \ </span></span><span style="display:flex;"><span> csvgrep -c issued -m 2020 | \ </span></span><span style="display:flex;"><span> csvcut -c id | \ </span></span><span style="display:flex;"><span> sed '1d' | \ </span></span><span style="display:flex;"><span> sort | \ </span></span><span style="display:flex;"><span> uniq </span></span></code></pre></div><ul> <li>So I remember in the future, this basically does the following: <ul> <li>Use csvcut to extract the id and all date issued columns from the CSV</li> <li>Use sed to remove the header so we can refer to the columns using default a, b, c instead of their real names (which are tricky to match due to special characters)</li> <li>Use csvsql to concatenate the various date issued columns (coalescing where null)</li> <li>Use csvgrep to filter items by date issued in 2020</li> <li>Use csvcut to extract the id column</li> <li>Use sed to delete the header row</li> <li>Use sort and uniq to filter out any duplicate IDs (there were three)</li> </ul> </li> <li>Then I have a list of 296 IDs for RTB items issued in 2020</li> <li>I constructed a JSON file to post to the DSpace Statistics API:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-json" data-lang="json"><span style="display:flex;"><span>{ </span></span><span style="display:flex;"><span> <span style="color:#f92672">"limit"</span>: <span style="color:#ae81ff">100</span>, </span></span><span style="display:flex;"><span> <span style="color:#f92672">"page"</span>: <span style="color:#ae81ff">0</span>, </span></span><span style="display:flex;"><span> <span style="color:#f92672">"dateFrom"</span>: <span style="color:#e6db74">"2020-01-01T00:00:00Z"</span>, </span></span><span style="display:flex;"><span> <span style="color:#f92672">"dateTo"</span>: <span style="color:#e6db74">"2020-12-31T00:00:00Z"</span>, </span></span><span style="display:flex;"><span> <span style="color:#f92672">"items"</span>: [ </span></span><span style="display:flex;"><span><span style="color:#e6db74">"00358715-b70c-4fdd-aa55-730e05ba739e"</span>, </span></span><span style="display:flex;"><span><span style="color:#e6db74">"004b54bb-f16f-4cec-9fbc-ab6c6345c43d"</span>, </span></span><span style="display:flex;"><span><span style="color:#e6db74">"02fb7630-d71a-449e-b65d-32b4ea7d6904"</span>, </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">...</span> </span></span><span style="display:flex;"><span> ] </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><ul> <li>Then I submitted the file three times (changing the page parameter):</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s -d @/tmp/2020-items.txt https://cgspace.cgiar.org/rest/statistics/items | json_pp > /tmp/page1.json </span></span><span style="display:flex;"><span>$ curl -s -d @/tmp/2020-items.txt https://cgspace.cgiar.org/rest/statistics/items | json_pp > /tmp/page2.json </span></span><span style="display:flex;"><span>$ curl -s -d @/tmp/2020-items.txt https://cgspace.cgiar.org/rest/statistics/items | json_pp > /tmp/page3.json </span></span></code></pre></div><ul> <li>Then I extracted the views and downloads in the most ridiculous way:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep views /tmp/page*.json | grep -o -E <span style="color:#e6db74">'[0-9]+$'</span> | sed <span style="color:#e6db74">'s/,//'</span> | xargs | sed -e <span style="color:#e6db74">'s/ /+/g'</span> | bc </span></span><span style="display:flex;"><span>30364 </span></span><span style="display:flex;"><span>$ grep downloads /tmp/page*.json | grep -o -E <span style="color:#e6db74">'[0-9]+,'</span> | sed <span style="color:#e6db74">'s/,//'</span> | xargs | sed -e <span style="color:#e6db74">'s/ /+/g'</span> | bc </span></span><span style="display:flex;"><span>9100 </span></span></code></pre></div><ul> <li>For curiousity I did the same exercise for items issued in 2019 and got the following: <ul> <li>Views: 30721</li> <li>Downloads: 10205</li> </ul> </li> </ul> <h2 id="2021-04-06">2021-04-06</h2> <ul> <li>Margarita from CCAFS was having problems deleting an item from CGSpace again <ul> <li>The error was “Authorization denied for action OBSOLETE (DELETE) on BITSTREAM:bd157345-448e …”</li> <li>This is the same issue as last month</li> </ul> </li> <li>Create a new collection on CGSpace for a new CIP project at Mishel Portilla’s request</li> <li>I got a notice that CGSpace was down <ul> <li>I didn’t see anything strange at first, but there are an insane amount of database connections:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>12413 </span></span></code></pre></div><ul> <li>The system journal shows thousands of these messages in the system journal, this is the first one:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>Apr 06 07:52:13 linode18 tomcat7[556]: Apr 06, 2021 7:52:13 AM org.apache.tomcat.jdbc.pool.ConnectionPool abandon </span></span></code></pre></div><ul> <li>Around that time in the dspace log I see nothing unusual, but maybe these?</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>2021-04-06 07:52:29,409 INFO com.atmire.dspace.cua.CUASolrLoggerServiceImpl @ Updating : 200/127 docs in http://localhost:8081/solr/statistics </span></span></code></pre></div><ul> <li>(BTW what is the deal with the “200/127”? I should send a comment to Atmire) <ul> <li>I file a ticket with Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-tickets">https://tracker.atmire.com/tickets-cgiar-ilri/view-tickets</a></li> </ul> </li> <li>I restarted the PostgreSQL and Tomcat services and now I see less connections, but still WAY high:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>3640 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>2968 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>13 </span></span></code></pre></div><ul> <li>After ten minutes or so it went back down…</li> <li>And now it’s back up in the thousands… I am seeing a lot of stuff in dspace log like this:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>2021-04-06 11:59:34,364 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717951 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717952 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717953 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717954 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717955 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717956 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717957 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717958 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717959 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717960 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717961 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717962 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717963 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717964 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717965 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717966 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717967 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717968 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717969 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717970 </span></span><span style="display:flex;"><span>2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717971 </span></span></code></pre></div><ul> <li>I sent some notes and a log to Atmire on our existing issue about the database stuff <ul> <li>Also I asked them about the possibility of doing a formal review of Hibernate</li> </ul> </li> <li>Falcon 3.0.0 was released so I updated the 3.0.0 branch for dspace-statistics-api and merged it to <code>v6_x</code> <ul> <li>I also fixed one minor (unrelated) bug in the tests</li> <li>Then I deployed the new version on DSpace Test</li> </ul> </li> <li>I had a meeting with Peter and Abenet about CGSpace TODOs</li> <li>CGSpace went down again and the PostgreSQL locks are through the roof:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>12154 </span></span></code></pre></div><ul> <li>I don’t see any activity on REST API, but in the last four hours there have been 3,500 DSpace sessions:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># grep -a -E <span style="color:#e6db74">'2021-04-06 (13|14|15|16|17):'</span> /home/cgspace.cgiar.org/log/dspace.log.2021-04-06 | grep -o -E <span style="color:#e6db74">'session_id=[A-Z0-9]{32}'</span> | sort | uniq | wc -l </span></span><span style="display:flex;"><span>3547 </span></span></code></pre></div><ul> <li>I looked at the same time of day for the past few weeks and it seems to be a normal number of sessions:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># <span style="color:#66d9ef">for</span> file in /home/cgspace.cgiar.org/log/dspace.log.2021-0<span style="color:#f92672">{</span>3,4<span style="color:#f92672">}</span>-*; <span style="color:#66d9ef">do</span> grep -a -E <span style="color:#e6db74">"2021-0(3|4)-[0-9]{2} (13|14|15|16|17):"</span> <span style="color:#e6db74">"</span>$file<span style="color:#e6db74">"</span> | grep -o -E <span style="color:#e6db74">'session_id=[A-Z0-9]{32}'</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span> </span></span><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span>3572 </span></span><span style="display:flex;"><span>4085 </span></span><span style="display:flex;"><span>3476 </span></span><span style="display:flex;"><span>3128 </span></span><span style="display:flex;"><span>2949 </span></span><span style="display:flex;"><span>2016 </span></span><span style="display:flex;"><span>1839 </span></span><span style="display:flex;"><span>4513 </span></span><span style="display:flex;"><span>3463 </span></span><span style="display:flex;"><span>4425 </span></span><span style="display:flex;"><span>3328 </span></span><span style="display:flex;"><span>2783 </span></span><span style="display:flex;"><span>3898 </span></span><span style="display:flex;"><span>3848 </span></span><span style="display:flex;"><span>7799 </span></span><span style="display:flex;"><span>255 </span></span><span style="display:flex;"><span>534 </span></span><span style="display:flex;"><span>2755 </span></span><span style="display:flex;"><span>599 </span></span><span style="display:flex;"><span>4463 </span></span><span style="display:flex;"><span>3547 </span></span></code></pre></div><ul> <li>What about total number of sessions per day?</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># <span style="color:#66d9ef">for</span> file in /home/cgspace.cgiar.org/log/dspace.log.2021-0<span style="color:#f92672">{</span>3,4<span style="color:#f92672">}</span>-*; <span style="color:#66d9ef">do</span> echo <span style="color:#e6db74">"</span>$file<span style="color:#e6db74">:"</span>; grep -a -o -E <span style="color:#e6db74">'session_id=[A-Z0-9]{32}'</span> <span style="color:#e6db74">"</span>$file<span style="color:#e6db74">"</span> | sort | uniq | wc -l; <span style="color:#66d9ef">done</span> </span></span><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span>/home/cgspace.cgiar.org/log/dspace.log.2021-03-28: </span></span><span style="display:flex;"><span>11784 </span></span><span style="display:flex;"><span>/home/cgspace.cgiar.org/log/dspace.log.2021-03-29: </span></span><span style="display:flex;"><span>15104 </span></span><span style="display:flex;"><span>/home/cgspace.cgiar.org/log/dspace.log.2021-03-30: </span></span><span style="display:flex;"><span>19396 </span></span><span style="display:flex;"><span>/home/cgspace.cgiar.org/log/dspace.log.2021-03-31: </span></span><span style="display:flex;"><span>32612 </span></span><span style="display:flex;"><span>/home/cgspace.cgiar.org/log/dspace.log.2021-04-01: </span></span><span style="display:flex;"><span>26037 </span></span><span style="display:flex;"><span>/home/cgspace.cgiar.org/log/dspace.log.2021-04-02: </span></span><span style="display:flex;"><span>14315 </span></span><span style="display:flex;"><span>/home/cgspace.cgiar.org/log/dspace.log.2021-04-03: </span></span><span style="display:flex;"><span>12530 </span></span><span style="display:flex;"><span>/home/cgspace.cgiar.org/log/dspace.log.2021-04-04: </span></span><span style="display:flex;"><span>13138 </span></span><span style="display:flex;"><span>/home/cgspace.cgiar.org/log/dspace.log.2021-04-05: </span></span><span style="display:flex;"><span>16756 </span></span><span style="display:flex;"><span>/home/cgspace.cgiar.org/log/dspace.log.2021-04-06: </span></span><span style="display:flex;"><span>12343 </span></span></code></pre></div><ul> <li>So it’s not the number of sessions… it’s something with the workload…</li> <li>I had to step away for an hour or so and when I came back the site was still down and there were still 12,000 locks <ul> <li>I restarted postgresql and tomcat7…</li> </ul> </li> <li>The locks in PostgreSQL shot up again…</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>3447 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>3527 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>4582 </span></span></code></pre></div><ul> <li>I don’t know what the hell is going on, but the PostgreSQL connections and locks are way higher than ever before:</li> </ul> <p><img src="/cgspace-notes/2021/04/postgres_connections_cgspace-week.png" alt="PostgreSQL connections week"> <img src="/cgspace-notes/2021/04/postgres_locks_cgspace-week.png" alt="PostgreSQL locks week"> <img src="/cgspace-notes/2021/04/jmx_tomcat_dbpools-week.png" alt="Tomcat database pool"></p> <ul> <li>Otherwise, the number of DSpace sessions is completely normal:</li> </ul> <p><img src="/cgspace-notes/2021/04/jmx_dspace_sessions-week.png" alt="DSpace sessions"></p> <ul> <li>While looking at the nginx logs I see that MEL is trying to log into CGSpace’s REST API and delete items:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>34.209.213.122 - - [06/Apr/2021:03:50:46 +0200] "POST /rest/login HTTP/1.1" 401 727 "-" "MEL" </span></span><span style="display:flex;"><span>34.209.213.122 - - [06/Apr/2021:03:50:48 +0200] "DELETE /rest/items/95f52bf1-f082-4e10-ad57-268a76ca18ec/metadata HTTP/1.1" 401 704 "-" "-" </span></span></code></pre></div><ul> <li>I see a few of these per day going back several months <ul> <li>I sent a message to Salem and Enrico to ask if they know</li> </ul> </li> <li>Also annoying, I see tons of what look like penetration testing requests from Qualys:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>2021-04-04 06:35:17,889 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:failed_login:no DN found for user "'><qss a=X158062356Y1_2Z> </span></span><span style="display:flex;"><span>2021-04-04 06:35:17,889 INFO org.dspace.authenticate.PasswordAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:authenticate:attempting password auth of user="'><qss a=X158062356Y1_2Z> </span></span><span style="display:flex;"><span>2021-04-04 06:35:17,890 INFO org.dspace.app.xmlui.utils.AuthenticationUtil @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:failed_login:email="'><qss a=X158062356Y1_2Z>, realm=null, result=2 </span></span><span style="display:flex;"><span>2021-04-04 06:35:18,145 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:auth:attempting trivial auth of user=was@qualys.com </span></span><span style="display:flex;"><span>2021-04-04 06:35:18,519 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:failed_login:no DN found for user was@qualys.com </span></span><span style="display:flex;"><span>2021-04-04 06:35:18,520 INFO org.dspace.authenticate.PasswordAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:authenticate:attempting password auth of user=was@qualys.com </span></span></code></pre></div><ul> <li>I deleted the ilri/AReS repository on GitHub since we haven’t updated it in two years <ul> <li>All development is happening in <a href="https://github.com/ilri/openRXV">https://github.com/ilri/openRXV</a> now</li> </ul> </li> <li>10PM and the server is down again, with locks through the roof:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>12198 </span></span></code></pre></div><ul> <li>I see that there are tons of PostgreSQL connections getting abandoned today, compared to very few in the past few weeks:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ journalctl -u tomcat7 --since<span style="color:#f92672">=</span>today | grep -c <span style="color:#e6db74">'ConnectionPool abandon'</span> </span></span><span style="display:flex;"><span>1838 </span></span><span style="display:flex;"><span>$ journalctl -u tomcat7 --since<span style="color:#f92672">=</span>2021-03-20 --until<span style="color:#f92672">=</span>2021-04-05 | grep -c <span style="color:#e6db74">'ConnectionPool abandon'</span> </span></span><span style="display:flex;"><span>3 </span></span></code></pre></div><ul> <li>I even restarted the server and connections were low for a few minutes until they shot back up:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>13 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>8651 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>8940 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>10504 </span></span></code></pre></div><ul> <li>I had to go to bed and I bet it will crash and be down for hours until I wake up…</li> <li>What the hell is this user agent?</li> </ul> <pre tabindex="0"><code>54.197.119.143 - - [06/Apr/2021:19:18:11 +0200] "GET /handle/10568/16499 HTTP/1.1" 499 0 "-" "GetUrl/1.0 wdestiny@umich.edu (Linux)" </code></pre><h2 id="2021-04-07">2021-04-07</h2> <ul> <li>CGSpace was still down from last night of course, with tons of database locks:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>12168 </span></span></code></pre></div><ul> <li>I restarted the server again and the locks came back</li> <li>Atmire responded to the message from yesterday <ul> <li>The noticed something in the logs about emails failing to be sent</li> <li>There appears to be an issue sending mails on workflow tasks when a user in that group has an invalid email address:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>2021-04-01 12:45:11,414 WARN org.dspace.workflowbasic.BasicWorkflowServiceImpl @ a.akwarandu@cgiar.org:session_id=2F20F20D4A8C36DB53D42DE45DFA3CCE:notifyGroupofTask:cannot email user group_id=aecf811b-b7e9-4b6f-8776-3d372e6a048b workflow_item_id=33085\colon; Invalid Addresses (com.sun.mail.smtp.SMTPAddressFailedException\colon; 501 5.1.3 Invalid address </span></span></code></pre></div><ul> <li>The issue is not the named user above, but a member of the group…</li> <li>And the group does have users with invalid email addresses (probably accounts created automatically after authenticating with LDAP):</li> </ul> <p><img src="/cgspace-notes/2021/04/group-invalid-email.png" alt="DSpace group"></p> <ul> <li>I extracted all the group IDs from recent logs that had users with invalid email addresses:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -a -E <span style="color:#e6db74">'email user group_id=\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b'</span> /home/cgspace.cgiar.org/log/dspace.log.* | grep -o -E <span style="color:#e6db74">'\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b'</span> | sort | uniq </span></span><span style="display:flex;"><span>0a30d6ae-74a6-4eee-a8f5-ee5d15192ee6 </span></span><span style="display:flex;"><span>1769137c-36d4-42b2-8fec-60585e110db7 </span></span><span style="display:flex;"><span>203c8614-8a97-4ac8-9686-d9d62cb52acc </span></span><span style="display:flex;"><span>294603de-3d09-464e-a5b0-09e452c6b5ab </span></span><span style="display:flex;"><span>35878555-9623-4679-beb8-bb3395fdf26e </span></span><span style="display:flex;"><span>3d8a5efa-5509-4bf9-9374-2bc714aceb99 </span></span><span style="display:flex;"><span>4238208a-f848-47cb-9dd2-43f9f954a4af </span></span><span style="display:flex;"><span>44939b84-1894-41e7-b3e6-8c8d1781057b </span></span><span style="display:flex;"><span>49ba087e-75a3-45ce-805c-69eeda0f786b </span></span><span style="display:flex;"><span>4a6606ce-0284-421d-bf80-4dafddba2d42 </span></span><span style="display:flex;"><span>527de6aa-9cd0-4988-bf5f-c9c92ba2ac10 </span></span><span style="display:flex;"><span>54cd1b16-65bf-4041-9d84-fb2ea3301d6d </span></span><span style="display:flex;"><span>58982847-5f7c-4b8b-a7b0-4d4de702136e </span></span><span style="display:flex;"><span>5f0b85be-bd23-47de-927d-bca368fa1fbc </span></span><span style="display:flex;"><span>646ada17-e4ef-49f6-9378-af7e58596ce1 </span></span><span style="display:flex;"><span>7e2f4bf8-fbc9-4b2f-97a4-75e5427bef90 </span></span><span style="display:flex;"><span>8029fd53-f9f5-4107-bfc3-8815507265cf </span></span><span style="display:flex;"><span>81faa934-c602-4608-bf45-de91845dfea7 </span></span><span style="display:flex;"><span>8611a462-210c-4be1-a5bb-f87a065e6113 </span></span><span style="display:flex;"><span>8855c903-ef86-433c-b0be-c12300eb0f84 </span></span><span style="display:flex;"><span>8c7ece98-3598-4de7-a885-d61fd033bea8 </span></span><span style="display:flex;"><span>8c9a0d01-2d12-4a99-84f9-cdc25ac072f9 </span></span><span style="display:flex;"><span>8f9f888a-b501-41f3-a462-4da16150eebf </span></span><span style="display:flex;"><span>94168f0e-9f45-4112-ac8d-3ba9be917842 </span></span><span style="display:flex;"><span>96998038-f381-47dc-8488-ff7252703627 </span></span><span style="display:flex;"><span>9768f4a8-3018-44e9-bf58-beba4296327c </span></span><span style="display:flex;"><span>9a99e8d2-558e-4fc1-8011-e4411f658414 </span></span><span style="display:flex;"><span>a34e6400-78ed-45c0-a751-abc039eed2e6 </span></span><span style="display:flex;"><span>a9da5af3-4ec7-4a9b-becb-6e3d028d594d </span></span><span style="display:flex;"><span>abf5201c-8be5-4dee-b461-132203dd51cb </span></span><span style="display:flex;"><span>adb5658c-cef3-402f-87b6-b498f580351c </span></span><span style="display:flex;"><span>aecf811b-b7e9-4b6f-8776-3d372e6a048b </span></span><span style="display:flex;"><span>ba5aae61-ea34-4ac1-9490-4645acf2382f </span></span><span style="display:flex;"><span>bf7f3638-c7c6-4a8f-893d-891a6d3dafff </span></span><span style="display:flex;"><span>c617ada0-09d1-40ed-b479-1c4860a4f724 </span></span><span style="display:flex;"><span>cff91d44-a855-458c-89e5-bd48c17d1a54 </span></span><span style="display:flex;"><span>e65171ae-a2bf-4043-8f54-f8457bc9174b </span></span><span style="display:flex;"><span>e7098b40-4701-4ca2-b9a9-3a1282f67044 </span></span><span style="display:flex;"><span>e904f122-71dc-439b-b877-313ef62486d7 </span></span><span style="display:flex;"><span>ede59734-adac-4c01-8691-b45f19088d37 </span></span><span style="display:flex;"><span>f88bd6bb-f93f-41cb-872f-ff26f6237068 </span></span><span style="display:flex;"><span>f985f5fb-be5c-430b-a8f1-cf86ae4fc49a </span></span><span style="display:flex;"><span>fe800006-aaec-4f9e-9ab4-f9475b4cbdc3 </span></span></code></pre></div><h2 id="2021-04-08">2021-04-08</h2> <ul> <li>I can’t believe it but the server has been down for twelve hours or so <ul> <li>The locks have not changed since I went to bed last night:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>12070 </span></span></code></pre></div><ul> <li>I restarted PostgreSQL and Tomcat and the locks go straight back up!</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>13 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>986 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>1194 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>1212 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>1489 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>2124 </span></span><span style="display:flex;"><span>$ psql -c <span style="color:#e6db74">'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;'</span> | wc -l </span></span><span style="display:flex;"><span>5934 </span></span></code></pre></div><h2 id="2021-04-09">2021-04-09</h2> <ul> <li>Atmire managed to get CGSpace back up by killing all the PostgreSQL connections yesterday <ul> <li>I don’t know how they did it…</li> <li>They also think it’s weird that restarting PostgreSQL didn’t kill the connections</li> <li>They asked some more questions, like for example if there were also issues on DSpace Test</li> <li>Strangely enough, I checked DSpace Test and notice a clear spike in PostgreSQL locks on the morning of April 6th as well!</li> </ul> </li> </ul> <p><img src="/cgspace-notes/2021/04/postgres_locks_ALL-week-PROD.png" alt="PostgreSQL locks week CGSpace"> <img src="/cgspace-notes/2021/04/postgres_locks_ALL-week-TEST.png" alt="PostgreSQL locks week DSpace Test"></p> <ul> <li>I definitely need to look into that!</li> </ul> <h2 id="2021-04-11">2021-04-11</h2> <ul> <li>I am trying to resolve the AReS Elasticsearch index issues that happened last week <ul> <li>I decided to back up the <code>openrxv-items</code> index to <code>openrxv-items-backup</code> and then delete all the others:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": true}}'</span> </span></span><span style="display:flex;"><span>$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-backup </span></span><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": false}}'</span> </span></span><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp'</span> </span></span><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-final'</span> </span></span></code></pre></div><ul> <li>Then I updated all Docker containers and rebooted the server (linode20) so that the correct indexes would be created again:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | xargs -L1 docker pull </span></span></code></pre></div><ul> <li>Then I realized I have to clone the backup index directly to <code>openrxv-items-final</code>, and re-create the <code>openrxv-items</code> alias:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-final'</span> </span></span><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-backup/_settings"</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"settings": {"index.blocks.write": true}}'</span> </span></span><span style="display:flex;"><span>$ curl -s -X POST http://localhost:9200/openrxv-items-backup/_clone/openrxv-items-final </span></span><span style="display:flex;"><span>$ curl -s -X POST <span style="color:#e6db74">'http://localhost:9200/_aliases'</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'</span> </span></span></code></pre></div><ul> <li>Now I see both <code>openrxv-items-final</code> and <code>openrxv-items</code> have the current number of items:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items/_count?q=*&pretty'</span> </span></span><span style="display:flex;"><span>{ </span></span><span style="display:flex;"><span> "count" : 103373, </span></span><span style="display:flex;"><span> "_shards" : { </span></span><span style="display:flex;"><span> "total" : 1, </span></span><span style="display:flex;"><span> "successful" : 1, </span></span><span style="display:flex;"><span> "skipped" : 0, </span></span><span style="display:flex;"><span> "failed" : 0 </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span>} </span></span><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-final/_count?q=*&pretty'</span> </span></span><span style="display:flex;"><span>{ </span></span><span style="display:flex;"><span> "count" : 103373, </span></span><span style="display:flex;"><span> "_shards" : { </span></span><span style="display:flex;"><span> "total" : 1, </span></span><span style="display:flex;"><span> "successful" : 1, </span></span><span style="display:flex;"><span> "skipped" : 0, </span></span><span style="display:flex;"><span> "failed" : 0 </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><ul> <li>Then I started a fresh harvesting in the AReS Explorer admin dashboard</li> </ul> <h2 id="2021-04-12">2021-04-12</h2> <ul> <li>The harvesting on AReS finished last night, but the indexes got messed up again <ul> <li>I will have to fix them manually next time…</li> </ul> </li> </ul> <h2 id="2021-04-13">2021-04-13</h2> <ul> <li>Looking into the logs on 2021-04-06 on CGSpace and DSpace Test to see if there is anything specific that stands out about the activty on those days that would cause the PostgreSQL issues <ul> <li>Digging into the Munin graphs for the last week I found a few other things happening on that morning:</li> </ul> </li> </ul> <p><img src="/cgspace-notes/2021/04/sda-week.png" alt="/dev/sda disk latency week"> <img src="/cgspace-notes/2021/04/classes_unloaded-week.png" alt="JVM classes unloaded week"> <img src="/cgspace-notes/2021/04/nginx_status-week.png" alt="Nginx status week"></p> <ul> <li>13,000 requests in the last two months from a user with user agent <code>SomeRandomText</code>, for example:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>84.33.2.97 - - [06/Apr/2021:06:25:13 +0200] "GET /bitstream/handle/10568/77776/CROP%20SCIENCE.jpg.jpg HTTP/1.1" 404 10890 "-" "SomeRandomText" </span></span></code></pre></div><ul> <li>I purged them:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-hits.sh -f /tmp/agents.txt -p </span></span><span style="display:flex;"><span>Purging 13159 hits from SomeRandomText in statistics </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 13159 </span></span></code></pre></div><ul> <li>I noticed there were 78 items submitted in the hour before CGSpace crashed:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># grep -a -E <span style="color:#e6db74">'2021-04-06 0(6|7):'</span> /home/cgspace.cgiar.org/log/dspace.log.2021-04-06 | grep -c -a add_item </span></span><span style="display:flex;"><span>78 </span></span></code></pre></div><ul> <li>Of those 78, 77 of them were from Udana</li> <li>Compared to other mornings (0 to 9 AM) this month that seems to be pretty high:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># <span style="color:#66d9ef">for</span> num in <span style="color:#f92672">{</span>01..13<span style="color:#f92672">}</span>; <span style="color:#66d9ef">do</span> grep -a -E <span style="color:#e6db74">"2021-04-</span>$num<span style="color:#e6db74"> 0"</span> /home/cgspace.cgiar.org/log/dspace.log.2021-04-$num | grep -c -a </span></span><span style="display:flex;"><span> add_item; done </span></span><span style="display:flex;"><span>32 </span></span><span style="display:flex;"><span>0 </span></span><span style="display:flex;"><span>0 </span></span><span style="display:flex;"><span>2 </span></span><span style="display:flex;"><span>8 </span></span><span style="display:flex;"><span>108 </span></span><span style="display:flex;"><span>4 </span></span><span style="display:flex;"><span>0 </span></span><span style="display:flex;"><span>29 </span></span><span style="display:flex;"><span>0 </span></span><span style="display:flex;"><span>1 </span></span><span style="display:flex;"><span>1 </span></span><span style="display:flex;"><span>2 </span></span></code></pre></div><h2 id="2021-04-15">2021-04-15</h2> <ul> <li>Release v1.4.2 of the DSpace Statistics API on GitHub: <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.4.2">https://github.com/ilri/dspace-statistics-api/releases/tag/v1.4.2</a> <ul> <li>This has been running on DSpace Test for the last week or so, and mostly contains the Falcon 3.0.0 changes</li> </ul> </li> <li>Re-sync DSpace Test with data from CGSpace <ul> <li>Run system updates on DSpace Test (linode26) and reboot the server</li> </ul> </li> <li>Update the PostgreSQL JDBC driver on DSpace Test (linode26) to 42.2.19 <ul> <li>It has been a few months since we updated this, and there have been a few releases since 42.2.14 that we are currently using</li> </ul> </li> <li>Create a test account for Rafael from Bioversity-CIAT to submit some items to DSpace Test:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ dspace user -a -m tip-submit@cgiar.org -g CIAT -s Submit -p <span style="color:#e6db74">'fuuuuuuuu'</span> </span></span></code></pre></div><ul> <li>I added the account to the Alliance Admins account, which is should allow him to submit to any Alliance collection <ul> <li>According to my notes from <a href="/cgspace-notes/2020-10/">2020-10</a> the account must be in the admin group in order to submit via the REST API</li> </ul> </li> </ul> <h2 id="2021-04-18">2021-04-18</h2> <ul> <li>Update all containers on AReS (linode20):</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ docker images | grep -v ^REPO | sed <span style="color:#e6db74">'s/ \+/:/g'</span> | cut -d: -f1,2 | xargs -L1 docker pull </span></span></code></pre></div><ul> <li>Then run all system updates and reboot the server</li> <li>I learned a new command for Elasticsearch:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl http://localhost:9200/_cat/indices </span></span><span style="display:flex;"><span>yellow open openrxv-values ChyhGwMDQpevJtlNWO1vcw 1 1 1579 0 537.6kb 537.6kb </span></span><span style="display:flex;"><span>yellow open openrxv-items-temp PhV5ieuxQsyftByvCxzSIw 1 1 103585 104372 482.7mb 482.7mb </span></span><span style="display:flex;"><span>yellow open openrxv-shared J_8cxIz6QL6XTRZct7UBBQ 1 1 127 0 115.7kb 115.7kb </span></span><span style="display:flex;"><span>yellow open openrxv-values-00001 jAoXTLR0R9mzivlDVbQaqA 1 1 3903 0 696.2kb 696.2kb </span></span><span style="display:flex;"><span>green open .kibana_task_manager_1 O1zgJ0YlQhKCFAwJZaNSIA 1 0 2 2 20.6kb 20.6kb </span></span><span style="display:flex;"><span>yellow open openrxv-users 1hWGXh9kS_S6YPxAaBN8ew 1 1 5 0 28.6kb 28.6kb </span></span><span style="display:flex;"><span>green open .apm-agent-configuration f3RAkSEBRGaxJZs3ePVxsA 1 0 0 0 283b 283b </span></span><span style="display:flex;"><span>yellow open openrxv-items-final sgk-s8O-RZKdcLRoWt3G8A 1 1 970 0 2.3mb 2.3mb </span></span><span style="display:flex;"><span>green open .kibana_1 HHPN7RD_T7qe0zDj4rauQw 1 0 25 7 36.8kb 36.8kb </span></span><span style="display:flex;"><span>yellow open users M0t2LaZhSm2NrF5xb64dnw 1 1 2 0 11.6kb 11.6kb </span></span></code></pre></div><ul> <li>Somehow the <code>openrxv-items-final</code> index only has a few items and the majority are in <code>openrxv-items-temp</code>, via the <code>openrxv-items</code> alias (which is in the temp index):</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items/_count?q=*&pretty'</span> </span></span><span style="display:flex;"><span>{ </span></span><span style="display:flex;"><span> "count" : 103585, </span></span><span style="display:flex;"><span> "_shards" : { </span></span><span style="display:flex;"><span> "total" : 1, </span></span><span style="display:flex;"><span> "successful" : 1, </span></span><span style="display:flex;"><span> "skipped" : 0, </span></span><span style="display:flex;"><span> "failed" : 0 </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><ul> <li>I found a cool tool to help with exporting and restoring Elasticsearch indexes:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --type<span style="color:#f92672">=</span>mapping </span></span><span style="display:flex;"><span>$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --limit<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span> --type<span style="color:#f92672">=</span>data </span></span><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span>Sun, 18 Apr 2021 06:27:07 GMT | Total Writes: 103585 </span></span><span style="display:flex;"><span>Sun, 18 Apr 2021 06:27:07 GMT | dump complete </span></span></code></pre></div><ul> <li>It took only two or three minutes to export everything…</li> <li>I did a test to restore the index:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ elasticdump --input<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --output<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items-test --type<span style="color:#f92672">=</span>mapping </span></span><span style="display:flex;"><span>$ elasticdump --input<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --output<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items-test --limit <span style="color:#ae81ff">1000</span> --type<span style="color:#f92672">=</span>data </span></span></code></pre></div><ul> <li>So that’s pretty cool!</li> <li>I deleted the <code>openrxv-items-final</code> index and <code>openrxv-items-temp</code> indexes and then restored the mappings to <code>openrxv-items-final</code>, added the <code>openrxv-items</code> alias, and started restoring the data to <code>openrxv-items</code> with elasticdump:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-final'</span> </span></span><span style="display:flex;"><span>$ elasticdump --input<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --output<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items-final --type<span style="color:#f92672">=</span>mapping </span></span><span style="display:flex;"><span>$ curl -s -X POST <span style="color:#e6db74">'http://localhost:9200/_aliases'</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'</span> </span></span><span style="display:flex;"><span>$ elasticdump --input<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --output<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --limit <span style="color:#ae81ff">1000</span> --type<span style="color:#f92672">=</span>data </span></span></code></pre></div><ul> <li>AReS seems to be working fine аfter that, so I created the <code>openrxv-items-temp</code> index and then started a fresh harvest on AReS Explorer:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -X PUT <span style="color:#e6db74">"localhost:9200/openrxv-items-temp"</span> </span></span></code></pre></div><ul> <li>Run system updates on CGSpace (linode18) and run the latest Ansible infrastructure playbook to update the DSpace Statistics API, PostgreSQL JDBC driver, etc, and then reboot the system</li> <li>I wasted a bit of time trying to get TSLint and then ESLint running for OpenRXV on GitHub Actions</li> </ul> <h2 id="2021-04-19">2021-04-19</h2> <ul> <li>The AReS harvesting last night seems to have completed successfully, but the number of results is strange:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items </span></span><span style="display:flex;"><span>yellow open openrxv-items-temp kNUlupUyS_i7vlBGiuVxwg 1 1 103741 105553 483.6mb 483.6mb </span></span><span style="display:flex;"><span>yellow open openrxv-items-final HFc3uytTRq2GPpn13vkbmg 1 1 970 0 2.3mb 2.3mb </span></span></code></pre></div><ul> <li>The indices endpoint doesn’t include the <code>openrxv-items</code> alias, but it is currently in the <code>openrxv-items-temp</code> index so the number of items is the same:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items/_count?q=*&pretty'</span> </span></span><span style="display:flex;"><span>{ </span></span><span style="display:flex;"><span> "count" : 103741, </span></span><span style="display:flex;"><span> "_shards" : { </span></span><span style="display:flex;"><span> "total" : 1, </span></span><span style="display:flex;"><span> "successful" : 1, </span></span><span style="display:flex;"><span> "skipped" : 0, </span></span><span style="display:flex;"><span> "failed" : 0 </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span>} </span></span></code></pre></div><ul> <li>A user was having problems resetting their password on CGSpace, with some message about SMTP etc <ul> <li>I checked and we are indeed locked out of our mailbox:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ dspace test-email </span></span><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span>Error sending email: </span></span><span style="display:flex;"><span> - Error: javax.mail.SendFailedException: Send failure (javax.mail.AuthenticationFailedException: 550 5.2.1 Mailbox cannot be accessed [PR0P264CA0280.FRAP264.PROD.OUTLOOK.COM] </span></span><span style="display:flex;"><span>) </span></span></code></pre></div><ul> <li>I have to write to ICT…</li> <li>I decided to switch back to the G1GC garbage collector on DSpace Test <ul> <li>Reading Shawn Heisy’s discussion again: <a href="https://cwiki.apache.org/confluence/display/SOLR/ShawnHeisey">https://cwiki.apache.org/confluence/display/SOLR/ShawnHeisey</a></li> <li>I am curious to check the JVM stats in a few days to see if there is a marked change</li> </ul> </li> <li>Work on minor changes to get DSpace working on Ubuntu 20.04 for our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a></li> </ul> <h2 id="2021-04-21">2021-04-21</h2> <ul> <li>Send Abdullah feedback on the <a href="https://github.com/ilri/OpenRXV/pull/91">filter on click pull request</a> for OpenRXV <ul> <li>I see it adds a new “allow filter on click” checkbox in the layout settings, but it doesn’t modify the filters</li> <li>Also, it seems to have broken the existing clicking of the countries on the map</li> </ul> </li> <li>Atmire recently sent feedback about the CUA duplicates processor <ul> <li>Last month when I ran it it got stuck on the storage reports, apparently, so I will try again (with a fresh Solr statistics core from production) and skip the storage reports (<code>-g</code>):</li> </ul> </li> </ul> <pre tabindex="0"><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m' $ cp atmire-cua-update.xml-20210124-132112.old /home/dspacetest.cgiar.org/config/spring/api/atmire-cua-update.xml $ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -r 100 -c statistics -t 12 -g </code></pre><ul> <li>The first run processed 1,439 docs, the second run processed 0 docs <ul> <li>I’m not sure if that means that it worked? I sent feedback to Atmire</li> </ul> </li> <li>Meeting with Moayad to discuss OpenRXV development progress</li> </ul> <h2 id="2021-04-25">2021-04-25</h2> <ul> <li>The indexes on AReS are messed up again <ul> <li>I made a backup of the indexes, then deleted the <code>openrxv-items-final</code> and <code>openrxv-items-temp</code> indexes, re-created the <code>openrxv-items</code> alias, and restored the data into <code>openrxv-items</code>:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --type<span style="color:#f92672">=</span>mapping </span></span><span style="display:flex;"><span>$ elasticdump --input<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --output<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --limit<span style="color:#f92672">=</span><span style="color:#ae81ff">1000</span> --type<span style="color:#f92672">=</span>data </span></span><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp'</span> </span></span><span style="display:flex;"><span>$ curl -XDELETE <span style="color:#e6db74">'http://localhost:9200/openrxv-items-final'</span> </span></span><span style="display:flex;"><span>$ elasticdump --input<span style="color:#f92672">=</span>/home/aorth/openrxv-items_mapping.json --output<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items-final --type<span style="color:#f92672">=</span>mapping </span></span><span style="display:flex;"><span>$ curl -s -X POST <span style="color:#e6db74">'http://localhost:9200/_aliases'</span> -H <span style="color:#e6db74">'Content-Type: application/json'</span> -d<span style="color:#e6db74">'{"actions" : [{"add" : { "index" : "openrxv-items-final", "alias" : "openrxv-items"}}]}'</span> </span></span><span style="display:flex;"><span>$ elasticdump --input<span style="color:#f92672">=</span>/home/aorth/openrxv-items_data.json --output<span style="color:#f92672">=</span>http://localhost:9200/openrxv-items --limit <span style="color:#ae81ff">1000</span> --type<span style="color:#f92672">=</span>data </span></span></code></pre></div><ul> <li>Then I started a fresh AReS harvest</li> </ul> <h2 id="2021-04-26">2021-04-26</h2> <ul> <li>The AReS harvest last night seems to have finished successfully and the number of items looks good:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s http://localhost:9200/_cat/indices | grep openrxv-items </span></span><span style="display:flex;"><span>yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 0 0 283b 283b </span></span><span style="display:flex;"><span>yellow open openrxv-items-final ul3SKsa7Q9Cd_K7qokBY_w 1 1 103951 0 254mb 254mb </span></span></code></pre></div><ul> <li>And the aliases seem correct for once:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/_alias/'</span> | python -m json.tool </span></span><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span> "openrxv-items-final": { </span></span><span style="display:flex;"><span> "aliases": { </span></span><span style="display:flex;"><span> "openrxv-items": {} </span></span><span style="display:flex;"><span> } </span></span><span style="display:flex;"><span> }, </span></span><span style="display:flex;"><span> "openrxv-items-temp": { </span></span><span style="display:flex;"><span> "aliases": {} </span></span><span style="display:flex;"><span> }, </span></span><span style="display:flex;"><span>... </span></span></code></pre></div><ul> <li>That’s 250 new items in the index since the last harvest!</li> <li>Re-create my local Artifactory container because I’m getting errors starting it and it has been a few months since it was updated:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ podman rm artifactory </span></span><span style="display:flex;"><span>$ podman pull docker.bintray.io/jfrog/artifactory-oss:latest </span></span><span style="display:flex;"><span>$ podman create --ulimit nofile<span style="color:#f92672">=</span>32000:32000 --name artifactory -v artifactory_data:/var/opt/jfrog/artifactory -p 8081-8082:8081-8082 docker.bintray.io/jfrog/artifactory-oss </span></span><span style="display:flex;"><span>$ podman start artifactory </span></span></code></pre></div><ul> <li>Start testing DSpace 7.0 Beta 5 so I can evaluate if it solves some of the problems we are having on DSpace 6, and if it’s missing things like multiple handle resolvers, etc <ul> <li>I see it needs Java JDK 11, Tomcat 9, Solr 8, and PostgreSQL 11</li> <li>Also, according to the <a href="https://wiki.lyrasis.org/display/DSDOC7x/Installing+DSpace">installation notes</a> I see you can install the old DSpace 6 REST API, so that’s potentially useful for us</li> <li>I see that all web applications on the backend are now rolled into just one “server” application</li> <li>The build process took 11 minutes the first time (due to downloading the world with Maven) and ~2 minutes the second time</li> <li>The <code>local.cfg</code> content and syntax is very similar DSpace 6</li> </ul> </li> <li>I got the basic <code>fresh_install</code> up and running <ul> <li>Then I tried to import a DSpace 6 database from production</li> </ul> </li> <li>I tried to delete all the Atmire SQL migrations:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace7b5= > DELETE FROM schema_version WHERE description LIKE '%Atmire%' OR description LIKE '%CUA%' OR description LIKE '%cua%'; </span></span></code></pre></div><ul> <li>But I got an error when running <code>dspace database migrate</code>:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ~/dspace7b5/bin/dspace database migrate </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Database URL: jdbc:postgresql://localhost:5432/dspace7b5 </span></span><span style="display:flex;"><span>Migrating database to latest version... (Check dspace logs for details) </span></span><span style="display:flex;"><span>Migration exception: </span></span><span style="display:flex;"><span>java.sql.SQLException: Flyway migration error occurred </span></span><span style="display:flex;"><span> at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:738) </span></span><span style="display:flex;"><span> at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:632) </span></span><span style="display:flex;"><span> at org.dspace.storage.rdbms.DatabaseUtils.main(DatabaseUtils.java:228) </span></span><span style="display:flex;"><span> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) </span></span><span style="display:flex;"><span> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) </span></span><span style="display:flex;"><span> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) </span></span><span style="display:flex;"><span> at java.base/java.lang.reflect.Method.invoke(Method.java:566) </span></span><span style="display:flex;"><span> at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:273) </span></span><span style="display:flex;"><span> at org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:129) </span></span><span style="display:flex;"><span> at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:94) </span></span><span style="display:flex;"><span>Caused by: org.flywaydb.core.api.FlywayException: Validate failed: </span></span><span style="display:flex;"><span>Detected applied migration not resolved locally: 5.0.2017.09.25 </span></span><span style="display:flex;"><span>Detected applied migration not resolved locally: 6.0.2017.01.30 </span></span><span style="display:flex;"><span>Detected applied migration not resolved locally: 6.0.2017.09.25 </span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"> </span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> at org.flywaydb.core.Flyway.doValidate(Flyway.java:292) </span></span><span style="display:flex;"><span> at org.flywaydb.core.Flyway.access$100(Flyway.java:73) </span></span><span style="display:flex;"><span> at org.flywaydb.core.Flyway$1.execute(Flyway.java:166) </span></span><span style="display:flex;"><span> at org.flywaydb.core.Flyway$1.execute(Flyway.java:158) </span></span><span style="display:flex;"><span> at org.flywaydb.core.Flyway.execute(Flyway.java:527) </span></span><span style="display:flex;"><span> at org.flywaydb.core.Flyway.migrate(Flyway.java:158) </span></span><span style="display:flex;"><span> at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:729) </span></span><span style="display:flex;"><span> ... 9 more </span></span></code></pre></div><ul> <li>I deleted those migrations:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspace7b5= > DELETE FROM schema_version WHERE version IN ('5.0.2017.09.25', '6.0.2017.01.30', '6.0.2017.09.25'); </span></span></code></pre></div><ul> <li>Then when I ran the migration again it failed for a new reason, related to the configurable workflow:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>Database URL: jdbc:postgresql://localhost:5432/dspace7b5 </span></span><span style="display:flex;"><span>Migrating database to latest version... (Check dspace logs for details) </span></span><span style="display:flex;"><span>Migration exception: </span></span><span style="display:flex;"><span>java.sql.SQLException: Flyway migration error occurred </span></span><span style="display:flex;"><span> at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:738) </span></span><span style="display:flex;"><span> at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:632) </span></span><span style="display:flex;"><span> at org.dspace.storage.rdbms.DatabaseUtils.main(DatabaseUtils.java:228) </span></span><span style="display:flex;"><span> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) </span></span><span style="display:flex;"><span> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) </span></span><span style="display:flex;"><span> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) </span></span><span style="display:flex;"><span> at java.base/java.lang.reflect.Method.invoke(Method.java:566) </span></span><span style="display:flex;"><span> at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:273) </span></span><span style="display:flex;"><span> at org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:129) </span></span><span style="display:flex;"><span> at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:94) </span></span><span style="display:flex;"><span>Caused by: org.flywaydb.core.internal.command.DbMigrate$FlywayMigrateException: </span></span><span style="display:flex;"><span>Migration V7.0_2019.05.02__DS-4239-workflow-xml-migration.sql failed </span></span><span style="display:flex;"><span>-------------------------------------------------------------------- </span></span><span style="display:flex;"><span>SQL State : 42P01 </span></span><span style="display:flex;"><span>Error Code : 0 </span></span><span style="display:flex;"><span>Message : ERROR: relation "cwf_pooltask" does not exist </span></span><span style="display:flex;"><span> Position: 8 </span></span><span style="display:flex;"><span>Location : org/dspace/storage/rdbms/sqlmigration/postgres/V7.0_2019.05.02__DS-4239-workflow-xml-migration.sql (/home/aorth/src/apache-tomcat-9.0.45/file:/home/aorth/dspace7b5/lib/dspace-api-7.0-beta5.jar!/org/dspace/storage/rdbms/sqlmigration/postgres/V7.0_2019.05.02__DS-4239-workflow-xml-migration.sql) </span></span><span style="display:flex;"><span>Line : 16 </span></span><span style="display:flex;"><span>Statement : UPDATE cwf_pooltask SET workflow_id='defaultWorkflow' WHERE workflow_id='default' </span></span><span style="display:flex;"><span>... </span></span></code></pre></div><ul> <li>The <a href="https://wiki.lyrasis.org/display/DSDOC7x/Upgrading+DSpace">DSpace 7 upgrade docs</a> say I need to apply these previously optional migrations:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ~/dspace7b5/bin/dspace database migrate ignored </span></span></code></pre></div><ul> <li>Now I see all migrations have completed and DSpace actually starts up fine!</li> <li>I will try to do a full re-index to see how long it takes:</li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ time ~/dspace7b5/bin/dspace index-discovery -b </span></span><span style="display:flex;"><span>... </span></span><span style="display:flex;"><span>~/dspace7b5/bin/dspace index-discovery -b 25156.71s user 64.22s system 97% cpu 7:11:09.94 total </span></span></code></pre></div><ul> <li>Not good, that shit took almost seven hours!</li> </ul> <h2 id="2021-04-27">2021-04-27</h2> <ul> <li>Peter sent me a list of 500+ DOIs from CGSpace with no Altmetric score <ul> <li>I used csvgrep (with Windows encoding!) to extract those without our handle and save the DOIs to a text file, then got their handles with my <code>doi-to-handle.py</code> script:</li> </ul> </li> </ul> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvgrep -e <span style="color:#e6db74">'windows-1252'</span> -c <span style="color:#e6db74">'Handle.net IDs'</span> -i -m <span style="color:#e6db74">'10568/'</span> ~/Downloads/Altmetric<span style="color:#ae81ff">\ </span>-<span style="color:#ae81ff">\ </span>Research<span style="color:#ae81ff">\ </span>Outputs<span style="color:#ae81ff">\ </span>-<span style="color:#ae81ff">\ </span>CGSpace<span style="color:#ae81ff">\ </span>-<span style="color:#ae81ff">\ </span>2021-04-26.csv | csvcut -c DOI | sed <span style="color:#e6db74">'1d'</span> > /tmp/dois.txt </span></span><span style="display:flex;"><span>$ ./ilri/doi-to-handle.py -i /tmp/dois.txt -o /tmp/handles.csv -db dspace63 -u dspace -p <span style="color:#e6db74">'fuuu'</span> -d </span></span></code></pre></div><ul> <li>He will Tweet them…</li> </ul> <h2 id="2021-04-28">2021-04-28</h2> <ul> <li>Grant some IWMI colleagues access to the Atmire Content and Usage stats on CGSpace</li> </ul> <!-- raw HTML omitted --> </article> </div> <!-- /.blog-main --> <aside class="col-sm-3 ml-auto blog-sidebar"> <section class="sidebar-module"> <h4>Recent Posts</h4> <ol class="list-unstyled"> <li><a href="/cgspace-notes/2022-07/">July, 2022</a></li> <li><a href="/cgspace-notes/2022-06/">June, 2022</a></li> <li><a href="/cgspace-notes/2022-05/">May, 2022</a></li> <li><a href="/cgspace-notes/2022-04/">April, 2022</a></li> <li><a href="/cgspace-notes/2022-03/">March, 2022</a></li> </ol> </section> <section class="sidebar-module"> <h4>Links</h4> <ol class="list-unstyled"> <li><a href="https://cgspace.cgiar.org">CGSpace</a></li> <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li> <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li> </ol> </section> </aside> </div> <!-- /.row --> </div> <!-- /.container --> <footer class="blog-footer"> <p dir="auto"> Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>. </p> <p> <a href="#">Back to top</a> </p> </footer> </body> </html>