2021-04-05 18:36:44 +02:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
< meta property = "og:title" content = "April, 2021" / >
< meta property = "og:description" content = "2021-04-01
I wrote a script to query Sherpa’ s API for our ISSNs: sherpa-issn-lookup.py
I’ m curious to see how the results compare with the results from Crossref yesterday
AReS Explorer was down since this morning, I didn’ t see anything in the systemd journal
I simply took everything down with docker-compose and then back up, and then it was OK
Perhaps one of the containers crashed, I should have looked closer but I was in a hurry
" />
< meta property = "og:type" content = "article" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2021-04/" / >
< meta property = "article:published_time" content = "2021-04-01T09:50:54+03:00" / >
2021-05-02 18:55:06 +02:00
< meta property = "article:modified_time" content = "2021-04-28T18:57:48+03:00" / >
2021-04-05 18:36:44 +02:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "April, 2021" / >
< meta name = "twitter:description" content = "2021-04-01
I wrote a script to query Sherpa’ s API for our ISSNs: sherpa-issn-lookup.py
I’ m curious to see how the results compare with the results from Crossref yesterday
AReS Explorer was down since this morning, I didn’ t see anything in the systemd journal
I simply took everything down with docker-compose and then back up, and then it was OK
Perhaps one of the containers crashed, I should have looked closer but I was in a hurry
"/>
2022-02-23 12:46:23 +01:00
< meta name = "generator" content = "Hugo 0.92.2" / >
2021-04-05 18:36:44 +02:00
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "April, 2021",
"url": "https://alanorth.github.io/cgspace-notes/2021-04/",
2021-11-09 05:29:52 +01:00
"wordCount": "4668",
2021-04-05 18:36:44 +02:00
"datePublished": "2021-04-01T09:50:54+03:00",
2021-05-02 18:55:06 +02:00
"dateModified": "2021-04-28T18:57:48+03:00",
2021-04-05 18:36:44 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2021-04/" >
< title > April, 2021 | CGSpace Notes< / title >
<!-- combined, minified CSS -->
< link href = "https://alanorth.github.io/cgspace-notes/css/style.beb8012edc08ba10be012f079d618dc243812267efe62e11f22fe49618f976a4.css" rel = "stylesheet" integrity = "sha256-vrgBLtwIuhC+AS8HnWGNwkOBImfv5i4R8i/klhj5dqQ=" crossorigin = "anonymous" >
<!-- minified Font Awesome for SVG icons -->
2021-09-28 09:32:32 +02:00
< script defer src = "https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity = "sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz+lcnA=" crossorigin = "anonymous" > < / script >
2021-04-05 18:36:44 +02:00
<!-- RSS 2.0 feed -->
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" dir = "auto" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2021-04/" > April, 2021< / a > < / h2 >
< p class = "blog-post-meta" >
< time datetime = "2021-04-01T09:50:54+03:00" > Thu Apr 01, 2021< / time >
in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2021-04-01" > 2021-04-01< / h2 >
< ul >
< li > I wrote a script to query Sherpa’ s API for our ISSNs: < code > sherpa-issn-lookup.py< / code >
< ul >
< li > I’ m curious to see how the results compare with the results from Crossref yesterday< / li >
< / ul >
< / li >
< li > AReS Explorer was down since this morning, I didn’ t see anything in the systemd journal
< ul >
< li > I simply took everything down with docker-compose and then back up, and then it was OK< / li >
< li > Perhaps one of the containers crashed, I should have looked closer but I was in a hurry< / li >
< / ul >
< / li >
< / ul >
< h2 id = "2021-04-03" > 2021-04-03< / h2 >
< ul >
< li > Biruk from ICT contacted me to say that some CGSpace users still can’ t log in
< ul >
< li > I guess the CGSpace LDAP bind account is really still locked after last week’ s reset< / li >
< li > He fixed the account and then I was finally able to bind and query:< / li >
< / ul >
< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b < span style = "color:#e6db74" > " dc=cgiarad,dc=org" < / span > -D < span style = "color:#e6db74" > " cgspace-account" < / span > -W < span style = "color:#e6db74" > " (sAMAccountName=otheraccounttoquery)" < / span >
< / code > < / pre > < / div > < h2 id = "2021-04-04" > 2021-04-04< / h2 >
2021-04-05 18:36:44 +02:00
< ul >
< li > Check the index aliases on AReS Explorer to make sure they are sane before starting a new harvest:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -s < span style = "color:#e6db74" > ' http://localhost:9200/_alias/' < / span > | python -m json.tool | less
< / code > < / pre > < / div > < ul >
2021-04-05 18:36:44 +02:00
< li > Then set the < code > openrxv-items-final< / code > index to read-only so we can make a backup:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -X PUT < span style = "color:#e6db74" > " localhost:9200/openrxv-items-final/_settings" < / span > -H < span style = "color:#e6db74" > ' Content-Type: application/json' < / span > -d< span style = "color:#e6db74" > ' {" settings" : {" index.blocks.write" : true}}' < / span >
{" acknowledged" :true}%
2021-04-05 18:36:44 +02:00
$ curl -s -X POST http://localhost:9200/openrxv-items-final/_clone/openrxv-items-final-backup
2021-11-09 05:29:52 +01:00
{" acknowledged" :true," shards_acknowledged" :true," index" :" openrxv-items-final-backup" }%
$ curl -X PUT < span style = "color:#e6db74" > " localhost:9200/openrxv-items-final/_settings" < / span > -H < span style = "color:#e6db74" > ' Content-Type: application/json' < / span > -d< span style = "color:#e6db74" > ' {" settings" : {" index.blocks.write" : false}}' < / span >
< / code > < / pre > < / div > < ul >
2021-04-05 18:36:44 +02:00
< li > Then start a harvesting on AReS Explorer< / li >
< li > Help Enrico get some 2020 statistics for the Roots, Tubers and Bananas (RTB) community on CGSpace
< ul >
< li > He was hitting < a href = "https://github.com/ilri/OpenRXV/issues/66" > a bug on AReS< / a > and also he only needed stats for 2020, and AReS currently only gives all-time stats< / li >
< / ul >
< / li >
< li > I cleaned up about 230 ISSNs on CGSpace in OpenRefine
< ul >
< li > I had exported them last week, then filtered for anything not looking like an ISSN with this GREL: < code > isNotNull(value.match(/^\p{Alnum}{4}-\p{Alnum}{4}$/))< / code > < / li >
< li > Then I applied them on CGSpace with the < code > fix-metadata-values.py< / code > script:< / li >
< / ul >
< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ ./ilri/fix-metadata-values.py -i /tmp/2021-04-01-ISSNs.csv -db dspace -u dspace -p < span style = "color:#e6db74" > ' fuuu' < / span > -f cg.issn -t < span style = "color:#e6db74" > ' correct' < / span > -m < span style = "color:#ae81ff" > 253< / span >
< / code > < / pre > < / div > < ul >
2021-04-05 18:36:44 +02:00
< li > For now I only fixed obvious errors like “ 1234-5678.” and “ e-ISSN: 1234-5678” etc, but there are still lots of invalid ones which need more manual work:
< ul >
< li > Too few characters< / li >
< li > Too many characters< / li >
< li > ISBNs< / li >
< / ul >
< / li >
< li > Create the CGSpace community and collection structure for the new Accelerating Impacts of CGIAR Climate Research for Africa (AICCRA) and assign all workflow steps< / li >
< / ul >
2021-04-13 20:13:08 +02:00
< h2 id = "2021-04-05" > 2021-04-05< / h2 >
2021-04-05 18:36:44 +02:00
< ul >
< li > The AReS Explorer harvesting from yesterday finished, and the results look OK, but actually the Elasticsearch indexes are messed up again:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -s < span style = "color:#e6db74" > ' http://localhost:9200/_alias/' < / span > | python -m json.tool
2021-04-05 18:36:44 +02:00
{
2021-11-09 05:29:52 +01:00
" openrxv-items-final" : {
" aliases" : {}
2021-04-05 18:36:44 +02:00
},
2021-11-09 05:29:52 +01:00
" openrxv-items-temp" : {
" aliases" : {
" openrxv-items" : {}
2021-04-05 18:36:44 +02:00
}
},
...
}
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-05 18:36:44 +02:00
< li > < code > openrxv-items< / code > should be an alias of < code > openrxv-items-final< / code > , not < code > openrxv-temp< / code > … I will have to fix that manually< / li >
< li > Enrico asked for more information on the RTB stats I gave him yesterday
< ul >
< li > I remembered (again) that we can’ t filter Atmire’ s CUA stats by date issued< / li >
< li > To show, for example, views/downloads in the year 2020 for RTB issued in 2020, we would need to use the DSpace statistics API and post a list of IDs and a custom date range< / li >
< li > I tried to do that here by exporting the RTB community and extracting the IDs for items issued in 2020:< / li >
< / ul >
< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ ~/dspace63/bin/dspace metadata-export -i 10568/80100 -f /tmp/rtb.csv
$ csvcut -c < span style = "color:#e6db74" > ' id,dcterms.issued,dcterms.issued[],dcterms.issued[en_US]' < / span > /tmp/rtb.csv | < span style = "color:#ae81ff" > \
< / span > < span style = "color:#ae81ff" > < / span > sed ' 1d' | \
csvsql --no-header --no-inference --query ' SELECT a AS id,COALESCE(b, " " )||COALESCE(c, " " )||COALESCE(d, " " ) AS issued FROM stdin' | \
2021-04-05 18:36:44 +02:00
csvgrep -c issued -m 2020 | \
csvcut -c id | \
2021-11-09 05:29:52 +01:00
sed ' 1d' | \
2021-04-05 18:36:44 +02:00
sort | \
uniq
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-05 18:36:44 +02:00
< li > So I remember in the future, this basically does the following:
< ul >
< li > Use csvcut to extract the id and all date issued columns from the CSV< / li >
< li > Use sed to remove the header so we can refer to the columns using default a, b, c instead of their real names (which are tricky to match due to special characters)< / li >
< li > Use csvsql to concatenate the various date issued columns (coalescing where null)< / li >
< li > Use csvgrep to filter items by date issued in 2020< / li >
< li > Use csvcut to extract the id column< / li >
< li > Use sed to delete the header row< / li >
< li > Use sort and uniq to filter out any duplicate IDs (there were three)< / li >
< / ul >
< / li >
< li > Then I have a list of 296 IDs for RTB items issued in 2020< / li >
< li > I constructed a JSON file to post to the DSpace Statistics API:< / li >
< / ul >
2021-08-01 15:19:05 +02:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-json" data-lang = "json" > {
2021-04-05 18:36:44 +02:00
< span style = "color:#f92672" > " limit" < / span > : < span style = "color:#ae81ff" > 100< / span > ,
< span style = "color:#f92672" > " page" < / span > : < span style = "color:#ae81ff" > 0< / span > ,
< span style = "color:#f92672" > " dateFrom" < / span > : < span style = "color:#e6db74" > " 2020-01-01T00:00:00Z" < / span > ,
< span style = "color:#f92672" > " dateTo" < / span > : < span style = "color:#e6db74" > " 2020-12-31T00:00:00Z" < / span > ,
< span style = "color:#f92672" > " items" < / span > : [
< span style = "color:#e6db74" > " 00358715-b70c-4fdd-aa55-730e05ba739e" < / span > ,
< span style = "color:#e6db74" > " 004b54bb-f16f-4cec-9fbc-ab6c6345c43d" < / span > ,
< span style = "color:#e6db74" > " 02fb7630-d71a-449e-b65d-32b4ea7d6904" < / span > ,
< span style = "color:#960050;background-color:#1e0010" > ...< / span >
]
}
< / code > < / pre > < / div > < ul >
< li > Then I submitted the file three times (changing the page parameter):< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -s -d @/tmp/2020-items.txt https://cgspace.cgiar.org/rest/statistics/items | json_pp > /tmp/page1.json
2021-04-05 18:36:44 +02:00
$ curl -s -d @/tmp/2020-items.txt https://cgspace.cgiar.org/rest/statistics/items | json_pp > /tmp/page2.json
$ curl -s -d @/tmp/2020-items.txt https://cgspace.cgiar.org/rest/statistics/items | json_pp > /tmp/page3.json
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-05 18:36:44 +02:00
< li > Then I extracted the views and downloads in the most ridiculous way:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ grep views /tmp/page*.json | grep -o -E < span style = "color:#e6db74" > ' [0-9]+$' < / span > | sed < span style = "color:#e6db74" > ' s/,//' < / span > | xargs | sed -e < span style = "color:#e6db74" > ' s/ /+/g' < / span > | bc
2021-04-05 18:36:44 +02:00
30364
2021-11-09 05:29:52 +01:00
$ grep downloads /tmp/page*.json | grep -o -E < span style = "color:#e6db74" > ' [0-9]+,' < / span > | sed < span style = "color:#e6db74" > ' s/,//' < / span > | xargs | sed -e < span style = "color:#e6db74" > ' s/ /+/g' < / span > | bc
2021-04-05 18:36:44 +02:00
9100
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-05 18:36:44 +02:00
< li > For curiousity I did the same exercise for items issued in 2019 and got the following:
< ul >
< li > Views: 30721< / li >
< li > Downloads: 10205< / li >
< / ul >
< / li >
< / ul >
2021-04-06 21:33:43 +02:00
< h2 id = "2021-04-06" > 2021-04-06< / h2 >
< ul >
< li > Margarita from CCAFS was having problems deleting an item from CGSpace again
< ul >
< li > The error was “ Authorization denied for action OBSOLETE (DELETE) on BITSTREAM:bd157345-448e … ” < / li >
< li > This is the same issue as last month< / li >
< / ul >
< / li >
< li > Create a new collection on CGSpace for a new CIP project at Mishel Portilla’ s request< / li >
< li > I got a notice that CGSpace was down
< ul >
< li > I didn’ t see anything strange at first, but there are an insane amount of database connections:< / li >
< / ul >
< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-06 21:33:43 +02:00
12413
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-06 21:33:43 +02:00
< li > The system journal shows thousands of these messages in the system journal, this is the first one:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > Apr 06 07:52:13 linode18 tomcat7[556]: Apr 06, 2021 7:52:13 AM org.apache.tomcat.jdbc.pool.ConnectionPool abandon
< / code > < / pre > < / div > < ul >
2021-04-06 21:33:43 +02:00
< li > Around that time in the dspace log I see nothing unusual, but maybe these?< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > 2021-04-06 07:52:29,409 INFO com.atmire.dspace.cua.CUASolrLoggerServiceImpl @ Updating : 200/127 docs in http://localhost:8081/solr/statistics
< / code > < / pre > < / div > < ul >
2021-04-06 21:33:43 +02:00
< li > (BTW what is the deal with the “ 200/127” ? I should send a comment to Atmire)
< ul >
< li > I file a ticket with Atmire: < a href = "https://tracker.atmire.com/tickets-cgiar-ilri/view-tickets" > https://tracker.atmire.com/tickets-cgiar-ilri/view-tickets< / a > < / li >
< / ul >
< / li >
< li > I restarted the PostgreSQL and Tomcat services and now I see less connections, but still WAY high:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-06 21:33:43 +02:00
3640
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-06 21:33:43 +02:00
2968
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-06 21:33:43 +02:00
13
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-06 21:33:43 +02:00
< li > After ten minutes or so it went back down… < / li >
< li > And now it’ s back up in the thousands… I am seeing a lot of stuff in dspace log like this:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > 2021-04-06 11:59:34,364 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717951
2021-04-06 21:33:43 +02:00
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717952
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717953
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717954
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717955
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717956
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717957
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717958
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717959
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717960
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717961
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717962
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717963
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717964
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717965
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717966
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717967
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717968
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717969
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717970
2021-04-06 11:59:34,365 INFO org.dspace.content.MetadataValueServiceImpl @ user.hidden@cgiar.org:session_id=65F32E67CE8E347F64EFB5EB4E349B9B:delete_metadata_value: metadata_value_id=5717971
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-06 21:33:43 +02:00
< li > I sent some notes and a log to Atmire on our existing issue about the database stuff
< ul >
< li > Also I asked them about the possibility of doing a formal review of Hibernate< / li >
< / ul >
< / li >
< li > Falcon 3.0.0 was released so I updated the 3.0.0 branch for dspace-statistics-api and merged it to < code > v6_x< / code >
< ul >
< li > I also fixed one minor (unrelated) bug in the tests< / li >
< li > Then I deployed the new version on DSpace Test< / li >
< / ul >
< / li >
< li > I had a meeting with Peter and Abenet about CGSpace TODOs< / li >
< li > CGSpace went down again and the PostgreSQL locks are through the roof:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-06 21:33:43 +02:00
12154
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-06 21:33:43 +02:00
< li > I don’ t see any activity on REST API, but in the last four hours there have been 3,500 DSpace sessions:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > # grep -a -E < span style = "color:#e6db74" > ' 2021-04-06 (13|14|15|16|17):' < / span > /home/cgspace.cgiar.org/log/dspace.log.2021-04-06 | grep -o -E < span style = "color:#e6db74" > ' session_id=[A-Z0-9]{32}' < / span > | sort | uniq | wc -l
2021-04-06 21:33:43 +02:00
3547
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-06 21:33:43 +02:00
< li > I looked at the same time of day for the past few weeks and it seems to be a normal number of sessions:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > # < span style = "color:#66d9ef" > for< / span > file in /home/cgspace.cgiar.org/log/dspace.log.2021-0< span style = "color:#f92672" > {< / span > 3,4< span style = "color:#f92672" > }< / span > -*; < span style = "color:#66d9ef" > do< / span > grep -a -E < span style = "color:#e6db74" > " 2021-0(3|4)-[0-9]{2} (13|14|15|16|17):" < / span > < span style = "color:#e6db74" > " < / span > $file< span style = "color:#e6db74" > " < / span > | grep -o -E < span style = "color:#e6db74" > ' session_id=[A-Z0-9]{32}' < / span > | sort | uniq | wc -l; < span style = "color:#66d9ef" > done< / span >
2021-04-06 21:33:43 +02:00
...
3572
4085
3476
3128
2949
2016
1839
4513
3463
4425
3328
2783
3898
3848
7799
255
534
2755
599
4463
3547
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-06 21:33:43 +02:00
< li > What about total number of sessions per day?< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > # < span style = "color:#66d9ef" > for< / span > file in /home/cgspace.cgiar.org/log/dspace.log.2021-0< span style = "color:#f92672" > {< / span > 3,4< span style = "color:#f92672" > }< / span > -*; < span style = "color:#66d9ef" > do< / span > echo < span style = "color:#e6db74" > " < / span > $file< span style = "color:#e6db74" > :" < / span > ; grep -a -o -E < span style = "color:#e6db74" > ' session_id=[A-Z0-9]{32}' < / span > < span style = "color:#e6db74" > " < / span > $file< span style = "color:#e6db74" > " < / span > | sort | uniq | wc -l; < span style = "color:#66d9ef" > done< / span >
2021-04-06 21:33:43 +02:00
...
/home/cgspace.cgiar.org/log/dspace.log.2021-03-28:
11784
/home/cgspace.cgiar.org/log/dspace.log.2021-03-29:
15104
/home/cgspace.cgiar.org/log/dspace.log.2021-03-30:
19396
/home/cgspace.cgiar.org/log/dspace.log.2021-03-31:
32612
/home/cgspace.cgiar.org/log/dspace.log.2021-04-01:
26037
/home/cgspace.cgiar.org/log/dspace.log.2021-04-02:
14315
/home/cgspace.cgiar.org/log/dspace.log.2021-04-03:
12530
/home/cgspace.cgiar.org/log/dspace.log.2021-04-04:
13138
/home/cgspace.cgiar.org/log/dspace.log.2021-04-05:
16756
/home/cgspace.cgiar.org/log/dspace.log.2021-04-06:
12343
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-06 21:33:43 +02:00
< li > So it’ s not the number of sessions… it’ s something with the workload… < / li >
< li > I had to step away for an hour or so and when I came back the site was still down and there were still 12,000 locks
< ul >
< li > I restarted postgresql and tomcat7… < / li >
< / ul >
< / li >
< li > The locks in PostgreSQL shot up again… < / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-06 21:33:43 +02:00
3447
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-06 21:33:43 +02:00
3527
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-06 21:33:43 +02:00
4582
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-06 21:48:44 +02:00
< li > I don’ t know what the hell is going on, but the PostgreSQL connections and locks are way higher than ever before:< / li >
< / ul >
< p > < img src = "/cgspace-notes/2021/04/postgres_connections_cgspace-week.png" alt = "PostgreSQL connections week" >
< img src = "/cgspace-notes/2021/04/postgres_locks_cgspace-week.png" alt = "PostgreSQL locks week" >
< img src = "/cgspace-notes/2021/04/jmx_tomcat_dbpools-week.png" alt = "Tomcat database pool" > < / p >
< ul >
< li > Otherwise, the number of DSpace sessions is completely normal:< / li >
< / ul >
< p > < img src = "/cgspace-notes/2021/04/jmx_dspace_sessions-week.png" alt = "DSpace sessions" > < / p >
< ul >
< li > While looking at the nginx logs I see that MEL is trying to log into CGSpace’ s REST API and delete items:< / li >
2021-04-06 21:33:43 +02:00
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > 34.209.213.122 - - [06/Apr/2021:03:50:46 +0200] " POST /rest/login HTTP/1.1" 401 727 " -" " MEL"
34.209.213.122 - - [06/Apr/2021:03:50:48 +0200] " DELETE /rest/items/95f52bf1-f082-4e10-ad57-268a76ca18ec/metadata HTTP/1.1" 401 704 " -" " -"
< / code > < / pre > < / div > < ul >
2021-04-06 21:48:44 +02:00
< li > I see a few of these per day going back several months
< ul >
< li > I sent a message to Salem and Enrico to ask if they know< / li >
< / ul >
< / li >
2021-04-06 21:33:43 +02:00
< li > Also annoying, I see tons of what look like penetration testing requests from Qualys:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > 2021-04-04 06:35:17,889 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:failed_login:no DN found for user " ' > < qss a=X158062356Y1_2Z>
2021-04-04 06:35:17,889 INFO org.dspace.authenticate.PasswordAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:authenticate:attempting password auth of user=" ' > < qss a=X158062356Y1_2Z>
2021-04-04 06:35:17,890 INFO org.dspace.app.xmlui.utils.AuthenticationUtil @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:failed_login:email=" ' > < qss a=X158062356Y1_2Z> , realm=null, result=2
2021-04-06 21:33:43 +02:00
2021-04-04 06:35:18,145 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:auth:attempting trivial auth of user=was@qualys.com
2021-04-04 06:35:18,519 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:failed_login:no DN found for user was@qualys.com
2021-04-04 06:35:18,520 INFO org.dspace.authenticate.PasswordAuthentication @ anonymous:session_id=FF1E051BCA7D81CC5A807D85380D81E5:ip_addr=64.39.108.48:authenticate:attempting password auth of user=was@qualys.com
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-06 21:33:43 +02:00
< li > I deleted the ilri/AReS repository on GitHub since we haven’ t updated it in two years
< ul >
< li > All development is happening in < a href = "https://github.com/ilri/openRXV" > https://github.com/ilri/openRXV< / a > now< / li >
< / ul >
< / li >
2021-04-09 14:00:21 +02:00
< li > 10PM and the server is down again, with locks through the roof:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
12198
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-09 14:00:21 +02:00
< li > I see that there are tons of PostgreSQL connections getting abandoned today, compared to very few in the past few weeks:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ journalctl -u tomcat7 --since< span style = "color:#f92672" > =< / span > today | grep -c < span style = "color:#e6db74" > ' ConnectionPool abandon' < / span >
2021-04-09 14:00:21 +02:00
1838
2021-11-09 05:29:52 +01:00
$ journalctl -u tomcat7 --since< span style = "color:#f92672" > =< / span > 2021-03-20 --until< span style = "color:#f92672" > =< / span > 2021-04-05 | grep -c < span style = "color:#e6db74" > ' ConnectionPool abandon' < / span >
2021-04-09 14:00:21 +02:00
3
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-09 14:00:21 +02:00
< li > I even restarted the server and connections were low for a few minutes until they shot back up:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
13
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
8651
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
8940
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
10504
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-09 14:00:21 +02:00
< li > I had to go to bed and I bet it will crash and be down for hours until I wake up… < / li >
< li > What the hell is this user agent?< / li >
< / ul >
2021-09-13 15:21:16 +02:00
< pre tabindex = "0" > < code > 54.197.119.143 - - [06/Apr/2021:19:18:11 +0200] " GET /handle/10568/16499 HTTP/1.1" 499 0 " -" " GetUrl/1.0 wdestiny@umich.edu (Linux)"
2021-04-09 14:00:21 +02:00
< / code > < / pre > < h2 id = "2021-04-07" > 2021-04-07< / h2 >
< ul >
< li > CGSpace was still down from last night of course, with tons of database locks:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
12168
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-09 14:00:21 +02:00
< li > I restarted the server again and the locks came back< / li >
< li > Atmire responded to the message from yesterday
< ul >
< li > The noticed something in the logs about emails failing to be sent< / li >
< li > There appears to be an issue sending mails on workflow tasks when a user in that group has an invalid email address:< / li >
< / ul >
< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > 2021-04-01 12:45:11,414 WARN org.dspace.workflowbasic.BasicWorkflowServiceImpl @ a.akwarandu@cgiar.org:session_id=2F20F20D4A8C36DB53D42DE45DFA3CCE:notifyGroupofTask:cannot email user group_id=aecf811b-b7e9-4b6f-8776-3d372e6a048b workflow_item_id=33085\colon; Invalid Addresses (com.sun.mail.smtp.SMTPAddressFailedException\colon; 501 5.1.3 Invalid address
< / code > < / pre > < / div > < ul >
2021-04-09 14:00:21 +02:00
< li > The issue is not the named user above, but a member of the group… < / li >
< li > And the group does have users with invalid email addresses (probably accounts created automatically after authenticating with LDAP):< / li >
< / ul >
< p > < img src = "/cgspace-notes/2021/04/group-invalid-email.png" alt = "DSpace group" > < / p >
< ul >
< li > I extracted all the group IDs from recent logs that had users with invalid email addresses:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ grep -a -E < span style = "color:#e6db74" > ' email user group_id=\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b' < / span > /home/cgspace.cgiar.org/log/dspace.log.* | grep -o -E < span style = "color:#e6db74" > ' \b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b' < / span > | sort | uniq
2021-04-09 14:00:21 +02:00
0a30d6ae-74a6-4eee-a8f5-ee5d15192ee6
1769137c-36d4-42b2-8fec-60585e110db7
203c8614-8a97-4ac8-9686-d9d62cb52acc
294603de-3d09-464e-a5b0-09e452c6b5ab
35878555-9623-4679-beb8-bb3395fdf26e
3d8a5efa-5509-4bf9-9374-2bc714aceb99
4238208a-f848-47cb-9dd2-43f9f954a4af
44939b84-1894-41e7-b3e6-8c8d1781057b
49ba087e-75a3-45ce-805c-69eeda0f786b
4a6606ce-0284-421d-bf80-4dafddba2d42
527de6aa-9cd0-4988-bf5f-c9c92ba2ac10
54cd1b16-65bf-4041-9d84-fb2ea3301d6d
58982847-5f7c-4b8b-a7b0-4d4de702136e
5f0b85be-bd23-47de-927d-bca368fa1fbc
646ada17-e4ef-49f6-9378-af7e58596ce1
7e2f4bf8-fbc9-4b2f-97a4-75e5427bef90
8029fd53-f9f5-4107-bfc3-8815507265cf
81faa934-c602-4608-bf45-de91845dfea7
8611a462-210c-4be1-a5bb-f87a065e6113
8855c903-ef86-433c-b0be-c12300eb0f84
8c7ece98-3598-4de7-a885-d61fd033bea8
8c9a0d01-2d12-4a99-84f9-cdc25ac072f9
8f9f888a-b501-41f3-a462-4da16150eebf
94168f0e-9f45-4112-ac8d-3ba9be917842
96998038-f381-47dc-8488-ff7252703627
9768f4a8-3018-44e9-bf58-beba4296327c
9a99e8d2-558e-4fc1-8011-e4411f658414
a34e6400-78ed-45c0-a751-abc039eed2e6
a9da5af3-4ec7-4a9b-becb-6e3d028d594d
abf5201c-8be5-4dee-b461-132203dd51cb
adb5658c-cef3-402f-87b6-b498f580351c
aecf811b-b7e9-4b6f-8776-3d372e6a048b
ba5aae61-ea34-4ac1-9490-4645acf2382f
bf7f3638-c7c6-4a8f-893d-891a6d3dafff
c617ada0-09d1-40ed-b479-1c4860a4f724
cff91d44-a855-458c-89e5-bd48c17d1a54
e65171ae-a2bf-4043-8f54-f8457bc9174b
e7098b40-4701-4ca2-b9a9-3a1282f67044
e904f122-71dc-439b-b877-313ef62486d7
ede59734-adac-4c01-8691-b45f19088d37
f88bd6bb-f93f-41cb-872f-ff26f6237068
f985f5fb-be5c-430b-a8f1-cf86ae4fc49a
fe800006-aaec-4f9e-9ab4-f9475b4cbdc3
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < h2 id = "2021-04-08" > 2021-04-08< / h2 >
2021-04-09 14:00:21 +02:00
< ul >
< li > I can’ t believe it but the server has been down for twelve hours or so
< ul >
< li > The locks have not changed since I went to bed last night:< / li >
< / ul >
< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
12070
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-09 14:00:21 +02:00
< li > I restarted PostgreSQL and Tomcat and the locks go straight back up!< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
13
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
986
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
1194
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
1212
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
1489
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
2124
2021-11-09 05:29:52 +01:00
$ psql -c < span style = "color:#e6db74" > ' SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' < / span > | wc -l
2021-04-09 14:00:21 +02:00
5934
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < h2 id = "2021-04-09" > 2021-04-09< / h2 >
2021-04-09 14:00:21 +02:00
< ul >
< li > Atmire managed to get CGSpace back up by killing all the PostgreSQL connections yesterday
< ul >
< li > I don’ t know how they did it… < / li >
< li > They also think it’ s weird that restarting PostgreSQL didn’ t kill the connections< / li >
< li > They asked some more questions, like for example if there were also issues on DSpace Test< / li >
< li > Strangely enough, I checked DSpace Test and notice a clear spike in PostgreSQL locks on the morning of April 6th as well!< / li >
< / ul >
< / li >
< / ul >
< p > < img src = "/cgspace-notes/2021/04/postgres_locks_ALL-week-PROD.png" alt = "PostgreSQL locks week CGSpace" >
< img src = "/cgspace-notes/2021/04/postgres_locks_ALL-week-TEST.png" alt = "PostgreSQL locks week DSpace Test" > < / p >
< ul >
< li > I definitely need to look into that!< / li >
2021-04-06 21:33:43 +02:00
< / ul >
2021-04-13 20:13:08 +02:00
< h2 id = "2021-04-11" > 2021-04-11< / h2 >
< ul >
< li > I am trying to resolve the AReS Elasticsearch index issues that happened last week
< ul >
< li > I decided to back up the < code > openrxv-items< / code > index to < code > openrxv-items-backup< / code > and then delete all the others:< / li >
< / ul >
< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -X PUT < span style = "color:#e6db74" > " localhost:9200/openrxv-items/_settings" < / span > -H < span style = "color:#e6db74" > ' Content-Type: application/json' < / span > -d< span style = "color:#e6db74" > ' {" settings" : {" index.blocks.write" : true}}' < / span >
2021-04-13 20:13:08 +02:00
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-backup
2021-11-09 05:29:52 +01:00
$ curl -X PUT < span style = "color:#e6db74" > " localhost:9200/openrxv-items/_settings" < / span > -H < span style = "color:#e6db74" > ' Content-Type: application/json' < / span > -d< span style = "color:#e6db74" > ' {" settings" : {" index.blocks.write" : false}}' < / span >
$ curl -XDELETE < span style = "color:#e6db74" > ' http://localhost:9200/openrxv-items-temp' < / span >
$ curl -XDELETE < span style = "color:#e6db74" > ' http://localhost:9200/openrxv-items-final' < / span >
< / code > < / pre > < / div > < ul >
2021-04-13 20:13:08 +02:00
< li > Then I updated all Docker containers and rebooted the server (linode20) so that the correct indexes would be created again:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ docker images | grep -v ^REPO | sed < span style = "color:#e6db74" > ' s/ \+/:/g' < / span > | cut -d: -f1,2 | xargs -L1 docker pull
< / code > < / pre > < / div > < ul >
2021-04-13 20:13:08 +02:00
< li > Then I realized I have to clone the backup index directly to < code > openrxv-items-final< / code > , and re-create the < code > openrxv-items< / code > alias:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -XDELETE < span style = "color:#e6db74" > ' http://localhost:9200/openrxv-items-final' < / span >
$ curl -X PUT < span style = "color:#e6db74" > " localhost:9200/openrxv-items-backup/_settings" < / span > -H < span style = "color:#e6db74" > ' Content-Type: application/json' < / span > -d< span style = "color:#e6db74" > ' {" settings" : {" index.blocks.write" : true}}' < / span >
2021-04-13 20:13:08 +02:00
$ curl -s -X POST http://localhost:9200/openrxv-items-backup/_clone/openrxv-items-final
2021-11-09 05:29:52 +01:00
$ curl -s -X POST < span style = "color:#e6db74" > ' http://localhost:9200/_aliases' < / span > -H < span style = "color:#e6db74" > ' Content-Type: application/json' < / span > -d< span style = "color:#e6db74" > ' {" actions" : [{" add" : { " index" : " openrxv-items-final" , " alias" : " openrxv-items" }}]}' < / span >
< / code > < / pre > < / div > < ul >
2021-04-13 20:13:08 +02:00
< li > Now I see both < code > openrxv-items-final< / code > and < code > openrxv-items< / code > have the current number of items:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -s < span style = "color:#e6db74" > ' http://localhost:9200/openrxv-items/_count?q=*& pretty' < / span >
2021-04-13 20:13:08 +02:00
{
2021-11-09 05:29:52 +01:00
" count" : 103373,
" _shards" : {
" total" : 1,
" successful" : 1,
" skipped" : 0,
" failed" : 0
2021-04-13 20:13:08 +02:00
}
}
2021-11-09 05:29:52 +01:00
$ curl -s < span style = "color:#e6db74" > ' http://localhost:9200/openrxv-items-final/_count?q=*& pretty' < / span >
2021-04-13 20:13:08 +02:00
{
2021-11-09 05:29:52 +01:00
" count" : 103373,
" _shards" : {
" total" : 1,
" successful" : 1,
" skipped" : 0,
" failed" : 0
2021-04-13 20:13:08 +02:00
}
}
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-13 20:13:08 +02:00
< li > Then I started a fresh harvesting in the AReS Explorer admin dashboard< / li >
< / ul >
< h2 id = "2021-04-12" > 2021-04-12< / h2 >
< ul >
< li > The harvesting on AReS finished last night, but the indexes got messed up again
< ul >
< li > I will have to fix them manually next time… < / li >
< / ul >
< / li >
< / ul >
< h2 id = "2021-04-13" > 2021-04-13< / h2 >
< ul >
< li > Looking into the logs on 2021-04-06 on CGSpace and DSpace Test to see if there is anything specific that stands out about the activty on those days that would cause the PostgreSQL issues
< ul >
< li > Digging into the Munin graphs for the last week I found a few other things happening on that morning:< / li >
< / ul >
< / li >
< / ul >
< p > < img src = "/cgspace-notes/2021/04/sda-week.png" alt = "/dev/sda disk latency week" >
< img src = "/cgspace-notes/2021/04/classes_unloaded-week.png" alt = "JVM classes unloaded week" >
< img src = "/cgspace-notes/2021/04/nginx_status-week.png" alt = "Nginx status week" > < / p >
< ul >
< li > 13,000 requests in the last two months from a user with user agent < code > SomeRandomText< / code > , for example:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > 84.33.2.97 - - [06/Apr/2021:06:25:13 +0200] " GET /bitstream/handle/10568/77776/CROP%20SCIENCE.jpg.jpg HTTP/1.1" 404 10890 " -" " SomeRandomText"
< / code > < / pre > < / div > < ul >
2021-04-13 20:13:08 +02:00
< li > I purged them:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ ./ilri/check-spider-hits.sh -f /tmp/agents.txt -p
2021-04-13 20:13:08 +02:00
Purging 13159 hits from SomeRandomText in statistics
2021-11-09 05:29:52 +01:00
< span style = "color:#960050;background-color:#1e0010" >
< / span > < span style = "color:#960050;background-color:#1e0010" > < / span > Total number of bot hits purged: 13159
< / code > < / pre > < / div > < ul >
2021-04-13 20:13:08 +02:00
< li > I noticed there were 78 items submitted in the hour before CGSpace crashed:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > # grep -a -E < span style = "color:#e6db74" > ' 2021-04-06 0(6|7):' < / span > /home/cgspace.cgiar.org/log/dspace.log.2021-04-06 | grep -c -a add_item
2021-04-13 20:13:08 +02:00
78
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-13 20:13:08 +02:00
< li > Of those 78, 77 of them were from Udana< / li >
< li > Compared to other mornings (0 to 9 AM) this month that seems to be pretty high:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > # < span style = "color:#66d9ef" > for< / span > num in < span style = "color:#f92672" > {< / span > 01..13< span style = "color:#f92672" > }< / span > ; < span style = "color:#66d9ef" > do< / span > grep -a -E < span style = "color:#e6db74" > " 2021-04-< / span > $num< span style = "color:#e6db74" > 0" < / span > /home/cgspace.cgiar.org/log/dspace.log.2021-04-$num | grep -c -a
2021-04-13 20:13:08 +02:00
add_item; done
32
0
0
2
8
108
4
0
29
0
1
1
2
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < h2 id = "2021-04-15" > 2021-04-15< / h2 >
2021-04-15 15:27:06 +02:00
< ul >
< li > Release v1.4.2 of the DSpace Statistics API on GitHub: < a href = "https://github.com/ilri/dspace-statistics-api/releases/tag/v1.4.2" > https://github.com/ilri/dspace-statistics-api/releases/tag/v1.4.2< / a >
< ul >
< li > This has been running on DSpace Test for the last week or so, and mostly contains the Falcon 3.0.0 changes< / li >
< / ul >
< / li >
< li > Re-sync DSpace Test with data from CGSpace
< ul >
< li > Run system updates on DSpace Test (linode26) and reboot the server< / li >
< / ul >
< / li >
< li > Update the PostgreSQL JDBC driver on DSpace Test (linode26) to 42.2.19
< ul >
< li > It has been a few months since we updated this, and there have been a few releases since 42.2.14 that we are currently using< / li >
< / ul >
< / li >
< li > Create a test account for Rafael from Bioversity-CIAT to submit some items to DSpace Test:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ dspace user -a -m tip-submit@cgiar.org -g CIAT -s Submit -p < span style = "color:#e6db74" > ' fuuuuuuuu' < / span >
< / code > < / pre > < / div > < ul >
2021-04-15 15:27:06 +02:00
< li > I added the account to the Alliance Admins account, which is should allow him to submit to any Alliance collection
< ul >
< li > According to my notes from < a href = "/cgspace-notes/2020-10/" > 2020-10< / a > the account must be in the admin group in order to submit via the REST API< / li >
< / ul >
< / li >
< / ul >
2021-04-18 09:07:54 +02:00
< h2 id = "2021-04-18" > 2021-04-18< / h2 >
< ul >
< li > Update all containers on AReS (linode20):< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ docker images | grep -v ^REPO | sed < span style = "color:#e6db74" > ' s/ \+/:/g' < / span > | cut -d: -f1,2 | xargs -L1 docker pull
< / code > < / pre > < / div > < ul >
2021-04-18 09:07:54 +02:00
< li > Then run all system updates and reboot the server< / li >
< li > I learned a new command for Elasticsearch:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl http://localhost:9200/_cat/indices
2021-04-18 09:07:54 +02:00
yellow open openrxv-values ChyhGwMDQpevJtlNWO1vcw 1 1 1579 0 537.6kb 537.6kb
yellow open openrxv-items-temp PhV5ieuxQsyftByvCxzSIw 1 1 103585 104372 482.7mb 482.7mb
yellow open openrxv-shared J_8cxIz6QL6XTRZct7UBBQ 1 1 127 0 115.7kb 115.7kb
yellow open openrxv-values-00001 jAoXTLR0R9mzivlDVbQaqA 1 1 3903 0 696.2kb 696.2kb
green open .kibana_task_manager_1 O1zgJ0YlQhKCFAwJZaNSIA 1 0 2 2 20.6kb 20.6kb
yellow open openrxv-users 1hWGXh9kS_S6YPxAaBN8ew 1 1 5 0 28.6kb 28.6kb
green open .apm-agent-configuration f3RAkSEBRGaxJZs3ePVxsA 1 0 0 0 283b 283b
yellow open openrxv-items-final sgk-s8O-RZKdcLRoWt3G8A 1 1 970 0 2.3mb 2.3mb
green open .kibana_1 HHPN7RD_T7qe0zDj4rauQw 1 0 25 7 36.8kb 36.8kb
yellow open users M0t2LaZhSm2NrF5xb64dnw 1 1 2 0 11.6kb 11.6kb
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-18 09:07:54 +02:00
< li > Somehow the < code > openrxv-items-final< / code > index only has a few items and the majority are in < code > openrxv-items-temp< / code > , via the < code > openrxv-items< / code > alias (which is in the temp index):< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -s < span style = "color:#e6db74" > ' http://localhost:9200/openrxv-items/_count?q=*& pretty' < / span >
2021-04-18 09:07:54 +02:00
{
2021-11-09 05:29:52 +01:00
" count" : 103585,
" _shards" : {
" total" : 1,
" successful" : 1,
" skipped" : 0,
" failed" : 0
2021-04-18 09:07:54 +02:00
}
}
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-18 09:07:54 +02:00
< li > I found a cool tool to help with exporting and restoring Elasticsearch indexes:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ elasticdump --input< span style = "color:#f92672" > =< / span > http://localhost:9200/openrxv-items --output< span style = "color:#f92672" > =< / span > /home/aorth/openrxv-items_mapping.json --type< span style = "color:#f92672" > =< / span > mapping
$ elasticdump --input< span style = "color:#f92672" > =< / span > http://localhost:9200/openrxv-items --output< span style = "color:#f92672" > =< / span > /home/aorth/openrxv-items_data.json --limit< span style = "color:#f92672" > =< / span > < span style = "color:#ae81ff" > 1000< / span > --type< span style = "color:#f92672" > =< / span > data
2021-04-18 09:07:54 +02:00
...
Sun, 18 Apr 2021 06:27:07 GMT | Total Writes: 103585
Sun, 18 Apr 2021 06:27:07 GMT | dump complete
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-18 09:07:54 +02:00
< li > It took only two or three minutes to export everything… < / li >
< li > I did a test to restore the index:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ elasticdump --input< span style = "color:#f92672" > =< / span > /home/aorth/openrxv-items_mapping.json --output< span style = "color:#f92672" > =< / span > http://localhost:9200/openrxv-items-test --type< span style = "color:#f92672" > =< / span > mapping
$ elasticdump --input< span style = "color:#f92672" > =< / span > /home/aorth/openrxv-items_data.json --output< span style = "color:#f92672" > =< / span > http://localhost:9200/openrxv-items-test --limit < span style = "color:#ae81ff" > 1000< / span > --type< span style = "color:#f92672" > =< / span > data
< / code > < / pre > < / div > < ul >
2021-04-18 09:07:54 +02:00
< li > So that’ s pretty cool!< / li >
< li > I deleted the < code > openrxv-items-final< / code > index and < code > openrxv-items-temp< / code > indexes and then restored the mappings to < code > openrxv-items-final< / code > , added the < code > openrxv-items< / code > alias, and started restoring the data to < code > openrxv-items< / code > with elasticdump:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -XDELETE < span style = "color:#e6db74" > ' http://localhost:9200/openrxv-items-final' < / span >
$ elasticdump --input< span style = "color:#f92672" > =< / span > /home/aorth/openrxv-items_mapping.json --output< span style = "color:#f92672" > =< / span > http://localhost:9200/openrxv-items-final --type< span style = "color:#f92672" > =< / span > mapping
$ curl -s -X POST < span style = "color:#e6db74" > ' http://localhost:9200/_aliases' < / span > -H < span style = "color:#e6db74" > ' Content-Type: application/json' < / span > -d< span style = "color:#e6db74" > ' {" actions" : [{" add" : { " index" : " openrxv-items-final" , " alias" : " openrxv-items" }}]}' < / span >
$ elasticdump --input< span style = "color:#f92672" > =< / span > /home/aorth/openrxv-items_data.json --output< span style = "color:#f92672" > =< / span > http://localhost:9200/openrxv-items --limit < span style = "color:#ae81ff" > 1000< / span > --type< span style = "color:#f92672" > =< / span > data
< / code > < / pre > < / div > < ul >
2021-04-18 09:07:54 +02:00
< li > AReS seems to be working fine а fter that, so I created the < code > openrxv-items-temp< / code > index and then started a fresh harvest on AReS Explorer:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -X PUT < span style = "color:#e6db74" > " localhost:9200/openrxv-items-temp" < / span >
< / code > < / pre > < / div > < ul >
2021-04-18 09:07:54 +02:00
< li > Run system updates on CGSpace (linode18) and run the latest Ansible infrastructure playbook to update the DSpace Statistics API, PostgreSQL JDBC driver, etc, and then reboot the system< / li >
2021-04-19 19:11:44 +02:00
< li > I wasted a bit of time trying to get TSLint and then ESLint running for OpenRXV on GitHub Actions< / li >
< / ul >
< h2 id = "2021-04-19" > 2021-04-19< / h2 >
< ul >
< li > The AReS harvesting last night seems to have completed successfully, but the number of results is strange:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
2021-04-19 19:11:44 +02:00
yellow open openrxv-items-temp kNUlupUyS_i7vlBGiuVxwg 1 1 103741 105553 483.6mb 483.6mb
yellow open openrxv-items-final HFc3uytTRq2GPpn13vkbmg 1 1 970 0 2.3mb 2.3mb
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-19 19:11:44 +02:00
< li > The indices endpoint doesn’ t include the < code > openrxv-items< / code > alias, but it is currently in the < code > openrxv-items-temp< / code > index so the number of items is the same:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -s < span style = "color:#e6db74" > ' http://localhost:9200/openrxv-items/_count?q=*& pretty' < / span >
2021-04-19 19:11:44 +02:00
{
2021-11-09 05:29:52 +01:00
" count" : 103741,
" _shards" : {
" total" : 1,
" successful" : 1,
" skipped" : 0,
" failed" : 0
2021-04-19 19:11:44 +02:00
}
}
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-19 19:11:44 +02:00
< li > A user was having problems resetting their password on CGSpace, with some message about SMTP etc
< ul >
< li > I checked and we are indeed locked out of our mailbox:< / li >
< / ul >
< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ dspace test-email
2021-04-19 19:11:44 +02:00
...
Error sending email:
- Error: javax.mail.SendFailedException: Send failure (javax.mail.AuthenticationFailedException: 550 5.2.1 Mailbox cannot be accessed [PR0P264CA0280.FRAP264.PROD.OUTLOOK.COM]
)
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-19 19:11:44 +02:00
< li > I have to write to ICT… < / li >
< li > I decided to switch back to the G1GC garbage collector on DSpace Test
< ul >
< li > Reading Shawn Heisy’ s discussion again: < a href = "https://cwiki.apache.org/confluence/display/SOLR/ShawnHeisey" > https://cwiki.apache.org/confluence/display/SOLR/ShawnHeisey< / a > < / li >
< li > I am curious to check the JVM stats in a few days to see if there is a marked change< / li >
< / ul >
< / li >
< li > Work on minor changes to get DSpace working on Ubuntu 20.04 for our < a href = "https://github.com/ilri/rmg-ansible-public" > Ansible infrastructure scripts< / a > < / li >
2021-04-18 09:07:54 +02:00
< / ul >
2021-04-21 21:53:24 +02:00
< h2 id = "2021-04-21" > 2021-04-21< / h2 >
< ul >
< li > Send Abdullah feedback on the < a href = "https://github.com/ilri/OpenRXV/pull/91" > filter on click pull request< / a > for OpenRXV
< ul >
< li > I see it adds a new “ allow filter on click” checkbox in the layout settings, but it doesn’ t modify the filters< / li >
< li > Also, it seems to have broken the existing clicking of the countries on the map< / li >
< / ul >
< / li >
< li > Atmire recently sent feedback about the CUA duplicates processor
< ul >
< li > Last month when I ran it it got stuck on the storage reports, apparently, so I will try again (with a fresh Solr statistics core from production) and skip the storage reports (< code > -g< / code > ):< / li >
< / ul >
< / li >
< / ul >
2021-09-13 15:21:16 +02:00
< pre tabindex = "0" > < code > $ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m'
2021-04-21 21:53:24 +02:00
$ cp atmire-cua-update.xml-20210124-132112.old /home/dspacetest.cgiar.org/config/spring/api/atmire-cua-update.xml
$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -r 100 -c statistics -t 12 -g
< / code > < / pre > < ul >
< li > The first run processed 1,439 docs, the second run processed 0 docs
< ul >
< li > I’ m not sure if that means that it worked? I sent feedback to Atmire< / li >
< / ul >
< / li >
< li > Meeting with Moayad to discuss OpenRXV development progress< / li >
< / ul >
2021-04-26 14:58:48 +02:00
< h2 id = "2021-04-25" > 2021-04-25< / h2 >
< ul >
< li > The indexes on AReS are messed up again
< ul >
< li > I made a backup of the indexes, then deleted the < code > openrxv-items-final< / code > and < code > openrxv-items-temp< / code > indexes, re-created the < code > openrxv-items< / code > alias, and restored the data into < code > openrxv-items< / code > :< / li >
< / ul >
< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ elasticdump --input< span style = "color:#f92672" > =< / span > http://localhost:9200/openrxv-items --output< span style = "color:#f92672" > =< / span > /home/aorth/openrxv-items_mapping.json --type< span style = "color:#f92672" > =< / span > mapping
$ elasticdump --input< span style = "color:#f92672" > =< / span > http://localhost:9200/openrxv-items --output< span style = "color:#f92672" > =< / span > /home/aorth/openrxv-items_data.json --limit< span style = "color:#f92672" > =< / span > < span style = "color:#ae81ff" > 1000< / span > --type< span style = "color:#f92672" > =< / span > data
$ curl -XDELETE < span style = "color:#e6db74" > ' http://localhost:9200/openrxv-items-temp' < / span >
$ curl -XDELETE < span style = "color:#e6db74" > ' http://localhost:9200/openrxv-items-final' < / span >
$ elasticdump --input< span style = "color:#f92672" > =< / span > /home/aorth/openrxv-items_mapping.json --output< span style = "color:#f92672" > =< / span > http://localhost:9200/openrxv-items-final --type< span style = "color:#f92672" > =< / span > mapping
$ curl -s -X POST < span style = "color:#e6db74" > ' http://localhost:9200/_aliases' < / span > -H < span style = "color:#e6db74" > ' Content-Type: application/json' < / span > -d< span style = "color:#e6db74" > ' {" actions" : [{" add" : { " index" : " openrxv-items-final" , " alias" : " openrxv-items" }}]}' < / span >
$ elasticdump --input< span style = "color:#f92672" > =< / span > /home/aorth/openrxv-items_data.json --output< span style = "color:#f92672" > =< / span > http://localhost:9200/openrxv-items --limit < span style = "color:#ae81ff" > 1000< / span > --type< span style = "color:#f92672" > =< / span > data
< / code > < / pre > < / div > < ul >
2021-04-26 14:58:48 +02:00
< li > Then I started a fresh AReS harvest< / li >
< / ul >
< h2 id = "2021-04-26" > 2021-04-26< / h2 >
< ul >
< li > The AReS harvest last night seems to have finished successfully and the number of items looks good:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -s http://localhost:9200/_cat/indices | grep openrxv-items
2021-04-26 14:58:48 +02:00
yellow open openrxv-items-temp H-CGsyyLTaqAj6-nKXZ-7w 1 1 0 0 283b 283b
yellow open openrxv-items-final ul3SKsa7Q9Cd_K7qokBY_w 1 1 103951 0 254mb 254mb
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-26 14:58:48 +02:00
< li > And the aliases seem correct for once:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ curl -s < span style = "color:#e6db74" > ' http://localhost:9200/_alias/' < / span > | python -m json.tool
2021-04-26 14:58:48 +02:00
...
2021-11-09 05:29:52 +01:00
" openrxv-items-final" : {
" aliases" : {
" openrxv-items" : {}
2021-04-26 14:58:48 +02:00
}
},
2021-11-09 05:29:52 +01:00
" openrxv-items-temp" : {
" aliases" : {}
2021-04-26 14:58:48 +02:00
},
...
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-26 14:58:48 +02:00
< li > That’ s 250 new items in the index since the last harvest!< / li >
< li > Re-create my local Artifactory container because I’ m getting errors starting it and it has been a few months since it was updated:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ podman rm artifactory
2021-04-26 14:58:48 +02:00
$ podman pull docker.bintray.io/jfrog/artifactory-oss:latest
2021-11-09 05:29:52 +01:00
$ podman create --ulimit nofile< span style = "color:#f92672" > =< / span > 32000:32000 --name artifactory -v artifactory_data:/var/opt/jfrog/artifactory -p 8081-8082:8081-8082 docker.bintray.io/jfrog/artifactory-oss
2021-04-26 14:58:48 +02:00
$ podman start artifactory
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-26 14:58:48 +02:00
< li > Start testing DSpace 7.0 Beta 5 so I can evaluate if it solves some of the problems we are having on DSpace 6, and if it’ s missing things like multiple handle resolvers, etc
< ul >
< li > I see it needs Java JDK 11, Tomcat 9, Solr 8, and PostgreSQL 11< / li >
< li > Also, according to the < a href = "https://wiki.lyrasis.org/display/DSDOC7x/Installing+DSpace" > installation notes< / a > I see you can install the old DSpace 6 REST API, so that’ s potentially useful for us< / li >
< li > I see that all web applications on the backend are now rolled into just one “ server” application< / li >
< li > The build process took 11 minutes the first time (due to downloading the world with Maven) and ~2 minutes the second time< / li >
< li > The < code > local.cfg< / code > content and syntax is very similar DSpace 6< / li >
< / ul >
< / li >
< li > I got the basic < code > fresh_install< / code > up and running
< ul >
< li > Then I tried to import a DSpace 6 database from production< / li >
< / ul >
< / li >
< li > I tried to delete all the Atmire SQL migrations:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > localhost/dspace7b5= > DELETE FROM schema_version WHERE description LIKE ' %Atmire%' OR description LIKE ' %CUA%' OR description LIKE ' %cua%' ;
< / code > < / pre > < / div > < ul >
2021-04-26 14:58:48 +02:00
< li > But I got an error when running < code > dspace database migrate< / code > :< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ ~/dspace7b5/bin/dspace database migrate
< span style = "color:#960050;background-color:#1e0010" >
< / span > < span style = "color:#960050;background-color:#1e0010" > < / span > Database URL: jdbc:postgresql://localhost:5432/dspace7b5
2021-04-26 14:58:48 +02:00
Migrating database to latest version... (Check dspace logs for details)
Migration exception:
java.sql.SQLException: Flyway migration error occurred
at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:738)
at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:632)
at org.dspace.storage.rdbms.DatabaseUtils.main(DatabaseUtils.java:228)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:273)
at org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:129)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:94)
Caused by: org.flywaydb.core.api.FlywayException: Validate failed:
Detected applied migration not resolved locally: 5.0.2017.09.25
Detected applied migration not resolved locally: 6.0.2017.01.30
Detected applied migration not resolved locally: 6.0.2017.09.25
2021-11-09 05:29:52 +01:00
< span style = "color:#960050;background-color:#1e0010" >
< / span > < span style = "color:#960050;background-color:#1e0010" > < / span > at org.flywaydb.core.Flyway.doValidate(Flyway.java:292)
2021-04-26 14:58:48 +02:00
at org.flywaydb.core.Flyway.access$100(Flyway.java:73)
at org.flywaydb.core.Flyway$1.execute(Flyway.java:166)
at org.flywaydb.core.Flyway$1.execute(Flyway.java:158)
at org.flywaydb.core.Flyway.execute(Flyway.java:527)
at org.flywaydb.core.Flyway.migrate(Flyway.java:158)
at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:729)
... 9 more
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-26 14:58:48 +02:00
< li > I deleted those migrations:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > localhost/dspace7b5= > DELETE FROM schema_version WHERE version IN (' 5.0.2017.09.25' , ' 6.0.2017.01.30' , ' 6.0.2017.09.25' );
< / code > < / pre > < / div > < ul >
2021-04-26 14:58:48 +02:00
< li > Then when I ran the migration again it failed for a new reason, related to the configurable workflow:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > Database URL: jdbc:postgresql://localhost:5432/dspace7b5
2021-04-26 14:58:48 +02:00
Migrating database to latest version... (Check dspace logs for details)
Migration exception:
java.sql.SQLException: Flyway migration error occurred
at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:738)
at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:632)
at org.dspace.storage.rdbms.DatabaseUtils.main(DatabaseUtils.java:228)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:273)
at org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:129)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:94)
Caused by: org.flywaydb.core.internal.command.DbMigrate$FlywayMigrateException:
Migration V7.0_2019.05.02__DS-4239-workflow-xml-migration.sql failed
--------------------------------------------------------------------
SQL State : 42P01
Error Code : 0
2021-11-09 05:29:52 +01:00
Message : ERROR: relation " cwf_pooltask" does not exist
2021-04-26 14:58:48 +02:00
Position: 8
Location : org/dspace/storage/rdbms/sqlmigration/postgres/V7.0_2019.05.02__DS-4239-workflow-xml-migration.sql (/home/aorth/src/apache-tomcat-9.0.45/file:/home/aorth/dspace7b5/lib/dspace-api-7.0-beta5.jar!/org/dspace/storage/rdbms/sqlmigration/postgres/V7.0_2019.05.02__DS-4239-workflow-xml-migration.sql)
Line : 16
2021-11-09 05:29:52 +01:00
Statement : UPDATE cwf_pooltask SET workflow_id=' defaultWorkflow' WHERE workflow_id=' default'
2021-04-26 14:58:48 +02:00
...
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-26 14:58:48 +02:00
< li > The < a href = "https://wiki.lyrasis.org/display/DSDOC7x/Upgrading+DSpace" > DSpace 7 upgrade docs< / a > say I need to apply these previously optional migrations:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ ~/dspace7b5/bin/dspace database migrate ignored
< / code > < / pre > < / div > < ul >
2021-04-26 14:58:48 +02:00
< li > Now I see all migrations have completed and DSpace actually starts up fine!< / li >
< li > I will try to do a full re-index to see how long it takes:< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ time ~/dspace7b5/bin/dspace index-discovery -b
2021-04-28 17:57:48 +02:00
...
~/dspace7b5/bin/dspace index-discovery -b 25156.71s user 64.22s system 97% cpu 7:11:09.94 total
2021-11-09 05:29:52 +01:00
< / code > < / pre > < / div > < ul >
2021-04-28 17:57:48 +02:00
< li > Not good, that shit took almost seven hours!< / li >
< / ul >
< h2 id = "2021-04-27" > 2021-04-27< / h2 >
< ul >
< li > Peter sent me a list of 500+ DOIs from CGSpace with no Altmetric score
< ul >
< li > I used csvgrep (with Windows encoding!) to extract those without our handle and save the DOIs to a text file, then got their handles with my < code > doi-to-handle.py< / code > script:< / li >
< / ul >
< / li >
< / ul >
2021-11-09 05:29:52 +01:00
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4" > < code class = "language-console" data-lang = "console" > $ csvgrep -e < span style = "color:#e6db74" > ' windows-1252' < / span > -c < span style = "color:#e6db74" > ' Handle.net IDs' < / span > -i -m < span style = "color:#e6db74" > ' 10568/' < / span > ~/Downloads/Altmetric< span style = "color:#ae81ff" > \ < / span > -< span style = "color:#ae81ff" > \ < / span > Research< span style = "color:#ae81ff" > \ < / span > Outputs< span style = "color:#ae81ff" > \ < / span > -< span style = "color:#ae81ff" > \ < / span > CGSpace< span style = "color:#ae81ff" > \ < / span > -< span style = "color:#ae81ff" > \ < / span > 2021-04-26.csv | csvcut -c DOI | sed < span style = "color:#e6db74" > ' 1d' < / span > > /tmp/dois.txt
$ ./ilri/doi-to-handle.py -i /tmp/dois.txt -o /tmp/handles.csv -db dspace63 -u dspace -p < span style = "color:#e6db74" > ' fuuu' < / span > -d
< / code > < / pre > < / div > < ul >
2021-04-28 17:57:48 +02:00
< li > He will Tweet them… < / li >
< / ul >
< h2 id = "2021-04-28" > 2021-04-28< / h2 >
< ul >
< li > Grant some IWMI colleagues access to the Atmire Content and Usage stats on CGSpace< / li >
< / ul >
<!-- raw HTML omitted -->
2021-04-05 18:36:44 +02:00
< / article >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
2022-03-01 15:48:40 +01:00
< li > < a href = "/cgspace-notes/2022-03/" > March, 2022< / a > < / li >
2022-02-10 18:35:40 +01:00
< li > < a href = "/cgspace-notes/2022-02/" > February, 2022< / a > < / li >
2022-01-01 14:21:47 +01:00
< li > < a href = "/cgspace-notes/2022-01/" > January, 2022< / a > < / li >
2021-12-03 11:58:43 +01:00
< li > < a href = "/cgspace-notes/2021-12/" > December, 2021< / a > < / li >
2021-11-01 09:49:21 +01:00
< li > < a href = "/cgspace-notes/2021-11/" > November, 2021< / a > < / li >
2021-04-05 18:36:44 +02:00
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
< p dir = "auto" >
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >