cgspace-notes/docs/2022-02/index.html

779 lines
43 KiB
HTML
Raw Normal View History

2022-02-10 18:35:40 +01:00
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="February, 2022" />
<meta property="og:description" content="2022-02-01
Meeting with Peter and Abenet about CGSpace in the One CGIAR
We agreed to buy $5,000 worth of credits from Atmire for future upgrades
We agreed to move CRPs and non-CGIAR communities off the home page, as well as some other things for the CGIAR System Organization
We agreed to make a Discovery facet for CGIAR Action Areas above the existing CGIAR Impact Areas one
We agreed to try to do more alignment of affiliations/funders with ROR
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-02/" />
<meta property="article:published_time" content="2022-02-01T14:06:54+02:00" />
2022-02-26 10:49:19 +01:00
<meta property="article:modified_time" content="2022-02-24T19:15:45+03:00" />
2022-02-10 18:35:40 +01:00
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="February, 2022"/>
<meta name="twitter:description" content="2022-02-01
Meeting with Peter and Abenet about CGSpace in the One CGIAR
We agreed to buy $5,000 worth of credits from Atmire for future upgrades
We agreed to move CRPs and non-CGIAR communities off the home page, as well as some other things for the CGIAR System Organization
We agreed to make a Discovery facet for CGIAR Action Areas above the existing CGIAR Impact Areas one
We agreed to try to do more alignment of affiliations/funders with ROR
"/>
2022-02-23 12:46:23 +01:00
<meta name="generator" content="Hugo 0.92.2" />
2022-02-10 18:35:40 +01:00
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "February, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-02/",
2022-02-26 10:49:19 +01:00
"wordCount": "3019",
2022-02-10 18:35:40 +01:00
"datePublished": "2022-02-01T14:06:54+02:00",
2022-02-26 10:49:19 +01:00
"dateModified": "2022-02-24T19:15:45+03:00",
2022-02-10 18:35:40 +01:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2022-02/">
<title>February, 2022 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.beb8012edc08ba10be012f079d618dc243812267efe62e11f22fe49618f976a4.css" rel="stylesheet" integrity="sha256-vrgBLtwIuhC&#43;AS8HnWGNwkOBImfv5i4R8i/klhj5dqQ=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz&#43;lcnA=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2022-02/">February, 2022</a></h2>
<p class="blog-post-meta">
<time datetime="2022-02-01T14:06:54+02:00">Tue Feb 01, 2022</time>
in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2022-02-01">2022-02-01</h2>
<ul>
<li>Meeting with Peter and Abenet about CGSpace in the One CGIAR
<ul>
<li>We agreed to buy $5,000 worth of credits from Atmire for future upgrades</li>
<li>We agreed to move CRPs and non-CGIAR communities off the home page, as well as some other things for the CGIAR System Organization</li>
<li>We agreed to make a Discovery facet for CGIAR Action Areas above the existing CGIAR Impact Areas one</li>
<li>We agreed to try to do more alignment of affiliations/funders with ROR</li>
</ul>
</li>
</ul>
<ul>
<li>I moved a bunch of communities:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ dspace community-filiator --remove --parent<span style="color:#f92672">=</span>10568/114639 --child<span style="color:#f92672">=</span>10568/115089
$ dspace community-filiator --remove --parent<span style="color:#f92672">=</span>10568/114639 --child<span style="color:#f92672">=</span>10568/115087
$ dspace community-filiator --remove --parent<span style="color:#f92672">=</span>10568/83389 --child<span style="color:#f92672">=</span>10568/108598
$ dspace community-filiator --remove --parent<span style="color:#f92672">=</span>10568/83389 --child<span style="color:#f92672">=</span>10947/1
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/35697 --child<span style="color:#f92672">=</span>10568/80211
$ dspace community-filiator --remove --parent<span style="color:#f92672">=</span>10568/83389 --child<span style="color:#f92672">=</span>10947/2517
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/97114 --child<span style="color:#f92672">=</span>10947/2517
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/97114 --child<span style="color:#f92672">=</span>10568/89416
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/97114 --child<span style="color:#f92672">=</span>10568/3530
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/97114 --child<span style="color:#f92672">=</span>10568/80099
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/97114 --child<span style="color:#f92672">=</span>10568/80100
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/97114 --child<span style="color:#f92672">=</span>10568/34494
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/117867 --child<span style="color:#f92672">=</span>10568/114644
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/117867 --child<span style="color:#f92672">=</span>10568/16573
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/117867 --child<span style="color:#f92672">=</span>10568/42211
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/117865 --child<span style="color:#f92672">=</span>10568/109945
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/117865 --child<span style="color:#f92672">=</span>10568/16498
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/117865 --child<span style="color:#f92672">=</span>10568/99453
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/117865 --child<span style="color:#f92672">=</span>10568/2983
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/117865 --child<span style="color:#f92672">=</span>10568/133
$ dspace community-filiator --remove --parent<span style="color:#f92672">=</span>10568/83389 --child<span style="color:#f92672">=</span>10568/1208
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/117865 --child<span style="color:#f92672">=</span>10568/1208
$ dspace community-filiator --remove --parent<span style="color:#f92672">=</span>10568/83389 --child<span style="color:#f92672">=</span>10568/56924
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10568/117865 --child<span style="color:#f92672">=</span>10568/56924
$ dspace community-filiator --remove --parent<span style="color:#f92672">=</span>10568/83389 --child<span style="color:#f92672">=</span>10568/91688
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10947/1 --child<span style="color:#f92672">=</span>10568/91688
$ dspace community-filiator --remove --parent<span style="color:#f92672">=</span>10568/83389 --child<span style="color:#f92672">=</span>10947/2515
$ dspace community-filiator --set --parent<span style="color:#f92672">=</span>10947/1 --child<span style="color:#f92672">=</span>10947/2515
</code></pre></div><ul>
<li>Remove CPWF and CTA subjects from the Discovery facets</li>
<li>Start a full Discovery index on CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ time chrt -b <span style="color:#ae81ff">0</span> ionice -c2 -n7 nice -n19 dspace index-discovery -b
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>real 275m15.777s
user 182m52.171s
sys 2m51.573s
</code></pre></div><ul>
<li>I got a request to confirm validation of CGSpace on openarchives.org, with the requestor&rsquo;s IP being 128.84.116.66
<ul>
<li>That is at Cornell&hellip; hmmmm who could that be?!</li>
<li>Oh, the OpenArchives initiative is at Cornell&hellip; maybe this is an automated periodic check?</li>
</ul>
</li>
</ul>
<h2 id="2022-02-02">2022-02-02</h2>
<ul>
<li>Looking at the top user agents and IP addresses in CGSpace&rsquo;s Solr statistics for 2022-01
<ul>
<li>64.39.98.40 made 26,000 requests, owned by Qualys so it&rsquo;s some kind of security scanning</li>
<li>45.134.26.171 made 8,000 requests and it&rsquo;s own by some Russian company and makes requests like this hmmmmm:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">45.134.26.171 - - [12/Jan/2022:06:25:27 +0100] &#34;GET /bitstream/handle/10568/81964/varietal-2faea58f.pdf?sequence=1 HTTP/1.1&#34; 200 1157807 &#34;https://cgspace.cgiar.org:443/bitstream/handle/10568/81964/varietal-2faea58f.pdf&#34; &#34;Opera/9.64 (Windows NT 6.1; U; MRA 5.5 (build 02842); ru) Presto/2.1.1)) AND 4734=CTXSYS.DRITHSX.SN(4734,(CHR(113)||CHR(120)||CHR(120)||CHR(112)||CHR(113)||(SELECT (CASE WHEN (4734=4734) THEN 1 ELSE 0 END) FROM DUAL)||CHR(113)||CHR(120)||CHR(113)||CHR(122)||CHR(113))) AND ((3917=3917&#34;
</code></pre></div><ul>
<li>3.225.28.105 made 3,000 requests mostly for one CIAT collection on the REST API and it is owned by Amazon
<ul>
<li>The user agent is sometimes a normal user one, and sometimes <code>Apache-HttpClient/4.3.4 (java 1.5)</code></li>
</ul>
</li>
<li>217.182.21.193 made 2,400 requests and is on OVH</li>
<li>I purged these hits</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt -p
Purging 26817 hits from 64.39.98.40 in statistics
Purging 9446 hits from 45.134.26.171 in statistics
Purging 6490 hits from 3.225.28.105 in statistics
Purging 11949 hits from 217.182.21.193 in statistics
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 54702
</code></pre></div><ul>
<li>Export donors and affiliations from CGSpace database:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">localhost/dspace63= ☘ \COPY (SELECT DISTINCT text_value as &#34;cg.contributor.donor&#34;, count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 248 GROUP BY text_value ORDER BY count DESC) to /tmp/2022-02-02-donors.csv WITH CSV HEADER;
COPY 1036
localhost/dspace63= ☘ \COPY (SELECT DISTINCT text_value as &#34;cg.contributor.affiliation&#34;, count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2022-02-02-affiliations.csv WITH CSV HEADER;
COPY 7901
</code></pre></div><ul>
<li>Then check matches against the latest ROR dump:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c cg.contributor.donor /tmp/2022-02-02-donors.csv | sed <span style="color:#e6db74">&#39;1d&#39;</span> &gt; /tmp/2022-02-02-donors.txt
$ ./ilri/ror-lookup.py -i /tmp/2022-02-02-donors.txt -r 2021-09-23-ror-data.json -o /tmp/donor-ror-matches.csv
...
</code></pre></div><ul>
<li>I see we have 258/1036 (24.9%) of our donors matching ROR (as of the 2021-09-23 ROR dump)</li>
<li>I see we have 1986/7901 (25.1%) of our affiliations matching ROR (as of the 2021-09-23 ROR dump)</li>
<li>Update the PostgreSQL JDBC driver to 42.3.2 in the Ansible Infrastructure playbooks and deploy on DSpace Test</li>
<li>Mishell from CIP sent me a copy of a security scan their ICT had done on CGSpace using QualysGuard
<ul>
<li>The report was very long and generic, highlighting low-severity things like being able to post crap to search forms and have it appear on the results page</li>
<li>Also they say we&rsquo;re using old jQuery and bootstrap, etc (fair enough) but there are no exploits per se</li>
<li>At least now I know why all those Qualys IPs are scanning us all the time!!!</li>
</ul>
</li>
<li>Mishell also said she&rsquo;s having issues logging into CGSpace
<ul>
<li>According to the logs her account is failing on LDAP authentication</li>
<li>I checked CGSpace&rsquo;s LDAP credentials using ldapsearch and was able to connect so it&rsquo;s gotta be something with her account</li>
</ul>
</li>
</ul>
<h2 id="2022-02-03">2022-02-03</h2>
<ul>
<li>I synchronized DSpace Test with a fresh snapshot of CGSpace</li>
<li>I noticed a bunch of thumbnails missing for items submitted in the last week on CGSpace so I ran the <code>dspace filter-media</code> script manually and eventually it crashed:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ JAVA_OPTS<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-Xmx1024m -Dfile.encoding=UTF-8&#34;</span> dspace filter-media
...
SKIPPED: bitstream 48612de7-eec5-4990-8f1b-589a87219a39 (item: 10568/67391) because &#39;ilri_establishiment.pdf.txt&#39; already exists
Generated Thumbnail ilri_establishiment.pdf matches pattern and is replacable.
SKIPPED: bitstream 48612de7-eec5-4990-8f1b-589a87219a39 (item: 10568/67391) because &#39;ilri_establishiment.pdf.jpg&#39; already exists
File: Agreement_on_the_Estab_of_ILRI.doc.txt
Exception: org.apache.poi.util.LittleEndian.getUnsignedByte([BI)I
java.lang.NoSuchMethodError: org.apache.poi.util.LittleEndian.getUnsignedByte([BI)I
at org.textmining.extraction.word.model.FormattedDiskPage.&lt;init&gt;(FormattedDiskPage.java:66)
at org.textmining.extraction.word.model.CHPFormattedDiskPage.&lt;init&gt;(CHPFormattedDiskPage.java:62)
at org.textmining.extraction.word.model.CHPBinTable.&lt;init&gt;(CHPBinTable.java:70)
at org.textmining.extraction.word.Word97TextExtractor.getText(Word97TextExtractor.java:122)
at org.textmining.extraction.word.Word97TextExtractor.getText(Word97TextExtractor.java:63)
at org.dspace.app.mediafilter.WordFilter.getDestinationStream(WordFilter.java:83)
at com.atmire.dspace.app.mediafilter.AtmireMediaFilter.processBitstream(AtmireMediaFilter.java:103)
at com.atmire.dspace.app.mediafilter.AtmireMediaFilterServiceImpl.filterBitstream(AtmireMediaFilterServiceImpl.java:61)
at org.dspace.app.mediafilter.MediaFilterServiceImpl.filterItem(MediaFilterServiceImpl.java:181)
at org.dspace.app.mediafilter.MediaFilterServiceImpl.applyFiltersItem(MediaFilterServiceImpl.java:159)
at org.dspace.app.mediafilter.MediaFilterServiceImpl.applyFiltersAllItems(MediaFilterServiceImpl.java:111)
at org.dspace.app.mediafilter.MediaFilterCLITool.main(MediaFilterCLITool.java:212)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
</code></pre></div><ul>
<li>I should look up that issue and report a bug somewhere perhaps, but for now I just forced the JPG thumbnails with:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ JAVA_OPTS<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-Xmx1024m -Dfile.encoding=UTF-8&#34;</span> dspace filter-media -p <span style="color:#e6db74">&#34;ImageMagick PDF Thumbnail&#34;</span> -v &gt;&amp; /tmp/filter-media.log
</code></pre></div><h2 id="2022-02-04">2022-02-04</h2>
<ul>
<li>I found a thread on the dspace-tech mailing list about the <code>media-filter</code> crash above
<ul>
<li>The problem is that the default filter for Word files is outdated, so we need to switch to the PoiWordFilter extractor</li>
<li>After changing that I was able to filter the Word file on that item above:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ JAVA_OPTS<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-Xmx1024m -Dfile.encoding=UTF-8&#34;</span> dspace filter-media -i 10568/67391 -p <span style="color:#e6db74">&#34;Word Text Extractor&#34;</span> -v
The following MediaFilters are enabled:
Full Filter Name: org.dspace.app.mediafilter.PoiWordFilter
org.dspace.app.mediafilter.PoiWordFilter
File: Agreement_on_the_Estab_of_ILRI.doc.txt
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>FILTERED: bitstream 31db7d05-5369-4309-adeb-3b888c80b73d (item: 10568/67391) and created &#39;Agreement_on_the_Estab_of_ILRI.doc.txt&#39;
</code></pre></div><ul>
<li>Meeting with the repositories working group to discuss issues moving forward in the One CGIAR</li>
</ul>
<h2 id="2022-02-07">2022-02-07</h2>
<ul>
<li>Gaia sent me her feedback on the duplicates for the TAC and ICW items for CGSpace a few days ago
<ul>
<li>I used the IDs marked &ldquo;delete&rdquo; in her spreadsheet to create a custom text facet with this GREL in OpenRefine:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">or(
isNotNull(value.match(&#39;1&#39;)),
isNotNull(value.match(&#39;4&#39;)),
isNotNull(value.match(&#39;5&#39;)),
isNotNull(value.match(&#39;6&#39;)),
isNotNull(value.match(&#39;8&#39;)),
...
sNotNull(value.match(&#39;178&#39;)),
isNotNull(value.match(&#39;186&#39;)),
isNotNull(value.match(&#39;188&#39;)),
isNotNull(value.match(&#39;189&#39;)),
isNotNull(value.match(&#39;197&#39;))
)
</code></pre></div><ul>
<li>Then I flagged all of these (seventy-five items)&hellip;
<ul>
<li>I decided to flag the deletes instead of star the keeps because there are some items in the original file that we not marked as duplicates so we have to keep those too</li>
</ul>
</li>
<li>I generated the next batch of 200 items, from IDs 201 to 400, checked them for duplicates, and then added the PDF file names to the CSV for reference:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c id,dc.title,dcterms.issued,dcterms.type ~/Downloads/2022-01-21-CGSpace-TAC-ICW-batch201-400.csv &gt; /tmp/tac.csv
$ ./ilri/check-duplicates.py -i /tmp/tac.csv -db dspace63 -u dspacetest -p <span style="color:#e6db74">&#39;dom@in34sniper&#39;</span> -o /tmp/2022-02-07-tac-batch2-201-400.csv
$ csvcut -c id,filename ~/Downloads/2022-01-21-CGSpace-TAC-ICW-batch201-400.csv &gt; /tmp/batch2-filenames.csv
$ csvjoin -c id /tmp/2022-02-07-tac-batch2-201-400.csv /tmp/batch2-filenames.csv &gt; /tmp/2022-02-07-tac-batch2-201-400-filenames.csv
</code></pre></div><ul>
<li>Then I sent this second batch of items to Gaia to look at</li>
</ul>
<h2 id="2022-02-08">2022-02-08</h2>
<ul>
<li>Create a SAF archive for the first 200 items (IDs 1 to 200) that were <em>not</em> flagged as duplicates and upload them to a <a href="https://dspacetest.cgiar.org/handle/10568/117921">new collection on DSpace Test</a>:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ JAVA_OPTS<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-Xmx1024m -Dfile.encoding=UTF-8&#34;</span> dspace import --add --eperson<span style="color:#f92672">=</span>bngo@mfin.com --source /tmp/SimpleArchiveFormat --mapfile<span style="color:#f92672">=</span>./2022-02-08-tac-batch1-1to200.map
</code></pre></div><ul>
<li>Fix some occurrences of &ldquo;Hammond, Jim&rdquo; to be &ldquo;Hammond, James&rdquo; on CGSpace</li>
<li>Start a full index on AReS</li>
</ul>
<h2 id="2022-02-09">2022-02-09</h2>
<ul>
<li>UptimeRobot said that CGSpace was down yesterday evening, but when I looked it was up and I didn&rsquo;t see a high database load or anything wrong</li>
<li>Maria from Bioversity wrote to say that CGSpace was very slow also&hellip;</li>
</ul>
<h2 id="2022-02-10">2022-02-10</h2>
<ul>
<li>Looking at the Munin graphs on CGSpace I see several metrics showing that there was likely just increased load&hellip;</li>
</ul>
<p><img src="/cgspace-notes/2022/02/fw_packets-day-fs8.png" alt="Firewall packets day">
<img src="/cgspace-notes/2022/02/jmx_dspace_sessions-day-fs8.png" alt="DSpace sessions day">
<img src="/cgspace-notes/2022/02/jmx_tomcat_dbpools-day-fs8.png" alt="Tomcat pool day">
<img src="/cgspace-notes/2022/02/postgres_connections_db-day-fs8.png" alt="PostgreSQL connections day"></p>
<ul>
<li>I extract the logs from nginx for yesterday so I can analyze the traffic:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep <span style="color:#e6db74">&#39;09/Feb/2022&#39;</span> &gt; /tmp/feb9-access.log
# zcat --force /var/log/nginx/rest.log.1 /var/log/nginx/rest.log.2.gz | grep <span style="color:#e6db74">&#39;09/Feb/2022&#39;</span> &gt; /tmp/feb9-rest.log
# awk <span style="color:#e6db74">&#39;{print $1}&#39;</span> /tmp/feb9-* | less | sort -u &gt; /tmp/feb9-ips.txt
# wc -l /tmp/feb9-ips.txt
11636 /tmp/feb9-ips.tx
</code></pre></div><ul>
<li>I started resolving them with my <code>resolve-addresses-geoip2.py</code> script</li>
<li>In the mean time I am looking at the requests and I see a new user agent: <code>1science Resolver 1.0.0</code>
<ul>
<li>Seems to be a defunct project from Elsevier (website down, Twitter account inactive since 2020)</li>
</ul>
</li>
<li>I also see 3,400 requests from <code>EyeMonIT_bot_version_0.1_(http://www.eyemon.it/)</code>, but because it has &ldquo;bot&rdquo; in the name it gets heavily throttled&hellip;
<ul>
<li>I wonder who is monitoring CGSpace with that service&hellip;</li>
</ul>
</li>
<li>Looking at the top twenty or so ASNs for the resolved IPs I see lots of bot traffic, but nothing malicious:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c asn /tmp/feb9-ips.csv | sort | uniq -c | sort -h | tail -n <span style="color:#ae81ff">20</span>
79 24940
89 36908
100 9299
107 2635
110 44546
111 16509
118 7552
120 4837
123 50245
123 55836
147 45899
173 33771
192 39832
202 32934
235 29465
260 15169
466 14618
607 24757
768 714
1214 8075
</code></pre></div><ul>
<li>The same information, but by org name:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c org /tmp/feb9-ips.csv | sort | uniq -c | sort -h | tail -n <span style="color:#ae81ff">20</span>
92 Orange
100 Hetzner Online GmbH
100 Philippine Long Distance Telephone Company
107 AUTOMATTIC
110 ALFA TELECOM s.r.o.
111 AMAZON-02
118 Viettel Group
120 CHINA UNICOM China169 Backbone
123 Reliance Jio Infocomm Limited
123 Serverel Inc.
147 VNPT Corp
173 SAFARICOM-LIMITED
192 Opera Software AS
202 FACEBOOK
235 MTN NIGERIA Communication limited
260 GOOGLE
466 AMAZON-AES
607 Ethiopian Telecommunication Corporation
768 APPLE-ENGINEERING
1214 MICROSOFT-CORP-MSN-AS-BLOCK
</code></pre></div><ul>
<li>Most of these are pretty normal except &ldquo;Serverel&rdquo; and Hetzner perhaps, but their user agents are pretending to be normal users so who knows&hellip;</li>
<li>I decided to look in the Solr stats with <code>facet.limit=1000&amp;facet.mincount=1</code> and found a few more definitely non-human agents:
<ul>
<li>scalaj-http/2.4.2</li>
<li>scpitspi-rs</li>
<li>lua-resty-http</li>
<li>AHC/2.1</li>
<li>acebookexternalhit &lt;&mdash;- typo, but purge it!!!</li>
<li>Iframely/1.3.1 (+https://iframely.com/docs/about) Atlassian</li>
<li>qbhttp/1.0.0</li>
<li>got (<a href="https://github.com/sindresorhus/got">https://github.com/sindresorhus/got</a>)</li>
<li>colly - <a href="https://github.com/gocolly/colly/v2">https://github.com/gocolly/colly/v2</a></li>
<li>article-parser/4.2.10</li>
<li>SomeRandomText</li>
<li>adreview/1.0</li>
</ul>
</li>
<li>I added them to the ILRI override in the DSpace spider list and ran the <code>check-spider-hits.sh</code> script:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/check-spider-hits.sh -f dspace/config/spiders/agents/ilri -p
Purging 234 hits from randint in statistics
Purging 337 hits from Koha in statistics
Purging 1164 hits from scalaj-http in statistics
Purging 1528 hits from scpitspi-rs in statistics
Purging 3050 hits from lua-resty-http in statistics
Purging 1683 hits from AHC in statistics
Purging 1129 hits from acebookexternalhit in statistics
Purging 534 hits from Iframely in statistics
Purging 1022 hits from qbhttp in statistics
Purging 330 hits from ^got in statistics
Purging 156 hits from ^colly in statistics
Purging 38 hits from article-parser in statistics
Purging 1148 hits from SomeRandomText in statistics
Purging 3126 hits from adreview in statistics
2022-02-11 07:41:05 +01:00
Purging 217 hits from 1science in statistics
2022-02-10 18:35:40 +01:00
<span style="color:#960050;background-color:#1e0010">
2022-02-11 07:41:05 +01:00
</span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 14696
2022-02-10 18:35:40 +01:00
</code></pre></div><ul>
<li>I don&rsquo;t have time right now to add any of these to the COUNTER-Robots list&hellip;</li>
<li>Peter asked me to add a new item type on CGSpace: Opinion Piece</li>
<li>Map an item on CGSpace for Maria since she couldn&rsquo;t find it in the item mapper</li>
</ul>
2022-02-14 07:40:59 +01:00
<h2 id="2022-02-11">2022-02-11</h2>
<ul>
<li>CGSpace is slow and the load has been over 400% for a few hours
<ul>
<li>The number of DSpace sessions seems normal, even lower than a few days ago</li>
<li>The number of PostgreSQL connections is low, but I see there are lots of &ldquo;AccessShare&rdquo; locks (green on Munin, not blue like usual)</li>
<li>I will run all system updates, copy the latest config changes, and restart the server</li>
</ul>
</li>
</ul>
<h2 id="2022-02-12">2022-02-12</h2>
<ul>
<li>Install PostgreSQL 12 on my local dev environment to starting DSpace 6.x workflows with it:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD<span style="color:#f92672">=</span>postgres -p 5432:5432 -d postgres:12-alpine
$ createuser -h localhost -p <span style="color:#ae81ff">5432</span> -U postgres --pwprompt dspacetest
$ createdb -h localhost -p <span style="color:#ae81ff">5432</span> -U postgres -O dspacetest --encoding<span style="color:#f92672">=</span>UNICODE dspacetest
$ psql -h localhost -U postgres -c <span style="color:#e6db74">&#39;ALTER USER dspacetest SUPERUSER;&#39;</span>
$ pg_restore -h localhost -U postgres -d dspacetest -O --role<span style="color:#f92672">=</span>dspacetest -h localhost ~/Downloads/dspace-2022-02-12.backup
$ psql -h localhost -U postgres -c <span style="color:#e6db74">&#39;ALTER USER dspacetest NOSUPERUSER;&#39;</span>
</code></pre></div><ul>
<li>Eventually I will updated DSpace Test, then CGSpace (time to start paying off some technical debt!)</li>
<li>Start a full Discovery re-index on CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ time chrt -b <span style="color:#ae81ff">0</span> ionice -c2 -n7 nice -n19 dspace index-discovery -b
<span style="color:#960050;background-color:#1e0010">
</span><span style="color:#960050;background-color:#1e0010"></span>real 292m49.263s
user 201m26.097s
sys 3m2.459s
</code></pre></div><ul>
<li>Start a full harvest on AReS</li>
</ul>
2022-02-14 14:43:12 +01:00
<h2 id="2022-02-14">2022-02-14</h2>
<ul>
<li>Last week Gaia sent me her notes on the second batch of TAC/ICW documents (items 201400 in the spreadsheet)
<ul>
<li>I created a filter in LibreOffice and selected the IDs for items with the action &ldquo;delete&rdquo;, then I created a custom text facet in OpenRefine with this GREL:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code>or(
isNotNull(value.match('201')),
isNotNull(value.match('203')),
isNotNull(value.match('209')),
isNotNull(value.match('209')),
isNotNull(value.match('215')),
isNotNull(value.match('220')),
isNotNull(value.match('225')),
isNotNull(value.match('226')),
isNotNull(value.match('227')),
...
isNotNull(value.match('396'))
</code></pre><ul>
<li>Then I flagged all matching records and exported a CSV to use with SAFBuilder
<ul>
<li>Then I imported the SAF bundle on DSpace Test:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ JAVA_OPTS<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-Xmx1024m -Dfile.encoding=UTF-8&#34;</span> dspace import --add --eperson<span style="color:#f92672">=</span>fuuu@umm.com --source /tmp/SimpleArchiveFormat --mapfile<span style="color:#f92672">=</span>./2022-02-14-tac-batch2-201to400.map
</code></pre></div><ul>
<li>Export the next batch from OpenRefine (items with ID 401 to 700), check duplicates, and then join with the file names:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ csvcut -c id,dc.title,dcterms.issued,dcterms.type ~/Downloads/2022-01-21-CGSpace-TAC-ICW-batch3-401to700.csv &gt; /tmp/tac3.csv
$ ./ilri/check-duplicates.py -i /tmp/tac3.csv -db dspacetest -u dspacetest -p <span style="color:#e6db74">&#39;dom@in34sniper&#39;</span> -o /tmp/2022-02-14-tac-batch3-401-700.csv
$ csvcut -c id,filename ~/Downloads/2022-01-21-CGSpace-TAC-ICW-batch3-401to700.csv &gt; /tmp/tac3-filenames.csv
$ csvjoin -c id /tmp/2022-02-14-tac-batch3-401-700.csv /tmp/tac3-filenames.csv &gt; /tmp/2022-02-14-tac-batch3-401-700-filenames.csv
</code></pre></div><ul>
<li>I sent these 300 items to Gaia&hellip;</li>
</ul>
2022-02-23 12:46:23 +01:00
<h2 id="2022-02-16">2022-02-16</h2>
<ul>
<li>Upgrade PostgreSQL on DSpace Test from version 10 to 12
<ul>
<li>First, I installed the new version of PostgreSQL via the Ansible playbook scripts</li>
<li>Then I stopped Tomcat and all PostgreSQL clusters and used <code>pg_upgrade</code> to upgrade the old version:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># systemctl stop tomcat7
# pg_ctlcluster <span style="color:#ae81ff">10</span> main stop
# tar -cvzpf var-lib-postgresql-10.tar.gz /var/lib/postgresql/10
# tar -cvzpf etc-postgresql-10.tar.gz /etc/postgresql/10
# pg_ctlcluster <span style="color:#ae81ff">12</span> main stop
# pg_dropcluster <span style="color:#ae81ff">12</span> main
# pg_upgradecluster <span style="color:#ae81ff">10</span> main
# pg_ctlcluster <span style="color:#ae81ff">12</span> main start
</code></pre></div><ul>
<li>After that I <a href="https://adamj.eu/tech/2021/04/13/reindexing-all-tables-after-upgrading-to-postgresql-13/">re-indexed the database indexes using a query</a>:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ su - postgres
$ cat /tmp/generate-reindex.sql
SELECT &#39;REINDEX TABLE CONCURRENTLY &#39; || quote_ident(relname) || &#39; /*&#39; || pg_size_pretty(pg_total_relation_size(C.oid)) || &#39;*/;&#39;
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname = &#39;public&#39;
AND C.relkind = &#39;r&#39;
AND nspname !~ &#39;^pg_toast&#39;
ORDER BY pg_total_relation_size(C.oid) ASC;
$ psql dspace &lt; /tmp/generate-reindex.sql &gt; /tmp/reindex.sql
$ &lt;trim the extra stuff from /tmp/reindex.sql&gt;
$ psql dspace &lt; /tmp/reindex.sql
</code></pre></div><ul>
<li>I saw that the index on <code>metadatavalue</code> shrunk by about 200MB!</li>
<li>After testing a few things I dropped the old cluster:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console"># pg_dropcluster <span style="color:#ae81ff">10</span> main
# dpkg -l | grep postgresql-10 | awk <span style="color:#e6db74">&#39;{print $2}&#39;</span> | xargs dpkg -r
</code></pre></div><h2 id="2022-02-17">2022-02-17</h2>
<ul>
<li>I updated my <code>migrate-fields.sh</code> script to use field names instead of IDs
<ul>
<li>The script now looks up the appropriate <code>metadata_field_id</code> values for each field in the metadata registry</li>
</ul>
</li>
</ul>
<h2 id="2022-02-18">2022-02-18</h2>
<ul>
<li>Normalize the <code>text_lang</code> attributes of metadata on CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">dspace=# SELECT DISTINCT text_lang, count(text_lang) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) GROUP BY text_lang ORDER BY count DESC;
text_lang | count
-----------+---------
en_US | 2838588
en | 1082
| 801
fr | 2
vn | 2
en_US. | 1
sp | 1
| 0
(8 rows)
dspace=# UPDATE metadatavalue SET text_lang=&#39;en_US&#39; WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_lang IN (&#39;en&#39;, &#39;en_US.&#39;, &#39;&#39;);
UPDATE 1884
dspace=# UPDATE metadatavalue SET text_lang=&#39;vi&#39; WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_lang IN (&#39;vn&#39;);
UPDATE 2
dspace=# UPDATE metadatavalue SET text_lang=&#39;es&#39; WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_lang IN (&#39;sp&#39;);
UPDATE 1
</code></pre></div><ul>
<li>I then exported the entire repository and did some cleanup on DOIs
<ul>
<li>I found ~1,200 items with no <code>cg.identifier.doi</code>, but which had a DOI in their citation</li>
<li>I cleaned up and normalized a few hundred others to use <a href="https://doi.org">https://doi.org</a> format</li>
</ul>
</li>
<li>I&rsquo;m debating using the Crossref API to search for our DOIs and improve our metadata
<ul>
<li>For example: <a href="https://api.crossref.org/works/10.1016/j.ecolecon.2008.03.011">https://api.crossref.org/works/10.1016/j.ecolecon.2008.03.011</a></li>
<li>There is good data on publishers, issue dates, volume/issue, and sometimes even licenses</li>
</ul>
</li>
<li>I cleaned up ~1,200 URLs that were using HTTP instead of HTTPS, fixed a bunch of handles, removed some handles from DOI field, etc</li>
</ul>
<h2 id="2022-02-20">2022-02-20</h2>
<ul>
<li>Yesterday I wrote a script to check our DOIs against Crossref&rsquo;s API and the did some investigation on dates, volumes, issues, pages, and types
<ul>
<li>While investigating issue dates in OpenRefine I created a new column using this GREL to show the number of days between Crossref&rsquo;s date and ours:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">abs(diff(toDate(cells[&#34;issued&#34;].value),toDate(cells[&#34;dcterms.issued[en_US]&#34;].value), &#34;days&#34;))
</code></pre></div><ul>
<li>In <em>most</em> cases Crossref&rsquo;s dates are more correct than ours, though there are a few odd cases that I don&rsquo;t know what strategy I want to use yet</li>
<li>Start a full harvest on AReS</li>
</ul>
<h2 id="2022-02-21">2022-02-21</h2>
<ul>
<li>I added support for checking the license of DOIs to my Crossref script
<ul>
<li>I exported ~2,800 DOIs and ran a check on them, then merged the CGSpace CSV with the results of the script to inspect in OpenRefine</li>
<li>There are hundreds of DOIs missing licenses in our data, even in this small subset of ~2,800 (out of 19,000 on CGSpace)</li>
<li>I spot checked a few dozen in Crossref&rsquo;s data and found some incorrect ones, like on Elsevier, Wiley, and Sage journals</li>
<li>I ended up using a series of GREL expressions in OpenRefine that ended up filtering out DOIs from these prefixes:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">or(
value.contains(&#34;10.1017&#34;),
value.contains(&#34;10.1007&#34;),
value.contains(&#34;10.1016&#34;),
value.contains(&#34;10.1098&#34;),
value.contains(&#34;10.1111&#34;),
value.contains(&#34;10.1002&#34;),
value.contains(&#34;10.1046&#34;),
value.contains(&#34;10.2135&#34;),
value.contains(&#34;10.1006&#34;),
value.contains(&#34;10.1177&#34;),
value.contains(&#34;10.1079&#34;),
value.contains(&#34;10.2298&#34;),
value.contains(&#34;10.1186&#34;),
value.contains(&#34;10.3835&#34;),
value.contains(&#34;10.1128&#34;),
value.contains(&#34;10.3732&#34;),
value.contains(&#34;10.2134&#34;)
)
</code></pre></div><ul>
<li>Many many of Crossref&rsquo;s records are correct where we have no license, and in some cases more correct when we have a different license
<ul>
<li>I ran license updates on ~167 DOIs in the end on CGSpace</li>
</ul>
</li>
</ul>
2022-02-24 17:15:45 +01:00
<h2 id="2022-02-24">2022-02-24</h2>
<ul>
<li>Update some audience metadata on CGSpace:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">dspace=# UPDATE metadatavalue SET text_value=&#39;Academics&#39; WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=144 AND text_value = &#39;Academicians&#39;;
UPDATE 354
dspace=# UPDATE metadatavalue SET text_value=&#39;Scientists&#39; WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=144 AND text_value = &#39;SCIENTISTS&#39;;
UPDATE 2
2022-02-26 10:49:19 +01:00
</code></pre></div><h2 id="2022-02-25">2022-02-25</h2>
<ul>
<li>A few days ago Gaia sent me her notes on the third batch of TAC/ICW documents (items 401700 in the spreadsheet)
<ul>
<li>I created a filter in LibreOffice and selected the IDs for items with the action &ldquo;delete&rdquo;, then I created a custom text facet in OpenRefine with this GREL:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code>or(
isNotNull(value.match('405')),
isNotNull(value.match('410')),
isNotNull(value.match('412')),
isNotNull(value.match('414')),
isNotNull(value.match('419')),
isNotNull(value.match('436')),
isNotNull(value.match('448')),
isNotNull(value.match('449')),
isNotNull(value.match('450')),
...
isNotNull(value.match('699'))
)
</code></pre><ul>
<li>Then I flagged all matching records, exported a CSV to use with SAFBuilder, and imported them on DSpace Test:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ JAVA_OPTS<span style="color:#f92672">=</span><span style="color:#e6db74">&#34;-Xmx1024m -Dfile.encoding=UTF-8&#34;</span> dspace import --add --eperson<span style="color:#f92672">=</span>fuuu@umm.com --source /tmp/SimpleArchiveFormat --mapfile<span style="color:#f92672">=</span>./2022-02-25-tac-batch3-401to700.map
</code></pre></div><h2 id="2022-02-26">2022-02-26</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 20.04</li>
<li>Start a full AReS harvest</li>
</ul>
<!-- raw HTML omitted -->
2022-02-10 18:35:40 +01:00
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2022-02/">February, 2022</a></li>
<li><a href="/cgspace-notes/2022-01/">January, 2022</a></li>
<li><a href="/cgspace-notes/2021-12/">December, 2021</a></li>
<li><a href="/cgspace-notes/2021-11/">November, 2021</a></li>
<li><a href="/cgspace-notes/2021-10/">October, 2021</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>