cgspace-notes/docs/page/7/index.html
2024-11-19 10:40:23 +03:00

504 lines
18 KiB
HTML

<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="CGSpace Notes" />
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2024-11-11T09:47:00+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.133.1">
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Blog",
"headline": "CGSpace Notes",
"url" : "https://alanorth.github.io/cgspace-notes/",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2024-11-11T09:47:00+03:00",
"keywords": "notes, migration, notes",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/">
<title>CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel="stylesheet" integrity="sha256-xrqAvFBmlVdkWr4F&#43;GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz&#43;lcnA=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
<link rel="alternate" type="application/rss+xml" href="https://alanorth.github.io/cgspace-notes/index.xml" title="CGSpace Notes" />
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link active" href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2020-01/">January, 2020</a></h2>
<p class="blog-post-meta"><time datetime="2020-01-06T10:48:30+02:00">Mon Jan 06, 2020</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2020-01-06">2020-01-06</h2>
<ul>
<li>Open <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=706">a ticket</a> with Atmire to request a quote for the upgrade to DSpace 6</li>
<li>Last week Altmetric responded about the <a href="https://hdl.handle.net/10568/97087">item</a> that had a lower score than than its DOI
<ul>
<li>The score is now linked to the DOI</li>
<li>Another <a href="https://hdl.handle.net/10568/91278">item</a> that had the same problem in 2019 has now also linked to the score for its DOI</li>
<li>Another <a href="https://hdl.handle.net/10568/81236">item</a> that had the same problem in 2019 has also been fixed</li>
</ul>
</li>
</ul>
<h2 id="2020-01-07">2020-01-07</h2>
<ul>
<li>Peter Ballantyne highlighted one more WLE <a href="https://hdl.handle.net/10568/101286">item</a> that is missing the Altmetric score that its DOI has
<ul>
<li>The DOI has a score of 259, but the Handle has no score at all</li>
<li>I <a href="https://twitter.com/mralanorth/status/1214471427157626881">tweeted</a> the CGSpace repository link</li>
</ul>
</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2020-01/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-12/">December, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-12-01T11:22:30+02:00">Sun Dec 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
<li>Check any packages that have residual configs and purge them:</li>
<li><!-- raw HTML omitted --># dpkg -l | grep -E &lsquo;^rc&rsquo; | awk &lsquo;{print $2}&rsquo; | xargs dpkg -P<!-- raw HTML omitted --></li>
<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code># apt update &amp;&amp; apt full-upgrade
# apt-get autoremove &amp;&amp; apt-get autoclean
# dpkg -C
# reboot
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-12/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-11/">November, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-11-04T12:20:30+02:00">Mon Nov 04, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-11-04">2019-11-04</h2>
<ul>
<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul>
<li>I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE &#34;[0-9]{1,2}/Oct/2019&#34;
4671942
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE &#34;[0-9]{1,2}/Oct/2019&#34;
1277694
</code></pre><ul>
<li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
<li>Let&rsquo;s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E &#34;[0-9]{1,2}/Oct/2019&#34;
1183456
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E &#34;[0-9]{1,2}/Oct/2019&#34; | grep -c -E &#34;/rest/bitstreams&#34;
106781
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-11/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-10/">October, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-10-01T13:20:51+03:00">Tue Oct 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script&rsquo;s &ldquo;unneccesary Unicode&rdquo; fix: $ csvcut -c &#39;id,dc.
<a href='https://alanorth.github.io/cgspace-notes/2019-10/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-09/">September, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-09-01T10:17:51+03:00">Sun Sep 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
</ul>
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &#34;01/Sep/2019:0&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &#34;01/Sep/2019:0&#34; | awk &#39;{print $1}&#39; | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-08/">August, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-08-03T12:39:51+03:00">Sat Aug 03, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-08-03">2019-08-03</h2>
<ul>
<li>Look at Bioversity&rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul>
<h2 id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
<ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li>
<li>After rebooting, all statistics cores were loaded&hellip; wow, that&rsquo;s lucky.</li>
</ul>
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-07/">July, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-07-01T12:13:51+03:00">Mon Jul 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-07-01">2019-07-01</h2>
<ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
</ul>
</li>
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-06/">June, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-06-02T10:57:51+03:00">Sun Jun 02, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-06-02">2019-06-02</h2>
<ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul>
<h2 id="2019-06-03">2019-06-03</h2>
<ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-06/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-05/">May, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-05-01T07:37:43+03:00">Wed May 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-05-01">2019-05-01</h2>
<ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<ul>
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
</ul>
</li>
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
</ul>
<pre tabindex="0"><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1
</code></pre><ul>
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-04/">April, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-04-01T09:00:43+03:00">Mon Apr 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>
<li>They asked if we had plans to enable RDF support in CGSpace</li>
</ul>
</li>
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
<ul>
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
</ul>
</li>
</ul>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep &#39;Spore-192-EN-web.pdf&#39; | grep -E &#39;(18.196.196.108|18.195.78.144|18.195.218.6)&#39; | awk &#39;{print $9}&#39; | sort | uniq -c | sort -n | tail -n 5
4432 200
</code></pre><ul>
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
</ul>
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p &#39;fuuu&#39; -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p &#39;fuuu&#39; -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 231 -f cg.coverage.region -d
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/page/6/" rel="prev" role="button">Previous page</a>
<a class="btn btn-outline-primary" href="/cgspace-notes/page/8/" rel="next" role="button">Next page</a>
</nav>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2024-11/">November, 2024</a></li>
<li><a href="/cgspace-notes/2024-10/">October, 2024</a></li>
<li><a href="/cgspace-notes/2024-09/">September, 2024</a></li>
<li><a href="/cgspace-notes/2024-08/">August, 2024</a></li>
<li><a href="/cgspace-notes/2024-07/">July, 2024</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>