<!DOCTYPE html> <html lang="en" > <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta property="og:title" content="December, 2019" /> <meta property="og:description" content="2019-12-01 Upgrade CGSpace (linode18) to Ubuntu 18.04: Check any packages that have residual configs and purge them: # dpkg -l | grep -E ‘^rc’ | awk ‘{print $2}’ | xargs dpkg -P Make sure all packages are up to date and the package manager is up to date, then reboot: # apt update && apt full-upgrade # apt-get autoremove && apt-get autoclean # dpkg -C # reboot " /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-12/" /> <meta property="article:published_time" content="2019-12-01T11:22:30+02:00" /> <meta property="article:modified_time" content="2019-12-30T14:28:15+02:00" /> <meta name="twitter:card" content="summary"/> <meta name="twitter:title" content="December, 2019"/> <meta name="twitter:description" content="2019-12-01 Upgrade CGSpace (linode18) to Ubuntu 18.04: Check any packages that have residual configs and purge them: # dpkg -l | grep -E ‘^rc’ | awk ‘{print $2}’ | xargs dpkg -P Make sure all packages are up to date and the package manager is up to date, then reboot: # apt update && apt full-upgrade # apt-get autoremove && apt-get autoclean # dpkg -C # reboot "/> <meta name="generator" content="Hugo 0.87.0" /> <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "BlogPosting", "headline": "December, 2019", "url": "https://alanorth.github.io/cgspace-notes/2019-12/", "wordCount": "1551", "datePublished": "2019-12-01T11:22:30+02:00", "dateModified": "2019-12-30T14:28:15+02:00", "author": { "@type": "Person", "name": "Alan Orth" }, "keywords": "Notes" } </script> <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2019-12/"> <title>December, 2019 | CGSpace Notes</title> <!-- combined, minified CSS --> <link href="https://alanorth.github.io/cgspace-notes/css/style.beb8012edc08ba10be012f079d618dc243812267efe62e11f22fe49618f976a4.css" rel="stylesheet" integrity="sha256-vrgBLtwIuhC+AS8HnWGNwkOBImfv5i4R8i/klhj5dqQ=" crossorigin="anonymous"> <!-- minified Font Awesome for SVG icons --> <script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.ffbfea088a9a1666ec65c3a8cb4906e2a0e4f92dc70dbbf400a125ad2422123a.js" integrity="sha256-/7/qCIqaFmbsZcOoy0kG4qDk+S3HDbv0AKElrSQiEjo=" crossorigin="anonymous"></script> <!-- RSS 2.0 feed --> </head> <body> <div class="blog-masthead"> <div class="container"> <nav class="nav blog-nav"> <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a> </nav> </div> </div> <header class="blog-header"> <div class="container"> <h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1> <p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p> </div> </header> <div class="container"> <div class="row"> <div class="col-sm-8 blog-main"> <article class="blog-post"> <header> <h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-12/">December, 2019</a></h2> <p class="blog-post-meta"> <time datetime="2019-12-01T11:22:30+02:00">Sun Dec 01, 2019</time> in <span class="fas fa-folder" aria-hidden="true"></span> <a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a> </p> </header> <h2 id="2019-12-01">2019-12-01</h2> <ul> <li>Upgrade CGSpace (linode18) to Ubuntu 18.04: <ul> <li>Check any packages that have residual configs and purge them:</li> <li><!-- raw HTML omitted --># dpkg -l | grep -E ‘^rc’ | awk ‘{print $2}’ | xargs dpkg -P<!-- raw HTML omitted --></li> <li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li> </ul> </li> </ul> <pre><code># apt update && apt full-upgrade # apt-get autoremove && apt-get autoclean # dpkg -C # reboot </code></pre><ul> <li>Take some backups:</li> </ul> <pre><code># dpkg -l > 2019-12-01-linode18-dpkg.txt # tar czf 2019-12-01-linode18-etc.tar.gz /etc </code></pre><ul> <li>Then check all third-party repositories in /etc/apt to see if everything using “xenial” has packages available for “bionic” and then update the sources:</li> <li><!-- raw HTML omitted --># sed -i ’s/xenial/bionic/' /etc/apt/sources.list.d/*.list<!-- raw HTML omitted --></li> <li>Pause the Uptime Robot monitoring for CGSpace</li> <li>Make sure the update manager is installed and do the upgrade:</li> </ul> <pre><code># apt install update-manager-core # do-release-upgrade </code></pre><ul> <li>After the upgrade finishes, remove Java 11, force the installation of bionic nginx, and reboot the server:</li> </ul> <pre><code># apt purge openjdk-11-jre-headless # apt install 'nginx=1.16.1-1~bionic' # reboot </code></pre><ul> <li>After the server comes back up, remove Python virtualenvs that were created with Python 3.5 and re-run certbot to make sure it’s working:</li> </ul> <pre><code># rm -rf /opt/eff.org/certbot/venv/bin/letsencrypt # rm -rf /opt/ilri/dspace-statistics-api/venv # /opt/certbot-auto </code></pre><ul> <li>Clear Ansible’s fact cache and re-run the playbooks to update the system’s firewalls, SSH config, etc</li> <li>Altmetric finally responded to my question about Dublin Core fields <ul> <li>They shared a <a href="https://help.altmetric.com/support/solutions/articles/6000141419-what-metadata-is-required-to-track-our-content-">list of fields they use for tracking</a>, but it only mentions HTML meta tags, and not fields considered when harvesting via OAI</li> <li>Anyways, there might be some areas we can improve on the HTML meta tags, if I look at one <a href="https://hdl.handle.net/10568/101623">item with a DOI, ISSN, etc</a> I see that we could at least add status (Open Access) and journal title</li> <li>I merged a <a href="https://github.com/ilri/DSpace/pull/438">pull request</a> into the <code>5_x-prod</code> branch to add status and journal title to the XHTML meta tags</li> </ul> </li> </ul> <h2 id="2019-12-02">2019-12-02</h2> <ul> <li>Raise the issue of old, low-quality thumbnails with Peter and the CGSpace team <ul> <li>I suggested that we move manually uploaded thumbnails from the <code>ORIGINAL</code> bundle to the <code>THUMBNAIL</code> bundle</li> <li>Also replace old thumbnails where an item is available on Slideshare or YouTube because those are easy to get new, high-quality thumbnails for</li> </ul> </li> <li>Continue testing CG Core v2 implementation on DSpace Test <ul> <li>Compare the OAI QDC representation of a few items on CGSpace vs DSpace Test:</li> </ul> </li> </ul> <pre><code>$ http 'https://cgspace.cgiar.org/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:cgspace.cgiar.org:10568/104030' > /tmp/cgspace-104030.xml $ http 'https://dspacetest.cgiar.org/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:cgspace.cgiar.org:10568/104030' > /tmp/dspacetest-104030.xml </code></pre><ul> <li>The DSpace Test ones actually now capture the DOI, where the CGSpace doesn’t…</li> <li>And the DSpace Test one doesn’t include review status as <code>dc.description</code>, but I don’t think that’s an important field</li> </ul> <h2 id="2019-12-04">2019-12-04</h2> <ul> <li>Peter noticed that there were about seventy items on CGSpace that were marked as private <ul> <li>Some have been withdrawn, but I extracted a list of the forty-eight that were not:</li> </ul> </li> </ul> <pre><code>dspace=# \COPY (SELECT handle, owning_collection FROM item, handle WHERE item.discoverable='f' AND item.in_archive='t' AND handle.resource_id = item.item_id) to /tmp/2019-12-04-CGSpace-private-items.csv WITH CSV HEADER; COPY 48 </code></pre><h2 id="2019-12-05">2019-12-05</h2> <ul> <li>Give <a href="https://hdl.handle.net/10568/106045">presentation about CG Core v2</a> to the MEL Developers' Retreat in Nairobi, Kenya (via Skype)</li> <li>Send some pull requests to the cg-core schema repository: <ul> <li><a href="https://github.com/AgriculturalSemantics/cg-core/pull/16">HTML syntax fixes</a></li> <li><a href="https://github.com/AgriculturalSemantics/cg-core/pull/17">Add LICENSE file</a></li> <li><a href="https://github.com/AgriculturalSemantics/cg-core/pull/18">Build main.css using npm build</a></li> </ul> </li> </ul> <h2 id="2019-12-08">2019-12-08</h2> <ul> <li>Enrico noticed that the AReS Explorer on CGSpace (linode18) was down <ul> <li>I only see HTTP 502 in the nginx logs on CGSpace… so I assume it’s something wrong with the AReS server</li> <li>I ran all system updates on the AReS server (linode20) and rebooted it</li> <li>After rebooting the Explorer was accessible again</li> </ul> </li> </ul> <h2 id="2019-12-09">2019-12-09</h2> <ul> <li>Update PostgreSQL JDBC driver to <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.9">version 42.2.9</a> in <a href="https://github.com/ilri/rmg-ansible-public">Ansible playbooks</a> <ul> <li>Deploy on DSpace Test (linode19) to test before deploying on CGSpace in a few days</li> </ul> </li> <li>Altmetric responded to my question about <a href="https://hdl.handle.net/10568/97087">the WLE item</a> that has a lower score than its DOI <ul> <li>They say that they will “reprocess” the item “before Christmas”</li> </ul> </li> </ul> <h2 id="2019-12-11">2019-12-11</h2> <ul> <li>Post <a href="https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=454830191804416">message to Yammer about good practices for thumbnails on CGSpace</a> <ul> <li>On the topic of thumbnails, I’m thinking we might want to force regenerate all PDF thumbnails on CGSpace since we upgraded it to Ubuntu 18.04 and got a new ghostscript…</li> </ul> </li> <li>More discussion about report formats for AReS</li> <li>Peter noticed that the Atmire reports weren’t showing any statistics before 2019 <ul> <li>I checked and indeed Solr had an issue loading some core last time it was started</li> <li>I restarted Tomcat three times before all cores came up successfully</li> </ul> </li> <li>While I was restarting the Tomcat service I upgraded the PostgreSQL JDBC driver to version 42.2.9, which had been deployed on DSpace Test earlier this week</li> </ul> <h2 id="2019-12-16">2019-12-16</h2> <ul> <li>Visit CodeObia office to discuss next phase of OpenRXV/AReS development <ul> <li>We discussed using CSV instead of Excel for tabular reports <ul> <li>OpenRXV should only have “simple” reports with Dublin Core fields</li> <li>AReS should have this as well as a customized “extended” report that has CRPs, Subjects, Sponsors, etc from CGSpace</li> </ul> </li> <li>We discussed using RTF instead of Word for graphical reports</li> </ul> </li> </ul> <h2 id="2019-12-17">2019-12-17</h2> <ul> <li>Start filing GitHub issues for the reporting features on OpenRXV and AReS <ul> <li>I created an issue for the “simple” tabular reports on OpenRXV GitHub (<a href="https://github.com/ilri/OpenRXV/issues/29">#29</a>)</li> <li>I created an issue for the “extended” tabular reports on AReS GitHub (<a href="https://github.com/ilri/AReS/issues/8">#8</a>)</li> <li>I created an issue for “simple” text reports on the OpenRXV GitHub (<a href="https://github.com/ilri/OpenRXV/issues/30">#30</a>)</li> <li>I created an issue for “extended” text reports on the AReS GitHub (<a href="https://github.com/ilri/AReS/issues/9">#9</a>)</li> </ul> </li> <li>I looked into creating RTF documents from HTML in Node.js and there is a library called <a href="https://www.npmjs.com/package/html-to-rtf">html-to-rtf</a> that works well, but doesn’t support images</li> <li>Export a list of all investors (<code>dc.description.sponsorship</code>) for Peter to look through and correct:</li> </ul> <pre><code>dspace=# \COPY (SELECT DISTINCT text_value as "dc.contributor.sponsor", count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 29 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-12-17-investors.csv WITH CSV HEADER; COPY 643 </code></pre><h2 id="2019-12-18">2019-12-18</h2> <ul> <li>Apply the investor corrections and deletions from Peter on CGSpace:</li> </ul> <pre><code>$ ./fix-metadata-values.py -i /tmp/2019-12-17-investors-fix-112.csv -db dspace -u dspace -p 'fuuu' -f dc.description.sponsorship -m 29 -t correct -d $ ./delete-metadata-values.py -i /tmp/2019-12-17-investors-delete-68.csv -db dspace -u dspace -p 'fuuu' -m 29 -f dc.description.sponsorship -d </code></pre><ul> <li>Peter asked about the “Open Government Licence 3.0” that is used by <a href="">some items</a> <ul> <li>I notice that it <a href="https://spdx.org/licenses/OGL-UK-3.0.html">exists in SPDX as <code>UGL-UK-3.0</code></a> so I created a GitHub issue to add this to our controlled vocabulary (<a href="https://github.com/ilri/DSpace/issues/439">#439</a>)</li> <li>I only see two in our database that use this for now, so I will update them:</li> </ul> </li> </ul> <pre><code>dspace=# SELECT text_value FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%Open%'; text_value ----------------------------- Open Government License 3.0 Open Government License 3.0 (2 rows) dspace=# UPDATE metadatavalue SET text_value='OGL-UK-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%Open Government License 3.0%'; UPDATE 2 </code></pre><ul> <li>I created a pull request to add the license and merged it to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/440">#440</a>)</li> <li>Add three new CCAFS Phase II project tags to CGSpace (<a href="https://github.com/ilri/DSpace/pull/441">#441</a>)</li> <li>Linode said DSpace Test (linode19) had an outbound traffic rate of 73Mb/sec for the last two hours <ul> <li>I see some Russian bot active in nginx’s access logs:</li> </ul> </li> </ul> <pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -c MegaIndex.ru 27320 </code></pre><ul> <li>I see they <em>did</em> check <code>robots.txt</code> and their requests are only going to XMLUI item pages… so I guess I just leave them alone</li> <li>Peter wrote to ask why this <a href="https://cgspace.cgiar.org/handle/10568/101286">one WLE item</a> does <a href="https://api.altmetric.com/v1/handle/10568/101286">not have an Altmetric attention score</a>, but <a href="https://api.altmetric.com/v1/doi/10.1126/science.aaw0911">the DOI does</a> <ul> <li>I tweeted the item just in case, but Peter said that he already did it yesterday</li> <li>The item was added six months ago…</li> <li>The DOI has an Altmetric score of 259, but for the Handle it is HTTP 404!</li> <li>I emailed Altmetric support</li> </ul> </li> </ul> <h2 id="2019-12-22">2019-12-22</h2> <ul> <li>I ran the <code>dspace cleanup</code> process on CGSpace (linode18) and had an error:</li> </ul> <pre><code>Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle" Detail: Key (bitstream_id)=(179441) is still referenced from table "bundle". </code></pre><ul> <li>The solution is to delete that bitstream manually:</li> </ul> <pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (179441);' UPDATE 1 </code></pre><ul> <li>Adjust <a href="/cgspace-notes/cgspace-cgcorev2-migration/">CG Core v2 migrataion notes</a> to use <code>cg.review-status</code> instead of <code>cg.peer-reviewed</code> <ul> <li>I had <a href="https://github.com/AgriculturalSemantics/cg-core/issues/14">raised the issue</a> with Marie-Angelique earlier this month</li> <li>It makes much more sense to use a wider scope here than a simple boolean</li> </ul> </li> <li>I also noticed another field that we should be using in DCTERMS instead of CG: <code>cg.targetaudience</code> <ul> <li><a href="https://www.dublincore.org/specifications/dublin-core/dcmi-terms/2012-06-14/?v=terms#audience">DCTERMS says that <code>dcterms.audience</code> should be used to describe a A class of entity for whom the resource is intended or useful."</a></li> <li>I will update my notes for this so that we use that field instead</li> <li>I don’t see “audience” on the <a href="https://github.com/AgriculturalSemantics/cg-core/">cg-core</a> repository so I filed <a href="https://github.com/AgriculturalSemantics/cg-core/issues/19">an issue</a> to raise it with Marie-Angelique</li> </ul> </li> </ul> <h2 id="2019-12-23">2019-12-23</h2> <ul> <li>Follow up with Altmetric on the issue where <a href="https://hdl.handle.net/10568/97087">an item</a> has a different (lower) score for its Handle despite it having a correct DOI (with a higher score) <ul> <li>I’ve raised this issue three times to Altmetric this year, and a few weeks ago they said they would re-process the item “before Christmas”</li> </ul> </li> <li>Abenet suggested we use <code>cg.reviewStatus</code> instead of <code>cg.review-status</code> and I agree that we should follow other examples like <code>DCTERMS.accessRights</code> and <code>DCTERMS.isPartOf</code> <ul> <li>I updated <a href="https://github.com/AgriculturalSemantics/cg-core/issues/14">my comment on the cg-core issue on GitHub</a></li> </ul> </li> </ul> <h2 id="2019-12-30">2019-12-30</h2> <ul> <li>Altmetric responded a few days ago about the <a href="https://hdl.handle.net/10568/97087">item</a> that has a different (lower) score for its Handle despite it having a correct DOI (with a higher score) <ul> <li>She tweeted the repository link and agreed that it didn’t get picked up by Altmetric</li> <li>She said she will add this to the existing ticket about the previous items I had raised an issue about</li> </ul> </li> <li>Update Tomcat to version 7.0.99 in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and deploy on DSpace Test (linode19)</li> </ul> <!-- raw HTML omitted --> </article> </div> <!-- /.blog-main --> <aside class="col-sm-3 ml-auto blog-sidebar"> <section class="sidebar-module"> <h4>Recent Posts</h4> <ol class="list-unstyled"> <li><a href="/cgspace-notes/2021-08/">August, 2021</a></li> <li><a href="/cgspace-notes/2021-07/">July, 2021</a></li> <li><a href="/cgspace-notes/2021-06/">June, 2021</a></li> <li><a href="/cgspace-notes/2021-05/">May, 2021</a></li> <li><a href="/cgspace-notes/2021-04/">April, 2021</a></li> </ol> </section> <section class="sidebar-module"> <h4>Links</h4> <ol class="list-unstyled"> <li><a href="https://cgspace.cgiar.org">CGSpace</a></li> <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li> <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li> </ol> </section> </aside> </div> <!-- /.row --> </div> <!-- /.container --> <footer class="blog-footer"> <p dir="auto"> Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>. </p> <p> <a href="#">Back to top</a> </p> </footer> </body> </html>