<!DOCTYPE html> <html lang="en" > <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta property="og:title" content="September, 2020" /> <meta property="og:description" content="2020-09-02 Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it I restarted it again now and told Moayad that the automatic indexing isn’t working Add Alliance of Bioversity International and CIAT to affiliations on CGSpace Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/39 I filed an issue on OpenRXV to make some minor edits to the admin UI: https://github.com/ilri/OpenRXV/issues/40 " /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-09/" /> <meta property="article:published_time" content="2020-09-02T15:35:54+03:00" /> <meta property="article:modified_time" content="2020-09-03T13:50:56+03:00" /> <meta name="twitter:card" content="summary"/> <meta name="twitter:title" content="September, 2020"/> <meta name="twitter:description" content="2020-09-02 Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it I restarted it again now and told Moayad that the automatic indexing isn’t working Add Alliance of Bioversity International and CIAT to affiliations on CGSpace Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/39 I filed an issue on OpenRXV to make some minor edits to the admin UI: https://github.com/ilri/OpenRXV/issues/40 "/> <meta name="generator" content="Hugo 0.74.3" /> <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "BlogPosting", "headline": "September, 2020", "url": "https://alanorth.github.io/cgspace-notes/2020-09/", "wordCount": "731", "datePublished": "2020-09-02T15:35:54+03:00", "dateModified": "2020-09-03T13:50:56+03:00", "author": { "@type": "Person", "name": "Alan Orth" }, "keywords": "Notes" } </script> <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2020-09/"> <title>September, 2020 | CGSpace Notes</title> <!-- combined, minified CSS --> <link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous"> <!-- minified Font Awesome for SVG icons --> <script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity="sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin="anonymous"></script> <!-- RSS 2.0 feed --> </head> <body> <div class="blog-masthead"> <div class="container"> <nav class="nav blog-nav"> <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a> </nav> </div> </div> <header class="blog-header"> <div class="container"> <h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1> <p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p> </div> </header> <div class="container"> <div class="row"> <div class="col-sm-8 blog-main"> <article class="blog-post"> <header> <h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2020-09/">September, 2020</a></h2> <p class="blog-post-meta"><time datetime="2020-09-02T15:35:54+03:00">Wed Sep 02, 2020</time> by Alan Orth in <span class="fas fa-folder" aria-hidden="true"></span> <a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a> </p> </header> <h2 id="2020-09-02">2020-09-02</h2> <ul> <li>Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS</li> <li>The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it <ul> <li>I restarted it again now and told Moayad that the automatic indexing isn’t working</li> </ul> </li> <li>Add <code>Alliance of Bioversity International and CIAT</code> to affiliations on CGSpace</li> <li>Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button <ul> <li>I filed a bug on OpenRXV: <a href="https://github.com/ilri/OpenRXV/issues/39">https://github.com/ilri/OpenRXV/issues/39</a></li> </ul> </li> <li>I filed an issue on OpenRXV to make some minor edits to the admin UI: <a href="https://github.com/ilri/OpenRXV/issues/40">https://github.com/ilri/OpenRXV/issues/40</a></li> </ul> <ul> <li>I ran the country code tagger on CGSpace:</li> </ul> <pre><code>$ time chrt -b 0 dspace curate -t countrycodetagger -i all -r - -l 500 -s object | tee /tmp/2020-09-02-countrycodetagger.log ... real 2m10.516s user 1m43.953s sys 0m15.192s $ grep -c added /tmp/2020-09-02-countrycodetagger.log 39 </code></pre><ul> <li>I still need to create a cron job for this…</li> <li>Sisay and Abenet said they can’t log in with LDAP on DSpace Test (DSpace 6) <ul> <li>I tried and I can’t either… but it is working on CGSpace</li> <li>The error on DSpace 6 is:</li> </ul> </li> </ul> <pre><code>2020-09-02 12:03:10,666 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=A629116488DCC467E1EA2062A2E2EFD7:ip_addr=92.220.02.201:failed_login:no DN found for user aorth </code></pre><ul> <li>I tried to query LDAP directly using the application credentials with ldapsearch and it works:</li> </ul> <pre><code>$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b "dc=cgiarad,dc=org" -D "applicationaccount@cgiarad.org" -W "(sAMAccountName=me)" </code></pre><ul> <li>According to the <a href="https://wiki.lyrasis.org/display/DSDOC6x/Authentication+Plugins#AuthenticationPlugins-LDAPAuthentication">DSpace 6 docs</a> we need to escape commas in our LDAP parameters due to the new configuration system <ul> <li>I added the commas and restarted DSpace (though technically we shouldn’t need to restart due to the new config system hot reloading configs)</li> <li>Run all system updates on DSpace Test (linode26) and reboot it</li> <li>After the restart LDAP login works…</li> </ul> </li> </ul> <h2 id="2020-09-03">2020-09-03</h2> <ul> <li>Fix some erroneous “review status” fields that Abenet noticed on AReS <ul> <li>I used my <code>fix-metadata-values.py</code> and <code>delete-metadata-values.py</code> scripts with the following input files:</li> </ul> </li> </ul> <pre><code>$ cat 2020-09-03-fix-review-status.csv dc.description.version,correct Externally Peer Reviewed,Peer Review Peer Reviewed,Peer Review Peer review,Peer Review Peer reviewed,Peer Review Peer-Reviewed,Peer Review Peer-reviewed,Peer Review peer Review,Peer Review $ cat 2020-09-03-delete-review-status.csv dc.description.version Report Formally Published Poster Unrefereed reprint $ ./delete-metadata-values.py -i 2020-09-03-delete-review-status.csv -db dspace -u dspace -p 'fuuu' -f dc.description.version -m 68 $ ./fix-metadata-values.py -i 2020-09-03-fix-review-status.csv -db dspace -u dspace -p 'fuuu' -f dc.description.version -t 'correct' -m 68 </code></pre><ul> <li>Start reviewing 95 items for IITA (20201stbatch) <ul> <li>I used my <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> tool to check and fix some low-hanging fruit first</li> <li>This fixed a few unnecessary Unicode, excessive whitespace, invalid multi-value separator, and duplicate metadata values</li> <li>Then I looked at the data in OpenRefine and noticed some things: <ul> <li>All issue dates use year only, but some have months in the citation so they could be more specific</li> <li>I normalized all the DOIs to use “<a href="https://doi.org">https://doi.org</a>” format</li> <li>I fixed a few AGROVOC subjects with a simple GREL: <code>value.replace("GRAINS","GRAIN").replace("SOILS","SOIL").replace("CORN","MAIZE")</code></li> <li>But there are a few more that are invalid that she will have to look at</li> <li>I uploaded the items to <a href="https://dspacetest.cgiar.org/handle/10568/108357">DSpace Test</a> and it was apparently successful but I get these errors to the console:</li> </ul> </li> </ul> </li> </ul> <pre><code>Thu Sep 03 12:26:33 CEST 2020 | Query:containerItem:ea7a2648-180d-4fce-bdc5-c3aa2304fc58 Error while updating java.lang.NullPointerException at com.atmire.dspace.cua.CUASolrLoggerServiceImpl$5.visit(SourceFile:1131) at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.visitEachStatisticShard(SourceFile:212) at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1104) at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1093) at org.dspace.statistics.StatisticsLoggingConsumer.consume(SourceFile:104) at org.dspace.event.BasicDispatcher.consume(BasicDispatcher.java:177) at org.dspace.event.BasicDispatcher.dispatch(BasicDispatcher.java:123) at org.dspace.core.Context.dispatchEvents(Context.java:455) at org.dspace.core.Context.commit(Context.java:424) at org.dspace.core.Context.complete(Context.java:380) at org.dspace.app.bulkedit.MetadataImport.main(MetadataImport.java:1399) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81) </code></pre><ul> <li>There are more in the DSpace log so I will raise it with Atmire immediately</li> </ul> <h2 id="2020-09-04">2020-09-04</h2> <ul> <li>I was checking the recent IITA data for duplicates when I noticed that one in CIFOR’s Archive and saw that CIFOR has updated a bunch of their website URLs, for example: <ul> <li><a href="http://www.cifor.org/nc/online-library/browse/view-publication/publication/151.html">http://www.cifor.org/nc/online-library/browse/view-publication/publication/151.html</a> → <a href="https://www.cifor.org/knowledge/publication/151">https://www.cifor.org/knowledge/publication/151</a></li> <li><a href="https://www.cifor.org/library/4033">https://www.cifor.org/library/4033</a> → <a href="https://www.cifor.org/knowledge/publication/4033">https://www.cifor.org/knowledge/publication/4033</a></li> <li><a href="https://www.cifor.org/pid/5087">https://www.cifor.org/pid/5087</a> → <a href="https://www.cifor.org/knowledge/publication/5087">https://www.cifor.org/knowledge/publication/5087</a></li> </ul> </li> <li>I will update our nearly 6,000 metadata values for CIFOR in the database accordingly:</li> </ul> <pre><code>dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^(http://)?www\.cifor\.org/(nc/)?online-library/browse/view-publication/publication/([[:digit:]]+)\.html$', 'https://www.cifor.org/knowledge/publication/\3') WHERE metadata_field_id=219 AND text_value ~ 'www\.cifor\.org/(nc/)?online-library/browse/view-publication/publication/[[:digit:]]+'; dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^https?://www\.cifor\.org/library/([[:digit:]]+)/?$', 'https://www.cifor.org/knowledge/publication/\1') WHERE metadata_field_id=219 AND text_value ~ 'https?://www\.cifor\.org/library/[[:digit:]]+/?'; dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^https?://www\.cifor\.org/pid/([[:digit:]]+)/?$', 'https://www.cifor.org/knowledge/publication/\1') WHERE metadata_field_id=219 AND text_value ~ 'https?://www\.cifor\.org/pid/[[:digit:]]+'; </code></pre><ul> <li>I did some cleanup on the author affiliations of the IITA data our 2019-04 list using reconcile-csv and OpenRefine: <ul> <li><code>$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id</code></li> <li>I always forget how to copy the reconciled values in OpenRefine, but you need to make a new column and populate it using this GREL: <code>if(cell.recon.matched, cell.recon.match.name, value)</code></li> </ul> </li> <li>I mapped one duplicated from the CIFOR Archives and re-uploaded the 94 IITA items to a new collection on <a href="https://dspacetest.cgiar.org/handle/10568/108453">DSpace Test</a></li> </ul> <!-- raw HTML omitted --> </article> </div> <!-- /.blog-main --> <aside class="col-sm-3 ml-auto blog-sidebar"> <section class="sidebar-module"> <h4>Recent Posts</h4> <ol class="list-unstyled"> <li><a href="/cgspace-notes/2020-09/">September, 2020</a></li> <li><a href="/cgspace-notes/2020-08/">August, 2020</a></li> <li><a href="/cgspace-notes/2020-07/">July, 2020</a></li> <li><a href="/cgspace-notes/2020-06/">June, 2020</a></li> <li><a href="/cgspace-notes/2020-05/">May, 2020</a></li> </ol> </section> <section class="sidebar-module"> <h4>Links</h4> <ol class="list-unstyled"> <li><a href="https://cgspace.cgiar.org">CGSpace</a></li> <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li> <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li> </ol> </section> </aside> </div> <!-- /.row --> </div> <!-- /.container --> <footer class="blog-footer"> <p dir="auto"> Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>. </p> <p> <a href="#">Back to top</a> </p> </footer> </body> </html>