<!DOCTYPE html> <html lang="en" > <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> <meta property="og:title" content="September, 2020" /> <meta property="og:description" content="2020-09-02 Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it I restarted it again now and told Moayad that the automatic indexing isn’t working Add Alliance of Bioversity International and CIAT to affiliations on CGSpace Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/39 I filed an issue on OpenRXV to make some minor edits to the admin UI: https://github.com/ilri/OpenRXV/issues/40 " /> <meta property="og:type" content="article" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-09/" /> <meta property="article:published_time" content="2020-09-02T15:35:54+03:00" /> <meta property="article:modified_time" content="2020-10-01T10:47:40+03:00" /> <meta name="twitter:card" content="summary"/> <meta name="twitter:title" content="September, 2020"/> <meta name="twitter:description" content="2020-09-02 Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it I restarted it again now and told Moayad that the automatic indexing isn’t working Add Alliance of Bioversity International and CIAT to affiliations on CGSpace Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/39 I filed an issue on OpenRXV to make some minor edits to the admin UI: https://github.com/ilri/OpenRXV/issues/40 "/> <meta name="generator" content="Hugo 0.128.2"> <script type="application/ld+json"> { "@context": "http://schema.org", "@type": "BlogPosting", "headline": "September, 2020", "url": "https://alanorth.github.io/cgspace-notes/2020-09/", "wordCount": "2970", "datePublished": "2020-09-02T15:35:54+03:00", "dateModified": "2020-10-01T10:47:40+03:00", "author": { "@type": "Person", "name": "Alan Orth" }, "keywords": "Notes" } </script> <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2020-09/"> <title>September, 2020 | CGSpace Notes</title> <!-- combined, minified CSS --> <link href="https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel="stylesheet" integrity="sha256-xrqAvFBmlVdkWr4F+GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin="anonymous"> <!-- minified Font Awesome for SVG icons --> <script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz+lcnA=" crossorigin="anonymous"></script> <!-- RSS 2.0 feed --> </head> <body> <div class="blog-masthead"> <div class="container"> <nav class="nav blog-nav"> <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a> </nav> </div> </div> <header class="blog-header"> <div class="container"> <h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1> <p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p> </div> </header> <div class="container"> <div class="row"> <div class="col-sm-8 blog-main"> <article class="blog-post"> <header> <h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2020-09/">September, 2020</a></h2> <p class="blog-post-meta"> <time datetime="2020-09-02T15:35:54+03:00">Wed Sep 02, 2020</time> in <span class="fas fa-folder" aria-hidden="true"></span> <a href="/categories/notes/" rel="category tag">Notes</a> </p> </header> <h2 id="2020-09-02">2020-09-02</h2> <ul> <li>Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS</li> <li>The AReS Explorer hasn’t updated its index since 2020-08-22 when I last forced it <ul> <li>I restarted it again now and told Moayad that the automatic indexing isn’t working</li> </ul> </li> <li>Add <code>Alliance of Bioversity International and CIAT</code> to affiliations on CGSpace</li> <li>Abenet told me that the general search text on AReS doesn’t get reset when you use the “Reset Filters” button <ul> <li>I filed a bug on OpenRXV: <a href="https://github.com/ilri/OpenRXV/issues/39">https://github.com/ilri/OpenRXV/issues/39</a></li> </ul> </li> <li>I filed an issue on OpenRXV to make some minor edits to the admin UI: <a href="https://github.com/ilri/OpenRXV/issues/40">https://github.com/ilri/OpenRXV/issues/40</a></li> </ul> <ul> <li>I ran the country code tagger on CGSpace:</li> </ul> <pre tabindex="0"><code>$ time chrt -b 0 dspace curate -t countrycodetagger -i all -r - -l 500 -s object | tee /tmp/2020-09-02-countrycodetagger.log ... real 2m10.516s user 1m43.953s sys 0m15.192s $ grep -c added /tmp/2020-09-02-countrycodetagger.log 39 </code></pre><ul> <li>I still need to create a cron job for this…</li> <li>Sisay and Abenet said they can’t log in with LDAP on DSpace Test (DSpace 6) <ul> <li>I tried and I can’t either… but it is working on CGSpace</li> <li>The error on DSpace 6 is:</li> </ul> </li> </ul> <pre tabindex="0"><code>2020-09-02 12:03:10,666 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=A629116488DCC467E1EA2062A2E2EFD7:ip_addr=92.220.02.201:failed_login:no DN found for user aorth </code></pre><ul> <li>I tried to query LDAP directly using the application credentials with ldapsearch and it works:</li> </ul> <pre tabindex="0"><code>$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b "dc=cgiarad,dc=org" -D "applicationaccount@cgiarad.org" -W "(sAMAccountName=me)" </code></pre><ul> <li>According to the <a href="https://wiki.lyrasis.org/display/DSDOC6x/Authentication+Plugins#AuthenticationPlugins-LDAPAuthentication">DSpace 6 docs</a> we need to escape commas in our LDAP parameters due to the new configuration system <ul> <li>I added the commas and restarted DSpace (though technically we shouldn’t need to restart due to the new config system hot reloading configs)</li> <li>Run all system updates on DSpace Test (linode26) and reboot it</li> <li>After the restart LDAP login works…</li> </ul> </li> </ul> <h2 id="2020-09-03">2020-09-03</h2> <ul> <li>Fix some erroneous “review status” fields that Abenet noticed on AReS <ul> <li>I used my <code>fix-metadata-values.py</code> and <code>delete-metadata-values.py</code> scripts with the following input files:</li> </ul> </li> </ul> <pre tabindex="0"><code>$ cat 2020-09-03-fix-review-status.csv dc.description.version,correct Externally Peer Reviewed,Peer Review Peer Reviewed,Peer Review Peer review,Peer Review Peer reviewed,Peer Review Peer-Reviewed,Peer Review Peer-reviewed,Peer Review peer Review,Peer Review $ cat 2020-09-03-delete-review-status.csv dc.description.version Report Formally Published Poster Unrefereed reprint $ ./delete-metadata-values.py -i 2020-09-03-delete-review-status.csv -db dspace -u dspace -p 'fuuu' -f dc.description.version -m 68 $ ./fix-metadata-values.py -i 2020-09-03-fix-review-status.csv -db dspace -u dspace -p 'fuuu' -f dc.description.version -t 'correct' -m 68 </code></pre><ul> <li>Start reviewing 95 items for IITA (20201stbatch) <ul> <li>I used my <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> tool to check and fix some low-hanging fruit first</li> <li>This fixed a few unnecessary Unicode, excessive whitespace, invalid multi-value separator, and duplicate metadata values</li> <li>Then I looked at the data in OpenRefine and noticed some things: <ul> <li>All issue dates use year only, but some have months in the citation so they could be more specific</li> <li>I normalized all the DOIs to use “<a href="https://doi.org">https://doi.org</a>” format</li> <li>I fixed a few AGROVOC subjects with a simple GREL: <code>value.replace("GRAINS","GRAIN").replace("SOILS","SOIL").replace("CORN","MAIZE")</code></li> <li>But there are a few more that are invalid that she will have to look at</li> <li>I uploaded the items to <a href="https://dspacetest.cgiar.org/handle/10568/108357">DSpace Test</a> and it was apparently successful but I get these errors to the console:</li> </ul> </li> </ul> </li> </ul> <pre tabindex="0"><code>Thu Sep 03 12:26:33 CEST 2020 | Query:containerItem:ea7a2648-180d-4fce-bdc5-c3aa2304fc58 Error while updating java.lang.NullPointerException at com.atmire.dspace.cua.CUASolrLoggerServiceImpl$5.visit(SourceFile:1131) at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.visitEachStatisticShard(SourceFile:212) at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1104) at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1093) at org.dspace.statistics.StatisticsLoggingConsumer.consume(SourceFile:104) at org.dspace.event.BasicDispatcher.consume(BasicDispatcher.java:177) at org.dspace.event.BasicDispatcher.dispatch(BasicDispatcher.java:123) at org.dspace.core.Context.dispatchEvents(Context.java:455) at org.dspace.core.Context.commit(Context.java:424) at org.dspace.core.Context.complete(Context.java:380) at org.dspace.app.bulkedit.MetadataImport.main(MetadataImport.java:1399) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81) </code></pre><ul> <li>There are more in the DSpace log so I will raise it with Atmire immediately</li> </ul> <h2 id="2020-09-04">2020-09-04</h2> <ul> <li>I was checking the recent IITA data for duplicates when I noticed that one in CIFOR’s Archive and saw that CIFOR has updated a bunch of their website URLs, for example: <ul> <li><a href="http://www.cifor.org/nc/online-library/browse/view-publication/publication/151.html">http://www.cifor.org/nc/online-library/browse/view-publication/publication/151.html</a> → <a href="https://www.cifor.org/knowledge/publication/151">https://www.cifor.org/knowledge/publication/151</a></li> <li><a href="https://www.cifor.org/library/4033">https://www.cifor.org/library/4033</a> → <a href="https://www.cifor.org/knowledge/publication/4033">https://www.cifor.org/knowledge/publication/4033</a></li> <li><a href="https://www.cifor.org/pid/5087">https://www.cifor.org/pid/5087</a> → <a href="https://www.cifor.org/knowledge/publication/5087">https://www.cifor.org/knowledge/publication/5087</a></li> </ul> </li> <li>I will update our nearly 6,000 metadata values for CIFOR in the database accordingly:</li> </ul> <pre tabindex="0"><code>dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^(http://)?www\.cifor\.org/(nc/)?online-library/browse/view-publication/publication/([[:digit:]]+)\.html$', 'https://www.cifor.org/knowledge/publication/\3') WHERE metadata_field_id=219 AND text_value ~ 'www\.cifor\.org/(nc/)?online-library/browse/view-publication/publication/[[:digit:]]+'; dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^https?://www\.cifor\.org/library/([[:digit:]]+)/?$', 'https://www.cifor.org/knowledge/publication/\1') WHERE metadata_field_id=219 AND text_value ~ 'https?://www\.cifor\.org/library/[[:digit:]]+/?'; dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^https?://www\.cifor\.org/pid/([[:digit:]]+)/?$', 'https://www.cifor.org/knowledge/publication/\1') WHERE metadata_field_id=219 AND text_value ~ 'https?://www\.cifor\.org/pid/[[:digit:]]+'; </code></pre><ul> <li>I did some cleanup on the author affiliations of the IITA data our 2019-04 list using reconcile-csv and OpenRefine: <ul> <li><code>$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id</code></li> <li>I always forget how to copy the reconciled values in OpenRefine, but you need to make a new column and populate it using this GREL: <code>if(cell.recon.matched, cell.recon.match.name, value)</code></li> </ul> </li> <li>I mapped one duplicated from the CIFOR Archives and re-uploaded the 94 IITA items to a new collection on <a href="https://dspacetest.cgiar.org/handle/10568/108453">DSpace Test</a></li> </ul> <h2 id="2020-09-08">2020-09-08</h2> <ul> <li>I noticed that the “share” link in AReS wasn’t working properly because it excludes the “explorer” part of the URI</li> </ul> <p><img src="/cgspace-notes/2020/09/ares-share-link.png" alt="AReS share link broken"></p> <ul> <li>I filed an issue on GitHub: <a href="https://github.com/ilri/OpenRXV/issues/41">https://github.com/ilri/OpenRXV/issues/41</a></li> <li>I uploaded the 94 IITA items that I had been working on last week to CGSpace</li> <li>RTB emailed to ask why they are getting HTTP 503 errors during harvesting to the RTB WordPress website <ul> <li>From the screenshot I can see they are requesting URLs like this:</li> </ul> </li> </ul> <pre tabindex="0"><code>https://cgspace.cgiar.org/bitstream/handle/10568/82745/Characteristics-Silage.JPG </code></pre><ul> <li>So they end up getting rate limited due to the XMLUI rate limits <ul> <li>I told them to use the REST API bitstream retrieve links, because we don’t have any rate limits there</li> </ul> </li> </ul> <h2 id="2020-09-09">2020-09-09</h2> <ul> <li>Wire up the systemd service/timer for the CGSpace Country Code Tagger curation task in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> <ul> <li><del>For now it won’t work on DSpace 6 because the curation task invocation needs to be slightly different (minus the <code>-l</code> parameter) and for some reason the task isn’t working on DSpace Test (version 6) right now</del></li> <li>I added DSpace 6 support to the playbook templates…</li> </ul> </li> <li>Run system updates on DSpace Test (linode26), re-deploy the DSpace 6 test branch, and reboot the server <ul> <li>After rebooting I deleted old copies of the cgspace-java-helpers JAR in the DSpace lib directory and then the curation worked</li> <li>To my great surprise the curation worked (and completed, albeit a few times slower) on my local DSpace 6 environment as well:</li> </ul> </li> </ul> <pre tabindex="0"><code>$ ~/dspace63/bin/dspace curate -t countrycodetagger -i all -s object </code></pre><h2 id="2020-09-10">2020-09-10</h2> <ul> <li>I checked the country code tagger on CGSpace and DSpace Test and it ran fine from the systemd timer last night… w00t</li> <li>I started looking at Peter’s changes to the CGSpace regions that were proposed in 2020-07 <ul> <li>The changes will be:</li> </ul> </li> </ul> <pre tabindex="0"><code>$ cat 2020-09-10-fix-cgspace-regions.csv cg.coverage.region,correct EAST AFRICA,EASTERN AFRICA WEST AFRICA,WESTERN AFRICA SOUTHEAST ASIA,SOUTHEASTERN ASIA SOUTH ASIA,SOUTHERN ASIA AFRICA SOUTH OF SAHARA,SUB-SAHARAN AFRICA NORTH AFRICA,NORTHERN AFRICA WEST ASIA,WESTERN ASIA SOUTHWEST ASIA,SOUTHWESTERN ASIA $ ./fix-metadata-values.py -i 2020-09-10-fix-cgspace-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -t 'correct' -m 227 -d -n Connected to database. Would fix 12227 occurences of: EAST AFRICA Would fix 7996 occurences of: WEST AFRICA Would fix 3515 occurences of: SOUTHEAST ASIA Would fix 3443 occurences of: SOUTH ASIA Would fix 1134 occurences of: AFRICA SOUTH OF SAHARA Would fix 357 occurences of: NORTH AFRICA Would fix 81 occurences of: WEST ASIA Would fix 3 occurences of: SOUTHWEST ASIA </code></pre><ul> <li>I think we need to wait for the web team, though, as they need to update their mappings <ul> <li>Not to mention that we’ll need to give WLE and CCAFS time to update their harvesters as well… hmmm</li> </ul> </li> <li>Looking at the top user agents active on CGSpace in 2020-08 and I see: <ul> <li><code>Delphi 2009</code>: 235353 (this is GARDIAN harvester I guess, as the IP is in Greece)</li> <li><code>Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)</code>: 57004 (IP is 18.196.100.94, and the requests seem to be for CTA’s content)</li> <li><code>RTB website BOT</code>: 12282</li> <li><code>ILRI Livestock Website Publications importer BOT</code>: 9393</li> </ul> </li> <li>Shit, I meant to add Delphi to the DSpace spider agents list last month but I guess I didn’t commit the change</li> <li>HTTrack is in the agents list so I’m not sure why DSpace registers a hit from that request</li> <li>Also, I am surprised to see the RTB and ILRI bots here because they have “BOT” in the name and that should also be dropped</li> <li>I also see hits from <code>curl</code> and <code>Java/1.8.0_66</code> and <code>Apache-HttpClient</code> so WTF… those are supposed to be dropped by the default agents list</li> <li>Some IP <code>2607:f298:5:101d:f816:3eff:fed9:a484</code> made 9,000 requests with the <code>RI/1.0</code> user agent this year… <ul> <li>That’s on DreamHost…?</li> </ul> </li> <li>I purged 448658 hits from these agents and added <code>Delphi</code> to our local agents overload for Solr as well as Tomcat’s Crawler Session Manager Valve so that it forces them to re-use a single session</li> <li>I made a pull request on the COUNTER-Robots project for the Daum robot: <a href="https://github.com/atmire/COUNTER-Robots/pull/38">https://github.com/atmire/COUNTER-Robots/pull/38</a> <ul> <li>This bot made 8,000 requests to CGSpace this year</li> <li>I purged about 20,000 total requests from this bot from our Solr stats for the last few years</li> </ul> </li> </ul> <h2 id="2020-09-11">2020-09-11</h2> <ul> <li>Peter noticed that an export from AReS shows some items with zero views and others with zero views/downloads, but on CGSpace and in the statistics API there are views/downloads <ul> <li>I need to ask Moayad…</li> </ul> </li> </ul> <h2 id="2020-09-12">2020-09-12</h2> <ul> <li>Carlos Tejo from the LandPortal emailed to ask for advice about integrating their <a href="https://landvoc.org/">LandVoc</a> vocabulary, which is a subset of AGROVOC, into DSpace <ul> <li>I told him that they could use the DSpace authority control framework and sent an example of the VIAFAuthority from the DSpace-CRIS project: <a href="https://github.com/4Science/DSpace/blob/dspace-6_x_x-cris/dspace-api/src/main/java/org/dspace/content/authority/VIAFAuthority.java">https://github.com/4Science/DSpace/blob/dspace-6_x_x-cris/dspace-api/src/main/java/org/dspace/content/authority/VIAFAuthority.java</a></li> </ul> </li> <li>Redeploy the latest <code>5_x-prod</code> branch on CGSpace, re-run the latest Ansible DSpace playbook, run all system updates, and reboot the server (linode18) <ul> <li>This will bring the latest bot lists for Solr and Tomcat</li> <li>I had to restart Tomcat 7 three times before all Solr statistics cores came up OK</li> </ul> </li> <li>Leroy and Carol from CIAT/Bioversity were asking for information about posting to the CGSpace REST API from Sharepoint <ul> <li>I told them that we don’t allow this yet, but that we need to check in the future whether content can be posted to a workflow</li> </ul> </li> </ul> <h2 id="2020-09-15">2020-09-15</h2> <ul> <li>Charlotte from Altmetric said they had issues parsing the XML file I sent them last month <ul> <li>I told them that it was mimicking the same format that they had sent me (fourteen pages of XML responses concatenated together)!</li> </ul> </li> <li>A few days ago IWMI asked us if we can add a new field on CGSpace for their library identifier <ul> <li>The IDs look like this: H049940</li> <li>I suggested that we use <code>cg.identifier.iwmilibrary</code></li> <li>I added it to the input forms and push it to the <code>5_x-prod</code> and 6.x branches and will re-deploy it in the next few days</li> </ul> </li> <li>Abenet asked me to import sixty-nine (69) CIP Annual Reports to CGSpace <ul> <li>I looked at the data in OpenRefine and it is very good quality</li> <li>I only added descriptions to the filename field so that SAFBuilder will add them to the bitstreams on import:</li> </ul> </li> </ul> <pre tabindex="0"><code>value + "__description:" + cells["dc.type"].value </code></pre><ul> <li>Then I created a SAF bundle with SAFBuilder:</li> </ul> <pre tabindex="0"><code>$ ./safbuilder.sh -c ~/Downloads/cip-annual-reports/cip-reports.csv </code></pre><ul> <li>And imported them into my local test instance of CGSpace:</li> </ul> <pre tabindex="0"><code>$ ~/dspace/bin/dspace import -a -e y.arrr@cgiar.org -m /tmp/2020-09-15-cip-annual-reports.map -s ~/Downloads/cip-annual-reports/SimpleArchiveFormat </code></pre><ul> <li>Then I uploaded them to CGSpace</li> </ul> <h2 id="2020-09-16">2020-09-16</h2> <ul> <li>Looking further into Carlos Tejos’s question about integrating LandVoc (the AGROVOC subset) into DSpace <ul> <li>I see that you can actually get LandVoc concepts directly from AGROVOC’s SPARQL, for example with <a href="http://agrovoc.uniroma2.it/sparql#query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX+skos%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23%3E%0A%0ASELECT+%3Fconcept%0AWHERE+%7B%0A++%3Fconcept+a+skos%3AConcept+%3B%0A+++++++++++skos%3AinScheme+%3Chttp%3A%2F%2Flandvoc.org%2Flandvoc%3E+.%0A%0A%7D+ORDER+BY+%3Fconcept&contentTypeConstruct=text%2Fturtle&contentTypeSelect=application%2Fsparql-results%2Bjson&endpoint=http%3A%2F%2Fagrovoc.uniroma2.it%2Fsparql&requestMethod=POST&tabTitle=Query&headers=%7B%7D&outputFormat=table">this query</a></li> </ul> </li> </ul> <p><img src="/cgspace-notes/2020/09/agrovoc-landvoc-sparql.png" alt="AGROVOC LandVoc SPARQL"></p> <ul> <li>So maybe we can query AGROVOC directly using a similar method to <a href="https://github.com/4Science/DSpace/blob/dspace-5_x_x-cris/dspace-api/src/main/java/org/dspace/content/authority/TGNAuthority.java">DSpace-CRIS’s GettyAuthority</a></li> <li>I wired up DSpace-CRIS’s VIAFAuthority to see how authorities for auto suggested names get stored <ul> <li>After submission you can see the item’s VIAF identifier:</li> </ul> </li> </ul> <p><img src="/cgspace-notes/2020/09/viaf-authority.png" alt="VIAF authority"></p> <ul> <li>And this identifier is the ID on VIAF, pretty cool!</li> </ul> <p><img src="/cgspace-notes/2020/09/viaf-darwin.png" alt="VIAF entry for Charles Darwin"></p> <ul> <li>I did a similar test with the Getty Thesaurus of Geographic Names (TGN) and it stores the concept URI in the authority:</li> </ul> <p><img src="/cgspace-notes/2020/09/tgn-concept-uri.png" alt="TGNAuthority"></p> <ul> <li>But the authority values are not exposed anywhere as metadata… <ul> <li>I need to play with it a bit more I guess…</li> </ul> </li> <li>The nice thing is that the Getty example from DSpace-CRIS uses SPARQL as well, and the TGN authority extends it <ul> <li>We could use a similar model for AGROVOC/LandVoc very easily</li> </ul> </li> </ul> <h2 id="2020-09-17">2020-09-17</h2> <ul> <li>Maria from Bioveristy asked about the ORCID identifier for one of her colleagues that seems to have been removed from our list <ul> <li>I re-added it to our controlled vocabulary and added the identifier to fifty-one of his existing items on CGSpace using my script:</li> </ul> </li> </ul> <pre tabindex="0"><code>$ cat 2020-09-17-add-bioversity-orcids.csv dc.contributor.author,cg.creator.id "Etten, Jacob van","Jacob van Etten: 0000-0001-7554-2558" "van Etten, Jacob","Jacob van Etten: 0000-0001-7554-2558" $ ./add-orcid-identifiers-csv.py -i 2020-09-17-add-bioversity-orcids.csv -db dspace -u dspace -p 'dom@in34sniper' </code></pre><ul> <li>I sent a follow-up message to Atmire to look into the two remaining issues with the DSpace 6 upgrade <ul> <li>First is the fact that we have zero results in our Listings and Reports, for any search</li> <li>Second is the error we get during CSV imports</li> </ul> </li> <li>Help Natalia and Cathy from Bioversity-CIAT with their OpenSearch query on “trade offs” again <ul> <li>They wanted to build a search query with multiple filters (type, crpsubject, status) and the general query “trade offs”</li> <li>I found a great <a href="https://www.kiwi.fi/pages/viewpage.action?pageId=45782169">reference for DSpace’s OpenSearch syntax</a> (albeit in Finnish, but the example URLs show the syntax clearly)</li> <li>We can use quotes and <code>AND</code> and <code>OR</code> and even group search parameters with parenthesis!</li> <li>So now I built a query for Natalia which uses these (showing without URL encoding so you can see the syntax):</li> </ul> </li> </ul> <pre tabindex="0"><code>https://cgspace.cgiar.org/open-search/discover?query=type:"Journal Article" AND status:"Open Access" AND crpsubject:"Water, Land and Ecosystems" AND "tradeoffs"&rpp=100 </code></pre><ul> <li>I noticed that my <code>move-collections.sh</code> script didn’t work on DSpace 6 because of the change from IDs to UUIDs, so I modified it to quote the collection <code>resource_id</code> parameters in the PostgreSQL query</li> </ul> <h2 id="2020-09-18">2020-09-18</h2> <ul> <li>Help Natalia with her WLE “tradeoffs” search query again…</li> </ul> <h2 id="2020-09-20">2020-09-20</h2> <ul> <li>Deploy latest 5_x-prod branch on CGSpace, run all system updates, and reboot the server <ul> <li>To my great surprise, all the Solr statistics cores came up correctly after reboot</li> </ul> </li> <li>Deploy latest 6_x-dev branch on DSpace Test, run all system updates and reboot the server</li> </ul> <h2 id="2020-09-22">2020-09-22</h2> <ul> <li>Abenet sent some feedback about AReS <ul> <li>The item views and downloads are still incorrect</li> <li>I looked in the server’s API logs and there are no errors, and the database has many more views/downloads:</li> </ul> </li> </ul> <pre tabindex="0"><code>dspacestatistics=# SELECT SUM(views) FROM items; sum ---------- 15714024 (1 row) dspacestatistics=# SELECT SUM(downloads) FROM items; sum ---------- 13979911 (1 row) </code></pre><ul> <li>I deleted “Report” from twelve items that had it in their peer review field:</li> </ul> <pre tabindex="0"><code>dspace=# BEGIN; BEGIN dspace=# DELETE FROM metadatavalue WHERE text_value='Report' AND resource_type_id=2 AND metadata_field_id=68; DELETE 12 dspace=# COMMIT; </code></pre><ul> <li>I added all CG center- and CRP-specific subject fields and mapped them to <code>dc.subject</code> in AReS</li> <li>After forcing a re-harvesting now the review status is much cleaner and the missing subjects are available</li> <li>Last week Natalia from CIAT had asked me to download all the PDFs for a certain query: <ul> <li>items with status “Open Access”</li> <li>items with type “Journal Article”</li> <li>items containing any of the following words: water land and ecosystems & trade offs</li> <li>The resulting OpenSearch query is: <a href="https://cgspace.cgiar.org/open-search/discover?query=type:%22Journal">https://cgspace.cgiar.org/open-search/discover?query=type:"Journal</a> Article" AND status:“Open Access” AND Water Land Ecosystems trade offs&rpp=1</li> <li>There were 241 results with a total of 208 PDFs, which I downloaded with my <code>get-wle-pdfs.py</code> script and shared to her via bashupload.com</li> </ul> </li> </ul> <h2 id="2020-09-23">2020-09-23</h2> <ul> <li>Peter said he was having problems submitting items to CGSpace <ul> <li>On a hunch I looked at the PostgreSQL locks in Munin and indeed the normal issue with locks is back (though I haven’t seen it in a few months?)</li> </ul> </li> </ul> <p><img src="/cgspace-notes/2020/09/postgres_connections_ALL-day.png" alt="PostgreSQL connections day"></p> <ul> <li>Instead of restarting Tomcat I restarted the PostgreSQL service and then Peter said he was able to submit the item…</li> <li>Experiment with doing direct queries for items in the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> <ul> <li>I tested querying a handful of item UUIDs with a date range and returning their hits faceted by <code>id</code></li> <li>Assuming a list of item UUIDs was posted to the REST API we could prepare them for a Solr query by joining them into a string with “OR” and escaping the hyphens:</li> </ul> </li> </ul> <pre tabindex="0"><code>... item_ids = ['0079470a-87a1-4373-beb1-b16e3f0c4d81', '007a9df1-0871-4612-8b28-5335982198cb'] item_ids_str = ' OR '.join(item_ids).replace('-', '\-') ... solr_query_params = { "q": f"id:({item_ids_str})", "fq": "type:2 AND isBot:false AND statistics_type:view AND time:[2020-01-01T00:00:00Z TO 2020-09-02T00:00:00Z]", "facet": "true", "facet.field": "id", "facet.mincount": 1, "facet.limit": 1, "facet.offset": 0, "stats": "true", "stats.field": "id", "stats.calcdistinct": "true", "shards": shards, "rows": 0, "wt": "json", } </code></pre><ul> <li>The date range format for Solr is important, but it seems we only need to add <code>T00:00:00Z</code> to the normal ISO 8601 YYYY-MM-DD strings</li> </ul> <h2 id="2020-09-25">2020-09-25</h2> <ul> <li>I did some more work on the dspace-statistics-api and finalized the support for sending a POST to <code>/items</code>:</li> </ul> <pre tabindex="0"><code>$ curl -s -d @request.json https://dspacetest.cgiar.org/rest/statistics/items | json_pp { "currentPage" : 0, "limit" : 10, "statistics" : [ { "downloads" : 3329, "id" : "b2c1bbfd-65b0-438c-9e49-d271c49b2696", "views" : 1565 }, { "downloads" : 3797, "id" : "f44cf173-2344-4eb2-8f00-ee55df32c76f", "views" : 48 }, { "downloads" : 11064, "id" : "8542f9da-9ce1-4614-abf4-f2e3fdb4b305", "views" : 26 }, { "downloads" : 6782, "id" : "2324aa41-e9de-4a2b-bc36-16241464683e", "views" : 19 }, { "downloads" : 48, "id" : "0fe573e7-042a-4240-a4d9-753b61233908", "views" : 12 }, { "downloads" : 0, "id" : "000e61ca-695d-43e5-9ab8-1f3fd7a67a32", "views" : 4 }, { "downloads" : 0, "id" : "000dc7cd-9485-424b-8ecf-78002613cc87", "views" : 1 }, { "downloads" : 0, "id" : "000e1616-3901-4431-80b1-c6bc67312d8c", "views" : 1 }, { "downloads" : 0, "id" : "000ea897-5557-49c7-9f54-9fa192c0f83b", "views" : 1 }, { "downloads" : 0, "id" : "000ec427-97e5-4766-85a5-e8dd62199ab5", "views" : 1 } ], "totalPages" : 13 } </code></pre><ul> <li>I deployed it on DSpace Test and sent a note to Salem so he can test it</li> <li>I still need to add tests…</li> <li>After that I will probably tag it as version 1.3.0</li> </ul> <h2 id="2020-09-25-1">2020-09-25</h2> <ul> <li>Atmire responded with some notes about the issues we’re having with CUA and L&R on DSpace Test <ul> <li>They think they have found the reason the issues are happening…</li> </ul> </li> </ul> <h2 id="2020-09-29">2020-09-29</h2> <ul> <li>Atmire sent a pull request yesterday with a potential fix for the Listings and Reports (L&R) issue <ul> <li>I tried to build it on DSpace Test but I got an HTTP 401 Unauthorized for the artifact</li> <li>I sent them a message…</li> </ul> </li> </ul> <h2 id="2020-09-30">2020-09-30</h2> <ul> <li>Experiment with re-creating IWMI’s “Monthly Abstract” type report with an AReS template <ul> <li>The template library for reports is: <a href="https://docxtemplater.com">https://docxtemplater.com</a></li> <li>Conditions start with a pound and end with a slash: {#items} {/items}</li> <li>An inverted section begins with a caret (hat) and ends with a slash: {^citation} No citation{/citation}</li> <li>I found a bug: templates with a space in the file name don’t download</li> <li>It would be nice if we could use <a href="https://docxtemplater.readthedocs.io/en/latest/angular_parse.html">angular expressions</a> to make more complex templates <ul> <li>Ability to iterate over authors (to change the separator)</li> <li>Ability to get item number in a loop (for a list)</li> <li>To do things like checking if a CRP is “WLE”</li> </ul> </li> </ul> </li> </ul> <!-- raw HTML omitted --> </article> </div> <!-- /.blog-main --> <aside class="col-sm-3 ml-auto blog-sidebar"> <section class="sidebar-module"> <h4>Recent Posts</h4> <ol class="list-unstyled"> <li><a href="/cgspace-notes/2024-07/">July, 2024</a></li> <li><a href="/cgspace-notes/2024-06/">June, 2024</a></li> <li><a href="/cgspace-notes/2024-05/">May, 2024</a></li> <li><a href="/cgspace-notes/2024-04/">April, 2024</a></li> <li><a href="/cgspace-notes/2024-03/">March, 2024</a></li> </ol> </section> <section class="sidebar-module"> <h4>Links</h4> <ol class="list-unstyled"> <li><a href="https://cgspace.cgiar.org">CGSpace</a></li> <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li> <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li> </ol> </section> </aside> </div> <!-- /.row --> </div> <!-- /.container --> <footer class="blog-footer"> <p dir="auto"> Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>. </p> <p> <a href="#">Back to top</a> </p> </footer> </body> </html>