--- title: "September, 2020" date: 2020-09-02T15:35:54+03:00 author: "Alan Orth" categories: ["Notes"] --- ## 2020-09-02 - Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS - The AReS Explorer hasn't updated its index since 2020-08-22 when I last forced it - I restarted it again now and told Moayad that the automatic indexing isn't working - Add `Alliance of Bioversity International and CIAT` to affiliations on CGSpace - Abenet told me that the general search text on AReS doesn't get reset when you use the "Reset Filters" button - I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/39 - I filed an issue on OpenRXV to make some minor edits to the admin UI: https://github.com/ilri/OpenRXV/issues/40 - I ran the country code tagger on CGSpace: ``` $ time chrt -b 0 dspace curate -t countrycodetagger -i all -r - -l 500 -s object | tee /tmp/2020-09-02-countrycodetagger.log ... real 2m10.516s user 1m43.953s sys 0m15.192s $ grep -c added /tmp/2020-09-02-countrycodetagger.log 39 ``` - I still need to create a cron job for this... - Sisay and Abenet said they can't log in with LDAP on DSpace Test (DSpace 6) - I tried and I can't either... but it is working on CGSpace - The error on DSpace 6 is: ``` 2020-09-02 12:03:10,666 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=A629116488DCC467E1EA2062A2E2EFD7:ip_addr=92.220.02.201:failed_login:no DN found for user aorth ``` - I tried to query LDAP directly using the application credentials with ldapsearch and it works: ``` $ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b "dc=cgiarad,dc=org" -D "applicationaccount@cgiarad.org" -W "(sAMAccountName=me)" ``` - According to the [DSpace 6 docs](https://wiki.lyrasis.org/display/DSDOC6x/Authentication+Plugins#AuthenticationPlugins-LDAPAuthentication) we need to escape commas in our LDAP parameters due to the new configuration system - I added the commas and restarted DSpace (though technically we shouldn't need to restart due to the new config system hot reloading configs) - Run all system updates on DSpace Test (linode26) and reboot it - After the restart LDAP login works... ## 2020-09-03 - Fix some erroneous "review status" fields that Abenet noticed on AReS - I used my `fix-metadata-values.py` and `delete-metadata-values.py` scripts with the following input files: ``` $ cat 2020-09-03-fix-review-status.csv dc.description.version,correct Externally Peer Reviewed,Peer Review Peer Reviewed,Peer Review Peer review,Peer Review Peer reviewed,Peer Review Peer-Reviewed,Peer Review Peer-reviewed,Peer Review peer Review,Peer Review $ cat 2020-09-03-delete-review-status.csv dc.description.version Report Formally Published Poster Unrefereed reprint $ ./delete-metadata-values.py -i 2020-09-03-delete-review-status.csv -db dspace -u dspace -p 'fuuu' -f dc.description.version -m 68 $ ./fix-metadata-values.py -i 2020-09-03-fix-review-status.csv -db dspace -u dspace -p 'fuuu' -f dc.description.version -t 'correct' -m 68 ``` - Start reviewing 95 items for IITA (20201stbatch) - I used my [csv-metadata-quality](https://github.com/ilri/csv-metadata-quality) tool to check and fix some low-hanging fruit first - This fixed a few unnecessary Unicode, excessive whitespace, invalid multi-value separator, and duplicate metadata values - Then I looked at the data in OpenRefine and noticed some things: - All issue dates use year only, but some have months in the citation so they could be more specific - I normalized all the DOIs to use "https://doi.org" format - I fixed a few AGROVOC subjects with a simple GREL: `value.replace("GRAINS","GRAIN").replace("SOILS","SOIL").replace("CORN","MAIZE")` - But there are a few more that are invalid that she will have to look at - I uploaded the items to [DSpace Test](https://dspacetest.cgiar.org/handle/10568/108357) and it was apparently successful but I get these errors to the console: ``` Thu Sep 03 12:26:33 CEST 2020 | Query:containerItem:ea7a2648-180d-4fce-bdc5-c3aa2304fc58 Error while updating java.lang.NullPointerException at com.atmire.dspace.cua.CUASolrLoggerServiceImpl$5.visit(SourceFile:1131) at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.visitEachStatisticShard(SourceFile:212) at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1104) at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1093) at org.dspace.statistics.StatisticsLoggingConsumer.consume(SourceFile:104) at org.dspace.event.BasicDispatcher.consume(BasicDispatcher.java:177) at org.dspace.event.BasicDispatcher.dispatch(BasicDispatcher.java:123) at org.dspace.core.Context.dispatchEvents(Context.java:455) at org.dspace.core.Context.commit(Context.java:424) at org.dspace.core.Context.complete(Context.java:380) at org.dspace.app.bulkedit.MetadataImport.main(MetadataImport.java:1399) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81) ``` - There are more in the DSpace log so I will raise it with Atmire immediately ## 2020-09-04 - I was checking the recent IITA data for duplicates when I noticed that one in CIFOR's Archive and saw that CIFOR has updated a bunch of their website URLs, for example: - http://www.cifor.org/nc/online-library/browse/view-publication/publication/151.html → https://www.cifor.org/knowledge/publication/151 - https://www.cifor.org/library/4033 → https://www.cifor.org/knowledge/publication/4033 - https://www.cifor.org/pid/5087 → https://www.cifor.org/knowledge/publication/5087 - I will update our nearly 6,000 metadata values for CIFOR in the database accordingly: ``` dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^(http://)?www\.cifor\.org/(nc/)?online-library/browse/view-publication/publication/([[:digit:]]+)\.html$', 'https://www.cifor.org/knowledge/publication/\3') WHERE metadata_field_id=219 AND text_value ~ 'www\.cifor\.org/(nc/)?online-library/browse/view-publication/publication/[[:digit:]]+'; dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^https?://www\.cifor\.org/library/([[:digit:]]+)/?$', 'https://www.cifor.org/knowledge/publication/\1') WHERE metadata_field_id=219 AND text_value ~ 'https?://www\.cifor\.org/library/[[:digit:]]+/?'; dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^https?://www\.cifor\.org/pid/([[:digit:]]+)/?$', 'https://www.cifor.org/knowledge/publication/\1') WHERE metadata_field_id=219 AND text_value ~ 'https?://www\.cifor\.org/pid/[[:digit:]]+'; ``` - I did some cleanup on the author affiliations of the IITA data our 2019-04 list using reconcile-csv and OpenRefine: - `$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id` - I always forget how to copy the reconciled values in OpenRefine, but you need to make a new column and populate it using this GREL: `if(cell.recon.matched, cell.recon.match.name, value)` - I mapped one duplicated from the CIFOR Archives and re-uploaded the 94 IITA items to a new collection on [DSpace Test](https://dspacetest.cgiar.org/handle/10568/108453) ## 2020-09-08 - I noticed that the "share" link in AReS wasn't working properly because it excludes the "explorer" part of the URI ![AReS share link broken](/cgspace-notes/2020/09/ares-share-link.png) - I filed an issue on GitHub: https://github.com/ilri/OpenRXV/issues/41 - I uploaded the 94 IITA items that I had been working on last week to CGSpace - RTB emailed to ask why they are getting HTTP 503 errors during harvesting to the RTB WordPress website - From the screenshot I can see they are requesting URLs like this: ``` https://cgspace.cgiar.org/bitstream/handle/10568/82745/Characteristics-Silage.JPG ``` - So they end up getting rate limited due to the XMLUI rate limits - I told them to use the REST API bitstream retrieve links, because we don't have any rate limits there ## 2020-09-09 - Wire up the systemd service/timer for the CGSpace Country Code Tagger curation task in the [Ansible infrastructure scripts](https://github.com/ilri/rmg-ansible-public) - ~~For now it won't work on DSpace 6 because the curation task invocation needs to be slightly different (minus the `-l` parameter) and for some reason the task isn't working on DSpace Test (version 6) right now~~ - I added DSpace 6 support to the playbook templates... - Run system updates on DSpace Test (linode26), re-deploy the DSpace 6 test branch, and reboot the server - After rebooting I deleted old copies of the cgspace-java-helpers JAR in the DSpace lib directory and then the curation worked - To my great surprise the curation worked (and completed, albeit a few times slower) on my local DSpace 6 environment as well: ``` $ ~/dspace63/bin/dspace curate -t countrycodetagger -i all -s object ``` ## 2020-09-10 - I checked the country code tagger on CGSpace and DSpace Test and it ran fine from the systemd timer last night... w00t - I started looking at Peter's changes to the CGSpace regions that were proposed in 2020-07 - The changes will be: ``` $ cat 2020-09-10-fix-cgspace-regions.csv cg.coverage.region,correct EAST AFRICA,EASTERN AFRICA WEST AFRICA,WESTERN AFRICA SOUTHEAST ASIA,SOUTHEASTERN ASIA SOUTH ASIA,SOUTHERN ASIA AFRICA SOUTH OF SAHARA,SUB-SAHARAN AFRICA NORTH AFRICA,NORTHERN AFRICA WEST ASIA,WESTERN ASIA SOUTHWEST ASIA,SOUTHWESTERN ASIA $ ./fix-metadata-values.py -i 2020-09-10-fix-cgspace-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -t 'correct' -m 227 -d -n Connected to database. Would fix 12227 occurences of: EAST AFRICA Would fix 7996 occurences of: WEST AFRICA Would fix 3515 occurences of: SOUTHEAST ASIA Would fix 3443 occurences of: SOUTH ASIA Would fix 1134 occurences of: AFRICA SOUTH OF SAHARA Would fix 357 occurences of: NORTH AFRICA Would fix 81 occurences of: WEST ASIA Would fix 3 occurences of: SOUTHWEST ASIA ``` - I think we need to wait for the web team, though, as they need to update their mappings - Not to mention that we'll need to give WLE and CCAFS time to update their harvesters as well... hmmm