--- title: "March, 2018" date: 2018-03-02T16:07:54+02:00 author: "Alan Orth" tags: ["Notes"] --- ## 2018-03-02 - Export a CSV of the IITA community metadata for Martin Mueller ## 2018-03-06 - Add three new CCAFS project tags to `input-forms.xml` ([#357](https://github.com/ilri/DSpace/pull/357)) - Andrea from Macaroni Bros had sent me an email that CCAFS needs them - Give Udana more feedback on his WLE records from last month - There were some records using a non-breaking space in their AGROVOC subject field - I checked and tested some author corrections from Peter from last week, and then applied them on CGSpace ``` $ ./fix-metadata-values.py -i Correct-309-authors-2018-03-06.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 $ ./delete-metadata-values.py -i Delete-3-Authors-2018-03-06.csv -db dspace -u dspace-p 'fuuu' -f dc.contributor.author -m 3 ``` - This time there were no errors in whitespace but I did have to correct one incorrectly encoded accent character - Add new CRP subject "GRAIN LEGUMES AND DRYLAND CEREALS" to `input-forms.xml` ([#358](https://github.com/ilri/DSpace/pull/358)) - Merge the ORCID integration stuff in to `5_x-prod` for deployment on CGSpace soon ([#359](https://github.com/ilri/DSpace/pull/359)) - Deploy ORCID changes on CGSpace (linode18), run all system updates, and reboot the server - Run all system updates on DSpace Test and reboot server - I ran the [orcid-authority-to-item.py](https://gist.github.com/alanorth/24d8081a5dc25e2a4e27e548e7e2389c) script on CGSpace and mapped 2,864 ORCID identifiers from Solr to item metadata ``` $ ./orcid-authority-to-item.py -db dspace -u dspace -p 'fuuu' -s http://localhost:8081/solr -d ``` - I ran the DSpace cleanup script on CGSpace and it threw an error (as always): ``` Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle" Detail: Key (bitstream_id)=(150659) is still referenced from table "bundle". ``` - The solution is, as always: ``` $ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (150659);' UPDATE 1 ``` - Apply the proposed PostgreSQL indexes from DS-3636 (pull request [#1791](https://github.com/DSpace/DSpace/pull/1791/) on CGSpace (linode18) ## 2018-03-07 - Add CIAT author Mauricio Efren Sotelo Cabrera to controlled vocabulary for ORCID identifiers ([#360](https://github.com/ilri/DSpace/pull/360)) - Help Sisay proof 200 IITA records on DSpace Test - Finally import Udana's 24 items to [IWMI Journal Articles](https://cgspace.cgiar.org/handle/10568/36185) on CGSpace - Skype with James Stapleton to discuss CGSpace, ILRI website, CKM staff issues, etc ## 2018-03-08 - Looking at a CSV dump of the CIAT community I see there are tons of stupid text languages people add for their metadata - This makes the CSV have tons of columns, for example `dc.title`, `dc.title[]`, `dc.title[en]`, `dc.title[eng]`, `dc.title[en_US]` and so on! - I think I can fix — or at least normalize — them in the database: ``` dspace=# select distinct text_lang from metadatavalue where resource_type_id=2; text_lang ----------- ethnob en spa EN En en_ en_US E. EN_US en_U eng fr es_ES es (16 rows) dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('en','EN','En','en_','EN_US','en_U','eng'); UPDATE 122227 dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2; text_lang ----------- ethnob en_US spa E. fr es_ES es (9 rows) ``` - On second inspection it looks like `dc.description.provenance` fields use the text_lang "en" so that's probably why there are over 100,000 fields changed... - If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc: ``` dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng'); UPDATE 2309 ``` - I will apply this on CGSpace right now - In other news, I was playing with adding ORCID identifiers to a dump of CIAT's community via CSV in OpenRefine - Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the `cg.creator.id` field - For example, a GREL expression in a custom text facet to get all items with `dc.contributor.author[en_US]` of a certain author with several name variations (this is how you use a logical OR in OpenRefine): ``` or(value.contains('Ceballos, Hern'), value.contains('Hernández Ceballos')) ``` - Then you can flag or star matching items and then use a conditional to either set the value directly or add it to an existing value: ``` if(isBlank(value), "Hernan Ceballos: 0000-0002-8744-7918", value + "||Hernan Ceballos: 0000-0002-8744-7918") ``` - One thing that bothers me is that this won't honor author order - It might be better to do batches of these in PostgreSQL with a script that takes the `place` column of an author into account when setting the `cg.creator.id` - I wrote a Python script to read the author names and ORCID identifiers from CSV and create matching `cg.creator.id` fieldsa: [add-orcid-identifiers-csv.py ](https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050) - The CSV should have two columns: author name and ORCID identifier: ``` dc.contributor.author,cg.creator.id "Orth, Alan",Alan S. Orth: 0000-0002-1735-7458 "Orth, A.",Alan S. Orth: 0000-0002-1735-7458 ``` - I didn't integrate the ORCID API lookup for author names in this script for now because I was only interested in "tagging" old items for a few given authors - I added ORCID identifers for 187 items by CIAT's Hernan Ceballos, because that is what Elizabeth was trying to do manually! - Also, I decided to add ORCID identifiers for all records from Peter, Abenet, and Sisay as well ## 2018-03-09 - Give James Stapleton input on Sisay's KRAs - Create a pull request to disable ORCID authority integration for `dc.contributor.author` in the submission forms and XMLUI display ([#363](https://github.com/ilri/DSpace/pull/363)) ## 2018-03-11 - Peter also wrote to say he is having issues with the Atmire Listings and Reports module - When I logged in to try it I get a blank white page after continuing and I see this in dspace.log.2018-03-11: ``` 2018-03-11 11:38:15,592 WARN org.dspace.app.webui.servlet.InternalErrorServlet @ :session_id=91C2C0C59669B33A7683570F6010603A:internal_error:-- URL Was: https://cgspace.cgiar.or g/jspui/listings-and-reports -- Method: POST -- Parameters were: -- selected_admin_preset: "ilri authors2" -- load: "normal" -- next: "NEXT STEP >>" -- step: "1" org.apache.jasper.JasperException: java.lang.NullPointerException ``` - Looks like I needed to remove the Humidtropics subject from Listings and Reports because it was looking for the terms and couldn't find them - I made a quick fix and it's working now ([#364](https://github.com/ilri/DSpace/pull/364)) ## 2018-03-12 - Increase upload size on CGSpace's nginx config to 85MB so Sisay can upload some data