+++ date = "2017-07-01T18:03:52+03:00" author = "Alan Orth" title = "July, 2017" tags = ["Notes"] +++ ## 2017-07-01 - Run system updates and reboot DSpace Test ## 2017-07-04 - Merge changes for WLE Phase II theme rename ([#329](https://github.com/ilri/DSpace/pull/329)) - Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace - We can use PostgreSQL's extended output format (`-x`) plus `sed` to format the output into quasi XML: ``` $ psql dspacenew -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=5 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:\n\ncg:;s:([^ ]*) +\| (.*): <\1>\2:;s:^$::;1s:\n::' ``` - The `sed` script is from a post on the [PostgreSQL mailing list](https://www.postgresql.org/message-id/437E44A5.508%40ultimeth.com) - Abenet says the ILRI board wants to be able to have "lead author" for every item, so I've whipped up a WIP test in the `5_x-lead-author` branch - It works but is still very rough and we haven't thought out the whole lifecycle yet ![Testing lead author in submission form](/cgspace-notes/2017/07/lead-author-test.png) - I assume that "lead author" would actually be the first question on the item submission form - We also need to check to see which ORCID authority core this uses, because it seems to be using an entirely new one rather than the one for `dc.contributor.author` (which makes sense of course, but fuck, all the author problems aren't bad enough?!) - Also would need to edit XMLUI item displays to incorporate this into authors list - And fuck, then anyone consuming our data via REST / OAI will not notice that we have an author outside of `dc.contributor.authors`... ugh - What if we modify the item submission form to use [`type-bind` fields to show/hide certain fields depending on the type](https://wiki.lyrasis.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ItemtypeBasedMetadataCollection)? ## 2017-07-05 - Adjust WLE Research Theme to include both Phase I and II on the submission form according to editor feedback ([#330](https://github.com/ilri/DSpace/pull/330)) - Generate list of fields in the current CGSpace `cg` scheme so we can record them properly in the metadata registry: ``` $ psql dspace -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=2 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:\n\ncg:;s:([^ ]*) +\| (.*): <\1>\2:;s:^$::;1s:\n::' > cg-types.xml ``` - CGSpace was unavailable briefly, and I saw this error in the DSpace log file: ``` 2017-07-05 13:05:36,452 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections ``` - Looking at the `pg_stat_activity` table I saw there were indeed 98 active connections to PostgreSQL, and at this time the limit is 100, so that makes sense - Tsega restarted Tomcat and it's working now - Abenet said she was generating a report with Atmire's CUA module, so it could be due to that? - Looking in the logs I see this random error again that I should report to DSpace: ``` 2017-07-05 13:50:07,196 ERROR org.dspace.statistics.SolrLogger @ COUNTRY ERROR: EU ``` - Seems to come from `dspace-api/src/main/java/org/dspace/statistics/SolrLogger.java` ## 2017-07-06 - Sisay tried to help by making a [pull request for the RTB flagships](https://github.com/ilri/DSpace/pull/331) but there are formatting errors, unrelated changes, and the flagship names are not in the style I requested - Abenet talked to CIP and they said they are actually ok with using collection names rather than adding a new metadata field ## 2017-07-13 - Remove `UKaid` from the controlled vocabulary for `dc.description.sponsorship`, as `Department for International Development, United Kingdom` is the correct form and it is already present ([#334](https://github.com/ilri/DSpace/pull/334)) ## 2017-07-14 - Sisay sent me a patch to add "Photo Report" to `dc.type` so I've added it to the `5_x-prod` branch ## 2017-07-17 - Linode shut down our seventeen (17) VMs due to nonpayment of the July 1st invoice - It took me a few hours to find the ICT/Finance contacts to pay the bill and boot all the servers back up - Since the server was down anyways, I decided to run all system updates and re-deploy CGSpace so that the latest changes to `input-forms.xml` and the sponsors controlled vocabulary ## 2017-07-20 - Skype chat with Addis team about the status of the CGIAR Library migration - Need to add the CGIAR System Organization subjects to Discovery Facets (test first) - Tentative list of dates for the migration: - August 4: aim to finish data cleanup and then give Peter a list of authors - August 18: ready to show System Office - September 4: all feedback and decisions (including workflows) from System Office - September 10/11: go live? - Talk to Tsega and Danny about exporting/injesting the blog posts from Drupal into DSpace? - Followup meeting on August 8/9? - Sent Abenet the 2415 records from CGIAR Library's Historical Archive (10947/1) after cleaning up the author authorities and HTML entities in `dc.contributor.author` and `dc.description.abstract` using OpenRefine: - Authors: `value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")` - Abstracts: `replace(value,/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>/,'')` ## 2017-07-24 - Move two top-level communities to be sub-communities of ILRI Projects ``` $ for community in 10568/2347 10568/25209; do /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child="$community"; done ``` - Discuss CGIAR Library data cleanup with Sisay and Abenet ## 2017-07-27 - Help Sisay with some transforms to add descriptions to the `filename` column of some CIAT Presentations he's working on in OpenRefine - Marianne emailed a few days ago to ask why "Integrating Ecosystem Solutions" was not in the list of WLE Phase I Research Themes on the input form - I told her that I only added the themes that I saw in the [WLE Phase I Research Themes](https://cgspace.cgiar.org/handle/10568/34508) community - Then Mia from WLE also emailed to ask where some WLE focal regions went, and I said I didn't understand what she was talking about, as all we did in our previous work was rename the old "Research Themes" subcommunity to "WLE Phase I Research Themes" and add a new subcommunity for "WLE Phase II Research Themes". - Discuss some modifications to the CCAFS project tags in CGSpace submission form and in the database ## 2017-07-28 - Discuss updates to the Phase II CCAFS project tags with Andrea from Macaroni Bros - I will do the renaming and untagging of items in CGSpace database, and he will update his webservice with the latest project tags and I will get the XML from here for our `input-forms.xml`: https://ccafs.cgiar.org/export/ccafsproject ## 2017-07-29 - Move some WLE items into appropriate Phase I Research Themes communities and delete some empty collections in WLE Regions community ## 2017-07-30 - Start working on CCAFS project tag cleanup - More questions about inconsistencies and spelling mistakes in their tags, so I've sent some questions for followup ## 2017-07-31 - Looks like the final list of metadata corrections for CCAFS project tags will be: ``` delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica'; update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED'; update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA'; delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions'; ``` - Now just waiting to run them on CGSpace, and then apply the modified input forms after Macaroni Bros give me an updated list - Temporarily increase the nginx upload limit to 200MB for Sisay to upload the CIAT presentations - Looking at CGSpace activity page, there are 52 Baidu bots concurrently crawling our website (I copied the activity page to a text file and grep it)! ``` $ grep 180.76. /tmp/status | awk '{print $5}' | sort | uniq | wc -l 52 ``` - From looking at the `dspace.log` I see they are all using the same session, which means our Crawler Session Manager Valve is working