cgspace-notes/content/posts/2017-07.md
Alan Orth 246538db59
Update DSpace wiki URL
They moved to Lyrasis in 2019.
2020-04-13 15:30:34 +03:00

150 lines
8.5 KiB
Markdown

+++
date = "2017-07-01T18:03:52+03:00"
author = "Alan Orth"
title = "July, 2017"
tags = ["Notes"]
+++
## 2017-07-01
- Run system updates and reboot DSpace Test
## 2017-07-04
- Merge changes for WLE Phase II theme rename ([#329](https://github.com/ilri/DSpace/pull/329))
- Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace
- We can use PostgreSQL's extended output format (`-x`) plus `sed` to format the output into quasi XML:
<!--more-->
```
$ psql dspacenew -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=5 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:</dc-type>\n<dc-type>\n<schema>cg</schema>:;s:([^ ]*) +\| (.*): <\1>\2</\1>:;s:^$:</dc-type>:;1s:</dc-type>\n::'
```
- The `sed` script is from a post on the [PostgreSQL mailing list](https://www.postgresql.org/message-id/437E44A5.508%40ultimeth.com)
- Abenet says the ILRI board wants to be able to have "lead author" for every item, so I've whipped up a WIP test in the `5_x-lead-author` branch
- It works but is still very rough and we haven't thought out the whole lifecycle yet
![Testing lead author in submission form](/cgspace-notes/2017/07/lead-author-test.png)
- I assume that "lead author" would actually be the first question on the item submission form
- We also need to check to see which ORCID authority core this uses, because it seems to be using an entirely new one rather than the one for `dc.contributor.author` (which makes sense of course, but fuck, all the author problems aren't bad enough?!)
- Also would need to edit XMLUI item displays to incorporate this into authors list
- And fuck, then anyone consuming our data via REST / OAI will not notice that we have an author outside of `dc.contributor.authors`... ugh
- What if we modify the item submission form to use [`type-bind` fields to show/hide certain fields depending on the type](https://wiki.lyrasis.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ItemtypeBasedMetadataCollection)?
## 2017-07-05
- Adjust WLE Research Theme to include both Phase I and II on the submission form according to editor feedback ([#330](https://github.com/ilri/DSpace/pull/330))
- Generate list of fields in the current CGSpace `cg` scheme so we can record them properly in the metadata registry:
```
$ psql dspace -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=2 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:</dc-type>\n<dc-type>\n<schema>cg</schema>:;s:([^ ]*) +\| (.*): <\1>\2</\1>:;s:^$:</dc-type>:;1s:</dc-type>\n::' > cg-types.xml
```
- CGSpace was unavailable briefly, and I saw this error in the DSpace log file:
```
2017-07-05 13:05:36,452 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections
```
- Looking at the `pg_stat_activity` table I saw there were indeed 98 active connections to PostgreSQL, and at this time the limit is 100, so that makes sense
- Tsega restarted Tomcat and it's working now
- Abenet said she was generating a report with Atmire's CUA module, so it could be due to that?
- Looking in the logs I see this random error again that I should report to DSpace:
```
2017-07-05 13:50:07,196 ERROR org.dspace.statistics.SolrLogger @ COUNTRY ERROR: EU
```
- Seems to come from `dspace-api/src/main/java/org/dspace/statistics/SolrLogger.java`
## 2017-07-06
- Sisay tried to help by making a [pull request for the RTB flagships](https://github.com/ilri/DSpace/pull/331) but there are formatting errors, unrelated changes, and the flagship names are not in the style I requested
- Abenet talked to CIP and they said they are actually ok with using collection names rather than adding a new metadata field
## 2017-07-13
- Remove `UKaid` from the controlled vocabulary for `dc.description.sponsorship`, as `Department for International Development, United Kingdom` is the correct form and it is already present ([#334](https://github.com/ilri/DSpace/pull/334))
## 2017-07-14
- Sisay sent me a patch to add "Photo Report" to `dc.type` so I've added it to the `5_x-prod` branch
## 2017-07-17
- Linode shut down our seventeen (17) VMs due to nonpayment of the July 1st invoice
- It took me a few hours to find the ICT/Finance contacts to pay the bill and boot all the servers back up
- Since the server was down anyways, I decided to run all system updates and re-deploy CGSpace so that the latest changes to `input-forms.xml` and the sponsors controlled vocabulary
## 2017-07-20
- Skype chat with Addis team about the status of the CGIAR Library migration
- Need to add the CGIAR System Organization subjects to Discovery Facets (test first)
- Tentative list of dates for the migration:
- August 4: aim to finish data cleanup and then give Peter a list of authors
- August 18: ready to show System Office
- September 4: all feedback and decisions (including workflows) from System Office
- September 10/11: go live?
- Talk to Tsega and Danny about exporting/injesting the blog posts from Drupal into DSpace?
- Followup meeting on August 8/9?
- Sent Abenet the 2415 records from CGIAR Library's Historical Archive (10947/1) after cleaning up the author authorities and HTML entities in `dc.contributor.author` and `dc.description.abstract` using OpenRefine:
- Authors: `value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")`
- Abstracts: `replace(value,/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>/,'')`
## 2017-07-24
- Move two top-level communities to be sub-communities of ILRI Projects
```
$ for community in 10568/2347 10568/25209; do /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child="$community"; done
```
- Discuss CGIAR Library data cleanup with Sisay and Abenet
## 2017-07-27
- Help Sisay with some transforms to add descriptions to the `filename` column of some CIAT Presentations he's working on in OpenRefine
- Marianne emailed a few days ago to ask why "Integrating Ecosystem Solutions" was not in the list of WLE Phase I Research Themes on the input form
- I told her that I only added the themes that I saw in the [WLE Phase I Research Themes](https://cgspace.cgiar.org/handle/10568/34508) community
- Then Mia from WLE also emailed to ask where some WLE focal regions went, and I said I didn't understand what she was talking about, as all we did in our previous work was rename the old "Research Themes" subcommunity to "WLE Phase I Research Themes" and add a new subcommunity for "WLE Phase II Research Themes".
- Discuss some modifications to the CCAFS project tags in CGSpace submission form and in the database
## 2017-07-28
- Discuss updates to the Phase II CCAFS project tags with Andrea from Macaroni Bros
- I will do the renaming and untagging of items in CGSpace database, and he will update his webservice with the latest project tags and I will get the XML from here for our `input-forms.xml`: https://ccafs.cgiar.org/export/ccafsproject
## 2017-07-29
- Move some WLE items into appropriate Phase I Research Themes communities and delete some empty collections in WLE Regions community
## 2017-07-30
- Start working on CCAFS project tag cleanup
- More questions about inconsistencies and spelling mistakes in their tags, so I've sent some questions for followup
## 2017-07-31
- Looks like the final list of metadata corrections for CCAFS project tags will be:
```
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
```
- Now just waiting to run them on CGSpace, and then apply the modified input forms after Macaroni Bros give me an updated list
- Temporarily increase the nginx upload limit to 200MB for Sisay to upload the CIAT presentations
- Looking at CGSpace activity page, there are 52 Baidu bots concurrently crawling our website (I copied the activity page to a text file and grep it)!
```
$ grep 180.76. /tmp/status | awk '{print $5}' | sort | uniq | wc -l
52
```
- From looking at the `dspace.log` I see they are all using the same session, which means our Crawler Session Manager Valve is working