mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-25 08:00:18 +01:00
150 lines
8.5 KiB
Markdown
150 lines
8.5 KiB
Markdown
+++
|
|
date = "2017-07-01T18:03:52+03:00"
|
|
author = "Alan Orth"
|
|
title = "July, 2017"
|
|
tags = ["Notes"]
|
|
|
|
+++
|
|
## 2017-07-01
|
|
|
|
- Run system updates and reboot DSpace Test
|
|
|
|
## 2017-07-04
|
|
|
|
- Merge changes for WLE Phase II theme rename ([#329](https://github.com/ilri/DSpace/pull/329))
|
|
- Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace
|
|
- We can use PostgreSQL's extended output format (`-x`) plus `sed` to format the output into quasi XML:
|
|
|
|
<!--more-->
|
|
|
|
```
|
|
$ psql dspacenew -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=5 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:</dc-type>\n<dc-type>\n<schema>cg</schema>:;s:([^ ]*) +\| (.*): <\1>\2</\1>:;s:^$:</dc-type>:;1s:</dc-type>\n::'
|
|
```
|
|
|
|
- The `sed` script is from a post on the [PostgreSQL mailing list](https://www.postgresql.org/message-id/437E44A5.508%40ultimeth.com)
|
|
- Abenet says the ILRI board wants to be able to have "lead author" for every item, so I've whipped up a WIP test in the `5_x-lead-author` branch
|
|
- It works but is still very rough and we haven't thought out the whole lifecycle yet
|
|
|
|
![Testing lead author in submission form](/cgspace-notes/2017/07/lead-author-test.png)
|
|
|
|
- I assume that "lead author" would actually be the first question on the item submission form
|
|
- We also need to check to see which ORCID authority core this uses, because it seems to be using an entirely new one rather than the one for `dc.contributor.author` (which makes sense of course, but fuck, all the author problems aren't bad enough?!)
|
|
- Also would need to edit XMLUI item displays to incorporate this into authors list
|
|
- And fuck, then anyone consuming our data via REST / OAI will not notice that we have an author outside of `dc.contributor.authors`... ugh
|
|
- What if we modify the item submission form to use [`type-bind` fields to show/hide certain fields depending on the type](https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ItemtypeBasedMetadataCollection)?
|
|
|
|
## 2017-07-05
|
|
|
|
- Adjust WLE Research Theme to include both Phase I and II on the submission form according to editor feedback ([#330](https://github.com/ilri/DSpace/pull/330))
|
|
- Generate list of fields in the current CGSpace `cg` scheme so we can record them properly in the metadata registry:
|
|
|
|
```
|
|
$ psql dspace -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=2 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:</dc-type>\n<dc-type>\n<schema>cg</schema>:;s:([^ ]*) +\| (.*): <\1>\2</\1>:;s:^$:</dc-type>:;1s:</dc-type>\n::' > cg-types.xml
|
|
```
|
|
|
|
- CGSpace was unavailable briefly, and I saw this error in the DSpace log file:
|
|
|
|
```
|
|
2017-07-05 13:05:36,452 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
|
org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections
|
|
```
|
|
|
|
- Looking at the `pg_stat_activity` table I saw there were indeed 98 active connections to PostgreSQL, and at this time the limit is 100, so that makes sense
|
|
- Tsega restarted Tomcat and it's working now
|
|
- Abenet said she was generating a report with Atmire's CUA module, so it could be due to that?
|
|
- Looking in the logs I see this random error again that I should report to DSpace:
|
|
|
|
```
|
|
2017-07-05 13:50:07,196 ERROR org.dspace.statistics.SolrLogger @ COUNTRY ERROR: EU
|
|
```
|
|
|
|
- Seems to come from `dspace-api/src/main/java/org/dspace/statistics/SolrLogger.java`
|
|
|
|
## 2017-07-06
|
|
|
|
- Sisay tried to help by making a [pull request for the RTB flagships](https://github.com/ilri/DSpace/pull/331) but there are formatting errors, unrelated changes, and the flagship names are not in the style I requested
|
|
- Abenet talked to CIP and they said they are actually ok with using collection names rather than adding a new metadata field
|
|
|
|
## 2017-07-13
|
|
|
|
- Remove `UKaid` from the controlled vocabulary for `dc.description.sponsorship`, as `Department for International Development, United Kingdom` is the correct form and it is already present ([#334](https://github.com/ilri/DSpace/pull/334))
|
|
|
|
## 2017-07-14
|
|
|
|
- Sisay sent me a patch to add "Photo Report" to `dc.type` so I've added it to the `5_x-prod` branch
|
|
|
|
## 2017-07-17
|
|
|
|
- Linode shut down our seventeen (17) VMs due to nonpayment of the July 1st invoice
|
|
- It took me a few hours to find the ICT/Finance contacts to pay the bill and boot all the servers back up
|
|
- Since the server was down anyways, I decided to run all system updates and re-deploy CGSpace so that the latest changes to `input-forms.xml` and the sponsors controlled vocabulary
|
|
|
|
## 2017-07-20
|
|
|
|
- Skype chat with Addis team about the status of the CGIAR Library migration
|
|
- Need to add the CGIAR System Organization subjects to Discovery Facets (test first)
|
|
- Tentative list of dates for the migration:
|
|
- August 4: aim to finish data cleanup and then give Peter a list of authors
|
|
- August 18: ready to show System Office
|
|
- September 4: all feedback and decisions (including workflows) from System Office
|
|
- September 10/11: go live?
|
|
- Talk to Tsega and Danny about exporting/injesting the blog posts from Drupal into DSpace?
|
|
- Followup meeting on August 8/9?
|
|
- Sent Abenet the 2415 records from CGIAR Library's Historical Archive (10947/1) after cleaning up the author authorities and HTML entities in `dc.contributor.author` and `dc.description.abstract` using OpenRefine:
|
|
- Authors: `value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")`
|
|
- Abstracts: `replace(value,/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>/,'')`
|
|
|
|
## 2017-07-24
|
|
|
|
- Move two top-level communities to be sub-communities of ILRI Projects
|
|
|
|
```
|
|
$ for community in 10568/2347 10568/25209; do /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child="$community"; done
|
|
```
|
|
|
|
- Discuss CGIAR Library data cleanup with Sisay and Abenet
|
|
|
|
## 2017-07-27
|
|
|
|
- Help Sisay with some transforms to add descriptions to the `filename` column of some CIAT Presentations he's working on in OpenRefine
|
|
- Marianne emailed a few days ago to ask why "Integrating Ecosystem Solutions" was not in the list of WLE Phase I Research Themes on the input form
|
|
- I told her that I only added the themes that I saw in the [WLE Phase I Research Themes](https://cgspace.cgiar.org/handle/10568/34508) community
|
|
- Then Mia from WLE also emailed to ask where some WLE focal regions went, and I said I didn't understand what she was talking about, as all we did in our previous work was rename the old "Research Themes" subcommunity to "WLE Phase I Research Themes" and add a new subcommunity for "WLE Phase II Research Themes".
|
|
- Discuss some modifications to the CCAFS project tags in CGSpace submission form and in the database
|
|
|
|
## 2017-07-28
|
|
|
|
- Discuss updates to the Phase II CCAFS project tags with Andrea from Macaroni Bros
|
|
- I will do the renaming and untagging of items in CGSpace database, and he will update his webservice with the latest project tags and I will get the XML from here for our `input-forms.xml`: https://ccafs.cgiar.org/export/ccafsproject
|
|
|
|
## 2017-07-29
|
|
|
|
- Move some WLE items into appropriate Phase I Research Themes communities and delete some empty collections in WLE Regions community
|
|
|
|
## 2017-07-30
|
|
|
|
- Start working on CCAFS project tag cleanup
|
|
- More questions about inconsistencies and spelling mistakes in their tags, so I've sent some questions for followup
|
|
|
|
## 2017-07-31
|
|
|
|
- Looks like the final list of metadata corrections for CCAFS project tags will be:
|
|
|
|
```
|
|
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
|
|
update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
|
|
update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
|
|
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
|
|
```
|
|
|
|
- Now just waiting to run them on CGSpace, and then apply the modified input forms after Macaroni Bros give me an updated list
|
|
- Temporarily increase the nginx upload limit to 200MB for Sisay to upload the CIAT presentations
|
|
- Looking at CGSpace activity page, there are 52 Baidu bots concurrently crawling our website (I copied the activity page to a text file and grep it)!
|
|
|
|
```
|
|
$ grep 180.76. /tmp/status | awk '{print $5}' | sort | uniq | wc -l
|
|
52
|
|
```
|
|
|
|
- From looking at the `dspace.log` I see they are all using the same session, which means our Crawler Session Manager Valve is working
|