mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-16 11:57:03 +01:00
223 lines
12 KiB
Markdown
223 lines
12 KiB
Markdown
---
|
|
title: "December, 2019"
|
|
date: 2019-12-01T11:22:30+02:00
|
|
author: "Alan Orth"
|
|
categories: ["Notes"]
|
|
---
|
|
|
|
## 2019-12-01
|
|
|
|
- Upgrade CGSpace (linode18) to Ubuntu 18.04:
|
|
- Check any packages that have residual configs and purge them:
|
|
- <code># dpkg -l | grep -E '^rc' | awk '{print $2}' | xargs dpkg -P</code>
|
|
- Make sure all packages are up to date and the package manager is up to date, then reboot:
|
|
|
|
```
|
|
# apt update && apt full-upgrade
|
|
# apt-get autoremove && apt-get autoclean
|
|
# dpkg -C
|
|
# reboot
|
|
```
|
|
|
|
<!--more-->
|
|
|
|
- Take some backups:
|
|
|
|
```
|
|
# dpkg -l > 2019-12-01-linode18-dpkg.txt
|
|
# tar czf 2019-12-01-linode18-etc.tar.gz /etc
|
|
```
|
|
- Then check all third-party repositories in /etc/apt to see if everything using "xenial" has packages available for "bionic" and then update the sources:
|
|
- <code># sed -i 's/xenial/bionic/' /etc/apt/sources.list.d/*.list</code>
|
|
- Pause the Uptime Robot monitoring for CGSpace
|
|
- Make sure the update manager is installed and do the upgrade:
|
|
|
|
```
|
|
# apt install update-manager-core
|
|
# do-release-upgrade
|
|
```
|
|
|
|
- After the upgrade finishes, remove Java 11, force the installation of bionic nginx, and reboot the server:
|
|
|
|
```
|
|
# apt purge openjdk-11-jre-headless
|
|
# apt install 'nginx=1.16.1-1~bionic'
|
|
# reboot
|
|
```
|
|
|
|
- After the server comes back up, remove Python virtualenvs that were created with Python 3.5 and re-run certbot to make sure it's working:
|
|
|
|
```
|
|
# rm -rf /opt/eff.org/certbot/venv/bin/letsencrypt
|
|
# rm -rf /opt/ilri/dspace-statistics-api/venv
|
|
# /opt/certbot-auto
|
|
```
|
|
|
|
- Clear Ansible's fact cache and re-run the playbooks to update the system's firewalls, SSH config, etc
|
|
- Altmetric finally responded to my question about Dublin Core fields
|
|
- They shared a [list of fields they use for tracking](https://help.altmetric.com/support/solutions/articles/6000141419-what-metadata-is-required-to-track-our-content-), but it only mentions HTML meta tags, and not fields considered when harvesting via OAI
|
|
- Anyways, there might be some areas we can improve on the HTML meta tags, if I look at one [item with a DOI, ISSN, etc](https://hdl.handle.net/10568/101623) I see that we could at least add status (Open Access) and journal title
|
|
- I merged a [pull request](https://github.com/ilri/DSpace/pull/438) into the `5_x-prod` branch to add status and journal title to the XHTML meta tags
|
|
|
|
## 2019-12-02
|
|
|
|
- Raise the issue of old, low-quality thumbnails with Peter and the CGSpace team
|
|
- I suggested that we move manually uploaded thumbnails from the `ORIGINAL` bundle to the `THUMBNAIL` bundle
|
|
- Also replace old thumbnails where an item is available on Slideshare or YouTube because those are easy to get new, high-quality thumbnails for
|
|
- Continue testing CG Core v2 implementation on DSpace Test
|
|
- Compare the OAI QDC representation of a few items on CGSpace vs DSpace Test:
|
|
|
|
```
|
|
$ http 'https://cgspace.cgiar.org/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:cgspace.cgiar.org:10568/104030' > /tmp/cgspace-104030.xml
|
|
$ http 'https://dspacetest.cgiar.org/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:cgspace.cgiar.org:10568/104030' > /tmp/dspacetest-104030.xml
|
|
```
|
|
|
|
- The DSpace Test ones actually now capture the DOI, where the CGSpace doesn't...
|
|
- And the DSpace Test one doesn't include review status as `dc.description`, but I don't think that's an important field
|
|
|
|
## 2019-12-04
|
|
|
|
- Peter noticed that there were about seventy items on CGSpace that were marked as private
|
|
- Some have been withdrawn, but I extracted a list of the forty-eight that were not:
|
|
|
|
```
|
|
dspace=# \COPY (SELECT handle, owning_collection FROM item, handle WHERE item.discoverable='f' AND item.in_archive='t' AND handle.resource_id = item.item_id) to /tmp/2019-12-04-CGSpace-private-items.csv WITH CSV HEADER;
|
|
COPY 48
|
|
```
|
|
|
|
## 2019-12-05
|
|
|
|
- Give [presentation about CG Core v2](https://hdl.handle.net/10568/106045) to the MEL Developers' Retreat in Nairobi, Kenya (via Skype)
|
|
- Send some pull requests to the cg-core schema repository:
|
|
- [HTML syntax fixes](https://github.com/AgriculturalSemantics/cg-core/pull/16)
|
|
- [Add LICENSE file](https://github.com/AgriculturalSemantics/cg-core/pull/17)
|
|
- [Build main.css using npm build](https://github.com/AgriculturalSemantics/cg-core/pull/18)
|
|
|
|
## 2019-12-08
|
|
|
|
- Enrico noticed that the AReS Explorer on CGSpace (linode18) was down
|
|
- I only see HTTP 502 in the nginx logs on CGSpace... so I assume it's something wrong with the AReS server
|
|
- I ran all system updates on the AReS server (linode20) and rebooted it
|
|
- After rebooting the Explorer was accessible again
|
|
|
|
## 2019-12-09
|
|
|
|
- Update PostgreSQL JDBC driver to [version 42.2.9](https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.9) in [Ansible playbooks](https://github.com/ilri/rmg-ansible-public)
|
|
- Deploy on DSpace Test (linode19) to test before deploying on CGSpace in a few days
|
|
- Altmetric responded to my question about [the WLE item](https://hdl.handle.net/10568/97087) that has a lower score than its DOI
|
|
- They say that they will "reprocess" the item "before Christmas"
|
|
|
|
## 2019-12-11
|
|
|
|
- Post [message to Yammer about good practices for thumbnails on CGSpace](https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=454830191804416)
|
|
- On the topic of thumbnails, I'm thinking we might want to force regenerate all PDF thumbnails on CGSpace since we upgraded it to Ubuntu 18.04 and got a new ghostscript...
|
|
- More discussion about report formats for AReS
|
|
- Peter noticed that the Atmire reports weren't showing any statistics before 2019
|
|
- I checked and indeed Solr had an issue loading some core last time it was started
|
|
- I restarted Tomcat three times before all cores came up successfully
|
|
- While I was restarting the Tomcat service I upgraded the PostgreSQL JDBC driver to version 42.2.9, which had been deployed on DSpace Test earlier this week
|
|
|
|
## 2019-12-16
|
|
|
|
- Visit CodeObia office to discuss next phase of OpenRXV/AReS development
|
|
- We discussed using CSV instead of Excel for tabular reports
|
|
- OpenRXV should only have "simple" reports with Dublin Core fields
|
|
- AReS should have this as well as a customized "extended" report that has CRPs, Subjects, Sponsors, etc from CGSpace
|
|
- We discussed using RTF instead of Word for graphical reports
|
|
|
|
## 2019-12-17
|
|
|
|
- Start filing GitHub issues for the reporting features on OpenRXV and AReS
|
|
- I created an issue for the "simple" tabular reports on OpenRXV GitHub ([#29](https://github.com/ilri/OpenRXV/issues/29))
|
|
- I created an issue for the "extended" tabular reports on AReS GitHub ([#8](https://github.com/ilri/AReS/issues/8))
|
|
- I created an issue for "simple" text reports on the OpenRXV GitHub ([#30](https://github.com/ilri/OpenRXV/issues/30))
|
|
- I created an issue for "extended" text reports on the AReS GitHub ([#9](https://github.com/ilri/AReS/issues/9))
|
|
- I looked into creating RTF documents from HTML in Node.js and there is a library called [html-to-rtf](https://www.npmjs.com/package/html-to-rtf) that works well, but doesn't support images
|
|
- Export a list of all investors (`dc.description.sponsorship`) for Peter to look through and correct:
|
|
|
|
```
|
|
dspace=# \COPY (SELECT DISTINCT text_value as "dc.contributor.sponsor", count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 29 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-12-17-investors.csv WITH CSV HEADER;
|
|
COPY 643
|
|
```
|
|
|
|
## 2019-12-18
|
|
|
|
- Apply the investor corrections and deletions from Peter on CGSpace:
|
|
|
|
```
|
|
$ ./fix-metadata-values.py -i /tmp/2019-12-17-investors-fix-112.csv -db dspace -u dspace -p 'fuuu' -f dc.description.sponsorship -m 29 -t correct -d
|
|
$ ./delete-metadata-values.py -i /tmp/2019-12-17-investors-delete-68.csv -db dspace -u dspace -p 'fuuu' -m 29 -f dc.description.sponsorship -d
|
|
```
|
|
|
|
- Peter asked about the "Open Government Licence 3.0" that is used by [some items]()
|
|
- I notice that it [exists in SPDX as `UGL-UK-3.0`](https://spdx.org/licenses/OGL-UK-3.0.html) so I created a GitHub issue to add this to our controlled vocabulary ([#439](https://github.com/ilri/DSpace/issues/439))
|
|
- I only see two in our database that use this for now, so I will update them:
|
|
|
|
```
|
|
dspace=# SELECT text_value FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%Open%';
|
|
text_value
|
|
-----------------------------
|
|
Open Government License 3.0
|
|
Open Government License 3.0
|
|
(2 rows)
|
|
dspace=# UPDATE metadatavalue SET text_value='OGL-UK-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%Open Government License 3.0%';
|
|
UPDATE 2
|
|
```
|
|
|
|
- I created a pull request to add the license and merged it to the `5_x-prod` branch ([#440](https://github.com/ilri/DSpace/pull/440))
|
|
- Add three new CCAFS Phase II project tags to CGSpace ([#441](https://github.com/ilri/DSpace/pull/441))
|
|
- Linode said DSpace Test (linode19) had an outbound traffic rate of 73Mb/sec for the last two hours
|
|
- I see some Russian bot active in nginx's access logs:
|
|
|
|
```
|
|
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -c MegaIndex.ru
|
|
27320
|
|
```
|
|
|
|
- I see they *did* check `robots.txt` and their requests are only going to XMLUI item pages... so I guess I just leave them alone
|
|
- Peter wrote to ask why this [one WLE item](https://cgspace.cgiar.org/handle/10568/101286) does [not have an Altmetric attention score](https://api.altmetric.com/v1/handle/10568/101286), but [the DOI does](https://api.altmetric.com/v1/doi/10.1126/science.aaw0911)
|
|
- I tweeted the item just in case, but Peter said that he already did it yesterday
|
|
- The item was added six months ago...
|
|
- The DOI has an Altmetric score of 259, but for the Handle it is HTTP 404!
|
|
- I emailed Altmetric support
|
|
|
|
## 2019-12-22
|
|
|
|
- I ran the `dspace cleanup` process on CGSpace (linode18) and had an error:
|
|
|
|
```
|
|
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
|
Detail: Key (bitstream_id)=(179441) is still referenced from table "bundle".
|
|
```
|
|
|
|
- The solution is to delete that bitstream manually:
|
|
|
|
```
|
|
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (179441);'
|
|
UPDATE 1
|
|
```
|
|
|
|
- Adjust [CG Core v2 migrataion notes]({{< relref "cgspace-cgcorev2-migration.md" >}}) to use `cg.review-status` instead of `cg.peer-reviewed`
|
|
- I had [raised the issue](https://github.com/AgriculturalSemantics/cg-core/issues/14) with Marie-Angelique earlier this month
|
|
- It makes much more sense to use a wider scope here than a simple boolean
|
|
- I also noticed another field that we should be using in DCTERMS instead of CG: `cg.targetaudience`
|
|
- [DCTERMS says that `dcterms.audience` should be used to describe a A class of entity for whom the resource is intended or useful."](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/2012-06-14/?v=terms#audience)
|
|
- I will update my notes for this so that we use that field instead
|
|
- I don't see "audience" on the [cg-core](https://github.com/AgriculturalSemantics/cg-core/) repository so I filed [an issue](https://github.com/AgriculturalSemantics/cg-core/issues/19) to raise it with Marie-Angelique
|
|
|
|
## 2019-12-23
|
|
|
|
- Follow up with Altmetric on the issue where [an item](https://hdl.handle.net/10568/97087) has a different (lower) score for its Handle despite it having a correct DOI (with a higher score)
|
|
- I've raised this issue three times to Altmetric this year, and a few weeks ago they said they would re-process the item "before Christmas"
|
|
- Abenet suggested we use `cg.reviewStatus` instead of `cg.review-status` and I agree that we should follow other examples like `DCTERMS.accessRights` and `DCTERMS.isPartOf`
|
|
- I updated [my comment on the cg-core issue on GitHub](https://github.com/AgriculturalSemantics/cg-core/issues/14)
|
|
|
|
## 2019-12-30
|
|
|
|
- Altmetric responded a few days ago about the [item](https://hdl.handle.net/10568/97087) that has a different (lower) score for its Handle despite it having a correct DOI (with a higher score)
|
|
- She tweeted the repository link and agreed that it didn't get picked up by Altmetric
|
|
- She said she will add this to the existing ticket about the previous items I had raised an issue about
|
|
- Update Tomcat to version 7.0.99 in the [Ansible infrastructure playbooks](https://github.com/ilri/rmg-ansible-public) and deploy on DSpace Test (linode19)
|
|
|
|
<!-- vim: set sw=2 ts=2: -->
|