cgspace-notes/content/posts/2023-06.md

150 lines
7.1 KiB
Markdown
Raw Normal View History

2023-06-02 15:33:48 +02:00
---
title: "June, 2023"
date: 2023-06-02T10:29:36+03:00
author: "Alan Orth"
categories: ["Notes"]
---
## 2023-06-02
- Spend some time testing my `post_bitstreams.py` script to update thumbnails for items on CGSpace
- Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail...
- Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
- They have experience with improving the MODS interface in MELSpace's OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace
- From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk
<!--more-->
2023-06-04 10:00:30 +02:00
## 2023-06-04
- Upgrade CGSpace to Ubuntu 22.04
- The upgrade was mostly normal, but I had to unhold the openjdk package in order for `do-release-upgrade` to run:
```console
# apt-mark hold openjdk-8-jdk-headless:amd64 openjdk-8-jre-headless:amd64
```
- In [2022-11]({{< relref "2022-11.md" >}}) an upstream Java update broke the DSpace 6 Handle server so we will have to pin this again after the upgrade to Ubuntu 22.04
2023-06-06 15:54:25 +02:00
- After the upgrade I made sure CGSpace was working, then proceeded to upgrade PostgreSQL from 12 to 14, like I did on [DSpace Test in 2023-03]({{< relref "2023-03.md" >}})
2023-06-04 10:00:30 +02:00
- Then I had to downgrade OpenJDK to fix the Handle server using the ones I had previously downloaded for Ubuntu 20.04 because they no longer exist on Launchpad:
```console
# dpkg -i openjdk-8-j*8u342-b07*.deb
```
2023-06-06 15:54:25 +02:00
- Export CGSpace to fix missing Initiative collection mappings
- Start a harvest on AReS
- Work on the DSpace 7 migration a bit more
- I decided to rebase and drop all the submission form edits because they conflict every time upstream changes!
## 2023-06-06
- Fix some incorrect ORCID identifiers for an Alliance author on CGSpace
- Export our list of ORCID identifiers, resolve them, and update the records in CGSpace:
```console
$ cat dspace/config/controlled-vocabularies/cg-creator-identifier.xml 2022-09-22-add-orcids.csv| grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u > /tmp/2023-06-06-orcids.txt
$ ./ilri/resolve_orcids.py -i /tmp/2023-06-06-orcids.txt -o /tmp/2023-06-06-orcids-names.txt -d
$ ./ilri/update_orcids.py -i /tmp/2023-06-06-orcids-names.txt -db dspacetest -u dspace -p 'ffff' -m 247
```
- Start working on updating the MODS schema in CGSpace from 3.1 to 3.8 based on Stefano and Salem's work last year
2023-06-08 16:04:20 +02:00
## 2023-06-08
- Continue working on the MODS schema mapping
- Export CGSpace to check and update `dcterms.extent` fields
- I normalized about 1,500 to use either "p. 1-6" or "5 p." format
- Also, I used this GREL expression to extract missing pages from the citation field: `cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*(pp?\.\s?\d+[-]\d+).*/)[0]`
- This was over 4,000 items with a format like "p. 1-6" and "pp. 1-6" in the citation
- I used another GREL expression to extract another 5,000: `cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*?(\d+\s+?[Pp]+\.).*/)[0]`
- This was for the format like "1 p." (note we had to protect against the greedy `.*` in the beginning)
- I also did some work to capture a handful of missing DOIs and ISSNs, but it was only about 100 items and I will have to wait until the 10,000+ above finish importing
2023-06-10 08:17:08 +02:00
## 2023-06-09
- I see there are ~200 users in CGSpace that have registered with their CGIAR email address using a password as opposed to using Active Directory:
```sql
SELECT * FROM eperson WHERE email LIKE '%cgiar.org' AND netid IS NOT NULL AND password IS NOT NULL;
```
- I am wondering if I should delete their passwords and tell them use log in using LDAP
- As an initial test I will reset a few accounts including my own that have passwords and salts:
```sql
UPDATE eperson SET password=DEFAULT,salt=DEFAULT,digest_algorithm=DEFAULT WHERE netid IN ('axxxx', 'axxxx', 'bxxxx');
```
- I also decided to reset passwords/salts for CGIAR accounts that have not been active since 2021 (1.5 years ago):
```sql
UPDATE eperson SET password=DEFAULT,salt=DEFAULT,digest_algorithm=DEFAULT WHERE email LIKE '%cgiar.org' AND netid IS NOT NULL AND password IS NOT NULL AND salt IS NOT NULL AND last_active < '2022-01-01'::date;
```
- This was about 100 accounts...
- I will wait some more time before I decide what to do about the more current ones
- Add a few more ORCID identifiers to my list and tag them on CGSpace
## 2023-06-10
- Export CGSpace to check for missing Initiative mappings
- Start a harvest on AReS
2023-06-13 19:58:57 +02:00
## 2023-06-11
- File [an issue](https://github.com/DSpace/DSpace/issues/8900) on DSpace for the `Content-Disposition` bug causing images to get downloaded instead of opened inline
## 2023-06-12
- Export CGSpace to do some more work extracting volume and issue from citations for items where they are missing
- I found and fixed over 7,000!
- Then I found and extracted another 7,000 items with no extents (pages)
- Then I replaced all occurences of en dashes for ranges in pages with regular hyphens
## 2023-06-13
- Last night I finally figured out how to do basic overrides to the simple item view in Angular
- Add a handful of new ORCID identifiers to my list and tag them on CGSpace
- Extract a list of all the proposed actions for CG Core output types and create a [new issue for them on CG Core's GitHub repository](https://github.com/AgriculturalSemantics/cg-core/issues/45)
- Extract a list of all the proposed actions for CG Core output types for MARLO and create [a new issue for them on MARLO's GitHub repository](https://github.com/CCAFS/MARLO/issues/2479)
- Meeting with Indira, Ryan, and Abenet to discuss plans for the DSpace 7 focus group
2023-06-14 19:29:35 +02:00
## 2023-06-14
- Did some more work on the DSpace 7 Test to improve the submission forms and the look and feel
- Extract a list of all the proposed actions for CG Core output types for MEL and create [a new issue for them on MEL's GitHub repository](https://github.com/CodeObia/MEL/issues/11216)
2023-06-17 22:14:32 +02:00
- I filed [an issue about the yarn merge-i18n script](https://github.com/DSpace/dspace-angular/issues/2309)
- I made [a pull request for some Finnish language i18n strings](https://github.com/DSpace/dspace-angular/pull/2306)
- I made [a pull request to lint the i18n en.json5 file](https://github.com/DSpace/dspace-angular/pull/2306)
## 2023-06-15
- A lot more work on DSpace 7
- I tested some pull requests and worked on the style of the item view and homepage
## 2023-06-16
- A lot more work on DSpace 7
- I made [a pull request to adjust font weight in item counts ](https://github.com/DSpace/dspace-angular/pull/2316)
- I made [a pull request to update the ESLint configuration for JSON5](https://github.com/DSpace/dspace-angular/pull/2317)
## 2023-06-17
- Export CGSpace to check for missing Initiative collection mappings
- I also spent some time doing sanity checks on countries, regions, DOIs, and more
- I lowercased all our AGROVOC keywords in `dcterms.subject`:
```sql
dspace=# BEGIN;
BEGIN
dspace=*# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]';
UPDATE 2392
dspace=*# COMMIT;
COMMIT
```
- Start a harvest on AReS
2023-06-14 19:29:35 +02:00
2023-06-02 15:33:48 +02:00
<!-- vim: set sw=2 ts=2: -->