mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-23 13:34:32 +01:00
253 lines
13 KiB
Markdown
253 lines
13 KiB
Markdown
---
|
||
title: "June, 2023"
|
||
date: 2023-06-02T10:29:36+03:00
|
||
author: "Alan Orth"
|
||
categories: ["Notes"]
|
||
---
|
||
|
||
## 2023-06-02
|
||
|
||
- Spend some time testing my `post_bitstreams.py` script to update thumbnails for items on CGSpace
|
||
- Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail...
|
||
- Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
|
||
- They have experience with improving the MODS interface in MELSpace's OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace
|
||
- From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk
|
||
|
||
<!--more-->
|
||
|
||
## 2023-06-04
|
||
|
||
- Upgrade CGSpace to Ubuntu 22.04
|
||
- The upgrade was mostly normal, but I had to unhold the openjdk package in order for `do-release-upgrade` to run:
|
||
|
||
```console
|
||
# apt-mark hold openjdk-8-jdk-headless:amd64 openjdk-8-jre-headless:amd64
|
||
```
|
||
|
||
- In [2022-11]({{< relref "2022-11.md" >}}) an upstream Java update broke the DSpace 6 Handle server so we will have to pin this again after the upgrade to Ubuntu 22.04
|
||
- After the upgrade I made sure CGSpace was working, then proceeded to upgrade PostgreSQL from 12 to 14, like I did on [DSpace Test in 2023-03]({{< relref "2023-03.md" >}})
|
||
- Then I had to downgrade OpenJDK to fix the Handle server using the ones I had previously downloaded for Ubuntu 20.04 because they no longer exist on Launchpad:
|
||
|
||
```console
|
||
# dpkg -i openjdk-8-j*8u342-b07*.deb
|
||
```
|
||
|
||
- Export CGSpace to fix missing Initiative collection mappings
|
||
- Start a harvest on AReS
|
||
- Work on the DSpace 7 migration a bit more
|
||
- I decided to rebase and drop all the submission form edits because they conflict every time upstream changes!
|
||
|
||
## 2023-06-06
|
||
|
||
- Fix some incorrect ORCID identifiers for an Alliance author on CGSpace
|
||
- Export our list of ORCID identifiers, resolve them, and update the records in CGSpace:
|
||
|
||
```console
|
||
$ cat dspace/config/controlled-vocabularies/cg-creator-identifier.xml 2022-09-22-add-orcids.csv| grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u > /tmp/2023-06-06-orcids.txt
|
||
$ ./ilri/resolve_orcids.py -i /tmp/2023-06-06-orcids.txt -o /tmp/2023-06-06-orcids-names.txt -d
|
||
$ ./ilri/update_orcids.py -i /tmp/2023-06-06-orcids-names.txt -db dspacetest -u dspace -p 'ffff' -m 247
|
||
```
|
||
|
||
- Start working on updating the MODS schema in CGSpace from 3.1 to 3.8 based on Stefano and Salem's work last year
|
||
|
||
## 2023-06-08
|
||
|
||
- Continue working on the MODS schema mapping
|
||
- Export CGSpace to check and update `dcterms.extent` fields
|
||
- I normalized about 1,500 to use either "p. 1-6" or "5 p." format
|
||
- Also, I used this GREL expression to extract missing pages from the citation field: `cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*(pp?\.\s?\d+[-–]\d+).*/)[0]`
|
||
- This was over 4,000 items with a format like "p. 1-6" and "pp. 1-6" in the citation
|
||
- I used another GREL expression to extract another 5,000: `cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*?(\d+\s+?[Pp]+\.).*/)[0]`
|
||
- This was for the format like "1 p." (note we had to protect against the greedy `.*` in the beginning)
|
||
- I also did some work to capture a handful of missing DOIs and ISSNs, but it was only about 100 items and I will have to wait until the 10,000+ above finish importing
|
||
|
||
## 2023-06-09
|
||
|
||
- I see there are ~200 users in CGSpace that have registered with their CGIAR email address using a password as opposed to using Active Directory:
|
||
|
||
```sql
|
||
SELECT * FROM eperson WHERE email LIKE '%cgiar.org' AND netid IS NOT NULL AND password IS NOT NULL;
|
||
```
|
||
|
||
- I am wondering if I should delete their passwords and tell them use log in using LDAP
|
||
- As an initial test I will reset a few accounts including my own that have passwords and salts:
|
||
|
||
```sql
|
||
UPDATE eperson SET password=DEFAULT,salt=DEFAULT,digest_algorithm=DEFAULT WHERE netid IN ('axxxx', 'axxxx', 'bxxxx');
|
||
```
|
||
|
||
- I also decided to reset passwords/salts for CGIAR accounts that have not been active since 2021 (1.5 years ago):
|
||
|
||
```sql
|
||
UPDATE eperson SET password=DEFAULT,salt=DEFAULT,digest_algorithm=DEFAULT WHERE email LIKE '%cgiar.org' AND netid IS NOT NULL AND password IS NOT NULL AND salt IS NOT NULL AND last_active < '2022-01-01'::date;
|
||
```
|
||
|
||
- This was about 100 accounts...
|
||
- I will wait some more time before I decide what to do about the more current ones
|
||
- Add a few more ORCID identifiers to my list and tag them on CGSpace
|
||
|
||
## 2023-06-10
|
||
|
||
- Export CGSpace to check for missing Initiative mappings
|
||
- Start a harvest on AReS
|
||
|
||
## 2023-06-11
|
||
|
||
- File [an issue](https://github.com/DSpace/DSpace/issues/8900) on DSpace for the `Content-Disposition` bug causing images to get downloaded instead of opened inline
|
||
|
||
## 2023-06-12
|
||
|
||
- Export CGSpace to do some more work extracting volume and issue from citations for items where they are missing
|
||
- I found and fixed over 7,000!
|
||
- Then I found and extracted another 7,000 items with no extents (pages)
|
||
- Then I replaced all occurences of en dashes for ranges in pages with regular hyphens
|
||
|
||
## 2023-06-13
|
||
|
||
- Last night I finally figured out how to do basic overrides to the simple item view in Angular
|
||
- Add a handful of new ORCID identifiers to my list and tag them on CGSpace
|
||
- Extract a list of all the proposed actions for CG Core output types and create a [new issue for them on CG Core's GitHub repository](https://github.com/AgriculturalSemantics/cg-core/issues/45)
|
||
- Extract a list of all the proposed actions for CG Core output types for MARLO and create [a new issue for them on MARLO's GitHub repository](https://github.com/CCAFS/MARLO/issues/2479)
|
||
- Meeting with Indira, Ryan, and Abenet to discuss plans for the DSpace 7 focus group
|
||
|
||
## 2023-06-14
|
||
|
||
- Did some more work on the DSpace 7 Test to improve the submission forms and the look and feel
|
||
- Extract a list of all the proposed actions for CG Core output types for MEL and create [a new issue for them on MEL's GitHub repository](https://github.com/CodeObia/MEL/issues/11216)
|
||
- I filed [an issue about the yarn merge-i18n script](https://github.com/DSpace/dspace-angular/issues/2309)
|
||
- I made [a pull request for some Finnish language i18n strings](https://github.com/DSpace/dspace-angular/pull/2306)
|
||
- I made [a pull request to lint the i18n en.json5 file](https://github.com/DSpace/dspace-angular/pull/2306)
|
||
|
||
## 2023-06-15
|
||
|
||
- A lot more work on DSpace 7
|
||
- I tested some pull requests and worked on the style of the item view and homepage
|
||
|
||
## 2023-06-16
|
||
|
||
- A lot more work on DSpace 7
|
||
- I made [a pull request to adjust font weight in item counts ](https://github.com/DSpace/dspace-angular/pull/2316)
|
||
- I made [a pull request to update the ESLint configuration for JSON5](https://github.com/DSpace/dspace-angular/pull/2317)
|
||
|
||
## 2023-06-17
|
||
|
||
- Export CGSpace to check for missing Initiative collection mappings
|
||
- I also spent some time doing sanity checks on countries, regions, DOIs, and more
|
||
- I lowercased all our AGROVOC keywords in `dcterms.subject`:
|
||
|
||
```sql
|
||
dspace=# BEGIN;
|
||
BEGIN
|
||
dspace=*# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]';
|
||
UPDATE 2392
|
||
dspace=*# COMMIT;
|
||
COMMIT
|
||
```
|
||
|
||
- Start a harvest on AReS
|
||
|
||
## 2023-06-19
|
||
|
||
- Today I started getting an error on DSpace 7 Test
|
||
- The page loads, and then when it is almost done it goes blank to white with this in the console:
|
||
|
||
```console
|
||
ERROR DOMException: CSSStyleSheet.cssRules getter: Not allowed to access cross-origin stylesheet
|
||
```
|
||
|
||
- I restarted Angular, but it didn't fix it
|
||
- The `yarn test:rest` script shows everything OK, and I haven't changed anything recently...
|
||
- I re-compiled the Angular UI using the default theme and it was the same...
|
||
- I tried in Firefox Nightly and it works...
|
||
- So it must be something related to the browser
|
||
- I tried clearing all the session storage / cookies and refreshing and it worked
|
||
- I switched back to the CGSpace theme and it happened again
|
||
- I had a hunch it might be due to the GDPR cookie plugin in my browser, so I disabled that and then refreshed and it worked... hmmm
|
||
- Upload thumbnails for about 42 IITA Journal Articles after resolving their DOIs and making sure they were not CC ND
|
||
- I fixed a few bugs in `get_scihub_pdfs.py` in the process
|
||
|
||
## 2023-06-21
|
||
|
||
- Stefano got back to me about the MODS OAI-PMH schema test on DSpace Test
|
||
- He said that it's fine if we use iso8601 encoding for dates instead of w3cdtf and asked if we can create a custom end point for AGRIS that only includes types like Journal Articles similar to how Salem did it: https://melspace.loc.codeobia.com/oai/agris?verb=ListRecords&metadataPrefix=mods
|
||
- I updated DSpace Test with the new date format and said I'd work on the custom AGRIS set
|
||
|
||
## 2023-06-25
|
||
|
||
- Export CGSpace to check for missing Initiative collection mappings
|
||
- I wanted to start a harvest on AReS but I've seen the load on the server high for a few days and I'm not sure what it is
|
||
- I decided to run all updates and reboot it since it's Sunday anyway
|
||
|
||
## 2023-06-26
|
||
|
||
- Since the new DSpace 7 will respect newlines in metadata fields I am curious to see how many of our abstracts have poor newlines
|
||
- I exported CGSpace and used a custom text facet with this GREL expression in OpenRefine to count the number of newlines in each cell:
|
||
|
||
```console
|
||
value.split('\n').length()
|
||
```
|
||
|
||
- Also useful to check for general length of the text in the cell to make sure it's a reasonably long string
|
||
- I spent some time trying to find a pattern that I could use to identify "easy" targets, but there are so many exceptions that it will have to be done manually
|
||
- I fixed a few dozen
|
||
- Do a bit of work on thumbnails on CGSpace
|
||
- I'm trying to troubleshoot the Discovery error I get on DSpace 7:
|
||
|
||
```console
|
||
java.lang.NullPointerException: Cannot invoke "org.dspace.discovery.configuration.DiscoverySearchFilterFacet.getIndexFieldName()" because the return value of "org.dspace.content.authority.DSpaceControlledVocabularyIndex.getFacetConfig()" is null
|
||
```
|
||
|
||
- I reverted to the default `submission-forms.xml` and the `getFacetConfig()` error goes away...
|
||
- Kill some long-held locks on CGSpace PostgreSQL, as some users are complaining of slowness in archiving
|
||
- I did some testing of the LDAP login issue related to groupmaps
|
||
- It does seem to be a regression from the [LDAP auth patch](https://github.com/DSpace/DSpace/pull/8814) from last month, so I [filed an issue](https://github.com/DSpace/DSpace/issues/8920)
|
||
- I spent some time on working on Angular and I figured out how to add a custom Angular component to show the UN SDG Goal icons on DSpace 7
|
||
|
||
## 2023-06-27
|
||
|
||
- I debugged the NullPointerException and somehow it disappeared
|
||
- It seems to be related to the external controlled vocabularies in the submission form
|
||
- I removed them all, then added them all back, and now the issue is solved... hmmmm
|
||
- Oh now, now they are gone again, sigh...
|
||
|
||
## 2023-06-28
|
||
|
||
- Spent a lot of time debugging the browse indexes
|
||
- Looking at the [DSpace 7 demo API](https://api7.dspace.org/server/api/discover/browses) I see the four default browse indexes from `dspace.cfg` and the one default `srsc` one that gets automatically enabled from the `<vocabulary>srsc</vocabulary>` in the `submission-forms.xml`
|
||
- The same API call on my test DSpace 7 configuration results in the HTTP 500 I've been seeing for some time, and I am pretty sure it's due to the automagic configuration of hierarchical browses based on the submission form
|
||
- Yes, if I remove them all from my submission form then this works: http://localhost:8080/server/api/discover/browses
|
||
- I went through each of our vocabularies and tested them one by one:
|
||
- dcterms-subject: OK
|
||
- dc-contributor-author: NO
|
||
- cg-creator-identifier: NO
|
||
- cg-contributor-affiliation: OK (and with `facetType: "affiliation"` in API response?!)
|
||
- cg-contributor-donor: OK (`facetType: "sponsorship"`)
|
||
- cg-journal: NO
|
||
- cg-coverage-subregion: NO
|
||
- cg-species-breed: NO
|
||
- Now I need to figure out what it is about those five that causes them to not work!
|
||
- Ah, after debugging with someone on the DSpace Slack, I realized that DSpace expects these vocabularies to have corresponding indexes configured in `discovery.xml`, and they must be added as search filters AND sidebar facets.
|
||
|
||
## 2023-06-29
|
||
|
||
- I noticed there is now a [patched version of the Handle JAR for DSpace 6.x](https://github.com/DSpace/DSpace/issues/8557#issuecomment-1595340249)
|
||
- This fixes the [issue in OpenJDK 1.8.0_352](https://groups.google.com/g/dspace-tech/c/PqjfA5mqG4w/m/FhxI5oXhFwAJ?pli=1), so we can remove the apt pin on JDK now
|
||
- I deployed it on CGSpace and it's working!
|
||
- I lowercased all our AGROVOC terms because I noticed a few that were not:
|
||
|
||
```console
|
||
dspace=# BEGIN;
|
||
BEGIN
|
||
dspace=*# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]';
|
||
UPDATE 53
|
||
dspace=*# COMMIT;
|
||
```
|
||
|
||
- After more discussion about the NullPointerException related to browse options, I filed [an issue](https://github.com/DSpace/DSpace/issues/8927)
|
||
|
||
## 2023-06-30
|
||
|
||
- I added another custom component to display CGIAR Impact Area icons in the DSpace 7 test
|
||
|
||
<!-- vim: set sw=2 ts=2: -->
|