mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-25 16:08:19 +01:00
244 lines
12 KiB
Markdown
244 lines
12 KiB
Markdown
---
|
|
title: "September, 2023"
|
|
date: 2023-09-02T17:29:36+03:00
|
|
author: "Alan Orth"
|
|
categories: ["Notes"]
|
|
---
|
|
|
|
## 2023-09-02
|
|
|
|
- Export CGSpace to check for missing Initiative collection mappings
|
|
- Start a harvest on AReS
|
|
|
|
<!--more-->
|
|
|
|
## 2023-09-03
|
|
|
|
- I figured out how to use Altmetric and Dimensions badges in the DSpace Angular frontend
|
|
- It still feels hacky, but using [AfterViewInit](https://stackoverflow.com/questions/41936631/how-to-trigger-the-function-after-dom-markup-is-loaded-in-angular-style-applicat), and importing the Altmetric `embed.js` in the component works
|
|
- The style on mobile also needs work...
|
|
|
|
## 2023-09-06
|
|
|
|
- Discussion with Marie about finalizing the output types list on GitHub
|
|
- I did some review and cleanup in preparation for publishing the new list
|
|
|
|
## 2023-09-07
|
|
|
|
- Export CGSpace to start doing a review of the metadata
|
|
- First I will start by extracting all items with DOIs, along with some fields I can compare against Crossref:
|
|
|
|
```console
|
|
$ csvgrep -c 'cg.identifier.doi[en_US]' -r 'doi.org' ~/Downloads/2023-09-07-cgspace.csv \
|
|
| csvcut -c 'id,dc.title[en_US],dcterms.issued[en_US],dcterms.available[en_US],cg.issn[en_US],cg.isbn[en_US],cg.volume[en_US],cg.issue[en_US],cg.number[en_US],dcterms.extent[en_US],cg.identifier.doi[en_US],cg.reviewStatus[en_US],cg.isijournal[en_US],dcterms.license[en_US],dcterms.accessRights[en_US],dcterms.type[en_US],dc.identifier.uri[en_US]' \
|
|
> /tmp/2023-09-07-cgspace-dois.csv
|
|
$ csvgrep -c 'cg.identifier.doi[en_US]' -r 'doi.org' ~/Downloads/2023-09-07-cgspace.csv | csvcut -c 'cg.identifier.doi[en_US]' | sed 1d > /tmp/2023-09-07-cgspace-dois.txt
|
|
```
|
|
|
|
- Then I resolved the DOIs from Crossref:
|
|
|
|
```console
|
|
$ ./ilri/crossref_doi_lookup.py -i /tmp/2023-09-07-cgspace-dois.txt -o /tmp/2023-09-07-cgspace-dois-results.csv -e a.orth@cgiar.org
|
|
```
|
|
|
|
- A user emailed to ask about uploading a 180MB PDF to CGSpace
|
|
- I used GhostScript to try reducing it using the `screen`, `ebook` and `prepress` presets:
|
|
|
|
```console
|
|
$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=primer-screen.pdf Primer\ \(digital\)_Climate-\ smart\ and\ regenerative\ agriculture\ in\ climate\ change\ adaptation.pdf
|
|
$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dBATCH -sOutputFile=primer-ebook.pdf Primer\ \(digital\)_Climate-\ smart\ and\ regenerative\ agriculture\ in\ climate\ change\ adaptation.pdf
|
|
$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/prepress -dNOPAUSE -dQUIET -dBATCH -sOutputFile=primer-prepress.pdf Primer\ \(digital\)_Climate-\ smart\ and\ regenerative\ agriculture\ in\ climate\ change\ adaptation.pdf
|
|
```
|
|
|
|
- The `prepress` one is 300DPI and looks visually identical to the original, so I proposed that we use that one
|
|
|
|
## 2023-09-08
|
|
|
|
- I did a review of the metadata for our items with DOIs, comparing with data from Crossref
|
|
- I spot checked a handful of issue / online dates and licenses, and saw that Crossref's dates are always more accurate than ours when they differ
|
|
- I also filled in some missing volumes, issues, ISSNs, and extents
|
|
- This results in 14,000 changes to existing items, which will take several days to import unfortunately
|
|
- After eight hours the first file is only about 2/3 finished... sigh
|
|
- Meet with Peter to discuss changes to the DSpace 7 test
|
|
- Minor updates to submission forms and some new ideas for the home page and item page
|
|
- I figured out how to use a themed home page component and add a cards UI to our CGSpace theme
|
|
|
|
## 2023-09-09
|
|
|
|
- I can't believe that almost 18 hours later the first CSV import with 5,000 changes is not done...
|
|
- Run all system updates on CGSpace and reboot it, as it had been two months since the last time
|
|
|
|
## 2023-09-10
|
|
|
|
- Minor work on the DSpace 7 home page
|
|
|
|
## 2023-09-11
|
|
|
|
- Export CGSpace to check for missing Initiative collection mappings
|
|
- Start a harvest on AReS
|
|
|
|
## 2023-09-12
|
|
|
|
- Minor work on DSpace 7 home page
|
|
- Minor work on CG Core types
|
|
- I published a new HTML version of the updated IPtypes and archived the current version as v2.0.0 so we can still reference it
|
|
|
|
## 2023-09-13
|
|
|
|
- Stefano reminded me about the updated OAI MODS mappings on CGSpace so I re-applied them on DSpace Test and updated the OAI index so he could confirm
|
|
- Now I'm ready to put it on CGSpace if he confirms
|
|
- I created a basic theme for CIP on DSpace 7
|
|
- While doing that I noticed that a bunch of CIP bitstreams didn't have the latest 500px thumbnails so I re-ran filter-media on a handful of their collections
|
|
- I had two occurrences of an OOM kill of the Tomcat 9 java process on DSpace 7 test tonight
|
|
- Once while doing a Discovery index, the other while doing filter media
|
|
|
|
## 2023-09-15
|
|
|
|
- Discuss issues with the Altmetric API with the Altmetric support team
|
|
- Apparently we can use a different API, the [Explorer API](https://www.altmetric.com/explorer/documentation/api), since we already have access to the Explorer dashboard
|
|
- I reduced the Solr heap size on DSpace 7 from 3GB to 2GB
|
|
- Apparentlty I already did this from 4GB to 3GB a few months ago
|
|
- The Solr admin interface was showing Solr taking ~1GB of RAM so I think this should be safe
|
|
- Mark on DSpace Slack said he uses PM2's `--max-memory-restart` so the processes restart when they hit the limit
|
|
- Also, he said he had to reduce `cache:serverSide:botCache:max` from 1000 to 500 to cache less SSR pages in memory
|
|
- I decided to try deploying DSpace 7 Test on a Hetzner server with 64GB RAM, 6 CPUs, and 2x512GB NVMe SSD
|
|
|
|
## 2023-09-16
|
|
|
|
- Export CGSpace to check for missing Initiative collection mappings
|
|
- Start a harvest on AReS
|
|
- Configure the privacy policy page on DSpace 7 using a themed component with the text from our DSpace 6 site
|
|
- I realized that for all my custom Angular components I should be using `routerLink` instead of `href` when I am constructing links
|
|
- The `routerLink` routes within the single page application and saves state, while the `href` reloads the page
|
|
- Using the `routerLink` way is faster and results in less flashing and jumping in the page when navigating
|
|
- See: https://stackoverflow.com/a/61588147
|
|
|
|
## 2023-09-17
|
|
|
|
- I added an About page to DSpace 7 Test using similar logic to the privacy page
|
|
|
|
## 2023-09-18
|
|
|
|
- I filed a GitHub issue for being unable to navigate dropdown lists using the keyboard on the dspace-angular submission form: https://github.com/DSpace/dspace-angular/issues/2500
|
|
- I filed a GitHub issue for the search filters capitalizing metadata values: https://github.com/DSpace/dspace-angular/issues/2501
|
|
|
|
## 2023-09-19
|
|
|
|
- Complete migration of DSpace 7 Test from Linode to Hetzner
|
|
- Export some years of Solr stats from CGSpace to import on the new DSpace 7 Test:
|
|
|
|
```console
|
|
$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics -a export -o /tmp/statistics-2020-2022.json -f 'time:[2020-01-01T00\:00\:00Z TO 2022-12-31T23\:59\:59Z]' -k uid -S actingGroupId,actingGroupParentId,actorMemberGroupId,author_mtdt,author_mtdt_search,bitstreamCount,bitstreamId,complete_query,complete_query_search,containerBitstream,containerCollection,containerCommunity,containerItem,core_update_run_nb,countryCode_ngram,countryCode_search,cua_version,dateYear,dateYearMonth,file_id,filterquery,first_name,geoipcountrycode,geoIpCountryCode,group_id,group_map,group_name,ip_ngram,ip_search,isArchived,isInternal,iso_mtdt,iso_mtdt_search,isWithdrawn,last_name,name,ngram_query_search,ngram_simplequery_search,orphaned,parent_count,p_communities_id,p_communities_map,p_communities_name,p_group_id,p_group_map,p_group_name,range,rangeDescription,rangeDescription_ngram,rangeDescription_search,range_ngram,range_search,referrer_ngram,referrer_search,simple_query,simple_query_search,solr_update_time_stamp,storage_nb_of_bitstreams,storage_size,storage_statistics_type,subject_mtdt,subject_mtdt_search,text,userAgent_ngram,userAgent_search,version_id,workflowItemId
|
|
```
|
|
|
|
- Ben sent me an export of ILRI presentations from Slideshare and asked if we could see if any are missing on CGSpace
|
|
- First I exported CGSpace and extracted the `cg.identifier.url` column so I could normalize all Slideshare URLs to use "https://www.slideshare.net" instead of localized variants (es.slideshare.net, fr.slideshare.net, etc) as well as non-https links and links with query params and slashes at the end
|
|
- This was about 250 URLs
|
|
- I extracted the URL field from both our list and the Slideshare list and then used [GNU `join` to print non-matched lines](https://unix.stackexchange.com/questions/274548/join-two-files-each-with-two-columns-including-non-matching-lines):
|
|
|
|
```console
|
|
$ join -t, -v 2 -11 -21 -o auto /tmp/cgspace-ilri-slideshare-sorted-only-urls-sorted.csv /tmp/ilri-slideshare-sorted-sorted.csv | wc -l
|
|
542
|
|
```
|
|
|
|
- Important to note that you must use GNU `sort` on the fiels first, as I had tried sorting in vim and it didn't satisfy `join`
|
|
- So it seems there are 542 Slideshare presentations we are missing
|
|
|
|
## 2023-09-20
|
|
|
|
- Regarding the incorrect city in Solr statistics, I see we have 1,600,000 of them
|
|
- Before filing a GitHub issue, I want to check if they maybe come from an Atmire module, as I see them clustered around two particular CUA versions:
|
|
|
|
```json
|
|
{
|
|
"responseHeader": {
|
|
"status": 0,
|
|
"QTime": 2760,
|
|
"params": {
|
|
"q": "city:com.maxmind.geoip2.record.City*",
|
|
"facet.field": "cua_version",
|
|
"indent": "true",
|
|
"rows": "0",
|
|
"wt": "json",
|
|
"facet": "true",
|
|
"_": "1695192301927"
|
|
}
|
|
},
|
|
"response": {
|
|
"numFound": 1661863,
|
|
"start": 0,
|
|
"docs": []
|
|
},
|
|
"facet_counts": {
|
|
"facet_queries": {},
|
|
"facet_fields": {
|
|
"cua_version": [
|
|
"6.x-4.1.10-ilri-RC7",
|
|
1112186,
|
|
"6.x-4.1.10-ilri-RC5",
|
|
451180,
|
|
"6.x-4.1.10-ilri-RC9",
|
|
0
|
|
]
|
|
},
|
|
"facet_dates": {},
|
|
"facet_ranges": {},
|
|
"facet_intervals": {}
|
|
}
|
|
}
|
|
```
|
|
|
|
- I migrated AReS from Linode to Hetzner
|
|
- I asked on Slack and someone told me that we need to edit `src/app/menu.resolver.ts` to add new drop down menus to the top navbar
|
|
- It works, though is unfortunate that we can't do it in a theme
|
|
|
|
## 2023-09-21
|
|
|
|
- More minor work on DSpace 7 home page and menus
|
|
- Meeting to discuss types and DSpace 7 migration plans
|
|
- Create a DSpace 7 theme for IITA
|
|
|
|
## 2023-09-22
|
|
|
|
- Create a DSpace 7 theme for IWMI
|
|
- I had some issues with pm2 on the new DSpace 7 Test
|
|
- It seems to be due to mixing systemd starting versus manually starting / stopping...
|
|
- After reading the discussion in [this pm2 issue](https://github.com/Unitech/pm2/issues/2914) I realize that we probably need to use `--no-daemon` to have systemd fully manage the processes without pm2 trying to save state
|
|
|
|
## 2023-09-23
|
|
|
|
- Export CGSpace to check for missing Initiative collection mappings
|
|
- Start a harvest on AReS
|
|
|
|
## 2023-09-25
|
|
|
|
- CGSpace metadata and community / collection cleanup
|
|
- Review some patches on DSpace Angular
|
|
- Create a basic Alliance theme for DSpace 7
|
|
|
|
## 2023-09-27
|
|
|
|
- I realized that we can get controlled vocabularies from DSpace 7's REST API, for both value-pairs and hierarchical controlled vocabularies, ie:
|
|
|
|
https://dspace7test.ilri.org/server/api/submission/vocabularies/common_iso_languages/entries
|
|
|
|
## 2023-09-29
|
|
|
|
- Meeting with Aditi and others to discuss plan for using CGSpace to do a systematic review of CGIAR research on climate change
|
|
- I cleaned up metadata for a hundred or so items, and realized we will need to do more to make sure abstracts and open access status are correct since there will be a laser focus on the metadata
|
|
|
|
## 2023-09-30
|
|
|
|
- Export CGSpace to check for missing Initiative collection mappings
|
|
- Still working on checking Unpaywall for access rights and licenses for our DOIs
|
|
- Regarding Unpaywall's "evidence" metadata about whether an item is open access or not, after looking at dozens of items manually:
|
|
- evidence: "oa journal (via doaj)" <---- yes
|
|
- evidence: "open (via free article)" <---- hmmm, not always correct
|
|
- evidence: "open (via page says license)" <--- noooo, can't rely on that
|
|
- evidence: "open (via page says Open Access)" <---- yes...?
|
|
- evidence: "open (via free pdf)" <---- hmmm, not always correct
|
|
- evidence: "oa journal (via publisher name)" <---- noooo
|
|
- I updated access status for about four hundred more items based on this, and licenses for a dozen or so
|
|
|
|
<!-- vim: set sw=2 ts=2: -->
|