mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-11-07
This commit is contained in:
@ -20,4 +20,102 @@ categories: ["Notes"]
|
||||
- Tim merged my [pull request to override the ImageMagick PDF density in DSpace 7](https://github.com/DSpace/DSpace/pull/8553)
|
||||
- I ported it to DSpace 6.x and submitted a pull request: https://github.com/DSpace/DSpace/pull/8560
|
||||
|
||||
## 2022-11-02
|
||||
|
||||
- I joined the FAO–CGIAR AGROVOC results sharing meeting
|
||||
- From June to October, 2022 we suggested 39 new keywords, added 27 to AGROVOC, 4 rejected, and 9 still under discussion
|
||||
- Doing duplicate check on IFPRI's batch upload and I found one duplicate uploaded by IWMI earlier this year
|
||||
- I will update the metadata of that item and map it to the correct Initiative collection
|
||||
|
||||
## 2022-11-03
|
||||
|
||||
- I added countries to the twenty-three IFPRI items in OpenRefine based on their titles and abstracts (using the Jython trick I learned a few months ago), then added regions using csv-metadata-quality, and uploaded them to CGSpace
|
||||
- I exported a list of collections from CGSpace so I can run the thumbnail fixes on each, as we seem to have issues when doing it on (some) large communities like the CRP community:
|
||||
|
||||
```console
|
||||
localhost/dspace= ☘ \COPY (SELECT ds6_collection2collectionhandle(uuid) AS collection FROM collection) to /tmp/collections.txt
|
||||
COPY 1268
|
||||
```
|
||||
|
||||
- Then I started a test run on DSpace Test:
|
||||
|
||||
```console
|
||||
$ while read -r collection; do chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails $collection | tee -a /tmp/FixLowQualityThumbnails.log; done < /tmp/collections.txt
|
||||
```
|
||||
|
||||
- I'll be curious to check the log after it's all done.
|
||||
- After a few hours I see:
|
||||
|
||||
```console
|
||||
$ grep -c 'Action: remove' /tmp/FixLowQualityThumbnails.log
|
||||
626
|
||||
```
|
||||
|
||||
- Not bad, because last week I did a more manual selection of collections and deleted ~200
|
||||
- I will replicate this on CGSpace soon, and also try the FixJpgJpgThumbnails tool
|
||||
- I see that the CIAT Library is still up, so I should really grab all the PDFs before they shut that old server down
|
||||
- Export a list of items with PDFs linked there:
|
||||
|
||||
```console
|
||||
localhost/dspacetest= ☘ \COPY (SELECT dspace_object_id,text_value FROM metadatavalue WHERE metadata_field_id=219 AND text_value LIKE '%ciat-library%') to /tmp/ciat-library-items.csv;
|
||||
COPY 4621
|
||||
```
|
||||
|
||||
- After stripping the page numbers off I see there are only about 2,700 unique files, and we have to filter the dead JSPUI ones...
|
||||
|
||||
```console
|
||||
$ csvcut -c url 2022-11-03-CIAT-Library-items.csv | sed 1d | grep -v jspui | sort -u | wc -l
|
||||
2752
|
||||
```
|
||||
|
||||
- I'm not sure how we'll handle the duplicates because many items are book chapters or something where they share a PDF
|
||||
|
||||
## 2022-11-04
|
||||
|
||||
- I decided to check for old pre-ImageMagick thumbnails on CGSpace by finding any bitstreams with the description "Generated Thumbnail":
|
||||
|
||||
```console
|
||||
localhost/dspacetest= ☘ \COPY (SELECT ds6_bitstream2itemhandle(dspace_object_id) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND text_value='Generated Thumbnail') to /tmp/old-thumbnails.txt;
|
||||
COPY 1147
|
||||
$ grep -v '\\N' /tmp/old-thumbnails.txt > /tmp/old-thumbnail-handles.txt
|
||||
$ wc -l /tmp/old-thumbnail-handles.txt
|
||||
987 /tmp/old-thumbnail-handles.txt
|
||||
```
|
||||
|
||||
- A bunch of these have `\N` for some reason when I use the `ds6_bitstream2itemhandle` function to get their handles so I had to exclude those...
|
||||
- I forced the media-filter for these items on CGSpace:
|
||||
|
||||
```console
|
||||
$ while read -r handle; do JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -i $handle -f -v; done < /tmp/old-thumbnail-handles.txt
|
||||
```
|
||||
|
||||
- Upload some batch records via CSV for Peter
|
||||
- Update the about page on CGSpace with new text from Peter
|
||||
- Add a few more ORCID identifiers and names to my growing file `2022-09-22-add-orcids.csv`
|
||||
- I tagged fifty-four new authors using this list
|
||||
- I deleted and mapped one duplicate item for Maria Garruccio
|
||||
- I updated the CG Core website from Bootstrap v4.6 to v5.2
|
||||
|
||||
## 2022-11-07
|
||||
|
||||
- I did a harvest on AReS last night but it seems that MELSpace's sitemap is broken again because we have 10,000 fewer records
|
||||
- I filed [an issue](https://github.com/ecrmnn/iso-3166-1/issues/10) on the iso-3166-1 npm package to update the name of Turkey to Türkiye
|
||||
- I also filed [an issue](https://github.com/flyingcircusio/pycountry/issues/148) and [a pull request](https://github.com/flyingcircusio/pycountry/pull/149) on the pycountry package
|
||||
- I also filed [an issue](https://github.com/konstantinstadler/country_converter/issues/121) and [a pull request](https://github.com/konstantinstadler/country_converter/pull/122) on the country-converter package
|
||||
- I also changed one item on CGSpace that had been submitted since the name was changed
|
||||
- I also imported the new iso-codes 4.12.0 into cgspace-java-helpers
|
||||
- I also updated it in the DSpace `input-forms.xml`
|
||||
- I also forked the iso-3166-1 package from npm and updated Swaziland, Macedonia, and Turkey in my fork
|
||||
- I submitted a [pull request](https://github.com/ecrmnn/iso-3166-1/pull/11) to update this upstream
|
||||
- Since I was making all these pull requests I also made [one on country-converter for the UN M.49 region "South-eastern Asia"](https://github.com/konstantinstadler/country_converter/pull/123)
|
||||
- Port the [ImageMagick PDF cropbox fix](https://github.com/DSpace/DSpace/pull/8550) to DSpace 6.x
|
||||
- I deployed it on CGSpace, ran all updates, and rebooted the host
|
||||
- I ran the filter-media script on one large collection where many of these PDFs with cropbox issues exist:
|
||||
|
||||
```console
|
||||
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" dspace filter-media -p "ImageMagick PDF Thumbnail" -v -f -i 10568/78 >& /tmp/filter-media-cropbox.log
|
||||
```
|
||||
|
||||
- But looking at the items it processed, I'm not sure it's working as expected
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user