mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-03-31
This commit is contained in:
@ -271,4 +271,31 @@ $ chrt -b 0 dspace filter-media -p "ImageMagick PDF Thumbnail" -i 10947/50
|
||||
- After that I did some normalization on the `cg.subject.system` metadata and extracted a few dozen countries to the country field
|
||||
- Start a harvest on AReS
|
||||
|
||||
## 2022-03-30
|
||||
|
||||
- Yesterday Rafael from CIAT asked me to re-create his approver account on DSpace Test as well
|
||||
|
||||
```console
|
||||
$ dspace user -a -m tip-approve@cgiar.org -g Rafael -s Rodriguez -p 'fuuuu'
|
||||
```
|
||||
|
||||
- I started looking into the request regarding the CIAT Library PDFs
|
||||
- There are over 4,000 links to PDFs hosted on that server in CGSpace metadata
|
||||
- The links seem to be down though! I emailed Paola to ask
|
||||
|
||||
## 2022-03-31
|
||||
|
||||
- Switch DSpace Test (linode26) back to CMS GC so I can do some monitoring and evaluation of GC before switching to G1GC
|
||||
- Leroy from CIAT said that the CIAT Library server has security issues so was limited to internal traffic
|
||||
- I extracted a list of URLs from CGSpace to send him:
|
||||
|
||||
```console
|
||||
localhost/dspacetest= ☘ \COPY (SELECT DISTINCT(text_value) FROM metadatavalue WHERE metadata_field_id=219 AND text_value ~ 'https?://ciat-library') to /tmp/2022-03-31-ciat-library-urls.csv WITH CSV HEADER;
|
||||
COPY 4552
|
||||
```
|
||||
|
||||
- I did some checks and cleanups in OpenRefine because there are some values with "#page" etc
|
||||
- Once I sorted them there were only ~2,700, which means there are going to be almost two thousand items with duplicate PDFs
|
||||
- I suggested that we might want to handle those cases specially and extract the chapters or whatever page range since they are probably books
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user