Add notes for 2022-03-31

This commit is contained in:
2022-03-31 16:09:14 +03:00
parent 79b5f023e1
commit 054d666fe0
26 changed files with 90 additions and 31 deletions

View File

@ -271,4 +271,31 @@ $ chrt -b 0 dspace filter-media -p "ImageMagick PDF Thumbnail" -i 10947/50
- After that I did some normalization on the `cg.subject.system` metadata and extracted a few dozen countries to the country field
- Start a harvest on AReS
## 2022-03-30
- Yesterday Rafael from CIAT asked me to re-create his approver account on DSpace Test as well
```console
$ dspace user -a -m tip-approve@cgiar.org -g Rafael -s Rodriguez -p 'fuuuu'
```
- I started looking into the request regarding the CIAT Library PDFs
- There are over 4,000 links to PDFs hosted on that server in CGSpace metadata
- The links seem to be down though! I emailed Paola to ask
## 2022-03-31
- Switch DSpace Test (linode26) back to CMS GC so I can do some monitoring and evaluation of GC before switching to G1GC
- Leroy from CIAT said that the CIAT Library server has security issues so was limited to internal traffic
- I extracted a list of URLs from CGSpace to send him:
```console
localhost/dspacetest= ☘ \COPY (SELECT DISTINCT(text_value) FROM metadatavalue WHERE metadata_field_id=219 AND text_value ~ 'https?://ciat-library') to /tmp/2022-03-31-ciat-library-urls.csv WITH CSV HEADER;
COPY 4552
```
- I did some checks and cleanups in OpenRefine because there are some values with "#page" etc
- Once I sorted them there were only ~2,700, which means there are going to be almost two thousand items with duplicate PDFs
- I suggested that we might want to handle those cases specially and extract the chapters or whatever page range since they are probably books
<!-- vim: set sw=2 ts=2: -->