- I added Phil Thornton and Sonal Henson's ORCID identifiers to the controlled vocabulary for `cg.creator.orcid` and then re-generated the names using my [resolve-orcids.py](https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b) script:
- But in super positive news, he says they are using my new [dspace-statistics-api](https://github.com/alanorth/dspace-statistics-api) and it's MUCH faster than using Atmire CUA's internal "restlet" API
- I don't recognize the `138.201.49.199` IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams:
- I tagged all of Sonal and Phil's items with their ORCID identifiers on CGSpace using my [add-orcid-identifiers.py](https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050) script:
- Salem raised an issue that the dspace-statistics-api reports downloads for some items that have no bitstreams (like many limited access items)
- Every item has at least a `LICENSE` bundle, and some have a `THUMBNAIL` bundle, but the indexing code is specifically checking for downloads from the `ORIGINAL` bundle
- [10568/97460](https://cgspace.cgiar.org/handle/10568/97460) (100550): has a thumbnail bitstream
- [10568/96112](https://cgspace.cgiar.org/handle/10568/96112) (96736): has only a LICENSE bitstream
- I see there are other bundles we might need to pay attention to: `TEXT`, `@_LOGO-COLLECTION_@`, `@_LOGO-COMMUNITY_@`, etc...
- On a hunch I dropped the statistics table and re-indexed and now those two items above have no downloads
- So it's fixed, but I'm not sure why!
- Peter wants to know the number of API requests per month, which was about 250,000 in September (exluding statlet requests):
- AgriKnowledge says they're going to add the `dc.identifier.uri` to their item view in November when they update their website software
## 2018-10-10
- Peter noticed that some recently added PDFs don't have thumbnails
- When I tried to force them to be generated I got an error that I've never seen before:
```
$ dspace filter-media -v -f -i 10568/97613
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: not authorized `/tmp/impdfthumb5039464037201498062.pdf' @ error/constitute.c/ReadImage/412.
```
- I see there was an update to Ubuntu's ImageMagick on 2018-10-05, so maybe something changed or broke?
- I get the same error when forcing `filter-media` to run on DSpace Test too, so it's gotta be an ImageMagic bug
- The ImageMagick version is currently 8:6.8.9.9-7ubuntu5.13, and there is an [Ubuntu Security Notice from 2018-10-04](https://usn.ubuntu.com/3785-1/)
- Wow, someone on [Twitter posted about this breaking his web application](https://twitter.com/rosscampbell/status/1048268966819319808) (and it was retweeted by the ImageMagick acount!)
- I commented out the line that disables PDF thumbnails in `/etc/ImageMagick-6/policy.xml`:
- I emailed DuraSpace to update [our entry in their DSpace registry](https://duraspace.org/registry/entry/4188/?gvid=178) (the data was still on DSpace 3, JSPUI, etc)
- Generate a list of the top 1500 values for `dc.subject` so Sisay can start making a controlled vocabulary for it:
```
dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-10-11-top-1500-subject.csv WITH CSV HEADER;
- Give WorldFish advice about Handles because they are talking to some company called KnowledgeArc who recommends they do not use Handles!
- Last week I emailed Altmetric to ask if their software would notice mentions of our Handle in the format "handle:10568/80775" because I noticed that the [Land Portal does this](https://landportal.org/library/resources/handle1056880775/unlocking-farming-potential-bangladesh%E2%80%99-polders)
- Altmetric support responded to say no, but the reason is that Land Portal is doing even more strange stuff by not using `<meta>` tags in their page header, and using "dct:identifier" property instead of "dc:identifier"
- I re-created my local DSpace databse container using [podman](https://github.com/containers/libpod) instead of Docker:
- With a few changes to my local Maven `settings.xml` it is working well
- Generate a list of the top 10,000 authors for Peter Ballantyne to look through:
```
dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 3 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 10000) to /tmp/2018-10-11-top-10000-authors.csv WITH CSV HEADER;
- CTA uploaded some infographics that are very tall and their thumbnails disrupt the item lists on the front page and in their communities and collections
- I decided to constrain the max height of these to 200px using CSS ([#392](https://github.com/ilri/DSpace/pull/392))
- I will apply these on CGSpace when I do the other updates tomorrow, as well as double check the high scoring ones to see if they are correct in Sisay's author controlled vocabulary
- Merge the authors controlled vocabulary ([#393](https://github.com/ilri/DSpace/pull/393)), usage rights ([#394](https://github.com/ilri/DSpace/pull/394)), and the upstream DSpace 5.x cherry-picks ([#394](https://github.com/ilri/DSpace/pull/395)) into our `5_x-prod` branch
- Switch to new CGIAR LDAP server on CGSpace, as it's been running (at least for authentication) on DSpace Test for the last few weeks, and I think they old one will be deprecated soon (today?)
- Apply Peter's 746 author corrections on CGSpace and DSpace Test using my [fix-metadata-values.py](https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897) script:
- After rebooting the server I noticed that Handles are not resolving, and the `dspace-handle-server` systemd service is not running (or rather, it exited with success)
- Restarting the service with systemd works for a few seconds, then the java process quits
- I suspect that the systemd service type needs to be `forking` rather than `simple`, because the service calls the default DSpace `start-handle-server` shell script, which uses `nohup` and `&` to background the java process
- It would be nice if there was a cleaner way to start the service and then just log to the systemd journal rather than all this hiding and log redirecting
- Email the Landportal.org people to ask if they would consider Dublin Core metadata tags in their page's header, rather than the HTML properties they are using in their body
- Peter pointed out that some thumbnails were still not getting generated
- When I tried to generate them manually I noticed that the path to the CMYK profile had changed because Ubuntu upgraded Ghostscript from 9.18 to 9.25 last week... WTF?
- Looks like I can use `/usr/share/ghostscript/current` instead of `/usr/share/ghostscript/9.25`...
- I limited the tall thumbnails even further to 170px because Peter said CTA's were still too tall at 200px ([#396](https://github.com/ilri/DSpace/pull/396))