cgspace-notes/content/posts/2017-03.md

238 lines
12 KiB
Markdown
Raw Normal View History

2017-03-01 16:10:08 +01:00
+++
date = "2017-03-01T17:08:52+02:00"
author = "Alan Orth"
title = "March, 2017"
tags = ["Notes"]
+++
## 2017-03-01
- Run the 279 CIAT author corrections on CGSpace
2017-03-02 18:26:39 +01:00
## 2017-03-02
2017-03-04 00:15:47 +01:00
- Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace
- CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles
- They might come in at the top level in one "CGIAR System" community, or with several communities
- I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?
- Need to send Peter and Michael some notes about this in a few days
- Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI
2017-03-02 18:26:39 +01:00
- Filed an issue on DSpace issue tracker for the `filter-media` bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: [DS-3516](https://jira.duraspace.org/browse/DS-3516)
- Discovered that the ImageMagic `filter-media` plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK
- Interestingly, it seems DSpace 4.x's thumbnails were sRGB, but forcing regeneration using DSpace 5.x's ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see [10568/51999](https://cgspace.cgiar.org/handle/10568/51999)):
```
$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
```
2017-03-01 16:10:08 +01:00
<!--more-->
2017-03-02 19:00:18 +01:00
2017-03-03 17:40:38 +01:00
- This results in discolored thumbnails when compared to the original PDF, for example sRGB and CMYK:
2017-03-02 19:00:18 +01:00
![Thumbnail in sRGB colorspace](/cgspace-notes/2017/03/thumbnail-srgb.jpg)
2017-03-02 23:57:37 +01:00
![Thumbnial in CMYK colorspace](/cgspace-notes/2017/03/thumbnail-cmyk.jpg)
2017-03-03 00:32:54 +01:00
- I filed an issue for the color space thing: [DS-3517](https://jira.duraspace.org/browse/DS-3517)
2017-03-03 17:40:38 +01:00
## 2017-03-03
- I created a patch for DS-3517 and made a pull request against upstream `dspace-5_x`: https://github.com/DSpace/DSpace/pull/1669
- Looks like `-colorspace sRGB` alone isn't enough, we need to use profiles:
```
2017-03-04 00:15:47 +01:00
$ convert alc_contrastes_desafios.pdf\[0\] -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_cmyk.icc -thumbnail 300x300 -flatten -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_rgb.icc alc_contrastes_desafios.pdf.jpg
2017-03-03 17:40:38 +01:00
```
2017-03-04 00:15:47 +01:00
- This reads the input file, applies the CMYK profile, applies the RGB profile, then writes the file
- Note that you should set the first profile immediately after the input file
- Also, it is better to use profiles than setting `-colorspace`
2017-03-03 17:40:38 +01:00
- This is a great resource describing the color stuff: http://www.imagemagick.org/Usage/formats/#profiles
2017-03-04 00:15:47 +01:00
- Somehow we need to detect the color system being used by the input file and handle each case differently (with profiles)
2017-03-03 17:40:38 +01:00
- This is trivial with `identify` (even by the [Java ImageMagick API](http://im4java.sourceforge.net/api/org/im4java/core/IMOps.html#identify)):
```
$ identify -format '%r\n' alc_contrastes_desafios.pdf\[0\]
DirectClass CMYK
$ identify -format '%r\n' Africa\ group\ of\ negotiators.pdf\[0\]
DirectClass sRGB Alpha
```
2017-03-05 11:39:09 +01:00
2017-03-05 14:30:36 +01:00
## 2017-03-04
- Spent more time looking at the ImageMagick CMYK issue
- The `default_cmyk.icc` and `default_rgb.icc` files are both part of the Ghostscript GPL distribution, but according to DSpace's `LICENSES_THIRD_PARTY` file, DSpace doesn't allow distribution of dependencies that are licensed solely under the GPL
- So this issue is kinda pointless now, as the ICC profiles are absolutely necessary to make a meaningful CMYK→sRGB conversion
2017-03-05 11:39:09 +01:00
## 2017-03-05
- Look into helping developers from landportal.info with a query for items related to LAND on the REST API
- They want something like the items that are returned by the general "LAND" query in the search interface, but we cannot do that
- We can only return specific results for metadata fields, like:
```
$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "LAND REFORM", "language": null}' | json_pp
```
- But there are hundreds of combinations of fields and values (like `dc.subject` and all the center subjects), and we can't use wildcards in REST!
2017-03-05 14:30:36 +01:00
- Reading about enabling multiple handle prefixes in DSpace
- There is a mailing list thread from 2011 about it: http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html
- And a comment from Atmire's Bram about it on the DSpace wiki: https://wiki.lyrasis.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296
2017-03-15 17:10:01 +01:00
- Bram mentions an undocumented configuration option `handle.plugin.checknameauthority`, but I noticed another one in `dspace.cfg`:
2017-03-05 14:30:36 +01:00
```
# List any additional prefixes that need to be managed by this handle server
# (as for examle handle prefix coming from old dspace repository merged in
# that repository)
# handle.additional.prefixes = prefix1[, prefix2]
```
- Because of this I noticed that our Handle server's `config.dct` was potentially misconfigured!
- We had some default values still present:
```
"300:0.NA/YOUR_NAMING_AUTHORITY"
```
- I've changed them to the following and restarted the handle server:
```
"300:0.NA/10568"
```
- In looking at all the configs I just noticed that we are not providing a DOI in the Google-specific metadata crosswalk
- From `dspace/config/crosswalks/google-metadata.properties`:
```
google.citation_doi = cg.identifier.doi
```
- This works, and makes DSpace output the following metadata on the item view page:
```
<meta content="https://dx.doi.org/10.1186/s13059-017-1153-y" name="citation_doi">
```
- Submitted and merged pull request for this: https://github.com/ilri/DSpace/pull/305
2017-03-05 22:10:50 +01:00
- Submit pull request to set the author separator for XMLUI item lists to a semicolon instead of ",": https://github.com/ilri/DSpace/pull/306
- I want to show it briefly to Abenet and Peter to get feedback
2017-03-06 15:27:48 +01:00
## 2017-03-06
- Someone on the mailing list said that `handle.plugin.checknameauthority` should be false if we're using multiple handle prefixes
2017-03-07 13:03:29 +01:00
## 2017-03-07
- I set up a top-level community as a test for the CGIAR Library and imported one item with the the 10947 handle prefix
- When testing the Handle resolver locally it shows the item to be on the local repository
- So this seems to work, with the following caveats:
- New items will have the default handle
- Communities and collections will have the default handle
- Only items imported manually can have the other handles
2017-03-07 17:01:24 +01:00
- I need to talk to Michael and Peter to share the news, and discuss the structure of their community(s) and try some actual test data
- We'll need to do some data cleaning to make sure they are using the same fields we are, like `dc.type` and `cg.identifier.status`
2017-03-07 17:05:24 +01:00
- Another thing is that the import process creates new `dc.date.accessioned` and `dc.date.available` fields, so we end up with duplicates (is it important to preserve the originals for these?)
2017-03-08 15:08:47 +01:00
- Report DS-3520 issue to Atmire
## 2017-03-08
- Merge the author separator changes to `5_x-prod`, as everyone has responded positively about it, and it's the default in Mirage2 afterall!
- Cherry pick the `commons-collections` patch from DSpace's `dspace-5_x` branch to address DS-3520: https://jira.duraspace.org/browse/DS-3520
2017-03-09 13:29:50 +01:00
## 2017-03-09
- Export list of sponsors so Peter can clean it up:
```
dspace=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship') group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
COPY 285
```
2017-03-12 12:22:16 +01:00
## 2017-03-12
- Test the sponsorship fixes and deletes from Peter:
```
$ ./fix-metadata-values.py -i Investors-Fix-51.csv -f dc.description.sponsorship -t Action -m 29 -d dspace -u dspace -p fuuuu
$ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.sponsorship -m 29 -d dspace -u dspace -p fuuu
```
- Generate a new list of unique sponsors so we can update the controlled vocabulary:
```
dspace=# \copy (select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship')) to /tmp/sponsorship.csv with csv;
```
2017-03-12 12:25:28 +01:00
- Pull request for controlled vocabulary if Peter approves: https://github.com/ilri/DSpace/pull/308
2017-03-12 12:41:42 +01:00
- Review Sisay's roots, tubers, and bananas (RTB) theme, which still needs some fixes to work properly: https://github.com/ilri/DSpace/pull/307
2017-03-12 12:51:05 +01:00
- Created an issue to track the progress on the Livestock CRP theme: https://github.com/ilri/DSpace/issues/309
2017-03-12 16:32:58 +01:00
- Created a basic theme for the Livestock CRP community
![Livestock CRP theme](/cgspace-notes/2017/03/livestock-theme.png)
2017-03-15 17:10:01 +01:00
## 2017-03-15
- Merge pull request for controlled vocabulary updates for sponsor: https://github.com/ilri/DSpace/pull/308
- Merge pull request for Livestock CRP theme: https://github.com/ilri/DSpace/issues/309
- Create pull request for PABRA subjects (waiting for confirmation from Abenet before merging): https://github.com/ilri/DSpace/pull/310
- Create pull request for CCAFS Phase II migrations (waiting for confirmation from CCAFS people): https://github.com/ilri/DSpace/pull/311
2017-03-15 17:15:32 +01:00
- I also need to ask if either of these new fields need to be added to Discovery facets, search, and Atmire modules
2017-03-15 23:15:13 +01:00
- Run all system updates on DSpace Test and re-deploy CGSpace
2017-03-16 15:56:54 +01:00
## 2017-03-16
- Merge pull request for PABRA subjects: https://github.com/ilri/DSpace/pull/310
- Abenet and Peter say we can add them to Discovery, Atmire modules, etc, but I might not have time to do it now
- Help Sisay with RTB theme again
- Remove ICARDA subject from Discovery sidebar facets: https://github.com/ilri/DSpace/pull/312
- Remove ICARDA subject from Browse and item submission form: https://github.com/ilri/DSpace/pull/313
- Merge the CCAFS Phase II changes but hold off on doing the flagship metadata updates until Macaroni Bros gets their importer updated
2017-03-16 18:05:46 +01:00
- Deploy latest changes and investor fixes/deletions on CGSpace
- Run system updates on CGSpace and reboot server
2017-03-20 13:09:52 +01:00
## 2017-03-20
- Create basic XMLUI theme for PABRA community: https://github.com/ilri/DSpace/pull/315
2017-03-24 13:27:31 +01:00
## 2017-03-24
- Still helping Sisay try to figure out how to create a theme for the RTB community
2017-03-28 17:39:09 +02:00
## 2017-03-28
- CCAFS said they are ready for the flagship updates for Phase II to be run (`cg.subject.ccafs`), so I ran them on CGSpace:
```
$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
```
- We've been waiting since February to run these
- Also, I generated a list of all CCAFS flagships because there are a dozen or so more than there should be:
```
2017-03-29 14:57:01 +02:00
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=210 group by text_value order by count desc) to /tmp/ccafs.csv with csv;
2017-03-28 17:39:09 +02:00
```
- I sent a list to CCAFS people so they can tell me if some should be deleted or moved, etc
- Test, squash, and merge Sisay's RTB theme into `5_x-prod`: https://github.com/ilri/DSpace/pull/316
2017-03-29 14:57:01 +02:00
## 2017-03-29
- Dump a list of fields in the DC and CG schemas to compare with CG Core:
```
dspace=# select case when metadata_schema_id=1 then 'dc' else 'cg' end as schema, element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
```
- Ooh, a better one!
```
dspace=# select coalesce(case when metadata_schema_id=1 then 'dc.' else 'cg.' end) || concat_ws('.', element, qualifier) as field, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
```
2017-03-30 13:50:53 +02:00
## 2017-03-30
- Adjust the Linode CPU usage alerts for the CGSpace server from 150% to 200%, as generally the nightly Solr indexing causes a usage around 150190%, so this should make the alerts less regular
2017-03-31 04:36:10 +02:00
- Adjust the threshold for DSpace Test from 90 to 100%