2017-03-01
- Run the 279 CIAT author corrections on CGSpace
2017-03-02
- Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace
- CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles
- They might come in at the top level in one “CGIAR System” community, or with several communities
- I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?
- Need to send Peter and Michael some notes about this in a few days
- Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI
- Filed an issue on DSpace issue tracker for the
filter-media
bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516
- Discovered that the ImageMagic
filter-media
plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK
- Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 10568⁄51999):
$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
- This results in discolored thumbnails when compared to the original PDF, for example sRGB and CMYK:
- I filed an issue for the color space thing: DS-3517
2017-03-03
$ convert alc_contrastes_desafios.pdf\[0\] -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_cmyk.icc -thumbnail 300x300 -flatten -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_rgb.icc alc_contrastes_desafios.pdf.jpg
- This reads the input file, applies the CMYK profile, applies the RGB profile, then writes the file
- Note that you should set the first profile immediately after the input file
- Also, it is better to use profiles than setting
-colorspace
- This is a great resource describing the color stuff: http://www.imagemagick.org/Usage/formats/#profiles
- Somehow we need to detect the color system being used by the input file and handle each case differently (with profiles)
- This is trivial with
identify
(even by the Java ImageMagick API):
$ identify -format '%r\n' alc_contrastes_desafios.pdf\[0\]
DirectClass CMYK
$ identify -format '%r\n' Africa\ group\ of\ negotiators.pdf\[0\]
DirectClass sRGB Alpha
2017-03-04
- Spent more time looking at the ImageMagick CMYK issue
- The
default_cmyk.icc
and default_rgb.icc
files are both part of the Ghostscript GPL distribution, but according to DSpace’s LICENSES_THIRD_PARTY
file, DSpace doesn’t allow distribution of dependencies that are licensed solely under the GPL
- So this issue is kinda pointless now, as the ICC profiles are absolutely necessary to make a meaningful CMYK→sRGB conversion
2017-03-05
- Look into helping developers from landportal.info with a query for items related to LAND on the REST API
- They want something like the items that are returned by the general “LAND” query in the search interface, but we cannot do that
- We can only return specific results for metadata fields, like:
$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "LAND REFORM", "language": null}' | json_pp
# List any additional prefixes that need to be managed by this handle server
# (as for examle handle prefix coming from old dspace repository merged in
# that repository)
# handle.additional.prefixes = prefix1[, prefix2]
- Because of this I noticed that our Handle server’s
config.dct
was potentially misconfigured!
- We had some default values still present:
"300:0.NA/YOUR_NAMING_AUTHORITY"
- I’ve changed them to the following and restarted the handle server:
"300:0.NA/10568"
- In looking at all the configs I just noticed that we are not providing a DOI in the Google-specific metadata crosswalk
- From
dspace/config/crosswalks/google-metadata.properties
:
google.citation_doi = cg.identifier.doi
- This works, and makes DSpace output the following metadata on the item view page:
<meta content="https://dx.doi.org/10.1186/s13059-017-1153-y" name="citation_doi">
2017-03-06
- Someone on the mailing list said that
handle.plugin.checknameauthority
should be false if we’re using multiple handle prefixes
2017-03-07
- I set up a top-level community as a test for the CGIAR Library and imported one item with the the 10947 handle prefix
- When testing the Handle resolver locally it shows the item to be on the local repository
- So this seems to work, with the following caveats:
- New items will have the default handle
- Communities and collections will have the default handle
- Only items imported manually can have the other handles
- I need to talk to Michael and Peter to share the news, and discuss the structure of their community(s) and try some actual test data
- We’ll need to do some data cleaning to make sure they are using the same fields we are, like
dc.type
and cg.identifier.status
- Another thing is that the import process creates new
dc.date.accessioned
and dc.date.available
fields, so we end up with duplicates