March, 2017
2017-03-01
- Run the 279 CIAT author corrections on CGSpace
2017-03-02
- Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace
- CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles
- They might come in at the top level in one “CGIAR System” community, or with several communities
- I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?
- Need to send Peter and Michael some notes about this in a few days
- Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI
- Filed an issue on DSpace issue tracker for the
filter-media
bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516 - Discovered that the ImageMagic
filter-media
plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 10568⁄51999):
$ identify ~/Desktop/alc_contrastes_desafios.jpg /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
- This results in discolored thumbnails when compared to the original PDF, for example sRGB and CMYK:
- I filed an issue for the color space thing: DS-3517
2017-03-03
- I created a patch for DS-3517 and made a pull request against upstream
dspace-5_x
: https://github.com/DSpace/DSpace/pull/1669 Looks like
-colorspace sRGB
alone isn’t enough, we need to use profiles:$ convert alc_contrastes_desafios.pdf\[0\] -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_cmyk.icc -thumbnail 300x300 -flatten -profile /opt/brew/Cellar/ghostscript/9.20/share/ghostscript/9.20/iccprofiles/default_rgb.icc alc_contrastes_desafios.pdf.jpg
This reads the input file, applies the CMYK profile, applies the RGB profile, then writes the file
Note that you should set the first profile immediately after the input file
Also, it is better to use profiles than setting
-colorspace
This is a great resource describing the color stuff: http://www.imagemagick.org/Usage/formats/#profiles
Somehow we need to detect the color system being used by the input file and handle each case differently (with profiles)
This is trivial with
identify
(even by the Java ImageMagick API):$ identify -format '%r\n' alc_contrastes_desafios.pdf\[0\] DirectClass CMYK $ identify -format '%r\n' Africa\ group\ of\ negotiators.pdf\[0\] DirectClass sRGB Alpha
2017-03-04
- Spent more time looking at the ImageMagick CMYK issue
- The
default_cmyk.icc
anddefault_rgb.icc
files are both part of the Ghostscript GPL distribution, but according to DSpace’sLICENSES_THIRD_PARTY
file, DSpace doesn’t allow distribution of dependencies that are licensed solely under the GPL - So this issue is kinda pointless now, as the ICC profiles are absolutely necessary to make a meaningful CMYK→sRGB conversion
2017-03-05
- Look into helping developers from landportal.info with a query for items related to LAND on the REST API
- They want something like the items that are returned by the general “LAND” query in the search interface, but we cannot do that
We can only return specific results for metadata fields, like:
$ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "https://dspacetest.cgiar.org/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "LAND REFORM", "language": null}' | json_pp
But there are hundreds of combinations of fields and values (like
dc.subject
and all the center subjects), and we can’t use wildcards in REST!Reading about enabling multiple handle prefixes in DSpace
There is a mailing list thread from 2011 about it: http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html
And a comment from Atmire’s Bram about it on the DSpace wiki: https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296
Bram mentions an undocumented configuration option
handle.plugin.checknameauthority
, but I noticed another one indspace.cfg
:# List any additional prefixes that need to be managed by this handle server # (as for examle handle prefix coming from old dspace repository merged in # that repository) # handle.additional.prefixes = prefix1[, prefix2]
Because of this I noticed that our Handle server’s
config.dct
was potentially misconfigured!We had some default values still present:
"300:0.NA/YOUR_NAMING_AUTHORITY"
I’ve changed them to the following and restarted the handle server:
"300:0.NA/10568"
In looking at all the configs I just noticed that we are not providing a DOI in the Google-specific metadata crosswalk
From
dspace/config/crosswalks/google-metadata.properties
:google.citation_doi = cg.identifier.doi
This works, and makes DSpace output the following metadata on the item view page:
<meta content="https://dx.doi.org/10.1186/s13059-017-1153-y" name="citation_doi">
Submitted and merged pull request for this: https://github.com/ilri/DSpace/pull/305
Submit pull request to set the author separator for XMLUI item lists to a semicolon instead of “,”: https://github.com/ilri/DSpace/pull/306
I want to show it briefly to Abenet and Peter to get feedback
2017-03-06
- Someone on the mailing list said that
handle.plugin.checknameauthority
should be false if we’re using multiple handle prefixes
2017-03-07
- I set up a top-level community as a test for the CGIAR Library and imported one item with the the 10947 handle prefix
- When testing the Handle resolver locally it shows the item to be on the local repository
- So this seems to work, with the following caveats:
- New items will have the default handle
- Communities and collections will have the default handle
- Only items imported manually can have the other handles
- I need to talk to Michael and Peter to share the news, and discuss the structure of their community(s) and try some actual test data
- We’ll need to do some data cleaning to make sure they are using the same fields we are, like
dc.type
andcg.identifier.status
- Another thing is that the import process creates new
dc.date.accessioned
anddc.date.available
fields, so we end up with duplicates (is it important to preserve the originals for these?) - Report DS-3520 issue to Atmire
2017-03-08
- Merge the author separator changes to
5_x-prod
, as everyone has responded positively about it, and it’s the default in Mirage2 afterall! - Cherry pick the
commons-collections
patch from DSpace’sdspace-5_x
branch to address DS-3520: https://jira.duraspace.org/browse/DS-3520
2017-03-09
Export list of sponsors so Peter can clean it up:
dspace=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship') group by text_value order by count desc) to /tmp/sponsorship.csv with csv; COPY 285
2017-03-12
Test the sponsorship fixes and deletes from Peter:
$ ./fix-metadata-values.py -i Investors-Fix-51.csv -f dc.description.sponsorship -t Action -m 29 -d dspace -u dspace -p fuuuu $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.sponsorship -m 29 -d dspace -u dspace -p fuuu
Generate a new list of unique sponsors so we can update the controlled vocabulary:
dspace=# \copy (select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship')) to /tmp/sponsorship.csv with csv;
Pull request for controlled vocabulary if Peter approves: https://github.com/ilri/DSpace/pull/308
Review Sisay’s roots, tubers, and bananas (RTB) theme, which still needs some fixes to work properly: https://github.com/ilri/DSpace/pull/307
Created an issue to track the progress on the Livestock CRP theme: https://github.com/ilri/DSpace/issues/309
Created a basic theme for the Livestock CRP community
2017-03-15
- Merge pull request for controlled vocabulary updates for sponsor: https://github.com/ilri/DSpace/pull/308
- Merge pull request for Livestock CRP theme: https://github.com/ilri/DSpace/issues/309
- Create pull request for PABRA subjects (waiting for confirmation from Abenet before merging): https://github.com/ilri/DSpace/pull/310
- Create pull request for CCAFS Phase II migrations (waiting for confirmation from CCAFS people): https://github.com/ilri/DSpace/pull/311
- I also need to ask if either of these new fields need to be added to Discovery facets, search, and Atmire modules
- Run all system updates on DSpace Test and re-deploy CGSpace
2017-03-16
- Merge pull request for PABRA subjects: https://github.com/ilri/DSpace/pull/310
- Abenet and Peter say we can add them to Discovery, Atmire modules, etc, but I might not have time to do it now
- Help Sisay with RTB theme again
- Remove ICARDA subject from Discovery sidebar facets: https://github.com/ilri/DSpace/pull/312
- Remove ICARDA subject from Browse and item submission form: https://github.com/ilri/DSpace/pull/313
- Merge the CCAFS Phase II changes but hold off on doing the flagship metadata updates until Macaroni Bros gets their importer updated
- Deploy latest changes and investor fixes/deletions on CGSpace
- Run system updates on CGSpace and reboot server
2017-03-20
- Create basic XMLUI theme for PABRA community: https://github.com/ilri/DSpace/pull/315
2017-03-24
- Still helping Sisay try to figure out how to create a theme for the RTB community
2017-03-28
CCAFS said they are ready for the flagship updates for Phase II to be run (
cg.subject.ccafs
), so I ran them on CGSpace:$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
We’ve been waiting since February to run these
Also, I generated a list of all CCAFS flagships because there are a dozen or so more than there should be:
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=210 group by text_value order by count desc) to /tmp/ccafs.csv with csv;
I sent a list to CCAFS people so they can tell me if some should be deleted or moved, etc
Test, squash, and merge Sisay’s RTB theme into
5_x-prod
: https://github.com/ilri/DSpace/pull/316
2017-03-29
Dump a list of fields in the DC and CG schemas to compare with CG Core:
dspace=# select case when metadata_schema_id=1 then 'dc' else 'cg' end as schema, element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
Ooh, a better one!
dspace=# select coalesce(case when metadata_schema_id=1 then 'dc.' else 'cg.' end) || concat_ws('.', element, qualifier) as field, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
2017-03-30
- Adjust the Linode CPU usage alerts for the CGSpace server from 150% to 200%, as generally the nightly Solr indexing causes a usage around 150–190%, so this should make the alerts less regular
- Adjust the threshold for DSpace Test from 90 to 100%