<metaproperty="og:description"content="2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace."/>
<metaname="twitter:description"content="2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace."/>
<li>Megan says there are still some mapped items are not appearing since last week, so I forced a full <code>index-discovery -b</code></li>
<li>Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: <ahref="https://cgspace.cgiar.org/handle/10568/80731">https://cgspace.cgiar.org/handle/10568/80731</a></li>
<li>Create ticket on Atmire tracker to ask about commissioning them to develop the feature to expose ORCID via REST/OAI: <ahref="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510</a></li>
<li>According to the <ahref="https://wiki.duraspace.org/display/DSDOC5x/Curation+System">DSpace curation docs</a> the fact that the <code>requiredmetadata</code> curation task stops when it finds a missing metadata field is by design</li>
<li>When ingesting some collections I was getting <code>java.lang.OutOfMemoryError: GC overhead limit exceeded</code>, which can be solved by disabling the GC timeout with <code>-XX:-UseGCOverheadLimit</code></li>
<li>Other times I was getting an error about heap space, so I kept bumping the RAM allocation by 512MB each time (up to 4096m!) it crashed</li>
<li>This leads to tens of thousands of abandoned files in the assetstore, which need to be cleaned up using <code>dspace cleanup -v</code>, or else you’ll run out of disk space</li>
<li>In the end I realized it’s better to use submission mode (<code>-s</code>) to ingest the community object as a single AIP without its children, followed by each of the collections:</li>
<li>Note that in submission mode DSpace ignores the handle specified in <code>mets.xml</code> in the zip file, so you need to turn that off with <code>-o ignoreHandle=false</code></li>
<li>Give feedback to CIFOR about their data quality:
<ul>
<li>Suggestion: uppercase dc.subject, cg.coverage.region, and cg.coverage.subregion in your crosswalk so they match CGSpace and therefore can be faceted / reported on easier</li>
<li>Suggestion: use CGSpace’s CRP names (cg.contributor.crp), see: dspace/config/input-forms.xml</li>
<li>Suggestion: clean up duplicates and errors in funders, perhaps use a controlled vocabulary like ours, see: dspace/config/controlled-vocabularies/dc-description-sponsorship.xml</li>
<li>Suggestion: use dc.type “Blog Post” instead of “Blog” for your blog post items (we are also adding a “Blog Post” type to CGSpace soon)</li>
<li>Question: many of your items use dc.document.uri AND cg.identifier.url with the same text value?</li>
</ul></li>
<li>Help Marianne from WLE with an Open Search query to show the latest WLE CRP outputs: <ahref="https://cgspace.cgiar.org/open-search/discover?query=crpsubject:WATER%2C+LAND+AND+ECOSYSTEMS&sort_by=2&order=DESC">https://cgspace.cgiar.org/open-search/discover?query=crpsubject:WATER%2C+LAND+AND+ECOSYSTEMS&sort_by=2&order=DESC</a></li>
<li>This uses the webui’s item list sort options, see <code>webui.itemlist.sort-option</code> in <code>dspace.cfg</code></li>
<li>The equivalent Discovery search would be: <ahref="https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&filter_relational_operator_1=equals&filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&submit_apply_filter=&query=&rpp=10&sort_by=dc.date.issued_dt&order=desc">https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&filter_relational_operator_1=equals&filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&submit_apply_filter=&query=&rpp=10&sort_by=dc.date.issued_dt&order=desc</a></li>
<li>I ended up running into issues during data cleaning and decided to wipe out the entire community and re-sync DSpace Test assetstore and database from CGSpace rather than waiting for the cleanup task to clean up</li>
<li>Hours into the re-ingestion I ran into more errors, and had to erase everything and start over <em>again</em>!</li>
<li>Now, no matter what I do I keep getting foreign key errors…</li>
<li>Atmire says they are willing to extend the ORCID implementation, and I’ve asked them to provide a quote</li>
<li>I clarified that the scope of the implementation should be that ORCIDs are stored in the database and exposed via REST / API like other fields</li>
<li>After quite a bit of troubleshooting with importing cleaned up data as CSV, it seems that there are actually <ahref="https://en.wikipedia.org/wiki/Null_character">NUL</a> characters in the <code>dc.description.abstract</code> field (at least) on the lines where CSV importing was failing</li>
<li>I tried to find a way to remove the characters in vim or Open Refine, but decided it was quicker to just remove the column temporarily and import it</li>
<li>The import was successful and detected 2022 changes, which should likely be the rest that were failing to import before</li>
<li>To delete the blank lines that cause isses during import we need to use a regex in vim <code>g/^$/d</code></li>
<li>After that I started looking in the <code>dc.subject</code> field to try to pull countries and regions out, but there are too many values in there</li>
<li>Bump the Academicons dependency of the Mirage 2 themes from 1.6.0 to 1.8.0 because the upstream deleted the old tag and now the build is failing: <ahref="https://github.com/ilri/DSpace/pull/321">#321</a></li>
<li>Merge changes to CCAFS project identifiers and flagships: <ahref="https://github.com/ilri/DSpace/pull/320">#320</a></li>
<li>GENDER AND SOCIAL DIFFERENTIATION→GENDER AND SOCIAL INCLUSION</li>
<li>MANAGING CLIMATE RISK→CLIMATE SERVICES AND SAFETY NETS</li>
</ul></li>
<li><p>Re-deploy CGSpace and DSpace Test and run system updates</p></li>
<li><p>Reboot DSpace Test</p></li>
<li><p>Fix cron jobs for log management on DSpace Test, as they weren’t catching <code>dspace.log.*</code> files correctly and we had over six months of them and they were taking up many gigs of disk space</p></li>
<li>Start creating a basic theme for the CGIAR System Organization’s community on CGSpace</li>
<li>Using colors from the <ahref="http://library.cgiar.org/handle/10947/2699">CGIAR Branding guidelines (2014)</a></li>
<li>Make a GitHub issue to track this work: <ahref="https://github.com/ilri/DSpace/issues/324">#324</a></li>
</ul>
<h2id="2017-05-22">2017-05-22</h2>
<ul>
<li>Do some cleanups of community and collection names in CGIAR System Management Office community on DSpace Test, as well as move some items as Peter requested</li>
<li>Peter wanted a list of authors in here, so I generated a list of collections using the “View Source” on each community and this hacky awk:</li>
<li>Then I joined them together and ran this old SQL query from the dspace-tech mailing list which gives you authors for items in those collections:</li>
</ul>
<pre><code>dspace=# select distinct text_value
from metadatavalue
where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author')
AND resource_type_id = 2
AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10947/2', '10947/3', '10947/1
<li>Add Affiliation to filters on Listing and Reports module (<ahref="https://github.com/ilri/DSpace/pull/325">#325</a>)</li>
<li>Start looking at WLE’s Phase II metadata updates but it seems they are not tagging their items properly, as their website importer infers which theme to use based on the name of the CGSpace collection!</li>
<li>For now I’ve suggested that they just change the collection names and that we fix their metadata manually afterwards</li>
<li>Also, they have a lot of messed up values in their <code>cg.subject.wle</code> field so I will clean up some of those first:</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id=119) to /tmp/wle.csv with csv;
<li>Respond to Atmire message about ORCIDs, saying that right now we’d prefer to just have them available via REST API like any other metadata field, and that I’m available for a Skype</li>