2019-06-30 13:38:00 +02:00
<!DOCTYPE html>
2019-10-11 10:19:42 +02:00
< html lang = "en" >
2019-06-30 13:38:00 +02:00
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
< meta property = "og:title" content = "June, 2019" / >
< meta property = "og:description" content = "2019-06-02
Merge the Solr filterCache and XMLUI ISI journal changes to the 5_x-prod branch and deploy on CGSpace
Run system updates on CGSpace (linode18) and reboot it
2019-06-03
Skype with Marie-Angélique and Abenet about CG Core v2
" />
< meta property = "og:type" content = "article" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2019-06/" / >
2019-08-08 17:10:44 +02:00
< meta property = "article:published_time" content = "2019-06-02T10:57:51+03:00" / >
2019-10-28 12:43:25 +01:00
< meta property = "article:modified_time" content = "2019-10-28T13:39:25+02:00" / >
2019-06-30 13:38:00 +02:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "June, 2019" / >
< meta name = "twitter:description" content = "2019-06-02
Merge the Solr filterCache and XMLUI ISI journal changes to the 5_x-prod branch and deploy on CGSpace
Run system updates on CGSpace (linode18) and reboot it
2019-06-03
Skype with Marie-Angélique and Abenet about CG Core v2
"/>
2020-08-06 08:00:37 +02:00
< meta name = "generator" content = "Hugo 0.74.3" / >
2019-06-30 13:38:00 +02:00
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "June, 2019",
2020-04-02 09:55:42 +02:00
"url": "https://alanorth.github.io/cgspace-notes/2019-06/",
2019-06-30 22:32:00 +02:00
"wordCount": "1057",
2019-10-11 10:19:42 +02:00
"datePublished": "2019-06-02T10:57:51+03:00",
2019-10-28 12:43:25 +01:00
"dateModified": "2019-10-28T13:39:25+02:00",
2019-06-30 13:38:00 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2019-06/" >
< title > June, 2019 | CGSpace Notes< / title >
2019-10-11 10:19:42 +02:00
2019-06-30 13:38:00 +02:00
<!-- combined, minified CSS -->
2020-01-23 19:19:38 +01:00
2020-01-28 11:01:42 +01:00
< link href = "https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel = "stylesheet" integrity = "sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin = "anonymous" >
2019-10-11 10:19:42 +02:00
2019-06-30 13:38:00 +02:00
2020-01-28 11:01:42 +01:00
<!-- minified Font Awesome for SVG icons -->
2020-04-02 09:55:42 +02:00
< script defer src = "https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity = "sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin = "anonymous" > < / script >
2020-01-28 11:01:42 +01:00
2019-06-30 13:38:00 +02:00
<!-- RSS 2.0 feed -->
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
2019-10-11 10:19:42 +02:00
< h1 class = "blog-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" dir = "auto" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
2019-06-30 13:38:00 +02:00
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
< article class = "blog-post" >
< header >
2019-10-11 10:19:42 +02:00
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2019-06/" > June, 2019< / a > < / h2 >
2020-04-02 09:55:42 +02:00
< p class = "blog-post-meta" > < time datetime = "2019-06-02T10:57:51+03:00" > Sun Jun 02, 2019< / time > by Alan Orth in
2020-01-28 11:01:42 +01:00
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/categories/notes/" rel = "category tag" > Notes< / a >
2019-06-30 13:38:00 +02:00
< / p >
< / header >
2019-12-17 13:49:24 +01:00
< h2 id = "2019-06-02" > 2019-06-02< / h2 >
2019-06-30 13:38:00 +02:00
< ul >
< li > Merge the < a href = "https://github.com/ilri/DSpace/pull/425" > Solr filterCache< / a > and < a href = "https://github.com/ilri/DSpace/pull/426" > XMLUI ISI journal< / a > changes to the < code > 5_x-prod< / code > branch and deploy on CGSpace< / li >
< li > Run system updates on CGSpace (linode18) and reboot it< / li >
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2019-06-03" > 2019-06-03< / h2 >
2019-06-30 13:38:00 +02:00
< ul >
< li > Skype with Marie-Angélique and Abenet about < a href = "https://agriculturalsemantics.github.io/cg-core/cgcore.html" > CG Core v2< / a > < / li >
< / ul >
< ul >
< li > Here is a list of proposed metadata migrations for CGSpace
< ul >
< li > dc.language.iso→DCTERMS.language (and switch to ISO 639-2 Alpha 3)< / li >
< li > dc.description.abstract→DCTERMS.abstract< / li >
< li > dc.identifier.citation→DCTERMS.bibliographicCitation< / li >
< li > dc.contributor.author→DCTERMS.creator (for people)< / li >
< li > dc.description.sponsorship→cg.contributor.donor (values from CrossRef or Grid.ac if possible)< / li >
< li > dc.rights→DCTERMS.license< / li >
< li > cg.identifier.status→DCTERMS.accessRights (values “ open” or “ restricted” )< / li >
< li > cg.creator.id→cg.creator.identifier?< / li >
< li > dc.relation.ispartofseries→DCTERMS.isPartOf< / li >
< li > cg.link.relation→DCTERMS.relation< / li >
2019-11-28 16:30:45 +01:00
< / ul >
< / li >
2019-06-30 13:38:00 +02:00
< li > Marie agreed that we need to adopt some controlled lists for our values, and pointed out that the MARLO team maintains a list of CRPs and Centers at < a href = "https://clarisa.cgiar.org/" > CLARISA< / a >
< ul >
< li > There is an API there but it needs a password for access… < / li >
< / ul >
2019-11-28 16:30:45 +01:00
< / li >
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2019-06-04" > 2019-06-04< / h2 >
2019-06-30 13:38:00 +02:00
< ul >
< li > The MARLO team responded and said they will give us access to the CLARISA API< / li >
< li > Marie-Angélique < a href = "https://github.com/AgriculturalSemantics/cg-core/pull/1" > proposed< / a > to integrate < code > dcterms.isPartOf< / code > , < code > dcterms.abstract< / code > , and < code > dcterms.bibliographicCitation< / code > into the CG Core v2 schema
< ul >
< li > I told her I would attempt to integrate those and the others above into DSpace Test soon and report back< / li >
< li > We also need to discuss with the ILRI Data Portal, MEL/MELSpace, and users who consume the CGSpace API< / li >
2019-11-28 16:30:45 +01:00
< / ul >
< / li >
2019-06-30 13:38:00 +02:00
< li > Add Arabic language to input-forms.xml (< a href = "https://github.com/ilri/DSpace/pull/427" > #427< / a > ), as Bioversity is adding some Arabic items and noticed it missing< / li >
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2019-06-05" > 2019-06-05< / h2 >
2019-06-30 13:38:00 +02:00
< ul >
< li > Send mail to CGSpace and MELSpace people to let them know about the proposed metadata field migrations after the discussion with Marie-Angélique< / li >
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2019-06-07" > 2019-06-07< / h2 >
2019-06-30 13:38:00 +02:00
< ul >
2019-11-28 16:30:45 +01:00
< li > Thierry noticed that the CUA statistics were missing previous years again, and I see that the Solr admin UI has the following message:< / li >
< / ul >
2019-06-30 13:38:00 +02:00
< pre > < code > statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > I had to restart Tomcat a few times for all the stats cores to get loaded with no issue< / li >
2019-06-30 13:38:00 +02:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2019-06-10" > 2019-06-10< / h2 >
2019-06-30 13:38:00 +02:00
< ul >
< li > Rename the AReS repository on GitHub to OpenRXV: < a href = "https://github.com/ilri/OpenRXV" > https://github.com/ilri/OpenRXV< / a > < / li >
< li > Create a new AReS repository: < a href = "https://github.com/ilri/AReS" > https://github.com/ilri/AReS< / a > < / li >
< li > Start looking at the 203 IITA records on DSpace Test from last month (< a href = "https://dspacetest.cgiar.org/handle/10568/102032" > IITA_May_16< / a > aka “ 20194th.xls” ) using OpenRefine
< ul >
2020-01-27 15:20:44 +01:00
< li > Trim leading, trailing, and consecutive whitespace on all columns, but I didn’ t notice very many issues< / li >
2019-06-30 13:38:00 +02:00
< li > Validate affiliations against latest list of top 1500 terms using reconcile-csv, correcting and standardizing about twenty-seven< / li >
< li > Validate countries against latest list of countries using reconcile-csv, correcting three< / li >
2020-01-27 15:20:44 +01:00
< li > Convert all DOIs to “ < a href = "https://dx.doi.org" > https://dx.doi.org< / a > ” format< / li >
2019-06-30 13:38:00 +02:00
< li > Normalize all < code > cg.identifier.url< / code > Google book fields to “ books.google.com” < / li >
< li > Correct some inconsistencies in IITA subjects< / li >
< li > Correct two incorrect “ Peer Review” in < code > dc.description.version< / code > < / li >
< li > About fifteen items have incorrect ISBNs (looks like an Excel error because the values look like scientific numbers)< / li >
< li > Delete one blank item< / li >
2020-01-27 15:20:44 +01:00
< li > I managed to get to subjects, so I’ ll continue from there when I start working next< / li >
2019-11-28 16:30:45 +01:00
< / ul >
< / li >
< li > Generate a new list of countries from the database for use with reconcile-csv
2019-06-30 13:38:00 +02:00
< ul >
2019-11-28 16:30:45 +01:00
< li > After dumping, use csvcut to add line numbers, then change the csv header to match those you use in reconcile-csv, for example < code > id< / code > and < code > name< / code > :< / li >
< / ul >
< / li >
< / ul >
2019-06-30 13:38:00 +02:00
< pre > < code > dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 228 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC) to /tmp/countries.csv WITH CSV HEADER
COPY 192
$ csvcut -l -c 0 /tmp/countries.csv > 2019-06-10-countries.csv
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
2020-01-27 15:20:44 +01:00
< li > Get a list of all the unique AGROVOC subject terms in IITA’ s data and export it to a text file so I can validate them with my < code > agrovoc-lookup.py< / code > script:< / li >
2019-11-28 16:30:45 +01:00
< / ul >
2019-06-30 13:38:00 +02:00
< pre > < code > $ csvcut -c dc.subject ~/Downloads/2019-06-10-IITA-20194th-Round-2.csv| sed 's/||/\n/g' | grep -v dc.subject | sort -u > iita-agrovoc.txt
$ ./agrovoc-lookup.py -i iita-agrovoc.txt -om iita-agrovoc-matches.txt -or iita-agrovoc-rejects.txt
$ wc -l iita-agrovoc*
2019-11-28 16:30:45 +01:00
402 iita-agrovoc-matches.txt
29 iita-agrovoc-rejects.txt
431 iita-agrovoc.txt
< / code > < / pre > < ul >
< li > Combine these IITA matches with the subjects I matched a few months ago:< / li >
< / ul >
2019-06-30 13:38:00 +02:00
< pre > < code > $ csvcut -c name 2019-03-18-subjects-matched.csv | grep -v name | cat - iita-agrovoc-matches.txt | sort -u > 2019-06-10-subjects-matched.txt
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > Then make a new list to use with reconcile-csv by adding line numbers with csvcut and changing the line number header to < code > id< / code > :< / li >
2019-06-30 13:38:00 +02:00
< / ul >
2019-11-28 16:30:45 +01:00
< pre > < code > $ csvcut -c name -l 2019-06-10-subjects-matched.txt | sed 's/line_number/id/' > 2019-06-10-subjects-matched.csv
2019-12-17 13:49:24 +01:00
< / code > < / pre > < h2 id = "2019-06-20" > 2019-06-20< / h2 >
2019-06-30 13:38:00 +02:00
< ul >
< li > Share some feedback about AReS v2 with the colleagues and encourage them to do the same< / li >
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2019-06-23" > 2019-06-23< / h2 >
2019-06-30 13:38:00 +02:00
< ul >
< li > Continue work on reviewing CG Core v2 standard and its implications to CGSpace an DSpace platforms in general
< ul >
< li > Update my < a href = "https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409" > list of fields to migrate< / a > < / li >
< li > Submit an < a href = "https://github.com/AgriculturalSemantics/cg-core/issues/2" > issue with my feedback to the CG Core project< / a > < / li >
2019-11-28 16:30:45 +01:00
< / ul >
< / li >
< li > Update my local PostgreSQL container:< / li >
< / ul >
2019-06-30 13:38:00 +02:00
< pre > < code > $ podman pull docker.io/library/postgres:9.6-alpine
$ podman rm dspacedb
$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
2019-12-17 13:49:24 +01:00
< / code > < / pre > < h2 id = "2019-06-25" > 2019-06-25< / h2 >
2019-06-30 13:38:00 +02:00
< ul >
2019-11-28 16:30:45 +01:00
< li > Normalize < code > text_lang< / code > values for metadata on DSpace Test and CGSpace:< / li >
< / ul >
2019-06-30 13:38:00 +02:00
< pre > < code > dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('ethnob', 'en', '*', 'E.', '');
UPDATE 1551
dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IS NULL;
UPDATE 2070
dspace=# UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('es', 'spa');
UPDATE 2
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > Upload 202 IITA records from earlier this month (20194th.xls) to CGSpace< / li >
< li > Communicate with Bioversity contractor in charge of their migration from Typo3 to CGSpace< / li >
2019-06-30 13:38:00 +02:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2019-06-28" > 2019-06-28< / h2 >
2019-06-30 13:38:00 +02:00
< ul >
< li > Start looking at the fifty-seven AfricaRice records sent by Ibnou earlier this month
< ul >
< li > First, I see there are several items with type “ Book” and “ Book Chapter” should go in an “ AfricaRice books and book chapters” collection, but none exists in the AfricaRice community< / li >
< li > Trim and collapse consecutive whitespace on author, affiliation, authorship types, title, subjects, doi, issn, source, citation, country, sponsors< / li >
< li > Standardize and correct affiliations like “ Africa Rice Cente” and “ Africa Rice Centre” , including syntax errors with multi-value separators< / li >
2019-11-28 16:30:45 +01:00
< li > Lots of variation in affiliations, for example:
< ul >
2019-06-30 13:38:00 +02:00
< li > Université Abomey-Calavi< / li >
2020-01-27 15:20:44 +01:00
< li > Université d’ Abomey< / li >
< li > Université d’ Abomey Calavi< / li >
< li > Université d’ Abomey-Calavi< / li >
2019-06-30 13:38:00 +02:00
< li > University of Abomey-Calavi< / li >
2019-11-28 16:30:45 +01:00
< / ul >
< / li >
< li > Validate and normalize affiliations against our 2019-04 list using reconcile-csv and OpenRefine:
< ul >
2019-06-30 13:38:00 +02:00
< li > < code > $ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id< / code > < / li >
< li > I always forget how to copy the reconciled values in OpenRefine, but you need to make a new colume and populate it using this GREL: < code > if(cell.recon.matched, cell.recon.match.name, value)< / code > < / li >
2019-11-28 16:30:45 +01:00
< / ul >
< / li >
2019-06-30 13:38:00 +02:00
< li > Replace smart quotes with standard ASCII ones< / li >
< li > Fix typos in authoriship types< / li >
2019-11-28 16:30:45 +01:00
< li > Validate and normalize subjects against our 2019-06 list using reconcile-csv and OpenRefine:
< ul >
2019-06-30 13:38:00 +02:00
< li > < code > $ lein run ~/src/git/DSpace/2019-06-10-subjects-matched.csv name id< / code > < / li >
< li > Also add about 30 new AGROVOC subjects to our list that I verified manually< / li >
2019-11-28 16:30:45 +01:00
< / ul >
< / li >
2019-06-30 13:38:00 +02:00
< li > There is one duplicate, both have the same DOI: < a href = "https://doi.org/10.1016/j.agwat.2018.06.018" > https://doi.org/10.1016/j.agwat.2018.06.018< / a > < / li >
< li > Fix four ISBNs that were in the ISSN field< / li >
< / ul >
2019-11-28 16:30:45 +01:00
< / li >
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2019-06-30" > 2019-06-30< / h2 >
2019-06-30 22:11:55 +02:00
< ul >
2019-11-28 16:30:45 +01:00
< li > Upload fifty-seven AfricaRice records to < a href = "https://dspacetest.cgiar.org/handle/10568/102274" > DSpace Test< / a >
2019-06-30 22:11:55 +02:00
< ul >
2019-11-28 16:30:45 +01:00
< li > I created the SAF bundler with SAFBuilder and then imported via the CLI:< / li >
< / ul >
< / li >
< / ul >
2019-06-30 22:11:55 +02:00
< pre > < code > $ dspace import -a -e me@cgiar.org -m 2019-06-30-AfricaRice-11to73.map -s /tmp/2019-06-30-AfricaRice-11to73
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > I sent feedback about a few missing PDFs and one duplicate to Ibnou to check< / li >
< li > Run all system updates on DSpace Test (linode19) and reboot it< / li >
2019-06-30 22:11:55 +02:00
< / ul >
2019-11-28 16:30:45 +01:00
<!-- raw HTML omitted -->
2019-07-01 11:22:43 +02:00
2019-06-30 13:38:00 +02:00
< / article >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
2020-08-02 21:14:16 +02:00
< li > < a href = "/cgspace-notes/2020-07/" > August, 2020< / a > < / li >
2020-07-01 14:37:20 +02:00
< li > < a href = "/cgspace-notes/2020-07/" > July, 2020< / a > < / li >
2020-06-02 14:12:32 +02:00
< li > < a href = "/cgspace-notes/2020-06/" > June, 2020< / a > < / li >
2020-05-02 09:08:14 +02:00
2020-06-02 14:12:32 +02:00
< li > < a href = "/cgspace-notes/2020-05/" > May, 2020< / a > < / li >
2020-06-01 16:08:25 +02:00
2020-04-02 09:54:46 +02:00
< li > < a href = "/cgspace-notes/2020-04/" > April, 2020< / a > < / li >
2019-06-30 13:38:00 +02:00
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
2019-10-11 10:19:42 +02:00
< p dir = "auto" >
2019-06-30 13:38:00 +02:00
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >