2016-11-01 08:23:50 +01:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta charset = "utf-8" >
< meta http-equiv = "X-UA-Compatible" content = "IE=edge" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
< meta name = "description" content = "" >
< meta name = "author" content = "Alan Orth" >
<!-- OpenGraph Metadata: http://ogp.me/ -->
< meta property = "og:title" content = "November, 2016" >
< meta property = "og:description" content = "" >
< meta property = "og:type" content = "article" >
< meta property = "article:published_time" content = "2016-11-01T09:21:00+03:00" >
< meta property = "article:author" content = "Alan Orth" >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2016-11/" >
<!-- Metadata for Twitter: https://dev.twitter.com/cards/markup -->
< meta property = "twitter:card" content = "summary" >
< meta property = "twitter:title" content = "November, 2016" >
< meta property = "twitter:description" content = "" >
< meta name = "generator" content = "Hugo 0.17" / >
< base href = "https://alanorth.github.io/cgspace-notes/" >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2016-11/" >
< title > November, 2016 | CGSpace Notes< / title >
<!-- combined, minified CSS -->
< link href = "https://alanorth.github.io/cgspace-notes/css/style.css" rel = "stylesheet" >
<!-- RSS 2.0 feed -->
< link href = "https://alanorth.github.io/cgspace-notes/index.xml" type = "application/rss+xml" rel = "alternate" >
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" > < a href = "https://alanorth.github.io/cgspace-notes/2016-11/" > November, 2016< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2016-11-01T09:21:00+03:00" > Tue Nov 01, 2016< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
2016-11-02 11:27:37 +01:00
< h2 id = "2016-11-01" > 2016-11-01< / h2 >
2016-11-01 08:23:50 +01:00
< ul >
2016-11-01 08:35:09 +01:00
< li > Add < code > dc.type< / code > to the output options for Atmire’ s Listings and Reports module (< a href = "https://github.com/ilri/DSpace/pull/286" > #286< / a > )< / li >
2016-11-01 08:23:50 +01:00
< / ul >
< p > < img src = "2016/11/listings-and-reports.png" alt = "Listings and Reports with output type" / > < / p >
2016-11-02 11:27:37 +01:00
< h2 id = "2016-11-02" > 2016-11-02< / h2 >
< ul >
2016-11-06 12:47:08 +01:00
< li > Migrate DSpace Test to DSpace 5.5 (< a href = "https://gist.github.com/alanorth/61013895c6efe7095d7f81000953d1cf" > notes< / a > )< / li >
2016-11-02 11:27:37 +01:00
< li > Run all updates on DSpace Test and reboot the server< / li >
2016-11-02 12:51:42 +01:00
< li > Looks like the OAI bug from DSpace 5.1 that caused validation at Base Search to fail is now fixed and DSpace Test passes validation! (< a href = "https://github.com/ilri/DSpace/issues/63" > #63< / a > )< / li >
2016-11-02 16:19:02 +01:00
< li > Indexing Discovery on DSpace Test took 332 minutes, which is like five times as long as it usually takes< / li >
< li > At the end it appeared to finish correctly but there were lots of errors right after it finished:< / li >
< / ul >
< pre > < code > 2016-11-02 15:09:48,578 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76454 to Index
2016-11-02 15:09:48,584 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Community: 10568/3202 to Index
2016-11-02 15:09:48,589 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76455 to Index
2016-11-02 15:09:48,590 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Community: 10568/51693 to Index
2016-11-02 15:09:48,590 INFO org.dspace.discovery.IndexClient @ Done with indexing
2016-11-02 15:09:48,600 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76456 to Index
2016-11-02 15:09:48,613 INFO org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/55536 to Index
2016-11-02 15:09:48,616 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76457 to Index
2016-11-02 15:09:48,634 ERROR com.atmire.dspace.discovery.AtmireSolrService @
java.lang.NullPointerException
at org.dspace.discovery.SearchUtils.getDiscoveryConfiguration(SourceFile:57)
at org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:824)
at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:821)
at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:898)
at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
at org.dspace.storage.rdbms.DatabaseUtils$ReindexerThread.run(DatabaseUtils.java:945)
< / code > < / pre >
< ul >
< li > DSpace is still up, and a few minutes later I see the default DSpace indexer is still running< / li >
< li > Sure enough, looking back before the first one finished, I see output from both indexers interleaved in the log:< / li >
< / ul >
< pre > < code > 2016-11-02 15:09:28,545 INFO org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/47242 to Index
2016-11-02 15:09:28,633 INFO org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/60785 to Index
2016-11-02 15:09:28,678 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (55695 of 55722): 43557
2016-11-02 15:09:28,688 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (55703 of 55722): 34476
< / code > < / pre >
< ul >
< li > I will raise a ticket with Atmire to ask them< / li >
2016-11-02 11:27:37 +01:00
< / ul >
2016-11-06 12:47:08 +01:00
< h2 id = "2016-11-06" > 2016-11-06< / h2 >
< ul >
< li > After re-deploying and re-indexing I didn’ t see the same issue, and the indexing completed in 85 minutes, which is about how long it is supposed to take< / li >
< / ul >
2016-11-07 15:46:42 +01:00
< h2 id = "2016-11-07" > 2016-11-07< / h2 >
< ul >
< li > Horrible one liner to get Linode ID from certain Ansible host vars:< / li >
< / ul >
< pre > < code > $ grep -A 3 contact_info * | grep -E " (Orth|Sisay|Peter|Daniel|Tsega)" | awk -F'-' '{print $1}' | grep linode | uniq | xargs grep linode_id
< / code > < / pre >
< ul >
< li > I noticed some weird CRPs in the database, and they don’ t show up in Discovery for some reason, perhaps the < code > :< / code > < / li >
< li > I’ ll export these and fix them in batch:< / li >
< / ul >
< pre > < code > dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=230 group by text_value order by count desc) to /tmp/crp.csv with csv;
COPY 22
< / code > < / pre >
< ul >
< li > Test running the replacements:< / li >
< / ul >
< pre > < code > $ ./fix-metadata-values.py -i /tmp/CRPs.csv -f cg.contributor.crp -t correct -m 230 -d dspace -u dspace -p 'fuuu'
< / code > < / pre >
< ul >
< li > Add < code > AMR< / code > to ILRI subjects and remove one duplicate instance of IITA in author affiliations controlled vocabulary (< a href = "https://github.com/ilri/DSpace/pull/288" > #288< / a > )< / li >
< / ul >
2016-11-08 10:27:36 +01:00
< h2 id = "2016-11-08" > 2016-11-08< / h2 >
< ul >
< li > Atmire’ s Listings and Reports module seems to be broken on DSpace 5.5< / li >
< / ul >
< p > < img src = "2016/11/listings-and-reports-55.png" alt = "Listings and Reports broken in DSpace 5.5" / > < / p >
< ul >
< li > I’ ve filed a ticket with Atmire< / li >
2016-11-08 11:44:29 +01:00
< li > Thinking about batch updates for ORCIDs and authors< / li >
< li > Playing with < a href = "https://github.com/moonlitesolutions/SolrClient" > SolrClient< / a > in Python to query Solr< / li >
< li > All records in the authority core are either < code > authority_type:orcid< / code > or < code > authority_type:person< / code > < / li >
< li > There is a < code > deleted< / code > field and all items seem to be < code > false< / code > , but might be important sanity check to remember< / li >
< li > The way to go is probably to have a CSV of author names and authority IDs, then to batch update them in PostgreSQL< / li >
< li > Dump of the top ~200 authors in CGSpace:< / li >
2016-11-08 10:27:36 +01:00
< / ul >
2016-11-08 11:44:29 +01:00
< pre > < code > dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=3 group by text_value order by count desc limit 210) to /tmp/210-authors.csv with csv;
< / code > < / pre >
2016-11-01 08:23:50 +01:00
< / article >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 offset-sm-1 blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "/cgspace-notes/2016-11/" > November, 2016< / a > < / li >
< li > < a href = "/cgspace-notes/2016-10/" > October, 2016< / a > < / li >
< li > < a href = "/cgspace-notes/2016-09/" > September, 2016< / a > < / li >
< li > < a href = "/cgspace-notes/2016-08/" > August, 2016< / a > < / li >
< li > < a href = "/cgspace-notes/2016-07/" > July, 2016< / a > < / li >
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
< p >
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >