2019-11-04 15:41:19 +01:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
2020-12-06 15:53:29 +01:00
2019-11-04 15:41:19 +01:00
< meta property = "og:title" content = "CGSpace Notes" / >
< meta property = "og:description" content = "Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." / >
< meta property = "og:type" content = "website" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/" / >
2022-06-21 15:59:04 +02:00
< meta property = "og:updated_time" content = "2022-06-18T20:39:37+03:00" / >
2020-12-06 15:53:29 +01:00
2019-11-04 15:41:19 +01:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "CGSpace Notes" / >
< meta name = "twitter:description" content = "Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." / >
2022-06-14 07:45:07 +02:00
< meta name = "generator" content = "Hugo 0.100.2" / >
2019-11-04 15:41:19 +01:00
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "Blog",
"headline": "CGSpace Notes",
2020-04-02 09:55:42 +02:00
"url" : "https://alanorth.github.io/cgspace-notes/",
2019-11-04 15:41:19 +01:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
2022-06-06 08:45:43 +02:00
"dateModified": "2022-06-06T09:01:36+03:00",
2020-11-16 09:54:00 +01:00
"keywords": "notes, migration, notes",
2020-04-02 09:55:42 +02:00
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
2019-11-04 15:41:19 +01:00
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/" >
< title > CGSpace Notes< / title >
<!-- combined, minified CSS -->
2020-01-23 19:19:38 +01:00
2021-01-24 08:46:27 +01:00
< link href = "https://alanorth.github.io/cgspace-notes/css/style.beb8012edc08ba10be012f079d618dc243812267efe62e11f22fe49618f976a4.css" rel = "stylesheet" integrity = "sha256-vrgBLtwIuhC+AS8HnWGNwkOBImfv5i4R8i/klhj5dqQ=" crossorigin = "anonymous" >
2019-11-04 15:41:19 +01:00
2020-01-28 11:01:42 +01:00
<!-- minified Font Awesome for SVG icons -->
2021-09-28 09:32:32 +02:00
< script defer src = "https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity = "sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz+lcnA=" crossorigin = "anonymous" > < / script >
2020-01-28 11:01:42 +01:00
2019-11-04 15:41:19 +01:00
<!-- RSS 2.0 feed -->
< link rel = "alternate" type = "application/rss+xml" href = "https://alanorth.github.io/cgspace-notes/index.xml" title = "CGSpace Notes" / >
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link active" href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" dir = "auto" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
2022-06-06 08:45:43 +02:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2018-06/" > June, 2018< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2018-06-04T19:49:54-07:00" > Mon Jun 04, 2018< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2018-06-04" > 2018-06-04< / h2 >
< ul >
< li > Test the < a href = "https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560" > DSpace 5.8 module upgrades from Atmire< / a > (< a href = "https://github.com/ilri/DSpace/pull/378" > #378< / a > )
< ul >
< li > There seems to be a problem with the CUA and L& R versions in < code > pom.xml< / code > because they are using SNAPSHOT and it doesn’ t build< / li >
< / ul >
< / li >
< li > I added the new CCAFS Phase II Project Tag < code > PII-FP1_PACCA2< / code > and merged it into the < code > 5_x-prod< / code > branch (< a href = "https://github.com/ilri/DSpace/pull/379" > #379< / a > )< / li >
< li > I proofed and tested the ILRI author corrections that Peter sent back to me this week:< / li >
< / ul >
< pre tabindex = "0" > < code > $ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p ' fuuu' -f dc.contributor.author -t correct -m 3 -n
< / code > < / pre > < ul >
< li > I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in < a href = "/cgspace-notes/2018-03/" > March, 2018< / a > < / li >
< li > Time to index ~70,000 items on CGSpace:< / li >
< / ul >
< pre tabindex = "0" > < code > $ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s
user 8m5.056s
sys 2m7.289s
< / code > < / pre >
< a href = 'https://alanorth.github.io/cgspace-notes/2018-06/' > Read more →< / a >
< / article >
2022-05-04 10:09:45 +02:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2018-05/" > May, 2018< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2018-05-01T16:43:54+03:00" > Tue May 01, 2018< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2018-05-01" > 2018-05-01< / h2 >
< ul >
< li > I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
< ul >
< li > http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E< / li >
< li > http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E< / li >
< / ul >
< / li >
< li > Then I reduced the JVM heap size from 6144 back to 5120m< / li >
< li > Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the < a href = "https://github.com/ilri/rmg-ansible-public" > Ansible infrastructure scripts< / a > to support hosts choosing which distribution they want to use< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2018-05/' > Read more →< / a >
< / article >
2022-04-04 18:15:58 +02:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2018-04/" > April, 2018< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2018-04-01T16:13:54+02:00" > Sun Apr 01, 2018< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2018-04-01" > 2018-04-01< / h2 >
< ul >
< li > I tried to test something on DSpace Test but noticed that it’ s down since god knows when< / li >
< li > Catalina logs at least show some memory errors yesterday:< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2018-04/' > Read more →< / a >
< / article >
2022-03-01 15:48:40 +01:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2018-03/" > March, 2018< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2018-03-02T16:07:54+02:00" > Fri Mar 02, 2018< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2018-03-02" > 2018-03-02< / h2 >
< ul >
< li > Export a CSV of the IITA community metadata for Martin Mueller< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2018-03/' > Read more →< / a >
< / article >
2022-02-10 18:35:40 +01:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2018-02/" > February, 2018< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2018-02-01T16:28:54+02:00" > Thu Feb 01, 2018< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2018-02-01" > 2018-02-01< / h2 >
< ul >
< li > Peter gave feedback on the < code > dc.rights< / code > proof of concept that I had sent him last week< / li >
< li > We don’ t need to distinguish between internal and external works, so that makes it just a simple list< / li >
< li > Yesterday I figured out how to monitor DSpace sessions using JMX< / li >
< li > I copied the logic in the < code > jmx_tomcat_dbpools< / code > provided by Ubuntu’ s < code > munin-plugins-java< / code > package and used the stuff I discovered about JMX < a href = "/cgspace-notes/2018-01/" > in 2018-01< / a > < / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2018-02/' > Read more →< / a >
< / article >
2022-01-01 14:21:47 +01:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2018-01/" > January, 2018< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2018-01-02T08:35:54-08:00" > Tue Jan 02, 2018< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2018-01-02" > 2018-01-02< / h2 >
< ul >
< li > Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time< / li >
< li > I didn’ t get any load alerts from Linode and the REST and XMLUI logs don’ t show anything out of the ordinary< / li >
< li > The nginx logs show HTTP 200s until < code > 02/Jan/2018:11:27:17 +0000< / code > when Uptime Robot got an HTTP 500< / li >
< li > In dspace.log around that time I see many errors like “ Client closed the connection before file download was complete” < / li >
< li > And just before that I see this:< / li >
< / ul >
< pre tabindex = "0" > < code > Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
< / code > < / pre > < ul >
< li > Ah hah! So the pool was actually empty!< / li >
< li > I need to increase that, let’ s try to bump it up from 50 to 75< / li >
< li > After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’ t know what the hell Uptime Robot saw< / li >
< li > I notice this error quite a few times in dspace.log:< / li >
< / ul >
< pre tabindex = "0" > < code > 2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
2022-03-04 13:30:06 +01:00
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse ' dateIssued_keyword:[1976+TO+1979]' : Encountered " " ]" " ] " " at line 1, column 32.
2022-01-01 14:21:47 +01:00
< / code > < / pre > < ul >
< li > And there are many of these errors every day for the past month:< / li >
< / ul >
2022-03-04 13:30:06 +01:00
< pre tabindex = "0" > < code > $ grep -c " Error while searching for sidebar facets" dspace.log.*
2022-01-01 14:21:47 +01:00
dspace.log.2017-11-21:4
dspace.log.2017-11-22:1
dspace.log.2017-11-23:4
dspace.log.2017-11-24:11
dspace.log.2017-11-25:0
dspace.log.2017-11-26:1
dspace.log.2017-11-27:7
dspace.log.2017-11-28:21
dspace.log.2017-11-29:31
dspace.log.2017-11-30:15
dspace.log.2017-12-01:15
dspace.log.2017-12-02:20
dspace.log.2017-12-03:38
dspace.log.2017-12-04:65
dspace.log.2017-12-05:43
dspace.log.2017-12-06:72
dspace.log.2017-12-07:27
dspace.log.2017-12-08:15
dspace.log.2017-12-09:29
dspace.log.2017-12-10:35
dspace.log.2017-12-11:20
dspace.log.2017-12-12:44
dspace.log.2017-12-13:36
dspace.log.2017-12-14:59
dspace.log.2017-12-15:104
dspace.log.2017-12-16:53
dspace.log.2017-12-17:66
dspace.log.2017-12-18:83
dspace.log.2017-12-19:101
dspace.log.2017-12-20:74
dspace.log.2017-12-21:55
dspace.log.2017-12-22:66
dspace.log.2017-12-23:50
dspace.log.2017-12-24:85
dspace.log.2017-12-25:62
dspace.log.2017-12-26:49
dspace.log.2017-12-27:30
dspace.log.2017-12-28:54
dspace.log.2017-12-29:68
dspace.log.2017-12-30:89
dspace.log.2017-12-31:53
dspace.log.2018-01-01:45
dspace.log.2018-01-02:34
< / code > < / pre > < ul >
< li > Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’ s Encrypt if it’ s just a handful of domains< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2018-01/' > Read more →< / a >
< / article >
2021-12-03 11:58:43 +01:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2017-12/" > December, 2017< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2017-12-01T13:53:54+03:00" > Fri Dec 01, 2017< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2017-12-01" > 2017-12-01< / h2 >
< ul >
< li > Uptime Robot noticed that CGSpace went down< / li >
< li > The logs say “ Timeout waiting for idle object” < / li >
< li > PostgreSQL activity says there are 115 connections currently< / li >
< li > The list of connections to XMLUI and REST API for today:< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2017-12/' > Read more →< / a >
< / article >
2021-11-01 09:49:21 +01:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2017-11/" > November, 2017< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2017-11-02T09:37:54+02:00" > Thu Nov 02, 2017< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2017-11-01" > 2017-11-01< / h2 >
< ul >
< li > The CORE developers responded to say they are looking into their bot not respecting our robots.txt< / li >
< / ul >
< h2 id = "2017-11-02" > 2017-11-02< / h2 >
< ul >
< li > Today there have been no hits by CORE and no alerts from Linode (coincidence?)< / li >
< / ul >
2022-03-04 13:30:06 +01:00
< pre tabindex = "0" > < code > # grep -c " CORE" /var/log/nginx/access.log
2021-11-01 09:49:21 +01:00
0
< / code > < / pre > < ul >
< li > Generate list of authors on CGSpace for Peter to go through and correct:< / li >
< / ul >
2022-03-04 13:30:06 +01:00
< pre tabindex = "0" > < code > dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = ' contributor' and qualifier = ' author' ) AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
2021-11-01 09:49:21 +01:00
COPY 54701
< / code > < / pre >
< a href = 'https://alanorth.github.io/cgspace-notes/2017-11/' > Read more →< / a >
< / article >
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2017-10/" > October, 2017< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2017-10-01T08:07:54+03:00" > Sun Oct 01, 2017< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2017-10-01" > 2017-10-01< / h2 >
< ul >
< li > Peter emailed to point out that many items in the < a href = "https://cgspace.cgiar.org/handle/10568/2703" > ILRI archive collection< / a > have multiple handles:< / li >
< / ul >
< pre tabindex = "0" > < code > http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
< / code > < / pre > < ul >
< li > There appears to be a pattern but I’ ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine< / li >
< li > Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2017-10/' > Read more →< / a >
< / article >
2021-09-02 16:21:48 +02:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/cgiar-library-migration/" > CGIAR Library Migration< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2017-09-18T16:38:35+03:00" > Mon Sep 18, 2017< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/categories/notes/" rel = "category tag" > Notes< / a >
< span class = "fas fa-tag" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/tags/migration/" rel = "tag" > Migration< / a >
< / p >
< / header >
< p > Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called < em > CGIAR System Organization< / em > .< / p >
< a href = 'https://alanorth.github.io/cgspace-notes/cgiar-library-migration/' > Read more →< / a >
< / article >
2019-11-04 15:41:19 +01:00
< nav class = "blog-pagination" >
< a class = "btn btn-outline-primary" href = "/cgspace-notes/page/5/" rel = "prev" role = "button" > Previous page< / a >
2020-09-03 12:50:56 +02:00
< a class = "btn btn-outline-primary" href = "/cgspace-notes/page/7/" rel = "next" role = "button" > Next page< / a >
2019-11-04 15:41:19 +01:00
< / nav >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
2022-06-06 08:45:43 +02:00
< li > < a href = "/cgspace-notes/2022-06/" > June, 2022< / a > < / li >
2022-05-04 10:09:45 +02:00
< li > < a href = "/cgspace-notes/2022-05/" > May, 2022< / a > < / li >
2022-04-27 08:58:45 +02:00
< li > < a href = "/cgspace-notes/2022-04/" > April, 2022< / a > < / li >
2022-03-01 15:48:40 +01:00
2022-04-27 08:58:45 +02:00
< li > < a href = "/cgspace-notes/2022-03/" > March, 2022< / a > < / li >
2022-04-04 18:15:58 +02:00
2022-02-10 18:35:40 +01:00
< li > < a href = "/cgspace-notes/2022-02/" > February, 2022< / a > < / li >
2019-11-04 15:41:19 +01:00
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
< p dir = "auto" >
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >