2019-02-01 20:45:50 +01:00
<!DOCTYPE html>
2019-10-11 10:19:42 +02:00
< html lang = "en" >
2019-02-01 20:45:50 +01:00
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
< meta property = "og:title" content = "Posts" / >
< meta property = "og:description" content = "Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." / >
< meta property = "og:type" content = "website" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/posts/" / >
2019-12-01 10:29:49 +01:00
< meta property = "og:updated_time" content = "2019-12-01T11:22:30+02:00" / >
2019-02-01 20:45:50 +01:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "Posts" / >
< meta name = "twitter:description" content = "Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." / >
2019-11-28 16:30:45 +01:00
< meta name = "generator" content = "Hugo 0.60.0" / >
2019-02-01 20:45:50 +01:00
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "Blog",
"headline": "CGSpace Notes",
2019-04-13 11:15:55 +02:00
"url" : "https:\/\/alanorth.github.io\/cgspace-notes\/posts\/",
2019-02-01 20:45:50 +01:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
2019-12-01 10:29:49 +01:00
"dateModified": "2019-12-01T11:22:30+02:00",
2019-10-28 12:43:25 +01:00
"keywords": "notes,migration,notes,",
2019-04-13 11:15:55 +02:00
"description": "Documenting day-to-day work on the [CGSpace](https:\/\/cgspace.cgiar.org) repository."
2019-02-01 20:45:50 +01:00
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/posts/" >
< title > CGSpace Notes< / title >
2019-10-11 10:19:42 +02:00
2019-02-01 20:45:50 +01:00
<!-- combined, minified CSS -->
2019-02-13 17:47:17 +01:00
< link href = "https://alanorth.github.io/cgspace-notes/css/style.css" rel = "stylesheet" integrity = "sha384-G5B34w7DFTumWTswxYzTX7NWfbvQEg1HbFFEg6ItN03uTAAoS2qkPS/fu3LhuuSA" crossorigin = "anonymous" >
2019-10-11 10:19:42 +02:00
2019-02-01 20:45:50 +01:00
<!-- RSS 2.0 feed -->
2019-04-14 15:59:47 +02:00
< link rel = "alternate" type = "application/rss+xml" href = "https://alanorth.github.io/cgspace-notes/posts/index.xml" title = "CGSpace Notes" / >
2019-02-01 20:45:50 +01:00
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
2019-10-11 10:19:42 +02:00
< h1 class = "blog-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" dir = "auto" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
2019-02-01 20:45:50 +01:00
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
2019-12-01 10:29:49 +01:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2016-10/" > October, 2016< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2016-10-03T15:53:00+03:00" > Mon Oct 03, 2016< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
< h2 id = "20161003" > 2016-10-03< / h2 >
< ul >
< li > Testing adding < a href = "https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing" > ORCIDs to a CSV< / a > file for a single item to see if the author orders get messed up< / li >
< li > Need to test the following scenarios to see how author order is affected:
< ul >
< li > ORCIDs only< / li >
< li > ORCIDs plus normal authors< / li >
< / ul >
< / li >
< li > I exported a random item's metadata as CSV, deleted < em > all columns< / em > except id and collection, and made a new coloum called < code > ORCID:dc.contributor.author< / code > with the following random ORCIDs from the ORCID registry:< / li >
< / ul >
< pre > < code > 0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
< / code > < / pre >
< a href = 'https://alanorth.github.io/cgspace-notes/2016-10/' > Read more →< / a >
< / article >
2019-11-04 15:41:19 +01:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2016-09/" > September, 2016< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2016-09-01T15:53:00+03:00" > Thu Sep 01, 2016< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
2019-11-28 16:30:45 +01:00
< h2 id = "20160901" > 2016-09-01< / h2 >
2019-11-04 15:41:19 +01:00
< ul >
< li > Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors< / li >
2019-11-28 16:30:45 +01:00
< li > Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace< / li >
2019-11-04 15:41:19 +01:00
< li > We had been using < code > DC=ILRI< / code > to determine whether a user was ILRI or not< / li >
2019-11-28 16:30:45 +01:00
< li > It looks like we might be able to use OUs now, instead of DCs:< / li >
2019-11-04 15:41:19 +01:00
< / ul >
2019-11-28 16:30:45 +01:00
< pre > < code > $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b " dc=cgiarad,dc=org" -D " admigration1@cgiarad.org" -W " (sAMAccountName=admigration1)"
< / code > < / pre >
2019-11-04 15:41:19 +01:00
< a href = 'https://alanorth.github.io/cgspace-notes/2016-09/' > Read more →< / a >
< / article >
2019-10-28 12:43:25 +01:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2016-08/" > August, 2016< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2016-08-01T15:53:00+03:00" > Mon Aug 01, 2016< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
2019-11-28 16:30:45 +01:00
< h2 id = "20160801" > 2016-08-01< / h2 >
2019-10-28 12:43:25 +01:00
< ul >
< li > Add updated distribution license from Sisay (< a href = "https://github.com/ilri/DSpace/issues/259" > #259< / a > )< / li >
< li > Play with upgrading Mirage 2 dependencies in < code > bower.json< / code > because most are several versions of out date< / li >
< li > Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more< / li >
< li > bower stuff is a dead end, waste of time, too many issues< / li >
< li > Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of < code > fonts< / code > )< / li >
2019-11-28 16:30:45 +01:00
< li > Start working on DSpace 5.1 → 5.5 port:< / li >
< / ul >
2019-10-28 12:43:25 +01:00
< pre > < code > $ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5
2019-11-28 16:30:45 +01:00
< / code > < / pre >
2019-10-28 12:43:25 +01:00
< a href = 'https://alanorth.github.io/cgspace-notes/2016-08/' > Read more →< / a >
< / article >
2019-10-01 16:31:40 +02:00
< article class = "blog-post" >
< header >
2019-10-11 10:19:42 +02:00
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2016-07/" > July, 2016< / a > < / h2 >
2019-10-01 16:31:40 +02:00
< p class = "blog-post-meta" > < time datetime = "2016-07-01T10:53:00+03:00" > Fri Jul 01, 2016< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
2019-11-28 16:30:45 +01:00
< h2 id = "20160701" > 2016-07-01< / h2 >
2019-10-01 16:31:40 +02:00
< ul >
< li > Add < code > dc.description.sponsorship< / code > to Discovery sidebar facets and make investors clickable in item view (< a href = "https://github.com/ilri/DSpace/issues/232" > #232< / a > )< / li >
2019-11-28 16:30:45 +01:00
< li > I think this query should find and replace all authors that have “ ,” at the end of their names:< / li >
< / ul >
2019-10-01 16:31:40 +02:00
< pre > < code > dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
2019-11-28 16:30:45 +01:00
text_value
2019-10-01 16:31:40 +02:00
------------
(0 rows)
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > In this case the select query was showing 95 results before the update< / li >
2019-10-01 16:31:40 +02:00
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2016-07/' > Read more →< / a >
< / article >
2019-09-01 09:41:30 +02:00
< article class = "blog-post" >
< header >
2019-10-11 10:19:42 +02:00
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2016-06/" > June, 2016< / a > < / h2 >
2019-09-01 09:41:30 +02:00
< p class = "blog-post-meta" > < time datetime = "2016-06-01T10:53:00+03:00" > Wed Jun 01, 2016< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
2019-11-28 16:30:45 +01:00
< h2 id = "20160601" > 2016-06-01< / h2 >
2019-09-01 09:41:30 +02:00
< ul >
< li > Experimenting with IFPRI OAI (we want to harvest their publications)< / li >
2019-11-28 16:30:45 +01:00
< li > After reading the < a href = "https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html" > ContentDM documentation< / a > I found IFPRI's OAI endpoint: < a href = "http://ebrary.ifpri.org/oai/oai.php" > http://ebrary.ifpri.org/oai/oai.php< / a > < / li >
2019-09-01 09:41:30 +02:00
< li > After reading the < a href = "https://www.openarchives.org/OAI/openarchivesprotocol.html" > OAI documentation< / a > and testing with an < a href = "http://validator.oaipmh.com/" > OAI validator< / a > I found out how to get their publications< / li >
< li > This is their publications set: < a href = "http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc" > http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords& from=2016-01-01& set=p15738coll2& metadataPrefix=oai_dc< / a > < / li >
< li > You can see the others by using the OAI < code > ListSets< / code > verb: < a href = "http://ebrary.ifpri.org/oai/oai.php?verb=ListSets" > http://ebrary.ifpri.org/oai/oai.php?verb=ListSets< / a > < / li >
< li > Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in < code > dc.identifier.fund< / code > to < code > cg.identifier.cpwfproject< / code > and then the rest to < code > dc.description.sponsorship< / code > < / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2016-06/' > Read more →< / a >
< / article >
2019-08-04 21:49:04 +02:00
< article class = "blog-post" >
< header >
2019-10-11 10:19:42 +02:00
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2016-05/" > May, 2016< / a > < / h2 >
2019-08-04 21:49:04 +02:00
< p class = "blog-post-meta" > < time datetime = "2016-05-01T23:06:00+03:00" > Sun May 01, 2016< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
2019-11-28 16:30:45 +01:00
< h2 id = "20160501" > 2016-05-01< / h2 >
2019-08-04 21:49:04 +02:00
< ul >
< li > Since yesterday there have been 10,000 REST errors and the site has been unstable again< / li >
< li > I have blocked access to the API now< / li >
2019-11-28 16:30:45 +01:00
< li > There are 3,000 IPs accessing the REST API in a 24-hour period!< / li >
< / ul >
2019-08-04 21:49:04 +02:00
< pre > < code > # awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168
2019-11-28 16:30:45 +01:00
< / code > < / pre >
2019-08-04 21:49:04 +02:00
< a href = 'https://alanorth.github.io/cgspace-notes/2016-05/' > Read more →< / a >
< / article >
2019-07-01 11:22:43 +02:00
< article class = "blog-post" >
< header >
2019-10-11 10:19:42 +02:00
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2016-04/" > April, 2016< / a > < / h2 >
2019-07-01 11:22:43 +02:00
< p class = "blog-post-meta" > < time datetime = "2016-04-04T11:06:00+03:00" > Mon Apr 04, 2016< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
2019-11-28 16:30:45 +01:00
< h2 id = "20160404" > 2016-04-04< / h2 >
2019-07-01 11:22:43 +02:00
< ul >
< li > Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit< / li >
< li > We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc< / li >
2019-11-28 16:30:45 +01:00
< li > After running DSpace for over five years I've never needed to look in any other log file than dspace.log, leave alone one from last year!< / li >
< li > This will save us a few gigs of backup space we're paying for on S3< / li >
2019-07-01 11:22:43 +02:00
< li > Also, I noticed the < code > checker< / code > log has some errors we should pay attention to:< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2016-04/' > Read more →< / a >
< / article >
2019-06-02 09:57:51 +02:00
< article class = "blog-post" >
< header >
2019-10-11 10:19:42 +02:00
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2016-03/" > March, 2016< / a > < / h2 >
2019-06-02 09:57:51 +02:00
< p class = "blog-post-meta" > < time datetime = "2016-03-02T16:50:00+03:00" > Wed Mar 02, 2016< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
2019-11-28 16:30:45 +01:00
< h2 id = "20160302" > 2016-03-02< / h2 >
2019-06-02 09:57:51 +02:00
< ul >
< li > Looking at issues with author authorities on CGSpace< / li >
2019-11-28 16:30:45 +01:00
< li > For some reason we still have the < code > index-lucene-update< / code > cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module< / li >
2019-06-02 09:57:51 +02:00
< li > Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2016-03/' > Read more →< / a >
< / article >
2019-05-01 10:53:26 +02:00
< article class = "blog-post" >
< header >
2019-10-11 10:19:42 +02:00
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2016-02/" > February, 2016< / a > < / h2 >
2019-05-01 10:53:26 +02:00
< p class = "blog-post-meta" > < time datetime = "2016-02-05T13:18:00+03:00" > Fri Feb 05, 2016< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
2019-11-28 16:30:45 +01:00
< h2 id = "20160205" > 2016-02-05< / h2 >
2019-05-01 10:53:26 +02:00
< ul >
< li > Looking at some DAGRIS data for Abenet Yabowork< / li >
< li > Lots of issues with spaces, newlines, etc causing the import to fail< / li >
< li > I noticed we have a very < em > interesting< / em > list of countries on CGSpace:< / li >
< / ul >
2019-11-28 16:30:45 +01:00
< p > < img src = "/cgspace-notes/2016/02/cgspace-countries.png" alt = "CGSpace country list" > < / p >
2019-05-01 10:53:26 +02:00
< ul >
< li > Not only are there 49,000 countries, we have some blanks (25)… < / li >
< li > Also, lots of things like “ COTE D`LVOIRE” and “ COTE D IVOIRE” < / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2016-02/' > Read more →< / a >
< / article >
2019-04-01 08:02:18 +02:00
< article class = "blog-post" >
< header >
2019-10-11 10:19:42 +02:00
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2016-01/" > January, 2016< / a > < / h2 >
2019-04-01 08:02:18 +02:00
< p class = "blog-post-meta" > < time datetime = "2016-01-13T13:18:00+03:00" > Wed Jan 13, 2016< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
2019-11-28 16:30:45 +01:00
< h2 id = "20160113" > 2016-01-13< / h2 >
2019-04-01 08:02:18 +02:00
< ul >
< li > Move ILRI collection < code > 10568/12503< / code > from < code > 10568/27869< / code > to < code > 10568/27629< / code > using the < a href = "https://gist.github.com/alanorth/392c4660e8b022d99dfa" > move_collections.sh< / a > script I wrote last year.< / li >
< li > I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.< / li >
< li > Update GitHub wiki for documentation of < a href = "https://github.com/ilri/DSpace/wiki/Maintenance-Tasks" > maintenance tasks< / a > .< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2016-01/' > Read more →< / a >
< / article >
2019-02-01 20:45:50 +01:00
< nav class = "blog-pagination" >
< a class = "btn btn-outline-primary" href = "/cgspace-notes/posts/page/4/" rel = "prev" role = "button" > Previous page< / a >
2019-11-04 15:41:19 +01:00
< a class = "btn btn-outline-primary" href = "/cgspace-notes/posts/page/6/" rel = "next" role = "button" > Next page< / a >
2019-02-01 20:45:50 +01:00
< / nav >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
2019-12-01 10:29:49 +01:00
< li > < a href = "/cgspace-notes/2019-12/" > December, 2019< / a > < / li >
2019-11-04 15:41:19 +01:00
< li > < a href = "/cgspace-notes/2019-11/" > November, 2019< / a > < / li >
2019-10-28 12:43:25 +01:00
< li > < a href = "/cgspace-notes/cgspace-cgcorev2-migration/" > CGSpace CG Core v2 Migration< / a > < / li >
2019-10-01 16:31:40 +02:00
< li > < a href = "/cgspace-notes/2019-10/" > October, 2019< / a > < / li >
2019-09-01 09:41:30 +02:00
< li > < a href = "/cgspace-notes/2019-09/" > September, 2019< / a > < / li >
2019-02-01 20:45:50 +01:00
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
2019-10-11 10:19:42 +02:00
< p dir = "auto" >
2019-02-01 20:45:50 +01:00
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >