2016-07-01 15:00:04 +02:00
<!DOCTYPE html>
2016-09-21 14:24:28 +02:00
< html lang = "en" >
2016-11-24 14:17:06 +01:00
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
2016-09-21 14:24:28 +02:00
2016-11-24 14:17:06 +01:00
< meta property = "og:title" content = "July, 2016" / >
2016-11-14 08:27:03 +01:00
< meta property = "og:description" content = "2016-07-01
Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
I think this query should find and replace all authors that have “ ,” at the end of their names:
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, ' (^.+ ?),$' , ' \1' ) where metadata_field_id=3 and resource_type_id=2 and text_value ~ ' ^.+ ?,$' ;
UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ ' ^.+ ?,$' ;
text_value
------------
(0 rows)
In this case the select query was showing 95 results before the update
" />
< meta property = "og:type" content = "article" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2016-07/" / >
< meta property = "og:updated_time" content = "2016-07-01T10:53:00+03:00" / >
2016-09-21 14:24:28 +02:00
2016-11-24 14:17:06 +01:00
2016-11-14 08:27:03 +01:00
< meta itemprop = "name" content = "July, 2016" >
< meta itemprop = "description" content = "2016-07-01
Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
I think this query should find and replace all authors that have “ ,” at the end of their names:
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, ' (^.+ ?),$' , ' \1' ) where metadata_field_id=3 and resource_type_id=2 and text_value ~ ' ^.+ ?,$' ;
UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ ' ^.+ ?,$' ;
text_value
------------
(0 rows)
In this case the select query was showing 95 results before the update
">
< meta itemprop = "dateModified" content = "2016-07-01T10:53:00+03:00" / >
< meta itemprop = "wordCount" content = "866" >
< meta itemprop = "keywords" content = "notes," / >
2016-11-24 14:17:06 +01:00
2016-11-14 08:27:03 +01:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "July, 2016" / >
< meta name = "twitter:description" content = "2016-07-01
Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
I think this query should find and replace all authors that have “ ,” at the end of their names:
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, ' (^.+ ?),$' , ' \1' ) where metadata_field_id=3 and resource_type_id=2 and text_value ~ ' ^.+ ?,$' ;
UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ ' ^.+ ?,$' ;
text_value
------------
(0 rows)
In this case the select query was showing 95 results before the update
"/>
2016-09-21 14:24:28 +02:00
2016-10-14 23:13:52 +02:00
2016-11-24 14:17:06 +01:00
2017-01-04 15:07:43 +01:00
< meta name = "generator" content = "Hugo 0.18.1" / >
2016-09-21 14:24:28 +02:00
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2016-07/" >
< title > July, 2016 | CGSpace Notes< / title >
<!-- combined, minified CSS -->
2017-01-11 10:45:50 +01:00
< link href = "https://alanorth.github.io/cgspace-notes/css/style.css" rel = "stylesheet" integrity = "sha384-qRVpIj9hSzsBhmO8Y7YEKF2UFra2sJQtl9V/uFKKDvy+Wjh9zgTku6VRgT8YdPoD" crossorigin = "anonymous" >
2016-09-21 14:24:28 +02:00
2016-11-14 08:27:03 +01:00
2016-11-24 14:17:06 +01:00
2016-09-21 14:24:28 +02:00
< / head >
2016-11-24 14:17:06 +01:00
< body >
2016-09-21 14:24:28 +02:00
2016-11-24 14:17:06 +01:00
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
2016-11-17 14:59:59 +01:00
2016-12-28 11:35:40 +01:00
2016-11-24 14:17:06 +01:00
< / nav >
< / div >
2016-09-21 14:24:28 +02:00
< / div >
2016-11-24 14:17:06 +01:00
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< / div >
< / header >
2016-09-21 14:24:28 +02:00
2016-11-24 14:17:06 +01:00
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
2016-09-21 14:24:28 +02:00
2016-11-24 14:17:06 +01:00
2016-11-14 08:27:03 +01:00
2016-11-24 14:17:06 +01:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" > < a href = "https://alanorth.github.io/cgspace-notes/2016-07/" > July, 2016< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2016-07-01T10:53:00+03:00" > Fri Jul 01, 2016< / time > by Alan Orth in
2016-09-27 22:54:30 +02:00
2016-11-24 14:17:06 +01:00
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
2016-09-27 22:54:30 +02:00
< / p >
2016-11-24 14:17:06 +01:00
< / header >
< h2 id = "2016-07-01" > 2016-07-01< / h2 >
2016-07-01 15:00:04 +02:00
< ul >
< li > Add < code > dc.description.sponsorship< / code > to Discovery sidebar facets and make investors clickable in item view (< a href = "https://github.com/ilri/DSpace/issues/232" > #232< / a > )< / li >
2016-07-02 16:04:52 +02:00
< li > I think this query should find and replace all authors that have “ ,” at the end of their names:< / li >
2016-07-01 15:00:04 +02:00
< / ul >
2016-09-02 15:43:39 +02:00
< pre > < code > dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
2016-07-01 15:56:54 +02:00
UPDATE 95
2016-09-02 15:43:39 +02:00
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
2016-07-01 15:56:54 +02:00
text_value
------------
(0 rows)
< / code > < / pre >
2016-07-02 16:04:52 +02:00
< ul >
< li > In this case the select query was showing 95 results before the update< / li >
< / ul >
2016-10-03 17:28:33 +02:00
< p > < / p >
2016-08-03 09:09:36 +02:00
< h2 id = "2016-07-02" > 2016-07-02< / h2 >
2016-07-02 16:04:52 +02:00
< ul >
< li > Comment on DSpace Jira ticket about author lookup search text (< a href = "https://jira.duraspace.org/browse/DS-2329" > DS-2329< / a > )< / li >
2016-07-05 18:07:25 +02:00
< / ul >
2016-08-03 09:09:36 +02:00
< h2 id = "2016-07-04" > 2016-07-04< / h2 >
2016-07-05 18:07:25 +02:00
< ul >
< li > Seems the database’ s author authority values mean nothing without the < code > authority< / code > Solr core from the host where they were created!< / li >
< / ul >
2016-08-03 09:09:36 +02:00
< h2 id = "2016-07-05" > 2016-07-05< / h2 >
2016-07-05 18:07:25 +02:00
< ul >
< li > Amend < code > backup-solr.sh< / code > script so it backs up the entire Solr folder< / li >
< li > We < em > really< / em > only need < code > statistics< / code > and < code > authority< / code > but meh< / li >
< li > Fix metadata for species on DSpace Test:< / li >
< / ul >
< pre > < code > $ ./fix-metadata-values.py -i /tmp/Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 94 -d dspacetest -u dspacetest -p 'fuuu'
< / code > < / pre >
< ul >
< li > Will run later on CGSpace< / li >
< li > A user is still having problems with Sherpa/Romeo causing crashes during the submission process when the journal is “ ungraded” < / li >
< li > I tested the < a href = "https://jira.duraspace.org/browse/DS-2740" > patch for DS-2740< / a > that I had found last month and it seems to work< / li >
< li > I will merge it to < code > 5_x-prod< / code > < / li >
2016-07-06 17:49:52 +02:00
< / ul >
2016-08-03 09:09:36 +02:00
< h2 id = "2016-07-06" > 2016-07-06< / h2 >
2016-07-06 17:49:52 +02:00
< ul >
< li > Delete 23 blank metadata values from CGSpace:< / li >
< / ul >
< pre > < code > cgspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
DELETE 23
< / code > < / pre >
< ul >
< li > Complete phase three of metadata migration, for the following fields:
< ul >
< li > dc.title.jtitle → dc.source< / li >
< li > dc.crsubject.crpsubject → cg.contributor.crp< / li >
< li > dc.contributor.affiliation → cg.contributor.affiliation< / li >
< li > dc.Species → cg.species< / li >
< li > dc.srplace.subregion → cg.coverage.subregion< / li >
< li > dc.contributor.corporate → dc.contributor.author< / li >
< li > dc.identifier.url → cg.identifier.url< / li >
< li > dc.identifier.doi → cg.identifier.doi< / li >
< li > dc.identifier.googleurl → cg.identifier.googleurl< / li >
< li > dc.identifier.dataurl → cg.identifier.dataurl< / li >
< / ul > < / li >
< li > Also, run fixes and deletes for species and author affiliations (over 1000 corrections!)< / li >
< / ul >
< pre > < code > $ ./fix-metadata-values.py -i Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 212 -d dspace -u dspace -p 'fuuu'
2016-07-07 08:21:12 +02:00
$ ./fix-metadata-values.py -i Affiliations-Fix-1045-Peter-Abenet.csv -f dc.contributor.affiliation -t Correct -m 211 -d dspace -u dspace -p 'fuuu'
$ ./delete-metadata-values.py -f dc.contributor.affiliation -i Affiliations-Delete-Peter-Abenet.csv -m 211 -u dspace -d dspace -p 'fuuu'
2016-07-06 17:49:52 +02:00
< / code > < / pre >
< ul >
< li > I then ran all server updates and rebooted the server< / li >
2016-07-02 16:04:52 +02:00
< / ul >
2016-08-03 09:09:36 +02:00
< h2 id = "2016-07-11" > 2016-07-11< / h2 >
2016-07-11 22:10:00 +02:00
< ul >
< li > Doing some author cleanups from Peter and Abenet:< / li >
< / ul >
< pre > < code > $ ./fix-metadata-values.py -i /tmp/Authors-Fix-205-UTF8.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
$ ./delete-metadata-values.py -f dc.contributor.author -i /tmp/Authors-Delete-UTF8.csv -m 3 -u dspacetest -d dspacetest -p fuuu
< / code > < / pre >
2016-08-03 09:09:36 +02:00
< h2 id = "2016-07-13" > 2016-07-13< / h2 >
2016-07-13 15:05:33 +02:00
< ul >
< li > Run the author cleanups on CGSpace and start a full Discovery re-index< / li >
2016-07-18 16:25:30 +02:00
< / ul >
2016-08-03 09:09:36 +02:00
< h2 id = "2016-07-14" > 2016-07-14< / h2 >
2016-07-25 13:04:28 +02:00
< ul >
< li > Test LDAP settings for new root LDAP< / li >
< li > Seems to work when binding as a top-level user< / li >
< / ul >
2016-08-03 09:09:36 +02:00
< h2 id = "2016-07-18" > 2016-07-18< / h2 >
2016-07-18 16:25:30 +02:00
< ul >
< li > Adjust identifiers in XMLUI item display to be more prominent< / li >
< li > Add species and breed to the XMLUI item display< / li >
2016-07-19 13:38:30 +02:00
< li > CGSpace crashed late at night and the DSpace logs were showing:< / li >
2016-07-13 15:05:33 +02:00
< / ul >
2016-07-19 13:38:30 +02:00
< pre > < code > 2016-07-18 20:26:30,941 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
...
< / code > < / pre >
< ul >
< li > I suspect it’ s someone hitting REST too much:< / li >
< / ul >
< pre > < code > # awk '{print $1}' /var/log/nginx/rest.log | sort -n | uniq -c | sort -h | tail -n 3
710 66.249.78.38
1781 181.118.144.29
24904 70.32.99.142
< / code > < / pre >
< ul >
< li > I just blocked access to < code > /rest< / code > for that last IP for now:< / li >
< / ul >
< pre > < code > # log rest requests
location /rest {
access_log /var/log/nginx/rest.log;
proxy_pass http://127.0.0.1:8443;
deny 70.32.99.142;
}
< / code > < / pre >
2016-08-03 09:09:36 +02:00
< h2 id = "2016-07-21" > 2016-07-21< / h2 >
2016-07-21 17:06:56 +02:00
< ul >
< li > Mitigate the < a href = "https://httpoxy.org" > HTTPoxy< / a > vulnerability for Tomcat etc in nginx: < a href = "https://github.com/ilri/rmg-ansible-public/pull/38" > https://github.com/ilri/rmg-ansible-public/pull/38< / a > < / li >
< li > Unblock 70.32.99.142 from < code > /rest< / code > as it has been blocked for a few days< / li >
2016-07-22 13:58:35 +02:00
< / ul >
2016-08-03 09:09:36 +02:00
< h2 id = "2016-07-22" > 2016-07-22< / h2 >
2016-07-22 13:58:35 +02:00
< ul >
< li > Help Paola from CCAFS with thumbnails for batch uploads< / li >
< li > She has been struggling to get the dimensions right, and manually enlarging smaller thumbnails, renaming PNGs to JPG, etc< / li >
< li > Altmetric reports having an issue with some of our authors being doubled… < / li >
< li > This is related to authority and confidence!< / li >
< li > We might need to use < code > index.authority.ignore-prefered=true< / code > to tell the Discovery index to prefer the variation that exists in the metadatavalue rather than what it finds in the authority cache.< / li >
< li > Trying these on DSpace Test after a discussion by Daniel Scharon on the dspace-tech mailing list:< / li >
< / ul >
< pre > < code > index.authority.ignore-prefered.dc.contributor.author=true
index.authority.ignore-variants.dc.contributor.author=false
< / code > < / pre >
< ul >
2016-07-22 16:59:12 +02:00
< li > After reindexing I don’ t see any change in Discovery’ s display of authors, and still have entries like:< / li >
< / ul >
< pre > < code > Grace, D. (464)
Grace, D. (62)
< / code > < / pre >
< ul >
< li > I asked for clarification of the following options on the DSpace mailing list:< / li >
< / ul >
< pre > < code > index.authority.ignore
index.authority.ignore-prefered
index.authority.ignore-variants
< / code > < / pre >
< ul >
< li > In the mean time, I will try these on DSpace Test (plus a reindex):< / li >
< / ul >
< pre > < code > index.authority.ignore=true
index.authority.ignore-prefered=true
index.authority.ignore-variants=true
< / code > < / pre >
< ul >
< li > Enabled usage of < code > X-Forwarded-For< / code > in DSpace admin control panel (< a href = "https://github.com/ilri/DSpace/pull/255" > #255< / a > < / li >
< li > It was misconfigured and disabled, but already working for some reason < em > sigh< / em > < / li >
2016-07-22 18:42:57 +02:00
< li > … no luck. Trying with just:< / li >
2016-07-21 17:06:56 +02:00
< / ul >
2016-07-22 18:42:57 +02:00
< pre > < code > index.authority.ignore=true
< / code > < / pre >
2016-07-25 13:04:28 +02:00
< ul >
< li > After re-indexing and clearing the XMLUI cache nothing has changed< / li >
< / ul >
2016-08-03 09:09:36 +02:00
< h2 id = "2016-07-25" > 2016-07-25< / h2 >
2016-07-25 13:04:28 +02:00
< ul >
< li > Trying a few more settings (plus reindex) for Discovery on DSpace Test:< / li >
< / ul >
< pre > < code > index.authority.ignore-prefered.dc.contributor.author=true
index.authority.ignore-variants=true
< / code > < / pre >
< ul >
< li > Run all OS updates and reboot DSpace Test server< / li >
< li > No changes to Discovery after reindexing… hmm.< / li >
< li > Integrate and massively clean up About page (< a href = "https://github.com/ilri/DSpace/pull/256" > #256< / a > )< / li >
< / ul >
2017-01-09 15:20:52 +01:00
< p > < img src = "/cgspace-notes/2016/07/cgspace-about-page.png" alt = "About page" / > < / p >
2016-07-25 15:15:54 +02:00
2016-07-25 15:55:00 +02:00
< ul >
< li > The DSpace source code mentions the configuration key < code > discovery.index.authority.ignore-prefered.*< / code > (with prefix of discovery, despite the docs saying otherwise), so I’ m trying the following on DSpace Test:< / li >
< / ul >
< pre > < code > discovery.index.authority.ignore-prefered.dc.contributor.author=true
discovery.index.authority.ignore-variants=true
< / code > < / pre >
2016-07-25 20:03:34 +02:00
< ul >
< li > Still no change!< / li >
< li > Deploy species, breed, and identifier changes to CGSpace, as well as About page< / li >
< li > Run Linode RAM upgrade (8→12GB)< / li >
< li > Re-sync DSpace Test with CGSpace< / li >
< li > I noticed that our backup scripts don’ t send Solr cores to S3 so I amended the script< / li >
2016-08-01 14:45:50 +02:00
< / ul >
2016-08-03 09:09:36 +02:00
< h2 id = "2016-07-31" > 2016-07-31< / h2 >
2016-08-01 14:45:50 +02:00
< ul >
< li > Work on removing Dryland Systems and Humidtropics subjects from Discovery sidebar and Browse by< / li >
< li > Also change “ Subjects” to “ AGROVOC keywords” in Discovery sidebar/search and Browse by (< a href = "https://github.com/ilri/DSpace/issues/257" > #257< / a > )< / li >
2016-07-25 20:03:34 +02:00
< / ul >
2016-11-14 08:27:03 +01:00
2016-11-24 14:17:06 +01:00
2017-01-08 16:08:08 +01:00
2016-11-24 14:17:06 +01:00
2017-01-08 16:08:08 +01:00
< / article >
2016-11-14 08:27:03 +01:00
2016-09-21 14:24:28 +02:00
2016-11-24 14:17:06 +01:00
< / div > <!-- /.blog - main -->
2016-09-21 14:24:28 +02:00
2016-11-24 14:17:06 +01:00
< aside class = "col-sm-3 offset-sm-1 blog-sidebar" >
2016-09-21 14:24:28 +02:00
2017-01-09 15:20:52 +01:00
2016-09-21 14:24:28 +02:00
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
2016-07-01 15:00:04 +02:00
2017-01-04 15:07:43 +01:00
< li > < a href = " / cgspace-notes / 2017-01 / " > January, 2017< / a > < / li >
2016-12-02 12:13:58 +01:00
< li > < a href = "/cgspace-notes/2016-12/" > December, 2016< / a > < / li >
2016-11-01 08:23:50 +01:00
< li > < a href = "/cgspace-notes/2016-11/" > November, 2016< / a > < / li >
2016-10-03 17:28:33 +02:00
< li > < a href = "/cgspace-notes/2016-10/" > October, 2016< / a > < / li >
2016-09-21 14:24:28 +02:00
< li > < a href = "/cgspace-notes/2016-09/" > September, 2016< / a > < / li >
2016-07-01 15:00:04 +02:00
2016-09-21 14:24:28 +02:00
< / ol >
< / section >
2016-07-01 15:00:04 +02:00
2017-01-09 15:20:52 +01:00
2016-09-21 14:24:28 +02:00
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
2016-07-01 15:00:04 +02:00
2016-09-21 14:24:28 +02:00
< / aside >
2016-07-01 15:00:04 +02:00
2016-11-24 14:17:06 +01:00
< / div > <!-- /.row -->
< / div > <!-- /.container -->
2016-09-21 14:24:28 +02:00
2016-11-24 14:17:06 +01:00
< footer class = "blog-footer" >
< p >
2016-10-14 23:13:52 +02:00
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
2016-11-24 14:17:06 +01:00
< / p >
< p >
2017-01-05 14:44:45 +01:00
< a href = "#" > Back to top< / a >
2016-11-24 14:17:06 +01:00
< / p >
< / footer >
2016-07-01 15:00:04 +02:00
2016-11-24 14:17:06 +01:00
< / body >
2016-07-01 15:00:04 +02:00
2016-09-21 14:24:28 +02:00
< / html >