2016-07-01 Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232) I think this query should find and replace all authors that have “,” at the end of their names (there were 95 matching the select before): dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and text_value ~ '^.+?,$'; UPDATE 95 dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and text_value ~ '^.+?,$'; text_value
2016-06-01 Experimenting with IFPRI OAI (we want to harvest their publications) After reading the ContentDM documentation I found IFPRI’s OAI endpoint: http://ebrary.ifpri.org/oai/oai.php After reading the OAI documentation and testing with an OAI validator I found out how to get their publications This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA'); UPDATE 497 dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75; UPDATE 14 Fix a few minor miscellaneous issues in dspace.cfg (#227) 2016-06-02 Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with cg.coverage.admin-unit Seems that the Browse configuration in dspace.cfg can’t handle the ‘-’ in the field name: webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error I’ve sent a message to the DSpace mailing list to ask about the Browse index definition A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue I found a thread on the mailing list talking about it and there is bug report and a patch: https://jira.duraspace.org/browse/DS-2740 The patch applies successfully on DSpace 5.1 so I will try it later 2016-06-03 Investigating the CCAFS authority issue, I exported the metadata for the Videos collection The top two authors are: CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500 CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600 So the only difference is the “confidence” Ok, well THAT is interesting: dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %'; text_value | authority | confidence ------------+--------------------------------------+------------ Orth, A.
2016-05-01 Since yesterday there have been 10,000 REST errors and the site has been unstable again I have blocked access to the API now There are 3,000 IPs accessing the REST API in a 24-hour period! # awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l 3168 The two most often requesters are in Ethiopia and Colombia: 213.55.99.121 and 181.118.144.29 100% of the requests coming from Ethiopia are like this and result in an HTTP 500: GET /rest/handle/10568/NaN?expand=parentCommunityList,metadata HTTP/1.1 For now I’ll block just the Ethiopian IP The owner of that application has said that the NaN (not a number) is an error in his code and he’ll fix it 2016-05-03 Update nginx to 1.10.x branch on CGSpace Fix a reference to dc.type.output in Discovery that I had missed when we migrated to dc.type last month (#223) 2016-05-06 DSpace Test is down, catalina.out has lots of messages about heap space from some time yesterday (!) It looks like Sisay was doing some batch imports Hmm, also disk space is full I decided to blow away the solr indexes, since they are 50GB and we don’t really need all the Atmire stuff there right now I will re-generate the Discovery indexes after re-deploying Testing renew-letsencrypt.sh script for nginx #!/usr/bin/env bash readonly SERVICE_BIN=/usr/sbin/service readonly LETSENCRYPT_BIN=/opt/letsencrypt/letsencrypt-auto # stop nginx so LE can listen on port 443 $SERVICE_BIN nginx stop $LETSENCRYPT_BIN renew -nvv --standalone --standalone-supported-challenges tls-sni-01 > /var/log/letsencrypt/renew.log 2>&1 LE_RESULT=$?
2016-04-04 Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year! This will save us a few gigs of backup space we’re paying for on S3 Also, I noticed the checker log has some errors we should pay attention to: Run start time: 03/06/2016 04:00:22 Error retrieving bitstream ID 71274 from asset store.
</div>
<footer>
<ulclass="pager">
<liclass="next"><ahref="/cgspace-notes/2016-04/">Read more <spanaria-hidden="true">»</span></a></li>
2016-03-02 Looking at issues with author authorities on CGSpace For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server 2016-03-07 Troubleshooting the issues with the slew of commits for Atmire modules in #182 Their changes on 5_x-dev branch work, but it is messy as hell with merge commits and old branch base When I rebase their branch on the latest 5_x-prod I get blank white pages I identified one commit that causes the issue and let them know Restart DSpace Test, as it seems to have crashed after Sisay tried to import some CSV or zip or something: Exception in thread "Lucene Merge Thread #19" org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device 2016-03-08 Add a few new filters to Atmire’s Listings and Reports module (#180) We had also wanted to add a few to the Content and Usage module but I have to ask the editors which ones they were 2016-03-10 Disable the lucene cron job on CGSpace as it shouldn’t be needed anymore Discuss ORCiD and duplicate authors on Yammer Request new documentation for Atmire CUA and L&R modules, as ours are from 2013 Walk Sisay through some data cleaning workflows in OpenRefine Start cleaning up the configuration for Atmire’s CUA module (#184) It is very messed up because some labels are incorrect, fields are missing, etc Update documentation for Atmire modules 2016-03-11 As I was looking at the CUA config I realized our Discovery config is all messed up and confusing I’ve opened an issue to track some of that work (#186) I did some major cleanup work on Discovery and XMLUI stuff related to the dc.type indexes (#187) We had been confusing dc.type (a Dublin Core value) with dc.type.output (a value we invented) for a few years and it had permeated all aspects of our data, indexes, item displays, etc.
2016-02-05 Looking at some DAGRIS data for Abenet Yabowork Lots of issues with spaces, newlines, etc causing the import to fail I noticed we have a very interesting list of countries on CGSpace: Not only are there 49,000 countries, we have some blanks (25)… Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE” 2016-02-06 Found a way to get items with null/empty metadata values from SQL First, find the metadata_field_id for the field you want from the metadatafieldregistry table: dspacetest=# select * from metadatafieldregistry; In this case our country field is 78 Now find all resources with type 2 (item) that have null/empty values for that field: dspacetest=# select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=78 and (text_value='' OR text_value IS NULL); Then you can find the handle that owns it from its resource_id: dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '22678'; It’s 25 items so editing in the web UI is annoying, let’s try SQL!
2016-01-13 Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_collections.sh script I wrote last year. I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated. Update GitHub wiki for documentation of maintenance tasks. 2016-01-14 Update CCAFS project identifiers in input-forms.xml Run system updates and restart the server 2016-01-18 Change “Extension material” to “Extension Material” in input-forms.xml (a mistake that fell through the cracks when we fixed the others in DSpace 4 era) 2016-01-19 Work on tweaks and updates for the social sharing icons on item pages: add Delicious and Mendeley (from Academicons), make links open in new windows, and set the icon color to the theme’s primary color (#157) Tweak date-based facets to show more values in drill-down ranges (#162) Need to remember to clear the Cocoon cache after deployment or else you don’t see the new ranges immediately Set up recipe on IFTTT to tweet new items from the CGSpace Atom feed to my twitter account Altmetrics’ support for Handles is kinda weak, so they can’t associate our items with DOIs until they are tweeted or blogged, etc first.
2015-12-02 Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space: # cd /home/dspacetest.cgiar.org/log # ls -lh dspace.log.2015-11-18* -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar
2015-11-22 CGSpace went down Looks like DSpace exhausted its PostgreSQL connection pool Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections: $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 78 For now I have increased the limit from 60 to 90, run updates, and rebooted the server 2015-11-24 CGSpace went down again Getting emails from uptimeRobot and uptimeButler that it’s down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors Looks like there are still a bunch of idle PostgreSQL connections: $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 96 For some reason the number of idle connections is very high since we upgraded to DSpace 5 2015-11-25 Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config The OAI application requests stylesheets and javascript files with the path /oai/static/css, which gets matched here: # static assets we can load from the file system directly with nginx location ~ /(themes|static|aspects/ReportingSuite) { try_files $uri @tomcat; ...