2016-06-01 Experimenting with IFPRI OAI (we want to harvest their publications) After reading the ContentDM documentation I found IFPRI's OAI endpoint: http://ebrary.ifpri.org/oai/oai.php After reading the OAI documentation and testing with an OAI validator I found out how to get their publications This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&from=2016-01-01&set=p15738coll2&metadataPrefix=oai_dc You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA'); UPDATE 497 dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75; UPDATE 14 Fix a few minor miscellaneous issues in dspace.cfg (#227) 2016-06-02 Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with cg.coverage.admin-unit Seems that the Browse configuration in dspace.cfg can't handle the '-' in the field name: webui.browse.index.12 = subregion:metadata:cg.coverage.admin-unit:text But actually, I think since DSpace 4 or 5 (we are 5.1) the Browse indexes come from Discovery (defined in discovery.xml) so this is really just a parsing error I've sent a message to the DSpace mailing list to ask about the Browse index definition A user was having problems with submission and from the stacktrace it looks like a Sherpa/Romeo issue I found a thread on the mailing list talking about it and there is bug report and a patch: https://jira.duraspace.org/browse/DS-2740 The patch applies successfully on DSpace 5.1 so I will try it later 2016-06-03 Investigating the CCAFS authority issue, I exported the metadata for the Videos collection The top two authors are: CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::500 CGIAR Research Program on Climate Change, Agriculture and Food Security::acd00765-02f1-4b5b-92fa-bfa3877229ce::600 So the only difference is the "confidence" Ok, well THAT is interesting: dspacetest=# select text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like '%Orth, %'; text_value | authority | confidence ------------+--------------------------------------+------------ Orth, A. # awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l 3168 The two most often requesters are in Ethiopia and Colombia: and 100% of the requests coming from Ethiopia are like this and result in an HTTP 500: GET /rest/handle/10568/NaN?expand=parentCommunityList,metadata HTTP/1.1 For now I'll block just the Ethiopian IP The owner of that application has said that the NaN (not a number) is an error in his code and he'll fix it 2016-05-03 Update nginx to 1.10.x branch on CGSpace Fix a reference to dc.type.output in Discovery that I had missed when we migrated to dc.type last month (#223) 2016-05-06 DSpace Test is down, catalina.out has lots of messages about heap space from some time yesterday (!) It looks like Sisay was doing some batch imports Hmm, also disk space is full I decided to blow away the solr indexes, since they are 50GB and we don't really need all the Atmire stuff there right now I will re-generate the Discovery indexes after re-deploying Testing renew-letsencrypt.sh script for nginx #!/usr/bin/env bash readonly SERVICE_BIN=/usr/sbin/service readonly LETSENCRYPT_BIN=/opt/letsencrypt/letsencrypt-auto # stop nginx so LE can listen on port 443 $SERVICE_BIN nginx stop $LETSENCRYPT_BIN renew -nvv --standalone --standalone-supported-challenges tls-sni-01 > /var/log/letsencrypt/renew.log 2>&1 LE_RESULT=$? This will save us a few gigs of backup space we're paying for on S3 Also, I noticed the checker log has some errors we should pay attention to: Run start time: 03/06/2016 04:00:22 Error retrieving bitstream ID 71274 from asset store. I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated. 