May, 2016
2016-05-01
- Since yesterday there have been 10,000 REST errors and the site has been unstable again
- I have blocked access to the API now
- There are 3,000 IPs accessing the REST API in a 24-hour period!
# awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168
- The two most often requesters are in Ethiopia and Colombia: 213.55.99.121 and 181.118.144.29
- 100% of the requests coming from Ethiopia are like this and result in an HTTP 500:
GET /rest/handle/10568/NaN?expand=parentCommunityList,metadata HTTP/1.1
- For now I’ll block just the Ethiopian IP
- The owner of that application has said that the
NaN
(not a number) is an error in his code and he’ll fix it
2016-05-03
- Update nginx to 1.10.x branch on CGSpace
- Fix a reference to
dc.type.output
in Discovery that I had missed when we migrated todc.type
last month (#223)
2016-05-06
- DSpace Test is down,
catalina.out
has lots of messages about heap space from some time yesterday (!) - It looks like Sisay was doing some batch imports
- Hmm, also disk space is full
- I decided to blow away the solr indexes, since they are 50GB and we don’t really need all the Atmire stuff there right now
- I will re-generate the Discovery indexes after re-deploying
- Testing
renew-letsencrypt.sh
script for nginx
#!/usr/bin/env bash
readonly SERVICE_BIN=/usr/sbin/service
readonly LETSENCRYPT_BIN=/opt/letsencrypt/letsencrypt-auto
# stop nginx so LE can listen on port 443
$SERVICE_BIN nginx stop
$LETSENCRYPT_BIN renew -nvv --standalone --standalone-supported-challenges tls-sni-01 > /var/log/letsencrypt/renew.log 2>&1
LE_RESULT=$?
$SERVICE_BIN nginx start
if [[ "$LE_RESULT" != 0 ]]; then
echo 'Automated renewal failed:'
cat /var/log/letsencrypt/renew.log
exit 1
fi
- Seems to work well
2016-05-10
- Start looking at more metadata migrations
- There are lots of fields in
dcterms
namespace that look interesting, like:- dcterms.type
- dcterms.spatial
- Not sure what
dcterms
is… - Looks like these were added in DSpace 4 to allow for future work to make DSpace more flexible
- CGSpace’s
dc
registry has 96 items, and the default DSpace one has 73.
2016-05-11
Identify and propose the next phase of CGSpace fields to migrate:
- dc.title.jtitle → cg.title.journal
- dc.identifier.status → cg.identifier.status
- dc.river.basin → cg.river.basin
- dc.Species → cg.species
- dc.targetaudience → cg.targetaudience
- dc.fulltextstatus → cg.fulltextstatus
- dc.editon → cg.edition
- dc.isijournal → cg.isijournal
Start a test rebase of the
5_x-prod
branch on top of thedspace-5.5
tagThere were a handful of conflicts that I didn’t understand
After completing the rebase I tried to build with the module versions Atmire had indicated as being 5.5 ready but I got this error:
[ERROR] Failed to execute goal on project additions: Could not resolve dependencies for project org.dspace.modules:additions:jar:5.5: Could not find artifact com.atmire:atmire-metadata-quality-api:jar:5.5-2.10.1-0 in sonatype-releases (https://oss.sonatype.org/content/repositories/releases/) -> [Help 1]
- I’ve sent them a question about it
- A user mentioned having problems with uploading a 33 MB PDF
- I told her I would increase the limit temporarily tomorrow morning
- Turns out she was able to decrease the size of the PDF so we didn’t have to do anything
2016-05-12
- Looks like the issue that Abenet was having a few days ago with “Connection Reset” in Firefox might be due to a Firefox 46 issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1268775
- I finally found a copy of the latest CG Core metadata guidelines and it looks like we can add a few more fields to our next migration:
- dc.rplace.region → cg.coverage.region
- dc.cplace.country → cg.coverage.country
- Questions for CG people:
- Our
dc.place
anddc.srplace.subregion
could both map tocg.coverage.admin-unit
? - Should we use
dc.contributor.crp
orcg.contributor.crp
for the CRP (ours isdc.crsubject.crpsubject
)? - Our
dc.contributor.affiliation
anddc.contributor.corporate
could both map todc.contributor
and possiblydc.contributor.center
depending on if it’s a CG center or not dc.title.jtitle
could either map todc.publisher
ordc.source
depending on how you read things
- Our
- Found ~200 messed up CIAT values in
dc.publisher
:
# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=39 and text_value similar to "% %";
2016-05-13
- More theorizing about CGcore
- Add two new fields:
- dc.srplace.subregion → cg.coverage.admin-unit
- dc.place → cg.place
dc.place
is our own field, so it’s easy to move- I’ve removed
dc.title.jtitle
from the list for now because there’s no use moving it out of DC until we know where it will go (see discussion yesterday)
2016-05-18
- Work on 707 CCAFS records
- They have thumbnails on Flickr and elsewhere
- In OpenRefine I created a new
filename
column based on thethumbnail
column with the following GREL:
if(cells['thumbnails'].value.contains('hqdefault'), cells['thumbnails'].value.split('/')[-2] + '.jpg', cells['thumbnails'].value.split('/')[-1])
- Because ~400 records had the same filename on Flickr (hqdefault.jpg) but different UUIDs in the URL
- So for the
hqdefault.jpg
ones I just take the UUID (-2) and use it as the filename - Before importing with SAFBuilder I tested adding “__bundle:THUMBNAIL” to the
filename
column and it works fine
2016-05-19
- More quality control on
filename
field of CCAFS records to make processing in shell and SAFBuilder more reliable:
value.replace('_','').replace('-','')
- We need to hold off on moving
dc.Species
tocg.species
because it is only used for plants, and might be better to move it to something likecg.species.plant
- And
dc.identifier.fund
is MOSTLY used for CPWF project identifier but has some other sponsorship things- We should move PN, SG, CBA, IA, and PHASE* values to
cg.identifier.cpwfproject
- The rest, like BMGF and USAID etc, might have to go to either
dc.description.sponsorship
orcg.identifier.fund
(not sure yet) - There are also some mistakes in CPWF’s things, like “PN 47”
- This ought to catch all the CPWF values (there don’t appear to be and SG* values):
- We should move PN, SG, CBA, IA, and PHASE* values to
# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA');
2016-05-20
- More work on CCAFS Video and Images records
- For SAFBuilder we need to modify filename column to have the thumbnail bundle:
value + "__bundle:THUMBNAIL"
- Also, I fixed some weird characters using OpenRefine’s transform with the following GREL:
value.replace(/\u0081/,'')
- Write shell script to resize thumbnails with height larger than 400: https://gist.github.com/alanorth/131401dcd39d00e0ce12e1be3ed13256
- Upload 707 CCAFS records to DSpace Test