April, 2016

Mon Apr 04, 2016 in Notes

2016-04-04

Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!
This will save us a few gigs of backup space we’re paying for on S3
Also, I noticed the checker log has some errors we should pay attention to:

Run start time: 03/06/2016 04:00:22
Error retrieving bitstream ID 71274 from asset store.
java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290601546459645925328536011917633626 (Too many open files)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at edu.sdsc.grid.io.local.LocalFileInputStream.open(LocalFileInputStream.java:171)
        at edu.sdsc.grid.io.GeneralFileInputStream.<init>(GeneralFileInputStream.java:145)
        at edu.sdsc.grid.io.local.LocalFileInputStream.<init>(LocalFileInputStream.java:139)
        at edu.sdsc.grid.io.FileFactory.newFileInputStream(FileFactory.java:630)
        at org.dspace.storage.bitstore.BitstreamStorageManager.retrieve(BitstreamStorageManager.java:525)
        at org.dspace.checker.BitstreamDAO.getBitstream(BitstreamDAO.java:60)
        at org.dspace.checker.CheckerCommand.processBitstream(CheckerCommand.java:303)
        at org.dspace.checker.CheckerCommand.checkBitstream(CheckerCommand.java:171)
        at org.dspace.checker.CheckerCommand.process(CheckerCommand.java:120)
        at org.dspace.app.checker.ChecksumChecker.main(ChecksumChecker.java:236)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:225)
        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:77)
******************************************************

So this would be the tomcat7 Unix user, who seems to have a default limit of 1024 files in its shell
For what it’s worth, we have been setting the actual Tomcat 7 process’ limit to 16384 for a few years (in /etc/default/tomcat7)
Looks like cron will read limits from /etc/security/limits.* so we can do something for the tomcat7 user there
Submit pull request for Tomcat 7 limits in Ansible dspace role (#30)

2016-04-05

Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don’t need!

# s3cmd ls s3://cgspace.cgiar.org/log/ > /tmp/s3-logs.txt
# grep checker.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep cocoon.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep handle-plugin.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep solr.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del

Also, adjust the cron jobs for backups so they only backup dspace.log and some stats files (.dat)
Try to do some metadata field migrations using the Atmire batch UI (dc.Species → cg.species) but it took several hours and even missed a few records

2016-04-06

A better way to move metadata on this scale is via SQL, for example dc.type.output → dc.type (their IDs in the metadatafieldregistry are 66 and 109, respectively):

dspacetest=# update metadatavalue set metadata_field_id=109 where metadata_field_id=66;
UPDATE 40852

After that an index-discovery -bf is required
Start working on metadata migrations, add 25 or so new metadata fields to CGSpace

2016-04-07

Write shell script to do the migration of fields: https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b
Testing with a few fields it seems to work well:

$ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40883
UPDATE metadatavalue SET metadata_field_id=202 WHERE metadata_field_id=72
UPDATE 21420
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51258

2016-04-08

Discuss metadata renaming with Abenet, we decided it’s better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF
I’ve e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change

2016-04-10

Looking at the DOI issue reported by Leroy from CIAT a few weeks ago
It seems the dx.doi.org URLs are much more proper in our repository!

dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://dx.doi.org%';
 count
-------
  5638
(1 row)

dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://doi.org%';
 count
-------
     3

I will manually edit the dc.identifier.doi in 10568/72509 and tweet the link, then check back in a week to see if the donut gets updated

2016-04-11

The donut is already updated and shows the correct number now
CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we’d do it tentatively on Monday the 18th.

2016-04-12

Looking at quality of WLE data (cg.subject.iwmi) in SQL:

dspacetest=# select text_value, count(*) from metadatavalue where metadata_field_id=217 group by text_value order by count(*) desc;

Listings and Reports is still not returning reliable data for dc.type
I think we need to ask Atmire, as their documentation isn’t too clear on the format of the filter configs
Alternatively, I want to see if I move all the data from dc.type.output to dc.type and then re-index, if it behaves better
Looking at our input-forms.xml I see we have two sets of ILRI subjects, but one has a few extra subjects
Remove one set of ILRI subjects and remove duplicate VALUE CHAINS from existing list (#216)
I decided to keep the set of subjects that had FMD and RANGELANDS added, as it appears to have been requested to have been added, and might be the newer list
I found 226 blank metadatavalues:

dspacetest# select * from metadatavalue where resource_type_id=2 and text_value='';

I think we should delete them and do a full re-index:

dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value='';
DELETE 226

I deleted them on CGSpace but I’ll wait to do the re-index as we’re going to be doing one in a few days for the metadata changes anyways
In other news, moving the dc.type.output to dc.type and re-indexing seems to have fixed the Listings and Reports issue from above
Unfortunately this isn’t a very good solution, because Listings and Reports config should allow us to filter on dc.type.* but the documentation isn’t very clear and I couldn’t reach Atmire today
We want to do the dc.type.output move on CGSpace anyways, but we should wait as it might affect other external people!

2016-04-14

Communicate with Macaroni Bros again about dc.type
Help Sisay with some rsync and Linux stuff
Notify CIAT people of metadata changes (I had forgotten them last week)

2016-04-15

DSpace Test had crashed, so I ran all system updates, rebooted, and re-deployed DSpace code

2016-04-18

Talk to CIAT people about their portal again
Start looking more at the fields we want to delete
The following metadata fields have 0 items using them, so we can just remove them from the registry and any references in XMLUI, input forms, etc:
- dc.description.abstractother
- dc.whatwasknown
- dc.whatisnew
- dc.description.nationalpartners
- dc.peerreviewprocess
- cg.species.animal
Deleted!
The following fields have some items using them and I have to decide what to do with them (delete or move):
- dc.icsubject.icrafsubject: 6 items, mostly in CPWF collections
- dc.type.journal: 11 items, mostly in ILRI collections
- dc.publicationcategory: 1 item, in CPWF
- dc.GRP: 2 items, CPWF
- dc.Species.animal: 6 items, in ILRI and AnGR
- cg.livestock.agegroup: 9 items, in ILRI collections
- cg.livestock.function: 20 items, mostly in EADD
Test metadata migration on local instance again:

$ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40885
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51330
UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82
UPDATE 5986
UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88
UPDATE 2456
UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
UPDATE 3872
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
UPDATE 46075
$ JAVA_OPTS="-Xms512m -Xmx512m -Dfile.encoding=UTF-8" ~/dspace/bin/dspace index-discovery -bf

CGSpace was down but I’m not sure why, this was in catalina.out:

Apr 18, 2016 7:32:26 PM com.sun.jersey.spi.container.ContainerResponse logException
SEVERE: Mapped exception to response: 500 (Internal Server Error)
javax.ws.rs.WebApplicationException
        at org.dspace.rest.Resource.processFinally(Resource.java:163)
        at org.dspace.rest.HandleResource.getObject(HandleResource.java:81)
        at sun.reflect.GeneratedMethodAccessor198.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
        at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
        at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
...

Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)
After restarting Tomcat a few more of these errors were logged but the application was up

2016-04-19

Get handles for items that are using a given metadata field, ie dc.Species.animal (105):

# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=105);
   handle
-------------
 10568/10298
 10568/16413
 10568/16774
 10568/34487

Delete metadata values for dc.GRP and dc.icsubject.icrafsubject:

# delete from metadatavalue where resource_type_id=2 and metadata_field_id=96;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=83;

They are old ICRAF fields and we haven’t used them since 2011 or so
Also delete them from the metadata registry
CGSpace went down again, dspace.log had this:

2016-04-19 15:02:17,025 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object

I restarted Tomcat and PostgreSQL and now it’s back up
I bet this is the same crash as yesterday, but I only saw the errors in catalina.out
Looks to be related to this, from dspace.log:

2016-04-19 15:16:34,670 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.

We have 18,000 of these errors right now…
Delete a few more old metadata values: dc.Species.animal, dc.type.journal, and dc.publicationcategory:

# delete from metadatavalue where resource_type_id=2 and metadata_field_id=105;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=85;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=95;

And then remove them from the metadata registry

2016-04-20

Re-deploy DSpace Test with the new subject and type fields, run all system updates, and reboot the server
Migrate fields and re-deploy CGSpace with the new subject and type fields, run all system updates, and reboot the server
Field migration went well:

$ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40909
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51419
UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82
UPDATE 5986
UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88
UPDATE 2458
UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
UPDATE 3872
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
UPDATE 46075

Also, I migrated CGSpace to using the PGDG PostgreSQL repo as the infrastructure playbooks had been using it for a while and it seemed to be working well
Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)
Looking into the REST API errors again, it looks like these started appearing a few days ago in the tens of thousands:

$ grep -c "Aborting context in finally statement" dspace.log.2016-04-20
21252

I found a recent discussion on the DSpace mailing list and I’ve asked for advice there
Looks like this issue was noted and fixed in DSpace 5.5 (we’re on 5.1): https://jira.duraspace.org/browse/DS-2936
I’ve sent a message to Atmire asking about compatibility with DSpace 5.5

2016-04-21

Fix a bunch of metadata consistency issues with IITA Journal Articles (Peer review, Formally published, messed up DOIs, etc)
Atmire responded with DSpace 5.5 compatible versions for their modules, so I’ll start testing those in a few weeks

2016-04-22

Import 95 records into CTA’s Agrodok collection

2016-04-26

Test embargo during item upload
Seems to be working but the help text is misleading as to the date format
It turns out the robots.txt issue we thought we solved last month isn’t solved because you can’t use wildcards in URL patterns: https://jira.duraspace.org/browse/DS-2962
Write some nginx rules to add X-Robots-Tag HTTP headers to the dynamic requests from robots.txt instead
A few URLs to test with:

2016-04-27

I woke up to ten or fifteen “up” and “down” emails from the monitoring website
Looks like the last one was “down” from about four hours ago
I think there must be something with this REST stuff:

# grep -c "Aborting context in finally statement" dspace.log.2016-04-*
dspace.log.2016-04-01:0
dspace.log.2016-04-02:0
dspace.log.2016-04-03:0
dspace.log.2016-04-04:0
dspace.log.2016-04-05:0
dspace.log.2016-04-06:0
dspace.log.2016-04-07:0
dspace.log.2016-04-08:0
dspace.log.2016-04-09:0
dspace.log.2016-04-10:0
dspace.log.2016-04-11:0
dspace.log.2016-04-12:235
dspace.log.2016-04-13:44
dspace.log.2016-04-14:0
dspace.log.2016-04-15:35
dspace.log.2016-04-16:0
dspace.log.2016-04-17:0
dspace.log.2016-04-18:11942
dspace.log.2016-04-19:28496
dspace.log.2016-04-20:28474
dspace.log.2016-04-21:28654
dspace.log.2016-04-22:28763
dspace.log.2016-04-23:28773
dspace.log.2016-04-24:28775
dspace.log.2016-04-25:28626
dspace.log.2016-04-26:28655
dspace.log.2016-04-27:7271

I restarted tomcat and it is back up
Add Spanish XMLUI strings so those users see “CGSpace” instead of “DSpace” in the user interface (#222)
Submit patch to upstream DSpace for the misleading help text in the embargo step of the item submission: https://jira.duraspace.org/browse/DS-3172
Update infrastructure playbooks for nginx 1.10.x (stable) release: https://github.com/ilri/rmg-ansible-public/issues/32
Currently running on DSpace Test, we’ll give it a few days before we adjust CGSpace
CGSpace down, restarted tomcat and it’s back up

2016-04-28

Problems with stability again. I’ve blocked access to /rest for now to see if the number of errors in the log files drop
Later we could maybe start logging access to /rest and perhaps whitelist some IPs…

2016-04-30

Logs for today and yesterday have zero references to this REST error, so I’m going to open back up the REST API but log all requests

location /rest {
	access_log /var/log/nginx/rest.log;
	proxy_pass http://127.0.0.1:8443;
}

I will check the logs again in a few days to look for patterns, see who is accessing it, etc