April, 2016
2016-04-04
- Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
- We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
- After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!
- This will save us a few gigs of backup space we’re paying for on S3
- Also, I noticed the
checker
log has some errors we should pay attention to:
Run start time: 03/06/2016 04:00:22
Error retrieving bitstream ID 71274 from asset store.
java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290601546459645925328536011917633626 (Too many open files)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at edu.sdsc.grid.io.local.LocalFileInputStream.open(LocalFileInputStream.java:171)
at edu.sdsc.grid.io.GeneralFileInputStream.<init>(GeneralFileInputStream.java:145)
at edu.sdsc.grid.io.local.LocalFileInputStream.<init>(LocalFileInputStream.java:139)
at edu.sdsc.grid.io.FileFactory.newFileInputStream(FileFactory.java:630)
at org.dspace.storage.bitstore.BitstreamStorageManager.retrieve(BitstreamStorageManager.java:525)
at org.dspace.checker.BitstreamDAO.getBitstream(BitstreamDAO.java:60)
at org.dspace.checker.CheckerCommand.processBitstream(CheckerCommand.java:303)
at org.dspace.checker.CheckerCommand.checkBitstream(CheckerCommand.java:171)
at org.dspace.checker.CheckerCommand.process(CheckerCommand.java:120)
at org.dspace.app.checker.ChecksumChecker.main(ChecksumChecker.java:236)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:225)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:77)
******************************************************
- So this would be the
tomcat7
Unix user, who seems to have a default limit of 1024 files in its shell - For what it’s worth, we have been setting the actual Tomcat 7 process’ limit to 16384 for a few years (in
/etc/default/tomcat7
) - Looks like cron will read limits from
/etc/security/limits.*
so we can do something for the tomcat7 user there - Submit pull request for Tomcat 7 limits in Ansible dspace role (#30)
2016-04-05
Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don’t need!
# s3cmd ls s3://cgspace.cgiar.org/log/ > /tmp/s3-logs.txt # grep checker.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del # grep cocoon.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del # grep handle-plugin.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del # grep solr.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
Also, adjust the cron jobs for backups so they only backup
dspace.log
and some stats files (.dat)Try to do some metadata field migrations using the Atmire batch UI (
dc.Species
→cg.species
) but it took several hours and even missed a few records
2016-04-06
A better way to move metadata on this scale is via SQL, for example
dc.type.output
→dc.type
(their IDs in the metadatafieldregistry are 66 and 109, respectively):dspacetest=# update metadatavalue set metadata_field_id=109 where metadata_field_id=66; UPDATE 40852
After that an
index-discovery -bf
is requiredStart working on metadata migrations, add 25 or so new metadata fields to CGSpace
2016-04-07
- Write shell script to do the migration of fields: https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b
Testing with a few fields it seems to work well:
$ ./migrate-fields.sh UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66 UPDATE 40883 UPDATE metadatavalue SET metadata_field_id=202 WHERE metadata_field_id=72 UPDATE 21420 UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76 UPDATE 51258
2016-04-08
- Discuss metadata renaming with Abenet, we decided it’s better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF
- I’ve e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change
2016-04-10
- Looking at the DOI issue reported by Leroy from CIAT a few weeks ago
It seems the
dx.doi.org
URLs are much more proper in our repository!dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://dx.doi.org%'; count ------- 5638 (1 row) dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://doi.org%'; count ------- 3
I will manually edit the
dc.identifier.doi
in 10568⁄72509 and tweet the link, then check back in a week to see if the donut gets updated
2016-04-11
- The donut is already updated and shows the correct number now
- CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we’d do it tentatively on Monday the 18th.
2016-04-12
Looking at quality of WLE data (
cg.subject.iwmi
) in SQL:dspacetest=# select text_value, count(*) from metadatavalue where metadata_field_id=217 group by text_value order by count(*) desc;
Listings and Reports is still not returning reliable data for
dc.type
I think we need to ask Atmire, as their documentation isn’t too clear on the format of the filter configs
Alternatively, I want to see if I move all the data from
dc.type.output
todc.type
and then re-index, if it behaves betterLooking at our
input-forms.xml
I see we have two sets of ILRI subjects, but one has a few extra subjectsRemove one set of ILRI subjects and remove duplicate
VALUE CHAINS
from existing list (#216)I decided to keep the set of subjects that had
FMD
andRANGELANDS
added, as it appears to have been requested to have been added, and might be the newer listI found 226 blank metadatavalues:
dspacetest# select * from metadatavalue where resource_type_id=2 and text_value='';
I think we should delete them and do a full re-index:
dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value=''; DELETE 226
I deleted them on CGSpace but I’ll wait to do the re-index as we’re going to be doing one in a few days for the metadata changes anyways
In other news, moving the
dc.type.output
todc.type
and re-indexing seems to have fixed the Listings and Reports issue from aboveUnfortunately this isn’t a very good solution, because Listings and Reports config should allow us to filter on
dc.type.*
but the documentation isn’t very clear and I couldn’t reach Atmire todayWe want to do the
dc.type.output
move on CGSpace anyways, but we should wait as it might affect other external people!
2016-04-14
- Communicate with Macaroni Bros again about
dc.type
- Help Sisay with some rsync and Linux stuff
- Notify CIAT people of metadata changes (I had forgotten them last week)
2016-04-15
- DSpace Test had crashed, so I ran all system updates, rebooted, and re-deployed DSpace code
2016-04-18
- Talk to CIAT people about their portal again
- Start looking more at the fields we want to delete
- The following metadata fields have 0 items using them, so we can just remove them from the registry and any references in XMLUI, input forms, etc:
- dc.description.abstractother
- dc.whatwasknown
- dc.whatisnew
- dc.description.nationalpartners
- dc.peerreviewprocess
- cg.species.animal
- Deleted!
- The following fields have some items using them and I have to decide what to do with them (delete or move):
- dc.icsubject.icrafsubject: 6 items, mostly in CPWF collections
- dc.type.journal: 11 items, mostly in ILRI collections
- dc.publicationcategory: 1 item, in CPWF
- dc.GRP: 2 items, CPWF
- dc.Species.animal: 6 items, in ILRI and AnGR
- cg.livestock.agegroup: 9 items, in ILRI collections
- cg.livestock.function: 20 items, mostly in EADD
Test metadata migration on local instance again:
$ ./migrate-fields.sh UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66 UPDATE 40885 UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76 UPDATE 51330 UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82 UPDATE 5986 UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88 UPDATE 2456 UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106 UPDATE 3872 UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108 UPDATE 46075 $ JAVA_OPTS="-Xms512m -Xmx512m -Dfile.encoding=UTF-8" ~/dspace/bin/dspace index-discovery -bf
CGSpace was down but I’m not sure why, this was in
catalina.out
:Apr 18, 2016 7:32:26 PM com.sun.jersey.spi.container.ContainerResponse logException SEVERE: Mapped exception to response: 500 (Internal Server Error) javax.ws.rs.WebApplicationException at org.dspace.rest.Resource.processFinally(Resource.java:163) at org.dspace.rest.HandleResource.getObject(HandleResource.java:81) at sun.reflect.GeneratedMethodAccessor198.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) ...
Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)
After restarting Tomcat a few more of these errors were logged but the application was up
2016-04-19
Get handles for items that are using a given metadata field, ie
dc.Species.animal
(105):# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=105); handle ------------- 10568/10298 10568/16413 10568/16774 10568/34487
Delete metadata values for
dc.GRP
anddc.icsubject.icrafsubject
:# delete from metadatavalue where resource_type_id=2 and metadata_field_id=96; # delete from metadatavalue where resource_type_id=2 and metadata_field_id=83;
They are old ICRAF fields and we haven’t used them since 2011 or so
Also delete them from the metadata registry
CGSpace went down again,
dspace.log
had this:2016-04-19 15:02:17,025 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
I restarted Tomcat and PostgreSQL and now it’s back up
I bet this is the same crash as yesterday, but I only saw the errors in
catalina.out
Looks to be related to this, from
dspace.log
:2016-04-19 15:16:34,670 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
We have 18,000 of these errors right now…
Delete a few more old metadata values:
dc.Species.animal
,dc.type.journal
, anddc.publicationcategory
:# delete from metadatavalue where resource_type_id=2 and metadata_field_id=105; # delete from metadatavalue where resource_type_id=2 and metadata_field_id=85; # delete from metadatavalue where resource_type_id=2 and metadata_field_id=95;
And then remove them from the metadata registry
2016-04-20
- Re-deploy DSpace Test with the new subject and type fields, run all system updates, and reboot the server
- Migrate fields and re-deploy CGSpace with the new subject and type fields, run all system updates, and reboot the server
Field migration went well:
$ ./migrate-fields.sh UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66 UPDATE 40909 UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76 UPDATE 51419 UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82 UPDATE 5986 UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88 UPDATE 2458 UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106 UPDATE 3872 UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108 UPDATE 46075
Also, I migrated CGSpace to using the PGDG PostgreSQL repo as the infrastructure playbooks had been using it for a while and it seemed to be working well
Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)
Looking into the REST API errors again, it looks like these started appearing a few days ago in the tens of thousands:
$ grep -c "Aborting context in finally statement" dspace.log.2016-04-20 21252
I found a recent discussion on the DSpace mailing list and I’ve asked for advice there
Looks like this issue was noted and fixed in DSpace 5.5 (we’re on 5.1): https://jira.duraspace.org/browse/DS-2936
I’ve sent a message to Atmire asking about compatibility with DSpace 5.5
2016-04-21
- Fix a bunch of metadata consistency issues with IITA Journal Articles (Peer review, Formally published, messed up DOIs, etc)
- Atmire responded with DSpace 5.5 compatible versions for their modules, so I’ll start testing those in a few weeks
2016-04-22
- Import 95 records into CTA’s Agrodok collection
2016-04-26
- Test embargo during item upload
- Seems to be working but the help text is misleading as to the date format
- It turns out the
robots.txt
issue we thought we solved last month isn’t solved because you can’t use wildcards in URL patterns: https://jira.duraspace.org/browse/DS-2962 - Write some nginx rules to add
X-Robots-Tag
HTTP headers to the dynamic requests fromrobots.txt
instead - A few URLs to test with:
2016-04-27
- I woke up to ten or fifteen “up” and “down” emails from the monitoring website
- Looks like the last one was “down” from about four hours ago
I think there must be something with this REST stuff:
# grep -c "Aborting context in finally statement" dspace.log.2016-04-* dspace.log.2016-04-01:0 dspace.log.2016-04-02:0 dspace.log.2016-04-03:0 dspace.log.2016-04-04:0 dspace.log.2016-04-05:0 dspace.log.2016-04-06:0 dspace.log.2016-04-07:0 dspace.log.2016-04-08:0 dspace.log.2016-04-09:0 dspace.log.2016-04-10:0 dspace.log.2016-04-11:0 dspace.log.2016-04-12:235 dspace.log.2016-04-13:44 dspace.log.2016-04-14:0 dspace.log.2016-04-15:35 dspace.log.2016-04-16:0 dspace.log.2016-04-17:0 dspace.log.2016-04-18:11942 dspace.log.2016-04-19:28496 dspace.log.2016-04-20:28474 dspace.log.2016-04-21:28654 dspace.log.2016-04-22:28763 dspace.log.2016-04-23:28773 dspace.log.2016-04-24:28775 dspace.log.2016-04-25:28626 dspace.log.2016-04-26:28655 dspace.log.2016-04-27:7271
I restarted tomcat and it is back up
Add Spanish XMLUI strings so those users see “CGSpace” instead of “DSpace” in the user interface (#222)
Submit patch to upstream DSpace for the misleading help text in the embargo step of the item submission: https://jira.duraspace.org/browse/DS-3172
Update infrastructure playbooks for nginx 1.10.x (stable) release: https://github.com/ilri/rmg-ansible-public/issues/32
Currently running on DSpace Test, we’ll give it a few days before we adjust CGSpace
CGSpace down, restarted tomcat and it’s back up
2016-04-28
- Problems with stability again. I’ve blocked access to
/rest
for now to see if the number of errors in the log files drop - Later we could maybe start logging access to
/rest
and perhaps whitelist some IPs…
2016-04-30
Logs for today and yesterday have zero references to this REST error, so I’m going to open back up the REST API but log all requests
location /rest { access_log /var/log/nginx/rest.log; proxy_pass http://127.0.0.1:8443; }
I will check the logs again in a few days to look for patterns, see who is accessing it, etc