mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-24 23:50:17 +01:00
396 lines
18 KiB
Markdown
396 lines
18 KiB
Markdown
+++
|
|
date = "2016-04-04T11:06:00+03:00"
|
|
author = "Alan Orth"
|
|
title = "April, 2016"
|
|
tags = ["Notes"]
|
|
|
|
+++
|
|
## 2016-04-04
|
|
|
|
- Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
|
|
- We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
|
|
- After running DSpace for over five years I've never needed to look in any other log file than dspace.log, leave alone one from last year!
|
|
- This will save us a few gigs of backup space we're paying for on S3
|
|
- Also, I noticed the `checker` log has some errors we should pay attention to:
|
|
|
|
<!--more-->
|
|
|
|
```
|
|
Run start time: 03/06/2016 04:00:22
|
|
Error retrieving bitstream ID 71274 from asset store.
|
|
java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290601546459645925328536011917633626 (Too many open files)
|
|
at java.io.FileInputStream.open(Native Method)
|
|
at java.io.FileInputStream.<init>(FileInputStream.java:146)
|
|
at edu.sdsc.grid.io.local.LocalFileInputStream.open(LocalFileInputStream.java:171)
|
|
at edu.sdsc.grid.io.GeneralFileInputStream.<init>(GeneralFileInputStream.java:145)
|
|
at edu.sdsc.grid.io.local.LocalFileInputStream.<init>(LocalFileInputStream.java:139)
|
|
at edu.sdsc.grid.io.FileFactory.newFileInputStream(FileFactory.java:630)
|
|
at org.dspace.storage.bitstore.BitstreamStorageManager.retrieve(BitstreamStorageManager.java:525)
|
|
at org.dspace.checker.BitstreamDAO.getBitstream(BitstreamDAO.java:60)
|
|
at org.dspace.checker.CheckerCommand.processBitstream(CheckerCommand.java:303)
|
|
at org.dspace.checker.CheckerCommand.checkBitstream(CheckerCommand.java:171)
|
|
at org.dspace.checker.CheckerCommand.process(CheckerCommand.java:120)
|
|
at org.dspace.app.checker.ChecksumChecker.main(ChecksumChecker.java:236)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:606)
|
|
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:225)
|
|
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:77)
|
|
******************************************************
|
|
```
|
|
|
|
- So this would be the `tomcat7` Unix user, who seems to have a default limit of 1024 files in its shell
|
|
- For what it's worth, we have been setting the actual Tomcat 7 process' limit to 16384 for a few years (in `/etc/default/tomcat7`)
|
|
- Looks like cron will read limits from `/etc/security/limits.*` so we can do something for the tomcat7 user there
|
|
- Submit pull request for Tomcat 7 limits in Ansible dspace role ([#30](https://github.com/ilri/rmg-ansible-public/pull/30))
|
|
|
|
## 2016-04-05
|
|
|
|
- Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don't need!
|
|
|
|
```
|
|
# s3cmd ls s3://cgspace.cgiar.org/log/ > /tmp/s3-logs.txt
|
|
# grep checker.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
|
|
# grep cocoon.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
|
|
# grep handle-plugin.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
|
|
# grep solr.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
|
|
```
|
|
|
|
- Also, adjust the cron jobs for backups so they only backup `dspace.log` and some stats files (.dat)
|
|
- Try to do some metadata field migrations using the Atmire batch UI (`dc.Species` → `cg.species`) but it took several hours and even missed a few records
|
|
|
|
## 2016-04-06
|
|
|
|
- A better way to move metadata on this scale is via SQL, for example `dc.type.output` → `dc.type` (their IDs in the metadatafieldregistry are 66 and 109, respectively):
|
|
|
|
```
|
|
dspacetest=# update metadatavalue set metadata_field_id=109 where metadata_field_id=66;
|
|
UPDATE 40852
|
|
```
|
|
|
|
- After that an `index-discovery -bf` is required
|
|
- Start working on metadata migrations, add 25 or so new metadata fields to CGSpace
|
|
|
|
## 2016-04-07
|
|
|
|
- Write shell script to do the migration of fields: https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b
|
|
- Testing with a few fields it seems to work well:
|
|
|
|
```
|
|
$ ./migrate-fields.sh
|
|
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
|
|
UPDATE 40883
|
|
UPDATE metadatavalue SET metadata_field_id=202 WHERE metadata_field_id=72
|
|
UPDATE 21420
|
|
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
|
|
UPDATE 51258
|
|
```
|
|
|
|
## 2016-04-08
|
|
|
|
- Discuss metadata renaming with Abenet, we decided it's better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF
|
|
- I've e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change
|
|
|
|
## 2016-04-10
|
|
|
|
- Looking at the DOI issue [reported by Leroy from CIAT a few weeks ago](https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=678507860)
|
|
- It seems the `dx.doi.org` URLs are much more proper in our repository!
|
|
|
|
```
|
|
dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://dx.doi.org%';
|
|
count
|
|
-------
|
|
5638
|
|
(1 row)
|
|
|
|
dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://doi.org%';
|
|
count
|
|
-------
|
|
3
|
|
```
|
|
|
|
- I will manually edit the `dc.identifier.doi` in [10568/72509](https://cgspace.cgiar.org/handle/10568/72509?show=full) and tweet the link, then check back in a week to see if the donut gets updated
|
|
|
|
## 2016-04-11
|
|
|
|
- The donut is already updated and shows the correct number now
|
|
- CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we'd do it tentatively on Monday the 18th.
|
|
|
|
## 2016-04-12
|
|
|
|
- Looking at quality of WLE data (`cg.subject.iwmi`) in SQL:
|
|
|
|
```
|
|
dspacetest=# select text_value, count(*) from metadatavalue where metadata_field_id=217 group by text_value order by count(*) desc;
|
|
```
|
|
|
|
- Listings and Reports is still not returning reliable data for `dc.type`
|
|
- I think we need to ask Atmire, as their documentation isn't too clear on the format of the filter configs
|
|
- Alternatively, I want to see if I move all the data from `dc.type.output` to `dc.type` and then re-index, if it behaves better
|
|
- Looking at our `input-forms.xml` I see we have two sets of ILRI subjects, but one has a few extra subjects
|
|
- Remove one set of ILRI subjects and remove duplicate `VALUE CHAINS` from existing list ([#216](https://github.com/ilri/DSpace/pull/216))
|
|
- I decided to keep the set of subjects that had `FMD` and `RANGELANDS` added, as it appears to have been requested to have been added, and might be the newer list
|
|
- I found 226 blank metadatavalues:
|
|
|
|
```
|
|
dspacetest# select * from metadatavalue where resource_type_id=2 and text_value='';
|
|
```
|
|
|
|
- I think we should delete them and do a full re-index:
|
|
|
|
```
|
|
dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value='';
|
|
DELETE 226
|
|
```
|
|
|
|
- I deleted them on CGSpace but I'll wait to do the re-index as we're going to be doing one in a few days for the metadata changes anyways
|
|
- In other news, moving the `dc.type.output` to `dc.type` and re-indexing seems to have fixed the Listings and Reports issue from above
|
|
- Unfortunately this isn't a very good solution, because Listings and Reports config should allow us to filter on `dc.type.*` but the documentation isn't very clear and I couldn't reach Atmire today
|
|
- We want to do the `dc.type.output` move on CGSpace anyways, but we should wait as it might affect other external people!
|
|
|
|
## 2016-04-14
|
|
|
|
- Communicate with Macaroni Bros again about `dc.type`
|
|
- Help Sisay with some rsync and Linux stuff
|
|
- Notify CIAT people of metadata changes (I had forgotten them last week)
|
|
|
|
## 2016-04-15
|
|
|
|
- DSpace Test had crashed, so I ran all system updates, rebooted, and re-deployed DSpace code
|
|
|
|
## 2016-04-18
|
|
|
|
- Talk to CIAT people about their portal again
|
|
- Start looking more at the fields we want to delete
|
|
- The following metadata fields have 0 items using them, so we can just remove them from the registry and any references in XMLUI, input forms, etc:
|
|
- dc.description.abstractother
|
|
- dc.whatwasknown
|
|
- dc.whatisnew
|
|
- dc.description.nationalpartners
|
|
- dc.peerreviewprocess
|
|
- cg.species.animal
|
|
- Deleted!
|
|
- The following fields have some items using them and I have to decide what to do with them (delete or move):
|
|
- dc.icsubject.icrafsubject: 6 items, mostly in CPWF collections
|
|
- dc.type.journal: 11 items, mostly in ILRI collections
|
|
- dc.publicationcategory: 1 item, in CPWF
|
|
- dc.GRP: 2 items, CPWF
|
|
- dc.Species.animal: 6 items, in ILRI and AnGR
|
|
- cg.livestock.agegroup: 9 items, in ILRI collections
|
|
- cg.livestock.function: 20 items, mostly in EADD
|
|
- Test metadata migration on local instance again:
|
|
|
|
```
|
|
$ ./migrate-fields.sh
|
|
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
|
|
UPDATE 40885
|
|
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
|
|
UPDATE 51330
|
|
UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82
|
|
UPDATE 5986
|
|
UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88
|
|
UPDATE 2456
|
|
UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
|
|
UPDATE 3872
|
|
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
|
|
UPDATE 46075
|
|
$ JAVA_OPTS="-Xms512m -Xmx512m -Dfile.encoding=UTF-8" ~/dspace/bin/dspace index-discovery -bf
|
|
```
|
|
|
|
- CGSpace was down but I'm not sure why, this was in `catalina.out`:
|
|
|
|
```
|
|
Apr 18, 2016 7:32:26 PM com.sun.jersey.spi.container.ContainerResponse logException
|
|
SEVERE: Mapped exception to response: 500 (Internal Server Error)
|
|
javax.ws.rs.WebApplicationException
|
|
at org.dspace.rest.Resource.processFinally(Resource.java:163)
|
|
at org.dspace.rest.HandleResource.getObject(HandleResource.java:81)
|
|
at sun.reflect.GeneratedMethodAccessor198.invoke(Unknown Source)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:606)
|
|
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
|
|
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
|
|
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
|
|
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
|
|
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
|
|
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
|
|
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
|
|
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
|
|
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
|
|
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
|
|
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
|
|
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
|
|
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
|
|
...
|
|
```
|
|
|
|
- Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)
|
|
- After restarting Tomcat a few more of these errors were logged but the application was up
|
|
|
|
## 2016-04-19
|
|
|
|
- Get handles for items that are using a given metadata field, ie `dc.Species.animal` (105):
|
|
|
|
```
|
|
# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=105);
|
|
handle
|
|
-------------
|
|
10568/10298
|
|
10568/16413
|
|
10568/16774
|
|
10568/34487
|
|
```
|
|
|
|
- Delete metadata values for `dc.GRP` and `dc.icsubject.icrafsubject`:
|
|
|
|
```
|
|
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=96;
|
|
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=83;
|
|
```
|
|
|
|
- They are old ICRAF fields and we haven't used them since 2011 or so
|
|
- Also delete them from the metadata registry
|
|
- CGSpace went down again, `dspace.log` had this:
|
|
|
|
```
|
|
2016-04-19 15:02:17,025 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
|
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
|
|
```
|
|
|
|
- I restarted Tomcat and PostgreSQL and now it's back up
|
|
- I bet this is the same crash as yesterday, but I only saw the errors in `catalina.out`
|
|
- Looks to be related to this, from `dspace.log`:
|
|
|
|
```
|
|
2016-04-19 15:16:34,670 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
|
|
```
|
|
|
|
- We have 18,000 of these errors right now...
|
|
- Delete a few more old metadata values: `dc.Species.animal`, `dc.type.journal`, and `dc.publicationcategory`:
|
|
|
|
```
|
|
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=105;
|
|
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=85;
|
|
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=95;
|
|
```
|
|
|
|
- And then remove them from the metadata registry
|
|
|
|
## 2016-04-20
|
|
|
|
- Re-deploy DSpace Test with the new subject and type fields, run all system updates, and reboot the server
|
|
- Migrate fields and re-deploy CGSpace with the new subject and type fields, run all system updates, and reboot the server
|
|
- Field migration went well:
|
|
|
|
```
|
|
$ ./migrate-fields.sh
|
|
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
|
|
UPDATE 40909
|
|
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
|
|
UPDATE 51419
|
|
UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82
|
|
UPDATE 5986
|
|
UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88
|
|
UPDATE 2458
|
|
UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
|
|
UPDATE 3872
|
|
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
|
|
UPDATE 46075
|
|
```
|
|
|
|
- Also, I migrated CGSpace to using the PGDG PostgreSQL repo as the infrastructure playbooks had been using it for a while and it seemed to be working well
|
|
- Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)
|
|
- Looking into the REST API errors again, it looks like these started appearing a few days ago in the tens of thousands:
|
|
|
|
```
|
|
$ grep -c "Aborting context in finally statement" dspace.log.2016-04-20
|
|
21252
|
|
```
|
|
|
|
- I found a recent discussion on the DSpace mailing list and I've asked for advice there
|
|
- Looks like this issue was noted and fixed in DSpace 5.5 (we're on 5.1): https://jira.duraspace.org/browse/DS-2936
|
|
- I've sent a message to Atmire asking about compatibility with DSpace 5.5
|
|
|
|
## 2016-04-21
|
|
|
|
- Fix a bunch of metadata consistency issues with IITA Journal Articles (Peer review, Formally published, messed up DOIs, etc)
|
|
- Atmire responded with DSpace 5.5 compatible versions for their modules, so I'll start testing those in a few weeks
|
|
|
|
## 2016-04-22
|
|
|
|
- Import 95 records into [CTA's Agrodok collection](https://cgspace.cgiar.org/handle/10568/42219)
|
|
|
|
## 2016-04-26
|
|
|
|
- Test embargo during item upload
|
|
- Seems to be working but the help text is misleading as to the date format
|
|
- It turns out the `robots.txt` issue we thought we solved last month isn't solved because you can't use wildcards in URL patterns: https://jira.duraspace.org/browse/DS-2962
|
|
- Write some nginx rules to add `X-Robots-Tag` HTTP headers to the dynamic requests from `robots.txt` instead
|
|
- A few URLs to test with:
|
|
- https://dspacetest.cgiar.org/handle/10568/440/browse?type=bioversity
|
|
- https://dspacetest.cgiar.org/handle/10568/913/discover
|
|
- https://dspacetest.cgiar.org/handle/10568/1/search-filter?filtertype_0=country&filter_0=VIETNAM&filter_relational_operator_0=equals&field=country
|
|
|
|
## 2016-04-27
|
|
|
|
- I woke up to ten or fifteen "up" and "down" emails from the monitoring website
|
|
- Looks like the last one was "down" from about four hours ago
|
|
- I think there must be something with this REST stuff:
|
|
|
|
```
|
|
# grep -c "Aborting context in finally statement" dspace.log.2016-04-*
|
|
dspace.log.2016-04-01:0
|
|
dspace.log.2016-04-02:0
|
|
dspace.log.2016-04-03:0
|
|
dspace.log.2016-04-04:0
|
|
dspace.log.2016-04-05:0
|
|
dspace.log.2016-04-06:0
|
|
dspace.log.2016-04-07:0
|
|
dspace.log.2016-04-08:0
|
|
dspace.log.2016-04-09:0
|
|
dspace.log.2016-04-10:0
|
|
dspace.log.2016-04-11:0
|
|
dspace.log.2016-04-12:235
|
|
dspace.log.2016-04-13:44
|
|
dspace.log.2016-04-14:0
|
|
dspace.log.2016-04-15:35
|
|
dspace.log.2016-04-16:0
|
|
dspace.log.2016-04-17:0
|
|
dspace.log.2016-04-18:11942
|
|
dspace.log.2016-04-19:28496
|
|
dspace.log.2016-04-20:28474
|
|
dspace.log.2016-04-21:28654
|
|
dspace.log.2016-04-22:28763
|
|
dspace.log.2016-04-23:28773
|
|
dspace.log.2016-04-24:28775
|
|
dspace.log.2016-04-25:28626
|
|
dspace.log.2016-04-26:28655
|
|
dspace.log.2016-04-27:7271
|
|
```
|
|
|
|
- I restarted tomcat and it is back up
|
|
- Add Spanish XMLUI strings so those users see "CGSpace" instead of "DSpace" in the user interface ([#222](https://github.com/ilri/DSpace/pull/222))
|
|
- Submit patch to upstream DSpace for the misleading help text in the embargo step of the item submission: https://jira.duraspace.org/browse/DS-3172
|
|
- Update infrastructure playbooks for nginx 1.10.x (stable) release: https://github.com/ilri/rmg-ansible-public/issues/32
|
|
- Currently running on DSpace Test, we'll give it a few days before we adjust CGSpace
|
|
- CGSpace down, restarted tomcat and it's back up
|
|
|
|
## 2016-04-28
|
|
|
|
- Problems with stability again. I've blocked access to `/rest` for now to see if the number of errors in the log files drop
|
|
- Later we could maybe start logging access to `/rest` and perhaps whitelist some IPs...
|
|
|
|
## 2016-04-30
|
|
|
|
- Logs for today and yesterday have zero references to this REST error, so I'm going to open back up the REST API but log all requests
|
|
|
|
```
|
|
location /rest {
|
|
access_log /var/log/nginx/rest.log;
|
|
proxy_pass http://127.0.0.1:8443;
|
|
}
|
|
```
|
|
|
|
- I will check the logs again in a few days to look for patterns, see who is accessing it, etc
|