mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-18 04:37:04 +01:00
73 lines
4.0 KiB
Markdown
73 lines
4.0 KiB
Markdown
+++
|
|
date = "2016-04-04T11:06:00+03:00"
|
|
author = "Alan Orth"
|
|
title = "April, 2016"
|
|
tags = ["notes"]
|
|
image = "../images/bg.jpg"
|
|
|
|
+++
|
|
## 2016-04-04
|
|
|
|
- Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
|
|
- We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
|
|
- After running DSpace for over five years I've never needed to look in any other log file than dspace.log, leave alone one from last year!
|
|
- This will save us a few gigs of backup space we're paying for on S3
|
|
- Also, I noticed the `checker` log has some errors we should pay attention to:
|
|
|
|
```
|
|
Run start time: 03/06/2016 04:00:22
|
|
Error retrieving bitstream ID 71274 from asset store.
|
|
java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290601546459645925328536011917633626 (Too many open files)
|
|
at java.io.FileInputStream.open(Native Method)
|
|
at java.io.FileInputStream.<init>(FileInputStream.java:146)
|
|
at edu.sdsc.grid.io.local.LocalFileInputStream.open(LocalFileInputStream.java:171)
|
|
at edu.sdsc.grid.io.GeneralFileInputStream.<init>(GeneralFileInputStream.java:145)
|
|
at edu.sdsc.grid.io.local.LocalFileInputStream.<init>(LocalFileInputStream.java:139)
|
|
at edu.sdsc.grid.io.FileFactory.newFileInputStream(FileFactory.java:630)
|
|
at org.dspace.storage.bitstore.BitstreamStorageManager.retrieve(BitstreamStorageManager.java:525)
|
|
at org.dspace.checker.BitstreamDAO.getBitstream(BitstreamDAO.java:60)
|
|
at org.dspace.checker.CheckerCommand.processBitstream(CheckerCommand.java:303)
|
|
at org.dspace.checker.CheckerCommand.checkBitstream(CheckerCommand.java:171)
|
|
at org.dspace.checker.CheckerCommand.process(CheckerCommand.java:120)
|
|
at org.dspace.app.checker.ChecksumChecker.main(ChecksumChecker.java:236)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:606)
|
|
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:225)
|
|
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:77)
|
|
******************************************************
|
|
```
|
|
|
|
- So this would be the `tomcat7` Unix user, who seems to have a default limit of 1024 files in its shell
|
|
- For what it's worth, we have been setting the actual Tomcat 7 process' limit to 16384 for a few years (in `/etc/default/tomcat7`)
|
|
- Looks like cron will read limits from `/etc/security/limits.*` so we can do something for the tomcat7 user there
|
|
- Submit pull request for Tomcat 7 limits in Ansible dspace role ([#30](https://github.com/ilri/rmg-ansible-public/pull/30))
|
|
|
|
## 2016-04-05
|
|
|
|
- Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don't need!
|
|
|
|
```
|
|
# s3cmd ls s3://cgspace.cgiar.org/log/ > /tmp/s3-logs.txt
|
|
# grep checker.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
|
|
# grep cocoon.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
|
|
# grep handle-plugin.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
|
|
# grep solr.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
|
|
```
|
|
|
|
- Also, adjust the cron jobs for backups so they only backup `dspace.log` and some stats files (.dat)
|
|
- Try to do some metadata field migrations using the Atmire batch UI (`dc.Species` → `cg.species`) but it took several hours and even missed a few records
|
|
|
|
## 2016-04-06
|
|
|
|
- A better way to move metadata on this scale is via SQL, for example `dc.type.output` → `dc.type` (their IDs in the metadatafieldregistry are 66 and 109, respectively):
|
|
|
|
```
|
|
dspacetest=# update metadatavalue set metadata_field_id=109 where metadata_field_id=66;
|
|
UPDATE 40852
|
|
```
|
|
|
|
- After that an `index-discovery -bf` is required
|
|
- Start working on metadata migrations, add 25 or so new metadata fields to CGSpace
|