cgspace-notes/docs/2016-04/index.html

547 lines
26 KiB
HTML
Raw Normal View History

2018-02-11 17:28:23 +01:00
<!DOCTYPE html>
<html lang="en" >
2018-02-11 17:28:23 +01:00
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="April, 2016" />
<meta property="og:description" content="2016-04-04
Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
2020-01-27 15:20:44 +01:00
After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!
This will save us a few gigs of backup space we&rsquo;re paying for on S3
2018-02-11 17:28:23 +01:00
Also, I noticed the checker log has some errors we should pay attention to:
" />
<meta property="og:type" content="article" />
2019-02-02 13:12:57 +01:00
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2016-04/" />
2019-08-08 17:10:44 +02:00
<meta property="article:published_time" content="2016-04-04T11:06:00+03:00" />
<meta property="article:modified_time" content="2018-03-09T22:10:33+02:00" />
2018-09-30 07:23:48 +02:00
2018-02-11 17:28:23 +01:00
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="April, 2016"/>
<meta name="twitter:description" content="2016-04-04
Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
2020-01-27 15:20:44 +01:00
After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!
This will save us a few gigs of backup space we&rsquo;re paying for on S3
2018-02-11 17:28:23 +01:00
Also, I noticed the checker log has some errors we should pay attention to:
"/>
2020-06-04 13:43:40 +02:00
<meta name="generator" content="Hugo 0.72.0" />
2018-02-11 17:28:23 +01:00
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "April, 2016",
2020-04-02 09:55:42 +02:00
"url": "https://alanorth.github.io/cgspace-notes/2016-04/",
2018-04-30 18:05:39 +02:00
"wordCount": "2006",
"datePublished": "2016-04-04T11:06:00+03:00",
"dateModified": "2018-03-09T22:10:33+02:00",
2018-02-11 17:28:23 +01:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2016-04/">
<title>April, 2016 | CGSpace Notes</title>
2018-02-11 17:28:23 +01:00
<!-- combined, minified CSS -->
2020-01-23 19:19:38 +01:00
2020-01-28 11:01:42 +01:00
<link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
2018-02-11 17:28:23 +01:00
2020-01-28 11:01:42 +01:00
<!-- minified Font Awesome for SVG icons -->
2020-04-02 09:55:42 +02:00
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity="sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin="anonymous"></script>
2020-01-28 11:01:42 +01:00
2019-04-14 15:59:47 +02:00
<!-- RSS 2.0 feed -->
2018-02-11 17:28:23 +01:00
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
2018-12-19 12:20:39 +01:00
2018-02-11 17:28:23 +01:00
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
2018-02-11 17:28:23 +01:00
</div>
</header>
2018-12-19 12:20:39 +01:00
2018-02-11 17:28:23 +01:00
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-04/">April, 2016</a></h2>
2020-04-02 09:55:42 +02:00
<p class="blog-post-meta"><time datetime="2016-04-04T11:06:00+03:00">Mon Apr 04, 2016</time> by Alan Orth in
2018-02-11 17:28:23 +01:00
2020-01-28 11:01:42 +01:00
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
2018-02-11 17:28:23 +01:00
</p>
</header>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-04">2016-04-04</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
2020-01-27 15:20:44 +01:00
<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li>
2018-02-11 17:28:23 +01:00
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul>
<pre><code>Run start time: 03/06/2016 04:00:22
Error retrieving bitstream ID 71274 from asset store.
java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290601546459645925328536011917633626 (Too many open files)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.&lt;init&gt;(FileInputStream.java:146)
at edu.sdsc.grid.io.local.LocalFileInputStream.open(LocalFileInputStream.java:171)
at edu.sdsc.grid.io.GeneralFileInputStream.&lt;init&gt;(GeneralFileInputStream.java:145)
at edu.sdsc.grid.io.local.LocalFileInputStream.&lt;init&gt;(LocalFileInputStream.java:139)
at edu.sdsc.grid.io.FileFactory.newFileInputStream(FileFactory.java:630)
at org.dspace.storage.bitstore.BitstreamStorageManager.retrieve(BitstreamStorageManager.java:525)
at org.dspace.checker.BitstreamDAO.getBitstream(BitstreamDAO.java:60)
at org.dspace.checker.CheckerCommand.processBitstream(CheckerCommand.java:303)
at org.dspace.checker.CheckerCommand.checkBitstream(CheckerCommand.java:171)
at org.dspace.checker.CheckerCommand.process(CheckerCommand.java:120)
at org.dspace.app.checker.ChecksumChecker.main(ChecksumChecker.java:236)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:225)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:77)
******************************************************
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2018-02-11 17:28:23 +01:00
<li>So this would be the <code>tomcat7</code> Unix user, who seems to have a default limit of 1024 files in its shell</li>
2020-01-27 15:20:44 +01:00
<li>For what it&rsquo;s worth, we have been setting the actual Tomcat 7 process&rsquo; limit to 16384 for a few years (in <code>/etc/default/tomcat7</code>)</li>
2018-02-11 17:28:23 +01:00
<li>Looks like cron will read limits from <code>/etc/security/limits.*</code> so we can do something for the tomcat7 user there</li>
<li>Submit pull request for Tomcat 7 limits in Ansible dspace role (<a href="https://github.com/ilri/rmg-ansible-public/pull/30">#30</a>)</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-05">2016-04-05</h2>
2018-02-11 17:28:23 +01:00
<ul>
2020-01-27 15:20:44 +01:00
<li>Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don&rsquo;t need!</li>
2019-11-28 16:30:45 +01:00
</ul>
2018-02-11 17:28:23 +01:00
<pre><code># s3cmd ls s3://cgspace.cgiar.org/log/ &gt; /tmp/s3-logs.txt
# grep checker.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep cocoon.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep handle-plugin.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep solr.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Also, adjust the cron jobs for backups so they only backup <code>dspace.log</code> and some stats files (.dat)</li>
<li>Try to do some metadata field migrations using the Atmire batch UI (<code>dc.Species</code> → <code>cg.species</code>) but it took several hours and even missed a few records</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-06">2016-04-06</h2>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>A better way to move metadata on this scale is via SQL, for example <code>dc.type.output</code> → <code>dc.type</code> (their IDs in the metadatafieldregistry are 66 and 109, respectively):</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>dspacetest=# update metadatavalue set metadata_field_id=109 where metadata_field_id=66;
UPDATE 40852
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>After that an <code>index-discovery -bf</code> is required</li>
<li>Start working on metadata migrations, add 25 or so new metadata fields to CGSpace</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-07">2016-04-07</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Write shell script to do the migration of fields: <a href="https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b">https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b</a></li>
2019-11-28 16:30:45 +01:00
<li>Testing with a few fields it seems to work well:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>$ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40883
UPDATE metadatavalue SET metadata_field_id=202 WHERE metadata_field_id=72
UPDATE 21420
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51258
2019-12-17 13:49:24 +01:00
</code></pre><h2 id="2016-04-08">2016-04-08</h2>
2018-02-11 17:28:23 +01:00
<ul>
2020-01-27 15:20:44 +01:00
<li>Discuss metadata renaming with Abenet, we decided it&rsquo;s better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF</li>
<li>I&rsquo;ve e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-10">2016-04-10</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Looking at the DOI issue <a href="https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=678507860">reported by Leroy from CIAT a few weeks ago</a></li>
2019-11-28 16:30:45 +01:00
<li>It seems the <code>dx.doi.org</code> URLs are much more proper in our repository!</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://dx.doi.org%';
2019-11-28 16:30:45 +01:00
count
2018-02-11 17:28:23 +01:00
-------
2019-11-28 16:30:45 +01:00
5638
2018-02-11 17:28:23 +01:00
(1 row)
dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://doi.org%';
2019-11-28 16:30:45 +01:00
count
2018-02-11 17:28:23 +01:00
-------
2019-11-28 16:30:45 +01:00
3
</code></pre><ul>
<li>I will manually edit the <code>dc.identifier.doi</code> in <a href="https://cgspace.cgiar.org/handle/10568/72509?show=full">10568/72509</a> and tweet the link, then check back in a week to see if the donut gets updated</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-11">2016-04-11</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>The donut is already updated and shows the correct number now</li>
2020-01-27 15:20:44 +01:00
<li>CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we&rsquo;d do it tentatively on Monday the 18th.</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-12">2016-04-12</h2>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>Looking at quality of WLE data (<code>cg.subject.iwmi</code>) in SQL:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>dspacetest=# select text_value, count(*) from metadatavalue where metadata_field_id=217 group by text_value order by count(*) desc;
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Listings and Reports is still not returning reliable data for <code>dc.type</code></li>
2020-01-27 15:20:44 +01:00
<li>I think we need to ask Atmire, as their documentation isn&rsquo;t too clear on the format of the filter configs</li>
2019-11-28 16:30:45 +01:00
<li>Alternatively, I want to see if I move all the data from <code>dc.type.output</code> to <code>dc.type</code> and then re-index, if it behaves better</li>
<li>Looking at our <code>input-forms.xml</code> I see we have two sets of ILRI subjects, but one has a few extra subjects</li>
<li>Remove one set of ILRI subjects and remove duplicate <code>VALUE CHAINS</code> from existing list (<a href="https://github.com/ilri/DSpace/pull/216">#216</a>)</li>
<li>I decided to keep the set of subjects that had <code>FMD</code> and <code>RANGELANDS</code> added, as it appears to have been requested to have been added, and might be the newer list</li>
<li>I found 226 blank metadatavalues:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>dspacetest# select * from metadatavalue where resource_type_id=2 and text_value='';
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>I think we should delete them and do a full re-index:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value='';
DELETE 226
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2020-01-27 15:20:44 +01:00
<li>I deleted them on CGSpace but I&rsquo;ll wait to do the re-index as we&rsquo;re going to be doing one in a few days for the metadata changes anyways</li>
2019-11-28 16:30:45 +01:00
<li>In other news, moving the <code>dc.type.output</code> to <code>dc.type</code> and re-indexing seems to have fixed the Listings and Reports issue from above</li>
2020-01-27 15:20:44 +01:00
<li>Unfortunately this isn&rsquo;t a very good solution, because Listings and Reports config should allow us to filter on <code>dc.type.*</code> but the documentation isn&rsquo;t very clear and I couldn&rsquo;t reach Atmire today</li>
2019-11-28 16:30:45 +01:00
<li>We want to do the <code>dc.type.output</code> move on CGSpace anyways, but we should wait as it might affect other external people!</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-14">2016-04-14</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Communicate with Macaroni Bros again about <code>dc.type</code></li>
<li>Help Sisay with some rsync and Linux stuff</li>
<li>Notify CIAT people of metadata changes (I had forgotten them last week)</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-15">2016-04-15</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>DSpace Test had crashed, so I ran all system updates, rebooted, and re-deployed DSpace code</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-18">2016-04-18</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Talk to CIAT people about their portal again</li>
<li>Start looking more at the fields we want to delete</li>
<li>The following metadata fields have 0 items using them, so we can just remove them from the registry and any references in XMLUI, input forms, etc:
<ul>
<li>dc.description.abstractother</li>
<li>dc.whatwasknown</li>
<li>dc.whatisnew</li>
<li>dc.description.nationalpartners</li>
<li>dc.peerreviewprocess</li>
<li>cg.species.animal</li>
2019-11-28 16:30:45 +01:00
</ul>
</li>
2018-02-11 17:28:23 +01:00
<li>Deleted!</li>
<li>The following fields have some items using them and I have to decide what to do with them (delete or move):
<ul>
<li>dc.icsubject.icrafsubject: 6 items, mostly in CPWF collections</li>
<li>dc.type.journal: 11 items, mostly in ILRI collections</li>
<li>dc.publicationcategory: 1 item, in CPWF</li>
<li>dc.GRP: 2 items, CPWF</li>
<li>dc.Species.animal: 6 items, in ILRI and AnGR</li>
<li>cg.livestock.agegroup: 9 items, in ILRI collections</li>
<li>cg.livestock.function: 20 items, mostly in EADD</li>
2019-11-28 16:30:45 +01:00
</ul>
</li>
<li>Test metadata migration on local instance again:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>$ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40885
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51330
UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82
UPDATE 5986
UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88
UPDATE 2456
UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
UPDATE 3872
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
UPDATE 46075
$ JAVA_OPTS=&quot;-Xms512m -Xmx512m -Dfile.encoding=UTF-8&quot; ~/dspace/bin/dspace index-discovery -bf
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2020-01-27 15:20:44 +01:00
<li>CGSpace was down but I&rsquo;m not sure why, this was in <code>catalina.out</code>:</li>
2019-11-28 16:30:45 +01:00
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>Apr 18, 2016 7:32:26 PM com.sun.jersey.spi.container.ContainerResponse logException
SEVERE: Mapped exception to response: 500 (Internal Server Error)
javax.ws.rs.WebApplicationException
2019-11-28 16:30:45 +01:00
at org.dspace.rest.Resource.processFinally(Resource.java:163)
at org.dspace.rest.HandleResource.getObject(HandleResource.java:81)
at sun.reflect.GeneratedMethodAccessor198.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
2018-02-11 17:28:23 +01:00
...
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)</li>
<li>After restarting Tomcat a few more of these errors were logged but the application was up</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-19">2016-04-19</h2>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>Get handles for items that are using a given metadata field, ie <code>dc.Species.animal</code> (105):</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code># select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=105);
2019-11-28 16:30:45 +01:00
handle
2018-02-11 17:28:23 +01:00
-------------
2019-11-28 16:30:45 +01:00
10568/10298
10568/16413
10568/16774
10568/34487
</code></pre><ul>
<li>Delete metadata values for <code>dc.GRP</code> and <code>dc.icsubject.icrafsubject</code>:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code># delete from metadatavalue where resource_type_id=2 and metadata_field_id=96;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=83;
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2020-01-27 15:20:44 +01:00
<li>They are old ICRAF fields and we haven&rsquo;t used them since 2011 or so</li>
2019-11-28 16:30:45 +01:00
<li>Also delete them from the metadata registry</li>
<li>CGSpace went down again, <code>dspace.log</code> had this:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>2016-04-19 15:02:17,025 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2020-01-27 15:20:44 +01:00
<li>I restarted Tomcat and PostgreSQL and now it&rsquo;s back up</li>
2019-11-28 16:30:45 +01:00
<li>I bet this is the same crash as yesterday, but I only saw the errors in <code>catalina.out</code></li>
<li>Looks to be related to this, from <code>dspace.log</code>:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>2016-04-19 15:16:34,670 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>We have 18,000 of these errors right now&hellip;</li>
<li>Delete a few more old metadata values: <code>dc.Species.animal</code>, <code>dc.type.journal</code>, and <code>dc.publicationcategory</code>:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code># delete from metadatavalue where resource_type_id=2 and metadata_field_id=105;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=85;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=95;
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>And then remove them from the metadata registry</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-20">2016-04-20</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Re-deploy DSpace Test with the new subject and type fields, run all system updates, and reboot the server</li>
<li>Migrate fields and re-deploy CGSpace with the new subject and type fields, run all system updates, and reboot the server</li>
2019-11-28 16:30:45 +01:00
<li>Field migration went well:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>$ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40909
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51419
UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82
UPDATE 5986
UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88
UPDATE 2458
UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
UPDATE 3872
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
UPDATE 46075
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Also, I migrated CGSpace to using the PGDG PostgreSQL repo as the infrastructure playbooks had been using it for a while and it seemed to be working well</li>
<li>Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)</li>
<li>Looking into the REST API errors again, it looks like these started appearing a few days ago in the tens of thousands:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>$ grep -c &quot;Aborting context in finally statement&quot; dspace.log.2016-04-20
21252
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2020-01-27 15:20:44 +01:00
<li>I found a recent discussion on the DSpace mailing list and I&rsquo;ve asked for advice there</li>
<li>Looks like this issue was noted and fixed in DSpace 5.5 (we&rsquo;re on 5.1): <a href="https://jira.duraspace.org/browse/DS-2936">https://jira.duraspace.org/browse/DS-2936</a></li>
<li>I&rsquo;ve sent a message to Atmire asking about compatibility with DSpace 5.5</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-21">2016-04-21</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Fix a bunch of metadata consistency issues with IITA Journal Articles (Peer review, Formally published, messed up DOIs, etc)</li>
2020-01-27 15:20:44 +01:00
<li>Atmire responded with DSpace 5.5 compatible versions for their modules, so I&rsquo;ll start testing those in a few weeks</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-22">2016-04-22</h2>
2018-02-11 17:28:23 +01:00
<ul>
2020-01-27 15:20:44 +01:00
<li>Import 95 records into <a href="https://cgspace.cgiar.org/handle/10568/42219">CTA&rsquo;s Agrodok collection</a></li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-26">2016-04-26</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Test embargo during item upload</li>
<li>Seems to be working but the help text is misleading as to the date format</li>
2020-01-27 15:20:44 +01:00
<li>It turns out the <code>robots.txt</code> issue we thought we solved last month isn&rsquo;t solved because you can&rsquo;t use wildcards in URL patterns: <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
2018-02-11 17:28:23 +01:00
<li>Write some nginx rules to add <code>X-Robots-Tag</code> HTTP headers to the dynamic requests from <code>robots.txt</code> instead</li>
<li>A few URLs to test with:
<ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/440/browse?type=bioversity">https://dspacetest.cgiar.org/handle/10568/440/browse?type=bioversity</a></li>
<li><a href="https://dspacetest.cgiar.org/handle/10568/913/discover">https://dspacetest.cgiar.org/handle/10568/913/discover</a></li>
<li><a href="https://dspacetest.cgiar.org/handle/10568/1/search-filter?filtertype_0=country&amp;filter_0=VIETNAM&amp;filter_relational_operator_0=equals&amp;field=country">https://dspacetest.cgiar.org/handle/10568/1/search-filter?filtertype_0=country&amp;filter_0=VIETNAM&amp;filter_relational_operator_0=equals&amp;field=country</a></li>
</ul>
2019-11-28 16:30:45 +01:00
</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-27">2016-04-27</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>I woke up to ten or fifteen &ldquo;up&rdquo; and &ldquo;down&rdquo; emails from the monitoring website</li>
<li>Looks like the last one was &ldquo;down&rdquo; from about four hours ago</li>
2019-11-28 16:30:45 +01:00
<li>I think there must be something with this REST stuff:</li>
</ul>
2018-02-11 17:28:23 +01:00
<pre><code># grep -c &quot;Aborting context in finally statement&quot; dspace.log.2016-04-*
dspace.log.2016-04-01:0
dspace.log.2016-04-02:0
dspace.log.2016-04-03:0
dspace.log.2016-04-04:0
dspace.log.2016-04-05:0
dspace.log.2016-04-06:0
dspace.log.2016-04-07:0
dspace.log.2016-04-08:0
dspace.log.2016-04-09:0
dspace.log.2016-04-10:0
dspace.log.2016-04-11:0
dspace.log.2016-04-12:235
dspace.log.2016-04-13:44
dspace.log.2016-04-14:0
dspace.log.2016-04-15:35
dspace.log.2016-04-16:0
dspace.log.2016-04-17:0
dspace.log.2016-04-18:11942
dspace.log.2016-04-19:28496
dspace.log.2016-04-20:28474
dspace.log.2016-04-21:28654
dspace.log.2016-04-22:28763
dspace.log.2016-04-23:28773
dspace.log.2016-04-24:28775
dspace.log.2016-04-25:28626
dspace.log.2016-04-26:28655
dspace.log.2016-04-27:7271
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>I restarted tomcat and it is back up</li>
<li>Add Spanish XMLUI strings so those users see &ldquo;CGSpace&rdquo; instead of &ldquo;DSpace&rdquo; in the user interface (<a href="https://github.com/ilri/DSpace/pull/222">#222</a>)</li>
<li>Submit patch to upstream DSpace for the misleading help text in the embargo step of the item submission: <a href="https://jira.duraspace.org/browse/DS-3172">https://jira.duraspace.org/browse/DS-3172</a></li>
<li>Update infrastructure playbooks for nginx 1.10.x (stable) release: <a href="https://github.com/ilri/rmg-ansible-public/issues/32">https://github.com/ilri/rmg-ansible-public/issues/32</a></li>
2020-01-27 15:20:44 +01:00
<li>Currently running on DSpace Test, we&rsquo;ll give it a few days before we adjust CGSpace</li>
<li>CGSpace down, restarted tomcat and it&rsquo;s back up</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-28">2016-04-28</h2>
2018-02-11 17:28:23 +01:00
<ul>
2020-01-27 15:20:44 +01:00
<li>Problems with stability again. I&rsquo;ve blocked access to <code>/rest</code> for now to see if the number of errors in the log files drop</li>
2018-02-11 17:28:23 +01:00
<li>Later we could maybe start logging access to <code>/rest</code> and perhaps whitelist some IPs&hellip;</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2016-04-30">2016-04-30</h2>
2018-02-11 17:28:23 +01:00
<ul>
2020-01-27 15:20:44 +01:00
<li>Logs for today and yesterday have zero references to this REST error, so I&rsquo;m going to open back up the REST API but log all requests</li>
2019-11-28 16:30:45 +01:00
</ul>
2018-02-11 17:28:23 +01:00
<pre><code>location /rest {
access_log /var/log/nginx/rest.log;
proxy_pass http://127.0.0.1:8443;
}
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>I will check the logs again in a few days to look for patterns, see who is accessing it, etc</li>
2018-02-11 17:28:23 +01:00
</ul>
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
2020-06-02 14:12:32 +02:00
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
2020-05-02 09:08:14 +02:00
2020-06-02 14:12:32 +02:00
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
2020-06-01 16:08:25 +02:00
2020-04-02 09:54:46 +02:00
<li><a href="/cgspace-notes/2020-04/">April, 2020</a></li>
2020-03-02 11:38:10 +01:00
<li><a href="/cgspace-notes/2020-03/">March, 2020</a></li>
2020-02-02 16:15:48 +01:00
<li><a href="/cgspace-notes/2020-02/">February, 2020</a></li>
2018-02-11 17:28:23 +01:00
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
2018-02-11 17:28:23 +01:00
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>