mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-22 19:43:24 +01:00
985 lines
47 KiB
HTML
985 lines
47 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
|
|
|
<meta property="og:title" content="January, 2018" />
|
|
<meta property="og:description" content="2018-01-02
|
|
|
|
|
|
Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time
|
|
I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary
|
|
The nginx logs show HTTP 200s until 02/Jan/2018:11:27:17 +0000 when Uptime Robot got an HTTP 500
|
|
In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”
|
|
And just before that I see this:
|
|
|
|
|
|
Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
|
|
|
|
|
|
|
Ah hah! So the pool was actually empty!
|
|
I need to increase that, let’s try to bump it up from 50 to 75
|
|
After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw
|
|
I notice this error quite a few times in dspace.log:
|
|
|
|
|
|
2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
|
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
|
|
|
|
|
|
|
|
And there are many of these errors every day for the past month:
|
|
|
|
|
|
$ grep -c "Error while searching for sidebar facets" dspace.log.*
|
|
dspace.log.2017-11-21:4
|
|
dspace.log.2017-11-22:1
|
|
dspace.log.2017-11-23:4
|
|
dspace.log.2017-11-24:11
|
|
dspace.log.2017-11-25:0
|
|
dspace.log.2017-11-26:1
|
|
dspace.log.2017-11-27:7
|
|
dspace.log.2017-11-28:21
|
|
dspace.log.2017-11-29:31
|
|
dspace.log.2017-11-30:15
|
|
dspace.log.2017-12-01:15
|
|
dspace.log.2017-12-02:20
|
|
dspace.log.2017-12-03:38
|
|
dspace.log.2017-12-04:65
|
|
dspace.log.2017-12-05:43
|
|
dspace.log.2017-12-06:72
|
|
dspace.log.2017-12-07:27
|
|
dspace.log.2017-12-08:15
|
|
dspace.log.2017-12-09:29
|
|
dspace.log.2017-12-10:35
|
|
dspace.log.2017-12-11:20
|
|
dspace.log.2017-12-12:44
|
|
dspace.log.2017-12-13:36
|
|
dspace.log.2017-12-14:59
|
|
dspace.log.2017-12-15:104
|
|
dspace.log.2017-12-16:53
|
|
dspace.log.2017-12-17:66
|
|
dspace.log.2017-12-18:83
|
|
dspace.log.2017-12-19:101
|
|
dspace.log.2017-12-20:74
|
|
dspace.log.2017-12-21:55
|
|
dspace.log.2017-12-22:66
|
|
dspace.log.2017-12-23:50
|
|
dspace.log.2017-12-24:85
|
|
dspace.log.2017-12-25:62
|
|
dspace.log.2017-12-26:49
|
|
dspace.log.2017-12-27:30
|
|
dspace.log.2017-12-28:54
|
|
dspace.log.2017-12-29:68
|
|
dspace.log.2017-12-30:89
|
|
dspace.log.2017-12-31:53
|
|
dspace.log.2018-01-01:45
|
|
dspace.log.2018-01-02:34
|
|
|
|
|
|
|
|
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains
|
|
|
|
|
|
" />
|
|
<meta property="og:type" content="article" />
|
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-01/" />
|
|
|
|
|
|
|
|
<meta property="article:published_time" content="2018-01-02T08:35:54-08:00"/>
|
|
|
|
<meta property="article:modified_time" content="2018-01-12T07:55:01+02:00"/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="January, 2018"/>
|
|
<meta name="twitter:description" content="2018-01-02
|
|
|
|
|
|
Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time
|
|
I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary
|
|
The nginx logs show HTTP 200s until 02/Jan/2018:11:27:17 +0000 when Uptime Robot got an HTTP 500
|
|
In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”
|
|
And just before that I see this:
|
|
|
|
|
|
Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
|
|
|
|
|
|
|
Ah hah! So the pool was actually empty!
|
|
I need to increase that, let’s try to bump it up from 50 to 75
|
|
After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw
|
|
I notice this error quite a few times in dspace.log:
|
|
|
|
|
|
2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
|
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
|
|
|
|
|
|
|
|
And there are many of these errors every day for the past month:
|
|
|
|
|
|
$ grep -c "Error while searching for sidebar facets" dspace.log.*
|
|
dspace.log.2017-11-21:4
|
|
dspace.log.2017-11-22:1
|
|
dspace.log.2017-11-23:4
|
|
dspace.log.2017-11-24:11
|
|
dspace.log.2017-11-25:0
|
|
dspace.log.2017-11-26:1
|
|
dspace.log.2017-11-27:7
|
|
dspace.log.2017-11-28:21
|
|
dspace.log.2017-11-29:31
|
|
dspace.log.2017-11-30:15
|
|
dspace.log.2017-12-01:15
|
|
dspace.log.2017-12-02:20
|
|
dspace.log.2017-12-03:38
|
|
dspace.log.2017-12-04:65
|
|
dspace.log.2017-12-05:43
|
|
dspace.log.2017-12-06:72
|
|
dspace.log.2017-12-07:27
|
|
dspace.log.2017-12-08:15
|
|
dspace.log.2017-12-09:29
|
|
dspace.log.2017-12-10:35
|
|
dspace.log.2017-12-11:20
|
|
dspace.log.2017-12-12:44
|
|
dspace.log.2017-12-13:36
|
|
dspace.log.2017-12-14:59
|
|
dspace.log.2017-12-15:104
|
|
dspace.log.2017-12-16:53
|
|
dspace.log.2017-12-17:66
|
|
dspace.log.2017-12-18:83
|
|
dspace.log.2017-12-19:101
|
|
dspace.log.2017-12-20:74
|
|
dspace.log.2017-12-21:55
|
|
dspace.log.2017-12-22:66
|
|
dspace.log.2017-12-23:50
|
|
dspace.log.2017-12-24:85
|
|
dspace.log.2017-12-25:62
|
|
dspace.log.2017-12-26:49
|
|
dspace.log.2017-12-27:30
|
|
dspace.log.2017-12-28:54
|
|
dspace.log.2017-12-29:68
|
|
dspace.log.2017-12-30:89
|
|
dspace.log.2017-12-31:53
|
|
dspace.log.2018-01-01:45
|
|
dspace.log.2018-01-02:34
|
|
|
|
|
|
|
|
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains
|
|
|
|
|
|
"/>
|
|
<meta name="generator" content="Hugo 0.32.4" />
|
|
|
|
|
|
|
|
<script type="application/ld+json">
|
|
{
|
|
"@context": "http://schema.org",
|
|
"@type": "BlogPosting",
|
|
"headline": "January, 2018",
|
|
"url": "https://alanorth.github.io/cgspace-notes/2018-01/",
|
|
"wordCount": "2965",
|
|
"datePublished": "2018-01-02T08:35:54-08:00",
|
|
"dateModified": "2018-01-12T07:55:01+02:00",
|
|
"author": {
|
|
"@type": "Person",
|
|
"name": "Alan Orth"
|
|
},
|
|
"keywords": "Notes"
|
|
}
|
|
</script>
|
|
|
|
|
|
|
|
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2018-01/">
|
|
|
|
<title>January, 2018 | CGSpace Notes</title>
|
|
|
|
<!-- combined, minified CSS -->
|
|
<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-UajcgyCkTRMsk9Cp3QtZlbZhbw92bDlC8dFxE4Y8+qJ1OFZeNETla8sXTn83xibY" crossorigin="anonymous">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
|
|
<div class="blog-masthead">
|
|
<div class="container">
|
|
<nav class="nav blog-nav">
|
|
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
|
|
|
|
|
|
</nav>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
|
|
<header class="blog-header">
|
|
<div class="container">
|
|
<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
|
|
<p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
|
|
</div>
|
|
</header>
|
|
|
|
|
|
|
|
<div class="container">
|
|
<div class="row">
|
|
<div class="col-sm-8 blog-main">
|
|
|
|
|
|
|
|
|
|
<article class="blog-post">
|
|
<header>
|
|
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2018-01/">January, 2018</a></h2>
|
|
<p class="blog-post-meta"><time datetime="2018-01-02T08:35:54-08:00">Tue Jan 02, 2018</time> by Alan Orth in
|
|
|
|
<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
|
|
|
|
</p>
|
|
</header>
|
|
<h2 id="2018-01-02">2018-01-02</h2>
|
|
|
|
<ul>
|
|
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
|
|
<li>I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary</li>
|
|
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
|
|
<li>In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”</li>
|
|
<li>And just before that I see this:</li>
|
|
</ul>
|
|
|
|
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Ah hah! So the pool was actually empty!</li>
|
|
<li>I need to increase that, let’s try to bump it up from 50 to 75</li>
|
|
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw</li>
|
|
<li>I notice this error quite a few times in dspace.log:</li>
|
|
</ul>
|
|
|
|
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
|
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>And there are many of these errors every day for the past month:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
|
|
dspace.log.2017-11-21:4
|
|
dspace.log.2017-11-22:1
|
|
dspace.log.2017-11-23:4
|
|
dspace.log.2017-11-24:11
|
|
dspace.log.2017-11-25:0
|
|
dspace.log.2017-11-26:1
|
|
dspace.log.2017-11-27:7
|
|
dspace.log.2017-11-28:21
|
|
dspace.log.2017-11-29:31
|
|
dspace.log.2017-11-30:15
|
|
dspace.log.2017-12-01:15
|
|
dspace.log.2017-12-02:20
|
|
dspace.log.2017-12-03:38
|
|
dspace.log.2017-12-04:65
|
|
dspace.log.2017-12-05:43
|
|
dspace.log.2017-12-06:72
|
|
dspace.log.2017-12-07:27
|
|
dspace.log.2017-12-08:15
|
|
dspace.log.2017-12-09:29
|
|
dspace.log.2017-12-10:35
|
|
dspace.log.2017-12-11:20
|
|
dspace.log.2017-12-12:44
|
|
dspace.log.2017-12-13:36
|
|
dspace.log.2017-12-14:59
|
|
dspace.log.2017-12-15:104
|
|
dspace.log.2017-12-16:53
|
|
dspace.log.2017-12-17:66
|
|
dspace.log.2017-12-18:83
|
|
dspace.log.2017-12-19:101
|
|
dspace.log.2017-12-20:74
|
|
dspace.log.2017-12-21:55
|
|
dspace.log.2017-12-22:66
|
|
dspace.log.2017-12-23:50
|
|
dspace.log.2017-12-24:85
|
|
dspace.log.2017-12-25:62
|
|
dspace.log.2017-12-26:49
|
|
dspace.log.2017-12-27:30
|
|
dspace.log.2017-12-28:54
|
|
dspace.log.2017-12-29:68
|
|
dspace.log.2017-12-30:89
|
|
dspace.log.2017-12-31:53
|
|
dspace.log.2018-01-01:45
|
|
dspace.log.2018-01-02:34
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains</li>
|
|
</ul>
|
|
|
|
<p></p>
|
|
|
|
<h2 id="2018-01-03">2018-01-03</h2>
|
|
|
|
<ul>
|
|
<li>I woke up to more up and down of CGSpace, this time UptimeRobot noticed a few rounds of up and down of a few minutes each and Linode also notified of high CPU load from 12 to 2 PM</li>
|
|
<li>Looks like I need to increase the database pool size again:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
|
|
dspace.log.2018-01-01:0
|
|
dspace.log.2018-01-02:1972
|
|
dspace.log.2018-01-03:1909
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>For some reason there were a lot of “active” connections last night:</li>
|
|
</ul>
|
|
|
|
<p><img src="/cgspace-notes/2018/01/postgres_connections-day.png" alt="CGSpace PostgreSQL connections" /></p>
|
|
|
|
<ul>
|
|
<li>The active IPs in XMLUI are:</li>
|
|
</ul>
|
|
|
|
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
|
607 40.77.167.141
|
|
611 2a00:23c3:8c94:7800:392c:a491:e796:9c50
|
|
663 188.226.169.37
|
|
759 157.55.39.245
|
|
887 68.180.229.254
|
|
1037 157.55.39.175
|
|
1068 216.244.66.245
|
|
1495 66.249.64.91
|
|
1934 104.196.152.243
|
|
2219 134.155.96.78
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>134.155.96.78 appears to be at the University of Mannheim in Germany</li>
|
|
<li>They identify as: Mozilla/5.0 (compatible; heritrix/3.2.0 +<a href="http://ifm.uni-mannheim.de">http://ifm.uni-mannheim.de</a>)</li>
|
|
<li>This appears to be the <a href="https://github.com/internetarchive/heritrix3">Internet Archive’s open source bot</a></li>
|
|
<li>They seem to be re-using their Tomcat session so I don’t need to do anything to them just yet:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ grep 134.155.96.78 dspace.log.2018-01-03 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
|
2
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>The API logs show the normal users:</li>
|
|
</ul>
|
|
|
|
<pre><code># cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
|
32 207.46.13.182
|
|
38 40.77.167.132
|
|
38 68.180.229.254
|
|
43 66.249.64.91
|
|
46 40.77.167.141
|
|
49 157.55.39.245
|
|
79 157.55.39.175
|
|
1533 50.116.102.77
|
|
4069 70.32.83.92
|
|
9355 45.5.184.196
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>In other related news I see a sizeable amount of requests coming from python-requests</li>
|
|
<li>For example, just in the last day there were 1700!</li>
|
|
</ul>
|
|
|
|
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -c python-requests
|
|
1773
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>But they come from hundreds of IPs, many of which are 54.x.x.x:</li>
|
|
</ul>
|
|
|
|
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep python-requests | awk '{print $1}' | sort -n | uniq -c | sort -h | tail -n 30
|
|
9 54.144.87.92
|
|
9 54.146.222.143
|
|
9 54.146.249.249
|
|
9 54.158.139.206
|
|
9 54.161.235.224
|
|
9 54.163.41.19
|
|
9 54.163.4.51
|
|
9 54.196.195.107
|
|
9 54.198.89.134
|
|
9 54.80.158.113
|
|
10 54.198.171.98
|
|
10 54.224.53.185
|
|
10 54.226.55.207
|
|
10 54.227.8.195
|
|
10 54.242.234.189
|
|
10 54.242.238.209
|
|
10 54.80.100.66
|
|
11 54.161.243.121
|
|
11 54.205.154.178
|
|
11 54.234.225.84
|
|
11 54.87.23.173
|
|
11 54.90.206.30
|
|
12 54.196.127.62
|
|
12 54.224.242.208
|
|
12 54.226.199.163
|
|
13 54.162.149.249
|
|
13 54.211.182.255
|
|
19 50.17.61.150
|
|
21 54.211.119.107
|
|
139 164.39.7.62
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I have no idea what these are but they seem to be coming from Amazon…</li>
|
|
<li>I guess for now I just have to increase the database connection pool’s max active</li>
|
|
<li>It’s currently 75 and normally I’d just bump it by 25 but let me be a bit daring and push it by 50 to 125, because I used to see at least 121 connections in pg_stat_activity before when we were using the shitty default pooling</li>
|
|
</ul>
|
|
|
|
<h2 id="2018-01-04">2018-01-04</h2>
|
|
|
|
<ul>
|
|
<li>CGSpace went down and up a bunch of times last night and ILRI staff were complaining a lot last night</li>
|
|
<li>The XMLUI logs show this activity:</li>
|
|
</ul>
|
|
|
|
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "4/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
|
968 197.211.63.81
|
|
981 213.55.99.121
|
|
1039 66.249.64.93
|
|
1258 157.55.39.175
|
|
1273 207.46.13.182
|
|
1311 157.55.39.191
|
|
1319 157.55.39.197
|
|
1775 66.249.64.78
|
|
2216 104.196.152.243
|
|
3366 66.249.64.91
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Again we ran out of PostgreSQL database connections, even after bumping the pool max active limit from 50 to 75 to 125 yesterday!</li>
|
|
</ul>
|
|
|
|
<pre><code>2018-01-04 07:36:08,089 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
|
|
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-256] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:125; busy:125; idle:0; lastwait:5000].
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>So for this week that is the number one problem!</li>
|
|
</ul>
|
|
|
|
<pre><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
|
|
dspace.log.2018-01-01:0
|
|
dspace.log.2018-01-02:1972
|
|
dspace.log.2018-01-03:1909
|
|
dspace.log.2018-01-04:1559
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I will just bump the connection limit to 300 because I’m fucking fed up with this shit</li>
|
|
<li>Once I get back to Amman I will have to try to create different database pools for different web applications, like recently discussed on the dspace-tech mailing list</li>
|
|
<li>Create accounts on CGSpace for two CTA staff km4ard@cta.int and bheenick@cta.int</li>
|
|
</ul>
|
|
|
|
<h2 id="2018-01-05">2018-01-05</h2>
|
|
|
|
<ul>
|
|
<li>Peter said that CGSpac was down last night and Tsega restarted Tomcat</li>
|
|
<li>I don’t see any alerts from Linode or UptimeRobot, and there are no PostgreSQL connection errors in the dspace logs for today:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
|
|
dspace.log.2018-01-01:0
|
|
dspace.log.2018-01-02:1972
|
|
dspace.log.2018-01-03:1909
|
|
dspace.log.2018-01-04:1559
|
|
dspace.log.2018-01-05:0
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Daniel asked for help with their DAGRIS server (linode2328112) that has no disk space</li>
|
|
<li>I had a look and there is one Apache 2 log file that is 73GB, with lots of this:</li>
|
|
</ul>
|
|
|
|
<pre><code>[Fri Jan 05 09:31:22.965398 2018] [:error] [pid 9340] [client 213.55.99.121:64476] WARNING: Unable to find a match for "9-16-1-RV.doc" in "/home/files/journals/6//articles/9/". Skipping this file., referer: http://dagris.info/reviewtool/index.php/index/install/upgrade
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I will delete the log file for now and tell Danny</li>
|
|
<li>Also, I’m still seeing a hundred or so of the “ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer” errors in dspace logs, I need to search the dspace-tech mailing list to see what the cause is</li>
|
|
<li>I will run a full Discovery reindex in the mean time to see if it’s something wrong with the Discovery Solr core</li>
|
|
</ul>
|
|
|
|
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
|
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
|
|
|
real 110m43.985s
|
|
user 15m24.960s
|
|
sys 3m14.890s
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Reboot CGSpace and DSpace Test for new kernels (4.14.12-x86_64-linode92) that partially mitigate the <a href="https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/">Spectre and Meltdown CPU vulnerabilities</a></li>
|
|
</ul>
|
|
|
|
<h2 id="2018-01-06">2018-01-06</h2>
|
|
|
|
<ul>
|
|
<li>I’m still seeing Solr errors in the DSpace logs even after the full reindex yesterday:</li>
|
|
</ul>
|
|
|
|
<pre><code>org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1983+TO+1989]': Encountered " "]" "] "" at line 1, column 32.
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I posted a message to the dspace-tech mailing list to see if anyone can help</li>
|
|
</ul>
|
|
|
|
<h2 id="2018-01-09">2018-01-09</h2>
|
|
|
|
<ul>
|
|
<li>Advise Sisay about blank lines in some IITA records</li>
|
|
<li>Generate a list of author affiliations for Peter to clean up:</li>
|
|
</ul>
|
|
|
|
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
|
|
COPY 4515
|
|
</code></pre>
|
|
|
|
<h2 id="2018-01-10">2018-01-10</h2>
|
|
|
|
<ul>
|
|
<li>I looked to see what happened to this year’s Solr statistics sharding task that should have run on 2018-01-01 and of course it failed:</li>
|
|
</ul>
|
|
|
|
<pre><code>Moving: 81742 into core statistics-2010
|
|
Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2010
|
|
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2010
|
|
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
|
|
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
|
|
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
|
|
at org.dspace.statistics.SolrLogger.shardSolrIndex(SourceFile:2243)
|
|
at org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:106)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:498)
|
|
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
|
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
|
Caused by: org.apache.http.client.ClientProtocolException
|
|
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:867)
|
|
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
|
|
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
|
|
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
|
|
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
|
|
... 10 more
|
|
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot retry request with a non-repeatable request entity. The cause lists the reason the original request failed.
|
|
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:659)
|
|
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
|
|
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
|
|
... 14 more
|
|
Caused by: java.net.SocketException: Connection reset
|
|
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:115)
|
|
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
|
|
at org.apache.http.impl.io.AbstractSessionOutputBuffer.flushBuffer(AbstractSessionOutputBuffer.java:159)
|
|
at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:179)
|
|
at org.apache.http.impl.io.ChunkedOutputStream.flushCacheWithAppend(ChunkedOutputStream.java:124)
|
|
at org.apache.http.impl.io.ChunkedOutputStream.write(ChunkedOutputStream.java:181)
|
|
at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:132)
|
|
at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
|
|
at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
|
|
at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
|
|
at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
|
|
at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
|
|
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
|
|
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
|
|
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
|
|
... 16 more
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>DSpace Test has the same error but with creating the 2017 core:</li>
|
|
</ul>
|
|
|
|
<pre><code>Moving: 2243021 into core statistics-2017
|
|
Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2017
|
|
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2017
|
|
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
|
|
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
|
|
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
|
|
at org.dspace.statistics.SolrLogger.shardSolrIndex(SourceFile:2243)
|
|
at org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:106)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:498)
|
|
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
|
|
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
|
Caused by: org.apache.http.client.ClientProtocolException
|
|
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:867)
|
|
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
|
|
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
|
|
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
|
|
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
|
|
... 10 more
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>There is interesting documentation about this on the DSpace Wiki: <a href="https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance#SOLRStatisticsMaintenance-SolrShardingByYear">https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance#SOLRStatisticsMaintenance-SolrShardingByYear</a></li>
|
|
<li>I’m looking to see maybe if we’re hitting the issues mentioned in <a href="https://jira.duraspace.org/browse/DS-2212">DS-2212</a> that were apparently fixed in DSpace 5.2</li>
|
|
<li>I can apparently search for records in the Solr stats core that have an empty <code>owningColl</code> field using this in the Solr admin query: <code>-owningColl:*</code></li>
|
|
<li>On CGSpace I see 48,000,000 records that have an <code>owningColl</code> field and 34,000,000 that don’t:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ http 'http://localhost:3000/solr/statistics/select?q=owningColl%3A*&wt=json&indent=true' | grep numFound
|
|
"response":{"numFound":48476327,"start":0,"docs":[
|
|
$ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=json&indent=true' | grep numFound
|
|
"response":{"numFound":34879872,"start":0,"docs":[
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I tested the <code>dspace stats-util -s</code> process on my local machine and it failed the same way</li>
|
|
<li>It doesn’t seem to be helpful, but the dspace log shows this:</li>
|
|
</ul>
|
|
|
|
<pre><code>2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
|
|
2018-01-10 10:51:19,301 INFO org.dspace.statistics.SolrLogger @ Moving: 3821 records into core statistics-2016
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Terry Brady has written some notes on the DSpace Wiki about Solr sharing issues: <a href="https://wiki.duraspace.org/display/%7Eterrywbrady/Statistics+Import+Export+Issues">https://wiki.duraspace.org/display/%7Eterrywbrady/Statistics+Import+Export+Issues</a></li>
|
|
<li>Uptime Robot said that CGSpace went down at around 9:43 AM</li>
|
|
<li>I looked at PostgreSQL’s <code>pg_stat_activity</code> table and saw 161 active connections, but no pool errors in the DSpace logs:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ grep -c "Timeout: Pool empty." dspace.log.2018-01-10
|
|
0
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>The XMLUI logs show quite a bit of activity today:</li>
|
|
</ul>
|
|
|
|
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
|
951 207.46.13.159
|
|
954 157.55.39.123
|
|
1217 95.108.181.88
|
|
1503 104.196.152.243
|
|
6455 70.36.107.50
|
|
11412 70.36.107.190
|
|
16730 70.36.107.49
|
|
17386 2607:fa98:40:9:26b6:fdff:feff:1c96
|
|
21566 2607:fa98:40:9:26b6:fdff:feff:195d
|
|
45384 2607:fa98:40:9:26b6:fdff:feff:1888
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>The user agent for the top six or so IPs are all the same:</li>
|
|
</ul>
|
|
|
|
<pre><code>"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36"
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li><code>whois</code> says they come from <a href="http://www.perfectip.net/">Perfect IP</a></li>
|
|
<li>I’ve never seen those top IPs before, but they have created 50,000 Tomcat sessions today:</li>
|
|
</ul>
|
|
|
|
<pre><code>$ grep -E '(2607:fa98:40:9:26b6:fdff:feff:1888|2607:fa98:40:9:26b6:fdff:feff:195d|2607:fa98:40:9:26b6:fdff:feff:1c96|70.36.107.49|70.36.107.190|70.36.107.50)' /home/cgspace.cgiar.org/log/dspace.log.2018-01-10 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
|
49096
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Rather than blocking their IPs, I think I might just add their user agent to the “badbots” zone with Baidu, because they seem to be the only ones using that user agent:</li>
|
|
</ul>
|
|
|
|
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari
|
|
/537.36" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
|
6796 70.36.107.50
|
|
11870 70.36.107.190
|
|
17323 70.36.107.49
|
|
19204 2607:fa98:40:9:26b6:fdff:feff:1c96
|
|
23401 2607:fa98:40:9:26b6:fdff:feff:195d
|
|
47875 2607:fa98:40:9:26b6:fdff:feff:1888
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I added the user agent to nginx’s badbots limit req zone but upon testing the config I got an error:</li>
|
|
</ul>
|
|
|
|
<pre><code># nginx -t
|
|
nginx: [emerg] could not build map_hash, you should increase map_hash_bucket_size: 64
|
|
nginx: configuration file /etc/nginx/nginx.conf test failed
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>According to nginx docs the <a href="https://nginx.org/en/docs/hash.html">bucket size should be a multiple of the CPU’s cache alignment</a>, which is 64 for us:</li>
|
|
</ul>
|
|
|
|
<pre><code># cat /proc/cpuinfo | grep cache_alignment | head -n1
|
|
cache_alignment : 64
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>On our servers that is 64, so I increased this parameter to 128 and deployed the changes to nginx</li>
|
|
<li>Almost immediately the PostgreSQL connections dropped back down to 40 or so, and UptimeRobot said the site was back up</li>
|
|
<li>So that’s interesting that we’re not out of PostgreSQL connections (current pool maxActive is 300!) but the system is “down” to UptimeRobot and very slow to use</li>
|
|
<li>Linode continues to test mitigations for Meltdown and Spectre: <a href="https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/">https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/</a></li>
|
|
<li>I rebooted DSpace Test to see if the kernel will be updated (currently Linux 4.14.12-x86_64-linode92)… nope.</li>
|
|
<li>It looks like Linode will reboot the KVM hosts later this week, though</li>
|
|
<li>Udana from WLE asked if we could give him permission to upload CSVs to CGSpace (which would require super admin access)</li>
|
|
<li>Citing concerns with metadata quality, I suggested adding him on DSpace Test first</li>
|
|
<li>I opened a ticket with Atmire to ask them about DSpace 5.8 compatibility: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560</a></li>
|
|
</ul>
|
|
|
|
<h2 id="2018-01-11">2018-01-11</h2>
|
|
|
|
<ul>
|
|
<li>The PostgreSQL and firewall graphs from this week show clearly the load from the new bot from PerfectIP.net yesterday:</li>
|
|
</ul>
|
|
|
|
<p><img src="/cgspace-notes/2018/01/postgres_connections-day-perfectip.png" alt="PostgreSQL load" />
|
|
<img src="/cgspace-notes/2018/01/firewall-perfectip.png" alt="Firewall load" /></p>
|
|
|
|
<ul>
|
|
<li>Linode rebooted DSpace Test and CGSpace for their host hypervisor kernel updates</li>
|
|
<li>Following up with the Solr sharding issue on the dspace-tech mailing list, I noticed this interesting snippet in the Tomcat <code>localhost_access_log</code> at the time of my sharding attempt on my test machine:</li>
|
|
</ul>
|
|
|
|
<pre><code>127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?q=type%3A2+AND+id%3A1&wt=javabin&version=2 HTTP/1.1" 200 107
|
|
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?q=*%3A*&rows=0&facet=true&facet.range=time&facet.range.start=NOW%2FYEAR-18YEARS&facet.range.end=NOW%2FYEAR%2B0YEARS&facet.range.gap=%2B1YEAR&facet.mincount=1&wt=javabin&version=2 HTTP/1.1" 200 447
|
|
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/admin/cores?action=STATUS&core=statistics-2016&indexInfo=true&wt=javabin&version=2 HTTP/1.1" 200 76
|
|
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/admin/cores?action=CREATE&name=statistics-2016&instanceDir=statistics&dataDir=%2FUsers%2Faorth%2Fdspace%2Fsolr%2Fstatistics-2016%2Fdata&wt=javabin&version=2 HTTP/1.1" 200 63
|
|
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?csv.mv.separator=%7C&q=*%3A*&fq=time%3A%28%5B2016%5C-01%5C-01T00%5C%3A00%5C%3A00Z+TO+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%5D+NOT+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%29&rows=10000&wt=csv HTTP/1.1" 200 2137630
|
|
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/admin/luke?show=schema&wt=javabin&version=2 HTTP/1.1" 200 16253
|
|
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "POST /solr//statistics-2016/update/csv?commit=true&softCommit=false&waitSearcher=true&f.previousWorkflowStep.split=true&f.previousWorkflowStep.separator=%7C&f.previousWorkflowStep.encapsulator=%22&f.actingGroupId.split=true&f.actingGroupId.separator=%7C&f.actingGroupId.encapsulator=%22&f.containerCommunity.split=true&f.containerCommunity.separator=%7C&f.containerCommunity.encapsulator=%22&f.range.split=true&f.range.separator=%7C&f.range.encapsulator=%22&f.containerItem.split=true&f.containerItem.separator=%7C&f.containerItem.encapsulator=%22&f.p_communities_map.split=true&f.p_communities_map.separator=%7C&f.p_communities_map.encapsulator=%22&f.ngram_query_search.split=true&f.ngram_query_search.separator=%7C&f.ngram_query_search.encapsulator=%22&f.containerBitstream.split=true&f.containerBitstream.separator=%7C&f.containerBitstream.encapsulator=%22&f.owningItem.split=true&f.owningItem.separator=%7C&f.owningItem.encapsulator=%22&f.actingGroupParentId.split=true&f.actingGroupParentId.separator=%7C&f.actingGroupParentId.encapsulator=%22&f.text.split=true&f.text.separator=%7C&f.text.encapsulator=%22&f.simple_query_search.split=true&f.simple_query_search.separator=%7C&f.simple_query_search.encapsulator=%22&f.owningComm.split=true&f.owningComm.separator=%7C&f.owningComm.encapsulator=%22&f.owner.split=true&f.owner.separator=%7C&f.owner.encapsulator=%22&f.filterquery.split=true&f.filterquery.separator=%7C&f.filterquery.encapsulator=%22&f.p_group_map.split=true&f.p_group_map.separator=%7C&f.p_group_map.encapsulator=%22&f.actorMemberGroupId.split=true&f.actorMemberGroupId.separator=%7C&f.actorMemberGroupId.encapsulator=%22&f.bitstreamId.split=true&f.bitstreamId.separator=%7C&f.bitstreamId.encapsulator=%22&f.group_name.split=true&f.group_name.separator=%7C&f.group_name.encapsulator=%22&f.p_communities_name.split=true&f.p_communities_name.separator=%7C&f.p_communities_name.encapsulator=%22&f.query.split=true&f.query.separator=%7C&f.query.encapsulator=%22&f.workflowStep.split=true&f.workflowStep.separator=%7C&f.workflowStep.encapsulator=%22&f.containerCollection.split=true&f.containerCollection.separator=%7C&f.containerCollection.encapsulator=%22&f.complete_query_search.split=true&f.complete_query_search.separator=%7C&f.complete_query_search.encapsulator=%22&f.p_communities_id.split=true&f.p_communities_id.separator=%7C&f.p_communities_id.encapsulator=%22&f.rangeDescription.split=true&f.rangeDescription.separator=%7C&f.rangeDescription.encapsulator=%22&f.group_id.split=true&f.group_id.separator=%7C&f.group_id.encapsulator=%22&f.bundleName.split=true&f.bundleName.separator=%7C&f.bundleName.encapsulator=%22&f.ngram_simplequery_search.split=true&f.ngram_simplequery_search.separator=%7C&f.ngram_simplequery_search.encapsulator=%22&f.group_map.split=true&f.group_map.separator=%7C&f.group_map.encapsulator=%22&f.owningColl.split=true&f.owningColl.separator=%7C&f.owningColl.encapsulator=%22&f.p_group_id.split=true&f.p_group_id.separator=%7C&f.p_group_id.encapsulator=%22&f.p_group_name.split=true&f.p_group_name.separator=%7C&f.p_group_name.encapsulator=%22&wt=javabin&version=2 HTTP/1.1" 409 156
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>The new core is created but when DSpace attempts to POST to it there is an HTTP 409 error</li>
|
|
<li>This is apparently a common Solr error code that means “version conflict”: <a href="http://yonik.com/solr/optimistic-concurrency/">http://yonik.com/solr/optimistic-concurrency/</a></li>
|
|
<li>Looks like that bot from the PerfectIP.net host ended up making about 450,000 requests to XMLUI alone yesterday:</li>
|
|
</ul>
|
|
|
|
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
|
|
21572 70.36.107.50
|
|
30722 70.36.107.190
|
|
34566 70.36.107.49
|
|
101829 2607:fa98:40:9:26b6:fdff:feff:195d
|
|
111535 2607:fa98:40:9:26b6:fdff:feff:1c96
|
|
161797 2607:fa98:40:9:26b6:fdff:feff:1888
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Wow, I just figured out how to set the application name of each database pool in the JNDI config of Tomcat’s <code>server.xml</code>:</li>
|
|
</ul>
|
|
|
|
<pre><code><Resource name="jdbc/dspaceWeb" auth="Container" type="javax.sql.DataSource"
|
|
driverClassName="org.postgresql.Driver"
|
|
url="jdbc:postgresql://localhost:5432/dspacetest?ApplicationName=dspaceWeb"
|
|
username="dspace"
|
|
password="dspace"
|
|
initialSize='5'
|
|
maxActive='75'
|
|
maxIdle='15'
|
|
minIdle='5'
|
|
maxWait='5000'
|
|
validationQuery='SELECT 1'
|
|
testOnBorrow='true' />
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>So theoretically I could name each connection “xmlui” or “dspaceWeb” or something meaningful and it would show up in PostgreSQL’s <code>pg_stat_activity</code> table!</li>
|
|
<li>This would be super helpful for figuring out where load was coming from (now I wonder if I could figure out how to graph this)</li>
|
|
<li>Also, I realized that the <code>db.jndi</code> parameter in dspace.cfg needs to match the <code>name</code> value in your applicaiton’s context—not the <code>global</code> one</li>
|
|
<li>Ah hah! Also, I can name the default DSpace connection pool in dspace.cfg as well, like:</li>
|
|
</ul>
|
|
|
|
<pre><code>db.url = jdbc:postgresql://localhost:5432/dspacetest?ApplicationName=dspaceDefault
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>With that it is super easy to see where PostgreSQL connections are coming from in <code>pg_stat_activity</code></li>
|
|
</ul>
|
|
|
|
<h2 id="2018-01-12">2018-01-12</h2>
|
|
|
|
<ul>
|
|
<li>I’m looking at the <a href="https://wiki.duraspace.org/display/DSDOC6x/Installing+DSpace#InstallingDSpace-ServletEngine(ApacheTomcat7orlater,Jetty,CauchoResinorequivalent)">DSpace 6.0 Install docs</a> and notice they tweak the number of threads in their Tomcat connector:</li>
|
|
</ul>
|
|
|
|
<pre><code><!-- Define a non-SSL HTTP/1.1 Connector on port 8080 -->
|
|
<Connector port="8080"
|
|
maxThreads="150"
|
|
minSpareThreads="25"
|
|
maxSpareThreads="75"
|
|
enableLookups="false"
|
|
redirectPort="8443"
|
|
acceptCount="100"
|
|
connectionTimeout="20000"
|
|
disableUploadTimeout="true"
|
|
URIEncoding="UTF-8"/>
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>In Tomcat 8.5 the <code>maxThreads</code> defaults to 200 which is probably fine, but tweaking <code>minSpareThreads</code> could be good</li>
|
|
<li>I don’t see a setting for <code>maxSpareThreads</code> in the docs so that might be an error</li>
|
|
<li>Looks like in Tomcat 8.5 the default URIEncoding for Connectors is UTF-8, so we don’t need to specify that manually anymore: <a href="https://tomcat.apache.org/tomcat-8.5-doc/config/http.html">https://tomcat.apache.org/tomcat-8.5-doc/config/http.html</a></li>
|
|
<li></li>
|
|
<li>Ooh, I just the <code>acceptorThreadCount</code> setting (in Tomcat 7 and 8.5):</li>
|
|
</ul>
|
|
|
|
<pre><code>The number of threads to be used to accept connections. Increase this value on a multi CPU machine, although you would never really need more than 2. Also, with a lot of non keep alive connections, you might want to increase this value as well. Default value is 1.
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>That could be very interesting</li>
|
|
</ul>
|
|
|
|
<h2 id="2018-01-13">2018-01-13</h2>
|
|
|
|
<ul>
|
|
<li>Still testing DSpace 6.2 on Tomcat 8.5.24</li>
|
|
<li>Catalina errors at Tomcat 8.5 startup:</li>
|
|
</ul>
|
|
|
|
<pre><code>13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxActive is not used in DBCP2, use maxTotal instead. maxTotal default value is 8. You have set value of "35" for "maxActive" property, which is being ignored.
|
|
13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxWait is not used in DBCP2 , use maxWaitMillis instead. maxWaitMillis default value is -1. You have set value of "5000" for "maxWait" property, which is being ignored.
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>I looked in my Tomcat 7.0.82 logs and I don’t see anything about DBCP2 errors, so I guess this a Tomcat 8.0.x or 8.5.x thing</li>
|
|
<li>DBCP2 appears to be Tomcat 8.0.x and up according to the <a href="https://tomcat.apache.org/migration-8.html">Tomcat 8.0 migration guide</a></li>
|
|
<li>I have updated our <a href="https://github.com/ilri/rmg-ansible-public/commit/246f9d7b06d53794f189f0cc57ad5ddd80f0b014">Ansible infrastructure scripts</a> so that it will be ready whenever we switch to Tomcat 8 (probably with Ubuntu 18.04 later this year)</li>
|
|
<li>When I enable the ResourceLink in the ROOT.xml context I get the following error in the Tomcat localhost log:</li>
|
|
</ul>
|
|
|
|
<pre><code>13-Jan-2018 14:14:36.017 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.app.util.DSpaceWebappListener]
|
|
java.lang.ExceptionInInitializerError
|
|
at org.dspace.app.util.AbstractDSpaceWebapp.register(AbstractDSpaceWebapp.java:74)
|
|
at org.dspace.app.util.DSpaceWebappListener.contextInitialized(DSpaceWebappListener.java:31)
|
|
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4745)
|
|
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5207)
|
|
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
|
|
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:752)
|
|
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:728)
|
|
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734)
|
|
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:629)
|
|
at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1839)
|
|
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
|
|
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
|
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
|
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
|
|
at java.lang.Thread.run(Thread.java:748)
|
|
Caused by: java.lang.NullPointerException
|
|
at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:547)
|
|
at org.dspace.core.Context.<clinit>(Context.java:103)
|
|
... 15 more
|
|
</code></pre>
|
|
|
|
<ul>
|
|
<li>Interesting blog post benchmarking Tomcat JDBC vs Apache Commons DBCP2, with configuration snippets: <a href="http://www.tugay.biz/2016/07/tomcat-connection-pool-vs-apache.html">http://www.tugay.biz/2016/07/tomcat-connection-pool-vs-apache.html</a></li>
|
|
<li>The Tomcat vs Apache pool thing is confusing, but apparently we’re using Apache Commons DBCP2 because we don’t specify <code>factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"</code> in our global resource</li>
|
|
<li>So at least I know that I’m not looking for documentation or troubleshooting on the Tomcat JDBC pool!</li>
|
|
<li>I looked at <code>pg_stat_activity</code> during Tomcat’s startup and I see that the pool created in server.xml is indeed connecting, just that nothing uses it</li>
|
|
<li>Also, the fallback connection parameters specified in local.cfg (not dspace.cfg) are used</li>
|
|
<li>Shit, this might actually be a DSpace error: <a href="https://jira.duraspace.org/browse/DS-3434">https://jira.duraspace.org/browse/DS-3434</a></li>
|
|
<li>I’ll comment on that issue</li>
|
|
</ul>
|
|
|
|
|
|
|
|
|
|
|
|
</article>
|
|
|
|
|
|
|
|
</div> <!-- /.blog-main -->
|
|
|
|
<aside class="col-sm-3 ml-auto blog-sidebar">
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Recent Posts</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
|
|
<li><a href="/cgspace-notes/2018-01/">January, 2018</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2017-12/">December, 2017</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2017-11/">November, 2017</a></li>
|
|
|
|
<li><a href="/cgspace-notes/2017-10/">October, 2017</a></li>
|
|
|
|
<li><a href="/cgspace-notes/cgiar-library-migration/">CGIAR Library Migration</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
|
|
|
|
|
|
<section class="sidebar-module">
|
|
<h4>Links</h4>
|
|
<ol class="list-unstyled">
|
|
|
|
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
|
|
|
|
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
|
|
|
|
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
|
|
|
|
</ol>
|
|
</section>
|
|
|
|
</aside>
|
|
|
|
|
|
</div> <!-- /.row -->
|
|
</div> <!-- /.container -->
|
|
|
|
|
|
|
|
<footer class="blog-footer">
|
|
<p>
|
|
|
|
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
|
|
|
|
</p>
|
|
<p>
|
|
<a href="#">Back to top</a>
|
|
</p>
|
|
</footer>
|
|
|
|
|
|
</body>
|
|
|
|
</html>
|