Add notes for 2020-07-12
@ -411,4 +411,47 @@ $ csvgrep -c matched -m true 2020-07-09-cgspace-sponsors-crossref.csv | wc -l
|
||||
174
|
||||
```
|
||||
|
||||
## 2020-07-12
|
||||
|
||||
- On 2020-07-10 Macaroni Bros emailed to ask if there are issues with CGSpace because they are getting HTTP 504 on the REST API
|
||||
- First, I looked in Munin and I see high number of DSpace sessions and threads on Friday evening around midnight, though that was much later than his email:
|
||||
|
||||
![DSpace sessions](/cgspace-notes/2020/07/jmx_dspace_sessions-day.png)
|
||||
![Threads](/cgspace-notes/2020/07/threads-day.png)
|
||||
![PostgreSQL locks](/cgspace-notes/2020/07/postgres_locks_ALL-day.png)
|
||||
![PostgreSQL transactions](/cgspace-notes/2020/07/postgres_transactions_ALL-day.png)
|
||||
|
||||
- CPU load and memory were not high then, but there was some load on the database and firewall...
|
||||
- Looking in the nginx logs I see a few IPs we've seen recently, like those 199.47.x.x IPs from Turnitin (which I need to remember to purge from Solr again because I didn't update the spider agents on CGSpace yet) and some new one 186.112.8.167
|
||||
- Also, the Turnitin bot doesn't re-use its Tomcat JSESSIONID, I see this from today:
|
||||
|
||||
```
|
||||
# grep 199.47.87 dspace.log.2020-07-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
2815
|
||||
```
|
||||
|
||||
- So I need to add this alternative user-agent to the Tomcat Crawler Session Manager valve to force it to re-use a common bot session
|
||||
- There are around 9,000 requests from `186.112.8.167` in Colombia and has the user agent `Java/1.8.0_241`, but those were mostly to REST API and I don't see any hits in Solr
|
||||
- Earlier in the day Linode had alerted that there was high outgoing bandwidth
|
||||
- I see some new bot from 134.155.96.78 made ~10,000 requests with the user agent... but it appears to already be in our DSpace user agent list via COUNTER-Robots:
|
||||
|
||||
```
|
||||
Mozilla/5.0 (compatible; heritrix/3.4.0-SNAPSHOT-2019-02-07T13:53:20Z +http://ifm.uni-mannheim.de)
|
||||
```
|
||||
|
||||
- Generate a list of sponsors to update our controlled vocabulary:
|
||||
|
||||
```
|
||||
dspace=# \COPY (SELECT DISTINCT text_value as "dc.description.sponsorship", count(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=29 GROUP BY "dc.description.sponsorship" ORDER BY count DESC LIMIT 125) TO /tmp/2020-07-12-sponsors.csv;
|
||||
COPY 125
|
||||
dspace=# \q
|
||||
$ csvcut -c 1 --tabs /tmp/2020-07-12-sponsors.csv > dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
|
||||
# add XML formatting
|
||||
$ dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
|
||||
$ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
|
||||
```
|
||||
|
||||
- Deploy latest `5_x-prod` branch on CGSpace (linode18), run all system updates, and reboot the server
|
||||
- After rebooting it I had to restart Tomcat 7 once to get all Solr statistics cores to come up properly
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -20,7 +20,7 @@ Since I was restarting Tomcat anyways I decided to redeploy the latest changes f
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-07/" />
|
||||
<meta property="article:published_time" content="2020-07-01T10:53:54+03:00" />
|
||||
<meta property="article:modified_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="article:modified_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="July, 2020"/>
|
||||
@ -45,9 +45,9 @@ Since I was restarting Tomcat anyways I decided to redeploy the latest changes f
|
||||
"@type": "BlogPosting",
|
||||
"headline": "July, 2020",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2020-07/",
|
||||
"wordCount": "2550",
|
||||
"wordCount": "2891",
|
||||
"datePublished": "2020-07-01T10:53:54+03:00",
|
||||
"dateModified": "2020-07-09T09:35:58+03:00",
|
||||
"dateModified": "2020-07-09T22:32:29+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -529,7 +529,56 @@ $ wc -l 2020-07-09-cgspace-sponsors.txt
|
||||
534 2020-07-09-cgspace-sponsors.txt
|
||||
$ csvgrep -c matched -m true 2020-07-09-cgspace-sponsors-crossref.csv | wc -l
|
||||
174
|
||||
</code></pre><!-- raw HTML omitted -->
|
||||
</code></pre><h2 id="2020-07-12">2020-07-12</h2>
|
||||
<ul>
|
||||
<li>On 2020-07-10 Macaroni Bros emailed to ask if there are issues with CGSpace because they are getting HTTP 504 on the REST API
|
||||
<ul>
|
||||
<li>First, I looked in Munin and I see high number of DSpace sessions and threads on Friday evening around midnight, though that was much later than his email:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p><img src="/cgspace-notes/2020/07/jmx_dspace_sessions-day.png" alt="DSpace sessions">
|
||||
<img src="/cgspace-notes/2020/07/threads-day.png" alt="Threads">
|
||||
<img src="/cgspace-notes/2020/07/postgres_locks_ALL-day.png" alt="PostgreSQL locks">
|
||||
<img src="/cgspace-notes/2020/07/postgres_transactions_ALL-day.png" alt="PostgreSQL transactions"></p>
|
||||
<ul>
|
||||
<li>CPU load and memory were not high then, but there was some load on the database and firewall…
|
||||
<ul>
|
||||
<li>Looking in the nginx logs I see a few IPs we’ve seen recently, like those 199.47.x.x IPs from Turnitin (which I need to remember to purge from Solr again because I didn’t update the spider agents on CGSpace yet) and some new one 186.112.8.167</li>
|
||||
<li>Also, the Turnitin bot doesn’t re-use its Tomcat JSESSIONID, I see this from today:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code># grep 199.47.87 dspace.log.2020-07-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
2815
|
||||
</code></pre><ul>
|
||||
<li>So I need to add this alternative user-agent to the Tomcat Crawler Session Manager valve to force it to re-use a common bot session</li>
|
||||
<li>There are around 9,000 requests from <code>186.112.8.167</code> in Colombia and has the user agent <code>Java/1.8.0_241</code>, but those were mostly to REST API and I don’t see any hits in Solr</li>
|
||||
<li>Earlier in the day Linode had alerted that there was high outgoing bandwidth
|
||||
<ul>
|
||||
<li>I see some new bot from 134.155.96.78 made ~10,000 requests with the user agent… but it appears to already be in our DSpace user agent list via COUNTER-Robots:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>Mozilla/5.0 (compatible; heritrix/3.4.0-SNAPSHOT-2019-02-07T13:53:20Z +http://ifm.uni-mannheim.de)
|
||||
</code></pre><ul>
|
||||
<li>Generate a list of sponsors to update our controlled vocabulary:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value as "dc.description.sponsorship", count(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=29 GROUP BY "dc.description.sponsorship" ORDER BY count DESC LIMIT 125) TO /tmp/2020-07-12-sponsors.csv;
|
||||
COPY 125
|
||||
dspace=# \q
|
||||
$ csvcut -c 1 --tabs /tmp/2020-07-12-sponsors.csv > dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
|
||||
# add XML formatting
|
||||
$ dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
|
||||
$ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
|
||||
</code></pre><ul>
|
||||
<li>Deploy latest <code>5_x-prod</code> branch on CGSpace (linode18), run all system updates, and reboot the server
|
||||
<ul>
|
||||
<li>After rebooting it I had to restart Tomcat 7 once to get all Solr statistics cores to come up properly</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
||||
|
BIN
docs/2020/07/jmx_dspace_sessions-day.png
Normal file
After Width: | Height: | Size: 8.7 KiB |
BIN
docs/2020/07/postgres_locks_ALL-day.png
Normal file
After Width: | Height: | Size: 14 KiB |
BIN
docs/2020/07/postgres_transactions_ALL-day.png
Normal file
After Width: | Height: | Size: 7.4 KiB |
BIN
docs/2020/07/threads-day.png
Normal file
After Width: | Height: | Size: 12 KiB |
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -4,27 +4,27 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
|
||||
<lastmod>2020-07-09T09:35:58+03:00</lastmod>
|
||||
<lastmod>2020-07-09T22:32:29+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2020-07-09T09:35:58+03:00</lastmod>
|
||||
<lastmod>2020-07-09T22:32:29+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2020-07/</loc>
|
||||
<lastmod>2020-07-09T09:35:58+03:00</lastmod>
|
||||
<lastmod>2020-07-09T22:32:29+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2020-07-09T09:35:58+03:00</lastmod>
|
||||
<lastmod>2020-07-09T22:32:29+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2020-07-09T09:35:58+03:00</lastmod>
|
||||
<lastmod>2020-07-09T22:32:29+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
|
BIN
static/2020/07/jmx_dspace_sessions-day.png
Normal file
After Width: | Height: | Size: 8.7 KiB |
BIN
static/2020/07/postgres_locks_ALL-day.png
Normal file
After Width: | Height: | Size: 14 KiB |
BIN
static/2020/07/postgres_transactions_ALL-day.png
Normal file
After Width: | Height: | Size: 7.4 KiB |
BIN
static/2020/07/threads-day.png
Normal file
After Width: | Height: | Size: 12 KiB |