Add notes for 2020-07-12

This commit is contained in:
Alan Orth 2020-07-12 15:52:26 +03:00
parent 1cc1e23aba
commit 425a85df5b
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
28 changed files with 118 additions and 26 deletions

View File

@ -411,4 +411,47 @@ $ csvgrep -c matched -m true 2020-07-09-cgspace-sponsors-crossref.csv | wc -l
174
```
## 2020-07-12
- On 2020-07-10 Macaroni Bros emailed to ask if there are issues with CGSpace because they are getting HTTP 504 on the REST API
- First, I looked in Munin and I see high number of DSpace sessions and threads on Friday evening around midnight, though that was much later than his email:
![DSpace sessions](/cgspace-notes/2020/07/jmx_dspace_sessions-day.png)
![Threads](/cgspace-notes/2020/07/threads-day.png)
![PostgreSQL locks](/cgspace-notes/2020/07/postgres_locks_ALL-day.png)
![PostgreSQL transactions](/cgspace-notes/2020/07/postgres_transactions_ALL-day.png)
- CPU load and memory were not high then, but there was some load on the database and firewall...
- Looking in the nginx logs I see a few IPs we've seen recently, like those 199.47.x.x IPs from Turnitin (which I need to remember to purge from Solr again because I didn't update the spider agents on CGSpace yet) and some new one 186.112.8.167
- Also, the Turnitin bot doesn't re-use its Tomcat JSESSIONID, I see this from today:
```
# grep 199.47.87 dspace.log.2020-07-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
2815
```
- So I need to add this alternative user-agent to the Tomcat Crawler Session Manager valve to force it to re-use a common bot session
- There are around 9,000 requests from `186.112.8.167` in Colombia and has the user agent `Java/1.8.0_241`, but those were mostly to REST API and I don't see any hits in Solr
- Earlier in the day Linode had alerted that there was high outgoing bandwidth
- I see some new bot from 134.155.96.78 made ~10,000 requests with the user agent... but it appears to already be in our DSpace user agent list via COUNTER-Robots:
```
Mozilla/5.0 (compatible; heritrix/3.4.0-SNAPSHOT-2019-02-07T13:53:20Z +http://ifm.uni-mannheim.de)
```
- Generate a list of sponsors to update our controlled vocabulary:
```
dspace=# \COPY (SELECT DISTINCT text_value as "dc.description.sponsorship", count(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=29 GROUP BY "dc.description.sponsorship" ORDER BY count DESC LIMIT 125) TO /tmp/2020-07-12-sponsors.csv;
COPY 125
dspace=# \q
$ csvcut -c 1 --tabs /tmp/2020-07-12-sponsors.csv > dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
# add XML formatting
$ dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
$ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
```
- Deploy latest `5_x-prod` branch on CGSpace (linode18), run all system updates, and reboot the server
- After rebooting it I had to restart Tomcat 7 once to get all Solr statistics cores to come up properly
<!-- vim: set sw=2 ts=2: -->

View File

@ -20,7 +20,7 @@ Since I was restarting Tomcat anyways I decided to redeploy the latest changes f
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-07/" />
<meta property="article:published_time" content="2020-07-01T10:53:54+03:00" />
<meta property="article:modified_time" content="2020-07-09T09:35:58+03:00" />
<meta property="article:modified_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="July, 2020"/>
@ -45,9 +45,9 @@ Since I was restarting Tomcat anyways I decided to redeploy the latest changes f
"@type": "BlogPosting",
"headline": "July, 2020",
"url": "https://alanorth.github.io/cgspace-notes/2020-07/",
"wordCount": "2550",
"wordCount": "2891",
"datePublished": "2020-07-01T10:53:54+03:00",
"dateModified": "2020-07-09T09:35:58+03:00",
"dateModified": "2020-07-09T22:32:29+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -529,7 +529,56 @@ $ wc -l 2020-07-09-cgspace-sponsors.txt
534 2020-07-09-cgspace-sponsors.txt
$ csvgrep -c matched -m true 2020-07-09-cgspace-sponsors-crossref.csv | wc -l
174
</code></pre><!-- raw HTML omitted -->
</code></pre><h2 id="2020-07-12">2020-07-12</h2>
<ul>
<li>On 2020-07-10 Macaroni Bros emailed to ask if there are issues with CGSpace because they are getting HTTP 504 on the REST API
<ul>
<li>First, I looked in Munin and I see high number of DSpace sessions and threads on Friday evening around midnight, though that was much later than his email:</li>
</ul>
</li>
</ul>
<p><img src="/cgspace-notes/2020/07/jmx_dspace_sessions-day.png" alt="DSpace sessions">
<img src="/cgspace-notes/2020/07/threads-day.png" alt="Threads">
<img src="/cgspace-notes/2020/07/postgres_locks_ALL-day.png" alt="PostgreSQL locks">
<img src="/cgspace-notes/2020/07/postgres_transactions_ALL-day.png" alt="PostgreSQL transactions"></p>
<ul>
<li>CPU load and memory were not high then, but there was some load on the database and firewall&hellip;
<ul>
<li>Looking in the nginx logs I see a few IPs we&rsquo;ve seen recently, like those 199.47.x.x IPs from Turnitin (which I need to remember to purge from Solr again because I didn&rsquo;t update the spider agents on CGSpace yet) and some new one 186.112.8.167</li>
<li>Also, the Turnitin bot doesn&rsquo;t re-use its Tomcat JSESSIONID, I see this from today:</li>
</ul>
</li>
</ul>
<pre><code># grep 199.47.87 dspace.log.2020-07-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
2815
</code></pre><ul>
<li>So I need to add this alternative user-agent to the Tomcat Crawler Session Manager valve to force it to re-use a common bot session</li>
<li>There are around 9,000 requests from <code>186.112.8.167</code> in Colombia and has the user agent <code>Java/1.8.0_241</code>, but those were mostly to REST API and I don&rsquo;t see any hits in Solr</li>
<li>Earlier in the day Linode had alerted that there was high outgoing bandwidth
<ul>
<li>I see some new bot from 134.155.96.78 made ~10,000 requests with the user agent&hellip; but it appears to already be in our DSpace user agent list via COUNTER-Robots:</li>
</ul>
</li>
</ul>
<pre><code>Mozilla/5.0 (compatible; heritrix/3.4.0-SNAPSHOT-2019-02-07T13:53:20Z +http://ifm.uni-mannheim.de)
</code></pre><ul>
<li>Generate a list of sponsors to update our controlled vocabulary:</li>
</ul>
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value as &quot;dc.description.sponsorship&quot;, count(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=29 GROUP BY &quot;dc.description.sponsorship&quot; ORDER BY count DESC LIMIT 125) TO /tmp/2020-07-12-sponsors.csv;
COPY 125
dspace=# \q
$ csvcut -c 1 --tabs /tmp/2020-07-12-sponsors.csv &gt; dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
# add XML formatting
$ dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
$ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
</code></pre><ul>
<li>Deploy latest <code>5_x-prod</code> branch on CGSpace (linode18), run all system updates, and reboot the server
<ul>
<li>After rebooting it I had to restart Tomcat 7 once to get all Solr statistics cores to come up properly</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-07-09T09:35:58+03:00" />
<meta property="og:updated_time" content="2020-07-09T22:32:29+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -4,27 +4,27 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2020-07-09T09:35:58+03:00</lastmod>
<lastmod>2020-07-09T22:32:29+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2020-07-09T09:35:58+03:00</lastmod>
<lastmod>2020-07-09T22:32:29+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2020-07/</loc>
<lastmod>2020-07-09T09:35:58+03:00</lastmod>
<lastmod>2020-07-09T22:32:29+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2020-07-09T09:35:58+03:00</lastmod>
<lastmod>2020-07-09T22:32:29+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2020-07-09T09:35:58+03:00</lastmod>
<lastmod>2020-07-09T22:32:29+03:00</lastmod>
</url>
<url>

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB