mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 14:45:03 +01:00
Update notes for 2019-04-06
This commit is contained in:
parent
2d768f6486
commit
682a2c2194
@ -109,4 +109,54 @@ statistics-2017: org.apache.solr.common.SolrException:org.apache.solr.common.Sol
|
||||
|
||||
- I restarted it again and all the Solr cores came up properly...
|
||||
|
||||
## 2019-04-06
|
||||
|
||||
- Udana asked why item [10568/91278](https://cgspace.cgiar.org/handle/10568/91278) didn't have an Altmetric badge on CGSpace, but on the [WLE website](https://wle.cgiar.org/food-and-agricultural-innovation-pathways-prosperity) it does
|
||||
- I looked and saw that the WLE website is using the Altmetric score associated with the DOI, and that the Handle has no score at all
|
||||
- I tweeted the item and I assume this will link the Handle with the DOI in the system
|
||||
- Linode sent an alert that there was high CPU usage this morning on CGSpace (linode18) and these were the top IPs in the webserver access logs around the time:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "06/Apr/2019:(06|07|08|09)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
222 18.195.78.144
|
||||
245 207.46.13.58
|
||||
303 207.46.13.194
|
||||
328 66.249.79.33
|
||||
564 207.46.13.210
|
||||
566 66.249.79.62
|
||||
575 40.77.167.66
|
||||
1803 66.249.79.59
|
||||
2834 2a01:4f8:140:3192::2
|
||||
9623 45.5.184.72
|
||||
# zcat --force /var/log/nginx/{rest,oai}.log /var/log/nginx/{rest,oai}.log.1 | grep -E "06/Apr/2019:(06|07|08|09)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
31 66.249.79.62
|
||||
41 207.46.13.210
|
||||
42 40.77.167.66
|
||||
54 42.113.50.219
|
||||
132 66.249.79.59
|
||||
785 2001:41d0:d:1990::
|
||||
1164 45.5.184.72
|
||||
2014 50.116.102.77
|
||||
4267 45.5.186.2
|
||||
4893 205.186.128.185
|
||||
```
|
||||
|
||||
- `45.5.184.72` is in Colombia so it's probably CIAT, and I see they are indeed trying to get crawl the Discover pages on CIAT's datasets collection:
|
||||
|
||||
```
|
||||
GET /handle/10568/72970/discover?filtertype_0=type&filtertype_1=author&filter_relational_operator_1=contains&filter_relational_operator_0=equals&filter_1=&filter_0=Dataset&filtertype=dateIssued&filter_relational_operator=equals&filter=2014
|
||||
```
|
||||
|
||||
- Their user agent is the one I added to the badbots list in nginx last week: "GuzzleHttp/6.3.3 curl/7.47.0 PHP/7.0.30-0ubuntu0.16.04.1"
|
||||
- They made 22,000 requests to Discover on this collection today alone (and it's only 11AM):
|
||||
|
||||
```
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "06/Apr/2019" | grep 45.5.184.72 | grep -oE '/handle/[0-9]+/[0-9]+/discover' | sort | uniq -c
|
||||
22077 /handle/10568/72970/discover
|
||||
```
|
||||
|
||||
- I need to find a contact at CIAT to tell them to use the REST API rather than crawling Discover
|
||||
- Maria from Bioversity recommended that we use the phrase "AGROVOC subject" instead of "Subject" in Listings and Reports
|
||||
- I made a pull request to update this and merged it to the `5_x-prod` branch ([#418](https://github.com/ilri/DSpace/pull/418))
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -38,7 +38,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" />
|
||||
<meta property="article:published_time" content="2019-04-01T09:00:43+03:00"/>
|
||||
<meta property="article:modified_time" content="2019-04-05T22:22:41+03:00"/>
|
||||
<meta property="article:modified_time" content="2019-04-05T23:07:30+03:00"/>
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="April, 2019"/>
|
||||
@ -81,9 +81,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
"@type": "BlogPosting",
|
||||
"headline": "April, 2019",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2019-04/",
|
||||
"wordCount": "661",
|
||||
"wordCount": "980",
|
||||
"datePublished": "2019-04-01T09:00:43+03:00",
|
||||
"dateModified": "2019-04-05T22:22:41+03:00",
|
||||
"dateModified": "2019-04-05T23:07:30+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -286,6 +286,67 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
<li>I restarted it again and all the Solr cores came up properly…</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2019-04-06">2019-04-06</h2>
|
||||
|
||||
<ul>
|
||||
<li>Udana asked why item <a href="https://cgspace.cgiar.org/handle/10568/91278"><sup>10568</sup>⁄<sub>91278</sub></a> didn’t have an Altmetric badge on CGSpace, but on the <a href="https://wle.cgiar.org/food-and-agricultural-innovation-pathways-prosperity">WLE website</a> it does
|
||||
|
||||
<ul>
|
||||
<li>I looked and saw that the WLE website is using the Altmetric score associated with the DOI, and that the Handle has no score at all</li>
|
||||
<li>I tweeted the item and I assume this will link the Handle with the DOI in the system</li>
|
||||
</ul></li>
|
||||
<li>Linode sent an alert that there was high CPU usage this morning on CGSpace (linode18) and these were the top IPs in the webserver access logs around the time:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/{access,error,library-access}.log /var/log/nginx/{access,error,library-access}.log.1 | grep -E "06/Apr/2019:(06|07|08|09)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
222 18.195.78.144
|
||||
245 207.46.13.58
|
||||
303 207.46.13.194
|
||||
328 66.249.79.33
|
||||
564 207.46.13.210
|
||||
566 66.249.79.62
|
||||
575 40.77.167.66
|
||||
1803 66.249.79.59
|
||||
2834 2a01:4f8:140:3192::2
|
||||
9623 45.5.184.72
|
||||
# zcat --force /var/log/nginx/{rest,oai}.log /var/log/nginx/{rest,oai}.log.1 | grep -E "06/Apr/2019:(06|07|08|09)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
31 66.249.79.62
|
||||
41 207.46.13.210
|
||||
42 40.77.167.66
|
||||
54 42.113.50.219
|
||||
132 66.249.79.59
|
||||
785 2001:41d0:d:1990::
|
||||
1164 45.5.184.72
|
||||
2014 50.116.102.77
|
||||
4267 45.5.186.2
|
||||
4893 205.186.128.185
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li><code>45.5.184.72</code> is in Colombia so it’s probably CIAT, and I see they are indeed trying to get crawl the Discover pages on CIAT’s datasets collection:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>GET /handle/10568/72970/discover?filtertype_0=type&filtertype_1=author&filter_relational_operator_1=contains&filter_relational_operator_0=equals&filter_1=&filter_0=Dataset&filtertype=dateIssued&filter_relational_operator=equals&filter=2014
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Their user agent is the one I added to the badbots list in nginx last week: “GuzzleHttp/6.3.3 curl/7.47.0 PHP/7.0.30-0ubuntu0.16.04.1”</li>
|
||||
<li>They made 22,000 requests to Discover on this collection today alone (and it’s only 11AM):</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "06/Apr/2019" | grep 45.5.184.72 | grep -oE '/handle/[0-9]+/[0-9]+/discover' | sort | uniq -c
|
||||
22077 /handle/10568/72970/discover
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>I need to find a contact at CIAT to tell them to use the REST API rather than crawling Discover</li>
|
||||
<li>Maria from Bioversity recommended that we use the phrase “AGROVOC subject” instead of “Subject” in Listings and Reports
|
||||
|
||||
<ul>
|
||||
<li>I made a pull request to update this and merged it to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/418">#418</a>)</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2019-04/</loc>
|
||||
<lastmod>2019-04-05T22:22:41+03:00</lastmod>
|
||||
<lastmod>2019-04-05T23:07:30+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -219,7 +219,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2019-04-05T22:22:41+03:00</lastmod>
|
||||
<lastmod>2019-04-05T23:07:30+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -230,7 +230,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2019-04-05T22:22:41+03:00</lastmod>
|
||||
<lastmod>2019-04-05T23:07:30+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -242,13 +242,13 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2019-04-05T22:22:41+03:00</lastmod>
|
||||
<lastmod>2019-04-05T23:07:30+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2019-04-05T22:22:41+03:00</lastmod>
|
||||
<lastmod>2019-04-05T23:07:30+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user