mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-26 00:18:21 +01:00
Add notes for 2020-04-20
This commit is contained in:
parent
3b0dbf2f78
commit
32018333d1
@ -174,4 +174,63 @@ dspace=# UPDATE metadatavalue SET text_value='Knight-Jones, Theodore J.D.' WHERE
|
||||
- They said they don't think the glyphicon encoding issue is due to their changes, but I built a new clean version of the vanilla `6_x-dev` branch from before their pull request and it *does not* have the encoding issue in the Mirage 2 header trails
|
||||
- Also, they said we need to use something called `AtomicStatisticsUpdateCLI` to do the Solr legacy integer ID to UUID conversion so I asked for more information about that workflow
|
||||
|
||||
## 2020-04-20
|
||||
|
||||
- Looking into a high rate of outgoing bandwidth from yesterday on CGSpace (linode18):
|
||||
|
||||
```
|
||||
# cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Apr/2020:0[6789]" | goaccess --log-format=COMBINED -
|
||||
```
|
||||
|
||||
- One host in Russia (91.241.19.70) download 23GiB over those few hours in the morning
|
||||
- It looks like all the requests were for one single item's bitstreams:
|
||||
|
||||
```
|
||||
# grep -c 91.241.19.70 /var/log/nginx/access.log.1
|
||||
8900
|
||||
# grep 91.241.19.70 /var/log/nginx/access.log.1 | grep -c '10568/35187'
|
||||
8900
|
||||
```
|
||||
|
||||
- I thought the host might have been Yandex misbehaving, but its user agent is:
|
||||
|
||||
```
|
||||
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_3; nl-nl) AppleWebKit/527 (KHTML, like Gecko) Version/3.1.1 Safari/525.20
|
||||
```
|
||||
|
||||
- I will purge that IP from the Solr statistics using my `check-spider-ip-hits.sh` script:
|
||||
|
||||
```
|
||||
$ ./check-spider-ip-hits.sh -d -f /tmp/ip -p
|
||||
(DEBUG) Using spider IPs file: /tmp/ip
|
||||
(DEBUG) Checking for hits from spider IP: 91.241.19.70
|
||||
Purging 8909 hits from 91.241.19.70 in statistics
|
||||
|
||||
Total number of bot hits purged: 8909
|
||||
```
|
||||
|
||||
- While investigating that I noticed ORCID identifiers missing from a few authors names, so I added them with my `add-orcid-identifiers.py` script:
|
||||
|
||||
```
|
||||
$ ./add-orcid-identifiers-csv.py -i 2020-04-20-add-orcids.csv -db dspace -u dspace -p 'fuuu' -d
|
||||
```
|
||||
|
||||
- The contents of `2020-04-20-add-orcids.csv` was:
|
||||
|
||||
```
|
||||
dc.contributor.author,cg.creator.id
|
||||
"Schut, Marc","Marc Schut: 0000-0002-3361-4581"
|
||||
"Schut, M.","Marc Schut: 0000-0002-3361-4581"
|
||||
"Kamau, G.","Geoffrey Kamau: 0000-0002-6995-4801"
|
||||
"Kamau, G","Geoffrey Kamau: 0000-0002-6995-4801"
|
||||
"Triomphe, Bernard","Bernard Triomphe: 0000-0001-6657-3002"
|
||||
"Waters-Bayer, Ann","Ann Waters-Bayer: 0000-0003-1887-7903"
|
||||
"Klerkx, Laurens","Laurens Klerkx: 0000-0002-1664-886X"
|
||||
```
|
||||
|
||||
- I confirmed some of the authors' names from the report itself, then by looking at their profiles on ORCID.org
|
||||
- Add new ILRI subject "COVID19" to the `5_x-prod` branch
|
||||
- Add new CCAFS Phase II project tags to the `5_x-prod` branch
|
||||
- I will deploy these to CGSpace in the next few days
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -25,7 +25,7 @@ On the same note, the one item Abenet pointed out last week now has a donut with
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-04/" />
|
||||
<meta property="article:published_time" content="2020-04-02T10:53:24+03:00" />
|
||||
<meta property="article:modified_time" content="2020-04-14T20:01:06+03:00" />
|
||||
<meta property="article:modified_time" content="2020-04-17T19:40:30+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="April, 2020"/>
|
||||
@ -55,9 +55,9 @@ On the same note, the one item Abenet pointed out last week now has a donut with
|
||||
"@type": "BlogPosting",
|
||||
"headline": "April, 2020",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2020-04/",
|
||||
"wordCount": "1401",
|
||||
"wordCount": "1660",
|
||||
"datePublished": "2020-04-02T10:53:24+03:00",
|
||||
"dateModified": "2020-04-14T20:01:06+03:00",
|
||||
"dateModified": "2020-04-17T19:40:30+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -308,6 +308,56 @@ $ podman start artifactory
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="2020-04-20">2020-04-20</h2>
|
||||
<ul>
|
||||
<li>Looking into a high rate of outgoing bandwidth from yesterday on CGSpace (linode18):</li>
|
||||
</ul>
|
||||
<pre><code># cat /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Apr/2020:0[6789]" | goaccess --log-format=COMBINED -
|
||||
</code></pre><ul>
|
||||
<li>One host in Russia (91.241.19.70) download 23GiB over those few hours in the morning
|
||||
<ul>
|
||||
<li>It looks like all the requests were for one single item’s bitstreams:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code># grep -c 91.241.19.70 /var/log/nginx/access.log.1
|
||||
8900
|
||||
# grep 91.241.19.70 /var/log/nginx/access.log.1 | grep -c '10568/35187'
|
||||
8900
|
||||
</code></pre><ul>
|
||||
<li>I thought the host might have been Yandex misbehaving, but its user agent is:</li>
|
||||
</ul>
|
||||
<pre><code>Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_3; nl-nl) AppleWebKit/527 (KHTML, like Gecko) Version/3.1.1 Safari/525.20
|
||||
</code></pre><ul>
|
||||
<li>I will purge that IP from the Solr statistics using my <code>check-spider-ip-hits.sh</code> script:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./check-spider-ip-hits.sh -d -f /tmp/ip -p
|
||||
(DEBUG) Using spider IPs file: /tmp/ip
|
||||
(DEBUG) Checking for hits from spider IP: 91.241.19.70
|
||||
Purging 8909 hits from 91.241.19.70 in statistics
|
||||
|
||||
Total number of bot hits purged: 8909
|
||||
</code></pre><ul>
|
||||
<li>While investigating that I noticed ORCID identifiers missing from a few authors names, so I added them with my <code>add-orcid-identifiers.py</code> script:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./add-orcid-identifiers-csv.py -i 2020-04-20-add-orcids.csv -db dspace -u dspace -p 'fuuu' -d
|
||||
</code></pre><ul>
|
||||
<li>The contents of <code>2020-04-20-add-orcids.csv</code> was:</li>
|
||||
</ul>
|
||||
<pre><code>dc.contributor.author,cg.creator.id
|
||||
"Schut, Marc","Marc Schut: 0000-0002-3361-4581"
|
||||
"Schut, M.","Marc Schut: 0000-0002-3361-4581"
|
||||
"Kamau, G.","Geoffrey Kamau: 0000-0002-6995-4801"
|
||||
"Kamau, G","Geoffrey Kamau: 0000-0002-6995-4801"
|
||||
"Triomphe, Bernard","Bernard Triomphe: 0000-0001-6657-3002"
|
||||
"Waters-Bayer, Ann","Ann Waters-Bayer: 0000-0003-1887-7903"
|
||||
"Klerkx, Laurens","Laurens Klerkx: 0000-0002-1664-886X"
|
||||
</code></pre><ul>
|
||||
<li>I confirmed some of the authors’ names from the report itself, then by looking at their profiles on ORCID.org</li>
|
||||
<li>Add new ILRI subject “COVID19” to the <code>5_x-prod</code> branch</li>
|
||||
<li>Add new CCAFS Phase II project tags to the <code>5_x-prod</code> branch</li>
|
||||
<li>I will deploy these to CGSpace in the next few days</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
@ -4,27 +4,27 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2020-04/</loc>
|
||||
<lastmod>2020-04-14T20:01:06+03:00</lastmod>
|
||||
<lastmod>2020-04-17T19:40:30+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
|
||||
<lastmod>2020-04-14T20:01:06+03:00</lastmod>
|
||||
<lastmod>2020-04-17T19:40:30+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2020-04-14T20:01:06+03:00</lastmod>
|
||||
<lastmod>2020-04-17T19:40:30+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2020-04-14T20:01:06+03:00</lastmod>
|
||||
<lastmod>2020-04-17T19:40:30+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2020-04-14T20:01:06+03:00</lastmod>
|
||||
<lastmod>2020-04-17T19:40:30+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
|
Loading…
Reference in New Issue
Block a user