Add notes for 2017-11-13

This commit is contained in:
Alan Orth 2017-11-13 12:04:41 +02:00
parent 41bdd24079
commit e77e3a13ae
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 68 additions and 8 deletions

View File

@ -596,3 +596,30 @@ Server: nginx
- The first request works, second is denied with an HTTP 503!
- I need to remember to check the Munin graphs for PostgreSQL and JVM next week to see how this affects them
## 2017-11-13
- Just a few hours into the day and it really looks like the Baidu rate limiting is working, HTTP 200 vs 503:
```
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "13/Nov/2017" | grep "Baiduspider" | grep -c " 200 "
508
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "13/Nov/2017" | grep "Baiduspider" | grep -c " 503 "
5462
```
- Helping Sisay proof 47 records for IITA: https://dspacetest.cgiar.org/handle/10568/97029
- From looking at the data in OpenRefine I found:
- Errors in `cg.authorship.types`
- Errors in `cg.coverage.country` (smart quote in "COTE DIVOIRE", "HAWAII" is not a country)
- Whitespace issues in some `cg.contributor.affiliatio
- Whitespace issues in some `cg.identifier.doi` fields and most values are using HTTP instead of HTTPS
- Whitespace issues in some `dc.contributor.author` fields
- Issue with invalid `dc.date.issued` value "2011-3"
- Description fields are poorly copypasted
- Whitespace issues in `dc.description.sponsorship`
- Lots of inconsistency in `dc.format.extent` (mixed dash style, periods at the end of values)
- Whitespace errors in `dc.identifier.citation`
- Whitespace errors in `dc.subject`
- Whitespace errors in `dc.title`
- After uploading and looking at the data in DSpace Test I saw more errors with CRPs, subjects (one item had four copies of all of its subjects, another had a "." in it), affiliations, sponsors, etc.

View File

@ -38,7 +38,7 @@ COPY 54701
<meta property="article:published_time" content="2017-11-02T09:37:54&#43;02:00"/>
<meta property="article:modified_time" content="2017-11-12T10:41:44&#43;02:00"/>
<meta property="article:modified_time" content="2017-11-12T18:48:52&#43;02:00"/>
@ -86,9 +86,9 @@ COPY 54701
"@type": "BlogPosting",
"headline": "November, 2017",
"url": "https://alanorth.github.io/cgspace-notes/2017-11/",
"wordCount": "3351",
"wordCount": "3544",
"datePublished": "2017-11-02T09:37:54&#43;02:00",
"dateModified": "2017-11-12T10:41:44&#43;02:00",
"dateModified": "2017-11-12T18:48:52&#43;02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -812,6 +812,39 @@ Server: nginx
<li>I need to remember to check the Munin graphs for PostgreSQL and JVM next week to see how this affects them</li>
</ul>
<h2 id="2017-11-13">2017-11-13</h2>
<ul>
<li>Just a few hours into the day and it really looks like the Baidu rate limiting is working, HTTP 200 vs 503:</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep &quot;13/Nov/2017&quot; | grep &quot;Baiduspider&quot; | grep -c &quot; 200 &quot;
508
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep &quot;13/Nov/2017&quot; | grep &quot;Baiduspider&quot; | grep -c &quot; 503 &quot;
5462
</code></pre>
<ul>
<li>Helping Sisay proof 47 records for IITA: <a href="https://dspacetest.cgiar.org/handle/10568/97029">https://dspacetest.cgiar.org/handle/10568/97029</a></li>
<li>From looking at the data in OpenRefine I found:
<ul>
<li>Errors in <code>cg.authorship.types</code></li>
<li>Errors in <code>cg.coverage.country</code> (smart quote in &ldquo;COTE DIVOIRE&rdquo;, &ldquo;HAWAII&rdquo; is not a country)</li>
<li>Whitespace issues in some `cg.contributor.affiliatio</li>
<li>Whitespace issues in some <code>cg.identifier.doi</code> fields and most values are using HTTP instead of HTTPS</li>
<li>Whitespace issues in some <code>dc.contributor.author</code> fields</li>
<li>Issue with invalid <code>dc.date.issued</code> value &ldquo;2011-3&rdquo;</li>
<li>Description fields are poorly copypasted</li>
<li>Whitespace issues in <code>dc.description.sponsorship</code></li>
<li>Lots of inconsistency in <code>dc.format.extent</code> (mixed dash style, periods at the end of values)</li>
<li>Whitespace errors in <code>dc.identifier.citation</code></li>
<li>Whitespace errors in <code>dc.subject</code></li>
<li>Whitespace errors in <code>dc.title</code></li>
</ul></li>
<li>After uploading and looking at the data in DSpace Test I saw more errors with CRPs, subjects (one item had four copies of all of its subjects, another had a &ldquo;.&rdquo; in it), affiliations, sponsors, etc.</li>
</ul>

View File

@ -4,7 +4,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2017-11/</loc>
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
<lastmod>2017-11-12T18:48:52+02:00</lastmod>
</url>
<url>
@ -134,7 +134,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
<lastmod>2017-11-12T18:48:52+02:00</lastmod>
<priority>0</priority>
</url>
@ -145,7 +145,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
<lastmod>2017-11-12T18:48:52+02:00</lastmod>
<priority>0</priority>
</url>
@ -157,13 +157,13 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
<lastmod>2017-11-12T18:48:52+02:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
<lastmod>2017-11-12T18:48:52+02:00</lastmod>
<priority>0</priority>
</url>