mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-23 13:34:32 +01:00
Add notes for 2017-11-13
This commit is contained in:
parent
41bdd24079
commit
e77e3a13ae
@ -596,3 +596,30 @@ Server: nginx
|
||||
|
||||
- The first request works, second is denied with an HTTP 503!
|
||||
- I need to remember to check the Munin graphs for PostgreSQL and JVM next week to see how this affects them
|
||||
|
||||
## 2017-11-13
|
||||
|
||||
- Just a few hours into the day and it really looks like the Baidu rate limiting is working, HTTP 200 vs 503:
|
||||
|
||||
```
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "13/Nov/2017" | grep "Baiduspider" | grep -c " 200 "
|
||||
508
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "13/Nov/2017" | grep "Baiduspider" | grep -c " 503 "
|
||||
5462
|
||||
```
|
||||
|
||||
- Helping Sisay proof 47 records for IITA: https://dspacetest.cgiar.org/handle/10568/97029
|
||||
- From looking at the data in OpenRefine I found:
|
||||
- Errors in `cg.authorship.types`
|
||||
- Errors in `cg.coverage.country` (smart quote in "COTE D’IVOIRE", "HAWAII" is not a country)
|
||||
- Whitespace issues in some `cg.contributor.affiliatio
|
||||
- Whitespace issues in some `cg.identifier.doi` fields and most values are using HTTP instead of HTTPS
|
||||
- Whitespace issues in some `dc.contributor.author` fields
|
||||
- Issue with invalid `dc.date.issued` value "2011-3"
|
||||
- Description fields are poorly copy–pasted
|
||||
- Whitespace issues in `dc.description.sponsorship`
|
||||
- Lots of inconsistency in `dc.format.extent` (mixed dash style, periods at the end of values)
|
||||
- Whitespace errors in `dc.identifier.citation`
|
||||
- Whitespace errors in `dc.subject`
|
||||
- Whitespace errors in `dc.title`
|
||||
- After uploading and looking at the data in DSpace Test I saw more errors with CRPs, subjects (one item had four copies of all of its subjects, another had a "." in it), affiliations, sponsors, etc.
|
||||
|
@ -38,7 +38,7 @@ COPY 54701
|
||||
|
||||
<meta property="article:published_time" content="2017-11-02T09:37:54+02:00"/>
|
||||
|
||||
<meta property="article:modified_time" content="2017-11-12T10:41:44+02:00"/>
|
||||
<meta property="article:modified_time" content="2017-11-12T18:48:52+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -86,9 +86,9 @@ COPY 54701
|
||||
"@type": "BlogPosting",
|
||||
"headline": "November, 2017",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2017-11/",
|
||||
"wordCount": "3351",
|
||||
"wordCount": "3544",
|
||||
"datePublished": "2017-11-02T09:37:54+02:00",
|
||||
"dateModified": "2017-11-12T10:41:44+02:00",
|
||||
"dateModified": "2017-11-12T18:48:52+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -812,6 +812,39 @@ Server: nginx
|
||||
<li>I need to remember to check the Munin graphs for PostgreSQL and JVM next week to see how this affects them</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2017-11-13">2017-11-13</h2>
|
||||
|
||||
<ul>
|
||||
<li>Just a few hours into the day and it really looks like the Baidu rate limiting is working, HTTP 200 vs 503:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "13/Nov/2017" | grep "Baiduspider" | grep -c " 200 "
|
||||
508
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "13/Nov/2017" | grep "Baiduspider" | grep -c " 503 "
|
||||
5462
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Helping Sisay proof 47 records for IITA: <a href="https://dspacetest.cgiar.org/handle/10568/97029">https://dspacetest.cgiar.org/handle/10568/97029</a></li>
|
||||
<li>From looking at the data in OpenRefine I found:
|
||||
|
||||
<ul>
|
||||
<li>Errors in <code>cg.authorship.types</code></li>
|
||||
<li>Errors in <code>cg.coverage.country</code> (smart quote in “COTE D’IVOIRE”, “HAWAII” is not a country)</li>
|
||||
<li>Whitespace issues in some `cg.contributor.affiliatio</li>
|
||||
<li>Whitespace issues in some <code>cg.identifier.doi</code> fields and most values are using HTTP instead of HTTPS</li>
|
||||
<li>Whitespace issues in some <code>dc.contributor.author</code> fields</li>
|
||||
<li>Issue with invalid <code>dc.date.issued</code> value “2011-3”</li>
|
||||
<li>Description fields are poorly copy–pasted</li>
|
||||
<li>Whitespace issues in <code>dc.description.sponsorship</code></li>
|
||||
<li>Lots of inconsistency in <code>dc.format.extent</code> (mixed dash style, periods at the end of values)</li>
|
||||
<li>Whitespace errors in <code>dc.identifier.citation</code></li>
|
||||
<li>Whitespace errors in <code>dc.subject</code></li>
|
||||
<li>Whitespace errors in <code>dc.title</code></li>
|
||||
</ul></li>
|
||||
<li>After uploading and looking at the data in DSpace Test I saw more errors with CRPs, subjects (one item had four copies of all of its subjects, another had a “.” in it), affiliations, sponsors, etc.</li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2017-11/</loc>
|
||||
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
|
||||
<lastmod>2017-11-12T18:48:52+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -134,7 +134,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
|
||||
<lastmod>2017-11-12T18:48:52+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -145,7 +145,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
|
||||
<lastmod>2017-11-12T18:48:52+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -157,13 +157,13 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
||||
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
|
||||
<lastmod>2017-11-12T18:48:52+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2017-11-12T10:41:44+02:00</lastmod>
|
||||
<lastmod>2017-11-12T18:48:52+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user