Add notes for 2017-11-13

This commit is contained in:
2017-11-13 12:04:41 +02:00
parent 41bdd24079
commit e77e3a13ae
3 changed files with 68 additions and 8 deletions

View File

@ -596,3 +596,30 @@ Server: nginx
- The first request works, second is denied with an HTTP 503!
- I need to remember to check the Munin graphs for PostgreSQL and JVM next week to see how this affects them
## 2017-11-13
- Just a few hours into the day and it really looks like the Baidu rate limiting is working, HTTP 200 vs 503:
```
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "13/Nov/2017" | grep "Baiduspider" | grep -c " 200 "
508
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "13/Nov/2017" | grep "Baiduspider" | grep -c " 503 "
5462
```
- Helping Sisay proof 47 records for IITA: https://dspacetest.cgiar.org/handle/10568/97029
- From looking at the data in OpenRefine I found:
- Errors in `cg.authorship.types`
- Errors in `cg.coverage.country` (smart quote in "COTE DIVOIRE", "HAWAII" is not a country)
- Whitespace issues in some `cg.contributor.affiliatio
- Whitespace issues in some `cg.identifier.doi` fields and most values are using HTTP instead of HTTPS
- Whitespace issues in some `dc.contributor.author` fields
- Issue with invalid `dc.date.issued` value "2011-3"
- Description fields are poorly copypasted
- Whitespace issues in `dc.description.sponsorship`
- Lots of inconsistency in `dc.format.extent` (mixed dash style, periods at the end of values)
- Whitespace errors in `dc.identifier.citation`
- Whitespace errors in `dc.subject`
- Whitespace errors in `dc.title`
- After uploading and looking at the data in DSpace Test I saw more errors with CRPs, subjects (one item had four copies of all of its subjects, another had a "." in it), affiliations, sponsors, etc.