mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2017-11-13
This commit is contained in:
@ -596,3 +596,30 @@ Server: nginx
|
||||
|
||||
- The first request works, second is denied with an HTTP 503!
|
||||
- I need to remember to check the Munin graphs for PostgreSQL and JVM next week to see how this affects them
|
||||
|
||||
## 2017-11-13
|
||||
|
||||
- Just a few hours into the day and it really looks like the Baidu rate limiting is working, HTTP 200 vs 503:
|
||||
|
||||
```
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "13/Nov/2017" | grep "Baiduspider" | grep -c " 200 "
|
||||
508
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "13/Nov/2017" | grep "Baiduspider" | grep -c " 503 "
|
||||
5462
|
||||
```
|
||||
|
||||
- Helping Sisay proof 47 records for IITA: https://dspacetest.cgiar.org/handle/10568/97029
|
||||
- From looking at the data in OpenRefine I found:
|
||||
- Errors in `cg.authorship.types`
|
||||
- Errors in `cg.coverage.country` (smart quote in "COTE D’IVOIRE", "HAWAII" is not a country)
|
||||
- Whitespace issues in some `cg.contributor.affiliatio
|
||||
- Whitespace issues in some `cg.identifier.doi` fields and most values are using HTTP instead of HTTPS
|
||||
- Whitespace issues in some `dc.contributor.author` fields
|
||||
- Issue with invalid `dc.date.issued` value "2011-3"
|
||||
- Description fields are poorly copy–pasted
|
||||
- Whitespace issues in `dc.description.sponsorship`
|
||||
- Lots of inconsistency in `dc.format.extent` (mixed dash style, periods at the end of values)
|
||||
- Whitespace errors in `dc.identifier.citation`
|
||||
- Whitespace errors in `dc.subject`
|
||||
- Whitespace errors in `dc.title`
|
||||
- After uploading and looking at the data in DSpace Test I saw more errors with CRPs, subjects (one item had four copies of all of its subjects, another had a "." in it), affiliations, sponsors, etc.
|
||||
|
Reference in New Issue
Block a user