mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2020-08-10
This commit is contained in:
@ -350,4 +350,22 @@ $ wc -l /tmp/2020-08-09-orcid-identifiers-uniq.csv
|
||||
1949 /tmp/2020-08-09-orcid-identifiers-uniq.csv
|
||||
```
|
||||
|
||||
- I looked into the strange Solr record above that had "{set=830}" in the communities and collections
|
||||
- There are exactly 11724 records like this in the current CGSpace (DSpace 5.8) statistics-2018 Solr core
|
||||
- None of them have an `id` or `type` field!
|
||||
- I see 242,000 of them in the statistics-2017 core, 185,063 in the statistics-2016 core... all the way to 2010, but not in 2019 or the current statistics core
|
||||
- I decided to purge all of these records from CGSpace right now so they don't even have a chance at being an issue on the real migration:
|
||||
|
||||
```
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2018/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>owningColl:/.*set.*/</query></delete>'
|
||||
...
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>owningColl:/.*set.*/</query></delete>'
|
||||
```
|
||||
|
||||
- I added `Googlebot` and `Twitterbot` to the list of explicitly allowed bots
|
||||
- In Google's case, they were getting lumped in with all the other bad bots and then important links like the sitemaps were returning HTTP 503, but they generally respect `robots.txt` so we should just allow them (perhaps we can control the crawl rate in the webmaster console)
|
||||
- In Twitter's case they were also getting lumped in with the bad bots too, but really they only make ~50 or so requests a day when someone posts a CGSpace link on Twitter
|
||||
- I tagged the ISO 3166-1 Alpha2 country codes on all items on CGSpace using my [CountryCodeTagger](https://github.com/ilri/cgspace-java-helpers) curation task
|
||||
- I still need to set up a cron job for it...
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user