Update notes for 2017-11-07

This commit is contained in:
2017-11-07 18:23:10 +02:00
parent 0ffe1f07b0
commit 6a60dfa9e4
3 changed files with 18 additions and 10 deletions

View File

@ -377,7 +377,7 @@ $ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017-
- I emailed CIAT about the session issue, user agent issue, and told them they should not scrape the HTML contents of communities, instead using the REST API
- About Baidu, I found a link to their [robots.txt tester tool](http://ziyuan.baidu.com/robots/)
- It seems like our robots.txt file is valid, and they claim to recognize that URLs like `/discover` should be forbidden:
- It seems like our robots.txt file is valid, and they claim to recognize that URLs like `/discover` should be forbidden (不允许, aka "not allowed"):
![Baidu robots.txt tester](/cgspace-notes/2017/11/baidu-robotstxt.png)