mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Update notes for 2017-11-07
This commit is contained in:
@ -377,7 +377,7 @@ $ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017-
|
||||
|
||||
- I emailed CIAT about the session issue, user agent issue, and told them they should not scrape the HTML contents of communities, instead using the REST API
|
||||
- About Baidu, I found a link to their [robots.txt tester tool](http://ziyuan.baidu.com/robots/)
|
||||
- It seems like our robots.txt file is valid, and they claim to recognize that URLs like `/discover` should be forbidden:
|
||||
- It seems like our robots.txt file is valid, and they claim to recognize that URLs like `/discover` should be forbidden (不允许, aka "not allowed"):
|
||||
|
||||

|
||||
|
||||
|
Reference in New Issue
Block a user