Update notes for 2017-08-01

2025-01-27 05:49:12 +01:00 · 2017-08-01 12:03:37 +03:00
parent e3e602881e
commit 5b11434f0f
30 changed files with 91 additions and 71 deletions
--- a/public/index.html
+++ b/public/index.html
@ -119,6 +119,8 @@
 </ul></li>
 <li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
 <li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
+<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
+<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
 </ul>

 <p></p>