Update notes for 2016-11-14

This commit is contained in:
Alan Orth 2016-11-14 21:48:55 +02:00
parent 13a43b792d
commit 006d0c8d6f
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
5 changed files with 165 additions and 1 deletions

View File

@ -222,3 +222,35 @@ $ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X P
- I applied Atmire's suggestions to fix Listings and Reports for DSpace 5.5 and now it works
- There were some issues with the `dspace/modules/jspui/pom.xml`, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire's installation procedure must have changed
- So there is apparently this Tomcat native way to limit web crawlers to one session: [Crawler Session Manager](https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve)
- After adding that to `server.xml` bots matching the pattern in the configuration will all use ONE session, just like normal users:
```
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 14 Nov 2016 19:47:29 GMT
Server: nginx
Set-Cookie: JSESSIONID=323694E079A53D5D024F839290EDD7E8; Path=/; Secure; HttpOnly
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
X-Robots-Tag: none
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 14 Nov 2016 19:47:35 GMT
Server: nginx
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
```
- This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!

View File

@ -30,7 +30,7 @@
<meta itemprop="dateModified" content="2016-11-01T09:21:00&#43;03:00" />
<meta itemprop="wordCount" content="1200">
<meta itemprop="wordCount" content="1341">
@ -356,6 +356,39 @@ $ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: applica
<ul>
<li>I applied Atmire&rsquo;s suggestions to fix Listings and Reports for DSpace 5.5 and now it works</li>
<li>There were some issues with the <code>dspace/modules/jspui/pom.xml</code>, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire&rsquo;s installation procedure must have changed</li>
<li>So there is apparently this Tomcat native way to limit web crawlers to one session: <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve">Crawler Session Manager</a></li>
<li>After adding that to <code>server.xml</code> bots matching the pattern in the configuration will all use ONE session, just like normal users:</li>
</ul>
<pre><code>$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 14 Nov 2016 19:47:29 GMT
Server: nginx
Set-Cookie: JSESSIONID=323694E079A53D5D024F839290EDD7E8; Path=/; Secure; HttpOnly
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
X-Robots-Tag: none
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 14 Nov 2016 19:47:35 GMT
Server: nginx
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
</code></pre>
<ul>
<li>This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!</li>
</ul>

View File

@ -265,6 +265,39 @@ $ curl -s -H &amp;quot;accept: application/json&amp;quot; -H &amp;quot;Content-T
&lt;ul&gt;
&lt;li&gt;I applied Atmire&amp;rsquo;s suggestions to fix Listings and Reports for DSpace 5.5 and now it works&lt;/li&gt;
&lt;li&gt;There were some issues with the &lt;code&gt;dspace/modules/jspui/pom.xml&lt;/code&gt;, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire&amp;rsquo;s installation procedure must have changed&lt;/li&gt;
&lt;li&gt;So there is apparently this Tomcat native way to limit web crawlers to one session: &lt;a href=&#34;https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve&#34;&gt;Crawler Session Manager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;After adding that to &lt;code&gt;server.xml&lt;/code&gt; bots matching the pattern in the configuration will all use ONE session, just like normal users:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ http --print h https://dspacetest.cgiar.org &#39;User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&#39;
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 14 Nov 2016 19:47:29 GMT
Server: nginx
Set-Cookie: JSESSIONID=323694E079A53D5D024F839290EDD7E8; Path=/; Secure; HttpOnly
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
X-Robots-Tag: none
$ http --print h https://dspacetest.cgiar.org &#39;User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&#39;
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 14 Nov 2016 19:47:35 GMT
Server: nginx
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!&lt;/li&gt;
&lt;/ul&gt;
</description>
</item>

View File

@ -265,6 +265,39 @@ $ curl -s -H &amp;quot;accept: application/json&amp;quot; -H &amp;quot;Content-T
&lt;ul&gt;
&lt;li&gt;I applied Atmire&amp;rsquo;s suggestions to fix Listings and Reports for DSpace 5.5 and now it works&lt;/li&gt;
&lt;li&gt;There were some issues with the &lt;code&gt;dspace/modules/jspui/pom.xml&lt;/code&gt;, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire&amp;rsquo;s installation procedure must have changed&lt;/li&gt;
&lt;li&gt;So there is apparently this Tomcat native way to limit web crawlers to one session: &lt;a href=&#34;https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve&#34;&gt;Crawler Session Manager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;After adding that to &lt;code&gt;server.xml&lt;/code&gt; bots matching the pattern in the configuration will all use ONE session, just like normal users:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ http --print h https://dspacetest.cgiar.org &#39;User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&#39;
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 14 Nov 2016 19:47:29 GMT
Server: nginx
Set-Cookie: JSESSIONID=323694E079A53D5D024F839290EDD7E8; Path=/; Secure; HttpOnly
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
X-Robots-Tag: none
$ http --print h https://dspacetest.cgiar.org &#39;User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&#39;
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 14 Nov 2016 19:47:35 GMT
Server: nginx
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!&lt;/li&gt;
&lt;/ul&gt;
</description>
</item>

View File

@ -264,6 +264,39 @@ $ curl -s -H &amp;quot;accept: application/json&amp;quot; -H &amp;quot;Content-T
&lt;ul&gt;
&lt;li&gt;I applied Atmire&amp;rsquo;s suggestions to fix Listings and Reports for DSpace 5.5 and now it works&lt;/li&gt;
&lt;li&gt;There were some issues with the &lt;code&gt;dspace/modules/jspui/pom.xml&lt;/code&gt;, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire&amp;rsquo;s installation procedure must have changed&lt;/li&gt;
&lt;li&gt;So there is apparently this Tomcat native way to limit web crawlers to one session: &lt;a href=&#34;https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve&#34;&gt;Crawler Session Manager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;After adding that to &lt;code&gt;server.xml&lt;/code&gt; bots matching the pattern in the configuration will all use ONE session, just like normal users:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ http --print h https://dspacetest.cgiar.org &#39;User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&#39;
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 14 Nov 2016 19:47:29 GMT
Server: nginx
Set-Cookie: JSESSIONID=323694E079A53D5D024F839290EDD7E8; Path=/; Secure; HttpOnly
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
X-Robots-Tag: none
$ http --print h https://dspacetest.cgiar.org &#39;User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)&#39;
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 14 Nov 2016 19:47:35 GMT
Server: nginx
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!&lt;/li&gt;
&lt;/ul&gt;
</description>
</item>