mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 06:35:03 +01:00
Update notes for 2016-11-14
This commit is contained in:
parent
13a43b792d
commit
006d0c8d6f
@ -222,3 +222,35 @@ $ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X P
|
||||
|
||||
- I applied Atmire's suggestions to fix Listings and Reports for DSpace 5.5 and now it works
|
||||
- There were some issues with the `dspace/modules/jspui/pom.xml`, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire's installation procedure must have changed
|
||||
- So there is apparently this Tomcat native way to limit web crawlers to one session: [Crawler Session Manager](https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve)
|
||||
- After adding that to `server.xml` bots matching the pattern in the configuration will all use ONE session, just like normal users:
|
||||
|
||||
```
|
||||
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Encoding: gzip
|
||||
Content-Language: en-US
|
||||
Content-Type: text/html;charset=utf-8
|
||||
Date: Mon, 14 Nov 2016 19:47:29 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=323694E079A53D5D024F839290EDD7E8; Path=/; Secure; HttpOnly
|
||||
Transfer-Encoding: chunked
|
||||
Vary: Accept-Encoding
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Robots-Tag: none
|
||||
|
||||
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Encoding: gzip
|
||||
Content-Language: en-US
|
||||
Content-Type: text/html;charset=utf-8
|
||||
Date: Mon, 14 Nov 2016 19:47:35 GMT
|
||||
Server: nginx
|
||||
Transfer-Encoding: chunked
|
||||
Vary: Accept-Encoding
|
||||
X-Cocoon-Version: 2.2.0
|
||||
```
|
||||
|
||||
- This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!
|
||||
|
@ -30,7 +30,7 @@
|
||||
|
||||
|
||||
<meta itemprop="dateModified" content="2016-11-01T09:21:00+03:00" />
|
||||
<meta itemprop="wordCount" content="1200">
|
||||
<meta itemprop="wordCount" content="1341">
|
||||
|
||||
|
||||
|
||||
@ -356,6 +356,39 @@ $ curl -s -H "accept: application/json" -H "Content-Type: applica
|
||||
<ul>
|
||||
<li>I applied Atmire’s suggestions to fix Listings and Reports for DSpace 5.5 and now it works</li>
|
||||
<li>There were some issues with the <code>dspace/modules/jspui/pom.xml</code>, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire’s installation procedure must have changed</li>
|
||||
<li>So there is apparently this Tomcat native way to limit web crawlers to one session: <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve">Crawler Session Manager</a></li>
|
||||
<li>After adding that to <code>server.xml</code> bots matching the pattern in the configuration will all use ONE session, just like normal users:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Encoding: gzip
|
||||
Content-Language: en-US
|
||||
Content-Type: text/html;charset=utf-8
|
||||
Date: Mon, 14 Nov 2016 19:47:29 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=323694E079A53D5D024F839290EDD7E8; Path=/; Secure; HttpOnly
|
||||
Transfer-Encoding: chunked
|
||||
Vary: Accept-Encoding
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Robots-Tag: none
|
||||
|
||||
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Encoding: gzip
|
||||
Content-Language: en-US
|
||||
Content-Type: text/html;charset=utf-8
|
||||
Date: Mon, 14 Nov 2016 19:47:35 GMT
|
||||
Server: nginx
|
||||
Transfer-Encoding: chunked
|
||||
Vary: Accept-Encoding
|
||||
X-Cocoon-Version: 2.2.0
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!</li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -265,6 +265,39 @@ $ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-T
|
||||
<ul>
|
||||
<li>I applied Atmire&rsquo;s suggestions to fix Listings and Reports for DSpace 5.5 and now it works</li>
|
||||
<li>There were some issues with the <code>dspace/modules/jspui/pom.xml</code>, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire&rsquo;s installation procedure must have changed</li>
|
||||
<li>So there is apparently this Tomcat native way to limit web crawlers to one session: <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve">Crawler Session Manager</a></li>
|
||||
<li>After adding that to <code>server.xml</code> bots matching the pattern in the configuration will all use ONE session, just like normal users:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Encoding: gzip
|
||||
Content-Language: en-US
|
||||
Content-Type: text/html;charset=utf-8
|
||||
Date: Mon, 14 Nov 2016 19:47:29 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=323694E079A53D5D024F839290EDD7E8; Path=/; Secure; HttpOnly
|
||||
Transfer-Encoding: chunked
|
||||
Vary: Accept-Encoding
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Robots-Tag: none
|
||||
|
||||
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Encoding: gzip
|
||||
Content-Language: en-US
|
||||
Content-Type: text/html;charset=utf-8
|
||||
Date: Mon, 14 Nov 2016 19:47:35 GMT
|
||||
Server: nginx
|
||||
Transfer-Encoding: chunked
|
||||
Vary: Accept-Encoding
|
||||
X-Cocoon-Version: 2.2.0
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!</li>
|
||||
</ul>
|
||||
</description>
|
||||
</item>
|
||||
|
@ -265,6 +265,39 @@ $ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-T
|
||||
<ul>
|
||||
<li>I applied Atmire&rsquo;s suggestions to fix Listings and Reports for DSpace 5.5 and now it works</li>
|
||||
<li>There were some issues with the <code>dspace/modules/jspui/pom.xml</code>, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire&rsquo;s installation procedure must have changed</li>
|
||||
<li>So there is apparently this Tomcat native way to limit web crawlers to one session: <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve">Crawler Session Manager</a></li>
|
||||
<li>After adding that to <code>server.xml</code> bots matching the pattern in the configuration will all use ONE session, just like normal users:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Encoding: gzip
|
||||
Content-Language: en-US
|
||||
Content-Type: text/html;charset=utf-8
|
||||
Date: Mon, 14 Nov 2016 19:47:29 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=323694E079A53D5D024F839290EDD7E8; Path=/; Secure; HttpOnly
|
||||
Transfer-Encoding: chunked
|
||||
Vary: Accept-Encoding
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Robots-Tag: none
|
||||
|
||||
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Encoding: gzip
|
||||
Content-Language: en-US
|
||||
Content-Type: text/html;charset=utf-8
|
||||
Date: Mon, 14 Nov 2016 19:47:35 GMT
|
||||
Server: nginx
|
||||
Transfer-Encoding: chunked
|
||||
Vary: Accept-Encoding
|
||||
X-Cocoon-Version: 2.2.0
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!</li>
|
||||
</ul>
|
||||
</description>
|
||||
</item>
|
||||
|
@ -264,6 +264,39 @@ $ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-T
|
||||
<ul>
|
||||
<li>I applied Atmire&rsquo;s suggestions to fix Listings and Reports for DSpace 5.5 and now it works</li>
|
||||
<li>There were some issues with the <code>dspace/modules/jspui/pom.xml</code>, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire&rsquo;s installation procedure must have changed</li>
|
||||
<li>So there is apparently this Tomcat native way to limit web crawlers to one session: <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve">Crawler Session Manager</a></li>
|
||||
<li>After adding that to <code>server.xml</code> bots matching the pattern in the configuration will all use ONE session, just like normal users:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Encoding: gzip
|
||||
Content-Language: en-US
|
||||
Content-Type: text/html;charset=utf-8
|
||||
Date: Mon, 14 Nov 2016 19:47:29 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=323694E079A53D5D024F839290EDD7E8; Path=/; Secure; HttpOnly
|
||||
Transfer-Encoding: chunked
|
||||
Vary: Accept-Encoding
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Robots-Tag: none
|
||||
|
||||
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Encoding: gzip
|
||||
Content-Language: en-US
|
||||
Content-Type: text/html;charset=utf-8
|
||||
Date: Mon, 14 Nov 2016 19:47:35 GMT
|
||||
Server: nginx
|
||||
Transfer-Encoding: chunked
|
||||
Vary: Accept-Encoding
|
||||
X-Cocoon-Version: 2.2.0
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!</li>
|
||||
</ul>
|
||||
</description>
|
||||
</item>
|
||||
|
Loading…
Reference in New Issue
Block a user