mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-09-30
This commit is contained in:
@ -25,7 +25,7 @@ I also fixed a few bugs and improved the region-matching logic
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-09/" />
|
||||
<meta property="article:published_time" content="2022-09-01T09:41:36+03:00" />
|
||||
<meta property="article:modified_time" content="2022-09-28T17:10:23+03:00" />
|
||||
<meta property="article:modified_time" content="2022-09-28T21:22:59+03:00" />
|
||||
|
||||
|
||||
|
||||
@ -46,7 +46,7 @@ I also fixed a few bugs and improved the region-matching logic
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.104.1" />
|
||||
<meta name="generator" content="Hugo 0.104.2" />
|
||||
|
||||
|
||||
|
||||
@ -56,9 +56,9 @@ I also fixed a few bugs and improved the region-matching logic
|
||||
"@type": "BlogPosting",
|
||||
"headline": "September, 2022",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2022-09/",
|
||||
"wordCount": "3112",
|
||||
"wordCount": "3621",
|
||||
"datePublished": "2022-09-01T09:41:36+03:00",
|
||||
"dateModified": "2022-09-28T17:10:23+03:00",
|
||||
"dateModified": "2022-09-28T21:22:59+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -685,7 +685,84 @@ harvesting of meat from wildlife and not from livestock.</p>
|
||||
</span></span><span style="display:flex;"><span>Fixed 3 occurences of: Alessandra Galie: 0000-0001-9868-7733
|
||||
</span></span><span style="display:flex;"><span>Fixed 1 occurences of: Amanda De Filippo: 0000-0002-1536-3221
|
||||
</span></span><span style="display:flex;"><span>...
|
||||
</span></span></code></pre></div><!-- raw HTML omitted -->
|
||||
</span></span></code></pre></div><h2 id="2022-09-29">2022-09-29</h2>
|
||||
<ul>
|
||||
<li>I’ve been checking the size of the nginx proxy cache the last few days and it always seems to hover around 14,000 entries and 385MB:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># find /var/cache/nginx/rest_cache/ -type f | wc -l
|
||||
</span></span><span style="display:flex;"><span>14202
|
||||
</span></span><span style="display:flex;"><span># du -sh /var/cache/nginx/rest_cache
|
||||
</span></span><span style="display:flex;"><span>384M /var/cache/nginx/rest_cache
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Also on that note I’m trying to implement a workaround for a potential caching issue that causes MEL to not be able to update items on DSpace Test
|
||||
<ul>
|
||||
<li>I <em>think</em> we might need to allow requests with a JSESSIONID to bypass the cache, but I have to verify with Salem</li>
|
||||
<li>We can do this with an nginx map:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># Check <span style="color:#66d9ef">if</span> the JSESSIONID cookie is present and contains a 32-character hex
|
||||
</span></span><span style="display:flex;"><span># value, which would mean that a user is actively attempting to re-use their
|
||||
</span></span><span style="display:flex;"><span># Tomcat session. Then we set the $active_user_session variable and use it
|
||||
</span></span><span style="display:flex;"><span># to bypass the nginx proxy cache in REST requests.
|
||||
</span></span><span style="display:flex;"><span>map $cookie_jsessionid $active_user_session {
|
||||
</span></span><span style="display:flex;"><span> # requests with an empty key are not evaluated by limit_req
|
||||
</span></span><span style="display:flex;"><span> # see: http://nginx.org/en/docs/http/ngx_http_limit_req_module.html
|
||||
</span></span><span style="display:flex;"><span> default '';
|
||||
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
|
||||
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span> '~[A-Z0-9]{32}' 1;
|
||||
</span></span><span style="display:flex;"><span>}
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then in the location block where we do the proxy cache:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span> # Don't cache when user Shift-refreshes (Cache-Control: no-cache) or
|
||||
</span></span><span style="display:flex;"><span> # when a client has an active session (see the $cookie_jsessionid map).
|
||||
</span></span><span style="display:flex;"><span> proxy_cache_bypass $http_cache_control $active_user_session;
|
||||
</span></span><span style="display:flex;"><span> proxy_no_cache $http_cache_control $active_user_session;
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>I found one client making 10,000 requests using a Windows 98 user agent:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>Mozilla/4.0 (compatible; MSIE 5.00; Windows 98)
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>They all come from one IP address (129.227.149.43) in Hong Kong
|
||||
<ul>
|
||||
<li>The IP belongs to a hosting provider called Zenlayer</li>
|
||||
<li>I will add this IP to the nginx bot networks and purge its hits</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/check-spider-ip-hits.sh -f /tmp/ip -p
|
||||
</span></span><span style="display:flex;"><span>Purging 33027 hits from 129.227.149.43 in statistics
|
||||
</span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010">
|
||||
</span></span></span><span style="display:flex;"><span><span style="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 33027
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>So it seems we’ve seen this bot before and the total number is much higher than the 10,000 this month</li>
|
||||
<li>I had a call with Salem and we verified that the nginx cache bypass for clients who provide a JSESSIONID fixes their issue with updating items/bitstreams from MEL
|
||||
<ul>
|
||||
<li>The issue was that they delete all metadata and bitstreams, then add them again to make sure everything is up to date, and in that process they also re-request the item with all expands to get the bitstreams, which ends up getting cached and then they try to delete the old bitstream</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I also noticed that someone made a <a href="https://github.com/DSpace/DSpace/pull/8343">pull request to enable POSTing bitstreams to a particular bundle</a> and it works, so that’s awesome!</li>
|
||||
</ul>
|
||||
<h2 id="2022-09-30">2022-09-30</h2>
|
||||
<ul>
|
||||
<li>I applied <a href="https://github.com/DSpace/DSpace/pull/8343">the patch for POSTing bitstreams to other bundles</a> on CGSpace</li>
|
||||
<li>Testing a few other DSpace 6.4 patches on DSpace Test:
|
||||
<ul>
|
||||
<li><a href="https://github.com/DSpace/DSpace/pull/1901">DS-3791 Make sure the “yearDifference” takes into account that a gap of 10 year contains 11 years</a></li>
|
||||
<li><a href="https://github.com/DSpace/DSpace/pull/2501">DS-3873 Limit the usage of PDFBoxThumbnail to PDFs</a></li>
|
||||
<li><a href="https://github.com/DSpace/DSpace/pull/2161">Reduce itemCounter init</a></li>
|
||||
<li><a href="https://github.com/DSpace/DSpace/pull/2201">ImageMagick: Only execute “identify” on first page</a></li>
|
||||
<li><a href="https://github.com/DSpace/DSpace/pull/2371">DS-3881: Show no total results on search-filter</a></li>
|
||||
<li><a href="https://github.com/DSpace/DSpace/pull/2699">pass value instead of qualifier to method</a></li>
|
||||
<li><a href="https://github.com/DSpace/DSpace/pull/7993">dspace-api: check for null AND empty qualifier in findByElement()</a></li>
|
||||
<li><a href="https://github.com/DSpace/DSpace/pull/7995">Avoid exporting mapped Item more than once</a></li>
|
||||
<li><a href="https://github.com/DSpace/DSpace/pull/3162">[DS-4574] v. 6 - Upgrade DBCP2 dependency</a></li>
|
||||
<li><a href="https://github.com/DSpace/DSpace/pull/2742">bump up pdfbox version on 6.x to match main branch</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user