mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-11-26
This commit is contained in:
@ -18,7 +18,7 @@ $ zstd statistics-2019.json
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2021-11/" />
|
||||
<meta property="article:published_time" content="2021-11-02T22:27:07+02:00" />
|
||||
<meta property="article:modified_time" content="2021-11-21T13:45:30+02:00" />
|
||||
<meta property="article:modified_time" content="2021-11-22T16:47:50+02:00" />
|
||||
|
||||
|
||||
|
||||
@ -42,9 +42,9 @@ $ zstd statistics-2019.json
|
||||
"@type": "BlogPosting",
|
||||
"headline": "November, 2021",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2021-11/",
|
||||
"wordCount": "1339",
|
||||
"wordCount": "1604",
|
||||
"datePublished": "2021-11-02T22:27:07+02:00",
|
||||
"dateModified": "2021-11-21T13:45:30+02:00",
|
||||
"dateModified": "2021-11-22T16:47:50+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -335,7 +335,50 @@ Purging 10893 hits from 87.203.87.141 in statistics
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
<h2 id="2021-11-23">2021-11-23</h2>
|
||||
<ul>
|
||||
<li>Help RTB colleagues with thumbnail issues on their <a href="https://hdl.handle.net/10568/114576">2020 Annual Report</a>
|
||||
<ul>
|
||||
<li>The PDF seems to be in landscape mode or something and the first page is half width, so the thumbnail renders with the left half being white</li>
|
||||
<li>I generated a new one manually with libvips and it is better:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ vipsthumbnail AR<span style="color:#ae81ff">\ </span>RTB<span style="color:#ae81ff">\ </span>2020.pdf -s <span style="color:#ae81ff">600</span> -o <span style="color:#e6db74">'%s.jpg[Q=85,optimize_coding,strip]'</span>
|
||||
</code></pre></div><ul>
|
||||
<li>I sent an email to the OpenArchives.org contact to ask for help with the OAI validator
|
||||
<ul>
|
||||
<li>Someone responded to say that there have been a number of complaints about this on the oai-pmh mailing list recently…</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I sent an email to Pythagoras from GARDIAN to ask if they can use a more specific user agent than “Microsoft Internet Explorer” for their scraper
|
||||
<ul>
|
||||
<li>He said he will change the user agent</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="2021-11-24">2021-11-24</h2>
|
||||
<ul>
|
||||
<li>I had an idea to check our Solr statistics for hits from all the IPs that I have listed in nginx as being bots
|
||||
<ul>
|
||||
<li>Other than a few that I ruled out that <em>may</em> be humans, these are all making requests within one month or with no user agent, which is highly suspicious:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ ./ilri/check-spider-ip-hits.sh -f /tmp/ips.txt
|
||||
Found 8352 hits from 138.201.49.199 in statistics
|
||||
Found 9374 hits from 78.46.89.18 in statistics
|
||||
Found 2112 hits from 93.179.69.74 in statistics
|
||||
Found 1 hits from 31.6.77.23 in statistics
|
||||
Found 5 hits from 34.209.213.122 in statistics
|
||||
Found 86772 hits from 163.172.68.99 in statistics
|
||||
Found 77 hits from 163.172.70.248 in statistics
|
||||
Found 15842 hits from 163.172.71.24 in statistics
|
||||
Found 172954 hits from 104.154.216.0 in statistics
|
||||
Found 3 hits from 188.134.31.88 in statistics
|
||||
<span style="color:#960050;background-color:#1e0010">
|
||||
</span><span style="color:#960050;background-color:#1e0010"></span>Total number of hits from bots: 295492
|
||||
</code></pre></div><!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user