mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 06:35:03 +01:00
Update notes for 2020-09-10
This commit is contained in:
parent
9d0f0cbfde
commit
7b3aa58055
@ -190,5 +190,20 @@ Would fix 3 occurences of: SOUTHWEST ASIA
|
||||
|
||||
- I think we need to wait for the web team, though, as they need to update their mappings
|
||||
- Not to mention that we'll need to give WLE and CCAFS time to update their harvesters as well... hmmm
|
||||
- Looking at the top user agents active on CGSpace in 2020-08 and I see:
|
||||
- `Delphi 2009`: 235353 (this is GARDIAN harvester I guess, as the IP is in Greece)
|
||||
- `Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)`: 57004 (IP is 18.196.100.94, and the requests seem to be for CTA's content)
|
||||
- `RTB website BOT`: 12282
|
||||
- `ILRI Livestock Website Publications importer BOT`: 9393
|
||||
- Shit, I meant to add Delphi to the DSpace spider agents list last month but I guess I didn't commit the change
|
||||
- HTTrack is in the agents list so I'm not sure why DSpace registers a hit from that request
|
||||
- Also, I am surprised to see the RTB and ILRI bots here because they have "BOT" in the name and that should also be dropped
|
||||
- I also see hits from `curl` and `Java/1.8.0_66` and `Apache-HttpClient` so WTF... those are supposed to be dropped by the default agents list
|
||||
- Some IP `2607:f298:5:101d:f816:3eff:fed9:a484` made 9,000 requests with the `RI/1.0` user agent this year...
|
||||
- That's on DreamHost...?
|
||||
- I purged 448658 hits from these agents and added `Delphi` to our local agents overload for Solr as well as Tomcat's Crawler Session Manager Valve so that it forces them to re-use a single session
|
||||
- I made a pull request on the COUNTER-Robots project for the Daum robot: https://github.com/atmire/COUNTER-Robots/pull/38
|
||||
- This bot made 8,000 requests to CGSpace this year
|
||||
- I purged about 20,000 total requests from this bot from our Solr stats for the last few years
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -25,7 +25,7 @@ I filed an issue on OpenRXV to make some minor edits to the admin UI: https://gi
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-09/" />
|
||||
<meta property="article:published_time" content="2020-09-02T15:35:54+03:00" />
|
||||
<meta property="article:modified_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="article:modified_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="September, 2020"/>
|
||||
@ -55,9 +55,9 @@ I filed an issue on OpenRXV to make some minor edits to the admin UI: https://gi
|
||||
"@type": "BlogPosting",
|
||||
"headline": "September, 2020",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2020-09/",
|
||||
"wordCount": "1159",
|
||||
"wordCount": "1398",
|
||||
"datePublished": "2020-09-02T15:35:54+03:00",
|
||||
"dateModified": "2020-09-08T12:10:08+03:00",
|
||||
"dateModified": "2020-09-10T12:18:03+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -341,6 +341,30 @@ Would fix 3 occurences of: SOUTHWEST ASIA
|
||||
<li>Not to mention that we’ll need to give WLE and CCAFS time to update their harvesters as well… hmmm</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Looking at the top user agents active on CGSpace in 2020-08 and I see:
|
||||
<ul>
|
||||
<li><code>Delphi 2009</code>: 235353 (this is GARDIAN harvester I guess, as the IP is in Greece)</li>
|
||||
<li><code>Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)</code>: 57004 (IP is 18.196.100.94, and the requests seem to be for CTA’s content)</li>
|
||||
<li><code>RTB website BOT</code>: 12282</li>
|
||||
<li><code>ILRI Livestock Website Publications importer BOT</code>: 9393</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Shit, I meant to add Delphi to the DSpace spider agents list last month but I guess I didn’t commit the change</li>
|
||||
<li>HTTrack is in the agents list so I’m not sure why DSpace registers a hit from that request</li>
|
||||
<li>Also, I am surprised to see the RTB and ILRI bots here because they have “BOT” in the name and that should also be dropped</li>
|
||||
<li>I also see hits from <code>curl</code> and <code>Java/1.8.0_66</code> and <code>Apache-HttpClient</code> so WTF… those are supposed to be dropped by the default agents list</li>
|
||||
<li>Some IP <code>2607:f298:5:101d:f816:3eff:fed9:a484</code> made 9,000 requests with the <code>RI/1.0</code> user agent this year…
|
||||
<ul>
|
||||
<li>That’s on DreamHost…?</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I purged 448658 hits from these agents and added <code>Delphi</code> to our local agents overload for Solr as well as Tomcat’s Crawler Session Manager Valve so that it forces them to re-use a single session</li>
|
||||
<li>I made a pull request on the COUNTER-Robots project for the Daum robot: <a href="https://github.com/atmire/COUNTER-Robots/pull/38">https://github.com/atmire/COUNTER-Robots/pull/38</a>
|
||||
<ul>
|
||||
<li>This bot made 8,000 requests to CGSpace this year</li>
|
||||
<li>I purged about 20,000 total requests from this bot from our Solr stats for the last few years</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-09-08T12:10:08+03:00" />
|
||||
<meta property="og:updated_time" content="2020-09-10T12:18:03+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -4,27 +4,27 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
|
||||
<lastmod>2020-09-08T12:10:08+03:00</lastmod>
|
||||
<lastmod>2020-09-10T12:18:03+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2020-09-08T12:10:08+03:00</lastmod>
|
||||
<lastmod>2020-09-10T12:18:03+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2020-09-08T12:10:08+03:00</lastmod>
|
||||
<lastmod>2020-09-10T12:18:03+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2020-09-08T12:10:08+03:00</lastmod>
|
||||
<lastmod>2020-09-10T12:18:03+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2020-09/</loc>
|
||||
<lastmod>2020-09-08T12:10:08+03:00</lastmod>
|
||||
<lastmod>2020-09-10T12:18:03+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
|
Loading…
Reference in New Issue
Block a user