mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-17 20:27:05 +01:00
Update notes for 2017-11-07
This commit is contained in:
parent
950b0d3a24
commit
7e18f5e5d2
@ -364,3 +364,15 @@ $ grep -E '2017-11-07 0[2-8]' dspace.log.2017-11-07 | grep -o -E 'ip_addr=[0-9.]
|
|||||||
# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 104.196.152.243 | grep -c pdf
|
# grep -E "07/Nov/2017:1[234]:" /var/log/nginx/access.log | grep 104.196.152.243 | grep -c pdf
|
||||||
0
|
0
|
||||||
```
|
```
|
||||||
|
|
||||||
|
- About CIAT, I think I need to encourage them to specify a user agent string for their requests, because they are not reuising their Tomcat session and they are creating thousands of sessions per day
|
||||||
|
- All CIAT requests vs unique ones:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017-11-07 | wc -l
|
||||||
|
3506
|
||||||
|
$ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017-11-07 | sort | uniq | wc -l
|
||||||
|
3506
|
||||||
|
```
|
||||||
|
|
||||||
|
- I emailed CIAT about the session issue, user agent issue, and told them they should not scrape the HTML contents of communities, instead using the REST API
|
||||||
|
@ -38,7 +38,7 @@ COPY 54701
|
|||||||
|
|
||||||
<meta property="article:published_time" content="2017-11-02T09:37:54+02:00"/>
|
<meta property="article:published_time" content="2017-11-02T09:37:54+02:00"/>
|
||||||
|
|
||||||
<meta property="article:modified_time" content="2017-11-07T14:50:01+02:00"/>
|
<meta property="article:modified_time" content="2017-11-07T17:03:49+02:00"/>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -86,9 +86,9 @@ COPY 54701
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "November, 2017",
|
"headline": "November, 2017",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2017-11/",
|
"url": "https://alanorth.github.io/cgspace-notes/2017-11/",
|
||||||
"wordCount": "1905",
|
"wordCount": "1997",
|
||||||
"datePublished": "2017-11-02T09:37:54+02:00",
|
"datePublished": "2017-11-02T09:37:54+02:00",
|
||||||
"dateModified": "2017-11-07T14:50:01+02:00",
|
"dateModified": "2017-11-07T17:03:49+02:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -552,6 +552,21 @@ $ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{3
|
|||||||
0
|
0
|
||||||
</code></pre>
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>About CIAT, I think I need to encourage them to specify a user agent string for their requests, because they are not reuising their Tomcat session and they are creating thousands of sessions per day</li>
|
||||||
|
<li>All CIAT requests vs unique ones:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017-11-07 | wc -l
|
||||||
|
3506
|
||||||
|
$ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017-11-07 | sort | uniq | wc -l
|
||||||
|
3506
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I emailed CIAT about the session issue, user agent issue, and told them they should not scrape the HTML contents of communities, instead using the REST API</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -29,7 +29,7 @@ Disallow: /cgspace-notes/2015-12/
|
|||||||
Disallow: /cgspace-notes/2015-11/
|
Disallow: /cgspace-notes/2015-11/
|
||||||
Disallow: /cgspace-notes/
|
Disallow: /cgspace-notes/
|
||||||
Disallow: /cgspace-notes/categories/
|
Disallow: /cgspace-notes/categories/
|
||||||
Disallow: /cgspace-notes/tags/notes/
|
|
||||||
Disallow: /cgspace-notes/categories/notes/
|
Disallow: /cgspace-notes/categories/notes/
|
||||||
|
Disallow: /cgspace-notes/tags/notes/
|
||||||
Disallow: /cgspace-notes/post/
|
Disallow: /cgspace-notes/post/
|
||||||
Disallow: /cgspace-notes/tags/
|
Disallow: /cgspace-notes/tags/
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2017-11/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2017-11/</loc>
|
||||||
<lastmod>2017-11-07T14:50:01+02:00</lastmod>
|
<lastmod>2017-11-07T17:03:49+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -134,7 +134,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2017-11-07T14:50:01+02:00</lastmod>
|
<lastmod>2017-11-07T17:03:49+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -143,27 +143,27 @@
|
|||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
|
||||||
<lastmod>2017-11-07T14:50:01+02:00</lastmod>
|
|
||||||
<priority>0</priority>
|
|
||||||
</url>
|
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||||
<lastmod>2017-09-28T12:00:49+03:00</lastmod>
|
<lastmod>2017-09-28T12:00:49+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
<url>
|
||||||
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
|
<lastmod>2017-11-07T17:03:49+02:00</lastmod>
|
||||||
|
<priority>0</priority>
|
||||||
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
||||||
<lastmod>2017-11-07T14:50:01+02:00</lastmod>
|
<lastmod>2017-11-07T17:03:49+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2017-11-07T14:50:01+02:00</lastmod>
|
<lastmod>2017-11-07T17:03:49+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user