From 6d071a642633516198010345303f2599541633ce Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Wed, 13 Sep 2017 09:53:54 +0300 Subject: [PATCH] Add notes for 2017-09-13 --- content/post/2017-09.md | 91 ++++++++++++++++++++++++++- public/2017-09/index.html | 117 +++++++++++++++++++++++++++++++++-- public/index.html | 6 ++ public/index.xml | 6 ++ public/post/index.html | 6 ++ public/post/index.xml | 6 ++ public/sitemap.xml | 10 +-- public/tags/notes/index.html | 6 ++ public/tags/notes/index.xml | 6 ++ 9 files changed, 242 insertions(+), 12 deletions(-) diff --git a/content/post/2017-09.md b/content/post/2017-09.md index 1c8ac0fb6..6fb603609 100644 --- a/content/post/2017-09.md +++ b/content/post/2017-09.md @@ -9,12 +9,12 @@ tags = ["Notes"] - Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours - - ## 2017-09-07 - Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group + + ## 2017-09-10 - Delete 58 blank metadata values from the CGSpace database: @@ -91,3 +91,90 @@ $ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x - Ideally there could also be a user interface for cleanup and merging of authorities - He will prepare a quote for us with keeping in mind that this could be useful to contribute back to the community for a 5.x release - As far as exposing ORCIDs as flat metadata along side all other metadata, he says this should be possible and will work on a quote for us + +## 2017-09-13 + +- Last night Linode sent an alert about CGSpace (linode18) that it has exceeded the outbound traffic rate threshold of 10Mb/s for the last two hours +- I wonder what was going on, and looking into the nginx logs I think maybe it's OAI... +- Here is yesterday's top ten IP addresses making requests to `/oai`: + +``` +# awk '{print $1}' /var/log/nginx/oai.log | sort -n | uniq -c | sort -h | tail -n 10 + 1 213.136.89.78 + 1 66.249.66.90 + 1 66.249.66.92 + 3 68.180.229.31 + 4 35.187.22.255 + 13745 54.70.175.86 + 15814 34.211.17.113 + 15825 35.161.215.53 + 16704 54.70.51.7 +``` + +- Compared to the previous day's logs it looks VERY high: + +``` +# awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10 + 1 207.46.13.39 + 1 66.249.66.93 + 2 66.249.66.91 + 4 216.244.66.194 + 14 66.249.66.90 +``` + +- The user agents for those top IPs are: + - 54.70.175.86: API scraper + - 34.211.17.113: API scraper + - 35.161.215.53: API scraper + - 54.70.51.7: API scraper +- And this user agent has never been seen before today (or at least recently!): + +``` +# grep -c "API scraper" /var/log/nginx/oai.log +62088 +# zgrep -c "API scraper" /var/log/nginx/oai.log.*.gz +/var/log/nginx/oai.log.10.gz:0 +/var/log/nginx/oai.log.11.gz:0 +/var/log/nginx/oai.log.12.gz:0 +/var/log/nginx/oai.log.13.gz:0 +/var/log/nginx/oai.log.14.gz:0 +/var/log/nginx/oai.log.15.gz:0 +/var/log/nginx/oai.log.16.gz:0 +/var/log/nginx/oai.log.17.gz:0 +/var/log/nginx/oai.log.18.gz:0 +/var/log/nginx/oai.log.19.gz:0 +/var/log/nginx/oai.log.20.gz:0 +/var/log/nginx/oai.log.21.gz:0 +/var/log/nginx/oai.log.22.gz:0 +/var/log/nginx/oai.log.23.gz:0 +/var/log/nginx/oai.log.24.gz:0 +/var/log/nginx/oai.log.25.gz:0 +/var/log/nginx/oai.log.26.gz:0 +/var/log/nginx/oai.log.27.gz:0 +/var/log/nginx/oai.log.28.gz:0 +/var/log/nginx/oai.log.29.gz:0 +/var/log/nginx/oai.log.2.gz:0 +/var/log/nginx/oai.log.30.gz:0 +/var/log/nginx/oai.log.3.gz:0 +/var/log/nginx/oai.log.4.gz:0 +/var/log/nginx/oai.log.5.gz:0 +/var/log/nginx/oai.log.6.gz:0 +/var/log/nginx/oai.log.7.gz:0 +/var/log/nginx/oai.log.8.gz:0 +/var/log/nginx/oai.log.9.gz:0 +``` + +- Some of these heavy users are also using XMLUI, and their user agent isn't matched by the [Tomcat Session Crawler valve](https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/tomcat/server-tomcat7.xml.j2#L158), so each request uses a different session +- Yesterday alone the IP addresses using the `API scraper` user agent were responsible for 16,000 sessions in XMLUI: + +``` +# grep -a -E "(54.70.51.7|35.161.215.53|34.211.17.113|54.70.175.86)" /home/cgspace.cgiar.org/log/dspace.log.2017-09-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +15924 +``` + +- If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex +- Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into: + +``` +WARN org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address +``` diff --git a/public/2017-09/index.html b/public/2017-09/index.html index 98ac7307d..f0446cc90 100644 --- a/public/2017-09/index.html +++ b/public/2017-09/index.html @@ -12,6 +12,12 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours +2017-09-07 + + +Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group + + " /> @@ -19,7 +25,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two - + @@ -38,6 +44,12 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours +2017-09-07 + + +Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group + + "/> @@ -49,9 +61,9 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two "@type": "BlogPosting", "headline": "September, 2017", "url": "https://alanorth.github.io/cgspace-notes/2017-09/", - "wordCount": "903", + "wordCount": "1241", "datePublished": "2017-09-07T16:54:52+07:00", - "dateModified": "2017-09-10T18:21:38+03:00", + "dateModified": "2017-09-12T16:57:19+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -120,14 +132,14 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two
  • Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
  • -

    -

    2017-09-07

    +

    +

    2017-09-10

    +

    2017-09-13

    + + + +
    # awk '{print $1}' /var/log/nginx/oai.log | sort -n | uniq -c | sort -h | tail -n 10
    +      1 213.136.89.78
    +      1 66.249.66.90
    +      1 66.249.66.92
    +      3 68.180.229.31
    +      4 35.187.22.255
    +  13745 54.70.175.86
    +  15814 34.211.17.113
    +  15825 35.161.215.53
    +  16704 54.70.51.7
    +
    + + + +
    # awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
    +      1 207.46.13.39
    +      1 66.249.66.93
    +      2 66.249.66.91
    +      4 216.244.66.194
    +     14 66.249.66.90
    +
    + + + +
    # grep -c "API scraper" /var/log/nginx/oai.log
    +62088
    +# zgrep -c "API scraper" /var/log/nginx/oai.log.*.gz
    +/var/log/nginx/oai.log.10.gz:0
    +/var/log/nginx/oai.log.11.gz:0
    +/var/log/nginx/oai.log.12.gz:0
    +/var/log/nginx/oai.log.13.gz:0
    +/var/log/nginx/oai.log.14.gz:0
    +/var/log/nginx/oai.log.15.gz:0
    +/var/log/nginx/oai.log.16.gz:0
    +/var/log/nginx/oai.log.17.gz:0
    +/var/log/nginx/oai.log.18.gz:0
    +/var/log/nginx/oai.log.19.gz:0
    +/var/log/nginx/oai.log.20.gz:0
    +/var/log/nginx/oai.log.21.gz:0
    +/var/log/nginx/oai.log.22.gz:0
    +/var/log/nginx/oai.log.23.gz:0
    +/var/log/nginx/oai.log.24.gz:0
    +/var/log/nginx/oai.log.25.gz:0
    +/var/log/nginx/oai.log.26.gz:0
    +/var/log/nginx/oai.log.27.gz:0
    +/var/log/nginx/oai.log.28.gz:0
    +/var/log/nginx/oai.log.29.gz:0
    +/var/log/nginx/oai.log.2.gz:0
    +/var/log/nginx/oai.log.30.gz:0
    +/var/log/nginx/oai.log.3.gz:0
    +/var/log/nginx/oai.log.4.gz:0
    +/var/log/nginx/oai.log.5.gz:0
    +/var/log/nginx/oai.log.6.gz:0
    +/var/log/nginx/oai.log.7.gz:0
    +/var/log/nginx/oai.log.8.gz:0
    +/var/log/nginx/oai.log.9.gz:0
    +
    + + + +
    # grep -a -E "(54.70.51.7|35.161.215.53|34.211.17.113|54.70.175.86)" /home/cgspace.cgiar.org/log/dspace.log.2017-09-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
    +15924
    +
    + + + +
    WARN  org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
    +
    + diff --git a/public/index.html b/public/index.html index dc22d0c11..b3b21edb5 100644 --- a/public/index.html +++ b/public/index.html @@ -112,6 +112,12 @@
  • Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
  • +

    2017-09-07

    + + +

    Read more → diff --git a/public/index.xml b/public/index.xml index a1b5a0445..f5fb2330b 100644 --- a/public/index.xml +++ b/public/index.xml @@ -23,6 +23,12 @@ <li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li> </ul> +<h2 id="2017-09-07">2017-09-07</h2> + +<ul> +<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> +</ul> + <p></p> diff --git a/public/post/index.html b/public/post/index.html index 3b5d28aa1..c0fce30e2 100644 --- a/public/post/index.html +++ b/public/post/index.html @@ -112,6 +112,12 @@
  • Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
  • +

    2017-09-07

    + + +

    Read more → diff --git a/public/post/index.xml b/public/post/index.xml index 53fe09496..bfa9f50cd 100644 --- a/public/post/index.xml +++ b/public/post/index.xml @@ -23,6 +23,12 @@ <li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li> </ul> +<h2 id="2017-09-07">2017-09-07</h2> + +<ul> +<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> +</ul> + <p></p> diff --git a/public/sitemap.xml b/public/sitemap.xml index afd6ab907..fb7dd2ad1 100644 --- a/public/sitemap.xml +++ b/public/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2017-09/ - 2017-09-10T18:21:38+03:00 + 2017-09-12T16:57:19+03:00 @@ -119,7 +119,7 @@ https://alanorth.github.io/cgspace-notes/ - 2017-09-10T18:21:38+03:00 + 2017-09-12T16:57:19+03:00 0 @@ -130,19 +130,19 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2017-09-10T18:21:38+03:00 + 2017-09-12T16:57:19+03:00 0 https://alanorth.github.io/cgspace-notes/post/ - 2017-09-10T18:21:38+03:00 + 2017-09-12T16:57:19+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2017-09-10T18:21:38+03:00 + 2017-09-12T16:57:19+03:00 0 diff --git a/public/tags/notes/index.html b/public/tags/notes/index.html index 51dc5052f..d9bf41dd3 100644 --- a/public/tags/notes/index.html +++ b/public/tags/notes/index.html @@ -112,6 +112,12 @@
  • Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
  • +

    2017-09-07

    + + +

    Read more → diff --git a/public/tags/notes/index.xml b/public/tags/notes/index.xml index e70ea5668..b4e4b68d9 100644 --- a/public/tags/notes/index.xml +++ b/public/tags/notes/index.xml @@ -23,6 +23,12 @@ <li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li> </ul> +<h2 id="2017-09-07">2017-09-07</h2> + +<ul> +<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> +</ul> + <p></p>