diff --git a/content/posts/2019-01.md b/content/posts/2019-01.md index ec538b7d7..abe8d8dd9 100644 --- a/content/posts/2019-01.md +++ b/content/posts/2019-01.md @@ -743,6 +743,35 @@ $ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=type:2+i - Release [version 0.9.0 of the dspace-statistics-api](https://github.com/ilri/dspace-statistics-api/releases/tag/v0.9.0) to address the issue of querying multiple Solr statistics shards - I deployed it on DSpace Test (linode19) and restarted the indexer and now it shows all the stats from 2018 as well (756 pages of views, intead of 6) - I deployed it on CGSpace (linode18) and restarted the indexer as well +- Linode sent an alert that CGSpace (linode18) was using high CPU this afternoon, the top ten IPs during that time were: + +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "22/Jan/2019:1(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 155 40.77.167.106 + 176 2003:d5:fbda:1c00:1106:c7a0:4b17:3af8 + 189 107.21.16.70 + 217 54.83.93.85 + 310 46.174.208.142 + 346 83.103.94.48 + 360 45.5.186.2 + 595 154.113.73.30 + 716 196.191.127.37 + 915 35.237.175.180 +``` + +- 35.237.175.180 is known to us +- I don't think we've seen 196.191.127.37 before. Its user agent is: + +``` +Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/7.0.185.1002 Safari/537.36 +``` + +- Interestingly this IP is located in Addis Ababa... +- Another interesting one is 154.113.73.30, which is apparently at IITA Nigeria and uses the user agent: + +``` +Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36 +``` ## 2019-01-23 @@ -759,5 +788,35 @@ $ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=type:2+i - Very interesting discussion of methods for [running Tomcat under systemd](https://jdebp.eu/FGA/systemd-house-of-horror/tomcat.html) - We can set the ulimit options that used to be in `/etc/default/tomcat7` with systemd's `LimitNOFILE` and `LimitAS` (see the `systemd.exec` man page) - Note that we need to use `infinity` instead of `unlimited` for the address space +- Create accounts for Bosun from IITA and Valerio from ICARDA / CGMEL on DSpace Test +- Maria Garruccio asked me for a list of author affiliations from all of their submitted items so she can clean them up +- I got a list of their collections from the CGSpace XMLUI and then used an SQL query to dump the unique values to CSV: + +``` +dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/35501', '10568/41728', '10568/49622', '10568/56589', '10568/56592', '10568/65064', '10568/65718', '10568/65719', '10568/67373', '10568/67731', '10568/68235', '10568/68546', '10568/69089', '10568/69160', '10568/69419', '10568/69556', '10568/70131', '10568/70252', '10568/70978'))) group by text_value order by count desc) to /tmp/bioversity-affiliations.csv with csv; +COPY 1109 +``` + +- Send a mail to the dspace-tech mailing list about the OpenSearch issue we had with the Livestock CRP +- Linode sent an alert that CGSpace (linode18) had a high load this morning, here are the top ten IPs during that time: + +``` +# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "23/Jan/2019:0(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 + 222 54.226.25.74 + 241 40.77.167.13 + 272 46.101.86.248 + 297 35.237.175.180 + 332 45.5.184.72 + 355 34.218.226.147 + 404 66.249.64.155 + 4637 205.186.128.185 + 4637 70.32.83.92 + 9265 45.5.186.2 +``` + +- I think it's the usual IPs: + - 45.5.186.2 is CIAT + - 70.32.83.92 is CCAFS + - 205.186.128.185 is CCAFS or perhaps another Macaroni Bros harvester (new ILRI website?) diff --git a/docs/2019-01/index.html b/docs/2019-01/index.html index 0e7ae68f8..68bfdbc40 100644 --- a/docs/2019-01/index.html +++ b/docs/2019-01/index.html @@ -27,7 +27,7 @@ I don’t see anything interesting in the web server logs around that time t " /> - + @@ -60,9 +60,9 @@ I don’t see anything interesting in the web server logs around that time t "@type": "BlogPosting", "headline": "January, 2019", "url": "https://alanorth.github.io/cgspace-notes/2019-01/", - "wordCount": "3697", + "wordCount": "4073", "datePublished": "2019-01-02T09:48:30+02:00", - "dateModified": "2019-01-23T10:46:23+02:00", + "dateModified": "2019-01-23T13:38:00+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -1002,8 +1002,38 @@ $ http 'http://localhost:8081/solr/statistics/select?indent=on&rows=0&q=
  • Release version 0.9.0 of the dspace-statistics-api to address the issue of querying multiple Solr statistics shards
  • I deployed it on DSpace Test (linode19) and restarted the indexer and now it shows all the stats from 2018 as well (756 pages of views, intead of 6)
  • I deployed it on CGSpace (linode18) and restarted the indexer as well
  • +
  • Linode sent an alert that CGSpace (linode18) was using high CPU this afternoon, the top ten IPs during that time were:
  • +
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "22/Jan/2019:1(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    +    155 40.77.167.106
    +    176 2003:d5:fbda:1c00:1106:c7a0:4b17:3af8
    +    189 107.21.16.70
    +    217 54.83.93.85
    +    310 46.174.208.142
    +    346 83.103.94.48
    +    360 45.5.186.2
    +    595 154.113.73.30
    +    716 196.191.127.37
    +    915 35.237.175.180
    +
    + + + +
    Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/7.0.185.1002 Safari/537.36
    +
    + + + +
    Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
    +
    +

    2019-01-23

    + +
    dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/35501', '10568/41728', '10568/49622', '10568/56589', '10568/56592', '10568/65064', '10568/65718', '10568/65719', '10568/67373', '10568/67731', '10568/68235', '10568/68546', '10568/69089', '10568/69160', '10568/69419', '10568/69556', '10568/70131', '10568/70252', '10568/70978'))) group by text_value order by count desc) to /tmp/bioversity-affiliations.csv with csv;
    +COPY 1109
    +
    + + + +
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "23/Jan/2019:0(4|5|6)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
    +    222 54.226.25.74
    +    241 40.77.167.13
    +    272 46.101.86.248
    +    297 35.237.175.180
    +    332 45.5.184.72
    +    355 34.218.226.147
    +    404 66.249.64.155
    +   4637 205.186.128.185
    +   4637 70.32.83.92
    +   9265 45.5.186.2
    +
    + + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 71df5d3c1..4ca201f55 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2019-01/ - 2019-01-23T10:46:23+02:00 + 2019-01-23T13:38:00+02:00 @@ -204,7 +204,7 @@ https://alanorth.github.io/cgspace-notes/ - 2019-01-23T10:46:23+02:00 + 2019-01-23T13:38:00+02:00 0 @@ -221,19 +221,19 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-01-23T10:46:23+02:00 + 2019-01-23T13:38:00+02:00 0 https://alanorth.github.io/cgspace-notes/posts/ - 2019-01-23T10:46:23+02:00 + 2019-01-23T13:38:00+02:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2019-01-23T10:46:23+02:00 + 2019-01-23T13:38:00+02:00 0