diff --git a/content/post/2018-01.md b/content/post/2018-01.md index e6fe21567..ca9bcf7c1 100644 --- a/content/post/2018-01.md +++ b/content/post/2018-01.md @@ -79,3 +79,109 @@ dspace.log.2018-01-02:34 - Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains + +## 2018-01-03 + +- I woke up to more up and down of CGSpace, this time UptimeRobot noticed a few rounds of up and down of a few minutes each and Linode also notified of high CPU load from 12 to 2 PM +- Looks like I need to increase the database pool size again: + +``` +$ grep -c "Timeout: Pool empty." dspace.log.2018-01-* +dspace.log.2018-01-01:0 +dspace.log.2018-01-02:1972 +dspace.log.2018-01-03:1909 +``` + +- For some reason there were a lot of "active" connections last night: + +![CGSpace PostgreSQL connections](/cgspace-notes/2018/01/postgres_connections-day.png) + +- The active IPs in XMLUI are: + +``` +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 607 40.77.167.141 + 611 2a00:23c3:8c94:7800:392c:a491:e796:9c50 + 663 188.226.169.37 + 759 157.55.39.245 + 887 68.180.229.254 + 1037 157.55.39.175 + 1068 216.244.66.245 + 1495 66.249.64.91 + 1934 104.196.152.243 + 2219 134.155.96.78 +``` + +- 134.155.96.78 appears to be at the University of Mannheim in Germany +- They identify as: Mozilla/5.0 (compatible; heritrix/3.2.0 +http://ifm.uni-mannheim.de) +- This appears to be the [Internet Archive's open source bot](https://github.com/internetarchive/heritrix3) +- They seem to be re-using their Tomcat session so I don't need to do anything to them just yet: + +``` +$ grep 134.155.96.78 dspace.log.2018-01-03 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +2 +``` + +- The API logs show the normal users: + +``` +# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 32 207.46.13.182 + 38 40.77.167.132 + 38 68.180.229.254 + 43 66.249.64.91 + 46 40.77.167.141 + 49 157.55.39.245 + 79 157.55.39.175 + 1533 50.116.102.77 + 4069 70.32.83.92 + 9355 45.5.184.196 +``` + +- In other related news I see a sizeable amount of requests coming from python-requests +- For example, just in the last day there were 1700! + +``` +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -c python-requests +1773 +``` + +- But they come from hundreds of IPs, many of which are 54.x.x.x: + +``` +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep python-requests | awk '{print $1}' | sort -n | uniq -c | sort -h | tail -n 30 + 9 54.144.87.92 + 9 54.146.222.143 + 9 54.146.249.249 + 9 54.158.139.206 + 9 54.161.235.224 + 9 54.163.41.19 + 9 54.163.4.51 + 9 54.196.195.107 + 9 54.198.89.134 + 9 54.80.158.113 + 10 54.198.171.98 + 10 54.224.53.185 + 10 54.226.55.207 + 10 54.227.8.195 + 10 54.242.234.189 + 10 54.242.238.209 + 10 54.80.100.66 + 11 54.161.243.121 + 11 54.205.154.178 + 11 54.234.225.84 + 11 54.87.23.173 + 11 54.90.206.30 + 12 54.196.127.62 + 12 54.224.242.208 + 12 54.226.199.163 + 13 54.162.149.249 + 13 54.211.182.255 + 19 50.17.61.150 + 21 54.211.119.107 + 139 164.39.7.62 +``` + +- I have no idea what these are but they seem to be coming from Amazon... +- I guess for now I just have to increase the database connection pool's max active +- It's currently 75 and normally I'd just bump it by 25 but let me be a bit daring and push it by 50 to 125, because I used to see at least 121 connections in pg_stat_activity before when we were using the shitty default pooling diff --git a/public/2015-11/index.html b/public/2015-11/index.html index 340e79152..284e36733 100644 --- a/public/2015-11/index.html +++ b/public/2015-11/index.html @@ -52,7 +52,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac "/> - + diff --git a/public/2015-12/index.html b/public/2015-12/index.html index ad50c4256..a8d3a692b 100644 --- a/public/2015-12/index.html +++ b/public/2015-12/index.html @@ -54,7 +54,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less "/> - + diff --git a/public/2016-01/index.html b/public/2016-01/index.html index a4cf9f1bc..38475aa7c 100644 --- a/public/2016-01/index.html +++ b/public/2016-01/index.html @@ -44,7 +44,7 @@ Update GitHub wiki for documentation of maintenance tasks. "/> - + diff --git a/public/2016-02/index.html b/public/2016-02/index.html index 857ef07a1..ca6aed88f 100644 --- a/public/2016-02/index.html +++ b/public/2016-02/index.html @@ -58,7 +58,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r "/> - + diff --git a/public/2016-03/index.html b/public/2016-03/index.html index 05413344c..7137b0d80 100644 --- a/public/2016-03/index.html +++ b/public/2016-03/index.html @@ -44,7 +44,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja "/> - + diff --git a/public/2016-04/index.html b/public/2016-04/index.html index e37bb565c..1d897227b 100644 --- a/public/2016-04/index.html +++ b/public/2016-04/index.html @@ -48,7 +48,7 @@ Also, I noticed the checker log has some errors we should pay attention to: "/> - + diff --git a/public/2016-05/index.html b/public/2016-05/index.html index 5c6d738ec..c7a9b5a72 100644 --- a/public/2016-05/index.html +++ b/public/2016-05/index.html @@ -52,7 +52,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period! "/> - + diff --git a/public/2016-06/index.html b/public/2016-06/index.html index 61bd03567..54d6afbfa 100644 --- a/public/2016-06/index.html +++ b/public/2016-06/index.html @@ -50,7 +50,7 @@ Working on second phase of metadata migration, looks like this will work for mov "/> - + diff --git a/public/2016-07/index.html b/public/2016-07/index.html index 2c9b73c2a..8cd5898c4 100644 --- a/public/2016-07/index.html +++ b/public/2016-07/index.html @@ -66,7 +66,7 @@ In this case the select query was showing 95 results before the update "/> - + diff --git a/public/2016-08/index.html b/public/2016-08/index.html index 85db56e95..5816aae36 100644 --- a/public/2016-08/index.html +++ b/public/2016-08/index.html @@ -60,7 +60,7 @@ $ git rebase -i dspace-5.5 "/> - + diff --git a/public/2016-09/index.html b/public/2016-09/index.html index 6a8d6fc27..c2471a874 100644 --- a/public/2016-09/index.html +++ b/public/2016-09/index.html @@ -52,7 +52,7 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=or "/> - + diff --git a/public/2016-10/index.html b/public/2016-10/index.html index f66e3f460..1db9f8f98 100644 --- a/public/2016-10/index.html +++ b/public/2016-10/index.html @@ -60,7 +60,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id "/> - + diff --git a/public/2016-11/index.html b/public/2016-11/index.html index 57473e45f..14681969e 100644 --- a/public/2016-11/index.html +++ b/public/2016-11/index.html @@ -44,7 +44,7 @@ Add dc.type to the output options for Atmire’s Listings and Reports module "/> - + diff --git a/public/2016-12/index.html b/public/2016-12/index.html index 6364162d2..b7f3869a5 100644 --- a/public/2016-12/index.html +++ b/public/2016-12/index.html @@ -68,7 +68,7 @@ Another worrying error from dspace.log is: "/> - + diff --git a/public/2017-01/index.html b/public/2017-01/index.html index 93b8b92b6..905dbfce4 100644 --- a/public/2017-01/index.html +++ b/public/2017-01/index.html @@ -44,7 +44,7 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua "/> - + diff --git a/public/2017-02/index.html b/public/2017-02/index.html index 69227bac0..a75a8c6b2 100644 --- a/public/2017-02/index.html +++ b/public/2017-02/index.html @@ -72,7 +72,7 @@ Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name "/> - + diff --git a/public/2017-03/index.html b/public/2017-03/index.html index b55d129df..52f20ea80 100644 --- a/public/2017-03/index.html +++ b/public/2017-03/index.html @@ -76,7 +76,7 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg "/> - + diff --git a/public/2017-04/index.html b/public/2017-04/index.html index 5e767b12b..c1f359e73 100644 --- a/public/2017-04/index.html +++ b/public/2017-04/index.html @@ -62,7 +62,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th "/> - + diff --git a/public/2017-05/index.html b/public/2017-05/index.html index 4f278d76f..738713052 100644 --- a/public/2017-05/index.html +++ b/public/2017-05/index.html @@ -28,7 +28,7 @@ - + diff --git a/public/2017-06/index.html b/public/2017-06/index.html index ad25e9436..66f2ee4a7 100644 --- a/public/2017-06/index.html +++ b/public/2017-06/index.html @@ -28,7 +28,7 @@ - + diff --git a/public/2017-07/index.html b/public/2017-07/index.html index ec1ae5f15..03d03ce27 100644 --- a/public/2017-07/index.html +++ b/public/2017-07/index.html @@ -56,7 +56,7 @@ We can use PostgreSQL’s extended output format (-x) plus sed to format the "/> - + diff --git a/public/2017-08/index.html b/public/2017-08/index.html index cfd43ebb9..b55ebd9b9 100644 --- a/public/2017-08/index.html +++ b/public/2017-08/index.html @@ -76,7 +76,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s "/> - + diff --git a/public/2017-09/index.html b/public/2017-09/index.html index 59ef2c036..34902141d 100644 --- a/public/2017-09/index.html +++ b/public/2017-09/index.html @@ -52,7 +52,7 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account "/> - + diff --git a/public/2017-10/index.html b/public/2017-10/index.html index 7e2bbcef4..2684feb71 100644 --- a/public/2017-10/index.html +++ b/public/2017-10/index.html @@ -56,7 +56,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG "/> - + diff --git a/public/2017-11/index.html b/public/2017-11/index.html index 53e00d00f..204b1371f 100644 --- a/public/2017-11/index.html +++ b/public/2017-11/index.html @@ -76,7 +76,7 @@ COPY 54701 "/> - + diff --git a/public/2017-12/index.html b/public/2017-12/index.html index 27b3367a4..e08a1c2ab 100644 --- a/public/2017-12/index.html +++ b/public/2017-12/index.html @@ -46,7 +46,7 @@ The list of connections to XMLUI and REST API for today: "/> - + diff --git a/public/2018-01/index.html b/public/2018-01/index.html index 58b3aa9df..726495630 100644 --- a/public/2018-01/index.html +++ b/public/2018-01/index.html @@ -184,7 +184,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv "/> - + @@ -194,7 +194,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv "@type": "BlogPosting", "headline": "January, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-01/", - "wordCount": "282", + "wordCount": "731", "datePublished": "2018-01-02T08:35:54-08:00", "dateModified": "2018-01-02T09:30:34-08:00", "author": { @@ -339,6 +339,122 @@ dspace.log.2018-01-02:34

+

2018-01-03

+ + + +
$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
+dspace.log.2018-01-01:0
+dspace.log.2018-01-02:1972
+dspace.log.2018-01-03:1909
+
+ + + +

CGSpace PostgreSQL connections

+ + + +
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
+    607 40.77.167.141
+    611 2a00:23c3:8c94:7800:392c:a491:e796:9c50
+    663 188.226.169.37
+    759 157.55.39.245
+    887 68.180.229.254
+   1037 157.55.39.175
+   1068 216.244.66.245
+   1495 66.249.64.91
+   1934 104.196.152.243
+   2219 134.155.96.78
+
+ + + +
$ grep 134.155.96.78 dspace.log.2018-01-03 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
+2
+
+ + + +
# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
+     32 207.46.13.182
+     38 40.77.167.132
+     38 68.180.229.254
+     43 66.249.64.91
+     46 40.77.167.141
+     49 157.55.39.245
+     79 157.55.39.175
+   1533 50.116.102.77
+   4069 70.32.83.92
+   9355 45.5.184.196
+
+ + + +
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -c python-requests
+1773
+
+ + + +
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep python-requests | awk '{print $1}' | sort -n | uniq -c | sort -h | tail -n 30
+      9 54.144.87.92
+      9 54.146.222.143
+      9 54.146.249.249
+      9 54.158.139.206
+      9 54.161.235.224
+      9 54.163.41.19
+      9 54.163.4.51
+      9 54.196.195.107
+      9 54.198.89.134
+      9 54.80.158.113
+     10 54.198.171.98
+     10 54.224.53.185
+     10 54.226.55.207
+     10 54.227.8.195
+     10 54.242.234.189
+     10 54.242.238.209
+     10 54.80.100.66
+     11 54.161.243.121
+     11 54.205.154.178
+     11 54.234.225.84
+     11 54.87.23.173
+     11 54.90.206.30
+     12 54.196.127.62
+     12 54.224.242.208
+     12 54.226.199.163
+     13 54.162.149.249
+     13 54.211.182.255
+     19 50.17.61.150
+     21 54.211.119.107
+    139 164.39.7.62
+
+ + + diff --git a/public/2018/01/postgres_connections-day.png b/public/2018/01/postgres_connections-day.png new file mode 100644 index 000000000..da27ce7af Binary files /dev/null and b/public/2018/01/postgres_connections-day.png differ diff --git a/public/categories/notes/index.html b/public/categories/notes/index.html index 3cf854632..bc35e687e 100644 --- a/public/categories/notes/index.html +++ b/public/categories/notes/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/cgiar-library-migration/index.html b/public/cgiar-library-migration/index.html index 49b4edca6..547e1a06f 100644 --- a/public/cgiar-library-migration/index.html +++ b/public/cgiar-library-migration/index.html @@ -28,7 +28,7 @@ - + diff --git a/public/index.html b/public/index.html index 1da773331..4dc17edba 100644 --- a/public/index.html +++ b/public/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/page/2/index.html b/public/page/2/index.html index 58cb29f5a..6da36f4b5 100644 --- a/public/page/2/index.html +++ b/public/page/2/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/page/3/index.html b/public/page/3/index.html index 27da99f02..d79653716 100644 --- a/public/page/3/index.html +++ b/public/page/3/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/post/index.html b/public/post/index.html index de9a2000a..6ebbad52e 100644 --- a/public/post/index.html +++ b/public/post/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/post/page/2/index.html b/public/post/page/2/index.html index 309df8fec..162485f7c 100644 --- a/public/post/page/2/index.html +++ b/public/post/page/2/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/post/page/3/index.html b/public/post/page/3/index.html index 8c7d11022..30aca8cc9 100644 --- a/public/post/page/3/index.html +++ b/public/post/page/3/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/robots.txt b/public/robots.txt index 28997c233..83b9cb92f 100644 --- a/public/robots.txt +++ b/public/robots.txt @@ -31,7 +31,7 @@ Disallow: /cgspace-notes/2015-12/ Disallow: /cgspace-notes/2015-11/ Disallow: /cgspace-notes/ Disallow: /cgspace-notes/categories/ -Disallow: /cgspace-notes/categories/notes/ Disallow: /cgspace-notes/tags/notes/ +Disallow: /cgspace-notes/categories/notes/ Disallow: /cgspace-notes/post/ Disallow: /cgspace-notes/tags/ diff --git a/public/sitemap.xml b/public/sitemap.xml index 0c962fb41..e4233e2ef 100644 --- a/public/sitemap.xml +++ b/public/sitemap.xml @@ -154,14 +154,14 @@ - https://alanorth.github.io/cgspace-notes/categories/notes/ - 2017-09-28T12:00:49+03:00 + https://alanorth.github.io/cgspace-notes/tags/notes/ + 2018-01-02T09:30:34-08:00 0 - https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-01-02T09:30:34-08:00 + https://alanorth.github.io/cgspace-notes/categories/notes/ + 2017-09-28T12:00:49+03:00 0 diff --git a/public/tags/notes/index.html b/public/tags/notes/index.html index 858aef78d..5271c273a 100644 --- a/public/tags/notes/index.html +++ b/public/tags/notes/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/tags/notes/page/2/index.html b/public/tags/notes/page/2/index.html index 8d010c791..645305599 100644 --- a/public/tags/notes/page/2/index.html +++ b/public/tags/notes/page/2/index.html @@ -25,7 +25,7 @@ - + diff --git a/public/tags/notes/page/3/index.html b/public/tags/notes/page/3/index.html index d1ca947c9..14084c060 100644 --- a/public/tags/notes/page/3/index.html +++ b/public/tags/notes/page/3/index.html @@ -25,7 +25,7 @@ - + diff --git a/static/2018/01/postgres_connections-day.png b/static/2018/01/postgres_connections-day.png new file mode 100644 index 000000000..da27ce7af Binary files /dev/null and b/static/2018/01/postgres_connections-day.png differ