diff --git a/content/post/2018-01.md b/content/post/2018-01.md index e6fe21567..ca9bcf7c1 100644 --- a/content/post/2018-01.md +++ b/content/post/2018-01.md @@ -79,3 +79,109 @@ dspace.log.2018-01-02:34 - Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains + +## 2018-01-03 + +- I woke up to more up and down of CGSpace, this time UptimeRobot noticed a few rounds of up and down of a few minutes each and Linode also notified of high CPU load from 12 to 2 PM +- Looks like I need to increase the database pool size again: + +``` +$ grep -c "Timeout: Pool empty." dspace.log.2018-01-* +dspace.log.2018-01-01:0 +dspace.log.2018-01-02:1972 +dspace.log.2018-01-03:1909 +``` + +- For some reason there were a lot of "active" connections last night: + +![CGSpace PostgreSQL connections](/cgspace-notes/2018/01/postgres_connections-day.png) + +- The active IPs in XMLUI are: + +``` +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 607 40.77.167.141 + 611 2a00:23c3:8c94:7800:392c:a491:e796:9c50 + 663 188.226.169.37 + 759 157.55.39.245 + 887 68.180.229.254 + 1037 157.55.39.175 + 1068 216.244.66.245 + 1495 66.249.64.91 + 1934 104.196.152.243 + 2219 134.155.96.78 +``` + +- 134.155.96.78 appears to be at the University of Mannheim in Germany +- They identify as: Mozilla/5.0 (compatible; heritrix/3.2.0 +http://ifm.uni-mannheim.de) +- This appears to be the [Internet Archive's open source bot](https://github.com/internetarchive/heritrix3) +- They seem to be re-using their Tomcat session so I don't need to do anything to them just yet: + +``` +$ grep 134.155.96.78 dspace.log.2018-01-03 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l +2 +``` + +- The API logs show the normal users: + +``` +# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 32 207.46.13.182 + 38 40.77.167.132 + 38 68.180.229.254 + 43 66.249.64.91 + 46 40.77.167.141 + 49 157.55.39.245 + 79 157.55.39.175 + 1533 50.116.102.77 + 4069 70.32.83.92 + 9355 45.5.184.196 +``` + +- In other related news I see a sizeable amount of requests coming from python-requests +- For example, just in the last day there were 1700! + +``` +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -c python-requests +1773 +``` + +- But they come from hundreds of IPs, many of which are 54.x.x.x: + +``` +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep python-requests | awk '{print $1}' | sort -n | uniq -c | sort -h | tail -n 30 + 9 54.144.87.92 + 9 54.146.222.143 + 9 54.146.249.249 + 9 54.158.139.206 + 9 54.161.235.224 + 9 54.163.41.19 + 9 54.163.4.51 + 9 54.196.195.107 + 9 54.198.89.134 + 9 54.80.158.113 + 10 54.198.171.98 + 10 54.224.53.185 + 10 54.226.55.207 + 10 54.227.8.195 + 10 54.242.234.189 + 10 54.242.238.209 + 10 54.80.100.66 + 11 54.161.243.121 + 11 54.205.154.178 + 11 54.234.225.84 + 11 54.87.23.173 + 11 54.90.206.30 + 12 54.196.127.62 + 12 54.224.242.208 + 12 54.226.199.163 + 13 54.162.149.249 + 13 54.211.182.255 + 19 50.17.61.150 + 21 54.211.119.107 + 139 164.39.7.62 +``` + +- I have no idea what these are but they seem to be coming from Amazon... +- I guess for now I just have to increase the database connection pool's max active +- It's currently 75 and normally I'd just bump it by 25 but let me be a bit daring and push it by 50 to 125, because I used to see at least 121 connections in pg_stat_activity before when we were using the shitty default pooling diff --git a/public/2015-11/index.html b/public/2015-11/index.html index 340e79152..284e36733 100644 --- a/public/2015-11/index.html +++ b/public/2015-11/index.html @@ -52,7 +52,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac "/> - + diff --git a/public/2015-12/index.html b/public/2015-12/index.html index ad50c4256..a8d3a692b 100644 --- a/public/2015-12/index.html +++ b/public/2015-12/index.html @@ -54,7 +54,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less "/> - + diff --git a/public/2016-01/index.html b/public/2016-01/index.html index a4cf9f1bc..38475aa7c 100644 --- a/public/2016-01/index.html +++ b/public/2016-01/index.html @@ -44,7 +44,7 @@ Update GitHub wiki for documentation of maintenance tasks. "/> - + diff --git a/public/2016-02/index.html b/public/2016-02/index.html index 857ef07a1..ca6aed88f 100644 --- a/public/2016-02/index.html +++ b/public/2016-02/index.html @@ -58,7 +58,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r "/> - + diff --git a/public/2016-03/index.html b/public/2016-03/index.html index 05413344c..7137b0d80 100644 --- a/public/2016-03/index.html +++ b/public/2016-03/index.html @@ -44,7 +44,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja "/> - + diff --git a/public/2016-04/index.html b/public/2016-04/index.html index e37bb565c..1d897227b 100644 --- a/public/2016-04/index.html +++ b/public/2016-04/index.html @@ -48,7 +48,7 @@ Also, I noticed the checker log has some errors we should pay attention to: "/> - + diff --git a/public/2016-05/index.html b/public/2016-05/index.html index 5c6d738ec..c7a9b5a72 100644 --- a/public/2016-05/index.html +++ b/public/2016-05/index.html @@ -52,7 +52,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period! "/> - + diff --git a/public/2016-06/index.html b/public/2016-06/index.html index 61bd03567..54d6afbfa 100644 --- a/public/2016-06/index.html +++ b/public/2016-06/index.html @@ -50,7 +50,7 @@ Working on second phase of metadata migration, looks like this will work for mov "/> - + diff --git a/public/2016-07/index.html b/public/2016-07/index.html index 2c9b73c2a..8cd5898c4 100644 --- a/public/2016-07/index.html +++ b/public/2016-07/index.html @@ -66,7 +66,7 @@ In this case the select query was showing 95 results before the update "/> - + diff --git a/public/2016-08/index.html b/public/2016-08/index.html index 85db56e95..5816aae36 100644 --- a/public/2016-08/index.html +++ b/public/2016-08/index.html @@ -60,7 +60,7 @@ $ git rebase -i dspace-5.5 "/> - + diff --git a/public/2016-09/index.html b/public/2016-09/index.html index 6a8d6fc27..c2471a874 100644 --- a/public/2016-09/index.html +++ b/public/2016-09/index.html @@ -52,7 +52,7 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=or "/> - + diff --git a/public/2016-10/index.html b/public/2016-10/index.html index f66e3f460..1db9f8f98 100644 --- a/public/2016-10/index.html +++ b/public/2016-10/index.html @@ -60,7 +60,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id "/> - + diff --git a/public/2016-11/index.html b/public/2016-11/index.html index 57473e45f..14681969e 100644 --- a/public/2016-11/index.html +++ b/public/2016-11/index.html @@ -44,7 +44,7 @@ Add dc.type to the output options for Atmire’s Listings and Reports module "/> - + diff --git a/public/2016-12/index.html b/public/2016-12/index.html index 6364162d2..b7f3869a5 100644 --- a/public/2016-12/index.html +++ b/public/2016-12/index.html @@ -68,7 +68,7 @@ Another worrying error from dspace.log is: "/> - + diff --git a/public/2017-01/index.html b/public/2017-01/index.html index 93b8b92b6..905dbfce4 100644 --- a/public/2017-01/index.html +++ b/public/2017-01/index.html @@ -44,7 +44,7 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua "/> - + diff --git a/public/2017-02/index.html b/public/2017-02/index.html index 69227bac0..a75a8c6b2 100644 --- a/public/2017-02/index.html +++ b/public/2017-02/index.html @@ -72,7 +72,7 @@ Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name "/> - + diff --git a/public/2017-03/index.html b/public/2017-03/index.html index b55d129df..52f20ea80 100644 --- a/public/2017-03/index.html +++ b/public/2017-03/index.html @@ -76,7 +76,7 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg "/> - + diff --git a/public/2017-04/index.html b/public/2017-04/index.html index 5e767b12b..c1f359e73 100644 --- a/public/2017-04/index.html +++ b/public/2017-04/index.html @@ -62,7 +62,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th "/> - + diff --git a/public/2017-05/index.html b/public/2017-05/index.html index 4f278d76f..738713052 100644 --- a/public/2017-05/index.html +++ b/public/2017-05/index.html @@ -28,7 +28,7 @@ - + diff --git a/public/2017-06/index.html b/public/2017-06/index.html index ad25e9436..66f2ee4a7 100644 --- a/public/2017-06/index.html +++ b/public/2017-06/index.html @@ -28,7 +28,7 @@ - + diff --git a/public/2017-07/index.html b/public/2017-07/index.html index ec1ae5f15..03d03ce27 100644 --- a/public/2017-07/index.html +++ b/public/2017-07/index.html @@ -56,7 +56,7 @@ We can use PostgreSQL’s extended output format (-x) plus sed to format the "/> - + diff --git a/public/2017-08/index.html b/public/2017-08/index.html index cfd43ebb9..b55ebd9b9 100644 --- a/public/2017-08/index.html +++ b/public/2017-08/index.html @@ -76,7 +76,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s "/> - + diff --git a/public/2017-09/index.html b/public/2017-09/index.html index 59ef2c036..34902141d 100644 --- a/public/2017-09/index.html +++ b/public/2017-09/index.html @@ -52,7 +52,7 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account "/> - + diff --git a/public/2017-10/index.html b/public/2017-10/index.html index 7e2bbcef4..2684feb71 100644 --- a/public/2017-10/index.html +++ b/public/2017-10/index.html @@ -56,7 +56,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG "/> - + diff --git a/public/2017-11/index.html b/public/2017-11/index.html index 53e00d00f..204b1371f 100644 --- a/public/2017-11/index.html +++ b/public/2017-11/index.html @@ -76,7 +76,7 @@ COPY 54701 "/> - + diff --git a/public/2017-12/index.html b/public/2017-12/index.html index 27b3367a4..e08a1c2ab 100644 --- a/public/2017-12/index.html +++ b/public/2017-12/index.html @@ -46,7 +46,7 @@ The list of connections to XMLUI and REST API for today: "/> - + diff --git a/public/2018-01/index.html b/public/2018-01/index.html index 58b3aa9df..726495630 100644 --- a/public/2018-01/index.html +++ b/public/2018-01/index.html @@ -184,7 +184,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv "/> - + @@ -194,7 +194,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv "@type": "BlogPosting", "headline": "January, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-01/", - "wordCount": "282", + "wordCount": "731", "datePublished": "2018-01-02T08:35:54-08:00", "dateModified": "2018-01-02T09:30:34-08:00", "author": { @@ -339,6 +339,122 @@ dspace.log.2018-01-02:34
+$ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
+dspace.log.2018-01-01:0
+dspace.log.2018-01-02:1972
+dspace.log.2018-01-03:1909
+
+
+# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
+ 607 40.77.167.141
+ 611 2a00:23c3:8c94:7800:392c:a491:e796:9c50
+ 663 188.226.169.37
+ 759 157.55.39.245
+ 887 68.180.229.254
+ 1037 157.55.39.175
+ 1068 216.244.66.245
+ 1495 66.249.64.91
+ 1934 104.196.152.243
+ 2219 134.155.96.78
+
+
+$ grep 134.155.96.78 dspace.log.2018-01-03 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
+2
+
+
+# cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "3/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
+ 32 207.46.13.182
+ 38 40.77.167.132
+ 38 68.180.229.254
+ 43 66.249.64.91
+ 46 40.77.167.141
+ 49 157.55.39.245
+ 79 157.55.39.175
+ 1533 50.116.102.77
+ 4069 70.32.83.92
+ 9355 45.5.184.196
+
+
+# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -c python-requests
+1773
+
+
+# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep python-requests | awk '{print $1}' | sort -n | uniq -c | sort -h | tail -n 30
+ 9 54.144.87.92
+ 9 54.146.222.143
+ 9 54.146.249.249
+ 9 54.158.139.206
+ 9 54.161.235.224
+ 9 54.163.41.19
+ 9 54.163.4.51
+ 9 54.196.195.107
+ 9 54.198.89.134
+ 9 54.80.158.113
+ 10 54.198.171.98
+ 10 54.224.53.185
+ 10 54.226.55.207
+ 10 54.227.8.195
+ 10 54.242.234.189
+ 10 54.242.238.209
+ 10 54.80.100.66
+ 11 54.161.243.121
+ 11 54.205.154.178
+ 11 54.234.225.84
+ 11 54.87.23.173
+ 11 54.90.206.30
+ 12 54.196.127.62
+ 12 54.224.242.208
+ 12 54.226.199.163
+ 13 54.162.149.249
+ 13 54.211.182.255
+ 19 50.17.61.150
+ 21 54.211.119.107
+ 139 164.39.7.62
+
+
+