diff --git a/content/post/2018-01.md b/content/post/2018-01.md index dccbc336d..fe972e732 100644 --- a/content/post/2018-01.md +++ b/content/post/2018-01.md @@ -463,3 +463,28 @@ cache_alignment : 64 ![Firewall load](/cgspace-notes/2018/01/firewall-perfectip.png) - Linode rebooted DSpace Test and CGSpace for their host hypervisor kernel updates +- Following up with the Solr sharding issue on the dspace-tech mailing list, I noticed this interesting snippet in the Tomcat `localhost_access_log` at the time of my sharding attempt on my test machine: + +``` +127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?q=type%3A2+AND+id%3A1&wt=javabin&version=2 HTTP/1.1" 200 107 +127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?q=*%3A*&rows=0&facet=true&facet.range=time&facet.range.start=NOW%2FYEAR-18YEARS&facet.range.end=NOW%2FYEAR%2B0YEARS&facet.range.gap=%2B1YEAR&facet.mincount=1&wt=javabin&version=2 HTTP/1.1" 200 447 +127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/admin/cores?action=STATUS&core=statistics-2016&indexInfo=true&wt=javabin&version=2 HTTP/1.1" 200 76 +127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/admin/cores?action=CREATE&name=statistics-2016&instanceDir=statistics&dataDir=%2FUsers%2Faorth%2Fdspace%2Fsolr%2Fstatistics-2016%2Fdata&wt=javabin&version=2 HTTP/1.1" 200 63 +127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?csv.mv.separator=%7C&q=*%3A*&fq=time%3A%28%5B2016%5C-01%5C-01T00%5C%3A00%5C%3A00Z+TO+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%5D+NOT+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%29&rows=10000&wt=csv HTTP/1.1" 200 2137630 +127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/admin/luke?show=schema&wt=javabin&version=2 HTTP/1.1" 200 16253 +127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "POST /solr//statistics-2016/update/csv?commit=true&softCommit=false&waitSearcher=true&f.previousWorkflowStep.split=true&f.previousWorkflowStep.separator=%7C&f.previousWorkflowStep.encapsulator=%22&f.actingGroupId.split=true&f.actingGroupId.separator=%7C&f.actingGroupId.encapsulator=%22&f.containerCommunity.split=true&f.containerCommunity.separator=%7C&f.containerCommunity.encapsulator=%22&f.range.split=true&f.range.separator=%7C&f.range.encapsulator=%22&f.containerItem.split=true&f.containerItem.separator=%7C&f.containerItem.encapsulator=%22&f.p_communities_map.split=true&f.p_communities_map.separator=%7C&f.p_communities_map.encapsulator=%22&f.ngram_query_search.split=true&f.ngram_query_search.separator=%7C&f.ngram_query_search.encapsulator=%22&f.containerBitstream.split=true&f.containerBitstream.separator=%7C&f.containerBitstream.encapsulator=%22&f.owningItem.split=true&f.owningItem.separator=%7C&f.owningItem.encapsulator=%22&f.actingGroupParentId.split=true&f.actingGroupParentId.separator=%7C&f.actingGroupParentId.encapsulator=%22&f.text.split=true&f.text.separator=%7C&f.text.encapsulator=%22&f.simple_query_search.split=true&f.simple_query_search.separator=%7C&f.simple_query_search.encapsulator=%22&f.owningComm.split=true&f.owningComm.separator=%7C&f.owningComm.encapsulator=%22&f.owner.split=true&f.owner.separator=%7C&f.owner.encapsulator=%22&f.filterquery.split=true&f.filterquery.separator=%7C&f.filterquery.encapsulator=%22&f.p_group_map.split=true&f.p_group_map.separator=%7C&f.p_group_map.encapsulator=%22&f.actorMemberGroupId.split=true&f.actorMemberGroupId.separator=%7C&f.actorMemberGroupId.encapsulator=%22&f.bitstreamId.split=true&f.bitstreamId.separator=%7C&f.bitstreamId.encapsulator=%22&f.group_name.split=true&f.group_name.separator=%7C&f.group_name.encapsulator=%22&f.p_communities_name.split=true&f.p_communities_name.separator=%7C&f.p_communities_name.encapsulator=%22&f.query.split=true&f.query.separator=%7C&f.query.encapsulator=%22&f.workflowStep.split=true&f.workflowStep.separator=%7C&f.workflowStep.encapsulator=%22&f.containerCollection.split=true&f.containerCollection.separator=%7C&f.containerCollection.encapsulator=%22&f.complete_query_search.split=true&f.complete_query_search.separator=%7C&f.complete_query_search.encapsulator=%22&f.p_communities_id.split=true&f.p_communities_id.separator=%7C&f.p_communities_id.encapsulator=%22&f.rangeDescription.split=true&f.rangeDescription.separator=%7C&f.rangeDescription.encapsulator=%22&f.group_id.split=true&f.group_id.separator=%7C&f.group_id.encapsulator=%22&f.bundleName.split=true&f.bundleName.separator=%7C&f.bundleName.encapsulator=%22&f.ngram_simplequery_search.split=true&f.ngram_simplequery_search.separator=%7C&f.ngram_simplequery_search.encapsulator=%22&f.group_map.split=true&f.group_map.separator=%7C&f.group_map.encapsulator=%22&f.owningColl.split=true&f.owningColl.separator=%7C&f.owningColl.encapsulator=%22&f.p_group_id.split=true&f.p_group_id.separator=%7C&f.p_group_id.encapsulator=%22&f.p_group_name.split=true&f.p_group_name.separator=%7C&f.p_group_name.encapsulator=%22&wt=javabin&version=2 HTTP/1.1" 409 156 +``` + +- The new core is created but when DSpace attempts to POST to it there is an HTTP 409 error +- This is apparently a common Solr error code that means "version conflict": http://yonik.com/solr/optimistic-concurrency/ +- Looks like that bot from the PerfectIP.net host ended up making about 450,000 requests to XMLUI alone yesterday: + +``` +# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail + 21572 70.36.107.50 + 30722 70.36.107.190 + 34566 70.36.107.49 + 101829 2607:fa98:40:9:26b6:fdff:feff:195d + 111535 2607:fa98:40:9:26b6:fdff:feff:1c96 + 161797 2607:fa98:40:9:26b6:fdff:feff:1888 +``` diff --git a/public/2018-01/index.html b/public/2018-01/index.html index 4c378b61e..8740da481 100644 --- a/public/2018-01/index.html +++ b/public/2018-01/index.html @@ -92,7 +92,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv - + @@ -194,9 +194,9 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv "@type": "BlogPosting", "headline": "January, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-01/", - "wordCount": "2123", + "wordCount": "2324", "datePublished": "2018-01-02T08:35:54-08:00", - "dateModified": "2018-01-10T13:15:13+02:00", + "dateModified": "2018-01-11T08:36:59+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -762,8 +762,33 @@ cache_alignment : 64 +
127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?q=type%3A2+AND+id%3A1&wt=javabin&version=2 HTTP/1.1" 200 107
+127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?q=*%3A*&rows=0&facet=true&facet.range=time&facet.range.start=NOW%2FYEAR-18YEARS&facet.range.end=NOW%2FYEAR%2B0YEARS&facet.range.gap=%2B1YEAR&facet.mincount=1&wt=javabin&version=2 HTTP/1.1" 200 447
+127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/admin/cores?action=STATUS&core=statistics-2016&indexInfo=true&wt=javabin&version=2 HTTP/1.1" 200 76
+127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/admin/cores?action=CREATE&name=statistics-2016&instanceDir=statistics&dataDir=%2FUsers%2Faorth%2Fdspace%2Fsolr%2Fstatistics-2016%2Fdata&wt=javabin&version=2 HTTP/1.1" 200 63
+127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/select?csv.mv.separator=%7C&q=*%3A*&fq=time%3A%28%5B2016%5C-01%5C-01T00%5C%3A00%5C%3A00Z+TO+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%5D+NOT+2017%5C-01%5C-01T00%5C%3A00%5C%3A00Z%29&rows=10000&wt=csv HTTP/1.1" 200 2137630
+127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "GET /solr/statistics/admin/luke?show=schema&wt=javabin&version=2 HTTP/1.1" 200 16253
+127.0.0.1 - - [10/Jan/2018:10:51:19 +0200] "POST /solr//statistics-2016/update/csv?commit=true&softCommit=false&waitSearcher=true&f.previousWorkflowStep.split=true&f.previousWorkflowStep.separator=%7C&f.previousWorkflowStep.encapsulator=%22&f.actingGroupId.split=true&f.actingGroupId.separator=%7C&f.actingGroupId.encapsulator=%22&f.containerCommunity.split=true&f.containerCommunity.separator=%7C&f.containerCommunity.encapsulator=%22&f.range.split=true&f.range.separator=%7C&f.range.encapsulator=%22&f.containerItem.split=true&f.containerItem.separator=%7C&f.containerItem.encapsulator=%22&f.p_communities_map.split=true&f.p_communities_map.separator=%7C&f.p_communities_map.encapsulator=%22&f.ngram_query_search.split=true&f.ngram_query_search.separator=%7C&f.ngram_query_search.encapsulator=%22&f.containerBitstream.split=true&f.containerBitstream.separator=%7C&f.containerBitstream.encapsulator=%22&f.owningItem.split=true&f.owningItem.separator=%7C&f.owningItem.encapsulator=%22&f.actingGroupParentId.split=true&f.actingGroupParentId.separator=%7C&f.actingGroupParentId.encapsulator=%22&f.text.split=true&f.text.separator=%7C&f.text.encapsulator=%22&f.simple_query_search.split=true&f.simple_query_search.separator=%7C&f.simple_query_search.encapsulator=%22&f.owningComm.split=true&f.owningComm.separator=%7C&f.owningComm.encapsulator=%22&f.owner.split=true&f.owner.separator=%7C&f.owner.encapsulator=%22&f.filterquery.split=true&f.filterquery.separator=%7C&f.filterquery.encapsulator=%22&f.p_group_map.split=true&f.p_group_map.separator=%7C&f.p_group_map.encapsulator=%22&f.actorMemberGroupId.split=true&f.actorMemberGroupId.separator=%7C&f.actorMemberGroupId.encapsulator=%22&f.bitstreamId.split=true&f.bitstreamId.separator=%7C&f.bitstreamId.encapsulator=%22&f.group_name.split=true&f.group_name.separator=%7C&f.group_name.encapsulator=%22&f.p_communities_name.split=true&f.p_communities_name.separator=%7C&f.p_communities_name.encapsulator=%22&f.query.split=true&f.query.separator=%7C&f.query.encapsulator=%22&f.workflowStep.split=true&f.workflowStep.separator=%7C&f.workflowStep.encapsulator=%22&f.containerCollection.split=true&f.containerCollection.separator=%7C&f.containerCollection.encapsulator=%22&f.complete_query_search.split=true&f.complete_query_search.separator=%7C&f.complete_query_search.encapsulator=%22&f.p_communities_id.split=true&f.p_communities_id.separator=%7C&f.p_communities_id.encapsulator=%22&f.rangeDescription.split=true&f.rangeDescription.separator=%7C&f.rangeDescription.encapsulator=%22&f.group_id.split=true&f.group_id.separator=%7C&f.group_id.encapsulator=%22&f.bundleName.split=true&f.bundleName.separator=%7C&f.bundleName.encapsulator=%22&f.ngram_simplequery_search.split=true&f.ngram_simplequery_search.separator=%7C&f.ngram_simplequery_search.encapsulator=%22&f.group_map.split=true&f.group_map.separator=%7C&f.group_map.encapsulator=%22&f.owningColl.split=true&f.owningColl.separator=%7C&f.owningColl.encapsulator=%22&f.p_group_id.split=true&f.p_group_id.separator=%7C&f.p_group_id.encapsulator=%22&f.p_group_name.split=true&f.p_group_name.separator=%7C&f.p_group_name.encapsulator=%22&wt=javabin&version=2 HTTP/1.1" 409 156
+
+ + + +
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36" | grep "10/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
+  21572 70.36.107.50
+  30722 70.36.107.190
+  34566 70.36.107.49
+ 101829 2607:fa98:40:9:26b6:fdff:feff:195d
+ 111535 2607:fa98:40:9:26b6:fdff:feff:1c96
+ 161797 2607:fa98:40:9:26b6:fdff:feff:1888
+
+ diff --git a/public/robots.txt b/public/robots.txt index 83b9cb92f..28997c233 100644 --- a/public/robots.txt +++ b/public/robots.txt @@ -31,7 +31,7 @@ Disallow: /cgspace-notes/2015-12/ Disallow: /cgspace-notes/2015-11/ Disallow: /cgspace-notes/ Disallow: /cgspace-notes/categories/ -Disallow: /cgspace-notes/tags/notes/ Disallow: /cgspace-notes/categories/notes/ +Disallow: /cgspace-notes/tags/notes/ Disallow: /cgspace-notes/post/ Disallow: /cgspace-notes/tags/ diff --git a/public/sitemap.xml b/public/sitemap.xml index 4273552f7..94bf36389 100644 --- a/public/sitemap.xml +++ b/public/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-01/ - 2018-01-10T13:15:13+02:00 + 2018-01-11T08:36:59+02:00 @@ -144,7 +144,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-01-10T13:15:13+02:00 + 2018-01-11T08:36:59+02:00 0 @@ -153,27 +153,27 @@ 0 - - https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-01-10T13:15:13+02:00 - 0 - - https://alanorth.github.io/cgspace-notes/categories/notes/ 2017-09-28T12:00:49+03:00 0 + + https://alanorth.github.io/cgspace-notes/tags/notes/ + 2018-01-11T08:36:59+02:00 + 0 + + https://alanorth.github.io/cgspace-notes/post/ - 2018-01-10T13:15:13+02:00 + 2018-01-11T08:36:59+02:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-01-10T13:15:13+02:00 + 2018-01-11T08:36:59+02:00 0