From ea19549164f3e8d15ed12194762d38bd024c3695 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Tue, 26 Jan 2021 15:52:39 +0200 Subject: [PATCH] Add notes for 2021-01-26 --- content/posts/2021-01.md | 52 +++++++++++++++++++ docs/2021-01/index.html | 66 +++++++++++++++++++++++-- docs/categories/index.html | 2 +- docs/categories/notes/index.html | 2 +- docs/categories/notes/page/2/index.html | 2 +- docs/categories/notes/page/3/index.html | 2 +- docs/categories/notes/page/4/index.html | 2 +- docs/categories/notes/page/5/index.html | 2 +- docs/index.html | 2 +- docs/page/2/index.html | 2 +- docs/page/3/index.html | 2 +- docs/page/4/index.html | 2 +- docs/page/5/index.html | 2 +- docs/page/6/index.html | 2 +- docs/page/7/index.html | 2 +- docs/posts/index.html | 2 +- docs/posts/page/2/index.html | 2 +- docs/posts/page/3/index.html | 2 +- docs/posts/page/4/index.html | 2 +- docs/posts/page/5/index.html | 2 +- docs/posts/page/6/index.html | 2 +- docs/posts/page/7/index.html | 2 +- docs/sitemap.xml | 10 ++-- 23 files changed, 140 insertions(+), 28 deletions(-) diff --git a/content/posts/2021-01.md b/content/posts/2021-01.md index 58a749f86..22effff4d 100644 --- a/content/posts/2021-01.md +++ b/content/posts/2021-01.md @@ -346,5 +346,57 @@ $ curl -XDELETE 'http://localhost:9200/openrxv-items-2021-01-25' - If you do a free-text search it works properly, but if you try to use the metadata filters it doesn't - I changed the default setting to make it available to any logged in user and will deploy it on CGSpace this week +## 2021-01-26 + +- Email some CIAT users who submitted items with upper case AGROVOC terms + - I will do another global replace soon after they reply +- Add CGIAR Impact Areas and UN Sustainable Development Goals (SDGs) to the `6x_prod` branch +- Looking into the issue with exporting search results in XMLUI again + - I notice that there is an HTTP 400 when you try to export search results containing a filter + - The Tomcat logs show: + +``` +Jan 26, 2021 10:47:23 AM org.apache.coyote.http11.AbstractHttp11Processor process +INFO: Error parsing HTTP request header + Note: further occurrences of HTTP request parsing errors will be logged at DEBUG level. +java.lang.IllegalArgumentException: Invalid character found in the request target [/discover/search/csv?query=*&scope=~&filters=author:(Alan\%20Orth)]. The valid characters are defined in RFC 7230 and RFC 3986 + at org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:213) + at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1108) + at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654) + at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317) + at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) + at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) + at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) + at java.lang.Thread.run(Thread.java:748) +``` + +- This actually seems to be a simple issue, as I notice DSpace is escaping the space for some reason: + - The URL that fails is: https://dspacetest.cgiar.org/discover/search/csv?query=*&scope=~&filters=author:(Alan\%20Orth) + - The URL that works is: https://dspacetest.cgiar.org/discover/search/csv?query=*&scope=~&filters=author:(Alan%20Orth) +- I [filed a bug](https://jira.lyrasis.org/browse/DS-4566) on DSpace's issue tracker (though I accidentally hit Enter and submitted it before I finished, and there is no edit function) +- Looking into Linode report that the load outbound traffic rate was high this morning: + +```console +# grep -E '26/Jan/2021:(08|09|10|11|12)' /var/log/nginx/rest.log | goaccess --log-format=COMBINED - +``` + +- The culprit seems to be the ILRI publications importer, so that's OK +- But I also see an IP in Jordan hitting the REST API 1,100 times today: + +``` +80.10.12.54 - - [26/Jan/2021:09:43:42 +0100] "GET /rest/rest/bitstreams/98309f17-a831-48ed-8f0a-2d3244cc5a1c/retrieve HTTP/2.0" 302 138 "http://wp.local/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36" +``` + +- Seems to be someone from CodeObia working on WordPress + - I told them to please use a bot user agent so it doesn't affect our stats, and to use DSpace Test if possible +- I purged all ~3,000 statistics hits that have the "http://wp.local/" referrer: + +```console +$ curl -s "http://localhost:8081/solr/statistics/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "referrer:http\:\/\/wp\.local\/" +``` + +- Tag version 0.4.3 of the csv-metadata-quality tool on GitHub: https://github.com/ilri/csv-metadata-quality/releases/tag/v0.4.3 + - I just realized that I never submitted this to CGSpace as a Big Data Platform output + - I used my previous [DSpace Statistics API submission](https://hdl.handle.net/10568/99143) as a reference and submitted it to CGSpace diff --git a/docs/2021-01/index.html b/docs/2021-01/index.html index 8b663dd77..41661d44c 100644 --- a/docs/2021-01/index.html +++ b/docs/2021-01/index.html @@ -27,7 +27,7 @@ For example, this item has 51 views on CGSpace, but 0 on AReS - + @@ -60,9 +60,9 @@ For example, this item has 51 views on CGSpace, but 0 on AReS "@type": "BlogPosting", "headline": "January, 2021", "url": "https://alanorth.github.io/cgspace-notes/2021-01/", - "wordCount": "2526", + "wordCount": "2880", "datePublished": "2021-01-03T10:13:54+02:00", - "dateModified": "2021-01-24T17:40:56+02:00", + "dateModified": "2021-01-25T16:37:30+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -564,6 +564,66 @@ $ curl -XDELETE 'http://localhost:9200/openrxv-items-2021-01-25' +

2021-01-26

+ +
Jan 26, 2021 10:47:23 AM org.apache.coyote.http11.AbstractHttp11Processor process
+INFO: Error parsing HTTP request header
+ Note: further occurrences of HTTP request parsing errors will be logged at DEBUG level.
+java.lang.IllegalArgumentException: Invalid character found in the request target [/discover/search/csv?query=*&scope=~&filters=author:(Alan\%20Orth)]. The valid characters are defined in RFC 7230 and RFC 3986
+        at org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:213)
+        at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1108)
+        at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
+        at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
+        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
+        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
+        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
+        at java.lang.Thread.run(Thread.java:748)
+
+
# grep -E '26/Jan/2021:(08|09|10|11|12)' /var/log/nginx/rest.log | goaccess --log-format=COMBINED -
+
+
80.10.12.54 - - [26/Jan/2021:09:43:42 +0100] "GET /rest/rest/bitstreams/98309f17-a831-48ed-8f0a-2d3244cc5a1c/retrieve HTTP/2.0" 302 138 "http://wp.local/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
+
+
$ curl -s "http://localhost:8081/solr/statistics/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>referrer:http\:\/\/wp\.local\/</query></delete>"
+
diff --git a/docs/categories/index.html b/docs/categories/index.html index 749d05ed7..a73f9f4cf 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index 0a4a59797..0670d2d5c 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 9c2a6531d..988529210 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 7f59ffbfb..f039dc2a0 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index c4eab6ec9..3c63b103a 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index 0049f940e..6613eb82a 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index 5dcce635a..fc77ec653 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index cfbc1697f..5c21f4307 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index c6204e807..6ce4bcaa9 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index b7b01e39b..6135cfce9 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index f14fb203d..7ea24b0de 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index ba433f334..f5e3d08c0 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index cd227f50c..74ddd753c 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 3f4a43794..f911bf8fd 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 4a6ffaaed..0406ba690 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index c46275b8f..51e533880 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index ca3fa9eba..81d1529e0 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index eae318bae..2fb32519e 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 4ff3dd303..c350091a3 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 0afa5c7ec..59c729db8 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 2c5149930..2b61c546d 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/categories/ - 2021-01-24T17:40:56+02:00 + 2021-01-25T16:37:30+02:00 https://alanorth.github.io/cgspace-notes/ - 2021-01-24T17:40:56+02:00 + 2021-01-25T16:37:30+02:00 https://alanorth.github.io/cgspace-notes/2021-01/ - 2021-01-24T17:40:56+02:00 + 2021-01-25T16:37:30+02:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2021-01-24T17:40:56+02:00 + 2021-01-25T16:37:30+02:00 https://alanorth.github.io/cgspace-notes/posts/ - 2021-01-24T17:40:56+02:00 + 2021-01-25T16:37:30+02:00