Add notes for 2021-01-26

This commit is contained in:
Alan Orth 2021-01-26 15:52:39 +02:00
parent ce74818085
commit ea19549164
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
23 changed files with 140 additions and 28 deletions

View File

@ -346,5 +346,57 @@ $ curl -XDELETE 'http://localhost:9200/openrxv-items-2021-01-25'
- If you do a free-text search it works properly, but if you try to use the metadata filters it doesn't
- I changed the default setting to make it available to any logged in user and will deploy it on CGSpace this week
## 2021-01-26
- Email some CIAT users who submitted items with upper case AGROVOC terms
- I will do another global replace soon after they reply
- Add CGIAR Impact Areas and UN Sustainable Development Goals (SDGs) to the `6x_prod` branch
- Looking into the issue with exporting search results in XMLUI again
- I notice that there is an HTTP 400 when you try to export search results containing a filter
- The Tomcat logs show:
```
Jan 26, 2021 10:47:23 AM org.apache.coyote.http11.AbstractHttp11Processor process
INFO: Error parsing HTTP request header
Note: further occurrences of HTTP request parsing errors will be logged at DEBUG level.
java.lang.IllegalArgumentException: Invalid character found in the request target [/discover/search/csv?query=*&scope=~&filters=author:(Alan\%20Orth)]. The valid characters are defined in RFC 7230 and RFC 3986
at org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:213)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1108)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
```
- This actually seems to be a simple issue, as I notice DSpace is escaping the space for some reason:
- The URL that fails is: https://dspacetest.cgiar.org/discover/search/csv?query=*&scope=~&filters=author:(Alan\%20Orth)
- The URL that works is: https://dspacetest.cgiar.org/discover/search/csv?query=*&scope=~&filters=author:(Alan%20Orth)
- I [filed a bug](https://jira.lyrasis.org/browse/DS-4566) on DSpace's issue tracker (though I accidentally hit Enter and submitted it before I finished, and there is no edit function)
- Looking into Linode report that the load outbound traffic rate was high this morning:
```console
# grep -E '26/Jan/2021:(08|09|10|11|12)' /var/log/nginx/rest.log | goaccess --log-format=COMBINED -
```
- The culprit seems to be the ILRI publications importer, so that's OK
- But I also see an IP in Jordan hitting the REST API 1,100 times today:
```
80.10.12.54 - - [26/Jan/2021:09:43:42 +0100] "GET /rest/rest/bitstreams/98309f17-a831-48ed-8f0a-2d3244cc5a1c/retrieve HTTP/2.0" 302 138 "http://wp.local/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
```
- Seems to be someone from CodeObia working on WordPress
- I told them to please use a bot user agent so it doesn't affect our stats, and to use DSpace Test if possible
- I purged all ~3,000 statistics hits that have the "http://wp.local/" referrer:
```console
$ curl -s "http://localhost:8081/solr/statistics/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>referrer:http\:\/\/wp\.local\/</query></delete>"
```
- Tag version 0.4.3 of the csv-metadata-quality tool on GitHub: https://github.com/ilri/csv-metadata-quality/releases/tag/v0.4.3
- I just realized that I never submitted this to CGSpace as a Big Data Platform output
- I used my previous [DSpace Statistics API submission](https://hdl.handle.net/10568/99143) as a reference and submitted it to CGSpace
<!-- vim: set sw=2 ts=2: -->

View File

@ -27,7 +27,7 @@ For example, this item has 51 views on CGSpace, but 0 on AReS
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2021-01/" />
<meta property="article:published_time" content="2021-01-03T10:13:54+02:00" />
<meta property="article:modified_time" content="2021-01-24T17:40:56+02:00" />
<meta property="article:modified_time" content="2021-01-25T16:37:30+02:00" />
@ -60,9 +60,9 @@ For example, this item has 51 views on CGSpace, but 0 on AReS
"@type": "BlogPosting",
"headline": "January, 2021",
"url": "https://alanorth.github.io/cgspace-notes/2021-01/",
"wordCount": "2526",
"wordCount": "2880",
"datePublished": "2021-01-03T10:13:54+02:00",
"dateModified": "2021-01-24T17:40:56+02:00",
"dateModified": "2021-01-25T16:37:30+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -564,6 +564,66 @@ $ curl -XDELETE 'http://localhost:9200/openrxv-items-2021-01-25'
</ul>
</li>
</ul>
<h2 id="2021-01-26">2021-01-26</h2>
<ul>
<li>Email some CIAT users who submitted items with upper case AGROVOC terms
<ul>
<li>I will do another global replace soon after they reply</li>
</ul>
</li>
<li>Add CGIAR Impact Areas and UN Sustainable Development Goals (SDGs) to the <code>6x_prod</code> branch</li>
<li>Looking into the issue with exporting search results in XMLUI again
<ul>
<li>I notice that there is an HTTP 400 when you try to export search results containing a filter</li>
<li>The Tomcat logs show:</li>
</ul>
</li>
</ul>
<pre><code>Jan 26, 2021 10:47:23 AM org.apache.coyote.http11.AbstractHttp11Processor process
INFO: Error parsing HTTP request header
Note: further occurrences of HTTP request parsing errors will be logged at DEBUG level.
java.lang.IllegalArgumentException: Invalid character found in the request target [/discover/search/csv?query=*&amp;scope=~&amp;filters=author:(Alan\%20Orth)]. The valid characters are defined in RFC 7230 and RFC 3986
at org.apache.coyote.http11.InternalInputBuffer.parseRequestLine(InternalInputBuffer.java:213)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1108)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:654)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
</code></pre><ul>
<li>This actually seems to be a simple issue, as I notice DSpace is escaping the space for some reason:
<ul>
<li>The URL that fails is: <a href="https://dspacetest.cgiar.org/discover/search/csv?query=">https://dspacetest.cgiar.org/discover/search/csv?query=</a>*&amp;scope=~&amp;filters=author:(Alan%20Orth)</li>
<li>The URL that works is: <a href="https://dspacetest.cgiar.org/discover/search/csv?query=">https://dspacetest.cgiar.org/discover/search/csv?query=</a>*&amp;scope=~&amp;filters=author:(Alan%20Orth)</li>
</ul>
</li>
<li>I <a href="https://jira.lyrasis.org/browse/DS-4566">filed a bug</a> on DSpace&rsquo;s issue tracker (though I accidentally hit Enter and submitted it before I finished, and there is no edit function)</li>
<li>Looking into Linode report that the load outbound traffic rate was high this morning:</li>
</ul>
<pre><code class="language-console" data-lang="console"># grep -E '26/Jan/2021:(08|09|10|11|12)' /var/log/nginx/rest.log | goaccess --log-format=COMBINED -
</code></pre><ul>
<li>The culprit seems to be the ILRI publications importer, so that&rsquo;s OK</li>
<li>But I also see an IP in Jordan hitting the REST API 1,100 times today:</li>
</ul>
<pre><code>80.10.12.54 - - [26/Jan/2021:09:43:42 +0100] &quot;GET /rest/rest/bitstreams/98309f17-a831-48ed-8f0a-2d3244cc5a1c/retrieve HTTP/2.0&quot; 302 138 &quot;http://wp.local/&quot; &quot;Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36&quot;
</code></pre><ul>
<li>Seems to be someone from CodeObia working on WordPress
<ul>
<li>I told them to please use a bot user agent so it doesn&rsquo;t affect our stats, and to use DSpace Test if possible</li>
</ul>
</li>
<li>I purged all ~3,000 statistics hits that have the &ldquo;<a href="http://wp.local/%22">http://wp.local/&quot;</a> referrer:</li>
</ul>
<pre><code class="language-console" data-lang="console">$ curl -s &quot;http://localhost:8081/solr/statistics/update?softCommit=true&quot; -H &quot;Content-Type: text/xml&quot; --data-binary &quot;&lt;delete&gt;&lt;query&gt;referrer:http\:\/\/wp\.local\/&lt;/query&gt;&lt;/delete&gt;&quot;
</code></pre><ul>
<li>Tag version 0.4.3 of the csv-metadata-quality tool on GitHub: <a href="https://github.com/ilri/csv-metadata-quality/releases/tag/v0.4.3">https://github.com/ilri/csv-metadata-quality/releases/tag/v0.4.3</a>
<ul>
<li>I just realized that I never submitted this to CGSpace as a Big Data Platform output</li>
<li>I used my previous <a href="https://hdl.handle.net/10568/99143">DSpace Statistics API submission</a> as a reference and submitted it to CGSpace</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2021-01-24T17:40:56+02:00" />
<meta property="og:updated_time" content="2021-01-25T16:37:30+02:00" />

View File

@ -4,27 +4,27 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2021-01-24T17:40:56+02:00</lastmod>
<lastmod>2021-01-25T16:37:30+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2021-01-24T17:40:56+02:00</lastmod>
<lastmod>2021-01-25T16:37:30+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2021-01/</loc>
<lastmod>2021-01-24T17:40:56+02:00</lastmod>
<lastmod>2021-01-25T16:37:30+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2021-01-24T17:40:56+02:00</lastmod>
<lastmod>2021-01-25T16:37:30+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2021-01-24T17:40:56+02:00</lastmod>
<lastmod>2021-01-25T16:37:30+02:00</lastmod>
</url>
<url>