mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2020-01-27
This commit is contained in:
@ -33,7 +33,7 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list
|
||||
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
|
||||
Today these are the top 10 IPs:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.62.2" />
|
||||
<meta name="generator" content="Hugo 0.63.1" />
|
||||
|
||||
|
||||
|
||||
@ -63,7 +63,7 @@ Today these are the top 10 IPs:
|
||||
|
||||
<!-- combined, minified CSS -->
|
||||
|
||||
<link href="https://alanorth.github.io/cgspace-notes/css/style.a20c1a4367639632cdb341d23c27ca44fedcc75b0f8b3cbea6203010da153d3c.css" rel="stylesheet" integrity="sha256-ogwaQ2djljLNs0HSPCfKRP7cx1sPizy+piAwENoVPTw=" crossorigin="anonymous">
|
||||
<link href="https://alanorth.github.io/cgspace-notes/css/style.23e2c3298bcc8c1136c19aba330c211ec94c36f7c4454ea15cf4d3548370042a.css" rel="stylesheet" integrity="sha256-I+LDKYvMjBE2wZq6MwwhHslMNvfERU6hXPTTVINwBCo=" crossorigin="anonymous">
|
||||
|
||||
|
||||
<!-- RSS 2.0 feed -->
|
||||
@ -110,7 +110,7 @@ Today these are the top 10 IPs:
|
||||
<header>
|
||||
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-11/">November, 2018</a></h2>
|
||||
<p class="blog-post-meta"><time datetime="2018-11-01T16:41:30+02:00">Thu Nov 01, 2018</time> by Alan Orth in
|
||||
<i class="fa fa-folder" aria-hidden="true"></i> <a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
|
||||
<span class="fas fa-folder" aria-hidden="true"></span> <a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
|
||||
|
||||
|
||||
</p>
|
||||
@ -138,7 +138,7 @@ Today these are the top 10 IPs:
|
||||
22508 66.249.64.59
|
||||
</code></pre><ul>
|
||||
<li>The <code>66.249.64.x</code> are definitely Google</li>
|
||||
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it's only a few thousand requests and always to REST API</li>
|
||||
<li><code>70.32.83.92</code> is well known, probably CCAFS or something, as it’s only a few thousand requests and always to REST API</li>
|
||||
<li><code>84.38.130.177</code> is some new IP in Latvia that is only hitting the XMLUI, using the following user agent:</li>
|
||||
</ul>
|
||||
<pre><code>Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.792.0 Safari/535.1
|
||||
@ -154,13 +154,13 @@ Today these are the top 10 IPs:
|
||||
</ul>
|
||||
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
|
||||
</code></pre><ul>
|
||||
<li>And it doesn't seem they are re-using their Tomcat sessions:</li>
|
||||
<li>And it doesn’t seem they are re-using their Tomcat sessions:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=138.201.52.218' dspace.log.2018-11-03
|
||||
1243
|
||||
</code></pre><ul>
|
||||
<li>Ah, we've apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…</li>
|
||||
<li>I wonder if it's worth adding them to the list of bots in the nginx config?</li>
|
||||
<li>Ah, we’ve apparently seen this server exactly a year ago in 2017-11, making 40,000 requests in one day…</li>
|
||||
<li>I wonder if it’s worth adding them to the list of bots in the nginx config?</li>
|
||||
<li>Linode sent a mail that CGSpace (linode18) is using high outgoing bandwidth</li>
|
||||
<li>Looking at the nginx logs again I see the following top ten IPs:</li>
|
||||
</ul>
|
||||
@ -176,11 +176,11 @@ Today these are the top 10 IPs:
|
||||
12557 78.46.89.18
|
||||
32152 66.249.64.59
|
||||
</code></pre><ul>
|
||||
<li><code>78.46.89.18</code> is new since I last checked a few hours ago, and it's from Hetzner with the following user agent:</li>
|
||||
<li><code>78.46.89.18</code> is new since I last checked a few hours ago, and it’s from Hetzner with the following user agent:</li>
|
||||
</ul>
|
||||
<pre><code>Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:62.0) Gecko/20100101 Firefox/62.0
|
||||
</code></pre><ul>
|
||||
<li>It's making lots of requests, though actually it does seem to be re-using its Tomcat sessions:</li>
|
||||
<li>It’s making lots of requests, though actually it does seem to be re-using its Tomcat sessions:</li>
|
||||
</ul>
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-03
|
||||
8449
|
||||
@ -190,7 +190,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-03
|
||||
<li><em>Updated on 2018-12-04 to correct the grep command above, as it was inaccurate and it seems the bot was actually already re-using its Tomcat sessions</em></li>
|
||||
<li>I could add this IP to the list of bot IPs in nginx, but it seems like a futile effort when some new IP could come along and do the same thing</li>
|
||||
<li>Perhaps I should think about adding rate limits to dynamic pages like <code>/discover</code> and <code>/browse</code></li>
|
||||
<li>I think it's reasonable for a human to click one of those links five or ten times a minute…</li>
|
||||
<li>I think it’s reasonable for a human to click one of those links five or ten times a minute…</li>
|
||||
<li>To contrast, <code>78.46.89.18</code> made about 300 requests per minute for a few hours today:</li>
|
||||
</ul>
|
||||
<pre><code># grep 78.46.89.18 /var/log/nginx/access.log | grep -o -E '03/Nov/2018:[0-9][0-9]:[0-9][0-9]' | sort | uniq -c | sort -n | tail -n 20
|
||||
@ -221,7 +221,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-03
|
||||
</ul>
|
||||
<h2 id="2018-11-04">2018-11-04</h2>
|
||||
<ul>
|
||||
<li>Forward Peter's information about CGSpace financials to Modi from ICRISAT</li>
|
||||
<li>Forward Peter’s information about CGSpace financials to Modi from ICRISAT</li>
|
||||
<li>Linode emailed about the CPU load and outgoing bandwidth on CGSpace (linode18) again</li>
|
||||
<li>Here are the top ten IPs active so far this morning:</li>
|
||||
</ul>
|
||||
@ -355,7 +355,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
|
||||
<h2 id="2018-11-08">2018-11-08</h2>
|
||||
<ul>
|
||||
<li>I deployed verison 0.7.0 of the dspace-statistics-api on DSpace Test (linode19) so I can test it for a few days (and check the Munin stats to see the change in database connections) before deploying on CGSpace</li>
|
||||
<li>I also enabled systemd's persistent journal by setting <a href="https://www.freedesktop.org/software/systemd/man/journald.conf.html"><code>Storage=persistent</code> in <em>journald.conf</em></a></li>
|
||||
<li>I also enabled systemd’s persistent journal by setting <a href="https://www.freedesktop.org/software/systemd/man/journald.conf.html"><code>Storage=persistent</code> in <em>journald.conf</em></a></li>
|
||||
<li>Apparently <a href="https://www.freedesktop.org/software/systemd/man/journald.conf.html">Ubuntu 16.04 defaulted to using rsyslog for boot records until early 2018</a>, so I removed <code>rsyslog</code> too</li>
|
||||
<li>Proof 277 IITA records on DSpace Test: <a href="https://dspacetest.cgiar.org/handle/10568/107871">IITA_ ALIZZY1802-csv_oct23</a>
|
||||
<ul>
|
||||
@ -371,7 +371,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
|
||||
<h2 id="2018-11-13">2018-11-13</h2>
|
||||
<ul>
|
||||
<li>Help troubleshoot an issue with Judy Kimani submitting to the <a href="https://cgspace.cgiar.org/handle/10568/78">ILRI project reports, papers and documents</a> collection on CGSpace</li>
|
||||
<li>For some reason there is an existing group for the “Accept/Reject” workflow step, but it's empty</li>
|
||||
<li>For some reason there is an existing group for the “Accept/Reject” workflow step, but it’s empty</li>
|
||||
<li>I added Judy to the group and told her to try again</li>
|
||||
<li>Sisay changed his leave to be full days until December so I need to finish the IITA records that he was working on (<a href="https://dspacetest.cgiar.org/handle/10568/107871">IITA_ ALIZZY1802-csv_oct23</a>)</li>
|
||||
<li>Sisay had said there were a few PDFs missing and Bosede sent them this week, so I had to find those items on DSpace Test and add the bitstreams to the items manually</li>
|
||||
@ -381,7 +381,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
|
||||
<h2 id="2018-11-14">2018-11-14</h2>
|
||||
<ul>
|
||||
<li>Finally import the 277 IITA (ALIZZY1802) records to CGSpace</li>
|
||||
<li>I had to export them from DSpace Test and import them into a temporary collection on CGSpace first, then export the collection as CSV to map them to new owning collections (IITA books, IITA posters, etc) with OpenRefine because DSpace's <code>dspace export</code> command doesn't include the collections for the items!</li>
|
||||
<li>I had to export them from DSpace Test and import them into a temporary collection on CGSpace first, then export the collection as CSV to map them to new owning collections (IITA books, IITA posters, etc) with OpenRefine because DSpace’s <code>dspace export</code> command doesn’t include the collections for the items!</li>
|
||||
<li>Delete all old IITA collections on DSpace Test and run <code>dspace cleanup</code> to get rid of all the bitstreams</li>
|
||||
</ul>
|
||||
<h2 id="2018-11-15">2018-11-15</h2>
|
||||
@ -428,12 +428,12 @@ java.lang.IllegalStateException: DSpace kernel cannot be null
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
|
||||
2018-11-19 15:23:04,223 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (4629 of 76007): 72731
|
||||
</code></pre><ul>
|
||||
<li>I looked in the Solr log around that time and I don't see anything…</li>
|
||||
<li>Working on Udana's WLE records from last month, first the sixteen records in <a href="https://dspacetest.cgiar.org/handle/10568/108254">2018-11-20 RDL Temp</a>
|
||||
<li>I looked in the Solr log around that time and I don’t see anything…</li>
|
||||
<li>Working on Udana’s WLE records from last month, first the sixteen records in <a href="https://dspacetest.cgiar.org/handle/10568/108254">2018-11-20 RDL Temp</a>
|
||||
<ul>
|
||||
<li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81592">Restoring Degraded Landscapes collection</a></li>
|
||||
<li>a few items missing DOIs, but they are easily available on the publication page</li>
|
||||
<li>clean up DOIs to use “<a href="https://doi.org%22">https://doi.org"</a> format</li>
|
||||
<li>clean up DOIs to use “<a href="https://doi.org">https://doi.org</a>” format</li>
|
||||
<li>clean up some cg.identifier.url to remove unneccessary query strings</li>
|
||||
<li>remove columns with no metadata (river basin, place, target audience, isbn, uri, publisher, ispartofseries, subject)</li>
|
||||
<li>fix column with invalid spaces in metadata field name (cg. subject. wle)</li>
|
||||
@ -447,12 +447,12 @@ java.lang.IllegalStateException: DSpace kernel cannot be null
|
||||
<li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81589">Variability, Risks and Competing Uses collection</a></li>
|
||||
<li>trim and collapse whitespace in all fields (lots in WLE subject!)</li>
|
||||
<li>clean up some cg.identifier.url fields that had unneccessary anchors in their links</li>
|
||||
<li>clean up DOIs to use “<a href="https://doi.org%22">https://doi.org"</a> format</li>
|
||||
<li>clean up DOIs to use “<a href="https://doi.org">https://doi.org</a>” format</li>
|
||||
<li>fix column with invalid spaces in metadata field name (cg. subject. wle)</li>
|
||||
<li>remove columns with no metadata (place, target audience, isbn, uri, publisher, ispartofseries, subject)</li>
|
||||
<li>remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: <code>value.replace('<27>','')</code></li>
|
||||
<li>I notice a few items using DOIs pointing at ICARDA's DSpace like: <a href="https://doi.org/20.500.11766/8178,">https://doi.org/20.500.11766/8178,</a> which then points at the “real” DOI on the publisher's site… these should be using the real DOI instead of ICARDA's “fake” Handle DOI</li>
|
||||
<li>Some items missing DOIs, but they clearly have them if you look at the publisher's site</li>
|
||||
<li>I notice a few items using DOIs pointing at ICARDA’s DSpace like: <a href="https://doi.org/20.500.11766/8178,">https://doi.org/20.500.11766/8178,</a> which then points at the “real” DOI on the publisher’s site… these should be using the real DOI instead of ICARDA’s “fake” Handle DOI</li>
|
||||
<li>Some items missing DOIs, but they clearly have them if you look at the publisher’s site</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
@ -463,7 +463,7 @@ java.lang.IllegalStateException: DSpace kernel cannot be null
|
||||
<li>Judy Kimani was having issues resuming submissions in another ILRI collection recently, and the issue there was due to an empty group defined for the “accept/reject” step (aka workflow step 1)</li>
|
||||
<li>The error then was “authorization denied for workflow step 1” where “workflow step 1” was the “accept/reject” step, which had a group defined, but was empty</li>
|
||||
<li>Adding her to this group solved her issues</li>
|
||||
<li>Tezira says she's also getting the same “authorization denied” error for workflow step 1 when resuming submissions, so I told Abenet to delete the empty group</li>
|
||||
<li>Tezira says she’s also getting the same “authorization denied” error for workflow step 1 when resuming submissions, so I told Abenet to delete the empty group</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
@ -475,7 +475,7 @@ java.lang.IllegalStateException: DSpace kernel cannot be null
|
||||
<pre><code>$ dspace index-discovery -r 10568/41888
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery
|
||||
</code></pre><ul>
|
||||
<li>… but the item still doesn't appear in the collection</li>
|
||||
<li>… but the item still doesn’t appear in the collection</li>
|
||||
<li>Now I will try a full Discovery re-index:</li>
|
||||
</ul>
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
@ -503,7 +503,7 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery
|
||||
4564 70.32.83.92
|
||||
</code></pre><ul>
|
||||
<li>We know 70.32.83.92 is CCAFS harvester on MediaTemple, but 205.186.128.185 is new appears to be a new CCAFS harvester</li>
|
||||
<li>I think we might want to prune some old accounts from CGSpace, perhaps users who haven't logged in in the last two years would be a conservative bunch:</li>
|
||||
<li>I think we might want to prune some old accounts from CGSpace, perhaps users who haven’t logged in in the last two years would be a conservative bunch:</li>
|
||||
</ul>
|
||||
<pre><code>$ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 | wc -l
|
||||
409
|
||||
@ -514,15 +514,15 @@ $ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d
|
||||
<ul>
|
||||
<li>The workflow step 1 (accept/reject) is now undefined for some reason</li>
|
||||
<li>Last week the group was defined, but empty, so we added her to the group and she was able to take the tasks</li>
|
||||
<li>Since then it looks like the group was deleted, so now she didn't have permission to take or leave the tasks in her pool</li>
|
||||
<li>We added her back to the group, then she was able to take the tasks, and then we removed the group again, as we generally don't use this step in CGSpace</li>
|
||||
<li>Since then it looks like the group was deleted, so now she didn’t have permission to take or leave the tasks in her pool</li>
|
||||
<li>We added her back to the group, then she was able to take the tasks, and then we removed the group again, as we generally don’t use this step in CGSpace</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Help Marianne troubleshoot some issue with items in their WLE collections and the WLE publicatons website</li>
|
||||
</ul>
|
||||
<h2 id="2018-11-28">2018-11-28</h2>
|
||||
<ul>
|
||||
<li>Change the usage rights text a bit based on Maria Garruccio's feedback on “all rights reserved” (<a href="https://github.com/ilri/DSpace/pull/404">#404</a>)</li>
|
||||
<li>Change the usage rights text a bit based on Maria Garruccio’s feedback on “all rights reserved” (<a href="https://github.com/ilri/DSpace/pull/404">#404</a>)</li>
|
||||
<li>Run all system updates on DSpace Test (linode19) and reboot the server</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
Reference in New Issue
Block a user