Add notes for 2020-01-27

This commit is contained in:
2020-01-27 16:20:44 +02:00
parent 207ace0883
commit 8feb93be39
112 changed files with 11466 additions and 5158 deletions

View File

@ -45,7 +45,7 @@ DELETE 1
But after this I tried to delete the item from the XMLUI and it is still present…
"/>
<meta name="generator" content="Hugo 0.62.2" />
<meta name="generator" content="Hugo 0.63.1" />
@ -75,7 +75,7 @@ But after this I tried to delete the item from the XMLUI and it is still present
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.a20c1a4367639632cdb341d23c27ca44fedcc75b0f8b3cbea6203010da153d3c.css" rel="stylesheet" integrity="sha256-ogwaQ2djljLNs0HSPCfKRP7cx1sPizy&#43;piAwENoVPTw=" crossorigin="anonymous">
<link href="https://alanorth.github.io/cgspace-notes/css/style.23e2c3298bcc8c1136c19aba330c211ec94c36f7c4454ea15cf4d3548370042a.css" rel="stylesheet" integrity="sha256-I&#43;LDKYvMjBE2wZq6MwwhHslMNvfERU6hXPTTVINwBCo=" crossorigin="anonymous">
<!-- RSS 2.0 feed -->
@ -122,7 +122,7 @@ But after this I tried to delete the item from the XMLUI and it is still present
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-05/">May, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-05-01T07:37:43&#43;03:00">Wed May 01, 2019</time> by Alan Orth in
<i class="fa fa-folder" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes" rel="category tag">Notes</a>
</p>
@ -146,7 +146,7 @@ DELETE 1
<ul>
<li>I managed to delete the problematic item from the database
<ul>
<li>First I deleted the item's bitstream in XMLUI and then ran <code>dspace cleanup -v</code> to remove it from the assetstore</li>
<li>First I deleted the item&rsquo;s bitstream in XMLUI and then ran <code>dspace cleanup -v</code> to remove it from the assetstore</li>
<li>Then I ran the following SQL:</li>
</ul>
</li>
@ -155,7 +155,7 @@ DELETE 1
dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
dspace=# DELETE FROM item WHERE item_id=74648;
</code></pre><ul>
<li>Now the item is (hopefully) really gone and I can continue to troubleshoot the issue with REST API's <code>/items/find-by-metadata-value</code> endpoint
<li>Now the item is (hopefully) really gone and I can continue to troubleshoot the issue with REST API&rsquo;s <code>/items/find-by-metadata-value</code> endpoint
<ul>
<li>Of course I run into another HTTP 401 error when I continue trying the LandPortal search from last month:</li>
</ul>
@ -177,15 +177,15 @@ curl: (22) The requested URL returned error: 401 Unauthorized
</li>
<li>Some are in the <code>workspaceitem</code> table (pre-submission), others are in the <code>workflowitem</code> table (submitted), and others are actually approved, but withdrawn&hellip;
<ul>
<li>This is actually a worthless exercise because the real issue is that the <code>/items/find-by-metadata-value</code> endpoint is simply designed flawed and shouldn't be fatally erroring when the search returns items the user doesn't have permission to access</li>
<li>It would take way too much time to try to fix the fucked up items that are in limbo by deleting them in SQL, but also, it doesn't actually fix the problem because some items are <em>submitted</em> but <em>withdrawn</em>, so they actually have handles and everything</li>
<li>I think the solution is to recommend people don't use the <code>/items/find-by-metadata-value</code> endpoint</li>
<li>This is actually a worthless exercise because the real issue is that the <code>/items/find-by-metadata-value</code> endpoint is simply designed flawed and shouldn&rsquo;t be fatally erroring when the search returns items the user doesn&rsquo;t have permission to access</li>
<li>It would take way too much time to try to fix the fucked up items that are in limbo by deleting them in SQL, but also, it doesn&rsquo;t actually fix the problem because some items are <em>submitted</em> but <em>withdrawn</em>, so they actually have handles and everything</li>
<li>I think the solution is to recommend people don&rsquo;t use the <code>/items/find-by-metadata-value</code> endpoint</li>
</ul>
</li>
<li>CIP is asking about embedding PDF thumbnail images in their RSS feeds again
<ul>
<li>They asked in 2018-09 as well and I told them it wasn't possible</li>
<li>To make sure, I looked at <a href="https://wiki.duraspace.org/display/DSPACE/Enable+Media+RSS+Feeds">the documentation for RSS media feeds</a> and tried it, but couldn't get it to work</li>
<li>They asked in 2018-09 as well and I told them it wasn&rsquo;t possible</li>
<li>To make sure, I looked at <a href="https://wiki.duraspace.org/display/DSPACE/Enable+Media+RSS+Feeds">the documentation for RSS media feeds</a> and tried it, but couldn&rsquo;t get it to work</li>
<li>It seems to be geared towards iTunes and Podcasts&hellip; I dunno</li>
</ul>
</li>
@ -273,7 +273,7 @@ Please see the DSpace documentation for assistance.
<p><img src="/cgspace-notes/2019/05/2019-05-06-postgres_connections_db-day.png" alt="linode18 postgres connections day"></p>
<p><img src="/cgspace-notes/2019/05/2019-05-06-cpu-day.png" alt="linode18 CPU day"></p>
<ul>
<li>The number of unique sessions today is <em>ridiculously</em> high compared to the last few days considering it's only 12:30PM right now:</li>
<li>The number of unique sessions today is <em>ridiculously</em> high compared to the last few days considering it&rsquo;s only 12:30PM right now:</li>
</ul>
<pre><code>$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-06 | sort | uniq | wc -l
101108
@ -326,7 +326,7 @@ $ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -
2845 HEAD
98121 GET
</code></pre><ul>
<li>I'm not exactly sure what happened this morning, but it looks like some legitimate user traffic—perhaps someone launched a new publication and it got a bunch of hits?</li>
<li>I&rsquo;m not exactly sure what happened this morning, but it looks like some legitimate user traffic—perhaps someone launched a new publication and it got a bunch of hits?</li>
<li>Looking again, I see 84,000 requests to <code>/handle</code> this morning (not including logs for library.cgiar.org because those get HTTP 301 redirect to CGSpace and appear here in <code>access.log</code>):</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -c -o -E &quot; /handle/[0-9]+/[0-9]+&quot;
@ -413,7 +413,7 @@ Error sending email:
Please see the DSpace documentation for assistance.
</code></pre><ul>
<li>I checked the settings and apparently I had updated it incorrectly last week after ICT reset the password</li>
<li>Help Moayad with certbot-auto for Let's Encrypt scripts on the new AReS server (linode20)</li>
<li>Help Moayad with certbot-auto for Let&rsquo;s Encrypt scripts on the new AReS server (linode20)</li>
<li>Normalize all <code>text_lang</code> values for metadata on CGSpace and DSpace Test (as I had tested last month):</li>
</ul>
<pre><code>UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('ethnob', 'en', '*', 'E.', '');
@ -455,7 +455,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
<!-- raw HTML omitted -->
<ul>
<li>So this was definitely an attack of some sort&hellip; only God knows why</li>
<li>I noticed a few new bots that don't use the word &ldquo;bot&rdquo; in their user agent and therefore don't match Tomcat's Crawler Session Manager Valve:
<li>I noticed a few new bots that don&rsquo;t use the word &ldquo;bot&rdquo; in their user agent and therefore don&rsquo;t match Tomcat&rsquo;s Crawler Session Manager Valve:
<ul>
<li><code>Blackboard Safeassign</code></li>
<li><code>Unpaywall</code></li>
@ -486,7 +486,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
</ul>
<h2 id="2019-05-15">2019-05-15</h2>
<ul>
<li>Tezira says she's having issues with email reports for approved submissions, but I received an email about collection subscriptions this morning, and I tested with <code>dspace test-email</code> and it's also working&hellip;</li>
<li>Tezira says she&rsquo;s having issues with email reports for approved submissions, but I received an email about collection subscriptions this morning, and I tested with <code>dspace test-email</code> and it&rsquo;s also working&hellip;</li>
<li>Send a list of DSpace build tips to Panagis from AgroKnow</li>
<li>Finally fix the AReS v2 to work via DSpace Test and send it to Peter et al to give their feedback
<ul>
@ -501,7 +501,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 29 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-05-16-investors.csv WITH CSV HEADER;
COPY 995
</code></pre><ul>
<li>Fork the <a href="https://github.com/icarda-git/AReS">ICARDA AReS v1 repository</a> to <a href="https://github.com/ilri/AReS">ILRI's GitHub</a> and give access to CodeObia guys
<li>Fork the <a href="https://github.com/icarda-git/AReS">ICARDA AReS v1 repository</a> to <a href="https://github.com/ilri/AReS">ILRI&rsquo;s GitHub</a> and give access to CodeObia guys
<ul>
<li>The plan is that we develop the v2 code here</li>
</ul>
@ -522,7 +522,7 @@ $ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
</code></pre><ul>
<li>I was going to make a new controlled vocabulary of the top 100 terms after these corrections, but I noticed a bunch of duplicates and variations when I sorted them alphabetically</li>
<li>Instead, I exported a new list and asked Peter to look at it again</li>
<li>Apply Peter's new corrections on DSpace Test and CGSpace:</li>
<li>Apply Peter&rsquo;s new corrections on DSpace Test and CGSpace:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-05-17-fix-25-Investors.csv -db dspace -u dspace -p 'fuuu' -f dc.description.sponsorship -m 29 -t correct -d
$ ./delete-metadata-values.py -i /tmp/2019-05-17-delete-14-Investors.csv -db dspace -u dspace -p 'fuuu' -m 29 -f dc.description.sponsorship -d
@ -581,7 +581,7 @@ COPY 64871
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li>Paola from CIAT asked for a way to generate a report of the top keywords for each year of their articles and journals
<ul>
<li>I told them that the best way (even though it's low tech) is to work on a CSV dump of the collection</li>
<li>I told them that the best way (even though it&rsquo;s low tech) is to work on a CSV dump of the collection</li>
</ul>
</li>
</ul>
@ -600,7 +600,7 @@ COPY 64871
</ul>
<pre><code>2019-05-30 07:19:35,166 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=A5E0C836AF8F3ABB769FE47107AE1CFF:ip_addr=185.71.4.34:failed_login:no DN found for user sa.saini@cgiar.org
</code></pre><ul>
<li>For now I just created an eperson with her personal email address until I have time to check LDAP to see what's up with her CGIAR account:</li>
<li>For now I just created an eperson with her personal email address until I have time to check LDAP to see what&rsquo;s up with her CGIAR account:</li>
</ul>
<pre><code>$ dspace user -a -m blah@blah.com -g Sakshi -s Saini -p 'sknflksnfksnfdls'
</code></pre><!-- raw HTML omitted -->