Add notes for 2020-01-27

This commit is contained in:
2020-01-27 16:20:44 +02:00
parent 207ace0883
commit 8feb93be39
112 changed files with 11466 additions and 5158 deletions

View File

@ -8,7 +8,7 @@
<meta property="og:title" content="November, 2016" />
<meta property="og:description" content="2016-11-01
Add dc.type to the output options for Atmire&#39;s Listings and Reports module (#286)
Add dc.type to the output options for Atmire&rsquo;s Listings and Reports module (#286)
" />
<meta property="og:type" content="article" />
@ -20,10 +20,10 @@ Add dc.type to the output options for Atmire&#39;s Listings and Reports module (
<meta name="twitter:title" content="November, 2016"/>
<meta name="twitter:description" content="2016-11-01
Add dc.type to the output options for Atmire&#39;s Listings and Reports module (#286)
Add dc.type to the output options for Atmire&rsquo;s Listings and Reports module (#286)
"/>
<meta name="generator" content="Hugo 0.62.2" />
<meta name="generator" content="Hugo 0.63.1" />
@ -53,7 +53,7 @@ Add dc.type to the output options for Atmire&#39;s Listings and Reports module (
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.a20c1a4367639632cdb341d23c27ca44fedcc75b0f8b3cbea6203010da153d3c.css" rel="stylesheet" integrity="sha256-ogwaQ2djljLNs0HSPCfKRP7cx1sPizy&#43;piAwENoVPTw=" crossorigin="anonymous">
<link href="https://alanorth.github.io/cgspace-notes/css/style.23e2c3298bcc8c1136c19aba330c211ec94c36f7c4454ea15cf4d3548370042a.css" rel="stylesheet" integrity="sha256-I&#43;LDKYvMjBE2wZq6MwwhHslMNvfERU6hXPTTVINwBCo=" crossorigin="anonymous">
<!-- RSS 2.0 feed -->
@ -101,13 +101,13 @@ Add dc.type to the output options for Atmire&#39;s Listings and Reports module (
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-11/">November, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-11-01T09:21:00&#43;03:00">Tue Nov 01, 2016</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-11-01">2016-11-01</h2>
<ul>
<li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
<li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>
<h2 id="2016-11-02">2016-11-02</h2>
@ -147,7 +147,7 @@ java.lang.NullPointerException
</ul>
<h2 id="2016-11-06">2016-11-06</h2>
<ul>
<li>After re-deploying and re-indexing I didn't see the same issue, and the indexing completed in 85 minutes, which is about how long it is supposed to take</li>
<li>After re-deploying and re-indexing I didn&rsquo;t see the same issue, and the indexing completed in 85 minutes, which is about how long it is supposed to take</li>
</ul>
<h2 id="2016-11-07">2016-11-07</h2>
<ul>
@ -155,8 +155,8 @@ java.lang.NullPointerException
</ul>
<pre><code>$ grep -A 3 contact_info * | grep -E &quot;(Orth|Sisay|Peter|Daniel|Tsega)&quot; | awk -F'-' '{print $1}' | grep linode | uniq | xargs grep linode_id
</code></pre><ul>
<li>I noticed some weird CRPs in the database, and they don't show up in Discovery for some reason, perhaps the <code>:</code></li>
<li>I'll export these and fix them in batch:</li>
<li>I noticed some weird CRPs in the database, and they don&rsquo;t show up in Discovery for some reason, perhaps the <code>:</code></li>
<li>I&rsquo;ll export these and fix them in batch:</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=230 group by text_value order by count desc) to /tmp/crp.csv with csv;
COPY 22
@ -169,11 +169,11 @@ COPY 22
</ul>
<h2 id="2016-11-08">2016-11-08</h2>
<ul>
<li>Atmire's Listings and Reports module seems to be broken on DSpace 5.5</li>
<li>Atmire&rsquo;s Listings and Reports module seems to be broken on DSpace 5.5</li>
</ul>
<p><img src="/cgspace-notes/2016/11/listings-and-reports-55.png" alt="Listings and Reports broken in DSpace 5.5"></p>
<ul>
<li>I've filed a ticket with Atmire</li>
<li>I&rsquo;ve filed a ticket with Atmire</li>
<li>Thinking about batch updates for ORCIDs and authors</li>
<li>Playing with <a href="https://github.com/moonlitesolutions/SolrClient">SolrClient</a> in Python to query Solr</li>
<li>All records in the authority core are either <code>authority_type:orcid</code> or <code>authority_type:person</code></li>
@ -185,7 +185,7 @@ COPY 22
</code></pre><h2 id="2016-11-09">2016-11-09</h2>
<ul>
<li>CGSpace crashed so I quickly ran system updates, applied one or two of the waiting changes from the <code>5_x-prod</code> branch, and rebooted the server</li>
<li>The error was <code>Timeout waiting for idle object</code> but I haven't looked into the Tomcat logs to see what happened</li>
<li>The error was <code>Timeout waiting for idle object</code> but I haven&rsquo;t looked into the Tomcat logs to see what happened</li>
<li>Also, I ran the corrections for CRPs from earlier this week</li>
</ul>
<h2 id="2016-11-10">2016-11-10</h2>
@ -214,7 +214,7 @@ $ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: applica
34
$ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: application/json&quot; -X POST &quot;http://localhost:8080/rest/items/find-by-metadata-field&quot; -d '{&quot;key&quot;: &quot;cg.subject.ilri&quot;,&quot;value&quot;: &quot;SEEDS&quot;, &quot;language&quot;:&quot;en_US&quot;}' | jq length
</code></pre><ul>
<li>The results (55+34=89) don't seem to match those from the database:</li>
<li>The results (55+34=89) don&rsquo;t seem to match those from the database:</li>
</ul>
<pre><code>dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang is null;
count
@ -230,8 +230,8 @@ dspace=# select count(text_value) from metadatavalue where resource_type_id=2 an
66
</code></pre><ul>
<li>So, querying from the API I get 55 + 34 = 89 results, but the database actually only has 85&hellip;</li>
<li>And the <code>find-by-metadata-field</code> endpoint doesn't seem to have a way to get all items with the field, or a wildcard value</li>
<li>I'll ask a question on the dspace-tech mailing list</li>
<li>And the <code>find-by-metadata-field</code> endpoint doesn&rsquo;t seem to have a way to get all items with the field, or a wildcard value</li>
<li>I&rsquo;ll ask a question on the dspace-tech mailing list</li>
<li>And speaking of <code>text_lang</code>, this is interesting:</li>
</ul>
<pre><code>dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2;
@ -274,7 +274,7 @@ UPDATE 420
<pre><code>dspacetest=# update metadatavalue set text_lang=NULL where resource_type_id=2 and text_lang='';
UPDATE 183726
</code></pre><ul>
<li>After that restarted Tomcat and PostgreSQL (because I'm superstitious about caches) and now I see the following in REST API query:</li>
<li>After that restarted Tomcat and PostgreSQL (because I&rsquo;m superstitious about caches) and now I see the following in REST API query:</li>
</ul>
<pre><code>$ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: application/json&quot; -X POST &quot;http://localhost:8080/rest/items/find-by-metadata-field&quot; -d '{&quot;key&quot;: &quot;cg.subject.ilri&quot;,&quot;value&quot;: &quot;SEEDS&quot;}' | jq length
71
@ -282,12 +282,12 @@ $ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: applica
0
$ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: application/json&quot; -X POST &quot;http://localhost:8080/rest/items/find-by-metadata-field&quot; -d '{&quot;key&quot;: &quot;cg.subject.ilri&quot;,&quot;value&quot;: &quot;SEEDS&quot;, &quot;language&quot;:&quot;en_US&quot;}' | jq length
</code></pre><ul>
<li>Not sure what's going on, but Discovery shows 83 values, and database shows 85, so I'm going to reindex Discovery just in case</li>
<li>Not sure what&rsquo;s going on, but Discovery shows 83 values, and database shows 85, so I&rsquo;m going to reindex Discovery just in case</li>
</ul>
<h2 id="2016-11-14">2016-11-14</h2>
<ul>
<li>I applied Atmire's suggestions to fix Listings and Reports for DSpace 5.5 and now it works</li>
<li>There were some issues with the <code>dspace/modules/jspui/pom.xml</code>, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire's installation procedure must have changed</li>
<li>I applied Atmire&rsquo;s suggestions to fix Listings and Reports for DSpace 5.5 and now it works</li>
<li>There were some issues with the <code>dspace/modules/jspui/pom.xml</code>, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire&rsquo;s installation procedure must have changed</li>
<li>So there is apparently this Tomcat native way to limit web crawlers to one session: <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve">Crawler Session Manager</a></li>
<li>After adding that to <code>server.xml</code> bots matching the pattern in the configuration will all use ONE session, just like normal users:</li>
</ul>
@ -327,7 +327,7 @@ X-Cocoon-Version: 2.2.0
<p><img src="/cgspace-notes/2016/11/dspacetest-tomcat-jvm-day.png" alt="Tomcat JVM heap (day) after setting up the Crawler Session Manager">
<img src="/cgspace-notes/2016/11/dspacetest-tomcat-jvm-week.png" alt="Tomcat JVM heap (week) after setting up the Crawler Session Manager"></p>
<ul>
<li>Seems the default regex doesn't catch Baidu, though:</li>
<li>Seems the default regex doesn&rsquo;t catch Baidu, though:</li>
</ul>
<pre><code>$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)'
HTTP/1.1 200 OK
@ -374,7 +374,7 @@ Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)&quot; &quot;
</ul>
<pre><code>$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=localhost -P \!dspace-lni,\!dspace-rdf,\!dspace-sword,\!dspace-swordv2 clean package
</code></pre><ul>
<li>We absolutely don't use those modules, so we shouldn't build them in the first place</li>
<li>We absolutely don&rsquo;t use those modules, so we shouldn&rsquo;t build them in the first place</li>
</ul>
<h2 id="2016-11-17">2016-11-17</h2>
<ul>
@ -394,16 +394,16 @@ UPDATE 7
<li>Had to run it twice to get all (not sure about &ldquo;global&rdquo; regex in PostgreSQL)</li>
<li>Run the updates on CGSpace as well</li>
<li>Run through some collections and manually regenerate some PDF thumbnails for items from before 2016 on DSpace Test to compare with CGSpace</li>
<li>I'm debating forcing the re-generation of ALL thumbnails, since some come from DSpace 3 and 4 when the thumbnailing wasn't as good</li>
<li>I&rsquo;m debating forcing the re-generation of ALL thumbnails, since some come from DSpace 3 and 4 when the thumbnailing wasn&rsquo;t as good</li>
<li>The results were very good, I think that after we upgrade to 5.5 I will do it, perhaps one community / collection at a time:</li>
</ul>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/67156 -p &quot;ImageMagick PDF Thumbnail&quot;
</code></pre><ul>
<li>In related news, I'm looking at thumbnails of thumbnails (the ones we uploaded manually before, and now DSpace's media filter has made thumbnails of THEM):</li>
<li>In related news, I&rsquo;m looking at thumbnails of thumbnails (the ones we uploaded manually before, and now DSpace&rsquo;s media filter has made thumbnails of THEM):</li>
</ul>
<pre><code>dspace=# select text_value from metadatavalue where text_value like '%.jpg.jpg';
</code></pre><ul>
<li>I'm not sure if there's anything we can do, actually, because we would have to remove those from the thumbnail bundles, and replace them with the regular JPGs from the content bundle, and then remove them from the assetstore&hellip;</li>
<li>I&rsquo;m not sure if there&rsquo;s anything we can do, actually, because we would have to remove those from the thumbnail bundles, and replace them with the regular JPGs from the content bundle, and then remove them from the assetstore&hellip;</li>
</ul>
<h2 id="2016-11-18">2016-11-18</h2>
<ul>
@ -419,17 +419,17 @@ UPDATE 7
<h2 id="2016-11-23">2016-11-23</h2>
<ul>
<li>Upgrade Java from 7 to 8 on CGSpace</li>
<li>I had started planning the inplace PostgreSQL 9.3→9.5 upgrade but decided that I will have to <code>pg_dump</code> and <code>pg_restore</code> when I move to the new server soon anyways, so there's no need to upgrade the database right now</li>
<li>I had started planning the inplace PostgreSQL 9.3→9.5 upgrade but decided that I will have to <code>pg_dump</code> and <code>pg_restore</code> when I move to the new server soon anyways, so there&rsquo;s no need to upgrade the database right now</li>
<li>Chat with Carlos about CGCore and the CGSpace metadata registry</li>
<li>Dump CGSpace metadata field registry for Carlos: <a href="https://gist.github.com/alanorth/8cbd0bb2704d4bbec78025b4742f8e70">https://gist.github.com/alanorth/8cbd0bb2704d4bbec78025b4742f8e70</a></li>
<li>Send some feedback to Carlos on CG Core so they can better understand how DSpace/CGSpace uses metadata</li>
<li>Notes about PostgreSQL tuning from James: <a href="https://paste.fedoraproject.org/488776/14798952/">https://paste.fedoraproject.org/488776/14798952/</a></li>
<li>Play with Creative Commons stuff in DSpace submission step</li>
<li>It seems to work but it doesn't let you choose a version of CC (like 4.0), and we would need to customize the XMLUI item display so it doesn't display the gross CC badges</li>
<li>It seems to work but it doesn&rsquo;t let you choose a version of CC (like 4.0), and we would need to customize the XMLUI item display so it doesn&rsquo;t display the gross CC badges</li>
</ul>
<h2 id="2016-11-24">2016-11-24</h2>
<ul>
<li>Bizuwork was testing DSpace Test on DSPace 5.5 and noticed that the Listings and Reports module seems to be case sensitive, whereas CGSpace's Listings and Reports isn't (ie, a search for &ldquo;orth, alan&rdquo; vs &ldquo;Orth, Alan&rdquo; returns the same results on CGSpace, but different on DSpace Test)</li>
<li>Bizuwork was testing DSpace Test on DSPace 5.5 and noticed that the Listings and Reports module seems to be case sensitive, whereas CGSpace&rsquo;s Listings and Reports isn&rsquo;t (ie, a search for &ldquo;orth, alan&rdquo; vs &ldquo;Orth, Alan&rdquo; returns the same results on CGSpace, but different on DSpace Test)</li>
<li>I have raised a ticket with Atmire</li>
<li>Looks like this issue is actually the new Listings and Reports module honoring the Solr search queries more correctly</li>
</ul>
@ -449,7 +449,7 @@ UPDATE 7
</ul>
</li>
<li>Need to do updates for ansible infrastructure role defaults, and switch the GitHub branch to the new 5.5 one</li>
<li>Testing DSpace 5.5 on CGSpace, it seems CUA's export as XLS works for Usage statistics, but not Content statistics</li>
<li>Testing DSpace 5.5 on CGSpace, it seems CUA&rsquo;s export as XLS works for Usage statistics, but not Content statistics</li>
<li>I will raise a bug with Atmire</li>
</ul>
<h2 id="2016-11-28">2016-11-28</h2>
@ -481,7 +481,7 @@ $ /home/dspacetest.cgiar.org/bin/dspace registry-loader -metadata /home/dspacete
</ul>
<h2 id="2016-11-29">2016-11-29</h2>
<ul>
<li>Sisay tried deleting and re-creating Goshu's account but he still can't see any communities on the homepage after he logs in</li>
<li>Sisay tried deleting and re-creating Goshu&rsquo;s account but he still can&rsquo;t see any communities on the homepage after he logs in</li>
<li>Around the time of his login I see this in the DSpace logs:</li>
</ul>
<pre><code>2016-11-29 07:56:36,350 INFO org.dspace.authenticate.LDAPAuthentication @ g.cherinet@cgiar.org:session_id=F628E13AB4EF2BA949198A99EFD8EBE4:ip_addr=213.55.99.121:failed_login:no DN found for user g.cherinet@cgiar.org
@ -510,7 +510,7 @@ org.dspace.discovery.SearchServiceException: Error executing query
</code></pre><ul>
<li>Which, according to some old threads on DSpace Tech, means that the user has a lot of permissions (from groups or on the individual eperson) which increases the Solr query size / query URL</li>
<li>It might be fixed by increasing the Tomcat <code>maxHttpHeaderSize</code>, which is <a href="http://tomcat.apache.org/tomcat-7.0-doc/config/http.html">8192 (or 8KB) by default</a></li>
<li>I've increased the <code>maxHttpHeaderSize</code> to 16384 on DSpace Test and the user said he is now able to see the communities on the homepage</li>
<li>I&rsquo;ve increased the <code>maxHttpHeaderSize</code> to 16384 on DSpace Test and the user said he is now able to see the communities on the homepage</li>
<li>I will make the changes on CGSpace soon</li>
<li>A few users are reporting having issues with their workflows, they get the following message: &ldquo;You are not allowed to perform this task&rdquo;</li>
<li>Might be the same as <a href="https://jira.duraspace.org/browse/DS-2920">DS-2920</a> on the bug tracker</li>
@ -518,7 +518,7 @@ org.dspace.discovery.SearchServiceException: Error executing query
<h2 id="2016-11-30">2016-11-30</h2>
<ul>
<li>The <code>maxHttpHeaderSize</code> fix worked on CGSpace (user is able to see the community list on the homepage)</li>
<li>The &ldquo;take task&rdquo; cache fix worked on DSpace Test but it's not an official patch, so I'll have to report the bug to DSpace people and try to get advice</li>
<li>The &ldquo;take task&rdquo; cache fix worked on DSpace Test but it&rsquo;s not an official patch, so I&rsquo;ll have to report the bug to DSpace people and try to get advice</li>
<li>More work on the KM4Dev Journal article</li>
</ul>