cgspace-notes/docs/2017-02/index.html

479 lines
25 KiB
HTML
Raw Normal View History

2018-02-11 17:28:23 +01:00
<!DOCTYPE html>
<html lang="en" >
2018-02-11 17:28:23 +01:00
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
2020-12-06 15:53:29 +01:00
2018-02-11 17:28:23 +01:00
<meta property="og:title" content="February, 2017" />
<meta property="og:description" content="2017-02-07
An item was mapped twice erroneously again, so I had to remove one of the mappings manually:
dspace=# select * from collection2item where item_id = &#39;80278&#39;;
2019-11-28 16:30:45 +01:00
id | collection_id | item_id
2018-02-11 17:28:23 +01:00
-------&#43;---------------&#43;---------
2019-11-28 16:30:45 +01:00
92551 | 313 | 80278
92550 | 313 | 80278
90774 | 1051 | 80278
2018-02-11 17:28:23 +01:00
(3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1
Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
2020-01-27 15:20:44 +01:00
Looks like we&rsquo;ll be using cg.identifier.ccafsprojectpii as the field name
2018-02-11 17:28:23 +01:00
" />
<meta property="og:type" content="article" />
2019-02-02 13:12:57 +01:00
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-02/" />
2019-08-08 17:10:44 +02:00
<meta property="article:published_time" content="2017-02-07T07:04:52-08:00" />
2020-04-13 16:24:05 +02:00
<meta property="article:modified_time" content="2020-04-13T15:30:24+03:00" />
2018-09-30 07:23:48 +02:00
2020-12-06 15:53:29 +01:00
2018-02-11 17:28:23 +01:00
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="February, 2017"/>
<meta name="twitter:description" content="2017-02-07
An item was mapped twice erroneously again, so I had to remove one of the mappings manually:
dspace=# select * from collection2item where item_id = &#39;80278&#39;;
2019-11-28 16:30:45 +01:00
id | collection_id | item_id
2018-02-11 17:28:23 +01:00
-------&#43;---------------&#43;---------
2019-11-28 16:30:45 +01:00
92551 | 313 | 80278
92550 | 313 | 80278
90774 | 1051 | 80278
2018-02-11 17:28:23 +01:00
(3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1
Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
2020-01-27 15:20:44 +01:00
Looks like we&rsquo;ll be using cg.identifier.ccafsprojectpii as the field name
2018-02-11 17:28:23 +01:00
"/>
2021-12-28 12:24:23 +01:00
<meta name="generator" content="Hugo 0.91.2" />
2018-02-11 17:28:23 +01:00
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "February, 2017",
2020-04-02 09:55:42 +02:00
"url": "https://alanorth.github.io/cgspace-notes/2017-02/",
2018-04-30 18:05:39 +02:00
"wordCount": "2028",
2018-02-11 17:28:23 +01:00
"datePublished": "2017-02-07T07:04:52-08:00",
2020-04-13 16:24:05 +02:00
"dateModified": "2020-04-13T15:30:24+03:00",
2018-02-11 17:28:23 +01:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2017-02/">
<title>February, 2017 | CGSpace Notes</title>
2018-02-11 17:28:23 +01:00
<!-- combined, minified CSS -->
2020-01-23 19:19:38 +01:00
2021-01-24 08:46:27 +01:00
<link href="https://alanorth.github.io/cgspace-notes/css/style.beb8012edc08ba10be012f079d618dc243812267efe62e11f22fe49618f976a4.css" rel="stylesheet" integrity="sha256-vrgBLtwIuhC&#43;AS8HnWGNwkOBImfv5i4R8i/klhj5dqQ=" crossorigin="anonymous">
2018-02-11 17:28:23 +01:00
2020-01-28 11:01:42 +01:00
<!-- minified Font Awesome for SVG icons -->
2021-09-28 09:32:32 +02:00
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity="sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz&#43;lcnA=" crossorigin="anonymous"></script>
2020-01-28 11:01:42 +01:00
2019-04-14 15:59:47 +02:00
<!-- RSS 2.0 feed -->
2018-02-11 17:28:23 +01:00
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
2018-12-19 12:20:39 +01:00
2018-02-11 17:28:23 +01:00
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
2018-02-11 17:28:23 +01:00
</div>
</header>
2018-12-19 12:20:39 +01:00
2018-02-11 17:28:23 +01:00
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-02/">February, 2017</a></h2>
2020-11-16 09:54:00 +01:00
<p class="blog-post-meta">
<time datetime="2017-02-07T07:04:52-08:00">Tue Feb 07, 2017</time>
in
2018-02-11 17:28:23 +01:00
2020-01-28 11:01:42 +01:00
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
2018-02-11 17:28:23 +01:00
</p>
</header>
2019-12-17 13:49:24 +01:00
<h2 id="2017-02-07">2017-02-07</h2>
2018-02-11 17:28:23 +01:00
<ul>
2019-11-28 16:30:45 +01:00
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# select * from collection2item where item_id = '80278';
2019-11-28 16:30:45 +01:00
id | collection_id | item_id
2018-02-11 17:28:23 +01:00
-------+---------------+---------
2019-11-28 16:30:45 +01:00
92551 | 313 | 80278
92550 | 313 | 80278
90774 | 1051 | 80278
2018-02-11 17:28:23 +01:00
(3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
2020-01-27 15:20:44 +01:00
<li>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2017-02-08">2017-02-08</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>We also need to rename some of the CCAFS Phase I flagships:
<ul>
<li>CLIMATE-SMART AGRICULTURAL PRACTICESCLIMATE-SMART TECHNOLOGIES AND PRACTICES</li>
<li>CLIMATE RISK MANAGEMENTCLIMATE SERVICES AND SAFETY NETS</li>
<li>LOW EMISSIONS AGRICULTURELOW EMISSIONS DEVELOPMENT</li>
<li>POLICIES AND INSTITUTIONSPRIORITIES AND POLICIES FOR CSA</li>
2019-05-05 15:45:12 +02:00
</ul>
2019-11-28 16:30:45 +01:00
</li>
2020-01-27 15:20:44 +01:00
<li>The climate risk management one doesn&rsquo;t exist, so I will have to ask Magdalena if they want me to add it to the input forms</li>
2019-11-28 16:30:45 +01:00
<li>Start testing some nearly 500 author corrections that CCAFS sent me:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/CCAFS-Authors-Feb-7.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
2019-12-17 13:49:24 +01:00
</code></pre><h2 id="2017-02-09">2017-02-09</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>More work on CCAFS Phase II stuff</li>
<li>Looks like simply adding a new metadata field to <code>dspace/config/registries/cgiar-types.xml</code> and restarting DSpace causes the field to get added to the rregistry</li>
<li>It requires a restart but at least it allows you to manage the registry programmatically</li>
2020-01-27 15:20:44 +01:00
<li>It&rsquo;s not a very good way to manage the registry, though, as removing one there doesn&rsquo;t cause it to be removed from the registry, and we always restore from database backups so there would never be a scenario when we needed these to be created</li>
2019-11-28 16:30:45 +01:00
<li>Testing some corrections on CCAFS Phase II flagships (<code>cg.subject.ccafs</code>):</li>
2019-05-05 15:45:12 +02:00
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
2019-12-17 13:49:24 +01:00
</code></pre><h2 id="2017-02-10">2017-02-10</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>CCAFS said they want to wait on the flagship updates (<code>cg.subject.ccafs</code>) on CGSpace, perhaps for a month or so</li>
<li>Help Marianne Gadeberg (WLE) with some user permissions as it seems she had previously been using a personal email account, and is now on a CGIAR one</li>
<li>I manually added her new account to ~25 authorizations that her hold user was on</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2017-02-14">2017-02-14</h2>
2018-02-11 17:28:23 +01:00
<ul>
2020-01-27 15:20:44 +01:00
<li>Add <code>SCALING</code> to ILRI subjects (<a href="https://github.com/ilri/DSpace/pull/304">#304</a>), as Sisay&rsquo;s attempts were all sloppy</li>
2018-02-11 17:28:23 +01:00
<li>Cherry pick some patches from the DSpace 5.7 branch:
<ul>
<li>DS-3363 CSV import error says &ldquo;row&rdquo;, means &ldquo;column&rdquo;: f7b6c83e991db099003ee4e28ca33d3c7bab48c0</li>
<li>DS-3479 avoid adding empty metadata values during import: 329f3b48a6de7fad074d825fd12118f7e181e151</li>
<li>[DS-3456] 5x Clarify command line options for statisics import/export tools (#1623): 567ec083c8a94eb2bcc1189816eb4f767745b278</li>
<li>[DS-3458]5x Allow Shard Process to Append to an existing repo: 3c8ecb5d1fd69a1dcfee01feed259e80abbb7749</li>
2019-11-28 16:30:45 +01:00
</ul>
</li>
2018-02-11 17:28:23 +01:00
<li>I still need to test these, especially as the last two which change some stuff with Solr maintenance</li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2017-02-15">2017-02-15</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Update rvm on DSpace Test and CGSpace as there was a <a href="https://github.com/justinsteven/advisories/blob/master/2017_rvm_cd_command_execution.md">security disclosure about versions less than 1.28.0</a></li>
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2017-02-16">2017-02-16</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Looking at memory info from munin on CGSpace:</li>
</ul>
2019-11-28 16:30:45 +01:00
<p><img src="/cgspace-notes/2017/02/meminfo_phisical-week.png" alt="CGSpace meminfo"></p>
2018-02-11 17:28:23 +01:00
<ul>
<li>We are using only ~8GB of RAM for applications, and 16GB for caches!</li>
2020-01-27 15:20:44 +01:00
<li>The Linode machine we&rsquo;re on has 24GB of RAM but only because that&rsquo;s the only instance that had enough disk space for us (384GB)&hellip;</li>
2018-02-11 17:28:23 +01:00
<li>We should probably look into Google Compute Engine or Digital Ocean where we can get more storage without having to follow a linear increase in instance pricing for CPU/memory as well</li>
<li>Especially because we only use 2 out of 8 CPUs basically:</li>
</ul>
2019-11-28 16:30:45 +01:00
<p><img src="/cgspace-notes/2017/02/cpu-week.png" alt="CGSpace CPU"></p>
2018-02-11 17:28:23 +01:00
<ul>
<li>Fix issue with duplicate declaration of in atmire-dspace-xmlui <code>pom.xml</code> (causing non-fatal warnings during the maven build)</li>
2020-01-27 15:20:44 +01:00
<li>Experiment with making DSpace generate HTTPS handle links, first a change in dspace.cfg or the site&rsquo;s properties file:</li>
2019-11-28 16:30:45 +01:00
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>handle.canonical.prefix = https://hdl.handle.net/
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>And then a SQL command to update existing records:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'uri');
2018-02-11 17:28:23 +01:00
UPDATE 58193
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Seems to work fine!</li>
<li>I noticed a few items that have incorrect DOI links (<code>dc.identifier.doi</code>), and after looking in the database I see there are over 100 that are missing the scheme or are just plain wrong:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value not like 'http%://%';
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>This will replace any that begin with <code>10.</code> and change them to <code>https://dx.doi.org/10.</code>:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^10\..+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like '10.%';
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>This will get any that begin with <code>doi:10.</code> and change them to <code>https://dx.doi.org/10.x</code>:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^doi:(10\..+$)', 'https://dx.doi.org/\1') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'doi:10%';
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Fix DOIs like <code>dx.doi.org/10.</code> to be <code>https://dx.doi.org/10.</code>:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org/%';
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Fix DOIs like <code>http//</code>:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '^http//(dx.doi.org/.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http//%';
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Fix DOIs like <code>dx.doi.org./</code>:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '(^dx.doi.org\./.+$)', 'https://dx.doi.org/\1') where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'dx.doi.org./%'
2018-02-11 17:28:23 +01:00
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Delete some invalid DOIs:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value in ('DOI','CPWF Mekong','Bulawayo, Zimbabwe','bb');
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Fix some other random outliers:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.1016/j.aquaculture.2015.09.003' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:/dx.doi.org/10.1016/j.aquaculture.2015.09.003';
2018-02-11 17:28:23 +01:00
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.5337/2016.200' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'doi: https://dx.doi.org/10.5337/2016.200';
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/doi:10.1371/journal.pone.0062898' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'Http://dx.doi.org/doi:10.1371/journal.pone.0062898';
dspace=# update metadatavalue set text_value = 'https://dx.doi.10.1016/j.cosust.2013.11.012' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'http:dx.doi.10.1016/j.cosust.2013.11.012';
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.1080/03632415.2014.883570' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'org/10.1080/03632415.2014.883570';
dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.15446/agron.colomb.v32n3.46052' where metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value = 'Doi: 10.15446/agron.colomb.v32n3.46052';
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>And do another round of <code>http://</code> → <code>https://</code> cleanups:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://dx.doi.org', 'https://dx.doi.org') where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'identifier' and qualifier = 'doi') and text_value like 'http://dx.doi.org%';
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>Run all DOI corrections on CGSpace</li>
2020-04-13 16:24:05 +02:00
<li>Something to think about here is to write a <a href="https://wiki.lyrasis.org/display/DSDOC5x/Curation+System#CurationSystem-ScriptedTasks">Curation Task</a> in Java to do these sanity checks / corrections every night</li>
2019-11-28 16:30:45 +01:00
<li>Then we could add a cron job for them and run them from the command line like:</li>
2019-05-05 15:45:12 +02:00
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>[dspace]/bin/dspace curate -t noop -i 10568/79891
2019-12-17 13:49:24 +01:00
</code></pre><h2 id="2017-02-20">2017-02-20</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Run all system updates on DSpace Test and reboot the server</li>
<li>Run CCAFS author corrections on DSpace Test and CGSpace and force a full discovery reindex</li>
<li>Fix label of CCAFS subjects in Atmire Listings and Reports module</li>
<li>Help Sisay with SQL commands</li>
<li>Help Paola from CCAFS with the Atmire Listings and Reports module</li>
2020-01-27 15:20:44 +01:00
<li>Testing the <code>fix-metadata-values.py</code> script on macOS and it seems like we don&rsquo;t need to use <code>.encode('utf-8')</code> anymore when printing strings to the screen</li>
2019-11-28 16:30:45 +01:00
<li>It seems this might have only been a temporary problem, as both Python 3.5.2 and 3.6.0 are able to print the problematic string &ldquo;Entwicklung &amp; Ländlicher Raum&rdquo; without the <code>encode()</code> call, but print it as a bytes when it <em>is</em> used:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>$ python
2018-02-11 17:28:23 +01:00
Python 3.6.0 (default, Dec 25 2016, 17:30:53)
&gt;&gt;&gt; print('Entwicklung &amp; Ländlicher Raum')
Entwicklung &amp; Ländlicher Raum
&gt;&gt;&gt; print('Entwicklung &amp; Ländlicher Raum'.encode())
b'Entwicklung &amp; L\xc3\xa4ndlicher Raum'
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>So for now I will remove the encode call from the script (though it was never used on the versions on the Linux hosts), leading me to believe it really <em>was</em> a temporary problem, perhaps due to macOS or the Python build I was using.</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2017-02-21">2017-02-21</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Testing regenerating PDF thumbnails, like I started in 2016-11</li>
2020-01-27 15:20:44 +01:00
<li>It seems there is a bug in <code>filter-media</code> that causes it to process formats that aren&rsquo;t part of its configuration:</li>
2019-11-28 16:30:45 +01:00
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16856 -p &quot;ImageMagick PDF Thumbnail&quot;
2018-02-11 17:28:23 +01:00
File: earlywinproposal_esa_postharvest.pdf.jpg
FILTERED: bitstream 13787 (item: 10568/16881) and created 'earlywinproposal_esa_postharvest.pdf.jpg'
File: postHarvest.jpg.jpg
FILTERED: bitstream 16524 (item: 10568/24655) and created 'postHarvest.jpg.jpg'
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>According to <code>dspace.cfg</code> the ImageMagick PDF Thumbnail plugin should only process PDFs:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>filter.org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter.inputFormats = BMP, GIF, image/png, JPG, TIFF, JPEG, JPEG 2000
2018-02-11 17:28:23 +01:00
filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = Adobe PDF
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2020-01-27 15:20:44 +01:00
<li>I&rsquo;ve sent a message to the mailing list and might file a Jira issue</li>
2019-11-28 16:30:45 +01:00
<li>Ask Atmire about the failed interpolation of the <code>dspace.internalUrl</code> variable in <code>atmire-cua.cfg</code></li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2017-02-22">2017-02-22</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>Atmire said I can add <code>dspace.internalUrl</code> to my build properties and the error will go away</li>
2020-01-27 15:20:44 +01:00
<li>It should be the local URL for accessing Tomcat from the server&rsquo;s own perspective, ie: http://localhost:8080</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2017-02-26">2017-02-26</h2>
2018-02-11 17:28:23 +01:00
<ul>
2020-01-27 15:20:44 +01:00
<li>Find all fields with &ldquo;<a href="http://hdl.handle.net">http://hdl.handle.net</a>&rdquo; values (most are in <code>dc.identifier.uri</code>, but some are in other URL-related fields like <code>cg.link.reference</code>, <code>cg.identifier.dataurl</code>, and <code>cg.identifier.url</code>):</li>
2019-11-28 16:30:45 +01:00
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# select distinct metadata_field_id from metadatavalue where resource_type_id=2 and text_value like 'http://hdl.handle.net%';
2018-02-11 17:28:23 +01:00
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where resource_type_id=2 and metadata_field_id IN (25, 113, 179, 219, 220, 223) and text_value like 'http://hdl.handle.net%';
UPDATE 58633
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2020-01-27 15:20:44 +01:00
<li>This works but I&rsquo;m thinking I&rsquo;ll wait on the replacement as there are perhaps some other places that rely on <code>http://hdl.handle.net</code> (grep the code, it&rsquo;s scary how many things are hard coded)</li>
2019-11-28 16:30:45 +01:00
<li>Send message to dspace-tech mailing list with concerns about this</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2017-02-27">2017-02-27</h2>
2018-02-11 17:28:23 +01:00
<ul>
2020-01-27 15:20:44 +01:00
<li>LDAP users cannot log in today, looks to be an issue with CGIAR&rsquo;s LDAP server:</li>
2019-11-28 16:30:45 +01:00
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>$ openssl s_client -connect svcgroot2.cgiarad.org:3269
2018-02-11 17:28:23 +01:00
CONNECTED(00000003)
depth=0 CN = SVCGROOT2.CGIARAD.ORG
verify error:num=20:unable to get local issuer certificate
verify return:1
depth=0 CN = SVCGROOT2.CGIARAD.ORG
verify error:num=21:unable to verify the first certificate
verify return:1
---
Certificate chain
2019-11-28 16:30:45 +01:00
0 s:/CN=SVCGROOT2.CGIARAD.ORG
i:/CN=CGIARAD-RDWA-CA
2018-02-11 17:28:23 +01:00
---
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>For some reason it is now signed by a private certificate authority</li>
<li>This error seems to have started on 2017-02-25:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>$ grep -c &quot;unable to find valid certification path&quot; [dspace]/log/dspace.log.2017-02-*
2018-02-11 17:28:23 +01:00
[dspace]/log/dspace.log.2017-02-01:0
[dspace]/log/dspace.log.2017-02-02:0
[dspace]/log/dspace.log.2017-02-03:0
[dspace]/log/dspace.log.2017-02-04:0
[dspace]/log/dspace.log.2017-02-05:0
[dspace]/log/dspace.log.2017-02-06:0
[dspace]/log/dspace.log.2017-02-07:0
[dspace]/log/dspace.log.2017-02-08:0
[dspace]/log/dspace.log.2017-02-09:0
[dspace]/log/dspace.log.2017-02-10:0
[dspace]/log/dspace.log.2017-02-11:0
[dspace]/log/dspace.log.2017-02-12:0
[dspace]/log/dspace.log.2017-02-13:0
[dspace]/log/dspace.log.2017-02-14:0
[dspace]/log/dspace.log.2017-02-15:0
[dspace]/log/dspace.log.2017-02-16:0
[dspace]/log/dspace.log.2017-02-17:0
[dspace]/log/dspace.log.2017-02-18:0
[dspace]/log/dspace.log.2017-02-19:0
[dspace]/log/dspace.log.2017-02-20:0
[dspace]/log/dspace.log.2017-02-21:0
[dspace]/log/dspace.log.2017-02-22:0
[dspace]/log/dspace.log.2017-02-23:0
[dspace]/log/dspace.log.2017-02-24:0
[dspace]/log/dspace.log.2017-02-25:7
[dspace]/log/dspace.log.2017-02-26:8
[dspace]/log/dspace.log.2017-02-27:90
2019-11-28 16:30:45 +01:00
</code></pre><ul>
2020-01-27 15:20:44 +01:00
<li>Also, it seems that we need to use a different user for LDAP binds, as we&rsquo;re still using the temporary one from the root migration, so maybe we can go back to the previous user we were using</li>
2019-11-28 16:30:45 +01:00
<li>So it looks like the certificate is invalid AND the bind users we had been using were deleted</li>
<li>Biruk Debebe recreated the bind user and now we are just waiting for CGNET to update their certificates</li>
<li>Regarding the <code>filter-media</code> issue I found earlier, it seems that the ImageMagick PDF plugin will also process JPGs if they are in the &ldquo;Content Files&rdquo; (aka <code>ORIGINAL</code>) bundle</li>
<li>The problem likely lies in the logic of <code>ImageMagickThumbnailFilter.java</code>, as <code>ImageMagickPdfThumbnailFilter.java</code> extends it</li>
<li>Run CIAT corrections on CGSpace</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>CGNET has fixed the certificate chain on their LDAP server</li>
<li>Redeploy CGSpace and DSpace Test to on latest <code>5_x-prod</code> branch with fixes for LDAP bind user</li>
<li>Run all system updates on CGSpace server and reboot</li>
2018-02-11 17:28:23 +01:00
</ul>
2019-12-17 13:49:24 +01:00
<h2 id="2017-02-28">2017-02-28</h2>
2018-02-11 17:28:23 +01:00
<ul>
<li>After running the CIAT corrections and updating the Discovery and authority indexes, there is still no change in the number of items listed for CIAT in Discovery</li>
2020-01-27 15:20:44 +01:00
<li>Ah, this is probably because some items have the <code>International Center for Tropical Agriculture</code> author twice, which I first noticed in 2016-12 but couldn&rsquo;t figure out how to fix</li>
2019-11-28 16:30:45 +01:00
<li>I think I can do it by first exporting all metadatavalues that have the author <code>International Center for Tropical Agriculture</code></li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='International Center for Tropical Agriculture') to /tmp/ciat.csv with csv;
2018-02-11 17:28:23 +01:00
COPY 1968
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>And then use awk to print the duplicate lines to a separate file:</li>
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>$ awk -F',' 'seen[$1]++' /tmp/ciat.csv &gt; /tmp/ciat-dupes.csv
2019-11-28 16:30:45 +01:00
</code></pre><ul>
<li>From that file I can create a list of 279 deletes and put them in a batch script like:</li>
2019-05-05 15:45:12 +02:00
</ul>
2021-09-13 15:21:16 +02:00
<pre tabindex="0"><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and metadata_value_id=2742061;
2019-11-28 16:30:45 +01:00
</code></pre>
2018-02-11 17:28:23 +01:00
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
2022-01-01 14:21:47 +01:00
<li><a href="/cgspace-notes/2022-01/">January, 2022</a></li>
2021-12-03 11:58:43 +01:00
<li><a href="/cgspace-notes/2021-12/">December, 2021</a></li>
2021-11-01 09:49:21 +01:00
<li><a href="/cgspace-notes/2021-11/">November, 2021</a></li>
<li><a href="/cgspace-notes/2021-10/">October, 2021</a></li>
2021-09-02 16:21:48 +02:00
<li><a href="/cgspace-notes/2021-09/">September, 2021</a></li>
2018-02-11 17:28:23 +01:00
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
2018-02-11 17:28:23 +01:00
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>