From a3f0d889459e7cb399912c6e35f425f1576d5ca3 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Tue, 28 Feb 2017 18:57:31 +0200 Subject: [PATCH] Add notes for 2017-02-28 --- content/post/2017-02.md | 18 ++++++++++++++++++ public/2017-02/index.html | 24 +++++++++++++++++++++++- public/index.xml | 22 ++++++++++++++++++++++ public/post/index.xml | 22 ++++++++++++++++++++++ public/tags/notes/index.xml | 22 ++++++++++++++++++++++ 5 files changed, 107 insertions(+), 1 deletion(-) diff --git a/content/post/2017-02.md b/content/post/2017-02.md index 09b94d350..94e09b030 100644 --- a/content/post/2017-02.md +++ b/content/post/2017-02.md @@ -290,6 +290,24 @@ $ grep -c "unable to find valid certification path" [dspace]/log/dspace.log.2017 - Regarding the `filter-media` issue I found earlier, it seems that the ImageMagick PDF plugin will also process JPGs if they are in the "Content Files" (aka `ORIGINAL`) bundle - The problem likely lies in the logic of `ImageMagickThumbnailFilter.java`, as `ImageMagickPdfThumbnailFilter.java` extends it - Run CIAT corrections on CGSpace + +``` +dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture'; +``` + - CGNET has fixed the certificate chain on their LDAP server - Redeploy CGSpace and DSpace Test to on latest `5_x-prod` branch with fixes for LDAP bind user - Run all system updates on CGSpace server and reboot + +## 2017-02-28 + +- After running the CIAT corrections and updating the Discovery and authority indexes, there is still no change in the number of items listed for CIAT in Discovery +- Ah, this is probably because some items have the `International Center for Tropical Agriculture` author twice, which I first noticed in 2016-12 but couldn't figure out how to fix +- I think I can do it by first exporting all metadatavalues that have the author `International Center for Tropical Agriculture` + +``` +dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='International Center for Tropical Agriculture') to /tmp/ciat.csv with csv; +COPY 1968 +``` + +- And then using awk or uniq to either remove or print the lines that have a duplicate `resource_id` (meaning they belong to the same item in DSpace and are therefore duplicates), and then using the `metadata_value_id` to delete them diff --git a/public/2017-02/index.html b/public/2017-02/index.html index 8d2f79be1..6a8140698 100644 --- a/public/2017-02/index.html +++ b/public/2017-02/index.html @@ -90,7 +90,7 @@ Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name "headline": "February, 2017", "url": "https://alanorth.github.io/cgspace-notes/2017-02/", - "wordCount": "1862", + "wordCount": "2019", "datePublished": "2017-02-07T07:04:52-08:00", @@ -498,11 +498,33 @@ Certificate chain
  • Regarding the filter-media issue I found earlier, it seems that the ImageMagick PDF plugin will also process JPGs if they are in the “Content Files” (aka ORIGINAL) bundle
  • The problem likely lies in the logic of ImageMagickThumbnailFilter.java, as ImageMagickPdfThumbnailFilter.java extends it
  • Run CIAT corrections on CGSpace
  • + + +
    dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
    +
    + + +

    2017-02-28

    + + + +
    dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='International Center for Tropical Agriculture') to /tmp/ciat.csv with csv;
    +COPY 1968
    +
    + + + diff --git a/public/index.xml b/public/index.xml index 42fc714da..e24c7fd70 100644 --- a/public/index.xml +++ b/public/index.xml @@ -348,9 +348,31 @@ Certificate chain <li>Regarding the <code>filter-media</code> issue I found earlier, it seems that the ImageMagick PDF plugin will also process JPGs if they are in the &ldquo;Content Files&rdquo; (aka <code>ORIGINAL</code>) bundle</li> <li>The problem likely lies in the logic of <code>ImageMagickThumbnailFilter.java</code>, as <code>ImageMagickPdfThumbnailFilter.java</code> extends it</li> <li>Run CIAT corrections on CGSpace</li> +</ul> + +<pre><code>dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture'; +</code></pre> + +<ul> <li>CGNET has fixed the certificate chain on their LDAP server</li> <li>Redeploy CGSpace and DSpace Test to on latest <code>5_x-prod</code> branch with fixes for LDAP bind user</li> <li>Run all system updates on CGSpace server and reboot</li> +</ul> + +<h2 id="2017-02-28">2017-02-28</h2> + +<ul> +<li>After running the CIAT corrections and updating the Discovery and authority indexes, there is still no change in the number of items listed for CIAT in Discovery</li> +<li>Ah, this is probably because some items have the <code>International Center for Tropical Agriculture</code> author twice, which I first noticed in 2016-12 but couldn&rsquo;t figure out how to fix</li> +<li>I think I can do it by first exporting all metadatavalues that have the author <code>International Center for Tropical Agriculture</code></li> +</ul> + +<pre><code>dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='International Center for Tropical Agriculture') to /tmp/ciat.csv with csv; +COPY 1968 +</code></pre> + +<ul> +<li>And then using awk or uniq to either remove or print the lines that have a duplicate <code>resource_id</code> (meaning they belong to the same item in DSpace and are therefore duplicates), and then using the <code>metadata_value_id</code> to delete them</li> </ul> diff --git a/public/post/index.xml b/public/post/index.xml index 8b507d363..baccecf8e 100644 --- a/public/post/index.xml +++ b/public/post/index.xml @@ -348,9 +348,31 @@ Certificate chain <li>Regarding the <code>filter-media</code> issue I found earlier, it seems that the ImageMagick PDF plugin will also process JPGs if they are in the &ldquo;Content Files&rdquo; (aka <code>ORIGINAL</code>) bundle</li> <li>The problem likely lies in the logic of <code>ImageMagickThumbnailFilter.java</code>, as <code>ImageMagickPdfThumbnailFilter.java</code> extends it</li> <li>Run CIAT corrections on CGSpace</li> +</ul> + +<pre><code>dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture'; +</code></pre> + +<ul> <li>CGNET has fixed the certificate chain on their LDAP server</li> <li>Redeploy CGSpace and DSpace Test to on latest <code>5_x-prod</code> branch with fixes for LDAP bind user</li> <li>Run all system updates on CGSpace server and reboot</li> +</ul> + +<h2 id="2017-02-28">2017-02-28</h2> + +<ul> +<li>After running the CIAT corrections and updating the Discovery and authority indexes, there is still no change in the number of items listed for CIAT in Discovery</li> +<li>Ah, this is probably because some items have the <code>International Center for Tropical Agriculture</code> author twice, which I first noticed in 2016-12 but couldn&rsquo;t figure out how to fix</li> +<li>I think I can do it by first exporting all metadatavalues that have the author <code>International Center for Tropical Agriculture</code></li> +</ul> + +<pre><code>dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='International Center for Tropical Agriculture') to /tmp/ciat.csv with csv; +COPY 1968 +</code></pre> + +<ul> +<li>And then using awk or uniq to either remove or print the lines that have a duplicate <code>resource_id</code> (meaning they belong to the same item in DSpace and are therefore duplicates), and then using the <code>metadata_value_id</code> to delete them</li> </ul> diff --git a/public/tags/notes/index.xml b/public/tags/notes/index.xml index 8f405e439..c15cf2a02 100644 --- a/public/tags/notes/index.xml +++ b/public/tags/notes/index.xml @@ -347,9 +347,31 @@ Certificate chain <li>Regarding the <code>filter-media</code> issue I found earlier, it seems that the ImageMagick PDF plugin will also process JPGs if they are in the &ldquo;Content Files&rdquo; (aka <code>ORIGINAL</code>) bundle</li> <li>The problem likely lies in the logic of <code>ImageMagickThumbnailFilter.java</code>, as <code>ImageMagickPdfThumbnailFilter.java</code> extends it</li> <li>Run CIAT corrections on CGSpace</li> +</ul> + +<pre><code>dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture'; +</code></pre> + +<ul> <li>CGNET has fixed the certificate chain on their LDAP server</li> <li>Redeploy CGSpace and DSpace Test to on latest <code>5_x-prod</code> branch with fixes for LDAP bind user</li> <li>Run all system updates on CGSpace server and reboot</li> +</ul> + +<h2 id="2017-02-28">2017-02-28</h2> + +<ul> +<li>After running the CIAT corrections and updating the Discovery and authority indexes, there is still no change in the number of items listed for CIAT in Discovery</li> +<li>Ah, this is probably because some items have the <code>International Center for Tropical Agriculture</code> author twice, which I first noticed in 2016-12 but couldn&rsquo;t figure out how to fix</li> +<li>I think I can do it by first exporting all metadatavalues that have the author <code>International Center for Tropical Agriculture</code></li> +</ul> + +<pre><code>dspace=# \copy (select resource_id, metadata_value_id from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='International Center for Tropical Agriculture') to /tmp/ciat.csv with csv; +COPY 1968 +</code></pre> + +<ul> +<li>And then using awk or uniq to either remove or print the lines that have a duplicate <code>resource_id</code> (meaning they belong to the same item in DSpace and are therefore duplicates), and then using the <code>metadata_value_id</code> to delete them</li> </ul>