Add notes for 2019-09-27

2025-01-27 05:49:12 +01:00 · 2019-09-27 17:53:18 +03:00
parent a8f833a6c6
commit aaae1e2cfe
5 changed files with 94 additions and 26 deletions
--- a/content/posts/2019-08.md
+++ b/content/posts/2019-08.md
@ -358,12 +358,5 @@ sys     2m27.496s
  - After reading the code I see that XSLT is reading the community titles from the DIM representation (stored in the `$dim` variable) created from METS
  - I modified the patterns in my sed script so that those lines are not replaced and then the community list works again
  - This is actually not a problem at all because this metadata is only used in the HTML meta tags in XMLUI community lists and has nothing to do with item metadata
 - Get a list of institutions from CCAFS's Clarisa API and try to parse it with `jq` and pass it through `csvcut` to add line numbers:
 ```
 $ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed 's/"//g' | csvcut -l > /tmp/investors.csv
 ```
 - I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against...
 <!-- vim: set sw=2 ts=2: -->
--- a/content/posts/2019-09.md
+++ b/content/posts/2019-09.md
@ -319,5 +319,37 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio
 - Give more feedback to Bosede about the [IITA Sept 6 (20196th.xls) records on DSpace Test](https://dspacetest.cgiar.org/handle/10568/105116)
  - I told her to delete one item that appears to be a duplicate, or to fix its citation to be correct if she thinks it is not a duplicate
  - I deleted another item that I had previously identified as a duplicate that she had fixed by incorrectly deleting the original (ugh)
 - Get a list of institutions from CCAFS's Clarisa API and try to parse it with `jq`, do some small cleanups and add a header in `sed`, and then pass it through `csvcut` to add line numbers:
 ```
 $ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed -e 's/"//g' -e 's/^ //' -e '1iname' | csvcut -l | sed '1s/line_number/id/' > /tmp/clarisa-institutions.csv
 $ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institutions-cleaned.csv -u
 ```
 - The csv-metadata-quality tool caught a few records with excessive spacing and unnecessary Unicode
 - I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against...
 ## 2019-09-27
 - Skype with Peter and Abenet about CGSpace actions
  - Peter will respond to ICARDA's request to deposit items in to CGSpace, with a caveat that we agree on some vocabulary standards for institutions, countries, regions, etc
  - We discussed using ISO 3166 for countries, though Peter doesn't like the formal names like "Moldova, Republic of" and "Tanzania, United Republic of"
    - The Debian `iso-codes` package has ISO 3166-1 with "common name", "name", and "official name" representations, for example:
      - common_name: Tanzania
      - name: Tanzania, United Republic of
      - official_name: United Republic of Tanzania
    - There are still some unfortunate ones there, though:
      - name: Korea, Democratic People's Republic of
      - official_name: Democratic People's Republic of Korea
    - And this, which isn't even in English...
      - name: Côte d'Ivoire
      - official_name: Republic of Côte d'Ivoire
    - The other alternative is to just keep using the names we have, which are mostly compliant with AGROVOC
  - Peter said that a new server for DSpace Test is fine, so I can proceed with the normal process of getting approval from Michael Victor and ICT when I have time (recommend moving from $40 to $80/month Linode, with 16GB RAM)
  - I need to ask Atmire for a quote to upgrade CGSpace to DSpace 6 with all current modules so we can see how many more credits we need
 - A little bit more work on the Sept 6 IITA batch records
  - Bosede deleted the one item that I told her was a duplicate
  - I checked the AGROVOC subjects and fixed one incorrect one
  - Then I told her that I think the items are ready to go to CGSpace and asked Abenet for a final comment
 <!-- vim: set sw=2 ts=2: -->
--- a/docs/2019-08/index.html
+++ b/docs/2019-08/index.html
@ -27,7 +27,7 @@ Run system updates on DSpace Test (linode19) and reboot it
 <meta property="og:type" content="article" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-08/" />
 <meta property="article:published_time" content="2019-08-03T12:39:51+03:00" />
-<meta property="article:modified_time" content="2019-09-01T01:54:55+03:00" />
+<meta property="article:modified_time" content="2019-09-27T01:20:09+03:00" />
 <meta name="twitter:card" content="summary"/>
 <meta name="twitter:title" content="August, 2019"/>
@ -59,9 +59,9 @@ Run system updates on DSpace Test (linode19) and reboot it
  "@type": "BlogPosting",
  "headline": "August, 2019",
  "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-08\/",
-  "wordCount": "2770",
+  "wordCount": "2703",
  "datePublished": "2019-08-03T12:39:51\x2b03:00",
-  "dateModified": "2019-09-01T01:54:55\x2b03:00",
+  "dateModified": "2019-09-27T01:20:09\x2b03:00",
  "author": {
    "@type": "Person",
    "name": "Alan Orth"
@ -603,13 +603,6 @@ sys     2m27.496s
 <li>I modified the patterns in my sed script so that those lines are not replaced and then the community list works again</li>
 <li>This is actually not a problem at all because this metadata is only used in the HTML meta tags in XMLUI community lists and has nothing to do with item metadata</li>
 </ul></li>
 <li><p>Get a list of institutions from CCAFS&rsquo;s Clarisa API and try to parse it with <code>jq</code> and pass it through <code>csvcut</code> to add line numbers:</p>
 <pre><code>$ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed 's/&quot;//g' | csvcut -l &gt; /tmp/investors.csv
 </code></pre></li>
 <li><p>I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against&hellip;</p></li>
 </ul>
 <!-- vim: set sw=2 ts=2: -->
--- a/docs/2019-09/index.html
+++ b/docs/2019-09/index.html
@ -40,7 +40,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
 <meta property="og:type" content="article" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-09/" />
 <meta property="article:published_time" content="2019-09-01T10:17:51+03:00" />
-<meta property="article:modified_time" content="2019-09-26T14:21:41+03:00" />
+<meta property="article:modified_time" content="2019-09-27T01:20:09+03:00" />
 <meta name="twitter:card" content="summary"/>
 <meta name="twitter:title" content="September, 2019"/>
@ -85,9 +85,9 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
  "@type": "BlogPosting",
  "headline": "September, 2019",
  "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-09\/",
-  "wordCount": "2497",
+  "wordCount": "2870",
  "datePublished": "2019-09-01T10:17:51\x2b03:00",
-  "dateModified": "2019-09-26T14:21:41\x2b03:00",
+  "dateModified": "2019-09-27T01:20:09\x2b03:00",
  "author": {
    "@type": "Person",
    "name": "Alan Orth"
@ -561,6 +561,56 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio
 <li>I told her to delete one item that appears to be a duplicate, or to fix its citation to be correct if she thinks it is not a duplicate</li>
 <li>I deleted another item that I had previously identified as a duplicate that she had fixed by incorrectly deleting the original (ugh)</li>
 </ul></li>
 <li><p>Get a list of institutions from CCAFS&rsquo;s Clarisa API and try to parse it with <code>jq</code>, do some small cleanups and add a header in <code>sed</code>, and then pass it through <code>csvcut</code> to add line numbers:</p>
 <pre><code>$ cat ~/Downloads/institutions.json| jq '.[] | {name: .name}' | grep name | awk -F: '{print $2}' | sed -e 's/&quot;//g' -e 's/^ //' -e '1iname' | csvcut -l | sed '1s/line_number/id/' &gt; /tmp/clarisa-institutions.csv
 $ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institutions-cleaned.csv -u
 </code></pre></li>
 <li><p>The csv-metadata-quality tool caught a few records with excessive spacing and unnecessary Unicode</p></li>
 <li><p>I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against&hellip;</p></li>
 </ul>
 <h2 id="2019-09-27">2019-09-27</h2>
 <ul>
 <li>Skype with Peter and Abenet about CGSpace actions
 <ul>
 <li>Peter will respond to ICARDA&rsquo;s request to deposit items in to CGSpace, with a caveat that we agree on some vocabulary standards for institutions, countries, regions, etc</li>
 <li>We discussed using ISO 3166 for countries, though Peter doesn&rsquo;t like the formal names like &ldquo;Moldova, Republic of&rdquo; and &ldquo;Tanzania, United Republic of&rdquo;</li>
 <li>The Debian <code>iso-codes</code> package has ISO 3166-1 with &ldquo;common name&rdquo;, &ldquo;name&rdquo;, and &ldquo;official name&rdquo; representations, for example:
 <ul>
 <li>common_name: Tanzania</li>
 <li>name: Tanzania, United Republic of</li>
 <li>official_name: United Republic of Tanzania</li>
 </ul></li>
 <li>There are still some unfortunate ones there, though:
 <ul>
 <li>name: Korea, Democratic People&rsquo;s Republic of</li>
 <li>official_name: Democratic People&rsquo;s Republic of Korea</li>
 </ul></li>
 <li>And this, which isn&rsquo;t even in English&hellip;
 <ul>
 <li>name: Côte d&rsquo;Ivoire</li>
 <li>official_name: Republic of Côte d&rsquo;Ivoire</li>
 </ul></li>
 <li>The other alternative is to just keep using the names we have, which are mostly compliant with AGROVOC</li>
 <li>Peter said that a new server for DSpace Test is fine, so I can proceed with the normal process of getting approval from Michael Victor and ICT when I have time (recommend moving from $40 to $80/month Linode, with 16GB RAM)</li>
 <li>I need to ask Atmire for a quote to upgrade CGSpace to DSpace 6 with all current modules so we can see how many more credits we need</li>
 </ul></li>
 <li>A little bit more work on the Sept 6 IITA batch records
 <ul>
 <li>Bosede deleted the one item that I told her was a duplicate</li>
 <li>I checked the AGROVOC subjects and fixed one incorrect one</li>
 <li>Then I told her that I think the items are ready to go to CGSpace and asked Abenet for a final comment</li>
 </ul></li>
 </ul>
 <!-- vim: set sw=2 ts=2: -->
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@ -4,32 +4,32 @@
  <url>
    <loc>https://alanorth.github.io/cgspace-notes/</loc>
-    <lastmod>2019-09-26T14:21:41+03:00</lastmod>
+    <lastmod>2019-09-27T01:20:09+03:00</lastmod>
  </url>
  <url>
    <loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
-    <lastmod>2019-09-26T14:21:41+03:00</lastmod>
+    <lastmod>2019-09-27T01:20:09+03:00</lastmod>
  </url>
  <url>
    <loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
-    <lastmod>2019-09-26T14:21:41+03:00</lastmod>
+    <lastmod>2019-09-27T01:20:09+03:00</lastmod>
  </url>
  <url>
    <loc>https://alanorth.github.io/cgspace-notes/2019-09/</loc>
-    <lastmod>2019-09-26T14:21:41+03:00</lastmod>
+    <lastmod>2019-09-27T01:20:09+03:00</lastmod>
  </url>
  <url>
    <loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
-    <lastmod>2019-09-26T14:21:41+03:00</lastmod>
+    <lastmod>2019-09-27T01:20:09+03:00</lastmod>
  </url>
  <url>
    <loc>https://alanorth.github.io/cgspace-notes/2019-08/</loc>
-    <lastmod>2019-09-01T01:54:55+03:00</lastmod>
+    <lastmod>2019-09-27T01:20:09+03:00</lastmod>
  </url>
  <url>