mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-08-03
This commit is contained in:
@ -14,7 +14,7 @@ Our request to add CC-BY-3.0-IGO to SPDX was approved a few weeks ago
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-08/" />
|
||||
<meta property="article:published_time" content="2022-08-01T10:22:36+03:00" />
|
||||
<meta property="article:modified_time" content="2022-08-01T10:22:36+03:00" />
|
||||
<meta property="article:modified_time" content="2022-08-01T16:36:13+03:00" />
|
||||
|
||||
|
||||
|
||||
@ -34,9 +34,9 @@ Our request to add CC-BY-3.0-IGO to SPDX was approved a few weeks ago
|
||||
"@type": "BlogPosting",
|
||||
"headline": "August, 2022",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2022-08/",
|
||||
"wordCount": "14",
|
||||
"wordCount": "492",
|
||||
"datePublished": "2022-08-01T10:22:36+03:00",
|
||||
"dateModified": "2022-08-01T10:22:36+03:00",
|
||||
"dateModified": "2022-08-01T16:36:13+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -54,7 +54,7 @@ Our request to add CC-BY-3.0-IGO to SPDX was approved a few weeks ago
|
||||
|
||||
<!-- combined, minified CSS -->
|
||||
|
||||
<link href="https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel="stylesheet" integrity="sha256-xrqAvFBmlVdkWr4F+GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin="anonymous">
|
||||
<link href="https://alanorth.github.io/cgspace-notes/css/style.beb8012edc08ba10be012f079d618dc243812267efe62e11f22fe49618f976a4.css" rel="stylesheet" integrity="sha256-vrgBLtwIuhC+AS8HnWGNwkOBImfv5i4R8i/klhj5dqQ=" crossorigin="anonymous">
|
||||
|
||||
|
||||
<!-- minified Font Awesome for SVG icons -->
|
||||
@ -114,6 +114,65 @@ Our request to add CC-BY-3.0-IGO to SPDX was approved a few weeks ago
|
||||
<ul>
|
||||
<li>Our request to add <a href="https://github.com/spdx/license-list-XML/issues/1525">CC-BY-3.0-IGO to SPDX</a> was approved a few weeks ago</li>
|
||||
</ul>
|
||||
<h2 id="2022-08-02">2022-08-02</h2>
|
||||
<ul>
|
||||
<li>Resume working on the MARLO Innovations
|
||||
<ul>
|
||||
<li>Last week Jose had sent me an updated CSV with UTF-8 formatting, which was missing the filename column</li>
|
||||
<li>I joined it with the older file (stripped down to just the <code>cg.number</code> and <code>filename</code> columns and then did the same cleanups I had done last week</li>
|
||||
<li>I noticed there are six PDFs unused, so I asked Jose</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Spent some time trying to understand the REST API submission issues that Rafael from CIAT is having with tip-approve and tip-submit
|
||||
<ul>
|
||||
<li>First, according to my notes in 2020-10, a user must be a <em>collection admin</em> in order to submit via the REST API</li>
|
||||
<li>Second, a collection must have a “Accept/Reject/Edit Metadata” step defined in the workflow</li>
|
||||
<li>Also, I referenced my notes from this gist I had made for exactly this purpose! <a href="https://gist.github.com/alanorth/40fc3092aefd78f978cca00e8abeeb7a">https://gist.github.com/alanorth/40fc3092aefd78f978cca00e8abeeb7a</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="2022-08-03">2022-08-03</h2>
|
||||
<ul>
|
||||
<li>I came up with an interesting idea to add missing countries and AGROVOC terms to the MARLO Innovation metadata
|
||||
<ul>
|
||||
<li>I copied the abstract column to two new fields: <code>countrytest</code> and <code>agrovoctest</code> and then used this Jython code as a transform to drop terms that don’t match (using CGSpace’s country list and list of 1,400 AGROVOC terms):</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">with</span> open(<span style="color:#e6db74">r</span><span style="color:#e6db74">"/tmp/cgspace-countries.txt"</span>,<span style="color:#e6db74">'r'</span>) <span style="color:#66d9ef">as</span> f :
|
||||
</span></span><span style="display:flex;"><span> countries <span style="color:#f92672">=</span> [name<span style="color:#f92672">.</span>rstrip()<span style="color:#f92672">.</span>lower() <span style="color:#66d9ef">for</span> name <span style="color:#f92672">in</span> f]
|
||||
</span></span><span style="display:flex;"><span>
|
||||
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">return</span> <span style="color:#e6db74">"||"</span><span style="color:#f92672">.</span>join([x <span style="color:#66d9ef">for</span> x <span style="color:#f92672">in</span> value<span style="color:#f92672">.</span>split(<span style="color:#e6db74">' '</span>) <span style="color:#66d9ef">if</span> x<span style="color:#f92672">.</span>lower() <span style="color:#f92672">in</span> countries])
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then I joined them with the other country and AGROVOC columns
|
||||
<ul>
|
||||
<li>I had originally tried to use csv-metadata-quality to look up and drop invalid AGROVOC terms but it was timing out ever dozen or so requests</li>
|
||||
<li>Then I briefly tried to use lightrdf to export a text file of labels from AGROVOC’s RDF, but I couldn’t figure it out</li>
|
||||
<li>I just realized this will not match countries with spaces in our cell value, ugh… and Jython has weird syntax and errors and I can’t get normal Python code to work here, I’m missing something</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Then I extracted the titles, dates, and types and added IDs, then ran them through <code>check-duplicates.py</code> to find the existing items on CGSpace so I can add them as <code>dcterm.relation</code> links</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvcut -l -c dc.title,dcterms.issued,dcterms.type ~/Downloads/2022-08-03-Innovations-Cleaned.csv | sed <span style="color:#e6db74">'1s/line_number/id/'</span> > /tmp/innovations-temp.csv
|
||||
</span></span><span style="display:flex;"><span>$ ./ilri/check-duplicates.py -i /tmp/innovations-temp.csv -u dspacetest -db dspacetest -p <span style="color:#e6db74">'dom@in34sniper'</span> -o /tmp/ccafs-duplicates.csv
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>There were about 115 with existing items on CGSpace</li>
|
||||
<li>Then I did some minor processing and checking of the duplicates file (for example, some titles appear more than once in both files), and joined with the other file (left join):</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ csvjoin --left -c dc.title ~/Downloads/2022-08-03-Innovations-Cleaned.csv ~/Downloads/2022-08-03-Innovations-relations.csv > /tmp/innovations-with-relations.csv
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then I used SAFBuilder to create a SimpleItemArchive and import to DSpace Test:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ export JAVA_OPTS<span style="color:#f92672">=</span><span style="color:#e6db74">"-Dfile.encoding=UTF-8 -Xmx2048m"</span>
|
||||
</span></span><span style="display:flex;"><span>$ dspace import --add --eperson<span style="color:#f92672">=</span>aorth@mjanja.ch --source /tmp/SimpleArchiveFormat --mapfile<span style="color:#f92672">=</span>./2022-08-03-innovations.map
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Meeting with Mohammed Salem about harmonizing MEL and CGSpace metadata fields
|
||||
<ul>
|
||||
<li>I still need to share our results and recommendations with Peter, Enrico, Sara, Svetlana, et al</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I made some minor fixes to csv-metadata-quality while working on the MARLO CRP Innovations</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user