Compare commits

..

2 Commits

Author SHA1 Message Date
2e80702de4
Add notes for 2023-02-22 2023-02-22 21:37:12 +03:00
ba6f826201
content/posts/2022-08.md: syntax fix 2023-02-22 11:59:48 +03:00
33 changed files with 123 additions and 45 deletions

View File

@ -328,4 +328,41 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: missing re
2023 200
```
- Start reviewing and fixing metadata for Sam's ~250 CAS publications from last year
- Both Abenet and Peter have already looked at them and Sam has been waiting for months on this
## 2023-02-22
- Continue proofing CAS records for Sam
- I downloaded all the PDFs manually and checked the issue dates for each from the PDF, noting some that had licenses, ISBNs, etc
- I combined the title, abstract, and system subjects into one column to mine them for AGROVOC terms:
```console
toLowercase(value) + toLowercase(cells["dcterms.abstract"].value) + toLowercase(cells["cg.subject.system"].value.replace("||", " "))
```
- Then I extracted a list of AGROVOC terms the same way I did in [August, 2022]({{< relref "2022-08.md" >}}) and used this Jython code to extract matching terms:
```python
import re
with open(r"/tmp/agrovoc-subjects.txt",'r') as f :
terms = [name.rstrip().lower() for name in f]
return "||".join([term for term in terms if re.match(r".*\b" + term + r"\b.*", value.lower())])
```
- Then I used [this cool Jython to remove duplicate metadata values](https://stackoverflow.com/questions/15419080/openrefine-remove-duplicates-from-list-with-jython):
```python
deduped_list = list(set(value.split("||")))
return '||'.join(map(str, deduped_list))
```
- Then I did the same with countries, woooooo!
- I checked for duplicates and found forty-one
- I just stumbled upon UNTERM, which provides the official list of countries for the UN General Assembly, including a downloadable Excel with the short and formal names in all UN languages: https://unterm.un.org/unterm2/en/country
- I created a [pull request to add common names for Iran, Laos, and Syria on the Debian iso-codes package](https://salsa.debian.org/iso-codes-team/iso-codes/-/merge_requests/32)
- These are remarked upon in the ISO.org online browsing platform for ISO 3166-1
<!-- vim: set sw=2 ts=2: -->

View File

@ -14,7 +14,7 @@ Our request to add CC-BY-3.0-IGO to SPDX was approved a few weeks ago
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2022-08/" />
<meta property="article:published_time" content="2022-08-01T10:22:36+03:00" />
<meta property="article:modified_time" content="2022-09-27T14:35:26+03:00" />
<meta property="article:modified_time" content="2023-02-22T11:59:48+03:00" />
@ -34,9 +34,9 @@ Our request to add CC-BY-3.0-IGO to SPDX was approved a few weeks ago
"@type": "BlogPosting",
"headline": "August, 2022",
"url": "https://alanorth.github.io/cgspace-notes/2022-08/",
"wordCount": "2706",
"wordCount": "2704",
"datePublished": "2022-08-01T10:22:36+03:00",
"dateModified": "2022-09-27T14:35:26+03:00",
"dateModified": "2023-02-22T11:59:48+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"

View File

@ -18,7 +18,7 @@ I want to try to expand my use of their data to journals, publishers, volumes, i
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2023-02/" />
<meta property="article:published_time" content="2023-02-01T10:57:36+03:00" />
<meta property="article:modified_time" content="2023-02-15T19:47:13+03:00" />
<meta property="article:modified_time" content="2023-02-21T20:46:53+03:00" />
@ -42,9 +42,9 @@ I want to try to expand my use of their data to journals, publishers, volumes, i
"@type": "BlogPosting",
"headline": "February, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-02/",
"wordCount": "2333",
"wordCount": "2566",
"datePublished": "2023-02-01T10:57:36+03:00",
"dateModified": "2023-02-15T19:47:13+03:00",
"dateModified": "2023-02-21T20:46:53+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -508,7 +508,48 @@ I want to try to expand my use of their data to journals, publishers, volumes, i
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># grep <span style="color:#e6db74">&#39;RTB website BOT&#39;</span> /var/log/nginx/rest.log | awk <span style="color:#e6db74">&#39;{print $9}&#39;</span> | sort | uniq -c | sort -h
</span></span><span style="display:flex;"><span> 2023 200
</span></span></code></pre></div><!-- raw HTML omitted -->
</span></span></code></pre></div><ul>
<li>Start reviewing and fixing metadata for Sam&rsquo;s ~250 CAS publications from last year
<ul>
<li>Both Abenet and Peter have already looked at them and Sam has been waiting for months on this</li>
</ul>
</li>
</ul>
<h2 id="2023-02-22">2023-02-22</h2>
<ul>
<li>Continue proofing CAS records for Sam
<ul>
<li>I downloaded all the PDFs manually and checked the issue dates for each from the PDF, noting some that had licenses, ISBNs, etc</li>
<li>I combined the title, abstract, and system subjects into one column to mine them for AGROVOC terms:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>toLowercase(value) + toLowercase(cells[&#34;dcterms.abstract&#34;].value) + toLowercase(cells[&#34;cg.subject.system&#34;].value.replace(&#34;||&#34;, &#34; &#34;))
</span></span></code></pre></div><ul>
<li>Then I extracted a list of AGROVOC terms the same way I did in <a href="/cgspace-notes/2022-08/">August, 2022</a> and used this Jython code to extract matching terms:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> re
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">with</span> open(<span style="color:#e6db74">r</span><span style="color:#e6db74">&#34;/tmp/agrovoc-subjects.txt&#34;</span>,<span style="color:#e6db74">&#39;r&#39;</span>) <span style="color:#66d9ef">as</span> f :
</span></span><span style="display:flex;"><span> terms <span style="color:#f92672">=</span> [name<span style="color:#f92672">.</span>rstrip()<span style="color:#f92672">.</span>lower() <span style="color:#66d9ef">for</span> name <span style="color:#f92672">in</span> f]
</span></span><span style="display:flex;"><span>
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">return</span> <span style="color:#e6db74">&#34;||&#34;</span><span style="color:#f92672">.</span>join([term <span style="color:#66d9ef">for</span> term <span style="color:#f92672">in</span> terms <span style="color:#66d9ef">if</span> re<span style="color:#f92672">.</span>match(<span style="color:#e6db74">r</span><span style="color:#e6db74">&#34;.*\b&#34;</span> <span style="color:#f92672">+</span> term <span style="color:#f92672">+</span> <span style="color:#e6db74">r</span><span style="color:#e6db74">&#34;\b.*&#34;</span>, value<span style="color:#f92672">.</span>lower())])
</span></span></code></pre></div><ul>
<li>Then I used <a href="https://stackoverflow.com/questions/15419080/openrefine-remove-duplicates-from-list-with-jython">this cool Jython to remove duplicate metadata values</a>:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>deduped_list <span style="color:#f92672">=</span> list(set(value<span style="color:#f92672">.</span>split(<span style="color:#e6db74">&#34;||&#34;</span>)))
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">return</span> <span style="color:#e6db74">&#39;||&#39;</span><span style="color:#f92672">.</span>join(map(str, deduped_list))
</span></span></code></pre></div><ul>
<li>Then I did the same with countries, woooooo!</li>
<li>I checked for duplicates and found forty-one</li>
<li>I just stumbled upon UNTERM, which provides the official list of countries for the UN General Assembly, including a downloadable Excel with the short and formal names in all UN languages: <a href="https://unterm.un.org/unterm2/en/country">https://unterm.un.org/unterm2/en/country</a></li>
<li>I created a <a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/merge_requests/32">pull request to add common names for Iran, Laos, and Syria on the Debian iso-codes package</a>
<ul>
<li>These are remarked upon in the ISO.org online browsing platform for ISO 3166-1</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-02-15T19:47:13+03:00" />
<meta property="og:updated_time" content="2023-02-22T11:59:48+03:00" />

View File

@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2023-02-15T19:47:13+03:00</lastmod>
<lastmod>2023-02-22T11:59:48+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2023-02-15T19:47:13+03:00</lastmod>
<lastmod>2023-02-22T11:59:48+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2023-02/</loc>
<lastmod>2023-02-15T19:47:13+03:00</lastmod>
<lastmod>2023-02-21T20:46:53+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2023-02-15T19:47:13+03:00</lastmod>
<lastmod>2023-02-22T11:59:48+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2023-02-15T19:47:13+03:00</lastmod>
<lastmod>2023-02-22T11:59:48+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2023-01/</loc>
<lastmod>2023-01-31T22:20:38+03:00</lastmod>
@ -33,7 +33,7 @@
<lastmod>2022-09-30T17:29:50+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-08/</loc>
<lastmod>2022-09-27T14:35:26+03:00</lastmod>
<lastmod>2023-02-22T11:59:48+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-07/</loc>
<lastmod>2022-07-31T15:49:35+03:00</lastmod>