mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2023-02-22
This commit is contained in:
@ -18,7 +18,7 @@ I want to try to expand my use of their data to journals, publishers, volumes, i
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2023-02/" />
|
||||
<meta property="article:published_time" content="2023-02-01T10:57:36+03:00" />
|
||||
<meta property="article:modified_time" content="2023-02-15T19:47:13+03:00" />
|
||||
<meta property="article:modified_time" content="2023-02-21T20:46:53+03:00" />
|
||||
|
||||
|
||||
|
||||
@ -42,9 +42,9 @@ I want to try to expand my use of their data to journals, publishers, volumes, i
|
||||
"@type": "BlogPosting",
|
||||
"headline": "February, 2023",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2023-02/",
|
||||
"wordCount": "2333",
|
||||
"wordCount": "2566",
|
||||
"datePublished": "2023-02-01T10:57:36+03:00",
|
||||
"dateModified": "2023-02-15T19:47:13+03:00",
|
||||
"dateModified": "2023-02-21T20:46:53+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -508,7 +508,48 @@ I want to try to expand my use of their data to journals, publishers, volumes, i
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># grep <span style="color:#e6db74">'RTB website BOT'</span> /var/log/nginx/rest.log | awk <span style="color:#e6db74">'{print $9}'</span> | sort | uniq -c | sort -h
|
||||
</span></span><span style="display:flex;"><span> 2023 200
|
||||
</span></span></code></pre></div><!-- raw HTML omitted -->
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Start reviewing and fixing metadata for Sam’s ~250 CAS publications from last year
|
||||
<ul>
|
||||
<li>Both Abenet and Peter have already looked at them and Sam has been waiting for months on this</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="2023-02-22">2023-02-22</h2>
|
||||
<ul>
|
||||
<li>Continue proofing CAS records for Sam
|
||||
<ul>
|
||||
<li>I downloaded all the PDFs manually and checked the issue dates for each from the PDF, noting some that had licenses, ISBNs, etc</li>
|
||||
<li>I combined the title, abstract, and system subjects into one column to mine them for AGROVOC terms:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>toLowercase(value) + toLowercase(cells["dcterms.abstract"].value) + toLowercase(cells["cg.subject.system"].value.replace("||", " "))
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then I extracted a list of AGROVOC terms the same way I did in <a href="/cgspace-notes/2022-08/">August, 2022</a> and used this Jython code to extract matching terms:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> re
|
||||
</span></span><span style="display:flex;"><span>
|
||||
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">with</span> open(<span style="color:#e6db74">r</span><span style="color:#e6db74">"/tmp/agrovoc-subjects.txt"</span>,<span style="color:#e6db74">'r'</span>) <span style="color:#66d9ef">as</span> f :
|
||||
</span></span><span style="display:flex;"><span> terms <span style="color:#f92672">=</span> [name<span style="color:#f92672">.</span>rstrip()<span style="color:#f92672">.</span>lower() <span style="color:#66d9ef">for</span> name <span style="color:#f92672">in</span> f]
|
||||
</span></span><span style="display:flex;"><span>
|
||||
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">return</span> <span style="color:#e6db74">"||"</span><span style="color:#f92672">.</span>join([term <span style="color:#66d9ef">for</span> term <span style="color:#f92672">in</span> terms <span style="color:#66d9ef">if</span> re<span style="color:#f92672">.</span>match(<span style="color:#e6db74">r</span><span style="color:#e6db74">".*\b"</span> <span style="color:#f92672">+</span> term <span style="color:#f92672">+</span> <span style="color:#e6db74">r</span><span style="color:#e6db74">"\b.*"</span>, value<span style="color:#f92672">.</span>lower())])
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then I used <a href="https://stackoverflow.com/questions/15419080/openrefine-remove-duplicates-from-list-with-jython">this cool Jython to remove duplicate metadata values</a>:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>deduped_list <span style="color:#f92672">=</span> list(set(value<span style="color:#f92672">.</span>split(<span style="color:#e6db74">"||"</span>)))
|
||||
</span></span><span style="display:flex;"><span><span style="color:#66d9ef">return</span> <span style="color:#e6db74">'||'</span><span style="color:#f92672">.</span>join(map(str, deduped_list))
|
||||
</span></span></code></pre></div><ul>
|
||||
<li>Then I did the same with countries, woooooo!</li>
|
||||
<li>I checked for duplicates and found forty-one</li>
|
||||
<li>I just stumbled upon UNTERM, which provides the official list of countries for the UN General Assembly, including a downloadable Excel with the short and formal names in all UN languages: <a href="https://unterm.un.org/unterm2/en/country">https://unterm.un.org/unterm2/en/country</a></li>
|
||||
<li>I created a <a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/merge_requests/32">pull request to add common names for Iran, Laos, and Syria on the Debian iso-codes package</a>
|
||||
<ul>
|
||||
<li>These are remarked upon in the ISO.org online browsing platform for ISO 3166-1</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user