Update notes for 2018-02-25

This commit is contained in:
2018-02-25 17:25:41 +02:00
parent 5750294974
commit e0dbe07c3b
4 changed files with 112 additions and 14 deletions

View File

@ -23,7 +23,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-pl
<meta property="article:published_time" content="2018-02-01T16:28:54&#43;02:00"/>
<meta property="article:modified_time" content="2018-02-25T11:23:54&#43;02:00"/>
<meta property="article:modified_time" content="2018-02-25T13:28:26&#43;02:00"/>
@ -57,9 +57,9 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu&rsquo;s munin-pl
"@type": "BlogPosting",
"headline": "February, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-02/",
"wordCount": "5219",
"wordCount": "5516",
"datePublished": "2018-02-01T16:28:54&#43;02:00",
"dateModified": "2018-02-25T11:23:54&#43;02:00",
"dateModified": "2018-02-25T13:28:26&#43;02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -1074,8 +1074,58 @@ Given Names Deactivated Family Name Deactivated: 0000-0002-2614-426X
<li>Remove Dryland Systems subject from submission form because that CRP closed two years ago (<a href="https://github.com/ilri/DSpace/pull/355">#355</a>)</li>
<li>Run all system updates on DSpace Test</li>
<li>Email ICT to ask how to proceed with the OCS proforma issue for the new DSpace Test server on Linode</li>
<li>Thinking about how to preserve ORCID identifiers attached to existing items in CGSpace</li>
<li>We have over 60,000 unique author + authority combinations on CGSpace:</li>
</ul>
<pre><code>dspace=# select count(distinct (text_value, authority)) from metadatavalue where resource_type_id=2 and metadata_field_id=3;
count
-------
62464
(1 row)
</code></pre>
<ul>
<li>I know from earlier this month that there are only 624 unique ORCID identifiers in the Solr authority core, so it&rsquo;s way easier to just fetch the unique ORCID iDs from Solr and then go back to PostgreSQL and do the metadata mapping that way</li>
<li>The query in Solr would simply be <code>orcid_id:*</code></li>
<li>Assuming I know that authority record with <code>id:d7ef744b-bbd4-4171-b449-00e37e1b776f</code>, then I could query PostgreSQL for all metadata records using that authority:</li>
</ul>
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and authority='d7ef744b-bbd4-4171-b449-00e37e1b776f';
metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
-------------------+-------------+-------------------+---------------------------+-----------+-------+--------------------------------------+------------+------------------
2726830 | 77710 | 3 | Rodríguez Chalarca, Jairo | | 2 | d7ef744b-bbd4-4171-b449-00e37e1b776f | 600 | 2
(1 row)
</code></pre>
<ul>
<li>Then I suppose I can use the <code>resource_id</code> to identify the item?</li>
<li>Actually, <code>resource_id</code> is the same id we use in CSV, so I could simply build something like this for a metadata import!</li>
</ul>
<pre><code>id,cg.creator.id
93848,Alan S. Orth: 0000-0002-1735-7458||Peter G. Ballantyne: 0000-0001-9346-2893
</code></pre>
<ul>
<li>I just discovered that <a href="https://requests-cache.readthedocs.io">requests-cache</a> can transparently cache HTTP requests</li>
<li>Running <code>resolve-orcids.py</code> with my test input takes 10.5 seconds the first time, and then 3.0 seconds the second time!</li>
</ul>
<pre><code>$ time ./resolve-orcids.py -i orcid-test-values.txt -o /tmp/orcid-names
Ali Ramadhan: 0000-0001-5019-1368
Alan S. Orth: 0000-0002-1735-7458
Ibrahim Mohammed: 0000-0001-5199-5528
Nor Azwadi: 0000-0001-9634-1958
./resolve-orcids.py -i orcid-test-values.txt -o /tmp/orcid-names 0.32s user 0.07s system 3% cpu 10.530 total
$ time ./resolve-orcids.py -i orcid-test-values.txt -o /tmp/orcid-names
Ali Ramadhan: 0000-0001-5019-1368
Alan S. Orth: 0000-0002-1735-7458
Ibrahim Mohammed: 0000-0001-5199-5528
Nor Azwadi: 0000-0001-9634-1958
./resolve-orcids.py -i orcid-test-values.txt -o /tmp/orcid-names 0.23s user 0.05s system 8% cpu 3.046 total
</code></pre>

View File

@ -32,7 +32,7 @@ Disallow: /cgspace-notes/2015-12/
Disallow: /cgspace-notes/2015-11/
Disallow: /cgspace-notes/
Disallow: /cgspace-notes/categories/
Disallow: /cgspace-notes/tags/notes/
Disallow: /cgspace-notes/categories/notes/
Disallow: /cgspace-notes/tags/notes/
Disallow: /cgspace-notes/post/
Disallow: /cgspace-notes/tags/

View File

@ -4,7 +4,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2018-02/</loc>
<lastmod>2018-02-25T11:23:54+02:00</lastmod>
<lastmod>2018-02-25T13:28:26+02:00</lastmod>
</url>
<url>
@ -149,7 +149,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2018-02-25T11:23:54+02:00</lastmod>
<lastmod>2018-02-25T13:28:26+02:00</lastmod>
<priority>0</priority>
</url>
@ -158,27 +158,27 @@
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-02-25T11:23:54+02:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2017-09-28T12:00:49+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-02-25T13:28:26+02:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
<lastmod>2018-02-25T11:23:54+02:00</lastmod>
<lastmod>2018-02-25T13:28:26+02:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2018-02-25T11:23:54+02:00</lastmod>
<lastmod>2018-02-25T13:28:26+02:00</lastmod>
<priority>0</priority>
</url>