Update notes

This commit is contained in:
Alan Orth 2018-03-08 22:47:12 +02:00
parent 8c2e314038
commit 2e5c0e3fed
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 27 additions and 8 deletions

View File

@ -102,6 +102,15 @@ dspacetest=# select distinct text_lang from metadatavalue where resource_type_id
(9 rows)
```
- On second inspection it looks like `dc.description.provenance` fields use the text_lang "en" so that's probably why there are over 100,000 fields changed...
- If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc:
```
dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng');
UPDATE 2309
```
- I will apply this on CGSpace right now
- In other news, I was playing with adding ORCID identifiers to a dump of CIAT's community via CSV in OpenRefine
- Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the `cg.creator.id` field
- For example, a GREL expression in a custom text facet to get all items with `dc.contributor.author[en_US]` of a certain author with several name variations (this is how you use a logical OR in OpenRefine):

View File

@ -20,7 +20,7 @@ Export a CSV of the IITA community metadata for Martin Mueller
<meta property="article:published_time" content="2018-03-02T16:07:54&#43;02:00"/>
<meta property="article:modified_time" content="2018-03-08T21:20:39&#43;02:00"/>
<meta property="article:modified_time" content="2018-03-08T21:29:37&#43;02:00"/>
@ -51,9 +51,9 @@ Export a CSV of the IITA community metadata for Martin Mueller
"@type": "BlogPosting",
"headline": "March, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-03/",
"wordCount": "709",
"wordCount": "780",
"datePublished": "2018-03-02T16:07:54&#43;02:00",
"dateModified": "2018-03-08T21:20:39&#43;02:00",
"dateModified": "2018-03-08T21:29:37&#43;02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -229,6 +229,16 @@ dspacetest=# select distinct text_lang from metadatavalue where resource_type_id
</code></pre>
<ul>
<li>On second inspection it looks like <code>dc.description.provenance</code> fields use the text_lang &ldquo;en&rdquo; so that&rsquo;s probably why there are over 100,000 fields changed&hellip;</li>
<li>If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc:</li>
</ul>
<pre><code>dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng');
UPDATE 2309
</code></pre>
<ul>
<li>I will apply this on CGSpace right now</li>
<li>In other news, I was playing with adding ORCID identifiers to a dump of CIAT&rsquo;s community via CSV in OpenRefine</li>
<li>Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the <code>cg.creator.id</code> field</li>
<li>For example, a GREL expression in a custom text facet to get all items with <code>dc.contributor.author[en_US]</code> of a certain author with several name variations (this is how you use a logical OR in OpenRefine):</li>

View File

@ -4,7 +4,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2018-03/</loc>
<lastmod>2018-03-08T21:20:39+02:00</lastmod>
<lastmod>2018-03-08T21:29:37+02:00</lastmod>
</url>
<url>
@ -154,7 +154,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2018-03-08T21:20:39+02:00</lastmod>
<lastmod>2018-03-08T21:29:37+02:00</lastmod>
<priority>0</priority>
</url>
@ -165,7 +165,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2018-03-08T21:20:39+02:00</lastmod>
<lastmod>2018-03-08T21:29:37+02:00</lastmod>
<priority>0</priority>
</url>
@ -177,13 +177,13 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
<lastmod>2018-03-08T21:20:39+02:00</lastmod>
<lastmod>2018-03-08T21:29:37+02:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2018-03-08T21:20:39+02:00</lastmod>
<lastmod>2018-03-08T21:29:37+02:00</lastmod>
<priority>0</priority>
</url>