mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-26 08:28:18 +01:00
Update notes for 2019-04-08
This commit is contained in:
parent
6f44a3bcdd
commit
ca75050fe9
@ -427,5 +427,34 @@ $ ./fix-metadata-values.py -i 2019-04-08-fix-13-affiliations.csv -db dspace -u d
|
|||||||
```
|
```
|
||||||
|
|
||||||
- We should create a new list of affiliations to update our controlled vocabulary again
|
- We should create a new list of affiliations to update our controlled vocabulary again
|
||||||
|
- I dumped a list of the top 1500 affiliations:
|
||||||
|
|
||||||
|
```
|
||||||
|
dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 211 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-04-08-top-1500-affiliations.csv WITH CSV HEADER;
|
||||||
|
COPY 1500
|
||||||
|
```
|
||||||
|
|
||||||
|
- Fix a few more messed up affiliations that have return characters in them (use Ctrl-V Ctrl-M to re-create control character):
|
||||||
|
|
||||||
|
```
|
||||||
|
dspace=# UPDATE metadatavalue SET text_value='International Institute for Environment and Development' WHERE resource_type_id = 2 AND metadata_field_id = 211 AND text_value LIKE 'International Institute^M%';
|
||||||
|
dspace=# UPDATE metadatavalue SET text_value='Kenya Agriculture and Livestock Research Organization' WHERE resource_type_id = 2 AND metadata_field_id = 211 AND text_value LIKE 'Kenya Agricultural and Livestock Research^M%';
|
||||||
|
```
|
||||||
|
|
||||||
|
- I noticed a bunch of subjects and affiliations that use stylized apostrophes so I will export those and then batch update them:
|
||||||
|
|
||||||
|
```
|
||||||
|
dspace=# \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 211 AND text_value LIKE '%’%') to /tmp/2019-04-08-affiliations-apostrophes.csv WITH CSV HEADER;
|
||||||
|
COPY 60
|
||||||
|
dspace=# \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 57 AND text_value LIKE '%’%') to /tmp/2019-04-08-subject-apostrophes.csv WITH CSV HEADER;
|
||||||
|
COPY 20
|
||||||
|
```
|
||||||
|
|
||||||
|
- I cleaned them up in OpenRefine and then applied the fixes on CGSpace and DSpace Test:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ./fix-metadata-values.py -i /tmp/2019-04-08-fix-60-affiliations-apostrophes.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 -t correct -d
|
||||||
|
$ ./fix-metadata-values.py -i /tmp/2019-04-08-fix-20-subject-apostrophes.csv -db dspace -u dspace -p 'fuuu' -f dc.subject -m 57 -t correct -d
|
||||||
|
```
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
@ -38,7 +38,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
|||||||
<meta property="og:type" content="article" />
|
<meta property="og:type" content="article" />
|
||||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" />
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" />
|
||||||
<meta property="article:published_time" content="2019-04-01T09:00:43+03:00"/>
|
<meta property="article:published_time" content="2019-04-01T09:00:43+03:00"/>
|
||||||
<meta property="article:modified_time" content="2019-04-07T21:17:16+03:00"/>
|
<meta property="article:modified_time" content="2019-04-08T11:26:20+03:00"/>
|
||||||
|
|
||||||
<meta name="twitter:card" content="summary"/>
|
<meta name="twitter:card" content="summary"/>
|
||||||
<meta name="twitter:title" content="April, 2019"/>
|
<meta name="twitter:title" content="April, 2019"/>
|
||||||
@ -81,9 +81,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "April, 2019",
|
"headline": "April, 2019",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2019-04/",
|
"url": "https://alanorth.github.io/cgspace-notes/2019-04/",
|
||||||
"wordCount": "2397",
|
"wordCount": "2631",
|
||||||
"datePublished": "2019-04-01T09:00:43+03:00",
|
"datePublished": "2019-04-01T09:00:43+03:00",
|
||||||
"dateModified": "2019-04-07T21:17:16+03:00",
|
"dateModified": "2019-04-08T11:26:20+03:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -659,8 +659,39 @@ X-XSS-Protection: 1; mode=block
|
|||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li>We should create a new list of affiliations to update our controlled vocabulary again</li>
|
<li>We should create a new list of affiliations to update our controlled vocabulary again</li>
|
||||||
|
<li>I dumped a list of the top 1500 affiliations:</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 211 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-04-08-top-1500-affiliations.csv WITH CSV HEADER;
|
||||||
|
COPY 1500
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Fix a few more messed up affiliations that have return characters in them (use Ctrl-V Ctrl-M to re-create control character):</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>dspace=# UPDATE metadatavalue SET text_value='International Institute for Environment and Development' WHERE resource_type_id = 2 AND metadata_field_id = 211 AND text_value LIKE 'International Institute^M%';
|
||||||
|
dspace=# UPDATE metadatavalue SET text_value='Kenya Agriculture and Livestock Research Organization' WHERE resource_type_id = 2 AND metadata_field_id = 211 AND text_value LIKE 'Kenya Agricultural and Livestock Research^M%';
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I noticed a bunch of subjects and affiliations that use stylized apostrophes so I will export those and then batch update them:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 211 AND text_value LIKE '%’%') to /tmp/2019-04-08-affiliations-apostrophes.csv WITH CSV HEADER;
|
||||||
|
COPY 60
|
||||||
|
dspace=# \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 57 AND text_value LIKE '%’%') to /tmp/2019-04-08-subject-apostrophes.csv WITH CSV HEADER;
|
||||||
|
COPY 20
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I cleaned them up in OpenRefine and then applied the fixes on CGSpace and DSpace Test:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-04-08-fix-60-affiliations-apostrophes.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 -t correct -d
|
||||||
|
$ ./fix-metadata-values.py -i /tmp/2019-04-08-fix-20-subject-apostrophes.csv -db dspace -u dspace -p 'fuuu' -f dc.subject -m 57 -t correct -d
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
|
||||||
|
|
||||||
|
@ -46,7 +46,7 @@ Disallow: /cgspace-notes/2015-12/
|
|||||||
Disallow: /cgspace-notes/2015-11/
|
Disallow: /cgspace-notes/2015-11/
|
||||||
Disallow: /cgspace-notes/
|
Disallow: /cgspace-notes/
|
||||||
Disallow: /cgspace-notes/categories/
|
Disallow: /cgspace-notes/categories/
|
||||||
Disallow: /cgspace-notes/tags/notes/
|
|
||||||
Disallow: /cgspace-notes/categories/notes/
|
Disallow: /cgspace-notes/categories/notes/
|
||||||
|
Disallow: /cgspace-notes/tags/notes/
|
||||||
Disallow: /cgspace-notes/posts/
|
Disallow: /cgspace-notes/posts/
|
||||||
Disallow: /cgspace-notes/tags/
|
Disallow: /cgspace-notes/tags/
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2019-04/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2019-04/</loc>
|
||||||
<lastmod>2019-04-07T21:17:16+03:00</lastmod>
|
<lastmod>2019-04-08T11:26:20+03:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -219,7 +219,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2019-04-07T21:17:16+03:00</lastmod>
|
<lastmod>2019-04-08T11:26:20+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -228,27 +228,27 @@
|
|||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
|
||||||
<lastmod>2019-04-07T21:17:16+03:00</lastmod>
|
|
||||||
<priority>0</priority>
|
|
||||||
</url>
|
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||||
<lastmod>2018-03-09T22:10:33+02:00</lastmod>
|
<lastmod>2018-03-09T22:10:33+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
<url>
|
||||||
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
|
<lastmod>2019-04-08T11:26:20+03:00</lastmod>
|
||||||
|
<priority>0</priority>
|
||||||
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||||
<lastmod>2019-04-07T21:17:16+03:00</lastmod>
|
<lastmod>2019-04-08T11:26:20+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2019-04-07T21:17:16+03:00</lastmod>
|
<lastmod>2019-04-08T11:26:20+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user