mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2018-09-30
This commit is contained in:
@ -18,7 +18,8 @@ I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I
|
||||
" />
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2018-09/" /><meta property="article:published_time" content="2018-09-02T09:55:54+03:00"/>
|
||||
<meta property="article:modified_time" content="2018-09-29T15:00:03+03:00"/>
|
||||
<meta property="article:modified_time" content="2018-09-29T19:00:41+03:00"/>
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="September, 2018"/>
|
||||
<meta name="twitter:description" content="2018-09-02
|
||||
@ -31,7 +32,7 @@ I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.48" />
|
||||
<meta name="generator" content="Hugo 0.49" />
|
||||
|
||||
|
||||
|
||||
@ -41,9 +42,9 @@ I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I
|
||||
"@type": "BlogPosting",
|
||||
"headline": "September, 2018",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2018-09/",
|
||||
"wordCount": "4757",
|
||||
"wordCount": "5245",
|
||||
"datePublished": "2018-09-02T09:55:54+03:00",
|
||||
"dateModified": "2018-09-29T15:00:03+03:00",
|
||||
"dateModified": "2018-09-29T19:00:41+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -812,6 +813,83 @@ $ ./fix-metadata-values.py -i 2018-09-29-fix-authors.csv -db dspace -u dspace -p
|
||||
<li>He was sending too many (5 or 10) concurrent requests to the server, but still… why is this shit so slow?!</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-09-30">2018-09-30</h2>
|
||||
|
||||
<ul>
|
||||
<li>Valerio keeps sending items on CGSpace that have weird or incorrect languages, authors, etc</li>
|
||||
<li>I think I should just batch export and update all languages…</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2018-09-30-languages.csv with csv;
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Then I can simply delete the “Other” and “other” ones because that’s not useful at all:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='Other';
|
||||
DELETE 6
|
||||
dspace=# DELETE FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='other';
|
||||
DELETE 79
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Looking through the list I see some weird language codes like <code>gh</code>, so I checked out those items:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>dspace=# SELECT resource_id FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='gh';
|
||||
resource_id
|
||||
-------------
|
||||
94530
|
||||
94529
|
||||
dspace=# SELECT handle,item_id FROM item, handle WHERE handle.resource_type_id=2 AND handle.resource_id = item.item_id AND handle.resource_id in (94530, 94529);
|
||||
handle | item_id
|
||||
-------------+---------
|
||||
10568/91386 | 94529
|
||||
10568/91387 | 94530
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Those items are from Ghana, so the submitter apparently thought <code>gh</code> was a language… I can safely delete them:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>dspace=# DELETE FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='gh';
|
||||
DELETE 2
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>The next issue would be <code>jn</code>:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>dspace=# SELECT resource_id FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='jn';
|
||||
resource_id
|
||||
-------------
|
||||
94001
|
||||
94003
|
||||
dspace=# SELECT handle,item_id FROM item, handle WHERE handle.resource_type_id=2 AND handle.resource_id = item.item_id AND handle.resource_id in (94001, 94003);
|
||||
handle | item_id
|
||||
-------------+---------
|
||||
10568/90868 | 94001
|
||||
10568/90870 | 94003
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Those items are about Japan, so I will update them to be <code>ja</code></li>
|
||||
<li>Other replacements:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>DELETE FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='gh';
|
||||
UPDATE metadatavalue SET text_value='fr' WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='fn';
|
||||
UPDATE metadatavalue SET text_value='hi' WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='in';
|
||||
UPDATE metadatavalue SET text_value='ja' WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='Ja';
|
||||
UPDATE metadatavalue SET text_value='ja' WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='jn';
|
||||
UPDATE metadatavalue SET text_value='ja' WHERE resource_type_id=2 AND metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'language' and qualifier = 'iso') AND text_value='jp';
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Then there are 12 items with <code>en|hi</code>, but they were all in one collection so I just exported it as a CSV and then re-imported the corrected metadata</li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user