mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-26 00:18:21 +01:00
Add notes for 2019-04-30
This commit is contained in:
parent
fe6eb4cf98
commit
76daa82326
@ -1016,4 +1016,46 @@ dspace=# SELECT * FROM item WHERE item_id=74648;
|
|||||||
|
|
||||||
- I even tried to "expunge" the item using an [action in CSV](https://wiki.duraspace.org/display/DSDOC5x/Batch+Metadata+Editing#BatchMetadataEditing-Performing'actions'onitems), and it said "EXPUNGED!" but the item is still there...
|
- I even tried to "expunge" the item using an [action in CSV](https://wiki.duraspace.org/display/DSDOC5x/Batch+Metadata+Editing#BatchMetadataEditing-Performing'actions'onitems), and it said "EXPUNGED!" but the item is still there...
|
||||||
|
|
||||||
|
## 2019-04-30
|
||||||
|
|
||||||
|
- Send mail to the dspace-tech mailing list to ask about the item expunge issue
|
||||||
|
- Delete and re-create Podman container for dspacedb after pulling a new PostgreSQL container:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
|
||||||
|
```
|
||||||
|
|
||||||
|
- Carlos from LandPortal asked if I could export CGSpace in a machine-readable format so I think I'll try to do a CSV
|
||||||
|
- In order to make it easier for him to understand the CSV I will normalize the text languages (minus the provenance field) on my local development instance before exporting:
|
||||||
|
|
||||||
|
```
|
||||||
|
dspace=# SELECT DISTINCT text_lang, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id != 28 GROUP BY text_lang;
|
||||||
|
text_lang | count
|
||||||
|
-----------+---------
|
||||||
|
| 358647
|
||||||
|
* | 11
|
||||||
|
E. | 1
|
||||||
|
en | 1635
|
||||||
|
en_US | 602312
|
||||||
|
es | 12
|
||||||
|
es_ES | 2
|
||||||
|
ethnob | 1
|
||||||
|
fr | 2
|
||||||
|
spa | 2
|
||||||
|
| 1074345
|
||||||
|
(11 rows)
|
||||||
|
dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('ethnob', 'en', '*', 'E.', '');
|
||||||
|
UPDATE 360295
|
||||||
|
dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IS NULL;
|
||||||
|
UPDATE 1074345
|
||||||
|
dspace=# UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('es', 'spa');
|
||||||
|
UPDATE 14
|
||||||
|
```
|
||||||
|
|
||||||
|
- Then I exported the whole repository as CSV, imported it into OpenRefine, removed a few unneeded columns, exported it, zipped it down to 36MB, and emailed a link to Carlos
|
||||||
|
- In other news, while I was looking through the CSV in OpenRefine I saw lots of weird values in some fields... we should check, for example:
|
||||||
|
- issue dates
|
||||||
|
- items missing handles
|
||||||
|
- authorship types
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
@ -38,7 +38,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
|||||||
<meta property="og:type" content="article" />
|
<meta property="og:type" content="article" />
|
||||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" />
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" />
|
||||||
<meta property="article:published_time" content="2019-04-01T09:00:43+03:00"/>
|
<meta property="article:published_time" content="2019-04-01T09:00:43+03:00"/>
|
||||||
<meta property="article:modified_time" content="2019-04-26T12:16:02+03:00"/>
|
<meta property="article:modified_time" content="2019-04-28T19:07:51+03:00"/>
|
||||||
|
|
||||||
<meta name="twitter:card" content="summary"/>
|
<meta name="twitter:card" content="summary"/>
|
||||||
<meta name="twitter:title" content="April, 2019"/>
|
<meta name="twitter:title" content="April, 2019"/>
|
||||||
@ -81,9 +81,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "April, 2019",
|
"headline": "April, 2019",
|
||||||
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-04\/",
|
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-04\/",
|
||||||
"wordCount": "6534",
|
"wordCount": "6800",
|
||||||
"datePublished": "2019-04-01T09:00:43\x2b03:00",
|
"datePublished": "2019-04-01T09:00:43\x2b03:00",
|
||||||
"dateModified": "2019-04-26T12:16:02\x2b03:00",
|
"dateModified": "2019-04-28T19:07:51\x2b03:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -1418,6 +1418,58 @@ COPY 65752
|
|||||||
<li>I even tried to “expunge” the item using an <a href="https://wiki.duraspace.org/display/DSDOC5x/Batch+Metadata+Editing#BatchMetadataEditing-Performing'actions'onitems">action in CSV</a>, and it said “EXPUNGED!” but the item is still there…</li>
|
<li>I even tried to “expunge” the item using an <a href="https://wiki.duraspace.org/display/DSDOC5x/Batch+Metadata+Editing#BatchMetadataEditing-Performing'actions'onitems">action in CSV</a>, and it said “EXPUNGED!” but the item is still there…</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<h2 id="2019-04-30">2019-04-30</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Send mail to the dspace-tech mailing list to ask about the item expunge issue</li>
|
||||||
|
<li>Delete and re-create Podman container for dspacedb after pulling a new PostgreSQL container:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Carlos from LandPortal asked if I could export CGSpace in a machine-readable format so I think I’ll try to do a CSV
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>In order to make it easier for him to understand the CSV I will normalize the text languages (minus the provenance field) on my local development instance before exporting:</li>
|
||||||
|
</ul></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>dspace=# SELECT DISTINCT text_lang, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id != 28 GROUP BY text_lang;
|
||||||
|
text_lang | count
|
||||||
|
-----------+---------
|
||||||
|
| 358647
|
||||||
|
* | 11
|
||||||
|
E. | 1
|
||||||
|
en | 1635
|
||||||
|
en_US | 602312
|
||||||
|
es | 12
|
||||||
|
es_ES | 2
|
||||||
|
ethnob | 1
|
||||||
|
fr | 2
|
||||||
|
spa | 2
|
||||||
|
| 1074345
|
||||||
|
(11 rows)
|
||||||
|
dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('ethnob', 'en', '*', 'E.', '');
|
||||||
|
UPDATE 360295
|
||||||
|
dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IS NULL;
|
||||||
|
UPDATE 1074345
|
||||||
|
dspace=# UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('es', 'spa');
|
||||||
|
UPDATE 14
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Then I exported the whole repository as CSV, imported it into OpenRefine, removed a few unneeded columns, exported it, zipped it down to 36MB, and emailed a link to Carlos</li>
|
||||||
|
<li>In other news, while I was looking through the CSV in OpenRefine I saw lots of weird values in some fields… we should check, for example:
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>issue dates</li>
|
||||||
|
<li>items missing handles</li>
|
||||||
|
<li>authorship types</li>
|
||||||
|
</ul></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
|
||||||
|
|
||||||
|
@ -4,30 +4,30 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2019-04/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2019-04/</loc>
|
||||||
<lastmod>2019-04-26T12:16:02+03:00</lastmod>
|
<lastmod>2019-04-28T19:07:51+03:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2019-04-26T12:16:02+03:00</lastmod>
|
<lastmod>2019-04-28T19:07:51+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
<lastmod>2019-04-26T12:16:02+03:00</lastmod>
|
<lastmod>2019-04-28T19:07:51+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||||
<lastmod>2019-04-26T12:16:02+03:00</lastmod>
|
<lastmod>2019-04-28T19:07:51+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2019-04-26T12:16:02+03:00</lastmod>
|
<lastmod>2019-04-28T19:07:51+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user