diff --git a/content/posts/2019-04.md b/content/posts/2019-04.md index ca223f963..d4eb7225a 100644 --- a/content/posts/2019-04.md +++ b/content/posts/2019-04.md @@ -1016,4 +1016,46 @@ dspace=# SELECT * FROM item WHERE item_id=74648; - I even tried to "expunge" the item using an [action in CSV](https://wiki.duraspace.org/display/DSDOC5x/Batch+Metadata+Editing#BatchMetadataEditing-Performing'actions'onitems), and it said "EXPUNGED!" but the item is still there... +## 2019-04-30 + +- Send mail to the dspace-tech mailing list to ask about the item expunge issue +- Delete and re-create Podman container for dspacedb after pulling a new PostgreSQL container: + +``` +$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine +``` + +- Carlos from LandPortal asked if I could export CGSpace in a machine-readable format so I think I'll try to do a CSV + - In order to make it easier for him to understand the CSV I will normalize the text languages (minus the provenance field) on my local development instance before exporting: + +``` +dspace=# SELECT DISTINCT text_lang, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id != 28 GROUP BY text_lang; + text_lang | count +-----------+--------- + | 358647 + * | 11 + E. | 1 + en | 1635 + en_US | 602312 + es | 12 + es_ES | 2 + ethnob | 1 + fr | 2 + spa | 2 + | 1074345 +(11 rows) +dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('ethnob', 'en', '*', 'E.', ''); +UPDATE 360295 +dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IS NULL; +UPDATE 1074345 +dspace=# UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('es', 'spa'); +UPDATE 14 +``` + +- Then I exported the whole repository as CSV, imported it into OpenRefine, removed a few unneeded columns, exported it, zipped it down to 36MB, and emailed a link to Carlos +- In other news, while I was looking through the CSV in OpenRefine I saw lots of weird values in some fields... we should check, for example: + - issue dates + - items missing handles + - authorship types + diff --git a/docs/2019-04/index.html b/docs/2019-04/index.html index 8cf90cb75..0bc92f997 100644 --- a/docs/2019-04/index.html +++ b/docs/2019-04/index.html @@ -38,7 +38,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace - + @@ -81,9 +81,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace "@type": "BlogPosting", "headline": "April, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-04\/", - "wordCount": "6534", + "wordCount": "6800", "datePublished": "2019-04-01T09:00:43\x2b03:00", - "dateModified": "2019-04-26T12:16:02\x2b03:00", + "dateModified": "2019-04-28T19:07:51\x2b03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -1418,6 +1418,58 @@ COPY 65752
  • I even tried to “expunge” the item using an action in CSV, and it said “EXPUNGED!” but the item is still there…
  • +

    2019-04-30

    + + + +
    $ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
    +
    + + + +
    dspace=# SELECT DISTINCT text_lang, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id != 28 GROUP BY text_lang;
    + text_lang |  count
    +-----------+---------
    +           |  358647
    + *         |      11
    + E.        |       1
    + en        |    1635
    + en_US     |  602312
    + es        |      12
    + es_ES     |       2
    + ethnob    |       1
    + fr        |       2
    + spa       |       2
    +           | 1074345
    +(11 rows)
    +dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('ethnob', 'en', '*', 'E.', '');
    +UPDATE 360295
    +dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IS NULL;
    +UPDATE 1074345
    +dspace=# UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('es', 'spa');
    +UPDATE 14
    +
    + + + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index de6634cde..d878a5935 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,30 +4,30 @@ https://alanorth.github.io/cgspace-notes/2019-04/ - 2019-04-26T12:16:02+03:00 + 2019-04-28T19:07:51+03:00 https://alanorth.github.io/cgspace-notes/ - 2019-04-26T12:16:02+03:00 + 2019-04-28T19:07:51+03:00 0 https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-04-26T12:16:02+03:00 + 2019-04-28T19:07:51+03:00 0 https://alanorth.github.io/cgspace-notes/posts/ - 2019-04-26T12:16:02+03:00 + 2019-04-28T19:07:51+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2019-04-26T12:16:02+03:00 + 2019-04-28T19:07:51+03:00 0