mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2018-07-03
This commit is contained in:
@ -30,7 +30,7 @@ There is insufficient memory for the Java Runtime Environment to continue.
|
||||
|
||||
<meta property="article:published_time" content="2018-07-01T12:56:54+03:00"/>
|
||||
|
||||
<meta property="article:modified_time" content="2018-07-01T18:05:01+03:00"/>
|
||||
<meta property="article:modified_time" content="2018-07-02T17:33:38+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -61,7 +61,7 @@ There is insufficient memory for the Java Runtime Environment to continue.
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.42.1" />
|
||||
<meta name="generator" content="Hugo 0.42.2" />
|
||||
|
||||
|
||||
|
||||
@ -71,9 +71,9 @@ There is insufficient memory for the Java Runtime Environment to continue.
|
||||
"@type": "BlogPosting",
|
||||
"headline": "July, 2018",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2018-07/",
|
||||
"wordCount": "210",
|
||||
"wordCount": "469",
|
||||
"datePublished": "2018-07-01T12:56:54+03:00",
|
||||
"dateModified": "2018-07-01T18:05:01+03:00",
|
||||
"dateModified": "2018-07-02T17:33:38+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -195,6 +195,54 @@ $ dspace database migrate ignored
|
||||
<li>They seem to be only interested in Gates-funded outputs, for example: <a href="https://www.agriknowledge.org/files/tm70mv21t">https://www.agriknowledge.org/files/tm70mv21t</a></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-07-03">2018-07-03</h2>
|
||||
|
||||
<ul>
|
||||
<li>Finally finish with the CIFOR Archive records (a total of 2448):
|
||||
|
||||
<ul>
|
||||
<li>I mapped the 50 items that were duplicates from elsewhere in CGSpace into <a href="https://cgspace.cgiar.org/handle/10568/16702">CIFOR Archive</a></li>
|
||||
<li>I did one last check of the remaining 2398 items and found eight who have a <code>cg.identifier.doi</code> that links to some URL other than a DOI so I moved those to <code>cg.identifier.url</code> and <code>cg.identifier.googleurl</code> as appropriate</li>
|
||||
<li>Also, thirteen items had a DOI in their citation, but did not have a <code>cg.identifier.doi</code> field, so I added those</li>
|
||||
<li>Then I imported those 2398 items in two batches (to deal with memory issues):</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive.csv
|
||||
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive2.csv
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
|
||||
count
|
||||
-------
|
||||
785
|
||||
dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
|
||||
count
|
||||
-------
|
||||
4
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>I think I should fix that as well as some other garbage values like “test” and “dspace.ilri.org” etc:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>dspace=# begin;
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
|
||||
UPDATE 785
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
|
||||
UPDATE 4
|
||||
dspace=# update metadatavalue set text_value='https://books.google.com/books?id=meF1CLdPSF4C' where resource_type_id=2 and metadata_field_id=222 and text_value='meF1CLdPSF4C';
|
||||
UPDATE 1
|
||||
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403);
|
||||
DELETE 4
|
||||
dspace=# commit;
|
||||
</code></pre>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user