--- title: "July, 2018" date: 2018-07-01T12:56:54+03:00 author: "Alan Orth" tags: ["Notes"] --- ## 2018-07-01 - I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case: ``` $ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace ``` - During the `mvn package` stage on the 5.8 branch I kept getting issues with java running out of memory: ``` There is insufficient memory for the Java Runtime Environment to continue. ``` - As the machine only has 8GB of RAM, I reduced the Tomcat memory heap from 5120m to 4096m so I could try to allocate more to the build process: ``` $ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m" $ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=dspacetest.cgiar.org -P \!dspace-lni,\!dspace-rdf,\!dspace-sword,\!dspace-swordv2 clean package ``` - Then I stopped the Tomcat 7 service, ran the ant update, and manually ran the old and ignored SQL migrations: ``` $ sudo su - postgres $ psql dspace ... dspace=# begin; BEGIN dspace=# \i Atmire-DSpace-5.8-Schema-Migration.sql DELETE 0 UPDATE 1 DELETE 1 dspace=# commit dspace=# \q $ exit $ dspace database migrate ignored ``` - After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have ## 2018-07-02 - Discuss AgriKnowledge including our Handle identifier on their harvested items from CGSpace - They seem to be only interested in Gates-funded outputs, for example: https://www.agriknowledge.org/files/tm70mv21t ## 2018-07-03 - Finally finish with the CIFOR Archive records (a total of 2448): - I mapped the 50 items that were duplicates from elsewhere in CGSpace into [CIFOR Archive](https://cgspace.cgiar.org/handle/10568/16702) - I did one last check of the remaining 2398 items and found eight who have a `cg.identifier.doi` that links to some URL other than a DOI so I moved those to `cg.identifier.url` and `cg.identifier.googleurl` as appropriate - Also, thirteen items had a DOI in their citation, but did not have a `cg.identifier.doi` field, so I added those - Then I imported those 2398 items in two batches (to deal with memory issues): ``` $ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m" $ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive.csv $ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive2.csv ``` - I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely: ``` dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%'; count ------- 785 dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*'; count ------- 4 ``` - I think I should fix that as well as some other garbage values like "test" and "dspace.ilri.org" etc: ``` dspace=# begin; dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%'; UPDATE 785 dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*'; UPDATE 4 dspace=# update metadatavalue set text_value='https://books.google.com/books?id=meF1CLdPSF4C' where resource_type_id=2 and metadata_field_id=222 and text_value='meF1CLdPSF4C'; UPDATE 1 dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403); DELETE 4 dspace=# commit; ```