cgspace-notes/content/posts/2018-07.md

---
title: "July, 2018"
date: 2018-07-01T12:56:54+03:00
author: "Alan Orth"
tags: ["Notes"]
---

## 2018-07-01

- I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:

```
$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
```

- During the `mvn package` stage on the 5.8 branch I kept getting issues with java running out of memory:

```
There is insufficient memory for the Java Runtime Environment to continue.
```

<!--more-->

- As the machine only has 8GB of RAM, I reduced the Tomcat memory heap from 5120m to 4096m so I could try to allocate more to the build process:

```
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=dspacetest.cgiar.org -P \!dspace-lni,\!dspace-rdf,\!dspace-sword,\!dspace-swordv2 clean package
```

- Then I stopped the Tomcat 7 service, ran the ant update, and manually ran the old and ignored SQL migrations:

```
$ sudo su - postgres
$ psql dspace
...
dspace=# begin;
BEGIN
dspace=# \i Atmire-DSpace-5.8-Schema-Migration.sql
DELETE 0
UPDATE 1
DELETE 1
dspace=# commit
dspace=# \q
$ exit
$ dspace database migrate ignored
```

- After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have

## 2018-07-02

- Discuss AgriKnowledge including our Handle identifier on their harvested items from CGSpace
- They seem to be only interested in Gates-funded outputs, for example: https://www.agriknowledge.org/files/tm70mv21t

## 2018-07-03

- Finally finish with the CIFOR Archive records (a total of 2448):
  - I mapped the 50 items that were duplicates from elsewhere in CGSpace into [CIFOR Archive](https://cgspace.cgiar.org/handle/10568/16702)
  - I did one last check of the remaining 2398 items and found eight who have a `cg.identifier.doi` that links to some URL other than a DOI so I moved those to `cg.identifier.url` and `cg.identifier.googleurl` as appropriate
  - Also, thirteen items had a DOI in their citation, but did not have a `cg.identifier.doi` field, so I added those
  - Then I imported those 2398 items in two batches (to deal with memory issues):

```
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive.csv
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive2.csv
```

- I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely:

```
dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
 count
-------
   785
dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
 count
-------
     4
```

- I think I should fix that as well as some other garbage values like "test" and "dspace.ilri.org" etc:

```
dspace=# begin;
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
UPDATE 785
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
UPDATE 4
dspace=# update metadatavalue set text_value='https://books.google.com/books?id=meF1CLdPSF4C' where resource_type_id=2 and metadata_field_id=222 and text_value='meF1CLdPSF4C';
UPDATE 1
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403);
DELETE 4
dspace=# commit;
```

<!-- vim: set sw=2 ts=2: -->
Add notes for 2018-07-01 2018-07-01 13:34:56 +02:00			`---`
			`title: "July, 2018"`
			`date: 2018-07-01T12:56:54+03:00`
			`author: "Alan Orth"`
			`tags: ["Notes"]`
			`---`

			`## 2018-07-01`

Update notes for 2018-07-01 2018-07-01 17:05:01 +02:00			`- I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:`
Add notes for 2018-07-01 2018-07-01 13:34:56 +02:00
			```
			`$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace`
			```

Update notes for 2018-07-01 2018-07-01 17:05:01 +02:00			- During the `mvn package` stage on the 5.8 branch I kept getting issues with java running out of memory:
Add notes for 2018-07-01 2018-07-01 13:34:56 +02:00
			```
			`There is insufficient memory for the Java Runtime Environment to continue.`
			```

Update notes for 2018-07-01 2018-07-01 17:05:01 +02:00			`<!--more-->`

			`- As the machine only has 8GB of RAM, I reduced the Tomcat memory heap from 5120m to 4096m so I could try to allocate more to the build process:`
Add notes for 2018-07-01 2018-07-01 13:34:56 +02:00
			```
			`$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"`
Update notes for 2018-07-01 2018-07-01 17:05:01 +02:00			`$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=dspacetest.cgiar.org -P \!dspace-lni,\!dspace-rdf,\!dspace-sword,\!dspace-swordv2 clean package`
Add notes for 2018-07-01 2018-07-01 13:34:56 +02:00			```

Update notes for 2018-07-01 2018-07-01 17:05:01 +02:00			`- Then I stopped the Tomcat 7 service, ran the ant update, and manually ran the old and ignored SQL migrations:`

			```
			`$ sudo su - postgres`
			`$ psql dspace`
			`...`
			`dspace=# begin;`
			`BEGIN`
			`dspace=# \i Atmire-DSpace-5.8-Schema-Migration.sql`
			`DELETE 0`
			`UPDATE 1`
			`DELETE 1`
			`dspace=# commit`
			`dspace=# \q`
			`$ exit`
			`$ dspace database migrate ignored`
			```

			`- After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have`
Add notes for 2018-07-01 2018-07-01 13:34:56 +02:00
Update notes for 2018-07-02 2018-07-02 16:33:38 +02:00			`## 2018-07-02`

			`- Discuss AgriKnowledge including our Handle identifier on their harvested items from CGSpace`
			`- They seem to be only interested in Gates-funded outputs, for example: https://www.agriknowledge.org/files/tm70mv21t`

Add notes for 2018-07-03 2018-07-03 13:37:30 +02:00			`## 2018-07-03`

			`- Finally finish with the CIFOR Archive records (a total of 2448):`
			`- I mapped the 50 items that were duplicates from elsewhere in CGSpace into [CIFOR Archive](https://cgspace.cgiar.org/handle/10568/16702)`
			- I did one last check of the remaining 2398 items and found eight who have a `cg.identifier.doi` that links to some URL other than a DOI so I moved those to `cg.identifier.url` and `cg.identifier.googleurl` as appropriate
			- Also, thirteen items had a DOI in their citation, but did not have a `cg.identifier.doi` field, so I added those
			`- Then I imported those 2398 items in two batches (to deal with memory issues):`

			```
			`$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"`
			`$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive.csv`
			`$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive2.csv`
			```

			`- I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely:`

			```
			`dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';`
			`count`
			`-------`
			`785`
			`dspace=# select count() from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..';`
			`count`
			`-------`
			`4`
			```

			`- I think I should fix that as well as some other garbage values like "test" and "dspace.ilri.org" etc:`

			```
			`dspace=# begin;`
			`dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';`
			`UPDATE 785`
			`dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';`
			`UPDATE 4`
			`dspace=# update metadatavalue set text_value='https://books.google.com/books?id=meF1CLdPSF4C' where resource_type_id=2 and metadata_field_id=222 and text_value='meF1CLdPSF4C';`
			`UPDATE 1`
			`dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403);`
			`DELETE 4`
			`dspace=# commit;`
			```

Add notes for 2018-07-01 2018-07-01 13:34:56 +02:00			`<!-- vim: set sw=2 ts=2: -->`