Add notes for 2018-07-03

This commit is contained in:
2018-07-03 14:37:30 +03:00
parent 172cda49e8
commit f72d67f67a
58 changed files with 154 additions and 64 deletions

View File

@ -53,4 +53,46 @@ $ dspace database migrate ignored
- Discuss AgriKnowledge including our Handle identifier on their harvested items from CGSpace
- They seem to be only interested in Gates-funded outputs, for example: https://www.agriknowledge.org/files/tm70mv21t
## 2018-07-03
- Finally finish with the CIFOR Archive records (a total of 2448):
- I mapped the 50 items that were duplicates from elsewhere in CGSpace into [CIFOR Archive](https://cgspace.cgiar.org/handle/10568/16702)
- I did one last check of the remaining 2398 items and found eight who have a `cg.identifier.doi` that links to some URL other than a DOI so I moved those to `cg.identifier.url` and `cg.identifier.googleurl` as appropriate
- Also, thirteen items had a DOI in their citation, but did not have a `cg.identifier.doi` field, so I added those
- Then I imported those 2398 items in two batches (to deal with memory issues):
```
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive.csv
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive2.csv
```
- I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely:
```
dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
count
-------
785
dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
count
-------
4
```
- I think I should fix that as well as some other garbage values like "test" and "dspace.ilri.org" etc:
```
dspace=# begin;
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
UPDATE 785
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
UPDATE 4
dspace=# update metadatavalue set text_value='https://books.google.com/books?id=meF1CLdPSF4C' where resource_type_id=2 and metadata_field_id=222 and text_value='meF1CLdPSF4C';
UPDATE 1
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403);
DELETE 4
dspace=# commit;
```
<!-- vim: set sw=2 ts=2: -->