Update notes

This commit is contained in:
2017-05-10 23:44:44 +03:00
parent fe6726122a
commit 4d443e60e1
3 changed files with 62 additions and 12 deletions

View File

@ -64,8 +64,8 @@ $ ./fix-metadata-values.py -i ccafs-flagships-may7.csv -f cg.subject.ccafs -t co
```
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m -XX:-UseGCOverheadLimit"
$ [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10568/87775 /home/aorth/10947-1/10947-1.zip
$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10947/1 $collection; done
$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e some@user.com $item; done
```
- Note that in submission mode DSpace ignores the handle specified in `mets.xml` in the zip file, so you need to turn that off with `-o ignoreHandle=false`
@ -98,7 +98,31 @@ Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violate
Detail: Key (handle_id)=(80928) already exists.
```
- I think those errors actually come from me running the `update-sequences.sql` script while Tomcat/DSpace are running
- Apparently you need to stop Tomcat!
## 2017-05-10
- Atmire says they are willing to extend the ORCID implementation, and I've asked them to provide a quote
- I clarified that the scope of the implementation should be that ORCIDs are stored in the database and exposed via REST / API like other fields
- Finally finished importing all the CGIAR Library content, final method was:
```
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit"
$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2517/10947-2517.zip
$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2515/10947-2515.zip
$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-2516/10947-2516.zip
$ [dspace]/bin/dspace packager -r -a -t AIP -o skipIfParentMissing=true -e some@user.com -p 10568/80923 /home/aorth/10947-1/10947-1.zip
$ [dspace]/bin/dspace packager -s -t AIP -o ignoreHandle=false -e some@user.com -p 10568/80923 /home/aorth/10947-1/10947-1.zip
$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10947/1 $collection; done
$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e some@user.com $item; done
```
- Basically, import the smaller communities using recursive AIP import (with `skipIfParentMissing`)
- Then, for the larger collection, create the community, collections, and items separately, ingesting the items one by one
- The `-XX:-UseGCOverheadLimit` JVM option helps with some issues in large imports
- After this I ran the `update-sequences.sql` script (with Tomcat shut down), and cleaned up the 200+ blank metadata records:
```
dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
```