_Note: I'm temporarily making this a page because it seems Hugo (currently 0.27.1) cannot use a custom slug for a post when there is a permalink defined in `config.toml`_
Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called _CGIAR System Organization_.
- [x] Increase `max_connections` in `/etc/postgresql/9.5/main/postgresql.conf` by ~10
-`SELECT * FROM pg_stat_activity;` seems to show ~6 extra connections used by the command line tools during import
- [x] Temporarily disable nightly `index-discovery` cron job because the import process will be taking place during some of this time and I don't want them to be competing to update the Solr index
$ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
$ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
```
This submits AIP hierarchies recursively (-r) and suppresses errors when an item's parent collection hasn't been created yet—for example, if the item is mapped. The large historic archive (10947/1) is created in several steps because it requires a lot of memory and often crashes.
**Create new subcommunities and collections for content we reorganized into new hierarchies from the original:**
- Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:
```
$ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done
```
**Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:**
```
dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z');
```
- Export them from the CGIAR Library:
```
# for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done
```
- Import on CGSpace:
```
$ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done
- The handle setup script will ask for IP address, contact person, etc
- Use CGSpace's IP address, but give some contact person from the system organization
- Copy the resulting `sitebndl.zip` somewhere so we can send it to Handle.net
- Now I'm wondering how we'll do this when we move servers in the future, because the `make-handle-config` basically assumes you only have one handle
- Also, there is `dspace make-handle-config` and `bin/make-handle-config` and both behave differently (the first is interactive, the second reads your `dspace.cfg` and generates your handle config and `sitebndl.zip` accordingly)
- I'm really not sure on the proper order of events actually
The cleanup script is sometimes used during import processes to clean the database and assetstore after failed AIP imports. If you see the following error with `dspace cleanup -v`:
```
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
Detail: Key (bitstream_id)=(119841) is still referenced from table "bundle".
```
The solution is to set the `primary_bitstream_id` to NULL in PostgreSQL:
```
dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (119841);
```
### PSQLException During AIP Ingest
After a few rounds of ingesting—possibly with failures—you might end up with inconsistent IDs in the database. In this case, during AIP ingest of a single collection in submit mode (-s):
```
org.dspace.content.packager.PackageValidationException: Exception while ingesting 10947-2527/10947-2527.zip, Reason: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "handle_pkey"
Detail: Key (handle_id)=(86227) already exists.
```
The normal solution is to run the `update-sequences.sql` script (with Tomcat shut down) but it doesn't seem to work in this case. Finding the maximum `handle_id` and manually updating the sequence seems to work:
```
dspace=# select * from handle where handle_id=(select max(handle_id) from handle);