description = "Notes on the migration of the CGIAR Library to CGSpace"
categories = ["Notes"]
slug = "cgiar-library-migration"
+++
_Temporarily making this a page because it seems Hugo (currently 0.27.1) cannot use a custom slug for a post when there is a permalink defined in `config.toml`_
Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called _CGIAR System Organization_.
## Pre-migration Technical TODOs
Things that need to happen before the migration:
- [x] Create top-level community on CGSpace to hold the CGIAR Library content: 10568/83389
- [ ] Merge [#339](https://github.com/ilri/DSpace/pull/339) to `5_x-prod` branch and rebuild DSpace
- [x] Increase `max_connections` in `/etc/postgresql/9.5/main/postgresql.conf` by ~10
-`SELECT * FROM pg_stat_activity;` seems to show ~6 extra connections used by the command line tools during import
- [x] Temporarily disable nightly `index-discovery` cron job because the import process will be taking place during some of this time and I don't want them to be competing to update the Solr index
## Migration
Process for the actual migration:
- Export all top-level communities and collections from DSpace Test:
- This submits AIP hierarchies recursively (-r) and suppresses errors when an item's parent collection hasn't been created yet—for example, if the item is mapped
- The large historic archive (10947/1) is created in several steps because it requires a lot of memory and often crashes
- Create new subcommunities and collections for content we reorganized into new hierarchies from the original:
- [x] Create _CGIAR System Management Board_ sub-community: 10568/83536
- [x] Content from _CGIAR System Management Board documents_ collection (10947/4561) goes here
- Import collection hierarchy first and then the items:
dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z');
- [ ] Merge `cgiar-library` branch to `master` and re-run ansible nginx templates
## Troubleshooting
### Foreign Key Error in `dspace cleanup`
The cleanup script is sometimes used during import processes to clean the database and assetstore after failed AIP imports. If you see the following error with `dspace cleanup -v`:
```
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
Detail: Key (bitstream_id)=(119841) is still referenced from table "bundle".
```
The solution is to set the `primary_bitstream_id` to NULL in PostgreSQL:
```
dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (119841);
```
### PSQLException During AIP Ingest
After a few rounds of ingesting—possibly with failures—you might end up with inconsistent IDs in the database. In this case, during AIP ingest of a single collection in submit mode (-s):
```
org.dspace.content.packager.PackageValidationException: Exception while ingesting 10947-2527/10947-2527.zip, Reason: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "handle_pkey"
Detail: Key (handle_id)=(86227) already exists.
```
The normal solution is to run the `update-sequences.sql` script (with Tomcat shut down) but it doesn't seem to work in this case. Finding the maximum `handle_id` and manually updating the sequence seems to work:
```
dspace=# select * from handle where handle_id=(select max(handle_id) from handle);