<p><em>Temporarily making this a page because it seems Hugo (currently 0.27.1) cannot use a custom slug for a post when there is a permalink defined in <code>config.toml</code></em></p>
<p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p>
<p>Things that need to happen before the migration:</p>
<ulclass="task-list">
<li><label><inputtype="checkbox"checkeddisabledclass="task-list-item"> Create top-level community on CGSpace to hold the CGIAR Library content: <sup>10568</sup>⁄<sub>83389</sub>
<ulclass="task-list">
<li><label><inputtype="checkbox"checkeddisabledclass="task-list-item"> Update nginx redirects in ansible templates</label></li>
<li><label><inputtype="checkbox"checkeddisabledclass="task-list-item"> Update handle in DSpace XMLUI config</label></li>
<li><label><inputtype="checkbox"disabledclass="task-list-item"> Merge <ahref="https://github.com/ilri/DSpace/pull/339">#339</a> to <code>5_x-prod</code> branch and rebuild DSpace</label></li>
<li><label><inputtype="checkbox"checkeddisabledclass="task-list-item"> Increase <code>max_connections</code> in <code>/etc/postgresql/9.5/main/postgresql.conf</code> by ~10
<ul>
<li><code>SELECT * FROM pg_stat_activity;</code> seems to show ~6 extra connections used by the command line tools during import</li>
</ul></label></li>
<li><label><inputtype="checkbox"checkeddisabledclass="task-list-item"> Temporarily disable nightly <code>index-discovery</code> cron job because the import process will be taking place during some of this time and I don’t want them to be competing to update the Solr index</label></li>
</ul>
<h2id="migration">Migration</h2>
<p>Process for the actual migration:</p>
<ulclass="task-list">
<li>Export all top-level communities and collections from DSpace Test:
<li><label><inputtype="checkbox"checkeddisabledclass="task-list-item"> Import communities and collections, paying attention to options to skip missing parents and ignore handles:
$ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
$ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
</code></label></li>
<li>This submits AIP hierarchies recursively (-r) and suppresses errors when an item’s parent collection hasn’t been created yet—for example, if the item is mapped</li>
<li>The large historic archive (<sup>10947</sup>⁄<sub>1</sub>) is created in several steps because it requires a lot of memory and often crashes</li>
</ul></li>
<li><p>Create new subcommunities and collections for content we reorganized into new hierarchies from the original:</p>
<ulclass="task-list">
<li><label><inputtype="checkbox"checkeddisabledclass="task-list-item"> Create <em>CGIAR System Management Board</em> sub-community: <sup>10568</sup>⁄<sub>83536</sub></label></li>
<li><label><inputtype="checkbox"checkeddisabledclass="task-list-item"> Content from <em>CGIAR System Management Board documents</em> collection (<sup>10947</sup>⁄<sub>4561</sub>) goes here</label></li>
<li>Import collection hierarchy first and then the items:
$ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
</code></li>
<li><label><inputtype="checkbox"checkeddisabledclass="task-list-item"> Create <em>CGIAR System Management Office</em> sub-community: <sup>10568</sup>⁄<sub>83537</sub></label></li>
<li><label><inputtype="checkbox"checkeddisabledclass="task-list-item"> Create <em>CGIAR System Management Office documents</em> collection: <sup>10568</sup>⁄<sub>83538</sub></label></li>
<li>Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:
<code>
$ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done
</code></li>
<li>Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:
<code>
dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z');
</code></li>
<li><p>Export them from the CGIAR Library:</p>
<pre><code># for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done
</code></pre></li>
<li><p>Import on CGSpace:</p>
<pre><code>$ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done
</code></pre></li>
<li><p>[ ] Shut down Tomcat and run <code>update-sequences.sql</code> as the system’s <code>postgres</code> user</p></li>
</ul></li>
</ul>
<h2id="post-migration">Post Migration</h2>
<ulclass="task-list">
<li><label><inputtype="checkbox"checkeddisabledclass="task-list-item"> Remove ingestion overrides from <code>dspace.cfg</code></label></li>
<li><label><inputtype="checkbox"disabledclass="task-list-item"> Reset PostgreSQL <code>max_connections</code> to 183</label></li>
<li><label><inputtype="checkbox"disabledclass="task-list-item"> Update DNS records:
<ul>
<li>CNAME: cgspace.cgiar.org</li>
</ul></label></li>
<li><label><inputtype="checkbox"disabledclass="task-list-item"> Re-deploy DSpace from freshly built <code>5_x-prod</code> branch</label></li>
<li><label><inputtype="checkbox"disabledclass="task-list-item"> Run system updates and reboot server</label></li>
<li><label><inputtype="checkbox"disabledclass="task-list-item"> Switch to Let’s Encrypt HTTPS certificates (after DNS is updated and server isn’t busy)
<li><label><inputtype="checkbox"disabledclass="task-list-item"> Merge <code>cgiar-library</code> branch to <code>master</code> and re-run ansible nginx templates</label></li>
</ul>
<h2id="troubleshooting">Troubleshooting</h2>
<h3id="foreign-key-error-in-dspace-cleanup">Foreign Key Error in <code>dspace cleanup</code></h3>
<p>The cleanup script is sometimes used during import processes to clean the database and assetstore after failed AIP imports. If you see the following error with <code>dspace cleanup -v</code>:</p>
<pre><code>Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
Detail: Key (bitstream_id)=(119841) is still referenced from table "bundle".
</code></pre>
<p>The solution is to set the <code>primary_bitstream_id</code> to NULL in PostgreSQL:</p>
<pre><code>dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (119841);
</code></pre>
<h3id="psqlexception-during-aip-ingest">PSQLException During AIP Ingest</h3>
<p>After a few rounds of ingesting—possibly with failures—you might end up with inconsistent IDs in the database. In this case, during AIP ingest of a single collection in submit mode (-s):</p>
<pre><code>org.dspace.content.packager.PackageValidationException: Exception while ingesting 10947-2527/10947-2527.zip, Reason: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "handle_pkey"
Detail: Key (handle_id)=(86227) already exists.
</code></pre>
<p>The normal solution is to run the <code>update-sequences.sql</code> script (with Tomcat shut down) but it doesn’t seem to work in this case. Finding the maximum <code>handle_id</code> and manually updating the sequence seems to work:</p>
<pre><code>dspace=# select * from handle where handle_id=(select max(handle_id) from handle);