Update formatting of CGIAR Library migration post

This commit is contained in:
Alan Orth 2017-09-18 21:24:27 +03:00
parent 84067ed6c9
commit d1eed90c0a
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
4 changed files with 123 additions and 76 deletions

View File

@ -28,7 +28,8 @@ Things that need to happen before the migration:
Process for the actual migration:
- Export all top-level communities and collections from DSpace Test:
```console
```
$ export PATH=$PATH:/home/dspacetest.cgiar.org/bin
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2515 10947-2515/10947-2515.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2516 10947-2516/10947-2516.zip
@ -43,15 +44,19 @@ $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93759 10568-93759/105
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93760 10568-93760/10568-93760.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip
```
- Import to CGSpace (also see [notes from 2017-05-10](http://alanorth.github.io/cgspace-notes/2017-05/#2017-05-10))
- [x] Copy all exports from DSpace Test
- [x] Add ingestion overrides to `dspace.cfg` before import:
```
mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
```
- [x] Import communities and collections, paying attention to options to skip missing parents and ignore handles:
```console
```
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
$ export PATH=$PATH:/home/cgspace.cgiar.org/bin
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2515/10947-2515.zip
@ -69,34 +74,45 @@ $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip
$ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
$ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
```
- This submits AIP hierarchies recursively (-r) and suppresses errors when an item's parent collection hasn't been created yet—for example, if the item is mapped
- The large historic archive (10947/1) is created in several steps because it requires a lot of memory and often crashes
- Create new subcommunities and collections for content we reorganized into new hierarchies from the original:
- [x] Create _CGIAR System Management Board_ sub-community: 10568/83536
- [x] Content from _CGIAR System Management Board documents_ collection (10947/4561) goes here
- Import collection hierarchy first and then the items:
```
$ dspace packager -r -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83536 10568-93760/COLLECTION@10947-4651.zip
$ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
```
- [x] Create _CGIAR System Management Office_ sub-community: 10568/83537
- [x] Create _CGIAR System Management Office documents_ collection: 10568/83538
- Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:
```
$ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done
```
- Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:
```
dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z');
```
- Export them from the CGIAR Library:
```
# for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done
```
- Import on CGSpace:
```
$ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done
```
- [ ] Shut down Tomcat and run `update-sequences.sql` as the system's `postgres` user
## Post Migration
@ -106,7 +122,8 @@ $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip
- [x] Enable nightly `index-discovery` cron job
- HTTPS certificates:
- [x] Install current certificates from their Tomcat keystore
```console
```
$ keytool -list -keystore tomcat.keystore
$ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat
$ openssl pkcs12 -in library.cgiar.org.p12 -nokeys -out library.cgiar.org.crt.pem
@ -114,15 +131,18 @@ $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip
$ wget https://certs.godaddy.com/repository/gdroot-g2.crt https://certs.godaddy.com/repository/gdig2.crt.pem
$ cat library.cgiar.org.crt.pem gdig2.crt.pem > library.cgiar.org-chained.pem
```
- [ ] Update DNS records:
- CNAME: cgspace.cgiar.org
- [ ] Re-deploy DSpace from freshly built `5_x-prod` branch
- [ ] Run system updates and reboot server
- [ ] Switch to Let's Encrypt HTTPS certificates (after DNS is updated and server isn't busy)
```console
```
$ sudo systemctl stop tomcat7
$ ./letsencrypt-auto certonly --standalone -d library.cgiar.org
```
- [ ] Merge `cgiar-library` branch to `master` and re-run ansible nginx templates
## Troubleshooting

View File

@ -25,7 +25,7 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account
<meta property="article:published_time" content="2017-09-07T16:54:52&#43;07:00"/>
<meta property="article:modified_time" content="2017-09-18T17:46:57&#43;03:00"/>
<meta property="article:modified_time" content="2017-09-18T18:18:09&#43;03:00"/>
@ -63,7 +63,7 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account
"url": "https://alanorth.github.io/cgspace-notes/2017-09/",
"wordCount": "2764",
"datePublished": "2017-09-07T16:54:52&#43;07:00",
"dateModified": "2017-09-18T17:46:57&#43;03:00",
"dateModified": "2017-09-18T18:18:09&#43;03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"

View File

@ -37,7 +37,7 @@
"@type": "BlogPosting",
"headline": "CGIAR Library Migration",
"url": "https://alanorth.github.io/cgspace-notes/cgiar-library-migration/",
"wordCount": "1175",
"wordCount": "1169",
"datePublished": "2017-09-18T16:38:35&#43;03:00",
"dateModified": "2017-09-18T18:05:57&#43;03:00",
"author": {
@ -142,10 +142,11 @@
<p>Process for the actual migration:</p>
<ul class="task-list">
<li>Export all top-level communities and collections from DSpace Test:
<code>console
$ export PATH=$PATH:/home/dspacetest.cgiar.org/bin
<ul>
<li>Export all top-level communities and collections from DSpace Test:</li>
</ul>
<pre><code>$ export PATH=$PATH:/home/dspacetest.cgiar.org/bin
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2515 10947-2515/10947-2515.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2516 10947-2516/10947-2516.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2517 10947-2517/10947-2517.zip
@ -158,74 +159,94 @@ $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2527 10947-2527/10947
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93759 10568-93759/10568-93759.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93760 10568-93760/10568-93760.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip
</code></li>
</code></pre>
<ul class="task-list">
<li>Import to CGSpace (also see <a href="http://alanorth.github.io/cgspace-notes/2017-05/#2017-05-10">notes from 2017-05-10</a>)
<ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Copy all exports from DSpace Test</label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Add ingestion overrides to <code>dspace.cfg</code> before import:
<code>
mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
</code></label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Import communities and collections, paying attention to options to skip missing parents and ignore handles:
<code>console
$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
$ export PATH=$PATH:/home/cgspace.cgiar.org/bin
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2515/10947-2515.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2516/10947-2516.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2517/10947-2517.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2518/10947-2518.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2519/10947-2519.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2708/10947-2708.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2526/10947-2526.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2871/10947-2871.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-4467/10947-4467.zip
$ dspace packager -s -u -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-2527/10947-2527.zip
$ for item in 10947-2527/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
$ dspace packager -s -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-1/10947-1.zip
$ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
$ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
</code></label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Add ingestion overrides to <code>dspace.cfg</code> before import:</label></li>
</ul></li>
</ul>
<pre><code> mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
</code></pre>
<ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Import communities and collections, paying attention to options to skip missing parents and ignore handles:</label></li>
</ul>
<pre><code> $ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
$ export PATH=$PATH:/home/cgspace.cgiar.org/bin
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2515/10947-2515.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2516/10947-2516.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2517/10947-2517.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2518/10947-2518.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2519/10947-2519.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2708/10947-2708.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2526/10947-2526.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2871/10947-2871.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-4467/10947-4467.zip
$ dspace packager -s -u -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-2527/10947-2527.zip
$ for item in 10947-2527/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
$ dspace packager -s -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-1/10947-1.zip
$ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
$ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
</code></pre>
<ul class="task-list">
<li>This submits AIP hierarchies recursively (-r) and suppresses errors when an item&rsquo;s parent collection hasn&rsquo;t been created yet—for example, if the item is mapped</li>
<li>The large historic archive (<sup>10947</sup>&frasl;<sub>1</sub>) is created in several steps because it requires a lot of memory and often crashes</li>
</ul></li>
<li><p>Create new subcommunities and collections for content we reorganized into new hierarchies from the original:</p>
<ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Board</em> sub-community: <sup>10568</sup>&frasl;<sub>83536</sub></label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Content from <em>CGIAR System Management Board documents</em> collection (<sup>10947</sup>&frasl;<sub>4561</sub>) goes here</label></li>
<li>Import collection hierarchy first and then the items:
<code>
$ dspace packager -r -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83536 10568-93760/COLLECTION@10947-4651.zip
<li>Import collection hierarchy first and then the items:</li>
</ul>
<pre><code>$ dspace packager -r -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83536 10568-93760/COLLECTION@10947-4651.zip
$ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
</code></li>
</code></pre>
<ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Office</em> sub-community: <sup>10568</sup>&frasl;<sub>83537</sub></label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Office documents</em> collection: <sup>10568</sup>&frasl;<sub>83538</sub></label></li>
<li>Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:
<code>
$ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done
</code></li>
<li>Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:
<code>
dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) &gt; '2017-05-01T00:00:00Z');
</code></li>
<li>Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:</li>
</ul>
<li><p>Export them from the CGIAR Library:</p>
<pre><code>$ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done
</code></pre>
<pre><code># for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done
</code></pre></li>
<li><p>Import on CGSpace:</p>
<pre><code>$ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done
</code></pre></li>
<li><p>[ ] Shut down Tomcat and run <code>update-sequences.sql</code> as the system&rsquo;s <code>postgres</code> user</p></li>
<ul>
<li>Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:</li>
</ul></li>
</ul>
<pre><code> dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) &gt; '2017-05-01T00:00:00Z');
</code></pre>
<ul>
<li>Export them from the CGIAR Library:</li>
</ul>
<pre><code> # for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done
</code></pre>
<ul>
<li>Import on CGSpace:</li>
</ul>
<pre><code> $ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done
</code></pre>
<ul class="task-list">
<li><label><input type="checkbox" disabled class="task-list-item"> Shut down Tomcat and run <code>update-sequences.sql</code> as the system&rsquo;s <code>postgres</code> user</label></li>
</ul>
<h2 id="post-migration">Post Migration</h2>
<ul class="task-list">
@ -235,16 +256,19 @@ dspace=# select handle from item, handle where handle.resource_id = item.item_id
<li>HTTPS certificates:
<ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Install current certificates from their Tomcat keystore
<code>console
$ keytool -list -keystore tomcat.keystore
$ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat
$ openssl pkcs12 -in library.cgiar.org.p12 -nokeys -out library.cgiar.org.crt.pem
$ openssl pkcs12 -in library.cgiar.org.p12 -nodes -nocerts -out library.cgiar.org.key.pem
$ wget https://certs.godaddy.com/repository/gdroot-g2.crt https://certs.godaddy.com/repository/gdig2.crt.pem
$ cat library.cgiar.org.crt.pem gdig2.crt.pem &gt; library.cgiar.org-chained.pem
</code></label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Install current certificates from their Tomcat keystore</label></li>
</ul></li>
</ul>
<pre><code> $ keytool -list -keystore tomcat.keystore
$ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat
$ openssl pkcs12 -in library.cgiar.org.p12 -nokeys -out library.cgiar.org.crt.pem
$ openssl pkcs12 -in library.cgiar.org.p12 -nodes -nocerts -out library.cgiar.org.key.pem
$ wget https://certs.godaddy.com/repository/gdroot-g2.crt https://certs.godaddy.com/repository/gdig2.crt.pem
$ cat library.cgiar.org.crt.pem gdig2.crt.pem &gt; library.cgiar.org-chained.pem
</code></pre>
<ul class="task-list">
<li><label><input type="checkbox" disabled class="task-list-item"> Update DNS records:
<ul>
@ -252,11 +276,14 @@ $ cat library.cgiar.org.crt.pem gdig2.crt.pem &gt; library.cgiar.org-chained.pem
</ul></label></li>
<li><label><input type="checkbox" disabled class="task-list-item"> Re-deploy DSpace from freshly built <code>5_x-prod</code> branch</label></li>
<li><label><input type="checkbox" disabled class="task-list-item"> Run system updates and reboot server</label></li>
<li><label><input type="checkbox" disabled class="task-list-item"> Switch to Let&rsquo;s Encrypt HTTPS certificates (after DNS is updated and server isn&rsquo;t busy)
<code>console
$ sudo systemctl stop tomcat7
<li><label><input type="checkbox" disabled class="task-list-item"> Switch to Let&rsquo;s Encrypt HTTPS certificates (after DNS is updated and server isn&rsquo;t busy)</label></li>
</ul>
<pre><code>$ sudo systemctl stop tomcat7
$ ./letsencrypt-auto certonly --standalone -d library.cgiar.org
</code></label></li>
</code></pre>
<ul class="task-list">
<li><label><input type="checkbox" disabled class="task-list-item"> Merge <code>cgiar-library</code> branch to <code>master</code> and re-run ansible nginx templates</label></li>
</ul>

View File

@ -9,7 +9,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2017-09/</loc>
<lastmod>2017-09-18T17:46:57+03:00</lastmod>
<lastmod>2017-09-18T18:18:09+03:00</lastmod>
</url>
<url>
@ -135,7 +135,7 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2017-09-18T17:46:57+03:00</lastmod>
<lastmod>2017-09-18T18:18:09+03:00</lastmod>
<priority>0</priority>
</url>
@ -147,13 +147,13 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
<lastmod>2017-09-18T17:46:57+03:00</lastmod>
<lastmod>2017-09-18T18:18:09+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2017-09-18T17:46:57+03:00</lastmod>
<lastmod>2017-09-18T18:18:09+03:00</lastmod>
<priority>0</priority>
</url>