Update notes for 2017-09-19

This commit is contained in:
Alan Orth 2017-09-19 12:53:00 +03:00
parent d1eed90c0a
commit f8550d509e
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
7 changed files with 161 additions and 139 deletions

View File

@ -6,7 +6,7 @@ categories = ["Notes"]
slug = "cgiar-library-migration" slug = "cgiar-library-migration"
+++ +++
_Temporarily making this a page because it seems Hugo (currently 0.27.1) cannot use a custom slug for a post when there is a permalink defined in `config.toml`_ _Note: I'm temporarily making this a page because it seems Hugo (currently 0.27.1) cannot use a custom slug for a post when there is a permalink defined in `config.toml`_
Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called _CGIAR System Organization_. Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called _CGIAR System Organization_.
@ -19,15 +19,14 @@ Things that need to happen before the migration:
- Set up nginx redirects for URLs like: - Set up nginx redirects for URLs like:
- [x] https://library.cgiar.org/bitstream/handle/10947/2699/CGIAR_Branding_Guidelines_and_Toolkit.pdf - [x] https://library.cgiar.org/bitstream/handle/10947/2699/CGIAR_Branding_Guidelines_and_Toolkit.pdf
- [x] https://library.cgiar.org/handle/10947/4258 - [x] https://library.cgiar.org/handle/10947/4258
- [ ] Merge [#339](https://github.com/ilri/DSpace/pull/339) to `5_x-prod` branch and rebuild DSpace - [x] Merge [#339](https://github.com/ilri/DSpace/pull/339) to `5_x-prod` branch and rebuild DSpace
- [x] Increase `max_connections` in `/etc/postgresql/9.5/main/postgresql.conf` by ~10 - [x] Increase `max_connections` in `/etc/postgresql/9.5/main/postgresql.conf` by ~10
- `SELECT * FROM pg_stat_activity;` seems to show ~6 extra connections used by the command line tools during import - `SELECT * FROM pg_stat_activity;` seems to show ~6 extra connections used by the command line tools during import
- [x] Temporarily disable nightly `index-discovery` cron job because the import process will be taking place during some of this time and I don't want them to be competing to update the Solr index - [x] Temporarily disable nightly `index-discovery` cron job because the import process will be taking place during some of this time and I don't want them to be competing to update the Solr index
## Migration ## Migration Process
Process for the actual migration:
- Export all top-level communities and collections from DSpace Test: **Export all top-level communities and collections from DSpace Test:**
``` ```
$ export PATH=$PATH:/home/dspacetest.cgiar.org/bin $ export PATH=$PATH:/home/dspacetest.cgiar.org/bin
@ -45,106 +44,106 @@ $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93760 10568-93760/105
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip
``` ```
- Import to CGSpace (also see [notes from 2017-05-10](http://alanorth.github.io/cgspace-notes/2017-05/#2017-05-10)) **Import to CGSpace (also see [notes from 2017-05-10](http://alanorth.github.io/cgspace-notes/2017-05/#2017-05-10)):**
- [x] Copy all exports from DSpace Test
- [x] Add ingestion overrides to `dspace.cfg` before import:
``` - [x] Copy all exports from DSpace Test
mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL - [x] Add ingestion overrides to `dspace.cfg` before import:
mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
```
- [x] Import communities and collections, paying attention to options to skip missing parents and ignore handles: ```
mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
```
``` - [x] Import communities and collections, paying attention to options to skip missing parents and ignore handles:
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
$ export PATH=$PATH:/home/cgspace.cgiar.org/bin
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2515/10947-2515.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2516/10947-2516.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2517/10947-2517.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2518/10947-2518.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2519/10947-2519.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2708/10947-2708.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2526/10947-2526.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2871/10947-2871.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-4467/10947-4467.zip
$ dspace packager -s -u -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-2527/10947-2527.zip
$ for item in 10947-2527/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
$ dspace packager -s -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-1/10947-1.zip
$ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
$ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
```
- This submits AIP hierarchies recursively (-r) and suppresses errors when an item's parent collection hasn't been created yet—for example, if the item is mapped ```
- The large historic archive (10947/1) is created in several steps because it requires a lot of memory and often crashes $ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
- Create new subcommunities and collections for content we reorganized into new hierarchies from the original: $ export PATH=$PATH:/home/cgspace.cgiar.org/bin
- [x] Create _CGIAR System Management Board_ sub-community: 10568/83536 $ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2515/10947-2515.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2516/10947-2516.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2517/10947-2517.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2518/10947-2518.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2519/10947-2519.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2708/10947-2708.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2526/10947-2526.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2871/10947-2871.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-4467/10947-4467.zip
$ dspace packager -s -u -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-2527/10947-2527.zip
$ for item in 10947-2527/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
$ dspace packager -s -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-1/10947-1.zip
$ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
$ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
```
This submits AIP hierarchies recursively (-r) and suppresses errors when an item's parent collection hasn't been created yet—for example, if the item is mapped. The large historic archive (10947/1) is created in several steps because it requires a lot of memory and often crashes.
**Create new subcommunities and collections for content we reorganized into new hierarchies from the original:**
- [x] Create _CGIAR System Management Board_ sub-community: 10568/83536
- [x] Content from _CGIAR System Management Board documents_ collection (10947/4561) goes here - [x] Content from _CGIAR System Management Board documents_ collection (10947/4561) goes here
- Import collection hierarchy first and then the items: - Import collection hierarchy first and then the items:
``` ```
$ dspace packager -r -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83536 10568-93760/COLLECTION@10947-4651.zip $ dspace packager -r -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83536 10568-93760/COLLECTION@10947-4651.zip
$ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done $ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
``` ```
- [x] Create _CGIAR System Management Office_ sub-community: 10568/83537 - [x] Create _CGIAR System Management Office_ sub-community: 10568/83537
- [x] Create _CGIAR System Management Office documents_ collection: 10568/83538 - [x] Create _CGIAR System Management Office documents_ collection: 10568/83538
- Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents: - Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:
``` ```
$ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done $ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done
``` ```
- Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May: **Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:**
``` ```
dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z'); dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z');
``` ```
- Export them from the CGIAR Library: - Export them from the CGIAR Library:
``` ```
# for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done # for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done
``` ```
- Import on CGSpace: - Import on CGSpace:
``` ```
$ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done $ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done
``` ```
- [ ] Shut down Tomcat and run `update-sequences.sql` as the system's `postgres` user
## Post Migration ## Post Migration
- [ ] Shut down Tomcat and run `update-sequences.sql` as the system's `postgres` user
- [x] Remove ingestion overrides from `dspace.cfg` - [x] Remove ingestion overrides from `dspace.cfg`
- [ ] Reset PostgreSQL `max_connections` to 183 - [x] Reset PostgreSQL `max_connections` to 183
- [x] Enable nightly `index-discovery` cron job - [x] Enable nightly `index-discovery` cron job
- HTTPS certificates: - HTTPS certificates:
- [x] Install current certificates from their Tomcat keystore - [x] Install current certificates from their Tomcat keystore
``` ```
$ keytool -list -keystore tomcat.keystore $ keytool -list -keystore tomcat.keystore
$ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat $ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat
$ openssl pkcs12 -in library.cgiar.org.p12 -nokeys -out library.cgiar.org.crt.pem $ openssl pkcs12 -in library.cgiar.org.p12 -nokeys -out library.cgiar.org.crt.pem
$ openssl pkcs12 -in library.cgiar.org.p12 -nodes -nocerts -out library.cgiar.org.key.pem $ openssl pkcs12 -in library.cgiar.org.p12 -nodes -nocerts -out library.cgiar.org.key.pem
$ wget https://certs.godaddy.com/repository/gdroot-g2.crt https://certs.godaddy.com/repository/gdig2.crt.pem $ wget https://certs.godaddy.com/repository/gdroot-g2.crt https://certs.godaddy.com/repository/gdig2.crt.pem
$ cat library.cgiar.org.crt.pem gdig2.crt.pem > library.cgiar.org-chained.pem $ cat library.cgiar.org.crt.pem gdig2.crt.pem > library.cgiar.org-chained.pem
``` ```
- [ ] Update DNS records: - [ ] Update DNS records:
- CNAME: cgspace.cgiar.org - CNAME: cgspace.cgiar.org
- [ ] Re-deploy DSpace from freshly built `5_x-prod` branch - [ ] Re-deploy DSpace from freshly built `5_x-prod` branch
- [ ] Merge `cgiar-library` branch to `master` and re-run ansible nginx templates
- [ ] Run system updates and reboot server - [ ] Run system updates and reboot server
- [ ] Switch to Let's Encrypt HTTPS certificates (after DNS is updated and server isn't busy) - [ ] Switch to Let's Encrypt HTTPS certificates (after DNS is updated and server isn't busy):
``` ```
$ sudo systemctl stop tomcat7 $ sudo systemctl stop tomcat7
$ ./letsencrypt-auto certonly --standalone -d library.cgiar.org $ ./letsencrypt-auto certonly --standalone -d library.cgiar.org
``` ```
- [ ] Merge `cgiar-library` branch to `master` and re-run ansible nginx templates
## Troubleshooting ## Troubleshooting
### Foreign Key Error in `dspace cleanup` ### Foreign Key Error in `dspace cleanup`

View File

@ -398,3 +398,19 @@ $ for item in 10568-93759/ITEM@10947-46*; do ~/dspace/bin/dspace packager -r -t
![After DSpace 5.5](/cgspace-notes/2017/09/10947-2919-after.jpg) ![After DSpace 5.5](/cgspace-notes/2017/09/10947-2919-after.jpg)
- Moved the CGIAR Library Migration notes to a page[cgiar-library-migration]({{< relref "cgiar-library-migration.md" >}})as there seems to be a bug with post slugs defined in frontmatter when you have a permalink scheme defined in `config.toml` (happens currently in Hugo 0.27.1 at least) - Moved the CGIAR Library Migration notes to a page[cgiar-library-migration]({{< relref "cgiar-library-migration.md" >}})as there seems to be a bug with post slugs defined in frontmatter when you have a permalink scheme defined in `config.toml` (happens currently in Hugo 0.27.1 at least)
## 2017-09-19
- Nightly Solr indexing is working again, and it appears to be pretty quick actually:
```
2017-09-19 00:00:14,953 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (0 of 65808): 17607
...
2017-09-19 00:04:18,017 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (65807 of 65808): 83753
```
- Sisay asked if he could import 50 items for IITA that have already been checked by Bosede and Bizuwork
- I had a look at the collection and noticed a bunch of issues with item types and donors, so I asked him to fix those and import it to DSpace Test again first
- Abenet wants to be able to filter by ISI Journal in advanced search on queries like this: https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&filtertype_1=dateIssued&filter_relational_operator_1=equals&filter_relational_operator_0=equals&filter_1=%5B2010+TO+2017%5D&filter_0=2017&filtertype=type&filter_relational_operator=equals&filter=Journal+Article
- I opened an issue to track this ([#340](https://github.com/ilri/DSpace/issues/340)) and will test it on DSpace Test soon

View File

@ -61,7 +61,7 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "September, 2017", "headline": "September, 2017",
"url": "https://alanorth.github.io/cgspace-notes/2017-09/", "url": "https://alanorth.github.io/cgspace-notes/2017-09/",
"wordCount": "2764", "wordCount": "2886",
"datePublished": "2017-09-07T16:54:52&#43;07:00", "datePublished": "2017-09-07T16:54:52&#43;07:00",
"dateModified": "2017-09-18T18:18:09&#43;03:00", "dateModified": "2017-09-18T18:18:09&#43;03:00",
"author": { "author": {
@ -569,6 +569,24 @@ DELETE 207
<li>Moved the CGIAR Library Migration notes to a page<a href="/cgspace-notes/cgiar-library-migration/">cgiar-library-migration</a>as there seems to be a bug with post slugs defined in frontmatter when you have a permalink scheme defined in <code>config.toml</code> (happens currently in Hugo 0.27.1 at least)</li> <li>Moved the CGIAR Library Migration notes to a page<a href="/cgspace-notes/cgiar-library-migration/">cgiar-library-migration</a>as there seems to be a bug with post slugs defined in frontmatter when you have a permalink scheme defined in <code>config.toml</code> (happens currently in Hugo 0.27.1 at least)</li>
</ul> </ul>
<h2 id="2017-09-19">2017-09-19</h2>
<ul>
<li>Nightly Solr indexing is working again, and it appears to be pretty quick actually:</li>
</ul>
<pre><code>2017-09-19 00:00:14,953 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (0 of 65808): 17607
...
2017-09-19 00:04:18,017 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (65807 of 65808): 83753
</code></pre>
<ul>
<li>Sisay asked if he could import 50 items for IITA that have already been checked by Bosede and Bizuwork</li>
<li>I had a look at the collection and noticed a bunch of issues with item types and donors, so I asked him to fix those and import it to DSpace Test again first</li>
<li>Abenet wants to be able to filter by ISI Journal in advanced search on queries like this: <a href="https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&amp;filtertype_1=dateIssued&amp;filter_relational_operator_1=equals&amp;filter_relational_operator_0=equals&amp;filter_1=%5B2010+TO+2017%5D&amp;filter_0=2017&amp;filtertype=type&amp;filter_relational_operator=equals&amp;filter=Journal+Article">https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&amp;filtertype_1=dateIssued&amp;filter_relational_operator_1=equals&amp;filter_relational_operator_0=equals&amp;filter_1=%5B2010+TO+2017%5D&amp;filter_0=2017&amp;filtertype=type&amp;filter_relational_operator=equals&amp;filter=Journal+Article</a></li>
<li>I opened an issue to track this (<a href="https://github.com/ilri/DSpace/issues/340">#340</a>) and will test it on DSpace Test soon</li>
</ul>

View File

@ -17,7 +17,7 @@
<pubDate>Mon, 18 Sep 2017 16:38:35 +0300</pubDate> <pubDate>Mon, 18 Sep 2017 16:38:35 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/cgiar-library-migration/</guid> <guid>https://alanorth.github.io/cgspace-notes/cgiar-library-migration/</guid>
<description>Temporarily making this a page because it seems Hugo (currently 0.27.1) cannot use a custom slug for a post when there is a permalink defined in config.toml <description>Note: I&amp;rsquo;m temporarily making this a page because it seems Hugo (currently 0.27.1) cannot use a custom slug for a post when there is a permalink defined in config.toml
Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization. Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization.
Pre-migration Technical TODOs Things that need to happen before the migration: Pre-migration Technical TODOs Things that need to happen before the migration:
Create top-level community on CGSpace to hold the CGIAR Library content: 10568&amp;frasl;83389 Update nginx redirects in ansible templates Update handle in DSpace XMLUI config Set up nginx redirects for URLs like: https://library.</description> Create top-level community on CGSpace to hold the CGIAR Library content: 10568&amp;frasl;83389 Update nginx redirects in ansible templates Update handle in DSpace XMLUI config Set up nginx redirects for URLs like: https://library.</description>

View File

@ -13,7 +13,7 @@
<meta property="article:published_time" content="2017-09-18T16:38:35&#43;03:00"/> <meta property="article:published_time" content="2017-09-18T16:38:35&#43;03:00"/>
<meta property="article:modified_time" content="2017-09-18T18:05:57&#43;03:00"/> <meta property="article:modified_time" content="2017-09-18T21:24:27&#43;03:00"/>
@ -37,9 +37,9 @@
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "CGIAR Library Migration", "headline": "CGIAR Library Migration",
"url": "https://alanorth.github.io/cgspace-notes/cgiar-library-migration/", "url": "https://alanorth.github.io/cgspace-notes/cgiar-library-migration/",
"wordCount": "1169", "wordCount": "1167",
"datePublished": "2017-09-18T16:38:35&#43;03:00", "datePublished": "2017-09-18T16:38:35&#43;03:00",
"dateModified": "2017-09-18T18:05:57&#43;03:00", "dateModified": "2017-09-18T21:24:27&#43;03:00",
"author": { "author": {
"@type": "Person", "@type": "Person",
"name": "Alan Orth" "name": "Alan Orth"
@ -108,7 +108,7 @@
</header> </header>
<p><em>Temporarily making this a page because it seems Hugo (currently 0.27.1) cannot use a custom slug for a post when there is a permalink defined in <code>config.toml</code></em></p> <p><em>Note: I&rsquo;m temporarily making this a page because it seems Hugo (currently 0.27.1) cannot use a custom slug for a post when there is a permalink defined in <code>config.toml</code></em></p>
<p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p> <p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p>
@ -129,7 +129,7 @@
<li><label><input type="checkbox" checked disabled class="task-list-item"> <a href="https://library.cgiar.org/bitstream/handle/10947/2699/CGIAR_Branding_Guidelines_and_Toolkit.pdf">https://library.cgiar.org/bitstream/handle/10947/2699/CGIAR_Branding_Guidelines_and_Toolkit.pdf</a></label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> <a href="https://library.cgiar.org/bitstream/handle/10947/2699/CGIAR_Branding_Guidelines_and_Toolkit.pdf">https://library.cgiar.org/bitstream/handle/10947/2699/CGIAR_Branding_Guidelines_and_Toolkit.pdf</a></label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> <a href="https://library.cgiar.org/handle/10947/4258">https://library.cgiar.org/handle/10947/4258</a></label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> <a href="https://library.cgiar.org/handle/10947/4258">https://library.cgiar.org/handle/10947/4258</a></label></li>
</ul></li> </ul></li>
<li><label><input type="checkbox" disabled class="task-list-item"> Merge <a href="https://github.com/ilri/DSpace/pull/339">#339</a> to <code>5_x-prod</code> branch and rebuild DSpace</label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> Merge <a href="https://github.com/ilri/DSpace/pull/339">#339</a> to <code>5_x-prod</code> branch and rebuild DSpace</label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Increase <code>max_connections</code> in <code>/etc/postgresql/9.5/main/postgresql.conf</code> by ~10 <li><label><input type="checkbox" checked disabled class="task-list-item"> Increase <code>max_connections</code> in <code>/etc/postgresql/9.5/main/postgresql.conf</code> by ~10
<ul> <ul>
@ -138,13 +138,9 @@
<li><label><input type="checkbox" checked disabled class="task-list-item"> Temporarily disable nightly <code>index-discovery</code> cron job because the import process will be taking place during some of this time and I don&rsquo;t want them to be competing to update the Solr index</label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> Temporarily disable nightly <code>index-discovery</code> cron job because the import process will be taking place during some of this time and I don&rsquo;t want them to be competing to update the Solr index</label></li>
</ul> </ul>
<h2 id="migration">Migration</h2> <h2 id="migration-process">Migration Process</h2>
<p>Process for the actual migration:</p> <p><strong>Export all top-level communities and collections from DSpace Test:</strong></p>
<ul>
<li>Export all top-level communities and collections from DSpace Test:</li>
</ul>
<pre><code>$ export PATH=$PATH:/home/dspacetest.cgiar.org/bin <pre><code>$ export PATH=$PATH:/home/dspacetest.cgiar.org/bin
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2515 10947-2515/10947-2515.zip $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2515 10947-2515/10947-2515.zip
@ -161,51 +157,50 @@ $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93760 10568-93760/105
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip $ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip
</code></pre> </code></pre>
<ul class="task-list"> <p><strong>Import to CGSpace (also see <a href="http://alanorth.github.io/cgspace-notes/2017-05/#2017-05-10">notes from 2017-05-10</a>):</strong></p>
<li>Import to CGSpace (also see <a href="http://alanorth.github.io/cgspace-notes/2017-05/#2017-05-10">notes from 2017-05-10</a>)
<ul class="task-list"> <ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Copy all exports from DSpace Test</label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> Copy all exports from DSpace Test</label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Add ingestion overrides to <code>dspace.cfg</code> before import:</label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> Add ingestion overrides to <code>dspace.cfg</code> before import:</label></li>
</ul></li>
</ul> </ul>
<pre><code> mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL <pre><code>mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
</code></pre> </code></pre>
<ul class="task-list"> <ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Import communities and collections, paying attention to options to skip missing parents and ignore handles:</label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> Import communities and collections, paying attention to options to skip missing parents and ignore handles:</label></li>
</ul> </ul>
<pre><code> $ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot; <pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
$ export PATH=$PATH:/home/cgspace.cgiar.org/bin $ export PATH=$PATH:/home/cgspace.cgiar.org/bin
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2515/10947-2515.zip $ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2515/10947-2515.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2516/10947-2516.zip $ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2516/10947-2516.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2517/10947-2517.zip $ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2517/10947-2517.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2518/10947-2518.zip $ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2518/10947-2518.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2519/10947-2519.zip $ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2519/10947-2519.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2708/10947-2708.zip $ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2708/10947-2708.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2526/10947-2526.zip $ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2526/10947-2526.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2871/10947-2871.zip $ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2871/10947-2871.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-4467/10947-4467.zip $ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-4467/10947-4467.zip
$ dspace packager -s -u -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-2527/10947-2527.zip $ dspace packager -s -u -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-2527/10947-2527.zip
$ for item in 10947-2527/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done $ for item in 10947-2527/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
$ dspace packager -s -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-1/10947-1.zip $ dspace packager -s -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-1/10947-1.zip
$ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done $ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
$ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done $ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
</code></pre> </code></pre>
<ul class="task-list"> <p>This submits AIP hierarchies recursively (-r) and suppresses errors when an item&rsquo;s parent collection hasn&rsquo;t been created yet—for example, if the item is mapped. The large historic archive (<sup>10947</sup>&frasl;<sub>1</sub>) is created in several steps because it requires a lot of memory and often crashes.</p>
<li>This submits AIP hierarchies recursively (-r) and suppresses errors when an item&rsquo;s parent collection hasn&rsquo;t been created yet—for example, if the item is mapped</li>
<li>The large historic archive (<sup>10947</sup>&frasl;<sub>1</sub>) is created in several steps because it requires a lot of memory and often crashes</li>
<li><p>Create new subcommunities and collections for content we reorganized into new hierarchies from the original:</p> <p><strong>Create new subcommunities and collections for content we reorganized into new hierarchies from the original:</strong></p>
<ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Board</em> sub-community: <sup>10568</sup>&frasl;<sub>83536</sub>
<ul class="task-list"> <ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Board</em> sub-community: <sup>10568</sup>&frasl;<sub>83536</sub></label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Content from <em>CGIAR System Management Board documents</em> collection (<sup>10947</sup>&frasl;<sub>4561</sub>) goes here</label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> Content from <em>CGIAR System Management Board documents</em> collection (<sup>10947</sup>&frasl;<sub>4561</sub>) goes here</label></li>
<li>Import collection hierarchy first and then the items:</li> <li>Import collection hierarchy first and then the items:</li>
</ul></label></li>
</ul> </ul>
<pre><code>$ dspace packager -r -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83536 10568-93760/COLLECTION@10947-4651.zip <pre><code>$ dspace packager -r -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83536 10568-93760/COLLECTION@10947-4651.zip
@ -213,45 +208,42 @@ $ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e
</code></pre> </code></pre>
<ul class="task-list"> <ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Office</em> sub-community: <sup>10568</sup>&frasl;<sub>83537</sub></label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Office</em> sub-community: <sup>10568</sup>&frasl;<sub>83537</sub>
<ul class="task-list">
<li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Office documents</em> collection: <sup>10568</sup>&frasl;<sub>83538</sub></label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> Create <em>CGIAR System Management Office documents</em> collection: <sup>10568</sup>&frasl;<sub>83538</sub></label></li>
<li>Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:</li> <li>Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:</li>
</ul></label></li>
</ul> </ul>
<pre><code>$ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done <pre><code>$ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done
</code></pre> </code></pre>
<ul> <p><strong>Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:</strong></p>
<li>Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:</li>
</ul></li>
</ul>
<pre><code> dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) &gt; '2017-05-01T00:00:00Z'); <pre><code>dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) &gt; '2017-05-01T00:00:00Z');
</code></pre> </code></pre>
<ul> <ul>
<li>Export them from the CGIAR Library:</li> <li>Export them from the CGIAR Library:</li>
</ul> </ul>
<pre><code> # for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done <pre><code># for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done
</code></pre> </code></pre>
<ul> <ul>
<li>Import on CGSpace:</li> <li>Import on CGSpace:</li>
</ul> </ul>
<pre><code> $ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done <pre><code>$ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done
</code></pre> </code></pre>
<ul class="task-list">
<li><label><input type="checkbox" disabled class="task-list-item"> Shut down Tomcat and run <code>update-sequences.sql</code> as the system&rsquo;s <code>postgres</code> user</label></li>
</ul>
<h2 id="post-migration">Post Migration</h2> <h2 id="post-migration">Post Migration</h2>
<ul class="task-list"> <ul class="task-list">
<li><label><input type="checkbox" disabled class="task-list-item"> Shut down Tomcat and run <code>update-sequences.sql</code> as the system&rsquo;s <code>postgres</code> user</label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Remove ingestion overrides from <code>dspace.cfg</code></label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> Remove ingestion overrides from <code>dspace.cfg</code></label></li>
<li><label><input type="checkbox" disabled class="task-list-item"> Reset PostgreSQL <code>max_connections</code> to 183</label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> Reset PostgreSQL <code>max_connections</code> to 183</label></li>
<li><label><input type="checkbox" checked disabled class="task-list-item"> Enable nightly <code>index-discovery</code> cron job</label></li> <li><label><input type="checkbox" checked disabled class="task-list-item"> Enable nightly <code>index-discovery</code> cron job</label></li>
<li>HTTPS certificates: <li>HTTPS certificates:
@ -260,12 +252,12 @@ $ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e
</ul></li> </ul></li>
</ul> </ul>
<pre><code> $ keytool -list -keystore tomcat.keystore <pre><code>$ keytool -list -keystore tomcat.keystore
$ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat $ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat
$ openssl pkcs12 -in library.cgiar.org.p12 -nokeys -out library.cgiar.org.crt.pem $ openssl pkcs12 -in library.cgiar.org.p12 -nokeys -out library.cgiar.org.crt.pem
$ openssl pkcs12 -in library.cgiar.org.p12 -nodes -nocerts -out library.cgiar.org.key.pem $ openssl pkcs12 -in library.cgiar.org.p12 -nodes -nocerts -out library.cgiar.org.key.pem
$ wget https://certs.godaddy.com/repository/gdroot-g2.crt https://certs.godaddy.com/repository/gdig2.crt.pem $ wget https://certs.godaddy.com/repository/gdroot-g2.crt https://certs.godaddy.com/repository/gdig2.crt.pem
$ cat library.cgiar.org.crt.pem gdig2.crt.pem &gt; library.cgiar.org-chained.pem $ cat library.cgiar.org.crt.pem gdig2.crt.pem &gt; library.cgiar.org-chained.pem
</code></pre> </code></pre>
<ul class="task-list"> <ul class="task-list">
@ -275,18 +267,15 @@ $ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e
<li>CNAME: cgspace.cgiar.org</li> <li>CNAME: cgspace.cgiar.org</li>
</ul></label></li> </ul></label></li>
<li><label><input type="checkbox" disabled class="task-list-item"> Re-deploy DSpace from freshly built <code>5_x-prod</code> branch</label></li> <li><label><input type="checkbox" disabled class="task-list-item"> Re-deploy DSpace from freshly built <code>5_x-prod</code> branch</label></li>
<li><label><input type="checkbox" disabled class="task-list-item"> Merge <code>cgiar-library</code> branch to <code>master</code> and re-run ansible nginx templates</label></li>
<li><label><input type="checkbox" disabled class="task-list-item"> Run system updates and reboot server</label></li> <li><label><input type="checkbox" disabled class="task-list-item"> Run system updates and reboot server</label></li>
<li><label><input type="checkbox" disabled class="task-list-item"> Switch to Let&rsquo;s Encrypt HTTPS certificates (after DNS is updated and server isn&rsquo;t busy)</label></li> <li><label><input type="checkbox" disabled class="task-list-item"> Switch to Let&rsquo;s Encrypt HTTPS certificates (after DNS is updated and server isn&rsquo;t busy):</label></li>
</ul> </ul>
<pre><code>$ sudo systemctl stop tomcat7 <pre><code>$ sudo systemctl stop tomcat7
$ ./letsencrypt-auto certonly --standalone -d library.cgiar.org $ ./letsencrypt-auto certonly --standalone -d library.cgiar.org
</code></pre> </code></pre>
<ul class="task-list">
<li><label><input type="checkbox" disabled class="task-list-item"> Merge <code>cgiar-library</code> branch to <code>master</code> and re-run ansible nginx templates</label></li>
</ul>
<h2 id="troubleshooting">Troubleshooting</h2> <h2 id="troubleshooting">Troubleshooting</h2>
<h3 id="foreign-key-error-in-dspace-cleanup">Foreign Key Error in <code>dspace cleanup</code></h3> <h3 id="foreign-key-error-in-dspace-cleanup">Foreign Key Error in <code>dspace cleanup</code></h3>

View File

@ -17,7 +17,7 @@
<pubDate>Mon, 18 Sep 2017 16:38:35 +0300</pubDate> <pubDate>Mon, 18 Sep 2017 16:38:35 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/cgiar-library-migration/</guid> <guid>https://alanorth.github.io/cgspace-notes/cgiar-library-migration/</guid>
<description>Temporarily making this a page because it seems Hugo (currently 0.27.1) cannot use a custom slug for a post when there is a permalink defined in config.toml <description>Note: I&amp;rsquo;m temporarily making this a page because it seems Hugo (currently 0.27.1) cannot use a custom slug for a post when there is a permalink defined in config.toml
Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization. Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization.
Pre-migration Technical TODOs Things that need to happen before the migration: Pre-migration Technical TODOs Things that need to happen before the migration:
Create top-level community on CGSpace to hold the CGIAR Library content: 10568&amp;frasl;83389 Update nginx redirects in ansible templates Update handle in DSpace XMLUI config Set up nginx redirects for URLs like: https://library.</description> Create top-level community on CGSpace to hold the CGIAR Library content: 10568&amp;frasl;83389 Update nginx redirects in ansible templates Update handle in DSpace XMLUI config Set up nginx redirects for URLs like: https://library.</description>

View File

@ -4,7 +4,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/cgiar-library-migration/</loc> <loc>https://alanorth.github.io/cgspace-notes/cgiar-library-migration/</loc>
<lastmod>2017-09-18T18:05:57+03:00</lastmod> <lastmod>2017-09-18T21:24:27+03:00</lastmod>
</url> </url>
<url> <url>
@ -124,7 +124,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2017-09-18T18:05:57+03:00</lastmod> <lastmod>2017-09-18T21:24:27+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
@ -141,7 +141,7 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2017-09-18T18:05:57+03:00</lastmod> <lastmod>2017-09-18T21:24:27+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>