2018-02-11 18:28:23 +02:00
<!DOCTYPE html>
2019-10-11 11:19:42 +03:00
< html lang = "en" >
2018-02-11 18:28:23 +02:00
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
< meta property = "og:title" content = "CGIAR Library Migration" / >
< meta property = "og:description" content = "Notes on the migration of the CGIAR Library to CGSpace" / >
< meta property = "og:type" content = "article" / >
2019-02-02 14:12:57 +02:00
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/cgiar-library-migration/" / >
2019-08-08 18:10:44 +03:00
< meta property = "article:published_time" content = "2017-09-18T16:38:35+03:00" / >
2019-10-28 13:43:25 +02:00
< meta property = "article:modified_time" content = "2019-10-28T13:40:20+02:00" / >
2018-09-30 08:23:48 +03:00
2018-02-11 18:28:23 +02:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "CGIAR Library Migration" / >
< meta name = "twitter:description" content = "Notes on the migration of the CGIAR Library to CGSpace" / >
2020-01-14 20:40:41 +02:00
< meta name = "generator" content = "Hugo 0.62.2" / >
2018-02-11 18:28:23 +02:00
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "CGIAR Library Migration",
2019-04-13 12:15:55 +03:00
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/cgiar-library-migration\/",
2019-11-28 17:30:45 +02:00
"wordCount": "1278",
2019-10-11 11:19:42 +03:00
"datePublished": "2017-09-18T16:38:35+03:00",
2019-10-28 13:43:25 +02:00
"dateModified": "2019-10-28T13:40:20+02:00",
2018-02-11 18:28:23 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
2019-10-28 13:43:25 +02:00
"keywords": "Notes, Migration",
2018-02-11 18:28:23 +02:00
"description": "Notes on the migration of the CGIAR Library to CGSpace"
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/cgiar-library-migration/" >
< title > CGIAR Library Migration | CGSpace Notes< / title >
2019-10-11 11:19:42 +03:00
2018-02-11 18:28:23 +02:00
<!-- combined, minified CSS -->
2020-01-23 20:19:38 +02:00
< link href = "https://alanorth.github.io/cgspace-notes/css/style.a20c1a4367639632cdb341d23c27ca44fedcc75b0f8b3cbea6203010da153d3c.css" rel = "stylesheet" integrity = "sha256-ogwaQ2djljLNs0HSPCfKRP7cx1sPizy+piAwENoVPTw=" crossorigin = "anonymous" >
2019-10-11 11:19:42 +03:00
2018-02-11 18:28:23 +02:00
2019-04-14 16:59:47 +03:00
<!-- RSS 2.0 feed -->
2018-02-11 18:28:23 +02:00
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
2018-12-19 13:20:39 +02:00
2018-02-11 18:28:23 +02:00
< header class = "blog-header" >
< div class = "container" >
2019-10-11 11:19:42 +03:00
< h1 class = "blog-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" dir = "auto" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
2018-02-11 18:28:23 +02:00
< / div >
< / header >
2018-12-19 13:20:39 +02:00
2018-02-11 18:28:23 +02:00
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
< article class = "blog-post" >
< header >
2019-10-11 11:19:42 +03:00
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/cgiar-library-migration/" > CGIAR Library Migration< / a > < / h2 >
2018-02-11 18:28:23 +02:00
< p class = "blog-post-meta" > < time datetime = "2017-09-18T16:38:35+03:00" > Mon Sep 18, 2017< / time > by Alan Orth in
< i class = "fa fa-folder" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/categories/notes" rel = "category tag" > Notes< / a >
2019-10-28 13:43:25 +02:00
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/migration" rel = "tag" > Migration< / a >
2018-02-11 18:28:23 +02:00
< / p >
< / header >
< p > Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called < em > CGIAR System Organization< / em > .< / p >
2019-12-17 14:49:24 +02:00
< h2 id = "pre-migration-technical-todos" > Pre-migration Technical TODOs< / h2 >
2018-02-11 18:28:23 +02:00
< p > Things that need to happen before the migration:< / p >
2019-11-28 17:30:45 +02:00
< ul >
< li > < input checked = "" disabled = "" type = "checkbox" > Create top-level community on CGSpace to hold the CGIAR Library content: < code > 10568/83389< / code >
< ul >
< li > < input checked = "" disabled = "" type = "checkbox" > Update nginx redirects in ansible templates< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Update handle in DSpace XMLUI config< / li >
< / ul >
< / li >
2018-02-11 18:28:23 +02:00
< li > Set up nginx redirects for URLs like:
2019-11-28 17:30:45 +02:00
< ul >
< li > < input checked = "" disabled = "" type = "checkbox" > < a href = "https://library.cgiar.org/bitstream/handle/10947/2699/CGIAR_Branding_Guidelines_and_Toolkit.pdf" > https://library.cgiar.org/bitstream/handle/10947/2699/CGIAR_Branding_Guidelines_and_Toolkit.pdf< / a > < / li >
< li > < input checked = "" disabled = "" type = "checkbox" > < a href = "https://library.cgiar.org/handle/10947/4258" > https://library.cgiar.org/handle/10947/4258< / a > < / li >
< / ul >
< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Merge < a href = "https://github.com/ilri/DSpace/pull/339" > #339< / a > to < code > 5_x-prod< / code > branch and rebuild DSpace< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Increase < code > max_connections< / code > in < code > /etc/postgresql/9.5/main/postgresql.conf< / code > by ~10
2018-02-11 18:28:23 +02:00
< ul >
< li > < code > SELECT * FROM pg_stat_activity;< / code > seems to show ~6 extra connections used by the command line tools during import< / li >
2019-11-28 17:30:45 +02:00
< / ul >
< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Temporarily disable nightly < code > index-discovery< / code > cron job because the import process will be taking place during some of this time and I don't want them to be competing to update the Solr index< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Copy HTTPS certificate key pair from CGIAR Library server's Tomcat keystore:< / li >
< / ul >
2018-02-11 18:28:23 +02:00
< pre > < code > $ keytool -list -keystore tomcat.keystore
$ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat
$ openssl pkcs12 -in library.cgiar.org.p12 -nokeys -out library.cgiar.org.crt.pem
$ openssl pkcs12 -in library.cgiar.org.p12 -nodes -nocerts -out library.cgiar.org.key.pem
$ wget https://certs.godaddy.com/repository/gdroot-g2.crt https://certs.godaddy.com/repository/gdig2.crt.pem
$ cat library.cgiar.org.crt.pem gdig2.crt.pem > library.cgiar.org-chained.pem
2019-11-28 17:30:45 +02:00
< / code > < / pre > < h2 id = "migration-process" > Migration Process< / h2 >
2018-02-11 18:28:23 +02:00
< p > < strong > Export all top-level communities and collections from DSpace Test:< / strong > < / p >
< pre > < code > $ export PATH=$PATH:/home/dspacetest.cgiar.org/bin
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2515 10947-2515/10947-2515.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2516 10947-2516/10947-2516.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2517 10947-2517/10947-2517.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2518 10947-2518/10947-2518.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2519 10947-2519/10947-2519.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2708 10947-2708/10947-2708.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2526 10947-2526/10947-2526.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2871 10947-2871/10947-2871.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/2527 10947-2527/10947-2527.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93759 10568-93759/10568-93759.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10568/93760 10568-93760/10568-93760.zip
$ dspace packager -d -a -t AIP -e aorth@mjanja.ch -i 10947/1 10947-1/10947-1.zip
2019-11-28 17:30:45 +02:00
< / code > < / pre > < p > < strong > Import to CGSpace (also see < a href = "http://alanorth.github.io/cgspace-notes/2017-05/#2017-05-10" > notes from 2017-05-10< / a > ):< / strong > < / p >
< ul >
< li > < input checked = "" disabled = "" type = "checkbox" > Copy all exports from DSpace Test< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Add ingestion overrides to < code > dspace.cfg< / code > before import:< / li >
< / ul >
2018-02-11 18:28:23 +02:00
< pre > < code > mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
2019-11-28 17:30:45 +02:00
< / code > < / pre > < ul >
< li > < input checked = "" disabled = "" type = "checkbox" > Import communities and collections, paying attention to options to skip missing parents and ignore handles:< / li >
< / ul >
2018-02-11 18:28:23 +02:00
< pre > < code > $ export JAVA_OPTS=" -Dfile.encoding=UTF-8 -Xmx3072m -XX:-UseGCOverheadLimit -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
$ export PATH=$PATH:/home/cgspace.cgiar.org/bin
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2515/10947-2515.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2516/10947-2516.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2517/10947-2517.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2518/10947-2518.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2519/10947-2519.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2708/10947-2708.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2526/10947-2526.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-2871/10947-2871.zip
$ dspace packager -r -u -a -t AIP -o skipIfParentMissing=true -e aorth@mjanja.ch -p 10568/83389 10947-4467/10947-4467.zip
$ dspace packager -s -u -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-2527/10947-2527.zip
$ for item in 10947-2527/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
$ dspace packager -s -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-1/10947-1.zip
$ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
$ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
2019-11-28 17:30:45 +02:00
< / code > < / pre > < p > This submits AIP hierarchies recursively (-r) and suppresses errors when an item's parent collection hasn't been created yet—for example, if the item is mapped. The large historic archive (10947/1) is created in several steps because it requires a lot of memory and often crashes.< / p >
2018-02-11 18:28:23 +02:00
< p > < strong > Create new subcommunities and collections for content we reorganized into new hierarchies from the original:< / strong > < / p >
2019-11-28 17:30:45 +02:00
< ul >
< li > < input checked = "" disabled = "" type = "checkbox" > Create < em > CGIAR System Management Board< / em > sub-community: < code > 10568/83536< / code >
< ul >
< li > < input checked = "" disabled = "" type = "checkbox" > Content from < em > CGIAR System Management Board documents< / em > collection (< code > 10947/4561< / code > ) goes here< / li >
< li > Import collection hierarchy first and then the items:< / li >
< / ul >
< / li >
< / ul >
2018-02-11 18:28:23 +02:00
< pre > < code > $ dspace packager -r -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83536 10568-93760/COLLECTION@10947-4651.zip
$ for item in 10568-93760/ITEM@10947-465*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
2019-11-28 17:30:45 +02:00
< / code > < / pre > < ul >
< li > < input checked = "" disabled = "" type = "checkbox" > Create < em > CGIAR System Management Office< / em > sub-community: < code > 10568/83537< / code >
< ul >
< li > < input checked = "" disabled = "" type = "checkbox" > Create < em > CGIAR System Management Office documents< / em > collection: < code > 10568/83538< / code > < / li >
< li > Import items to collection individually in replace mode (-r) while explicitly preserving handles and ignoring parents:< / li >
2019-05-05 16:45:12 +03:00
< / ul >
2019-11-28 17:30:45 +02:00
< / li >
< / ul >
< pre > < code > $ for item in 10568-93759/ITEM@10947-46*; do dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/83538 $item; done
< / code > < / pre > < p > < strong > Get the handles for the last few items from CGIAR Library that were created since we did the migration to DSpace Test in May:< / strong > < / p >
2018-02-11 18:28:23 +02:00
< pre > < code > dspace=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select item_id from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z');
2019-11-28 17:30:45 +02:00
< / code > < / pre > < ul >
< li > Export them from the CGIAR Library:< / li >
< / ul >
2018-02-11 18:28:23 +02:00
< pre > < code > # for handle in 10947/4658 10947/4659 10947/4660 10947/4661 10947/4665 10947/4664 10947/4666 10947/4669; do /usr/local/dspace/bin/dspace packager -d -a -t AIP -e m.marus@cgiar.org -i $handle ${handle}.zip; done
2019-11-28 17:30:45 +02:00
< / code > < / pre > < ul >
< li > Import on CGSpace:< / li >
< / ul >
2018-02-11 18:28:23 +02:00
< pre > < code > $ for item in 10947-latest/*.zip; do dspace packager -r -u -t AIP -e aorth@mjanja.ch $item; done
2019-11-28 17:30:45 +02:00
< / code > < / pre > < h2 id = "post-migration" > Post Migration< / h2 >
< ul >
< li > < input checked = "" disabled = "" type = "checkbox" > Shut down Tomcat and run < code > update-sequences.sql< / code > as the system's < code > postgres< / code > user< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Remove ingestion overrides from < code > dspace.cfg< / code > < / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Reset PostgreSQL < code > max_connections< / code > to 183< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Enable nightly < code > index-discovery< / code > cron job< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Adjust CGSpace's < code > handle-server/config.dct< / code > to add the new prefix alongside our existing 10568, ie:< / li >
2019-05-05 16:45:12 +03:00
< / ul >
2018-02-11 18:28:23 +02:00
< pre > < code > " server_admins" = (
" 300:0.NA/10568"
" 300:0.NA/10947"
)
" replication_admins" = (
" 300:0.NA/10568"
" 300:0.NA/10947"
)
" backup_admins" = (
" 300:0.NA/10568"
" 300:0.NA/10947"
)
2019-11-28 17:30:45 +02:00
< / code > < / pre > < p > I had been regenerated the < code > sitebndl.zip< / code > file on the CGIAR Library server and sent it to the Handle.net admins but they said that there were mismatches between the public and private keys, which I suspect is due to < code > make-handle-config< / code > not being very flexible. After discussing our scenario with the Handle.net admins they said we actually don't need to send an updated < code > sitebndl.zip< / code > for this type of change, and the above < code > config.dct< / code > edits are all that is required. I guess they just did something on their end by setting the authoritative IP address for the 10947 prefix to be the same as ours… < / p >
< ul >
< li > < input checked = "" disabled = "" type = "checkbox" > Update DNS records:
2018-02-11 18:28:23 +02:00
< ul >
< li > CNAME: cgspace.cgiar.org< / li >
2019-11-28 17:30:45 +02:00
< / ul >
< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Re-deploy DSpace from freshly built < code > 5_x-prod< / code > branch< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Merge < code > cgiar-library< / code > branch to < code > master< / code > and re-run ansible nginx templates< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Run system updates and reboot server< / li >
< li > < input checked = "" disabled = "" type = "checkbox" > Switch to Let's Encrypt HTTPS certificates (after DNS is updated and server isn't busy):< / li >
< / ul >
2018-02-11 18:28:23 +02:00
< pre > < code > $ sudo systemctl stop nginx
$ /opt/certbot-auto certonly --standalone -d library.cgiar.org
$ sudo systemctl start nginx
2019-11-28 17:30:45 +02:00
< / code > < / pre > < h2 id = "troubleshooting" > Troubleshooting< / h2 >
2018-02-11 18:28:23 +02:00
< h3 id = "foreign-key-error-in-dspace-cleanup" > Foreign Key Error in < code > dspace cleanup< / code > < / h3 >
< p > The cleanup script is sometimes used during import processes to clean the database and assetstore after failed AIP imports. If you see the following error with < code > dspace cleanup -v< / code > :< / p >
< pre > < code > Error: ERROR: update or delete on table " bitstream" violates foreign key constraint " bundle_primary_bitstream_id_fkey" on table " bundle"
Detail: Key (bitstream_id)=(119841) is still referenced from table " bundle" .
2019-11-28 17:30:45 +02:00
< / code > < / pre > < p > The solution is to set the < code > primary_bitstream_id< / code > to NULL in PostgreSQL:< / p >
2018-02-11 18:28:23 +02:00
< pre > < code > dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (119841);
2019-11-28 17:30:45 +02:00
< / code > < / pre > < h3 id = "psqlexception-during-aip-ingest" > PSQLException During AIP Ingest< / h3 >
2018-02-11 18:28:23 +02:00
< p > After a few rounds of ingesting—possibly with failures—you might end up with inconsistent IDs in the database. In this case, during AIP ingest of a single collection in submit mode (-s):< / p >
< pre > < code > org.dspace.content.packager.PackageValidationException: Exception while ingesting 10947-2527/10947-2527.zip, Reason: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint " handle_pkey"
Detail: Key (handle_id)=(86227) already exists.
2019-11-28 17:30:45 +02:00
< / code > < / pre > < p > The normal solution is to run the < code > update-sequences.sql< / code > script (with Tomcat shut down) but it doesn't seem to work in this case. Finding the maximum < code > handle_id< / code > and manually updating the sequence seems to work:< / p >
2018-02-11 18:28:23 +02:00
< pre > < code > dspace=# select * from handle where handle_id=(select max(handle_id) from handle);
dspace=# select setval('handle_seq',86873);
< / code > < / pre >
< / article >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
2020-01-14 20:40:41 +02:00
< li > < a href = "/cgspace-notes/2020-01/" > January, 2020< / a > < / li >
2019-12-01 11:29:49 +02:00
< li > < a href = "/cgspace-notes/2019-12/" > December, 2019< / a > < / li >
2019-11-04 16:41:19 +02:00
< li > < a href = "/cgspace-notes/2019-11/" > November, 2019< / a > < / li >
2019-10-28 13:43:25 +02:00
< li > < a href = "/cgspace-notes/cgspace-cgcorev2-migration/" > CGSpace CG Core v2 Migration< / a > < / li >
2019-10-01 17:31:40 +03:00
< li > < a href = "/cgspace-notes/2019-10/" > October, 2019< / a > < / li >
2018-02-11 18:28:23 +02:00
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
2019-10-11 11:19:42 +03:00
< p dir = "auto" >
2018-02-11 18:28:23 +02:00
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >