diff --git a/content/posts/2019-09.md b/content/posts/2019-09.md
index f73f26360..6bc4ca2eb 100644
--- a/content/posts/2019-09.md
+++ b/content/posts/2019-09.md
@@ -236,6 +236,7 @@ $ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/dc-contribut
 
 ## 2019-09-20
 
+- Deploy a fresh snapshot of CGSpace's PostgreSQL database on DSpace Test so we can get more accurate duplicate checking with the upcoming Bioversity and IITA migrations
 - Skype with Carol and Francesca to discuss the Bioveristy migration to CGSpace
   - They want to do some enrichment of the metadata to add countries and regions
   - Also, they noticed that some items have a blank ISSN in the citation like "ISSN:"
@@ -248,4 +249,46 @@ $ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/dc-contribut
 $ perl-rename -n 's/_{2,3}/_/g' *.pdf
 ```
 
+- I was going preparing to run SAFBuilder for the Bioversity migration and decided to check the list of PDFs on my local machine versus on DSpace Test (where I had downloaded them last month)
+  - There are a *few dozen* that have completely fucked up names due to some encoding error
+  - To make matters worse, when I tried to download them, some of the links in the "URL" column that Francesco included are wrong, so I had to go to the permalink and get a link that worked
+  - After downloading everything I had to use Ubuntu's version of rename to get rid of all the double and triple underscores:
+
+```
+$ rename -v 's/___/_/g'  *.pdf
+$ rename -v 's/__/_/g'  *.pdf
+```
+
+- I'm still waiting to hear what Carol and Francesca want to do with the `1195.pdf.LCK` file (for now I've removed it from the CSV, but for future reference it has the number 630 in its permalink)
+- I wrote two fairly long GREL expressions to clean up the institutional author names in the `dc.contributor.author` and `dc.identifier.citation` fields using OpenRefine
+  - The first targets acronyms in parentheses like "International Livestock Research Institute (ILRI)":
+
+```
+value.replace(/,? ?\((ANDES|APAFRI|APFORGEN|Canada|CFC|CGRFA|China|CacaoNet|CATAS|CDU|CIAT|CIRF|CIP|CIRNMA|COSUDE|Colombia|COA|COGENT|CTDT|Denmark|DfLP|DSE|ECPGR|ECOWAS|ECP\/GR|England|EUFORGEN|FAO|France|Francia|FFTC|Germany|GEF|GFU|GGCO|GRPI|italy|Italy|Italia|India|ICCO|ICAR|ICGR|ICRISAT|IDRC|INFOODS|IPGRI|IBPGR|ICARDA|ILRI|INIBAP|INBAR|IPK|ISG|IT|Japan|JIRCAS|Kenya|LI\-BIRD|Malaysia|NARC|NBPGR|Nepal|OOAS|RDA|RISBAP|Rome|ROPPA|SEARICE|Senegal|SGRP|Sweden|Syrian Arab Republic|The Netherlands|UNDP|UK|UNEP|UoB|UoM|United Kingdom|WAHO)\)/,"")
+```
+  - The second targets cities and countries after names like "International Livestock Research Intstitute, Kenya":
+
+```
+replace(/,? ?(ali|Aleppo|Amsterdam|Beijing|Bonn|Burkina Faso|CN|Dakar|Gatersleben|London|Montpellier|Nairobi|New Delhi|Kaski|Kepong|Malaysia|Khumaltar|Lima|Ltpur|Ottawa|Patancheru|Peru|Pokhara|Rome|Uppsala|University of Mauritius|Tsukuba)/,"")
+```
+
+- I imported the 1,427 Bioversity records with bitstreams to a new collection called [2019-09-20 Bioversity Migration Test](https://dspacetest.cgiar.org/handle/10568/103688) on DSpace Test (after splitting them in two batches of about 700 each):
+
+```
+$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx768m'
+$ dspace import -a me@cgiar.org -m 2019-09-20-bioversity1.map -s /home/aorth/Bioversity/bioversity1
+$ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bioversity/bioversity2
+```
+
+- After that I exported the collection again and started doing some quality checks and cleanups:
+  - Change all DOIs to use https://doi.org format
+  - Change all bioversityinternational.org links to use https://
+  - Fix ten authors with invalid names like "Orth,." by checking the correct name in the citation
+  - Fix several invalid ISBNs, but there are several more that contain incorrect ISBNs in their PDFs!
+  - Fix some citations that were using "ISSN" instead of ISBN
+- The next steps are:
+  - Check for duplicates
+  - Continue with institutional author normalization
+  - Ask which collection to map items with type Brochure, Journal Item, and Thesis?
+
 <!-- vim: set sw=2 ts=2: -->
diff --git a/docs/2019-09/index.html b/docs/2019-09/index.html
index c9a220154..916e1a1b4 100644
--- a/docs/2019-09/index.html
+++ b/docs/2019-09/index.html
@@ -40,7 +40,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
 <meta property="og:type" content="article" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-09/" />
 <meta property="article:published_time" content="2019-09-01T10:17:51+03:00" />
-<meta property="article:modified_time" content="2019-09-20T12:55:11+03:00" />
+<meta property="article:modified_time" content="2019-09-20T13:25:59+03:00" />
 
 <meta name="twitter:card" content="summary"/>
 <meta name="twitter:title" content="September, 2019"/>
@@ -85,9 +85,9 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
   "@type": "BlogPosting",
   "headline": "September, 2019",
   "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-09\/",
-  "wordCount": "1778",
+  "wordCount": "2166",
   "datePublished": "2019-09-01T10:17:51\x2b03:00",
-  "dateModified": "2019-09-20T12:55:11\x2b03:00",
+  "dateModified": "2019-09-20T13:25:59\x2b03:00",
   "author": {
     "@type": "Person",
     "name": "Alan Orth"
@@ -438,6 +438,8 @@ $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-s
 <h2 id="2019-09-20">2019-09-20</h2>
 
 <ul>
+<li>Deploy a fresh snapshot of CGSpace&rsquo;s PostgreSQL database on DSpace Test so we can get more accurate duplicate checking with the upcoming Bioversity and IITA migrations</li>
+
 <li><p>Skype with Carol and Francesca to discuss the Bioveristy migration to CGSpace</p>
 
 <ul>
@@ -452,6 +454,60 @@ $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-s
 <pre><code>$ perl-rename -n 's/_{2,3}/_/g' *.pdf
 </code></pre></li>
 </ul></li>
+
+<li><p>I was going preparing to run SAFBuilder for the Bioversity migration and decided to check the list of PDFs on my local machine versus on DSpace Test (where I had downloaded them last month)</p>
+
+<ul>
+<li>There are a <em>few dozen</em> that have completely fucked up names due to some encoding error</li>
+<li>To make matters worse, when I tried to download them, some of the links in the &ldquo;URL&rdquo; column that Francesco included are wrong, so I had to go to the permalink and get a link that worked</li>
+
+<li><p>After downloading everything I had to use Ubuntu&rsquo;s version of rename to get rid of all the double and triple underscores:</p>
+
+<pre><code>$ rename -v 's/___/_/g'  *.pdf
+$ rename -v 's/__/_/g'  *.pdf
+</code></pre></li>
+</ul></li>
+
+<li><p>I&rsquo;m still waiting to hear what Carol and Francesca want to do with the <code>1195.pdf.LCK</code> file (for now I&rsquo;ve removed it from the CSV, but for future reference it has the number 630 in its permalink)</p></li>
+
+<li><p>I wrote two fairly long GREL expressions to clean up the institutional author names in the <code>dc.contributor.author</code> and <code>dc.identifier.citation</code> fields using OpenRefine</p>
+
+<ul>
+<li><p>The first targets acronyms in parentheses like &ldquo;International Livestock Research Institute (ILRI)&rdquo;:</p>
+
+<pre><code>value.replace(/,? ?\((ANDES|APAFRI|APFORGEN|Canada|CFC|CGRFA|China|CacaoNet|CATAS|CDU|CIAT|CIRF|CIP|CIRNMA|COSUDE|Colombia|COA|COGENT|CTDT|Denmark|DfLP|DSE|ECPGR|ECOWAS|ECP\/GR|England|EUFORGEN|FAO|France|Francia|FFTC|Germany|GEF|GFU|GGCO|GRPI|italy|Italy|Italia|India|ICCO|ICAR|ICGR|ICRISAT|IDRC|INFOODS|IPGRI|IBPGR|ICARDA|ILRI|INIBAP|INBAR|IPK|ISG|IT|Japan|JIRCAS|Kenya|LI\-BIRD|Malaysia|NARC|NBPGR|Nepal|OOAS|RDA|RISBAP|Rome|ROPPA|SEARICE|Senegal|SGRP|Sweden|Syrian Arab Republic|The Netherlands|UNDP|UK|UNEP|UoB|UoM|United Kingdom|WAHO)\)/,&quot;&quot;)
+</code></pre></li>
+
+<li><p>The second targets cities and countries after names like &ldquo;International Livestock Research Intstitute, Kenya&rdquo;:</p>
+
+<pre><code>replace(/,? ?(ali|Aleppo|Amsterdam|Beijing|Bonn|Burkina Faso|CN|Dakar|Gatersleben|London|Montpellier|Nairobi|New Delhi|Kaski|Kepong|Malaysia|Khumaltar|Lima|Ltpur|Ottawa|Patancheru|Peru|Pokhara|Rome|Uppsala|University of Mauritius|Tsukuba)/,&quot;&quot;)
+</code></pre></li>
+</ul></li>
+
+<li><p>I imported the 1,427 Bioversity records with bitstreams to a new collection called <a href="https://dspacetest.cgiar.org/handle/10568/103688">2019-09-20 Bioversity Migration Test</a> on DSpace Test (after splitting them in two batches of about 700 each):</p>
+
+<pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx768m'
+$ dspace import -a me@cgiar.org -m 2019-09-20-bioversity1.map -s /home/aorth/Bioversity/bioversity1
+$ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bioversity/bioversity2
+</code></pre></li>
+
+<li><p>After that I exported the collection again and started doing some quality checks and cleanups:</p>
+
+<ul>
+<li>Change all DOIs to use <a href="https://doi.org">https://doi.org</a> format</li>
+<li>Change all bioversityinternational.org links to use https://</li>
+<li>Fix ten authors with invalid names like &ldquo;Orth,.&rdquo; by checking the correct name in the citation</li>
+<li>Fix several invalid ISBNs, but there are several more that contain incorrect ISBNs in their PDFs!</li>
+<li>Fix some citations that were using &ldquo;ISSN&rdquo; instead of ISBN</li>
+</ul></li>
+
+<li><p>The next steps are:</p>
+
+<ul>
+<li>Check for duplicates</li>
+<li>Continue with institutional author normalization</li>
+<li>Ask which collection to map items with type Brochure, Journal Item, and Thesis?</li>
+</ul></li>
 </ul>
 
 <!-- vim: set sw=2 ts=2: -->
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index c24179f62..b2883d2f6 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -4,27 +4,27 @@
   
   <url>
     <loc>https://alanorth.github.io/cgspace-notes/</loc>
-    <lastmod>2019-09-20T12:55:11+03:00</lastmod>
+    <lastmod>2019-09-20T13:25:59+03:00</lastmod>
   </url>
   
   <url>
     <loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
-    <lastmod>2019-09-20T12:55:11+03:00</lastmod>
+    <lastmod>2019-09-20T13:25:59+03:00</lastmod>
   </url>
   
   <url>
     <loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
-    <lastmod>2019-09-20T12:55:11+03:00</lastmod>
+    <lastmod>2019-09-20T13:25:59+03:00</lastmod>
   </url>
   
   <url>
     <loc>https://alanorth.github.io/cgspace-notes/2019-09/</loc>
-    <lastmod>2019-09-20T12:55:11+03:00</lastmod>
+    <lastmod>2019-09-20T13:25:59+03:00</lastmod>
   </url>
   
   <url>
     <loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
-    <lastmod>2019-09-20T12:55:11+03:00</lastmod>
+    <lastmod>2019-09-20T13:25:59+03:00</lastmod>
   </url>
   
   <url>