diff --git a/content/posts/2023-03.md b/content/posts/2023-03.md
index 130a3db5c..53b0ff5e0 100644
--- a/content/posts/2023-03.md
+++ b/content/posts/2023-03.md
@@ -257,4 +257,146 @@ $ ./ilri/resolve_orcids.py -i /tmp/2023-03-14-orcids.txt -o /tmp/2023-03-14-orci
 $ ./ilri/update_orcids.py -i /tmp/2023-03-14-orcids-names.txt -db dspace -u dspace -p 'fuuu' -m 247
 ```
 
+## 2023-03-15
+
+- Jawoo was asking about possibilities to harvest PDFs from CGSpace for some kind of AI chatbot integration
+  - I see we have 45,000 PDFs (format ID 2)
+
+```console
+localhost/dspacetest= ☘ SELECT COUNT(*) FROM bitstream WHERE NOT deleted AND bitstream_format_id=2;
+ count 
+───────
+ 45281
+(1 row)
+```
+
+- Rework some of my Python scripts to use a common `db_connect` function from util
+- I reworked my `post_bitstreams.py` script to be able to overwrite bitstreams if requested
+  - The use case is to upload thumbnails for all the journal articles where we have these horrible pixelated journal covers
+  - I replaced JPEG thumbnails for ~896 ILRI publications by exporting a list of DOIs from the 10568/3 collection that were CC-BY, getting their PDFs from Sci-Hub, and then posting them with my new script
+
+## 2023-03-16
+
+- Continue working on the ILRI publication thumbnails
+  - There were about sixty-four that had existing PNG "journal cover" thumbnails that didn't get replaced because I only overwrote the JPEG ones yesterday
+  - Now I generated a list of those bitstream UUIDs and deleted them with a shell script via the REST API
+- I made a [pull request on DSpace 7 to update the bitstream format registry for PNG, WebP, and AVIF](https://github.com/DSpace/DSpace/pull/8722)
+- Export CGSpace to perform mappings to Initiatives collections
+- I also used this export to find CC-BY items with DOIs that had JPEGs or PNGs in their provenance, meaning that the submitter likely submitted a low-quality "journal cover" for the item
+  - I found about 330 of them and got most of their PDFs from Sci-Hub and replaced the crappy thumbnails with real ones where Sci-Hub had them (~245)
+- In related news, I realized you can get an [API key from Elsevier and download the PDFs from their API](https://stackoverflow.com/questions/59202176/python-download-papers-from-sciencedirect-by-doi-with-requests):
+
+```python
+import requests
+
+api_key = 'fuuuuuuuuu'
+doi = "10.1016/j.foodqual.2021.104362"
+request_url = f'https://api.elsevier.com/content/article/doi:{doi}'
+
+headers = {
+    'X-ELS-APIKEY': api_key,
+    'Accept': 'application/pdf'
+}
+
+with requests.get(request_url, stream=True, headers=headers) as r:
+    if r.status_code == 200:
+        with open("article.pdf", "wb") as f:
+            for chunk in r.iter_content(chunk_size=1024*1024):
+                f.write(chunk)
+```
+
+- The question is, how do we know if a DOI is Elsevier or not...
+- CGIAR Repositories Working Group meeting
+  - We discussed controlled vocabularies for funders
+  - I suggested checking our combined lists against Crossref and ROR
+- Export a list of donors from `cg.contributor.donor` on CGSpace:
+
+```console
+localhost/dspacetest= ☘ \COPY (SELECT DISTINCT(text_value) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=248) to /tmp/2023-03-16-donors.txt;
+COPY 1521
+```
+
+- Then resolve them against Crossref's funders API:
+
+```console
+$ ./ilri/crossref_funders_lookup.py -e fuuuu@cgiar.org -i /tmp/2023-03-16-donors.txt -o ~/Downloads/2023-03-16-cgspace-crossref-funders-results.csv -d
+$ csvgrep -c matched -m true ~/Downloads/2023-03-16-cgspace-crossref-funders-results.csv | wc -l
+472
+$ sed 1d ~/Downloads/2023-03-16-cgspace-crossref-funders-results.csv | wc -l 
+1521
+```
+
+- That's a 31% hit rate, but I see some simple things like "Bill and Melinda Gates Foundation" instead of "Bill & Melinda Gates Foundation"
+
+## 2023-03-17
+
+- I did the same lookup of CGSpace donors on ROR's 2022-12-01 data dump:
+
+```console
+$ ./ilri/ror_lookup.py -i /tmp/2023-03-16-donors.txt -o ~/Downloads/2023-03-16-cgspace-ror-funders-results.csv -r v1.15-2022-12-01-ror-data.json
+$ csvgrep -c matched -m true ~/Downloads/2023-03-16-cgspace-ror-funders-results.csv | wc -l                                            
+407
+$ sed 1d ~/Downloads/2023-03-16-cgspace-ror-funders-results.csv | wc -l
+1521
+```
+
+- That's a 26.7% hit rate
+- As for the number of funders in each dataset
+  - Crossref has about 34,000
+  - ROR has 15,000 if "FundRef" data is a proxy for that:
+
+```console
+$ grep -c -rsI FundRef v1.15-2022-12-01-ror-data.json    
+15162
+```
+
+- On a related note, I remembered that DOI.org has a list of DOI prefixes and publishers: https://doi.crossref.org/getPrefixPublisher
+  - In Python I can look up publishers by prefix easily, here with a nested list comprehension:
+
+```console
+In [10]: [publisher for publisher in publishers if '10.3390' in publisher['prefixes']]
+Out[10]: 
+[{'prefixes': ['10.1989', '10.32545', '10.20944', '10.3390', '10.35995'],
+  'name': 'MDPI AG',
+  'memberId': 1968}]
+```
+
+- And in OpenRefine, if I create a new column based on the DOI using Jython:
+
+```python
+import json
+
+with open("/home/aorth/src/git/DSpace/publisher-doi-prefixes.json", "rb") as f:
+    publishers = json.load(f)
+
+doi_prefix = value.split("/")[3]
+
+publisher = [publisher for publisher in publishers if doi_prefix in publisher['prefixes']]
+
+return publisher[0]['name']
+```
+
+- ... though this is very slow and hung OpenRefine when I tried it
+- I added the ability to overwrite multiple bitstream formats at once in `post_bitstreams.py`
+
+```console
+$ ./ilri/post_bitstreams.py -i test.csv -u https://dspacetest.cgiar.org/rest -e fuuu@example.com -p 'fffnjnjn' -d -s 2B40C7C4E34CEFCF5AFAE4B75A8C52E2 --overwrite JPEG --overwrite PNG -n
+Session valid: 2B40C7C4E34CEFCF5AFAE4B75A8C52E2
+Opened test.csv
+384142cb-58b9-4e64-bcdc-0a8cc34888b3: checking for existing bitstreams in THUMBNAIL bundle
+> (DRY RUN) Deleting bitstream: IFPRI Malawi_Maize Market Report_February_202_anonymous.pdf.jpg (16883cb0-1fc8-4786-a04f-32132e0617d4)
+> (DRY RUN) Deleting bitstream: AgroEcol_Newsletter_2.png (7e9cd434-45a6-4d55-8d56-4efa89d73813)
+> (DRY RUN) Uploading file: 10568-129666.pdf.jpg
+```
+
+- I learned how to use Python's built-in `logging` module and it simplifies all my debug and info printing
+  - I re-factored a few scripts to use the new logging
+
+## 2023-03-18
+
+- I applied changes for publishers on 16,000 items in batches of 5,000
+- While working on my `post_bitstreams.py` script I realized the Tomcat Crawler Session Manager valve that groups bot user agents into sessions is causing my login to fail the first time, every time
+  - I've disabled it for now and will check the Munin session graphs after some time to see if it makes a difference
+  - In any case I have much better spider user agent lists in DSpace now than I did years ago when I started using the Crawler Session Manager valve
+
 <!-- vim: set sw=2 ts=2: -->
diff --git a/docs/2023-03/index.html b/docs/2023-03/index.html
index 45b7ad76e..a2dc2def3 100644
--- a/docs/2023-03/index.html
+++ b/docs/2023-03/index.html
@@ -16,7 +16,7 @@ I finally got through with porting the input form from DSpace 6 to DSpace 7
 <meta property="og:type" content="article" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2023-03/" />
 <meta property="article:published_time" content="2023-03-01T07:58:36+03:00" />
-<meta property="article:modified_time" content="2023-03-13T21:22:25+03:00" />
+<meta property="article:modified_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
@@ -38,9 +38,9 @@ I finally got through with porting the input form from DSpace 6 to DSpace 7
   "@type": "BlogPosting",
   "headline": "March, 2023",
   "url": "https://alanorth.github.io/cgspace-notes/2023-03/",
-  "wordCount": "1984",
+  "wordCount": "2804",
   "datePublished": "2023-03-01T07:58:36+03:00",
-  "dateModified": "2023-03-13T21:22:25+03:00",
+  "dateModified": "2023-03-15T08:03:48+03:00",
   "author": {
     "@type": "Person",
     "name": "Alan Orth"
@@ -401,7 +401,158 @@ pd.options.mode.nullable_dtypes = True
 <li>Then update them in the database:</li>
 </ul>
 <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/update_orcids.py -i /tmp/2023-03-14-orcids-names.txt -db dspace -u dspace -p <span style="color:#e6db74">&#39;fuuu&#39;</span> -m <span style="color:#ae81ff">247</span>
-</span></span></code></pre></div><!-- raw HTML omitted -->
+</span></span></code></pre></div><h2 id="2023-03-15">2023-03-15</h2>
+<ul>
+<li>Jawoo was asking about possibilities to harvest PDFs from CGSpace for some kind of AI chatbot integration
+<ul>
+<li>I see we have 45,000 PDFs (format ID 2)</li>
+</ul>
+</li>
+</ul>
+<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspacetest= ☘ SELECT COUNT(*) FROM bitstream WHERE NOT deleted AND bitstream_format_id=2;
+</span></span><span style="display:flex;"><span> count 
+</span></span><span style="display:flex;"><span>───────
+</span></span><span style="display:flex;"><span> 45281
+</span></span><span style="display:flex;"><span>(1 row)
+</span></span></code></pre></div><ul>
+<li>Rework some of my Python scripts to use a common <code>db_connect</code> function from util</li>
+<li>I reworked my <code>post_bitstreams.py</code> script to be able to overwrite bitstreams if requested
+<ul>
+<li>The use case is to upload thumbnails for all the journal articles where we have these horrible pixelated journal covers</li>
+<li>I replaced JPEG thumbnails for ~896 ILRI publications by exporting a list of DOIs from the 10568/3 collection that were CC-BY, getting their PDFs from Sci-Hub, and then posting them with my new script</li>
+</ul>
+</li>
+</ul>
+<h2 id="2023-03-16">2023-03-16</h2>
+<ul>
+<li>Continue working on the ILRI publication thumbnails
+<ul>
+<li>There were about sixty-four that had existing PNG &ldquo;journal cover&rdquo; thumbnails that didn&rsquo;t get replaced because I only overwrote the JPEG ones yesterday</li>
+<li>Now I generated a list of those bitstream UUIDs and deleted them with a shell script via the REST API</li>
+</ul>
+</li>
+<li>I made a <a href="https://github.com/DSpace/DSpace/pull/8722">pull request on DSpace 7 to update the bitstream format registry for PNG, WebP, and AVIF</a></li>
+<li>Export CGSpace to perform mappings to Initiatives collections</li>
+<li>I also used this export to find CC-BY items with DOIs that had JPEGs or PNGs in their provenance, meaning that the submitter likely submitted a low-quality &ldquo;journal cover&rdquo; for the item
+<ul>
+<li>I found about 330 of them and got most of their PDFs from Sci-Hub and replaced the crappy thumbnails with real ones where Sci-Hub had them (~245)</li>
+</ul>
+</li>
+<li>In related news, I realized you can get an <a href="https://stackoverflow.com/questions/59202176/python-download-papers-from-sciencedirect-by-doi-with-requests">API key from Elsevier and download the PDFs from their API</a>:</li>
+</ul>
+<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> requests
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>api_key <span style="color:#f92672">=</span> <span style="color:#e6db74">&#39;fuuuuuuuuu&#39;</span>
+</span></span><span style="display:flex;"><span>doi <span style="color:#f92672">=</span> <span style="color:#e6db74">&#34;10.1016/j.foodqual.2021.104362&#34;</span>
+</span></span><span style="display:flex;"><span>request_url <span style="color:#f92672">=</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">&#39;https://api.elsevier.com/content/article/doi:</span><span style="color:#e6db74">{</span>doi<span style="color:#e6db74">}</span><span style="color:#e6db74">&#39;</span>
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>headers <span style="color:#f92672">=</span> {
+</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#39;X-ELS-APIKEY&#39;</span>: api_key,
+</span></span><span style="display:flex;"><span>    <span style="color:#e6db74">&#39;Accept&#39;</span>: <span style="color:#e6db74">&#39;application/pdf&#39;</span>
+</span></span><span style="display:flex;"><span>}
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#66d9ef">with</span> requests<span style="color:#f92672">.</span>get(request_url, stream<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>, headers<span style="color:#f92672">=</span>headers) <span style="color:#66d9ef">as</span> r:
+</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">if</span> r<span style="color:#f92672">.</span>status_code <span style="color:#f92672">==</span> <span style="color:#ae81ff">200</span>:
+</span></span><span style="display:flex;"><span>        <span style="color:#66d9ef">with</span> open(<span style="color:#e6db74">&#34;article.pdf&#34;</span>, <span style="color:#e6db74">&#34;wb&#34;</span>) <span style="color:#66d9ef">as</span> f:
+</span></span><span style="display:flex;"><span>            <span style="color:#66d9ef">for</span> chunk <span style="color:#f92672">in</span> r<span style="color:#f92672">.</span>iter_content(chunk_size<span style="color:#f92672">=</span><span style="color:#ae81ff">1024</span><span style="color:#f92672">*</span><span style="color:#ae81ff">1024</span>):
+</span></span><span style="display:flex;"><span>                f<span style="color:#f92672">.</span>write(chunk)
+</span></span></code></pre></div><ul>
+<li>The question is, how do we know if a DOI is Elsevier or not&hellip;</li>
+<li>CGIAR Repositories Working Group meeting
+<ul>
+<li>We discussed controlled vocabularies for funders</li>
+<li>I suggested checking our combined lists against Crossref and ROR</li>
+</ul>
+</li>
+<li>Export a list of donors from <code>cg.contributor.donor</code> on CGSpace:</li>
+</ul>
+<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>localhost/dspacetest= ☘ \COPY (SELECT DISTINCT(text_value) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=248) to /tmp/2023-03-16-donors.txt;
+</span></span><span style="display:flex;"><span>COPY 1521
+</span></span></code></pre></div><ul>
+<li>Then resolve them against Crossref&rsquo;s funders API:</li>
+</ul>
+<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/crossref_funders_lookup.py -e fuuuu@cgiar.org -i /tmp/2023-03-16-donors.txt -o ~/Downloads/2023-03-16-cgspace-crossref-funders-results.csv -d
+</span></span><span style="display:flex;"><span>$ csvgrep -c matched -m true ~/Downloads/2023-03-16-cgspace-crossref-funders-results.csv | wc -l
+</span></span><span style="display:flex;"><span>472
+</span></span><span style="display:flex;"><span>$ sed 1d ~/Downloads/2023-03-16-cgspace-crossref-funders-results.csv | wc -l 
+</span></span><span style="display:flex;"><span>1521
+</span></span></code></pre></div><ul>
+<li>That&rsquo;s a 31% hit rate, but I see some simple things like &ldquo;Bill and Melinda Gates Foundation&rdquo; instead of &ldquo;Bill &amp; Melinda Gates Foundation&rdquo;</li>
+</ul>
+<h2 id="2023-03-17">2023-03-17</h2>
+<ul>
+<li>I did the same lookup of CGSpace donors on ROR&rsquo;s 2022-12-01 data dump:</li>
+</ul>
+<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/ror_lookup.py -i /tmp/2023-03-16-donors.txt -o ~/Downloads/2023-03-16-cgspace-ror-funders-results.csv -r v1.15-2022-12-01-ror-data.json
+</span></span><span style="display:flex;"><span>$ csvgrep -c matched -m true ~/Downloads/2023-03-16-cgspace-ror-funders-results.csv | wc -l                                            
+</span></span><span style="display:flex;"><span>407
+</span></span><span style="display:flex;"><span>$ sed 1d ~/Downloads/2023-03-16-cgspace-ror-funders-results.csv | wc -l
+</span></span><span style="display:flex;"><span>1521
+</span></span></code></pre></div><ul>
+<li>That&rsquo;s a 26.7% hit rate</li>
+<li>As for the number of funders in each dataset
+<ul>
+<li>Crossref has about 34,000</li>
+<li>ROR has 15,000 if &ldquo;FundRef&rdquo; data is a proxy for that:</li>
+</ul>
+</li>
+</ul>
+<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ grep -c -rsI FundRef v1.15-2022-12-01-ror-data.json    
+</span></span><span style="display:flex;"><span>15162
+</span></span></code></pre></div><ul>
+<li>On a related note, I remembered that DOI.org has a list of DOI prefixes and publishers: <a href="https://doi.crossref.org/getPrefixPublisher">https://doi.crossref.org/getPrefixPublisher</a>
+<ul>
+<li>In Python I can look up publishers by prefix easily, here with a nested list comprehension:</li>
+</ul>
+</li>
+</ul>
+<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>In [10]: [publisher for publisher in publishers if &#39;10.3390&#39; in publisher[&#39;prefixes&#39;]]
+</span></span><span style="display:flex;"><span>Out[10]: 
+</span></span><span style="display:flex;"><span>[{&#39;prefixes&#39;: [&#39;10.1989&#39;, &#39;10.32545&#39;, &#39;10.20944&#39;, &#39;10.3390&#39;, &#39;10.35995&#39;],
+</span></span><span style="display:flex;"><span>  &#39;name&#39;: &#39;MDPI AG&#39;,
+</span></span><span style="display:flex;"><span>  &#39;memberId&#39;: 1968}]
+</span></span></code></pre></div><ul>
+<li>And in OpenRefine, if I create a new column based on the DOI using Jython:</li>
+</ul>
+<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#f92672">import</span> json
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#66d9ef">with</span> open(<span style="color:#e6db74">&#34;/home/aorth/src/git/DSpace/publisher-doi-prefixes.json&#34;</span>, <span style="color:#e6db74">&#34;rb&#34;</span>) <span style="color:#66d9ef">as</span> f:
+</span></span><span style="display:flex;"><span>    publishers <span style="color:#f92672">=</span> json<span style="color:#f92672">.</span>load(f)
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>doi_prefix <span style="color:#f92672">=</span> value<span style="color:#f92672">.</span>split(<span style="color:#e6db74">&#34;/&#34;</span>)[<span style="color:#ae81ff">3</span>]
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span>publisher <span style="color:#f92672">=</span> [publisher <span style="color:#66d9ef">for</span> publisher <span style="color:#f92672">in</span> publishers <span style="color:#66d9ef">if</span> doi_prefix <span style="color:#f92672">in</span> publisher[<span style="color:#e6db74">&#39;prefixes&#39;</span>]]
+</span></span><span style="display:flex;"><span>
+</span></span><span style="display:flex;"><span><span style="color:#66d9ef">return</span> publisher[<span style="color:#ae81ff">0</span>][<span style="color:#e6db74">&#39;name&#39;</span>]
+</span></span></code></pre></div><ul>
+<li>&hellip; though this is very slow and hung OpenRefine when I tried it</li>
+<li>I added the ability to overwrite multiple bitstream formats at once in <code>post_bitstreams.py</code></li>
+</ul>
+<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ ./ilri/post_bitstreams.py -i test.csv -u https://dspacetest.cgiar.org/rest -e fuuu@example.com -p <span style="color:#e6db74">&#39;fffnjnjn&#39;</span> -d -s 2B40C7C4E34CEFCF5AFAE4B75A8C52E2 --overwrite JPEG --overwrite PNG -n
+</span></span><span style="display:flex;"><span>Session valid: 2B40C7C4E34CEFCF5AFAE4B75A8C52E2
+</span></span><span style="display:flex;"><span>Opened test.csv
+</span></span><span style="display:flex;"><span>384142cb-58b9-4e64-bcdc-0a8cc34888b3: checking for existing bitstreams in THUMBNAIL bundle
+</span></span><span style="display:flex;"><span>&gt; <span style="color:#f92672">(</span>DRY RUN<span style="color:#f92672">)</span> Deleting bitstream: IFPRI Malawi_Maize Market Report_February_202_anonymous.pdf.jpg <span style="color:#f92672">(</span>16883cb0-1fc8-4786-a04f-32132e0617d4<span style="color:#f92672">)</span>
+</span></span><span style="display:flex;"><span>&gt; <span style="color:#f92672">(</span>DRY RUN<span style="color:#f92672">)</span> Deleting bitstream: AgroEcol_Newsletter_2.png <span style="color:#f92672">(</span>7e9cd434-45a6-4d55-8d56-4efa89d73813<span style="color:#f92672">)</span>
+</span></span><span style="display:flex;"><span>&gt; <span style="color:#f92672">(</span>DRY RUN<span style="color:#f92672">)</span> Uploading file: 10568-129666.pdf.jpg
+</span></span></code></pre></div><ul>
+<li>I learned how to use Python&rsquo;s built-in <code>logging</code> module and it simplifies all my debug and info printing
+<ul>
+<li>I re-factored a few scripts to use the new logging</li>
+</ul>
+</li>
+</ul>
+<h2 id="2023-03-18">2023-03-18</h2>
+<ul>
+<li>I applied changes for publishers on 16,000 items in batches of 5,000</li>
+<li>While working on my <code>post_bitstreams.py</code> script I realized the Tomcat Crawler Session Manager valve that groups bot user agents into sessions is causing my login to fail the first time, every time
+<ul>
+<li>I&rsquo;ve disabled it for now and will check the Munin session graphs after some time to see if it makes a difference</li>
+<li>In any case I have much better spider user agent lists in DSpace now than I did years ago when I started using the Crawler Session Manager valve</li>
+</ul>
+</li>
+</ul>
+<!-- raw HTML omitted -->
 
   
 
diff --git a/docs/categories/index.html b/docs/categories/index.html
index 75dc69d0d..6f78850fa 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index e3741e3be..6629e2b1d 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index 18f39aa40..7faff6e30 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index 7b99c3c31..639ffb777 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index 27d8fe9ac..75d8f9d37 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index 2ef6fa559..154e859d9 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html
index d87fa1477..ebf0f2155 100644
--- a/docs/categories/notes/page/6/index.html
+++ b/docs/categories/notes/page/6/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html
index 0c1bbd7f3..cbde3978b 100644
--- a/docs/categories/notes/page/7/index.html
+++ b/docs/categories/notes/page/7/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/index.html b/docs/index.html
index ee5f99b04..436d12678 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/page/10/index.html b/docs/page/10/index.html
index 017f14111..ac050a30a 100644
--- a/docs/page/10/index.html
+++ b/docs/page/10/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 7a168849a..9ef8c24dd 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index 3c1a15c97..0bcee4ade 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index 812af2a33..89e93c08e 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 2318df757..bbfe9ec34 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index 493d3a9ea..a90729235 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index 29a22ae13..147215d0e 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index ea2ed8a29..6696d199e 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/page/9/index.html b/docs/page/9/index.html
index 0d29bf707..f6abce6f8 100644
--- a/docs/page/9/index.html
+++ b/docs/page/9/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/posts/index.html b/docs/posts/index.html
index eaae827fb..77f6c60ce 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/posts/page/10/index.html b/docs/posts/page/10/index.html
index 5d1c77b9c..51d339ff7 100644
--- a/docs/posts/page/10/index.html
+++ b/docs/posts/page/10/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index ba6c624ef..cf4766359 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 07b89ac15..1de9e52b6 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index 65c7b3191..0b5dc0135 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index c763e1b1a..5d15d834e 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index cad7b6978..028b2c617 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index 7c8777e2d..e26ee6c63 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index aa4fd5035..1e4e6773a 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html
index 1efc81430..b2ce00b0c 100644
--- a/docs/posts/page/9/index.html
+++ b/docs/posts/page/9/index.html
@@ -10,7 +10,7 @@
 <meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
 <meta property="og:type" content="website" />
 <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
-<meta property="og:updated_time" content="2023-03-14T14:30:17+03:00" />
+<meta property="og:updated_time" content="2023-03-15T08:03:48+03:00" />
 
 
 
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 625afd54c..295345035 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,19 +3,19 @@
   xmlns:xhtml="http://www.w3.org/1999/xhtml">
   <url>
     <loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
-    <lastmod>2023-03-14T14:30:17+03:00</lastmod>
+    <lastmod>2023-03-15T08:03:48+03:00</lastmod>
   </url><url>
     <loc>https://alanorth.github.io/cgspace-notes/</loc>
-    <lastmod>2023-03-14T14:30:17+03:00</lastmod>
+    <lastmod>2023-03-15T08:03:48+03:00</lastmod>
   </url><url>
     <loc>https://alanorth.github.io/cgspace-notes/2023-03/</loc>
-    <lastmod>2023-03-13T21:22:25+03:00</lastmod>
+    <lastmod>2023-03-15T08:03:48+03:00</lastmod>
   </url><url>
     <loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
-    <lastmod>2023-03-14T14:30:17+03:00</lastmod>
+    <lastmod>2023-03-15T08:03:48+03:00</lastmod>
   </url><url>
     <loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
-    <lastmod>2023-03-14T14:30:17+03:00</lastmod>
+    <lastmod>2023-03-15T08:03:48+03:00</lastmod>
   </url><url>
     <loc>https://alanorth.github.io/cgspace-notes/2023-02/</loc>
     <lastmod>2023-03-01T08:30:25+03:00</lastmod>