</span></span></span><spanstyle="display:flex;"><span><spanstyle="color:#960050;background-color:#1e0010"></span>Total number of bot hits purged: 12914
</span></span></code></pre></div><ul>
<li>I added a few from that list to the local overrides in our DSpace while I wait for feedback from the COUNTER-Robots project</li>
</ul>
<h2id="2022-03-05">2022-03-05</h2>
<ul>
<li>Start AReS harvest</li>
</ul>
<h2id="2022-03-10">2022-03-10</h2>
<ul>
<li>A few days ago Gaia sent me her notes on the fourth batch of TAC/ICW documents (items 701–980 in the spreadsheet)
<ul>
<li>I created a filter in LibreOffice and selected the IDs for items with the action “delete”, then I created a custom text facet in OpenRefine with this GREL:</li>
</ul>
</li>
</ul>
<pretabindex="0"><code>or(
isNotNull(value.match('707')),
isNotNull(value.match('709')),
isNotNull(value.match('710')),
isNotNull(value.match('711')),
isNotNull(value.match('713')),
isNotNull(value.match('717')),
isNotNull(value.match('718')),
...
isNotNull(value.match('821'))
)
</code></pre><ul>
<li>Then I flagged all matching records, exported a CSV to use with SAFBuilder, and imported them on DSpace Test:</li>
<li>Meeting with KM/KS group to start talking about the way forward for repositories and web publishing
<ul>
<li>We agreed to form a sub-group of the transition task team to put forward a recommendation for repository and web publishing</li>
</ul>
</li>
</ul>
<h2id="2022-03-20">2022-03-20</h2>
<ul>
<li>Start a full harvest on AReS</li>
</ul>
<h2id="2022-03-21">2022-03-21</h2>
<ul>
<li>Review a few submissions for Open Repositories 2022</li>
<li>Test one tentative DSpace 6.4 patch and give feedback on a few more that Hrafn missed</li>
</ul>
<h2id="2022-03-22">2022-03-22</h2>
<ul>
<li>I accidentally dropped the PostgreSQL database on DSpace Test, forgetting that I had all the CGIAR CAS items there
<ul>
<li>I had been meaning to update my local database…</li>
</ul>
</li>
<li>I re-imported the CGIAR CAS documents to <ahref="https://dspacetest.cgiar.org/handle/10568/118432">DSpace Test</a> and generated the PDF thumbnails:</li>
</span></span><spanstyle="display:flex;"><span>Due to abuse we no longer permit requests without a user agent. Please specify a descriptive user agent, for example containing the word 'bot', if you are accessing the site programmatically. For more information see here: https://dspacetest.cgiar.org/page/about.
</span></span></code></pre></div><ul>
<li>I note that the nginx log shows ‘-’ for a request with an empty user agent, which would be indistinguishable from a request with a ‘-’, for example these were successful:</li>
<li>I can only assume that these requests used a literal ‘-’ so I will have to add an nginx rule to block those too</li>
<li>Otherwise, I see from my notes that 70.32.90.172 is the wle.cgiar.org REST API harvester… I should ask Macaroni Bros about that</li>
</ul>
<h2id="2022-03-24">2022-03-24</h2>
<ul>
<li>Maria from ABC asked about a reporting discrepancy on AReS
<ul>
<li>I think it’s because the last harvest was over the weekend, and she was expecting to see items submitted this week</li>
</ul>
</li>
<li>Paola from ABC said they are decomissioning the server where many of their library PDFs are hosted
<ul>
<li>She asked if we can download them and upload them directly to CGSpace</li>
</ul>
</li>
<li>I re-created my local Artifactory container</li>
<li>I am doing a walkthrough of DSpace 7.3-SNAPSHOT to see how things are lately
<ul>
<li>One thing I realized is that OAI is no longer a standalone web application, it is part of the <code>server</code> app now: http://localhost:8080/server/oai/request?verb=Identify</li>
</ul>
</li>
<li>Deploy PostgreSQL 12 on CGSpace (linode18) but don’t switch over yet, because I see some users active
<ul>
<li>I did this on DSpace Test in 2022-02 so I just followed the same procedure</li>
<li>After that I ran all system updates and rebooted the server</li>
</ul>
</li>
</ul>
<h2id="2022-03-25">2022-03-25</h2>
<ul>
<li>Looking at the PostgreSQL database size on CGSpace after the update yesterday:</li>
<li>The space saving in indexes of recent PostgreSQL releases is awesome!</li>
<li>Import a DSpace 6.x database dump from production into my local DSpace 7 database
<ul>
<li>I see I still the same errors <ahref="/cgspace-notes/2021-04/">I saw in 2021-04</a> when testing DSpace 7.0 beta 5</li>
<li>I had to delete some old migrations, as well as all Atmire ones first:</li>
</ul>
</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span>localhost/dspace7= ☘ DELETE FROM schema_version WHERE version IN ('5.0.2017.09.25', '6.0.2017.01.30', '6.0.2017.09.25');
</span></span><spanstyle="display:flex;"><span>localhost/dspace7= ☘ DELETE FROM schema_version WHERE description LIKE '%Atmire%' OR description LIKE '%CUA%' OR description LIKE '%cua%';
</span></span></code></pre></div><ul>
<li>Then I was able to migrate to DSpace 7 with <code>dspace database migrate ignored</code> as the <ahref="https://wiki.lyrasis.org/display/DSDOC7x/Upgrading+DSpace">DSpace upgrade notes say</a>
<ul>
<li>I see that the <ahref="https://github.com/DSpace/dspace-angular/issues/1357">flash of unstyled content bug</a> still exists on dspace-angluar… ouch!</li>
</ul>
</li>
<li>Start a harvest on AReS</li>
</ul>
<h2id="2022-03-26">2022-03-26</h2>
<ul>
<li>Update dspace-statistics-api to Falcon 3.1.0 and <ahref="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.4.3">release v1.4.3</a></li>
</ul>
<h2id="2022-03-28">2022-03-28</h2>
<ul>
<li>Create another test account for Rafael from Bioversity-CIAT to submit some items to DSpace Test:</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span>$ dspace user -a -m tip-submit@cgiar.org -g CIAT -s Submit -p <spanstyle="color:#e6db74">'fuuuuuuuu'</span>
</span></span></code></pre></div><ul>
<li>I added the account to the Alliance Admins account, which is should allow him to submit to any Alliance collection
<ul>
<li>According to my notes from <ahref="/cgspace-notes/2020-10/">2020-10</a> the account must be in the admin group in order to submit via the REST API</li>
</ul>
</li>
<li>Abenet and I noticed 1,735 items in CTA’s community that have the title “delete”
<ul>
<li>We asked Peter and he said we should delete them</li>
<li>I exported the CTA community metadata and used OpenRefine to filter all items with the “delete” title, then used the “expunge” bulkedit action to remove them</li>
</ul>
</li>
<li>I realized I forgot to clean up the old Let’s Encrypt certbot stuff after upgrading CGSpace (linode18) to Ubuntu 20.04 a few weeks ago
<ul>
<li>I also removed the pre-Ubuntu 20.04 Let’s Encrypt stuff from the Ansble infrastructure playbooks</li>
</ul>
</li>
</ul>
<h2id="2022-03-29">2022-03-29</h2>
<ul>
<li>Gaia sent me her notes on the final review of duplicates of all TAC/ICW documents
<ul>
<li>I created a filter in LibreOffice and selected the IDs for items with the action “delete”, then I created a custom text facet in OpenRefine with this GREL:</li>
</ul>
</li>
</ul>
<pretabindex="0"><code>or(
isNotNull(value.match('33')),
isNotNull(value.match('179')),
isNotNull(value.match('452')),
isNotNull(value.match('489')),
isNotNull(value.match('541')),
isNotNull(value.match('568')),
isNotNull(value.match('646')),
isNotNull(value.match('889'))
)
</code></pre><ul>
<li>Then I flagged all matching records, exported a CSV to use with SAFBuilder, and imported the 692 items on CGSpace, and generated the thumbnails:</li>
<li>After that I did some normalization on the <code>cg.subject.system</code> metadata and extracted a few dozen countries to the country field</li>
<li>Start a harvest on AReS</li>
</ul>
<h2id="2022-03-30">2022-03-30</h2>
<ul>
<li>Yesterday Rafael from CIAT asked me to re-create his approver account on DSpace Test as well</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span>$ dspace user -a -m tip-approve@cgiar.org -g Rafael -s Rodriguez -p <spanstyle="color:#e6db74">'fuuuu'</span>
</span></span></code></pre></div><ul>
<li>I started looking into the request regarding the CIAT Library PDFs
<ul>
<li>There are over 4,000 links to PDFs hosted on that server in CGSpace metadata</li>
<li>The links seem to be down though! I emailed Paola to ask</li>
</ul>
</li>
</ul>
<h2id="2022-03-31">2022-03-31</h2>
<ul>
<li>Switch DSpace Test (linode26) back to CMS GC so I can do some monitoring and evaluation of GC before switching to G1GC</li>
<li>I will do the following for CMS and G1GC on DSpace Test:
<li>Leroy from CIAT said that the CIAT Library server has security issues so was limited to internal traffic
<ul>
<li>I extracted a list of URLs from CGSpace to send him:</li>
</ul>
</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span>localhost/dspacetest= ☘ \COPY (SELECT DISTINCT(text_value) FROM metadatavalue WHERE metadata_field_id=219 AND text_value ~ 'https?://ciat-library') to /tmp/2022-03-31-ciat-library-urls.csv WITH CSV HEADER;