<li>Last week Bizu reported an issue with the “browse by issue date” drop down
<ul>
<li>I verified it, and suspect it could be due to missing issue dates…</li>
<li>It might be this issue: <ahref="https://github.com/DSpace/dspace-angular/issues/2808">https://github.com/DSpace/dspace-angular/issues/2808</a></li>
</ul>
</li>
</ul>
<ul>
<li>I spent some time trying to reproduce the bug affecting <code>onebox</code> fields that are configured to use external vocabularies and are not repeatable
<ul>
<li>I filed an issue: <ahref="https://github.com/DSpace/dspace-angular/issues/2846">https://github.com/DSpace/dspace-angular/issues/2846</a></li>
</ul>
</li>
</ul>
<h2id="2024-03-03">2024-03-03</h2>
<ul>
<li>I did some cleanups on abstracts, licenses, and dates from CrossRef</li>
<li>I also did some minor cleanups to affiliations because I saw some incorrect and duplicate ones in our list</li>
<li>I tried a new technique to get some affiliations from Crossref using OpenRefine
<ul>
<li>First I split them and clustered, resolving a few hundred clusters out of 1500 (!)</li>
<li>Then I used a custom text facet with a few dozen CGIAR and other large affiliations to reduce the work</li>
<li>Then I joined them with our affiliations, paying no attention to duplicates</li>
<li>Then I deduped them using the Jython technique I learned in 2023-02</li>
</ul>
</li>
</ul>
<h2id="2024-03-06">2024-03-06</h2>
<ul>
<li>Peter sent me some more corrections for the authors that I had sent him in 2023-12</li>
</ul>
<h2id="2024-03-08">2024-03-08</h2>
<ul>
<li>IFPRI sent me their 2023 records from CONTENTdm so I started working on those
<ul>
<li>I found a way to match their ORCID identifiers in our list using Jython in OpenRefine:</li>
</ul>
</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-python"data-lang="python"><spanstyle="display:flex;"><span><spanstyle="color:#f92672">import</span> re
</span></span><spanstyle="display:flex;"><span>
</span></span><spanstyle="display:flex;"><span><spanstyle="color:#66d9ef">with</span> open(<spanstyle="color:#e6db74">r</span><spanstyle="color:#e6db74">"/tmp/cg-creator-identifier.txt"</span>,<spanstyle="color:#e6db74">'r'</span>) <spanstyle="color:#66d9ef">as</span> f :
</span></span><spanstyle="display:flex;"><span><spanstyle="color:#66d9ef">return</span> value
</span></span></code></pre></div><ul>
<li>I realized that <ahref="https://www.unicef.org/about-unicef/frequently-asked-questions#3">UNICEF was renamed to its current name in 1953</a> so I replaced all other variations in our vocabularies and metadata:</li>
<li>Experimenting with moving some of my Python scripts to the DSpace 7 REST API
<ul>
<li>I need a way to get UUIDs for Handles…</li>
<li>Seems that I can use a Discovery query like: <ahref="https://dspace7test.ilri.org/server/api/discover/search/objects?dsoType=item&query=handle:10568/130864">https://dspace7test.ilri.org/server/api/discover/search/objects?dsoType=item&query=handle:10568/130864</a></li>
<li>Then just take the first result…?</li>
</ul>
</li>
<li>I spent some time working on the script get abstracts from CGSpace, and found a bug in my logic
<ul>
<li>I also noticed that one item had two abstracts, but the first one was blank!</li>
<li>Looking deeper, I found 113 blank metadata values so I deleted those:</li>
<li>I deployed the change to disable Angular SSR’s <code>inlineCriticalCss</code> on production because we had heavy load on the frontend and I’ve been meaning to do this permanently for some time</li>
<li>Maria asked me for a CSV with all the broken Bioversity permalinks so I exported them for her:</li>
<li>Run the duplicate checker for IFPRI 2023 batch upload</li>
</ul>
<h2id="2024-03-13">2024-03-13</h2>
<ul>
<li>I found about 428 duplicates in the IFPRI 2023 batch records
<ul>
<li>Alarmingly, I found about 18 that are duplicated on CGSpace as well!</li>
<li>I looked closer and decided that 11 were duplicates, so I merged the metadata and withdrew the later ones</li>
</ul>
</li>
<li>Alliance asked me to get him the Handles for items submitted by TIP that are not discoverable
<ul>
<li>I found it easiest to use the <code>ds6_item2itemhandle</code><ahref="https://wiki.lyrasis.org/display/DSPACE/Helper+SQL+functions+for+DSpace+6">DSpace SQL helper function</a> with a nested query on the provenance:</li>
<li>IWMI sent me some new author ORCID identifiers so I updated our list</li>
<li>Started working on updating my data for the Ontology CoP webinar on CGIAR and AGROVOC
<ul>
<li>First extracting all unique subjects on CGSpace:</li>
</ul>
</li>
</ul>
<pretabindex="0"><code>localhost/dspace7= ☘ \COPY (SELECT DISTINCT(lower(text_value)) AS "subject" FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (187, 120, 210, 122, 215, 127, 208, 124, 128, 123, 125, 135, 203, 236, 238, 119)) to /tmp/2024-03-19-cgspace-subjects.csv WITH CSV HEADER;
COPY 28024
</code></pre><ul>
<li>Then I extracted the subjects and looked them up against AGROVOC:</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span>$ csvcut -c subject /tmp/2024-03-19-cgspace-subjects.csv | sed <spanstyle="color:#e6db74">'1d'</span>> /tmp/2024-03-19-cgspace-subjects.txt
<li>Identify seven duplicates on CGSpace from the PRMS results and withdraw them from CGSpace</li>
</ul>
<h2id="2024-03-21">2024-03-21</h2>
<ul>
<li>Look more closely at duplicates on CGSpace based on a fresh export
<ul>
<li>Using DOIs I found ~842 that occur more than once for journal articles alone, so probably around 400 duplicates</li>
<li>I did a handful of them, merging the metadata and withdrawing the duplicate, and decided to add <code>dcterms.replaces</code> with the handle in the original</li>
</ul>
</li>
</ul>
<h2id="2024-03-22">2024-03-22</h2>
<ul>
<li>Look at duplicate DOIs on CGSpace and address a dozen or so</li>
</ul>
<h2id="2024-03-23">2024-03-23</h2>
<ul>
<li>Look at duplicate DOIs on CGSpace and address a dozen or so</li>
<li>Update Tomcat and Solr to latest versions
<ul>
<li>I had done some tests with these last week, and did a last minute test on DSpace 7 Test to make sure submission and searching worked</li>
</ul>
</li>
</ul>
<h2id="2024-03-24">2024-03-24</h2>
<ul>
<li>Slowly process several dozen more duplicate DOIs on CGSpace, sigh…</li>