<metaproperty="og:description"content="2023-12-01 There is still high load on CGSpace and I don’t know why I don’t see a high number of sessions compared to previous days in the last few weeks $ for file in dspace.log.2023-11-[23]*; do echo "$file"; grep -a -oE 'session_id=[A-Z0-9]{32}' "$file" | sort | uniq | wc -l; done dspace.log.2023-11-20 22865 dspace.log.2023-11-21 20296 dspace.log.2023-11-22 19688 dspace.log.2023-11-23 17906 dspace.log.2023-11-24 18453 dspace.log.2023-11-25 17513 dspace.log.2023-11-26 19037 dspace.log.2023-11-27 21103 dspace.log.2023-11-28 23023 dspace.log.2023-11-29 23545 dspace."/>
<metaname="twitter:description"content="2023-12-01 There is still high load on CGSpace and I don’t know why I don’t see a high number of sessions compared to previous days in the last few weeks $ for file in dspace.log.2023-11-[23]*; do echo "$file"; grep -a -oE 'session_id=[A-Z0-9]{32}' "$file" | sort | uniq | wc -l; done dspace.log.2023-11-20 22865 dspace.log.2023-11-21 20296 dspace.log.2023-11-22 19688 dspace.log.2023-11-23 17906 dspace.log.2023-11-24 18453 dspace.log.2023-11-25 17513 dspace.log.2023-11-26 19037 dspace.log.2023-11-27 21103 dspace.log.2023-11-28 23023 dspace.log.2023-11-29 23545 dspace."/>
<li>Send a message to Altmetric support because the item IWMI highlighted last month still doesn’t show the attention score for the Handle after I tweeted it several times weeks ago</li>
<li>Spent some time writing a Python script to fix the literal MaxMind City JSON objects in our Solr statistics
<ul>
<li>There are about 1.6 million of these, so I exported them using solr-import-export-json with the query <code>city:com*</code> but ended up finding many that have missing bundles, container bitstreams, etc:</li>
</ul>
</li>
</ul>
<pretabindex="0"><code>city:com* AND -bundleName:[* TO *] AND -containerBitstream:[* TO *] AND -file_id:[* TO *] AND -owningItem:[* TO *] AND -version_id:[* TO *]
</code></pre><ul>
<li>(Note the negation to find fields that are missing)</li>
<li>I don’t know what I want to do with these yet</li>
</ul>
<h2id="2023-12-05">2023-12-05</h2>
<ul>
<li>I finished the <code>fix_maxmind_stats.py</code> script and fixed 1.6 million records and imported them on CGSpace after testing on DSpace 7 Test</li>
<li>Altmetric said there was a glitch regarding the Handle and DOI linking and they successfully re-scraped the item page and linked them
<ul>
<li>They sent me a list of current production IPs and I notice that some of them are in our nginx bot network list:</li>
</ul>
</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span>$ <spanstyle="color:#66d9ef">for</span> network in <spanstyle="color:#66d9ef">$(</span>csvcut -c network /tmp/ips.csv | sed 1d | sort -u<spanstyle="color:#66d9ef">)</span>; <spanstyle="color:#66d9ef">do</span> grepcidr $network ~/src/git/rmg-ansible-public/roles/dspace/files/nginx/bot-networks.conf; <spanstyle="color:#66d9ef">done</span>
<li>Finalized the script to generate Solr statistics for Alliance research Mirjam
<ul>
<li>The script is <code>ilri/generate_solr_statistics.py</code></li>
<li>I generated ~3,200 statistics based on her records of the download statistics of <ahref="https://hdl.handle.net/10568/131997">that item</a> and imported them on CGSpace</li>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span>localhost/dspace7= ☘ \COPY (SELECT DISTINCT text_value AS "dc.contributor.author", count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id = 3 GROUP BY "dc.contributor.author" ORDER BY count DESC) to /tmp/2023-12-08-authors.csv WITH CSV HEADER;
<li>The Alliance TIP team is testing deposits to the DSpace 7 REST API and getting an HTTP 500 error
<ul>
<li>In the DSpace logs I see this after they log in, create the item, and update the metadata:</li>
</ul>
</li>
</ul>
<pretabindex="0"><code>2023-12-19 17:49:28,022 ERROR unknown unknown org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
</code></pre><ul>
<li>I found some messages on the dspace-tech mailing list suggesting this might be an old bug: <ahref="https://groups.google.com/g/dspace-tech/c/My1GUFYFGoU/m/tS7-WAJPAwAJ">https://groups.google.com/g/dspace-tech/c/My1GUFYFGoU/m/tS7-WAJPAwAJ</a>
<ul>
<li>I restarted Tomcat and told the Alliance TIP team to try again</li>
</ul>
</li>
</ul>
<h2id="2023-12-20">2023-12-20</h2>
<ul>
<li>The Alliance guys said that submitting via REST works now… sigh, so that’s just some old DSpace 5/6 REST API bug</li>
<li>I lowercased all our AGROVOC keywords in <code>dcterms.subject</code> in SQL:</li>
</span></span><spanstyle="display:flex;"><span>dspace=*# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=187 AND text_value ~ '[[:upper:]]';
<li>Then I saw the size of the snapshot reach the size of the index…</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span># du -sh /var/solr/data/configsets/statistics/data/*
<li>Interestingly the import worked fine, but created a new data index:</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span># du -sh /var/solr/data/configsets/statistics/data/*
<li>Not sure the implications of that—Solr uses the data just fine</li>
<li>I can surely use this for atomic Solr backups</li>
</ul>
<h2id="2023-12-27">2023-12-27</h2>
<ul>
<li>Delete duplicate metadata as described in my DSpace issue from last year: <ahref="https://github.com/DSpace/DSpace/issues/8253">https://github.com/DSpace/DSpace/issues/8253</a></li>
<li>Do some other metadata cleanups on CGSpace
<ul>
<li>I also looked up our DOIs on Crossref to get some missing abstracts and correct licenses and dates</li>
</ul>
</li>
<li>Some minor work on the CGSpace DSpace 7 theme to fix the navbar on mobile</li>
<li>Some work on the IFPRI ISNAR archive</li>
</ul>
<h2id="2023-12-28">2023-12-28</h2>
<ul>
<li>I started porting the <ahref="https://github.com/ilri/cgspace-java-helpers">cgspace-java-helpers</a> to DSpace 7</li>
<li>Some work on the IFPRI ISNAR archive
<ul>
<li>I ended up going through most of the PDFs to get better dates and abstracts</li>
</ul>
</li>
</ul>
<h2id="2023-12-29">2023-12-29</h2>
<ul>
<li>I created a new Hetzner server to replace the current DSpace 6 CGSpace next week when we migrate to DSpace 7</li>
<li>Interesting, I haven’t checked for content pointing to legacy domains in several years (!)
<ul>
<li><code>inurl:mahider.cgiar.org</code>: 0 results on Google!</li>
<li><code>inurl:mahider.ilri.org</code>: 2,100 results on Google</li>
<li><code>inurl:mahider.ilri.org inurl:https</code>: 2 results on Google (!)</li>
<li><code>inurl:dspace.ilri.org:</code> 1,390 results on Google</li>
<li><code>inurl:dspace.ilri.org inurl:https</code>: 0 results on Google (!)</li>
</ul>
</li>
<li>So it seems I can do away with the HTTPS virtual hosts finally
<ul>
<li>Well my current certificates expired on 2021-02-13 and nobody noticed… so…</li>