Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
DSpace Test
CGSpace
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
DSpace Test
CGSpace
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
<li>Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul>
<ul>
<li>If I change the parameters to 2019 I see stats, so I’m really thinking it has something to do with the sharded yearly Solr statistics cores
<ul>
<li>I checked the Solr admin UI and I see all Solr cores loaded, so I don’t know what it could be</li>
<li>When I check the Atmire content and usage module it seems obvious that there is a problem with the old cores because I dont have anything before 2019-01</li>
<li>I don’t see anyone logged in right now so I’m going to try to restart Tomcat and see if the stats are accessible after Solr comes back up</li>
<li><p>I decided to run all system updates on the server (linode18) and reboot it</p>
<ul>
<li>After rebooting Tomcat came back up, but the the Solr statistics cores were not all loaded</li>
<li><p>The error is always (with a different core):</p>
<li><p>I stopped Tomcat, deleted the old locks, and will try to use the “simple” lock file type in <code>solr/statistics/conf/solrconfig.xml</code>:</p>
<li><p>Now the cores seem to load, but I still see an error in the Solr Admin UI and I still can’t access any stats before 2018</p></li>
<li><p>I filed an <ahref="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">issue with Atmire</a>, so let’s see if they can help</p></li>
<li><p>And since I’m annoyed and it’s been a few months, I’m going to move the JVM heap settings that I’ve been testing on DSpace Test to CGSpace</p></li>
<li><p>Sisay had already done the SAFBundle so I did some minor corrections to and uploaded them to a temporary collection so I could check them in OpenRefine:</p>
<li><p>I noticed that all twenty-seven items had double dates like “2019-05||2019-05” so I fixed those, but the rest of the metadata looked good so I unmapped them from the temporary collection</p></li>
<li><p>Finish looking at the fifty-six AfricaRice items and upload them to CGSpace:</p>
<pre><code>$ dspace import -a -e me@cgiar.org -m 2019-07-02-AfricaRice-11to73.map -s /tmp/SimpleArchiveFormat
<li><p>Peter pointed out that the Sharefair dates I fixed were not actually fixed</p>
<ul>
<li>It seems there is a bug that causes DSpace to not detect changes if the values are the same like “2019-05||2019-05” and you try to remove one</li>
<li>To get it to work I had to change some of them to 2019-01, then remove them</li>
<li>Atmire responded about the <ahref="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr issue</a> and said they would be willing to help</li>
<li>Send a reminder to Marie about my notes on the <ahref="https://github.com/AgriculturalSemantics/cg-core/issues/2">CG Core v2 issue I created two weeks ago</a></li>
<li><p>Playing with the idea of using <ahref="https://github.com/BurntSushi/xsv">xsv</a> to do some basic batch quality checks on CSVs, for example to find items that might be duplicates if they have the same DOI or title:</p>
<li><p>Or perhaps items with invalid ISSNs (according to the <ahref="https://en.wikipedia.org/wiki/International_Standard_Serial_Number#Code_format">ISSN code format</a>):</p>
<li>Skype call with Marie Angelique about CG Core v2
<ul>
<li>We discussed my comments and suggestions from last week</li>
<li>One comment she had was that we should try to move our center-specific subjects into <code>DCTERMS.subject</code> and normalize them against AGROVOC</li>
<li>I updated my <ahref="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">gist about CGSpace metadata changes</a></li>
<li>Yesterday Theirry from CTA asked me about an error he was getting while submitting an item on CGSpace: “Unable to load Submission Information, since WorkspaceID (ID:S106658) is not a valid in-process submission.”</li>
<li><p>I looked in the DSpace logs and found this right around the time of the screenshot he sent me:</p>
<pre><code>2019-07-10 11:50:27,433 INFO org.dspace.submit.step.CompleteStep @ lewyllie@cta.int:session_id=A920730003BCAECE8A3B31DCDE11A97E:submission_complete:Completed submission with id=106658
</code></pre></li>
<li><p>I’m assuming something happened in his browser (like a refresh) after the item was submitted…</p></li>
<li>Atmire responded with some initial feedback about our Tomcat configuration related to the <ahref="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr issue I raised recently</a>
<ul>
<li>Unfortunately there is no concrete feedback yet</li>
<li>I think we need to upgrade our DSpace Test server so we can fit all the Solr cores…</li>
<li>Actually, I looked and there were over 40 GB free on DSpace Test so I copied the Solr statistics cores for the years 2017 to 2010 from CGSpace to DSpace Test because they weren’t actually very large</li>
<li>I re-deployed DSpace for good measure, and I think all Solr cores are loading… I will do more tests later</li>
</ul></li>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li><p>Try to run <code>dspace cleanup -v</code> on CGSpace and ran into an error:</p>
<pre><code>Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
Detail: Key (bitstream_id)=(167394) is still referenced from table "bundle".
</code></pre></li>
<li><p>The solution is, as always:</p>
<pre><code># su - postgres
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (167394);'
<li><p>Completely reset the Podman configuration on my laptop because there were some layers that I couldn’t delete and it had been some time since I did a cleanup:</p>
<pre><code>$ podman system prune -a -f --volumes
$ sudo rm -rf ~/.local/share/containers
</code></pre></li>
<li><p>Then pull a new PostgreSQL 9.6 image and load a CGSpace database dump into a new local test container:</p>
<li><p>Start working on implementing the <ahref="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">CG Core v2 changes</a> on my local DSpace test environment</p></li>
<li><p>Make a pull request to CG Core v2 with some fixes for typos in the specification (<ahref="https://github.com/AgriculturalSemantics/cg-core/pull/5">#5</a>)</p></li>
<li>Talk to Moayad about the remaining issues for OpenRXV / AReS
<ul>
<li>He sent a pull request with some changes for the bar chart and documentation about configuration, and said he’d finish the export feature next week</li>
</ul></li>
<li><p>Sisay said a user was having problems registering on CGSpace and it looks like the email account expired again:</p>
<pre><code>$ dspace test-email
About to send test email:
- To: blahh@cgiar.org
- Subject: DSpace test email
- Server: smtp.office365.com
Error sending email:
- Error: javax.mail.AuthenticationFailedException
Please see the DSpace documentation for assistance.
</code></pre></li>
<li><p>I emailed ICT to ask them to reset it and make the expiration period longer if possible</p></li>
<li><p>Raise an <ahref="https://github.com/AgriculturalSemantics/cg-core/issues/8">issue on CG Core v2 spec regarding country and region coverage</a></p>
<ul>
<li><p>The current standard has them implemented as a class like this:</p>
<li>Generate a list of the ORCID identifiers that we added to CGSpace in 2019 for Sara Jani at ICARDA</li>
<li><p>Bioversity sent a new file for their migration to CGSpace</p>
<ul>
<li>There is always a blank row and blank column at the end</li>
<li>One invalid type (Brie)</li>
<li>824 items with leading/trailing spaces in dc.identifier.citation</li>
<li>175 items with a trailing comma in dc.identifier.citation (using custom text facet with GREL <code>value.endsWith(',').toString()</code>)</li>
<li>Fix them with GREL transform: <code>value.replace(/,$/, '')</code></li>
<li>A few strange publishers after splitting multi-value cells, like “(Belgium)”</li>
<li>Deleted four ISSNs that are actually ISBNs and are already present in the ISBN field</li>
<li>Eight invalid ISBNs</li>
<li>Convert all DOIs to “<ahref="https://doi.org"">https://doi.org"</a> format and fix one invalid DOI</li>
<li>Fix a handful of incorrect CRPs that seem to have been split on comma “,”</li>
<li>Lots of strange values in cg.link.reference, and I normalized all DOIs to <ahref="https://doi.org">https://doi.org</a> format</li>
<li>There are lots of invalid links here, like “36” and “recordlink:publications:2606” and “t3://record?identifier=publications&uid=2606”</li>
<li>Also there are hundreds of items that use the same value for cg.link.reference AND cg.link.dataurl</li>
<li>Use https:// for all Bioversity links (reference, data url, permalink)</li>
</ul></li>
<li><p>I might be able to use <ahref="https://pypi.org/project/isbnlib/">isbnlib</a> to validate ISBNs in Python:</p>
<pre><code>if isbnlib.is_isbn10('9966-955-07-0') or isbnlib.is_isbn13('9966-955-07-0'):
print("Yes")
else:
print("No")
</code></pre></li>
<li><p>Or with <ahref="https://github.com/arthurdejong/python-stdnum">python-stdnum</a>:</p>
<li>Add support for removing newlines (line feeds) to <ahref="https://git.sr.ht/~alanorth/csv-metadata-quality">csv-metadata-quality</a></li>
<li>On the subject of validating some of our fields like countries and regions, Abenet pointed out that these should all be valid AGROVOC terms, so we can actually try to validate against that!</li>