Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
DSpace Test
CGSpace
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
DSpace Test
CGSpace
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
<li>Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul>
<ul>
<li>If I change the parameters to 2019 I see stats, so I’m really thinking it has something to do with the sharded yearly Solr statistics cores
<ul>
<li>I checked the Solr admin UI and I see all Solr cores loaded, so I don’t know what it could be</li>
<li>When I check the Atmire content and usage module it seems obvious that there is a problem with the old cores because I dont have anything before 2019-01</li>
<li>I don’t see anyone logged in right now so I’m going to try to restart Tomcat and see if the stats are accessible after Solr comes back up</li>
<li><p>I decided to run all system updates on the server (linode18) and reboot it</p>
<ul>
<li>After rebooting Tomcat came back up, but the the Solr statistics cores were not all loaded</li>
<li><p>The error is always (with a different core):</p>
<li><p>I stopped Tomcat, deleted the old locks, and will try to use the “simple” lock file type in <code>solr/statistics/conf/solrconfig.xml</code>:</p>
<li><p>Now the cores seem to load, but I still see an error in the Solr Admin UI and I still can’t access any stats before 2018</p></li>
<li><p>I filed an <ahref="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">issue with Atmire</a>, so let’s see if they can help</p></li>
<li><p>And since I’m annoyed and it’s been a few months, I’m going to move the JVM heap settings that I’ve been testing on DSpace Test to CGSpace</p></li>
<li><p>Sisay had already done the SAFBundle so I did some minor corrections to and uploaded them to a temporary collection so I could check them in OpenRefine:</p>
<li><p>I noticed that all twenty-seven items had double dates like “2019-05||2019-05” so I fixed those, but the rest of the metadata looked good so I unmapped them from the temporary collection</p></li>
<li><p>Finish looking at the fifty-six AfricaRice items and upload them to CGSpace:</p>
<pre><code>$ dspace import -a -e me@cgiar.org -m 2019-07-02-AfricaRice-11to73.map -s /tmp/SimpleArchiveFormat
<li><p>Peter pointed out that the Sharefair dates I fixed were not actually fixed</p>
<ul>
<li>It seems there is a bug that causes DSpace to not detect changes if the values are the same like “2019-05||2019-05” and you try to remove one</li>
<li>To get it to work I had to change some of them to 2019-01, then remove them</li>
<li>Atmire responded about the <ahref="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr issue</a> and said they would be willing to help</li>
<li>Send a reminder to Marie about my notes on the <ahref="https://github.com/AgriculturalSemantics/cg-core/issues/2">CG Core v2 issue I created two weeks ago</a></li>
<li><p>Playing with the idea of using <ahref="https://github.com/BurntSushi/xsv">xsv</a> to do some basic batch quality checks on CSVs, for example to find items that might be duplicates if they have the same DOI or title:</p>
<li><p>Or perhaps items with invalid ISSNs (according to the <ahref="https://en.wikipedia.org/wiki/International_Standard_Serial_Number#Code_format">ISSN code format</a>):</p>
<li>Skype call with Marie Angelique about CG Core v2
<ul>
<li>We discussed my comments and suggestions from last week</li>
<li>One comment she had was that we should try to move our center-specific subjects into <code>DCTERMS.subject</code> and normalize them against AGROVOC</li>
<li>I updated my <ahref="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">gist about CGSpace metadata changes</a></li>
<li>Yesterday Theirry from CTA asked me about an error he was getting while submitting an item on CGSpace: “Unable to load Submission Information, since WorkspaceID (ID:S106658) is not a valid in-process submission.”</li>
<li><p>I looked in the DSpace logs and found this right around the time of the screenshot he sent me:</p>
<pre><code>2019-07-10 11:50:27,433 INFO org.dspace.submit.step.CompleteStep @ lewyllie@cta.int:session_id=A920730003BCAECE8A3B31DCDE11A97E:submission_complete:Completed submission with id=106658
</code></pre></li>
<li><p>I’m assuming something happened in his browser (like a refresh) after the item was submitted…</p></li>
<li>Atmire responded with some initial feedback about our Tomcat configuration related to the <ahref="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr issue I raised recently</a>
<ul>
<li>Unfortunately there is no concrete feedback yet</li>
<li>I think we need to upgrade our DSpace Test server so we can fit all the Solr cores…</li>
<li>Actually, I looked and there were over 40 GB free on DSpace Test so I copied the Solr statistics cores for the years 2017 to 2010 from CGSpace to DSpace Test because they weren’t actually very large</li>
<li>I re-deployed DSpace for good measure, and I think all Solr cores are loading… I will do more tests later</li>
</ul></li>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li><p>Try to run <code>dspace cleanup -v</code> on CGSpace and ran into an error:</p>
<pre><code>Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
Detail: Key (bitstream_id)=(167394) is still referenced from table "bundle".
</code></pre></li>
<li><p>The solution is, as always:</p>
<pre><code># su - postgres
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (167394);'
<li><p>Completely reset the Podman configuration on my laptop because there were some layers that I couldn’t delete and it had been some time since I did a cleanup:</p>
<pre><code>$ podman system prune -a -f --volumes
$ sudo rm -rf ~/.local/share/containers
</code></pre></li>
<li><p>Then pull a new PostgreSQL 9.6 image and load a CGSpace database dump into a new local test container:</p>
<li><p>Start working on implementing the <ahref="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">CG Core v2 changes</a> on my local DSpace test environment</p></li>
<li><p>Make a pull request to CG Core v2 with some fixes for typos in the specification (<ahref="https://github.com/AgriculturalSemantics/cg-core/pull/5">#5</a>)</p></li>
<li>Talk to Moayad about the remaining issues for OpenRXV / AReS
<ul>
<li>He sent a pull request with some changes for the bar chart and documentation about configuration, and said he’d finish the export feature next week</li>
</ul></li>
<li><p>Sisay said a user was having problems registering on CGSpace and it looks like the email account expired again:</p>
<pre><code>$ dspace test-email
About to send test email:
- To: blahh@cgiar.org
- Subject: DSpace test email
- Server: smtp.office365.com
Error sending email:
- Error: javax.mail.AuthenticationFailedException
Please see the DSpace documentation for assistance.
</code></pre></li>
<li><p>I emailed ICT to ask them to reset it and make the expiration period longer if possible</p></li>
<li><p>Raise an <ahref="https://github.com/AgriculturalSemantics/cg-core/issues/8">issue on CG Core v2 spec regarding country and region coverage</a></p>
<ul>
<li><p>The current standard has them implemented as a class like this:</p>
<li>Generate a list of the ORCID identifiers that we added to CGSpace in 2019 for Sara Jani at ICARDA</li>
<li><p>Bioversity sent a new file for their migration to CGSpace</p>
<ul>
<li>There is always a blank row and blank column at the end</li>
<li>One invalid type (Brie)</li>
<li>824 items with leading/trailing spaces in dc.identifier.citation</li>
<li>175 items with a trailing comma in dc.identifier.citation (using custom text facet with GREL <code>value.endsWith(',').toString()</code>)</li>
<li>Fix them with GREL transform: <code>value.replace(/,$/, '')</code></li>
<li>A few strange publishers after splitting multi-value cells, like “(Belgium)”</li>
<li>Deleted four ISSNs that are actually ISBNs and are already present in the ISBN field</li>
<li>Eight invalid ISBNs</li>
<li>Convert all DOIs to “<ahref="https://doi.org"">https://doi.org"</a> format and fix one invalid DOI</li>
<li>Fix a handful of incorrect CRPs that seem to have been split on comma “,”</li>
<li>Lots of strange values in cg.link.reference, and I normalized all DOIs to <ahref="https://doi.org">https://doi.org</a> format</li>
<li>There are lots of invalid links here, like “36” and “recordlink:publications:2606” and “t3://record?identifier=publications&uid=2606”</li>
<li>Also there are hundreds of items that use the same value for cg.link.reference AND cg.link.dataurl</li>
<li>Use https:// for all Bioversity links (reference, data url, permalink)</li>
</ul></li>
<li><p>I might be able to use <ahref="https://pypi.org/project/isbnlib/">isbnlib</a> to validate ISBNs in Python:</p>
<pre><code>if isbnlib.is_isbn10('9966-955-07-0') or isbnlib.is_isbn13('9966-955-07-0'):
print("Yes")
else:
print("No")
</code></pre></li>
<li><p>Or with <ahref="https://github.com/arthurdejong/python-stdnum">python-stdnum</a>:</p>