Add notes for 2022-03-04

This commit is contained in:
2022-03-04 15:30:06 +03:00
parent 7453499827
commit 27acbac859
115 changed files with 6550 additions and 6444 deletions

View File

@ -38,7 +38,7 @@ CGSpace
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
"/>
<meta name="generator" content="Hugo 0.92.2" />
<meta name="generator" content="Hugo 0.93.1" />
@ -153,13 +153,13 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
</ul>
</li>
</ul>
<pre tabindex="0"><code>org.apache.solr.common.SolrException: Error CREATEing SolrCore 'statistics-2010': Unable to create core [statistics-2010] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2010/data/index/write.lock
<pre tabindex="0"><code>org.apache.solr.common.SolrException: Error CREATEing SolrCore &#39;statistics-2010&#39;: Unable to create core [statistics-2010] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2010/data/index/write.lock
</code></pre><ul>
<li>I restarted Tomcat <em>ten times</em> and it never worked&hellip;</li>
<li>I tried to stop Tomcat and delete the write locks:</li>
</ul>
<pre tabindex="0"><code># systemctl stop tomcat7
# find /dspace/solr/statistics* -iname &quot;*.lock&quot; -print -delete
# find /dspace/solr/statistics* -iname &#34;*.lock&#34; -print -delete
/dspace/solr/statistics/data/index/write.lock
/dspace/solr/statistics-2010/data/index/write.lock
/dspace/solr/statistics-2011/data/index/write.lock
@ -170,7 +170,7 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
/dspace/solr/statistics-2016/data/index/write.lock
/dspace/solr/statistics-2017/data/index/write.lock
/dspace/solr/statistics-2018/data/index/write.lock
# find /dspace/solr/statistics* -iname &quot;*.lock&quot; -print -delete
# find /dspace/solr/statistics* -iname &#34;*.lock&#34; -print -delete
# systemctl start tomcat7
</code></pre><ul>
<li>But it still didn&rsquo;t work!</li>
@ -221,8 +221,8 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
</ul>
</li>
</ul>
<pre tabindex="0"><code>$ sed -i 's/CC-BY 4.0/CC-BY-4.0/' item_*/dublin_core.xml
$ echo &quot;10568/101992&quot; &gt;&gt; item_*/collections
<pre tabindex="0"><code>$ sed -i &#39;s/CC-BY 4.0/CC-BY-4.0/&#39; item_*/dublin_core.xml
$ echo &#34;10568/101992&#34; &gt;&gt; item_*/collections
$ dspace import -a -e me@cgiar.org -m 2019-07-02-Sharefair.map -s /tmp/Sharefair_mapped
</code></pre><ul>
<li>I noticed that all twenty-seven items had double dates like &ldquo;2019-05||2019-05&rdquo; so I fixed those, but the rest of the metadata looked good so I unmapped them from the temporary collection</li>
@ -249,20 +249,20 @@ $ dspace import -a -e me@cgiar.org -m 2019-07-02-Sharefair.map -s /tmp/Sharefair
</ul>
</li>
</ul>
<pre tabindex="0"><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/new-bioversity-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u &gt; /tmp/2019-07-04-orcid-ids.txt
<pre tabindex="0"><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/new-bioversity-orcids.txt | grep -oE &#39;[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}&#39; | sort -u &gt; /tmp/2019-07-04-orcid-ids.txt
$ ./resolve-orcids.py -i /tmp/2019-07-04-orcid-ids.txt -o 2019-07-04-orcid-names.txt -d
</code></pre><ul>
<li>Send and merge a pull request for the new ORCID identifiers (<a href="https://github.com/ilri/DSpace/pull/428">#428</a>)</li>
<li>I created a CSV with some ORCID identifiers that I had seen change so I could update any existing ones in the databse:</li>
</ul>
<pre tabindex="0"><code>cg.creator.id,correct
&quot;Marius Ekué: 0000-0002-5829-6321&quot;,&quot;Marius R.M. Ekué: 0000-0002-5829-6321&quot;
&quot;Mwungu: 0000-0001-6181-8445&quot;,&quot;Chris Miyinzi Mwungu: 0000-0001-6181-8445&quot;
&quot;Mwungu: 0000-0003-1658-287X&quot;,&quot;Chris Miyinzi Mwungu: 0000-0003-1658-287X&quot;
&#34;Marius Ekué: 0000-0002-5829-6321&#34;,&#34;Marius R.M. Ekué: 0000-0002-5829-6321&#34;
&#34;Mwungu: 0000-0001-6181-8445&#34;,&#34;Chris Miyinzi Mwungu: 0000-0001-6181-8445&#34;
&#34;Mwungu: 0000-0003-1658-287X&#34;,&#34;Chris Miyinzi Mwungu: 0000-0003-1658-287X&#34;
</code></pre><ul>
<li>But when I ran <code>fix-metadata-values.py</code> I didn&rsquo;t see any changes:</li>
</ul>
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i 2019-07-04-update-orcids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -m 240 -t correct -d
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i 2019-07-04-update-orcids.csv -db dspace -u dspace -p &#39;fuuu&#39; -f cg.creator.id -m 240 -t correct -d
</code></pre><h2 id="2019-07-06">2019-07-06</h2>
<ul>
<li>Send a reminder to Marie about my notes on the <a href="https://github.com/AgriculturalSemantics/cg-core/issues/2">CG Core v2 issue I created two weeks ago</a></li>
@ -282,22 +282,22 @@ $ ./resolve-orcids.py -i /tmp/2019-07-04-orcid-ids.txt -o 2019-07-04-orcid-names
</li>
<li>Playing with the idea of using <a href="https://github.com/BurntSushi/xsv">xsv</a> to do some basic batch quality checks on CSVs, for example to find items that might be duplicates if they have the same DOI or title:</li>
</ul>
<pre tabindex="0"><code>$ xsv frequency --select cg.identifier.doi --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E ',1'
<pre tabindex="0"><code>$ xsv frequency --select cg.identifier.doi --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E &#39;,1&#39;
field,value,count
cg.identifier.doi,https://doi.org/10.1016/j.agwat.2018.06.018,2
$ xsv frequency --select dc.title --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E ',1'
$ xsv frequency --select dc.title --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E &#39;,1&#39;
field,value,count
dc.title,Reference evapotranspiration prediction using hybridized fuzzy model with firefly algorithm: Regional case study in Burkina Faso,2
</code></pre><ul>
<li>Or perhaps if DOIs are valid or not (having doi.org in the URL):</li>
</ul>
<pre tabindex="0"><code>$ xsv frequency --select cg.identifier.doi --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E 'doi.org'
<pre tabindex="0"><code>$ xsv frequency --select cg.identifier.doi --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E &#39;doi.org&#39;
field,value,count
cg.identifier.doi,https://hdl.handle.net/10520/EJC-1236ac700f,1
</code></pre><ul>
<li>Or perhaps items with invalid ISSNs (according to the <a href="https://en.wikipedia.org/wiki/International_Standard_Serial_Number#Code_format">ISSN code format</a>):</li>
</ul>
<pre tabindex="0"><code>$ xsv select dc.identifier.issn cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v '&quot;' | grep -v -E '^[0-9]{4}-[0-9]{3}[0-9xX]$'
<pre tabindex="0"><code>$ xsv select dc.identifier.issn cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v &#39;&#34;&#39; | grep -v -E &#39;^[0-9]{4}-[0-9]{3}[0-9xX]$&#39;
dc.identifier.issn
978-3-319-71997-9
978-3-319-71997-9
@ -350,13 +350,13 @@ dc.identifier.issn
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li>Try to run <code>dspace cleanup -v</code> on CGSpace and ran into an error:</li>
</ul>
<pre tabindex="0"><code>Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
Detail: Key (bitstream_id)=(167394) is still referenced from table &quot;bundle&quot;.
<pre tabindex="0"><code>Error: ERROR: update or delete on table &#34;bitstream&#34; violates foreign key constraint &#34;bundle_primary_bitstream_id_fkey&#34; on table &#34;bundle&#34;
Detail: Key (bitstream_id)=(167394) is still referenced from table &#34;bundle&#34;.
</code></pre><ul>
<li>The solution is, as always:</li>
</ul>
<pre tabindex="0"><code># su - postgres
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (167394);'
$ psql dspace -c &#39;update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (167394);&#39;
UPDATE 1
</code></pre><h2 id="2019-07-16">2019-07-16</h2>
<ul>
@ -371,9 +371,9 @@ $ sudo rm -rf ~/.local/share/containers
$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
$ createuser -h localhost -U postgres --pwprompt dspacetest
$ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
$ psql -h localhost -U postgres dspacetest -c &#39;alter user dspacetest superuser;&#39;
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2019-07-16.backup
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
$ psql -h localhost -U postgres dspacetest -c &#39;alter user dspacetest nosuperuser;&#39;
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
</code></pre><ul>
<li>Start working on implementing the <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">CG Core v2 changes</a> on my local DSpace test environment</li>
@ -414,7 +414,7 @@ Please see the DSpace documentation for assistance.
<ul>
<li>Create an account for Lionelle Samnick on CGSpace because the registration isn&rsquo;t working for some reason:</li>
</ul>
<pre tabindex="0"><code>$ dspace user --add --givenname Lionelle --surname Samnick --email blah@blah.com --password 'blah'
<pre tabindex="0"><code>$ dspace user --add --givenname Lionelle --surname Samnick --email blah@blah.com --password &#39;blah&#39;
</code></pre><ul>
<li>I added her as a submitter to <a href="https://cgspace.cgiar.org/handle/10568/74536">CTA ISF Pro-Agro series</a></li>
<li>Start looking at 1429 records for the Bioversity batch import
@ -484,18 +484,18 @@ Please see the DSpace documentation for assistance.
<p>I might be able to use <a href="https://pypi.org/project/isbnlib/">isbnlib</a> to validate ISBNs in Python:</p>
</li>
</ul>
<pre tabindex="0"><code>if isbnlib.is_isbn10('9966-955-07-0') or isbnlib.is_isbn13('9966-955-07-0'):
print(&quot;Yes&quot;)
<pre tabindex="0"><code>if isbnlib.is_isbn10(&#39;9966-955-07-0&#39;) or isbnlib.is_isbn13(&#39;9966-955-07-0&#39;):
print(&#34;Yes&#34;)
else:
print(&quot;No&quot;)
print(&#34;No&#34;)
</code></pre><ul>
<li>Or with <a href="https://github.com/arthurdejong/python-stdnum">python-stdnum</a>:</li>
</ul>
<pre tabindex="0"><code>from stdnum import isbn
from stdnum import issn
isbn.validate('978-92-9043-389-7')
issn.validate('1020-3362')
isbn.validate(&#39;978-92-9043-389-7&#39;)
issn.validate(&#39;1020-3362&#39;)
</code></pre><h2 id="2019-07-26">2019-07-26</h2>
<ul>
<li>
@ -510,7 +510,7 @@ issn.validate('1020-3362')
<p>I figured out a GREL to trim spaces in multi-value cells without splitting them:</p>
</li>
</ul>
<pre tabindex="0"><code>value.replace(/\s+\|\|/,&quot;||&quot;).replace(/\|\|\s+/,&quot;||&quot;)
<pre tabindex="0"><code>value.replace(/\s+\|\|/,&#34;||&#34;).replace(/\|\|\s+/,&#34;||&#34;)
</code></pre><ul>
<li>I whipped up a quick script using Python Pandas to do whitespace cleanup</li>
</ul>