mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-03-04
This commit is contained in:
@ -38,7 +38,7 @@ CGSpace
|
||||
|
||||
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -153,13 +153,13 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>org.apache.solr.common.SolrException: Error CREATEing SolrCore 'statistics-2010': Unable to create core [statistics-2010] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2010/data/index/write.lock
|
||||
<pre tabindex="0"><code>org.apache.solr.common.SolrException: Error CREATEing SolrCore 'statistics-2010': Unable to create core [statistics-2010] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2010/data/index/write.lock
|
||||
</code></pre><ul>
|
||||
<li>I restarted Tomcat <em>ten times</em> and it never worked…</li>
|
||||
<li>I tried to stop Tomcat and delete the write locks:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># systemctl stop tomcat7
|
||||
# find /dspace/solr/statistics* -iname "*.lock" -print -delete
|
||||
# find /dspace/solr/statistics* -iname "*.lock" -print -delete
|
||||
/dspace/solr/statistics/data/index/write.lock
|
||||
/dspace/solr/statistics-2010/data/index/write.lock
|
||||
/dspace/solr/statistics-2011/data/index/write.lock
|
||||
@ -170,7 +170,7 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
|
||||
/dspace/solr/statistics-2016/data/index/write.lock
|
||||
/dspace/solr/statistics-2017/data/index/write.lock
|
||||
/dspace/solr/statistics-2018/data/index/write.lock
|
||||
# find /dspace/solr/statistics* -iname "*.lock" -print -delete
|
||||
# find /dspace/solr/statistics* -iname "*.lock" -print -delete
|
||||
# systemctl start tomcat7
|
||||
</code></pre><ul>
|
||||
<li>But it still didn’t work!</li>
|
||||
@ -221,8 +221,8 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ sed -i 's/CC-BY 4.0/CC-BY-4.0/' item_*/dublin_core.xml
|
||||
$ echo "10568/101992" >> item_*/collections
|
||||
<pre tabindex="0"><code>$ sed -i 's/CC-BY 4.0/CC-BY-4.0/' item_*/dublin_core.xml
|
||||
$ echo "10568/101992" >> item_*/collections
|
||||
$ dspace import -a -e me@cgiar.org -m 2019-07-02-Sharefair.map -s /tmp/Sharefair_mapped
|
||||
</code></pre><ul>
|
||||
<li>I noticed that all twenty-seven items had double dates like “2019-05||2019-05” so I fixed those, but the rest of the metadata looked good so I unmapped them from the temporary collection</li>
|
||||
@ -249,20 +249,20 @@ $ dspace import -a -e me@cgiar.org -m 2019-07-02-Sharefair.map -s /tmp/Sharefair
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/new-bioversity-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u > /tmp/2019-07-04-orcid-ids.txt
|
||||
<pre tabindex="0"><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/new-bioversity-orcids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u > /tmp/2019-07-04-orcid-ids.txt
|
||||
$ ./resolve-orcids.py -i /tmp/2019-07-04-orcid-ids.txt -o 2019-07-04-orcid-names.txt -d
|
||||
</code></pre><ul>
|
||||
<li>Send and merge a pull request for the new ORCID identifiers (<a href="https://github.com/ilri/DSpace/pull/428">#428</a>)</li>
|
||||
<li>I created a CSV with some ORCID identifiers that I had seen change so I could update any existing ones in the databse:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>cg.creator.id,correct
|
||||
"Marius Ekué: 0000-0002-5829-6321","Marius R.M. Ekué: 0000-0002-5829-6321"
|
||||
"Mwungu: 0000-0001-6181-8445","Chris Miyinzi Mwungu: 0000-0001-6181-8445"
|
||||
"Mwungu: 0000-0003-1658-287X","Chris Miyinzi Mwungu: 0000-0003-1658-287X"
|
||||
"Marius Ekué: 0000-0002-5829-6321","Marius R.M. Ekué: 0000-0002-5829-6321"
|
||||
"Mwungu: 0000-0001-6181-8445","Chris Miyinzi Mwungu: 0000-0001-6181-8445"
|
||||
"Mwungu: 0000-0003-1658-287X","Chris Miyinzi Mwungu: 0000-0003-1658-287X"
|
||||
</code></pre><ul>
|
||||
<li>But when I ran <code>fix-metadata-values.py</code> I didn’t see any changes:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i 2019-07-04-update-orcids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -m 240 -t correct -d
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i 2019-07-04-update-orcids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -m 240 -t correct -d
|
||||
</code></pre><h2 id="2019-07-06">2019-07-06</h2>
|
||||
<ul>
|
||||
<li>Send a reminder to Marie about my notes on the <a href="https://github.com/AgriculturalSemantics/cg-core/issues/2">CG Core v2 issue I created two weeks ago</a></li>
|
||||
@ -282,22 +282,22 @@ $ ./resolve-orcids.py -i /tmp/2019-07-04-orcid-ids.txt -o 2019-07-04-orcid-names
|
||||
</li>
|
||||
<li>Playing with the idea of using <a href="https://github.com/BurntSushi/xsv">xsv</a> to do some basic batch quality checks on CSVs, for example to find items that might be duplicates if they have the same DOI or title:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ xsv frequency --select cg.identifier.doi --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E ',1'
|
||||
<pre tabindex="0"><code>$ xsv frequency --select cg.identifier.doi --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E ',1'
|
||||
field,value,count
|
||||
cg.identifier.doi,https://doi.org/10.1016/j.agwat.2018.06.018,2
|
||||
$ xsv frequency --select dc.title --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E ',1'
|
||||
$ xsv frequency --select dc.title --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E ',1'
|
||||
field,value,count
|
||||
dc.title,Reference evapotranspiration prediction using hybridized fuzzy model with firefly algorithm: Regional case study in Burkina Faso,2
|
||||
</code></pre><ul>
|
||||
<li>Or perhaps if DOIs are valid or not (having doi.org in the URL):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ xsv frequency --select cg.identifier.doi --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E 'doi.org'
|
||||
<pre tabindex="0"><code>$ xsv frequency --select cg.identifier.doi --no-nulls cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v -E 'doi.org'
|
||||
field,value,count
|
||||
cg.identifier.doi,https://hdl.handle.net/10520/EJC-1236ac700f,1
|
||||
</code></pre><ul>
|
||||
<li>Or perhaps items with invalid ISSNs (according to the <a href="https://en.wikipedia.org/wiki/International_Standard_Serial_Number#Code_format">ISSN code format</a>):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ xsv select dc.identifier.issn cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v '"' | grep -v -E '^[0-9]{4}-[0-9]{3}[0-9xX]$'
|
||||
<pre tabindex="0"><code>$ xsv select dc.identifier.issn cgspace_metadata_africaRice-11to73_ay_id.csv | grep -v '"' | grep -v -E '^[0-9]{4}-[0-9]{3}[0-9xX]$'
|
||||
dc.identifier.issn
|
||||
978-3-319-71997-9
|
||||
978-3-319-71997-9
|
||||
@ -350,13 +350,13 @@ dc.identifier.issn
|
||||
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
|
||||
<li>Try to run <code>dspace cleanup -v</code> on CGSpace and ran into an error:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(167394) is still referenced from table "bundle".
|
||||
<pre tabindex="0"><code>Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(167394) is still referenced from table "bundle".
|
||||
</code></pre><ul>
|
||||
<li>The solution is, as always:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># su - postgres
|
||||
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (167394);'
|
||||
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (167394);'
|
||||
UPDATE 1
|
||||
</code></pre><h2 id="2019-07-16">2019-07-16</h2>
|
||||
<ul>
|
||||
@ -371,9 +371,9 @@ $ sudo rm -rf ~/.local/share/containers
|
||||
$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
|
||||
$ createuser -h localhost -U postgres --pwprompt dspacetest
|
||||
$ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2019-07-16.backup
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
|
||||
</code></pre><ul>
|
||||
<li>Start working on implementing the <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">CG Core v2 changes</a> on my local DSpace test environment</li>
|
||||
@ -414,7 +414,7 @@ Please see the DSpace documentation for assistance.
|
||||
<ul>
|
||||
<li>Create an account for Lionelle Samnick on CGSpace because the registration isn’t working for some reason:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ dspace user --add --givenname Lionelle --surname Samnick --email blah@blah.com --password 'blah'
|
||||
<pre tabindex="0"><code>$ dspace user --add --givenname Lionelle --surname Samnick --email blah@blah.com --password 'blah'
|
||||
</code></pre><ul>
|
||||
<li>I added her as a submitter to <a href="https://cgspace.cgiar.org/handle/10568/74536">CTA ISF Pro-Agro series</a></li>
|
||||
<li>Start looking at 1429 records for the Bioversity batch import
|
||||
@ -484,18 +484,18 @@ Please see the DSpace documentation for assistance.
|
||||
<p>I might be able to use <a href="https://pypi.org/project/isbnlib/">isbnlib</a> to validate ISBNs in Python:</p>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>if isbnlib.is_isbn10('9966-955-07-0') or isbnlib.is_isbn13('9966-955-07-0'):
|
||||
print("Yes")
|
||||
<pre tabindex="0"><code>if isbnlib.is_isbn10('9966-955-07-0') or isbnlib.is_isbn13('9966-955-07-0'):
|
||||
print("Yes")
|
||||
else:
|
||||
print("No")
|
||||
print("No")
|
||||
</code></pre><ul>
|
||||
<li>Or with <a href="https://github.com/arthurdejong/python-stdnum">python-stdnum</a>:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>from stdnum import isbn
|
||||
from stdnum import issn
|
||||
|
||||
isbn.validate('978-92-9043-389-7')
|
||||
issn.validate('1020-3362')
|
||||
isbn.validate('978-92-9043-389-7')
|
||||
issn.validate('1020-3362')
|
||||
</code></pre><h2 id="2019-07-26">2019-07-26</h2>
|
||||
<ul>
|
||||
<li>
|
||||
@ -510,7 +510,7 @@ issn.validate('1020-3362')
|
||||
<p>I figured out a GREL to trim spaces in multi-value cells without splitting them:</p>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>value.replace(/\s+\|\|/,"||").replace(/\|\|\s+/,"||")
|
||||
<pre tabindex="0"><code>value.replace(/\s+\|\|/,"||").replace(/\|\|\s+/,"||")
|
||||
</code></pre><ul>
|
||||
<li>I whipped up a quick script using Python Pandas to do whitespace cleanup</li>
|
||||
</ul>
|
||||
|
Reference in New Issue
Block a user