mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-03-04
This commit is contained in:
@ -58,7 +58,7 @@ real 74m42.646s
|
||||
user 8m5.056s
|
||||
sys 2m7.289s
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -154,7 +154,7 @@ sys 2m7.289s
|
||||
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
|
||||
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
</code></pre><ul>
|
||||
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
|
||||
<li>Time to index ~70,000 items on CGSpace:</li>
|
||||
@ -181,7 +181,7 @@ sys 2m7.289s
|
||||
<li>Institut National des Recherches Agricoles du B nin</li>
|
||||
<li>Centre de Coop ration Internationale en Recherche Agronomique pour le D veloppement</li>
|
||||
<li>Institut des Recherches Agricoles du B nin</li>
|
||||
<li>Institut des Savannes, C te d' Ivoire</li>
|
||||
<li>Institut des Savannes, C te d’ Ivoire</li>
|
||||
<li>Institut f r Pflanzenpathologie und Pflanzenschutz der Universit t, Germany</li>
|
||||
<li>Projet de Gestion des Ressources Naturelles, B nin</li>
|
||||
<li>Universit t Hannover</li>
|
||||
@ -193,9 +193,9 @@ sys 2m7.289s
|
||||
<li>I uploaded fixes for all those now, but I will continue with the rest of the data later</li>
|
||||
<li>Regarding the SQL migration errors, Atmire told me I need to run some migrations manually in PostgreSQL:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>delete from schema_version where version = '5.6.2015.12.03.2';
|
||||
update schema_version set version = '5.6.2015.12.03.2' where version = '5.5.2015.12.03.2';
|
||||
update schema_version set version = '5.8.2015.12.03.3' where version = '5.5.2015.12.03.3';
|
||||
<pre tabindex="0"><code>delete from schema_version where version = '5.6.2015.12.03.2';
|
||||
update schema_version set version = '5.6.2015.12.03.2' where version = '5.5.2015.12.03.2';
|
||||
update schema_version set version = '5.8.2015.12.03.3' where version = '5.5.2015.12.03.3';
|
||||
</code></pre><ul>
|
||||
<li>And then I need to ignore the ignored ones:</li>
|
||||
</ul>
|
||||
@ -205,7 +205,7 @@ update schema_version set version = '5.8.2015.12.03.3' where version = '5.5.2015
|
||||
<li>Gabriela from CIP got back to me about the author names we were correcting on CGSpace</li>
|
||||
<li>I did a quick sanity check on them and then did a test import with my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897"><code>fix-metadata-value.py</code></a> script:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-06-08-CIP-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-06-08-CIP-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
|
||||
</code></pre><ul>
|
||||
<li>I will apply them on CGSpace tomorrow I think…</li>
|
||||
</ul>
|
||||
@ -221,7 +221,7 @@ update schema_version set version = '5.8.2015.12.03.3' where version = '5.5.2015
|
||||
<li>After removing all code mentioning MQM, mqm, metadata-quality, batchedit, duplicatechecker, etc, I think I got most of it removed, but there is a Spring error during Tomcat startup:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code> INFO [org.dspace.servicemanager.DSpaceServiceManager] Shutdown DSpace core service manager
|
||||
Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'org.dspace.servicemanager.spring.DSpaceBeanPostProcessor#0' defined in class path resource [spring/spring-dspace-applicationContext.xml]: Unsatisfied dependency expressed through constructor argument with index 0 of type [org.dspace.servicemanager.config.DSpaceConfigurationService]: : Cannot find class [com.atmire.dspace.discovery.ItemCollectionPlugin] for bean with name 'itemCollectionPlugin' defined in file [/home/aorth/dspace/config/spring/api/discovery.xml];
|
||||
Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'org.dspace.servicemanager.spring.DSpaceBeanPostProcessor#0' defined in class path resource [spring/spring-dspace-applicationContext.xml]: Unsatisfied dependency expressed through constructor argument with index 0 of type [org.dspace.servicemanager.config.DSpaceConfigurationService]: : Cannot find class [com.atmire.dspace.discovery.ItemCollectionPlugin] for bean with name 'itemCollectionPlugin' defined in file [/home/aorth/dspace/config/spring/api/discovery.xml];
|
||||
</code></pre><ul>
|
||||
<li>I can fix this by commenting out the <code>ItemCollectionPlugin</code> line of <code>discovery.xml</code>, but from looking at the git log I’m not actually sure if that is related to MQM or not</li>
|
||||
<li>I will have to ask Atmire</li>
|
||||
@ -336,11 +336,11 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>or(
|
||||
value.contains('€'),
|
||||
value.contains('6g'),
|
||||
value.contains('6m'),
|
||||
value.contains('6d'),
|
||||
value.contains('6e')
|
||||
value.contains('€'),
|
||||
value.contains('6g'),
|
||||
value.contains('6m'),
|
||||
value.contains('6d'),
|
||||
value.contains('6e')
|
||||
)
|
||||
</code></pre><ul>
|
||||
<li>So IITA should double check the abstracts for these:
|
||||
@ -357,24 +357,24 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
|
||||
<li>Elizabeth from CIAT contacted me to ask if I could add ORCID identifiers to all of Robin Buruchara’s items</li>
|
||||
<li>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./add-orcid-identifiers-csv.py -i 2018-06-13-Robin-Buruchara.csv -db dspace -u dspace -p 'fuuu'
|
||||
<pre tabindex="0"><code>$ ./add-orcid-identifiers-csv.py -i 2018-06-13-Robin-Buruchara.csv -db dspace -u dspace -p 'fuuu'
|
||||
</code></pre><ul>
|
||||
<li>The contents of <code>2018-06-13-Robin-Buruchara.csv</code> were:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dc.contributor.author,cg.creator.id
|
||||
"Buruchara, Robin",Robin Buruchara: 0000-0003-0934-1218
|
||||
"Buruchara, Robin A.",Robin Buruchara: 0000-0003-0934-1218
|
||||
"Buruchara, Robin",Robin Buruchara: 0000-0003-0934-1218
|
||||
"Buruchara, Robin A.",Robin Buruchara: 0000-0003-0934-1218
|
||||
</code></pre><ul>
|
||||
<li>On a hunch I checked to see if CGSpace’s bitstream cleanup was working properly and of course it’s broken:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ dspace cleanup -v
|
||||
...
|
||||
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(152402) is still referenced from table "bundle".
|
||||
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||||
Detail: Key (bitstream_id)=(152402) is still referenced from table "bundle".
|
||||
</code></pre><ul>
|
||||
<li>As always, the solution is to delete that ID manually in PostgreSQL:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (152402);'
|
||||
<pre tabindex="0"><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (152402);'
|
||||
UPDATE 1
|
||||
</code></pre><h2 id="2018-06-14">2018-06-14</h2>
|
||||
<ul>
|
||||
@ -389,9 +389,9 @@ UPDATE 1
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ dropdb -h localhost -U postgres dspacetest
|
||||
$ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost /tmp/cgspace_2018-06-24.backup
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
</code></pre><ul>
|
||||
<li>The <code>-O</code> option to <code>pg_restore</code> makes the import process ignore ownership specified in the dump itself, and instead makes the schema owned by the user doing the restore</li>
|
||||
<li>I always prefer to use the <code>postgres</code> user locally because it’s just easier than remembering the <code>dspacetest</code> user’s password, but then I couldn’t figure out why the resulting schema was owned by <code>postgres</code></li>
|
||||
@ -413,13 +413,13 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser
|
||||
<li>So I need to make sure to run the following during the DSpace 5.8 upgrade:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>-- Delete existing CUA 4 migration if it exists
|
||||
delete from schema_version where version = '5.6.2015.12.03.2';
|
||||
delete from schema_version where version = '5.6.2015.12.03.2';
|
||||
|
||||
-- Update version of CUA 4 migration
|
||||
update schema_version set version = '5.6.2015.12.03.2' where version = '5.5.2015.12.03.2';
|
||||
update schema_version set version = '5.6.2015.12.03.2' where version = '5.5.2015.12.03.2';
|
||||
|
||||
-- Delete MQM migration since we're no longer using it
|
||||
delete from schema_version where version = '5.5.2015.12.03.3';
|
||||
-- Delete MQM migration since we're no longer using it
|
||||
delete from schema_version where version = '5.5.2015.12.03.3';
|
||||
</code></pre><ul>
|
||||
<li>After that you can run the migrations manually and then DSpace should work fine:</li>
|
||||
</ul>
|
||||
@ -427,17 +427,17 @@ delete from schema_version where version = '5.5.2015.12.03.3';
|
||||
...
|
||||
Done.
|
||||
</code></pre><ul>
|
||||
<li>Elizabeth from CIAT contacted me to ask if I could add ORCID identifiers to all of Andy Jarvis' items on CGSpace</li>
|
||||
<li>Elizabeth from CIAT contacted me to ask if I could add ORCID identifiers to all of Andy Jarvis’ items on CGSpace</li>
|
||||
<li>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./add-orcid-identifiers-csv.py -i 2018-06-24-andy-jarvis-orcid.csv -db dspacetest -u dspacetest -p 'fuuu'
|
||||
<pre tabindex="0"><code>$ ./add-orcid-identifiers-csv.py -i 2018-06-24-andy-jarvis-orcid.csv -db dspacetest -u dspacetest -p 'fuuu'
|
||||
</code></pre><ul>
|
||||
<li>The contents of <code>2018-06-24-andy-jarvis-orcid.csv</code> were:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dc.contributor.author,cg.creator.id
|
||||
"Jarvis, A.",Andy Jarvis: 0000-0001-6543-0798
|
||||
"Jarvis, Andy",Andy Jarvis: 0000-0001-6543-0798
|
||||
"Jarvis, Andrew",Andy Jarvis: 0000-0001-6543-0798
|
||||
"Jarvis, A.",Andy Jarvis: 0000-0001-6543-0798
|
||||
"Jarvis, Andy",Andy Jarvis: 0000-0001-6543-0798
|
||||
"Jarvis, Andrew",Andy Jarvis: 0000-0001-6543-0798
|
||||
</code></pre><h2 id="2018-06-26">2018-06-26</h2>
|
||||
<ul>
|
||||
<li>Atmire got back to me to say that we can remove the <code>itemCollectionPlugin</code> and <code>HasBitstreamsSSIPlugin</code> beans from DSpace’s <code>discovery.xml</code> file, as they are used by the Metadata Quality Module (MQM) that we are not using anymore</li>
|
||||
@ -455,19 +455,19 @@ Done.
|
||||
<li>I’ll have to figure out how to separate those we’re keeping, deleting, and mapping into CIFOR’s archive collection</li>
|
||||
<li>First, get the 62 deletes from Vika’s file and remove them from the collection:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ grep delete 2018-06-22-cifor-duplicates.txt | grep -o -E '[0-9]{5}\/[0-9]{5}' > cifor-handle-to-delete.txt
|
||||
<pre tabindex="0"><code>$ grep delete 2018-06-22-cifor-duplicates.txt | grep -o -E '[0-9]{5}\/[0-9]{5}' > cifor-handle-to-delete.txt
|
||||
$ wc -l cifor-handle-to-delete.txt
|
||||
62 cifor-handle-to-delete.txt
|
||||
$ wc -l 10568-92904.csv
|
||||
2461 10568-92904.csv
|
||||
$ while read line; do sed -i "\#$line#d" 10568-92904.csv; done < cifor-handle-to-delete.txt
|
||||
$ while read line; do sed -i "\#$line#d" 10568-92904.csv; done < cifor-handle-to-delete.txt
|
||||
$ wc -l 10568-92904.csv
|
||||
2399 10568-92904.csv
|
||||
</code></pre><ul>
|
||||
<li>This iterates over the handles for deletion and uses <code>sed</code> with an alternative pattern delimiter of ‘#’ (which must be escaped), because the pattern itself contains a ‘/’</li>
|
||||
<li>The mapped ones will be difficult because we need their internal IDs in order to map them, and there are 50 of them:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ grep map 2018-06-22-cifor-duplicates.txt | grep -o -E '[0-9]{5}\/[0-9]{5}' > cifor-handle-to-map.txt
|
||||
<pre tabindex="0"><code>$ grep map 2018-06-22-cifor-duplicates.txt | grep -o -E '[0-9]{5}\/[0-9]{5}' > cifor-handle-to-map.txt
|
||||
$ wc -l cifor-handle-to-map.txt
|
||||
50 cifor-handle-to-map.txt
|
||||
</code></pre><ul>
|
||||
@ -475,7 +475,7 @@ $ wc -l cifor-handle-to-map.txt
|
||||
<li>Oooh, I can export the items one by one, concatenate them together, remove the headers, and extract the <code>id</code> and <code>collection</code> columns using <a href="https://csvkit.readthedocs.io/">csvkit</a>:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ while read line; do filename=${line/\//-}.csv; dspace metadata-export -i $line -f $filename; done < /tmp/cifor-handle-to-map.txt
|
||||
$ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 > map-to-cifor-archive.csv
|
||||
$ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 > map-to-cifor-archive.csv
|
||||
</code></pre><ul>
|
||||
<li>Then I can use Open Refine to add the “CIFOR Archive” collection to the mappings</li>
|
||||
<li>Importing the 2398 items via <code>dspace metadata-import</code> ends up with a Java garbage collection error, so I think I need to do it in batches of 1,000</li>
|
||||
|
Reference in New Issue
Block a user