mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-09-13
This commit is contained in:
@ -42,7 +42,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id
|
||||
|
||||
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.87.0" />
|
||||
<meta name="generator" content="Hugo 0.88.1" />
|
||||
|
||||
|
||||
|
||||
@ -139,7 +139,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id
|
||||
</li>
|
||||
<li>I exported a random item’s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
|
||||
</ul>
|
||||
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
<pre tabindex="0"><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
</code></pre><ul>
|
||||
<li>Hmm, with the <code>dc.contributor.author</code> column removed, DSpace doesn’t detect any changes</li>
|
||||
<li>With a blank <code>dc.contributor.author</code> column, DSpace wants to remove all non-ORCID authors and add the new ORCID authors</li>
|
||||
@ -161,14 +161,14 @@ I exported a random item’s metadata as CSV, deleted all columns except id
|
||||
</li>
|
||||
<li>That left us with 3,180 valid corrections and 3 deletions:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./fix-metadata-values.py -i authors-fix-3180.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i authors-fix-3180.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
|
||||
$ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -m 3 -d dspacetest -u dspacetest -p fuuu
|
||||
</code></pre><ul>
|
||||
<li>Remove old about page (<a href="https://github.com/ilri/DSpace/pull/284">#284</a>)</li>
|
||||
<li>CGSpace crashed a few times today</li>
|
||||
<li>Generate list of unique authors in CCAFS collections:</li>
|
||||
</ul>
|
||||
<pre><code>dspacetest=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/32729', '10568/5472', '10568/5473', '10568/10288', '10568/70974', '10568/3547', '10568/3549', '10568/3531','10568/16890','10568/5470','10568/3546', '10568/36024', '10568/66581', '10568/21789', '10568/5469', '10568/5468', '10568/3548', '10568/71053', '10568/25167'))) group by text_value order by count desc) to /tmp/ccafs-authors.csv with csv;
|
||||
<pre tabindex="0"><code>dspacetest=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/32729', '10568/5472', '10568/5473', '10568/10288', '10568/70974', '10568/3547', '10568/3549', '10568/3531','10568/16890','10568/5470','10568/3546', '10568/36024', '10568/66581', '10568/21789', '10568/5469', '10568/5468', '10568/3548', '10568/71053', '10568/25167'))) group by text_value order by count desc) to /tmp/ccafs-authors.csv with csv;
|
||||
</code></pre><h2 id="2016-10-05">2016-10-05</h2>
|
||||
<ul>
|
||||
<li>Work on more infrastructure cleanups for Ansible DSpace role</li>
|
||||
@ -190,13 +190,13 @@ $ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -
|
||||
<li>Re-deploy CGSpace with latest changes from late September and early October</li>
|
||||
<li>Run fixes for ILRI subjects and delete blank metadata values:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
|
||||
<pre tabindex="0"><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
|
||||
DELETE 11
|
||||
</code></pre><ul>
|
||||
<li>Run all system updates and reboot CGSpace</li>
|
||||
<li>Delete ten gigs of old 2015 Tomcat logs that never got rotated (WTF?):</li>
|
||||
</ul>
|
||||
<pre><code>root@linode01:~# ls -lh /var/log/tomcat7/localhost_access_log.2015* | wc -l
|
||||
<pre tabindex="0"><code>root@linode01:~# ls -lh /var/log/tomcat7/localhost_access_log.2015* | wc -l
|
||||
47
|
||||
</code></pre><ul>
|
||||
<li>Delete 2GB <code>cron-filter-media.log</code> file, as it is just a log from a cron job and it doesn’t get rotated like normal log files (almost a year now maybe)</li>
|
||||
@ -211,7 +211,7 @@ DELETE 11
|
||||
<ul>
|
||||
<li>A bit more cleanup on the CCAFS authors, and run the corrections on DSpace Test:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./fix-metadata-values.py -i ccafs-authors-oct-16.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i ccafs-authors-oct-16.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
|
||||
</code></pre><ul>
|
||||
<li>One observation is that there are still some old versions of names in the author lookup because authors appear in other communities (as we only corrected authors from CCAFS for this round)</li>
|
||||
</ul>
|
||||
@ -219,7 +219,7 @@ DELETE 11
|
||||
<ul>
|
||||
<li>Start working on DSpace 5.5 porting work again:</li>
|
||||
</ul>
|
||||
<pre><code>$ git checkout -b 5_x-55 5_x-prod
|
||||
<pre tabindex="0"><code>$ git checkout -b 5_x-55 5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
</code></pre><ul>
|
||||
<li>Have to fix about ten merge conflicts, mostly in the SCSS for the CGIAR theme</li>
|
||||
@ -248,25 +248,25 @@ $ git rebase -i dspace-5.5
|
||||
<ul>
|
||||
<li>Move the LIVES community from the top level to the ILRI projects community</li>
|
||||
</ul>
|
||||
<pre><code>$ /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child=10568/25101
|
||||
<pre tabindex="0"><code>$ /home/cgspace.cgiar.org/bin/dspace community-filiator --set --parent=10568/27629 --child=10568/25101
|
||||
</code></pre><ul>
|
||||
<li>Start testing some things for DSpace 5.5, like command line metadata import, PDF media filter, and Atmire CUA</li>
|
||||
<li>Start looking at batch fixing of “old” ILRI website links without www or https, for example:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and text_value like 'http://ilri.org%';
|
||||
<pre tabindex="0"><code>dspace=# select * from metadatavalue where resource_type_id=2 and text_value like 'http://ilri.org%';
|
||||
</code></pre><ul>
|
||||
<li>Also CCAFS has HTTPS and their links should use it where possible:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and text_value like 'http://ccafs.cgiar.org%';
|
||||
<pre tabindex="0"><code>dspace=# select * from metadatavalue where resource_type_id=2 and text_value like 'http://ccafs.cgiar.org%';
|
||||
</code></pre><ul>
|
||||
<li>And this will find community and collection HTML text that is using the old style PNG/JPG icons for RSS and email (we should be using Font Awesome icons instead):</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# select text_value from metadatavalue where resource_type_id in (3,4) and text_value like '%Iconrss2.png%';
|
||||
<pre tabindex="0"><code>dspace=# select text_value from metadatavalue where resource_type_id in (3,4) and text_value like '%Iconrss2.png%';
|
||||
</code></pre><ul>
|
||||
<li>Turns out there are shit tons of varieties of this, like with http, https, www, separate <code></img></code> tags, alignments, etc</li>
|
||||
<li>Had to find all variations and replace them individually:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://www.ilri.org/images/Iconrss2.png"/>','<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://www.ilri.org/images/Iconrss2.png"/>%';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://www.ilri.org/images/Iconrss2.png"/>','<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://www.ilri.org/images/Iconrss2.png"/>%';
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="https://www.ilri.org/images/email.jpg"/>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="https://www.ilri.org/images/email.jpg"/>%';
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="http://www.ilri.org/images/Iconrss2.png"/>', '<span class="fa fa-rss fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="http://www.ilri.org/images/Iconrss2.png"/>%';
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<img align="left" src="http://www.ilri.org/images/email.jpg"/>', '<span class="fa fa-at fa-2x" aria-hidden="true"></span>') where resource_type_id in (3,4) and text_value like '%<img align="left" src="http://www.ilri.org/images/email.jpg"/>%';
|
||||
@ -291,7 +291,7 @@ dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<i
|
||||
<ul>
|
||||
<li>Run Font Awesome fixes on DSpace Test:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \i /tmp/font-awesome-text-replace.sql
|
||||
<pre tabindex="0"><code>dspace=# \i /tmp/font-awesome-text-replace.sql
|
||||
UPDATE 17
|
||||
UPDATE 17
|
||||
UPDATE 3
|
||||
@ -321,7 +321,7 @@ UPDATE 0
|
||||
<ul>
|
||||
<li>Fix some messed up authors on CGSpace:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set authority='799da1d8-22f3-43f5-8233-3d2ef5ebf8a8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Charleston, B.%';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set authority='799da1d8-22f3-43f5-8233-3d2ef5ebf8a8', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Charleston, B.%';
|
||||
UPDATE 10
|
||||
dspace=# update metadatavalue set authority='e936f5c5-343d-4c46-aa91-7a1fff6277ed', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Knight-Jones%';
|
||||
UPDATE 36
|
||||
@ -332,20 +332,20 @@ UPDATE 36
|
||||
<li>Talk to Carlos Quiros about CG Core metadata in CGSpace</li>
|
||||
<li>Get a list of countries from CGSpace so I can do some batch corrections:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=228 group by text_value order by count desc) to /tmp/countries.csv with csv;
|
||||
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=228 group by text_value order by count desc) to /tmp/countries.csv with csv;
|
||||
</code></pre><ul>
|
||||
<li>Fix a bunch of countries in Open Refine and run the corrections on CGSpace:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./fix-metadata-values.py -i countries-fix-18.csv -f dc.coverage.country -t 'correct' -m 228 -d dspace -u dspace -p fuuu
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i countries-fix-18.csv -f dc.coverage.country -t 'correct' -m 228 -d dspace -u dspace -p fuuu
|
||||
$ ./delete-metadata-values.py -i countries-delete-2.csv -f dc.coverage.country -m 228 -d dspace -u dspace -p fuuu
|
||||
</code></pre><ul>
|
||||
<li>Run a shit ton of author fixes from Peter Ballantyne that we’ve been cleaning up for two months:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/authors-fix-pb2.csv -f dc.contributor.author -t correct -m 3 -u dspace -d dspace -p fuuu
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/authors-fix-pb2.csv -f dc.contributor.author -t correct -m 3 -u dspace -d dspace -p fuuu
|
||||
</code></pre><ul>
|
||||
<li>Run a few URL corrections for ilri.org and doi.org, etc:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://www.ilri.org','https://www.ilri.org') where resource_type_id=2 and text_value like '%http://www.ilri.org%';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://www.ilri.org','https://www.ilri.org') where resource_type_id=2 and text_value like '%http://www.ilri.org%';
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://mahider.ilri.org', 'https://cgspace.cgiar.org') where resource_type_id=2 and text_value like '%http://mahider.%.org%' and metadata_field_id not in (28);
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://dx.doi.org', 'https://dx.doi.org') where resource_type_id=2 and text_value like '%http://dx.doi.org%' and metadata_field_id not in (18,26,28,111);
|
||||
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://doi.org', 'https://dx.doi.org') where resource_type_id=2 and text_value like '%http://doi.org%' and metadata_field_id not in (18,26,28,111);
|
||||
|
Reference in New Issue
Block a user