mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-21 22:25:02 +01:00
Add notes for 2018-02-11
This commit is contained in:
parent
d312304729
commit
3441bd7128
@ -302,6 +302,25 @@ $ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|ds
|
||||
- I cherry-picked all the commits for DS-3551 but it won't build on our current DSpace 5.5!
|
||||
- I sent a message to the dspace-tech mailing list asking why DSpace thinks these connections are busy when PostgreSQL says they are idle
|
||||
|
||||
## 2018-02-10
|
||||
|
||||
- I tried to disable ORCID lookups but keep the existing authorities
|
||||
- This item has an ORCID for Ralf Kiese: http://localhost:8080/handle/10568/89897
|
||||
- Switch authority.controlled off and change authorLookup to lookup, and the ORCID badge doesn't show up on the item
|
||||
- Leave all settings but change choices.presentation to lookup and ORCID badge is there and item submission uses LC Name Authority and it breaks with this error:
|
||||
|
||||
```
|
||||
Field dc_contributor_author has choice presentation of type "select", it may NOT be authority-controlled.
|
||||
```
|
||||
|
||||
- If I change choices.presentation to suggest it give this error:
|
||||
|
||||
```
|
||||
xmlui.mirage2.forms.instancedCompositeFields.noSuggestionError
|
||||
```
|
||||
|
||||
- So I don't think we can disable the ORCID lookup function and keep the ORCID badges
|
||||
|
||||
## 2018-02-11
|
||||
|
||||
- Magdalena from CCAFS emailed to ask why one of their items has such a weird thumbnail: [10568/90735](https://cgspace.cgiar.org/handle/10568/90735)
|
||||
@ -315,3 +334,64 @@ $ convert CCAFS_WP_223.pdf\[0\] -profile /usr/local/share/ghostscript/9.22/iccpr
|
||||
```
|
||||
|
||||
![Manual thumbnail](/cgspace-notes/2018/02/CCAFS_WP_223.jpg)
|
||||
|
||||
- Peter sent me corrected author names last week but the file encoding is messed up:
|
||||
|
||||
```
|
||||
$ isutf8 authors-2018-02-05.csv
|
||||
authors-2018-02-05.csv: line 100, char 18, byte 4179: After a first byte between E1 and EC, expecting the 2nd byte between 80 and BF.
|
||||
```
|
||||
|
||||
- The `isutf8` program comes from `moreutils`
|
||||
- Line 100 contains: Galiè, Alessandra
|
||||
- In other news, psycopg2 is splitting their package in pip, so to install the binary wheel distribution you need to use `pip install psycopg2-binary`
|
||||
- See: http://initd.org/psycopg/articles/2018/02/08/psycopg-274-released/
|
||||
- I updated my `fix-metadata-values.py` and `delete-metadata-values.py` scripts on the scripts page: https://github.com/ilri/DSpace/wiki/Scripts
|
||||
- I ran the 342 author corrections (after trimming whitespace and excluding those with `||` and other syntax errors) on CGSpace:
|
||||
|
||||
```
|
||||
$ ./fix-metadata-values.py -i Correct-342-Authors-2018-02-11.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p 'fuuu'
|
||||
```
|
||||
|
||||
- Then I ran a full Discovery re-indexing:
|
||||
|
||||
```
|
||||
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
```
|
||||
|
||||
- That reminds me that Bizu had asked me to fix some of Alan Duncan's names in December
|
||||
- I see he actually has some variations with "Duncan, Alan J.": https://cgspace.cgiar.org/discover?filtertype_1=author&filter_relational_operator_1=contains&filter_1=Duncan%2C+Alan&submit_apply_filter=&query=
|
||||
- I will just update those for her too and then restart the indexing:
|
||||
|
||||
```
|
||||
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
|
||||
text_value | authority | confidence
|
||||
-----------------+--------------------------------------+------------
|
||||
Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | 600
|
||||
Duncan, Alan J. | 62298c84-4d9d-4b83-a932-4a9dd4046db7 | -1
|
||||
Duncan, Alan J. | | -1
|
||||
Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
|
||||
Duncan, Alan J. | cd0e03bf-92c3-475f-9589-60c5b042ea60 | -1
|
||||
Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | -1
|
||||
Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | -1
|
||||
Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
|
||||
(8 rows)
|
||||
|
||||
dspace=# begin;
|
||||
dspace=# update metadatavalue set text_value='Duncan, Alan', authority='a6486522-b08a-4f7a-84f9-3a73ce56034d', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Duncan, Alan%';
|
||||
UPDATE 216
|
||||
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
|
||||
text_value | authority | confidence
|
||||
--------------+--------------------------------------+------------
|
||||
Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
|
||||
(1 row)
|
||||
dspace=# commit;
|
||||
```
|
||||
|
||||
- Run all system updates on DSpace Test (linode02) and reboot it
|
||||
- I wrote a Python script ([`resolve-orcids-from-solr.py`](https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b)) using SolrClient to parse the Solr authority cache for ORCID IDs
|
||||
- We currently have 1562 authority records with ORCID IDs, and 624 unique IDs
|
||||
- We can use this to build a controlled vocabulary of ORCID IDs for new item submissions
|
||||
- I don't know how to add ORCID IDs to existing items yet... some more querying of PostgreSQL for authority values perhaps?
|
||||
- I added the script to the [ILRI DSpace wiki on GitHub](https://github.com/ilri/DSpace/wiki/Scripts)
|
||||
|
@ -23,7 +23,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-pl
|
||||
|
||||
<meta property="article:published_time" content="2018-02-01T16:28:54+02:00"/>
|
||||
|
||||
<meta property="article:modified_time" content="2018-02-08T01:08:36+02:00"/>
|
||||
<meta property="article:modified_time" content="2018-02-11T10:01:13+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -57,9 +57,9 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-pl
|
||||
"@type": "BlogPosting",
|
||||
"headline": "February, 2018",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2018-02/",
|
||||
"wordCount": "2147",
|
||||
"wordCount": "2666",
|
||||
"datePublished": "2018-02-01T16:28:54+02:00",
|
||||
"dateModified": "2018-02-08T01:08:36+02:00",
|
||||
"dateModified": "2018-02-11T10:01:13+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -455,6 +455,30 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
|
||||
<li>I sent a message to the dspace-tech mailing list asking why DSpace thinks these connections are busy when PostgreSQL says they are idle</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-02-10">2018-02-10</h2>
|
||||
|
||||
<ul>
|
||||
<li>I tried to disable ORCID lookups but keep the existing authorities</li>
|
||||
<li>This item has an ORCID for Ralf Kiese: <a href="http://localhost:8080/handle/10568/89897">http://localhost:8080/handle/10568/89897</a></li>
|
||||
<li>Switch authority.controlled off and change authorLookup to lookup, and the ORCID badge doesn’t show up on the item</li>
|
||||
<li>Leave all settings but change choices.presentation to lookup and ORCID badge is there and item submission uses LC Name Authority and it breaks with this error:
|
||||
<br /></li>
|
||||
</ul>
|
||||
|
||||
<pre><code>Field dc_contributor_author has choice presentation of type "select", it may NOT be authority-controlled.
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>If I change choices.presentation to suggest it give this error:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>xmlui.mirage2.forms.instancedCompositeFields.noSuggestionError
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>So I don’t think we can disable the ORCID lookup function and keep the ORCID badges</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-02-11">2018-02-11</h2>
|
||||
|
||||
<ul>
|
||||
@ -472,6 +496,73 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
|
||||
|
||||
<p><img src="/cgspace-notes/2018/02/CCAFS_WP_223.jpg" alt="Manual thumbnail" /></p>
|
||||
|
||||
<ul>
|
||||
<li>Peter sent me corrected author names last week but the file encoding is messed up:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ isutf8 authors-2018-02-05.csv
|
||||
authors-2018-02-05.csv: line 100, char 18, byte 4179: After a first byte between E1 and EC, expecting the 2nd byte between 80 and BF.
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>The <code>isutf8</code> program comes from <code>moreutils</code></li>
|
||||
<li>Line 100 contains: Galiè, Alessandra</li>
|
||||
<li>In other news, psycopg2 is splitting their package in pip, so to install the binary wheel distribution you need to use <code>pip install psycopg2-binary</code></li>
|
||||
<li>See: <a href="http://initd.org/psycopg/articles/2018/02/08/psycopg-274-released/">http://initd.org/psycopg/articles/2018/02/08/psycopg-274-released/</a></li>
|
||||
<li>I updated my <code>fix-metadata-values.py</code> and <code>delete-metadata-values.py</code> scripts on the scripts page: <a href="https://github.com/ilri/DSpace/wiki/Scripts">https://github.com/ilri/DSpace/wiki/Scripts</a></li>
|
||||
<li>I ran the 342 author corrections (after trimming whitespace and excluding those with <code>||</code> and other syntax errors) on CGSpace:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ ./fix-metadata-values.py -i Correct-342-Authors-2018-02-11.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p 'fuuu'
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Then I ran a full Discovery re-indexing:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>That reminds me that Bizu had asked me to fix some of Alan Duncan’s names in December</li>
|
||||
<li>I see he actually has some variations with “Duncan, Alan J.”: <a href="https://cgspace.cgiar.org/discover?filtertype_1=author&filter_relational_operator_1=contains&filter_1=Duncan%2C+Alan&submit_apply_filter=&query=">https://cgspace.cgiar.org/discover?filtertype_1=author&filter_relational_operator_1=contains&filter_1=Duncan%2C+Alan&submit_apply_filter=&query=</a></li>
|
||||
<li>I will just update those for her too and then restart the indexing:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
|
||||
text_value | authority | confidence
|
||||
-----------------+--------------------------------------+------------
|
||||
Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | 600
|
||||
Duncan, Alan J. | 62298c84-4d9d-4b83-a932-4a9dd4046db7 | -1
|
||||
Duncan, Alan J. | | -1
|
||||
Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
|
||||
Duncan, Alan J. | cd0e03bf-92c3-475f-9589-60c5b042ea60 | -1
|
||||
Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | -1
|
||||
Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | -1
|
||||
Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
|
||||
(8 rows)
|
||||
|
||||
dspace=# begin;
|
||||
dspace=# update metadatavalue set text_value='Duncan, Alan', authority='a6486522-b08a-4f7a-84f9-3a73ce56034d', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Duncan, Alan%';
|
||||
UPDATE 216
|
||||
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
|
||||
text_value | authority | confidence
|
||||
--------------+--------------------------------------+------------
|
||||
Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
|
||||
(1 row)
|
||||
dspace=# commit;
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Run all system updates on DSpace Test (linode02) and reboot it</li>
|
||||
<li>I wrote a Python script (<a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b"><code>resolve-orcids-from-solr.py</code></a>) using SolrClient to parse the Solr authority cache for ORCID IDs</li>
|
||||
<li>We currently have 1562 authority records with ORCID IDs, and 624 unique IDs</li>
|
||||
<li>We can use this to build a controlled vocabulary of ORCID IDs for new item submissions</li>
|
||||
<li>I don’t know how to add ORCID IDs to existing items yet… some more querying of PostgreSQL for authority values perhaps?</li>
|
||||
<li>I added the script to the <a href="https://github.com/ilri/DSpace/wiki/Scripts">ILRI DSpace wiki on GitHub</a></li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
|
||||
|
@ -32,7 +32,7 @@ Disallow: /cgspace-notes/2015-12/
|
||||
Disallow: /cgspace-notes/2015-11/
|
||||
Disallow: /cgspace-notes/
|
||||
Disallow: /cgspace-notes/categories/
|
||||
Disallow: /cgspace-notes/categories/notes/
|
||||
Disallow: /cgspace-notes/tags/notes/
|
||||
Disallow: /cgspace-notes/categories/notes/
|
||||
Disallow: /cgspace-notes/post/
|
||||
Disallow: /cgspace-notes/tags/
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2018-02/</loc>
|
||||
<lastmod>2018-02-08T01:08:36+02:00</lastmod>
|
||||
<lastmod>2018-02-11T10:01:13+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -149,7 +149,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2018-02-08T01:08:36+02:00</lastmod>
|
||||
<lastmod>2018-02-11T10:01:13+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -158,27 +158,27 @@
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2018-02-11T10:01:13+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2017-09-28T12:00:49+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2018-02-08T01:08:36+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
||||
<lastmod>2018-02-08T01:08:36+02:00</lastmod>
|
||||
<lastmod>2018-02-11T10:01:13+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2018-02-08T01:08:36+02:00</lastmod>
|
||||
<lastmod>2018-02-11T10:01:13+02:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user