mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-21 22:25:02 +01:00
Add notes for 2018-02-11
This commit is contained in:
parent
d312304729
commit
3441bd7128
@ -302,6 +302,25 @@ $ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|ds
|
|||||||
- I cherry-picked all the commits for DS-3551 but it won't build on our current DSpace 5.5!
|
- I cherry-picked all the commits for DS-3551 but it won't build on our current DSpace 5.5!
|
||||||
- I sent a message to the dspace-tech mailing list asking why DSpace thinks these connections are busy when PostgreSQL says they are idle
|
- I sent a message to the dspace-tech mailing list asking why DSpace thinks these connections are busy when PostgreSQL says they are idle
|
||||||
|
|
||||||
|
## 2018-02-10
|
||||||
|
|
||||||
|
- I tried to disable ORCID lookups but keep the existing authorities
|
||||||
|
- This item has an ORCID for Ralf Kiese: http://localhost:8080/handle/10568/89897
|
||||||
|
- Switch authority.controlled off and change authorLookup to lookup, and the ORCID badge doesn't show up on the item
|
||||||
|
- Leave all settings but change choices.presentation to lookup and ORCID badge is there and item submission uses LC Name Authority and it breaks with this error:
|
||||||
|
|
||||||
|
```
|
||||||
|
Field dc_contributor_author has choice presentation of type "select", it may NOT be authority-controlled.
|
||||||
|
```
|
||||||
|
|
||||||
|
- If I change choices.presentation to suggest it give this error:
|
||||||
|
|
||||||
|
```
|
||||||
|
xmlui.mirage2.forms.instancedCompositeFields.noSuggestionError
|
||||||
|
```
|
||||||
|
|
||||||
|
- So I don't think we can disable the ORCID lookup function and keep the ORCID badges
|
||||||
|
|
||||||
## 2018-02-11
|
## 2018-02-11
|
||||||
|
|
||||||
- Magdalena from CCAFS emailed to ask why one of their items has such a weird thumbnail: [10568/90735](https://cgspace.cgiar.org/handle/10568/90735)
|
- Magdalena from CCAFS emailed to ask why one of their items has such a weird thumbnail: [10568/90735](https://cgspace.cgiar.org/handle/10568/90735)
|
||||||
@ -315,3 +334,64 @@ $ convert CCAFS_WP_223.pdf\[0\] -profile /usr/local/share/ghostscript/9.22/iccpr
|
|||||||
```
|
```
|
||||||
|
|
||||||
![Manual thumbnail](/cgspace-notes/2018/02/CCAFS_WP_223.jpg)
|
![Manual thumbnail](/cgspace-notes/2018/02/CCAFS_WP_223.jpg)
|
||||||
|
|
||||||
|
- Peter sent me corrected author names last week but the file encoding is messed up:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ isutf8 authors-2018-02-05.csv
|
||||||
|
authors-2018-02-05.csv: line 100, char 18, byte 4179: After a first byte between E1 and EC, expecting the 2nd byte between 80 and BF.
|
||||||
|
```
|
||||||
|
|
||||||
|
- The `isutf8` program comes from `moreutils`
|
||||||
|
- Line 100 contains: Galiè, Alessandra
|
||||||
|
- In other news, psycopg2 is splitting their package in pip, so to install the binary wheel distribution you need to use `pip install psycopg2-binary`
|
||||||
|
- See: http://initd.org/psycopg/articles/2018/02/08/psycopg-274-released/
|
||||||
|
- I updated my `fix-metadata-values.py` and `delete-metadata-values.py` scripts on the scripts page: https://github.com/ilri/DSpace/wiki/Scripts
|
||||||
|
- I ran the 342 author corrections (after trimming whitespace and excluding those with `||` and other syntax errors) on CGSpace:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ./fix-metadata-values.py -i Correct-342-Authors-2018-02-11.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p 'fuuu'
|
||||||
|
```
|
||||||
|
|
||||||
|
- Then I ran a full Discovery re-indexing:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||||
|
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||||
|
```
|
||||||
|
|
||||||
|
- That reminds me that Bizu had asked me to fix some of Alan Duncan's names in December
|
||||||
|
- I see he actually has some variations with "Duncan, Alan J.": https://cgspace.cgiar.org/discover?filtertype_1=author&filter_relational_operator_1=contains&filter_1=Duncan%2C+Alan&submit_apply_filter=&query=
|
||||||
|
- I will just update those for her too and then restart the indexing:
|
||||||
|
|
||||||
|
```
|
||||||
|
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
|
||||||
|
text_value | authority | confidence
|
||||||
|
-----------------+--------------------------------------+------------
|
||||||
|
Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | 600
|
||||||
|
Duncan, Alan J. | 62298c84-4d9d-4b83-a932-4a9dd4046db7 | -1
|
||||||
|
Duncan, Alan J. | | -1
|
||||||
|
Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
|
||||||
|
Duncan, Alan J. | cd0e03bf-92c3-475f-9589-60c5b042ea60 | -1
|
||||||
|
Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | -1
|
||||||
|
Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | -1
|
||||||
|
Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
|
||||||
|
(8 rows)
|
||||||
|
|
||||||
|
dspace=# begin;
|
||||||
|
dspace=# update metadatavalue set text_value='Duncan, Alan', authority='a6486522-b08a-4f7a-84f9-3a73ce56034d', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Duncan, Alan%';
|
||||||
|
UPDATE 216
|
||||||
|
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
|
||||||
|
text_value | authority | confidence
|
||||||
|
--------------+--------------------------------------+------------
|
||||||
|
Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
|
||||||
|
(1 row)
|
||||||
|
dspace=# commit;
|
||||||
|
```
|
||||||
|
|
||||||
|
- Run all system updates on DSpace Test (linode02) and reboot it
|
||||||
|
- I wrote a Python script ([`resolve-orcids-from-solr.py`](https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b)) using SolrClient to parse the Solr authority cache for ORCID IDs
|
||||||
|
- We currently have 1562 authority records with ORCID IDs, and 624 unique IDs
|
||||||
|
- We can use this to build a controlled vocabulary of ORCID IDs for new item submissions
|
||||||
|
- I don't know how to add ORCID IDs to existing items yet... some more querying of PostgreSQL for authority values perhaps?
|
||||||
|
- I added the script to the [ILRI DSpace wiki on GitHub](https://github.com/ilri/DSpace/wiki/Scripts)
|
||||||
|
@ -23,7 +23,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-pl
|
|||||||
|
|
||||||
<meta property="article:published_time" content="2018-02-01T16:28:54+02:00"/>
|
<meta property="article:published_time" content="2018-02-01T16:28:54+02:00"/>
|
||||||
|
|
||||||
<meta property="article:modified_time" content="2018-02-08T01:08:36+02:00"/>
|
<meta property="article:modified_time" content="2018-02-11T10:01:13+02:00"/>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -57,9 +57,9 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-pl
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "February, 2018",
|
"headline": "February, 2018",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2018-02/",
|
"url": "https://alanorth.github.io/cgspace-notes/2018-02/",
|
||||||
"wordCount": "2147",
|
"wordCount": "2666",
|
||||||
"datePublished": "2018-02-01T16:28:54+02:00",
|
"datePublished": "2018-02-01T16:28:54+02:00",
|
||||||
"dateModified": "2018-02-08T01:08:36+02:00",
|
"dateModified": "2018-02-11T10:01:13+02:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -455,6 +455,30 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
|
|||||||
<li>I sent a message to the dspace-tech mailing list asking why DSpace thinks these connections are busy when PostgreSQL says they are idle</li>
|
<li>I sent a message to the dspace-tech mailing list asking why DSpace thinks these connections are busy when PostgreSQL says they are idle</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<h2 id="2018-02-10">2018-02-10</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I tried to disable ORCID lookups but keep the existing authorities</li>
|
||||||
|
<li>This item has an ORCID for Ralf Kiese: <a href="http://localhost:8080/handle/10568/89897">http://localhost:8080/handle/10568/89897</a></li>
|
||||||
|
<li>Switch authority.controlled off and change authorLookup to lookup, and the ORCID badge doesn’t show up on the item</li>
|
||||||
|
<li>Leave all settings but change choices.presentation to lookup and ORCID badge is there and item submission uses LC Name Authority and it breaks with this error:
|
||||||
|
<br /></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>Field dc_contributor_author has choice presentation of type "select", it may NOT be authority-controlled.
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>If I change choices.presentation to suggest it give this error:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>xmlui.mirage2.forms.instancedCompositeFields.noSuggestionError
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>So I don’t think we can disable the ORCID lookup function and keep the ORCID badges</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
<h2 id="2018-02-11">2018-02-11</h2>
|
<h2 id="2018-02-11">2018-02-11</h2>
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
@ -472,6 +496,73 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
|
|||||||
|
|
||||||
<p><img src="/cgspace-notes/2018/02/CCAFS_WP_223.jpg" alt="Manual thumbnail" /></p>
|
<p><img src="/cgspace-notes/2018/02/CCAFS_WP_223.jpg" alt="Manual thumbnail" /></p>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Peter sent me corrected author names last week but the file encoding is messed up:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ isutf8 authors-2018-02-05.csv
|
||||||
|
authors-2018-02-05.csv: line 100, char 18, byte 4179: After a first byte between E1 and EC, expecting the 2nd byte between 80 and BF.
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>The <code>isutf8</code> program comes from <code>moreutils</code></li>
|
||||||
|
<li>Line 100 contains: Galiè, Alessandra</li>
|
||||||
|
<li>In other news, psycopg2 is splitting their package in pip, so to install the binary wheel distribution you need to use <code>pip install psycopg2-binary</code></li>
|
||||||
|
<li>See: <a href="http://initd.org/psycopg/articles/2018/02/08/psycopg-274-released/">http://initd.org/psycopg/articles/2018/02/08/psycopg-274-released/</a></li>
|
||||||
|
<li>I updated my <code>fix-metadata-values.py</code> and <code>delete-metadata-values.py</code> scripts on the scripts page: <a href="https://github.com/ilri/DSpace/wiki/Scripts">https://github.com/ilri/DSpace/wiki/Scripts</a></li>
|
||||||
|
<li>I ran the 342 author corrections (after trimming whitespace and excluding those with <code>||</code> and other syntax errors) on CGSpace:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ ./fix-metadata-values.py -i Correct-342-Authors-2018-02-11.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p 'fuuu'
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Then I ran a full Discovery re-indexing:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||||
|
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>That reminds me that Bizu had asked me to fix some of Alan Duncan’s names in December</li>
|
||||||
|
<li>I see he actually has some variations with “Duncan, Alan J.”: <a href="https://cgspace.cgiar.org/discover?filtertype_1=author&filter_relational_operator_1=contains&filter_1=Duncan%2C+Alan&submit_apply_filter=&query=">https://cgspace.cgiar.org/discover?filtertype_1=author&filter_relational_operator_1=contains&filter_1=Duncan%2C+Alan&submit_apply_filter=&query=</a></li>
|
||||||
|
<li>I will just update those for her too and then restart the indexing:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
|
||||||
|
text_value | authority | confidence
|
||||||
|
-----------------+--------------------------------------+------------
|
||||||
|
Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | 600
|
||||||
|
Duncan, Alan J. | 62298c84-4d9d-4b83-a932-4a9dd4046db7 | -1
|
||||||
|
Duncan, Alan J. | | -1
|
||||||
|
Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
|
||||||
|
Duncan, Alan J. | cd0e03bf-92c3-475f-9589-60c5b042ea60 | -1
|
||||||
|
Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | -1
|
||||||
|
Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | -1
|
||||||
|
Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
|
||||||
|
(8 rows)
|
||||||
|
|
||||||
|
dspace=# begin;
|
||||||
|
dspace=# update metadatavalue set text_value='Duncan, Alan', authority='a6486522-b08a-4f7a-84f9-3a73ce56034d', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Duncan, Alan%';
|
||||||
|
UPDATE 216
|
||||||
|
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
|
||||||
|
text_value | authority | confidence
|
||||||
|
--------------+--------------------------------------+------------
|
||||||
|
Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
|
||||||
|
(1 row)
|
||||||
|
dspace=# commit;
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Run all system updates on DSpace Test (linode02) and reboot it</li>
|
||||||
|
<li>I wrote a Python script (<a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b"><code>resolve-orcids-from-solr.py</code></a>) using SolrClient to parse the Solr authority cache for ORCID IDs</li>
|
||||||
|
<li>We currently have 1562 authority records with ORCID IDs, and 624 unique IDs</li>
|
||||||
|
<li>We can use this to build a controlled vocabulary of ORCID IDs for new item submissions</li>
|
||||||
|
<li>I don’t know how to add ORCID IDs to existing items yet… some more querying of PostgreSQL for authority values perhaps?</li>
|
||||||
|
<li>I added the script to the <a href="https://github.com/ilri/DSpace/wiki/Scripts">ILRI DSpace wiki on GitHub</a></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -32,7 +32,7 @@ Disallow: /cgspace-notes/2015-12/
|
|||||||
Disallow: /cgspace-notes/2015-11/
|
Disallow: /cgspace-notes/2015-11/
|
||||||
Disallow: /cgspace-notes/
|
Disallow: /cgspace-notes/
|
||||||
Disallow: /cgspace-notes/categories/
|
Disallow: /cgspace-notes/categories/
|
||||||
Disallow: /cgspace-notes/categories/notes/
|
|
||||||
Disallow: /cgspace-notes/tags/notes/
|
Disallow: /cgspace-notes/tags/notes/
|
||||||
|
Disallow: /cgspace-notes/categories/notes/
|
||||||
Disallow: /cgspace-notes/post/
|
Disallow: /cgspace-notes/post/
|
||||||
Disallow: /cgspace-notes/tags/
|
Disallow: /cgspace-notes/tags/
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2018-02/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2018-02/</loc>
|
||||||
<lastmod>2018-02-08T01:08:36+02:00</lastmod>
|
<lastmod>2018-02-11T10:01:13+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -149,7 +149,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2018-02-08T01:08:36+02:00</lastmod>
|
<lastmod>2018-02-11T10:01:13+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -158,27 +158,27 @@
|
|||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
<url>
|
||||||
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
|
<lastmod>2018-02-11T10:01:13+02:00</lastmod>
|
||||||
|
<priority>0</priority>
|
||||||
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||||
<lastmod>2017-09-28T12:00:49+03:00</lastmod>
|
<lastmod>2017-09-28T12:00:49+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
|
||||||
<lastmod>2018-02-08T01:08:36+02:00</lastmod>
|
|
||||||
<priority>0</priority>
|
|
||||||
</url>
|
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
||||||
<lastmod>2018-02-08T01:08:36+02:00</lastmod>
|
<lastmod>2018-02-11T10:01:13+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2018-02-08T01:08:36+02:00</lastmod>
|
<lastmod>2018-02-11T10:01:13+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user