diff --git a/content/post/2018-02.md b/content/post/2018-02.md index 28d875df7..472ab9453 100644 --- a/content/post/2018-02.md +++ b/content/post/2018-02.md @@ -302,6 +302,25 @@ $ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|ds - I cherry-picked all the commits for DS-3551 but it won't build on our current DSpace 5.5! - I sent a message to the dspace-tech mailing list asking why DSpace thinks these connections are busy when PostgreSQL says they are idle +## 2018-02-10 + +- I tried to disable ORCID lookups but keep the existing authorities +- This item has an ORCID for Ralf Kiese: http://localhost:8080/handle/10568/89897 +- Switch authority.controlled off and change authorLookup to lookup, and the ORCID badge doesn't show up on the item +- Leave all settings but change choices.presentation to lookup and ORCID badge is there and item submission uses LC Name Authority and it breaks with this error: + +``` +Field dc_contributor_author has choice presentation of type "select", it may NOT be authority-controlled. +``` + +- If I change choices.presentation to suggest it give this error: + +``` +xmlui.mirage2.forms.instancedCompositeFields.noSuggestionError +``` + +- So I don't think we can disable the ORCID lookup function and keep the ORCID badges + ## 2018-02-11 - Magdalena from CCAFS emailed to ask why one of their items has such a weird thumbnail: [10568/90735](https://cgspace.cgiar.org/handle/10568/90735) @@ -315,3 +334,64 @@ $ convert CCAFS_WP_223.pdf\[0\] -profile /usr/local/share/ghostscript/9.22/iccpr ``` ![Manual thumbnail](/cgspace-notes/2018/02/CCAFS_WP_223.jpg) + +- Peter sent me corrected author names last week but the file encoding is messed up: + +``` +$ isutf8 authors-2018-02-05.csv +authors-2018-02-05.csv: line 100, char 18, byte 4179: After a first byte between E1 and EC, expecting the 2nd byte between 80 and BF. +``` + +- The `isutf8` program comes from `moreutils` +- Line 100 contains: Galiè, Alessandra +- In other news, psycopg2 is splitting their package in pip, so to install the binary wheel distribution you need to use `pip install psycopg2-binary` +- See: http://initd.org/psycopg/articles/2018/02/08/psycopg-274-released/ +- I updated my `fix-metadata-values.py` and `delete-metadata-values.py` scripts on the scripts page: https://github.com/ilri/DSpace/wiki/Scripts +- I ran the 342 author corrections (after trimming whitespace and excluding those with `||` and other syntax errors) on CGSpace: + +``` +$ ./fix-metadata-values.py -i Correct-342-Authors-2018-02-11.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p 'fuuu' +``` + +- Then I ran a full Discovery re-indexing: + +``` +$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m" +$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b +``` + +- That reminds me that Bizu had asked me to fix some of Alan Duncan's names in December +- I see he actually has some variations with "Duncan, Alan J.": https://cgspace.cgiar.org/discover?filtertype_1=author&filter_relational_operator_1=contains&filter_1=Duncan%2C+Alan&submit_apply_filter=&query= +- I will just update those for her too and then restart the indexing: + +``` +dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%'; + text_value | authority | confidence +-----------------+--------------------------------------+------------ + Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | 600 + Duncan, Alan J. | 62298c84-4d9d-4b83-a932-4a9dd4046db7 | -1 + Duncan, Alan J. | | -1 + Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600 + Duncan, Alan J. | cd0e03bf-92c3-475f-9589-60c5b042ea60 | -1 + Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | -1 + Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | -1 + Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600 +(8 rows) + +dspace=# begin; +dspace=# update metadatavalue set text_value='Duncan, Alan', authority='a6486522-b08a-4f7a-84f9-3a73ce56034d', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Duncan, Alan%'; +UPDATE 216 +dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%'; + text_value | authority | confidence +--------------+--------------------------------------+------------ + Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600 +(1 row) +dspace=# commit; +``` + +- Run all system updates on DSpace Test (linode02) and reboot it +- I wrote a Python script ([`resolve-orcids-from-solr.py`](https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b)) using SolrClient to parse the Solr authority cache for ORCID IDs +- We currently have 1562 authority records with ORCID IDs, and 624 unique IDs +- We can use this to build a controlled vocabulary of ORCID IDs for new item submissions +- I don't know how to add ORCID IDs to existing items yet... some more querying of PostgreSQL for authority values perhaps? +- I added the script to the [ILRI DSpace wiki on GitHub](https://github.com/ilri/DSpace/wiki/Scripts) diff --git a/public/2018-02/index.html b/public/2018-02/index.html index 7580b8818..b1f9779ff 100644 --- a/public/2018-02/index.html +++ b/public/2018-02/index.html @@ -23,7 +23,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-pl - + @@ -57,9 +57,9 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-pl "@type": "BlogPosting", "headline": "February, 2018", "url": "https://alanorth.github.io/cgspace-notes/2018-02/", - "wordCount": "2147", + "wordCount": "2666", "datePublished": "2018-02-01T16:28:54+02:00", - "dateModified": "2018-02-08T01:08:36+02:00", + "dateModified": "2018-02-11T10:01:13+02:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -455,6 +455,30 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
Field dc_contributor_author has choice presentation of type "select", it may NOT be authority-controlled.
+
+
+xmlui.mirage2.forms.instancedCompositeFields.noSuggestionError
+
+
+$ isutf8 authors-2018-02-05.csv
+authors-2018-02-05.csv: line 100, char 18, byte 4179: After a first byte between E1 and EC, expecting the 2nd byte between 80 and BF.
+
+
+isutf8
program comes from moreutils
pip install psycopg2-binary
fix-metadata-values.py
and delete-metadata-values.py
scripts on the scripts page: https://github.com/ilri/DSpace/wiki/Scripts||
and other syntax errors) on CGSpace:$ ./fix-metadata-values.py -i Correct-342-Authors-2018-02-11.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p 'fuuu'
+
+
+$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
+$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
+
+
+dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
+ text_value | authority | confidence
+-----------------+--------------------------------------+------------
+ Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | 600
+ Duncan, Alan J. | 62298c84-4d9d-4b83-a932-4a9dd4046db7 | -1
+ Duncan, Alan J. | | -1
+ Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
+ Duncan, Alan J. | cd0e03bf-92c3-475f-9589-60c5b042ea60 | -1
+ Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | -1
+ Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | -1
+ Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
+(8 rows)
+
+dspace=# begin;
+dspace=# update metadatavalue set text_value='Duncan, Alan', authority='a6486522-b08a-4f7a-84f9-3a73ce56034d', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Duncan, Alan%';
+UPDATE 216
+dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
+ text_value | authority | confidence
+--------------+--------------------------------------+------------
+ Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
+(1 row)
+dspace=# commit;
+
+
+resolve-orcids-from-solr.py
) using SolrClient to parse the Solr authority cache for ORCID IDs