Add notes for 2018-02-11

This commit is contained in:
2018-02-11 18:21:39 +02:00
parent d312304729
commit 3441bd7128
4 changed files with 185 additions and 14 deletions

View File

@ -302,6 +302,25 @@ $ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|ds
- I cherry-picked all the commits for DS-3551 but it won't build on our current DSpace 5.5!
- I sent a message to the dspace-tech mailing list asking why DSpace thinks these connections are busy when PostgreSQL says they are idle
## 2018-02-10
- I tried to disable ORCID lookups but keep the existing authorities
- This item has an ORCID for Ralf Kiese: http://localhost:8080/handle/10568/89897
- Switch authority.controlled off and change authorLookup to lookup, and the ORCID badge doesn't show up on the item
- Leave all settings but change choices.presentation to lookup and ORCID badge is there and item submission uses LC Name Authority and it breaks with this error:
```
Field dc_contributor_author has choice presentation of type "select", it may NOT be authority-controlled.
```
- If I change choices.presentation to suggest it give this error:
```
xmlui.mirage2.forms.instancedCompositeFields.noSuggestionError
```
- So I don't think we can disable the ORCID lookup function and keep the ORCID badges
## 2018-02-11
- Magdalena from CCAFS emailed to ask why one of their items has such a weird thumbnail: [10568/90735](https://cgspace.cgiar.org/handle/10568/90735)
@ -315,3 +334,64 @@ $ convert CCAFS_WP_223.pdf\[0\] -profile /usr/local/share/ghostscript/9.22/iccpr
```
![Manual thumbnail](/cgspace-notes/2018/02/CCAFS_WP_223.jpg)
- Peter sent me corrected author names last week but the file encoding is messed up:
```
$ isutf8 authors-2018-02-05.csv
authors-2018-02-05.csv: line 100, char 18, byte 4179: After a first byte between E1 and EC, expecting the 2nd byte between 80 and BF.
```
- The `isutf8` program comes from `moreutils`
- Line 100 contains: Galiè, Alessandra
- In other news, psycopg2 is splitting their package in pip, so to install the binary wheel distribution you need to use `pip install psycopg2-binary`
- See: http://initd.org/psycopg/articles/2018/02/08/psycopg-274-released/
- I updated my `fix-metadata-values.py` and `delete-metadata-values.py` scripts on the scripts page: https://github.com/ilri/DSpace/wiki/Scripts
- I ran the 342 author corrections (after trimming whitespace and excluding those with `||` and other syntax errors) on CGSpace:
```
$ ./fix-metadata-values.py -i Correct-342-Authors-2018-02-11.csv -f dc.contributor.author -t correct -m 3 -d dspace -u dspace -p 'fuuu'
```
- Then I ran a full Discovery re-indexing:
```
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
```
- That reminds me that Bizu had asked me to fix some of Alan Duncan's names in December
- I see he actually has some variations with "Duncan, Alan J.": https://cgspace.cgiar.org/discover?filtertype_1=author&filter_relational_operator_1=contains&filter_1=Duncan%2C+Alan&submit_apply_filter=&query=
- I will just update those for her too and then restart the indexing:
```
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
text_value | authority | confidence
-----------------+--------------------------------------+------------
Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | 600
Duncan, Alan J. | 62298c84-4d9d-4b83-a932-4a9dd4046db7 | -1
Duncan, Alan J. | | -1
Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
Duncan, Alan J. | cd0e03bf-92c3-475f-9589-60c5b042ea60 | -1
Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | -1
Duncan, Alan J. | 5ff35043-942e-4d0a-b377-4daed6e3c1a3 | -1
Duncan, Alan J. | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
(8 rows)
dspace=# begin;
dspace=# update metadatavalue set text_value='Duncan, Alan', authority='a6486522-b08a-4f7a-84f9-3a73ce56034d', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Duncan, Alan%';
UPDATE 216
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Duncan, Alan%';
text_value | authority | confidence
--------------+--------------------------------------+------------
Duncan, Alan | a6486522-b08a-4f7a-84f9-3a73ce56034d | 600
(1 row)
dspace=# commit;
```
- Run all system updates on DSpace Test (linode02) and reboot it
- I wrote a Python script ([`resolve-orcids-from-solr.py`](https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b)) using SolrClient to parse the Solr authority cache for ORCID IDs
- We currently have 1562 authority records with ORCID IDs, and 624 unique IDs
- We can use this to build a controlled vocabulary of ORCID IDs for new item submissions
- I don't know how to add ORCID IDs to existing items yet... some more querying of PostgreSQL for authority values perhaps?
- I added the script to the [ILRI DSpace wiki on GitHub](https://github.com/ilri/DSpace/wiki/Scripts)