Update notes for 2019-10-13

This commit is contained in:
2019-10-13 21:17:22 +03:00
parent be1e2a283c
commit 0171ace573
3 changed files with 87 additions and 8 deletions

View File

@ -136,5 +136,44 @@ $ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuu
- More cleanup work on the authors in the Bioversity migration
- Now I sent the final feedback to Francesca, Carol, and Abenet
- Peter is still seeing some authors listed with "|" in the "Top Authors" statistics for some collections
- I looked in some of the items that are listed and the author field does not contain those invalid separators
- I decided to try doing a full Discovery re-indexing on CGSpace (linode18):
```
$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 82m35.993s
```
- After the re-indexing the top authors still list the following:
```
Jagwe, J.|Ouma, E.A.|Brandes-van Dorresteijn, D.|Kawuma, Brian|Smith, J.
```
- I looked in the database to find authors that had "|" in them:
```
dspace=# SELECT text_value, resource_id FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value LIKE '%|%';
text_value | resource_id
----------------------------------+-------------
Anandajayasekeram, P.|Puskur, R. | 157
Morales, J.|Renner, I. | 22779
Zahid, A.|Haque, M.A. | 25492
(3 rows)
```
- Then I found their handles and corrected them, for example:
```
dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '157' and handle.resource_type_id=2;
handle
-----------
10568/129
(1 row)
```
- So I'm still not sure where these weird authors in the "Top Author" stats are coming from
<!-- vim: set sw=2 ts=2: -->