diff --git a/content/posts/2019-10.md b/content/posts/2019-10.md index f3a9a27ba..2ffceab83 100644 --- a/content/posts/2019-10.md +++ b/content/posts/2019-10.md @@ -136,5 +136,44 @@ $ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuu - More cleanup work on the authors in the Bioversity migration - Now I sent the final feedback to Francesca, Carol, and Abenet +- Peter is still seeing some authors listed with "|" in the "Top Authors" statistics for some collections + - I looked in some of the items that are listed and the author field does not contain those invalid separators + - I decided to try doing a full Discovery re-indexing on CGSpace (linode18): + +``` +$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b + +real 82m35.993s +``` + +- After the re-indexing the top authors still list the following: + +``` +Jagwe, J.|Ouma, E.A.|Brandes-van Dorresteijn, D.|Kawuma, Brian|Smith, J. +``` + +- I looked in the database to find authors that had "|" in them: + +``` +dspace=# SELECT text_value, resource_id FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value LIKE '%|%'; + text_value | resource_id +----------------------------------+------------- + Anandajayasekeram, P.|Puskur, R. | 157 + Morales, J.|Renner, I. | 22779 + Zahid, A.|Haque, M.A. | 25492 +(3 rows) +``` + +- Then I found their handles and corrected them, for example: + +``` +dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '157' and handle.resource_type_id=2; + handle +----------- + 10568/129 +(1 row) +``` + +- So I'm still not sure where these weird authors in the "Top Author" stats are coming from diff --git a/docs/2019-10/index.html b/docs/2019-10/index.html index 3a10d043a..d637f0563 100644 --- a/docs/2019-10/index.html +++ b/docs/2019-10/index.html @@ -11,7 +11,7 @@ - + @@ -27,9 +27,9 @@ "@type": "BlogPosting", "headline": "October, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-10\/", - "wordCount": "1073", + "wordCount": "1250", "datePublished": "2019-10-01T13:20:51+03:00", - "dateModified": "2019-10-12T23:28:50+03:00", + "dateModified": "2019-10-13T11:59:11+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -286,6 +286,46 @@ International Maize and Wheat Improvement Centre,International Maize and Wheat I
Peter is still seeing some authors listed with “|” in the “Top Authors” statistics for some collections
+ +I decided to try doing a full Discovery re-indexing on CGSpace (linode18):
+ +$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
+
+real 82m35.993s
+
After the re-indexing the top authors still list the following:
+ +Jagwe, J.|Ouma, E.A.|Brandes-van Dorresteijn, D.|Kawuma, Brian|Smith, J.
+
I looked in the database to find authors that had “|” in them:
+ +dspace=# SELECT text_value, resource_id FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value LIKE '%|%';
+ text_value | resource_id
+----------------------------------+-------------
+Anandajayasekeram, P.|Puskur, R. | 157
+Morales, J.|Renner, I. | 22779
+Zahid, A.|Haque, M.A. | 25492
+(3 rows)
+
Then I found their handles and corrected them, for example:
+ +dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '157' and handle.resource_type_id=2;
+handle
+-----------
+10568/129
+(1 row)
+
So I’m still not sure where these weird authors in the “Top Author” stats are coming from