diff --git a/content/posts/2019-10.md b/content/posts/2019-10.md index f3a9a27ba..2ffceab83 100644 --- a/content/posts/2019-10.md +++ b/content/posts/2019-10.md @@ -136,5 +136,44 @@ $ ./fix-metadata-values.py -i /tmp/affiliations.csv -db dspace -u dspace -p 'fuu - More cleanup work on the authors in the Bioversity migration - Now I sent the final feedback to Francesca, Carol, and Abenet +- Peter is still seeing some authors listed with "|" in the "Top Authors" statistics for some collections + - I looked in some of the items that are listed and the author field does not contain those invalid separators + - I decided to try doing a full Discovery re-indexing on CGSpace (linode18): + +``` +$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b + +real 82m35.993s +``` + +- After the re-indexing the top authors still list the following: + +``` +Jagwe, J.|Ouma, E.A.|Brandes-van Dorresteijn, D.|Kawuma, Brian|Smith, J. +``` + +- I looked in the database to find authors that had "|" in them: + +``` +dspace=# SELECT text_value, resource_id FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value LIKE '%|%'; + text_value | resource_id +----------------------------------+------------- + Anandajayasekeram, P.|Puskur, R. | 157 + Morales, J.|Renner, I. | 22779 + Zahid, A.|Haque, M.A. | 25492 +(3 rows) +``` + +- Then I found their handles and corrected them, for example: + +``` +dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '157' and handle.resource_type_id=2; + handle +----------- + 10568/129 +(1 row) +``` + +- So I'm still not sure where these weird authors in the "Top Author" stats are coming from diff --git a/docs/2019-10/index.html b/docs/2019-10/index.html index 3a10d043a..d637f0563 100644 --- a/docs/2019-10/index.html +++ b/docs/2019-10/index.html @@ -11,7 +11,7 @@ - + @@ -27,9 +27,9 @@ "@type": "BlogPosting", "headline": "October, 2019", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-10\/", - "wordCount": "1073", + "wordCount": "1250", "datePublished": "2019-10-01T13:20:51+03:00", - "dateModified": "2019-10-12T23:28:50+03:00", + "dateModified": "2019-10-13T11:59:11+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -286,6 +286,46 @@ International Maize and Wheat Improvement Centre,International Maize and Wheat I + +
  • Peter is still seeing some authors listed with “|” in the “Top Authors” statistics for some collections

    + +
  • + +
  • After the re-indexing the top authors still list the following:

    + +
    Jagwe, J.|Ouma, E.A.|Brandes-van Dorresteijn, D.|Kawuma, Brian|Smith, J.
    +
  • + +
  • I looked in the database to find authors that had “|” in them:

    + +
    dspace=# SELECT text_value, resource_id FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value LIKE '%|%';
    +        text_value            | resource_id 
    +----------------------------------+-------------
    +Anandajayasekeram, P.|Puskur, R. |         157
    +Morales, J.|Renner, I.           |       22779
    +Zahid, A.|Haque, M.A.            |       25492
    +(3 rows)
    +
  • + +
  • Then I found their handles and corrected them, for example:

    + +
    dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '157' and handle.resource_type_id=2;
    +handle   
    +-----------
    +10568/129
    +(1 row)
    +
  • + +
  • So I’m still not sure where these weird authors in the “Top Author” stats are coming from

  • diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 2683beaca..fe20fe416 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/ - 2019-10-12T23:28:50+03:00 + 2019-10-13T11:59:11+03:00 https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-10-12T23:28:50+03:00 + 2019-10-13T11:59:11+03:00 https://alanorth.github.io/cgspace-notes/2019-10/ - 2019-10-12T23:28:50+03:00 + 2019-10-13T11:59:11+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2019-10-12T23:28:50+03:00 + 2019-10-13T11:59:11+03:00 https://alanorth.github.io/cgspace-notes/tags/ - 2019-10-12T23:28:50+03:00 + 2019-10-13T11:59:11+03:00