From 3fa0fd445829f95e57b35641aff59ef028135a77 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Wed, 3 Apr 2019 17:40:05 +0300 Subject: [PATCH] Update notes for 2019-04-03 --- content/posts/2019-04.md | 27 ++++++++++++++++++++++++++- docs/2019-04/index.html | 38 ++++++++++++++++++++++++++++++++++---- docs/sitemap.xml | 10 +++++----- 3 files changed, 65 insertions(+), 10 deletions(-) diff --git a/content/posts/2019-04.md b/content/posts/2019-04.md index bf835dbd4..a69b6bcc5 100644 --- a/content/posts/2019-04.md +++ b/content/posts/2019-04.md @@ -42,7 +42,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace - First I need to extract the ones that are unique from their list compared to our existing one: ``` -$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > /tmp/2019-04-03-orcid-ids.txt +$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u > /tmp/2019-04-03-orcid-ids.txt ``` - We currently have 1177 unique ORCID identifiers, and this brings our total to 1237! @@ -52,4 +52,29 @@ $ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.tx $ ./resolve-orcids.py -i /tmp/2019-04-03-orcid-ids.txt -o 2019-04-03-orcid-ids.txt -d ``` +- After that I added the XML formatting, formatted the file with tidy, and sorted the names in vim +- One user's name has changed so I will update those using my `fix-metadata-values.py` script: + +``` +$ ./fix-metadata-values.py -i 2019-04-03-update-orcids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -m 240 -t correct -d +``` + +- I created a pull request and merged the changes to the 5_x-prod branch ([#417](https://github.com/ilri/DSpace/pull/417)) +- A few days ago I noticed some weird update process for the statistics-2018 Solr core and I see it's still going: + +``` +2019-04-03 16:34:02,262 INFO org.dspace.statistics.SolrLogger @ Updating : 1754500/21701 docs in http://localhost:8081/solr//statistics-2018 +``` + +- Interestingly, there are 5666 occurences, and they are mostly for the 2018 core: + +``` +$ grep 'org.dspace.statistics.SolrLogger @ Updating' /home/cgspace.cgiar.org/log/dspace.log.2019-04-03 | awk '{print $11}' | sort | uniq -c + 1 + 3 http://localhost:8081/solr//statistics-2017 + 5662 http://localhost:8081/solr//statistics-2018 +``` + +- I will have to keep an eye on it because nothing should be updating 2018 stats in 2019... + diff --git a/docs/2019-04/index.html b/docs/2019-04/index.html index 8465547c6..0cbfd2edd 100644 --- a/docs/2019-04/index.html +++ b/docs/2019-04/index.html @@ -38,7 +38,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace - + @@ -81,9 +81,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace "@type": "BlogPosting", "headline": "April, 2019", "url": "https://alanorth.github.io/cgspace-notes/2019-04/", - "wordCount": "347", + "wordCount": "492", "datePublished": "2019-04-01T09:00:43+03:00", - "dateModified": "2019-04-02T20:32:18+03:00", + "dateModified": "2019-04-03T17:01:31+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -203,7 +203,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -
$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > /tmp/2019-04-03-orcid-ids.txt
+
$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u > /tmp/2019-04-03-orcid-ids.txt
 
    @@ -214,6 +214,36 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
    $ ./resolve-orcids.py -i /tmp/2019-04-03-orcid-ids.txt -o 2019-04-03-orcid-ids.txt -d
     
    +
      +
    • After that I added the XML formatting, formatted the file with tidy, and sorted the names in vim
    • +
    • One user’s name has changed so I will update those using my fix-metadata-values.py script:
    • +
    + +
    $ ./fix-metadata-values.py -i 2019-04-03-update-orcids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -m 240 -t correct -d
    +
    + +
      +
    • I created a pull request and merged the changes to the 5_x-prod branch (#417)
    • +
    • A few days ago I noticed some weird update process for the statistics-2018 Solr core and I see it’s still going:
    • +
    + +
    2019-04-03 16:34:02,262 INFO  org.dspace.statistics.SolrLogger @ Updating : 1754500/21701 docs in http://localhost:8081/solr//statistics-2018
    +
    + +
      +
    • Interestingly, there are 5666 occurences, and they are mostly for the 2018 core:
    • +
    + +
    $ grep 'org.dspace.statistics.SolrLogger @ Updating' /home/cgspace.cgiar.org/log/dspace.log.2019-04-03 | awk '{print $11}' | sort | uniq -c
    +      1 
    +      3 http://localhost:8081/solr//statistics-2017
    +   5662 http://localhost:8081/solr//statistics-2018
    +
    + +
      +
    • I will have to keep an eye on it because nothing should be updating 2018 stats in 2019…
    • +
    + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 18cb60d6c..03dc066df 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2019-04/ - 2019-04-02T20:32:18+03:00 + 2019-04-03T17:01:31+03:00 @@ -219,7 +219,7 @@ https://alanorth.github.io/cgspace-notes/ - 2019-04-02T20:32:18+03:00 + 2019-04-03T17:01:31+03:00 0 @@ -230,7 +230,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2019-04-02T20:32:18+03:00 + 2019-04-03T17:01:31+03:00 0 @@ -242,13 +242,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2019-04-02T20:32:18+03:00 + 2019-04-03T17:01:31+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2019-04-02T20:32:18+03:00 + 2019-04-03T17:01:31+03:00 0