diff --git a/content/posts/2019-04.md b/content/posts/2019-04.md index bf835dbd4..a69b6bcc5 100644 --- a/content/posts/2019-04.md +++ b/content/posts/2019-04.md @@ -42,7 +42,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace - First I need to extract the ones that are unique from their list compared to our existing one: ``` -$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > /tmp/2019-04-03-orcid-ids.txt +$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u > /tmp/2019-04-03-orcid-ids.txt ``` - We currently have 1177 unique ORCID identifiers, and this brings our total to 1237! @@ -52,4 +52,29 @@ $ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.tx $ ./resolve-orcids.py -i /tmp/2019-04-03-orcid-ids.txt -o 2019-04-03-orcid-ids.txt -d ``` +- After that I added the XML formatting, formatted the file with tidy, and sorted the names in vim +- One user's name has changed so I will update those using my `fix-metadata-values.py` script: + +``` +$ ./fix-metadata-values.py -i 2019-04-03-update-orcids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -m 240 -t correct -d +``` + +- I created a pull request and merged the changes to the 5_x-prod branch ([#417](https://github.com/ilri/DSpace/pull/417)) +- A few days ago I noticed some weird update process for the statistics-2018 Solr core and I see it's still going: + +``` +2019-04-03 16:34:02,262 INFO org.dspace.statistics.SolrLogger @ Updating : 1754500/21701 docs in http://localhost:8081/solr//statistics-2018 +``` + +- Interestingly, there are 5666 occurences, and they are mostly for the 2018 core: + +``` +$ grep 'org.dspace.statistics.SolrLogger @ Updating' /home/cgspace.cgiar.org/log/dspace.log.2019-04-03 | awk '{print $11}' | sort | uniq -c + 1 + 3 http://localhost:8081/solr//statistics-2017 + 5662 http://localhost:8081/solr//statistics-2018 +``` + +- I will have to keep an eye on it because nothing should be updating 2018 stats in 2019... + diff --git a/docs/2019-04/index.html b/docs/2019-04/index.html index 8465547c6..0cbfd2edd 100644 --- a/docs/2019-04/index.html +++ b/docs/2019-04/index.html @@ -38,7 +38,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace - + @@ -81,9 +81,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace "@type": "BlogPosting", "headline": "April, 2019", "url": "https://alanorth.github.io/cgspace-notes/2019-04/", - "wordCount": "347", + "wordCount": "492", "datePublished": "2019-04-01T09:00:43+03:00", - "dateModified": "2019-04-02T20:32:18+03:00", + "dateModified": "2019-04-03T17:01:31+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -203,7 +203,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -
$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > /tmp/2019-04-03-orcid-ids.txt
+
$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u > /tmp/2019-04-03-orcid-ids.txt