2019-04-01 08:02:18 +02:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
< meta property = "og:title" content = "April, 2019" / >
2019-04-01 16:02:54 +02:00
< meta property = "og:description" content = "2019-04-01
Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
They asked if we had plans to enable RDF support in CGSpace
There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep ' Spore-192-EN-web.pdf' | grep -E ' (18.196.196.108|18.195.78.144|18.195.218.6)' | awk ' {print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses
Apply country and region corrections and deletions on DSpace Test and CGSpace:
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p ' fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p ' fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p ' fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p ' fuuu' -m 231 -f cg.coverage.region -d
" />
2019-04-01 08:02:18 +02:00
< meta property = "og:type" content = "article" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2019-04/" / >
< meta property = "article:published_time" content = "2019-04-01T09:00:43+03:00" / >
2019-04-03 16:40:05 +02:00
< meta property = "article:modified_time" content = "2019-04-03T17:01:31+03:00" / >
2019-04-01 08:02:18 +02:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "April, 2019" / >
2019-04-01 16:02:54 +02:00
< meta name = "twitter:description" content = "2019-04-01
Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
They asked if we had plans to enable RDF support in CGSpace
There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep ' Spore-192-EN-web.pdf' | grep -E ' (18.196.196.108|18.195.78.144|18.195.218.6)' | awk ' {print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses
Apply country and region corrections and deletions on DSpace Test and CGSpace:
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p ' fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p ' fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p ' fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p ' fuuu' -m 231 -f cg.coverage.region -d
"/>
2019-04-01 08:02:18 +02:00
< meta name = "generator" content = "Hugo 0.54.0" / >
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "April, 2019",
"url": "https://alanorth.github.io/cgspace-notes/2019-04/",
2019-04-03 16:40:05 +02:00
"wordCount": "492",
2019-04-01 08:02:18 +02:00
"datePublished": "2019-04-01T09:00:43+ 03:00",
2019-04-03 16:40:05 +02:00
"dateModified": "2019-04-03T17:01:31+ 03:00",
2019-04-01 08:02:18 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2019-04/" >
< title > April, 2019 | CGSpace Notes< / title >
<!-- combined, minified CSS -->
< link href = "https://alanorth.github.io/cgspace-notes/css/style.css" rel = "stylesheet" integrity = "sha384-G5B34w7DFTumWTswxYzTX7NWfbvQEg1HbFFEg6ItN03uTAAoS2qkPS/fu3LhuuSA" crossorigin = "anonymous" >
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" > < a href = "https://alanorth.github.io/cgspace-notes/2019-04/" > April, 2019< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2019-04-01T09:00:43+03:00" > Mon Apr 01, 2019< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2019-04-01" > 2019-04-01< / h2 >
2019-04-01 16:02:54 +02:00
< ul >
< li > Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
< ul >
< li > They asked if we had plans to enable RDF support in CGSpace< / li >
< / ul > < / li >
< li > There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
< ul >
< li > I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!< / li >
< / ul > < / li >
< / ul >
< pre > < code > # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
< / code > < / pre >
< ul >
< li > In the last two weeks there have been 47,000 downloads of this < em > same exact PDF< / em > by these three IP addresses< / li >
< li > Apply country and region corrections and deletions on DSpace Test and CGSpace:< / li >
< / ul >
< pre > < code > $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
< / code > < / pre >
2019-04-01 08:02:18 +02:00
< h2 id = "2019-04-02" > 2019-04-02< / h2 >
2019-04-02 11:44:18 +02:00
< ul >
< li > CTA says the Amazon IPs are AWS gateways for real user traffic< / li >
2019-04-02 19:32:18 +02:00
< li > I was trying to add Felix Shaw’ s account back to the Administrators group on DSpace Test, but I couldn’ t find his name in the user search of the groups page
< ul >
< li > If I searched for “ Felix” or “ Shaw” I saw other matches, included one for his personal email address!< / li >
< li > I ended up finding him via searching for his email address< / li >
< / ul > < / li >
2019-04-02 11:44:18 +02:00
< / ul >
2019-04-03 16:01:31 +02:00
< h2 id = "2019-04-03" > 2019-04-03< / h2 >
< ul >
< li > Maria from Bioversity emailed me a list of new ORCID identifiers for their researchers so I will add them to our controlled vocabulary
< ul >
< li > First I need to extract the ones that are unique from their list compared to our existing one:< / li >
< / ul > < / li >
< / ul >
2019-04-03 16:40:05 +02:00
< pre > < code > $ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort -u > /tmp/2019-04-03-orcid-ids.txt
2019-04-03 16:01:31 +02:00
< / code > < / pre >
< ul >
< li > We currently have 1177 unique ORCID identifiers, and this brings our total to 1237!< / li >
< li > Next I will resolve all their names using my < code > resolve-orcids.py< / code > script:< / li >
< / ul >
< pre > < code > $ ./resolve-orcids.py -i /tmp/2019-04-03-orcid-ids.txt -o 2019-04-03-orcid-ids.txt -d
< / code > < / pre >
2019-04-03 16:40:05 +02:00
< ul >
< li > After that I added the XML formatting, formatted the file with tidy, and sorted the names in vim< / li >
< li > One user’ s name has changed so I will update those using my < code > fix-metadata-values.py< / code > script:< / li >
< / ul >
< pre > < code > $ ./fix-metadata-values.py -i 2019-04-03-update-orcids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -m 240 -t correct -d
< / code > < / pre >
< ul >
< li > I created a pull request and merged the changes to the 5_x-prod branch (< a href = "https://github.com/ilri/DSpace/pull/417" > #417< / a > )< / li >
< li > A few days ago I noticed some weird update process for the statistics-2018 Solr core and I see it’ s still going:< / li >
< / ul >
< pre > < code > 2019-04-03 16:34:02,262 INFO org.dspace.statistics.SolrLogger @ Updating : 1754500/21701 docs in http://localhost:8081/solr//statistics-2018
< / code > < / pre >
< ul >
< li > Interestingly, there are 5666 occurences, and they are mostly for the 2018 core:< / li >
< / ul >
< pre > < code > $ grep 'org.dspace.statistics.SolrLogger @ Updating' /home/cgspace.cgiar.org/log/dspace.log.2019-04-03 | awk '{print $11}' | sort | uniq -c
1
3 http://localhost:8081/solr//statistics-2017
5662 http://localhost:8081/solr//statistics-2018
< / code > < / pre >
< ul >
< li > I will have to keep an eye on it because nothing should be updating 2018 stats in 2019… < / li >
< / ul >
2019-04-01 08:02:18 +02:00
<!-- vim: set sw=2 ts=2: -->
< / article >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "/cgspace-notes/2019-04/" > April, 2019< / a > < / li >
< li > < a href = "/cgspace-notes/2019-03/" > March, 2019< / a > < / li >
< li > < a href = "/cgspace-notes/2019-02/" > February, 2019< / a > < / li >
< li > < a href = "/cgspace-notes/2019-01/" > January, 2019< / a > < / li >
< li > < a href = "/cgspace-notes/2018-12/" > December, 2018< / a > < / li >
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
< p >
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >