2019-04-01 08:02:18 +02:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
< meta property = "og:title" content = "April, 2019" / >
2019-04-01 16:02:54 +02:00
< meta property = "og:description" content = "2019-04-01
Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
They asked if we had plans to enable RDF support in CGSpace
There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep ' Spore-192-EN-web.pdf' | grep -E ' (18.196.196.108|18.195.78.144|18.195.218.6)' | awk ' {print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses
Apply country and region corrections and deletions on DSpace Test and CGSpace:
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p ' fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p ' fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p ' fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p ' fuuu' -m 231 -f cg.coverage.region -d
" />
2019-04-01 08:02:18 +02:00
< meta property = "og:type" content = "article" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2019-04/" / >
< meta property = "article:published_time" content = "2019-04-01T09:00:43+03:00" / >
2019-04-03 16:01:31 +02:00
< meta property = "article:modified_time" content = "2019-04-02T20:32:18+03:00" / >
2019-04-01 08:02:18 +02:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "April, 2019" / >
2019-04-01 16:02:54 +02:00
< meta name = "twitter:description" content = "2019-04-01
Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
They asked if we had plans to enable RDF support in CGSpace
There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep ' Spore-192-EN-web.pdf' | grep -E ' (18.196.196.108|18.195.78.144|18.195.218.6)' | awk ' {print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses
Apply country and region corrections and deletions on DSpace Test and CGSpace:
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p ' fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p ' fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p ' fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p ' fuuu' -m 231 -f cg.coverage.region -d
"/>
2019-04-01 08:02:18 +02:00
< meta name = "generator" content = "Hugo 0.54.0" / >
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "April, 2019",
"url": "https://alanorth.github.io/cgspace-notes/2019-04/",
2019-04-03 16:01:31 +02:00
"wordCount": "347",
2019-04-01 08:02:18 +02:00
"datePublished": "2019-04-01T09:00:43+ 03:00",
2019-04-03 16:01:31 +02:00
"dateModified": "2019-04-02T20:32:18+ 03:00",
2019-04-01 08:02:18 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2019-04/" >
< title > April, 2019 | CGSpace Notes< / title >
<!-- combined, minified CSS -->
< link href = "https://alanorth.github.io/cgspace-notes/css/style.css" rel = "stylesheet" integrity = "sha384-G5B34w7DFTumWTswxYzTX7NWfbvQEg1HbFFEg6ItN03uTAAoS2qkPS/fu3LhuuSA" crossorigin = "anonymous" >
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" > < a href = "https://alanorth.github.io/cgspace-notes/2019-04/" > April, 2019< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2019-04-01T09:00:43+03:00" > Mon Apr 01, 2019< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2019-04-01" > 2019-04-01< / h2 >
2019-04-01 16:02:54 +02:00
< ul >
< li > Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
< ul >
< li > They asked if we had plans to enable RDF support in CGSpace< / li >
< / ul > < / li >
< li > There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
< ul >
< li > I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!< / li >
< / ul > < / li >
< / ul >
< pre > < code > # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
< / code > < / pre >
< ul >
< li > In the last two weeks there have been 47,000 downloads of this < em > same exact PDF< / em > by these three IP addresses< / li >
< li > Apply country and region corrections and deletions on DSpace Test and CGSpace:< / li >
< / ul >
< pre > < code > $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
< / code > < / pre >
2019-04-01 08:02:18 +02:00
< h2 id = "2019-04-02" > 2019-04-02< / h2 >
2019-04-02 11:44:18 +02:00
< ul >
< li > CTA says the Amazon IPs are AWS gateways for real user traffic< / li >
2019-04-02 19:32:18 +02:00
< li > I was trying to add Felix Shaw’ s account back to the Administrators group on DSpace Test, but I couldn’ t find his name in the user search of the groups page
< ul >
< li > If I searched for “ Felix” or “ Shaw” I saw other matches, included one for his personal email address!< / li >
< li > I ended up finding him via searching for his email address< / li >
< / ul > < / li >
2019-04-02 11:44:18 +02:00
< / ul >
2019-04-03 16:01:31 +02:00
< h2 id = "2019-04-03" > 2019-04-03< / h2 >
< ul >
< li > Maria from Bioversity emailed me a list of new ORCID identifiers for their researchers so I will add them to our controlled vocabulary
< ul >
< li > First I need to extract the ones that are unique from their list compared to our existing one:< / li >
< / ul > < / li >
< / ul >
< pre > < code > $ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > /tmp/2019-04-03-orcid-ids.txt
< / code > < / pre >
< ul >
< li > We currently have 1177 unique ORCID identifiers, and this brings our total to 1237!< / li >
< li > Next I will resolve all their names using my < code > resolve-orcids.py< / code > script:< / li >
< / ul >
< pre > < code > $ ./resolve-orcids.py -i /tmp/2019-04-03-orcid-ids.txt -o 2019-04-03-orcid-ids.txt -d
< / code > < / pre >
2019-04-01 08:02:18 +02:00
<!-- vim: set sw=2 ts=2: -->
< / article >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "/cgspace-notes/2019-04/" > April, 2019< / a > < / li >
< li > < a href = "/cgspace-notes/2019-03/" > March, 2019< / a > < / li >
< li > < a href = "/cgspace-notes/2019-02/" > February, 2019< / a > < / li >
< li > < a href = "/cgspace-notes/2019-01/" > January, 2019< / a > < / li >
< li > < a href = "/cgspace-notes/2018-12/" > December, 2018< / a > < / li >
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
< p >
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >