2018-10-01 21:33:15 +02:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
< meta property = "og:title" content = "October, 2018" / >
2018-10-03 10:52:48 +02:00
< meta property = "og:description" content = "2018-10-01 Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now 2018-10-03 I see Moayad was busy collecting item views and downloads from CGSpace yesterday: # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1} ' | sort | uniq -c | sort -n | tail -n 10 933 40." / >
2018-10-01 21:33:15 +02:00
< meta property = "og:type" content = "article" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2018-10/" / > < meta property = "article:published_time" content = "2018-10-01T22:31:54+03:00" / >
2018-10-03 20:52:12 +02:00
< meta property = "article:modified_time" content = "2018-10-03T17:54:58+03:00" / >
2018-10-01 21:33:15 +02:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "October, 2018" / >
2018-10-03 10:52:48 +02:00
< meta name = "twitter:description" content = "2018-10-01 Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now 2018-10-03 I see Moayad was busy collecting item views and downloads from CGSpace yesterday: # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1} ' | sort | uniq -c | sort -n | tail -n 10 933 40." / >
2018-10-01 21:33:15 +02:00
< meta name = "generator" content = "Hugo 0.49" / >
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "October, 2018",
"url": "https://alanorth.github.io/cgspace-notes/2018-10/",
2018-10-03 20:52:12 +02:00
"wordCount": "460",
2018-10-01 21:33:15 +02:00
"datePublished": "2018-10-01T22:31:54+ 03:00",
2018-10-03 20:52:12 +02:00
"dateModified": "2018-10-03T17:54:58+ 03:00",
2018-10-01 21:33:15 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2018-10/" >
< title > October, 2018 | CGSpace Notes< / title >
<!-- combined, minified CSS -->
< link href = "https://alanorth.github.io/cgspace-notes/css/style.css" rel = "stylesheet" integrity = "sha384-Upm5uY/SXdvbjuIGH6fBjF5vOYUr9DguqBskM+EQpLBzO9U+9fMVmWEt+TTlGrWQ" crossorigin = "anonymous" >
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" > < a href = "https://alanorth.github.io/cgspace-notes/2018-10/" > October, 2018< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2018-10-01T22:31:54+03:00" > Mon Oct 01, 2018< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2018-10-01" > 2018-10-01< / h2 >
< ul >
< li > Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items< / li >
< li > I created a GitHub issue to track this < a href = "https://github.com/ilri/DSpace/issues/389" > #389< / a > , because I’ m super busy in Nairobi right now< / li >
< / ul >
2018-10-03 10:52:48 +02:00
< h2 id = "2018-10-03" > 2018-10-03< / h2 >
< ul >
< li > I see Moayad was busy collecting item views and downloads from CGSpace yesterday:< / li >
< / ul >
< pre > < code > # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E " 02/Oct/2018" | awk '{print $1}
' | sort | uniq -c | sort -n | tail -n 10
933 40.77.167.90
971 95.108.181.88
1043 41.204.190.40
1454 157.55.39.54
1538 207.46.13.69
1719 66.249.64.61
2048 50.116.102.77
4639 66.249.64.59
4736 35.237.175.180
150362 34.218.226.147
< / code > < / pre >
< ul >
< li > Of those, about 20% were HTTP 500 responses (!):< / li >
< / ul >
< pre > < code > $ zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E " 02/Oct/2018" | grep 34.218.226.147 | awk '{print $9}' | sort -n | uniq -c
118927 200
31435 500
< / code > < / pre >
2018-10-03 16:54:58 +02:00
< ul >
< li > I added Phil Thornton and Sonal Henson’ s ORCID identifiers to the controlled vocabulary for < code > cg.creator.orcid< / code > and then re-generated the names using my < a href = "https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b" > resolve-orcids.py< / a > script:< / li >
< / ul >
< pre > < code > $ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml | sort | uniq > 2018-10-03-orcids.txt
$ ./resolve-orcids.py -i 2018-10-03-orcids.txt -o 2018-10-03-names.txt -d
< / code > < / pre >
< ul >
< li > I found a new corner case error that I need to check, given < em > and< / em > family names deactivated:< / li >
< / ul >
< pre > < code > Looking up the names associated with ORCID iD: 0000-0001-7930-5752
Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
< / code > < / pre >
< ul >
< li > It appears to be Jim Lorenzen… I need to check that later!< / li >
< li > I merged the changes to the < code > 5_x-prod< / code > branch (< a href = "https://github.com/ilri/DSpace/pull/390" > #390< / a > )< / li >
2018-10-03 20:52:12 +02:00
< li > Linode sent another alert about CPU usage on CGSpace (linode18) this evening< / li >
< li > It seems that Moayad is making quite a lot of requests today:< / li >
< / ul >
< pre > < code > # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E " 03/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1594 157.55.39.160
1627 157.55.39.173
1774 136.243.6.84
4228 35.237.175.180
4497 70.32.83.92
4856 66.249.64.59
7120 50.116.102.77
12518 138.201.49.199
87646 34.218.226.147
111729 213.139.53.62
< / code > < / pre >
< ul >
< li > But in super positive news, he says they are using my new < a href = "https://github.com/alanorth/dspace-statistics-api" > dspace-statistics-api< / a > and it’ s MUCH faster than using Atmire CUA’ s internal “ restlet” API< / li >
< li > I don’ t recognize the < code > 138.201.49.199< / code > IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams:< / li >
< / ul >
< pre > < code > # grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c
8324 GET /bitstream
4193 GET /handle
< / code > < / pre >
< ul >
< li > Suspiciously, it’ s only grabbing the CGIAR System Office community (handle prefix 10947):< / li >
< / ul >
< pre > < code > # grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c
7 GET /handle/10568
4186 GET /handle/10947
< / code > < / pre >
< ul >
< li > The user agent is suspicious too:< / li >
< / ul >
< pre > < code > Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
< / code > < / pre >
< ul >
< li > It’ s clearly a bot and it’ s not re-using its Tomcat session, so I will add its IP to the nginx bad bot list< / li >
< li > I looked in Solr’ s statistics core and these hits were actually all counted as < code > isBot:false< / code > (of course)… hmmm< / li >
2018-10-03 16:54:58 +02:00
< / ul >
2018-10-01 21:33:15 +02:00
<!-- vim: set sw=2 ts=2: -->
< / article >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "/cgspace-notes/2018-10/" > October, 2018< / a > < / li >
< li > < a href = "/cgspace-notes/2018-09/" > September, 2018< / a > < / li >
< li > < a href = "/cgspace-notes/2018-08/" > August, 2018< / a > < / li >
< li > < a href = "/cgspace-notes/2018-07/" > July, 2018< / a > < / li >
< li > < a href = "/cgspace-notes/2018-06/" > June, 2018< / a > < / li >
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
< p >
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >