2019-09-01 09:41:30 +02:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
< meta property = "og:title" content = "September, 2019" / >
< meta property = "og:description" content = "2019-09-01
Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning
Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E " 01/Sep/2019:0" | awk ' {print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E " 01/Sep/2019:0" | awk ' {print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
" />
< meta property = "og:type" content = "article" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2019-09/" / >
< meta property = "article:published_time" content = "2019-09-01T10:17:51+03:00" / >
2019-09-10 16:20:42 +02:00
< meta property = "article:modified_time" content = "2019-09-10T16:59:18+03:00" / >
2019-09-01 09:41:30 +02:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "September, 2019" / >
< meta name = "twitter:description" content = "2019-09-01
Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning
Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E " 01/Sep/2019:0" | awk ' {print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E " 01/Sep/2019:0" | awk ' {print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
"/>
2019-09-10 15:59:18 +02:00
< meta name = "generator" content = "Hugo 0.58.1" / >
2019-09-01 09:41:30 +02:00
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "September, 2019",
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-09\/",
2019-09-10 16:20:42 +02:00
"wordCount": "534",
2019-09-01 09:41:30 +02:00
"datePublished": "2019-09-01T10:17:51\x2b03:00",
2019-09-10 16:20:42 +02:00
"dateModified": "2019-09-10T16:59:18\x2b03:00",
2019-09-01 09:41:30 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2019-09/" >
< title > September, 2019 | CGSpace Notes< / title >
<!-- combined, minified CSS -->
< link href = "https://alanorth.github.io/cgspace-notes/css/style.css" rel = "stylesheet" integrity = "sha384-G5B34w7DFTumWTswxYzTX7NWfbvQEg1HbFFEg6ItN03uTAAoS2qkPS/fu3LhuuSA" crossorigin = "anonymous" >
<!-- RSS 2.0 feed -->
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" > < a href = "https://alanorth.github.io/cgspace-notes/2019-09/" > September, 2019< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2019-09-01T10:17:51+03:00" > Sun Sep 01, 2019< / time > by Alan Orth in
< i class = "fa fa-tag" aria-hidden = "true" > < / i > < a href = "/cgspace-notes/tags/notes" rel = "tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2019-09-01" > 2019-09-01< / h2 >
< ul >
< li > Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning< / li >
< li > < p > Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:< / p >
< pre > < code > # zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E " 01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E " 01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
< / code > < / pre > < / li >
< / ul >
< ul >
< li > < code > 3.94.211.189< / code > is MauiBot, and most of its requests are to Discovery and get rate limited with HTTP 503< / li >
< li > < p > < code > 163.172.71.23< / code > is some IP on Online SAS in France and its user agent is:< / p >
< pre > < code > Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)
< / code > < / pre > < / li >
< li > < p > It actually got mostly HTTP 200 responses:< / p >
< pre > < code > # zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E " 01/Sep/2019:0" | grep 163.172.71.23 | awk '{print $9}' | sort | uniq -c
1775 200
703 499
72 503
< / code > < / pre > < / li >
< li > < p > And it was mostly requesting Discover pages:< / p >
< pre > < code > # zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E " 01/Sep/2019:0" | grep 163.172.71.23 | grep -o -E " (bitstream|discover|handle)" | sort | uniq -c
2350 discover
71 handle
< / code > < / pre > < / li >
< li > < p > I’ m not sure why the outbound traffic rate was so high… < / p > < / li >
< / ul >
2019-09-03 00:03:06 +02:00
< h2 id = "2019-09-02" > 2019-09-02< / h2 >
< ul >
< li > Follow up with Carol and Francesca from Bioversity as they were on holiday during the mid-to-late August
< ul >
< li > I told them to check the < a href = "https://dspacetest.cgiar.org/handle/10568/103999" > temporary collection on DSpace Test< / a > where I uploaded the 1,427 items so they can see how it will look< / li >
< li > Also, I told them to advise me about the strange file extensions (.7z, .zip, .lck)< / li >
< li > Also, I reminded Abenet to check the metadata, as the institutional authors at least will need some modification< / li >
< / ul > < / li >
< / ul >
2019-09-10 15:59:18 +02:00
< h2 id = "2019-09-10" > 2019-09-10< / h2 >
< ul >
< li > Altmetric responded to say that they have fixed an issue with their badge code so now research outputs with multiple handles are showing badges!
< ul >
< li > See: < a href = "https://hdl.handle.net/handle/10568/97825" > https://hdl.handle.net/handle/10568/97825< / a > < / li >
< / ul > < / li >
< li > Follow up with Bosede about the mixup with PDFs in the items uploaded in 2018-12 (aka Daniel1807)
< ul >
< li > These are the same ones that Peter noticed last week, that Bosede and I had been discussing earlier this year that we never sorted out< / li >
2019-09-10 16:20:42 +02:00
< li > It looks like these items were uploaded by Sisay on 2018-12-19 so we can use the < a href = "https://cgspace.cgiar.org/handle/10568/68616/discover?filtertype_1=dateAccessioned&filter_relational_operator_1=contains&filter_1=2018-12-19&submit_apply_filter=&query=" > accession date as a filter< / a > to narrow it down to 230 items (of which only 104 have PDFs, according to the Daniel1807.xls input input file)< / li >
2019-09-10 15:59:18 +02:00
< / ul > < / li >
< li > Continue working on CG Core v2 migration, focusing on the crosswalk mappings
< ul >
< li > I think we can skip the MODS crosswalk for now because it is only used in < a href = "https://wiki.duraspace.org/display/DSDOC5x/DSpace+AIP+Format#DSpaceAIPFormat-MODSSchema" > AIP exports that are meant for non-DSpace systems< / a > < / li >
< li > We should probably do the QDC crosswalk as well as those in < code > xhtml-head-item.properties< / code > … < / li >
< li > Ouch, there is potentially a lot of work in the OAI metadata formats like DIM, METS, and QDC (see < code > dspace/config/crosswalks/oai/*.xsl< / code > )< / li >
< li > In general I think I should only modify the left side of the crosswalk mappings (ie, where metadata is coming from) so we maintain the same exact output for search engines, etc< / li >
< / ul > < / li >
< / ul >
2019-09-01 09:41:30 +02:00
<!-- vim: set sw=2 ts=2: -->
< / article >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "/cgspace-notes/posts/" > Posts< / a > < / li >
< li > < a href = "/cgspace-notes/2019-09/" > September, 2019< / a > < / li >
< li > < a href = "/cgspace-notes/2019-08/" > August, 2019< / a > < / li >
< li > < a href = "/cgspace-notes/2019-07/" > July, 2019< / a > < / li >
< li > < a href = "/cgspace-notes/2019-06/" > June, 2019< / a > < / li >
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
< p >
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >