cgspace-notes/docs/2019-04/index.html

<!DOCTYPE html>
<html lang="en">

  <head>
    <meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">

<meta property="og:title" content="April, 2019" />
<meta property="og:description" content="2019-04-01


Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc


They asked if we had plans to enable RDF support in CGSpace

There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today


I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!


# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep &#39;Spore-192-EN-web.pdf&#39; | grep -E &#39;(18.196.196.108|18.195.78.144|18.195.218.6)&#39; | awk &#39;{print $9}&#39; | sort | uniq -c | sort -n | tail -n 5
   4432 200


In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses
Apply country and region corrections and deletions on DSpace Test and CGSpace:


$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p &#39;fuuu&#39; -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p &#39;fuuu&#39; -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 231 -f cg.coverage.region -d
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" />
<meta property="article:published_time" content="2019-04-01T09:00:43&#43;03:00"/>
<meta property="article:modified_time" content="2019-04-02T20:32:18&#43;03:00"/>

<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="April, 2019"/>
<meta name="twitter:description" content="2019-04-01


Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc


They asked if we had plans to enable RDF support in CGSpace

There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today


I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!


# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep &#39;Spore-192-EN-web.pdf&#39; | grep -E &#39;(18.196.196.108|18.195.78.144|18.195.218.6)&#39; | awk &#39;{print $9}&#39; | sort | uniq -c | sort -n | tail -n 5
   4432 200


In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses
Apply country and region corrections and deletions on DSpace Test and CGSpace:


$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p &#39;fuuu&#39; -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p &#39;fuuu&#39; -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 231 -f cg.coverage.region -d
"/>
<meta name="generator" content="Hugo 0.54.0" />


<script type="application/ld+json">
{
  "@context": "http://schema.org",
  "@type": "BlogPosting",
  "headline": "April, 2019",
  "url": "https://alanorth.github.io/cgspace-notes/2019-04/",
  "wordCount": "347",
  "datePublished": "2019-04-01T09:00:43&#43;03:00",
  "dateModified": "2019-04-02T20:32:18&#43;03:00",
  "author": {
    "@type": "Person",
    "name": "Alan Orth"
  },
  "keywords": "Notes"
}
</script>


    <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2019-04/">

    <title>April, 2019 | CGSpace Notes</title>

    <!-- combined, minified CSS -->
    <link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-G5B34w7DFTumWTswxYzTX7NWfbvQEg1HbFFEg6ItN03uTAAoS2qkPS/fu3LhuuSA" crossorigin="anonymous">

    
  </head>

  <body>

    
    <div class="blog-masthead">
      <div class="container">
        <nav class="nav blog-nav">
          <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
        </nav>
      </div>
    </div>
    

    <header class="blog-header">
      <div class="container">
        <h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
        <p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
      </div>
    </header>
    
    
    <div class="container">
      <div class="row">
        <div class="col-sm-8 blog-main">

          
<article class="blog-post">
  <header>
    <h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2019-04/">April, 2019</a></h2>
    <p class="blog-post-meta"><time datetime="2019-04-01T09:00:43&#43;03:00">Mon Apr 01, 2019</time> by Alan Orth in 

<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>

</p>
  </header>
  <h2 id="2019-04-01">2019-04-01</h2>

<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc

<ul>
<li>They asked if we had plans to enable RDF support in CGSpace</li>
</ul></li>
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today

<ul>
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
</ul></li>
</ul>

<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
   4432 200
</code></pre>

<ul>
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
</ul>

<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre>

<h2 id="2019-04-02">2019-04-02</h2>

<ul>
<li>CTA says the Amazon IPs are AWS gateways for real user traffic</li>
<li>I was trying to add Felix Shaw&rsquo;s account back to the Administrators group on DSpace Test, but I couldn&rsquo;t find his name in the user search of the groups page

<ul>
<li>If I searched for &ldquo;Felix&rdquo; or &ldquo;Shaw&rdquo; I saw other matches, included one for his personal email address!</li>
<li>I ended up finding him via searching for his email address</li>
</ul></li>
</ul>

<h2 id="2019-04-03">2019-04-03</h2>

<ul>
<li>Maria from Bioversity emailed me a list of new ORCID identifiers for their researchers so I will add them to our controlled vocabulary

<ul>
<li>First I need to extract the ones that are unique from their list compared to our existing one:</li>
</ul></li>
</ul>

<pre><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq &gt; /tmp/2019-04-03-orcid-ids.txt
</code></pre>

<ul>
<li>We currently have 1177 unique ORCID identifiers, and this brings our total to 1237!</li>
<li>Next I will resolve all their names using my <code>resolve-orcids.py</code> script:</li>
</ul>

<pre><code>$ ./resolve-orcids.py -i /tmp/2019-04-03-orcid-ids.txt -o 2019-04-03-orcid-ids.txt -d
</code></pre>

<!-- vim: set sw=2 ts=2: -->

  
</article> 


        </div> <!-- /.blog-main -->

        <aside class="col-sm-3 ml-auto blog-sidebar">
  

        <section class="sidebar-module">
    <h4>Recent Posts</h4>
    <ol class="list-unstyled">


<li><a href="/cgspace-notes/2019-04/">April, 2019</a></li>

<li><a href="/cgspace-notes/2019-03/">March, 2019</a></li>

<li><a href="/cgspace-notes/2019-02/">February, 2019</a></li>

<li><a href="/cgspace-notes/2019-01/">January, 2019</a></li>

<li><a href="/cgspace-notes/2018-12/">December, 2018</a></li>

    </ol>
  </section>

  
  <section class="sidebar-module">
    <h4>Links</h4>
    <ol class="list-unstyled">
      
      <li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
      
      <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
      
      <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
      
    </ol>
  </section>
  
</aside>


      </div> <!-- /.row -->
    </div> <!-- /.container -->
    

    <footer class="blog-footer">
      <p>
      
      Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
      
      </p>
      <p>
      <a href="#">Back to top</a>
      </p>
    </footer>
    

  </body>

</html>
Update notes 2019-04-01 09:02:18 +03:00			`<!DOCTYPE html>`
			`<html lang="en">`

			`<head>`
			`<meta charset="utf-8">`
			`<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">`

			`<meta property="og:title" content="April, 2019" />`
Update notes for 2019-04-01 2019-04-01 17:02:54 +03:00			`<meta property="og:description" content="2019-04-01`


			`Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc`


			`They asked if we had plans to enable RDF support in CGSpace`

			`There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today`


			`I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!`



			`# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 \| grep 'Spore-192-EN-web.pdf' \| grep -E '(18.196.196.108\|18.195.78.144\|18.195.218.6)' \| awk '{print $9}' \| sort \| uniq -c \| sort -n \| tail -n 5`
			`4432 200`



			`In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses`
			`Apply country and region corrections and deletions on DSpace Test and CGSpace:`


			`$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d`
			`$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d`
			`$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d`
			`$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d`
			`" />`
Update notes 2019-04-01 09:02:18 +03:00			`<meta property="og:type" content="article" />`
			`<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" />`
			`<meta property="article:published_time" content="2019-04-01T09:00:43+03:00"/>`
Update notes for 2019-04-03 2019-04-03 17:01:31 +03:00			`<meta property="article:modified_time" content="2019-04-02T20:32:18+03:00"/>`
Update notes 2019-04-01 09:02:18 +03:00
			`<meta name="twitter:card" content="summary"/>`
			`<meta name="twitter:title" content="April, 2019"/>`
Update notes for 2019-04-01 2019-04-01 17:02:54 +03:00			`<meta name="twitter:description" content="2019-04-01`


			`Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc`


			`They asked if we had plans to enable RDF support in CGSpace`

			`There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today`


			`I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!`



			`# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 \| grep 'Spore-192-EN-web.pdf' \| grep -E '(18.196.196.108\|18.195.78.144\|18.195.218.6)' \| awk '{print $9}' \| sort \| uniq -c \| sort -n \| tail -n 5`
			`4432 200`



			`In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses`
			`Apply country and region corrections and deletions on DSpace Test and CGSpace:`


			`$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d`
			`$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d`
			`$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d`
			`$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d`
			`"/>`
Update notes 2019-04-01 09:02:18 +03:00			`<meta name="generator" content="Hugo 0.54.0" />`



			`<script type="application/ld+json">`
			`{`
			`"@context": "http://schema.org",`
			`"@type": "BlogPosting",`
			`"headline": "April, 2019",`
			`"url": "https://alanorth.github.io/cgspace-notes/2019-04/",`
Update notes for 2019-04-03 2019-04-03 17:01:31 +03:00			`"wordCount": "347",`
Update notes 2019-04-01 09:02:18 +03:00			`"datePublished": "2019-04-01T09:00:43+03:00",`
Update notes for 2019-04-03 2019-04-03 17:01:31 +03:00			`"dateModified": "2019-04-02T20:32:18+03:00",`
Update notes 2019-04-01 09:02:18 +03:00			`"author": {`
			`"@type": "Person",`
			`"name": "Alan Orth"`
			`},`
			`"keywords": "Notes"`
			`}`
			`</script>`



			`<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2019-04/">`

			`<title>April, 2019 \| CGSpace Notes</title>`

			`<!-- combined, minified CSS -->`
			`<link href="https://alanorth.github.io/cgspace-notes/css/style.css" rel="stylesheet" integrity="sha384-G5B34w7DFTumWTswxYzTX7NWfbvQEg1HbFFEg6ItN03uTAAoS2qkPS/fu3LhuuSA" crossorigin="anonymous">`









			`</head>`

			`<body>`


			`<div class="blog-masthead">`
			`<div class="container">`
			`<nav class="nav blog-nav">`
			`<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>`
			`</nav>`
			`</div>`
			`</div>`




			`<header class="blog-header">`
			`<div class="container">`
			`<h1 class="blog-title"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>`
			`<p class="lead blog-description">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>`
			`</div>`
			`</header>`




			`<div class="container">`
			`<div class="row">`
			`<div class="col-sm-8 blog-main">`




			`<article class="blog-post">`
			`<header>`
			`<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2019-04/">April, 2019</a></h2>`
			`<p class="blog-post-meta"><time datetime="2019-04-01T09:00:43+03:00">Mon Apr 01, 2019</time> by Alan Orth in`

			`<i class="fa fa-tag" aria-hidden="true"></i> <a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>`

			`</p>`
			`</header>`
			`<h2 id="2019-04-01">2019-04-01</h2>`

Update notes for 2019-04-01 2019-04-01 17:02:54 +03:00			`<ul>`
			`<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc`

			`<ul>`
			`<li>They asked if we had plans to enable RDF support in CGSpace</li>`
			`</ul></li>`
			`<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today`

			`<ul>`
			`<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>`
			`</ul></li>`
			`</ul>`

			`<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 \| grep 'Spore-192-EN-web.pdf' \| grep -E '(18.196.196.108\|18.195.78.144\|18.195.218.6)' \| awk '{print $9}' \| sort \| uniq -c \| sort -n \| tail -n 5`
			`4432 200`
			`</code></pre>`

			`<ul>`
			`<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>`
			`<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>`
			`</ul>`

			`<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d`
			`$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d`
			`$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d`
			`$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d`
			`</code></pre>`

Update notes 2019-04-01 09:02:18 +03:00			`<h2 id="2019-04-02">2019-04-02</h2>`

Add notes for 2019-04-02 2019-04-02 12:44:18 +03:00			`<ul>`
			`<li>CTA says the Amazon IPs are AWS gateways for real user traffic</li>`
Update notes for 2019-04-02 2019-04-02 20:32:18 +03:00			`<li>I was trying to add Felix Shaw’s account back to the Administrators group on DSpace Test, but I couldn’t find his name in the user search of the groups page`

			`<ul>`
			`<li>If I searched for “Felix” or “Shaw” I saw other matches, included one for his personal email address!</li>`
			`<li>I ended up finding him via searching for his email address</li>`
			`</ul></li>`
Add notes for 2019-04-02 2019-04-02 12:44:18 +03:00			`</ul>`

Update notes for 2019-04-03 2019-04-03 17:01:31 +03:00			`<h2 id="2019-04-03">2019-04-03</h2>`

			`<ul>`
			`<li>Maria from Bioversity emailed me a list of new ORCID identifiers for their researchers so I will add them to our controlled vocabulary`

			`<ul>`
			`<li>First I need to extract the ones that are unique from their list compared to our existing one:</li>`
			`</ul></li>`
			`</ul>`

			`<pre><code>$ cat dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity.txt \| grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' \| sort \| uniq > /tmp/2019-04-03-orcid-ids.txt`
			`</code></pre>`

			`<ul>`
			`<li>We currently have 1177 unique ORCID identifiers, and this brings our total to 1237!</li>`
			`<li>Next I will resolve all their names using my <code>resolve-orcids.py</code> script:</li>`
			`</ul>`

			`<pre><code>$ ./resolve-orcids.py -i /tmp/2019-04-03-orcid-ids.txt -o 2019-04-03-orcid-ids.txt -d`
			`</code></pre>`

Update notes 2019-04-01 09:02:18 +03:00			`<!-- vim: set sw=2 ts=2: -->`





			`</article>`



			`</div> <!-- /.blog-main -->`

			`<aside class="col-sm-3 ml-auto blog-sidebar">`



			`<section class="sidebar-module">`
			`<h4>Recent Posts</h4>`
			`<ol class="list-unstyled">`


			`<li><a href="/cgspace-notes/2019-04/">April, 2019</a></li>`

			`<li><a href="/cgspace-notes/2019-03/">March, 2019</a></li>`

			`<li><a href="/cgspace-notes/2019-02/">February, 2019</a></li>`

			`<li><a href="/cgspace-notes/2019-01/">January, 2019</a></li>`

			`<li><a href="/cgspace-notes/2018-12/">December, 2018</a></li>`

			`</ol>`
			`</section>`




			`<section class="sidebar-module">`
			`<h4>Links</h4>`
			`<ol class="list-unstyled">`

			`<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>`

			`<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>`

			`<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>`

			`</ol>`
			`</section>`

			`</aside>`


			`</div> <!-- /.row -->`
			`</div> <!-- /.container -->`



			`<footer class="blog-footer">`
			`<p>`

			`Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.`

			`</p>`
			`<p>`
			`<a href="#">Back to top</a>`
			`</p>`
			`</footer>`


			`</body>`

			`</html>`