Update notes for 2019-04-15

This commit is contained in:
Alan Orth 2019-04-15 17:42:53 +03:00
parent 36eb0ec636
commit 1fe5b9b7f7
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 65 additions and 8 deletions

View File

@ -649,5 +649,30 @@ GC_TUNE="-XX:NewRatio=3 \
- Rework the dspace-statistics-api to use the vanilla Python requests library instead of Solr client - Rework the dspace-statistics-api to use the vanilla Python requests library instead of Solr client
- [Tag version 1.0.0](https://github.com/ilri/dspace-statistics-api/releases/tag/v1.0.0) and deploy it on DSpace Test - [Tag version 1.0.0](https://github.com/ilri/dspace-statistics-api/releases/tag/v1.0.0) and deploy it on DSpace Test
- Pretty annoying to see CGSpace (linode18) with 2050% CPU steal according to `iostat 1 10`, though I haven't had any Linode alerts in a few days
- Abenet sent me a list of ILRI items that don't have CRPs added to them
- The spreadsheet only had Handles (no IDs), so I'm experimenting with using Python in OpenRefine to get the IDs
- I cloned the handle column and then did a transform to get the IDs from the CGSpace REST API:
```
import json
import re
import urllib
import urllib2
handle = re.findall('[0-9]+/[0-9]+', value)
url = 'https://cgspace.cgiar.org/rest/handle/' + handle[0]
req = urllib2.Request(url)
req.add_header('User-agent', 'Alan Python bot')
res = urllib2.urlopen(req)
data = json.load(res)
item_id = data['id']
return item_id
```
- Luckily none of the items already had CRPs, so I didn't have to worry about them getting removed
- It would have been much trickier if I had to get the CRPs for the items first, then add the CRPs...
<!-- vim: set sw=2 ts=2: --> <!-- vim: set sw=2 ts=2: -->

View File

@ -38,7 +38,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<meta property="og:type" content="article" /> <meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" /> <meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-04/" />
<meta property="article:published_time" content="2019-04-01T09:00:43&#43;03:00"/> <meta property="article:published_time" content="2019-04-01T09:00:43&#43;03:00"/>
<meta property="article:modified_time" content="2019-04-14T16:59:47&#43;03:00"/> <meta property="article:modified_time" content="2019-04-15T12:58:07&#43;03:00"/>
<meta name="twitter:card" content="summary"/> <meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="April, 2019"/> <meta name="twitter:title" content="April, 2019"/>
@ -81,9 +81,9 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
"@type": "BlogPosting", "@type": "BlogPosting",
"headline": "April, 2019", "headline": "April, 2019",
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-04\/", "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-04\/",
"wordCount": "3748", "wordCount": "3901",
"datePublished": "2019-04-01T09:00:43\x2b03:00", "datePublished": "2019-04-01T09:00:43\x2b03:00",
"dateModified": "2019-04-14T16:59:47\x2b03:00", "dateModified": "2019-04-15T12:58:07\x2b03:00",
"author": { "author": {
"@type": "Person", "@type": "Person",
"name": "Alan Orth" "name": "Alan Orth"
@ -924,6 +924,38 @@ $ ./delete-metadata-values.py -i /tmp/2019-04-11-delete-6-subjects.csv -db dspac
<ul> <ul>
<li><a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.0.0">Tag version 1.0.0</a> and deploy it on DSpace Test</li> <li><a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.0.0">Tag version 1.0.0</a> and deploy it on DSpace Test</li>
</ul></li> </ul></li>
<li>Pretty annoying to see CGSpace (linode18) with 2050% CPU steal according to <code>iostat 1 10</code>, though I haven&rsquo;t had any Linode alerts in a few days</li>
<li>Abenet sent me a list of ILRI items that don&rsquo;t have CRPs added to them
<ul>
<li>The spreadsheet only had Handles (no IDs), so I&rsquo;m experimenting with using Python in OpenRefine to get the IDs</li>
<li>I cloned the handle column and then did a transform to get the IDs from the CGSpace REST API:</li>
</ul></li>
</ul>
<pre><code>import json
import re
import urllib
import urllib2
handle = re.findall('[0-9]+/[0-9]+', value)
url = 'https://cgspace.cgiar.org/rest/handle/' + handle[0]
req = urllib2.Request(url)
req.add_header('User-agent', 'Alan Python bot')
res = urllib2.urlopen(req)
data = json.load(res)
item_id = data['id']
return item_id
</code></pre>
<ul>
<li>Luckily none of the items already had CRPs, so I didn&rsquo;t have to worry about them getting removed
<ul>
<li>It would have been much trickier if I had to get the CRPs for the items first, then add the CRPs&hellip;</li>
</ul></li>
</ul> </ul>
<!-- vim: set sw=2 ts=2: --> <!-- vim: set sw=2 ts=2: -->

View File

@ -4,30 +4,30 @@
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/2019-04/</loc> <loc>https://alanorth.github.io/cgspace-notes/2019-04/</loc>
<lastmod>2019-04-14T16:59:47+03:00</lastmod> <lastmod>2019-04-15T12:58:07+03:00</lastmod>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2019-04-14T16:59:47+03:00</lastmod> <lastmod>2019-04-15T12:58:07+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2019-04-14T16:59:47+03:00</lastmod> <lastmod>2019-04-15T12:58:07+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc> <loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2019-04-14T16:59:47+03:00</lastmod> <lastmod>2019-04-15T12:58:07+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>
<url> <url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc> <loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2019-04-14T16:59:47+03:00</lastmod> <lastmod>2019-04-15T12:58:07+03:00</lastmod>
<priority>0</priority> <priority>0</priority>
</url> </url>