Add notes for 2017-08-01

This commit is contained in:
2017-08-01 11:57:37 +03:00
parent ff336ce2ba
commit e3e602881e
38 changed files with 787 additions and 345 deletions

View File

@ -12,7 +12,7 @@
<meta property="og:updated_time" content="2017-07-01T18:03:52&#43;03:00"/>
<meta property="og:updated_time" content="2017-08-01T11:51:52&#43;03:00"/>
@ -37,7 +37,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2017-07-01T18:03:52&#43;03:00",
"dateModified": "2017-08-01T11:51:52&#43;03:00",
"keywords": "notes,",
"description": "Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,41 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2017-08/">August, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-08-01T11:51:52&#43;03:00">Tue Aug 01, 2017</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like:
<ul>
<li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li>
</ul></li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
</ul>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2017-08/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2017-07/">July, 2017</a></h2>
@ -369,40 +404,6 @@ DELETE 1
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2016-10/">October, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-10-03T15:53:00&#43;03:00">Mon Oct 03, 2016</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
<ul>
<li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li>
</ul></li>
<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
</article>
<nav class="blog-pagination">
@ -426,6 +427,8 @@ DELETE 1
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2017-08/">August, 2017</a></li>
<li><a href="/cgspace-notes/2017-07/">July, 2017</a></li>
<li><a href="/cgspace-notes/2017-06/">June, 2017</a></li>
@ -434,8 +437,6 @@ DELETE 1
<li><a href="/cgspace-notes/2017-04/">April, 2017</a></li>
<li><a href="/cgspace-notes/2017-03/">March, 2017</a></li>
</ol>
</section>

View File

@ -6,11 +6,37 @@
<description>Recent content in Posts on CGSpace Notes</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Sat, 01 Jul 2017 18:03:52 +0300</lastBuildDate>
<lastBuildDate>Tue, 01 Aug 2017 11:51:52 +0300</lastBuildDate>
<atom:link href="https://alanorth.github.io/cgspace-notes/post/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>August, 2017</title>
<link>https://alanorth.github.io/cgspace-notes/2017-08/</link>
<pubDate>Tue, 01 Aug 2017 11:51:52 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-08/</guid>
<description>&lt;h2 id=&#34;2017-08-01&#34;&gt;2017-08-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours&lt;/li&gt;
&lt;li&gt;I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)&lt;/li&gt;
&lt;li&gt;The good thing is that, according to &lt;code&gt;dspace.log.2017-08-01&lt;/code&gt;, they are all using the same Tomcat session&lt;/li&gt;
&lt;li&gt;This means our Tomcat Crawler Session Valve is working&lt;/li&gt;
&lt;li&gt;But many of the bots are browsing dynamic URLs like:
&lt;ul&gt;
&lt;li&gt;/handle/10568/3353/discover&lt;/li&gt;
&lt;li&gt;/handle/10568/16510/browse&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;robots.txt&lt;/code&gt; only blocks the top-level &lt;code&gt;/discover&lt;/code&gt; and &lt;code&gt;/browse&lt;/code&gt; URLs&amp;hellip; we will need to find a way to forbid them from accessing these!&lt;/li&gt;
&lt;li&gt;Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): &lt;a href=&#34;https://jira.duraspace.org/browse/DS-2962&#34;&gt;https://jira.duraspace.org/browse/DS-2962&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;</description>
</item>
<item>
<title>July, 2017</title>
<link>https://alanorth.github.io/cgspace-notes/2017-07/</link>

View File

@ -12,7 +12,7 @@
<meta property="og:updated_time" content="2016-09-01T15:53:00&#43;03:00"/>
<meta property="og:updated_time" content="2016-10-03T15:53:00&#43;03:00"/>
@ -37,7 +37,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2016-09-01T15:53:00&#43;03:00",
"dateModified": "2016-10-03T15:53:00&#43;03:00",
"keywords": "notes,",
"description": "Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,40 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2016-10/">October, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-10-03T15:53:00&#43;03:00">Mon Oct 03, 2016</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
<ul>
<li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li>
</ul></li>
<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2016-09/">September, 2016</a></h2>
@ -367,37 +401,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2015-12/">December, 2015</a></h2>
<p class="blog-post-meta"><time datetime="2015-12-02T13:18:00&#43;03:00">Wed Dec 02, 2015</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/post/" rel="prev" role="button">Previous page</a>
@ -421,6 +424,8 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2017-08/">August, 2017</a></li>
<li><a href="/cgspace-notes/2017-07/">July, 2017</a></li>
<li><a href="/cgspace-notes/2017-06/">June, 2017</a></li>
@ -429,8 +434,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<li><a href="/cgspace-notes/2017-04/">April, 2017</a></li>
<li><a href="/cgspace-notes/2017-03/">March, 2017</a></li>
</ol>
</section>

View File

@ -12,7 +12,7 @@
<meta property="og:updated_time" content="2015-11-23T17:00:57&#43;03:00"/>
<meta property="og:updated_time" content="2015-12-02T13:18:00&#43;03:00"/>
@ -37,7 +37,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2015-11-23T17:00:57&#43;03:00",
"dateModified": "2015-12-02T13:18:00&#43;03:00",
"keywords": "notes,",
"description": "Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,37 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2015-12/">December, 2015</a></h2>
<p class="blog-post-meta"><time datetime="2015-12-02T13:18:00&#43;03:00">Wed Dec 02, 2015</time> by Alan Orth in
<i class="fa fa-tag" aria-hidden="true"></i>&nbsp;<a href="/cgspace-notes/tags/notes" rel="tag">Notes</a>
</p>
</header>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre>
<p></p>
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title"><a href="https://alanorth.github.io/cgspace-notes/2015-11/">November, 2015</a></h2>
@ -147,6 +178,8 @@
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2017-08/">August, 2017</a></li>
<li><a href="/cgspace-notes/2017-07/">July, 2017</a></li>
<li><a href="/cgspace-notes/2017-06/">June, 2017</a></li>
@ -155,8 +188,6 @@
<li><a href="/cgspace-notes/2017-04/">April, 2017</a></li>
<li><a href="/cgspace-notes/2017-03/">March, 2017</a></li>
</ol>
</section>