Add notes for 2019-12-17

This commit is contained in:
Alan Orth 2019-12-17 14:49:24 +02:00
parent d83c951532
commit d54e5b69f1
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
90 changed files with 1420 additions and 1377 deletions

View File

@ -117,4 +117,21 @@ COPY 48
- I restarted Tomcat three times before all cores came up successfully
- While I was restarting the Tomcat service I upgraded the PostgreSQL JDBC driver to version 42.2.9, which had been deployed on DSpace Test earlier this week
## 2019-12-16
- Visit CodeObia office to discuss next phase of OpenRXV/AReS development
- We discussed using CSV instead of Excel for tabular reports
- OpenRXV should only have "simple" reports with Dublin Core fields
- AReS should have this as well as a customized "extended" report that has CRPs, Subjects, Sponsors, etc from CGSpace
- We discussed using RTF instead of Word for graphical reports
## 2019-12-17
- Start filing GitHub issues for the reporting features on OpenRXV and AReS
- I created an issue for the "simple" tabular reports on OpenRXV GitHub ([#29](https://github.com/ilri/OpenRXV/issues/29))
- I created an issue for the "extended" tabular reports on AReS GitHub ([#8](https://github.com/ilri/AReS/issues/8))
- I created an issue for "simple" text reports on the OpenRXV GitHub ([#30](https://github.com/ilri/OpenRXV/issues/30))
- I created an issue for "extended" text reports on the AReS GitHub ([#9](https://github.com/ilri/AReS/issues/9))
- I looked into creating RTF documents from HTML in Node.js and there is a library called [html-to-rtf](https://www.npmjs.com/package/html-to-rtf) that works well, but doesn't support images
<!-- vim: set sw=2 ts=2: -->

View File

@ -31,7 +31,7 @@ Last week I had increased the limit from 30 to 60, which seemed to help, but now
$ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspace
78
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -112,7 +112,7 @@ $ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspac
</p>
</header>
<h2 id="20151122">2015-11-22</h2>
<h2 id="2015-11-22">2015-11-22</h2>
<ul>
<li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
@ -123,7 +123,7 @@ $ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspac
</code></pre><ul>
<li>For now I have increased the limit from 60 to 90, run updates, and rebooted the server</li>
</ul>
<h2 id="20151124">2015-11-24</h2>
<h2 id="2015-11-24">2015-11-24</h2>
<ul>
<li>CGSpace went down again</li>
<li>Getting emails from uptimeRobot and uptimeButler that it's down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors</li>
@ -134,7 +134,7 @@ $ psql -c &#39;SELECT * from pg_stat_activity;&#39; | grep idle | grep -c cgspac
</code></pre><ul>
<li>For some reason the number of idle connections is very high since we upgraded to DSpace 5</li>
</ul>
<h2 id="20151125">2015-11-25</h2>
<h2 id="2015-11-25">2015-11-25</h2>
<ul>
<li>Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config</li>
<li>The OAI application requests stylesheets and javascript files with the path <code>/oai/static/css</code>, which gets matched here:</li>
@ -177,7 +177,7 @@ datid | datname | pid | usesysid | usename | application_name | client_addr
<li>Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well</li>
<li>Also deploy the nginx fixes for the <code>try_files</code> location block as well as the expires block</li>
</ul>
<h2 id="20151126">2015-11-26</h2>
<h2 id="2015-11-26">2015-11-26</h2>
<ul>
<li>CGSpace behaving much better since changing <code>db.maxidle</code> yesterday, but still two up/down notices from monitoring this morning (better than 50!)</li>
<li>CCAFS colleagues mentioned that the REST API is very slow, 24 seconds for one item</li>
@ -195,7 +195,7 @@ datid | datname | pid | usesysid | usename | application_name | client_addr
<li>At the time, the current DSpace pool size was 50&hellip;</li>
<li>I reduced the pool back to the default of 30, and reduced the <code>db.maxidle</code> settings from 10 to 8</li>
</ul>
<h2 id="20151129">2015-11-29</h2>
<h2 id="2015-11-29">2015-11-29</h2>
<ul>
<li>Still more alerts that CGSpace has been up and down all day</li>
<li>Current database settings for DSpace:</li>

View File

@ -33,7 +33,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -114,7 +114,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
</p>
</header>
<h2 id="20151202">2015-12-02</h2>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
@ -176,7 +176,7 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
<li>I filed a ticket on Atmire's issue tracker</li>
<li>I also filed a ticket on Atmire's issue tracker for the PostgreSQL stuff</li>
</ul>
<h2 id="20151203">2015-12-03</h2>
<h2 id="2015-12-03">2015-12-03</h2>
<ul>
<li>CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)</li>
<li>Idle postgres connections look like this (with no change in DSpace db settings lately):</li>
@ -201,7 +201,7 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
0.806
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.854
</code></pre><h2 id="20151205">2015-12-05</h2>
</code></pre><h2 id="2015-12-05">2015-12-05</h2>
<ul>
<li>CGSpace has been up and down all day and REST API is completely unresponsive</li>
<li>PostgreSQL idle connections are currently:</li>
@ -216,7 +216,7 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
<img src="/cgspace-notes/2015/12/postgres_cache_cgspace-year.png" alt="PostgreSQL cache (year)">
<img src="/cgspace-notes/2015/12/postgres_locks_cgspace-year.png" alt="PostgreSQL locks (year)">
<img src="/cgspace-notes/2015/12/postgres_scans_cgspace-year.png" alt="PostgreSQL scans (year)"></p>
<h2 id="20151207">2015-12-07</h2>
<h2 id="2015-12-07">2015-12-07</h2>
<ul>
<li>Atmire sent <a href="https://github.com/ilri/DSpace/pull/161">some fixes</a> to DSpace's REST API code that was leaving contexts open (causing the slow performance and database issues)</li>
<li>After deploying the fix to CGSpace the REST API is consistently faster:</li>
@ -231,7 +231,7 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
0.566
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.497
</code></pre><h2 id="20151208">2015-12-08</h2>
</code></pre><h2 id="2015-12-08">2015-12-08</h2>
<ul>
<li>Switch CGSpace log compression cron jobs from using lzop to xz—the compression isn't as good, but it's much faster and causes less IO/CPU load</li>
<li>Since we figured out (and fixed) the cause of the performance issue, I reverted Google Bot's crawl rate to the &ldquo;Let Google optimize&rdquo; setting</li>

View File

@ -25,7 +25,7 @@ Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_
I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
Update GitHub wiki for documentation of maintenance tasks.
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -106,22 +106,22 @@ Update GitHub wiki for documentation of maintenance tasks.
</p>
</header>
<h2 id="20160113">2016-01-13</h2>
<h2 id="2016-01-13">2016-01-13</h2>
<ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
<li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li>
</ul>
<h2 id="20160114">2016-01-14</h2>
<h2 id="2016-01-14">2016-01-14</h2>
<ul>
<li>Update CCAFS project identifiers in input-forms.xml</li>
<li>Run system updates and restart the server</li>
</ul>
<h2 id="20160118">2016-01-18</h2>
<h2 id="2016-01-18">2016-01-18</h2>
<ul>
<li>Change &ldquo;Extension material&rdquo; to &ldquo;Extension Material&rdquo; in input-forms.xml (a mistake that fell through the cracks when we fixed the others in DSpace 4 era)</li>
</ul>
<h2 id="20160119">2016-01-19</h2>
<h2 id="2016-01-19">2016-01-19</h2>
<ul>
<li>Work on tweaks and updates for the social sharing icons on item pages: add Delicious and Mendeley (from Academicons), make links open in new windows, and set the icon color to the theme's primary color (<a href="https://github.com/ilri/DSpace/issues/157">#157</a>)</li>
<li>Tweak date-based facets to show more values in drill-down ranges (<a href="https://github.com/ilri/DSpace/issues/162">#162</a>)</li>
@ -129,7 +129,7 @@ Update GitHub wiki for documentation of maintenance tasks.
<li>Set up recipe on IFTTT to tweet new items from the CGSpace Atom feed to my twitter account</li>
<li>Altmetrics&rsquo; support for Handles is kinda weak, so they can't associate our items with DOIs until they are tweeted or blogged, etc first.</li>
</ul>
<h2 id="20160121">2016-01-21</h2>
<h2 id="2016-01-21">2016-01-21</h2>
<ul>
<li>Still waiting for my IFTTT recipe to fire, two days later</li>
<li>It looks like the Atom feed on CGSpace hasn't changed in two days, but there have definitely been new items</li>
@ -139,17 +139,17 @@ Update GitHub wiki for documentation of maintenance tasks.
<li>In any case, we should change this cache to be something more like 6 hours, as we publish new items several times per day.</li>
<li>Work around a CSS issue with long URLs in the item view (<a href="https://github.com/ilri/DSpace/issues/172">#172</a>)</li>
</ul>
<h2 id="20160125">2016-01-25</h2>
<h2 id="2016-01-25">2016-01-25</h2>
<ul>
<li>Re-deploy CGSpace and DSpace Test with latest <code>5_x-prod</code> branch</li>
<li>This included the social icon fixes/updates, date-based facet tweaks, reducing the feed cache age, and fixing a layout issue in XMLUI item view when an item had long URLs</li>
</ul>
<h2 id="20160126">2016-01-26</h2>
<h2 id="2016-01-26">2016-01-26</h2>
<ul>
<li>Run nginx updates on CGSpace and DSpace Test (<a href="http://mailman.nginx.org/pipermail/nginx/2016-January/049700.html">1.8.1 and 1.9.10, respectively</a>)</li>
<li>Run updates on DSpace Test and reboot for new Linode kernel <code>Linux 4.4.0-x86_64-linode63</code> (first update in months)</li>
</ul>
<h2 id="20160128">2016-01-28</h2>
<h2 id="2016-01-28">2016-01-28</h2>
<ul>
<li>
<p>Start looking at importing some Bioversity data that had been prepared earlier this week</p>
@ -162,7 +162,7 @@ $ find SimpleArchiveForBio/ -iname &ldquo;*.pdf&rdquo; -exec basename {} ; | sor
8</p>
</li>
</ul>
<h2 id="20160129">2016-01-29</h2>
<h2 id="2016-01-29">2016-01-29</h2>
<ul>
<li>Add five missing center-specific subjects to XMLUI item view (<a href="https://github.com/ilri/DSpace/issues/174">#174</a>)</li>
<li>This <a href="https://cgspace.cgiar.org/handle/10568/67062">CCAFS item</a> Before:</li>

View File

@ -35,7 +35,7 @@ I noticed we have a very interesting list of countries on CGSpace:
Not only are there 49,000 countries, we have some blanks (25)&hellip;
Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -116,7 +116,7 @@ Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&r
</p>
</header>
<h2 id="20160205">2016-02-05</h2>
<h2 id="2016-02-05">2016-02-05</h2>
<ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
@ -127,7 +127,7 @@ Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&r
<li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li>
<li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li>
</ul>
<h2 id="20160206">2016-02-06</h2>
<h2 id="2016-02-06">2016-02-06</h2>
<ul>
<li>Found a way to get items with null/empty metadata values from SQL</li>
<li>First, find the <code>metadata_field_id</code> for the field you want from the <code>metadatafieldregistry</code> table:</li>
@ -154,7 +154,7 @@ DELETE 25
<li>Yep! The full re-index seems to work.</li>
<li>Process the empty countries on CGSpace</li>
</ul>
<h2 id="20160207">2016-02-07</h2>
<h2 id="2016-02-07">2016-02-07</h2>
<ul>
<li>Working on cleaning up Abenet's DAGRIS data with OpenRefine</li>
<li>I discovered two really nice functions in OpenRefine: <code>value.trim()</code> and <code>value.escape(&quot;javascript&quot;)</code> which shows whitespace characters like <code>\r\n</code>!</li>
@ -195,14 +195,14 @@ $ /opt/brew/Cellar/tomcat/8.0.30/bin/catalina start
<li>After verifying that the site is working, start a full index:</li>
</ul>
<pre><code>$ ~/dspace/bin/dspace index-discovery -b
</code></pre><h2 id="20160208">2016-02-08</h2>
</code></pre><h2 id="2016-02-08">2016-02-08</h2>
<ul>
<li>Finish cleaning up and importing ~400 DAGRIS items into CGSpace</li>
<li>Whip up some quick CSS to make the button in the submission workflow use the XMLUI theme's brand colors (<a href="https://github.com/ilri/DSpace/issues/154">#154</a>)</li>
</ul>
<p><img src="/cgspace-notes/2016/02/submit-button-ilri.png" alt="ILRI submission buttons">
<img src="/cgspace-notes/2016/02/submit-button-drylands.png" alt="Drylands submission buttons"></p>
<h2 id="20160209">2016-02-09</h2>
<h2 id="2016-02-09">2016-02-09</h2>
<ul>
<li>Re-sync DSpace Test with CGSpace</li>
<li>Help Sisay with OpenRefine</li>
@ -239,7 +239,7 @@ Swap: 255 57 198
</code></pre><ul>
<li>So I'll bump up the Tomcat heap to 2048 (CGSpace production server is using 3GB)</li>
</ul>
<h2 id="20160211">2016-02-11</h2>
<h2 id="2016-02-11">2016-02-11</h2>
<ul>
<li>Massaging some CIAT data in OpenRefine</li>
<li>There are 1200 records that have PDFs, and will need to be imported into CGSpace</li>
@ -256,7 +256,7 @@ Processing 64661.pdf
Processing 64195.pdf
&gt; Downloading 64195.pdf
&gt; Creating thumbnail for 64195.pdf
</code></pre><h2 id="20160212">2016-02-12</h2>
</code></pre><h2 id="2016-02-12">2016-02-12</h2>
<ul>
<li>Looking at CIAT's records again, there are some problems with a dozen or so files (out of 1200)</li>
<li>A few items are using the same exact PDF</li>
@ -265,7 +265,7 @@ Processing 64195.pdf
<li>A few items have no item</li>
<li>Also, I'm not sure if we import these items, will be remove the <code>dc.identifier.url</code> field from the records?</li>
</ul>
<h2 id="201602121">2016-02-12</h2>
<h2 id="2016-02-12-1">2016-02-12</h2>
<ul>
<li>Looking at CIAT's records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I'm not sure if we can use those</li>
<li>265 items have dirty, URL-encoded filenames:</li>
@ -282,7 +282,7 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
<li>Merge pull requests for submission form theming (<a href="https://github.com/ilri/DSpace/pull/178">#178</a>) and missing center subjects in XMLUI item views (<a href="https://github.com/ilri/DSpace/pull/176">#176</a>)</li>
<li>They will be deployed on CGSpace the next time I re-deploy</li>
</ul>
<h2 id="20160216">2016-02-16</h2>
<h2 id="2016-02-16">2016-02-16</h2>
<ul>
<li>Turns out OpenRefine has an unescape function!</li>
</ul>
@ -296,14 +296,14 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
<li>To get filenames from <code>dc.identifier.url</code>, create a new column based on this transform: <code>forEach(value.split('||'), v, v.split('/')[-1]).join('||')</code></li>
<li>This also works for records that have multiple URLs (separated by &ldquo;||&rdquo;)</li>
</ul>
<h2 id="20160217">2016-02-17</h2>
<h2 id="2016-02-17">2016-02-17</h2>
<ul>
<li>Re-deploy CGSpace, run all system updates, and reboot</li>
<li>More work on CIAT data, cleaning and doing a last metadata-only import into DSpace Test</li>
<li>SAFBuilder has a bug preventing it from processing filenames containing more than one underscore</li>
<li>Need to re-process the filename column to replace multiple underscores with one: <code>value.replace(/_{2,}/, &quot;_&quot;)</code></li>
</ul>
<h2 id="20160220">2016-02-20</h2>
<h2 id="2016-02-20">2016-02-20</h2>
<ul>
<li>Turns out the &ldquo;bug&rdquo; in SAFBuilder isn't a bug, it's a feature that allows you to encode extra information like the destintion bundle in the filename</li>
<li>Also, it seems DSpace's SAF import tool doesn't like importing filenames that have accents in them:</li>
@ -313,7 +313,7 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
<li>Need to rename files to have no accents or umlauts, etc&hellip;</li>
<li>Useful custom text facet for URLs ending with &ldquo;.pdf&rdquo;: <code>value.endsWith(&quot;.pdf&quot;)</code></li>
</ul>
<h2 id="20160222">2016-02-22</h2>
<h2 id="2016-02-22">2016-02-22</h2>
<ul>
<li>To change Spanish accents to ASCII in OpenRefine:</li>
</ul>
@ -330,7 +330,7 @@ Bitstream: tést señora alimentación.pdf
<li>HFS+ stores filenames as a string, and filenames with accents get stored as <a href="https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/">character+accent</a> whereas Linux's ext4 stores them as an array of bytes</li>
<li>Running the SAFBuilder on Mac OS X works if you're going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem's encoding matches</li>
</ul>
<h2 id="20160229">2016-02-29</h2>
<h2 id="2016-02-29">2016-02-29</h2>
<ul>
<li>Got notified by some CIFOR colleagues that the Google Scholar team had contacted them about CGSpace's incorrect ordering of authors in Google Scholar metadata</li>
<li>Turns out there is a patch, and it was merged in DSpace 5.4: <a href="https://jira.duraspace.org/browse/DS-2679">https://jira.duraspace.org/browse/DS-2679</a></li>

View File

@ -25,7 +25,7 @@ Looking at issues with author authorities on CGSpace
For some reason we still have the index-lucene-update cron job active on CGSpace, but I&#39;m pretty sure we don&#39;t need it as of the latest few versions of Atmire&#39;s Listings and Reports module
Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -106,13 +106,13 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
</p>
</header>
<h2 id="20160302">2016-03-02</h2>
<h2 id="2016-03-02">2016-03-02</h2>
<ul>
<li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li>
<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
</ul>
<h2 id="20160307">2016-03-07</h2>
<h2 id="2016-03-07">2016-03-07</h2>
<ul>
<li>Troubleshooting the issues with the slew of commits for Atmire modules in <a href="https://github.com/ilri/DSpace/pull/182">#182</a></li>
<li>Their changes on <code>5_x-dev</code> branch work, but it is messy as hell with merge commits and old branch base</li>
@ -121,12 +121,12 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
<li>Restart DSpace Test, as it seems to have crashed after Sisay tried to import some CSV or zip or something:</li>
</ul>
<pre><code>Exception in thread &quot;Lucene Merge Thread #19&quot; org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device
</code></pre><h2 id="20160308">2016-03-08</h2>
</code></pre><h2 id="2016-03-08">2016-03-08</h2>
<ul>
<li>Add a few new filters to Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/issues/180">#180</a>)</li>
<li>We had also wanted to add a few to the Content and Usage module but I have to ask the editors which ones they were</li>
</ul>
<h2 id="20160310">2016-03-10</h2>
<h2 id="2016-03-10">2016-03-10</h2>
<ul>
<li>Disable the lucene cron job on CGSpace as it shouldn't be needed anymore</li>
<li>Discuss ORCiD and duplicate authors on Yammer</li>
@ -139,7 +139,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
<ul>
<li>Update documentation for Atmire modules</li>
</ul>
<h2 id="20160311">2016-03-11</h2>
<h2 id="2016-03-11">2016-03-11</h2>
<ul>
<li>As I was looking at the CUA config I realized our Discovery config is all messed up and confusing</li>
<li>I've opened an issue to track some of that work (<a href="https://github.com/ilri/DSpace/issues/186">#186</a>)</li>
@ -147,7 +147,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
<li>We had been confusing <code>dc.type</code> (a Dublin Core value) with <code>dc.type.output</code> (a value we invented) for a few years and it had permeated all aspects of our data, indexes, item displays, etc.</li>
<li>There is still some more work to be done to remove references to old <code>outputtype</code> and <code>output</code></li>
</ul>
<h2 id="20160314">2016-03-14</h2>
<h2 id="2016-03-14">2016-03-14</h2>
<ul>
<li>Fix some items that had invalid dates (I noticed them in the log during a re-indexing)</li>
<li>Reset <code>search.index.*</code> to the default, as it is only used by Lucene (deprecated by Discovery in DSpace 5.x): <a href="https://github.com/ilri/DSpace/pull/188">#188</a></li>
@ -155,11 +155,11 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
<li>Also four or so center-specific subject strings were missing for Discovery</li>
</ul>
<p><img src="/cgspace-notes/2016/03/missing-xmlui-string.png" alt="Missing XMLUI string"></p>
<h2 id="20160315">2016-03-15</h2>
<h2 id="2016-03-15">2016-03-15</h2>
<ul>
<li>Create simple theme for new AVCD community just for a unique Google Tracking ID (<a href="https://github.com/ilri/DSpace/pull/191">#191</a>)</li>
</ul>
<h2 id="20160316">2016-03-16</h2>
<h2 id="2016-03-16">2016-03-16</h2>
<ul>
<li>Still having problems deploying Atmire's CUA updates and fixes from January!</li>
<li>More discussion on the GitHub issue here: <a href="https://github.com/ilri/DSpace/pull/182">https://github.com/ilri/DSpace/pull/182</a></li>
@ -185,7 +185,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
<li>It seems this <code>dc.language</code> field isn't really used, but we should delete these values</li>
<li>Also, <code>dc.language.iso</code> has some weird values, like &ldquo;En&rdquo; and &ldquo;English&rdquo;</li>
</ul>
<h2 id="20160317">2016-03-17</h2>
<h2 id="2016-03-17">2016-03-17</h2>
<ul>
<li>It turns out <code>hi</code> is the ISO 639 language code for Hindi, but these should be in <code>dc.language.iso</code> instead of <code>dc.language</code></li>
<li>I fixed the eleven items with <code>hi</code> as well as some using the incorrect <code>vn</code> for Vietnamese</li>
@ -193,7 +193,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
<li>Re-sync CGSpace database to DSpace Test for Atmire to do some tests about the problematic CUA patches</li>
<li>The patches work fine with a clean database, so the error was caused by some mismatch in CUA versions and the database during my testing</li>
</ul>
<h2 id="20160318">2016-03-18</h2>
<h2 id="2016-03-18">2016-03-18</h2>
<ul>
<li>Merge Atmire fixes into <code>5_x-prod</code></li>
<li>Discuss thumbnails with Francesca from Bioversity</li>
@ -211,7 +211,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
</code></pre><ul>
<li>Also, it looks like adding <code>-sharpen 0x1.0</code> really improves the quality of the image for only a few KB</li>
</ul>
<h2 id="20160321">2016-03-21</h2>
<h2 id="2016-03-21">2016-03-21</h2>
<ul>
<li>Fix 66 site errors in Google's webmaster tools</li>
<li>I looked at a bunch of them and they were old URLs, weird things linked from non-existent items, etc, so I just marked them all as fixed</li>
@ -245,11 +245,11 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
<li>Run updates on CGSpace and reboot server (new kernel, <code>4.5.0</code>)</li>
<li>Deploy Let's Encrypt certificate for cgspace.cgiar.org, but still need to work it into the ansible playbooks</li>
</ul>
<h2 id="20160322">2016-03-22</h2>
<h2 id="2016-03-22">2016-03-22</h2>
<ul>
<li>Merge robots.txt patch and disallow indexing of browse pages as our sitemap is consumed correctly (<a href="https://github.com/ilri/DSpace/issues/198">#198</a>)</li>
</ul>
<h2 id="20160323">2016-03-23</h2>
<h2 id="2016-03-23">2016-03-23</h2>
<ul>
<li>Abenet is having problems saving group memberships, and she gets this error: <a href="https://gist.github.com/alanorth/87281c061c2de57b773e">https://gist.github.com/alanorth/87281c061c2de57b773e</a></li>
</ul>
@ -258,18 +258,18 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
<li>I can reproduce the same error on DSpace Test and on my Mac</li>
<li>Looks to be an issue with the Atmire modules, I've submitted a ticket to their tracker.</li>
</ul>
<h2 id="20160324">2016-03-24</h2>
<h2 id="2016-03-24">2016-03-24</h2>
<ul>
<li>Atmire sent a patch for the group saving issue: <a href="https://github.com/ilri/DSpace/pull/201">https://github.com/ilri/DSpace/pull/201</a></li>
<li>I tested it locally and it works, so I merged it to <code>5_x-prod</code> and will deploy on CGSpace this week</li>
</ul>
<h2 id="20160325">2016-03-25</h2>
<h2 id="2016-03-25">2016-03-25</h2>
<ul>
<li>Having problems with Listings and Reports, seems to be caused by a rogue reference to <code>dc.type.output</code></li>
<li>This is the error we get when we proceed to the second page of Listings and Reports: <a href="https://gist.github.com/alanorth/b2d7fb5b82f94898caaf">https://gist.github.com/alanorth/b2d7fb5b82f94898caaf</a></li>
<li>Commenting out the line works, but I haven't figured out the proper syntax for referring to <code>dc.type.*</code></li>
</ul>
<h2 id="20160328">2016-03-28</h2>
<h2 id="2016-03-28">2016-03-28</h2>
<ul>
<li>Look into enabling the embargo during item submission, see: <a href="https://wiki.duraspace.org/display/DSDOC5x/Embargo#Embargo-SubmissionProcess">https://wiki.duraspace.org/display/DSDOC5x/Embargo#Embargo-SubmissionProcess</a></li>
<li>Seems we only want <code>AccessStep</code> because <code>UploadWithEmbargoStep</code> disables the ability to edit embargos at the item level</li>
@ -281,7 +281,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
<li>This pull request simply updates the config for the dc.type.outputdc.type change that was made last week: <a href="https://github.com/ilri/DSpace/pull/204">https://github.com/ilri/DSpace/pull/204</a></li>
<li>Deploy robots.txt fix, embargo for item submissions, and listings and reports fix on CGSpace</li>
</ul>
<h2 id="20160329">2016-03-29</h2>
<h2 id="2016-03-29">2016-03-29</h2>
<ul>
<li>Skype meeting with Peter and Addis team to discuss metadata changes for Dublin Core, CGcore, and CGSpace-specific fields</li>
<li>We decided to proceed with some deletes first, then identify CGSpace-specific fields to clean/move to <code>cg.*</code>, and then worry about broader changes to DC</li>

View File

@ -29,7 +29,7 @@ After running DSpace for over five years I&#39;ve never needed to look in any ot
This will save us a few gigs of backup space we&#39;re paying for on S3
Also, I noticed the checker log has some errors we should pay attention to:
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -110,7 +110,7 @@ Also, I noticed the checker log has some errors we should pay attention to:
</p>
</header>
<h2 id="20160404">2016-04-04</h2>
<h2 id="2016-04-04">2016-04-04</h2>
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
@ -146,7 +146,7 @@ java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290
<li>Looks like cron will read limits from <code>/etc/security/limits.*</code> so we can do something for the tomcat7 user there</li>
<li>Submit pull request for Tomcat 7 limits in Ansible dspace role (<a href="https://github.com/ilri/rmg-ansible-public/pull/30">#30</a>)</li>
</ul>
<h2 id="20160405">2016-04-05</h2>
<h2 id="2016-04-05">2016-04-05</h2>
<ul>
<li>Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don't need!</li>
</ul>
@ -159,7 +159,7 @@ java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290
<li>Also, adjust the cron jobs for backups so they only backup <code>dspace.log</code> and some stats files (.dat)</li>
<li>Try to do some metadata field migrations using the Atmire batch UI (<code>dc.Species</code> → <code>cg.species</code>) but it took several hours and even missed a few records</li>
</ul>
<h2 id="20160406">2016-04-06</h2>
<h2 id="2016-04-06">2016-04-06</h2>
<ul>
<li>A better way to move metadata on this scale is via SQL, for example <code>dc.type.output</code> → <code>dc.type</code> (their IDs in the metadatafieldregistry are 66 and 109, respectively):</li>
</ul>
@ -169,7 +169,7 @@ UPDATE 40852
<li>After that an <code>index-discovery -bf</code> is required</li>
<li>Start working on metadata migrations, add 25 or so new metadata fields to CGSpace</li>
</ul>
<h2 id="20160407">2016-04-07</h2>
<h2 id="2016-04-07">2016-04-07</h2>
<ul>
<li>Write shell script to do the migration of fields: <a href="https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b">https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b</a></li>
<li>Testing with a few fields it seems to work well:</li>
@ -181,12 +181,12 @@ UPDATE metadatavalue SET metadata_field_id=202 WHERE metadata_field_id=72
UPDATE 21420
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51258
</code></pre><h2 id="20160408">2016-04-08</h2>
</code></pre><h2 id="2016-04-08">2016-04-08</h2>
<ul>
<li>Discuss metadata renaming with Abenet, we decided it's better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF</li>
<li>I've e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change</li>
</ul>
<h2 id="20160410">2016-04-10</h2>
<h2 id="2016-04-10">2016-04-10</h2>
<ul>
<li>Looking at the DOI issue <a href="https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=678507860">reported by Leroy from CIAT a few weeks ago</a></li>
<li>It seems the <code>dx.doi.org</code> URLs are much more proper in our repository!</li>
@ -204,12 +204,12 @@ dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and t
</code></pre><ul>
<li>I will manually edit the <code>dc.identifier.doi</code> in <a href="https://cgspace.cgiar.org/handle/10568/72509?show=full">10568/72509</a> and tweet the link, then check back in a week to see if the donut gets updated</li>
</ul>
<h2 id="20160411">2016-04-11</h2>
<h2 id="2016-04-11">2016-04-11</h2>
<ul>
<li>The donut is already updated and shows the correct number now</li>
<li>CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we'd do it tentatively on Monday the 18th.</li>
</ul>
<h2 id="20160412">2016-04-12</h2>
<h2 id="2016-04-12">2016-04-12</h2>
<ul>
<li>Looking at quality of WLE data (<code>cg.subject.iwmi</code>) in SQL:</li>
</ul>
@ -235,17 +235,17 @@ DELETE 226
<li>Unfortunately this isn't a very good solution, because Listings and Reports config should allow us to filter on <code>dc.type.*</code> but the documentation isn't very clear and I couldn't reach Atmire today</li>
<li>We want to do the <code>dc.type.output</code> move on CGSpace anyways, but we should wait as it might affect other external people!</li>
</ul>
<h2 id="20160414">2016-04-14</h2>
<h2 id="2016-04-14">2016-04-14</h2>
<ul>
<li>Communicate with Macaroni Bros again about <code>dc.type</code></li>
<li>Help Sisay with some rsync and Linux stuff</li>
<li>Notify CIAT people of metadata changes (I had forgotten them last week)</li>
</ul>
<h2 id="20160415">2016-04-15</h2>
<h2 id="2016-04-15">2016-04-15</h2>
<ul>
<li>DSpace Test had crashed, so I ran all system updates, rebooted, and re-deployed DSpace code</li>
</ul>
<h2 id="20160418">2016-04-18</h2>
<h2 id="2016-04-18">2016-04-18</h2>
<ul>
<li>Talk to CIAT people about their portal again</li>
<li>Start looking more at the fields we want to delete</li>
@ -316,7 +316,7 @@ javax.ws.rs.WebApplicationException
<li>Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)</li>
<li>After restarting Tomcat a few more of these errors were logged but the application was up</li>
</ul>
<h2 id="20160419">2016-04-19</h2>
<h2 id="2016-04-19">2016-04-19</h2>
<ul>
<li>Get handles for items that are using a given metadata field, ie <code>dc.Species.animal</code> (105):</li>
</ul>
@ -355,7 +355,7 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
</code></pre><ul>
<li>And then remove them from the metadata registry</li>
</ul>
<h2 id="20160420">2016-04-20</h2>
<h2 id="2016-04-20">2016-04-20</h2>
<ul>
<li>Re-deploy DSpace Test with the new subject and type fields, run all system updates, and reboot the server</li>
<li>Migrate fields and re-deploy CGSpace with the new subject and type fields, run all system updates, and reboot the server</li>
@ -386,16 +386,16 @@ UPDATE 46075
<li>Looks like this issue was noted and fixed in DSpace 5.5 (we're on 5.1): <a href="https://jira.duraspace.org/browse/DS-2936">https://jira.duraspace.org/browse/DS-2936</a></li>
<li>I've sent a message to Atmire asking about compatibility with DSpace 5.5</li>
</ul>
<h2 id="20160421">2016-04-21</h2>
<h2 id="2016-04-21">2016-04-21</h2>
<ul>
<li>Fix a bunch of metadata consistency issues with IITA Journal Articles (Peer review, Formally published, messed up DOIs, etc)</li>
<li>Atmire responded with DSpace 5.5 compatible versions for their modules, so I'll start testing those in a few weeks</li>
</ul>
<h2 id="20160422">2016-04-22</h2>
<h2 id="2016-04-22">2016-04-22</h2>
<ul>
<li>Import 95 records into <a href="https://cgspace.cgiar.org/handle/10568/42219">CTA's Agrodok collection</a></li>
</ul>
<h2 id="20160426">2016-04-26</h2>
<h2 id="2016-04-26">2016-04-26</h2>
<ul>
<li>Test embargo during item upload</li>
<li>Seems to be working but the help text is misleading as to the date format</li>
@ -409,7 +409,7 @@ UPDATE 46075
</ul>
</li>
</ul>
<h2 id="20160427">2016-04-27</h2>
<h2 id="2016-04-27">2016-04-27</h2>
<ul>
<li>I woke up to ten or fifteen &ldquo;up&rdquo; and &ldquo;down&rdquo; emails from the monitoring website</li>
<li>Looks like the last one was &ldquo;down&rdquo; from about four hours ago</li>
@ -451,12 +451,12 @@ dspace.log.2016-04-27:7271
<li>Currently running on DSpace Test, we'll give it a few days before we adjust CGSpace</li>
<li>CGSpace down, restarted tomcat and it's back up</li>
</ul>
<h2 id="20160428">2016-04-28</h2>
<h2 id="2016-04-28">2016-04-28</h2>
<ul>
<li>Problems with stability again. I've blocked access to <code>/rest</code> for now to see if the number of errors in the log files drop</li>
<li>Later we could maybe start logging access to <code>/rest</code> and perhaps whitelist some IPs&hellip;</li>
</ul>
<h2 id="20160430">2016-04-30</h2>
<h2 id="2016-04-30">2016-04-30</h2>
<ul>
<li>Logs for today and yesterday have zero references to this REST error, so I'm going to open back up the REST API but log all requests</li>
</ul>

View File

@ -31,7 +31,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
# awk &#39;{print $1}&#39; /var/log/nginx/rest.log | uniq | wc -l
3168
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -112,7 +112,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
</p>
</header>
<h2 id="20160501">2016-05-01</h2>
<h2 id="2016-05-01">2016-05-01</h2>
<ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li>
@ -129,13 +129,13 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
<li>For now I'll block just the Ethiopian IP</li>
<li>The owner of that application has said that the <code>NaN</code> (not a number) is an error in his code and he'll fix it</li>
</ul>
<h2 id="20160503">2016-05-03</h2>
<h2 id="2016-05-03">2016-05-03</h2>
<ul>
<li>Update nginx to 1.10.x branch on CGSpace</li>
<li>Fix a reference to <code>dc.type.output</code> in Discovery that I had missed when we migrated to <code>dc.type</code> last month (<a href="https://github.com/ilri/DSpace/pull/223">#223</a>)</li>
</ul>
<p><img src="/cgspace-notes/2016/05/discovery-types.png" alt="Item type in Discovery results"></p>
<h2 id="20160506">2016-05-06</h2>
<h2 id="2016-05-06">2016-05-06</h2>
<ul>
<li>DSpace Test is down, <code>catalina.out</code> has lots of messages about heap space from some time yesterday (!)</li>
<li>It looks like Sisay was doing some batch imports</li>
@ -168,7 +168,7 @@ fi
</code></pre><ul>
<li>Seems to work well</li>
</ul>
<h2 id="20160510">2016-05-10</h2>
<h2 id="2016-05-10">2016-05-10</h2>
<ul>
<li>Start looking at more metadata migrations</li>
<li>There are lots of fields in <code>dcterms</code> namespace that look interesting, like:
@ -181,7 +181,7 @@ fi
<li>Looks like these were <a href="https://wiki.duraspace.org/display/DSDOC5x/Metadata+and+Bitstream+Format+Registries#MetadataandBitstreamFormatRegistries-DublinCoreTermsRegistry(DCTERMS)">added in DSpace 4</a> to allow for future work to make DSpace more flexible</li>
<li>CGSpace's <code>dc</code> registry has 96 items, and the default DSpace one has 73.</li>
</ul>
<h2 id="20160511">2016-05-11</h2>
<h2 id="2016-05-11">2016-05-11</h2>
<ul>
<li>
<p>Identify and propose the next phase of CGSpace fields to migrate:</p>
@ -213,7 +213,7 @@ fi
<li>I told her I would increase the limit temporarily tomorrow morning</li>
<li>Turns out she was able to decrease the size of the PDF so we didn't have to do anything</li>
</ul>
<h2 id="20160512">2016-05-12</h2>
<h2 id="2016-05-12">2016-05-12</h2>
<ul>
<li>Looks like the issue that Abenet was having a few days ago with &ldquo;Connection Reset&rdquo; in Firefox might be due to a Firefox 46 issue: <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1268775">https://bugzilla.mozilla.org/show_bug.cgi?id=1268775</a></li>
<li>I finally found a copy of the latest CG Core metadata guidelines and it looks like we can add a few more fields to our next migration:
@ -233,7 +233,7 @@ fi
<li>Found ~200 messed up CIAT values in <code>dc.publisher</code>:</li>
</ul>
<pre><code># select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=39 and text_value similar to &quot;% %&quot;;
</code></pre><h2 id="20160513">2016-05-13</h2>
</code></pre><h2 id="2016-05-13">2016-05-13</h2>
<ul>
<li>More theorizing about CGcore</li>
<li>Add two new fields:
@ -245,7 +245,7 @@ fi
<li><code>dc.place</code> is our own field, so it's easy to move</li>
<li>I've removed <code>dc.title.jtitle</code> from the list for now because there's no use moving it out of DC until we know where it will go (see discussion yesterday)</li>
</ul>
<h2 id="20160518">2016-05-18</h2>
<h2 id="2016-05-18">2016-05-18</h2>
<ul>
<li>Work on 707 CCAFS records</li>
<li>They have thumbnails on Flickr and elsewhere</li>
@ -257,7 +257,7 @@ fi
<li>So for the <code>hqdefault.jpg</code> ones I just take the UUID (-2) and use it as the filename</li>
<li>Before importing with SAFBuilder I tested adding &ldquo;__bundle:THUMBNAIL&rdquo; to the <code>filename</code> column and it works fine</li>
</ul>
<h2 id="20160519">2016-05-19</h2>
<h2 id="2016-05-19">2016-05-19</h2>
<ul>
<li>More quality control on <code>filename</code> field of CCAFS records to make processing in shell and SAFBuilder more reliable:</li>
</ul>
@ -274,7 +274,7 @@ fi
</li>
</ul>
<pre><code># select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA');
</code></pre><h2 id="20160520">2016-05-20</h2>
</code></pre><h2 id="2016-05-20">2016-05-20</h2>
<ul>
<li>More work on CCAFS Video and Images records</li>
<li>For SAFBuilder we need to modify filename column to have the thumbnail bundle:</li>
@ -290,14 +290,14 @@ fi
<li>A few miscellaneous fixes for XMLUI display niggles (spaces in item lists and link target <code>_black</code>): <a href="https://github.com/ilri/DSpace/pull/224">#224</a></li>
<li>Work on configuration changes for Phase 2 metadata migrations</li>
</ul>
<h2 id="20160523">2016-05-23</h2>
<h2 id="2016-05-23">2016-05-23</h2>
<ul>
<li>Try to import the CCAFS Images and Videos to CGSpace but had some issues with LibreOffice and OpenRefine</li>
<li>LibreOffice excludes empty cells when it exports and all the fields shift over to the left and cause URLs to go to Subjects, etc.</li>
<li>Google Docs does this better, but somehow reorders the rows and when I paste the thumbnail/filename row in they don't match!</li>
<li>I will have to try later</li>
</ul>
<h2 id="20160530">2016-05-30</h2>
<h2 id="2016-05-30">2016-05-30</h2>
<ul>
<li>Export CCAFS video and image records from DSpace Test using the migrate option (<code>-m</code>):</li>
</ul>
@ -320,7 +320,7 @@ $ /home/cgspace.cgiar.org/bin/dspace metadata-import -e aorth@mjanja.ch -f ~/CTA
<li>Discovery indexing took a few hours for some reason, and after that I started the <code>index-authority</code> script</li>
</ul>
<pre><code>$ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; /home/cgspace.cgiar.org/bin/dspace index-authority
</code></pre><h2 id="20160531">2016-05-31</h2>
</code></pre><h2 id="2016-05-31">2016-05-31</h2>
<ul>
<li>The <code>index-authority</code> script ran over night and was finished in the morning</li>
<li>Hopefully this was because we haven't been running it regularly and it will speed up next time</li>

View File

@ -31,7 +31,7 @@ This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRec
You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -112,7 +112,7 @@ Working on second phase of metadata migration, looks like this will work for mov
</p>
</header>
<h2 id="20160601">2016-06-01</h2>
<h2 id="2016-06-01">2016-06-01</h2>
<ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
@ -128,7 +128,7 @@ UPDATE 14
</code></pre><ul>
<li>Fix a few minor miscellaneous issues in <code>dspace.cfg</code> (<a href="https://github.com/ilri/DSpace/pull/227">#227</a>)</li>
</ul>
<h2 id="20160602">2016-06-02</h2>
<h2 id="2016-06-02">2016-06-02</h2>
<ul>
<li>Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with <code>cg.coverage.admin-unit</code></li>
<li>Seems that the Browse configuration in <code>dspace.cfg</code> can't handle the &lsquo;-&rsquo; in the field name:</li>
@ -141,7 +141,7 @@ UPDATE 14
<li>I found a thread on the mailing list talking about it and there is bug report and a patch: <a href="https://jira.duraspace.org/browse/DS-2740">https://jira.duraspace.org/browse/DS-2740</a></li>
<li>The patch applies successfully on DSpace 5.1 so I will try it later</li>
</ul>
<h2 id="20160603">2016-06-03</h2>
<h2 id="2016-06-03">2016-06-03</h2>
<ul>
<li>Investigating the CCAFS authority issue, I exported the metadata for the Videos collection</li>
<li>The top two authors are:</li>
@ -197,13 +197,13 @@ UPDATE 960
</code></pre><ul>
<li>That would only be for the &ldquo;Browse by&rdquo; function&hellip; so we'll have to see what effect that has later</li>
</ul>
<h2 id="20160604">2016-06-04</h2>
<h2 id="2016-06-04">2016-06-04</h2>
<ul>
<li>Re-sync DSpace Test with CGSpace and perform test of metadata migration again</li>
<li>Run phase two of metadata migrations on CGSpace (see the <a href="https://gist.github.com/alanorth/1a730bec5ac9457a8fb0e3e72c98d09c">migration notes</a>)</li>
<li>Run all system updates and reboot CGSpace server</li>
</ul>
<h2 id="20160607">2016-06-07</h2>
<h2 id="2016-06-07">2016-06-07</h2>
<ul>
<li>Figured out how to export a list of the unique values from a metadata field ordered by count:</li>
</ul>
@ -230,7 +230,7 @@ UPDATE 960
<p>Looks like OAI is kinda obtuse for this, and if we use ContentDM's API we'll be able to access their internal field names (rather than trying to figure out how they stuffed them into various, repeated Dublin Core fields)</p>
</li>
</ul>
<h2 id="20160608">2016-06-08</h2>
<h2 id="2016-06-08">2016-06-08</h2>
<ul>
<li>Discuss controlled vocabularies for ~28 fields</li>
<li>Looks like this is all we need: <a href="https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ConfiguringControlledVocabularies">https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ConfiguringControlledVocabularies</a></li>
@ -243,13 +243,13 @@ UPDATE 960
<li>In other news, I found out that the About page that we haven't been using lives in <code>dspace/config/about.xml</code>, so now we can update the text</li>
<li>File bug about <code>closed=&quot;true&quot;</code> attribute of controlled vocabularies not working: <a href="https://jira.duraspace.org/browse/DS-3238">https://jira.duraspace.org/browse/DS-3238</a></li>
</ul>
<h2 id="20160609">2016-06-09</h2>
<h2 id="2016-06-09">2016-06-09</h2>
<ul>
<li>Atmire explained that the <code>atmire.orcid.id</code> field doesn't exist in the schema, as it actually comes from the authority cache during XMLUI run time</li>
<li>This means we don't see it when harvesting via OAI or REST, for example</li>
<li>They opened a feature ticket on the DSpace tracker to ask for support of this: <a href="https://jira.duraspace.org/browse/DS-3239">https://jira.duraspace.org/browse/DS-3239</a></li>
</ul>
<h2 id="20160610">2016-06-10</h2>
<h2 id="2016-06-10">2016-06-10</h2>
<ul>
<li>Investigating authority confidences</li>
<li>It looks like the values are documented in <code>Choices.java</code></li>
@ -269,16 +269,16 @@ UPDATE 960
<li>Merge item display tweaks from earlier this week (<a href="https://github.com/ilri/DSpace/pull/231">#231</a>)</li>
<li>Merge controlled vocabulary functionality for subregions (<a href="https://github.com/ilri/DSpace/pull/238">#238</a>)</li>
</ul>
<h2 id="20160611">2016-06-11</h2>
<h2 id="2016-06-11">2016-06-11</h2>
<ul>
<li>Merge controlled vocabulary for sponsorship field (<a href="https://github.com/ilri/DSpace/pull/239">#239</a>)</li>
<li>Fix character encoding issues for animal breed lookup that I merged yesterday</li>
</ul>
<h2 id="20160617">2016-06-17</h2>
<h2 id="2016-06-17">2016-06-17</h2>
<ul>
<li>Linode has free RAM upgrades for their 13th birthday so I migrated DSpace Test (4→8GB of RAM)</li>
</ul>
<h2 id="20160618">2016-06-18</h2>
<h2 id="2016-06-18">2016-06-18</h2>
<ul>
<li>
<p>Clean up titles and hints in <code>input-forms.xml</code> to use title/sentence case and a few more consistency things (<a href="https://github.com/ilri/DSpace/pull/241">#241</a>)</p>
@ -308,7 +308,7 @@ UPDATE 960
<p>Need to run <code>fix-metadata-values.py</code> and then <code>fix-metadata-values.py</code></p>
</li>
</ul>
<h2 id="20160620">2016-06-20</h2>
<h2 id="2016-06-20">2016-06-20</h2>
<ul>
<li>CGSpace's HTTPS certificate expired last night and I didn't notice, had to renew:</li>
</ul>
@ -316,7 +316,7 @@ UPDATE 960
</code></pre><ul>
<li>I really need to fix that cron job&hellip;</li>
</ul>
<h2 id="20160624">2016-06-24</h2>
<h2 id="2016-06-24">2016-06-24</h2>
<ul>
<li>Run the replacements/deletes for <code>dc.description.sponsorship</code> (investors) on CGSpace:</li>
</ul>
@ -332,7 +332,7 @@ $ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.spons
<li>Add new sponsors to controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/244">#244</a>)</li>
<li>Refine submission form labels and hints</li>
</ul>
<h2 id="20160628">2016-06-28</h2>
<h2 id="2016-06-28">2016-06-28</h2>
<ul>
<li>Testing the cleanup of <code>dc.contributor.corporate</code> with 13 deletions and 121 replacements</li>
<li>There are still ~97 fields that weren't indicated to do anything</li>
@ -342,7 +342,7 @@ $ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.spons
</code></pre><ul>
<li>Re-evaluate <code>dc.contributor.corporate</code> and it seems we will move it to <code>dc.contributor.author</code> as this is more in line with how editors are actually using it</li>
</ul>
<h2 id="20160629">2016-06-29</h2>
<h2 id="2016-06-29">2016-06-29</h2>
<ul>
<li>Test run of <code>migrate-fields.sh</code> with the following re-mappings:</li>
</ul>
@ -371,7 +371,7 @@ $ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-D
<li>Run all system updates on the servers and reboot</li>
<li>Start working on config changes for phase three of the metadata migrations</li>
</ul>
<h2 id="20160630">2016-06-30</h2>
<h2 id="2016-06-30">2016-06-30</h2>
<ul>
<li>Wow, there are 95 authors in the database who have &lsquo;,&rsquo; at the end of their name:</li>
</ul>

View File

@ -41,7 +41,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
In this case the select query was showing 95 results before the update
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -122,7 +122,7 @@ In this case the select query was showing 95 results before the update
</p>
</header>
<h2 id="20160701">2016-07-01</h2>
<h2 id="2016-07-01">2016-07-01</h2>
<ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
@ -136,15 +136,15 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</code></pre><ul>
<li>In this case the select query was showing 95 results before the update</li>
</ul>
<h2 id="20160702">2016-07-02</h2>
<h2 id="2016-07-02">2016-07-02</h2>
<ul>
<li>Comment on DSpace Jira ticket about author lookup search text (<a href="https://jira.duraspace.org/browse/DS-2329">DS-2329</a>)</li>
</ul>
<h2 id="20160704">2016-07-04</h2>
<h2 id="2016-07-04">2016-07-04</h2>
<ul>
<li>Seems the database's author authority values mean nothing without the <code>authority</code> Solr core from the host where they were created!</li>
</ul>
<h2 id="20160705">2016-07-05</h2>
<h2 id="2016-07-05">2016-07-05</h2>
<ul>
<li>Amend <code>backup-solr.sh</code> script so it backs up the entire Solr folder</li>
<li>We <em>really</em> only need <code>statistics</code> and <code>authority</code> but meh</li>
@ -157,7 +157,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<li>I tested the <a href="https://jira.duraspace.org/browse/DS-2740">patch for DS-2740</a> that I had found last month and it seems to work</li>
<li>I will merge it to <code>5_x-prod</code></li>
</ul>
<h2 id="20160706">2016-07-06</h2>
<h2 id="2016-07-06">2016-07-06</h2>
<ul>
<li>Delete 23 blank metadata values from CGSpace:</li>
</ul>
@ -186,22 +186,22 @@ $ ./delete-metadata-values.py -f dc.contributor.affiliation -i Affiliations-Dele
</code></pre><ul>
<li>I then ran all server updates and rebooted the server</li>
</ul>
<h2 id="20160711">2016-07-11</h2>
<h2 id="2016-07-11">2016-07-11</h2>
<ul>
<li>Doing some author cleanups from Peter and Abenet:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/Authors-Fix-205-UTF8.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
$ ./delete-metadata-values.py -f dc.contributor.author -i /tmp/Authors-Delete-UTF8.csv -m 3 -u dspacetest -d dspacetest -p fuuu
</code></pre><h2 id="20160713">2016-07-13</h2>
</code></pre><h2 id="2016-07-13">2016-07-13</h2>
<ul>
<li>Run the author cleanups on CGSpace and start a full Discovery re-index</li>
</ul>
<h2 id="20160714">2016-07-14</h2>
<h2 id="2016-07-14">2016-07-14</h2>
<ul>
<li>Test LDAP settings for new root LDAP</li>
<li>Seems to work when binding as a top-level user</li>
</ul>
<h2 id="20160718">2016-07-18</h2>
<h2 id="2016-07-18">2016-07-18</h2>
<ul>
<li>Adjust identifiers in XMLUI item display to be more prominent</li>
<li>Add species and breed to the XMLUI item display</li>
@ -226,12 +226,12 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
proxy_pass http://127.0.0.1:8443;
deny 70.32.99.142;
}
</code></pre><h2 id="20160721">2016-07-21</h2>
</code></pre><h2 id="2016-07-21">2016-07-21</h2>
<ul>
<li>Mitigate the <a href="https://httpoxy.org">HTTPoxy</a> vulnerability for Tomcat etc in nginx: <a href="https://github.com/ilri/rmg-ansible-public/pull/38">https://github.com/ilri/rmg-ansible-public/pull/38</a></li>
<li>Unblock 70.32.99.142 from <code>/rest</code> as it has been blocked for a few days</li>
</ul>
<h2 id="20160722">2016-07-22</h2>
<h2 id="2016-07-22">2016-07-22</h2>
<ul>
<li>Help Paola from CCAFS with thumbnails for batch uploads</li>
<li>She has been struggling to get the dimensions right, and manually enlarging smaller thumbnails, renaming PNGs to JPG, etc</li>
@ -268,7 +268,7 @@ index.authority.ignore-variants=true
</code></pre><ul>
<li>After re-indexing and clearing the XMLUI cache nothing has changed</li>
</ul>
<h2 id="20160725">2016-07-25</h2>
<h2 id="2016-07-25">2016-07-25</h2>
<ul>
<li>Trying a few more settings (plus reindex) for Discovery on DSpace Test:</li>
</ul>
@ -292,7 +292,7 @@ discovery.index.authority.ignore-variants=true
<li>Re-sync DSpace Test with CGSpace</li>
<li>I noticed that our backup scripts don't send Solr cores to S3 so I amended the script</li>
</ul>
<h2 id="20160731">2016-07-31</h2>
<h2 id="2016-07-31">2016-07-31</h2>
<ul>
<li>Work on removing Dryland Systems and Humidtropics subjects from Discovery sidebar and Browse by</li>
<li>Also change &ldquo;Subjects&rdquo; to &ldquo;AGROVOC keywords&rdquo; in Discovery sidebar/search and Browse by (<a href="https://github.com/ilri/DSpace/issues/257">#257</a>)</li>

View File

@ -39,7 +39,7 @@ $ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -120,7 +120,7 @@ $ git rebase -i dspace-5.5
</p>
</header>
<h2 id="20160801">2016-08-01</h2>
<h2 id="2016-08-01">2016-08-01</h2>
<ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
@ -141,33 +141,33 @@ $ git rebase -i dspace-5.5
<li>Eventually I just turned on git rerere and solved the conflicts and completed the 403 commit rebase</li>
<li>The 5.5 code now builds but doesn't run (white page in Tomcat)</li>
</ul>
<h2 id="20160802">2016-08-02</h2>
<h2 id="2016-08-02">2016-08-02</h2>
<ul>
<li>Ask Atmire for help with DSpace 5.5 issue</li>
<li>Vanilla DSpace 5.5 deploys and runs fine</li>
<li>Playing with DSpace in Ubuntu 16.04 and Tomcat 7</li>
<li>Everything is still fucked up, even vanilla DSpace 5.5</li>
</ul>
<h2 id="20160804">2016-08-04</h2>
<h2 id="2016-08-04">2016-08-04</h2>
<ul>
<li>Ask on DSpace mailing list about duplicate authors, Discovery and author text values</li>
<li>Atmire responded with some new DSpace 5.5 ready versions to try for their modules</li>
</ul>
<h2 id="20160805">2016-08-05</h2>
<h2 id="2016-08-05">2016-08-05</h2>
<ul>
<li>Fix item display incorrectly displaying Species when Breeds were present (<a href="https://github.com/ilri/DSpace/pull/260">#260</a>)</li>
<li>Experiment with fixing more authors, like Delia Grace:</li>
</ul>
<pre><code>dspacetest=# update metadatavalue set authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where metadata_field_id=3 and text_value='Grace, D.';
</code></pre><h2 id="20160806">2016-08-06</h2>
</code></pre><h2 id="2016-08-06">2016-08-06</h2>
<ul>
<li>Finally figured out how to remove &ldquo;View/Open&rdquo; and &ldquo;Bitstreams&rdquo; from the item view</li>
</ul>
<h2 id="20160807">2016-08-07</h2>
<h2 id="2016-08-07">2016-08-07</h2>
<ul>
<li>Start working on Ubuntu 16.04 Ansible playbook for Tomcat 8, PostgreSQL 9.5, Oracle 8, etc</li>
</ul>
<h2 id="20160808">2016-08-08</h2>
<h2 id="2016-08-08">2016-08-08</h2>
<ul>
<li>Still troubleshooting Atmire modules on DSpace 5.5</li>
<li>Vanilla DSpace 5.5 works on Tomcat 7&hellip;</li>
@ -190,13 +190,13 @@ $ ln -sv ~/dspace/webapps/oai /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/oai
$ ln -sv ~/dspace/webapps/jspui /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/jspui
$ ln -sv ~/dspace/webapps/rest /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/rest
$ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/solr
</code></pre><h2 id="20160809">2016-08-09</h2>
</code></pre><h2 id="2016-08-09">2016-08-09</h2>
<ul>
<li>More tests of Atmire's 5.5 modules on a clean, working instance of <code>5_x-prod</code></li>
<li>Still fails, though perhaps differently than before (Flyway): <a href="https://gist.github.com/alanorth/5d49c45a16efd7c6bc1e6642e66118b2">https://gist.github.com/alanorth/5d49c45a16efd7c6bc1e6642e66118b2</a></li>
<li>More work on Tomcat 8 and Java 8 stuff for Ansible playbooks</li>
</ul>
<h2 id="20160810">2016-08-10</h2>
<h2 id="2016-08-10">2016-08-10</h2>
<ul>
<li>Turns out DSpace 5.x isn't ready for Tomcat 8: <a href="https://jira.duraspace.org/browse/DS-3092">https://jira.duraspace.org/browse/DS-3092</a></li>
<li>So we'll need to use Tomcat 7 + Java 8 on Ubuntu 16.04</li>
@ -204,27 +204,27 @@ $ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/sol
<li>Merge pull request for fixing the type Discovery index to use <code>dc.type</code> (<a href="https://github.com/ilri/DSpace/pull/262">#262</a>)</li>
<li>Merge pull request for removing &ldquo;Bitstream&rdquo; text from item display, as it confuses users and isn't necessary (<a href="https://github.com/ilri/DSpace/pull/263">#263</a>)</li>
</ul>
<h2 id="20160811">2016-08-11</h2>
<h2 id="2016-08-11">2016-08-11</h2>
<ul>
<li>Finally got DSpace (5.5) running on Ubuntu 16.04, Tomcat 7, Java 8, PostgreSQL 9.5 via the updated Ansible stuff</li>
</ul>
<p><img src="/cgspace-notes/2016/08/dspace55-ubuntu16.04.png" alt="DSpace 5.5 on Ubuntu 16.04, Tomcat 7, Java 8, PostgreSQL 9.5"></p>
<h2 id="20160814">2016-08-14</h2>
<h2 id="2016-08-14">2016-08-14</h2>
<ul>
<li>Update Mirage 2 build notes for Ubuntu 16.04: <a href="https://gist.github.com/alanorth/2cf9c15834dc68a514262fcb04004cb0">https://gist.github.com/alanorth/2cf9c15834dc68a514262fcb04004cb0</a></li>
</ul>
<h2 id="20160815">2016-08-15</h2>
<h2 id="2016-08-15">2016-08-15</h2>
<ul>
<li>Notes on NodeJS + nginx + systemd: <a href="https://gist.github.com/alanorth/51acd476891c67dfe27725848cf5ace1">https://gist.github.com/alanorth/51acd476891c67dfe27725848cf5ace1</a></li>
</ul>
<p><img src="/cgspace-notes/2016/08/nodejs-nginx.png" alt="ExpressJS running behind nginx"></p>
<h2 id="20160816">2016-08-16</h2>
<h2 id="2016-08-16">2016-08-16</h2>
<ul>
<li>Troubleshoot Paramiko connection issues with Ansible on ILRI servers: <a href="https://github.com/ilri/rmg-ansible-public/issues/37">#37</a></li>
<li>Turns out we need to add some MACs to our <code>sshd_config</code>: hmac-sha2-512,hmac-sha2-256</li>
<li>Update DSpace Test's Java to version 8 to start testing this configuration (<a href="https://wiki.apache.org/solr/ShawnHeisey">seeing as Solr recommends it</a>)</li>
</ul>
<h2 id="20160817">2016-08-17</h2>
<h2 id="2016-08-17">2016-08-17</h2>
<ul>
<li>More work on Let's Encrypt stuff for Ansible roles</li>
<li>Yesterday Atmire responded about DSpace 5.5 issues and asked me to try the <code>dspace database repair</code> command to fix Flyway issues</li>
@ -233,7 +233,7 @@ $ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/sol
<li>After removing the spring folder and running ant install again, <code>dspace database</code> works</li>
<li>I see there are missing and pending Flyway migrations, but running <code>dspace database repair</code> and <code>dspace database migrate</code> does nothing: <a href="https://gist.github.com/alanorth/41ed5abf2ff32d8ac9eedd1c3d015d70">https://gist.github.com/alanorth/41ed5abf2ff32d8ac9eedd1c3d015d70</a></li>
</ul>
<h2 id="20160818">2016-08-18</h2>
<h2 id="2016-08-18">2016-08-18</h2>
<ul>
<li>Fix &ldquo;CONGO,DR&rdquo; country name in <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/264">#264</a>)</li>
<li>Also need to fix existing records using the incorrect form in the database:</li>
@ -242,7 +242,7 @@ $ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/sol
</code></pre><ul>
<li>I asked a question on the DSpace mailing list about updating &ldquo;preferred&rdquo; forms of author names from ORCID</li>
</ul>
<h2 id="20160821">2016-08-21</h2>
<h2 id="2016-08-21">2016-08-21</h2>
<ul>
<li>A few days ago someone on the DSpace mailing list suggested I try <code>dspace dsrun org.dspace.authority.UpdateAuthorities</code> to update preferred author names from ORCID</li>
<li>If you set <code>auto-update-items=true</code> in <code>dspace/config/modules/solrauthority.cfg</code> it is supposed to update records it finds automatically</li>
@ -250,7 +250,7 @@ $ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/sol
<li>Still troubleshooting Atmire modules on DSpace 5.5</li>
<li>I sent them some new verbose logs: <a href="https://gist.github.com/alanorth/700748995649688148ceba89d760253e">https://gist.github.com/alanorth/700748995649688148ceba89d760253e</a></li>
</ul>
<h2 id="20160822">2016-08-22</h2>
<h2 id="2016-08-22">2016-08-22</h2>
<ul>
<li>Database migrations are fine on DSpace 5.1:</li>
</ul>
@ -286,7 +286,7 @@ Database Driver: PostgreSQL Native Driver version PostgreSQL 9.1 JDBC4 (build 90
</code></pre><ul>
<li>So I'm not sure why they have problems when we move to DSpace 5.5 (even the 5.1 migrations themselves show as &ldquo;Missing&rdquo;)</li>
</ul>
<h2 id="20160823">2016-08-23</h2>
<h2 id="2016-08-23">2016-08-23</h2>
<ul>
<li>Help Paola from CCAFS with her thumbnails again</li>
<li>Talk to Atmire about the DSpace 5.5 issue, and it seems to be caused by a bug in FlywayDB</li>
@ -311,13 +311,13 @@ context:/jndi:/localhost/themes/0_CGIAR/sitemap.xmap - 136:77
<li>I tried with a small version bump to CUA but it didn't work (version <code>5.5-4.1.1-0</code>)</li>
<li>Also, I started looking into huge pages to prepare for PostgreSQL 9.5, but it seems Linode's kernels don't enable them</li>
</ul>
<h2 id="20160824">2016-08-24</h2>
<h2 id="2016-08-24">2016-08-24</h2>
<ul>
<li>Clean up and import 48 CCAFS records into DSpace Test</li>
<li>SQL to get all journal titles from dc.source (55), since it's apparently used for internal DSpace filename shit, but we moved all our journal titles there a few months ago:</li>
</ul>
<pre><code>dspacetest=# select distinct text_value from metadatavalue where metadata_field_id=55 and text_value !~ '.*(\.pdf|\.png|\.PDF|\.Pdf|\.JPEG|\.jpg|\.JPG|\.jpeg|\.xls|\.rtf|\.docx?|\.potx|\.dotx|\.eqa|\.tiff|\.mp4|\.mp3|\.gif|\.zip|\.txt|\.pptx|\.indd|\.PNG|\.bmp|\.exe|org\.dspace\.app\.mediafilter).*';
</code></pre><h2 id="20160825">2016-08-25</h2>
</code></pre><h2 id="2016-08-25">2016-08-25</h2>
<ul>
<li>Atmire suggested adding a missing bean to <code>dspace/config/spring/api/atmire-cua.xml</code> but it doesn't help:</li>
</ul>
@ -347,7 +347,7 @@ $ JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot; /home/cgspace.cgiar.org/b
</code></pre><ul>
<li>Finally got DSpace 5.5 working with the Atmire modules after a few rounds of back and forth with Atmire devs</li>
</ul>
<h2 id="20160826">2016-08-26</h2>
<h2 id="2016-08-26">2016-08-26</h2>
<ul>
<li>CGSpace had issues tonight, not entirely crashing, but becoming unresponsive</li>
<li>The dspace log had this:</li>
@ -356,7 +356,7 @@ $ JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot; /home/cgspace.cgiar.org/b
</code></pre><ul>
<li>Related to /rest no doubt</li>
</ul>
<h2 id="20160827">2016-08-27</h2>
<h2 id="2016-08-27">2016-08-27</h2>
<ul>
<li>Run corrections for Delia Grace and <code>CONGO, DR</code>, and deploy August changes to CGSpace</li>
<li>Run all system updates and reboot the server</li>

View File

@ -31,7 +31,7 @@ It looks like we might be able to use OUs now, instead of DCs:
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -112,7 +112,7 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=or
</p>
</header>
<h2 id="20160901">2016-09-01</h2>
<h2 id="2016-09-01">2016-09-01</h2>
<ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li>
@ -203,7 +203,7 @@ dspacetest=# select distinct text_value, authority, confidence from metadatavalu
<li>After updating the Authority indexes (<code>bin/dspace index-authority</code>) everything looks good</li>
<li>Run authority updates on CGSpace</li>
</ul>
<h2 id="20160905">2016-09-05</h2>
<h2 id="2016-09-05">2016-09-05</h2>
<ul>
<li>After one week of logging TLS connections on CGSpace:</li>
</ul>
@ -222,7 +222,7 @@ TLSv1/EDH-RSA-DES-CBC3-SHA
</code></pre><ul>
<li>This gives you, for example: <code>Mainstreaming gender in agricultural R&amp;D.pdf__description:Brief</code></li>
</ul>
<h2 id="20160906">2016-09-06</h2>
<h2 id="2016-09-06">2016-09-06</h2>
<ul>
<li>Trying to import the records for CIAT from yesterday, but having filename encoding issues from their zip file</li>
<li>Create a zip on Mac OS X from a SAF bundle containing only one record with one PDF:
@ -258,7 +258,7 @@ TLSv1/EDH-RSA-DES-CBC3-SHA
<pre><code>$ ./safbuilder.sh -c /home/aorth/ciat-gender-2016-09-06/66601.csv
$ JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot; /home/cgspace.cgiar.org/bin/dspace import -a -e aorth@mjanja.ch -c 10568/66601 -s /home/aorth/ciat-gender-2016-09-06/SimpleArchiveFormat -m 66601.map
$ rm -rf ~/ciat-gender-2016-09-06/SimpleArchiveFormat/
</code></pre><h2 id="20160907">2016-09-07</h2>
</code></pre><h2 id="2016-09-07">2016-09-07</h2>
<ul>
<li>Erase and rebuild DSpace Test based on latest Ubuntu 16.04, PostgreSQL 9.5, and Java 8 stuff</li>
<li>Reading about PostgreSQL maintenance and it seems manual vacuuming is only for certain workloads, such as heavy update/write loads</li>
@ -272,7 +272,7 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
</code></pre><ul>
<li>Since CGSpace had crashed I quickly deployed the new LDAP settings before restarting Tomcat</li>
</ul>
<h2 id="20160913">2016-09-13</h2>
<h2 id="2016-09-13">2016-09-13</h2>
<ul>
<li>CGSpace crashed twice today, errors from <code>catalina.out</code>:</li>
</ul>
@ -281,7 +281,7 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
</code></pre><ul>
<li>I enabled logging of requests to <code>/rest</code> again</li>
</ul>
<h2 id="20160914">2016-09-14</h2>
<h2 id="2016-09-14">2016-09-14</h2>
<ul>
<li>CGSpace crashed again, errors from <code>catalina.out</code>:</li>
</ul>
@ -399,12 +399,12 @@ java.util.Map does not have a no-arg default constructor.
<li>So I'm going to bump the heap +512m and remove all the other experimental shit (and update ansible!)</li>
<li>Increased JVM heap to 4096m on CGSpace (linode01)</li>
</ul>
<h2 id="20160915">2016-09-15</h2>
<h2 id="2016-09-15">2016-09-15</h2>
<ul>
<li>Looking at Google Webmaster Tools again, it seems the work I did on URL query parameters and blocking via the <code>X-Robots-Tag</code> HTTP header in March, 2016 seem to have had a positive effect on Google's index for CGSpace</li>
</ul>
<p><img src="/cgspace-notes/2016/09/google-webmaster-tools-index.png" alt="Google Webmaster Tools for CGSpace"></p>
<h2 id="20160916">2016-09-16</h2>
<h2 id="2016-09-16">2016-09-16</h2>
<ul>
<li>CGSpace crashed again, and there are TONS of heap space errors but the datestamps aren't on those lines so I'm not sure if they were yesterday:</li>
</ul>
@ -440,7 +440,7 @@ Exception in thread &quot;Thread-54216&quot; org.apache.solr.client.solrj.impl.H
</code></pre><ul>
<li>I've sent a message to Atmire about the Solr error to see if it's related to their batch update module</li>
</ul>
<h2 id="20160919">2016-09-19</h2>
<h2 id="2016-09-19">2016-09-19</h2>
<ul>
<li>Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions:</li>
</ul>
@ -450,7 +450,7 @@ $ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2
<li>After that we need to take the top ~300 and make a controlled vocabulary for it</li>
<li>I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (<a href="https://github.com/ilri/DSpace/pull/267">#267</a>)</li>
</ul>
<h2 id="20160920">2016-09-20</h2>
<h2 id="2016-09-20">2016-09-20</h2>
<ul>
<li>Run all system updates on DSpace Test and reboot the server</li>
<li>Merge changes for sponsorship and affiliation controlled vocabularies (<a href="https://github.com/ilri/DSpace/pull/267">#267</a>, <a href="https://github.com/ilri/DSpace/pull/268">#268</a>)</li>
@ -461,7 +461,7 @@ $ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2
<li>I need to read the docs and ask on the mailing list to see if we can tweak that</li>
<li>Generate a new list of sponsors from the database for Peter Ballantyne so we can clean them up and update the controlled vocabulary</li>
</ul>
<h2 id="20160921">2016-09-21</h2>
<h2 id="2016-09-21">2016-09-21</h2>
<ul>
<li>Turns out the Solr search logic switched from OR to AND in DSpace 6.0 and the change is easy to backport: <a href="https://jira.duraspace.org/browse/DS-2809">https://jira.duraspace.org/browse/DS-2809</a></li>
<li>We just need to set this in <code>dspace/solr/search/conf/schema.xml</code>:</li>
@ -490,11 +490,11 @@ $ ./delete-metadata-values.py -i sponsors-delete-8.csv -f dc.description.sponsor
<li>I need to run these and the others from a few days ago on CGSpace the next time we run updates</li>
<li>Also, I need to update the controlled vocab for sponsors based on these</li>
</ul>
<h2 id="20160922">2016-09-22</h2>
<h2 id="2016-09-22">2016-09-22</h2>
<ul>
<li>Update controlled vocabulary for sponsorship based on the latest corrected values from the database</li>
</ul>
<h2 id="20160925">2016-09-25</h2>
<h2 id="2016-09-25">2016-09-25</h2>
<ul>
<li>Merge accession date improvements for CUA module (<a href="https://github.com/ilri/DSpace/pull/275">#275</a>)</li>
<li>Merge addition of accession date to Discovery search filters (<a href="https://github.com/ilri/DSpace/pull/276">#276</a>)</li>
@ -520,7 +520,7 @@ OCSP Response Data:
</code></pre><ul>
<li>I've been monitoring this for almost two years in this GitHub issue: <a href="https://github.com/ilri/DSpace/issues/38">https://github.com/ilri/DSpace/issues/38</a></li>
</ul>
<h2 id="20160927">2016-09-27</h2>
<h2 id="2016-09-27">2016-09-27</h2>
<ul>
<li>Discuss fixing some ORCIDs for CCAFS author Sonja Vermeulen with Magdalena Haman</li>
<li>This author has a few variations:</li>
@ -546,7 +546,7 @@ UPDATE 101
<li>We can also replace the RSS and mail icons in community text!</li>
<li>Fix reference to <code>dc.type.*</code> in Atmire CUA module, as we now only index <code>dc.type</code> for &ldquo;Output type&rdquo;</li>
</ul>
<h2 id="20160928">2016-09-28</h2>
<h2 id="2016-09-28">2016-09-28</h2>
<ul>
<li>Make a placeholder pull request for <code>discovery.xml</code> changes (<a href="https://github.com/ilri/DSpace/pull/278">#278</a>), as I still need to test their effect on Atmire content analysis module</li>
<li>Make a placeholder pull request for Font Awesome changes (<a href="https://github.com/ilri/DSpace/pull/279">#279</a>), which replaces the GitHub image in the footer with an icon, and add style for RSS and @ icons that I will start replacing in community/collection HTML intros</li>
@ -565,7 +565,7 @@ dspacetest=# update metadatavalue set authority='09e4da69-33a3-45ca-b110-7d3f82d
</ul>
<pre><code>$ ./fix-metadata-values.py -i ilrisubjects-fix-32.csv -f cg.subject.ilri -t correct -m 203 -d dspace -u dspace -p fuuuu
$ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -m 203 -d dspace -u dspace -p fuuu
</code></pre><h2 id="20160929">2016-09-29</h2>
</code></pre><h2 id="2016-09-29">2016-09-29</h2>
<ul>
<li>Add <code>cg.identifier.ciatproject</code> to metadata registry in preparation for CIAT project tag</li>
<li>Merge changes for CIAT project tag (<a href="https://github.com/ilri/DSpace/pull/282">#282</a>)</li>
@ -573,7 +573,7 @@ $ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -
<li>People on DSpace mailing list gave me a query to get authors from certain collections:</li>
</ul>
<pre><code>dspacetest=# select distinct text_value from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/5472', '10568/5473')));
</code></pre><h2 id="20160930">2016-09-30</h2>
</code></pre><h2 id="2016-09-30">2016-09-30</h2>
<ul>
<li>Deny access to REST API's <code>find-by-metadata-field</code> endpoint to protect against an upstream security issue (DS-3250)</li>
<li>There is a patch but it is only for 5.5 and doesn't apply cleanly to 5.1</li>

View File

@ -39,7 +39,7 @@ I exported a random item&#39;s metadata as CSV, deleted all columns except id an
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -120,7 +120,7 @@ I exported a random item&#39;s metadata as CSV, deleted all columns except id an
</p>
</header>
<h2 id="20161003">2016-10-03</h2>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
@ -141,7 +141,7 @@ I exported a random item&#39;s metadata as CSV, deleted all columns except id an
<ul>
<li>Looks like we'll just have to add the text to the About page (without a link) or add a separate page</li>
</ul>
<h2 id="20161004">2016-10-04</h2>
<h2 id="2016-10-04">2016-10-04</h2>
<ul>
<li>Start testing cleanups of authors that Peter sent last week</li>
<li>Out of 40,000+ rows, Peter had indicated corrections for ~3,200 of them—too many to look through carefully, so I did some basic quality checking:
@ -161,12 +161,12 @@ $ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -
<li>Generate list of unique authors in CCAFS collections:</li>
</ul>
<pre><code>dspacetest=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/32729', '10568/5472', '10568/5473', '10568/10288', '10568/70974', '10568/3547', '10568/3549', '10568/3531','10568/16890','10568/5470','10568/3546', '10568/36024', '10568/66581', '10568/21789', '10568/5469', '10568/5468', '10568/3548', '10568/71053', '10568/25167'))) group by text_value order by count desc) to /tmp/ccafs-authors.csv with csv;
</code></pre><h2 id="20161005">2016-10-05</h2>
</code></pre><h2 id="2016-10-05">2016-10-05</h2>
<ul>
<li>Work on more infrastructure cleanups for Ansible DSpace role</li>
<li>Clean up Let's Encrypt plumbing and submit pull request for rmg-ansible-public (<a href="https://github.com/ilri/rmg-ansible-public/pull/60">#60</a>)</li>
</ul>
<h2 id="20161006">2016-10-06</h2>
<h2 id="2016-10-06">2016-10-06</h2>
<ul>
<li>Nice! DSpace Test (linode02) is now having <code>java.lang.OutOfMemoryError: Java heap space</code> errors&hellip;</li>
<li>Heap space is 2048m, and we have 5GB of RAM being used for OS cache (Solr!) so let's just bump the memory to 3072m</li>
@ -177,7 +177,7 @@ $ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -
<li>Turns out the first PDF was exported from InDesign using CMYK and the second one was using sRGB</li>
<li>Run all system updates on DSpace Test and reboot it</li>
</ul>
<h2 id="20161008">2016-10-08</h2>
<h2 id="2016-10-08">2016-10-08</h2>
<ul>
<li>Re-deploy CGSpace with latest changes from late September and early October</li>
<li>Run fixes for ILRI subjects and delete blank metadata values:</li>
@ -193,13 +193,13 @@ DELETE 11
</code></pre><ul>
<li>Delete 2GB <code>cron-filter-media.log</code> file, as it is just a log from a cron job and it doesn't get rotated like normal log files (almost a year now maybe)</li>
</ul>
<h2 id="20161014">2016-10-14</h2>
<h2 id="2016-10-14">2016-10-14</h2>
<ul>
<li>Run all system updates on DSpace Test and reboot server</li>
<li>Looking into some issues with Discovery filters in Atmire's content and usage analysis module after adjusting the filter class</li>
<li>Looks like changing the filters from <code>configuration.DiscoverySearchFilterFacet</code> to <code>configuration.DiscoverySearchFilter</code> breaks them in Atmire CUA module</li>
</ul>
<h2 id="20161017">2016-10-17</h2>
<h2 id="2016-10-17">2016-10-17</h2>
<ul>
<li>A bit more cleanup on the CCAFS authors, and run the corrections on DSpace Test:</li>
</ul>
@ -207,7 +207,7 @@ DELETE 11
</code></pre><ul>
<li>One observation is that there are still some old versions of names in the author lookup because authors appear in other communities (as we only corrected authors from CCAFS for this round)</li>
</ul>
<h2 id="20161018">2016-10-18</h2>
<h2 id="2016-10-18">2016-10-18</h2>
<ul>
<li>Start working on DSpace 5.5 porting work again:</li>
</ul>
@ -221,7 +221,7 @@ $ git rebase -i dspace-5.5
<li>Merge the <code>discovery.xml</code> cleanups (<a href="https://github.com/ilri/DSpace/pull/278">#278</a>)</li>
<li>Merge some minor edits to the distribution license (<a href="https://github.com/ilri/DSpace/pull/285">#285</a>)</li>
</ul>
<h2 id="20161019">2016-10-19</h2>
<h2 id="2016-10-19">2016-10-19</h2>
<ul>
<li>When we move to DSpace 5.5 we should also cherry pick some patches from 5.6 branch:
<ul>
@ -231,12 +231,12 @@ $ git rebase -i dspace-5.5
</ul>
</li>
</ul>
<h2 id="20161020">2016-10-20</h2>
<h2 id="2016-10-20">2016-10-20</h2>
<ul>
<li>Run CCAFS author corrections on CGSpace</li>
<li>Discovery reindexing took forever and kinda caused CGSpace to crash, so I ran all system updates and rebooted the server</li>
</ul>
<h2 id="20161025">2016-10-25</h2>
<h2 id="2016-10-25">2016-10-25</h2>
<ul>
<li>Move the LIVES community from the top level to the ILRI projects community</li>
</ul>
@ -279,7 +279,7 @@ dspace=# update metadatavalue set text_value = regexp_replace(text_value, '&lt;i
<li>And now that I start looking, I want to fix a bunch of links to popular sites that should be using HTTPS, like Twitter, Facebook, Google, Feed Burner, DOI, etc</li>
<li>I should look to see if any of those domains is sending an HTTP 301 or setting HSTS headers to their HTTPS domains, then just replace them</li>
</ul>
<h2 id="20161027">2016-10-27</h2>
<h2 id="2016-10-27">2016-10-27</h2>
<ul>
<li>Run Font Awesome fixes on DSpace Test:</li>
</ul>
@ -309,7 +309,7 @@ UPDATE 0
<ul>
<li>Run the same replacements on CGSpace</li>
</ul>
<h2 id="20161030">2016-10-30</h2>
<h2 id="2016-10-30">2016-10-30</h2>
<ul>
<li>Fix some messed up authors on CGSpace:</li>
</ul>

View File

@ -23,7 +23,7 @@ Add dc.type to the output options for Atmire&#39;s Listings and Reports module (
Add dc.type to the output options for Atmire&#39;s Listings and Reports module (#286)
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -104,12 +104,12 @@ Add dc.type to the output options for Atmire&#39;s Listings and Reports module (
</p>
</header>
<h2 id="20161101">2016-11-01</h2>
<h2 id="2016-11-01">2016-11-01</h2>
<ul>
<li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>
<h2 id="20161102">2016-11-02</h2>
<h2 id="2016-11-02">2016-11-02</h2>
<ul>
<li>Migrate DSpace Test to DSpace 5.5 (<a href="https://gist.github.com/alanorth/61013895c6efe7095d7f81000953d1cf">notes</a>)</li>
<li>Run all updates on DSpace Test and reboot the server</li>
@ -144,11 +144,11 @@ java.lang.NullPointerException
</code></pre><ul>
<li>I will raise a ticket with Atmire to ask them</li>
</ul>
<h2 id="20161106">2016-11-06</h2>
<h2 id="2016-11-06">2016-11-06</h2>
<ul>
<li>After re-deploying and re-indexing I didn't see the same issue, and the indexing completed in 85 minutes, which is about how long it is supposed to take</li>
</ul>
<h2 id="20161107">2016-11-07</h2>
<h2 id="2016-11-07">2016-11-07</h2>
<ul>
<li>Horrible one liner to get Linode ID from certain Ansible host vars:</li>
</ul>
@ -166,7 +166,7 @@ COPY 22
</code></pre><ul>
<li>Add <code>AMR</code> to ILRI subjects and remove one duplicate instance of IITA in author affiliations controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/288">#288</a>)</li>
</ul>
<h2 id="20161108">2016-11-08</h2>
<h2 id="2016-11-08">2016-11-08</h2>
<ul>
<li>Atmire's Listings and Reports module seems to be broken on DSpace 5.5</li>
</ul>
@ -181,13 +181,13 @@ COPY 22
<li>Dump of the top ~200 authors in CGSpace:</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=3 group by text_value order by count desc limit 210) to /tmp/210-authors.csv with csv;
</code></pre><h2 id="20161109">2016-11-09</h2>
</code></pre><h2 id="2016-11-09">2016-11-09</h2>
<ul>
<li>CGSpace crashed so I quickly ran system updates, applied one or two of the waiting changes from the <code>5_x-prod</code> branch, and rebooted the server</li>
<li>The error was <code>Timeout waiting for idle object</code> but I haven't looked into the Tomcat logs to see what happened</li>
<li>Also, I ran the corrections for CRPs from earlier this week</li>
</ul>
<h2 id="20161110">2016-11-10</h2>
<h2 id="2016-11-10">2016-11-10</h2>
<ul>
<li>Helping Megan Zandstra and CIAT with some questions about the REST API</li>
<li>Playing with <code>find-by-metadata-field</code>, this works:</li>
@ -283,7 +283,7 @@ $ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: applica
</code></pre><ul>
<li>Not sure what's going on, but Discovery shows 83 values, and database shows 85, so I'm going to reindex Discovery just in case</li>
</ul>
<h2 id="20161114">2016-11-14</h2>
<h2 id="2016-11-14">2016-11-14</h2>
<ul>
<li>I applied Atmire's suggestions to fix Listings and Reports for DSpace 5.5 and now it works</li>
<li>There were some issues with the <code>dspace/modules/jspui/pom.xml</code>, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire's installation procedure must have changed</li>
@ -319,7 +319,7 @@ X-Cocoon-Version: 2.2.0
<li>The first one gets a session, and any after thatwithin 60 secondswill be internally mapped to the same session by Tomcat</li>
<li>This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!</li>
</ul>
<h2 id="20161115">2016-11-15</h2>
<h2 id="2016-11-15">2016-11-15</h2>
<ul>
<li>The Tomcat JVM heap looks really good after applying the Crawler Session Manager fix on DSpace Test last night:</li>
</ul>
@ -375,7 +375,7 @@ Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)&quot; &quot;
</code></pre><ul>
<li>We absolutely don't use those modules, so we shouldn't build them in the first place</li>
</ul>
<h2 id="20161117">2016-11-17</h2>
<h2 id="2016-11-17">2016-11-17</h2>
<ul>
<li>Generate a list of journal titles for Peter and Abenet to look through so we can make a controlled vocabulary out of them:</li>
</ul>
@ -404,18 +404,18 @@ UPDATE 7
</code></pre><ul>
<li>I'm not sure if there's anything we can do, actually, because we would have to remove those from the thumbnail bundles, and replace them with the regular JPGs from the content bundle, and then remove them from the assetstore&hellip;</li>
</ul>
<h2 id="20161118">2016-11-18</h2>
<h2 id="2016-11-18">2016-11-18</h2>
<ul>
<li>Enable Tomcat Crawler Session Manager on CGSpace</li>
</ul>
<h2 id="20161121">2016-11-21</h2>
<h2 id="2016-11-21">2016-11-21</h2>
<ul>
<li>More work on Ansible playbooks for PostgreSQL 9.3→9.5 and Java 7→8 work</li>
<li>CGSpace virtual managers meeting</li>
<li>I need to look into making the item thumbnail clickable</li>
<li>Macaroni Bros said they tested the DSpace Test (DSpace 5.5) REST API for CCAFS and WLE sites and it works as expected</li>
</ul>
<h2 id="20161123">2016-11-23</h2>
<h2 id="2016-11-23">2016-11-23</h2>
<ul>
<li>Upgrade Java from 7 to 8 on CGSpace</li>
<li>I had started planning the inplace PostgreSQL 9.3→9.5 upgrade but decided that I will have to <code>pg_dump</code> and <code>pg_restore</code> when I move to the new server soon anyways, so there's no need to upgrade the database right now</li>
@ -426,13 +426,13 @@ UPDATE 7
<li>Play with Creative Commons stuff in DSpace submission step</li>
<li>It seems to work but it doesn't let you choose a version of CC (like 4.0), and we would need to customize the XMLUI item display so it doesn't display the gross CC badges</li>
</ul>
<h2 id="20161124">2016-11-24</h2>
<h2 id="2016-11-24">2016-11-24</h2>
<ul>
<li>Bizuwork was testing DSpace Test on DSPace 5.5 and noticed that the Listings and Reports module seems to be case sensitive, whereas CGSpace's Listings and Reports isn't (ie, a search for &ldquo;orth, alan&rdquo; vs &ldquo;Orth, Alan&rdquo; returns the same results on CGSpace, but different on DSpace Test)</li>
<li>I have raised a ticket with Atmire</li>
<li>Looks like this issue is actually the new Listings and Reports module honoring the Solr search queries more correctly</li>
</ul>
<h2 id="20161127">2016-11-27</h2>
<h2 id="2016-11-27">2016-11-27</h2>
<ul>
<li>Run system updates on DSpace Test and reboot the server</li>
<li>Deploy DSpace 5.5 on CGSpace:
@ -451,7 +451,7 @@ UPDATE 7
<li>Testing DSpace 5.5 on CGSpace, it seems CUA's export as XLS works for Usage statistics, but not Content statistics</li>
<li>I will raise a bug with Atmire</li>
</ul>
<h2 id="20161128">2016-11-28</h2>
<h2 id="2016-11-28">2016-11-28</h2>
<ul>
<li>One user says they are still getting a blank page when he logs in (just CGSpace header, but no community list)</li>
<li>Looking at the Catlina logs I see there is some super long-running indexing process going on:</li>
@ -478,7 +478,7 @@ $ /home/dspacetest.cgiar.org/bin/dspace registry-loader -metadata /home/dspacete
<li>Wow, Bram from Atmire pointed out this solution for using multiple handles with one DSpace instance: <a href="https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296">https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296</a></li>
<li>We might be able to migrate the <a href="http://library.cgiar.org/">CGIAR Library</a> now, as they had wanted to keep their handles</li>
</ul>
<h2 id="20161129">2016-11-29</h2>
<h2 id="2016-11-29">2016-11-29</h2>
<ul>
<li>Sisay tried deleting and re-creating Goshu's account but he still can't see any communities on the homepage after he logs in</li>
<li>Around the time of his login I see this in the DSpace logs:</li>
@ -514,7 +514,7 @@ org.dspace.discovery.SearchServiceException: Error executing query
<li>A few users are reporting having issues with their workflows, they get the following message: &ldquo;You are not allowed to perform this task&rdquo;</li>
<li>Might be the same as <a href="https://jira.duraspace.org/browse/DS-2920">DS-2920</a> on the bug tracker</li>
</ul>
<h2 id="20161130">2016-11-30</h2>
<h2 id="2016-11-30">2016-11-30</h2>
<ul>
<li>The <code>maxHttpHeaderSize</code> fix worked on CGSpace (user is able to see the community list on the homepage)</li>
<li>The &ldquo;take task&rdquo; cache fix worked on DSpace Test but it's not an official patch, so I'll have to report the bug to DSpace people and try to get advice</li>

View File

@ -43,7 +43,7 @@ I see thousands of them in the logs for the last few months, so it&#39;s not rel
I&#39;ve raised a ticket with Atmire to ask
Another worrying error from dspace.log is:
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -124,7 +124,7 @@ Another worrying error from dspace.log is:
</p>
</header>
<h2 id="20161202">2016-12-02</h2>
<h2 id="2016-12-02">2016-12-02</h2>
<ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
@ -242,7 +242,7 @@ org.apache.solr.client.solrj.SolrServerException: Server refused connection at:
<li>Also, the disk is nearly full because of log file issues, so I'm running some compression on DSpace logs</li>
<li>Normally these stay uncompressed for a month just in case we need to look at them, so now I've just compressed anything older than 2 weeks so we can get some disk space back</li>
</ul>
<h2 id="20161204">2016-12-04</h2>
<h2 id="2016-12-04">2016-12-04</h2>
<ul>
<li>I got a weird report from the CGSpace checksum checker this morning</li>
<li>It says 732 bitstreams have potential issues, for example:</li>
@ -293,7 +293,7 @@ GC_TUNE=&quot;-XX:-UseSuperWord \
<li>I need to try these because they are recommended by the Solr project itself</li>
<li>Also, as always, I need to read <a href="https://wiki.apache.org/solr/ShawnHeisey">Shawn Heisey's wiki page on Solr</a></li>
</ul>
<h2 id="20161205">2016-12-05</h2>
<h2 id="2016-12-05">2016-12-05</h2>
<ul>
<li>I did some basic benchmarking on a local DSpace before and after the JVM settings above, but there wasn't anything amazingly obvious</li>
<li>I want to make the changes on DSpace Test and monitor the JVM heap graphs for a few days to see if they change the JVM GC patterns or anything (munin graphs)</li>
@ -307,7 +307,7 @@ GC_TUNE=&quot;-XX:-UseSuperWord \
</code></pre><ul>
<li>I haven't tested it yet, but I created a pull request: <a href="https://github.com/ilri/DSpace/pull/289">#289</a></li>
</ul>
<h2 id="20161206">2016-12-06</h2>
<h2 id="2016-12-06">2016-12-06</h2>
<ul>
<li>Some author authority corrections and name standardizations for Peter:</li>
</ul>
@ -360,7 +360,7 @@ java.lang.NullPointerException
real 8m39.913s
user 1m54.190s
sys 0m22.647s
</code></pre><h2 id="20161207">2016-12-07</h2>
</code></pre><h2 id="2016-12-07">2016-12-07</h2>
<ul>
<li>For what it's worth, after running the same SQL updates on my local test server, <code>index-authority</code> runs and completes just fine</li>
<li>I will have to test more</li>
@ -459,7 +459,7 @@ update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-
update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
</code></pre><h2 id="20161208">2016-12-08</h2>
</code></pre><h2 id="2016-12-08">2016-12-08</h2>
<ul>
<li>Something weird happened and Peter Thorne's names all ended up as &ldquo;Thorne&rdquo;, I guess because the original authority had that as its name value:</li>
</ul>
@ -506,7 +506,7 @@ UPDATE 362
<li>In other news, I think we should really be using more RAM for PostgreSQL's <code>shared_buffers</code></li>
<li>The <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html">PostgreSQL documentation</a> recommends using 25% of the system's RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache</li>
</ul>
<h2 id="20161209">2016-12-09</h2>
<h2 id="2016-12-09">2016-12-09</h2>
<ul>
<li>More work on finishing rough draft of KM4Dev article</li>
<li>Set PostgreSQL's <code>shared_buffers</code> on CGSpace to 10% of system RAM (1200MB)</li>
@ -517,7 +517,7 @@ dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab76
</code></pre><ul>
<li>The authority IDs were different now than when I was looking a few days ago so I had to adjust them here</li>
</ul>
<h2 id="20161211">2016-12-11</h2>
<h2 id="2016-12-11">2016-12-11</h2>
<ul>
<li>After enabling a sizable <code>shared_buffers</code> for CGSpace's PostgreSQL configuration the number of connections to the database dropped significantly</li>
</ul>
@ -553,7 +553,7 @@ UPDATE 35
</code></pre><ul>
<li>Work on article for KM4Dev journal</li>
</ul>
<h2 id="20161213">2016-12-13</h2>
<h2 id="2016-12-13">2016-12-13</h2>
<ul>
<li>Checking in on CGSpace postgres stats again, looks like the <code>shared_buffers</code> change from a few days ago really made a big impact:</li>
</ul>
@ -640,7 +640,7 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
<li>It happens on development and production, so I will have to ask Atmire</li>
<li>Most likely an issue with installation/configuration</li>
</ul>
<h2 id="20161214">2016-12-14</h2>
<h2 id="2016-12-14">2016-12-14</h2>
<ul>
<li>Atmire sent a quick fix for the <code>last-update.txt</code> file not found error</li>
<li>After applying pull request <a href="https://github.com/ilri/DSpace/pull/291">#291</a> on DSpace Test I no longer see the error in the logs after the <code>UpdateSolrStorageReports</code> task runs</li>
@ -648,7 +648,7 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
<li>Made a pull request with a template for the cron jobs (<a href="https://github.com/ilri/rmg-ansible-public/pull/75">#75</a>)</li>
<li>Testing SMTP from the new CGSpace server and it's not working, I'll have to tell James</li>
</ul>
<h2 id="20161215">2016-12-15</h2>
<h2 id="2016-12-15">2016-12-15</h2>
<ul>
<li>Start planning for server migration this weekend, letting users know</li>
<li>I am trying to figure out what the process is to <a href="http://handle.net/hnr_support.html">update the server's IP in the Handle system</a>, and emailing the hdladmin account bounces(!)</li>
@ -662,7 +662,7 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
</ul>
<p><img src="/cgspace-notes/2016/12/batch-edit1.png" alt="Select all items with &ldquo;rangelands&rdquo; in metadata">
<img src="/cgspace-notes/2016/12/batch-edit2.png" alt="Add RANGELANDS ILRI subject"></p>
<h2 id="20161218">2016-12-18</h2>
<h2 id="2016-12-18">2016-12-18</h2>
<ul>
<li>Add four new CRP subjects for 2017 and sort the input forms alphabetically (<a href="https://github.com/ilri/DSpace/pull/294">#294</a>)</li>
<li>Test the SMTP on the new server and it's working</li>
@ -737,13 +737,13 @@ $ exit
</ul>
</li>
</ul>
<h2 id="20161222">2016-12-22</h2>
<h2 id="2016-12-22">2016-12-22</h2>
<ul>
<li>Abenet wanted a CSV of the IITA community, but the web export doesn't include the <code>dc.date.accessioned</code> field</li>
<li>I had to export it from the command line using the <code>-a</code> flag:</li>
</ul>
<pre><code>$ [dspace]/bin/dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616
</code></pre><h2 id="20161228">2016-12-28</h2>
</code></pre><h2 id="2016-12-28">2016-12-28</h2>
<ul>
<li>We've been getting two alerts per day about CPU usage on the new server from Linode</li>
<li>These are caused by the batch jobs for Solr etc that run in the early morning hours</li>

View File

@ -25,7 +25,7 @@ I checked to see if the Solr sharding task that is supposed to run on January 1s
I tested on DSpace Test as well and it doesn&#39;t work there either
I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&#39;m not sure if we&#39;ve ever had the sharding task run successfully over all these years
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -106,13 +106,13 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua
</p>
</header>
<h2 id="20170102">2017-01-02</h2>
<h2 id="2017-01-02">2017-01-02</h2>
<ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn't work there either</li>
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I'm not sure if we've ever had the sharding task run successfully over all these years</li>
</ul>
<h2 id="20170104">2017-01-04</h2>
<h2 id="2017-01-04">2017-01-04</h2>
<ul>
<li>I tried to shard my local dev instance and it fails the same way:</li>
</ul>
@ -183,17 +183,17 @@ Caused by: java.net.SocketException: Broken pipe (Write failed)
</code></pre><ul>
<li>Very interesting&hellip; it creates the core and then fails somehow</li>
</ul>
<h2 id="20170108">2017-01-08</h2>
<h2 id="2017-01-08">2017-01-08</h2>
<ul>
<li>Put Sisay's <code>item-view.xsl</code> code to show mapped collections on CGSpace (<a href="https://github.com/ilri/DSpace/pull/295">#295</a>)</li>
</ul>
<h2 id="20170109">2017-01-09</h2>
<h2 id="2017-01-09">2017-01-09</h2>
<ul>
<li>A user wrote to tell me that the new display of an item's mappings had a crazy bug for at least one item: <a href="https://cgspace.cgiar.org/handle/10568/78596">https://cgspace.cgiar.org/handle/10568/78596</a></li>
<li>She said she only mapped it once, but it appears to be mapped 184 times</li>
</ul>
<p><img src="/cgspace-notes/2017/01/mapping-crazy-duplicate.png" alt="Crazy item mapping"></p>
<h2 id="20170110">2017-01-10</h2>
<h2 id="2017-01-10">2017-01-10</h2>
<ul>
<li>I tried to clean up the duplicate mappings by exporting the item's metadata to CSV, editing, and re-importing, but DSpace said &ldquo;no changes were detected&rdquo;</li>
<li>I've asked on the dspace-tech mailing list to see if anyone can help</li>
@ -210,7 +210,7 @@ Caused by: java.net.SocketException: Broken pipe (Write failed)
<li>I will have to ask the DSpace people if this is a valid approach</li>
<li>Finish looking at the Journal Title corrections of the top 500 Journal Titles so we can make a controlled vocabulary from it</li>
</ul>
<h2 id="20170111">2017-01-11</h2>
<h2 id="2017-01-11">2017-01-11</h2>
<ul>
<li>Maria found another item with duplicate mappings: <a href="https://cgspace.cgiar.org/handle/10568/78658">https://cgspace.cgiar.org/handle/10568/78658</a></li>
<li>Error in <code>fix-metadata-values.py</code> when it tries to print the value for Entwicklung &amp; Ländlicher Raum:</li>
@ -238,11 +238,11 @@ UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15:
<li>I will have to go through these and fix some more before making the controlled vocabulary</li>
<li>Added 30 more corrections or so, now there are 49 total and I'll have to get the top 500 after applying them</li>
</ul>
<h2 id="20170113">2017-01-13</h2>
<h2 id="2017-01-13">2017-01-13</h2>
<ul>
<li>Add <code>FOOD SYSTEMS</code> to CIAT subjects, waiting to merge: <a href="https://github.com/ilri/DSpace/pull/296">https://github.com/ilri/DSpace/pull/296</a></li>
</ul>
<h2 id="20170116">2017-01-16</h2>
<h2 id="2017-01-16">2017-01-16</h2>
<ul>
<li>Fix the two items Maria found with duplicate mappings with this script:</li>
</ul>
@ -250,7 +250,7 @@ UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15:
delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807);
/* 1 incorrect mapping: https://cgspace.cgiar.org/handle/10568/78658 */
delete from collection2item where id = '91082';
</code></pre><h2 id="20170117">2017-01-17</h2>
</code></pre><h2 id="2017-01-17">2017-01-17</h2>
<ul>
<li>Helping clean up some file names in the 232 CIAT records that Sisay worked on last week</li>
<li>There are about 30 files with <code>%20</code> (space) and Spanish accents in the file name</li>
@ -276,18 +276,18 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
</code></pre><ul>
<li>Somewhere on the Internet suggested using a DPI of 144</li>
</ul>
<h2 id="20170119">2017-01-19</h2>
<h2 id="2017-01-19">2017-01-19</h2>
<ul>
<li>In testing a random sample of CIAT's PDFs for compressability, it looks like all of these methods generally increase the file size so we will just import them as they are</li>
<li>Import 232 CIAT records into CGSpace:</li>
</ul>
<pre><code>$ JAVA_OPTS=&quot;-Xmx512m -Dfile.encoding=UTF-8&quot; /home/cgspace.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/68704 --source /home/aorth/CIAT_232/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;&gt; /tmp/ciat.log
</code></pre><h2 id="20170122">2017-01-22</h2>
</code></pre><h2 id="2017-01-22">2017-01-22</h2>
<ul>
<li>Looking at some records that Sisay is having problems importing into DSpace Test (seems to be because of copious whitespace return characters from Excel's CSV exporter)</li>
<li>There were also some issues with an invalid dc.date.issued field, and I trimmed leading / trailing whitespace and cleaned up some URLs with unneeded parameters like ?show=full</li>
</ul>
<h2 id="20170123">2017-01-23</h2>
<h2 id="2017-01-23">2017-01-23</h2>
<ul>
<li>I merged Atmire's pull request into the development branch so they can deploy it on DSpace Test</li>
<li>Move some old ILRI Program communities to a new subcommunity for former programs (10568/79164):</li>
@ -298,7 +298,7 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
</ul>
<pre><code>10568/42161 10568/171 10568/79341
10568/41914 10568/171 10568/79340
</code></pre><h2 id="20170124">2017-01-24</h2>
</code></pre><h2 id="2017-01-24">2017-01-24</h2>
<ul>
<li>Run all updates on DSpace Test and reboot the server</li>
<li>Run fixes for Journal titles on CGSpace:</li>
@ -312,7 +312,7 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
<li>Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request (<a href="https://github.com/ilri/DSpace/pull/298">#298</a>)</li>
<li>This would be the last issue remaining to close the meta issue about switching to controlled vocabularies (<a href="https://github.com/ilri/DSpace/pull/69">#69</a>)</li>
</ul>
<h2 id="20170125">2017-01-25</h2>
<h2 id="2017-01-25">2017-01-25</h2>
<ul>
<li>Atmire says the <code>com.atmire.statistics.util.UpdateSolrStorageReports</code> and <code>com.atmire.utils.ReportSender</code> are no longer necessary because they are using a Spring scheduler for these tasks now</li>
<li>Pull request to remove them from the Ansible templates: <a href="https://github.com/ilri/rmg-ansible-public/pull/80">https://github.com/ilri/rmg-ansible-public/pull/80</a></li>
@ -325,18 +325,18 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
</li>
<li>But now we have a new issue with the &ldquo;Types&rdquo; in Content statistics not being respected—we only get the defaults, despite having custom settings in <code>dspace/config/modules/atmire-cua.cfg</code></li>
</ul>
<h2 id="20170127">2017-01-27</h2>
<h2 id="2017-01-27">2017-01-27</h2>
<ul>
<li>Magdalena pointed out that somehow the Anonymous group had been added to the Administrators group on CGSpace (!)</li>
<li>Discuss plans to update CCAFS metadata and communities for their new flagships and phase II project identifiers</li>
<li>The flagships are in <code>cg.subject.ccafs</code>, and we need to probably make a new field for the phase II project identifiers</li>
</ul>
<h2 id="20170128">2017-01-28</h2>
<h2 id="2017-01-28">2017-01-28</h2>
<ul>
<li>Merge controlled vocabulary for journal titles (<code>dc.source</code>) into CGSpace (<a href="https://github.com/ilri/DSpace/pull/298">#298</a>)</li>
<li>Merge new CIAT subject into CGSpace (<a href="https://github.com/ilri/DSpace/pull/296">#296</a>)</li>
</ul>
<h2 id="20170129">2017-01-29</h2>
<h2 id="2017-01-29">2017-01-29</h2>
<ul>
<li>Run all system updates on DSpace Test, redeploy DSpace code, and reboot the server</li>
<li>Run all system updates on CGSpace, redeploy DSpace code, and reboot the server</li>

View File

@ -47,7 +47,7 @@ DELETE 1
Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
Looks like we&#39;ll be using cg.identifier.ccafsprojectpii as the field name
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -128,7 +128,7 @@ Looks like we&#39;ll be using cg.identifier.ccafsprojectpii as the field name
</p>
</header>
<h2 id="20170207">2017-02-07</h2>
<h2 id="2017-02-07">2017-02-07</h2>
<ul>
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
@ -145,7 +145,7 @@ DELETE 1
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
<li>Looks like we'll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
</ul>
<h2 id="20170208">2017-02-08</h2>
<h2 id="2017-02-08">2017-02-08</h2>
<ul>
<li>We also need to rename some of the CCAFS Phase I flagships:
<ul>
@ -159,7 +159,7 @@ DELETE 1
<li>Start testing some nearly 500 author corrections that CCAFS sent me:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/CCAFS-Authors-Feb-7.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
</code></pre><h2 id="20170209">2017-02-09</h2>
</code></pre><h2 id="2017-02-09">2017-02-09</h2>
<ul>
<li>More work on CCAFS Phase II stuff</li>
<li>Looks like simply adding a new metadata field to <code>dspace/config/registries/cgiar-types.xml</code> and restarting DSpace causes the field to get added to the rregistry</li>
@ -168,13 +168,13 @@ DELETE 1
<li>Testing some corrections on CCAFS Phase II flagships (<code>cg.subject.ccafs</code>):</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
</code></pre><h2 id="20170210">2017-02-10</h2>
</code></pre><h2 id="2017-02-10">2017-02-10</h2>
<ul>
<li>CCAFS said they want to wait on the flagship updates (<code>cg.subject.ccafs</code>) on CGSpace, perhaps for a month or so</li>
<li>Help Marianne Gadeberg (WLE) with some user permissions as it seems she had previously been using a personal email account, and is now on a CGIAR one</li>
<li>I manually added her new account to ~25 authorizations that her hold user was on</li>
</ul>
<h2 id="20170214">2017-02-14</h2>
<h2 id="2017-02-14">2017-02-14</h2>
<ul>
<li>Add <code>SCALING</code> to ILRI subjects (<a href="https://github.com/ilri/DSpace/pull/304">#304</a>), as Sisay's attempts were all sloppy</li>
<li>Cherry pick some patches from the DSpace 5.7 branch:
@ -187,11 +187,11 @@ DELETE 1
</li>
<li>I still need to test these, especially as the last two which change some stuff with Solr maintenance</li>
</ul>
<h2 id="20170215">2017-02-15</h2>
<h2 id="2017-02-15">2017-02-15</h2>
<ul>
<li>Update rvm on DSpace Test and CGSpace as there was a <a href="https://github.com/justinsteven/advisories/blob/master/2017_rvm_cd_command_execution.md">security disclosure about versions less than 1.28.0</a></li>
</ul>
<h2 id="20170216">2017-02-16</h2>
<h2 id="2017-02-16">2017-02-16</h2>
<ul>
<li>Looking at memory info from munin on CGSpace:</li>
</ul>
@ -262,7 +262,7 @@ dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.15446/agro
<li>Then we could add a cron job for them and run them from the command line like:</li>
</ul>
<pre><code>[dspace]/bin/dspace curate -t noop -i 10568/79891
</code></pre><h2 id="20170220">2017-02-20</h2>
</code></pre><h2 id="2017-02-20">2017-02-20</h2>
<ul>
<li>Run all system updates on DSpace Test and reboot the server</li>
<li>Run CCAFS author corrections on DSpace Test and CGSpace and force a full discovery reindex</li>
@ -281,7 +281,7 @@ b'Entwicklung &amp; L\xc3\xa4ndlicher Raum'
</code></pre><ul>
<li>So for now I will remove the encode call from the script (though it was never used on the versions on the Linux hosts), leading me to believe it really <em>was</em> a temporary problem, perhaps due to macOS or the Python build I was using.</li>
</ul>
<h2 id="20170221">2017-02-21</h2>
<h2 id="2017-02-21">2017-02-21</h2>
<ul>
<li>Testing regenerating PDF thumbnails, like I started in 2016-11</li>
<li>It seems there is a bug in <code>filter-media</code> that causes it to process formats that aren't part of its configuration:</li>
@ -300,14 +300,14 @@ filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = A
<li>I've sent a message to the mailing list and might file a Jira issue</li>
<li>Ask Atmire about the failed interpolation of the <code>dspace.internalUrl</code> variable in <code>atmire-cua.cfg</code></li>
</ul>
<h2 id="20170222">2017-02-22</h2>
<h2 id="2017-02-22">2017-02-22</h2>
<ul>
<li>Atmire said I can add <code>dspace.internalUrl</code> to my build properties and the error will go away</li>
<li>It should be the local URL for accessing Tomcat from the server's own perspective, ie: http://localhost:8080</li>
</ul>
<h2 id="20170226">2017-02-26</h2>
<h2 id="2017-02-26">2017-02-26</h2>
<ul>
<li>Find all fields with &ldquo;<a href="http://hdl.handle.net">http://hdl.handle.net</a>&rdquo; values (most are in <code>dc.identifier.uri</code>, but some are in other URL-related fields like <code>cg.link.reference</code>, <code>cg.identifier.dataurl</code>, and <code>cg.identifier.url</code>):</li>
<li>Find all fields with &ldquo;<a href="http://hdl.handle.net%22">http://hdl.handle.net&quot;</a> values (most are in <code>dc.identifier.uri</code>, but some are in other URL-related fields like <code>cg.link.reference</code>, <code>cg.identifier.dataurl</code>, and <code>cg.identifier.url</code>):</li>
</ul>
<pre><code>dspace=# select distinct metadata_field_id from metadatavalue where resource_type_id=2 and text_value like 'http://hdl.handle.net%';
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where resource_type_id=2 and metadata_field_id IN (25, 113, 179, 219, 220, 223) and text_value like 'http://hdl.handle.net%';
@ -316,7 +316,7 @@ UPDATE 58633
<li>This works but I'm thinking I'll wait on the replacement as there are perhaps some other places that rely on <code>http://hdl.handle.net</code> (grep the code, it's scary how many things are hard coded)</li>
<li>Send message to dspace-tech mailing list with concerns about this</li>
</ul>
<h2 id="20170227">2017-02-27</h2>
<h2 id="2017-02-27">2017-02-27</h2>
<ul>
<li>LDAP users cannot log in today, looks to be an issue with CGIAR's LDAP server:</li>
</ul>
@ -379,7 +379,7 @@ Certificate chain
<li>Redeploy CGSpace and DSpace Test to on latest <code>5_x-prod</code> branch with fixes for LDAP bind user</li>
<li>Run all system updates on CGSpace server and reboot</li>
</ul>
<h2 id="20170228">2017-02-28</h2>
<h2 id="2017-02-28">2017-02-28</h2>
<ul>
<li>After running the CIAT corrections and updating the Discovery and authority indexes, there is still no change in the number of items listed for CIAT in Discovery</li>
<li>Ah, this is probably because some items have the <code>International Center for Tropical Agriculture</code> author twice, which I first noticed in 2016-12 but couldn't figure out how to fix</li>

View File

@ -51,7 +51,7 @@ Interestingly, it seems DSpace 4.x&#39;s thumbnails were sRGB, but forcing regen
$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600&#43;0&#43;0 8-bit CMYK 168KB 0.000u 0:00.000
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -132,11 +132,11 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
</p>
</header>
<h2 id="20170301">2017-03-01</h2>
<h2 id="2017-03-01">2017-03-01</h2>
<ul>
<li>Run the 279 CIAT author corrections on CGSpace</li>
</ul>
<h2 id="20170302">2017-03-02</h2>
<h2 id="2017-03-02">2017-03-02</h2>
<ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
@ -158,7 +158,7 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
<ul>
<li>I filed an issue for the color space thing: <a href="https://jira.duraspace.org/browse/DS-3517">DS-3517</a></li>
</ul>
<h2 id="20170303">2017-03-03</h2>
<h2 id="2017-03-03">2017-03-03</h2>
<ul>
<li>I created a patch for DS-3517 and made a pull request against upstream <code>dspace-5_x</code>: <a href="https://github.com/DSpace/DSpace/pull/1669">https://github.com/DSpace/DSpace/pull/1669</a></li>
<li>Looks like <code>-colorspace sRGB</code> alone isn't enough, we need to use profiles:</li>
@ -176,13 +176,13 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
DirectClass CMYK
$ identify -format '%r\n' Africa\ group\ of\ negotiators.pdf\[0\]
DirectClass sRGB Alpha
</code></pre><h2 id="20170304">2017-03-04</h2>
</code></pre><h2 id="2017-03-04">2017-03-04</h2>
<ul>
<li>Spent more time looking at the ImageMagick CMYK issue</li>
<li>The <code>default_cmyk.icc</code> and <code>default_rgb.icc</code> files are both part of the Ghostscript GPL distribution, but according to DSpace's <code>LICENSES_THIRD_PARTY</code> file, DSpace doesn't allow distribution of dependencies that are licensed solely under the GPL</li>
<li>So this issue is kinda pointless now, as the ICC profiles are absolutely necessary to make a meaningful CMYK→sRGB conversion</li>
</ul>
<h2 id="20170305">2017-03-05</h2>
<h2 id="2017-03-05">2017-03-05</h2>
<ul>
<li>Look into helping developers from landportal.info with a query for items related to LAND on the REST API</li>
<li>They want something like the items that are returned by the general &ldquo;LAND&rdquo; query in the search interface, but we cannot do that</li>
@ -223,11 +223,11 @@ DirectClass sRGB Alpha
<li>Submit pull request to set the author separator for XMLUI item lists to a semicolon instead of &ldquo;,&quot;: <a href="https://github.com/ilri/DSpace/pull/306">https://github.com/ilri/DSpace/pull/306</a></li>
<li>I want to show it briefly to Abenet and Peter to get feedback</li>
</ul>
<h2 id="20170306">2017-03-06</h2>
<h2 id="2017-03-06">2017-03-06</h2>
<ul>
<li>Someone on the mailing list said that <code>handle.plugin.checknameauthority</code> should be false if we're using multiple handle prefixes</li>
</ul>
<h2 id="20170307">2017-03-07</h2>
<h2 id="2017-03-07">2017-03-07</h2>
<ul>
<li>I set up a top-level community as a test for the CGIAR Library and imported one item with the the 10947 handle prefix</li>
<li>When testing the Handle resolver locally it shows the item to be on the local repository</li>
@ -243,18 +243,18 @@ DirectClass sRGB Alpha
<li>Another thing is that the import process creates new <code>dc.date.accessioned</code> and <code>dc.date.available</code> fields, so we end up with duplicates (is it important to preserve the originals for these?)</li>
<li>Report DS-3520 issue to Atmire</li>
</ul>
<h2 id="20170308">2017-03-08</h2>
<h2 id="2017-03-08">2017-03-08</h2>
<ul>
<li>Merge the author separator changes to <code>5_x-prod</code>, as everyone has responded positively about it, and it's the default in Mirage2 afterall!</li>
<li>Cherry pick the <code>commons-collections</code> patch from DSpace's <code>dspace-5_x</code> branch to address DS-3520: <a href="https://jira.duraspace.org/browse/DS-3520">https://jira.duraspace.org/browse/DS-3520</a></li>
</ul>
<h2 id="20170309">2017-03-09</h2>
<h2 id="2017-03-09">2017-03-09</h2>
<ul>
<li>Export list of sponsors so Peter can clean it up:</li>
</ul>
<pre><code>dspace=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship') group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
COPY 285
</code></pre><h2 id="20170312">2017-03-12</h2>
</code></pre><h2 id="2017-03-12">2017-03-12</h2>
<ul>
<li>Test the sponsorship fixes and deletes from Peter:</li>
</ul>
@ -271,7 +271,7 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
<li>Created a basic theme for the Livestock CRP community</li>
</ul>
<p><img src="/cgspace-notes/2017/03/livestock-theme.png" alt="Livestock CRP theme"></p>
<h2 id="20170315">2017-03-15</h2>
<h2 id="2017-03-15">2017-03-15</h2>
<ul>
<li>Merge pull request for controlled vocabulary updates for sponsor: <a href="https://github.com/ilri/DSpace/pull/308">https://github.com/ilri/DSpace/pull/308</a></li>
<li>Merge pull request for Livestock CRP theme: <a href="https://github.com/ilri/DSpace/issues/309">https://github.com/ilri/DSpace/issues/309</a></li>
@ -280,7 +280,7 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
<li>I also need to ask if either of these new fields need to be added to Discovery facets, search, and Atmire modules</li>
<li>Run all system updates on DSpace Test and re-deploy CGSpace</li>
</ul>
<h2 id="20170316">2017-03-16</h2>
<h2 id="2017-03-16">2017-03-16</h2>
<ul>
<li>Merge pull request for PABRA subjects: <a href="https://github.com/ilri/DSpace/pull/310">https://github.com/ilri/DSpace/pull/310</a></li>
<li>Abenet and Peter say we can add them to Discovery, Atmire modules, etc, but I might not have time to do it now</li>
@ -291,15 +291,15 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
<li>Deploy latest changes and investor fixes/deletions on CGSpace</li>
<li>Run system updates on CGSpace and reboot server</li>
</ul>
<h2 id="20170320">2017-03-20</h2>
<h2 id="2017-03-20">2017-03-20</h2>
<ul>
<li>Create basic XMLUI theme for PABRA community: <a href="https://github.com/ilri/DSpace/pull/315">https://github.com/ilri/DSpace/pull/315</a></li>
</ul>
<h2 id="20170324">2017-03-24</h2>
<h2 id="2017-03-24">2017-03-24</h2>
<ul>
<li>Still helping Sisay try to figure out how to create a theme for the RTB community</li>
</ul>
<h2 id="20170328">2017-03-28</h2>
<h2 id="2017-03-28">2017-03-28</h2>
<ul>
<li>CCAFS said they are ready for the flagship updates for Phase II to be run (<code>cg.subject.ccafs</code>), so I ran them on CGSpace:</li>
</ul>
@ -313,7 +313,7 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
<li>I sent a list to CCAFS people so they can tell me if some should be deleted or moved, etc</li>
<li>Test, squash, and merge Sisay's RTB theme into <code>5_x-prod</code>: <a href="https://github.com/ilri/DSpace/pull/316">https://github.com/ilri/DSpace/pull/316</a></li>
</ul>
<h2 id="20170329">2017-03-29</h2>
<h2 id="2017-03-29">2017-03-29</h2>
<ul>
<li>Dump a list of fields in the DC and CG schemas to compare with CG Core:</li>
</ul>
@ -322,7 +322,7 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
<li>Ooh, a better one!</li>
</ul>
<pre><code>dspace=# select coalesce(case when metadata_schema_id=1 then 'dc.' else 'cg.' end) || concat_ws('.', element, qualifier) as field, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
</code></pre><h2 id="20170330">2017-03-30</h2>
</code></pre><h2 id="2017-03-30">2017-03-30</h2>
<ul>
<li>Adjust the Linode CPU usage alerts for the CGSpace server from 150% to 200%, as generally the nightly Solr indexing causes a usage around 150190%, so this should make the alerts less regular</li>
<li>Adjust the threshold for DSpace Test from 90 to 100%</li>

View File

@ -37,7 +37,7 @@ Testing the CMYK patch on a collection with 650 items:
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -118,7 +118,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Th
</p>
</header>
<h2 id="20170402">2017-04-02</h2>
<h2 id="2017-04-02">2017-04-02</h2>
<ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
@ -129,7 +129,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Th
<li>Testing the CMYK patch on a collection with 650 items:</li>
</ul>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre><h2 id="20170403">2017-04-03</h2>
</code></pre><h2 id="2017-04-03">2017-04-03</h2>
<ul>
<li>Continue testing the CMYK patch on more communities:</li>
</ul>
@ -150,7 +150,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Th
<li>Also, I'm noticing some weird outliers in <code>cg.coverage.region</code>, need to remember to go correct these later:</li>
</ul>
<pre><code>dspace=# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=227;
</code></pre><h2 id="20170404">2017-04-04</h2>
</code></pre><h2 id="2017-04-04">2017-04-04</h2>
<ul>
<li>The <code>filter-media</code> script has been running on more large communities and now there are many more CMYK PDFs that have been fixed:</li>
</ul>
@ -177,13 +177,13 @@ ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0
<li>In that case it might just be better to see how many the user submitted (both <em>with</em> and <em>without</em> bitstreams):</li>
</ul>
<pre><code>dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^Submitted.*giampieri.*2016-.*';
</code></pre><h2 id="20170405">2017-04-05</h2>
</code></pre><h2 id="2017-04-05">2017-04-05</h2>
<ul>
<li>After doing a few more large communities it seems this is the final count of CMYK PDFs:</li>
</ul>
<pre><code>$ grep -c profile /tmp/filter-media-cmyk.txt
2505
</code></pre><h2 id="20170406">2017-04-06</h2>
</code></pre><h2 id="2017-04-06">2017-04-06</h2>
<ul>
<li>After reading the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">notes for DCAT April 2017</a> I am testing some new settings for PostgreSQL on DSpace Test:
<ul>
@ -198,7 +198,7 @@ ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0
<li>Sisay added their OAI as a source to a new collection, but using the Simple Dublin Core method, so many fields are unqualified and duplicated</li>
<li>Looking at the <a href="https://wiki.duraspace.org/display/DSDOC5x/XMLUI+Configuration+and+Customization">documentation</a> it seems that we probably want to be using DSpace Intermediate Metadata</li>
</ul>
<h2 id="20170410">2017-04-10</h2>
<h2 id="2017-04-10">2017-04-10</h2>
<ul>
<li>Adjust Linode CPU usage alerts on DSpace servers
<ul>
@ -216,12 +216,12 @@ ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0
<li>I added <code>cg.subject.cifor</code> to the metadata registry and I'm waiting for the harvester to re-harvest to see if it picks up more data now</li>
<li>Another possiblity is that we could use a cross walk&hellip; but I've never done it.</li>
</ul>
<h2 id="20170411">2017-04-11</h2>
<h2 id="2017-04-11">2017-04-11</h2>
<ul>
<li>Looking at the item from CIFOR it hasn't been updated yet, maybe they aren't running the cron job</li>
<li>I emailed Usman from CIFOR to ask if he's running the cron job</li>
</ul>
<h2 id="20170412">2017-04-12</h2>
<h2 id="2017-04-12">2017-04-12</h2>
<ul>
<li>CIFOR says they have cleaned their OAI cache and that the cron job for OAI import is enabled</li>
<li>Now I see updated fields, like <code>dc.date.issued</code> but none from the CG or CIFOR namespaces</li>
@ -281,7 +281,7 @@ sys 1m29.310s
<li>Perhaps I need to file a bug for this, or at least ask on the DSpace Test mailing list?</li>
<li>I wonder if we could use a crosswalk to convert to a format that CG Core wants, like <code>&lt;date Type=&quot;Available&quot;&gt;</code></li>
</ul>
<h2 id="20170413">2017-04-13</h2>
<h2 id="2017-04-13">2017-04-13</h2>
<ul>
<li>Checking the <a href="https://dspacetest.cgiar.org/handle/11463/947?show=full">CIFOR item on DSpace Test</a>, it still doesn't have the new metadata</li>
<li>The collection status shows this message from the harvester:</li>
@ -297,7 +297,7 @@ sys 1m29.310s
<li>It seems like they have done a full metadata migration with <code>dc.date.issued</code> and <code>cg.coverage.country</code> etc</li>
<li>Submit pull request to upstream DSpace for the PDF thumbnail bug (DS-3516): <a href="https://github.com/DSpace/DSpace/pull/1709">https://github.com/DSpace/DSpace/pull/1709</a></li>
</ul>
<h2 id="20170414">2017-04-14</h2>
<h2 id="2017-04-14">2017-04-14</h2>
<ul>
<li>DSpace committers reviewed my patch for DS-3516 and proposed a simpler idea involving incorrect use of <code>SelfRegisteredInputFormats</code></li>
<li>I tested the idea and it works, so I made a new patch: <a href="https://github.com/DSpace/DSpace/pull/1709">https://github.com/DSpace/DSpace/pull/1709</a></li>
@ -311,7 +311,7 @@ sys 1m29.310s
</li>
<li>Reboot DSpace Test server to get new Linode kernel</li>
</ul>
<h2 id="20170417">2017-04-17</h2>
<h2 id="2017-04-17">2017-04-17</h2>
<ul>
<li>CIFOR has now implemented a new &ldquo;cgiar&rdquo; context in their OAI that exposes CG fields, so I am re-harvesting that to see how it looks in the Discovery sidebars and searches</li>
<li>See: <a href="https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:data.cifor.org:11463/947">https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:data.cifor.org:11463/947</a></li>
@ -320,7 +320,7 @@ sys 1m29.310s
</ul>
<pre><code>Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign key constraint &quot;bundle_primary_bitstream_id_fkey&quot; on table &quot;bundle&quot;
Detail: Key (bitstream_id)=(435) is still referenced from table &quot;bundle&quot;.
</code></pre><h2 id="20170418">2017-04-18</h2>
</code></pre><h2 id="2017-04-18">2017-04-18</h2>
<ul>
<li>Helping Tsega test his new <a href="https://github.com/ilri/ckm-cgspace-rest-api">CGSpace REST API Rails app</a> on DSpace Test</li>
<li>Setup and run with:</li>
@ -340,7 +340,7 @@ $ rails -s
<li>This is interesting for creating runnable commands from <code>bundle</code>:</li>
</ul>
<pre><code>$ bundle binstubs puma --path ./sbin
</code></pre><h2 id="20170419">2017-04-19</h2>
</code></pre><h2 id="2017-04-19">2017-04-19</h2>
<ul>
<li>Usman sent another link to their OAI interface, where the country names are now capitalized: <a href="https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:data.cifor.org:11463/947">https://data.cifor.org/dspace/oai/cgiar?verb=GetRecord&amp;metadataPrefix=dim&amp;identifier=oai:data.cifor.org:11463/947</a></li>
<li>Looking at the same item in XMLUI, the countries are not capitalized: <a href="https://data.cifor.org/dspace/xmlui/handle/11463/947?show=full">https://data.cifor.org/dspace/xmlui/handle/11463/947?show=full</a></li>
@ -366,7 +366,7 @@ $ rails -s
<li>Also, we need to only use the PDF on the item corresponding with page 1, so we don't end up with literally hundreds of duplicate PDFs</li>
<li>Alternatively, I could export each page to a standalone PDF&hellip;</li>
</ul>
<h2 id="20170420">2017-04-20</h2>
<h2 id="2017-04-20">2017-04-20</h2>
<ul>
<li>Atmire responded about the Workflow Statistics, saying that it had been disabled because many environments needed customization to be useful</li>
<li>I re-enabled it with a hidden config key <code>workflow.stats.enabled = true</code> on DSpace Test and will evaluate adding it on CGSpace</li>
@ -403,14 +403,14 @@ $ wc -l /tmp/ciat
</ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m&quot;
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace filter-media -f -v -i 10568/71249 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre><h2 id="20170422">2017-04-22</h2>
</code></pre><h2 id="2017-04-22">2017-04-22</h2>
<ul>
<li>Someone on the dspace-tech mailing list responded with a suggestion about the foreign key violation in the <code>cleanup</code> task</li>
<li>The solution is to remove the ID (ie set to NULL) from the <code>primary_bitstream_id</code> column in the <code>bundle</code> table</li>
<li>After doing that and running the <code>cleanup</code> task again I find more bitstreams that are affected and end up with a long list of IDs that need to be fixed:</li>
</ul>
<pre><code>dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (435, 1136, 1132, 1220, 1236, 3002, 3255, 5322);
</code></pre><h2 id="20170424">2017-04-24</h2>
</code></pre><h2 id="2017-04-24">2017-04-24</h2>
<ul>
<li>Two users mentioned some items they recently approved not showing up in the search / XMLUI</li>
<li>I looked at the logs from yesterday and it seems the Discovery indexing has been crashing:</li>
@ -476,7 +476,7 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: this Index
</code></pre><ul>
<li>Now running the cleanup script on DSpace Test and already seeing 11GB freed from the assetstore—it's likely we haven't had a cleanup task complete successfully in years&hellip;</li>
</ul>
<h2 id="20170425">2017-04-25</h2>
<h2 id="2017-04-25">2017-04-25</h2>
<ul>
<li>Finally finished running the PDF thumbnail re-processing on CGSpace, the final count of CMYK PDFs is about 2751</li>
<li>Preparing to run the cleanup task on CGSpace, I want to see how many files are in the assetstore:</li>
@ -544,7 +544,7 @@ Caused by: java.lang.ClassNotFoundException: org.dspace.statistics.content.DSpac
<li>So that is 30,000 files, and about 7GB</li>
<li>Add logging to the cleanup cron task</li>
</ul>
<h2 id="20170426">2017-04-26</h2>
<h2 id="2017-04-26">2017-04-26</h2>
<ul>
<li>The size of the CGSpace database dump went from 111MB to 96MB, not sure about actual database size though</li>
<li>Update RVM's Ruby from 2.3.0 to 2.4.0 on DSpace Test:</li>

View File

@ -15,7 +15,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="May, 2017"/>
<meta name="twitter:description" content="2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&#39;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&#39;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -96,7 +96,7 @@
</p>
</header>
<h2 id="20170501">2017-05-01</h2>
<h2 id="2017-05-01">2017-05-01</h2>
<ul>
<li>ICARDA apparently started working on CG Core on their MEL repository</li>
<li>They have done a few <code>cg.*</code> fields, but not very consistent and even copy some of CGSpace items:
@ -106,11 +106,11 @@
</ul>
</li>
</ul>
<h2 id="20170502">2017-05-02</h2>
<h2 id="2017-05-02">2017-05-02</h2>
<ul>
<li>Atmire got back about the Workflow Statistics issue, and apparently it's a bug in the CUA module so they will send us a pull request</li>
</ul>
<h2 id="20170504">2017-05-04</h2>
<h2 id="2017-05-04">2017-05-04</h2>
<ul>
<li>Sync DSpace Test with database and assetstore from CGSpace</li>
<li>Re-deploy DSpace Test with Atmire's CUA patch for workflow statistics, run system updates, and restart the server</li>
@ -118,7 +118,7 @@
<li>Megan says there are still some mapped items are not appearing since last week, so I forced a full <code>index-discovery -b</code></li>
<li>Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: <a href="https://cgspace.cgiar.org/handle/10568/80731">https://cgspace.cgiar.org/handle/10568/80731</a></li>
</ul>
<h2 id="20170505">2017-05-05</h2>
<h2 id="2017-05-05">2017-05-05</h2>
<ul>
<li>Discovered that CGSpace has ~700 items that are missing the <code>cg.identifier.status</code> field</li>
<li>Need to perhaps try using the &ldquo;required metadata&rdquo; curation task to find fields missing these items:</li>
@ -127,13 +127,13 @@
</code></pre><ul>
<li>It seems the curation task dies when it finds an item which has missing metadata</li>
</ul>
<h2 id="20170506">2017-05-06</h2>
<h2 id="2017-05-06">2017-05-06</h2>
<ul>
<li>Add &ldquo;Blog Post&rdquo; to <code>dc.type</code></li>
<li>Create ticket on Atmire tracker to ask about commissioning them to develop the feature to expose ORCID via REST/OAI: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510</a></li>
<li>According to the <a href="https://wiki.duraspace.org/display/DSDOC5x/Curation+System">DSpace curation docs</a> the fact that the <code>requiredmetadata</code> curation task stops when it finds a missing metadata field is by design</li>
</ul>
<h2 id="20170507">2017-05-07</h2>
<h2 id="2017-05-07">2017-05-07</h2>
<ul>
<li>Testing one replacement for CCAFS Flagships (<code>cg.subject.ccafs</code>), first changed in the submission forms, and then in the database:</li>
</ul>
@ -142,7 +142,7 @@
<li>Also, CCAFS wants to re-order their flagships to prioritize the Phase II ones</li>
<li>Waiting for feedback from CCAFS, then I can merge <a href="https://github.com/ilri/DSpace/pull/320">#320</a></li>
</ul>
<h2 id="20170508">2017-05-08</h2>
<h2 id="2017-05-08">2017-05-08</h2>
<ul>
<li>Start working on CGIAR Library migration</li>
<li>We decided to use AIP export to preserve the hierarchies and handles of communities and collections</li>
@ -171,7 +171,7 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
<li>This uses the webui's item list sort options, see <code>webui.itemlist.sort-option</code> in <code>dspace.cfg</code></li>
<li>The equivalent Discovery search would be: <a href="https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&amp;filter_relational_operator_1=equals&amp;filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&amp;submit_apply_filter=&amp;query=&amp;rpp=10&amp;sort_by=dc.date.issued_dt&amp;order=desc">https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&amp;filter_relational_operator_1=equals&amp;filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&amp;submit_apply_filter=&amp;query=&amp;rpp=10&amp;sort_by=dc.date.issued_dt&amp;order=desc</a></li>
</ul>
<h2 id="20170509">2017-05-09</h2>
<h2 id="2017-05-09">2017-05-09</h2>
<ul>
<li>The CGIAR Library metadata has some blank metadata values, which leads to <code>|||</code> in the Discovery facets</li>
<li>Clean these up in the database using:</li>
@ -188,7 +188,7 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
<li>I think those errors actually come from me running the <code>update-sequences.sql</code> script while Tomcat/DSpace are running</li>
<li>Apparently you need to stop Tomcat!</li>
</ul>
<h2 id="20170510">2017-05-10</h2>
<h2 id="2017-05-10">2017-05-10</h2>
<ul>
<li>Atmire says they are willing to extend the ORCID implementation, and I've asked them to provide a quote</li>
<li>I clarified that the scope of the implementation should be that ORCIDs are stored in the database and exposed via REST / API like other fields</li>
@ -208,13 +208,13 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
<li>After this I ran the <code>update-sequences.sql</code> script (with Tomcat shut down), and cleaned up the 200+ blank metadata records:</li>
</ul>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
</code></pre><h2 id="20170513">2017-05-13</h2>
</code></pre><h2 id="2017-05-13">2017-05-13</h2>
<ul>
<li>After quite a bit of troubleshooting with importing cleaned up data as CSV, it seems that there are actually <a href="https://en.wikipedia.org/wiki/Null_character">NUL</a> characters in the <code>dc.description.abstract</code> field (at least) on the lines where CSV importing was failing</li>
<li>I tried to find a way to remove the characters in vim or Open Refine, but decided it was quicker to just remove the column temporarily and import it</li>
<li>The import was successful and detected 2022 changes, which should likely be the rest that were failing to import before</li>
</ul>
<h2 id="20170515">2017-05-15</h2>
<h2 id="2017-05-15">2017-05-15</h2>
<ul>
<li>To delete the blank lines that cause isses during import we need to use a regex in vim <code>g/^$/d</code></li>
<li>After that I started looking in the <code>dc.subject</code> field to try to pull countries and regions out, but there are too many values in there</li>
@ -241,12 +241,12 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
<p>Fix cron jobs for log management on DSpace Test, as they weren't catching <code>dspace.log.*</code> files correctly and we had over six months of them and they were taking up many gigs of disk space</p>
</li>
</ul>
<h2 id="20170516">2017-05-16</h2>
<h2 id="2017-05-16">2017-05-16</h2>
<ul>
<li>Discuss updates to WLE themes for their Phase II</li>
<li>Make an issue to track the changes to <code>cg.subject.wle</code>: <a href="https://github.com/ilri/DSpace/issues/322">#322</a></li>
</ul>
<h2 id="20170517">2017-05-17</h2>
<h2 id="2017-05-17">2017-05-17</h2>
<ul>
<li>Looking into the error I get when trying to create a new collection on DSpace Test:</li>
</ul>
@ -275,13 +275,13 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
</code></pre><ul>
<li>After that I can create collections just fine, though I'm not sure if it has other side effects</li>
</ul>
<h2 id="20170521">2017-05-21</h2>
<h2 id="2017-05-21">2017-05-21</h2>
<ul>
<li>Start creating a basic theme for the CGIAR System Organization's community on CGSpace</li>
<li>Using colors from the <a href="http://library.cgiar.org/handle/10947/2699">CGIAR Branding guidelines (2014)</a></li>
<li>Make a GitHub issue to track this work: <a href="https://github.com/ilri/DSpace/issues/324">#324</a></li>
</ul>
<h2 id="20170522">2017-05-22</h2>
<h2 id="2017-05-22">2017-05-22</h2>
<ul>
<li>Do some cleanups of community and collection names in CGIAR System Management Office community on DSpace Test, as well as move some items as Peter requested</li>
<li>Peter wanted a list of authors in here, so I generated a list of collections using the &ldquo;View Source&rdquo; on each community and this hacky awk:</li>
@ -311,7 +311,7 @@ from metadatavalue
where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author')
AND resource_type_id = 2
AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10947/2', '10947/3', '10947/10', '10947/4', '10947/5', '10947/6', '10947/7', '10947/8', '10947/9', '10947/11', '10947/25', '10947/12', '10947/26', '10947/27', '10947/28', '10947/29', '10947/30', '10947/13', '10947/14', '10947/15', '10947/16', '10947/31', '10947/32', '10947/33', '10947/34', '10947/35', '10947/36', '10947/37', '10947/17', '10947/18', '10947/38', '10947/19', '10947/39', '10947/40', '10947/41', '10947/42', '10947/43', '10947/2512', '10947/44', '10947/20', '10947/21', '10947/45', '10947/46', '10947/47', '10947/48', '10947/49', '10947/22', '10947/23', '10947/24', '10947/50', '10947/51', '10947/2518', '10947/2776', '10947/2790', '10947/2521', '10947/2522', '10947/2782', '10947/2525', '10947/2836', '10947/2524', '10947/2878', '10947/2520', '10947/2523', '10947/2786', '10947/2631', '10947/2589', '10947/2519', '10947/2708', '10947/2526', '10947/2871', '10947/2527', '10947/4467', '10947/3457', '10947/2528', '10947/2529', '10947/2533', '10947/2530', '10947/2531', '10947/2532', '10947/2538', '10947/2534', '10947/2540', '10947/2900', '10947/2539', '10947/2784', '10947/2536', '10947/2805', '10947/2541', '10947/2535', '10947/2537', '10568/93761'))) group by text_value order by count desc) to /tmp/cgiar-librar-authors.csv with csv;
</code></pre><h2 id="20170523">2017-05-23</h2>
</code></pre><h2 id="2017-05-23">2017-05-23</h2>
<ul>
<li>Add Affiliation to filters on Listing and Reports module (<a href="https://github.com/ilri/DSpace/pull/325">#325</a>)</li>
<li>Start looking at WLE's Phase II metadata updates but it seems they are not tagging their items properly, as their website importer infers which theme to use based on the name of the CGSpace collection!</li>
@ -323,12 +323,12 @@ COPY 111
</code></pre><ul>
<li>Respond to Atmire message about ORCIDs, saying that right now we'd prefer to just have them available via REST API like any other metadata field, and that I'm available for a Skype</li>
</ul>
<h2 id="20170526">2017-05-26</h2>
<h2 id="2017-05-26">2017-05-26</h2>
<ul>
<li>Increase max file size in nginx so that CIP can upload some larger PDFs</li>
<li>Agree to talk with Atmire after the June DSpace developers meeting where they will be discussing exposing ORCIDs via REST/OAI</li>
</ul>
<h2 id="20170528">2017-05-28</h2>
<h2 id="2017-05-28">2017-05-28</h2>
<ul>
<li>File an issue on GitHub to explore/track migration to proper country/region codes (ISO 2/3 and UN M.49): <a href="https://github.com/ilri/DSpace/issues/326">#326</a></li>
<li>Ask Peter how the Landportal.info people should acknowledge us as the source of data on their website</li>
@ -354,7 +354,7 @@ UPDATE 187
<li>Run the corrections on CGSpace and then update discovery / authority</li>
<li>I notice that there are a handful of <code>java.lang.OutOfMemoryError: Java heap space</code> errors in the Catalina logs on CGSpace, I should go look into that&hellip;</li>
</ul>
<h2 id="20170529">2017-05-29</h2>
<h2 id="2017-05-29">2017-05-29</h2>
<ul>
<li>Discuss WLE themes and subjects with Mia and Macaroni Bros</li>
<li>We decided we need to create metadata fields for Phase I and II themes</li>

View File

@ -15,7 +15,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="June, 2017"/>
<meta name="twitter:description" content="2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&#39;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -96,7 +96,7 @@
</p>
</header>
<h2 id="20170601">2017-06-01</h2>
<h2 id="2017-06-01">2017-06-01</h2>
<ul>
<li>After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes</li>
<li>The <code>cg.identifier.wletheme</code> field will be used for both Phase I and Phase II Research Themes</li>
@ -106,7 +106,7 @@
<li>Create pull request to add Phase II research themes to the submission form: <a href="https://github.com/ilri/DSpace/pull/328">#328</a></li>
<li>Add <code>cg.subject.system</code> to CGSpace metadata registry, for subject from the upcoming CGIAR Library migration</li>
</ul>
<h2 id="20170604">2017-06-04</h2>
<h2 id="2017-06-04">2017-06-04</h2>
<ul>
<li>After adding <code>cg.identifier.wletheme</code> to 1106 WLE items I can see the field on XMLUI but not in REST!</li>
<li>Strangely it happens on DSpace Test AND on CGSpace!</li>
@ -115,7 +115,7 @@
<li>After rebooting the server (and therefore restarting Tomcat) the new metadata field is available</li>
<li>I've sent a message to the dspace-tech mailing list to ask if this is a bug and whether I should file a Jira ticket</li>
</ul>
<h2 id="20160605">2016-06-05</h2>
<h2 id="2016-06-05">2016-06-05</h2>
<ul>
<li>Rename WLE's &ldquo;Research Themes&rdquo; sub-community to &ldquo;WLE Phase I Research Themes&rdquo; on DSpace Test so Macaroni Bros can continue their testing</li>
<li>Macaroni Bros tested it and said it's fine, so I renamed it on CGSpace as well</li>
@ -151,7 +151,7 @@
<li>Total items in CIAT Book Chapters is 914, with the others being flagged for some reason, and we should send that back to CIAT</li>
<li>Restart Tomcat on CGSpace so that the <code>cg.identifier.wletheme</code> field is available on REST API for Macaroni Bros</li>
</ul>
<h2 id="20170607">2017-06-07</h2>
<h2 id="2017-06-07">2017-06-07</h2>
<ul>
<li>Testing <a href="https://github.com/ilri/DSpace/pull/319">Atmire's patch for the CUA Workflow Statistics again</a></li>
<li>Still doesn't seem to give results I'd expect, like there are no results for Maria Garruccio, or for the ILRI community!</li>
@ -186,12 +186,12 @@
</code></pre><ul>
<li>Merge the pull request for <a href="https://github.com/ilri/DSpace/pull/328">WLE Phase II themes</a></li>
</ul>
<h2 id="20170618">2017-06-18</h2>
<h2 id="2017-06-18">2017-06-18</h2>
<ul>
<li>Redeploy CGSpace with latest changes from <code>5_x-prod</code>, run system updates, and reboot the server</li>
<li>Continue working on ansible infrastructure changes for CGIAR Library</li>
</ul>
<h2 id="20170620">2017-06-20</h2>
<h2 id="2017-06-20">2017-06-20</h2>
<ul>
<li>Import Abenet and Peter's changes to the CGIAR Library CRP community</li>
<li>Due to them using Windows and renaming some columns there were formatting, encoding, and duplicate metadata value issues</li>
@ -207,7 +207,7 @@
</ul>
<pre><code>$ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &amp;&gt; /tmp/ciat-books.log
$ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books2.map &amp;&gt; /tmp/ciat-books2.log
</code></pre><h2 id="20170625">2017-06-25</h2>
</code></pre><h2 id="2017-06-25">2017-06-25</h2>
<ul>
<li>WLE has said that one of their Phase II research themes is being renamed from <code>Regenerating Degraded Landscapes</code> to <code>Restoring Degraded Landscapes</code></li>
<li>Pull request with the changes to <code>input-forms.xml</code>: <a href="https://github.com/ilri/DSpace/pull/329">#329</a></li>
@ -221,7 +221,7 @@ $ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; [dspace]/bin/dspace impo
<li>Marianne from WLE asked if they can have both Phase I and II research themes together in the item submission form</li>
<li>Perhaps we can add them together in the same question for <code>cg.identifier.wletheme</code></li>
</ul>
<h2 id="20170630">2017-06-30</h2>
<h2 id="2017-06-30">2017-06-30</h2>
<ul>
<li>CGSpace went down briefly, I see lots of these errors in the dspace logs:</li>
</ul>

View File

@ -33,7 +33,7 @@ Merge changes for WLE Phase II theme rename (#329)
Looking at extracting the metadata registries from ICARDA&#39;s MEL DSpace database so we can compare fields with CGSpace
We can use PostgreSQL&#39;s extended output format (-x) plus sed to format the output into quasi XML:
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -114,11 +114,11 @@ We can use PostgreSQL&#39;s extended output format (-x) plus sed to format the o
</p>
</header>
<h2 id="20170701">2017-07-01</h2>
<h2 id="2017-07-01">2017-07-01</h2>
<ul>
<li>Run system updates and reboot DSpace Test</li>
</ul>
<h2 id="20170704">2017-07-04</h2>
<h2 id="2017-07-04">2017-07-04</h2>
<ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li>
@ -138,7 +138,7 @@ We can use PostgreSQL&#39;s extended output format (-x) plus sed to format the o
<li>And fuck, then anyone consuming our data via REST / OAI will not notice that we have an author outside of <code>dc.contributor.authors</code>&hellip; ugh</li>
<li>What if we modify the item submission form to use <a href="https://wiki.duraspace.org/display/DSDOC5x/Submission+User+Interface#SubmissionUserInterface-ItemtypeBasedMetadataCollection"><code>type-bind</code> fields to show/hide certain fields depending on the type</a>?</li>
</ul>
<h2 id="20170705">2017-07-05</h2>
<h2 id="2017-07-05">2017-07-05</h2>
<ul>
<li>Adjust WLE Research Theme to include both Phase I and II on the submission form according to editor feedback (<a href="https://github.com/ilri/DSpace/pull/330">#330</a>)</li>
<li>Generate list of fields in the current CGSpace <code>cg</code> scheme so we can record them properly in the metadata registry:</li>
@ -159,26 +159,26 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
</code></pre><ul>
<li>Seems to come from <code>dspace-api/src/main/java/org/dspace/statistics/SolrLogger.java</code></li>
</ul>
<h2 id="20170706">2017-07-06</h2>
<h2 id="2017-07-06">2017-07-06</h2>
<ul>
<li>Sisay tried to help by making a <a href="https://github.com/ilri/DSpace/pull/331">pull request for the RTB flagships</a> but there are formatting errors, unrelated changes, and the flagship names are not in the style I requested</li>
<li>Abenet talked to CIP and they said they are actually ok with using collection names rather than adding a new metadata field</li>
</ul>
<h2 id="20170713">2017-07-13</h2>
<h2 id="2017-07-13">2017-07-13</h2>
<ul>
<li>Remove <code>UKaid</code> from the controlled vocabulary for <code>dc.description.sponsorship</code>, as <code>Department for International Development, United Kingdom</code> is the correct form and it is already present (<a href="https://github.com/ilri/DSpace/pull/334">#334</a>)</li>
</ul>
<h2 id="20170714">2017-07-14</h2>
<h2 id="2017-07-14">2017-07-14</h2>
<ul>
<li>Sisay sent me a patch to add &ldquo;Photo Report&rdquo; to <code>dc.type</code> so I've added it to the <code>5_x-prod</code> branch</li>
</ul>
<h2 id="20170717">2017-07-17</h2>
<h2 id="2017-07-17">2017-07-17</h2>
<ul>
<li>Linode shut down our seventeen (17) VMs due to nonpayment of the July 1st invoice</li>
<li>It took me a few hours to find the ICT/Finance contacts to pay the bill and boot all the servers back up</li>
<li>Since the server was down anyways, I decided to run all system updates and re-deploy CGSpace so that the latest changes to <code>input-forms.xml</code> and the sponsors controlled vocabulary</li>
</ul>
<h2 id="20170720">2017-07-20</h2>
<h2 id="2017-07-20">2017-07-20</h2>
<ul>
<li>Skype chat with Addis team about the status of the CGIAR Library migration</li>
<li>Need to add the CGIAR System Organization subjects to Discovery Facets (test first)</li>
@ -199,7 +199,7 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
</ul>
</li>
</ul>
<h2 id="20170724">2017-07-24</h2>
<h2 id="2017-07-24">2017-07-24</h2>
<ul>
<li>Move two top-level communities to be sub-communities of ILRI Projects</li>
</ul>
@ -207,7 +207,7 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
</code></pre><ul>
<li>Discuss CGIAR Library data cleanup with Sisay and Abenet</li>
</ul>
<h2 id="20170727">2017-07-27</h2>
<h2 id="2017-07-27">2017-07-27</h2>
<ul>
<li>Help Sisay with some transforms to add descriptions to the <code>filename</code> column of some CIAT Presentations he's working on in OpenRefine</li>
<li>Marianne emailed a few days ago to ask why &ldquo;Integrating Ecosystem Solutions&rdquo; was not in the list of WLE Phase I Research Themes on the input form</li>
@ -215,21 +215,21 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
<li>Then Mia from WLE also emailed to ask where some WLE focal regions went, and I said I didn't understand what she was talking about, as all we did in our previous work was rename the old &ldquo;Research Themes&rdquo; subcommunity to &ldquo;WLE Phase I Research Themes&rdquo; and add a new subcommunity for &ldquo;WLE Phase II Research Themes&rdquo;.</li>
<li>Discuss some modifications to the CCAFS project tags in CGSpace submission form and in the database</li>
</ul>
<h2 id="20170728">2017-07-28</h2>
<h2 id="2017-07-28">2017-07-28</h2>
<ul>
<li>Discuss updates to the Phase II CCAFS project tags with Andrea from Macaroni Bros</li>
<li>I will do the renaming and untagging of items in CGSpace database, and he will update his webservice with the latest project tags and I will get the XML from here for our <code>input-forms.xml</code>: <a href="https://ccafs.cgiar.org/export/ccafsproject">https://ccafs.cgiar.org/export/ccafsproject</a></li>
</ul>
<h2 id="20170729">2017-07-29</h2>
<h2 id="2017-07-29">2017-07-29</h2>
<ul>
<li>Move some WLE items into appropriate Phase I Research Themes communities and delete some empty collections in WLE Regions community</li>
</ul>
<h2 id="20170730">2017-07-30</h2>
<h2 id="2017-07-30">2017-07-30</h2>
<ul>
<li>Start working on CCAFS project tag cleanup</li>
<li>More questions about inconsistencies and spelling mistakes in their tags, so I've sent some questions for followup</li>
</ul>
<h2 id="20170731">2017-07-31</h2>
<h2 id="2017-07-31">2017-07-31</h2>
<ul>
<li>Looks like the final list of metadata corrections for CCAFS project tags will be:</li>
</ul>

View File

@ -57,7 +57,7 @@ This was due to newline characters in the dc.description.abstract column, which
I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -138,7 +138,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
</p>
</header>
<h2 id="20170801">2017-08-01</h2>
<h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
@ -160,7 +160,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
</ul>
<h2 id="20170802">2017-08-02</h2>
<h2 id="2017-08-02">2017-08-02</h2>
<ul>
<li>Magdalena from CCAFS asked if there was a way to get the top ten items published in 2016 (note: not the top items in 2016!)</li>
<li>I think Atmire's Content and Usage Analysis module should be able to do this but I will have to look at the configuration and maybe email Atmire if I can't figure it out</li>
@ -168,7 +168,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<li>Atmire responded about the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=500">missing workflow statistics issue</a> a few weeks ago but I didn't see it for some reason</li>
<li>They said they added a publication and saw the workflow stat for the user, so I should try again and let them know</li>
</ul>
<h2 id="20170805">2017-08-05</h2>
<h2 id="2017-08-05">2017-08-05</h2>
<ul>
<li>Usman from CIFOR emailed to ask about the status of our OAI tests for harvesting their DSpace repository</li>
<li>I told him that the OAI appears to not be harvesting properly after the first sync, and that the control panel shows an &ldquo;Internal error&rdquo; for that collection:</li>
@ -178,18 +178,18 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<li>I don't see anything related in our logs, so I asked him to check for our server's IP in their logs</li>
<li>Also, in the mean time I stopped the harvesting process, reset the status, and restarted the process via the Admin control panel (note: I didn't reset the collection, just the harvester status!)</li>
</ul>
<h2 id="20170807">2017-08-07</h2>
<h2 id="2017-08-07">2017-08-07</h2>
<ul>
<li>Apply Abenet's corrections for the CGIAR Library's Consortium subcommunity (697 records)</li>
<li>I had to fix a few small things, like moving the <code>dc.title</code> column away from the beginning of the row, delete blank spaces in the abstract in vim using <code>:g/^$/d</code>, add the <code>dc.subject[en_US]</code> column back, as she had deleted it and DSpace didn't detect the changes made there (we needed to blank the values instead)</li>
</ul>
<h2 id="20170808">2017-08-08</h2>
<h2 id="2017-08-08">2017-08-08</h2>
<ul>
<li>Apply Abenet's corrections for the CGIAR Library's historic archive subcommunity (2415 records)</li>
<li>I had to add the <code>dc.subject[en_US]</code> column back with blank values so that DSpace could detect the changes</li>
<li>I applied the changes in 500 item batches</li>
</ul>
<h2 id="20170809">2017-08-09</h2>
<h2 id="2017-08-09">2017-08-09</h2>
<ul>
<li>Run system updates on DSpace Test and reboot server</li>
<li>Help ICARDA upgrade their MELSpace to DSpace 5.7 using the <a href="https://github.com/alanorth/docker-dspace">docker-dspace</a> container
@ -199,7 +199,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
</ul>
</li>
</ul>
<h2 id="20170810">2017-08-10</h2>
<h2 id="2017-08-10">2017-08-10</h2>
<ul>
<li>Apply last updates to the CGIAR Library's Fund community (812 items)</li>
<li>Had to do some quality checks and column renames before importing, as either Sisay or Abenet renamed a few columns and the metadata importer wanted to remove/add new metadata for title, abstract, etc.</li>
@ -220,7 +220,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<li>Follow up with Atmire on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">ticket about ORCID metadata in DSpace</a></li>
<li>Follow up with Lili and Andrea about the pending CCAFS metadata and flagship updates</li>
</ul>
<h2 id="20170811">2017-08-11</h2>
<h2 id="2017-08-11">2017-08-11</h2>
<ul>
<li>CGSpace had load issues and was throwing errors related to PostgreSQL</li>
<li>I told Tsega to reduce the max connections from 70 to 40 because actually each web application gets that limit and so for xmlui, oai, jspui, rest, etc it could be 70 x 4 = 280 connections depending on the load, and the PostgreSQL config itself is only 100!</li>
@ -229,7 +229,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
<li>Also, I need to find out where the load is coming from (rest?) and possibly block bots from accessing dynamic pages like Browse and Discover instead of just sending an X-Robots-Tag HTTP header</li>
<li>I noticed that Google has bitstreams from the <code>rest</code> interface in the search index. I need to ask on the dspace-tech mailing list to see what other people are doing about this, and maybe start issuing an <code>X-Robots-Tag: none</code> there!</li>
</ul>
<h2 id="20170812">2017-08-12</h2>
<h2 id="2017-08-12">2017-08-12</h2>
<ul>
<li>I sent a message to the mailing list about the duplicate content issue with <code>/rest</code> and <code>/bitstream</code> URLs</li>
<li>Looking at the logs for the REST API on <code>/rest</code>, it looks like there is someone hammering doing testing or something on it&hellip;</li>
@ -249,12 +249,12 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
access_log /var/log/nginx/oai.log;
proxy_pass http://tomcat_http;
}
</code></pre><h2 id="20170813">2017-08-13</h2>
</code></pre><h2 id="2017-08-13">2017-08-13</h2>
<ul>
<li>Macaroni Bros say that CCAFS wants them to check once every hour for changes</li>
<li>I told them to check every four or six hours</li>
</ul>
<h2 id="20170814">2017-08-14</h2>
<h2 id="2017-08-14">2017-08-14</h2>
<ul>
<li>Run author corrections on CGIAR Library community from Peter</li>
</ul>
@ -300,7 +300,7 @@ $ grep -rsI SQLException dspace-xmlui | wc -l
<li>Apply 223 more author corrections from Peter on CGIAR Library</li>
<li>Help Magdalena from CCAFS with some CUA statistics questions</li>
</ul>
<h2 id="20170815">2017-08-15</h2>
<h2 id="2017-08-15">2017-08-15</h2>
<ul>
<li>Increase the nginx upload limit on CGSpace (linode18) so Sisay can upload 23 CIAT reports</li>
<li>Do some last minute cleanups and de-duplications of the CGIAR Library data, as I need to send it to Peter this week</li>
@ -308,7 +308,7 @@ $ grep -rsI SQLException dspace-xmlui | wc -l
<li>Also, a few dozen <code>dc.description.abstract</code> fields still had various HTML tags and entities in them</li>
<li>Also, a bunch of <code>dc.subject</code> fields that were not AGROVOC had not been moved properly to <code>cg.system.subject</code></li>
</ul>
<h2 id="20170816">2017-08-16</h2>
<h2 id="2017-08-16">2017-08-16</h2>
<ul>
<li>I wanted to merge the various field variations like <code>cg.subject.system</code> and <code>cg.subject.system[en_US]</code> in OpenRefine but I realized it would be easier in PostgreSQL:</li>
</ul>
@ -351,7 +351,7 @@ UPDATE 4899
<li>I think we could use <code>harvest.includerestricted.rss = false</code> but the items might need to be 100% restricted, not just the metadata</li>
<li>Adjust Ansible postgres role to use <code>max_connections</code> from a template variable and deploy a new limit of 123 on CGSpace</li>
</ul>
<h2 id="20170817">2017-08-17</h2>
<h2 id="2017-08-17">2017-08-17</h2>
<ul>
<li>Run Peter's edits to the CGIAR System Organization community on DSpace Test</li>
<li>Uptime Robot said CGSpace went down for 1 minute, not sure why</li>
@ -395,7 +395,7 @@ dspace.log.2017-08-17:584
</li>
<li>Peter responded and said that he doesn't want to limit items to be restricted just so we can change the RSS feeds</li>
</ul>
<h2 id="20170818">2017-08-18</h2>
<h2 id="2017-08-18">2017-08-18</h2>
<ul>
<li>Someone on the dspace-tech mailing list responded with some tips about using the authority framework to do external queries from the submission form</li>
<li>He linked to some examples from DSpace-CRIS that use this functionality: <a href="https://github.com/4Science/DSpace/blob/dspace-5_x_x-cris/dspace-api/src/main/java/org/dspace/content/authority/VIAFAuthority.java">VIAFAuthority</a></li>
@ -432,14 +432,14 @@ WHERE {
<li>I found this blog post about speeding up the Tomcat startup time: <a href="http://skybert.net/java/improve-tomcat-startup-time/">http://skybert.net/java/improve-tomcat-startup-time/</a></li>
<li>The startup time went from ~80s to 40s!</li>
</ul>
<h2 id="20170819">2017-08-19</h2>
<h2 id="2017-08-19">2017-08-19</h2>
<ul>
<li>More examples of SPARQL queries: <a href="https://github.com/rsinger/openlcsh/wiki/Sparql-Examples">https://github.com/rsinger/openlcsh/wiki/Sparql-Examples</a></li>
<li>Specifically the explanation of the <code>FILTER</code> regex</li>
<li>Might want to <code>SELECT DISTINCT</code> or increase the <code>LIMIT</code> to get terms like &ldquo;wheat&rdquo; and &ldquo;fish&rdquo; to be visible</li>
<li>Test queries online on the AGROVOC SPARQL portal: http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc</li>
</ul>
<h2 id="20170820">2017-08-20</h2>
<h2 id="2017-08-20">2017-08-20</h2>
<ul>
<li>Since I cleared the XMLUI cache on 2017-08-17 there haven't been any more <code>ERROR net.sf.ehcache.store.DiskStore</code> errors</li>
<li>Look at the CGIAR Library to see if I can find the items that have been submitted since May:</li>
@ -466,16 +466,16 @@ WHERE {
10947/4661
10947/4664
(5 rows)
</code></pre><h2 id="20170823">2017-08-23</h2>
</code></pre><h2 id="2017-08-23">2017-08-23</h2>
<ul>
<li>Start testing the nginx configs for the CGIAR Library migration as well as start making a checklist</li>
</ul>
<h2 id="20170828">2017-08-28</h2>
<h2 id="2017-08-28">2017-08-28</h2>
<ul>
<li>Bram had written to me two weeks ago to set up a chat about ORCID stuff but the email apparently bounced and I only found out when he emaiiled me on another account</li>
<li>I told him I can chat in a few weeks when I'm back</li>
</ul>
<h2 id="20170831">2017-08-31</h2>
<h2 id="2017-08-31">2017-08-31</h2>
<ul>
<li>I notice that in many WLE collections Marianne Gadeberg is in the edit or approval steps, but she is also in the groups for those steps.</li>
<li>I think we need to have a process to go back and check / fix some of these scenarios—to remove her user from the step and instead add her to the group—because we have way too many authorizations and in late 2016 we had <a href="https://github.com/ilri/rmg-ansible-public/commit/358b5ea43f9e5820986f897c9d560937c702ac6e">performance issues with Solr</a> because of this</li>

View File

@ -29,7 +29,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two
Ask Sisay to clean up the WLE approvers a bit, as Marianne&#39;s user account is both in the approvers step as well as the group
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -110,15 +110,15 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne&#39;s user account is
</p>
</header>
<h2 id="20170906">2017-09-06</h2>
<h2 id="2017-09-06">2017-09-06</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul>
<h2 id="20170907">2017-09-07</h2>
<h2 id="2017-09-07">2017-09-07</h2>
<ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li>
</ul>
<h2 id="20170910">2017-09-10</h2>
<h2 id="2017-09-10">2017-09-10</h2>
<ul>
<li>Delete 58 blank metadata values from the CGSpace database:</li>
</ul>
@ -155,12 +155,12 @@ dspace.log.2017-09-10:0
<li>I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit)</li>
<li>I'm expecting to see 0 connection errors for the next few months</li>
</ul>
<h2 id="20170911">2017-09-11</h2>
<h2 id="2017-09-11">2017-09-11</h2>
<ul>
<li>Lots of work testing the CGIAR Library migration</li>
<li>Many technical notes and TODOs here: <a href="https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c">https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c</a></li>
</ul>
<h2 id="20170912">2017-09-12</h2>
<h2 id="2017-09-12">2017-09-12</h2>
<ul>
<li>I was testing the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-AIPConfigurationsToImproveIngestionSpeedwhileValidating">METS XSD caching during AIP ingest</a> but it doesn't seem to help actually</li>
<li>The import process takes the same amount of time with and without the caching</li>
@ -190,7 +190,7 @@ dspace.log.2017-09-10:0
</ul>
</li>
</ul>
<h2 id="20170913">2017-09-13</h2>
<h2 id="2017-09-13">2017-09-13</h2>
<ul>
<li>Last night Linode sent an alert about CGSpace (linode18) that it has exceeded the outbound traffic rate threshold of 10Mb/s for the last two hours</li>
<li>I wonder what was going on, and looking into the nginx logs I think maybe it's OAI&hellip;</li>
@ -406,7 +406,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
</code></pre><ul>
<li>It added <em>another</em> authority&hellip; surely this is not the desired behavior, or maybe we are not using this as intented?</li>
</ul>
<h2 id="20170914">2017-09-14</h2>
<h2 id="2017-09-14">2017-09-14</h2>
<ul>
<li>Communicate with Handle.net admins to try to get some guidance about the 10947 prefix</li>
<li>Michael Marus is the contact for their prefix but he has left CGIAR, but as I actually have access to the CGIAR Library server I think I can just generate a new <code>sitebndl.zip</code> file from their server and send it to Handle.net</li>
@ -415,7 +415,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
<li>I didn't see any abnormally high usage in the REST or OAI logs, but looking at Munin I see the average JVM usage was at 4.9GB and the heap is only 5GB (5120M), so I think it's just normal growing pains</li>
<li>Every few months I generally try to increase the JVM heap to be 512M higher than the average usage reported by Munin, so now I adjusted it to 5632M</li>
</ul>
<h2 id="20170915">2017-09-15</h2>
<h2 id="2017-09-15">2017-09-15</h2>
<ul>
<li>Apply CCAFS project tag corrections on CGSpace:</li>
</ul>
@ -425,7 +425,7 @@ UPDATE 4
UPDATE 1
DELETE 1
DELETE 207
</code></pre><h2 id="20170917">2017-09-17</h2>
</code></pre><h2 id="2017-09-17">2017-09-17</h2>
<ul>
<li>Create pull request for CGSpace to be able to resolve multiple handles (<a href="https://github.com/ilri/DSpace/pull/339">#339</a>)</li>
<li>We still need to do the changes to <code>config.dct</code> and regenerate the <code>sitebndl.zip</code> to send to the Handle.net admins</li>
@ -456,7 +456,7 @@ DELETE 207
<li>I decided to start the import process in the evening rather than waiting for the morning, and right as the first community was finished importing I started seeing <code>Timeout waiting for idle object</code> errors</li>
<li>I had to cancel the import, clean up a bunch of database entries, increase the PostgreSQL <code>max_connections</code> as a precaution, restart PostgreSQL and Tomcat, and then finally completed the import</li>
</ul>
<h2 id="20170918">2017-09-18</h2>
<h2 id="2017-09-18">2017-09-18</h2>
<ul>
<li>I think we should force regeneration of all thumbnails in the CGIAR Library community, as their DSpace is version 1.7 and CGSpace is running DSpace 5.5 so they should look much better</li>
<li>One item for comparison:</li>
@ -466,7 +466,7 @@ DELETE 207
<ul>
<li>Moved the CGIAR Library Migration notes to a page<a href="/cgspace-notes/cgiar-library-migration/">cgiar-library-migration</a>as there seems to be a bug with post slugs defined in frontmatter when you have a permalink scheme defined in <code>config.toml</code> (happens currently in Hugo 0.27.1 at least)</li>
</ul>
<h2 id="20170919">2017-09-19</h2>
<h2 id="2017-09-19">2017-09-19</h2>
<ul>
<li>Nightly Solr indexing is working again, and it appears to be pretty quick actually:</li>
</ul>
@ -481,7 +481,7 @@ DELETE 207
<li>Marianne Gadeberg from WLE asked if I would add an account for Adam Hunt on CGSpace and give him permissions to approve all WLE publications</li>
<li>I told him to register first, as he's a CGIAR user and needs an account to be created before I can add him to the groups</li>
</ul>
<h2 id="20170920">2017-09-20</h2>
<h2 id="2017-09-20">2017-09-20</h2>
<ul>
<li>Abenet and I noticed that hdl.handle.net is blocked by ETC at ILRI Addis so I asked Biruk Debebe to route it over the satellite</li>
<li>Force thumbnail regeneration for the CGIAR System Organization's Historic Archive community (2000 items):</li>
@ -490,19 +490,19 @@ DELETE 207
</code></pre><ul>
<li>I'm still waiting (over 1 day later) to hear back from the CGIAR System Organization about updating the DNS for library.cgiar.org</li>
</ul>
<h2 id="20170921">2017-09-21</h2>
<h2 id="2017-09-21">2017-09-21</h2>
<ul>
<li>Switch to OpenJDK 8 from Oracle JDK on DSpace Test</li>
<li>I want to test this for awhile to see if we can start using it instead</li>
<li>I need to look at the JVM graphs in Munin, test the Atmire modules, build the source, etc to get some impressions</li>
</ul>
<h2 id="20170922">2017-09-22</h2>
<h2 id="2017-09-22">2017-09-22</h2>
<ul>
<li>Experimenting with setting up a global JNDI database resource that can be pooled among all the DSpace webapps (reference the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting</a> comments)</li>
<li>See: <a href="https://www.journaldev.com/2513/tomcat-datasource-jndi-example-java">https://www.journaldev.com/2513/tomcat-datasource-jndi-example-java</a></li>
<li>See: <a href="http://memorynotfound.com/configure-jndi-datasource-tomcat/">http://memorynotfound.com/configure-jndi-datasource-tomcat/</a></li>
</ul>
<h2 id="20170924">2017-09-24</h2>
<h2 id="2017-09-24">2017-09-24</h2>
<ul>
<li>Start investigating other platforms for CGSpace due to linear instance pricing on Linode</li>
<li>We need to figure out how much memory is used by applications, caches, etc, and how much disk space the asset store needs</li>
@ -538,7 +538,7 @@ DELETE 207
<li>I ended up having to kill the import and wait until he was done</li>
<li>I exported a clean CSV and applied the changes from that one, which was a hundred or two less than I thought there should be (at least compared to the current state of DSpace Test, which is a few months old)</li>
</ul>
<h2 id="20170925">2017-09-25</h2>
<h2 id="2017-09-25">2017-09-25</h2>
<ul>
<li>Email Rosemary Kande from ICT to ask about the administrative / finance procedure for moving DSpace Test from EU to US region on Linode</li>
<li>Communicate (finally) with Tania and Tunji from the CGIAR System Organization office to tell them to request CGNET make the DNS updates for library.cgiar.org</li>
@ -602,7 +602,7 @@ INFO org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Da
</code></pre><ul>
<li>So it's good to know that <em>something</em> gets printed when it fails because I didn't see <em>any</em> mention of JNDI before when I was testing!</li>
</ul>
<h2 id="20170926">2017-09-26</h2>
<h2 id="2017-09-26">2017-09-26</h2>
<ul>
<li>Adam Hunt from WLE finally registered so I added him to the editor and approver groups</li>
<li>Then I noticed that Sisay never removed Marianne's user accounts from the approver steps in the workflow because she is already in the WLE groups, which are in those steps</li>
@ -613,7 +613,7 @@ INFO org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Da
<li>Start discussiong with ICT about Linode server update for DSpace Test</li>
<li>Rosemary said I need to work with Robert Okal to destroy/create the server, and then let her and Lilian Masigah from finance know the updated Linode asset names for their records</li>
</ul>
<h2 id="20170928">2017-09-28</h2>
<h2 id="2017-09-28">2017-09-28</h2>
<ul>
<li>Tunji from the System Organization finally sent the DNS request for library.cgiar.org to CGNET</li>
<li>Now the redirects work</li>

View File

@ -31,7 +31,7 @@ http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
There appears to be a pattern but I&#39;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -112,7 +112,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
</p>
</header>
<h2 id="20171001">2017-10-01</h2>
<h2 id="2017-10-01">2017-10-01</h2>
<ul>
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
@ -121,7 +121,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
<li>There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
</ul>
<h2 id="20171002">2017-10-02</h2>
<h2 id="2017-10-02">2017-10-02</h2>
<ul>
<li>Peter Ballantyne said he was having problems logging into CGSpace with &ldquo;both&rdquo; of his accounts (CGIAR LDAP and personal, apparently)</li>
<li>I looked in the logs and saw some LDAP lookup failures due to timeout but also strangely a &ldquo;no DN found&rdquo; error:</li>
@ -138,7 +138,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
<li>For what it's worth, there are no errors on any other recent days, so it must have been some network issue on Linode or CGNET's LDAP server</li>
<li>Linode emailed to say that linode578611 (DSpace Test) needs to migrate to a new host for a security update so I initiated the migration immediately rather than waiting for the scheduled time in two weeks</li>
</ul>
<h2 id="20171004">2017-10-04</h2>
<h2 id="2017-10-04">2017-10-04</h2>
<ul>
<li>Twice in the last twenty-four hours Linode has alerted about high CPU usage on CGSpace (linode2533629)</li>
<li>Communicate with Sam from the CGIAR System Organization about some broken links coming from their CGIAR Library domain to CGSpace</li>
@ -152,7 +152,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
<li>Lots of inconsistencies and errors in subjects, dc.format.extent, regions, countries</li>
<li>Merge the Discovery search changes for ISI Journal (<a href="https://github.com/ilri/DSpace/pull/341">#341</a>)</li>
</ul>
<h2 id="20171005">2017-10-05</h2>
<h2 id="2017-10-05">2017-10-05</h2>
<ul>
<li>Twice in the past twenty-four hours Linode has warned that CGSpace's outbound traffic rate was exceeding the notification threshold</li>
<li>I had a look at yesterday's OAI and REST logs in <code>/var/log/nginx</code> but didn't see anything unusual:</li>
@ -188,7 +188,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
<li>I used OpenRefine to isolate them and then fixed and re-imported them into CGSpace</li>
<li>I manually checked a dozen of them and it appeared that the correct handle was always the second one, so I just deleted the first one</li>
</ul>
<h2 id="20171006">2017-10-06</h2>
<h2 id="2017-10-06">2017-10-06</h2>
<ul>
<li>I saw a nice tweak to thumbnail presentation on the Cardiff Metropolitan University DSpace: <a href="https://repository.cardiffmet.ac.uk/handle/10369/8780">https://repository.cardiffmet.ac.uk/handle/10369/8780</a></li>
<li>It adds a subtle border and box shadow, before and after:</li>
@ -203,7 +203,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
<li>This method is kinda a hack but at least we can put all the pieces into git to be reproducible</li>
<li>I will tell Tunji to send me the verification file</li>
</ul>
<h2 id="20171010">2017-10-10</h2>
<h2 id="2017-10-10">2017-10-10</h2>
<ul>
<li>Deploy logic to allow verification of the library.cgiar.org domain in the Google Search Console (<a href="https://github.com/ilri/DSpace/pull/343">#343</a>)</li>
<li>After verifying both the HTTP and HTTPS domains and submitting a sitemap it will be interesting to see how the stats in the console as well as the search results change (currently 28,500 results):</li>
@ -226,7 +226,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
<li>Delete community 10568/174 (Sustainable livestock futures)</li>
<li>Delete collections in 10568/27629 that have zero items (33 of them!)</li>
</ul>
<h2 id="20171011">2017-10-11</h2>
<h2 id="2017-10-11">2017-10-11</h2>
<ul>
<li>Peter added me as an owner on the CGSpace property on Google Search Console and I tried to submit a &ldquo;Change of Address&rdquo; request for the CGIAR Library but got an error:</li>
</ul>
@ -235,25 +235,25 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
<li>We are sending top-level CGIAR Library traffic to their specific community hierarchy in CGSpace so this type of change of address won't work—we'll just need to wait for Google to slowly index everything and take note of the HTTP 301 redirects</li>
<li>Also the Google Search Console doesn't work very well with Google Analytics being blocked, so I had to turn off my ad blocker to get the &ldquo;Change of Address&rdquo; tool to work!</li>
</ul>
<h2 id="20171012">2017-10-12</h2>
<h2 id="2017-10-12">2017-10-12</h2>
<ul>
<li>Finally finish (I think) working on the myriad nginx redirects for all the CGIAR Library browse stuff—it ended up getting pretty complicated!</li>
<li>I still need to commit the DSpace changes (add browse index, XMLUI strings, Discovery index, etc), but I should be able to deploy that on CGSpace soon</li>
</ul>
<h2 id="20171014">2017-10-14</h2>
<h2 id="2017-10-14">2017-10-14</h2>
<ul>
<li>Run system updates on DSpace Test and reboot server</li>
<li>Merge changes adding a search/browse index for CGIAR System subject to <code>5_x-prod</code> (<a href="https://github.com/ilri/DSpace/pull/344">#344</a>)</li>
<li>I checked the top browse links in Google's search results for <code>site:library.cgiar.org inurl:browse</code> and they are all redirected appropriately by the nginx rewrites I worked on last week</li>
</ul>
<h2 id="20171022">2017-10-22</h2>
<h2 id="2017-10-22">2017-10-22</h2>
<ul>
<li>Run system updates on DSpace Test and reboot server</li>
<li>Re-deploy CGSpace from latest <code>5_x-prod</code> (adds ISI Journal to search filters and adds Discovery index for CGIAR Library <code>systemsubject</code>)</li>
<li>Deploy nginx redirect fixes to catch CGIAR Library browse links (redirect to their community and translate subject→systemsubject)</li>
<li>Run migration of CGSpace server (linode18) for Linode security alert, which took 42 minutes of downtime</li>
</ul>
<h2 id="20171026">2017-10-26</h2>
<h2 id="2017-10-26">2017-10-26</h2>
<ul>
<li>In the last 24 hours we've gotten a few alerts from Linode that there was high CPU and outgoing traffic on CGSpace</li>
<li>Uptime Robot even noticed CGSpace go &ldquo;down&rdquo; for a few minutes</li>
@ -280,15 +280,15 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
<li>I told her about the possibility to use per-collection item templates, and asked if her items in question were all from a single collection</li>
<li>We've never used it but it could be worth looking at</li>
</ul>
<h2 id="20171027">2017-10-27</h2>
<h2 id="2017-10-27">2017-10-27</h2>
<ul>
<li>Linode alerted about high CPU usage again (twice) on CGSpace in the last 24 hours, around 2AM and 2PM</li>
</ul>
<h2 id="20171028">2017-10-28</h2>
<h2 id="2017-10-28">2017-10-28</h2>
<ul>
<li>Linode alerted about high CPU usage again on CGSpace around 2AM this morning</li>
</ul>
<h2 id="20171029">2017-10-29</h2>
<h2 id="2017-10-29">2017-10-29</h2>
<ul>
<li>Linode alerted about high CPU usage again on CGSpace around 2AM and 4AM</li>
<li>I'm still not sure why this started causing alerts so repeatadely the past week</li>
@ -310,7 +310,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
<li>After browsing the CORE site it seems that the CGIAR Library is somehow a member of CORE, so they have probably only been harvesting CGSpace since we did the migration, as library.cgiar.org directs to us now</li>
<li>For now I will just contact them to have them update their contact info in the bot's user agent, but eventually I think I'll tell them to swap out the CGIAR Library entry for CGSpace</li>
</ul>
<h2 id="20171030">2017-10-30</h2>
<h2 id="2017-10-30">2017-10-30</h2>
<ul>
<li>Like clock work, Linode alerted about high CPU usage on CGSpace again this morning (this time at 8:13 AM)</li>
<li>Uptime Robot noticed that CGSpace went down around 10:15 AM, and I saw that there were 93 PostgreSQL connections:</li>
@ -385,7 +385,7 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
</code></pre><ul>
<li>I will check again tomorrow</li>
</ul>
<h2 id="20171031">2017-10-31</h2>
<h2 id="2017-10-31">2017-10-31</h2>
<ul>
<li>Very nice, Linode alerted that CGSpace had high CPU usage at 2AM again</li>
<li>Ask on the dspace-tech mailing list if it's possible to use an existing item as a template for a new item</li>

View File

@ -45,7 +45,7 @@ Generate list of authors on CGSpace for Peter to go through and correct:
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = &#39;contributor&#39; and qualifier = &#39;author&#39;) AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -126,11 +126,11 @@ COPY 54701
</p>
</header>
<h2 id="20171101">2017-11-01</h2>
<h2 id="2017-11-01">2017-11-01</h2>
<ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul>
<h2 id="20171102">2017-11-02</h2>
<h2 id="2017-11-02">2017-11-02</h2>
<ul>
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
@ -156,12 +156,12 @@ COPY 54701
<li>Also, some dates like with completely invalid format like &ldquo;2010- 06&rdquo; and &ldquo;2011-3-28&rdquo;</li>
<li>I also collapsed some consecutive whitespace on a handful of fields</li>
</ul>
<h2 id="20171103">2017-11-03</h2>
<h2 id="2017-11-03">2017-11-03</h2>
<ul>
<li>Atmire got back to us to say that they estimate it will take two days of labor to implement the change to Listings and Reports</li>
<li>I said I'd ask Abenet if she wants that feature</li>
</ul>
<h2 id="20171104">2017-11-04</h2>
<h2 id="2017-11-04">2017-11-04</h2>
<ul>
<li>I finished looking through Sisay's CIAT records for the &ldquo;Alianzas de Aprendizaje&rdquo; data</li>
<li>I corrected about half of the authors to standardize them</li>
@ -198,7 +198,7 @@ COPY 54701
</code></pre><ul>
<li>For now I don't know what this user is!</li>
</ul>
<h2 id="20171105">2017-11-05</h2>
<h2 id="2017-11-05">2017-11-05</h2>
<ul>
<li>Peter asked if I could fix the appearance of &ldquo;International Livestock Research Institute&rdquo; in the author lookup during item submission</li>
<li>It looks to be just an issue with the user interface expecting authors to have both a first and last name:</li>
@ -226,7 +226,7 @@ COPY 54701
<li>This guide shows how to <a href="https://geekflare.com/enable-jmx-tomcat-to-monitor-administer/">enable JMX in Tomcat</a> by modifying <code>CATALINA_OPTS</code></li>
<li>I was able to successfully connect to my local Tomcat with jconsole!</li>
</ul>
<h2 id="20171107">2017-11-07</h2>
<h2 id="2017-11-07">2017-11-07</h2>
<ul>
<li>CGSpace when down and up a few times this morning, first around 3AM, then around 7</li>
<li>Tsega had to restart Tomcat 7 to fix it temporarily</li>
@ -464,7 +464,7 @@ $ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017-
</ul>
<pre><code># grep &quot;Baiduspider/2.0&quot; /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq | wc -l
164
</code></pre><h2 id="20171108">2017-11-08</h2>
</code></pre><h2 id="2017-11-08">2017-11-08</h2>
<ul>
<li>Linode sent several alerts last night about CPU usage and outbound traffic rate at 6:13PM</li>
<li>Linode sent another alert about CPU usage in the morning at 6:12AM</li>
@ -526,7 +526,7 @@ proxy_set_header User-Agent $ua;
<li>Run system updates on CGSpace and reboot the server</li>
<li>Re-deploy latest <code>5_x-prod</code> branch on CGSpace and DSpace Test (includes the clickable thumbnails, CCAFS phase II project tags, and updated news text)</li>
</ul>
<h2 id="20171109">2017-11-09</h2>
<h2 id="2017-11-09">2017-11-09</h2>
<ul>
<li>Awesome, it seems my bot mapping stuff in nginx actually reduced the number of Tomcat sessions used by the CIAT scraper today, total requests and unique sessions:</li>
</ul>
@ -550,13 +550,13 @@ $ grep 104.196.152.243 dspace.log.2017-11-07 | grep -o -E 'session_id=[A-Z0-9]{3
<li>This gets me thinking, I wonder if I can use something like nginx's rate limiter to automatically change the user agent of clients who make too many requests</li>
<li>Perhaps using a combination of geo and map, like illustrated here: <a href="https://www.nginx.com/blog/rate-limiting-nginx/">https://www.nginx.com/blog/rate-limiting-nginx/</a></li>
</ul>
<h2 id="20171111">2017-11-11</h2>
<h2 id="2017-11-11">2017-11-11</h2>
<ul>
<li>I was looking at the Google index and noticed there are 4,090 search results for dspace.ilri.org but only seven for mahider.ilri.org</li>
<li>Search with something like: inurl:dspace.ilri.org inurl:https</li>
<li>I want to get rid of those legacy domains eventually!</li>
</ul>
<h2 id="20171112">2017-11-12</h2>
<h2 id="2017-11-12">2017-11-12</h2>
<ul>
<li>Update the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure templates</a> to be a little more modular and flexible</li>
<li>Looking at the top client IPs on CGSpace so far this morning, even though it's only been eight hours:</li>
@ -630,7 +630,7 @@ Server: nginx
<li>The first request works, second is denied with an HTTP 503!</li>
<li>I need to remember to check the Munin graphs for PostgreSQL and JVM next week to see how this affects them</li>
</ul>
<h2 id="20171113">2017-11-13</h2>
<h2 id="2017-11-13">2017-11-13</h2>
<ul>
<li>At the end of the day I checked the logs and it really looks like the Baidu rate limiting is working, HTTP 200 vs 503:</li>
</ul>
@ -659,7 +659,7 @@ Server: nginx
<li>After uploading and looking at the data in DSpace Test I saw more errors with CRPs, subjects (one item had four copies of all of its subjects, another had a &ldquo;.&rdquo; in it), affiliations, sponsors, etc.</li>
<li>Atmire responded to the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">ticket about ORCID stuff</a> a few days ago, today I told them that I need to talk to Peter and the partners to see what we would like to do</li>
</ul>
<h2 id="20171114">2017-11-14</h2>
<h2 id="2017-11-14">2017-11-14</h2>
<ul>
<li>Deploy some nginx configuration updates to CGSpace</li>
<li>They had been waiting on a branch for a few months and I think I just forgot about them</li>
@ -674,13 +674,13 @@ dspace6=# CREATE EXTENSION pgcrypto;
<li>I'm not sure if we can use separate profiles like we did before with <code>mvn -Denv=blah</code> to use blah.properties</li>
<li>It seems we need to use &ldquo;system properties&rdquo; to override settings, ie: <code>-Ddspace.dir=/Users/aorth/dspace6</code></li>
</ul>
<h2 id="20171115">2017-11-15</h2>
<h2 id="2017-11-15">2017-11-15</h2>
<ul>
<li>Send Adam Hunt an invite to the DSpace Developers network on Yammer</li>
<li>He is the new head of communications at WLE, since Michael left</li>
<li>Merge changes to item view's wording of link metadata (<a href="https://github.com/ilri/DSpace/pull/348">#348</a>)</li>
</ul>
<h2 id="20171117">2017-11-17</h2>
<h2 id="2017-11-17">2017-11-17</h2>
<ul>
<li>Uptime Robot said that CGSpace went down today and I see lots of <code>Timeout waiting for idle object</code> errors in the DSpace logs</li>
<li>I looked in PostgreSQL using <code>SELECT * FROM pg_stat_activity;</code> and saw that there were 73 active connections</li>
@ -724,7 +724,7 @@ dspace6=# CREATE EXTENSION pgcrypto;
<ul>
<li>Switch DSpace Test to using the G1GC for JVM so I can see what the JVM graph looks like eventually, and start evaluating it for production</li>
</ul>
<h2 id="20171119">2017-11-19</h2>
<h2 id="2017-11-19">2017-11-19</h2>
<ul>
<li>Linode sent an alert that CGSpace was using a lot of CPU around 46 AM</li>
<li>Looking in the nginx access logs I see the most active XMLUI users between 4 and 6 AM:</li>
@ -762,18 +762,18 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
<li>It's been a few days since I enabled the G1GC on DSpace Test and the JVM graph definitely changed:</li>
</ul>
<p><img src="/cgspace-notes/2017/11/tomcat-jvm-g1gc.png" alt="Tomcat G1GC"></p>
<h2 id="20171120">2017-11-20</h2>
<h2 id="2017-11-20">2017-11-20</h2>
<ul>
<li>I found <a href="https://www.cakesolutions.net/teamblogs/low-pause-gc-on-the-jvm">an article about JVM tuning</a> that gives some pointers how to enable logging and tools to analyze logs for you</li>
<li>Also notes on <a href="https://blog.gceasy.io/2016/11/15/rotating-gc-log-files/">rotating GC logs</a></li>
<li>I decided to switch DSpace Test back to the CMS garbage collector because it is designed for low pauses and high throughput (like G1GC!) and because we haven't even tried to monitor or tune it</li>
</ul>
<h2 id="20171121">2017-11-21</h2>
<h2 id="2017-11-21">2017-11-21</h2>
<ul>
<li>Magdalena was having problems logging in via LDAP and it seems to be a problem with the CGIAR LDAP server:</li>
</ul>
<pre><code>2017-11-21 11:11:09,621 WARN org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=2FEC0E5286C17B6694567FFD77C3171C:ip_addr=77.241.141.58:ldap_authentication:type=failed_auth javax.naming.CommunicationException\colon; simple bind failed\colon; svcgroot2.cgiarad.org\colon;3269 [Root exception is javax.net.ssl.SSLHandshakeException\colon; sun.security.validator.ValidatorException\colon; PKIX path validation failed\colon; java.security.cert.CertPathValidatorException\colon; validity check failed]
</code></pre><h2 id="20171122">2017-11-22</h2>
</code></pre><h2 id="2017-11-22">2017-11-22</h2>
<ul>
<li>Linode sent an alert that the CPU usage on the CGSpace server was very high around 4 to 6 AM</li>
<li>The logs don't show anything particularly abnormal between those hours:</li>
@ -794,7 +794,7 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
<li>In other news, it looks like the JVM garbage collection pattern is back to its standard jigsaw pattern after switching back to CMS a few days ago:</li>
</ul>
<p><img src="/cgspace-notes/2017/11/tomcat-jvm-cms.png" alt="Tomcat JVM with CMS GC"></p>
<h2 id="20171123">2017-11-23</h2>
<h2 id="2017-11-23">2017-11-23</h2>
<ul>
<li>Linode alerted again that CPU usage was high on CGSpace from 4:13 to 6:13 AM</li>
<li>I see a lot of Googlebot (66.249.66.90) in the XMLUI access logs</li>
@ -838,7 +838,7 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
<li>Apparently setting <code>random_page_cost</code> to 1 is &ldquo;common&rdquo; advice for systems running PostgreSQL on SSD (the default is 4)</li>
<li>So I deployed this on DSpace Test and will check the Munin PostgreSQL graphs in a few days to see if anything changes</li>
</ul>
<h2 id="20171124">2017-11-24</h2>
<h2 id="2017-11-24">2017-11-24</h2>
<ul>
<li>It's too early to tell for sure, but after I made the <code>random_page_cost</code> change on DSpace Test's PostgreSQL yesterday the number of connections dropped drastically:</li>
</ul>
@ -857,7 +857,7 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
</code></pre><ul>
<li>I should probably tell CGIAR people to have CGNET stop that</li>
</ul>
<h2 id="20171126">2017-11-26</h2>
<h2 id="2017-11-26">2017-11-26</h2>
<ul>
<li>Linode alerted that CGSpace server was using too much CPU from 5:18 to 7:18 AM</li>
<li>Yet another mystery because the load for all domains looks fine at that time:</li>
@ -873,7 +873,7 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
298 157.55.39.206
379 66.249.66.70
1855 66.249.66.90
</code></pre><h2 id="20171129">2017-11-29</h2>
</code></pre><h2 id="2017-11-29">2017-11-29</h2>
<ul>
<li>Linode alerted that CGSpace was using 279% CPU from 6 to 8 AM this morning</li>
<li>About an hour later Uptime Robot said that the server was down</li>
@ -911,7 +911,7 @@ $ cat dspace.log.2017-11-28 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | u
<li>I will bump DSpace's <code>db.maxconnections</code> from 60 to 90, and PostgreSQL's <code>max_connections</code> from 183 to 273 (which is using my loose formula of 90 * webapps + 3)</li>
<li>I really need to figure out how to get DSpace to use a PostgreSQL connection pool</li>
</ul>
<h2 id="20171130">2017-11-30</h2>
<h2 id="2017-11-30">2017-11-30</h2>
<ul>
<li>Linode alerted about high CPU usage on CGSpace again around 6 to 8 AM</li>
<li>Then Uptime Robot said CGSpace was down a few minutes later, but it resolved itself I think (or Tsega restarted Tomcat, I don't know)</li>

View File

@ -27,7 +27,7 @@ The logs say &ldquo;Timeout waiting for idle object&rdquo;
PostgreSQL activity says there are 115 connections currently
The list of connections to XMLUI and REST API for today:
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -108,7 +108,7 @@ The list of connections to XMLUI and REST API for today:
</p>
</header>
<h2 id="20171201">2017-12-01</h2>
<h2 id="2017-12-01">2017-12-01</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li>
@ -166,11 +166,11 @@ The list of connections to XMLUI and REST API for today:
14 2a01:7e00::f03c:91ff:fe18:7396
46 2001:4b99:1:1:216:3eff:fe2c:dc6c
319 2001:4b99:1:1:216:3eff:fe76:205b
</code></pre><h2 id="20171203">2017-12-03</h2>
</code></pre><h2 id="2017-12-03">2017-12-03</h2>
<ul>
<li>Linode alerted that CGSpace's load was 327.5% from 6 to 8 AM again</li>
</ul>
<h2 id="20171204">2017-12-04</h2>
<h2 id="2017-12-04">2017-12-04</h2>
<ul>
<li>Linode alerted that CGSpace's load was 255.5% from 8 to 10 AM again</li>
<li>I looked at the Munin stats on DSpace Test (linode02) again to see how the PostgreSQL tweaks from a few weeks ago were holding up:</li>
@ -184,13 +184,13 @@ The list of connections to XMLUI and REST API for today:
<li>For reference, here is the past month's connections:</li>
</ul>
<p><img src="/cgspace-notes/2017/12/postgres-connections-month-cgspace.png" alt="CGSpace PostgreSQL connections month"></p>
<h2 id="20171205">2017-12-05</h2>
<h2 id="2017-12-05">2017-12-05</h2>
<ul>
<li>Linode alerted again that the CPU usage on CGSpace was high this morning from 8 to 10 AM</li>
<li>CORE updated the entry for CGSpace on their index: <a href="https://core.ac.uk/search?q=repositories.id:(1016)&amp;fullTextOnly=false">https://core.ac.uk/search?q=repositories.id:(1016)&amp;fullTextOnly=false</a></li>
<li>Linode alerted again that the CPU usage on CGSpace was high this evening from 8 to 10 PM</li>
</ul>
<h2 id="20171206">2017-12-06</h2>
<h2 id="2017-12-06">2017-12-06</h2>
<ul>
<li>Linode alerted again that the CPU usage on CGSpace was high this morning from 6 to 8 AM</li>
<li>Uptime Robot alerted that the server went down and up around 8:53 this morning</li>
@ -212,7 +212,7 @@ The list of connections to XMLUI and REST API for today:
</code></pre><ul>
<li>50.116.102.77 is apparently in the US on websitewelcome.com</li>
</ul>
<h2 id="20171207">2017-12-07</h2>
<h2 id="2017-12-07">2017-12-07</h2>
<ul>
<li>Uptime Robot reported a few times today that CGSpace was down and then up</li>
<li>At one point Tsega restarted Tomcat</li>
@ -254,17 +254,17 @@ Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign k
</ul>
<pre><code>dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (144666);
UPDATE 1
</code></pre><h2 id="20171213">2017-12-13</h2>
</code></pre><h2 id="2017-12-13">2017-12-13</h2>
<ul>
<li>Linode alerted that CGSpace was using high CPU from 10:13 to 12:13 this morning</li>
</ul>
<h2 id="20171216">2017-12-16</h2>
<h2 id="2017-12-16">2017-12-16</h2>
<ul>
<li>Re-work the XMLUI base theme to allow child themes to override the header logo's image and link destination: <a href="https://github.com/ilri/DSpace/pull/349">#349</a></li>
<li>This required a little bit of work to restructure the XSL templates</li>
<li>Optimize PNG and SVG image assets in the CGIAR base theme using pngquant and svgo: <a href="https://github.com/ilri/DSpace/pull/350">#350</a></li>
</ul>
<h2 id="20171217">2017-12-17</h2>
<h2 id="2017-12-17">2017-12-17</h2>
<ul>
<li>Reboot DSpace Test to get new Linode Linux kernel</li>
<li>Looking at CCAFS bulk import for Magdalena Haman (she originally sent them in November but some of the thumbnails were missing and dates were messed up so she resent them now)</li>
@ -358,7 +358,7 @@ Elapsed time: 2 secs (2559 msecs)
<li>I will apply it on our branch but I need to make a note to NOT cherry-pick it when I rebase on to the latest 5.x upstream later</li>
<li>Pull request: <a href="https://github.com/ilri/DSpace/pull/351">#351</a></li>
</ul>
<h2 id="20171218">2017-12-18</h2>
<h2 id="2017-12-18">2017-12-18</h2>
<ul>
<li>Linode alerted this morning that there was high outbound traffic from 6 to 8 AM</li>
<li>The XMLUI logs show that the CORE bot from last night (137.108.70.7) is very active still:</li>
@ -453,7 +453,7 @@ $ schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery
</code></pre><ul>
<li>The PostgreSQL issues are getting out of control, I need to figure out how to enable connection pools in Tomcat!</li>
</ul>
<h2 id="20171219">2017-12-19</h2>
<h2 id="2017-12-19">2017-12-19</h2>
<ul>
<li>Briefly had PostgreSQL connection issues on CGSpace for the millionth time</li>
<li>I'm fucking sick of this!</li>
@ -651,7 +651,7 @@ javax.naming.NoInitialContextException: Need to specify class name in environmen
<li>If you monitor the <code>pg_stat_activity</code> while you run <code>dspace database info</code> you can see that it doesn't use the JNDI and creates ~9 extra PostgreSQL connections!</li>
<li>And in the middle of all of this Linode sends an alert that CGSpace has high CPU usage from 2 to 4 PM</li>
</ul>
<h2 id="20171220">2017-12-20</h2>
<h2 id="2017-12-20">2017-12-20</h2>
<ul>
<li>The database connection pooling is definitely better!</li>
</ul>
@ -674,7 +674,7 @@ $ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -i 10568/89287
</code></pre><ul>
<li>The final code for the JNDI work in the Ansible infrastructure scripts is here: <a href="https://github.com/ilri/rmg-ansible-public/commit/1959d9cb7a0e7a7318c77f769253e5e029bdfa3b">https://github.com/ilri/rmg-ansible-public/commit/1959d9cb7a0e7a7318c77f769253e5e029bdfa3b</a></li>
</ul>
<h2 id="20171224">2017-12-24</h2>
<h2 id="2017-12-24">2017-12-24</h2>
<ul>
<li>Linode alerted that CGSpace was using high CPU this morning around 6 AM</li>
<li>I'm playing with reading all of a month's nginx logs into goaccess:</li>
@ -690,13 +690,13 @@ $ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -i 10568/89287
</ul>
</li>
</ul>
<h2 id="20171225">2017-12-25</h2>
<h2 id="2017-12-25">2017-12-25</h2>
<ul>
<li>The PostgreSQL connection pooling is much better when using the Tomcat JNDI pool</li>
<li>Here are the Munin stats for the past week on CGSpace:</li>
</ul>
<p><img src="/cgspace-notes/2017/12/postgres-connections-cgspace.png" alt="CGSpace PostgreSQL connections week"></p>
<h2 id="20171229">2017-12-29</h2>
<h2 id="2017-12-29">2017-12-29</h2>
<ul>
<li>Looking at some old notes for metadata to clean up, I found a few hundred corrections in <code>cg.fulltextstatus</code> and <code>dc.language.iso</code>:</li>
</ul>
@ -721,7 +721,7 @@ DELETE 20
</code></pre><ul>
<li>I need to figure out why we have records with language <code>in</code> because that's not a language!</li>
</ul>
<h2 id="20171230">2017-12-30</h2>
<h2 id="2017-12-30">2017-12-30</h2>
<ul>
<li>Linode alerted that CGSpace was using 259% CPU from 4 to 6 AM</li>
<li>Uptime Robot noticed that the server went down for 1 minute a few hours later, around 9AM</li>
@ -748,7 +748,7 @@ DELETE 20
</code></pre><ul>
<li>216.244.66.245 seems to be moz.com's DotBot</li>
</ul>
<h2 id="20171231">2017-12-31</h2>
<h2 id="2017-12-31">2017-12-31</h2>
<ul>
<li>I finished working on the 42 records for CCAFS after Magdalena sent the remaining corrections</li>
<li>After that I uploaded them to CGSpace:</li>

View File

@ -147,7 +147,7 @@ dspace.log.2018-01-02:34
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&#39;s Encrypt if it&#39;s just a handful of domains
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -228,7 +228,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
</p>
</header>
<h2 id="20180102">2018-01-02</h2>
<h2 id="2018-01-02">2018-01-02</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
@ -295,7 +295,7 @@ dspace.log.2018-01-02:34
</code></pre><ul>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains</li>
</ul>
<h2 id="20180103">2018-01-03</h2>
<h2 id="2018-01-03">2018-01-03</h2>
<ul>
<li>I woke up to more up and down of CGSpace, this time UptimeRobot noticed a few rounds of up and down of a few minutes each and Linode also notified of high CPU load from 12 to 2 PM</li>
<li>Looks like I need to increase the database pool size again:</li>
@ -389,7 +389,7 @@ dspace.log.2018-01-03:1909
<li>I guess for now I just have to increase the database connection pool's max active</li>
<li>It's currently 75 and normally I'd just bump it by 25 but let me be a bit daring and push it by 50 to 125, because I used to see at least 121 connections in pg_stat_activity before when we were using the shitty default pooling</li>
</ul>
<h2 id="20180104">2018-01-04</h2>
<h2 id="2018-01-04">2018-01-04</h2>
<ul>
<li>CGSpace went down and up a bunch of times last night and ILRI staff were complaining a lot last night</li>
<li>The XMLUI logs show this activity:</li>
@ -423,7 +423,7 @@ dspace.log.2018-01-04:1559
<li>Once I get back to Amman I will have to try to create different database pools for different web applications, like recently discussed on the dspace-tech mailing list</li>
<li>Create accounts on CGSpace for two CTA staff <a href="mailto:km4ard@cta.int">km4ard@cta.int</a> and <a href="mailto:bheenick@cta.int">bheenick@cta.int</a></li>
</ul>
<h2 id="20180105">2018-01-05</h2>
<h2 id="2018-01-05">2018-01-05</h2>
<ul>
<li>Peter said that CGSpace was down last night and Tsega restarted Tomcat</li>
<li>I don't see any alerts from Linode or UptimeRobot, and there are no PostgreSQL connection errors in the dspace logs for today:</li>
@ -453,7 +453,7 @@ sys 3m14.890s
</code></pre><ul>
<li>Reboot CGSpace and DSpace Test for new kernels (4.14.12-x86_64-linode92) that partially mitigate the <a href="https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/">Spectre and Meltdown CPU vulnerabilities</a></li>
</ul>
<h2 id="20180106">2018-01-06</h2>
<h2 id="2018-01-06">2018-01-06</h2>
<ul>
<li>I'm still seeing Solr errors in the DSpace logs even after the full reindex yesterday:</li>
</ul>
@ -461,14 +461,14 @@ sys 3m14.890s
</code></pre><ul>
<li>I posted a message to the dspace-tech mailing list to see if anyone can help</li>
</ul>
<h2 id="20180109">2018-01-09</h2>
<h2 id="2018-01-09">2018-01-09</h2>
<ul>
<li>Advise Sisay about blank lines in some IITA records</li>
<li>Generate a list of author affiliations for Peter to clean up:</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
COPY 4515
</code></pre><h2 id="20180110">2018-01-10</h2>
</code></pre><h2 id="2018-01-10">2018-01-10</h2>
<ul>
<li>I looked to see what happened to this year's Solr statistics sharding task that should have run on 2018-01-01 and of course it failed:</li>
</ul>
@ -619,7 +619,7 @@ cache_alignment : 64
<li>Citing concerns with metadata quality, I suggested adding him on DSpace Test first</li>
<li>I opened a ticket with Atmire to ask them about DSpace 5.8 compatibility: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560</a></li>
</ul>
<h2 id="20180111">2018-01-11</h2>
<h2 id="2018-01-11">2018-01-11</h2>
<ul>
<li>The PostgreSQL and firewall graphs from this week show clearly the load from the new bot from PerfectIP.net yesterday:</li>
</ul>
@ -673,7 +673,7 @@ cache_alignment : 64
</code></pre><ul>
<li>With that it is super easy to see where PostgreSQL connections are coming from in <code>pg_stat_activity</code></li>
</ul>
<h2 id="20180112">2018-01-12</h2>
<h2 id="2018-01-12">2018-01-12</h2>
<ul>
<li>I'm looking at the <a href="https://wiki.duraspace.org/display/DSDOC6x/Installing+DSpace#InstallingDSpace-ServletEngine(ApacheTomcat7orlater,Jetty,CauchoResinorequivalent)">DSpace 6.0 Install docs</a> and notice they tweak the number of threads in their Tomcat connector:</li>
</ul>
@ -698,7 +698,7 @@ cache_alignment : 64
</code></pre><ul>
<li>That could be very interesting</li>
</ul>
<h2 id="20180113">2018-01-13</h2>
<h2 id="2018-01-13">2018-01-13</h2>
<ul>
<li>Still testing DSpace 6.2 on Tomcat 8.5.24</li>
<li>Catalina errors at Tomcat 8.5 startup:</li>
@ -741,14 +741,14 @@ Caused by: java.lang.NullPointerException
<li>Shit, this might actually be a DSpace error: <a href="https://jira.duraspace.org/browse/DS-3434">https://jira.duraspace.org/browse/DS-3434</a></li>
<li>I'll comment on that issue</li>
</ul>
<h2 id="20180114">2018-01-14</h2>
<h2 id="2018-01-14">2018-01-14</h2>
<ul>
<li>Looking at the authors Peter had corrected</li>
<li>Some had multiple and he's corrected them by adding <code>||</code> in the correction column, but I can't process those this way so I will just have to flag them and do those manually later</li>
<li>Also, I can flag the values that have &ldquo;DELETE&rdquo;</li>
<li>Then I need to facet the correction column on isBlank(value) and not flagged</li>
</ul>
<h2 id="20180115">2018-01-15</h2>
<h2 id="2018-01-15">2018-01-15</h2>
<ul>
<li>Help Udana from IWMI export a CSV from DSpace Test so he can start trying a batch upload</li>
<li>I'm going to apply these ~130 corrections on CGSpace:</li>
@ -830,7 +830,7 @@ COPY 4552
real 0m25.756s
user 0m28.016s
sys 0m2.210s
</code></pre><h2 id="20180116">2018-01-16</h2>
</code></pre><h2 id="2018-01-16">2018-01-16</h2>
<ul>
<li>Meeting with CGSpace team, a few action items:
<ul>
@ -849,7 +849,7 @@ sys 0m2.210s
<li>I ended up creating a Jira issue for my <code>db.jndi</code> documentation fix: <a href="https://jira.duraspace.org/browse/DS-3803">DS-3803</a></li>
<li>The DSpace developers said they wanted each pull request to be associated with a Jira issue</li>
</ul>
<h2 id="20180117">2018-01-17</h2>
<h2 id="2018-01-17">2018-01-17</h2>
<ul>
<li>Abenet asked me to proof and upload 54 records for LIVES</li>
<li>A few records were missing countries (even though they're all from Ethiopia)</li>
@ -990,7 +990,7 @@ $ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/
<li>Overall the heap space usage in the munin graph seems ok, though I usually increase it by 512MB over the average a few times per year as usage grows</li>
<li>But maybe I should increase it by more, like 1024MB, to give a bit more head room</li>
</ul>
<h2 id="20180118">2018-01-18</h2>
<h2 id="2018-01-18">2018-01-18</h2>
<ul>
<li>UptimeRobot said CGSpace was down for 1 minute last night</li>
<li>I don't see any errors in the nginx or catalina logs, so I guess UptimeRobot just got impatient and closed the request, which caused nginx to send an HTTP 499</li>
@ -1013,7 +1013,7 @@ Jan 18 07:01:22 linode18 sudo[10812]: pam_unix(sudo:session): session opened for
<li>I had to cancel the Discovery indexing and I'll have to re-try it another time when the server isn't so busy (it had already taken two hours and wasn't even close to being done)</li>
<li>For now I've increased the Tomcat JVM heap from 5632 to 6144m, to give ~1GB of free memory over the average usage to hopefully account for spikes caused by load or background jobs</li>
</ul>
<h2 id="20180119">2018-01-19</h2>
<h2 id="2018-01-19">2018-01-19</h2>
<ul>
<li>Linode alerted and said that the CPU load was 264.1% on CGSpace</li>
<li>Start the Discovery indexing again:</li>
@ -1029,7 +1029,7 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspa
</code></pre><ul>
<li>I told Peter we should keep an eye out and try again next week</li>
</ul>
<h2 id="20180120">2018-01-20</h2>
<h2 id="2018-01-20">2018-01-20</h2>
<ul>
<li>Run the authority indexing script on CGSpace and of course it died:</li>
</ul>
@ -1072,7 +1072,7 @@ $ docker exec dspace_db psql -U postgres dspace -c 'alter user dspace nocreateus
$ docker exec dspace_db vacuumdb -U postgres dspace
$ docker cp ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspace_db:/tmp
$ docker exec dspace_db psql -U dspace -f /tmp/update-sequences.sql dspace
</code></pre><h2 id="20180122">2018-01-22</h2>
</code></pre><h2 id="2018-01-22">2018-01-22</h2>
<ul>
<li>Look over Udana's CSV of 25 WLE records from last week</li>
<li>I sent him some corrections:
@ -1106,7 +1106,7 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
<li>I'd still like to get arbitrary mbeans like activeSessions etc, though</li>
<li>I can't remember if I had to configure the jmx settings in <code>/etc/munin/plugin-conf.d/munin-node</code> or not—I think all I did was re-run the <code>munin-node-configure</code> script and of course enable JMX in Tomcat's JVM options</li>
</ul>
<h2 id="20180123">2018-01-23</h2>
<h2 id="2018-01-23">2018-01-23</h2>
<ul>
<li>Thinking about generating a jmeter test plan for DSpace, along the lines of <a href="https://github.com/Georgetown-University-Libraries/dspace-performance-test">Georgetown's dspace-performance-test</a></li>
<li>I got a list of all the GET requests on CGSpace for January 21st (the last time Linode complained the load was high), excluding admin calls:</li>
@ -1141,7 +1141,7 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
</code></pre><ul>
<li>I can definitely design a test plan on this!</li>
</ul>
<h2 id="20180124">2018-01-24</h2>
<h2 id="2018-01-24">2018-01-24</h2>
<ul>
<li>Looking at the REST requests, most of them are to expand all or metadata, but 5% are for retrieving bitstreams:</li>
</ul>
@ -1205,7 +1205,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
<li>Then I generated reports for these runs like this:</li>
</ul>
<pre><code>$ jmeter -g 2018-01-24-linode5451120-baseline.jtl -o 2018-01-24-linode5451120-baseline
</code></pre><h2 id="20180125">2018-01-25</h2>
</code></pre><h2 id="2018-01-25">2018-01-25</h2>
<ul>
<li>Run another round of tests on DSpace Test with jmeter after changing Tomcat's <code>minSpareThreads</code> to 20 (default is 10) and <code>acceptorThreadCount</code> to 2 (default is 1):</li>
</ul>
@ -1222,7 +1222,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
</code></pre><ul>
<li>I haven't had time to look at the results yet</li>
</ul>
<h2 id="20180126">2018-01-26</h2>
<h2 id="2018-01-26">2018-01-26</h2>
<ul>
<li>Peter followed up about some of the points from the Skype meeting last week</li>
<li>Regarding the ORCID field issue, I see <a href="http://repo.mel.cgiar.org/handle/20.500.11766/7668?show=full">ICARDA's MELSpace is using <code>cg.creator.ID</code></a>: 0000-0001-9156-7691</li>
@ -1246,7 +1246,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
<li>I submitted a test item with ORCiDs and dc.rights from a controlled vocabulary on DSpace Test: <a href="https://dspacetest.cgiar.org/handle/10568/97703">https://dspacetest.cgiar.org/handle/10568/97703</a></li>
<li>I will send it to Peter to check and give feedback (ie, about the ORCiD field name as well as allowing users to add ORCiDs manually or not)</li>
</ul>
<h2 id="20180128">2018-01-28</h2>
<h2 id="2018-01-28">2018-01-28</h2>
<ul>
<li>Assist Udana from WLE again to proof his 25 records and upload them to DSpace Test</li>
<li>I am playing with the <code>startStopThreads=&quot;0&quot;</code> parameter in Tomcat <code>&lt;Engine&gt;</code> and <code>&lt;Host&gt;</code> configuration</li>
@ -1254,7 +1254,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
<li>On my local test machine the startup time went from 70 to 30 seconds</li>
<li>See: <a href="https://tomcat.apache.org/tomcat-7.0-doc/config/host.html">https://tomcat.apache.org/tomcat-7.0-doc/config/host.html</a></li>
</ul>
<h2 id="20180129">2018-01-29</h2>
<h2 id="2018-01-29">2018-01-29</h2>
<ul>
<li>CGSpace went down this morning for a few minutes, according to UptimeRobot</li>
<li>Looking at the DSpace logs I see this error happened just before UptimeRobot noticed it going down:</li>
@ -1353,7 +1353,7 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name=&quot;jdbc/dspace&quot;
</code></pre><ul>
<li>I filed a ticket with Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566</a></li>
</ul>
<h2 id="20180131">2018-01-31</h2>
<h2 id="2018-01-31">2018-01-31</h2>
<ul>
<li>UptimeRobot says CGSpace went down at 7:57 AM, and indeed I see a lot of HTTP 499 codes in nginx logs</li>
<li>PostgreSQL activity shows 222 database connections</li>

View File

@ -27,7 +27,7 @@ We don&#39;t need to distinguish between internal and external works, so that ma
Yesterday I figured out how to monitor DSpace sessions using JMX
I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu&#39;s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -108,7 +108,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu&#39;s munin-plug
</p>
</header>
<h2 id="20180201">2018-02-01</h2>
<h2 id="2018-02-01">2018-02-01</h2>
<ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don't need to distinguish between internal and external works, so that makes it just a simple list</li>
@ -124,7 +124,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu&#39;s munin-plug
v_.value 223
v_jspui.value 1
v_oai.value 0
</code></pre><h2 id="20180203">2018-02-03</h2>
</code></pre><h2 id="2018-02-03">2018-02-03</h2>
<ul>
<li>Bram from Atmire responded about the high load caused by the Solr updater script and said it will be fixed with the updates to DSpace 5.8 compatibility: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=566</a></li>
<li>We will close that ticket for now and wait for the 5.8 stuff: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560</a></li>
@ -155,7 +155,7 @@ COPY 3723
real 0m23.839s
user 0m27.225s
sys 0m1.905s
</code></pre><h2 id="20180205">2018-02-05</h2>
</code></pre><h2 id="2018-02-05">2018-02-05</h2>
<ul>
<li>Toying with correcting authors with trailing spaces via PostgreSQL:</li>
</ul>
@ -168,7 +168,7 @@ UPDATE 20
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors-2018-02-05.csv with csv;
COPY 55630
</code></pre><h2 id="20180206">2018-02-06</h2>
</code></pre><h2 id="2018-02-06">2018-02-06</h2>
<ul>
<li>UptimeRobot says CGSpace is down this morning around 9:15</li>
<li>I see 308 PostgreSQL connections in <code>pg_stat_activity</code></li>
@ -213,7 +213,7 @@ Tue Feb 6 09:30:32 UTC 2018
<li>I'm not actually sure if the Solr web application uses the database though, so I'll have to check later and remove it if necessary</li>
<li>I deployed the changes on DSpace Test only for now, so I will monitor and make them on CGSpace later this week</li>
</ul>
<h2 id="20180207">2018-02-07</h2>
<h2 id="2018-02-07">2018-02-07</h2>
<ul>
<li>Abenet wrote to ask a question about the ORCiD lookup not working for one CIAT user on CGSpace</li>
<li>I tried on DSpace Test and indeed the lookup just doesn't work!</li>
@ -363,7 +363,7 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
<li>I cherry-picked all the commits for DS-3551 but it won't build on our current DSpace 5.5!</li>
<li>I sent a message to the dspace-tech mailing list asking why DSpace thinks these connections are busy when PostgreSQL says they are idle</li>
</ul>
<h2 id="20180210">2018-02-10</h2>
<h2 id="2018-02-10">2018-02-10</h2>
<ul>
<li>I tried to disable ORCID lookups but keep the existing authorities</li>
<li>This item has an ORCID for Ralf Kiese: http://localhost:8080/handle/10568/89897</li>
@ -378,7 +378,7 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
</code></pre><ul>
<li>So I don't think we can disable the ORCID lookup function and keep the ORCID badges</li>
</ul>
<h2 id="20180211">2018-02-11</h2>
<h2 id="2018-02-11">2018-02-11</h2>
<ul>
<li>Magdalena from CCAFS emailed to ask why one of their items has such a weird thumbnail: <a href="https://cgspace.cgiar.org/handle/10568/90735">10568/90735</a></li>
</ul>
@ -442,7 +442,7 @@ dspace=# commit;
<li>I don't know how to add ORCID IDs to existing items yet&hellip; some more querying of PostgreSQL for authority values perhaps?</li>
<li>I added the script to the <a href="https://github.com/ilri/DSpace/wiki/Scripts">ILRI DSpace wiki on GitHub</a></li>
</ul>
<h2 id="20180212">2018-02-12</h2>
<h2 id="2018-02-12">2018-02-12</h2>
<ul>
<li>Follow up with Atmire on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 Compatibility ticket</a> to ask again if they want me to send them a DSpace 5.8 branch to work on</li>
<li>Abenet asked if there was a way to get the number of submissions she and Bizuwork did</li>
@ -464,7 +464,7 @@ dspace=# commit;
<li>I think I'd probably just attach the block storage volume and mount it on /home/dspace</li>
<li>Ask Peter about <code>dc.rights</code> on DSpace Test again, if he likes it then we should move it to CGSpace soon</li>
</ul>
<h2 id="20180213">2018-02-13</h2>
<h2 id="2018-02-13">2018-02-13</h2>
<ul>
<li>Peter said he was getting a &ldquo;socket closed&rdquo; error on CGSpace</li>
<li>I looked in the dspace.log.2018-02-13 and saw one recent one:</li>
@ -497,7 +497,7 @@ dspace.log.2018-02-13:4
</ul>
<pre><code>Feb 13, 2018 2:05:42 PM org.apache.tomcat.jdbc.pool.ConnectionPool abandon
WARNING: Connection has been abandoned PooledConnection[org.postgresql.jdbc.PgConnection@22e107be]:java.lang.Exception
</code></pre><h2 id="20180214">2018-02-14</h2>
</code></pre><h2 id="2018-02-14">2018-02-14</h2>
<ul>
<li>Skype with Peter and the Addis team to discuss what we need to do for the ORCIDs in the immediate future</li>
<li>We said we'd start with a controlled vocabulary for <code>cg.creator.id</code> on the DSpace Test submission form, where we store the author name and the ORCID in some format like: Alan S. Orth (0000-0002-1735-7458)</li>
@ -552,7 +552,7 @@ UPDATE 1
</code></pre><ul>
<li>Then the cleanup process will continue for awhile and hit another foreign key conflict, and eventually it will complete after you manually resolve them all</li>
</ul>
<h2 id="20180215">2018-02-15</h2>
<h2 id="2018-02-15">2018-02-15</h2>
<ul>
<li>Altmetric seems to be indexing DSpace Test for some reason:
<ul>
@ -596,7 +596,7 @@ UPDATE 1
1512 207.46.13.59
1554 207.46.13.157
2018 104.196.152.243
</code></pre><h2 id="20180217">2018-02-17</h2>
</code></pre><h2 id="2018-02-17">2018-02-17</h2>
<ul>
<li>Peter pointed out that we had an incorrect sponsor in the controlled vocabulary: <code>U.S. Agency for International Development</code><code>United States Agency for International Development</code></li>
<li>I made a pull request to fix it ((#354)[<a href="https://github.com/ilri/DSpace/pull/354">https://github.com/ilri/DSpace/pull/354</a>])</li>
@ -604,7 +604,7 @@ UPDATE 1
</ul>
<pre><code>dspace=# update metadatavalue set text_value='United States Agency for International Development' where resource_type_id=2 and metadata_field_id=29 and text_value like '%U.S. Agency for International Development%';
UPDATE 2
</code></pre><h2 id="20180218">2018-02-18</h2>
</code></pre><h2 id="2018-02-18">2018-02-18</h2>
<ul>
<li>ICARDA's Mohamed Salem pointed out that it would be easiest to format the <code>cg.creator.id</code> field like &ldquo;Alan Orth: 0000-0002-1735-7458&rdquo; because no name will have a &ldquo;:&rdquo; so it's easier to split on</li>
<li>I finally figured out a few ways to extract ORCID iDs from metadata using XSLT and display them in the XMLUI:</li>
@ -665,7 +665,7 @@ org.springframework.web.util.NestedServletException: Handler processing failed;
<li>I have no idea what caused this crash</li>
<li>In other news, I adjusted the ORCID badge size on the XMLUI item display and sent it back to Peter for feedback</li>
</ul>
<h2 id="20180219">2018-02-19</h2>
<h2 id="2018-02-19">2018-02-19</h2>
<ul>
<li>Combined list of CGIAR author ORCID iDs is up to 1,500:</li>
</ul>
@ -708,7 +708,7 @@ TypeError: 'NoneType' object is not subscriptable
</code></pre><ul>
<li>According to ORCID that identifier's entire name block is null!</li>
</ul>
<h2 id="20180220">2018-02-20</h2>
<h2 id="2018-02-20">2018-02-20</h2>
<ul>
<li>Send Abenet an email about getting a purchase requisition for a new DSpace Test server on Linode</li>
<li>Discuss some of the issues with null values and poor-quality names in some ORCID identifiers with Abenet and I think we'll now only use ORCID iDs that have been sent to use from partners, not those extracted via keyword searches on orcid.org</li>
@ -756,7 +756,7 @@ TypeError: 'NoneType' object is not subscriptable
<li>Remove CPWF project number and Humidtropics subject from submission form (<a href="https://github.com/alanorth/DSpace/pull/3">#3</a>)</li>
<li>I accidentally merged it into my own repository, oops</li>
</ul>
<h2 id="20180222">2018-02-22</h2>
<h2 id="2018-02-22">2018-02-22</h2>
<ul>
<li>CGSpace was apparently down today around 13:00 server time and I didn't get any emails on my phone, but saw them later on the computer</li>
<li>It looks like Sisay restarted Tomcat because I was offline</li>
@ -803,11 +803,11 @@ TypeError: 'NoneType' object is not subscriptable
</code></pre><ul>
<li>It seems to re-use its user agent but makes tons of useless requests and I wonder if I should add &ldquo;.<em>spider.</em>&rdquo; to the Tomcat Crawler Session Manager valve?</li>
</ul>
<h2 id="20180223">2018-02-23</h2>
<h2 id="2018-02-23">2018-02-23</h2>
<ul>
<li>Atmire got back to us with a quote about their DSpace 5.8 upgrade</li>
</ul>
<h2 id="20180225">2018-02-25</h2>
<h2 id="2018-02-25">2018-02-25</h2>
<ul>
<li>A few days ago Abenet sent me the list of ORCID iDs from CCAFS</li>
<li>We currently have 988 unique identifiers:</li>
@ -872,7 +872,7 @@ Alan S. Orth: 0000-0002-1735-7458
Ibrahim Mohammed: 0000-0001-5199-5528
Nor Azwadi: 0000-0001-9634-1958
./resolve-orcids.py -i orcid-test-values.txt -o /tmp/orcid-names 0.23s user 0.05s system 8% cpu 3.046 total
</code></pre><h2 id="20180226">2018-02-26</h2>
</code></pre><h2 id="2018-02-26">2018-02-26</h2>
<ul>
<li>Peter is having problems with &ldquo;Socket closed&rdquo; on his submissions page again</li>
<li>He says his personal account loads much faster than his CGIAR account, which could be because the CGIAR account has potentially thousands of submissions over the last few years</li>
@ -880,7 +880,7 @@ Nor Azwadi: 0000-0001-9634-1958
<li>I think I should increase the <code>removeAbandonedTimeout</code> from 90 to something like 180 and continue observing</li>
<li>I also reduced the timeout for the API pool back to 60 because those interfaces are only used by bots</li>
</ul>
<h2 id="20180227">2018-02-27</h2>
<h2 id="2018-02-27">2018-02-27</h2>
<ul>
<li>Peter is still having problems with &ldquo;Socket closed&rdquo; on his submissions page</li>
<li>I have disabled <code>removeAbandoned</code> for now because that's the only thing I changed in the last few weeks since he started having issues</li>
@ -923,7 +923,7 @@ COPY 263
<li>It successfully mapped 2600 ORCID identifiers to items in my tests</li>
<li>I will run it on DSpace Test</li>
</ul>
<h2 id="20180228">2018-02-28</h2>
<h2 id="2018-02-28">2018-02-28</h2>
<ul>
<li>CGSpace crashed today, the first HTTP 499 in nginx's access.log was around 09:12</li>
<li>There's nothing interesting going on in nginx's logs around that time:</li>

View File

@ -21,7 +21,7 @@ Export a CSV of the IITA community metadata for Martin Mueller
Export a CSV of the IITA community metadata for Martin Mueller
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -102,11 +102,11 @@ Export a CSV of the IITA community metadata for Martin Mueller
</p>
</header>
<h2 id="20180302">2018-03-02</h2>
<h2 id="2018-03-02">2018-03-02</h2>
<ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul>
<h2 id="20180306">2018-03-06</h2>
<h2 id="2018-03-06">2018-03-06</h2>
<ul>
<li>Add three new CCAFS project tags to <code>input-forms.xml</code> (<a href="https://github.com/ilri/DSpace/pull/357">#357</a>)</li>
<li>Andrea from Macaroni Bros had sent me an email that CCAFS needs them</li>
@ -138,14 +138,14 @@ UPDATE 1
</code></pre><ul>
<li>Apply the proposed PostgreSQL indexes from DS-3636 (pull request <a href="https://github.com/DSpace/DSpace/pull/1791/">#1791</a> on CGSpace (linode18)</li>
</ul>
<h2 id="20180307">2018-03-07</h2>
<h2 id="2018-03-07">2018-03-07</h2>
<ul>
<li>Add CIAT author Mauricio Efren Sotelo Cabrera to controlled vocabulary for ORCID identifiers (<a href="https://github.com/ilri/DSpace/pull/360">#360</a>)</li>
<li>Help Sisay proof 200 IITA records on DSpace Test</li>
<li>Finally import Udana's 24 items to <a href="https://cgspace.cgiar.org/handle/10568/36185">IWMI Journal Articles</a> on CGSpace</li>
<li>Skype with James Stapleton to discuss CGSpace, ILRI website, CKM staff issues, etc</li>
</ul>
<h2 id="20180308">2018-03-08</h2>
<h2 id="2018-03-08">2018-03-08</h2>
<ul>
<li>Looking at a CSV dump of the CIAT community I see there are tons of stupid text languages people add for their metadata</li>
<li>This makes the CSV have tons of columns, for example <code>dc.title</code>, <code>dc.title[]</code>, <code>dc.title[en]</code>, <code>dc.title[eng]</code>, <code>dc.title[en_US]</code> and so on!</li>
@ -218,12 +218,12 @@ UPDATE 2309
<li>I added ORCID identifers for 187 items by CIAT's Hernan Ceballos, because that is what Elizabeth was trying to do manually!</li>
<li>Also, I decided to add ORCID identifiers for all records from Peter, Abenet, and Sisay as well</li>
</ul>
<h2 id="20180309">2018-03-09</h2>
<h2 id="2018-03-09">2018-03-09</h2>
<ul>
<li>Give James Stapleton input on Sisay's KRAs</li>
<li>Create a pull request to disable ORCID authority integration for <code>dc.contributor.author</code> in the submission forms and XMLUI display (<a href="https://github.com/ilri/DSpace/pull/363">#363</a>)</li>
</ul>
<h2 id="20180311">2018-03-11</h2>
<h2 id="2018-03-11">2018-03-11</h2>
<ul>
<li>Peter also wrote to say he is having issues with the Atmire Listings and Reports module</li>
<li>When I logged in to try it I get a blank white page after continuing and I see this in dspace.log.2018-03-11:</li>
@ -242,11 +242,11 @@ org.apache.jasper.JasperException: java.lang.NullPointerException
<li>Looks like I needed to remove the Humidtropics subject from Listings and Reports because it was looking for the terms and couldn't find them</li>
<li>I made a quick fix and it's working now (<a href="https://github.com/ilri/DSpace/pull/364">#364</a>)</li>
</ul>
<h2 id="20180312">2018-03-12</h2>
<h2 id="2018-03-12">2018-03-12</h2>
<ul>
<li>Increase upload size on CGSpace's nginx config to 85MB so Sisay can upload some data</li>
</ul>
<h2 id="20180313">2018-03-13</h2>
<h2 id="2018-03-13">2018-03-13</h2>
<ul>
<li>I created a new Linode server for DSpace Test (linode6623840) so I could try the block storage stuff, but when I went to add a 300GB volume it said that block storage capacity was exceeded in that datacenter (Newark, NJ)</li>
<li>I deleted the Linode and created another one (linode6624164) in the Fremont, CA region</li>
@ -258,14 +258,14 @@ org.apache.jasper.JasperException: java.lang.NullPointerException
<li>CCAFS publication page: <a href="https://ccafs.cgiar.org/publications/can-scenario-planning-catalyse-transformational-change-evaluating-climate-change-policy">https://ccafs.cgiar.org/publications/can-scenario-planning-catalyse-transformational-change-evaluating-climate-change-policy</a></li>
<li>Peter tweeted the Handle link and now Altmetric shows the donut for both the DOI and the Handle</li>
</ul>
<h2 id="20180314">2018-03-14</h2>
<h2 id="2018-03-14">2018-03-14</h2>
<ul>
<li>Help Abenet with a troublesome Listings and Report question for CIAT author Steve Beebe</li>
<li>Continue migrating DSpace Test to the new server (linode6624164)</li>
<li>I emailed ILRI service desk to update the DNS records for dspacetest.cgiar.org</li>
<li>Abenet was having problems saving Listings and Reports configurations or layouts but I tested it and it works</li>
</ul>
<h2 id="20180315">2018-03-15</h2>
<h2 id="2018-03-15">2018-03-15</h2>
<ul>
<li>Help Abenet troubleshoot the Listings and Reports issue again</li>
<li>It looks like it's an issue with the layouts, if you create a new layout that only has one type (<code>dc.identifier.citation</code>):</li>
@ -281,7 +281,7 @@ org.apache.jasper.JasperException: java.lang.NullPointerException
<li>I submitted a ticket to Atmire: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589</a></li>
<li>Small fix to the example citation text in Listings and Reports (<a href="https://github.com/ilri/DSpace/pull/365">#365</a>)</li>
</ul>
<h2 id="20180316">2018-03-16</h2>
<h2 id="2018-03-16">2018-03-16</h2>
<ul>
<li>ICT made the DNS updates for dspacetest.cgiar.org late last night</li>
<li>I have removed the old server (linode02 aka linode578611) in favor of linode19 aka linode6624164</li>
@ -300,7 +300,7 @@ COPY 21
</code></pre><ul>
<li>Create a pull request to update the input forms for the new CRP subject style (<a href="https://github.com/ilri/DSpace/pull/366">#366</a>)</li>
</ul>
<h2 id="20180319">2018-03-19</h2>
<h2 id="2018-03-19">2018-03-19</h2>
<ul>
<li>Tezira has been having problems accessing CGSpace from the ILRI Nairobi campus since last week</li>
<li>She is getting an HTTPS error apparently</li>
@ -355,7 +355,7 @@ Exception in thread &quot;http-bio-127.0.0.1-8081-exec-280&quot; java.lang.OutOf
<li>The title is &ldquo;Untitled&rdquo; and there is some metadata but indeed the citation is missing</li>
<li>I don't know what would cause that</li>
</ul>
<h2 id="20180320">2018-03-20</h2>
<h2 id="2018-03-20">2018-03-20</h2>
<ul>
<li>DSpace Test has been down for a few hours with SQL and memory errors starting this morning:</li>
</ul>
@ -401,7 +401,7 @@ java.lang.IllegalArgumentException: No choices plugin was configured for field
</code></pre><ul>
<li>I have to figure that one out&hellip;</li>
</ul>
<h2 id="20180321">2018-03-21</h2>
<h2 id="2018-03-21">2018-03-21</h2>
<ul>
<li>Looks like the indexing gets confused that there is still data in the <code>authority</code> column</li>
<li>Unfortunately this causes those items to simply not be indexed, which users noticed because item counts were cut in half and old items showed up in RSS!</li>
@ -466,17 +466,17 @@ sys 2m45.135s
</code></pre><ul>
<li>I need to be able to add many common characters though so that it is useful to copy and paste into a new project to find issues</li>
</ul>
<h2 id="20180322">2018-03-22</h2>
<h2 id="2018-03-22">2018-03-22</h2>
<ul>
<li>Add ORCID identifier for Silvia Alonso</li>
<li>Update my Mirage 2 setup notes for Ubuntu 18.04: <a href="https://gist.github.com/alanorth/9bfd29feb7d2e836a9d417633319b3f5">https://gist.github.com/alanorth/9bfd29feb7d2e836a9d417633319b3f5</a></li>
</ul>
<h2 id="20180324">2018-03-24</h2>
<h2 id="2018-03-24">2018-03-24</h2>
<ul>
<li>More work on the Ubuntu 18.04 readiness stuff for the <a href="https://github.com/ilri/rmg-ansible-public">Ansible playbooks</a></li>
<li>The playbook now uses the system's Ruby and Node.js so I don't have to manually install RVM and NVM after</li>
</ul>
<h2 id="20180325">2018-03-25</h2>
<h2 id="2018-03-25">2018-03-25</h2>
<ul>
<li>Looking at Peter's author corrections and trying to work out a way to find errors in OpenRefine easily</li>
<li>I can find all names that have acceptable characters using a GREL expression like:</li>
@ -520,16 +520,16 @@ $ ./delete-metadata-values.py -i /tmp/Delete-8-Authors-2018-03-21.csv -f dc.cont
<li>CGSpace took 76m28.292s</li>
<li>DSpace Test took 194m56.048s</li>
</ul>
<h2 id="20180326">2018-03-26</h2>
<h2 id="2018-03-26">2018-03-26</h2>
<ul>
<li>Atmire got back to me about the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">Listings and Reports issue</a> and said it's caused by items that have missing <code>dc.identifier.citation</code> fields</li>
<li>The will send a fix</li>
</ul>
<h2 id="20180327">2018-03-27</h2>
<h2 id="2018-03-27">2018-03-27</h2>
<ul>
<li>Atmire got back with an updated quote about the DSpace 5.8 compatibility so I've forwarded it to Peter</li>
</ul>
<h2 id="20180328">2018-03-28</h2>
<h2 id="2018-03-28">2018-03-28</h2>
<ul>
<li>DSpace Test crashed due to heap space so I've increased it from 4096m to 5120m</li>
<li>The error in Tomcat's <code>catalina.out</code> was:</li>

View File

@ -23,7 +23,7 @@ Catalina logs at least show some memory errors yesterday:
I tried to test something on DSpace Test but noticed that it&#39;s down since god knows when
Catalina logs at least show some memory errors yesterday:
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -104,7 +104,7 @@ Catalina logs at least show some memory errors yesterday:
</p>
</header>
<h2 id="20180401">2018-04-01</h2>
<h2 id="2018-04-01">2018-04-01</h2>
<ul>
<li>I tried to test something on DSpace Test but noticed that it's down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li>
@ -121,7 +121,7 @@ Exception in thread &quot;ContainerBackgroundProcessor[StandardEngine[Catalina]]
<li>I posted a message on Yammer to ask if people are using the Duplicate Check step from the Metadata Quality Module</li>
<li>Help Lili Szilagyi with a question about statistics on some CCAFS items</li>
</ul>
<h2 id="20180404">2018-04-04</h2>
<h2 id="2018-04-04">2018-04-04</h2>
<ul>
<li>Peter noticed that there were still some old CRP names on CGSpace, because I hadn't forced the Discovery index to be updated after I fixed the others last week</li>
<li>For completeness I re-ran the CRP corrections on CGSpace:</li>
@ -168,7 +168,7 @@ $ git rebase -i dspace-5.8
<li>I need to send this branch to Atmire and also arrange payment (see <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">ticket #560</a> in their tracker)</li>
<li>Fix Sisay's SSH access to the new DSpace Test server (linode19)</li>
</ul>
<h2 id="20180405">2018-04-05</h2>
<h2 id="2018-04-05">2018-04-05</h2>
<ul>
<li>Fix Sisay's sudo access on the new DSpace Test server (linode19)</li>
<li>The reindexing process on DSpace Test took <em>forever</em> yesterday:</li>
@ -192,7 +192,7 @@ sys 2m52.585s
<li>Proof some records on DSpace Test for Udana from IWMI</li>
<li>He has done better with the small syntax and consistency issues but then there are larger concerns with not linking to DOIs, copying titles incorrectly, etc</li>
</ul>
<h2 id="20180410">2018-04-10</h2>
<h2 id="2018-04-10">2018-04-10</h2>
<ul>
<li>I got a notice that CGSpace CPU usage was very high this morning</li>
<li>Looking at the nginx logs, here are the top users today so far:</li>
@ -344,7 +344,7 @@ UPDATE 1
<li>I told Udana to fix the citation and abstract of the one item, and to correct the <code>dc.language.iso</code> for the five Spanish items in his Book Chapters collection</li>
<li>Then we can import the records to CGSpace</li>
</ul>
<h2 id="20180411">2018-04-11</h2>
<h2 id="2018-04-11">2018-04-11</h2>
<ul>
<li>DSpace Test (linode19) crashed again some time since yesterday:</li>
</ul>
@ -353,16 +353,16 @@ UPDATE 1
</code></pre><ul>
<li>I ran all system updates and rebooted the server</li>
</ul>
<h2 id="20180412">2018-04-12</h2>
<h2 id="2018-04-12">2018-04-12</h2>
<ul>
<li>I caught wind of an interesting XMLUI performance optimization coming in DSpace 6.3: <a href="https://jira.duraspace.org/browse/DS-3883">https://jira.duraspace.org/browse/DS-3883</a></li>
<li>I asked for it to be ported to DSpace 5.x</li>
</ul>
<h2 id="20180413">2018-04-13</h2>
<h2 id="2018-04-13">2018-04-13</h2>
<ul>
<li>Add <code>PII-LAM_CSAGender</code> to CCAFS Phase II project tags in <code>input-forms.xml</code></li>
</ul>
<h2 id="20180415">2018-04-15</h2>
<h2 id="2018-04-15">2018-04-15</h2>
<ul>
<li>While testing an XMLUI patch for <a href="https://jira.duraspace.org/browse/DS-3883">DS-3883</a> I noticed that there is still some remaining Authority / Solr configuration left that we need to remove:</li>
</ul>
@ -385,11 +385,11 @@ Total time: 4 minutes 12 seconds
<li>The Linode block storage is much slower than the instance storage</li>
<li>I ran all system updates and rebooted DSpace Test (linode19)</li>
</ul>
<h2 id="20180416">2018-04-16</h2>
<h2 id="2018-04-16">2018-04-16</h2>
<ul>
<li>Communicate with Bioversity about their project to migrate their e-Library (Typo3) and Sci-lit databases to CGSpace</li>
</ul>
<h2 id="20180418">2018-04-18</h2>
<h2 id="2018-04-18">2018-04-18</h2>
<ul>
<li>IWMI people are asking about building a search query that outputs RSS for their reports</li>
<li>They want the same results as this Discovery query: <a href="https://cgspace.cgiar.org/discover?filtertype_1=dateAccessioned&amp;filter_relational_operator_1=contains&amp;filter_1=2018&amp;submit_apply_filter=&amp;query=&amp;scope=10568%2F16814&amp;rpp=100&amp;sort_by=dc.date.issued_dt&amp;order=desc">https://cgspace.cgiar.org/discover?filtertype_1=dateAccessioned&amp;filter_relational_operator_1=contains&amp;filter_1=2018&amp;submit_apply_filter=&amp;query=&amp;scope=10568%2F16814&amp;rpp=100&amp;sort_by=dc.date.issued_dt&amp;order=desc</a></li>
@ -422,7 +422,7 @@ webui.itemlist.sort-option.4 = type:dc.type:text
<li>I got a list of all the CIP collections manually and use the same query that I used in <a href="/cgspace-notes/2017-08">August, 2017</a>:</li>
</ul>
<pre><code>dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/89347', '10568/88229', '10568/53086', '10568/53085', '10568/69069', '10568/53087', '10568/53088', '10568/53089', '10568/53090', '10568/53091', '10568/53092', '10568/70150', '10568/53093', '10568/64874', '10568/53094'))) group by text_value order by count desc) to /tmp/cip-authors.csv with csv;
</code></pre><h2 id="20180419">2018-04-19</h2>
</code></pre><h2 id="2018-04-19">2018-04-19</h2>
<ul>
<li>Run updates on DSpace Test (linode19) and reboot the server</li>
<li>Also try deploying updated GeoLite database during ant update while re-deploying code:</li>
@ -442,7 +442,7 @@ sys 2m2.687s
</code></pre><ul>
<li>This time is with about 70,000 items in the repository</li>
</ul>
<h2 id="20180420">2018-04-20</h2>
<h2 id="2018-04-20">2018-04-20</h2>
<ul>
<li>Gabriela from CIP emailed to say that CGSpace was returning a white page, but I haven't seen any emails from UptimeRobot</li>
<li>I confirm that it's just giving a white page around 4:16</li>
@ -515,7 +515,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Time
</code></pre><ul>
<li>Very suspect!</li>
</ul>
<h2 id="20180424">2018-04-24</h2>
<h2 id="2018-04-24">2018-04-24</h2>
<ul>
<li>Testing my Ansible playbooks with a clean and updated installation of Ubuntu 18.04 and I fixed some issues that I hadn't run into a few weeks ago</li>
<li>There seems to be a new issue with Java dependencies, though</li>
@ -529,7 +529,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Time
<li>Also, I started porting PostgreSQL 9.6 into the Ansible infrastructure scripts</li>
<li>This should be a drop in I believe, though I will definitely test it more locally as well as on DSpace Test once we move to DSpace 5.8 and Ubuntu 18.04 in the coming months</li>
</ul>
<h2 id="20180425">2018-04-25</h2>
<h2 id="2018-04-25">2018-04-25</h2>
<ul>
<li>Still testing the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> for Ubuntu 18.04, Tomcat 8.5, and PostgreSQL 9.6</li>
<li>One other new thing I notice is that PostgreSQL 9.6 no longer uses <code>createuser</code> and <code>nocreateuser</code>, as those have actually meant <code>superuser</code> and <code>nosuperuser</code> and have been deprecated for <em>ten years</em></li>
@ -556,12 +556,12 @@ $ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-
<li>There's a <a href="https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=895866">Debian bug about this from a few weeks ago</a></li>
<li>Apparently Tomcat was compiled with Java 9, so doesn't work with Java 8</li>
</ul>
<h2 id="20180429">2018-04-29</h2>
<h2 id="2018-04-29">2018-04-29</h2>
<ul>
<li>DSpace Test crashed again, looks like memory issues again</li>
<li>JVM heap size was last increased to 6144m but the system only has 8GB total so there's not much we can do here other than get a bigger Linode instance or remove the massive Solr Statistics data</li>
</ul>
<h2 id="20180430">2018-04-30</h2>
<h2 id="2018-04-30">2018-04-30</h2>
<ul>
<li>DSpace Test crashed again</li>
<li>I will email the CGSpace team to ask them whether or not we want to commit to having a public test server that accurately mirrors CGSpace (ie, to upgrade to the next largest Linode)</li>

View File

@ -35,7 +35,7 @@ http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E
Then I reduced the JVM heap size from 6144 back to 5120m
Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -116,7 +116,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
</p>
</header>
<h2 id="20180501">2018-05-01</h2>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
@ -127,7 +127,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul>
<h2 id="20180502">2018-05-02</h2>
<h2 id="2018-05-02">2018-05-02</h2>
<ul>
<li>Advise Fabio Fidanza about integrating CGSpace content in the new CGIAR corporate website</li>
<li>I think they can mostly rely on using the <code>cg.contributor.crp</code> field</li>
@ -161,7 +161,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
</ul>
</li>
</ul>
<h2 id="20180503">2018-05-03</h2>
<h2 id="2018-05-03">2018-05-03</h2>
<ul>
<li>It turns out that the IITA records that I was helping Sisay with in March were imported in 2018-04 without a final check by Abenet or I</li>
<li>There are lots of errors on language, CRP, and even some encoding errors on abstract fields</li>
@ -172,7 +172,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
<li>Abenet sent a list of 46 ORCID identifiers for ILRI authors so I need to get their names using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script and merge them into our controlled vocabulary</li>
<li>On the messed up IITA records from 2018-04 I see sixty DOIs in incorrect format (cg.identifier.doi)</li>
</ul>
<h2 id="20180506">2018-05-06</h2>
<h2 id="2018-05-06">2018-05-06</h2>
<ul>
<li>Fixing the IITA records from Sisay, sixty DOIs have completely invalid format like <code>http:dx.doi.org10.1016j.cropro.2008.07.003</code></li>
<li>I corrected all the DOIs and then checked them for validity with a quick bash loop:</li>
@ -218,7 +218,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
<li>I made a pull request (<a href="https://github.com/ilri/DSpace/pull/373">#373</a>) for this that I'll merge some time next week (I'm expecting Atmire to get back to us about DSpace 5.8 soon)</li>
<li>After testing quickly I just decided to merge it, and I noticed that I don't even need to restart Tomcat for the changes to get loaded</li>
</ul>
<h2 id="20180507">2018-05-07</h2>
<h2 id="2018-05-07">2018-05-07</h2>
<ul>
<li>I spent a bit of time playing with <a href="https://github.com/codeforkjeff/conciliator">conciliator</a> and Solr, trying to figure out how to reconcile columns in OpenRefine with data in our existing Solr cores (like CRP subjects)</li>
<li>The documentation regarding the Solr stuff is limited, and I cannot figure out what all the fields in <code>conciliator.properties</code> are supposed to be</li>
@ -226,7 +226,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
<li>That, combined with splitting our multi-value fields on &ldquo;||&rdquo; in OpenRefine is amaaaaazing, because after reconciliation you can just join them again</li>
<li>Oh wow, you can also facet on the individual values once you've split them! That's going to be amazing for proofing CRPs, subjects, etc.</li>
</ul>
<h2 id="20180509">2018-05-09</h2>
<h2 id="2018-05-09">2018-05-09</h2>
<ul>
<li>Udana asked about the Book Chapters we had been proofing on DSpace Test in 2018-04</li>
<li>I told him that there were still some TODO items for him on that data, for example to update the <code>dc.language.iso</code> field for the Spanish items</li>
@ -271,7 +271,7 @@ Livestock and Fish
</code></pre><ul>
<li>I tried to reconcile against a CSV of our countries but reconcile-csv crashes</li>
</ul>
<h2 id="20180513">2018-05-13</h2>
<h2 id="2018-05-13">2018-05-13</h2>
<ul>
<li>It turns out there was a space in my &ldquo;country&rdquo; header that was causing reconcile-csv to crash</li>
<li>After removing that it works fine!</li>
@ -291,12 +291,12 @@ Livestock and Fish
</ul>
</li>
</ul>
<h2 id="20180514">2018-05-14</h2>
<h2 id="2018-05-14">2018-05-14</h2>
<ul>
<li>Send a message to the OpenRefine mailing list about the bug with reconciling multi-value cells</li>
<li>Help Silvia Alonso get a list of all her publications since 2013 from Listings and Reports</li>
</ul>
<h2 id="20180515">2018-05-15</h2>
<h2 id="2018-05-15">2018-05-15</h2>
<ul>
<li>Turns out I was doing the OpenRefine reconciliation wrong: I needed to copy the matched values to a new column!</li>
<li>Also, I learned how to do something cool with Jython expressions in OpenRefine</li>
@ -358,7 +358,7 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
<li>I copied over the DSpace <code>search_text</code> field type from the DSpace Solr config (had to remove some properties so Solr would start) but it doesn't seem to be any better at matching than the <code>text_en</code> type</li>
<li>I think I need to focus on trying to return scores with conciliator</li>
</ul>
<h2 id="20180516">2018-05-16</h2>
<h2 id="2018-05-16">2018-05-16</h2>
<ul>
<li>Discuss GDPR with James Stapleton
<ul>
@ -381,7 +381,7 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
<li>According to the <a href="https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference#anonymizeIp">analytics.js protocol parameter documentation</a> this means that IPs are being anonymized</li>
<li>After finding and fixing some duplicates in IITA's <code>IITA_April_27</code> test collection on DSpace Test (10568/92703) I told Sisay that he can move them to IITA's Journal Articles collection on CGSpace</li>
</ul>
<h2 id="20180517">2018-05-17</h2>
<h2 id="2018-05-17">2018-05-17</h2>
<ul>
<li>Testing reconciliation of countries against Solr via conciliator, I notice that <code>CÔTE D'IVOIRE</code> doesn't match <code>COTE D'IVOIRE</code>, whereas with reconcile-csv it does</li>
<li>Also, when reconciling regions against Solr via conciliator <code>EASTERN AFRICA</code> doesn't match <code>EAST AFRICA</code>, whereas with reconcile-csv it does</li>
@ -401,23 +401,23 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
<li>This cookie could be set by a user clicking a link in a privacy policy, for example</li>
<li>The additional Javascript could be easily added to our existing <code>googleAnalytics</code> template in each XMLUI theme</li>
</ul>
<h2 id="20180518">2018-05-18</h2>
<h2 id="2018-05-18">2018-05-18</h2>
<ul>
<li>Do a final check on the thirty (30) IWMI Book Chapters for Udana and upload them to CGSpace</li>
<li>These were previously on <a href="https://dspacetest.cgiar.org/handle/10568/91679">DSpace Test as &ldquo;IWMI test collection&rdquo;</a> in 2018-04</li>
</ul>
<h2 id="20180520">2018-05-20</h2>
<h2 id="2018-05-20">2018-05-20</h2>
<ul>
<li>Run all system updates on DSpace Test (linode19), re-deploy DSpace with latest <code>5_x-dev</code> branch (including GDPR IP anonymization), and reboot the server</li>
<li>Run all system updates on CGSpace (linode18), re-deploy DSpace with latest <code>5_x-dev</code> branch (including GDPR IP anonymization), and reboot the server</li>
</ul>
<h2 id="20180521">2018-05-21</h2>
<h2 id="2018-05-21">2018-05-21</h2>
<ul>
<li>Geoffrey from IITA got back with more questions about depositing items programatically into the CGSpace workflow</li>
<li>I pointed out that <a href="http://swordapp.org/">SWORD</a> might be an option, as <a href="https://wiki.duraspace.org/display/DSDOC5x/SWORDv2+Server">DSpace supports the SWORDv2 protocol</a> (although we have never tested it)</li>
<li>Work on implementing <a href="https://cookieconsent.insites.com">cookie consent</a> popup for all XMLUI themes (SASS theme with primary / secondary branding from Bootstrap)</li>
</ul>
<h2 id="20180522">2018-05-22</h2>
<h2 id="2018-05-22">2018-05-22</h2>
<ul>
<li>Skype with James Stapleton about last minute GDPR wording</li>
<li>After spending yesterday working on integration and theming of the cookieconsent popup, today I cannot get the damn &ldquo;Agree&rdquo; button to dismiss the popup!</li>
@ -427,7 +427,7 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
<li>This is a waste of TWO full days of work</li>
<li>Marissa Van Epp asked if I could add <code>PII-FP1_PACCA2</code> to the CCAFS phase II project tags on CGSpace so I created a ticket to track it (<a href="https://github.com/ilri/DSpace/issues/376">#376</a>)</li>
</ul>
<h2 id="20180523">2018-05-23</h2>
<h2 id="2018-05-23">2018-05-23</h2>
<ul>
<li>I'm investigating how many non-CGIAR users we have registered on CGSpace:</li>
</ul>
@ -439,14 +439,14 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
<li>I made a pull request for the GDPR compliance popup (<a href="https://github.com/ilri/DSpace/pull/377">#377</a>) and merged it to the <code>5_x-prod</code> branch</li>
<li>I will deploy it to CGSpace tonight</li>
</ul>
<h2 id="20180528">2018-05-28</h2>
<h2 id="2018-05-28">2018-05-28</h2>
<ul>
<li>Daniel Haile-Michael sent a message that CGSpace was down (I am currently in Oregon so the time difference is ~10 hours)</li>
<li>I looked in the logs but didn't see anything that would be the cause of the crash</li>
<li>Atmire finalized the DSpace 5.8 testing and sent a pull request: <a href="https://github.com/ilri/DSpace/pull/378">https://github.com/ilri/DSpace/pull/378</a></li>
<li>They have asked if I can test this and get back to them by June 11th</li>
</ul>
<h2 id="20180530">2018-05-30</h2>
<h2 id="2018-05-30">2018-05-30</h2>
<ul>
<li>Talk to Samantha from Bioversity about something related to Google Analytics, I'm still not sure what they want</li>
<li>DSpace Test crashed last night, seems to be related to system memory (not JVM heap)</li>
@ -479,7 +479,7 @@ $ sed 's/.*Item1.*/\n&amp;/g' ~/cifor-duplicates.txt &gt; ~/cifor-duplicates-cle
<li>Then I format the list of handles and put it into this SQL query to export authors from items ONLY in those collections (too many to list here):</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/67236','10568/67274',...))) group by text_value order by count desc) to /tmp/ilri-authors.csv with csv;
</code></pre><h2 id="20180531">2018-05-31</h2>
</code></pre><h2 id="2018-05-31">2018-05-31</h2>
<ul>
<li>Clarify CGSpace's usage of Google Analytics and personally identifiable information during user registration for Bioversity team who had been asking about GDPR compliance</li>
<li>Testing running PostgreSQL in a Docker container on localhost because when I'm on Arch Linux there isn't an easily installable package for particular PostgreSQL versions</li>

View File

@ -55,7 +55,7 @@ real 74m42.646s
user 8m5.056s
sys 2m7.289s
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -136,7 +136,7 @@ sys 2m7.289s
</p>
</header>
<h2 id="20180604">2018-06-04</h2>
<h2 id="2018-06-04">2018-06-04</h2>
<ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul>
@ -156,13 +156,13 @@ sys 2m7.289s
real 74m42.646s
user 8m5.056s
sys 2m7.289s
</code></pre><h2 id="20180606">2018-06-06</h2>
</code></pre><h2 id="2018-06-06">2018-06-06</h2>
<ul>
<li>It turns out that I needed to add a server block for <code>atmire.com-snapshots</code> to my Maven settings, so now the Atmire code builds</li>
<li>Now Maven and Ant run properly, but I'm getting SQL migration errors in <code>dspace.log</code> after starting Tomcat</li>
<li>I've updated my ticket on Atmire's bug tracker: <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560</a></li>
</ul>
<h2 id="20180607">2018-06-07</h2>
<h2 id="2018-06-07">2018-06-07</h2>
<ul>
<li>Proofing 200 IITA records on DSpace Test for Sisay: <a href="https://dspacetest.cgiar.org/handle/10568/95391">IITA_Junel_06 (10568/95391)</a>
<ul>
@ -201,13 +201,13 @@ update schema_version set version = '5.8.2015.12.03.3' where version = '5.5.2015
</code></pre><ul>
<li>I will apply them on CGSpace tomorrow I think&hellip;</li>
</ul>
<h2 id="20180609">2018-06-09</h2>
<h2 id="2018-06-09">2018-06-09</h2>
<ul>
<li>It's pretty annoying, but the JVM monitoring for Munin was never set up when I migrated DSpace Test to its new server a few months ago</li>
<li>I ran the tomcat and munin-node tags in Ansible again and now the stuff is all wired up and recording stats properly</li>
<li>I applied the CIP author corrections on CGSpace and DSpace Test and re-ran the Discovery indexing</li>
</ul>
<h2 id="20180610">2018-06-10</h2>
<h2 id="2018-06-10">2018-06-10</h2>
<ul>
<li>I spent some time removing the Atmire Metadata Quality Module (MQM) from the proposed DSpace 5.8 changes</li>
<li>After removing all code mentioning MQM, mqm, metadata-quality, batchedit, duplicatechecker, etc, I think I got most of it removed, but there is a Spring error during Tomcat startup:</li>
@ -237,7 +237,7 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
</li>
<li>I will have to tell IITA people to redo these entirely I think&hellip;</li>
</ul>
<h2 id="20180611">2018-06-11</h2>
<h2 id="2018-06-11">2018-06-11</h2>
<ul>
<li>Sisay sent a new version of the last IITA records that he created from the original CSV from IITA</li>
<li>The 200 records are in the <a href="https://dspacetest.cgiar.org/handle/10568/95870">IITA_Junel_11 (10568/95870)</a> collection</li>
@ -265,7 +265,7 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
<li>I always use the built-in trim and collapse transformations anyways, but this seems to work to find the offending cells: <code>isNotNull(value.match(/.*?\s{2,}.*?/))</code></li>
<li>I wonder if I should start checking for &ldquo;smart&rdquo; quotes like (hex 2019)</li>
</ul>
<h2 id="20180612">2018-06-12</h2>
<h2 id="2018-06-12">2018-06-12</h2>
<ul>
<li>Udana from IWMI asked about the OAI base URL for their community on CGSpace</li>
<li>I think it should be this: <a href="https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=oai_dc&amp;set=com_10568_16814">https://cgspace.cgiar.org/oai/request?verb=ListRecords&amp;metadataPrefix=oai_dc&amp;set=com_10568_16814</a></li>
@ -341,7 +341,7 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
</ul>
</li>
</ul>
<h2 id="20180613">2018-06-13</h2>
<h2 id="2018-06-13">2018-06-13</h2>
<ul>
<li>Elizabeth from CIAT contacted me to ask if I could add ORCID identifiers to all of Robin Buruchara's items</li>
<li>I used my <a href="https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py">add-orcid-identifiers-csv.py</a> script:</li>
@ -365,14 +365,14 @@ Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign k
</ul>
<pre><code>$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (152402);'
UPDATE 1
</code></pre><h2 id="20180614">2018-06-14</h2>
</code></pre><h2 id="2018-06-14">2018-06-14</h2>
<ul>
<li>Check through Udana's IWMI records from last week on DSpace Test</li>
<li>There were only some minor whitespace and one or two syntax errors, but they look very good otherwise</li>
<li>I uploaded the twenty-four reports to the IWMI Reports collection: <a href="https://cgspace.cgiar.org/handle/10568/36188">https://cgspace.cgiar.org/handle/10568/36188</a></li>
<li>I uploaded the seventy-six book chapters to the IWMI Book Chapters collection: <a href="https://cgspace.cgiar.org/handle/10568/36178">https://cgspace.cgiar.org/handle/10568/36178</a></li>
</ul>
<h2 id="20180624">2018-06-24</h2>
<h2 id="2018-06-24">2018-06-24</h2>
<ul>
<li>I was restoring a PostgreSQL dump on my test machine and found a way to restore the CGSpace dump as the <code>postgres</code> user, but have the owner of the schema be the <code>dspacetest</code> user:</li>
</ul>
@ -427,7 +427,7 @@ Done.
&quot;Jarvis, A.&quot;,Andy Jarvis: 0000-0001-6543-0798
&quot;Jarvis, Andy&quot;,Andy Jarvis: 0000-0001-6543-0798
&quot;Jarvis, Andrew&quot;,Andy Jarvis: 0000-0001-6543-0798
</code></pre><h2 id="20180626">2018-06-26</h2>
</code></pre><h2 id="2018-06-26">2018-06-26</h2>
<ul>
<li>Atmire got back to me to say that we can remove the <code>itemCollectionPlugin</code> and <code>HasBitstreamsSSIPlugin</code> beans from DSpace's <code>discovery.xml</code> file, as they are used by the Metadata Quality Module (MQM) that we are not using anymore</li>
<li>I removed both those beans and did some simple tests to check item submission, media-filter of PDFs, REST API, but got an error &ldquo;No matches for the query&rdquo; when listing records in OAI</li>
@ -438,7 +438,7 @@ Done.
<li>It's actually only a warning and it also appears in the logs on DSpace Test (which is currently running DSpace 5.5), so I need to keep troubleshooting</li>
<li>Ah, I think I just need to run <code>dspace oai import</code></li>
</ul>
<h2 id="20180627">2018-06-27</h2>
<h2 id="2018-06-27">2018-06-27</h2>
<ul>
<li>Vika from CIFOR sent back his annotations on the duplicates for the &ldquo;CIFOR_May_9&rdquo; archive import that I sent him last week</li>
<li>I'll have to figure out how to separate those we're keeping, deleting, and mapping into CIFOR's archive collection</li>
@ -471,7 +471,7 @@ $ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 &gt; map-to-cifor-archive.csv
<li>After deleting the 62 duplicates, mapping the 50 items from elsewhere in CGSpace, and uploading 2,398 unique items, there are a total of 2,448 items added in this batch</li>
<li>I'll let Abenet take one last look and then move them to CGSpace</li>
</ul>
<h2 id="20180628">2018-06-28</h2>
<h2 id="2018-06-28">2018-06-28</h2>
<ul>
<li>DSpace Test appears to have crashed last night</li>
<li>There is nothing in the Tomcat or DSpace logs, but I see the following in <code>dmesg -T</code>:</li>

View File

@ -33,7 +33,7 @@ During the mvn package stage on the 5.8 branch I kept getting issues with java r
There is insufficient memory for the Java Runtime Environment to continue.
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -114,7 +114,7 @@ There is insufficient memory for the Java Runtime Environment to continue.
</p>
</header>
<h2 id="20180701">2018-07-01</h2>
<h2 id="2018-07-01">2018-07-01</h2>
<ul>
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
</ul>
@ -147,12 +147,12 @@ $ dspace database migrate ignored
</code></pre><ul>
<li>After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have</li>
</ul>
<h2 id="20180702">2018-07-02</h2>
<h2 id="2018-07-02">2018-07-02</h2>
<ul>
<li>Discuss AgriKnowledge including our Handle identifier on their harvested items from CGSpace</li>
<li>They seem to be only interested in Gates-funded outputs, for example: <a href="https://www.agriknowledge.org/files/tm70mv21t">https://www.agriknowledge.org/files/tm70mv21t</a></li>
</ul>
<h2 id="20180703">2018-07-03</h2>
<h2 id="2018-07-03">2018-07-03</h2>
<ul>
<li>Finally finish with the CIFOR Archive records (a total of 2448):
<ul>
@ -213,7 +213,7 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
</code></pre><ul>
<li>Gotta check that out later&hellip;</li>
</ul>
<h2 id="20180704">2018-07-04</h2>
<h2 id="2018-07-04">2018-07-04</h2>
<ul>
<li>I verified that the autowire error indeed only occurs on Tomcat 8.5, but the application works fine on Tomcat 7</li>
<li>I have raised this in the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 compatibility ticket on Atmire's tracker</a></li>
@ -221,12 +221,12 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
<li>Also, Udana wants me to add &ldquo;Enhancing Sustainability Across Agricultural Systems&rdquo; to the WLE Phase II research themes so I created a ticket to track that (<a href="https://github.com/ilri/DSpace/issues/382">#382</a>)</li>
<li>I need to try to finish this DSpace 5.8 business first because I have too many branches with cherry-picks going on right now!</li>
</ul>
<h2 id="20180706">2018-07-06</h2>
<h2 id="2018-07-06">2018-07-06</h2>
<ul>
<li>CCAFS want me to add &ldquo;PII-FP2_MSCCCAFS&rdquo; to their Phase II project tags on CGSpace (<a href="https://github.com/ilri/DSpace/issues/383">#383</a>)</li>
<li>I'll do it in a batch with all the other metadata updates next week</li>
</ul>
<h2 id="20180708">2018-07-08</h2>
<h2 id="2018-07-08">2018-07-08</h2>
<ul>
<li>I was tempted to do the Linode instance upgrade on CGSpace (linode18), but after looking closely at the system backups I noticed that Solr isn't being backed up to S3</li>
<li>I apparently noticed this—and fixed it!—in <a href="/cgspace-notes/2016-07/">2016-07</a>, but it doesn't look like the backup has been updated since then!</li>
@ -246,7 +246,7 @@ $ ./resolve-orcids.py -i /tmp/2018-07-08-orcids.txt -o /tmp/2018-07-08-names.txt
</code></pre><ul>
<li>But after comparing to the existing list of names I didn't see much change, so I just ignored it</li>
</ul>
<h2 id="20180709">2018-07-09</h2>
<h2 id="2018-07-09">2018-07-09</h2>
<ul>
<li>Uptime Robot said that CGSpace was down for two minutes early this morning but I don't see anything in Tomcat logs or dmesg</li>
<li>Uptime Robot said that CGSpace was down for two minutes again later in the day, and this time I saw a memory error in Tomcat's <code>catalina.out</code>:</li>
@ -295,7 +295,7 @@ org.apache.solr.client.solrj.SolrServerException: IOException occured when talki
<li>Interestingly, the first time that I see <code>35.227.26.162</code> was on 2018-06-08</li>
<li>I've added <code>35.227.26.162</code> to the bot tagging logic in the nginx vhost</li>
</ul>
<h2 id="20180710">2018-07-10</h2>
<h2 id="2018-07-10">2018-07-10</h2>
<ul>
<li>Add &ldquo;United Kingdom government&rdquo; to sponsors (<a href="https://github.com/ilri/DSpace/issues/381">#381</a>)</li>
<li>Add &ldquo;Enhancing Sustainability Across Agricultural Systems&rdquo; to WLE Phase II Research Themes (<a href="https://github.com/ilri/DSpace/issues/382">#382</a>)</li>
@ -325,7 +325,7 @@ org.apache.solr.client.solrj.SolrServerException: IOException occured when talki
<li>He said there was a bug that caused his app to request a bunch of invalid URLs</li>
<li>I'll have to keep and eye on this and see how their platform evolves</li>
</ul>
<h2 id="20180711">2018-07-11</h2>
<h2 id="2018-07-11">2018-07-11</h2>
<ul>
<li>Skype meeting with Peter and Addis CGSpace team
<ul>
@ -336,7 +336,7 @@ org.apache.solr.client.solrj.SolrServerException: IOException occured when talki
</ul>
</li>
</ul>
<h2 id="20180712">2018-07-12</h2>
<h2 id="2018-07-12">2018-07-12</h2>
<ul>
<li>Uptime Robot said that CGSpace went down a few times last night, around 10:45 PM and 12:30 AM</li>
<li>Here are the top ten IPs from last night and this morning:</li>
@ -396,13 +396,13 @@ $ csvcut -c 1 &lt; /tmp/affiliations.csv &gt; /tmp/affiliations-1.csv
</code></pre><ul>
<li>We also need to discuss standardizing our countries and comparing our ORCID iDs</li>
</ul>
<h2 id="20180713">2018-07-13</h2>
<h2 id="2018-07-13">2018-07-13</h2>
<ul>
<li>Generate a list of affiliations for Peter and Abenet to go over so we can batch correct them before we deploy the new data visualization dashboard:</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv header;
COPY 4518
</code></pre><h2 id="20180715">2018-07-15</h2>
</code></pre><h2 id="2018-07-15">2018-07-15</h2>
<ul>
<li>Run all system updates on CGSpace, add latest metadata changes from last week, and start the Linode instance upgrade</li>
<li>After the upgrade I see we have more disk space available in the instance's dashboard, so I shut the instance down and resized it from 392GB to 650GB</li>
@ -447,7 +447,7 @@ $ ./resolve-orcids.py -i /tmp/2018-07-15-orcid-ids.txt -o /tmp/2018-07-15-resolv
<li>I will check with the CGSpace team to see if they want me to add these to CGSpace</li>
<li>Help Udana from WLE understand some Altmetrics concepts</li>
</ul>
<h2 id="20180718">2018-07-18</h2>
<h2 id="2018-07-18">2018-07-18</h2>
<ul>
<li>ICARDA sent me another refined list of ORCID iDs so I sorted and formatted them into our controlled vocabulary again</li>
<li>Participate in call with IWMI and WLE to discuss Altmetric, CGSpace, and social media</li>
@ -486,7 +486,7 @@ Vary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
</code></pre><h2 id="20180719">2018-07-19</h2>
</code></pre><h2 id="2018-07-19">2018-07-19</h2>
<ul>
<li>I tested a submission via SAF bundle to DSpace 5.8 and it worked fine</li>
<li>In addition to testing DSpace 5.8, I specifically wanted to see if the issue with specifying collections in metadata instead of on the command line would work (<a href="https://jira.duraspace.org/browse/DS-3583">DS-3583</a>)</li>
@ -497,7 +497,7 @@ X-XSS-Protection: 1; mode=block
<li>I told her that they need to start using more accurate dates for their issue dates</li>
<li>In the example item I looked at the DOI has a publish date of 2018-03-16, so they should really try to capture that</li>
</ul>
<h2 id="20180722">2018-07-22</h2>
<h2 id="2018-07-22">2018-07-22</h2>
<ul>
<li>I told the IWMI people that they can use <code>sort_by=3</code> in their OpenSearch query to sort the results by <code>dc.date.accessioned</code> instead of <code>dc.date.issued</code></li>
<li>They say that it is a burden for them to capture the issue dates, so I cautioned them that this is in their own benefit for future posterity and that everyone else on CGSpace manages to capture the issue dates!</li>
@ -510,7 +510,7 @@ X-XSS-Protection: 1; mode=block
<li>I finally informed Atmire that we're ready to proceed with deploying this to CGSpace and that they should advise whether we should wait about the SNAPSHOT versions in <code>pom.xml</code></li>
<li>There is no word on the issue I reported with Tomcat 8.5.32 yet, though&hellip;</li>
</ul>
<h2 id="20180723">2018-07-23</h2>
<h2 id="2018-07-23">2018-07-23</h2>
<ul>
<li>Still discussing dates with IWMI</li>
<li>I looked in the database to see the breakdown of date formats used in <code>dc.date.issued</code>, ie YYYY, YYYY-MM, or YYYY-MM-DD:</li>
@ -532,11 +532,11 @@ dspace=# select count(text_value) from metadatavalue where resource_type_id=2 an
</code></pre><ul>
<li>So it looks like YYYY is the most numerious, followed by YYYY-MM-DD, then YYYY-MM</li>
</ul>
<h2 id="20180726">2018-07-26</h2>
<h2 id="2018-07-26">2018-07-26</h2>
<ul>
<li>Run system updates on DSpace Test (linode19) and reboot the server</li>
</ul>
<h2 id="20180727">2018-07-27</h2>
<h2 id="2018-07-27">2018-07-27</h2>
<ul>
<li>Follow up with Atmire again about the SNAPSHOT versions in our <code>pom.xml</code> because I want to finalize the DSpace 5.8 upgrade soon and I haven't heard from them in a month (<a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">ticket 560</a>)</li>
</ul>

View File

@ -43,7 +43,7 @@ Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did
The server only has 8GB of RAM so we&#39;ll eventually need to upgrade to a larger one because we&#39;ll start starving the OS, PostgreSQL, and command line batch processes
I ran all system updates on DSpace Test and rebooted it
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -124,7 +124,7 @@ I ran all system updates on DSpace Test and rebooted it
</p>
</header>
<h2 id="20180801">2018-08-01</h2>
<h2 id="2018-08-01">2018-08-01</h2>
<ul>
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
@ -149,7 +149,7 @@ I ran all system updates on DSpace Test and rebooted it
</ul>
</li>
</ul>
<h2 id="20180802">2018-08-02</h2>
<h2 id="2018-08-02">2018-08-02</h2>
<ul>
<li>DSpace Test crashed again and I don't see the only error I see is this in <code>dmesg</code>:</li>
</ul>
@ -165,7 +165,7 @@ I ran all system updates on DSpace Test and rebooted it
<li>I just tried to enable the stats again on DSpace Test now that we're on DSpace 5.8 with updated Atmire modules, but every user I search for shows &ldquo;No data available&rdquo;</li>
<li>As a test I submitted a new item and I was able to see it in the workflow statistics &ldquo;data&rdquo; tab, but not in the graph</li>
</ul>
<h2 id="20180815">2018-08-15</h2>
<h2 id="2018-08-15">2018-08-15</h2>
<ul>
<li>Run through Peter's list of author affiliations from earlier this month</li>
<li>I did some quick sanity checks and small cleanups in Open Refine, checking for spaces, weird accents, and encoding errors</li>
@ -173,7 +173,7 @@ I ran all system updates on DSpace Test and rebooted it
</ul>
<pre><code>$ ./fix-metadata-values.py -i 2018-08-15-Correct-1083-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t correct -m 211
$ ./delete-metadata-values.py -i 2018-08-15-Remove-11-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
</code></pre><h2 id="20180816">2018-08-16</h2>
</code></pre><h2 id="2018-08-16">2018-08-16</h2>
<ul>
<li>Generate a list of the top 1,500 authors on CGSpace for Sisay so he can create the controlled vocabulary:</li>
</ul>
@ -194,7 +194,7 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest ~/Downloads/cgspace_2018-08-16.backup
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
</code></pre><h2 id="20180819">2018-08-19</h2>
</code></pre><h2 id="2018-08-19">2018-08-19</h2>
<ul>
<li>Keep working on the CIAT ORCID identifiers from Elizabeth</li>
<li>In the spreadsheet she sent me there are some names with other versions in the database, so when it is obviously the same one (ie &ldquo;Schultze-Kraft, Rainer&rdquo; and &ldquo;Schultze-Kraft, R.&quot;) I will just tag them with ORCID identifiers too</li>
@ -296,7 +296,7 @@ sys 2m20.248s
</code></pre><ul>
<li>So I'm thinking we should add &ldquo;crawl&rdquo; to the Tomcat Crawler Session Manager valve, as we already have &ldquo;bot&rdquo; that catches Googlebot, Bingbot, etc.</li>
</ul>
<h2 id="20180820">2018-08-20</h2>
<h2 id="2018-08-20">2018-08-20</h2>
<ul>
<li>Help Sisay with some UTF-8 encoding issues in a file Peter sent him</li>
<li>Finish up reconciling Atmire's pull request for DSpace 5.8 changes with the latest status of our <code>5_x-prod</code> branch</li>
@ -313,7 +313,7 @@ sys 2m20.248s
<li>Instead, I will archive the current <code>5_x-prod</code> DSpace 5.5 branch as <code>5_x-prod-dspace-5.5</code> and then hard reset <code>5_x-prod</code> based on <code>5_x-dspace-5.8</code></li>
<li>Unfortunately this will mess up the references in pull requests and issues on GitHub</li>
</ul>
<h2 id="20180821">2018-08-21</h2>
<h2 id="2018-08-21">2018-08-21</h2>
<ul>
<li>Something must have happened, as the <code>mvn package</code> <em>always</em> takes about two hours now, stopping for a very long time near the end at this step:</li>
</ul>
@ -335,7 +335,7 @@ sys 2m20.248s
<li>I need to test to see if this has any side effects when deployed&hellip;</li>
<li>In other news, I see there was a pull request in DSpace 5.9 that fixes the issue with not being able to have blank lines in CSVs when importing via command line or webui (<a href="https://jira.duraspace.org/browse/DS-3245">DS-3245</a>)</li>
</ul>
<h2 id="20180823">2018-08-23</h2>
<h2 id="2018-08-23">2018-08-23</h2>
<ul>
<li>Skype meeting with CKM people to meet new web dev guy Tariku</li>
<li>They say they want to start working on the ContentDM harvester middleware again</li>
@ -345,7 +345,7 @@ sys 2m20.248s
<li>I imported the CTA items on CGSpace for Sisay:</li>
</ul>
<pre><code>$ dspace import -a -e s.webshet@cgiar.org -s /home/swebshet/ictupdates_uploads_August_21 -m /tmp/2018-08-23-cta-ictupdates.map
</code></pre><h2 id="20180826">2018-08-26</h2>
</code></pre><h2 id="2018-08-26">2018-08-26</h2>
<ul>
<li>Doing the DSpace 5.8 upgrade on CGSpace (linode18)</li>
<li>I already finished the Maven build, now I'll take a backup of the PostgreSQL database and do a database cleanup just in case:</li>
@ -401,14 +401,14 @@ $ dspace database migrate ignored
<li>I just checked to see if the Listings and Reports issue with using the CGSpace citation field was fixed as planned alongside the DSpace 5.8 upgrades (<a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589">#589</a></li>
<li>I was able to create a new layout containing only the citation field, so I closed the ticket</li>
</ul>
<h2 id="20180829">2018-08-29</h2>
<h2 id="2018-08-29">2018-08-29</h2>
<ul>
<li>Discuss <a href="https://copo-project.org/copo/">COPO</a> with Martin Mueller</li>
<li>He and the consortium's idea is to use this for metadata annotation (submission?) to all repositories</li>
<li>It is somehow related to adding events as items in the repository, and then linking related papers, presentations, etc to the event item using <code>dc.relation</code>, etc.</li>
<li>Discuss Linode server charges with Abenet, apparently we want to start charging these to Big Data</li>
</ul>
<h2 id="20180830">2018-08-30</h2>
<h2 id="2018-08-30">2018-08-30</h2>
<ul>
<li>I fixed the graphical glitch in the cookieconsent popup (the dismiss bug is still there) by pinning the last known good version (3.0.6) in <code>bower.json</code> of each XMLUI theme</li>
<li>I guess cookieconsent got updated without me realizing it and the previous expression <code>^3.0.6</code> make bower install version 3.1.0</li>

View File

@ -27,7 +27,7 @@ I&#39;ll update the DSpace role in our Ansible infrastructure playbooks and run
Also, I&#39;ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system&#39;s RAM, and we never re-ran them after migrating to larger Linodes last month
I&#39;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&#39;m getting those autowire errors in Tomcat 8.5.30 again:
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -108,7 +108,7 @@ I&#39;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&#
</p>
</header>
<h2 id="20180902">2018-09-02</h2>
<h2 id="2018-09-02">2018-09-02</h2>
<ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
@ -139,7 +139,7 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
<li>And the <code>5_x-prod</code> DSpace 5.8 branch does work in Tomcat 8.5.x on my Arch Linux laptop&hellip;</li>
<li>I'm not sure where the issue is then!</li>
</ul>
<h2 id="20180903">2018-09-03</h2>
<h2 id="2018-09-03">2018-09-03</h2>
<ul>
<li>Abenet says she's getting three emails about periodic statistics reports every day since the DSpace 5.8 upgrade last week</li>
<li>They are from the CUA module</li>
@ -148,7 +148,7 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
<li>She will try to click the &ldquo;Unsubscribe&rdquo; link in the first two to see if it works, otherwise we should contact Atmire</li>
<li>The only one she remembers subscribing to is the top downloads one</li>
</ul>
<h2 id="20180904">2018-09-04</h2>
<h2 id="2018-09-04">2018-09-04</h2>
<ul>
<li>I'm looking over the latest round of IITA records from Sisay: <a href="https://dspacetest.cgiar.org/handle/10568/104230">Mercy1806_August_29</a>
<ul>
@ -171,7 +171,7 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
</li>
<li>Abenet says she hasn't received any more subscription emails from the CUA module since she unsubscribed yesterday, so I think we don't need create an issue on Atmire's bug tracker anymore</li>
</ul>
<h2 id="20180910">2018-09-10</h2>
<h2 id="2018-09-10">2018-09-10</h2>
<ul>
<li>Playing with <a href="https://github.com/eykhagen/strest">strest</a> to test the DSpace REST API programatically</li>
<li>For example, given this <code>test.yaml</code>:</li>
@ -287,7 +287,7 @@ X-XSS-Protection: 1; mode=block
</code></pre><ul>
<li>I will have to keep an eye on it and perhaps add it to the list of &ldquo;bad bots&rdquo; that get rate limited</li>
</ul>
<h2 id="20180912">2018-09-12</h2>
<h2 id="2018-09-12">2018-09-12</h2>
<ul>
<li>Merge AReS explorer changes to nginx config and deploy on CGSpace so CodeObia can start testing more</li>
<li>Re-create my local Docker container for PostgreSQL data, but using a volume for the database data:</li>
@ -301,7 +301,7 @@ $ sudo docker run --name dspacedb -v dspacetest_data:/var/lib/postgresql/data -e
<li>I told Sisay to run the XML file through tidy</li>
<li>More testing of the access and usage rights changes</li>
</ul>
<h2 id="20180913">2018-09-13</h2>
<h2 id="2018-09-13">2018-09-13</h2>
<ul>
<li>Peter was communicating with Altmetric about the OAI mapping issue for item <a href="https://cgspace.cgiar.org/oai/request?verb=GetRecord&amp;metadataPrefix=oai_dc&amp;identifier=oai:cgspace.cgiar.org:10568/82810">10568/82810</a> again</li>
<li>Altmetric said it was somehow related to the OAI <code>dateStamp</code> not getting updated when the mappings changed, but I said that back in <a href="/cgspace-notes/2018-07/">2018-07</a> when this happened it was because the OAI was actually just not reflecting all the item's mappings</li>
@ -348,12 +348,12 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
<li>Must have been something like an old DSpace 5.5 file in the spring folder&hellip; weird</li>
<li>But yay, this means we can update DSpace Test to Ubuntu 18.04, Tomcat 8, PostgreSQL 9.6, etc&hellip;</li>
</ul>
<h2 id="20180914">2018-09-14</h2>
<h2 id="2018-09-14">2018-09-14</h2>
<ul>
<li>Sisay uploaded the IITA records to CGSpace, but forgot to remove the old Handles</li>
<li>I explicitly told him not to forget to remove them yesterday!</li>
</ul>
<h2 id="20180916">2018-09-16</h2>
<h2 id="2018-09-16">2018-09-16</h2>
<ul>
<li>Add the DSpace build.properties as a template into my <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> for configuring DSpace machines</li>
<li>One stupid thing there is that I add all the variables in a private vars file, which is apparently higher precedence than host vars, meaning that I can't override them (like SMTP server) on a per-host basis</li>
@ -361,7 +361,7 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
<li>I suggested that we leave access rights (<code>cg.identifier.access</code>) as it is now, with &ldquo;Open Access&rdquo; or &ldquo;Limited Access&rdquo;, and then simply re-brand that as &ldquo;Access rights&rdquo; in the UIs and relevant drop downs</li>
<li>Then we continue as planned to add <code>dc.rights</code> as &ldquo;Usage rights&rdquo;</li>
</ul>
<h2 id="20180917">2018-09-17</h2>
<h2 id="2018-09-17">2018-09-17</h2>
<ul>
<li>Skype meeting with CGSpace team in Addis</li>
<li>Change <code>cg.identifier.status</code> &ldquo;Access rights&rdquo; options to:
@ -418,7 +418,7 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
<li>That one returns 766, which is exactly 1655 minus 889&hellip;</li>
<li>Also, Solr's <code>fq</code> is similar to the regular <code>q</code> query parameter, but it is considered for the Solr query cache so it should be faster for multiple queries</li>
</ul>
<h2 id="20180918">2018-09-18</h2>
<h2 id="2018-09-18">2018-09-18</h2>
<ul>
<li>I managed to create a simple proof of concept REST API to expose item view and download statistics: <a href="https://github.com/alanorth/cgspace-statistics-api">cgspace-statistics-api</a></li>
<li>It uses the Python-based <a href="https://falcon.readthedocs.io">Falcon</a> web framework and talks to Solr directly using the <a href="https://github.com/moonlitesolutions/SolrClient">SolrClient</a> library (which seems to have issues in Python 3.7 currently)</li>
@ -439,12 +439,12 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
</code></pre><ul>
<li>The rest of the Falcon tooling will be more difficult&hellip;</li>
</ul>
<h2 id="20180919">2018-09-19</h2>
<h2 id="2018-09-19">2018-09-19</h2>
<ul>
<li>I emailed Jane Poole to ask if there is some money we can use from the Big Data Platform (BDP) to fund the purchase of some Atmire credits for CGSpace</li>
<li>I learned that there is an efficient way to do <a href="http://yonik.com/solr/paging-and-deep-paging/">&ldquo;deep paging&rdquo; in large Solr results sets by using <code>cursorMark</code></a>, but it doesn't work with faceting</li>
</ul>
<h2 id="20180920">2018-09-20</h2>
<h2 id="2018-09-20">2018-09-20</h2>
<ul>
<li>Contact Atmire to ask how we can buy more credits for future development</li>
<li>I researched the Solr <code>filterCache</code> size and I found out that the formula for calculating the potential memory use of <strong>each entry</strong> in the cache is:</li>
@ -460,7 +460,7 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
<li><a href="https://docs.google.com/document/d/1vl-nmlprSULvNZKQNrqp65eLnLhG9s_ydXQtg9iML10/edit">Article discussing testing methodology for different <code>filterCache</code> sizes</a></li>
<li>Discuss Handle links on Twitter with IWMI</li>
</ul>
<h2 id="20180921">2018-09-21</h2>
<h2 id="2018-09-21">2018-09-21</h2>
<ul>
<li>I see that there was a nice optimization to the ImageMagick PDF CMYK detection in the upstream <code>dspace-5_x</code> branch: <a href="https://github.com/DSpace/DSpace/pull/2204">DS-3664</a></li>
<li>The fix will go into DSpace 5.10, and we are currently on DSpace 5.8 but I think I'll cherry-pick that fix into our <code>5_x-prod</code> branch:
@ -475,14 +475,14 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
</ul>
</li>
</ul>
<h2 id="20190923">2019-09-23</h2>
<h2 id="2019-09-23">2019-09-23</h2>
<ul>
<li>I did more work on my <a href="https://github.com/alanorth/cgspace-statistics-api">cgspace-statistics-api</a>, fixing some item view counts and adding indexing via SQLite (I'm trying to avoid having to set up <em>yet another</em> database, user, password, etc) during deployment</li>
<li>I created a new branch called <code>5_x-upstream-cherry-picks</code> to test and track those cherry-picks from the upstream 5.x branch</li>
<li>Also, I need to test the new LDAP server, so I will deploy that on DSpace Test today</li>
<li>Rename my cgspace-statistics-api to <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> on GitHub</li>
</ul>
<h2 id="20180924">2018-09-24</h2>
<h2 id="2018-09-24">2018-09-24</h2>
<ul>
<li>Trying to figure out how to get item views and downloads from SQLite in a join</li>
<li>It appears SQLite doesn't support <code>FULL OUTER JOIN</code> so some people on StackOverflow have emulated it with <code>LEFT JOIN</code> and <code>UNION</code>:</li>
@ -539,7 +539,7 @@ $ createuser -h localhost -U postgres --pwprompt dspacestatistics
$ psql -h localhost -U postgres dspacestatistics
dspacestatistics=&gt; CREATE TABLE IF NOT EXISTS items
dspacestatistics-&gt; (id INT PRIMARY KEY, views INT DEFAULT 0, downloads INT DEFAULT 0)
</code></pre><h2 id="20180925">2018-09-25</h2>
</code></pre><h2 id="2018-09-25">2018-09-25</h2>
<ul>
<li>I deployed the DSpace statistics API on CGSpace, but when I ran the indexer it wanted to index 180,000 pages of item views</li>
<li>I'm not even sure how that's possible, as we only have 74,000 items!</li>
@ -586,7 +586,7 @@ Indexing item downloads (page 260 of 260)
</code></pre><ul>
<li>And now it's fast as hell due to the muuuuch smaller Solr statistics core</li>
</ul>
<h2 id="20180926">2018-09-26</h2>
<h2 id="2018-09-26">2018-09-26</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) was using 30Mb/sec of outward bandwidth for two hours around midnight</li>
<li>I don't see anything unusual in the nginx logs, so perhaps it was the cron job that syncs the Solr database to Amazon S3?</li>
@ -616,7 +616,7 @@ sys 2m18.485s
<li>I updated the dspace-statistiscs-api to use psycopg2's <code>execute_values()</code> to insert batches of 100 values into PostgreSQL instead of doing every insert individually</li>
<li>On CGSpace this reduces the total run time of <code>indexer.py</code> from 432 seconds to 400 seconds (most of the time is actually spent in getting the data from Solr though)</li>
</ul>
<h2 id="20180927">2018-09-27</h2>
<h2 id="2018-09-27">2018-09-27</h2>
<ul>
<li>Linode emailed to say that CGSpace's (linode19) CPU load was high for a few hours last night</li>
<li>Looking in the nginx logs around that time I see some new IPs that look like they are harvesting things:</li>
@ -645,7 +645,7 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=68.6.87.12' dspace.log.2018-09-26
<li>I will add their IPs to the list of bad bots in nginx so we can add a &ldquo;bot&rdquo; user agent to them and let Tomcat's Crawler Session Manager Valve handle them</li>
<li>I asked Atmire to prepare an invoice for 125 credits</li>
</ul>
<h2 id="20180929">2018-09-29</h2>
<h2 id="2018-09-29">2018-09-29</h2>
<ul>
<li>I merged some changes to author affiliations from Sisay as well as some corrections to organizational names using smart quotes like <code>Université dAbomey Calavi</code> (<a href="https://github.com/ilri/DSpace/pull/388">#388</a>)</li>
<li>Peter sent me a list of 43 author names to fix, but it had some encoding errors like <code>Belalcázar, John</code> like usual (I will tell him to stop trying to export as UTF-8 because it never seems to work)</li>
@ -662,7 +662,7 @@ $ ./fix-metadata-values.py -i 2018-09-29-fix-authors.csv -db dspace -u dspace -p
<li>It seems to be Moayad trying to do the AReS explorer indexing</li>
<li>He was sending too many (5 or 10) concurrent requests to the server, but still&hellip; why is this shit so slow?!</li>
</ul>
<h2 id="20180930">2018-09-30</h2>
<h2 id="2018-09-30">2018-09-30</h2>
<ul>
<li>Valerio keeps sending items on CGSpace that have weird or incorrect languages, authors, etc</li>
<li>I think I should just batch export and update all languages&hellip;</li>

View File

@ -23,7 +23,7 @@ I created a GitHub issue to track this #389, because I&#39;m super busy in Nairo
Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
I created a GitHub issue to track this #389, because I&#39;m super busy in Nairobi right now
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -104,12 +104,12 @@ I created a GitHub issue to track this #389, because I&#39;m super busy in Nairo
</p>
</header>
<h2 id="20181001">2018-10-01</h2>
<h2 id="2018-10-01">2018-10-01</h2>
<ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
</ul>
<h2 id="20181003">2018-10-03</h2>
<h2 id="2018-10-03">2018-10-03</h2>
<ul>
<li>I see Moayad was busy collecting item views and downloads from CGSpace yesterday:</li>
</ul>
@ -193,7 +193,7 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
&quot;Thornton, Philip K.&quot;,Philip Thornton: 0000-0002-1854-0182
&quot;Thornton, Phillip&quot;,Philip Thornton: 0000-0002-1854-0182
&quot;Thornton, Phillip K.&quot;,Philip Thornton: 0000-0002-1854-0182
</code></pre><h2 id="20181004">2018-10-04</h2>
</code></pre><h2 id="2018-10-04">2018-10-04</h2>
<ul>
<li>Salem raised an issue that the dspace-statistics-api reports downloads for some items that have no bitstreams (like many limited access items)</li>
<li>Every item has at least a <code>LICENSE</code> bundle, and some have a <code>THUMBNAIL</code> bundle, but the indexing code is specifically checking for downloads from the <code>ORIGINAL</code> bundle
@ -213,24 +213,24 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
<li>I found a logic error in the dspace-statistics-api <code>indexer.py</code> script that was causing item views to be inserted into downloads</li>
<li>I tagged version 0.4.2 of the tool and redeployed it on CGSpace</li>
</ul>
<h2 id="20181005">2018-10-05</h2>
<h2 id="2018-10-05">2018-10-05</h2>
<ul>
<li>Meet with Peter, Abenet, and Sisay to discuss CGSpace meeting in Nairobi and Sisay's work plan</li>
<li>We agreed that he would do monthly updates of the controlled vocabularies and generate a new one for the top 1,000 AGROVOC terms</li>
<li>Add a link to <a href="https://cgspace.cgiar.org/explorer/">AReS explorer</a> to the CGSpace homepage introduction text</li>
</ul>
<h2 id="20181006">2018-10-06</h2>
<h2 id="2018-10-06">2018-10-06</h2>
<ul>
<li>Follow up with AgriKnowledge about including Handle links (<code>dc.identifier.uri</code>) on their item pages</li>
<li>In July, 2018 they had said their programmers would include the field in the next update of their website software</li>
<li><a href="https://repository.cimmyt.org/">CIMMYT's DSpace repository</a> is now running DSpace 5.x!</li>
<li>It's running OAI, but not REST, so I need to talk to Richard about that!</li>
</ul>
<h2 id="20181008">2018-10-08</h2>
<h2 id="2018-10-08">2018-10-08</h2>
<ul>
<li>AgriKnowledge says they're going to add the <code>dc.identifier.uri</code> to their item view in November when they update their website software</li>
</ul>
<h2 id="20181010">2018-10-10</h2>
<h2 id="2018-10-10">2018-10-10</h2>
<ul>
<li>Peter noticed that some recently added PDFs don't have thumbnails</li>
<li>When I tried to force them to be generated I got an error that I've never seen before:</li>
@ -249,7 +249,7 @@ org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.c
<li>This works, but I'm not sure what ImageMagick's long-term plan is if they are going to disable ALL image formats&hellip;</li>
<li>I suppose I need to enable a workaround for this in Ansible?</li>
</ul>
<h2 id="20181011">2018-10-11</h2>
<h2 id="2018-10-11">2018-10-11</h2>
<ul>
<li>I emailed DuraSpace to update <a href="https://duraspace.org/registry/entry/4188/?gvid=178">our entry in their DSpace registry</a> (the data was still on DSpace 3, JSPUI, etc)</li>
<li>Generate a list of the top 1500 values for <code>dc.subject</code> so Sisay can start making a controlled vocabulary for it:</li>
@ -288,7 +288,7 @@ COPY 10000
<li>CTA uploaded some infographics that are very tall and their thumbnails disrupt the item lists on the front page and in their communities and collections</li>
<li>I decided to constrain the max height of these to 200px using CSS (<a href="https://github.com/ilri/DSpace/pull/392">#392</a>)</li>
</ul>
<h2 id="20181013">2018-10-13</h2>
<h2 id="2018-10-13">2018-10-13</h2>
<ul>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li>Look through Peter's list of 746 author corrections in OpenRefine</li>
@ -308,7 +308,7 @@ COPY 10000
</code></pre><ul>
<li>I will apply these on CGSpace when I do the other updates tomorrow, as well as double check the high scoring ones to see if they are correct in Sisay's author controlled vocabulary</li>
</ul>
<h2 id="20181014">2018-10-14</h2>
<h2 id="2018-10-14">2018-10-14</h2>
<ul>
<li>Merge the authors controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/393">#393</a>), usage rights (<a href="https://github.com/ilri/DSpace/pull/394">#394</a>), and the upstream DSpace 5.x cherry-picks (<a href="https://github.com/ilri/DSpace/pull/395">#394</a>) into our <code>5_x-prod</code> branch</li>
<li>Switch to new CGIAR LDAP server on CGSpace, as it's been running (at least for authentication) on DSpace Test for the last few weeks, and I think they old one will be deprecated soon (today?)</li>
@ -330,7 +330,7 @@ COPY 10000
</li>
<li>I limited the tall thumbnails even further to 170px because Peter said CTA's were still too tall at 200px (<a href="https://github.com/ilri/DSpace/pull/396">#396</a>)</li>
</ul>
<h2 id="20181015">2018-10-15</h2>
<h2 id="2018-10-15">2018-10-15</h2>
<ul>
<li>Tomcat on DSpace Test (linode19) has somehow stopped running all the DSpace applications</li>
<li>I don't see anything in the Catalina logs or <code>dmesg</code>, and the Tomcat manager shows XMLUI, REST, OAI, etc all &ldquo;Running: false&rdquo;</li>
@ -353,7 +353,7 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2018-10-11.backup
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
</code></pre><h2 id="20181016">2018-10-16</h2>
</code></pre><h2 id="2018-10-16">2018-10-16</h2>
<ul>
<li>Generate a list of the schema on CGSpace so CodeObia can compare with MELSpace:</li>
</ul>
@ -401,7 +401,7 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,b
</code></pre><ul>
<li>I sent a mail to dspace-tech to ask how to profile this&hellip;</li>
</ul>
<h2 id="20181017">2018-10-17</h2>
<h2 id="2018-10-17">2018-10-17</h2>
<ul>
<li>I decided to update most of the existing metadata values that we have in <code>dc.rights</code> on CGSpace to be machine readable in SPDX format (with Creative Commons version if it was included)</li>
<li>Most of the are from Bioversity, and I asked Maria for permission before updating them</li>
@ -444,7 +444,7 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
<li>I made a pull request and merged the ORCID updates into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/397">#397</a>)</li>
<li>Improve the logic of name checking in my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script</li>
</ul>
<h2 id="20181018">2018-10-18</h2>
<h2 id="2018-10-18">2018-10-18</h2>
<ul>
<li>I granted MEL's deposit user admin access to IITA, CIP, Bioversity, and RTB communities on DSpace Test so they can start testing real depositing</li>
<li>After they do some tests and we check the values Enrico will send a formal email to Peter et al to ask that they start depositing officially</li>
@ -455,7 +455,7 @@ $ /usr/lib/postgresql/9.6/bin/pg_upgrade -b /usr/lib/postgresql/9.5/bin -B /usr/
$ exit
# systemctl start postgresql
# dpkg -r postgresql-9.5 postgresql-client-9.5 postgresql-contrib-9.5
</code></pre><h2 id="20181019">2018-10-19</h2>
</code></pre><h2 id="2018-10-19">2018-10-19</h2>
<ul>
<li>Help Francesca from Bioversity generate a report about items they uploaded in 2015 through 2018</li>
<li>Linode emailed me to say that CGSpace (linode18) had high CPU usage for a few hours this afternoon</li>
@ -475,7 +475,7 @@ $ exit
</code></pre><ul>
<li>5.9.6.51 is MegaIndex, which I've seen before&hellip;</li>
</ul>
<h2 id="20181020">2018-10-20</h2>
<h2 id="2018-10-20">2018-10-20</h2>
<ul>
<li>I was going to try to run Solr in Docker because I learned I can run Docker on Travis-CI (for testing my dspace-statistics-api), but the oldest official Solr images are for 5.5, and DSpace's Solr configuration is for 4.9</li>
<li>This means our existing Solr configuration doesn't run in Solr 5.5:</li>
@ -522,11 +522,11 @@ ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics]
</code></pre><ul>
<li>So I'm not sure why this bot uses so many sessionsis it because it requests very slowly?</li>
</ul>
<h2 id="20181021">2018-10-21</h2>
<h2 id="2018-10-21">2018-10-21</h2>
<ul>
<li>Discuss AfricaRice joining CGSpace</li>
</ul>
<h2 id="20181022">2018-10-22</h2>
<h2 id="2018-10-22">2018-10-22</h2>
<ul>
<li>Post message to Yammer about usage rights (dc.rights)</li>
<li>Change <code>build.properties</code> to use HTTPS for Handles in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a></li>
@ -546,7 +546,7 @@ UPDATE 76608
<li>Skype with Peter about ToRs for the AReS open source work and future plans to develop tools around the DSpace ecosystem</li>
<li>Help CGSpace users with some issues related to usage rights</li>
</ul>
<h2 id="20181023">2018-10-23</h2>
<h2 id="2018-10-23">2018-10-23</h2>
<ul>
<li>Improve the usage rights (dc.rights) on CGSpace again by adding the long names in the submission form, as well as adding versio 3.0 and Creative Commons Zero (CC0) public domain license (<a href="https://github.com/ilri/DSpace/pull/399">#399</a>)</li>
<li>Add &ldquo;usage rights&rdquo; to the XMLUI item display (<a href="https://github.com/ilri/DSpace/pull/400">#400</a>)</li>
@ -571,14 +571,14 @@ $ curl -X GET -H &quot;Content-Type: application/json&quot; -H &quot;Accept: app
<li>Improve the documentatin of my <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a></li>
<li>Email Modi and Jayashree from ICRISAT to ask if they want to join CGSpace as partners</li>
</ul>
<h2 id="20181024">2018-10-24</h2>
<h2 id="2018-10-24">2018-10-24</h2>
<ul>
<li>I deployed the new Creative Commons choices to the usage rights on the CGSpace submission form</li>
<li>Also, I deployed the changes to show usage rights on the item view</li>
<li>Re-work the <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> to use Python's native json instead of ujson to make it easier to deploy in places where we don't haveor don't want to havePython headers and a compiler (like containers)</li>
<li>Re-work the deployment of the API to use systemd's <code>EnvironmentFile</code> to read the environment variables instead of <code>Environment</code> in the <a href="https://github.com/ilri/rmg-ansible-public">RMG Ansible infrastructure scripts</a></li>
</ul>
<h2 id="20181025">2018-10-25</h2>
<h2 id="2018-10-25">2018-10-25</h2>
<ul>
<li>Send Peter and Jane a list of technical ToRs for AReS open source work:</li>
<li>Basic version of AReS that works with metadata fields present in default DSpace 5.x/6.x (for example author, date issued, type, subjects)
@ -595,7 +595,7 @@ $ curl -X GET -H &quot;Content-Type: application/json&quot; -H &quot;Accept: app
</li>
<li>Maria asked if we can add publisher (<code>dc.publisher</code>) to the advanced search filters, so I created a <a href="https://github.com/ilri/DSpace/issues/401">GitHub issue</a> to track it</li>
</ul>
<h2 id="20181028">2018-10-28</h2>
<h2 id="2018-10-28">2018-10-28</h2>
<ul>
<li>I forked the <a href="https://github.com/alanorth/SolrClient/tree/kazoo-2.5.0">SolrClient library and updated its kazoo dependency to be version 2.5.0</a> so we stop getting errors about &ldquo;async&rdquo; being a reserved keyword in Python 3.7</li>
<li>Then I re-generated the <code>requirements.txt</code> in the <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-library</a> and released version 0.5.2</li>
@ -606,12 +606,12 @@ $ curl -X GET -H &quot;Content-Type: application/json&quot; -H &quot;Accept: app
<li>I merged the changes for adding versionless Creative Commons licenses to the submission form to the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/403">#403</a>)</li>
<li>I will deploy them later this week</li>
</ul>
<h2 id="20181029">2018-10-29</h2>
<h2 id="2018-10-29">2018-10-29</h2>
<ul>
<li>I deployed the publisher and Creative Commons changes to CGSpace, ran all system updates, and rebooted the server</li>
<li>I sent the email to Jane Poole and ILRI ICT and Finance to start the admin process of getting a new Linode server for AReS</li>
</ul>
<h2 id="20181030">2018-10-30</h2>
<h2 id="2018-10-30">2018-10-30</h2>
<ul>
<li>Meet with the COPO guys to walk them through the CGSpace submission workflow and discuss CG core, REST API, etc
<ul>
@ -621,7 +621,7 @@ $ curl -X GET -H &quot;Content-Type: application/json&quot; -H &quot;Accept: app
</ul>
</li>
</ul>
<h2 id="20181031">2018-10-31</h2>
<h2 id="2018-10-31">2018-10-31</h2>
<ul>
<li>More discussion and planning for AReS open sourcing and Amman meeting in 2019-10</li>
<li>I did some work to clean up and improve the dspace-statistics-api README.md and project structure and <a href="https://github.com/ilri/dspace-statistics-api">moved it to the ILRI organization on GitHub</a></li>

View File

@ -33,7 +33,7 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
Today these are the top 10 IPs:
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -114,12 +114,12 @@ Today these are the top 10 IPs:
</p>
</header>
<h2 id="20181101">2018-11-01</h2>
<h2 id="2018-11-01">2018-11-01</h2>
<ul>
<li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="20181103">2018-11-03</h2>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
@ -218,7 +218,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-03
<li>I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later</li>
<li>Also, this is the third (?) time a mysterious IP on Hetzner has done this&hellip; who is this?</li>
</ul>
<h2 id="20181104">2018-11-04</h2>
<h2 id="2018-11-04">2018-11-04</h2>
<ul>
<li>Forward Peter's information about CGSpace financials to Modi from ICRISAT</li>
<li>Linode emailed about the CPU load and outgoing bandwidth on CGSpace (linode18) again</li>
@ -313,7 +313,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
<li>I added the &ldquo;most-popular&rdquo; pages to the list that return <code>X-Robots-Tag: none</code> to try to inform bots not to index or follow those pages</li>
<li>Also, I implemented an nginx rate limit of twelve requests per minute on all dynamic pages&hellip; I figure a human user might legitimately request one every five seconds</li>
</ul>
<h2 id="20181105">2018-11-05</h2>
<h2 id="2018-11-05">2018-11-05</h2>
<ul>
<li>I wrote a small Python script <a href="https://gist.github.com/alanorth/4ff81d5f65613814a66cb6f84fdf1fc5">add-dc-rights.py</a> to add usage rights (<code>dc.rights</code>) to CGSpace items based on the CSV Hector gave me from MARLO:</li>
</ul>
@ -336,7 +336,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
<li>29,000 requests from Facebook and none of the requests are to the dynamic pages I rate limited yesterday!</li>
<li>At least the Tomcat Crawler Session Manager Valve is working now&hellip;</li>
</ul>
<h2 id="20181106">2018-11-06</h2>
<h2 id="2018-11-06">2018-11-06</h2>
<ul>
<li>I updated all the <a href="https://github.com/ilri/DSpace/wiki/Scripts">DSpace helper Python scripts</a> to validate against PEP 8 using Flake8</li>
<li>While I was updating the <a href="https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50">rest-find-collections.py</a> script I noticed it was using <code>expand=all</code> to get the collection and community IDs</li>
@ -346,12 +346,12 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
</code></pre><ul>
<li>Average time with all expands was 14.3 seconds, and 12.8 seconds with <code>collections,subCommunities</code>, so <strong>1.5 seconds difference</strong>!</li>
</ul>
<h2 id="20181107">2018-11-07</h2>
<h2 id="2018-11-07">2018-11-07</h2>
<ul>
<li>Update my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to use a database management class with Python contexts so that connections and cursors are automatically opened and closed</li>
<li>Tag version 0.7.0 of the dspace-statistics-api</li>
</ul>
<h2 id="20181108">2018-11-08</h2>
<h2 id="2018-11-08">2018-11-08</h2>
<ul>
<li>I deployed verison 0.7.0 of the dspace-statistics-api on DSpace Test (linode19) so I can test it for a few days (and check the Munin stats to see the change in database connections) before deploying on CGSpace</li>
<li>I also enabled systemd's persistent journal by setting <a href="https://www.freedesktop.org/software/systemd/man/journald.conf.html"><code>Storage=persistent</code> in <em>journald.conf</em></a></li>
@ -362,12 +362,12 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
</ul>
</li>
</ul>
<h2 id="20181111">2018-11-11</h2>
<h2 id="2018-11-11">2018-11-11</h2>
<ul>
<li>I added tests to the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a>!</li>
<li>It runs with Python 3.5, 3.6, and 3.7 using pytest, including automatically on Travis CI!</li>
</ul>
<h2 id="20181113">2018-11-13</h2>
<h2 id="2018-11-13">2018-11-13</h2>
<ul>
<li>Help troubleshoot an issue with Judy Kimani submitting to the <a href="https://cgspace.cgiar.org/handle/10568/78">ILRI project reports, papers and documents</a> collection on CGSpace</li>
<li>For some reason there is an existing group for the &ldquo;Accept/Reject&rdquo; workflow step, but it's empty</li>
@ -377,21 +377,21 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
<li>As for the collection mappings I think I need to export the CSV from DSpace Test, add mappings for each type (ie Books go to IITA books collection, etc), then re-import to DSpace Test, then export from DSpace command line in &ldquo;migrate&rdquo; mode&hellip;</li>
<li>From there I should be able to script the removal of the old DSpace Test collection so they just go to the correct IITA collections on import into CGSpace</li>
</ul>
<h2 id="20181114">2018-11-14</h2>
<h2 id="2018-11-14">2018-11-14</h2>
<ul>
<li>Finally import the 277 IITA (ALIZZY1802) records to CGSpace</li>
<li>I had to export them from DSpace Test and import them into a temporary collection on CGSpace first, then export the collection as CSV to map them to new owning collections (IITA books, IITA posters, etc) with OpenRefine because DSpace's <code>dspace export</code> command doesn't include the collections for the items!</li>
<li>Delete all old IITA collections on DSpace Test and run <code>dspace cleanup</code> to get rid of all the bitstreams</li>
</ul>
<h2 id="20181115">2018-11-15</h2>
<h2 id="2018-11-15">2018-11-15</h2>
<ul>
<li>Deploy version 0.8.1 of the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to CGSpace (linode18)</li>
</ul>
<h2 id="20181118">2018-11-18</h2>
<h2 id="2018-11-18">2018-11-18</h2>
<ul>
<li>Request invoice from Wild Jordan for their meeting venue in January</li>
</ul>
<h2 id="20181119">2018-11-19</h2>
<h2 id="2018-11-19">2018-11-19</h2>
<ul>
<li>Testing corrections and deletions for AGROVOC (<code>dc.subject</code>) that Sisay and Peter were working on earlier this month:</li>
</ul>
@ -405,7 +405,7 @@ $ ./delete-metadata-values.py -i 2018-11-19-delete-agrovoc.csv -f dc.subject -m
<li>Generate a new list of the top 1500 AGROVOC subjects on CGSpace to send to Peter and Sisay:</li>
</ul>
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-11-19-top-1500-subject.csv WITH CSV HEADER;
</code></pre><h2 id="20181120">2018-11-20</h2>
</code></pre><h2 id="2018-11-20">2018-11-20</h2>
<ul>
<li>The Discovery re-indexing on CGSpace never finished yesterday&hellip; the command died after six minutes</li>
<li>The <code>dspace.log.2018-11-19</code> shows this at the time:</li>
@ -432,7 +432,7 @@ java.lang.IllegalStateException: DSpace kernel cannot be null
<ul>
<li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81592">Restoring Degraded Landscapes collection</a></li>
<li>a few items missing DOIs, but they are easily available on the publication page</li>
<li>clean up DOIs to use &ldquo;<a href="https://doi.org">https://doi.org</a>&rdquo; format</li>
<li>clean up DOIs to use &ldquo;<a href="https://doi.org%22">https://doi.org&quot;</a> format</li>
<li>clean up some cg.identifier.url to remove unneccessary query strings</li>
<li>remove columns with no metadata (river basin, place, target audience, isbn, uri, publisher, ispartofseries, subject)</li>
<li>fix column with invalid spaces in metadata field name (cg. subject. wle)</li>
@ -446,16 +446,16 @@ java.lang.IllegalStateException: DSpace kernel cannot be null
<li>these items will go to the <a href="https://dspacetest.cgiar.org/handle/10568/81589">Variability, Risks and Competing Uses collection</a></li>
<li>trim and collapse whitespace in all fields (lots in WLE subject!)</li>
<li>clean up some cg.identifier.url fields that had unneccessary anchors in their links</li>
<li>clean up DOIs to use &ldquo;<a href="https://doi.org">https://doi.org</a>&rdquo; format</li>
<li>clean up DOIs to use &ldquo;<a href="https://doi.org%22">https://doi.org&quot;</a> format</li>
<li>fix column with invalid spaces in metadata field name (cg. subject. wle)</li>
<li>remove columns with no metadata (place, target audience, isbn, uri, publisher, ispartofseries, subject)</li>
<li>remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine: <code>value.replace('<27>','')</code></li>
<li>I notice a few items using DOIs pointing at ICARDA's DSpace like: <a href="https://doi.org/20.500.11766/8178">https://doi.org/20.500.11766/8178</a>, which then points at the &ldquo;real&rdquo; DOI on the publisher's site&hellip; these should be using the real DOI instead of ICARDA's &ldquo;fake&rdquo; Handle DOI</li>
<li>I notice a few items using DOIs pointing at ICARDA's DSpace like: <a href="https://doi.org/20.500.11766/8178,">https://doi.org/20.500.11766/8178,</a> which then points at the &ldquo;real&rdquo; DOI on the publisher's site&hellip; these should be using the real DOI instead of ICARDA's &ldquo;fake&rdquo; Handle DOI</li>
<li>Some items missing DOIs, but they clearly have them if you look at the publisher's site</li>
</ul>
</li>
</ul>
<h2 id="20181122">2018-11-22</h2>
<h2 id="2018-11-22">2018-11-22</h2>
<ul>
<li>Tezira is having problems submitting to the <a href="https://cgspace.cgiar.org/handle/10568/24452">ILRI brochures</a> collection for some reason
<ul>
@ -466,7 +466,7 @@ java.lang.IllegalStateException: DSpace kernel cannot be null
</ul>
</li>
</ul>
<h2 id="20181126">2018-11-26</h2>
<h2 id="2018-11-26">2018-11-26</h2>
<ul>
<li><a href="https://cgspace.cgiar.org/handle/10568/97709">This WLE item</a> is issued on 2018-10 and accessioned on 2018-10-22 but does not show up in the <a href="https://cgspace.cgiar.org/handle/10568/41888">WLE R4D Learning Series</a> collection on CGSpace for some reason, and therefore does not show up on the WLE publication website</li>
<li>I tried to remove that collection from Discovery and do a simple re-index:</li>
@ -484,7 +484,7 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery
<li>More work on the AReS terms of reference for CodeObia</li>
<li>Erica from AgriKnowledge emailed me to say that they have implemented the changes in their item page UI so that they include the permanent identifier on items harvested from CGSpace, for example: <a href="https://www.agriknowledge.org/concern/generics/wd375w33s">https://www.agriknowledge.org/concern/generics/wd375w33s</a></li>
</ul>
<h2 id="20181127">2018-11-27</h2>
<h2 id="2018-11-27">2018-11-27</h2>
<ul>
<li>Linode alerted me that the outbound traffic rate on CGSpace (linode19) was very high</li>
<li>The top users this morning are:</li>
@ -519,7 +519,7 @@ $ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d
</li>
<li>Help Marianne troubleshoot some issue with items in their WLE collections and the WLE publicatons website</li>
</ul>
<h2 id="20181128">2018-11-28</h2>
<h2 id="2018-11-28">2018-11-28</h2>
<ul>
<li>Change the usage rights text a bit based on Maria Garruccio's feedback on &ldquo;all rights reserved&rdquo; (<a href="https://github.com/ilri/DSpace/pull/404">#404</a>)</li>
<li>Run all system updates on DSpace Test (linode19) and reboot the server</li>

View File

@ -33,7 +33,7 @@ Then I ran all system updates and restarted the server
I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -114,13 +114,13 @@ I noticed that there is another issue with PDF thumbnails on CGSpace, and I see
</p>
</header>
<h2 id="20181201">2018-12-01</h2>
<h2 id="2018-12-01">2018-12-01</h2>
<ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li>
</ul>
<h2 id="20181202">2018-12-02</h2>
<h2 id="2018-12-02">2018-12-02</h2>
<ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul>
@ -182,7 +182,7 @@ DEBUG: FC_WEIGHT didn't match
isNotNull(value.match(/.*\u00b4.*/)),
isNotNull(value.match(/.*\u007e.*/))
)
</code></pre><h2 id="20181203">2018-12-03</h2>
</code></pre><h2 id="2018-12-03">2018-12-03</h2>
<ul>
<li>I looked at the DSpace Ghostscript issue more and it seems to only affect certain PDFs&hellip;</li>
<li>I can successfully generate a thumbnail for another recent item (<a href="https://hdl.handle.net/10568/98394">10568/98394</a>), but not for <a href="https://hdl.handle.net/10568/98390">10568/98930</a></li>
@ -308,7 +308,7 @@ $ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg
</code></pre><ul>
<li>This has got to be part Ubuntu Tomcat packaging, and part DSpace 5.x Tomcat 8.5 readiness&hellip;?</li>
</ul>
<h2 id="20181204">2018-12-04</h2>
<h2 id="2018-12-04">2018-12-04</h2>
<ul>
<li>Last night Linode sent a message that the load on CGSpace (linode18) was too high, here's a list of the top users at the time and throughout the day:</li>
</ul>
@ -368,11 +368,11 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03
<li>In other news, it's good to see my re-work of the database connectivity in the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> actually caused a reduction of persistent database connections (from 1 to 0, but still!):</li>
</ul>
<p><img src="/cgspace-notes/2018/12/postgres_connections_db-month.png" alt="PostgreSQL connections day"></p>
<h2 id="20181205">2018-12-05</h2>
<h2 id="2018-12-05">2018-12-05</h2>
<ul>
<li>Discuss RSS issues with IWMI and WLE people</li>
</ul>
<h2 id="20181206">2018-12-06</h2>
<h2 id="2018-12-06">2018-12-06</h2>
<ul>
<li>Linode sent a message that the CPU usage of CGSpace (linode18) is too high last night</li>
<li>I looked in the logs and there's nothing particular going on:</li>
@ -404,7 +404,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
<li>It seems they are hitting the XMLUI's OpenSearch a bit, but mostly on the REST API so no issues here yet</li>
<li><code>Drupal</code> is already in the Tomcat Crawler Session Manager Valve's regex so that's good!</li>
</ul>
<h2 id="20181210">2018-12-10</h2>
<h2 id="2018-12-10">2018-12-10</h2>
<ul>
<li>I ran into Mia Signs in Addis and we discussed Altmetric as well as RSS feeds again
<ul>
@ -417,7 +417,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
</ul>
</li>
</ul>
<h2 id="20181211">2018-12-11</h2>
<h2 id="2018-12-11">2018-12-11</h2>
<ul>
<li>I checked the <a href="https://twitter.com/mralanorth/status/1072198292182892545">latest tweet of the IWMI item with a DOI</a> and it was <a href="https://cgspace.altmetric.com/details/50160871/twitter">picked up by Altmetric</a>
<ul>
@ -426,14 +426,14 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
</ul>
</li>
</ul>
<h2 id="20181213">2018-12-13</h2>
<h2 id="2018-12-13">2018-12-13</h2>
<ul>
<li>Oh this is very interesting: <a href="https://digitalarchive.worldfishcenter.org">WorldFish's repository is live now</a></li>
<li>It's running DSpace 5.9-SNAPSHOT running on KnowledgeArc and the OAI and REST interfaces are active at least</li>
<li>Also, I notice they ended up registering a Handle (they had been considering taking KnowledgeArc's advice to <em>not</em> use Handles!)</li>
<li>Did some coordination work on the hotel bookings for the January AReS workshop in Amman</li>
</ul>
<h2 id="20181217">2018-12-17</h2>
<h2 id="2018-12-17">2018-12-17</h2>
<ul>
<li>Linode alerted me twice today that the load on CGSpace (linode18) was very high</li>
<li>Looking at the nginx logs I see a few new IPs in the top 10:</li>
@ -457,15 +457,15 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
<li>I see that I added this bot to the Tomcat Crawler Session Manager valve in 2017-12 so its XMLUI sessions are getting re-used</li>
<li><code>2a01:4f8:173:1e85::2</code> is some new bot called <code>BLEXBot/1.0</code> which should be matching the existing &ldquo;bot&rdquo; pattern in the Tomcat Crawler Session Manager regex</li>
</ul>
<h2 id="20181218">2018-12-18</h2>
<h2 id="2018-12-18">2018-12-18</h2>
<ul>
<li>Open a <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=657">ticket</a> with Atmire to ask them to prepare the Metadata Quality Module for our DSpace 5.8 code</li>
</ul>
<h2 id="20181219">2018-12-19</h2>
<h2 id="2018-12-19">2018-12-19</h2>
<ul>
<li>Update Atmire Listings and Reports to add the journal title (<code>dc.source</code>) to bibliography and update example bibliography values (<a href="https://github.com/ilri/DSpace/pull/405">#405</a>)</li>
</ul>
<h2 id="20181220">2018-12-20</h2>
<h2 id="2018-12-20">2018-12-20</h2>
<ul>
<li>Testing compression of PostgreSQL backups with xz and gzip:</li>
</ul>
@ -531,7 +531,7 @@ UPDATE 1
</code></pre><ul>
<li>After all that I started a full Discovery reindex to get the index name changes and rights updates</li>
</ul>
<h2 id="20181229">2018-12-29</h2>
<h2 id="2018-12-29">2018-12-29</h2>
<ul>
<li>CGSpace went down today for a few minutes while I was at dinner and I quickly restarted Tomcat</li>
<li>The top IP addresses as of this evening are:</li>

View File

@ -47,7 +47,7 @@ I don&#39;t see anything interesting in the web server logs around that time tho
357 207.46.13.1
903 54.70.40.11
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -128,7 +128,7 @@ I don&#39;t see anything interesting in the web server logs around that time tho
</p>
</header>
<h2 id="20190102">2019-01-02</h2>
<h2 id="2019-01-02">2019-01-02</h2>
<ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don't see anything interesting in the web server logs around that time though:</li>
@ -173,7 +173,7 @@ Moving: 18497180 into core statistics-2018
<li>This could by why the outbound traffic rate was high, due to the S3 backup that run at 3:30AM&hellip;</li>
<li>Run all system updates on DSpace Test (linode19) and reboot the server</li>
</ul>
<h2 id="20190103">2019-01-03</h2>
<h2 id="2019-01-03">2019-01-03</h2>
<ul>
<li>Update local Docker image for DSpace PostgreSQL, re-using the existing data volume:</li>
</ul>
@ -271,7 +271,7 @@ org.apache.jasper.JasperException: /home.jsp (line: [214], column: [1]) /discove
</li>
<li>I sent a message to the dspace-tech mailing list to ask</li>
</ul>
<h2 id="20190104">2019-01-04</h2>
<h2 id="2019-01-04">2019-01-04</h2>
<ul>
<li>Linode sent a message last night that CGSpace (linode18) had high CPU usage, but I don't see anything around that time in the web server logs:</li>
</ul>
@ -403,7 +403,7 @@ In [14]: for row in result.fetchone():
</code></pre><ul>
<li>The SPARQL query comes from my notes in <a href="/cgspace-notes/2017-08/">2017-08</a></li>
</ul>
<h2 id="20190106">2019-01-06</h2>
<h2 id="2019-01-06">2019-01-06</h2>
<ul>
<li>I built a clean DSpace 5.8 installation from the upstream <code>dspace-5.8</code> tag and the issue with the XMLUI/JSPUI login is still there with Tomcat 8.5.37
<ul>
@ -413,7 +413,7 @@ In [14]: for row in result.fetchone():
</ul>
</li>
</ul>
<h2 id="20190107">2019-01-07</h2>
<h2 id="2019-01-07">2019-01-07</h2>
<ul>
<li>I built a clean DSpace 6.3 installation from the upstream <code>dspace-6.3</code> tag and the issue with the XMLUI/JSPUI login is still there with Tomcat 8.5.37
<ul>
@ -423,7 +423,7 @@ In [14]: for row in result.fetchone():
</ul>
</li>
</ul>
<h2 id="20190108">2019-01-08</h2>
<h2 id="2019-01-08">2019-01-08</h2>
<ul>
<li>Tim Donohue responded to my thread about the cookies on the dspace-tech mailing list
<ul>
@ -433,7 +433,7 @@ In [14]: for row in result.fetchone():
</ul>
</li>
</ul>
<h2 id="20190111">2019-01-11</h2>
<h2 id="2019-01-11">2019-01-11</h2>
<ul>
<li>Tezira wrote to say she has stopped receiving the <code>DSpace Submission Approved and Archived</code> emails from CGSpace as of January 2nd
<ul>
@ -442,11 +442,11 @@ In [14]: for row in result.fetchone():
</ul>
</li>
</ul>
<h2 id="20190114">2019-01-14</h2>
<h2 id="2019-01-14">2019-01-14</h2>
<ul>
<li>Day one of CGSpace AReS meeting in Amman</li>
</ul>
<h2 id="20190115">2019-01-15</h2>
<h2 id="2019-01-15">2019-01-15</h2>
<ul>
<li>Day two of CGSpace AReS meeting in Amman
<ul>
@ -477,7 +477,7 @@ In [14]: for row in result.fetchone():
1211 35.237.175.180
1830 66.249.64.155
2482 45.5.186.2
</code></pre><h2 id="20190116">2019-01-16</h2>
</code></pre><h2 id="2019-01-16">2019-01-16</h2>
<ul>
<li>Day three of CGSpace AReS meeting in Amman
<ul>
@ -719,7 +719,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
real 0m17.161s
user 0m16.205s
sys 0m2.396s
</code></pre><h2 id="20190117">2019-01-17</h2>
</code></pre><h2 id="2019-01-17">2019-01-17</h2>
<ul>
<li>Send reminder to Atmire about purchasing the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=657">MQM module</a></li>
<li>Trying to decide the solid action points for CGSpace on the CG Core 2.0 metadata&hellip;</li>
@ -758,7 +758,7 @@ sys 0m2.396s
</ul>
</li>
</ul>
<h2 id="20190119">2019-01-19</h2>
<h2 id="2019-01-19">2019-01-19</h2>
<ul>
<li>
<p>There's no official set of Dublin Core qualifiers so I can't tell if things like <code>dc.contributor.author</code> that are used by DSpace are official</p>
@ -774,7 +774,7 @@ sys 0m2.396s
<p>These terms conform with the DCMI Abstract Model and may be used in DCMI application profiles. DCMI endorses their use with Dublin Core elements as indicated.</p>
</li>
</ul>
<h2 id="20190120">2019-01-20</h2>
<h2 id="2019-01-20">2019-01-20</h2>
<ul>
<li>That's weird, I logged into DSpace Test (linode19) and it says it has been up for 213 days:</li>
</ul>
@ -790,7 +790,7 @@ sys 0m2.396s
<li>The query currently shows 3023 items, but a <a href="https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&amp;filter_relational_operator_1=equals&amp;filter_1=Livestock&amp;submit_apply_filter=&amp;query=">Discovery search for Livestock CRP only returns 858 items</a></li>
<li>That query seems to return items tagged with <code>Livestock and Fish</code> CRP as well&hellip; hmm.</li>
</ul>
<h2 id="20190121">2019-01-21</h2>
<h2 id="2019-01-21">2019-01-21</h2>
<ul>
<li>Investigating running Tomcat 7 on Ubuntu 18.04 with the tarball and a custom systemd package instead of waiting for our DSpace to get compatible with Ubuntu 18.04's Tomcat 8.5</li>
<li>I could either run with a simple <code>tomcat7.service</code> like this:</li>
@ -909,7 +909,7 @@ $ http 'http://localhost:3000/solr/statistics/select?&amp;shards=localhost:8081/
&lt;result name=&quot;response&quot; numFound=&quot;275&quot; start=&quot;0&quot; maxScore=&quot;12.205825&quot;&gt;
$ http 'http://localhost:8081/solr/statistics/select?indent=on&amp;rows=0&amp;q=type:2+id:11576&amp;fq=isBot:false&amp;fq=statistics_type:view&amp;shards=localhost:8081/solr/statistics-2018' | grep numFound
&lt;result name=&quot;response&quot; numFound=&quot;241&quot; start=&quot;0&quot; maxScore=&quot;12.205825&quot;&gt;
</code></pre><h2 id="20190122">2019-01-22</h2>
</code></pre><h2 id="2019-01-22">2019-01-22</h2>
<ul>
<li>Release <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v0.9.0">version 0.9.0 of the dspace-statistics-api</a> to address the issue of querying multiple Solr statistics shards</li>
<li>I deployed it on DSpace Test (linode19) and restarted the indexer and now it shows all the stats from 2018 as well (756 pages of views, intead of 6)</li>
@ -937,7 +937,7 @@ $ http 'http://localhost:8081/solr/statistics/select?indent=on&amp;rows=0&amp;q=
<li>Another interesting one is 154.113.73.30, which is apparently at IITA Nigeria and uses the user agent:</li>
</ul>
<pre><code>Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36
</code></pre><h2 id="20190123">2019-01-23</h2>
</code></pre><h2 id="2019-01-23">2019-01-23</h2>
<ul>
<li>Peter noticed that some goo.gl links in our tweets from Feedburner are broken, for example this one from last week:</li>
</ul>
@ -1019,7 +1019,7 @@ $ schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace fi
<li>I think this Launchpad discussion is relevant: <a href="https://bugs.launchpad.net/ubuntu/+source/ghostscript/+bug/1806517">https://bugs.launchpad.net/ubuntu/+source/ghostscript/+bug/1806517</a></li>
<li>As well as the original Ghostscript bug report: <a href="https://bugs.ghostscript.com/show_bug.cgi?id=699815">https://bugs.ghostscript.com/show_bug.cgi?id=699815</a></li>
</ul>
<h2 id="20190124">2019-01-24</h2>
<h2 id="2019-01-24">2019-01-24</h2>
<ul>
<li>I noticed Ubuntu's Ghostscript 9.26 works on some troublesome PDFs where Arch's Ghostscript 9.26 doesn't, so the fix for the first/last page crash is not the patch I found yesterday</li>
<li>Ubuntu's Ghostscript uses another <a href="http://git.ghostscript.com/?p=ghostpdl.git;h=fae21f1668d2b44b18b84cf0923a1d5f3008a696">patch from Ghostscript git</a> (<a href="https://bugs.ghostscript.com/show_bug.cgi?id=700315">upstream bug report</a>)</li>
@ -1078,7 +1078,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
</li>
<li>I sent a message titled &ldquo;<a href="https://groups.google.com/forum/#!topic/dspace-tech/phV_t51TGuE">DC, QDC, and DCTERMS: reviewing our metadata practices</a>&rdquo; to the dspace-tech mailing list to ask about some of this</li>
</ul>
<h2 id="20190125">2019-01-25</h2>
<h2 id="2019-01-25">2019-01-25</h2>
<ul>
<li>A little bit more work on getting Tomcat to run from a tarball on our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a>
<ul>
@ -1090,7 +1090,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
</ul>
</li>
</ul>
<h2 id="20190127">2019-01-27</h2>
<h2 id="2019-01-27">2019-01-27</h2>
<ul>
<li>Linode sent an email that the server was using a lot of CPU this morning, and these were the top IPs in the web server logs at the time:</li>
</ul>
@ -1113,7 +1113,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
</ul>
</li>
</ul>
<h2 id="20190128">2019-01-28</h2>
<h2 id="2019-01-28">2019-01-28</h2>
<ul>
<li>Udana from WLE asked me about the interaction between their publication website and their items on CGSpace
<ul>
@ -1161,7 +1161,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
<li><code>199.47.87.140</code> and <code>199.47.87.141</code> is TurnItIn with the following user agent:</li>
</ul>
<pre><code>TurnitinBot (https://turnitin.com/robot/crawlerinfo.html)
</code></pre><h2 id="20190129">2019-01-29</h2>
</code></pre><h2 id="2019-01-29">2019-01-29</h2>
<ul>
<li>Linode sent an alert about CGSpace (linode18) CPU usage this morning, here are the top IPs in the web server logs just before, during, and after the alert:</li>
</ul>
@ -1186,7 +1186,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
</ul>
</li>
</ul>
<h2 id="20190130">2019-01-30</h2>
<h2 id="2019-01-30">2019-01-30</h2>
<ul>
<li>Got another alert from Linode about CGSpace (linode18) this morning, here are the top IPs before, during, and after the alert:</li>
</ul>
@ -1204,7 +1204,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
</code></pre><ul>
<li>I might need to adjust the threshold again, because the load average this morning was 296% and the activity looks pretty normal (as always recently)</li>
</ul>
<h2 id="20190131">2019-01-31</h2>
<h2 id="2019-01-31">2019-01-31</h2>
<ul>
<li>Linode sent alerts about CGSpace (linode18) last night and this morning, here are the top IPs before, during, and after those times:</li>
</ul>

View File

@ -69,7 +69,7 @@ real 0m19.873s
user 0m22.203s
sys 0m1.979s
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -150,7 +150,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20190201">2019-02-01</h2>
<h2 id="2019-02-01">2019-02-01</h2>
<ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
@ -186,7 +186,7 @@ sys 0m1.979s
</ul>
</li>
</ul>
<h2 id="20190202">2019-02-02</h2>
<h2 id="2019-02-02">2019-02-02</h2>
<ul>
<li>Another alert from Linode about CGSpace (linode18) this morning, here are the top IPs in the web server logs before, during, and after that time:</li>
</ul>
@ -206,7 +206,7 @@ sys 0m1.979s
<li>I will increase the Linode alert threshold from 275 to 300% because this is becoming too much!</li>
<li>I tested the Atmire Metadata Quality Module (MQM)&lsquo;s duplicate checked on the some <a href="https://dspacetest.cgiar.org/handle/10568/81268">WLE items</a> that I helped Udana with a few months ago on DSpace Test (linode19) and indeed it found many duplicates!</li>
</ul>
<h2 id="20190203">2019-02-03</h2>
<h2 id="2019-02-03">2019-02-03</h2>
<ul>
<li>This is seriously getting annoying, Linode sent another alert this morning that CGSpace (linode18) load was 377%!</li>
<li>Here are the top IPs before, during, and after that time:</li>
@ -268,7 +268,7 @@ sys 0m1.979s
</ul>
</li>
</ul>
<h2 id="20190204">2019-02-04</h2>
<h2 id="2019-02-04">2019-02-04</h2>
<ul>
<li>Generate a list of CTA subjects from CGSpace for Peter:</li>
</ul>
@ -294,7 +294,7 @@ COPY 321
<li>At this rate I think I just need to stop paying attention to these alerts—DSpace gets thrashed when people use the APIs properly and there's nothing we can do to improve REST API performance!</li>
<li>Perhaps I just need to keep increasing the Linode alert threshold (currently 300%) for this host?</li>
</ul>
<h2 id="20190205">2019-02-05</h2>
<h2 id="2019-02-05">2019-02-05</h2>
<ul>
<li>Peter sent me corrections and deletions for the CTA subjects and as usual, there were encoding errors with some accentsÁ in his file</li>
<li>In other news, it seems that the GREL syntax regarding booleans changed in OpenRefine recently, so I need to update some expressions like the one I use to detect encoding errors to use <code>toString()</code>:</li>
@ -328,7 +328,7 @@ MARKETING ET COMMERCE,MARKETING||COMMERCE
NATURAL RESOURCES AND ENVIRONMENT,NATURAL RESOURCES MANAGEMENT||ENVIRONMENT
PÊCHES ET AQUACULTURE,PÊCHES||AQUACULTURE
PESCAS E AQUACULTURE,PISCICULTURA||AQUACULTURE
</code></pre><h2 id="20190206">2019-02-06</h2>
</code></pre><h2 id="2019-02-06">2019-02-06</h2>
<ul>
<li>I dumped the CTA community so I can try to fix the subjects with multiple subjects that Peter indicated in his corrections:</li>
</ul>
@ -406,7 +406,7 @@ PESCAS E AQUACULTURE,PISCICULTURA||AQUACULTURE
4661 205.186.128.185
4661 70.32.83.92
5102 45.5.186.2
</code></pre><h2 id="20190207">2019-02-07</h2>
</code></pre><h2 id="2019-02-07">2019-02-07</h2>
<ul>
<li>Linode sent an alert last night that the load on CGSpace (linode18) was over 300%</li>
<li>Here are the top IPs in the web server and API logs before, during, and after that time, respectively:</li>
@ -491,7 +491,7 @@ Please see the DSpace documentation for assistance.
<li>I can't connect to TCP port 25 on that server so I sent a mail to CGNET support to ask what's up</li>
<li>CGNET said these servers were discontinued in 2018-01 and that I should use <a href="https://docs.microsoft.com/en-us/exchange/mail-flow-best-practices/how-to-set-up-a-multifunction-device-or-application-to-send-email-using-office-3">Office 365</a></li>
</ul>
<h2 id="20190208">2019-02-08</h2>
<h2 id="2019-02-08">2019-02-08</h2>
<ul>
<li>I re-configured CGSpace to use the email/password for cgspace-support, but I get this error when I try the <code>test-email</code> script:</li>
</ul>
@ -500,7 +500,7 @@ Please see the DSpace documentation for assistance.
</code></pre><ul>
<li>I tried to log into Outlook 365 with the credentials but I think the ones I have must be wrong, so I will ask ICT to reset the password</li>
</ul>
<h2 id="20190209">2019-02-09</h2>
<h2 id="2019-02-09">2019-02-09</h2>
<ul>
<li>Linode sent alerts about CPU load yesterday morning, yesterday night, and this morning! All over 300% CPU load!</li>
<li>This is just for this morning:</li>
@ -535,7 +535,7 @@ Please see the DSpace documentation for assistance.
</code></pre><ul>
<li>151.80.203.180 is on OVH so I sent a message to their abuse email&hellip;</li>
</ul>
<h2 id="20190210">2019-02-10</h2>
<h2 id="2019-02-10">2019-02-10</h2>
<ul>
<li>Linode sent another alert about CGSpace (linode18) CPU load this morning, here are the top IPs in the web server XMLUI and API logs before, during, and after that time:</li>
</ul>
@ -624,12 +624,12 @@ Please see the DSpace documentation for assistance.
# mkdir -p /home/aorth/.local/lib/containers/volumes/artifactory5_data
# chown 1030 /home/aorth/.local/lib/containers/volumes/artifactory5_data
# docker run --name artifactory --network dspace-build -d -v /home/aorth/.local/lib/containers/volumes/artifactory5_data:/var/opt/jfrog/artifactory -p 8081:8081 docker.bintray.io/jfrog/artifactory-oss
</code></pre><h2 id="20190211">2019-02-11</h2>
</code></pre><h2 id="2019-02-11">2019-02-11</h2>
<ul>
<li>Bosede from IITA said we can use &ldquo;SOCIAL SCIENCE &amp; AGRIBUSINESS&rdquo; in their new IITA theme field to be consistent with other places they are using it</li>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
</ul>
<h2 id="20190212">2019-02-12</h2>
<h2 id="2019-02-12">2019-02-12</h2>
<ul>
<li>I notice that <a href="https://jira.duraspace.org/browse/DS-3052">DSpace 6 has included a new JAR-based PDF thumbnailer based on PDFBox</a>, I wonder how good its thumbnails are and how it handles CMYK PDFs</li>
<li>On a similar note, I wonder if we could use the performance-focused <a href="https://libvips.github.io/libvips/">libvps</a> and the third-party <a href="https://github.com/codecitizen/jlibvips/">jlibvips Java library</a> in DSpace</li>
@ -658,7 +658,7 @@ dspacestatistics=# SELECT * FROM items WHERE downloads &gt; 0 ORDER BY downloads
</code></pre><ul>
<li>I will read the PDFBox thumbnailer documentation to see if I can change the size and quality</li>
</ul>
<h2 id="20190213">2019-02-13</h2>
<h2 id="2019-02-13">2019-02-13</h2>
<ul>
<li>ILRI ICT reset the password for the CGSpace mail account, but I still can't get it to send mail from DSpace's <code>test-email</code> utility</li>
<li>I even added extra mail properties to <code>dspace.cfg</code> as suggested by someone on the dspace-tech mailing list:</li>
@ -735,7 +735,7 @@ $ podman run --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspace
<li>I increased the nginx upload limit, but she said she was having problems and couldn't really tell me why</li>
<li>I logged in as her and completed the submission with no problems&hellip;</li>
</ul>
<h2 id="20190215">2019-02-15</h2>
<h2 id="2019-02-15">2019-02-15</h2>
<ul>
<li>Tomcat was killed around 3AM by the kernel's OOM killer according to <code>dmesg</code>:</li>
</ul>
@ -805,7 +805,7 @@ $ podman start artifactory
</code></pre><ul>
<li>More on the <a href="https://podman.io/blogs/2018/10/03/podman-remove-content-homedir.html">subuid permissions issue with rootless containers here</a></li>
</ul>
<h2 id="20190217">2019-02-17</h2>
<h2 id="2019-02-17">2019-02-17</h2>
<ul>
<li>I ran DSpace's cleanup task on CGSpace (linode18) and there were errors:</li>
</ul>
@ -821,7 +821,7 @@ UPDATE 1
<li>I merged the Atmire Metadata Quality Module (MQM) changes to the <code>5_x-prod</code> branch and deployed it on CGSpace (<a href="https://github.com/ilri/DSpace/pull/407">#407</a>)</li>
<li>Then I ran all system updates on CGSpace server and rebooted it</li>
</ul>
<h2 id="20190218">2019-02-18</h2>
<h2 id="2019-02-18">2019-02-18</h2>
<ul>
<li>Jesus fucking Christ, Linode sent an alert that CGSpace (linode18) was using 421% CPU for a few hours this afternoon (server time):</li>
<li>There seems to have been a lot of activity in XMLUI:</li>
@ -942,7 +942,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</code></pre><ul>
<li>I merged the changes to the <code>5_x-prod</code> branch and they will go live the next time we re-deploy CGSpace (<a href="https://github.com/ilri/DSpace/pull/412">#412</a>)</li>
</ul>
<h2 id="20190219">2019-02-19</h2>
<h2 id="2019-02-19">2019-02-19</h2>
<ul>
<li>Linode sent another alert about CPU usage on CGSpace (linode18) averaging 417% this morning</li>
<li>Unfortunately, I don't see any strange activity in the web server API or XMLUI logs at that time in particular</li>
@ -1028,7 +1028,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</code></pre><ul>
<li>I wrote a quick and dirty Python script called <code>resolve-addresses.py</code> to resolve IP addresses to their owning organization's name, ASN, and country using the <a href="https://ipapi.co">IPAPI.co API</a></li>
</ul>
<h2 id="20190220">2019-02-20</h2>
<h2 id="2019-02-20">2019-02-20</h2>
<ul>
<li>Ben Hack was asking about getting authors publications programmatically from CGSpace for the new ILRI website</li>
<li>I told him that they should probably try to use the REST API's <code>find-by-metadata-field</code> endpoint</li>
@ -1049,7 +1049,7 @@ $ curl -s -H &quot;accept: application/json&quot; -H &quot;Content-Type: applica
<li>See this <a href="https://jira.duraspace.org/browse/VIVO-1655">issue on the VIVO tracker</a> for more information about this endpoint</li>
<li>The old-school AGROVOC SOAP WSDL works with the <a href="https://python-zeep.readthedocs.io/en/master/">Zeep Python library</a>, but in my tests the results are way too broad despite trying to use a &ldquo;exact match&rdquo; searching</li>
</ul>
<h2 id="20190221">2019-02-21</h2>
<h2 id="2019-02-21">2019-02-21</h2>
<ul>
<li>I wrote a script <a href="https://github.com/ilri/DSpace/blob/5_x-prod/agrovoc-lookup.py">agrovoc-lookup.py</a> to resolve subject terms against the public AGROVOC REST API</li>
<li>It allows specifying the language the term should be queried in as well as output files to save the matched and unmatched terms to</li>
@ -1088,7 +1088,7 @@ COPY 33
</ul>
</li>
</ul>
<h2 id="20190222">2019-02-22</h2>
<h2 id="2019-02-22">2019-02-22</h2>
<ul>
<li>
<p>Help Udana from WLE with some issues related to CGSpace items on their <a href="https://www.wle.cgiar.org/publications">Publications website</a></p>
@ -1134,7 +1134,7 @@ return &quot;unmatched&quot;
<li>You have to make sure to URL encode the value with <code>quote_plus()</code> and it totally works, but it seems to refresh the facets (and therefore re-query everything) when you select a facet so that makes it basically unusable</li>
<li>There is a <a href="https://programminghistorian.org/en/lessons/fetch-and-parse-data-with-openrefine#example-2-url-queries-and-parsing-json">good resource discussing OpenRefine, Jython, and web scraping</a></li>
</ul>
<h2 id="20190224">2019-02-24</h2>
<h2 id="2019-02-24">2019-02-24</h2>
<ul>
<li>I decided to try to validate the AGROVOC subjects in IITA's recent batch upload by dumping all their terms, checking them in en/es/fr with <code>agrovoc-lookup.py</code>, then reconciling against the final list using reconcile-csv with OpenRefine</li>
<li>I'm not sure how to deal with terms like &ldquo;CORN&rdquo; that are alternative labels (<code>altLabel</code>) in AGROVOC where the preferred label (<code>prefLabel</code>) would be &ldquo;MAIZE&rdquo;</li>
@ -1163,7 +1163,7 @@ return &quot;unmatched&quot;
</ul>
</li>
</ul>
<h2 id="20190225">2019-02-25</h2>
<h2 id="2019-02-25">2019-02-25</h2>
<ul>
<li>There seems to be something going on with Solr on CGSpace (linode18) because statistics on communities and collections are blank for January and February this year</li>
<li>I see some errors started recently in Solr (yesterday):</li>
@ -1257,7 +1257,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
<ul>
<li>I still have not figured out what the <em>real</em> cause for the Solr cores to not load was, though</li>
</ul>
<h2 id="20190226">2019-02-26</h2>
<h2 id="2019-02-26">2019-02-26</h2>
<ul>
<li>I sent a mail to the dspace-tech mailing list about the &ldquo;solr_update_time_stamp&rdquo; error</li>
<li>A CCAFS user sent a message saying they got this error when submitting to CGSpace:</li>
@ -1268,7 +1268,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
<li>I looked at the <code>WORKFLOW_STEP_1</code> (Accept/Reject) and the group is of course empty</li>
<li>As we've seen several times recently, we are not using this step so it should simply be deleted</li>
</ul>
<h2 id="20190227">2019-02-27</h2>
<h2 id="2019-02-27">2019-02-27</h2>
<ul>
<li>Discuss batch uploads with Sisay</li>
<li>He's trying to upload some CTA records, but it's not possible to do collection mapping when using the web UI
@ -1291,7 +1291,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
</ul>
</li>
</ul>
<h2 id="20190228">2019-02-28</h2>
<h2 id="2019-02-28">2019-02-28</h2>
<ul>
<li>I helped Sisay upload the nineteen CTA records from last week via the command line because they required mappings (which is not possible to do via the batch upload web interface)</li>
</ul>

View File

@ -43,7 +43,7 @@ Most worryingly, there are encoding errors in the abstracts for eleven items, fo
I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -124,7 +124,7 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca
</p>
</header>
<h2 id="20190301">2019-03-01</h2>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
@ -139,7 +139,7 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul>
<h2 id="20190303">2019-03-03</h2>
<h2 id="2019-03-03">2019-03-03</h2>
<ul>
<li>Trying to finally upload IITA's 259 Feb 14 items to CGSpace so I exported them from DSpace Test:</li>
</ul>
@ -166,7 +166,7 @@ $ dspace export -i 10568/108684 -t COLLECTION -m -n 0 -d 2019-03-03-IITA-Feb14
</li>
<li>Deploy Tomcat 7.0.93 on CGSpace (linode18) after having tested it on DSpace Test (linode19) for a week</li>
</ul>
<h2 id="20190306">2019-03-06</h2>
<h2 id="2019-03-06">2019-03-06</h2>
<ul>
<li>Abenet was having problems with a CIP user account, I think that the user could not register</li>
<li>I suspect it's related to the email issue that ICT hasn't responded about since last week</li>
@ -184,7 +184,7 @@ Error sending email:
</code></pre><ul>
<li>I will send a follow-up to ICT to ask them to reset the password</li>
</ul>
<h2 id="20190307">2019-03-07</h2>
<h2 id="2019-03-07">2019-03-07</h2>
<ul>
<li>ICT reset the email password and I confirmed that it is working now</li>
<li>Generate a controlled vocabulary of 1187 AGROVOC subjects from the top 1500 that I checked last month, dumping the terms themselves using <code>csvcut</code> and then applying XML controlled vocabulary format in vim and then checking with tidy for good measure:</li>
@ -200,7 +200,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/dc-subject.x
</ul>
</li>
</ul>
<h2 id="20190308">2019-03-08</h2>
<h2 id="2019-03-08">2019-03-08</h2>
<ul>
<li>There's an issue with CGSpace right now where all items are giving a blank page in the XMLUI
<ul>
@ -223,7 +223,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/dc-subject.x
</ul>
</li>
</ul>
<h2 id="20190309">2019-03-09</h2>
<h2 id="2019-03-09">2019-03-09</h2>
<ul>
<li>I shared a post on Yammer informing our editors to try to AGROVOC controlled list</li>
<li>The SPDX legal committee had a meeting and discussed the addition of CC-BY-ND-3.0-IGO and other IGO licenses to their list, but it seems unlikely (<a href="https://github.com/spdx/license-list-XML/issues/767#issuecomment-470709673">spdx/license-list-XML/issues/767</a>)</li>
@ -241,7 +241,7 @@ UPDATE 44
</code></pre><ul>
<li>I ran the corrections on CGSpace and DSpace Test</li>
</ul>
<h2 id="20190310">2019-03-10</h2>
<h2 id="2019-03-10">2019-03-10</h2>
<ul>
<li>Working on tagging IITA's items with their new research theme (<code>cg.identifier.iitatheme</code>) based on their existing IITA subjects (see <a href="/cgspace-notes/2018-02/">notes from 2019-02</a>)</li>
<li>I exported the entire IITA community from CGSpace and then used <code>csvcut</code> to extract only the needed fields:</li>
@ -261,15 +261,15 @@ UPDATE 44
<li>In total this would add research themes to 1,755 items</li>
<li>I want to double check one last time with Bosede that they would like to do this, because I also see that this will tag a few hundred items from the 1970s and 1980s</li>
</ul>
<h2 id="20190311">2019-03-11</h2>
<h2 id="2019-03-11">2019-03-11</h2>
<ul>
<li>Bosede said that she would like the IITA research theme tagging only for items since 2015, which would be 256 items</li>
</ul>
<h2 id="20190312">2019-03-12</h2>
<h2 id="2019-03-12">2019-03-12</h2>
<ul>
<li>I imported the changes to 256 of IITA's records on CGSpace</li>
</ul>
<h2 id="20190314">2019-03-14</h2>
<h2 id="2019-03-14">2019-03-14</h2>
<ul>
<li>CGSpace had the same issue with blank items like earlier this month and I restarted Tomcat to fix it</li>
<li>Create a pull request to change Swaziland to Eswatini and Macedonia to North Macedonia (<a href="https://github.com/ilri/DSpace/pull/414">#414</a>)
@ -301,7 +301,7 @@ done
<li>Run all system updates and reboot linode20</li>
<li>Follow up with Felix from Earlham to see if he's done testing DSpace Test with COPO so I can re-sync the server from CGSpace</li>
</ul>
<h2 id="20190315">2019-03-15</h2>
<h2 id="2019-03-15">2019-03-15</h2>
<ul>
<li>CGSpace (linode18) has the blank page error again</li>
<li>I'm not sure if it's related, but I see the following error in DSpace's log:</li>
@ -402,7 +402,7 @@ java.util.EmptyStackException
</code></pre><ul>
<li>For now I will just restart Tomcat&hellip;</li>
</ul>
<h2 id="20190317">2019-03-17</h2>
<h2 id="2019-03-17">2019-03-17</h2>
<ul>
<li>Last week Felix from Earlham said that they finished testing on DSpace Test (linode19) so I made backups of some things there and re-deployed the system on Ubuntu 18.04
<ul>
@ -437,7 +437,7 @@ Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign k
<pre><code># su - postgres
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (164496);'
UPDATE 1
</code></pre><h2 id="20190318">2019-03-18</h2>
</code></pre><h2 id="2019-03-18">2019-03-18</h2>
<ul>
<li>I noticed that the regular expression for validating lines from input files in my <code>agrovoc-lookup.py</code> script was skipping characters with accents, etc, so I changed it to use the <code>\w</code> character class for words instead of trying to match <code>[A-Z]</code> etc&hellip;
<ul>
@ -568,7 +568,7 @@ $ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|ds
</code></pre><ul>
<li>I'm not sure if it's cocoon or that's just a symptom of something else</li>
</ul>
<h2 id="20190319">2019-03-19</h2>
<h2 id="2019-03-19">2019-03-19</h2>
<ul>
<li>I found a handful of AGROVOC subjects that use a non-breaking space (0x00a0) instead of a regular space, which makes for a pretty confusing debugging&hellip;</li>
<li>I will replace these in the database immediately to save myself the headache later:</li>
@ -640,7 +640,7 @@ Max realtime timeout unlimited unlimited us
</ul>
</li>
</ul>
<h2 id="20190320">2019-03-20</h2>
<h2 id="2019-03-20">2019-03-20</h2>
<ul>
<li>Create a branch for Solr 4.10.4 changes so I can test on DSpace Test (linode19)
<ul>
@ -648,7 +648,7 @@ Max realtime timeout unlimited unlimited us
</ul>
</li>
</ul>
<h2 id="20190321">2019-03-21</h2>
<h2 id="2019-03-21">2019-03-21</h2>
<ul>
<li>It's been two days since we had the blank page issue on CGSpace, and looking in the Cocoon logs I see very low numbers of the errors that we were seeing the last time the issue occurred:</li>
</ul>
@ -687,12 +687,12 @@ $ grep 'Can not load requested doc' cocoon.log.2019-03-21 | grep -oE '2019-03-21
</ul>
</li>
</ul>
<h2 id="20190322">2019-03-22</h2>
<h2 id="2019-03-22">2019-03-22</h2>
<ul>
<li>Share the initial list of invalid AGROVOC terms on Yammer to ask the editors for help in correcting them</li>
<li>Advise Phanuel Ayuka from IITA about using controlled vocabularies in DSpace</li>
</ul>
<h2 id="20190323">2019-03-23</h2>
<h2 id="2019-03-23">2019-03-23</h2>
<ul>
<li>CGSpace (linode18) is having the blank page issue again and it seems to have started last night around 21:00:</li>
</ul>
@ -811,7 +811,7 @@ org.postgresql.util.PSQLException: This statement has been closed.
</ul>
</li>
</ul>
<h2 id="20190324">2019-03-24</h2>
<h2 id="2019-03-24">2019-03-24</h2>
<ul>
<li>I did some more tests with the <a href="https://github.com/gnosly/TomcatJdbcConnectionTest">TomcatJdbcConnectionTest</a> thing and while monitoring the number of active connections in jconsole and after adjusting the limits quite low I eventually saw some connections get abandoned</li>
<li>I forgot that to connect to a remote JMX session with jconsole you need to use a dynamic SSH SOCKS proxy (as I originally <a href="/cgspace-notes/2017-11/">discovered in 2017-11</a>:</li>
@ -831,7 +831,7 @@ org.postgresql.util.PSQLException: This statement has been closed.
</ul>
</li>
</ul>
<h2 id="20190325">2019-03-25</h2>
<h2 id="2019-03-25">2019-03-25</h2>
<ul>
<li>Finish looking over the 175 invalid AGROVOC terms
<ul>
@ -918,7 +918,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-03-22 | sort -u | wc -l
</li>
<li>According the Uptime Robot the server was up and down a few more times over the next hour so I restarted Tomcat again</li>
</ul>
<h2 id="20190326">2019-03-26</h2>
<h2 id="2019-03-26">2019-03-26</h2>
<ul>
<li>UptimeRobot says CGSpace went down again and I see the load is again at 14.0!</li>
<li>Here are the top IPs in nginx logs in the last hour:</li>
@ -1032,7 +1032,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-03-26-AGROVOC-79-deletions.csv -db ds
</ul>
<pre><code>$ grep -I -c 45.5.184.72 dspace.log.2019-03-26
0
</code></pre><h2 id="20190328">2019-03-28</h2>
</code></pre><h2 id="2019-03-28">2019-03-28</h2>
<ul>
<li>Run the corrections and deletions to AGROVOC (dc.subject) on DSpace Test and CGSpace, and then start a full re-index of Discovery</li>
<li>What the hell is going on with this CTA publication?</li>
@ -1074,7 +1074,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-03-26-AGROVOC-79-deletions.csv -db ds
</code></pre><ul>
<li>In other other news I see that DSpace has no statistics for years before 2019 currently, yet when I connect to Solr I see all the cores up</li>
</ul>
<h2 id="20190329">2019-03-29</h2>
<h2 id="2019-03-29">2019-03-29</h2>
<ul>
<li>Sent Linode more information from <code>top</code> and <code>iostat</code> about the resource usage on linode18
<ul>
@ -1088,7 +1088,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-03-26-AGROVOC-79-deletions.csv -db ds
</ul>
</li>
</ul>
<h2 id="20190331">2019-03-31</h2>
<h2 id="2019-03-31">2019-03-31</h2>
<ul>
<li>After a few days of the CGSpace VM (linode18) being migrated to a new host the CPU steal is gone and the site is much more responsive</li>
</ul>

View File

@ -61,7 +61,7 @@ $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u ds
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 231 -f cg.coverage.region -d
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -142,7 +142,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</p>
</header>
<h2 id="20190401">2019-04-01</h2>
<h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>
@ -165,7 +165,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre><h2 id="20190402">2019-04-02</h2>
</code></pre><h2 id="2019-04-02">2019-04-02</h2>
<ul>
<li>CTA says the Amazon IPs are AWS gateways for real user traffic</li>
<li>I was trying to add Felix Shaw's account back to the Administrators group on DSpace Test, but I couldn't find his name in the user search of the groups page
@ -175,7 +175,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</ul>
</li>
</ul>
<h2 id="20190403">2019-04-03</h2>
<h2 id="2019-04-03">2019-04-03</h2>
<ul>
<li>Maria from Bioversity emailed me a list of new ORCID identifiers for their researchers so I will add them to our controlled vocabulary
<ul>
@ -209,7 +209,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</code></pre><ul>
<li>I will have to keep an eye on it because nothing should be updating 2018 stats in 2019&hellip;</li>
</ul>
<h2 id="20190405">2019-04-05</h2>
<h2 id="2019-04-05">2019-04-05</h2>
<ul>
<li>Uptime Robot reported that CGSpace (linode18) went down tonight</li>
<li>I see there are lots of PostgreSQL connections:</li>
@ -238,7 +238,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</code></pre><ul>
<li>I restarted it again and all the Solr cores came up properly&hellip;</li>
</ul>
<h2 id="20190406">2019-04-06</h2>
<h2 id="2019-04-06">2019-04-06</h2>
<ul>
<li>Udana asked why item <a href="https://cgspace.cgiar.org/handle/10568/91278">10568/91278</a> didn't have an Altmetric badge on CGSpace, but on the <a href="https://wle.cgiar.org/food-and-agricultural-innovation-pathways-prosperity">WLE website</a> it does
<ul>
@ -297,7 +297,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
</ul>
</li>
</ul>
<h2 id="20190407">2019-04-07</h2>
<h2 id="2019-04-07">2019-04-07</h2>
<ul>
<li>Looking into the impact of harvesters like <code>45.5.184.72</code>, I see in Solr that this user is not categorized as a bot so it definitely impacts the usage stats by some tens of thousands <em>per day</em></li>
<li>Last week CTA switched their frontend code to use HEAD requests instead of GET requests for bitstreams
@ -529,7 +529,7 @@ X-XSS-Protection: 1; mode=block
<li>It seems that the issue with CGSpace being &ldquo;down&rdquo; is actually because of CPU steal again!!!</li>
<li>I opened a ticket with support and asked them to migrate the VM to a less busy host</li>
</ul>
<h2 id="20190408">2019-04-08</h2>
<h2 id="2019-04-08">2019-04-08</h2>
<ul>
<li>Start checking IITA's last round of batch uploads from <a href="https://dspacetest.cgiar.org/handle/10568/100333">March on DSpace Test</a> (20193rd.xls)
<ul>
@ -623,7 +623,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exe
19 157.55.39.164
20 40.77.167.132
370 51.254.16.223
</code></pre><h2 id="20190409">2019-04-09</h2>
</code></pre><h2 id="2019-04-09">2019-04-09</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was 440% CPU for the last two hours this morning</li>
<li>Here are the top IPs in the web server logs around that time:</li>
@ -670,7 +670,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exe
</li>
<li>In other news, Linode staff identified a noisy neighbor sharing our host and migrated it elsewhere last night</li>
</ul>
<h2 id="20190410">2019-04-10</h2>
<h2 id="2019-04-10">2019-04-10</h2>
<ul>
<li>Abenet pointed out a possibility of validating funders against the <a href="https://support.crossref.org/hc/en-us/articles/215788143-Funder-data-via-the-API">CrossRef API</a></li>
<li>Note that if you use HTTPS and specify a contact address in the API request you have less likelihood of being blocked</li>
@ -684,7 +684,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exe
<pre><code>from habanero import Crossref
cr = Crossref(mailto=&quot;me@cgiar.org&quot;)
x = cr.funders(query = &quot;mercator&quot;)
</code></pre><h2 id="20190411">2019-04-11</h2>
</code></pre><h2 id="2019-04-11">2019-04-11</h2>
<ul>
<li>Continue proofing IITA's last round of batch uploads from <a href="https://dspacetest.cgiar.org/handle/10568/100333">March on DSpace Test</a> (20193rd.xls)
<ul>
@ -725,7 +725,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-04-11-delete-6-subjects.csv -db dspac
</ul>
</li>
</ul>
<h2 id="20190413">2019-04-13</h2>
<h2 id="2019-04-13">2019-04-13</h2>
<ul>
<li>I copied the <code>statistics</code> and <code>statistics-2018</code> Solr cores from CGSpace to my local machine and watched the Java process in VisualVM while indexing item views and downloads with my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a>:</li>
</ul>
@ -741,7 +741,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-04-11-delete-6-subjects.csv -db dspac
<li>I tried again with the GC tuning settings from the Solr 4.10.4 release:</li>
</ul>
<p><img src="/cgspace-notes/2019/04/visualvm-solr-indexing-solr-settings.png" alt="Java GC during Solr indexing Solr 4.10.4 settings"></p>
<h2 id="20190414">2019-04-14</h2>
<h2 id="2019-04-14">2019-04-14</h2>
<ul>
<li>Change DSpace Test (linode19) to use the Java GC tuning from the Solr 4.14.4 startup script:</li>
</ul>
@ -763,7 +763,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-04-11-delete-6-subjects.csv -db dspac
<li>I need to remember to check the Munin JVM graphs in a few days</li>
<li>It might be placebo, but the site <em>does</em> feel snappier&hellip;</li>
</ul>
<h2 id="20190415">2019-04-15</h2>
<h2 id="2019-04-15">2019-04-15</h2>
<ul>
<li>Rework the dspace-statistics-api to use the vanilla Python requests library instead of Solr client
<ul>
@ -806,11 +806,11 @@ return item_id
real 82m45.324s
user 7m33.446s
sys 2m13.463s
</code></pre><h2 id="20190416">2019-04-16</h2>
</code></pre><h2 id="2019-04-16">2019-04-16</h2>
<ul>
<li>Export IITA's community from CGSpace because they want to experiment with importing it into their internal DSpace for some testing or something</li>
</ul>
<h2 id="20190417">2019-04-17</h2>
<h2 id="2019-04-17">2019-04-17</h2>
<ul>
<li>Reading an interesting <a href="https://teaspoon-consulting.com/articles/solr-cache-tuning.html">blog post about Solr caching</a></li>
<li>Did some tests of the dspace-statistics-api on my local DSpace instance with 28 million documents in a sharded statistics core (<code>statistics</code> and <code>statistics-2018</code>) and monitored the memory usage of Tomcat in VisualVM</li>
@ -956,7 +956,7 @@ sys 2m13.463s
<li>Lots of CPU steal going on still on CGSpace (linode18):</li>
</ul>
<p><img src="/cgspace-notes/2019/04/cpu-week3.png" alt="CPU usage week"></p>
<h2 id="20190418">2019-04-18</h2>
<h2 id="2019-04-18">2019-04-18</h2>
<ul>
<li>I've been trying to copy the <code>statistics-2018</code> Solr core from CGSpace to DSpace Test since yesterday, but the network speed is like 20KiB/sec
<ul>
@ -984,7 +984,7 @@ sys 2m13.463s
</ul>
</li>
</ul>
<h2 id="20190420">2019-04-20</h2>
<h2 id="2019-04-20">2019-04-20</h2>
<ul>
<li>Linode agreed to move CGSpace (linode18) to a new machine shortly after I filed my ticket about CPU steal two days ago and now the load is much more sane:</li>
</ul>
@ -1020,7 +1020,7 @@ TCP window size: 85.0 KByte (default)
</ul>
</li>
</ul>
<h2 id="20190421">2019-04-21</h2>
<h2 id="2019-04-21">2019-04-21</h2>
<ul>
<li>Deploy Solr 4.10.4 on CGSpace (linode18)</li>
<li>Deploy Tomcat 7.0.94 on CGSpace</li>
@ -1031,7 +1031,7 @@ TCP window size: 85.0 KByte (default)
</ul>
</li>
</ul>
<h2 id="20190422">2019-04-22</h2>
<h2 id="2019-04-22">2019-04-22</h2>
<ul>
<li>Abenet pointed out <a href="https://hdl.handle.net/10568/97912">an item</a> that doesn't have an Altmetric score on CGSpace, but has a score of 343 in the CGSpace Altmetric dashboard
<ul>
@ -1055,7 +1055,7 @@ dspace.log.2019-04-20:1515
</ul>
</li>
</ul>
<h2 id="20190423">2019-04-23</h2>
<h2 id="2019-04-23">2019-04-23</h2>
<ul>
<li>One blog post says that there is <a href="https://kvaes.wordpress.com/2017/07/01/what-azure-virtual-machine-size-should-i-pick/">no overprovisioning in Azure</a>:</li>
</ul>
@ -1068,7 +1068,7 @@ dspace.log.2019-04-20:1515
</ul>
</li>
</ul>
<h2 id="20190424">2019-04-24</h2>
<h2 id="2019-04-24">2019-04-24</h2>
<ul>
<li>Linode migrated CGSpace (linode18) to a new host, but I am still getting poor performance when copying data to DSpace Test (linode19)
<ul>
@ -1159,7 +1159,7 @@ dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AN
</code></pre><ul>
<li>I sent a message to the dspace-tech mailing list to ask for help</li>
</ul>
<h2 id="20190425">2019-04-25</h2>
<h2 id="2019-04-25">2019-04-25</h2>
<ul>
<li>Peter pointed out that we need to remove Delicious and Google+ from our social sharing links
<ul>
@ -1200,13 +1200,13 @@ $ curl -f -H &quot;rest-dspace-token: b43d41a6-5ac1-455d-b49a-616b8debc25b&quot;
<li>Communicate with Carlos Tejo from the Land Portal about the <code>/items/find-by-metadata-value</code> endpoint</li>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
</ul>
<h2 id="20190426">2019-04-26</h2>
<h2 id="2019-04-26">2019-04-26</h2>
<ul>
<li>Export a list of authors for Peter to look through:</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-04-26-all-authors.csv with csv header;
COPY 65752
</code></pre><h2 id="20190428">2019-04-28</h2>
</code></pre><h2 id="2019-04-28">2019-04-28</h2>
<ul>
<li>Still trying to figure out the issue with the items that cause the REST API's <code>/items/find-by-metadata-value</code> endpoint to throw an exception
<ul>
@ -1226,7 +1226,7 @@ COPY 65752
</code></pre><ul>
<li>I even tried to &ldquo;expunge&rdquo; the item using an <a href="https://wiki.duraspace.org/display/DSDOC5x/Batch+Metadata+Editing#BatchMetadataEditing-Performing'actions'onitems">action in CSV</a>, and it said &ldquo;EXPUNGED!&rdquo; but the item is still there&hellip;</li>
</ul>
<h2 id="20190430">2019-04-30</h2>
<h2 id="2019-04-30">2019-04-30</h2>
<ul>
<li>Send mail to the dspace-tech mailing list to ask about the item expunge issue</li>
<li>Delete and re-create Podman container for dspacedb after pulling a new PostgreSQL container:</li>

View File

@ -45,7 +45,7 @@ DELETE 1
But after this I tried to delete the item from the XMLUI and it is still present&hellip;
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -126,7 +126,7 @@ But after this I tried to delete the item from the XMLUI and it is still present
</p>
</header>
<h2 id="20190501">2019-05-01</h2>
<h2 id="2019-05-01">2019-05-01</h2>
<ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
@ -195,7 +195,7 @@ curl: (22) The requested URL returned error: 401 Unauthorized
</li>
</ul>
<pre><code>https://cgspace.cgiar.org/rest/collections/1179/items?limit=812&amp;expand=metadata
</code></pre><h2 id="20190503">2019-05-03</h2>
</code></pre><h2 id="2019-05-03">2019-05-03</h2>
<ul>
<li>A user from CIAT emailed to say that CGSpace submission emails have not been working the last few weeks
<ul>
@ -221,7 +221,7 @@ Please see the DSpace documentation for assistance.
</ul>
</li>
</ul>
<h2 id="20190505">2019-05-05</h2>
<h2 id="2019-05-05">2019-05-05</h2>
<ul>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li>Merge changes into the <code>5_x-prod</code> branch of CGSpace:
@ -239,7 +239,7 @@ Please see the DSpace documentation for assistance.
</ul>
</li>
</ul>
<h2 id="20190506">2019-05-06</h2>
<h2 id="2019-05-06">2019-05-06</h2>
<ul>
<li>Peter pointed out that Solr stats are only showing 2019 stats
<ul>
@ -351,7 +351,7 @@ $ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -
</ul>
</li>
</ul>
<h2 id="20190507">2019-05-07</h2>
<h2 id="2019-05-07">2019-05-07</h2>
<ul>
<li>The total number of unique IPs on CGSpace yesterday was almost 14,000, which is several thousand higher than previous day totals:</li>
</ul>
@ -391,7 +391,7 @@ $ cat dspace.log.2019-05-01 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq |
</li>
<li>Add requests cache to <code>resolve-addresses.py</code> script</li>
</ul>
<h2 id="20190508">2019-05-08</h2>
<h2 id="2019-05-08">2019-05-08</h2>
<ul>
<li>A user said that CGSpace emails have stopped sending again
<ul>
@ -425,7 +425,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
</ul>
</li>
</ul>
<h2 id="20190510">2019-05-10</h2>
<h2 id="2019-05-10">2019-05-10</h2>
<ul>
<li>I finally had time to analyze the 7,000 IPs from the major traffic spike on 2019-05-06 after several runs of my <code>resolve-addresses.py</code> script (ipapi.co has a limit of 1,000 requests per day)</li>
<li>Resolving the unique IP addresses to organization and AS names reveals some pretty big abusers:
@ -461,7 +461,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
</ul>
</li>
</ul>
<h2 id="20190512">2019-05-12</h2>
<h2 id="2019-05-12">2019-05-12</h2>
<ul>
<li>I see that the Unpaywall bot is resonsible for a few thousand XMLUI sessions every day (IP addresses come from nginx access.log):</li>
</ul>
@ -474,7 +474,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
<li>Also, there is 10 to 20% CPU steal on that VM, so I will ask Linode to move it to another host</li>
<li>Commit changes to the <code>resolve-addresses.py</code> script to add proper CSV output support</li>
</ul>
<h2 id="20190514">2019-05-14</h2>
<h2 id="2019-05-14">2019-05-14</h2>
<ul>
<li>Skype with Peter and AgroKnow about CTA story telling modification they want to do on the CTA ICT Update collection on CGSpace
<ul>
@ -483,7 +483,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
</ul>
</li>
</ul>
<h2 id="20190515">2019-05-15</h2>
<h2 id="2019-05-15">2019-05-15</h2>
<ul>
<li>Tezira says she's having issues with email reports for approved submissions, but I received an email about collection subscriptions this morning, and I tested with <code>dspace test-email</code> and it's also working&hellip;</li>
<li>Send a list of DSpace build tips to Panagis from AgroKnow</li>
@ -493,7 +493,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
</ul>
</li>
</ul>
<h2 id="20190516">2019-05-16</h2>
<h2 id="2019-05-16">2019-05-16</h2>
<ul>
<li>Export a list of all investors (<code>dc.description.sponsorship</code>) for Peter to look through and correct:</li>
</ul>
@ -506,7 +506,7 @@ COPY 995
</ul>
</li>
</ul>
<h2 id="20190517">2019-05-17</h2>
<h2 id="2019-05-17">2019-05-17</h2>
<ul>
<li>Peter sent me a bunch of fixes for investors from yesterday</li>
<li>I did a quick check in Open Refine (trim and collapse whitespace, clean smart quotes, etc) and then applied them on CGSpace:</li>
@ -532,17 +532,17 @@ $ ./delete-metadata-values.py -i /tmp/2019-05-17-delete-14-Investors.csv -db dsp
</ul>
</li>
</ul>
<h2 id="20190519">2019-05-19</h2>
<h2 id="2019-05-19">2019-05-19</h2>
<ul>
<li>Add &ldquo;ISI journal&rdquo; to item view sidebar at the request of Maria Garruccio</li>
<li>Update <code>fix-metadata-values.py</code> and <code>delete-metadata-values.py</code> scripts to add some basic checking of CSV fields and colorize shell output using Colorama</li>
</ul>
<h2 id="20190524">2019-05-24</h2>
<h2 id="2019-05-24">2019-05-24</h2>
<ul>
<li>Update AReS README.md on GitHub repository to add a proper introduction, credits, requirements, installation instructions, and legal information</li>
<li>Update CIP subjects in input forms on CGSpace (<a href="https://github.com/ilri/DSpace/pull/424">#424</a>)</li>
</ul>
<h2 id="20190525">2019-05-25</h2>
<h2 id="2019-05-25">2019-05-25</h2>
<ul>
<li>Help Abenet proof ten Africa Rice publications
<ul>
@ -557,7 +557,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-05-17-delete-14-Investors.csv -db dsp
<li>Generate Simple Archive Format bundle with SAFBuilder and import into the <a href="https://cgspace.cgiar.org/handle/10568/101106">AfricaRice Articles in Journals</a> collection on CGSpace:</li>
</ul>
<pre><code>$ dspace import -a -e me@cgiar.org -m 2019-05-25-AfricaRice.map -s /tmp/SimpleArchiveFormat
</code></pre><h2 id="20190527">2019-05-27</h2>
</code></pre><h2 id="2019-05-27">2019-05-27</h2>
<ul>
<li>Peter sent me over two thousand corrections for the authors on CGSpace that I had dumped last month
<ul>
@ -584,7 +584,7 @@ COPY 64871
</ul>
</li>
</ul>
<h2 id="20190529">2019-05-29</h2>
<h2 id="2019-05-29">2019-05-29</h2>
<ul>
<li>A CIMMYT user was having problems registering or logging into CGSpace
<ul>
@ -593,7 +593,7 @@ COPY 64871
</ul>
</li>
</ul>
<h2 id="20190530">2019-05-30</h2>
<h2 id="2019-05-30">2019-05-30</h2>
<ul>
<li>I see the following error in the DSpace log when the user tries to log in with her CGIAR email and password on the LDAP login:</li>
</ul>

View File

@ -31,7 +31,7 @@ Run system updates on CGSpace (linode18) and reboot it
Skype with Marie-Angélique and Abenet about CG Core v2
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -112,12 +112,12 @@ Skype with Marie-Angélique and Abenet about CG Core v2
</p>
</header>
<h2 id="20190602">2019-06-02</h2>
<h2 id="2019-06-02">2019-06-02</h2>
<ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul>
<h2 id="20190603">2019-06-03</h2>
<h2 id="2019-06-03">2019-06-03</h2>
<ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul>
@ -142,7 +142,7 @@ Skype with Marie-Angélique and Abenet about CG Core v2
</ul>
</li>
</ul>
<h2 id="20190604">2019-06-04</h2>
<h2 id="2019-06-04">2019-06-04</h2>
<ul>
<li>The MARLO team responded and said they will give us access to the CLARISA API</li>
<li>Marie-Angélique <a href="https://github.com/AgriculturalSemantics/cg-core/pull/1">proposed</a> to integrate <code>dcterms.isPartOf</code>, <code>dcterms.abstract</code>, and <code>dcterms.bibliographicCitation</code> into the CG Core v2 schema
@ -153,11 +153,11 @@ Skype with Marie-Angélique and Abenet about CG Core v2
</li>
<li>Add Arabic language to input-forms.xml (<a href="https://github.com/ilri/DSpace/pull/427">#427</a>), as Bioversity is adding some Arabic items and noticed it missing</li>
</ul>
<h2 id="20190605">2019-06-05</h2>
<h2 id="2019-06-05">2019-06-05</h2>
<ul>
<li>Send mail to CGSpace and MELSpace people to let them know about the proposed metadata field migrations after the discussion with Marie-Angélique</li>
</ul>
<h2 id="20190607">2019-06-07</h2>
<h2 id="2019-06-07">2019-06-07</h2>
<ul>
<li>Thierry noticed that the CUA statistics were missing previous years again, and I see that the Solr admin UI has the following message:</li>
</ul>
@ -165,7 +165,7 @@ Skype with Marie-Angélique and Abenet about CG Core v2
</code></pre><ul>
<li>I had to restart Tomcat a few times for all the stats cores to get loaded with no issue</li>
</ul>
<h2 id="20190610">2019-06-10</h2>
<h2 id="2019-06-10">2019-06-10</h2>
<ul>
<li>Rename the AReS repository on GitHub to OpenRXV: <a href="https://github.com/ilri/OpenRXV">https://github.com/ilri/OpenRXV</a></li>
<li>Create a new AReS repository: <a href="https://github.com/ilri/AReS">https://github.com/ilri/AReS</a></li>
@ -174,7 +174,7 @@ Skype with Marie-Angélique and Abenet about CG Core v2
<li>Trim leading, trailing, and consecutive whitespace on all columns, but I didn't notice very many issues</li>
<li>Validate affiliations against latest list of top 1500 terms using reconcile-csv, correcting and standardizing about twenty-seven</li>
<li>Validate countries against latest list of countries using reconcile-csv, correcting three</li>
<li>Convert all DOIs to &ldquo;<a href="https://dx.doi.org">https://dx.doi.org</a>&rdquo; format</li>
<li>Convert all DOIs to &ldquo;<a href="https://dx.doi.org%22">https://dx.doi.org&quot;</a> format</li>
<li>Normalize all <code>cg.identifier.url</code> Google book fields to &ldquo;books.google.com&rdquo;</li>
<li>Correct some inconsistencies in IITA subjects</li>
<li>Correct two incorrect &ldquo;Peer Review&rdquo; in <code>dc.description.version</code></li>
@ -209,11 +209,11 @@ $ wc -l iita-agrovoc*
<li>Then make a new list to use with reconcile-csv by adding line numbers with csvcut and changing the line number header to <code>id</code>:</li>
</ul>
<pre><code>$ csvcut -c name -l 2019-06-10-subjects-matched.txt | sed 's/line_number/id/' &gt; 2019-06-10-subjects-matched.csv
</code></pre><h2 id="20190620">2019-06-20</h2>
</code></pre><h2 id="2019-06-20">2019-06-20</h2>
<ul>
<li>Share some feedback about AReS v2 with the colleagues and encourage them to do the same</li>
</ul>
<h2 id="20190623">2019-06-23</h2>
<h2 id="2019-06-23">2019-06-23</h2>
<ul>
<li>Continue work on reviewing CG Core v2 standard and its implications to CGSpace an DSpace platforms in general
<ul>
@ -226,7 +226,7 @@ $ wc -l iita-agrovoc*
<pre><code>$ podman pull docker.io/library/postgres:9.6-alpine
$ podman rm dspacedb
$ podman run --name dspacedb -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
</code></pre><h2 id="20190625">2019-06-25</h2>
</code></pre><h2 id="2019-06-25">2019-06-25</h2>
<ul>
<li>Normalize <code>text_lang</code> values for metadata on DSpace Test and CGSpace:</li>
</ul>
@ -240,7 +240,7 @@ UPDATE 2
<li>Upload 202 IITA records from earlier this month (20194th.xls) to CGSpace</li>
<li>Communicate with Bioversity contractor in charge of their migration from Typo3 to CGSpace</li>
</ul>
<h2 id="20190628">2019-06-28</h2>
<h2 id="2019-06-28">2019-06-28</h2>
<ul>
<li>Start looking at the fifty-seven AfricaRice records sent by Ibnou earlier this month
<ul>
@ -275,7 +275,7 @@ UPDATE 2
</ul>
</li>
</ul>
<h2 id="20190630">2019-06-30</h2>
<h2 id="2019-06-30">2019-06-30</h2>
<ul>
<li>Upload fifty-seven AfricaRice records to <a href="https://dspacetest.cgiar.org/handle/10568/102274">DSpace Test</a>
<ul>

View File

@ -35,7 +35,7 @@ CGSpace
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -116,7 +116,7 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
</p>
</header>
<h2 id="20190701">2019-07-01</h2>
<h2 id="2019-07-01">2019-07-01</h2>
<ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
@ -205,7 +205,7 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
-Dcom.sun.management.jmxremote.port=1337
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
</code></pre><h2 id="20190702">2019-07-02</h2>
</code></pre><h2 id="2019-07-02">2019-07-02</h2>
<ul>
<li>Help upload twenty-seven posters from the 2019-05 Sharefair to CGSpace
<ul>
@ -229,11 +229,11 @@ $ dspace import -a -e me@cgiar.org -m 2019-07-02-Sharefair.map -s /tmp/Sharefair
</ul>
</li>
</ul>
<h2 id="20190703">2019-07-03</h2>
<h2 id="2019-07-03">2019-07-03</h2>
<ul>
<li>Atmire responded about the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr issue</a> and said they would be willing to help</li>
</ul>
<h2 id="20190704">2019-07-04</h2>
<h2 id="2019-07-04">2019-07-04</h2>
<ul>
<li>Maria Garruccio sent me some new ORCID identifiers for Bioversity authors
<ul>
@ -255,11 +255,11 @@ $ ./resolve-orcids.py -i /tmp/2019-07-04-orcid-ids.txt -o 2019-07-04-orcid-names
<li>But when I ran <code>fix-metadata-values.py</code> I didn't see any changes:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i 2019-07-04-update-orcids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -m 240 -t correct -d
</code></pre><h2 id="20190706">2019-07-06</h2>
</code></pre><h2 id="2019-07-06">2019-07-06</h2>
<ul>
<li>Send a reminder to Marie about my notes on the <a href="https://github.com/AgriculturalSemantics/cg-core/issues/2">CG Core v2 issue I created two weeks ago</a></li>
</ul>
<h2 id="20190708">2019-07-08</h2>
<h2 id="2019-07-08">2019-07-08</h2>
<ul>
<li>Communicate with Atmire about the Solr statistics cores issue
<ul>
@ -297,7 +297,7 @@ dc.identifier.issn
978-3-319-58789-9
2320-7035
2593-9173
</code></pre><h2 id="20190709">2019-07-09</h2>
</code></pre><h2 id="2019-07-09">2019-07-09</h2>
<ul>
<li>Thinking about data cleaning automation again and found some resources about Python and Pandas:
<ul>
@ -306,7 +306,7 @@ dc.identifier.issn
</ul>
</li>
</ul>
<h2 id="20190711">2019-07-11</h2>
<h2 id="2019-07-11">2019-07-11</h2>
<ul>
<li>Skype call with Marie Angelique about CG Core v2
<ul>
@ -329,7 +329,7 @@ dc.identifier.issn
</code></pre><ul>
<li>I'm assuming something happened in his browser (like a refresh) after the item was submitted&hellip;</li>
</ul>
<h2 id="20190712">2019-07-12</h2>
<h2 id="2019-07-12">2019-07-12</h2>
<ul>
<li>Atmire responded with some initial feedback about our Tomcat configuration related to the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685">Solr issue I raised recently</a>
<ul>
@ -350,7 +350,7 @@ dc.identifier.issn
<pre><code># su - postgres
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (167394);'
UPDATE 1
</code></pre><h2 id="20190716">2019-07-16</h2>
</code></pre><h2 id="2019-07-16">2019-07-16</h2>
<ul>
<li>Completely reset the Podman configuration on my laptop because there were some layers that I couldn't delete and it had been some time since I did a cleanup:</li>
</ul>
@ -371,7 +371,7 @@ $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-s
<li>Start working on implementing the <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">CG Core v2 changes</a> on my local DSpace test environment</li>
<li>Make a pull request to CG Core v2 with some fixes for typos in the specification (<a href="https://github.com/AgriculturalSemantics/cg-core/pull/5">#5</a>)</li>
</ul>
<h2 id="20190718">2019-07-18</h2>
<h2 id="2019-07-18">2019-07-18</h2>
<ul>
<li>Talk to Moayad about the remaining issues for OpenRXV / AReS
<ul>
@ -394,7 +394,7 @@ Please see the DSpace documentation for assistance.
</code></pre><ul>
<li>I emailed ICT to ask them to reset it and make the expiration period longer if possible</li>
</ul>
<h2 id="20190719">2019-07-19</h2>
<h2 id="2019-07-19">2019-07-19</h2>
<ul>
<li>ICT reset the password for the CGSpace support account and apparently removed the expiry requirement
<ul>
@ -402,7 +402,7 @@ Please see the DSpace documentation for assistance.
</ul>
</li>
</ul>
<h2 id="20190720">2019-07-20</h2>
<h2 id="2019-07-20">2019-07-20</h2>
<ul>
<li>Create an account for Lionelle Samnick on CGSpace because the registration isn't working for some reason:</li>
</ul>
@ -417,7 +417,7 @@ Please see the DSpace documentation for assistance.
<li>Some invalid ISSNs in dc.identifier.issn (they look like ISBNs)</li>
<li>I see some ISSNs in the dc.identifier.isbn field</li>
<li>I see some invalid ISBNs that look like Excel errors (9,78E+12)</li>
<li>For DOI we just use the URL, not &ldquo;DOI: <a href="https://doi.org..">https://doi.org..</a>.&rdquo;</li>
<li>For DOI we just use the URL, not &ldquo;DOI: <a href="https://doi.org...%22">https://doi.org...&quot;</a></li>
<li>I see an invalid &ldquo;LEAVE BLANK&rdquo; in the cg.contributor.crp field</li>
<li>Country field is using &ldquo;,&rdquo; for multiple values instead of &ldquo;||&rdquo;</li>
<li>Region field is using &ldquo;,&rdquo; for multiple values instead of &ldquo;||&rdquo;</li>
@ -426,7 +426,7 @@ Please see the DSpace documentation for assistance.
</ul>
</li>
</ul>
<h2 id="20190722">2019-07-22</h2>
<h2 id="2019-07-22">2019-07-22</h2>
<ul>
<li>Raise an <a href="https://github.com/AgriculturalSemantics/cg-core/issues/8">issue on CG Core v2 spec regarding country and region coverage</a>
<ul>
@ -445,7 +445,7 @@ Please see the DSpace documentation for assistance.
<li>I left a note saying that DSpace is technically limited to a flat schema so we use <code>cg.coverage.country: Kenya</code></li>
<li>Do a little more work on CG Core v2 in the input forms</li>
</ul>
<h2 id="20190725">2019-07-25</h2>
<h2 id="2019-07-25">2019-07-25</h2>
<ul>
<li>
<p>Generate a list of the ORCID identifiers that we added to CGSpace in 2019 for Sara Jani at ICARDA</p>
@ -461,7 +461,7 @@ Please see the DSpace documentation for assistance.
<li>A few strange publishers after splitting multi-value cells, like &ldquo;(Belgium)&rdquo;</li>
<li>Deleted four ISSNs that are actually ISBNs and are already present in the ISBN field</li>
<li>Eight invalid ISBNs</li>
<li>Convert all DOIs to &ldquo;<a href="https://doi.org">https://doi.org</a>&rdquo; format and fix one invalid DOI</li>
<li>Convert all DOIs to &ldquo;<a href="https://doi.org%22">https://doi.org&quot;</a> format and fix one invalid DOI</li>
<li>Fix a handful of incorrect CRPs that seem to have been split on comma &ldquo;,&rdquo;</li>
<li>Lots of strange values in cg.link.reference, and I normalized all DOIs to <a href="https://doi.org">https://doi.org</a> format
<ul>
@ -488,7 +488,7 @@ from stdnum import issn
isbn.validate('978-92-9043-389-7')
issn.validate('1020-3362')
</code></pre><h2 id="20190726">2019-07-26</h2>
</code></pre><h2 id="2019-07-26">2019-07-26</h2>
<ul>
<li>
<p>Bioversity sent me an updated CSV file that fixes some of the issues I pointed out yesterday</p>
@ -506,7 +506,7 @@ issn.validate('1020-3362')
</code></pre><ul>
<li>I whipped up a quick script using Python Pandas to do whitespace cleanup</li>
</ul>
<h2 id="20190729">2019-07-29</h2>
<h2 id="2019-07-29">2019-07-29</h2>
<ul>
<li>I turned the Pandas script into a proper Python package called <a href="https://git.sr.ht/~alanorth/csv-metadata-quality">csv-metadata-quality</a>
<ul>
@ -520,7 +520,7 @@ issn.validate('1020-3362')
</li>
<li>Inform Bioversity that there is an error in their CSV, seemingly caused by quotes in the citation field</li>
</ul>
<h2 id="20190730">2019-07-30</h2>
<h2 id="2019-07-30">2019-07-30</h2>
<ul>
<li>Add support for removing newlines (line feeds) to <a href="https://git.sr.ht/~alanorth/csv-metadata-quality">csv-metadata-quality</a></li>
<li>On the subject of validating some of our fields like countries and regions, Abenet pointed out that these should all be valid AGROVOC terms, so we can actually try to validate against that!</li>

View File

@ -43,7 +43,7 @@ After rebooting, all statistics cores were loaded&hellip; wow, that&#39;s lucky.
Run system updates on DSpace Test (linode19) and reboot it
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -124,11 +124,11 @@ Run system updates on DSpace Test (linode19) and reboot it
</p>
</header>
<h2 id="20190803">2019-08-03</h2>
<h2 id="2019-08-03">2019-08-03</h2>
<ul>
<li>Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul>
<h2 id="20190804">2019-08-04</h2>
<h2 id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
@ -139,7 +139,7 @@ Run system updates on DSpace Test (linode19) and reboot it
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul>
<h2 id="20190805">2019-08-05</h2>
<h2 id="2019-08-05">2019-08-05</h2>
<ul>
<li>Update Tomcat to 7.0.96 in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a></li>
<li>Update PostgreSQL JDBC driver to 42.2.6 in the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastrucutre playbooks</a></li>
@ -201,7 +201,7 @@ Run system updates on DSpace Test (linode19) and reboot it
<li>I tried to extract the filenames and construct a URL to download the PDFs with my <code>generate-thumbnails.py</code> script, but there seem to be several paths for PDFs so I can't guess it properly</li>
<li>I will have to wait for Francesco to respond about the PDFs, or perhaps proceed with a metadata-only upload so we can do other checks on DSpace Test</li>
</ul>
<h2 id="20190806">2019-08-06</h2>
<h2 id="2019-08-06">2019-08-06</h2>
<ul>
<li>Francesca responded to address my feedback yesterday
<ul>
@ -213,11 +213,11 @@ Run system updates on DSpace Test (linode19) and reboot it
</ul>
</li>
</ul>
<h2 id="20190807">2019-08-07</h2>
<h2 id="2019-08-07">2019-08-07</h2>
<ul>
<li>Daniel Haile-Michael asked about using a logical OR with the DSpace OpenSearch, but I looked in the DSpace manual and it does not seem to be possible</li>
</ul>
<h2 id="20190808">2019-08-08</h2>
<h2 id="2019-08-08">2019-08-08</h2>
<ul>
<li>Moayad noticed that the HTTPS certificate expired on the AReS dev server (linode20)
<ul>
@ -274,7 +274,7 @@ $ ./generate-thumbnails.py -i /tmp/user-upload2.csv -w --url-field-name url -d |
<li>Though I am really wondering why this happened now, because the configuration has been working for months&hellip;</li>
<li>Improve the output of the suspicious characters check in <a href="https://github.com/alanorth/csv-metadata-quality">csv-metadata-quality</a> script and tag version 0.2.0</li>
</ul>
<h2 id="20190809">2019-08-09</h2>
<h2 id="2019-08-09">2019-08-09</h2>
<ul>
<li>Looking at the 128 IITA records (20195TH.xls) that Sisay uploadd to DSpace Test last month: <a href="https://dspacetest.cgiar.org/handle/10568/102361">IITA_July_29</a>
<ul>
@ -294,11 +294,11 @@ $ ./generate-thumbnails.py -i /tmp/user-upload2.csv -w --url-field-name url -d |
</ul>
</li>
</ul>
<h2 id="20190810">2019-08-10</h2>
<h2 id="2019-08-10">2019-08-10</h2>
<ul>
<li>Add checks for uncommon filename extensions and replacements for unneccesary Unicode to the csv-metadata-quality script</li>
</ul>
<h2 id="20190812">2019-08-12</h2>
<h2 id="2019-08-12">2019-08-12</h2>
<ul>
<li>Looking at the 128 IITA records again:
<ul>
@ -317,7 +317,7 @@ $ ./generate-thumbnails.py -i /tmp/user-upload2.csv -w --url-field-name url -d |
</ul>
</li>
</ul>
<h2 id="20190813">2019-08-13</h2>
<h2 id="2019-08-13">2019-08-13</h2>
<ul>
<li>Create a test user on DSpace Test for Mohammad Salem to attempt depositing:</li>
</ul>
@ -343,7 +343,7 @@ $ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com
<li>This time it succeeded, and using VisualVM I noticed that the import process used a maximum of 620MB of RAM</li>
<li>(oops, I realize that actually I forgot to delete items I had flagged as duplicates, so the total should be 1,427 items)</li>
</ul>
<h2 id="20190814">2019-08-14</h2>
<h2 id="2019-08-14">2019-08-14</h2>
<ul>
<li>I imported the 1,427 Bioversity records into DSpace Test
<ul>
@ -359,11 +359,11 @@ $ dspace metadata-import -f /tmp/bioversity2.csv -e blah@blah.com
</code></pre><ul>
<li>The next step is to check these items for duplicates</li>
</ul>
<h2 id="20190816">2019-08-16</h2>
<h2 id="2019-08-16">2019-08-16</h2>
<ul>
<li>Email Bioversity to let them know that the 1,427 records are on DSpace Test and that Abenet should look over them</li>
</ul>
<h2 id="20190818">2019-08-18</h2>
<h2 id="2019-08-18">2019-08-18</h2>
<ul>
<li>Deploy latest <code>5_x-prod</code> branch on CGSpace (linode18), including the <a href="https://github.com/ilri/DSpace/pull/429">new CCAFS project tags</a></li>
<li>Deploy Tomcat 7.0.96 and PostgreSQL JDBC 42.2.6 driver on CGSpace (linde18)</li>
@ -375,7 +375,7 @@ $ dspace metadata-import -f /tmp/bioversity2.csv -e blah@blah.com
<li>After reboot the statistics-2018 core failed to load so I restarted <code>tomcat7</code> again</li>
<li>After this last restart all Solr cores seem to be up and running</li>
</ul>
<h2 id="20190820">2019-08-20</h2>
<h2 id="2019-08-20">2019-08-20</h2>
<ul>
<li>Francesco sent me a new CSV with the raw filenames and paths for the Bioversity migration
<ul>
@ -392,7 +392,7 @@ return os.path.basename(value)
<li>Then I can try to download all the files again with the script</li>
<li>I also asked Francesco about the strange filenames (.LCK, .zip, and .7z)</li>
</ul>
<h2 id="20190821">2019-08-21</h2>
<h2 id="2019-08-21">2019-08-21</h2>
<ul>
<li>Upload <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality repository to ILRI's GitHub organization</a></li>
<li>Fix a few invalid countries in IITA's <a href="https://dspacetest.cgiar.org/handle/10568/102361">July 29</a> records (aka &ldquo;20195TH.xls&rdquo;)
@ -402,16 +402,16 @@ return os.path.basename(value)
</ul>
</li>
</ul>
<h2 id="20190822">2019-08-22</h2>
<h2 id="2019-08-22">2019-08-22</h2>
<ul>
<li>Transfer original <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> repository to ILRI organization on GitHub</li>
</ul>
<h2 id="20190823">2019-08-23</h2>
<h2 id="2019-08-23">2019-08-23</h2>
<ul>
<li>Run system updates on AReS / OpenRXV dev server (linode20) and reboot it</li>
<li>Fix AReS exports on DSpace Test by adding a new nginx proxy pass</li>
</ul>
<h2 id="20190826">2019-08-26</h2>
<h2 id="2019-08-26">2019-08-26</h2>
<ul>
<li>Peter sent 2,943 corrections to the author dump I had originally sent him on 2019-05-27
<ul>
@ -448,7 +448,7 @@ sys 2m24.715s
</ul>
</li>
</ul>
<h2 id="20190827">2019-08-27</h2>
<h2 id="2019-08-27">2019-08-27</h2>
<ul>
<li>File <a href="https://github.com/ilri/OpenRXV/issues/11">an issue on OpenRXV</a> for the bug when selecting communities</li>
<li>Peter approved the related citation changes so I merged the <a href="https://github.com/ilri/DSpace/pull/430">pull request on GitHub</a> and will deploy it to CGSpace this weekend</li>
@ -461,7 +461,7 @@ sys 2m24.715s
</li>
<li>Add a fix for missing space after commas to my <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> script and tag version 0.2.2</li>
</ul>
<h2 id="20190828">2019-08-28</h2>
<h2 id="2019-08-28">2019-08-28</h2>
<ul>
<li>Skype with Jane about AReS Phase III priorities</li>
<li>I did a test to automatically fix some authors in the database using my csv-metadata-quality script
@ -488,7 +488,7 @@ COPY 65597
</code></pre><ul>
<li>I very well might run these on CGSpace soon&hellip;</li>
</ul>
<h2 id="20190829">2019-08-29</h2>
<h2 id="2019-08-29">2019-08-29</h2>
<ul>
<li>Resume working on the CG Core v2 changes in the <code>5_x-cgcorev2</code> branch again
<ul>
@ -522,7 +522,7 @@ COPY 65597
</code></pre><ul>
<li>So this is the same issue we had before, where Altmetric <em>knows</em> this Handle is associated with a DOI that has a score, but the client-side JavaScript code doesn't show it because it seems to a secondary handle or something</li>
</ul>
<h2 id="20190831">2019-08-31</h2>
<h2 id="2019-08-31">2019-08-31</h2>
<ul>
<li>Run system updates on DSpace Test (linode19) and reboot the server</li>
<li>Run the author fixes on DSpace Test and CGSpace and start a full Discovery re-index:</li>

View File

@ -69,7 +69,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -150,7 +150,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
</p>
</header>
<h2 id="20190901">2019-09-01</h2>
<h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
@ -198,7 +198,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
</code></pre><ul>
<li>I'm not sure why the outbound traffic rate was so high&hellip;</li>
</ul>
<h2 id="20190902">2019-09-02</h2>
<h2 id="2019-09-02">2019-09-02</h2>
<ul>
<li>Follow up with Carol and Francesca from Bioversity as they were on holiday during the mid-to-late August
<ul>
@ -208,7 +208,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
</ul>
</li>
</ul>
<h2 id="20190910">2019-09-10</h2>
<h2 id="2019-09-10">2019-09-10</h2>
<ul>
<li>Altmetric responded to say that they have fixed an issue with their badge code so now research outputs with multiple handles are showing badges!
<ul>
@ -232,7 +232,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
</ul>
</li>
</ul>
<h2 id="20190911">2019-09-11</h2>
<h2 id="2019-09-11">2019-09-11</h2>
<ul>
<li>Maria Garruccio asked me to add two new Bioversity ORCID identifiers to CGSpace so I created a <a href="https://github.com/ilri/DSpace/pull/431">pull request</a></li>
<li>Marissa Van Epp asked me to add new CCAFS Phase II project tags to CGSpace so I created a <a href="https://github.com/ilri/DSpace/pull/432">pull request</a>
@ -246,11 +246,11 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
</ul>
</li>
</ul>
<h2 id="20190912">2019-09-12</h2>
<h2 id="2019-09-12">2019-09-12</h2>
<ul>
<li>Deploy <a href="https://jdbc.postgresql.org/">PostgreSQL JDBC driver</a> version 42.2.7 on DSpace Test and update the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a></li>
</ul>
<h2 id="20190915">2019-09-15</h2>
<h2 id="2019-09-15">2019-09-15</h2>
<ul>
<li>Deploy Bioversity ORCID identifier updates to CGSpace</li>
<li>Deploy PostgreSQL JDBC driver 42.2.7 on CGSpace</li>
@ -309,7 +309,7 @@ dspace.log.2019-09-15:808
</ul>
</li>
</ul>
<h2 id="20190919">2019-09-19</h2>
<h2 id="2019-09-19">2019-09-19</h2>
<ul>
<li>For some reason my podman PostgreSQL container isn't working so I had to use Docker to re-create it for my testing work today:</li>
</ul>
@ -394,7 +394,7 @@ $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-s
</ul>
</li>
</ul>
<h2 id="20190920">2019-09-20</h2>
<h2 id="2019-09-20">2019-09-20</h2>
<ul>
<li>Deploy a fresh snapshot of CGSpace's PostgreSQL database on DSpace Test so we can get more accurate duplicate checking with the upcoming Bioversity and IITA migrations</li>
<li>Skype with Carol and Francesca to discuss the Bioveristy migration to CGSpace
@ -457,7 +457,7 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio
</ul>
</li>
</ul>
<h2 id="20190921">2019-09-21</h2>
<h2 id="2019-09-21">2019-09-21</h2>
<ul>
<li>Re-upload the <a href="https://dspacetest.cgiar.org/handle/10568/105116">IITA Sept 6 (20196th.xls) records to DSpace Test</a> after I did the re-sync yesterday
<ul>
@ -480,7 +480,7 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio
</ul>
</li>
</ul>
<h2 id="20190924">2019-09-24</h2>
<h2 id="2019-09-24">2019-09-24</h2>
<ul>
<li>Bosede fixed a few of the things I mentioned in her Sept 6 batch records, but there were still issues
<ul>
@ -489,7 +489,7 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio
</ul>
</li>
</ul>
<h2 id="20190926">2019-09-26</h2>
<h2 id="2019-09-26">2019-09-26</h2>
<ul>
<li>Release <a href="https://github.com/ilri/csv-metadata-quality/releases/tag/v0.3.0">version 0.3.0 of the csv-metadata-quality</a> tool
<ul>
@ -511,7 +511,7 @@ $ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institut
<li>The csv-metadata-quality tool caught a few records with excessive spacing and unnecessary Unicode</li>
<li>I could potentially use this with reconcile-csv and OpenRefine as a source to validate our institutional authors against&hellip;</li>
</ul>
<h2 id="20190927">2019-09-27</h2>
<h2 id="2019-09-27">2019-09-27</h2>
<ul>
<li>Skype with Peter and Abenet about CGSpace actions
<ul>

View File

@ -15,7 +15,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="October, 2019"/>
<meta name="twitter:description" content="2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U&#43;00A0) there that would otherwise be removed by the csv-metadata-quality script&#39;s &ldquo;unneccesary Unicode&rdquo; fix: $ csvcut -c &#39;id,dc."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -96,7 +96,7 @@
</p>
</header>
<h2 id="20191001">2019-10-01</h2>
<h2 id="2019-10-01">2019-10-01</h2>
<ul>
<li>Udana from IWMI asked me for a CSV export of their community on CGSpace
<ul>
@ -118,11 +118,11 @@
<li>That fixed 153 items (unnecessary Unicode, duplicates, commaspace fixes, etc)</li>
<li>Release <a href="https://github.com/ilri/csv-metadata-quality/releases/tag/v0.3.1">version 0.3.1 of the csv-metadata-quality script</a> with the non-breaking spaces change</li>
</ul>
<h2 id="20191003">2019-10-03</h2>
<h2 id="2019-10-03">2019-10-03</h2>
<ul>
<li>Upload the 117 IITA records that we had been working on last month (aka 20196th.xls aka Sept 6) to CGSpace</li>
</ul>
<h2 id="20191004">2019-10-04</h2>
<h2 id="2019-10-04">2019-10-04</h2>
<ul>
<li>Create an account for Bioversity's ICT consultant Francesco on DSpace Test:</li>
</ul>
@ -135,7 +135,7 @@
</ul>
</li>
</ul>
<h2 id="20191006">2019-10-06</h2>
<h2 id="2019-10-06">2019-10-06</h2>
<ul>
<li>Hector from CCAFS responded about my feedback of their CLARISA API
<ul>
@ -149,7 +149,7 @@
</ul>
</li>
</ul>
<h2 id="20191008">2019-10-08</h2>
<h2 id="2019-10-08">2019-10-08</h2>
<ul>
<li>Fix 108 more issues with authors in the ongoing Bioversity migration on DSpace Test, for example:
<ul>
@ -165,7 +165,7 @@
</ul>
</li>
</ul>
<h2 id="20191009">2019-10-09</h2>
<h2 id="2019-10-09">2019-10-09</h2>
<ul>
<li>Continue working on identifying duplicates in the Bioversity migration
<ul>
@ -176,7 +176,7 @@
</li>
<li>Run all system updates on DSpace Test (linode19) and reboot the server</li>
</ul>
<h2 id="20191010">2019-10-10</h2>
<h2 id="2019-10-10">2019-10-10</h2>
<ul>
<li>Felix Shaw from Earlham emailed me to ask about his admin account on DSpace Test
<ul>
@ -186,7 +186,7 @@
</li>
</ul>
<pre><code>$ dspace user -a -m wow@me.com -g Felix -s Shaw -p 'fuananaaa'
</code></pre><h2 id="20191011">2019-10-11</h2>
</code></pre><h2 id="2019-10-11">2019-10-11</h2>
<ul>
<li>I ran the DSpace cleanup function on CGSpace and it found some errors:</li>
</ul>
@ -200,7 +200,7 @@ Error: ERROR: update or delete on table &quot;bitstream&quot; violates foreign k
<pre><code># su - postgres
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (171221);'
UPDATE 1
</code></pre><h2 id="20191012">2019-10-12</h2>
</code></pre><h2 id="2019-10-12">2019-10-12</h2>
<ul>
<li>More work on identifying duplicates in the Bioversity migration data on DSpace Test
<ul>
@ -238,7 +238,7 @@ International Maize and Wheat Improvement Centre,International Maize and Wheat I
</ul>
</li>
</ul>
<h2 id="20191013">2019-10-13</h2>
<h2 id="2019-10-13">2019-10-13</h2>
<ul>
<li>More cleanup work on the authors in the Bioversity migration
<ul>
@ -280,7 +280,7 @@ real 82m35.993s
</code></pre><ul>
<li>So I'm still not sure where these weird authors in the &ldquo;Top Author&rdquo; stats are coming from</li>
</ul>
<h2 id="20191014">2019-10-14</h2>
<h2 id="2019-10-14">2019-10-14</h2>
<ul>
<li>I talked to Peter about the Bioversity items and he said that we should add the institutional authors back to <code>dc.contributor.author</code>, because I had moved them to <code>cg.contributor.affiliation</code>
<ul>
@ -288,7 +288,7 @@ real 82m35.993s
</ul>
</li>
</ul>
<h2 id="20191015">2019-10-15</h2>
<h2 id="2019-10-15">2019-10-15</h2>
<ul>
<li>I did a test export / import of the Bioversity migration items on DSpace Test
<ul>
@ -314,16 +314,16 @@ $ dspace import -a -c 10568/104057 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map
</code></pre><ul>
<li>After importing the 1,367 items I re-exported the metadata, changed the owning collections to those based on their type, then re-imported them</li>
</ul>
<h2 id="20191021">2019-10-21</h2>
<h2 id="2019-10-21">2019-10-21</h2>
<ul>
<li>Re-sync the DSpace Test database and assetstore with CGSpace</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul>
<h2 id="20191024">2019-10-24</h2>
<h2 id="2019-10-24">2019-10-24</h2>
<ul>
<li>Create a test user for Mohammad Salem to test depositing from MEL to DSpace Test, as the last one I had created in 2019-08 was cleared when we re-syncronized DSpace Test with CGSpace recently.</li>
</ul>
<h2 id="20191025">2019-10-25</h2>
<h2 id="2019-10-25">2019-10-25</h2>
<ul>
<li>Give a presentationa (via WebEx) about open source software to the ILRI Open Access Week
<ul>
@ -332,7 +332,7 @@ $ dspace import -a -c 10568/104057 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map
</ul>
</li>
</ul>
<h2 id="20191028">2019-10-28</h2>
<h2 id="2019-10-28">2019-10-28</h2>
<ul>
<li>Move the CGSpace CG Core v2 notes from a <a href="https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409">GitHub Gist</a> to a <a href="/cgspace-notes/cgspace-cgcorev2-migration/">page</a> on this site for archive and searchability sake</li>
<li>Work on the CG Core v2 implementation testing
@ -345,7 +345,7 @@ $ dspace import -a -c 10568/104057 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map
</ul>
</li>
</ul>
<h2 id="20191029">2019-10-29</h2>
<h2 id="2019-10-29">2019-10-29</h2>
<ul>
<li>After more digging in the source I found out why the <code>dcterms.title</code> and <code>dcterms.creator</code> fields are not present in the DRI <code>pageMeta</code>&hellip;
<ul>

View File

@ -55,7 +55,7 @@ Let&#39;s see how many of the REST API requests were for bitstreams (because the
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E &quot;[0-9]{1,2}/Oct/2019&quot; | grep -c -E &quot;/rest/bitstreams&quot;
106781
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -136,7 +136,7 @@ Let&#39;s see how many of the REST API requests were for bitstreams (because the
</p>
</header>
<h2 id="20191104">2019-11-04</h2>
<h2 id="2019-11-04">2019-11-04</h2>
<ul>
<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul>
@ -251,7 +251,7 @@ $ http --print Hh 'https://dspacetest.cgiar.org/bitstream/handle/10568/105487/cs
</ul>
</li>
</ul>
<h2 id="20191105">2019-11-05</h2>
<h2 id="2019-11-05">2019-11-05</h2>
<ul>
<li>I added &ldquo;alanfuu2&rdquo; to the example spiders file, restarted Tomcat, then made two requests to DSpace Test:</li>
</ul>
@ -271,7 +271,7 @@ $ http --print b 'http://localhost:8081/solr/statistics/select?q=userAgent:alanf
<li>Even though the &ldquo;mark by user agent&rdquo; function is not working (see email to dspace-tech mailing list) DSpace will still not log Solr events from these user agents</li>
</ul>
</li>
<li>I'm curious how the special character matching is in Solr, so I will test two requests: one with &ldquo;<a href="http://www.gnip.com">www.gnip.com</a>&rdquo; which is in the spider list, and one with &ldquo;<a href="http://www.gnyp.com">www.gnyp.com</a>&rdquo; which isn't:</li>
<li>I'm curious how the special character matching is in Solr, so I will test two requests: one with &ldquo;<a href="http://www.gnip.com%22">www.gnip.com&quot;</a> which is in the spider list, and one with &ldquo;<a href="http://www.gnyp.com%22">www.gnyp.com&quot;</a> which isn't:</li>
</ul>
<pre><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;www.gnip.com&quot;
$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/105487' User-Agent:&quot;www.gnyp.com&quot;
@ -286,7 +286,7 @@ $ http --print b 'http://localhost:8081/solr/statistics/select?q=userAgent:www.g
</code></pre><ul>
<li>So the blocking seems to be working because &ldquo;www.gnip.com&rdquo; is one of the new patterns added to the spiders file&hellip;</li>
</ul>
<h2 id="20191107">2019-11-07</h2>
<h2 id="2019-11-07">2019-11-07</h2>
<ul>
<li>CCAFS finally confirmed that they do indeed need the confusing new project tag that looks like a duplicate
<ul>
@ -353,7 +353,7 @@ $ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&a
</code></pre><ul>
<li>That answers Peter's question about why the stats jumped in October&hellip;</li>
</ul>
<h2 id="20191108">2019-11-08</h2>
<h2 id="2019-11-08">2019-11-08</h2>
<ul>
<li>I saw a bunch of user agents that have the literal string <code>User-Agent</code> in their user agent HTTP header, for example:
<ul>
@ -367,7 +367,7 @@ $ http --print b 'http://localhost:8081/solr/statistics-2018/select?facet=true&a
</li>
<li>I filed <a href="https://github.com/atmire/COUNTER-Robots/issues/27">an issue</a> on the COUNTER-Robots project to see if they agree to add <code>User-Agent:</code> to the list of robot user agents</li>
</ul>
<h2 id="20191109">2019-11-09</h2>
<h2 id="2019-11-09">2019-11-09</h2>
<ul>
<li>Deploy the latest <code>5_x-prod</code> branch on CGSpace (linode19)
<ul>
@ -391,7 +391,7 @@ istics-2014 statistics-2013 statistics-2012 statistics-2011 statistics-2010; do
</code></pre><ul>
<li>Open a <a href="https://github.com/atmire/COUNTER-Robots/pull/28">pull request</a> against COUNTER-Robots to remove unnecessary escaping of dashes</li>
</ul>
<h2 id="20191112">2019-11-12</h2>
<h2 id="2019-11-12">2019-11-12</h2>
<ul>
<li>Udana and Chandima emailed me to ask why <a href="https://hdl.handle.net/10568/81236">one of their WLE items</a> that is mapped from IWMI only shows up in the IWMI &ldquo;department&rdquo; on the Altmetric dashboard
<ul>
@ -406,7 +406,7 @@ istics-2014 statistics-2013 statistics-2012 statistics-2011 statistics-2010; do
</ul>
</li>
</ul>
<h2 id="20191113">2019-11-13</h2>
<h2 id="2019-11-13">2019-11-13</h2>
<ul>
<li>The <a href="https://hdl.handle.net/10568/97087">item with a low Altmetric score for its Handle</a> that I tweeted yesterday still hasn't linked with the DOI's score
<ul>
@ -437,7 +437,7 @@ $ http &quot;http://localhost:8081/solr/statistics/select?q=userAgent:/Scrapoo\/
</code></pre><ul>
<li>I updated the <code>check-spider-hits.sh</code> script to use the POST syntax, and I'm evaluating the feasability of including the regex search patterns from the spider agent file, as I had been filtering them out due to differences in PCRE and Solr regex syntax and issues with shell handling</li>
</ul>
<h2 id="20191114">2019-11-14</h2>
<h2 id="2019-11-14">2019-11-14</h2>
<ul>
<li>IWMI sent a few new ORCID identifiers for us to add to our controlled vocabulary</li>
<li>I will merge them with our existing list and then resolve their names using my <code>resolve-orcids.py</code> script:</li>
@ -459,7 +459,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</ul>
</li>
</ul>
<h2 id="20191115">2019-11-15</h2>
<h2 id="2019-11-15">2019-11-15</h2>
<ul>
<li>Run the new version of <code>check-spider-hits.sh</code> on CGSpace's Solr statistics cores one by one, starting from the oldest just in case something goes wrong</li>
<li>But then I noticed that some (all?) of the hits weren't actually getting purged, all of which were using regular expressions like:
@ -509,7 +509,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</code></pre><ul>
<li>Run system updates on DSpace Test and reboot the server</li>
</ul>
<h2 id="20191117">2019-11-17</h2>
<h2 id="2019-11-17">2019-11-17</h2>
<ul>
<li>Altmetric support responded about our dashboard question, asking if the second &ldquo;department&rdquo; (aka WLE's collection) was added recently and might have not been in the last harvesting yet
<ul>
@ -529,13 +529,13 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</li>
<li>Finally deploy <code>5_x-cgcorev2</code> branch on DSpace Test</li>
</ul>
<h2 id="20191118">2019-11-18</h2>
<h2 id="2019-11-18">2019-11-18</h2>
<ul>
<li>I sent a mail to the CGSpace partners in Addis about the CG Core v2 changes on DSpace Test</li>
<li>Then I filed an <a href="https://github.com/AgriculturalSemantics/cg-core/issues/11">issue on the CG Core GitHub</a> to let the metadata people know about our progress</li>
<li>It seems like I will do a session about CG Core v2 implementation and limitations in DSpace for the data workshop in December in Nairobi (?)</li>
</ul>
<h2 id="20191119">2019-11-19</h2>
<h2 id="2019-11-19">2019-11-19</h2>
<ul>
<li>Export IITA's community from CGSpace because they want to experiment with importing it into their internal DSpace for some testing or something
<ul>
@ -560,11 +560,11 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</code></pre><ul>
<li>All in all that's about 85,000 more hits purged, in addition to the 3.4 million I purged last week</li>
</ul>
<h2 id="20191120">2019-11-20</h2>
<h2 id="2019-11-20">2019-11-20</h2>
<ul>
<li>Email Usman Muchlish from CIFOR to see what he's doing with their DSpace lately</li>
</ul>
<h2 id="20191121">2019-11-21</h2>
<h2 id="2019-11-21">2019-11-21</h2>
<ul>
<li>Discuss bugs and issues with AReS v2 that are limiting its adoption
<ul>
@ -583,7 +583,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</li>
<li>We have a meeting about AReS future developments with Jane, Abenet, Peter, and Enrico tomorrow</li>
</ul>
<h2 id="20191122">2019-11-22</h2>
<h2 id="2019-11-22">2019-11-22</h2>
<ul>
<li>Skype with Jane, Abenet, Peter, and Enrico about AReS v2 future development
<ul>
@ -594,7 +594,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</ul>
</li>
</ul>
<h2 id="20191124">2019-11-24</h2>
<h2 id="2019-11-24">2019-11-24</h2>
<ul>
<li>I rebooted DSpace Test (linode19) and it kernel panicked at boot
<ul>
@ -609,7 +609,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</ul>
</li>
</ul>
<h2 id="20191125">2019-11-25</h2>
<h2 id="2019-11-25">2019-11-25</h2>
<ul>
<li>The migration of DSpace Test from Fremont, CA (USA) to Frankfurt (DE) region completed
<ul>
@ -617,7 +617,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</ul>
</li>
</ul>
<h2 id="20191126">2019-11-26</h2>
<h2 id="2019-11-26">2019-11-26</h2>
<ul>
<li>Visit CodeObia to discuss future of OpenRXV and AReS
<ul>
@ -627,7 +627,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</ul>
</li>
</ul>
<h2 id="20191127">2019-11-27</h2>
<h2 id="2019-11-27">2019-11-27</h2>
<ul>
<li>Minor updates on the <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a>
<ul>
@ -652,7 +652,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
</ul>
</li>
</ul>
<h2 id="20191128">2019-11-28</h2>
<h2 id="2019-11-28">2019-11-28</h2>
<ul>
<li>File an issue with CG Core v2 project to ask Marie-Angelique about expanding the scope of <code>cg.peer-reviewed</code> to include other types of review, and possibly to change the field name to something more generic like <code>cg.review-status</code> (<a href="https://github.com/AgriculturalSemantics/cg-core/issues/14">#14</a>)</li>
<li>More review of AReS feedback

View File

@ -24,7 +24,7 @@ Make sure all packages are up to date and the package manager is up to date, the
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-12/" />
<meta property="article:published_time" content="2019-12-01T11:22:30+02:00" />
<meta property="article:modified_time" content="2019-12-11T18:20:20+02:00" />
<meta property="article:modified_time" content="2019-12-11T19:02:05+02:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="December, 2019"/>
@ -43,7 +43,7 @@ Make sure all packages are up to date and the package manager is up to date, the
# dpkg -C
# reboot
"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -53,9 +53,9 @@ Make sure all packages are up to date and the package manager is up to date, the
"@type": "BlogPosting",
"headline": "December, 2019",
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-12\/",
"wordCount": "731",
"wordCount": "880",
"datePublished": "2019-12-01T11:22:30+02:00",
"dateModified": "2019-12-11T18:20:20+02:00",
"dateModified": "2019-12-11T19:02:05+02:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -124,7 +124,7 @@ Make sure all packages are up to date and the package manager is up to date, the
</p>
</header>
<h2 id="20191201">2019-12-01</h2>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
@ -173,7 +173,7 @@ Make sure all packages are up to date and the package manager is up to date, the
</ul>
</li>
</ul>
<h2 id="20191202">2019-12-02</h2>
<h2 id="2019-12-02">2019-12-02</h2>
<ul>
<li>Raise the issue of old, low-quality thumbnails with Peter and the CGSpace team
<ul>
@ -193,7 +193,7 @@ $ http 'https://dspacetest.cgiar.org/oai/request?verb=GetRecord&amp;metadataPref
<li>The DSpace Test ones actually now capture the DOI, where the CGSpace doesn't&hellip;</li>
<li>And the DSpace Test one doesn't include review status as <code>dc.description</code>, but I don't think that's an important field</li>
</ul>
<h2 id="20191204">2019-12-04</h2>
<h2 id="2019-12-04">2019-12-04</h2>
<ul>
<li>Peter noticed that there were about seventy items on CGSpace that were marked as private
<ul>
@ -203,7 +203,7 @@ $ http 'https://dspacetest.cgiar.org/oai/request?verb=GetRecord&amp;metadataPref
</ul>
<pre><code>dspace=# \COPY (SELECT handle, owning_collection FROM item, handle WHERE item.discoverable='f' AND item.in_archive='t' AND handle.resource_id = item.item_id) to /tmp/2019-12-04-CGSpace-private-items.csv WITH CSV HEADER;
COPY 48
</code></pre><h2 id="20191205">2019-12-05</h2>
</code></pre><h2 id="2019-12-05">2019-12-05</h2>
<ul>
<li>Give <a href="https://hdl.handle.net/10568/106045">presentation about CG Core v2</a> to the MEL Developers&rsquo; Retreat in Nairobi, Kenya (via Skype)</li>
<li>Send some pull requests to the cg-core schema repository:
@ -214,7 +214,7 @@ COPY 48
</ul>
</li>
</ul>
<h2 id="20191208">2019-12-08</h2>
<h2 id="2019-12-08">2019-12-08</h2>
<ul>
<li>Enrico noticed that the AReS Explorer on CGSpace (linode18) was down
<ul>
@ -224,7 +224,7 @@ COPY 48
</ul>
</li>
</ul>
<h2 id="20191209">2019-12-09</h2>
<h2 id="2019-12-09">2019-12-09</h2>
<ul>
<li>Update PostgreSQL JDBC driver to <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.9">version 42.2.9</a> in <a href="https://github.com/ilri/rmg-ansible-public">Ansible playbooks</a>
<ul>
@ -237,7 +237,7 @@ COPY 48
</ul>
</li>
</ul>
<h2 id="20191211">2019-12-11</h2>
<h2 id="2019-12-11">2019-12-11</h2>
<ul>
<li>Post <a href="https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=454830191804416">message to Yammer about good practices for thumbnails on CGSpace</a>
<ul>
@ -253,6 +253,32 @@ COPY 48
</li>
<li>While I was restarting the Tomcat service I upgraded the PostgreSQL JDBC driver to version 42.2.9, which had been deployed on DSpace Test earlier this week</li>
</ul>
<h2 id="2019-12-16">2019-12-16</h2>
<ul>
<li>Visit CodeObia office to discuss next phase of OpenRXV/AReS development
<ul>
<li>We discussed using CSV instead of Excel for tabular reports
<ul>
<li>OpenRXV should only have &ldquo;simple&rdquo; reports with Dublin Core fields</li>
<li>AReS should have this as well as a customized &ldquo;extended&rdquo; report that has CRPs, Subjects, Sponsors, etc from CGSpace</li>
</ul>
</li>
<li>We discussed using RTF instead of Word for graphical reports</li>
</ul>
</li>
</ul>
<h2 id="2019-12-17">2019-12-17</h2>
<ul>
<li>Start filing GitHub issues for the reporting features on OpenRXV and AReS
<ul>
<li>I created an issue for the &ldquo;simple&rdquo; tabular reports on OpenRXV GitHub (<a href="https://github.com/ilri/OpenRXV/issues/29">#29</a>)</li>
<li>I created an issue for the &ldquo;extended&rdquo; tabular reports on AReS GitHub (<a href="https://github.com/ilri/AReS/issues/8">#8</a>)</li>
<li>I created an issue for &ldquo;simple&rdquo; text reports on the OpenRXV GitHub (<a href="https://github.com/ilri/OpenRXV/issues/30">#30</a>)</li>
<li>I created an issue for &ldquo;extended&rdquo; text reports on the AReS GitHub (<a href="https://github.com/ilri/AReS/issues/9">#9</a>)</li>
</ul>
</li>
<li>I looked into creating RTF documents from HTML in Node.js and there is a library called <a href="https://www.npmjs.com/package/html-to-rtf">html-to-rtf</a> that works well, but doesn't support images</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="404 Page not found"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20191201">2019-12-01</h2>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
@ -131,7 +131,7 @@
</p>
</header>
<h2 id="20191104">2019-11-04</h2>
<h2 id="2019-11-04">2019-11-04</h2>
<ul>
<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul>
@ -208,7 +208,7 @@
</p>
</header>
<h2 id="20190901">2019-09-01</h2>
<h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
@ -253,11 +253,11 @@
</p>
</header>
<h2 id="20190803">2019-08-03</h2>
<h2 id="2019-08-03">2019-08-03</h2>
<ul>
<li>Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul>
<h2 id="20190804">2019-08-04</h2>
<h2 id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
@ -285,7 +285,7 @@
</p>
</header>
<h2 id="20190701">2019-07-01</h2>
<h2 id="2019-07-01">2019-07-01</h2>
<ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
@ -313,12 +313,12 @@
</p>
</header>
<h2 id="20190602">2019-06-02</h2>
<h2 id="2019-06-02">2019-06-02</h2>
<ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul>
<h2 id="20190603">2019-06-03</h2>
<h2 id="2019-06-03">2019-06-03</h2>
<ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul>
@ -339,7 +339,7 @@
</p>
</header>
<h2 id="20190501">2019-05-01</h2>
<h2 id="2019-05-01">2019-05-01</h2>
<ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
@ -372,7 +372,7 @@ DELETE 1
</p>
</header>
<h2 id="20190401">2019-04-01</h2>
<h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -84,7 +84,7 @@
</p>
</header>
<h2 id="20191201">2019-12-01</h2>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
@ -116,7 +116,7 @@
</p>
</header>
<h2 id="20191104">2019-11-04</h2>
<h2 id="2019-11-04">2019-11-04</h2>
<ul>
<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul>
@ -193,7 +193,7 @@
</p>
</header>
<h2 id="20190901">2019-09-01</h2>
<h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
@ -238,11 +238,11 @@
</p>
</header>
<h2 id="20190803">2019-08-03</h2>
<h2 id="2019-08-03">2019-08-03</h2>
<ul>
<li>Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul>
<h2 id="20190804">2019-08-04</h2>
<h2 id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
@ -270,7 +270,7 @@
</p>
</header>
<h2 id="20190701">2019-07-01</h2>
<h2 id="2019-07-01">2019-07-01</h2>
<ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
@ -298,12 +298,12 @@
</p>
</header>
<h2 id="20190602">2019-06-02</h2>
<h2 id="2019-06-02">2019-06-02</h2>
<ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul>
<h2 id="20190603">2019-06-03</h2>
<h2 id="2019-06-03">2019-06-03</h2>
<ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul>
@ -324,7 +324,7 @@
</p>
</header>
<h2 id="20190501">2019-05-01</h2>
<h2 id="2019-05-01">2019-05-01</h2>
<ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
@ -357,7 +357,7 @@ DELETE 1
</p>
</header>
<h2 id="20190401">2019-04-01</h2>
<h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>

View File

@ -17,7 +17,7 @@
<pubDate>Sun, 01 Dec 2019 11:22:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-12/</guid>
<description>&lt;h2 id=&#34;20191201&#34;&gt;2019-12-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-12-01&#34;&gt;2019-12-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Upgrade CGSpace (linode18) to Ubuntu 18.04:
&lt;ul&gt;
@ -40,7 +40,7 @@
<pubDate>Mon, 04 Nov 2019 12:20:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-11/</guid>
<description>&lt;h2 id=&#34;20191104&#34;&gt;2019-11-04&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-11-04&#34;&gt;2019-11-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
&lt;ul&gt;
@ -88,7 +88,7 @@
<pubDate>Sun, 01 Sep 2019 10:17:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-09/</guid>
<description>&lt;h2 id=&#34;20190901&#34;&gt;2019-09-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-09-01&#34;&gt;2019-09-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning&lt;/li&gt;
&lt;li&gt;Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:&lt;/li&gt;
@ -124,11 +124,11 @@
<pubDate>Sat, 03 Aug 2019 12:39:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-08/</guid>
<description>&lt;h2 id=&#34;20190803&#34;&gt;2019-08-03&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-08-03&#34;&gt;2019-08-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Look at Bioversity&#39;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&amp;hellip;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20190804&#34;&gt;2019-08-04&lt;/h2&gt;
&lt;h2 id=&#34;2019-08-04&#34;&gt;2019-08-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Deploy ORCID identifier updates requested by Bioversity to CGSpace&lt;/li&gt;
&lt;li&gt;Run system updates on CGSpace (linode18) and reboot it
@ -147,7 +147,7 @@
<pubDate>Mon, 01 Jul 2019 12:13:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-07/</guid>
<description>&lt;h2 id=&#34;20190701&#34;&gt;2019-07-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-07-01&#34;&gt;2019-07-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Create an &amp;ldquo;AfricaRice books and book chapters&amp;rdquo; collection on CGSpace for AfricaRice&lt;/li&gt;
&lt;li&gt;Last month Sisay asked why the following &amp;ldquo;most popular&amp;rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
@ -166,12 +166,12 @@
<pubDate>Sun, 02 Jun 2019 10:57:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-06/</guid>
<description>&lt;h2 id=&#34;20190602&#34;&gt;2019-06-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-06-02&#34;&gt;2019-06-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Merge the &lt;a href=&#34;https://github.com/ilri/DSpace/pull/425&#34;&gt;Solr filterCache&lt;/a&gt; and &lt;a href=&#34;https://github.com/ilri/DSpace/pull/426&#34;&gt;XMLUI ISI journal&lt;/a&gt; changes to the &lt;code&gt;5_x-prod&lt;/code&gt; branch and deploy on CGSpace&lt;/li&gt;
&lt;li&gt;Run system updates on CGSpace (linode18) and reboot it&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20190603&#34;&gt;2019-06-03&lt;/h2&gt;
&lt;h2 id=&#34;2019-06-03&#34;&gt;2019-06-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Skype with Marie-Angélique and Abenet about &lt;a href=&#34;https://agriculturalsemantics.github.io/cg-core/cgcore.html&#34;&gt;CG Core v2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>
@ -183,7 +183,7 @@
<pubDate>Wed, 01 May 2019 07:37:43 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-05/</guid>
<description>&lt;h2 id=&#34;20190501&#34;&gt;2019-05-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-05-01&#34;&gt;2019-05-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace&lt;/li&gt;
&lt;li&gt;A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
@ -207,7 +207,7 @@ DELETE 1
<pubDate>Mon, 01 Apr 2019 09:00:43 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-04/</guid>
<description>&lt;h2 id=&#34;20190401&#34;&gt;2019-04-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-04-01&#34;&gt;2019-04-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
&lt;ul&gt;
@ -239,7 +239,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<pubDate>Fri, 01 Mar 2019 12:16:30 +0100</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-03/</guid>
<description>&lt;h2 id=&#34;20190301&#34;&gt;2019-03-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-03-01&#34;&gt;2019-03-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I checked IITA&#39;s 259 Feb 14 records from last month for duplicates using Atmire&#39;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good&lt;/li&gt;
&lt;li&gt;I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&amp;hellip;&lt;/li&gt;
@ -262,7 +262,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<pubDate>Fri, 01 Feb 2019 21:37:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-02/</guid>
<description>&lt;h2 id=&#34;20190201&#34;&gt;2019-02-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-02-01&#34;&gt;2019-02-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!&lt;/li&gt;
&lt;li&gt;The top IPs before, during, and after this latest alert tonight were:&lt;/li&gt;
@ -298,7 +298,7 @@ sys 0m1.979s
<pubDate>Wed, 02 Jan 2019 09:48:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-01/</guid>
<description>&lt;h2 id=&#34;20190102&#34;&gt;2019-01-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-01-02&#34;&gt;2019-01-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning&lt;/li&gt;
&lt;li&gt;I don&#39;t see anything interesting in the web server logs around that time though:&lt;/li&gt;
@ -323,13 +323,13 @@ sys 0m1.979s
<pubDate>Sun, 02 Dec 2018 02:09:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-12/</guid>
<description>&lt;h2 id=&#34;20181201&#34;&gt;2018-12-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-12-01&#34;&gt;2018-12-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK&lt;/li&gt;
&lt;li&gt;I manually installed OpenJDK, then removed Oracle JDK, then re-ran the &lt;a href=&#34;http://github.com/ilri/rmg-ansible-public&#34;&gt;Ansible playbook&lt;/a&gt; to update all configuration files, etc&lt;/li&gt;
&lt;li&gt;Then I ran all system updates and restarted the server&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20181202&#34;&gt;2018-12-02&lt;/h2&gt;
&lt;h2 id=&#34;2018-12-02&#34;&gt;2018-12-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another &lt;a href=&#34;https://usn.ubuntu.com/3831-1/&#34;&gt;Ghostscript vulnerability last week&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>
@ -341,12 +341,12 @@ sys 0m1.979s
<pubDate>Thu, 01 Nov 2018 16:41:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-11/</guid>
<description>&lt;h2 id=&#34;20181101&#34;&gt;2018-11-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-11-01&#34;&gt;2018-11-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Finalize AReS Phase I and Phase II ToRs&lt;/li&gt;
&lt;li&gt;Send a note about my &lt;a href=&#34;https://github.com/ilri/dspace-statistics-api&#34;&gt;dspace-statistics-api&lt;/a&gt; to the dspace-tech mailing list&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20181103&#34;&gt;2018-11-03&lt;/h2&gt;
&lt;h2 id=&#34;2018-11-03&#34;&gt;2018-11-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage&lt;/li&gt;
&lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt;
@ -359,7 +359,7 @@ sys 0m1.979s
<pubDate>Mon, 01 Oct 2018 22:31:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-10/</guid>
<description>&lt;h2 id=&#34;20181001&#34;&gt;2018-10-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-10-01&#34;&gt;2018-10-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items&lt;/li&gt;
&lt;li&gt;I created a GitHub issue to track this &lt;a href=&#34;https://github.com/ilri/DSpace/issues/389&#34;&gt;#389&lt;/a&gt;, because I&#39;m super busy in Nairobi right now&lt;/li&gt;
@ -372,7 +372,7 @@ sys 0m1.979s
<pubDate>Sun, 02 Sep 2018 09:55:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-09/</guid>
<description>&lt;h2 id=&#34;20180902&#34;&gt;2018-09-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-09-02&#34;&gt;2018-09-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;New &lt;a href=&#34;https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5&#34;&gt;PostgreSQL JDBC driver version 42.2.5&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;I&#39;ll update the DSpace role in our &lt;a href=&#34;https://github.com/ilri/rmg-ansible-public&#34;&gt;Ansible infrastructure playbooks&lt;/a&gt; and run the updated playbooks on CGSpace and DSpace Test&lt;/li&gt;
@ -387,7 +387,7 @@ sys 0m1.979s
<pubDate>Wed, 01 Aug 2018 11:52:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-08/</guid>
<description>&lt;h2 id=&#34;20180801&#34;&gt;2018-08-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-08-01&#34;&gt;2018-08-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;DSpace Test had crashed at some point yesterday morning and I see the following in &lt;code&gt;dmesg&lt;/code&gt;:&lt;/li&gt;
&lt;/ul&gt;
@ -410,7 +410,7 @@ sys 0m1.979s
<pubDate>Sun, 01 Jul 2018 12:56:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-07/</guid>
<description>&lt;h2 id=&#34;20180701&#34;&gt;2018-07-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-07-01&#34;&gt;2018-07-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:&lt;/li&gt;
&lt;/ul&gt;
@ -428,7 +428,7 @@ sys 0m1.979s
<pubDate>Mon, 04 Jun 2018 19:49:54 -0700</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-06/</guid>
<description>&lt;h2 id=&#34;20180604&#34;&gt;2018-06-04&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-06-04&#34;&gt;2018-06-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Test the &lt;a href=&#34;https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560&#34;&gt;DSpace 5.8 module upgrades from Atmire&lt;/a&gt; (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/378&#34;&gt;#378&lt;/a&gt;)
&lt;ul&gt;
@ -457,7 +457,7 @@ sys 2m7.289s
<pubDate>Tue, 01 May 2018 16:43:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-05/</guid>
<description>&lt;h2 id=&#34;20180501&#34;&gt;2018-05-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-05-01&#34;&gt;2018-05-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
&lt;ul&gt;
@ -476,7 +476,7 @@ sys 2m7.289s
<pubDate>Sun, 01 Apr 2018 16:13:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-04/</guid>
<description>&lt;h2 id=&#34;20180401&#34;&gt;2018-04-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-04-01&#34;&gt;2018-04-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I tried to test something on DSpace Test but noticed that it&#39;s down since god knows when&lt;/li&gt;
&lt;li&gt;Catalina logs at least show some memory errors yesterday:&lt;/li&gt;
@ -489,7 +489,7 @@ sys 2m7.289s
<pubDate>Fri, 02 Mar 2018 16:07:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-03/</guid>
<description>&lt;h2 id=&#34;20180302&#34;&gt;2018-03-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-03-02&#34;&gt;2018-03-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Export a CSV of the IITA community metadata for Martin Mueller&lt;/li&gt;
&lt;/ul&gt;</description>
@ -501,7 +501,7 @@ sys 2m7.289s
<pubDate>Thu, 01 Feb 2018 16:28:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-02/</guid>
<description>&lt;h2 id=&#34;20180201&#34;&gt;2018-02-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-02-01&#34;&gt;2018-02-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Peter gave feedback on the &lt;code&gt;dc.rights&lt;/code&gt; proof of concept that I had sent him last week&lt;/li&gt;
&lt;li&gt;We don&#39;t need to distinguish between internal and external works, so that makes it just a simple list&lt;/li&gt;
@ -516,7 +516,7 @@ sys 2m7.289s
<pubDate>Tue, 02 Jan 2018 08:35:54 -0800</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-01/</guid>
<description>&lt;h2 id=&#34;20180102&#34;&gt;2018-01-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-01-02&#34;&gt;2018-01-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time&lt;/li&gt;
&lt;li&gt;I didn&#39;t get any load alerts from Linode and the REST and XMLUI logs don&#39;t show anything out of the ordinary&lt;/li&gt;
@ -591,7 +591,7 @@ dspace.log.2018-01-02:34
<pubDate>Fri, 01 Dec 2017 13:53:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-12/</guid>
<description>&lt;h2 id=&#34;20171201&#34;&gt;2017-12-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-12-01&#34;&gt;2017-12-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Uptime Robot noticed that CGSpace went down&lt;/li&gt;
&lt;li&gt;The logs say &amp;ldquo;Timeout waiting for idle object&amp;rdquo;&lt;/li&gt;
@ -606,11 +606,11 @@ dspace.log.2018-01-02:34
<pubDate>Thu, 02 Nov 2017 09:37:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-11/</guid>
<description>&lt;h2 id=&#34;20171101&#34;&gt;2017-11-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-11-01&#34;&gt;2017-11-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The CORE developers responded to say they are looking into their bot not respecting our robots.txt&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20171102&#34;&gt;2017-11-02&lt;/h2&gt;
&lt;h2 id=&#34;2017-11-02&#34;&gt;2017-11-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Today there have been no hits by CORE and no alerts from Linode (coincidence?)&lt;/li&gt;
&lt;/ul&gt;
@ -630,7 +630,7 @@ COPY 54701
<pubDate>Sun, 01 Oct 2017 08:07:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-10/</guid>
<description>&lt;h2 id=&#34;20171001&#34;&gt;2017-10-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-10-01&#34;&gt;2017-10-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Peter emailed to point out that many items in the &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/2703&#34;&gt;ILRI archive collection&lt;/a&gt; have multiple handles:&lt;/li&gt;
&lt;/ul&gt;

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -84,7 +84,7 @@
</p>
</header>
<h2 id="20190301">2019-03-01</h2>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
@ -116,7 +116,7 @@
</p>
</header>
<h2 id="20190201">2019-02-01</h2>
<h2 id="2019-02-01">2019-02-01</h2>
<ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
@ -161,7 +161,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20190102">2019-01-02</h2>
<h2 id="2019-01-02">2019-01-02</h2>
<ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don't see anything interesting in the web server logs around that time though:</li>
@ -195,13 +195,13 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181201">2018-12-01</h2>
<h2 id="2018-12-01">2018-12-01</h2>
<ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li>
</ul>
<h2 id="20181202">2018-12-02</h2>
<h2 id="2018-12-02">2018-12-02</h2>
<ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul>
@ -222,12 +222,12 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181101">2018-11-01</h2>
<h2 id="2018-11-01">2018-11-01</h2>
<ul>
<li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="20181103">2018-11-03</h2>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
@ -249,7 +249,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181001">2018-10-01</h2>
<h2 id="2018-10-01">2018-10-01</h2>
<ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
@ -271,7 +271,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180902">2018-09-02</h2>
<h2 id="2018-09-02">2018-09-02</h2>
<ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
@ -295,7 +295,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180801">2018-08-01</h2>
<h2 id="2018-08-01">2018-08-01</h2>
<ul>
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
@ -327,7 +327,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180701">2018-07-01</h2>
<h2 id="2018-07-01">2018-07-01</h2>
<ul>
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
</ul>
@ -354,7 +354,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180604">2018-06-04</h2>
<h2 id="2018-06-04">2018-06-04</h2>
<ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -84,7 +84,7 @@
</p>
</header>
<h2 id="20180501">2018-05-01</h2>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
@ -112,7 +112,7 @@
</p>
</header>
<h2 id="20180401">2018-04-01</h2>
<h2 id="2018-04-01">2018-04-01</h2>
<ul>
<li>I tried to test something on DSpace Test but noticed that it's down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li>
@ -134,7 +134,7 @@
</p>
</header>
<h2 id="20180302">2018-03-02</h2>
<h2 id="2018-03-02">2018-03-02</h2>
<ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul>
@ -155,7 +155,7 @@
</p>
</header>
<h2 id="20180201">2018-02-01</h2>
<h2 id="2018-02-01">2018-02-01</h2>
<ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don't need to distinguish between internal and external works, so that makes it just a simple list</li>
@ -179,7 +179,7 @@
</p>
</header>
<h2 id="20180102">2018-01-02</h2>
<h2 id="2018-01-02">2018-01-02</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
@ -263,7 +263,7 @@ dspace.log.2018-01-02:34
</p>
</header>
<h2 id="20171201">2017-12-01</h2>
<h2 id="2017-12-01">2017-12-01</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li>
@ -287,11 +287,11 @@ dspace.log.2018-01-02:34
</p>
</header>
<h2 id="20171101">2017-11-01</h2>
<h2 id="2017-11-01">2017-11-01</h2>
<ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul>
<h2 id="20171102">2017-11-02</h2>
<h2 id="2017-11-02">2017-11-02</h2>
<ul>
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
@ -320,7 +320,7 @@ COPY 54701
</p>
</header>
<h2 id="20171001">2017-10-01</h2>
<h2 id="2017-10-01">2017-10-01</h2>
<ul>
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20190301">2019-03-01</h2>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
@ -131,7 +131,7 @@
</p>
</header>
<h2 id="20190201">2019-02-01</h2>
<h2 id="2019-02-01">2019-02-01</h2>
<ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
@ -176,7 +176,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20190102">2019-01-02</h2>
<h2 id="2019-01-02">2019-01-02</h2>
<ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don't see anything interesting in the web server logs around that time though:</li>
@ -210,13 +210,13 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181201">2018-12-01</h2>
<h2 id="2018-12-01">2018-12-01</h2>
<ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li>
</ul>
<h2 id="20181202">2018-12-02</h2>
<h2 id="2018-12-02">2018-12-02</h2>
<ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul>
@ -237,12 +237,12 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181101">2018-11-01</h2>
<h2 id="2018-11-01">2018-11-01</h2>
<ul>
<li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="20181103">2018-11-03</h2>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
@ -264,7 +264,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181001">2018-10-01</h2>
<h2 id="2018-10-01">2018-10-01</h2>
<ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
@ -286,7 +286,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180902">2018-09-02</h2>
<h2 id="2018-09-02">2018-09-02</h2>
<ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
@ -310,7 +310,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180801">2018-08-01</h2>
<h2 id="2018-08-01">2018-08-01</h2>
<ul>
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
@ -342,7 +342,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180701">2018-07-01</h2>
<h2 id="2018-07-01">2018-07-01</h2>
<ul>
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
</ul>
@ -369,7 +369,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180604">2018-06-04</h2>
<h2 id="2018-06-04">2018-06-04</h2>
<ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20180501">2018-05-01</h2>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
@ -127,7 +127,7 @@
</p>
</header>
<h2 id="20180401">2018-04-01</h2>
<h2 id="2018-04-01">2018-04-01</h2>
<ul>
<li>I tried to test something on DSpace Test but noticed that it's down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li>
@ -149,7 +149,7 @@
</p>
</header>
<h2 id="20180302">2018-03-02</h2>
<h2 id="2018-03-02">2018-03-02</h2>
<ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul>
@ -170,7 +170,7 @@
</p>
</header>
<h2 id="20180201">2018-02-01</h2>
<h2 id="2018-02-01">2018-02-01</h2>
<ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don't need to distinguish between internal and external works, so that makes it just a simple list</li>
@ -194,7 +194,7 @@
</p>
</header>
<h2 id="20180102">2018-01-02</h2>
<h2 id="2018-01-02">2018-01-02</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
@ -278,7 +278,7 @@ dspace.log.2018-01-02:34
</p>
</header>
<h2 id="20171201">2017-12-01</h2>
<h2 id="2017-12-01">2017-12-01</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li>
@ -302,11 +302,11 @@ dspace.log.2018-01-02:34
</p>
</header>
<h2 id="20171101">2017-11-01</h2>
<h2 id="2017-11-01">2017-11-01</h2>
<ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul>
<h2 id="20171102">2017-11-02</h2>
<h2 id="2017-11-02">2017-11-02</h2>
<ul>
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
@ -335,7 +335,7 @@ COPY 54701
</p>
</header>
<h2 id="20171001">2017-10-01</h2>
<h2 id="2017-10-01">2017-10-01</h2>
<ul>
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
@ -381,11 +381,11 @@ COPY 54701
</p>
</header>
<h2 id="20170906">2017-09-06</h2>
<h2 id="2017-09-06">2017-09-06</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul>
<h2 id="20170907">2017-09-07</h2>
<h2 id="2017-09-07">2017-09-07</h2>
<ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li>
</ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20170801">2017-08-01</h2>
<h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
@ -138,11 +138,11 @@
</p>
</header>
<h2 id="20170701">2017-07-01</h2>
<h2 id="2017-07-01">2017-07-01</h2>
<ul>
<li>Run system updates and reboot DSpace Test</li>
</ul>
<h2 id="20170704">2017-07-04</h2>
<h2 id="2017-07-04">2017-07-04</h2>
<ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li>
@ -201,7 +201,7 @@
</p>
</header>
<h2 id="20170402">2017-04-02</h2>
<h2 id="2017-04-02">2017-04-02</h2>
<ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
@ -230,11 +230,11 @@
</p>
</header>
<h2 id="20170301">2017-03-01</h2>
<h2 id="2017-03-01">2017-03-01</h2>
<ul>
<li>Run the 279 CIAT author corrections on CGSpace</li>
</ul>
<h2 id="20170302">2017-03-02</h2>
<h2 id="2017-03-02">2017-03-02</h2>
<ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
@ -266,7 +266,7 @@
</p>
</header>
<h2 id="20170207">2017-02-07</h2>
<h2 id="2017-02-07">2017-02-07</h2>
<ul>
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
@ -300,7 +300,7 @@ DELETE 1
</p>
</header>
<h2 id="20170102">2017-01-02</h2>
<h2 id="2017-01-02">2017-01-02</h2>
<ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn't work there either</li>
@ -323,7 +323,7 @@ DELETE 1
</p>
</header>
<h2 id="20161202">2016-12-02</h2>
<h2 id="2016-12-02">2016-12-02</h2>
<ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
@ -355,7 +355,7 @@ DELETE 1
</p>
</header>
<h2 id="20161101">2016-11-01</h2>
<h2 id="2016-11-01">2016-11-01</h2>
<ul>
<li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20161003">2016-10-03</h2>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
@ -129,7 +129,7 @@
</p>
</header>
<h2 id="20160901">2016-09-01</h2>
<h2 id="2016-09-01">2016-09-01</h2>
<ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li>
@ -155,7 +155,7 @@
</p>
</header>
<h2 id="20160801">2016-08-01</h2>
<h2 id="2016-08-01">2016-08-01</h2>
<ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
@ -185,7 +185,7 @@ $ git rebase -i dspace-5.5
</p>
</header>
<h2 id="20160701">2016-07-01</h2>
<h2 id="2016-07-01">2016-07-01</h2>
<ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
@ -216,7 +216,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160601">2016-06-01</h2>
<h2 id="2016-06-01">2016-06-01</h2>
<ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
@ -242,7 +242,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160501">2016-05-01</h2>
<h2 id="2016-05-01">2016-05-01</h2>
<ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li>
@ -268,7 +268,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160404">2016-04-04</h2>
<h2 id="2016-04-04">2016-04-04</h2>
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
@ -293,7 +293,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160302">2016-03-02</h2>
<h2 id="2016-03-02">2016-03-02</h2>
<ul>
<li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li>
@ -316,7 +316,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160205">2016-02-05</h2>
<h2 id="2016-02-05">2016-02-05</h2>
<ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
@ -344,7 +344,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160113">2016-01-13</h2>
<h2 id="2016-01-13">2016-01-13</h2>
<ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20151202">2015-12-02</h2>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
@ -126,7 +126,7 @@
</p>
</header>
<h2 id="20151122">2015-11-22</h2>
<h2 id="2015-11-22">2015-11-22</h2>
<ul>
<li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>

View File

@ -15,7 +15,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGIAR Library Migration"/>
<meta name="twitter:description" content="Notes on the migration of the CGIAR Library to CGSpace"/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -100,7 +100,7 @@
</p>
</header>
<p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p>
<h2 id="premigration-technical-todos">Pre-migration Technical TODOs</h2>
<h2 id="pre-migration-technical-todos">Pre-migration Technical TODOs</h2>
<p>Things that need to happen before the migration:</p>
<ul>
<li><input checked="" disabled="" type="checkbox">Create top-level community on CGSpace to hold the CGIAR Library content: <code>10568/83389</code>

View File

@ -15,7 +15,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace CG Core v2 Migration"/>
<meta name="twitter:description" content="Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20191201">2019-12-01</h2>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
@ -131,7 +131,7 @@
</p>
</header>
<h2 id="20191104">2019-11-04</h2>
<h2 id="2019-11-04">2019-11-04</h2>
<ul>
<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul>
@ -208,7 +208,7 @@
</p>
</header>
<h2 id="20190901">2019-09-01</h2>
<h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
@ -253,11 +253,11 @@
</p>
</header>
<h2 id="20190803">2019-08-03</h2>
<h2 id="2019-08-03">2019-08-03</h2>
<ul>
<li>Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul>
<h2 id="20190804">2019-08-04</h2>
<h2 id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
@ -285,7 +285,7 @@
</p>
</header>
<h2 id="20190701">2019-07-01</h2>
<h2 id="2019-07-01">2019-07-01</h2>
<ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
@ -313,12 +313,12 @@
</p>
</header>
<h2 id="20190602">2019-06-02</h2>
<h2 id="2019-06-02">2019-06-02</h2>
<ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul>
<h2 id="20190603">2019-06-03</h2>
<h2 id="2019-06-03">2019-06-03</h2>
<ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul>
@ -339,7 +339,7 @@
</p>
</header>
<h2 id="20190501">2019-05-01</h2>
<h2 id="2019-05-01">2019-05-01</h2>
<ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
@ -372,7 +372,7 @@ DELETE 1
</p>
</header>
<h2 id="20190401">2019-04-01</h2>
<h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>

View File

@ -17,7 +17,7 @@
<pubDate>Sun, 01 Dec 2019 11:22:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-12/</guid>
<description>&lt;h2 id=&#34;20191201&#34;&gt;2019-12-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-12-01&#34;&gt;2019-12-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Upgrade CGSpace (linode18) to Ubuntu 18.04:
&lt;ul&gt;
@ -40,7 +40,7 @@
<pubDate>Mon, 04 Nov 2019 12:20:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-11/</guid>
<description>&lt;h2 id=&#34;20191104&#34;&gt;2019-11-04&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-11-04&#34;&gt;2019-11-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
&lt;ul&gt;
@ -88,7 +88,7 @@
<pubDate>Sun, 01 Sep 2019 10:17:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-09/</guid>
<description>&lt;h2 id=&#34;20190901&#34;&gt;2019-09-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-09-01&#34;&gt;2019-09-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning&lt;/li&gt;
&lt;li&gt;Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:&lt;/li&gt;
@ -124,11 +124,11 @@
<pubDate>Sat, 03 Aug 2019 12:39:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-08/</guid>
<description>&lt;h2 id=&#34;20190803&#34;&gt;2019-08-03&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-08-03&#34;&gt;2019-08-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Look at Bioversity&#39;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&amp;hellip;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20190804&#34;&gt;2019-08-04&lt;/h2&gt;
&lt;h2 id=&#34;2019-08-04&#34;&gt;2019-08-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Deploy ORCID identifier updates requested by Bioversity to CGSpace&lt;/li&gt;
&lt;li&gt;Run system updates on CGSpace (linode18) and reboot it
@ -147,7 +147,7 @@
<pubDate>Mon, 01 Jul 2019 12:13:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-07/</guid>
<description>&lt;h2 id=&#34;20190701&#34;&gt;2019-07-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-07-01&#34;&gt;2019-07-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Create an &amp;ldquo;AfricaRice books and book chapters&amp;rdquo; collection on CGSpace for AfricaRice&lt;/li&gt;
&lt;li&gt;Last month Sisay asked why the following &amp;ldquo;most popular&amp;rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
@ -166,12 +166,12 @@
<pubDate>Sun, 02 Jun 2019 10:57:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-06/</guid>
<description>&lt;h2 id=&#34;20190602&#34;&gt;2019-06-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-06-02&#34;&gt;2019-06-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Merge the &lt;a href=&#34;https://github.com/ilri/DSpace/pull/425&#34;&gt;Solr filterCache&lt;/a&gt; and &lt;a href=&#34;https://github.com/ilri/DSpace/pull/426&#34;&gt;XMLUI ISI journal&lt;/a&gt; changes to the &lt;code&gt;5_x-prod&lt;/code&gt; branch and deploy on CGSpace&lt;/li&gt;
&lt;li&gt;Run system updates on CGSpace (linode18) and reboot it&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20190603&#34;&gt;2019-06-03&lt;/h2&gt;
&lt;h2 id=&#34;2019-06-03&#34;&gt;2019-06-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Skype with Marie-Angélique and Abenet about &lt;a href=&#34;https://agriculturalsemantics.github.io/cg-core/cgcore.html&#34;&gt;CG Core v2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>
@ -183,7 +183,7 @@
<pubDate>Wed, 01 May 2019 07:37:43 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-05/</guid>
<description>&lt;h2 id=&#34;20190501&#34;&gt;2019-05-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-05-01&#34;&gt;2019-05-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace&lt;/li&gt;
&lt;li&gt;A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
@ -207,7 +207,7 @@ DELETE 1
<pubDate>Mon, 01 Apr 2019 09:00:43 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-04/</guid>
<description>&lt;h2 id=&#34;20190401&#34;&gt;2019-04-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-04-01&#34;&gt;2019-04-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
&lt;ul&gt;
@ -239,7 +239,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<pubDate>Fri, 01 Mar 2019 12:16:30 +0100</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-03/</guid>
<description>&lt;h2 id=&#34;20190301&#34;&gt;2019-03-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-03-01&#34;&gt;2019-03-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I checked IITA&#39;s 259 Feb 14 records from last month for duplicates using Atmire&#39;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good&lt;/li&gt;
&lt;li&gt;I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&amp;hellip;&lt;/li&gt;
@ -262,7 +262,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<pubDate>Fri, 01 Feb 2019 21:37:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-02/</guid>
<description>&lt;h2 id=&#34;20190201&#34;&gt;2019-02-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-02-01&#34;&gt;2019-02-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!&lt;/li&gt;
&lt;li&gt;The top IPs before, during, and after this latest alert tonight were:&lt;/li&gt;
@ -298,7 +298,7 @@ sys 0m1.979s
<pubDate>Wed, 02 Jan 2019 09:48:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-01/</guid>
<description>&lt;h2 id=&#34;20190102&#34;&gt;2019-01-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-01-02&#34;&gt;2019-01-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning&lt;/li&gt;
&lt;li&gt;I don&#39;t see anything interesting in the web server logs around that time though:&lt;/li&gt;
@ -323,13 +323,13 @@ sys 0m1.979s
<pubDate>Sun, 02 Dec 2018 02:09:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-12/</guid>
<description>&lt;h2 id=&#34;20181201&#34;&gt;2018-12-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-12-01&#34;&gt;2018-12-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK&lt;/li&gt;
&lt;li&gt;I manually installed OpenJDK, then removed Oracle JDK, then re-ran the &lt;a href=&#34;http://github.com/ilri/rmg-ansible-public&#34;&gt;Ansible playbook&lt;/a&gt; to update all configuration files, etc&lt;/li&gt;
&lt;li&gt;Then I ran all system updates and restarted the server&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20181202&#34;&gt;2018-12-02&lt;/h2&gt;
&lt;h2 id=&#34;2018-12-02&#34;&gt;2018-12-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another &lt;a href=&#34;https://usn.ubuntu.com/3831-1/&#34;&gt;Ghostscript vulnerability last week&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>
@ -341,12 +341,12 @@ sys 0m1.979s
<pubDate>Thu, 01 Nov 2018 16:41:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-11/</guid>
<description>&lt;h2 id=&#34;20181101&#34;&gt;2018-11-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-11-01&#34;&gt;2018-11-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Finalize AReS Phase I and Phase II ToRs&lt;/li&gt;
&lt;li&gt;Send a note about my &lt;a href=&#34;https://github.com/ilri/dspace-statistics-api&#34;&gt;dspace-statistics-api&lt;/a&gt; to the dspace-tech mailing list&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20181103&#34;&gt;2018-11-03&lt;/h2&gt;
&lt;h2 id=&#34;2018-11-03&#34;&gt;2018-11-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage&lt;/li&gt;
&lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt;
@ -359,7 +359,7 @@ sys 0m1.979s
<pubDate>Mon, 01 Oct 2018 22:31:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-10/</guid>
<description>&lt;h2 id=&#34;20181001&#34;&gt;2018-10-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-10-01&#34;&gt;2018-10-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items&lt;/li&gt;
&lt;li&gt;I created a GitHub issue to track this &lt;a href=&#34;https://github.com/ilri/DSpace/issues/389&#34;&gt;#389&lt;/a&gt;, because I&#39;m super busy in Nairobi right now&lt;/li&gt;
@ -372,7 +372,7 @@ sys 0m1.979s
<pubDate>Sun, 02 Sep 2018 09:55:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-09/</guid>
<description>&lt;h2 id=&#34;20180902&#34;&gt;2018-09-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-09-02&#34;&gt;2018-09-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;New &lt;a href=&#34;https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5&#34;&gt;PostgreSQL JDBC driver version 42.2.5&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;I&#39;ll update the DSpace role in our &lt;a href=&#34;https://github.com/ilri/rmg-ansible-public&#34;&gt;Ansible infrastructure playbooks&lt;/a&gt; and run the updated playbooks on CGSpace and DSpace Test&lt;/li&gt;
@ -387,7 +387,7 @@ sys 0m1.979s
<pubDate>Wed, 01 Aug 2018 11:52:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-08/</guid>
<description>&lt;h2 id=&#34;20180801&#34;&gt;2018-08-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-08-01&#34;&gt;2018-08-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;DSpace Test had crashed at some point yesterday morning and I see the following in &lt;code&gt;dmesg&lt;/code&gt;:&lt;/li&gt;
&lt;/ul&gt;
@ -410,7 +410,7 @@ sys 0m1.979s
<pubDate>Sun, 01 Jul 2018 12:56:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-07/</guid>
<description>&lt;h2 id=&#34;20180701&#34;&gt;2018-07-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-07-01&#34;&gt;2018-07-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:&lt;/li&gt;
&lt;/ul&gt;
@ -428,7 +428,7 @@ sys 0m1.979s
<pubDate>Mon, 04 Jun 2018 19:49:54 -0700</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-06/</guid>
<description>&lt;h2 id=&#34;20180604&#34;&gt;2018-06-04&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-06-04&#34;&gt;2018-06-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Test the &lt;a href=&#34;https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560&#34;&gt;DSpace 5.8 module upgrades from Atmire&lt;/a&gt; (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/378&#34;&gt;#378&lt;/a&gt;)
&lt;ul&gt;
@ -457,7 +457,7 @@ sys 2m7.289s
<pubDate>Tue, 01 May 2018 16:43:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-05/</guid>
<description>&lt;h2 id=&#34;20180501&#34;&gt;2018-05-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-05-01&#34;&gt;2018-05-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
&lt;ul&gt;
@ -476,7 +476,7 @@ sys 2m7.289s
<pubDate>Sun, 01 Apr 2018 16:13:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-04/</guid>
<description>&lt;h2 id=&#34;20180401&#34;&gt;2018-04-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-04-01&#34;&gt;2018-04-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I tried to test something on DSpace Test but noticed that it&#39;s down since god knows when&lt;/li&gt;
&lt;li&gt;Catalina logs at least show some memory errors yesterday:&lt;/li&gt;
@ -489,7 +489,7 @@ sys 2m7.289s
<pubDate>Fri, 02 Mar 2018 16:07:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-03/</guid>
<description>&lt;h2 id=&#34;20180302&#34;&gt;2018-03-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-03-02&#34;&gt;2018-03-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Export a CSV of the IITA community metadata for Martin Mueller&lt;/li&gt;
&lt;/ul&gt;</description>
@ -501,7 +501,7 @@ sys 2m7.289s
<pubDate>Thu, 01 Feb 2018 16:28:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-02/</guid>
<description>&lt;h2 id=&#34;20180201&#34;&gt;2018-02-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-02-01&#34;&gt;2018-02-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Peter gave feedback on the &lt;code&gt;dc.rights&lt;/code&gt; proof of concept that I had sent him last week&lt;/li&gt;
&lt;li&gt;We don&#39;t need to distinguish between internal and external works, so that makes it just a simple list&lt;/li&gt;
@ -516,7 +516,7 @@ sys 2m7.289s
<pubDate>Tue, 02 Jan 2018 08:35:54 -0800</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-01/</guid>
<description>&lt;h2 id=&#34;20180102&#34;&gt;2018-01-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-01-02&#34;&gt;2018-01-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time&lt;/li&gt;
&lt;li&gt;I didn&#39;t get any load alerts from Linode and the REST and XMLUI logs don&#39;t show anything out of the ordinary&lt;/li&gt;
@ -591,7 +591,7 @@ dspace.log.2018-01-02:34
<pubDate>Fri, 01 Dec 2017 13:53:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-12/</guid>
<description>&lt;h2 id=&#34;20171201&#34;&gt;2017-12-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-12-01&#34;&gt;2017-12-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Uptime Robot noticed that CGSpace went down&lt;/li&gt;
&lt;li&gt;The logs say &amp;ldquo;Timeout waiting for idle object&amp;rdquo;&lt;/li&gt;
@ -606,11 +606,11 @@ dspace.log.2018-01-02:34
<pubDate>Thu, 02 Nov 2017 09:37:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-11/</guid>
<description>&lt;h2 id=&#34;20171101&#34;&gt;2017-11-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-11-01&#34;&gt;2017-11-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The CORE developers responded to say they are looking into their bot not respecting our robots.txt&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20171102&#34;&gt;2017-11-02&lt;/h2&gt;
&lt;h2 id=&#34;2017-11-02&#34;&gt;2017-11-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Today there have been no hits by CORE and no alerts from Linode (coincidence?)&lt;/li&gt;
&lt;/ul&gt;
@ -630,7 +630,7 @@ COPY 54701
<pubDate>Sun, 01 Oct 2017 08:07:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-10/</guid>
<description>&lt;h2 id=&#34;20171001&#34;&gt;2017-10-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-10-01&#34;&gt;2017-10-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Peter emailed to point out that many items in the &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/2703&#34;&gt;ILRI archive collection&lt;/a&gt; have multiple handles:&lt;/li&gt;
&lt;/ul&gt;
@ -656,11 +656,11 @@ COPY 54701
<pubDate>Thu, 07 Sep 2017 16:54:52 +0700</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-09/</guid>
<description>&lt;h2 id=&#34;20170906&#34;&gt;2017-09-06&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-09-06&#34;&gt;2017-09-06&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20170907&#34;&gt;2017-09-07&lt;/h2&gt;
&lt;h2 id=&#34;2017-09-07&#34;&gt;2017-09-07&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Ask Sisay to clean up the WLE approvers a bit, as Marianne&#39;s user account is both in the approvers step as well as the group&lt;/li&gt;
&lt;/ul&gt;</description>
@ -672,7 +672,7 @@ COPY 54701
<pubDate>Tue, 01 Aug 2017 11:51:52 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-08/</guid>
<description>&lt;h2 id=&#34;20170801&#34;&gt;2017-08-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-08-01&#34;&gt;2017-08-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours&lt;/li&gt;
&lt;li&gt;I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)&lt;/li&gt;
@ -702,11 +702,11 @@ COPY 54701
<pubDate>Sat, 01 Jul 2017 18:03:52 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-07/</guid>
<description>&lt;h2 id=&#34;20170701&#34;&gt;2017-07-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-07-01&#34;&gt;2017-07-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Run system updates and reboot DSpace Test&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20170704&#34;&gt;2017-07-04&lt;/h2&gt;
&lt;h2 id=&#34;2017-07-04&#34;&gt;2017-07-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Merge changes for WLE Phase II theme rename (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/329&#34;&gt;#329&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Looking at extracting the metadata registries from ICARDA&#39;s MEL DSpace database so we can compare fields with CGSpace&lt;/li&gt;
@ -738,7 +738,7 @@ COPY 54701
<pubDate>Sun, 02 Apr 2017 17:08:52 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-04/</guid>
<description>&lt;h2 id=&#34;20170402&#34;&gt;2017-04-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-04-02&#34;&gt;2017-04-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Merge one change to CCAFS flagships that I had forgotten to remove last month (&amp;ldquo;MANAGING CLIMATE RISK&amp;rdquo;): &lt;a href=&#34;https://github.com/ilri/DSpace/pull/317&#34;&gt;https://github.com/ilri/DSpace/pull/317&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Quick proof-of-concept hack to add &lt;code&gt;dc.rights&lt;/code&gt; to the input form, including some inline instructions/hints:&lt;/li&gt;
@ -758,11 +758,11 @@ COPY 54701
<pubDate>Wed, 01 Mar 2017 17:08:52 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-03/</guid>
<description>&lt;h2 id=&#34;20170301&#34;&gt;2017-03-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-03-01&#34;&gt;2017-03-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Run the 279 CIAT author corrections on CGSpace&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20170302&#34;&gt;2017-03-02&lt;/h2&gt;
&lt;h2 id=&#34;2017-03-02&#34;&gt;2017-03-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace&lt;/li&gt;
&lt;li&gt;CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles&lt;/li&gt;
@ -785,7 +785,7 @@ COPY 54701
<pubDate>Tue, 07 Feb 2017 07:04:52 -0800</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-02/</guid>
<description>&lt;h2 id=&#34;20170207&#34;&gt;2017-02-07&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-02-07&#34;&gt;2017-02-07&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;An item was mapped twice erroneously again, so I had to remove one of the mappings manually:&lt;/li&gt;
&lt;/ul&gt;
@ -810,7 +810,7 @@ DELETE 1
<pubDate>Mon, 02 Jan 2017 10:43:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-01/</guid>
<description>&lt;h2 id=&#34;20170102&#34;&gt;2017-01-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-01-02&#34;&gt;2017-01-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error&lt;/li&gt;
&lt;li&gt;I tested on DSpace Test as well and it doesn&#39;t work there either&lt;/li&gt;
@ -824,7 +824,7 @@ DELETE 1
<pubDate>Fri, 02 Dec 2016 10:43:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-12/</guid>
<description>&lt;h2 id=&#34;20161202&#34;&gt;2016-12-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-12-02&#34;&gt;2016-12-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace was down for five hours in the morning while I was sleeping&lt;/li&gt;
&lt;li&gt;While looking in the logs for errors, I see tons of warnings about Atmire MQM:&lt;/li&gt;
@ -847,7 +847,7 @@ DELETE 1
<pubDate>Tue, 01 Nov 2016 09:21:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-11/</guid>
<description>&lt;h2 id=&#34;20161101&#34;&gt;2016-11-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-11-01&#34;&gt;2016-11-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add &lt;code&gt;dc.type&lt;/code&gt; to the output options for Atmire&#39;s Listings and Reports module (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/286&#34;&gt;#286&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
@ -860,7 +860,7 @@ DELETE 1
<pubDate>Mon, 03 Oct 2016 15:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-10/</guid>
<description>&lt;h2 id=&#34;20161003&#34;&gt;2016-10-03&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-10-03&#34;&gt;2016-10-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Testing adding &lt;a href=&#34;https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing&#34;&gt;ORCIDs to a CSV&lt;/a&gt; file for a single item to see if the author orders get messed up&lt;/li&gt;
&lt;li&gt;Need to test the following scenarios to see how author order is affected:
@ -881,7 +881,7 @@ DELETE 1
<pubDate>Thu, 01 Sep 2016 15:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-09/</guid>
<description>&lt;h2 id=&#34;20160901&#34;&gt;2016-09-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-09-01&#34;&gt;2016-09-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors&lt;/li&gt;
&lt;li&gt;Discuss how the migration of CGIAR&#39;s Active Directory to a flat structure will break our LDAP groups in DSpace&lt;/li&gt;
@ -898,7 +898,7 @@ DELETE 1
<pubDate>Mon, 01 Aug 2016 15:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-08/</guid>
<description>&lt;h2 id=&#34;20160801&#34;&gt;2016-08-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-08-01&#34;&gt;2016-08-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add updated distribution license from Sisay (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/259&#34;&gt;#259&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Play with upgrading Mirage 2 dependencies in &lt;code&gt;bower.json&lt;/code&gt; because most are several versions of out date&lt;/li&gt;
@ -919,7 +919,7 @@ $ git rebase -i dspace-5.5
<pubDate>Fri, 01 Jul 2016 10:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-07/</guid>
<description>&lt;h2 id=&#34;20160701&#34;&gt;2016-07-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-07-01&#34;&gt;2016-07-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add &lt;code&gt;dc.description.sponsorship&lt;/code&gt; to Discovery sidebar facets and make investors clickable in item view (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/232&#34;&gt;#232&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;I think this query should find and replace all authors that have &amp;ldquo;,&amp;rdquo; at the end of their names:&lt;/li&gt;
@ -941,7 +941,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Wed, 01 Jun 2016 10:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-06/</guid>
<description>&lt;h2 id=&#34;20160601&#34;&gt;2016-06-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-06-01&#34;&gt;2016-06-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Experimenting with IFPRI OAI (we want to harvest their publications)&lt;/li&gt;
&lt;li&gt;After reading the &lt;a href=&#34;https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html&#34;&gt;ContentDM documentation&lt;/a&gt; I found IFPRI&#39;s OAI endpoint: &lt;a href=&#34;http://ebrary.ifpri.org/oai/oai.php&#34;&gt;http://ebrary.ifpri.org/oai/oai.php&lt;/a&gt;&lt;/li&gt;
@ -958,7 +958,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Sun, 01 May 2016 23:06:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-05/</guid>
<description>&lt;h2 id=&#34;20160501&#34;&gt;2016-05-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-05-01&#34;&gt;2016-05-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Since yesterday there have been 10,000 REST errors and the site has been unstable again&lt;/li&gt;
&lt;li&gt;I have blocked access to the API now&lt;/li&gt;
@ -975,7 +975,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Mon, 04 Apr 2016 11:06:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-04/</guid>
<description>&lt;h2 id=&#34;20160404&#34;&gt;2016-04-04&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-04-04&#34;&gt;2016-04-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit&lt;/li&gt;
&lt;li&gt;We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc&lt;/li&gt;
@ -991,7 +991,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Wed, 02 Mar 2016 16:50:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-03/</guid>
<description>&lt;h2 id=&#34;20160302&#34;&gt;2016-03-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-03-02&#34;&gt;2016-03-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Looking at issues with author authorities on CGSpace&lt;/li&gt;
&lt;li&gt;For some reason we still have the &lt;code&gt;index-lucene-update&lt;/code&gt; cron job active on CGSpace, but I&#39;m pretty sure we don&#39;t need it as of the latest few versions of Atmire&#39;s Listings and Reports module&lt;/li&gt;
@ -1005,7 +1005,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Fri, 05 Feb 2016 13:18:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-02/</guid>
<description>&lt;h2 id=&#34;20160205&#34;&gt;2016-02-05&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-02-05&#34;&gt;2016-02-05&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Looking at some DAGRIS data for Abenet Yabowork&lt;/li&gt;
&lt;li&gt;Lots of issues with spaces, newlines, etc causing the import to fail&lt;/li&gt;
@ -1024,7 +1024,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Wed, 13 Jan 2016 13:18:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-01/</guid>
<description>&lt;h2 id=&#34;20160113&#34;&gt;2016-01-13&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-01-13&#34;&gt;2016-01-13&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Move ILRI collection &lt;code&gt;10568/12503&lt;/code&gt; from &lt;code&gt;10568/27869&lt;/code&gt; to &lt;code&gt;10568/27629&lt;/code&gt; using the &lt;a href=&#34;https://gist.github.com/alanorth/392c4660e8b022d99dfa&#34;&gt;move_collections.sh&lt;/a&gt; script I wrote last year.&lt;/li&gt;
&lt;li&gt;I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.&lt;/li&gt;
@ -1038,7 +1038,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Wed, 02 Dec 2015 13:18:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2015-12/</guid>
<description>&lt;h2 id=&#34;20151202&#34;&gt;2015-12-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2015-12-02&#34;&gt;2015-12-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Replace &lt;code&gt;lzop&lt;/code&gt; with &lt;code&gt;xz&lt;/code&gt; in log compression cron jobs on DSpace Test—it uses less space:&lt;/li&gt;
&lt;/ul&gt;
@ -1056,7 +1056,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Mon, 23 Nov 2015 17:00:57 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2015-11/</guid>
<description>&lt;h2 id=&#34;20151122&#34;&gt;2015-11-22&lt;/h2&gt;
<description>&lt;h2 id=&#34;2015-11-22&#34;&gt;2015-11-22&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace went down&lt;/li&gt;
&lt;li&gt;Looks like DSpace exhausted its PostgreSQL connection pool&lt;/li&gt;

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20190301">2019-03-01</h2>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
@ -131,7 +131,7 @@
</p>
</header>
<h2 id="20190201">2019-02-01</h2>
<h2 id="2019-02-01">2019-02-01</h2>
<ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
@ -176,7 +176,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20190102">2019-01-02</h2>
<h2 id="2019-01-02">2019-01-02</h2>
<ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don't see anything interesting in the web server logs around that time though:</li>
@ -210,13 +210,13 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181201">2018-12-01</h2>
<h2 id="2018-12-01">2018-12-01</h2>
<ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li>
</ul>
<h2 id="20181202">2018-12-02</h2>
<h2 id="2018-12-02">2018-12-02</h2>
<ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul>
@ -237,12 +237,12 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181101">2018-11-01</h2>
<h2 id="2018-11-01">2018-11-01</h2>
<ul>
<li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="20181103">2018-11-03</h2>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
@ -264,7 +264,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181001">2018-10-01</h2>
<h2 id="2018-10-01">2018-10-01</h2>
<ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
@ -286,7 +286,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180902">2018-09-02</h2>
<h2 id="2018-09-02">2018-09-02</h2>
<ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
@ -310,7 +310,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180801">2018-08-01</h2>
<h2 id="2018-08-01">2018-08-01</h2>
<ul>
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
@ -342,7 +342,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180701">2018-07-01</h2>
<h2 id="2018-07-01">2018-07-01</h2>
<ul>
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
</ul>
@ -369,7 +369,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180604">2018-06-04</h2>
<h2 id="2018-06-04">2018-06-04</h2>
<ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20180501">2018-05-01</h2>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
@ -127,7 +127,7 @@
</p>
</header>
<h2 id="20180401">2018-04-01</h2>
<h2 id="2018-04-01">2018-04-01</h2>
<ul>
<li>I tried to test something on DSpace Test but noticed that it's down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li>
@ -149,7 +149,7 @@
</p>
</header>
<h2 id="20180302">2018-03-02</h2>
<h2 id="2018-03-02">2018-03-02</h2>
<ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul>
@ -170,7 +170,7 @@
</p>
</header>
<h2 id="20180201">2018-02-01</h2>
<h2 id="2018-02-01">2018-02-01</h2>
<ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don't need to distinguish between internal and external works, so that makes it just a simple list</li>
@ -194,7 +194,7 @@
</p>
</header>
<h2 id="20180102">2018-01-02</h2>
<h2 id="2018-01-02">2018-01-02</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
@ -278,7 +278,7 @@ dspace.log.2018-01-02:34
</p>
</header>
<h2 id="20171201">2017-12-01</h2>
<h2 id="2017-12-01">2017-12-01</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li>
@ -302,11 +302,11 @@ dspace.log.2018-01-02:34
</p>
</header>
<h2 id="20171101">2017-11-01</h2>
<h2 id="2017-11-01">2017-11-01</h2>
<ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul>
<h2 id="20171102">2017-11-02</h2>
<h2 id="2017-11-02">2017-11-02</h2>
<ul>
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
@ -335,7 +335,7 @@ COPY 54701
</p>
</header>
<h2 id="20171001">2017-10-01</h2>
<h2 id="2017-10-01">2017-10-01</h2>
<ul>
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
@ -381,11 +381,11 @@ COPY 54701
</p>
</header>
<h2 id="20170906">2017-09-06</h2>
<h2 id="2017-09-06">2017-09-06</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul>
<h2 id="20170907">2017-09-07</h2>
<h2 id="2017-09-07">2017-09-07</h2>
<ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li>
</ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20170801">2017-08-01</h2>
<h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
@ -138,11 +138,11 @@
</p>
</header>
<h2 id="20170701">2017-07-01</h2>
<h2 id="2017-07-01">2017-07-01</h2>
<ul>
<li>Run system updates and reboot DSpace Test</li>
</ul>
<h2 id="20170704">2017-07-04</h2>
<h2 id="2017-07-04">2017-07-04</h2>
<ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li>
@ -201,7 +201,7 @@
</p>
</header>
<h2 id="20170402">2017-04-02</h2>
<h2 id="2017-04-02">2017-04-02</h2>
<ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
@ -230,11 +230,11 @@
</p>
</header>
<h2 id="20170301">2017-03-01</h2>
<h2 id="2017-03-01">2017-03-01</h2>
<ul>
<li>Run the 279 CIAT author corrections on CGSpace</li>
</ul>
<h2 id="20170302">2017-03-02</h2>
<h2 id="2017-03-02">2017-03-02</h2>
<ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
@ -266,7 +266,7 @@
</p>
</header>
<h2 id="20170207">2017-02-07</h2>
<h2 id="2017-02-07">2017-02-07</h2>
<ul>
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
@ -300,7 +300,7 @@ DELETE 1
</p>
</header>
<h2 id="20170102">2017-01-02</h2>
<h2 id="2017-01-02">2017-01-02</h2>
<ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn't work there either</li>
@ -323,7 +323,7 @@ DELETE 1
</p>
</header>
<h2 id="20161202">2016-12-02</h2>
<h2 id="2016-12-02">2016-12-02</h2>
<ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
@ -355,7 +355,7 @@ DELETE 1
</p>
</header>
<h2 id="20161101">2016-11-01</h2>
<h2 id="2016-11-01">2016-11-01</h2>
<ul>
<li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20161003">2016-10-03</h2>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
@ -129,7 +129,7 @@
</p>
</header>
<h2 id="20160901">2016-09-01</h2>
<h2 id="2016-09-01">2016-09-01</h2>
<ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li>
@ -155,7 +155,7 @@
</p>
</header>
<h2 id="20160801">2016-08-01</h2>
<h2 id="2016-08-01">2016-08-01</h2>
<ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
@ -185,7 +185,7 @@ $ git rebase -i dspace-5.5
</p>
</header>
<h2 id="20160701">2016-07-01</h2>
<h2 id="2016-07-01">2016-07-01</h2>
<ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
@ -216,7 +216,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160601">2016-06-01</h2>
<h2 id="2016-06-01">2016-06-01</h2>
<ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
@ -242,7 +242,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160501">2016-05-01</h2>
<h2 id="2016-05-01">2016-05-01</h2>
<ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li>
@ -268,7 +268,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160404">2016-04-04</h2>
<h2 id="2016-04-04">2016-04-04</h2>
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
@ -293,7 +293,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160302">2016-03-02</h2>
<h2 id="2016-03-02">2016-03-02</h2>
<ul>
<li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li>
@ -316,7 +316,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160205">2016-02-05</h2>
<h2 id="2016-02-05">2016-02-05</h2>
<ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
@ -344,7 +344,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160113">2016-01-13</h2>
<h2 id="2016-01-13">2016-01-13</h2>
<ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20151202">2015-12-02</h2>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
@ -126,7 +126,7 @@
</p>
</header>
<h2 id="20151122">2015-11-22</h2>
<h2 id="2015-11-22">2015-11-22</h2>
<ul>
<li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20191201">2019-12-01</h2>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
@ -131,7 +131,7 @@
</p>
</header>
<h2 id="20191104">2019-11-04</h2>
<h2 id="2019-11-04">2019-11-04</h2>
<ul>
<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul>
@ -208,7 +208,7 @@
</p>
</header>
<h2 id="20190901">2019-09-01</h2>
<h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
@ -253,11 +253,11 @@
</p>
</header>
<h2 id="20190803">2019-08-03</h2>
<h2 id="2019-08-03">2019-08-03</h2>
<ul>
<li>Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul>
<h2 id="20190804">2019-08-04</h2>
<h2 id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
@ -285,7 +285,7 @@
</p>
</header>
<h2 id="20190701">2019-07-01</h2>
<h2 id="2019-07-01">2019-07-01</h2>
<ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
@ -313,12 +313,12 @@
</p>
</header>
<h2 id="20190602">2019-06-02</h2>
<h2 id="2019-06-02">2019-06-02</h2>
<ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul>
<h2 id="20190603">2019-06-03</h2>
<h2 id="2019-06-03">2019-06-03</h2>
<ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul>
@ -339,7 +339,7 @@
</p>
</header>
<h2 id="20190501">2019-05-01</h2>
<h2 id="2019-05-01">2019-05-01</h2>
<ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
@ -372,7 +372,7 @@ DELETE 1
</p>
</header>
<h2 id="20190401">2019-04-01</h2>
<h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>

View File

@ -17,7 +17,7 @@
<pubDate>Sun, 01 Dec 2019 11:22:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-12/</guid>
<description>&lt;h2 id=&#34;20191201&#34;&gt;2019-12-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-12-01&#34;&gt;2019-12-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Upgrade CGSpace (linode18) to Ubuntu 18.04:
&lt;ul&gt;
@ -40,7 +40,7 @@
<pubDate>Mon, 04 Nov 2019 12:20:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-11/</guid>
<description>&lt;h2 id=&#34;20191104&#34;&gt;2019-11-04&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-11-04&#34;&gt;2019-11-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
&lt;ul&gt;
@ -88,7 +88,7 @@
<pubDate>Sun, 01 Sep 2019 10:17:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-09/</guid>
<description>&lt;h2 id=&#34;20190901&#34;&gt;2019-09-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-09-01&#34;&gt;2019-09-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning&lt;/li&gt;
&lt;li&gt;Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:&lt;/li&gt;
@ -124,11 +124,11 @@
<pubDate>Sat, 03 Aug 2019 12:39:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-08/</guid>
<description>&lt;h2 id=&#34;20190803&#34;&gt;2019-08-03&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-08-03&#34;&gt;2019-08-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Look at Bioversity&#39;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&amp;hellip;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20190804&#34;&gt;2019-08-04&lt;/h2&gt;
&lt;h2 id=&#34;2019-08-04&#34;&gt;2019-08-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Deploy ORCID identifier updates requested by Bioversity to CGSpace&lt;/li&gt;
&lt;li&gt;Run system updates on CGSpace (linode18) and reboot it
@ -147,7 +147,7 @@
<pubDate>Mon, 01 Jul 2019 12:13:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-07/</guid>
<description>&lt;h2 id=&#34;20190701&#34;&gt;2019-07-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-07-01&#34;&gt;2019-07-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Create an &amp;ldquo;AfricaRice books and book chapters&amp;rdquo; collection on CGSpace for AfricaRice&lt;/li&gt;
&lt;li&gt;Last month Sisay asked why the following &amp;ldquo;most popular&amp;rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
@ -166,12 +166,12 @@
<pubDate>Sun, 02 Jun 2019 10:57:51 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-06/</guid>
<description>&lt;h2 id=&#34;20190602&#34;&gt;2019-06-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-06-02&#34;&gt;2019-06-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Merge the &lt;a href=&#34;https://github.com/ilri/DSpace/pull/425&#34;&gt;Solr filterCache&lt;/a&gt; and &lt;a href=&#34;https://github.com/ilri/DSpace/pull/426&#34;&gt;XMLUI ISI journal&lt;/a&gt; changes to the &lt;code&gt;5_x-prod&lt;/code&gt; branch and deploy on CGSpace&lt;/li&gt;
&lt;li&gt;Run system updates on CGSpace (linode18) and reboot it&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20190603&#34;&gt;2019-06-03&lt;/h2&gt;
&lt;h2 id=&#34;2019-06-03&#34;&gt;2019-06-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Skype with Marie-Angélique and Abenet about &lt;a href=&#34;https://agriculturalsemantics.github.io/cg-core/cgcore.html&#34;&gt;CG Core v2&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>
@ -183,7 +183,7 @@
<pubDate>Wed, 01 May 2019 07:37:43 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-05/</guid>
<description>&lt;h2 id=&#34;20190501&#34;&gt;2019-05-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-05-01&#34;&gt;2019-05-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace&lt;/li&gt;
&lt;li&gt;A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
@ -207,7 +207,7 @@ DELETE 1
<pubDate>Mon, 01 Apr 2019 09:00:43 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-04/</guid>
<description>&lt;h2 id=&#34;20190401&#34;&gt;2019-04-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-04-01&#34;&gt;2019-04-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
&lt;ul&gt;
@ -239,7 +239,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<pubDate>Fri, 01 Mar 2019 12:16:30 +0100</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-03/</guid>
<description>&lt;h2 id=&#34;20190301&#34;&gt;2019-03-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-03-01&#34;&gt;2019-03-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I checked IITA&#39;s 259 Feb 14 records from last month for duplicates using Atmire&#39;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good&lt;/li&gt;
&lt;li&gt;I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&amp;hellip;&lt;/li&gt;
@ -262,7 +262,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<pubDate>Fri, 01 Feb 2019 21:37:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-02/</guid>
<description>&lt;h2 id=&#34;20190201&#34;&gt;2019-02-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-02-01&#34;&gt;2019-02-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!&lt;/li&gt;
&lt;li&gt;The top IPs before, during, and after this latest alert tonight were:&lt;/li&gt;
@ -298,7 +298,7 @@ sys 0m1.979s
<pubDate>Wed, 02 Jan 2019 09:48:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2019-01/</guid>
<description>&lt;h2 id=&#34;20190102&#34;&gt;2019-01-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2019-01-02&#34;&gt;2019-01-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning&lt;/li&gt;
&lt;li&gt;I don&#39;t see anything interesting in the web server logs around that time though:&lt;/li&gt;
@ -323,13 +323,13 @@ sys 0m1.979s
<pubDate>Sun, 02 Dec 2018 02:09:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-12/</guid>
<description>&lt;h2 id=&#34;20181201&#34;&gt;2018-12-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-12-01&#34;&gt;2018-12-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK&lt;/li&gt;
&lt;li&gt;I manually installed OpenJDK, then removed Oracle JDK, then re-ran the &lt;a href=&#34;http://github.com/ilri/rmg-ansible-public&#34;&gt;Ansible playbook&lt;/a&gt; to update all configuration files, etc&lt;/li&gt;
&lt;li&gt;Then I ran all system updates and restarted the server&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20181202&#34;&gt;2018-12-02&lt;/h2&gt;
&lt;h2 id=&#34;2018-12-02&#34;&gt;2018-12-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another &lt;a href=&#34;https://usn.ubuntu.com/3831-1/&#34;&gt;Ghostscript vulnerability last week&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>
@ -341,12 +341,12 @@ sys 0m1.979s
<pubDate>Thu, 01 Nov 2018 16:41:30 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-11/</guid>
<description>&lt;h2 id=&#34;20181101&#34;&gt;2018-11-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-11-01&#34;&gt;2018-11-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Finalize AReS Phase I and Phase II ToRs&lt;/li&gt;
&lt;li&gt;Send a note about my &lt;a href=&#34;https://github.com/ilri/dspace-statistics-api&#34;&gt;dspace-statistics-api&lt;/a&gt; to the dspace-tech mailing list&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20181103&#34;&gt;2018-11-03&lt;/h2&gt;
&lt;h2 id=&#34;2018-11-03&#34;&gt;2018-11-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage&lt;/li&gt;
&lt;li&gt;Today these are the top 10 IPs:&lt;/li&gt;
@ -359,7 +359,7 @@ sys 0m1.979s
<pubDate>Mon, 01 Oct 2018 22:31:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-10/</guid>
<description>&lt;h2 id=&#34;20181001&#34;&gt;2018-10-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-10-01&#34;&gt;2018-10-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items&lt;/li&gt;
&lt;li&gt;I created a GitHub issue to track this &lt;a href=&#34;https://github.com/ilri/DSpace/issues/389&#34;&gt;#389&lt;/a&gt;, because I&#39;m super busy in Nairobi right now&lt;/li&gt;
@ -372,7 +372,7 @@ sys 0m1.979s
<pubDate>Sun, 02 Sep 2018 09:55:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-09/</guid>
<description>&lt;h2 id=&#34;20180902&#34;&gt;2018-09-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-09-02&#34;&gt;2018-09-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;New &lt;a href=&#34;https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5&#34;&gt;PostgreSQL JDBC driver version 42.2.5&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;I&#39;ll update the DSpace role in our &lt;a href=&#34;https://github.com/ilri/rmg-ansible-public&#34;&gt;Ansible infrastructure playbooks&lt;/a&gt; and run the updated playbooks on CGSpace and DSpace Test&lt;/li&gt;
@ -387,7 +387,7 @@ sys 0m1.979s
<pubDate>Wed, 01 Aug 2018 11:52:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-08/</guid>
<description>&lt;h2 id=&#34;20180801&#34;&gt;2018-08-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-08-01&#34;&gt;2018-08-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;DSpace Test had crashed at some point yesterday morning and I see the following in &lt;code&gt;dmesg&lt;/code&gt;:&lt;/li&gt;
&lt;/ul&gt;
@ -410,7 +410,7 @@ sys 0m1.979s
<pubDate>Sun, 01 Jul 2018 12:56:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-07/</guid>
<description>&lt;h2 id=&#34;20180701&#34;&gt;2018-07-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-07-01&#34;&gt;2018-07-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:&lt;/li&gt;
&lt;/ul&gt;
@ -428,7 +428,7 @@ sys 0m1.979s
<pubDate>Mon, 04 Jun 2018 19:49:54 -0700</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-06/</guid>
<description>&lt;h2 id=&#34;20180604&#34;&gt;2018-06-04&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-06-04&#34;&gt;2018-06-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Test the &lt;a href=&#34;https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560&#34;&gt;DSpace 5.8 module upgrades from Atmire&lt;/a&gt; (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/378&#34;&gt;#378&lt;/a&gt;)
&lt;ul&gt;
@ -457,7 +457,7 @@ sys 2m7.289s
<pubDate>Tue, 01 May 2018 16:43:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-05/</guid>
<description>&lt;h2 id=&#34;20180501&#34;&gt;2018-05-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-05-01&#34;&gt;2018-05-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
&lt;ul&gt;
@ -476,7 +476,7 @@ sys 2m7.289s
<pubDate>Sun, 01 Apr 2018 16:13:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-04/</guid>
<description>&lt;h2 id=&#34;20180401&#34;&gt;2018-04-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-04-01&#34;&gt;2018-04-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I tried to test something on DSpace Test but noticed that it&#39;s down since god knows when&lt;/li&gt;
&lt;li&gt;Catalina logs at least show some memory errors yesterday:&lt;/li&gt;
@ -489,7 +489,7 @@ sys 2m7.289s
<pubDate>Fri, 02 Mar 2018 16:07:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-03/</guid>
<description>&lt;h2 id=&#34;20180302&#34;&gt;2018-03-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-03-02&#34;&gt;2018-03-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Export a CSV of the IITA community metadata for Martin Mueller&lt;/li&gt;
&lt;/ul&gt;</description>
@ -501,7 +501,7 @@ sys 2m7.289s
<pubDate>Thu, 01 Feb 2018 16:28:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-02/</guid>
<description>&lt;h2 id=&#34;20180201&#34;&gt;2018-02-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-02-01&#34;&gt;2018-02-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Peter gave feedback on the &lt;code&gt;dc.rights&lt;/code&gt; proof of concept that I had sent him last week&lt;/li&gt;
&lt;li&gt;We don&#39;t need to distinguish between internal and external works, so that makes it just a simple list&lt;/li&gt;
@ -516,7 +516,7 @@ sys 2m7.289s
<pubDate>Tue, 02 Jan 2018 08:35:54 -0800</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2018-01/</guid>
<description>&lt;h2 id=&#34;20180102&#34;&gt;2018-01-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2018-01-02&#34;&gt;2018-01-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time&lt;/li&gt;
&lt;li&gt;I didn&#39;t get any load alerts from Linode and the REST and XMLUI logs don&#39;t show anything out of the ordinary&lt;/li&gt;
@ -591,7 +591,7 @@ dspace.log.2018-01-02:34
<pubDate>Fri, 01 Dec 2017 13:53:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-12/</guid>
<description>&lt;h2 id=&#34;20171201&#34;&gt;2017-12-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-12-01&#34;&gt;2017-12-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Uptime Robot noticed that CGSpace went down&lt;/li&gt;
&lt;li&gt;The logs say &amp;ldquo;Timeout waiting for idle object&amp;rdquo;&lt;/li&gt;
@ -606,11 +606,11 @@ dspace.log.2018-01-02:34
<pubDate>Thu, 02 Nov 2017 09:37:54 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-11/</guid>
<description>&lt;h2 id=&#34;20171101&#34;&gt;2017-11-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-11-01&#34;&gt;2017-11-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The CORE developers responded to say they are looking into their bot not respecting our robots.txt&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20171102&#34;&gt;2017-11-02&lt;/h2&gt;
&lt;h2 id=&#34;2017-11-02&#34;&gt;2017-11-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Today there have been no hits by CORE and no alerts from Linode (coincidence?)&lt;/li&gt;
&lt;/ul&gt;
@ -630,7 +630,7 @@ COPY 54701
<pubDate>Sun, 01 Oct 2017 08:07:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-10/</guid>
<description>&lt;h2 id=&#34;20171001&#34;&gt;2017-10-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-10-01&#34;&gt;2017-10-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Peter emailed to point out that many items in the &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/2703&#34;&gt;ILRI archive collection&lt;/a&gt; have multiple handles:&lt;/li&gt;
&lt;/ul&gt;
@ -656,11 +656,11 @@ COPY 54701
<pubDate>Thu, 07 Sep 2017 16:54:52 +0700</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-09/</guid>
<description>&lt;h2 id=&#34;20170906&#34;&gt;2017-09-06&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-09-06&#34;&gt;2017-09-06&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20170907&#34;&gt;2017-09-07&lt;/h2&gt;
&lt;h2 id=&#34;2017-09-07&#34;&gt;2017-09-07&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Ask Sisay to clean up the WLE approvers a bit, as Marianne&#39;s user account is both in the approvers step as well as the group&lt;/li&gt;
&lt;/ul&gt;</description>
@ -672,7 +672,7 @@ COPY 54701
<pubDate>Tue, 01 Aug 2017 11:51:52 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-08/</guid>
<description>&lt;h2 id=&#34;20170801&#34;&gt;2017-08-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-08-01&#34;&gt;2017-08-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours&lt;/li&gt;
&lt;li&gt;I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)&lt;/li&gt;
@ -702,11 +702,11 @@ COPY 54701
<pubDate>Sat, 01 Jul 2017 18:03:52 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-07/</guid>
<description>&lt;h2 id=&#34;20170701&#34;&gt;2017-07-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-07-01&#34;&gt;2017-07-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Run system updates and reboot DSpace Test&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20170704&#34;&gt;2017-07-04&lt;/h2&gt;
&lt;h2 id=&#34;2017-07-04&#34;&gt;2017-07-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Merge changes for WLE Phase II theme rename (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/329&#34;&gt;#329&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Looking at extracting the metadata registries from ICARDA&#39;s MEL DSpace database so we can compare fields with CGSpace&lt;/li&gt;
@ -738,7 +738,7 @@ COPY 54701
<pubDate>Sun, 02 Apr 2017 17:08:52 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-04/</guid>
<description>&lt;h2 id=&#34;20170402&#34;&gt;2017-04-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-04-02&#34;&gt;2017-04-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Merge one change to CCAFS flagships that I had forgotten to remove last month (&amp;ldquo;MANAGING CLIMATE RISK&amp;rdquo;): &lt;a href=&#34;https://github.com/ilri/DSpace/pull/317&#34;&gt;https://github.com/ilri/DSpace/pull/317&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Quick proof-of-concept hack to add &lt;code&gt;dc.rights&lt;/code&gt; to the input form, including some inline instructions/hints:&lt;/li&gt;
@ -758,11 +758,11 @@ COPY 54701
<pubDate>Wed, 01 Mar 2017 17:08:52 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-03/</guid>
<description>&lt;h2 id=&#34;20170301&#34;&gt;2017-03-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-03-01&#34;&gt;2017-03-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Run the 279 CIAT author corrections on CGSpace&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20170302&#34;&gt;2017-03-02&lt;/h2&gt;
&lt;h2 id=&#34;2017-03-02&#34;&gt;2017-03-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace&lt;/li&gt;
&lt;li&gt;CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles&lt;/li&gt;
@ -785,7 +785,7 @@ COPY 54701
<pubDate>Tue, 07 Feb 2017 07:04:52 -0800</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-02/</guid>
<description>&lt;h2 id=&#34;20170207&#34;&gt;2017-02-07&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-02-07&#34;&gt;2017-02-07&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;An item was mapped twice erroneously again, so I had to remove one of the mappings manually:&lt;/li&gt;
&lt;/ul&gt;
@ -810,7 +810,7 @@ DELETE 1
<pubDate>Mon, 02 Jan 2017 10:43:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-01/</guid>
<description>&lt;h2 id=&#34;20170102&#34;&gt;2017-01-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-01-02&#34;&gt;2017-01-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error&lt;/li&gt;
&lt;li&gt;I tested on DSpace Test as well and it doesn&#39;t work there either&lt;/li&gt;
@ -824,7 +824,7 @@ DELETE 1
<pubDate>Fri, 02 Dec 2016 10:43:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-12/</guid>
<description>&lt;h2 id=&#34;20161202&#34;&gt;2016-12-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-12-02&#34;&gt;2016-12-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace was down for five hours in the morning while I was sleeping&lt;/li&gt;
&lt;li&gt;While looking in the logs for errors, I see tons of warnings about Atmire MQM:&lt;/li&gt;
@ -847,7 +847,7 @@ DELETE 1
<pubDate>Tue, 01 Nov 2016 09:21:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-11/</guid>
<description>&lt;h2 id=&#34;20161101&#34;&gt;2016-11-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-11-01&#34;&gt;2016-11-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add &lt;code&gt;dc.type&lt;/code&gt; to the output options for Atmire&#39;s Listings and Reports module (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/286&#34;&gt;#286&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
@ -860,7 +860,7 @@ DELETE 1
<pubDate>Mon, 03 Oct 2016 15:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-10/</guid>
<description>&lt;h2 id=&#34;20161003&#34;&gt;2016-10-03&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-10-03&#34;&gt;2016-10-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Testing adding &lt;a href=&#34;https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing&#34;&gt;ORCIDs to a CSV&lt;/a&gt; file for a single item to see if the author orders get messed up&lt;/li&gt;
&lt;li&gt;Need to test the following scenarios to see how author order is affected:
@ -881,7 +881,7 @@ DELETE 1
<pubDate>Thu, 01 Sep 2016 15:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-09/</guid>
<description>&lt;h2 id=&#34;20160901&#34;&gt;2016-09-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-09-01&#34;&gt;2016-09-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors&lt;/li&gt;
&lt;li&gt;Discuss how the migration of CGIAR&#39;s Active Directory to a flat structure will break our LDAP groups in DSpace&lt;/li&gt;
@ -898,7 +898,7 @@ DELETE 1
<pubDate>Mon, 01 Aug 2016 15:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-08/</guid>
<description>&lt;h2 id=&#34;20160801&#34;&gt;2016-08-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-08-01&#34;&gt;2016-08-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add updated distribution license from Sisay (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/259&#34;&gt;#259&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Play with upgrading Mirage 2 dependencies in &lt;code&gt;bower.json&lt;/code&gt; because most are several versions of out date&lt;/li&gt;
@ -919,7 +919,7 @@ $ git rebase -i dspace-5.5
<pubDate>Fri, 01 Jul 2016 10:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-07/</guid>
<description>&lt;h2 id=&#34;20160701&#34;&gt;2016-07-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-07-01&#34;&gt;2016-07-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add &lt;code&gt;dc.description.sponsorship&lt;/code&gt; to Discovery sidebar facets and make investors clickable in item view (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/232&#34;&gt;#232&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;I think this query should find and replace all authors that have &amp;ldquo;,&amp;rdquo; at the end of their names:&lt;/li&gt;
@ -941,7 +941,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Wed, 01 Jun 2016 10:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-06/</guid>
<description>&lt;h2 id=&#34;20160601&#34;&gt;2016-06-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-06-01&#34;&gt;2016-06-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Experimenting with IFPRI OAI (we want to harvest their publications)&lt;/li&gt;
&lt;li&gt;After reading the &lt;a href=&#34;https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html&#34;&gt;ContentDM documentation&lt;/a&gt; I found IFPRI&#39;s OAI endpoint: &lt;a href=&#34;http://ebrary.ifpri.org/oai/oai.php&#34;&gt;http://ebrary.ifpri.org/oai/oai.php&lt;/a&gt;&lt;/li&gt;
@ -958,7 +958,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Sun, 01 May 2016 23:06:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-05/</guid>
<description>&lt;h2 id=&#34;20160501&#34;&gt;2016-05-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-05-01&#34;&gt;2016-05-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Since yesterday there have been 10,000 REST errors and the site has been unstable again&lt;/li&gt;
&lt;li&gt;I have blocked access to the API now&lt;/li&gt;
@ -975,7 +975,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Mon, 04 Apr 2016 11:06:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-04/</guid>
<description>&lt;h2 id=&#34;20160404&#34;&gt;2016-04-04&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-04-04&#34;&gt;2016-04-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit&lt;/li&gt;
&lt;li&gt;We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc&lt;/li&gt;
@ -991,7 +991,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Wed, 02 Mar 2016 16:50:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-03/</guid>
<description>&lt;h2 id=&#34;20160302&#34;&gt;2016-03-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-03-02&#34;&gt;2016-03-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Looking at issues with author authorities on CGSpace&lt;/li&gt;
&lt;li&gt;For some reason we still have the &lt;code&gt;index-lucene-update&lt;/code&gt; cron job active on CGSpace, but I&#39;m pretty sure we don&#39;t need it as of the latest few versions of Atmire&#39;s Listings and Reports module&lt;/li&gt;
@ -1005,7 +1005,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Fri, 05 Feb 2016 13:18:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-02/</guid>
<description>&lt;h2 id=&#34;20160205&#34;&gt;2016-02-05&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-02-05&#34;&gt;2016-02-05&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Looking at some DAGRIS data for Abenet Yabowork&lt;/li&gt;
&lt;li&gt;Lots of issues with spaces, newlines, etc causing the import to fail&lt;/li&gt;
@ -1024,7 +1024,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Wed, 13 Jan 2016 13:18:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-01/</guid>
<description>&lt;h2 id=&#34;20160113&#34;&gt;2016-01-13&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-01-13&#34;&gt;2016-01-13&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Move ILRI collection &lt;code&gt;10568/12503&lt;/code&gt; from &lt;code&gt;10568/27869&lt;/code&gt; to &lt;code&gt;10568/27629&lt;/code&gt; using the &lt;a href=&#34;https://gist.github.com/alanorth/392c4660e8b022d99dfa&#34;&gt;move_collections.sh&lt;/a&gt; script I wrote last year.&lt;/li&gt;
&lt;li&gt;I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.&lt;/li&gt;
@ -1038,7 +1038,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Wed, 02 Dec 2015 13:18:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2015-12/</guid>
<description>&lt;h2 id=&#34;20151202&#34;&gt;2015-12-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2015-12-02&#34;&gt;2015-12-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Replace &lt;code&gt;lzop&lt;/code&gt; with &lt;code&gt;xz&lt;/code&gt; in log compression cron jobs on DSpace Test—it uses less space:&lt;/li&gt;
&lt;/ul&gt;
@ -1056,7 +1056,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Mon, 23 Nov 2015 17:00:57 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2015-11/</guid>
<description>&lt;h2 id=&#34;20151122&#34;&gt;2015-11-22&lt;/h2&gt;
<description>&lt;h2 id=&#34;2015-11-22&#34;&gt;2015-11-22&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace went down&lt;/li&gt;
&lt;li&gt;Looks like DSpace exhausted its PostgreSQL connection pool&lt;/li&gt;

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20190301">2019-03-01</h2>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
@ -131,7 +131,7 @@
</p>
</header>
<h2 id="20190201">2019-02-01</h2>
<h2 id="2019-02-01">2019-02-01</h2>
<ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
@ -176,7 +176,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20190102">2019-01-02</h2>
<h2 id="2019-01-02">2019-01-02</h2>
<ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don't see anything interesting in the web server logs around that time though:</li>
@ -210,13 +210,13 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181201">2018-12-01</h2>
<h2 id="2018-12-01">2018-12-01</h2>
<ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li>
</ul>
<h2 id="20181202">2018-12-02</h2>
<h2 id="2018-12-02">2018-12-02</h2>
<ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul>
@ -237,12 +237,12 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181101">2018-11-01</h2>
<h2 id="2018-11-01">2018-11-01</h2>
<ul>
<li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="20181103">2018-11-03</h2>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
@ -264,7 +264,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181001">2018-10-01</h2>
<h2 id="2018-10-01">2018-10-01</h2>
<ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
@ -286,7 +286,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180902">2018-09-02</h2>
<h2 id="2018-09-02">2018-09-02</h2>
<ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
@ -310,7 +310,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180801">2018-08-01</h2>
<h2 id="2018-08-01">2018-08-01</h2>
<ul>
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
@ -342,7 +342,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180701">2018-07-01</h2>
<h2 id="2018-07-01">2018-07-01</h2>
<ul>
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
</ul>
@ -369,7 +369,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180604">2018-06-04</h2>
<h2 id="2018-06-04">2018-06-04</h2>
<ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20180501">2018-05-01</h2>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
@ -127,7 +127,7 @@
</p>
</header>
<h2 id="20180401">2018-04-01</h2>
<h2 id="2018-04-01">2018-04-01</h2>
<ul>
<li>I tried to test something on DSpace Test but noticed that it's down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li>
@ -149,7 +149,7 @@
</p>
</header>
<h2 id="20180302">2018-03-02</h2>
<h2 id="2018-03-02">2018-03-02</h2>
<ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul>
@ -170,7 +170,7 @@
</p>
</header>
<h2 id="20180201">2018-02-01</h2>
<h2 id="2018-02-01">2018-02-01</h2>
<ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don't need to distinguish between internal and external works, so that makes it just a simple list</li>
@ -194,7 +194,7 @@
</p>
</header>
<h2 id="20180102">2018-01-02</h2>
<h2 id="2018-01-02">2018-01-02</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
@ -278,7 +278,7 @@ dspace.log.2018-01-02:34
</p>
</header>
<h2 id="20171201">2017-12-01</h2>
<h2 id="2017-12-01">2017-12-01</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li>
@ -302,11 +302,11 @@ dspace.log.2018-01-02:34
</p>
</header>
<h2 id="20171101">2017-11-01</h2>
<h2 id="2017-11-01">2017-11-01</h2>
<ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul>
<h2 id="20171102">2017-11-02</h2>
<h2 id="2017-11-02">2017-11-02</h2>
<ul>
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
@ -335,7 +335,7 @@ COPY 54701
</p>
</header>
<h2 id="20171001">2017-10-01</h2>
<h2 id="2017-10-01">2017-10-01</h2>
<ul>
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
@ -381,11 +381,11 @@ COPY 54701
</p>
</header>
<h2 id="20170906">2017-09-06</h2>
<h2 id="2017-09-06">2017-09-06</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul>
<h2 id="20170907">2017-09-07</h2>
<h2 id="2017-09-07">2017-09-07</h2>
<ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li>
</ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20170801">2017-08-01</h2>
<h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
@ -138,11 +138,11 @@
</p>
</header>
<h2 id="20170701">2017-07-01</h2>
<h2 id="2017-07-01">2017-07-01</h2>
<ul>
<li>Run system updates and reboot DSpace Test</li>
</ul>
<h2 id="20170704">2017-07-04</h2>
<h2 id="2017-07-04">2017-07-04</h2>
<ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li>
@ -201,7 +201,7 @@
</p>
</header>
<h2 id="20170402">2017-04-02</h2>
<h2 id="2017-04-02">2017-04-02</h2>
<ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
@ -230,11 +230,11 @@
</p>
</header>
<h2 id="20170301">2017-03-01</h2>
<h2 id="2017-03-01">2017-03-01</h2>
<ul>
<li>Run the 279 CIAT author corrections on CGSpace</li>
</ul>
<h2 id="20170302">2017-03-02</h2>
<h2 id="2017-03-02">2017-03-02</h2>
<ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
@ -266,7 +266,7 @@
</p>
</header>
<h2 id="20170207">2017-02-07</h2>
<h2 id="2017-02-07">2017-02-07</h2>
<ul>
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
@ -300,7 +300,7 @@ DELETE 1
</p>
</header>
<h2 id="20170102">2017-01-02</h2>
<h2 id="2017-01-02">2017-01-02</h2>
<ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn't work there either</li>
@ -323,7 +323,7 @@ DELETE 1
</p>
</header>
<h2 id="20161202">2016-12-02</h2>
<h2 id="2016-12-02">2016-12-02</h2>
<ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
@ -355,7 +355,7 @@ DELETE 1
</p>
</header>
<h2 id="20161101">2016-11-01</h2>
<h2 id="2016-11-01">2016-11-01</h2>
<ul>
<li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20161003">2016-10-03</h2>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
@ -129,7 +129,7 @@
</p>
</header>
<h2 id="20160901">2016-09-01</h2>
<h2 id="2016-09-01">2016-09-01</h2>
<ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li>
@ -155,7 +155,7 @@
</p>
</header>
<h2 id="20160801">2016-08-01</h2>
<h2 id="2016-08-01">2016-08-01</h2>
<ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
@ -185,7 +185,7 @@ $ git rebase -i dspace-5.5
</p>
</header>
<h2 id="20160701">2016-07-01</h2>
<h2 id="2016-07-01">2016-07-01</h2>
<ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
@ -216,7 +216,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160601">2016-06-01</h2>
<h2 id="2016-06-01">2016-06-01</h2>
<ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
@ -242,7 +242,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160501">2016-05-01</h2>
<h2 id="2016-05-01">2016-05-01</h2>
<ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li>
@ -268,7 +268,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160404">2016-04-04</h2>
<h2 id="2016-04-04">2016-04-04</h2>
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
@ -293,7 +293,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160302">2016-03-02</h2>
<h2 id="2016-03-02">2016-03-02</h2>
<ul>
<li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li>
@ -316,7 +316,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160205">2016-02-05</h2>
<h2 id="2016-02-05">2016-02-05</h2>
<ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
@ -344,7 +344,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160113">2016-01-13</h2>
<h2 id="2016-01-13">2016-01-13</h2>
<ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20151202">2015-12-02</h2>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
@ -126,7 +126,7 @@
</p>
</header>
<h2 id="20151122">2015-11-22</h2>
<h2 id="2015-11-22">2015-11-22</h2>
<ul>
<li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>

View File

@ -4,27 +4,27 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2019-12-11T18:20:20+02:00</lastmod>
<lastmod>2019-12-11T19:02:05+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2019-12-11T18:20:20+02:00</lastmod>
<lastmod>2019-12-11T19:02:05+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2019-12/</loc>
<lastmod>2019-12-11T18:20:20+02:00</lastmod>
<lastmod>2019-12-11T19:02:05+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2019-12-11T18:20:20+02:00</lastmod>
<lastmod>2019-12-11T19:02:05+02:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2019-12-11T18:20:20+02:00</lastmod>
<lastmod>2019-12-11T19:02:05+02:00</lastmod>
</url>
<url>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20191201">2019-12-01</h2>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
@ -131,7 +131,7 @@
</p>
</header>
<h2 id="20191104">2019-11-04</h2>
<h2 id="2019-11-04">2019-11-04</h2>
<ul>
<li>Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics
<ul>
@ -208,7 +208,7 @@
</p>
</header>
<h2 id="20190901">2019-09-01</h2>
<h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
@ -253,11 +253,11 @@
</p>
</header>
<h2 id="20190803">2019-08-03</h2>
<h2 id="2019-08-03">2019-08-03</h2>
<ul>
<li>Look at Bioversity's latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul>
<h2 id="20190804">2019-08-04</h2>
<h2 id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
@ -285,7 +285,7 @@
</p>
</header>
<h2 id="20190701">2019-07-01</h2>
<h2 id="2019-07-01">2019-07-01</h2>
<ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
@ -313,12 +313,12 @@
</p>
</header>
<h2 id="20190602">2019-06-02</h2>
<h2 id="2019-06-02">2019-06-02</h2>
<ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul>
<h2 id="20190603">2019-06-03</h2>
<h2 id="2019-06-03">2019-06-03</h2>
<ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul>
@ -339,7 +339,7 @@
</p>
</header>
<h2 id="20190501">2019-05-01</h2>
<h2 id="2019-05-01">2019-05-01</h2>
<ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
@ -372,7 +372,7 @@ DELETE 1
</p>
</header>
<h2 id="20190401">2019-04-01</h2>
<h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Migration"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -84,11 +84,11 @@
</p>
</header>
<h2 id="20170906">2017-09-06</h2>
<h2 id="2017-09-06">2017-09-06</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul>
<h2 id="20170907">2017-09-07</h2>
<h2 id="2017-09-07">2017-09-07</h2>
<ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li>
</ul>
@ -109,7 +109,7 @@
</p>
</header>
<h2 id="20170801">2017-08-01</h2>
<h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
@ -148,11 +148,11 @@
</p>
</header>
<h2 id="20170701">2017-07-01</h2>
<h2 id="2017-07-01">2017-07-01</h2>
<ul>
<li>Run system updates and reboot DSpace Test</li>
</ul>
<h2 id="20170704">2017-07-04</h2>
<h2 id="2017-07-04">2017-07-04</h2>
<ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li>
@ -211,7 +211,7 @@
</p>
</header>
<h2 id="20170402">2017-04-02</h2>
<h2 id="2017-04-02">2017-04-02</h2>
<ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
@ -240,11 +240,11 @@
</p>
</header>
<h2 id="20170301">2017-03-01</h2>
<h2 id="2017-03-01">2017-03-01</h2>
<ul>
<li>Run the 279 CIAT author corrections on CGSpace</li>
</ul>
<h2 id="20170302">2017-03-02</h2>
<h2 id="2017-03-02">2017-03-02</h2>
<ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
@ -276,7 +276,7 @@
</p>
</header>
<h2 id="20170207">2017-02-07</h2>
<h2 id="2017-02-07">2017-02-07</h2>
<ul>
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
@ -310,7 +310,7 @@ DELETE 1
</p>
</header>
<h2 id="20170102">2017-01-02</h2>
<h2 id="2017-01-02">2017-01-02</h2>
<ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn't work there either</li>
@ -333,7 +333,7 @@ DELETE 1
</p>
</header>
<h2 id="20161202">2016-12-02</h2>
<h2 id="2016-12-02">2016-12-02</h2>
<ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>

View File

@ -17,11 +17,11 @@
<pubDate>Thu, 07 Sep 2017 16:54:52 +0700</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-09/</guid>
<description>&lt;h2 id=&#34;20170906&#34;&gt;2017-09-06&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-09-06&#34;&gt;2017-09-06&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20170907&#34;&gt;2017-09-07&lt;/h2&gt;
&lt;h2 id=&#34;2017-09-07&#34;&gt;2017-09-07&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Ask Sisay to clean up the WLE approvers a bit, as Marianne&#39;s user account is both in the approvers step as well as the group&lt;/li&gt;
&lt;/ul&gt;</description>
@ -33,7 +33,7 @@
<pubDate>Tue, 01 Aug 2017 11:51:52 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-08/</guid>
<description>&lt;h2 id=&#34;20170801&#34;&gt;2017-08-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-08-01&#34;&gt;2017-08-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours&lt;/li&gt;
&lt;li&gt;I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)&lt;/li&gt;
@ -63,11 +63,11 @@
<pubDate>Sat, 01 Jul 2017 18:03:52 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-07/</guid>
<description>&lt;h2 id=&#34;20170701&#34;&gt;2017-07-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-07-01&#34;&gt;2017-07-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Run system updates and reboot DSpace Test&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20170704&#34;&gt;2017-07-04&lt;/h2&gt;
&lt;h2 id=&#34;2017-07-04&#34;&gt;2017-07-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Merge changes for WLE Phase II theme rename (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/329&#34;&gt;#329&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Looking at extracting the metadata registries from ICARDA&#39;s MEL DSpace database so we can compare fields with CGSpace&lt;/li&gt;
@ -99,7 +99,7 @@
<pubDate>Sun, 02 Apr 2017 17:08:52 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-04/</guid>
<description>&lt;h2 id=&#34;20170402&#34;&gt;2017-04-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-04-02&#34;&gt;2017-04-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Merge one change to CCAFS flagships that I had forgotten to remove last month (&amp;ldquo;MANAGING CLIMATE RISK&amp;rdquo;): &lt;a href=&#34;https://github.com/ilri/DSpace/pull/317&#34;&gt;https://github.com/ilri/DSpace/pull/317&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Quick proof-of-concept hack to add &lt;code&gt;dc.rights&lt;/code&gt; to the input form, including some inline instructions/hints:&lt;/li&gt;
@ -119,11 +119,11 @@
<pubDate>Wed, 01 Mar 2017 17:08:52 +0200</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-03/</guid>
<description>&lt;h2 id=&#34;20170301&#34;&gt;2017-03-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-03-01&#34;&gt;2017-03-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Run the 279 CIAT author corrections on CGSpace&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;20170302&#34;&gt;2017-03-02&lt;/h2&gt;
&lt;h2 id=&#34;2017-03-02&#34;&gt;2017-03-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace&lt;/li&gt;
&lt;li&gt;CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles&lt;/li&gt;
@ -146,7 +146,7 @@
<pubDate>Tue, 07 Feb 2017 07:04:52 -0800</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-02/</guid>
<description>&lt;h2 id=&#34;20170207&#34;&gt;2017-02-07&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-02-07&#34;&gt;2017-02-07&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;An item was mapped twice erroneously again, so I had to remove one of the mappings manually:&lt;/li&gt;
&lt;/ul&gt;
@ -171,7 +171,7 @@ DELETE 1
<pubDate>Mon, 02 Jan 2017 10:43:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2017-01/</guid>
<description>&lt;h2 id=&#34;20170102&#34;&gt;2017-01-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2017-01-02&#34;&gt;2017-01-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error&lt;/li&gt;
&lt;li&gt;I tested on DSpace Test as well and it doesn&#39;t work there either&lt;/li&gt;
@ -185,7 +185,7 @@ DELETE 1
<pubDate>Fri, 02 Dec 2016 10:43:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-12/</guid>
<description>&lt;h2 id=&#34;20161202&#34;&gt;2016-12-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-12-02&#34;&gt;2016-12-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace was down for five hours in the morning while I was sleeping&lt;/li&gt;
&lt;li&gt;While looking in the logs for errors, I see tons of warnings about Atmire MQM:&lt;/li&gt;
@ -208,7 +208,7 @@ DELETE 1
<pubDate>Tue, 01 Nov 2016 09:21:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-11/</guid>
<description>&lt;h2 id=&#34;20161101&#34;&gt;2016-11-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-11-01&#34;&gt;2016-11-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add &lt;code&gt;dc.type&lt;/code&gt; to the output options for Atmire&#39;s Listings and Reports module (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/286&#34;&gt;#286&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
@ -221,7 +221,7 @@ DELETE 1
<pubDate>Mon, 03 Oct 2016 15:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-10/</guid>
<description>&lt;h2 id=&#34;20161003&#34;&gt;2016-10-03&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-10-03&#34;&gt;2016-10-03&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Testing adding &lt;a href=&#34;https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing&#34;&gt;ORCIDs to a CSV&lt;/a&gt; file for a single item to see if the author orders get messed up&lt;/li&gt;
&lt;li&gt;Need to test the following scenarios to see how author order is affected:
@ -242,7 +242,7 @@ DELETE 1
<pubDate>Thu, 01 Sep 2016 15:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-09/</guid>
<description>&lt;h2 id=&#34;20160901&#34;&gt;2016-09-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-09-01&#34;&gt;2016-09-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors&lt;/li&gt;
&lt;li&gt;Discuss how the migration of CGIAR&#39;s Active Directory to a flat structure will break our LDAP groups in DSpace&lt;/li&gt;
@ -259,7 +259,7 @@ DELETE 1
<pubDate>Mon, 01 Aug 2016 15:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-08/</guid>
<description>&lt;h2 id=&#34;20160801&#34;&gt;2016-08-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-08-01&#34;&gt;2016-08-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add updated distribution license from Sisay (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/259&#34;&gt;#259&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Play with upgrading Mirage 2 dependencies in &lt;code&gt;bower.json&lt;/code&gt; because most are several versions of out date&lt;/li&gt;
@ -280,7 +280,7 @@ $ git rebase -i dspace-5.5
<pubDate>Fri, 01 Jul 2016 10:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-07/</guid>
<description>&lt;h2 id=&#34;20160701&#34;&gt;2016-07-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-07-01&#34;&gt;2016-07-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add &lt;code&gt;dc.description.sponsorship&lt;/code&gt; to Discovery sidebar facets and make investors clickable in item view (&lt;a href=&#34;https://github.com/ilri/DSpace/issues/232&#34;&gt;#232&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;I think this query should find and replace all authors that have &amp;ldquo;,&amp;rdquo; at the end of their names:&lt;/li&gt;
@ -302,7 +302,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Wed, 01 Jun 2016 10:53:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-06/</guid>
<description>&lt;h2 id=&#34;20160601&#34;&gt;2016-06-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-06-01&#34;&gt;2016-06-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Experimenting with IFPRI OAI (we want to harvest their publications)&lt;/li&gt;
&lt;li&gt;After reading the &lt;a href=&#34;https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html&#34;&gt;ContentDM documentation&lt;/a&gt; I found IFPRI&#39;s OAI endpoint: &lt;a href=&#34;http://ebrary.ifpri.org/oai/oai.php&#34;&gt;http://ebrary.ifpri.org/oai/oai.php&lt;/a&gt;&lt;/li&gt;
@ -319,7 +319,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Sun, 01 May 2016 23:06:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-05/</guid>
<description>&lt;h2 id=&#34;20160501&#34;&gt;2016-05-01&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-05-01&#34;&gt;2016-05-01&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Since yesterday there have been 10,000 REST errors and the site has been unstable again&lt;/li&gt;
&lt;li&gt;I have blocked access to the API now&lt;/li&gt;
@ -336,7 +336,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Mon, 04 Apr 2016 11:06:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-04/</guid>
<description>&lt;h2 id=&#34;20160404&#34;&gt;2016-04-04&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-04-04&#34;&gt;2016-04-04&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit&lt;/li&gt;
&lt;li&gt;We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc&lt;/li&gt;
@ -352,7 +352,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Wed, 02 Mar 2016 16:50:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-03/</guid>
<description>&lt;h2 id=&#34;20160302&#34;&gt;2016-03-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-03-02&#34;&gt;2016-03-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Looking at issues with author authorities on CGSpace&lt;/li&gt;
&lt;li&gt;For some reason we still have the &lt;code&gt;index-lucene-update&lt;/code&gt; cron job active on CGSpace, but I&#39;m pretty sure we don&#39;t need it as of the latest few versions of Atmire&#39;s Listings and Reports module&lt;/li&gt;
@ -366,7 +366,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Fri, 05 Feb 2016 13:18:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-02/</guid>
<description>&lt;h2 id=&#34;20160205&#34;&gt;2016-02-05&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-02-05&#34;&gt;2016-02-05&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Looking at some DAGRIS data for Abenet Yabowork&lt;/li&gt;
&lt;li&gt;Lots of issues with spaces, newlines, etc causing the import to fail&lt;/li&gt;
@ -385,7 +385,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Wed, 13 Jan 2016 13:18:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2016-01/</guid>
<description>&lt;h2 id=&#34;20160113&#34;&gt;2016-01-13&lt;/h2&gt;
<description>&lt;h2 id=&#34;2016-01-13&#34;&gt;2016-01-13&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Move ILRI collection &lt;code&gt;10568/12503&lt;/code&gt; from &lt;code&gt;10568/27869&lt;/code&gt; to &lt;code&gt;10568/27629&lt;/code&gt; using the &lt;a href=&#34;https://gist.github.com/alanorth/392c4660e8b022d99dfa&#34;&gt;move_collections.sh&lt;/a&gt; script I wrote last year.&lt;/li&gt;
&lt;li&gt;I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.&lt;/li&gt;
@ -399,7 +399,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Wed, 02 Dec 2015 13:18:00 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2015-12/</guid>
<description>&lt;h2 id=&#34;20151202&#34;&gt;2015-12-02&lt;/h2&gt;
<description>&lt;h2 id=&#34;2015-12-02&#34;&gt;2015-12-02&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Replace &lt;code&gt;lzop&lt;/code&gt; with &lt;code&gt;xz&lt;/code&gt; in log compression cron jobs on DSpace Test—it uses less space:&lt;/li&gt;
&lt;/ul&gt;
@ -417,7 +417,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<pubDate>Mon, 23 Nov 2015 17:00:57 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2015-11/</guid>
<description>&lt;h2 id=&#34;20151122&#34;&gt;2015-11-22&lt;/h2&gt;
<description>&lt;h2 id=&#34;2015-11-22&#34;&gt;2015-11-22&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CGSpace went down&lt;/li&gt;
&lt;li&gt;Looks like DSpace exhausted its PostgreSQL connection pool&lt;/li&gt;

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -84,7 +84,7 @@
</p>
</header>
<h2 id="20161101">2016-11-01</h2>
<h2 id="2016-11-01">2016-11-01</h2>
<ul>
<li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul>
@ -106,7 +106,7 @@
</p>
</header>
<h2 id="20161003">2016-10-03</h2>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
@ -136,7 +136,7 @@
</p>
</header>
<h2 id="20160901">2016-09-01</h2>
<h2 id="2016-09-01">2016-09-01</h2>
<ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li>
@ -162,7 +162,7 @@
</p>
</header>
<h2 id="20160801">2016-08-01</h2>
<h2 id="2016-08-01">2016-08-01</h2>
<ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
@ -192,7 +192,7 @@ $ git rebase -i dspace-5.5
</p>
</header>
<h2 id="20160701">2016-07-01</h2>
<h2 id="2016-07-01">2016-07-01</h2>
<ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
@ -223,7 +223,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160601">2016-06-01</h2>
<h2 id="2016-06-01">2016-06-01</h2>
<ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
@ -249,7 +249,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160501">2016-05-01</h2>
<h2 id="2016-05-01">2016-05-01</h2>
<ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li>
@ -275,7 +275,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160404">2016-04-04</h2>
<h2 id="2016-04-04">2016-04-04</h2>
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
@ -300,7 +300,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160302">2016-03-02</h2>
<h2 id="2016-03-02">2016-03-02</h2>
<ul>
<li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li>
@ -323,7 +323,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160205">2016-02-05</h2>
<h2 id="2016-02-05">2016-02-05</h2>
<ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -84,7 +84,7 @@
</p>
</header>
<h2 id="20160113">2016-01-13</h2>
<h2 id="2016-01-13">2016-01-13</h2>
<ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
@ -107,7 +107,7 @@
</p>
</header>
<h2 id="20151202">2015-12-02</h2>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
@ -134,7 +134,7 @@
</p>
</header>
<h2 id="20151122">2015-11-22</h2>
<h2 id="2015-11-22">2015-11-22</h2>
<ul>
<li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20190301">2019-03-01</h2>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
@ -131,7 +131,7 @@
</p>
</header>
<h2 id="20190201">2019-02-01</h2>
<h2 id="2019-02-01">2019-02-01</h2>
<ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
@ -176,7 +176,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20190102">2019-01-02</h2>
<h2 id="2019-01-02">2019-01-02</h2>
<ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don't see anything interesting in the web server logs around that time though:</li>
@ -210,13 +210,13 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181201">2018-12-01</h2>
<h2 id="2018-12-01">2018-12-01</h2>
<ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li>
</ul>
<h2 id="20181202">2018-12-02</h2>
<h2 id="2018-12-02">2018-12-02</h2>
<ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul>
@ -237,12 +237,12 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181101">2018-11-01</h2>
<h2 id="2018-11-01">2018-11-01</h2>
<ul>
<li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="20181103">2018-11-03</h2>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
@ -264,7 +264,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20181001">2018-10-01</h2>
<h2 id="2018-10-01">2018-10-01</h2>
<ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I'm super busy in Nairobi right now</li>
@ -286,7 +286,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180902">2018-09-02</h2>
<h2 id="2018-09-02">2018-09-02</h2>
<ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
@ -310,7 +310,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180801">2018-08-01</h2>
<h2 id="2018-08-01">2018-08-01</h2>
<ul>
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
@ -342,7 +342,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180701">2018-07-01</h2>
<h2 id="2018-07-01">2018-07-01</h2>
<ul>
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
</ul>
@ -369,7 +369,7 @@ sys 0m1.979s
</p>
</header>
<h2 id="20180604">2018-06-04</h2>
<h2 id="2018-06-04">2018-06-04</h2>
<ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20180501">2018-05-01</h2>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
@ -127,7 +127,7 @@
</p>
</header>
<h2 id="20180401">2018-04-01</h2>
<h2 id="2018-04-01">2018-04-01</h2>
<ul>
<li>I tried to test something on DSpace Test but noticed that it's down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li>
@ -149,7 +149,7 @@
</p>
</header>
<h2 id="20180302">2018-03-02</h2>
<h2 id="2018-03-02">2018-03-02</h2>
<ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul>
@ -170,7 +170,7 @@
</p>
</header>
<h2 id="20180201">2018-02-01</h2>
<h2 id="2018-02-01">2018-02-01</h2>
<ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don't need to distinguish between internal and external works, so that makes it just a simple list</li>
@ -194,7 +194,7 @@
</p>
</header>
<h2 id="20180102">2018-01-02</h2>
<h2 id="2018-01-02">2018-01-02</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary</li>
@ -278,7 +278,7 @@ dspace.log.2018-01-02:34
</p>
</header>
<h2 id="20171201">2017-12-01</h2>
<h2 id="2017-12-01">2017-12-01</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li>
@ -302,11 +302,11 @@ dspace.log.2018-01-02:34
</p>
</header>
<h2 id="20171101">2017-11-01</h2>
<h2 id="2017-11-01">2017-11-01</h2>
<ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul>
<h2 id="20171102">2017-11-02</h2>
<h2 id="2017-11-02">2017-11-02</h2>
<ul>
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
@ -335,7 +335,7 @@ COPY 54701
</p>
</header>
<h2 id="20171001">2017-10-01</h2>
<h2 id="2017-10-01">2017-10-01</h2>
<ul>
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
@ -381,11 +381,11 @@ COPY 54701
</p>
</header>
<h2 id="20170906">2017-09-06</h2>
<h2 id="2017-09-06">2017-09-06</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul>
<h2 id="20170907">2017-09-07</h2>
<h2 id="2017-09-07">2017-09-07</h2>
<ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li>
</ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20170801">2017-08-01</h2>
<h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
@ -138,11 +138,11 @@
</p>
</header>
<h2 id="20170701">2017-07-01</h2>
<h2 id="2017-07-01">2017-07-01</h2>
<ul>
<li>Run system updates and reboot DSpace Test</li>
</ul>
<h2 id="20170704">2017-07-04</h2>
<h2 id="2017-07-04">2017-07-04</h2>
<ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li>
@ -201,7 +201,7 @@
</p>
</header>
<h2 id="20170402">2017-04-02</h2>
<h2 id="2017-04-02">2017-04-02</h2>
<ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
@ -230,11 +230,11 @@
</p>
</header>
<h2 id="20170301">2017-03-01</h2>
<h2 id="2017-03-01">2017-03-01</h2>
<ul>
<li>Run the 279 CIAT author corrections on CGSpace</li>
</ul>
<h2 id="20170302">2017-03-02</h2>
<h2 id="2017-03-02">2017-03-02</h2>
<ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
@ -266,7 +266,7 @@
</p>
</header>
<h2 id="20170207">2017-02-07</h2>
<h2 id="2017-02-07">2017-02-07</h2>
<ul>
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
@ -300,7 +300,7 @@ DELETE 1
</p>
</header>
<h2 id="20170102">2017-01-02</h2>
<h2 id="2017-01-02">2017-01-02</h2>
<ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn't work there either</li>
@ -323,7 +323,7 @@ DELETE 1
</p>
</header>
<h2 id="20161202">2016-12-02</h2>
<h2 id="2016-12-02">2016-12-02</h2>
<ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
@ -355,7 +355,7 @@ DELETE 1
</p>
</header>
<h2 id="20161101">2016-11-01</h2>
<h2 id="2016-11-01">2016-11-01</h2>
<ul>
<li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20161003">2016-10-03</h2>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.duraspace.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
@ -129,7 +129,7 @@
</p>
</header>
<h2 id="20160901">2016-09-01</h2>
<h2 id="2016-09-01">2016-09-01</h2>
<ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li>
@ -155,7 +155,7 @@
</p>
</header>
<h2 id="20160801">2016-08-01</h2>
<h2 id="2016-08-01">2016-08-01</h2>
<ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
@ -185,7 +185,7 @@ $ git rebase -i dspace-5.5
</p>
</header>
<h2 id="20160701">2016-07-01</h2>
<h2 id="2016-07-01">2016-07-01</h2>
<ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
@ -216,7 +216,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160601">2016-06-01</h2>
<h2 id="2016-06-01">2016-06-01</h2>
<ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
@ -242,7 +242,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160501">2016-05-01</h2>
<h2 id="2016-05-01">2016-05-01</h2>
<ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li>
@ -268,7 +268,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160404">2016-04-04</h2>
<h2 id="2016-04-04">2016-04-04</h2>
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
@ -293,7 +293,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160302">2016-03-02</h2>
<h2 id="2016-03-02">2016-03-02</h2>
<ul>
<li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li>
@ -316,7 +316,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160205">2016-02-05</h2>
<h2 id="2016-02-05">2016-02-05</h2>
<ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
@ -344,7 +344,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
</p>
</header>
<h2 id="20160113">2016-01-13</h2>
<h2 id="2016-01-13">2016-01-13</h2>
<ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>

View File

@ -14,7 +14,7 @@
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.60.1" />
<meta name="generator" content="Hugo 0.61.0" />
@ -99,7 +99,7 @@
</p>
</header>
<h2 id="20151202">2015-12-02</h2>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
@ -126,7 +126,7 @@
</p>
</header>
<h2 id="20151122">2015-11-22</h2>
<h2 id="2015-11-22">2015-11-22</h2>
<ul>
<li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>