cgspace-notes/docs/2017-09/index.html

<!DOCTYPE html>
<html lang="en" >

  <head>
    <meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">

<meta property="og:title" content="September, 2017" />
<meta property="og:description" content="2017-09-06

Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours

2017-09-07

Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-09/" />
<meta property="article:published_time" content="2017-09-07T16:54:52+07:00" />
<meta property="article:modified_time" content="2018-03-09T22:10:33+02:00" />

<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="September, 2017"/>
<meta name="twitter:description" content="2017-09-06

Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours

2017-09-07

Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group
"/>
<meta name="generator" content="Hugo 0.63.2" />


<script type="application/ld+json">
{
  "@context": "http://schema.org",
  "@type": "BlogPosting",
  "headline": "September, 2017",
  "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2017-09\/",
  "wordCount": "4199",
  "datePublished": "2017-09-07T16:54:52+07:00",
  "dateModified": "2018-03-09T22:10:33+02:00",
  "author": {
    "@type": "Person",
    "name": "Alan Orth"
  },
  "keywords": "Notes"
}
</script>


    <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2017-09/">

    <title>September, 2017 | CGSpace Notes</title>

    
    <!-- combined, minified CSS -->
    
    <link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
    

    <!-- minified Font Awesome for SVG icons -->
    
    <script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.90e14c13cee52929ac33e1c21694a3cc95063a194eb22aad9f7976434e1a9125.js" integrity="sha256-kOFME87lKSmsM&#43;HCFpSjzJUGOhlOsiqtn3l2Q04akSU=" crossorigin="anonymous"></script>

    <!-- RSS 2.0 feed -->
    

  </head>

  <body>

    
    <div class="blog-masthead">
      <div class="container">
        <nav class="nav blog-nav">
          <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
        </nav>
      </div>
    </div>
    

    <header class="blog-header">
      <div class="container">
        <h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
        <p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
      </div>
    </header>
    
    
    <div class="container">
      <div class="row">
        <div class="col-sm-8 blog-main">

          
<article class="blog-post">
  <header>
    <h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-09/">September, 2017</a></h2>
    <p class="blog-post-meta"><time datetime="2017-09-07T16:54:52&#43;07:00">Thu Sep 07, 2017</time> by Alan Orth in 

<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>

</p>
  </header>
  <h2 id="2017-09-06">2017-09-06</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul>
<h2 id="2017-09-07">2017-09-07</h2>
<ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li>
</ul>
<h2 id="2017-09-10">2017-09-10</h2>
<ul>
<li>Delete 58 blank metadata values from the CGSpace database:</li>
</ul>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
DELETE 58
</code></pre><ul>
<li>I also ran it on DSpace Test because we&rsquo;ll be migrating the CGIAR Library soon and it would be good to catch these before we migrate</li>
<li>Run system updates and restart DSpace Test</li>
<li>We only have 7.7GB of free space on DSpace Test so I need to copy some data off of it before doing the CGIAR Library migration (requires lots of exporting and creating temp files)</li>
<li>I still have the original data from the CGIAR Library so I&rsquo;ve zipped it up and sent it off to linode18 for now</li>
<li>sha256sum of <code>original-cgiar-library-6.6GB.tar.gz</code> is: bcfabb52f51cbdf164b61b7e9b3a0e498479e4c1ed1d547d32d11f44c0d5eb8a</li>
<li>Start doing a test run of the CGIAR Library migration locally</li>
<li>Notes and todo checklist here for now: <a href="https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c">https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c</a></li>
<li>Create pull request for Phase I and II changes to CCAFS Project Tags: <a href="https://github.com/ilri/DSpace/pull/336">#336</a></li>
<li>We&rsquo;ve been discussing with Macaroni Bros and CCAFS for the past month or so and the list of tags was recently finalized</li>
<li>There will need to be some metadata updates — though if I recall correctly it is only about seven records — for that as well, I had made some notes about it in <a href="/cgspace-notes/2017-07">2017-07</a>, but I&rsquo;ve asked for more clarification from Lili just in case</li>
<li>Looking at the DSpace logs to see if we&rsquo;ve had a change in the &ldquo;Cannot get a connection&rdquo; errors since last month when we adjusted the <code>db.maxconnections</code> parameter on CGSpace:</li>
</ul>
<pre><code># grep -c &quot;Cannot get a connection, pool error Timeout waiting for idle object&quot; dspace.log.2017-09-*
dspace.log.2017-09-01:0
dspace.log.2017-09-02:0
dspace.log.2017-09-03:9
dspace.log.2017-09-04:17
dspace.log.2017-09-05:752
dspace.log.2017-09-06:0
dspace.log.2017-09-07:0
dspace.log.2017-09-08:10
dspace.log.2017-09-09:0
dspace.log.2017-09-10:0
</code></pre><ul>
<li>Also, since last month (2017-08) Macaroni Bros no longer runs their REST API scraper every hour, so I&rsquo;m sure that helped</li>
<li>There are still some errors, though, so maybe I should bump the connection limit up a bit</li>
<li>I remember seeing that Munin shows that the average number of connections is 50 (which is probably mostly from the XMLUI) and we&rsquo;re currently allowing 40 connections per app, so maybe it would be good to bump that value up to 50 or 60 along with the system&rsquo;s PostgreSQL <code>max_connections</code> (formula should be: webapps * 60 + 3, or 3 * 60 + 3 = 183 in our case)</li>
<li>I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit)</li>
<li>I&rsquo;m expecting to see 0 connection errors for the next few months</li>
</ul>
<h2 id="2017-09-11">2017-09-11</h2>
<ul>
<li>Lots of work testing the CGIAR Library migration</li>
<li>Many technical notes and TODOs here: <a href="https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c">https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c</a></li>
</ul>
<h2 id="2017-09-12">2017-09-12</h2>
<ul>
<li>I was testing the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-AIPConfigurationsToImproveIngestionSpeedwhileValidating">METS XSD caching during AIP ingest</a> but it doesn&rsquo;t seem to help actually</li>
<li>The import process takes the same amount of time with and without the caching</li>
<li>Also, I captured TCP packets destined for port 80 and both imports only captured ONE packet (an update check from some component in Java):</li>
</ul>
<pre><code>$ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x47455420'
</code></pre><ul>
<li>Great TCP dump guide here: <a href="https://danielmiessler.com/study/tcpdump">https://danielmiessler.com/study/tcpdump</a></li>
<li>The last part of that command filters for HTTP GET requests, of which there should have been many to fetch all the XSD files for validation</li>
<li>I sent a message to the mailing list to see if anyone knows more about this</li>
<li>In looking at the tcpdump results I notice that there is an update check to the ehcache server on <em>every</em> iteration of the ingest loop, for example:</li>
</ul>
<pre><code>09:39:36.008956 IP 192.168.8.124.50515 &gt; 157.189.192.67.http: Flags [P.], seq 1736833672:1736834103, ack 147469926, win 4120, options [nop,nop,TS val 1175113331 ecr 550028064], length 431: HTTP: GET /kit/reflector?kitID=ehcache.default&amp;pageID=update.properties&amp;id=2130706433&amp;os-name=Mac+OS+X&amp;jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&amp;jvm-version=1.8.0_144&amp;platform=x86_64&amp;tc-version=UNKNOWN&amp;tc-product=Ehcache+Core+1.7.2&amp;source=Ehcache+Core&amp;uptime-secs=0&amp;patch=UNKNOWN HTTP/1.1
</code></pre><ul>
<li>Turns out this is a known issue and Ehcache has refused to make it opt-in: <a href="https://jira.terracotta.org/jira/browse/EHC-461">https://jira.terracotta.org/jira/browse/EHC-461</a></li>
<li>But we can disable it by adding an <code>updateCheck=&quot;false&quot;</code> attribute to the main <code>&lt;ehcache &gt;</code> tag in <code>dspace-services/src/main/resources/caching/ehcache-config.xml</code></li>
<li>After re-compiling and re-deploying DSpace I no longer see those update checks during item submission</li>
<li>I had a Skype call with Bram Luyten from Atmire to discuss various issues related to ORCID in DSpace
<ul>
<li>First, ORCID is deprecating their version 1 API (which DSpace uses) and in version 2 API they have removed the ability to search for users by name</li>
<li>The logic is that searching by name actually isn&rsquo;t very useful because ORCID is essentially a global phonebook and there are tons of legitimately duplicate and ambiguous names</li>
<li>Atmire&rsquo;s proposed integration would work by having users lookup and add authors to the authority core directly using their ORCID ID itself (this would happen during the item submission process or perhaps as a standalone / batch process, for example to populate the authority core with a list of known ORCIDs)</li>
<li>Once the association between name and ORCID is made in the authority then it can be autocompleted in the lookup field</li>
<li>Ideally there could also be a user interface for cleanup and merging of authorities</li>
<li>He will prepare a quote for us with keeping in mind that this could be useful to contribute back to the community for a 5.x release</li>
<li>As far as exposing ORCIDs as flat metadata along side all other metadata, he says this should be possible and will work on a quote for us</li>
</ul>
</li>
</ul>
<h2 id="2017-09-13">2017-09-13</h2>
<ul>
<li>Last night Linode sent an alert about CGSpace (linode18) that it has exceeded the outbound traffic rate threshold of 10Mb/s for the last two hours</li>
<li>I wonder what was going on, and looking into the nginx logs I think maybe it&rsquo;s OAI&hellip;</li>
<li>Here is yesterday&rsquo;s top ten IP addresses making requests to <code>/oai</code>:</li>
</ul>
<pre><code># awk '{print $1}' /var/log/nginx/oai.log | sort -n | uniq -c | sort -h | tail -n 10
      1 213.136.89.78
      1 66.249.66.90
      1 66.249.66.92
      3 68.180.229.31
      4 35.187.22.255
  13745 54.70.175.86
  15814 34.211.17.113
  15825 35.161.215.53
  16704 54.70.51.7
</code></pre><ul>
<li>Compared to the previous day&rsquo;s logs it looks VERY high:</li>
</ul>
<pre><code># awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
      1 207.46.13.39
      1 66.249.66.93
      2 66.249.66.91
      4 216.244.66.194
     14 66.249.66.90
</code></pre><ul>
<li>The user agents for those top IPs are:
<ul>
<li>54.70.175.86: API scraper</li>
<li>34.211.17.113: API scraper</li>
<li>35.161.215.53: API scraper</li>
<li>54.70.51.7: API scraper</li>
</ul>
</li>
<li>And this user agent has never been seen before today (or at least recently!):</li>
</ul>
<pre><code># grep -c &quot;API scraper&quot; /var/log/nginx/oai.log
62088
# zgrep -c &quot;API scraper&quot; /var/log/nginx/oai.log.*.gz
/var/log/nginx/oai.log.10.gz:0
/var/log/nginx/oai.log.11.gz:0
/var/log/nginx/oai.log.12.gz:0
/var/log/nginx/oai.log.13.gz:0
/var/log/nginx/oai.log.14.gz:0
/var/log/nginx/oai.log.15.gz:0
/var/log/nginx/oai.log.16.gz:0
/var/log/nginx/oai.log.17.gz:0
/var/log/nginx/oai.log.18.gz:0
/var/log/nginx/oai.log.19.gz:0
/var/log/nginx/oai.log.20.gz:0
/var/log/nginx/oai.log.21.gz:0
/var/log/nginx/oai.log.22.gz:0
/var/log/nginx/oai.log.23.gz:0
/var/log/nginx/oai.log.24.gz:0
/var/log/nginx/oai.log.25.gz:0
/var/log/nginx/oai.log.26.gz:0
/var/log/nginx/oai.log.27.gz:0
/var/log/nginx/oai.log.28.gz:0
/var/log/nginx/oai.log.29.gz:0
/var/log/nginx/oai.log.2.gz:0
/var/log/nginx/oai.log.30.gz:0
/var/log/nginx/oai.log.3.gz:0
/var/log/nginx/oai.log.4.gz:0
/var/log/nginx/oai.log.5.gz:0
/var/log/nginx/oai.log.6.gz:0
/var/log/nginx/oai.log.7.gz:0
/var/log/nginx/oai.log.8.gz:0
/var/log/nginx/oai.log.9.gz:0
</code></pre><ul>
<li>Some of these heavy users are also using XMLUI, and their user agent isn&rsquo;t matched by the <a href="https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/tomcat/server-tomcat7.xml.j2#L158">Tomcat Session Crawler valve</a>, so each request uses a different session</li>
<li>Yesterday alone the IP addresses using the <code>API scraper</code> user agent were responsible for 16,000 sessions in XMLUI:</li>
</ul>
<pre><code># grep -a -E &quot;(54.70.51.7|35.161.215.53|34.211.17.113|54.70.175.86)&quot; /home/cgspace.cgiar.org/log/dspace.log.2017-09-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
15924
</code></pre><ul>
<li>If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex</li>
<li>A search for &ldquo;API scraper&rdquo; user agent on Google returns a <code>robots.txt</code> with a comment that this is the Yewno bot: <a href="http://www.escholarship.org/robots.txt">http://www.escholarship.org/robots.txt</a></li>
<li>Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into:</li>
</ul>
<pre><code>WARN  org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
</code></pre><ul>
<li>Looking at the spreadsheet with deletions and corrections that CCAFS sent last week</li>
<li>It appears they want to delete a lot of metadata, which I&rsquo;m not sure they realize the implications of:</li>
</ul>
<pre><code>dspace=# select text_value, count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange') group by text_value;                                                                                                                                                                                                                  
        text_value        | count                              
--------------------------+-------                             
 FP4_ClimateModels        |     6                              
 FP1_CSAEvidence          |     7                              
 SEA_UpscalingInnovation  |     7                              
 FP4_Baseline             |    69                              
 WA_Partnership           |     1                              
 WA_SciencePolicyExchange |     6                              
 SA_GHGMeasurement        |     2                              
 SA_CSV                   |     7                              
 EA_PAR                   |    18                              
 FP4_Livestock            |     7                              
 FP4_GenderPolicy         |     4                              
 FP2_CRMWestAfrica        |    12                              
 FP4_ClimateData          |    24                              
 FP4_CCPAG                |     2                              
 SEA_mitigationSAMPLES    |     2                              
 SA_Biodiversity          |     1                              
 FP4_PolicyEngagement     |    20                              
 FP3_Gender               |     9                              
 FP4_GenderToolbox        |     3                              
(19 rows)
</code></pre><ul>
<li>I sent CCAFS people an email to ask if they really want to remove these 200+ tags</li>
<li>She responded yes, so I&rsquo;ll at least need to do these deletes in PostgreSQL:</li>
</ul>
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
DELETE 207
</code></pre><ul>
<li>When we discussed this in late July there were some other renames they had requested, but I don&rsquo;t see them in the current spreadsheet so I will have to follow that up</li>
<li>I talked to Macaroni Bros and they said to just go ahead with the other corrections as well as their spreadsheet was evolved organically rather than systematically!</li>
<li>The final list of corrections and deletes should therefore be:</li>
</ul>
<pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
</code></pre><ul>
<li>Create and merge pull request to shut up the Ehcache update check (<a href="https://github.com/ilri/DSpace/pull/337">#337</a>)</li>
<li>Although it looks like there was a previous attempt to disable these update checks that was merged in DSpace 4.0 (although it only affects XMLUI): <a href="https://jira.duraspace.org/browse/DS-1492">https://jira.duraspace.org/browse/DS-1492</a></li>
<li>I commented there suggesting that we disable it globally</li>
<li>I merged the changes to the CCAFS project tags (<a href="https://github.com/ilri/DSpace/pull/336">#336</a>) but still need to finalize the metadata deletions/renames</li>
<li>I merged the CGIAR Library theme changes (<a href="https://github.com/ilri/DSpace/pull/338">#338</a>) to the <code>5_x-prod</code> branch in preparation for next week&rsquo;s migration</li>
<li>I emailed the Handle administrators (<a href="mailto:hdladmin@cnri.reston.va.us">hdladmin@cnri.reston.va.us</a>) to ask them what the process for changing their prefix to be resolved by our resolver</li>
<li>They responded and said that they need email confirmation from the contact of record of the other prefix, so I should have the CGIAR System Organization people email them before I send the new <code>sitebndl.zip</code></li>
<li>Testing to see how we end up with all these new authorities after we keep cleaning and merging them in the database</li>
<li>Here are all my distinct authority combinations in the database before:</li>
</ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
 text_value |              authority               | confidence 
------------+--------------------------------------+------------
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
 Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
 Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
 Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
(8 rows)
</code></pre><ul>
<li>And then after adding a new item and selecting an existing &ldquo;Orth, Alan&rdquo; with an ORCID in the author lookup:</li>
</ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
 text_value |              authority               | confidence 
------------+--------------------------------------+------------
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
 Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
 Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde |        600
 Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
 Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
(9 rows)
</code></pre><ul>
<li>It created a new authority&hellip; let&rsquo;s try to add another item and select the same existing author and see what happens in the database:</li>
</ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
 text_value |              authority               | confidence 
------------+--------------------------------------+------------
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
 Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
 Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde |        600
 Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
 Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
(9 rows)
</code></pre><ul>
<li>No new one&hellip; so now let me try to add another item and select the italicized result from the ORCID lookup and see what happens in the database:</li>
</ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
 text_value |              authority               | confidence 
------------+--------------------------------------+------------
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f |        600
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
 Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
 Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde |        600
 Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
 Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
(10 rows)
</code></pre><ul>
<li>Shit, it created another authority! Let&rsquo;s try it again!</li>
</ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';                                                                                             
 text_value |              authority               | confidence
------------+--------------------------------------+------------
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f |        600
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
 Orth, Alan | 9aed566a-a248-4878-9577-0caedada43db |        600
 Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
 Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde |        600
 Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
 Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
(11 rows)
</code></pre><ul>
<li>It added <em>another</em> authority&hellip; surely this is not the desired behavior, or maybe we are not using this as intented?</li>
</ul>
<h2 id="2017-09-14">2017-09-14</h2>
<ul>
<li>Communicate with Handle.net admins to try to get some guidance about the 10947 prefix</li>
<li>Michael Marus is the contact for their prefix but he has left CGIAR, but as I actually have access to the CGIAR Library server I think I can just generate a new <code>sitebndl.zip</code> file from their server and send it to Handle.net</li>
<li>Also, Handle.net says their prefix is up for annual renewal next month so we might want to just pay for it and take it over</li>
<li>CGSpace was very slow and Uptime Robot even said it was down at one time</li>
<li>I didn&rsquo;t see any abnormally high usage in the REST or OAI logs, but looking at Munin I see the average JVM usage was at 4.9GB and the heap is only 5GB (5120M), so I think it&rsquo;s just normal growing pains</li>
<li>Every few months I generally try to increase the JVM heap to be 512M higher than the average usage reported by Munin, so now I adjusted it to 5632M</li>
</ul>
<h2 id="2017-09-15">2017-09-15</h2>
<ul>
<li>Apply CCAFS project tag corrections on CGSpace:</li>
</ul>
<pre><code>dspace=# \i /tmp/ccafs-projects.sql 
DELETE 5
UPDATE 4
UPDATE 1
DELETE 1
DELETE 207
</code></pre><h2 id="2017-09-17">2017-09-17</h2>
<ul>
<li>Create pull request for CGSpace to be able to resolve multiple handles (<a href="https://github.com/ilri/DSpace/pull/339">#339</a>)</li>
<li>We still need to do the changes to <code>config.dct</code> and regenerate the <code>sitebndl.zip</code> to send to the Handle.net admins</li>
<li>According to this <a href="http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html">dspace-tech mailing list entry from 2011</a>, we need to add the extra handle prefixes to <code>config.dct</code> like this:</li>
</ul>
<pre><code>&quot;server_admins&quot; = (
&quot;300:0.NA/10568&quot;
&quot;300:0.NA/10947&quot;
)

&quot;replication_admins&quot; = (
&quot;300:0.NA/10568&quot;
&quot;300:0.NA/10947&quot;
)

&quot;backup_admins&quot; = (
&quot;300:0.NA/10568&quot;
&quot;300:0.NA/10947&quot;
)
</code></pre><ul>
<li>More work on the CGIAR Library migration test run locally, as I was having problem with importing the last fourteen items from the CGIAR System Management Office community</li>
<li>The problem was that we remapped the items to new collections after the initial import, so the items were using the 10947 prefix but the community and collection was using 10568</li>
<li>I ended up having to read the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-ForceReplaceMode">AIP Backup and Restore</a> closely a few times and then explicitly preserve handles and ignore parents:</li>
</ul>
<pre><code>$ for item in 10568-93759/ITEM@10947-46*; do ~/dspace/bin/dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/87738 $item; done
</code></pre><ul>
<li>Also, this was in replace mode (-r) rather than submit mode (-s), because submit mode always generated a new handle even if I told it not to!</li>
<li>I decided to start the import process in the evening rather than waiting for the morning, and right as the first community was finished importing I started seeing <code>Timeout waiting for idle object</code> errors</li>
<li>I had to cancel the import, clean up a bunch of database entries, increase the PostgreSQL <code>max_connections</code> as a precaution, restart PostgreSQL and Tomcat, and then finally completed the import</li>
</ul>
<h2 id="2017-09-18">2017-09-18</h2>
<ul>
<li>I think we should force regeneration of all thumbnails in the CGIAR Library community, as their DSpace is version 1.7 and CGSpace is running DSpace 5.5 so they should look much better</li>
<li>One item for comparison:</li>
</ul>
<p><img src="/cgspace-notes/2017/09/10947-2919-before.jpg" alt="With original DSpace 1.7 thumbnail"></p>
<p><img src="/cgspace-notes/2017/09/10947-2919-after.jpg" alt="After DSpace 5.5"></p>
<ul>
<li>Moved the CGIAR Library Migration notes to a page — <a href="/cgspace-notes/cgiar-library-migration/">cgiar-library-migration</a> — as there seems to be a bug with post slugs defined in frontmatter when you have a permalink scheme defined in <code>config.toml</code> (happens currently in Hugo 0.27.1 at least)</li>
</ul>
<h2 id="2017-09-19">2017-09-19</h2>
<ul>
<li>Nightly Solr indexing is working again, and it appears to be pretty quick actually:</li>
</ul>
<pre><code>2017-09-19 00:00:14,953 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Processing (0 of 65808): 17607
...
2017-09-19 00:04:18,017 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Processing (65807 of 65808): 83753
</code></pre><ul>
<li>Sisay asked if he could import 50 items for IITA that have already been checked by Bosede and Bizuwork</li>
<li>I had a look at the collection and noticed a bunch of issues with item types and donors, so I asked him to fix those and import it to DSpace Test again first</li>
<li>Abenet wants to be able to filter by ISI Journal in advanced search on queries like this: <a href="https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&amp;filtertype_1=dateIssued&amp;filter_relational_operator_1=equals&amp;filter_relational_operator_0=equals&amp;filter_1=%5B2010+TO+2017%5D&amp;filter_0=2017&amp;filtertype=type&amp;filter_relational_operator=equals&amp;filter=Journal+Article">https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&amp;filtertype_1=dateIssued&amp;filter_relational_operator_1=equals&amp;filter_relational_operator_0=equals&amp;filter_1=%5B2010+TO+2017%5D&amp;filter_0=2017&amp;filtertype=type&amp;filter_relational_operator=equals&amp;filter=Journal+Article</a></li>
<li>I opened an issue to track this (<a href="https://github.com/ilri/DSpace/issues/340">#340</a>) and will test it on DSpace Test soon</li>
<li>Marianne Gadeberg from WLE asked if I would add an account for Adam Hunt on CGSpace and give him permissions to approve all WLE publications</li>
<li>I told him to register first, as he&rsquo;s a CGIAR user and needs an account to be created before I can add him to the groups</li>
</ul>
<h2 id="2017-09-20">2017-09-20</h2>
<ul>
<li>Abenet and I noticed that hdl.handle.net is blocked by ETC at ILRI Addis so I asked Biruk Debebe to route it over the satellite</li>
<li>Force thumbnail regeneration for the CGIAR System Organization&rsquo;s Historic Archive community (2000 items):</li>
</ul>
<pre><code>$ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -f -i 10947/1 -p &quot;ImageMagick PDF Thumbnail&quot;
</code></pre><ul>
<li>I&rsquo;m still waiting (over 1 day later) to hear back from the CGIAR System Organization about updating the DNS for library.cgiar.org</li>
</ul>
<h2 id="2017-09-21">2017-09-21</h2>
<ul>
<li>Switch to OpenJDK 8 from Oracle JDK on DSpace Test</li>
<li>I want to test this for awhile to see if we can start using it instead</li>
<li>I need to look at the JVM graphs in Munin, test the Atmire modules, build the source, etc to get some impressions</li>
</ul>
<h2 id="2017-09-22">2017-09-22</h2>
<ul>
<li>Experimenting with setting up a global JNDI database resource that can be pooled among all the DSpace webapps (reference the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting</a> comments)</li>
<li>See: <a href="https://www.journaldev.com/2513/tomcat-datasource-jndi-example-java">https://www.journaldev.com/2513/tomcat-datasource-jndi-example-java</a></li>
<li>See: <a href="http://memorynotfound.com/configure-jndi-datasource-tomcat/">http://memorynotfound.com/configure-jndi-datasource-tomcat/</a></li>
</ul>
<h2 id="2017-09-24">2017-09-24</h2>
<ul>
<li>Start investigating other platforms for CGSpace due to linear instance pricing on Linode</li>
<li>We need to figure out how much memory is used by applications, caches, etc, and how much disk space the asset store needs</li>
<li>First, here&rsquo;s the last week of memory usage on CGSpace and DSpace Test:</li>
</ul>
<p><img src="/cgspace-notes/2017/09/cgspace-memory-week.png" alt="CGSpace memory week">
<img src="/cgspace-notes/2017/09/dspace-test-memory-week.png" alt="DSpace Test memory week"></p>
<ul>
<li>8GB of RAM seems to be good for DSpace Test for now, with Tomcat&rsquo;s JVM heap taking 3GB, caches and buffers taking 3–4GB, and then ~1GB unused</li>
<li>24GB of RAM is <em>way</em> too much for CGSpace, with Tomcat&rsquo;s JVM heap taking 5.5GB and caches and buffers happily using 14GB or so</li>
<li>As far as disk space, the CGSpace assetstore currently uses 51GB and Solr cores use 86GB (mostly in the statistics core)</li>
<li>DSpace Test currently doesn&rsquo;t even have enough space to store a full copy of CGSpace, as its Linode instance only has 96GB of disk space</li>
<li>I&rsquo;ve heard Google Cloud is nice (cheap and performant) but it&rsquo;s definitely more complicated than Linode and instances aren&rsquo;t <em>that</em> much cheaper to make it worth it</li>
<li>Here are some theoretical instances on Google Cloud:
<ul>
<li>DSpace Test, <code>n1-standard-2 </code> with 2 vCPUs, 7.5GB RAM, 300GB persistent SSD: $99/month</li>
<li>CGSpace, <code>n1-standard-4 </code> with 4 vCPUs, 15GB RAM, 300GB persistent SSD: $148/month</li>
</ul>
</li>
<li>Looking at <a href="https://www.linode.com/pricing#all">Linode&rsquo;s instance pricing</a>, for DSpace Test it seems we could use the same 8GB instance for $40/month, and then add <a href="https://www.linode.com/docs/platform/how-to-use-block-storage-with-your-linode">block storage</a> of ~300GB for $30 (block storage is currently in beta and priced at $0.10/GiB)</li>
<li>For CGSpace we could use the cheaper 12GB instance for $80 and then add block storage of 500GB for $50</li>
<li>I&rsquo;ve sent Peter a message about moving DSpace Test to the New Jersey data center so we can test the block storage beta</li>
<li>Create pull request for adding ISI Journal to search filters (<a href="https://github.com/ilri/DSpace/pull/341">#341</a>)</li>
<li>Peter asked if we could map all the items of type <code>Journal Article</code> in <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI Archive</a> to <a href="https://cgspace.cgiar.org/handle/10568/3">ILRI articles in journals and newsletters</a></li>
<li>It is easy to do via CSV using OpenRefine but I noticed that on CGSpace ~1,000 of the expected 2,500 are already mapped, while on DSpace Test they were not</li>
<li>I&rsquo;ve asked Peter if he knows what&rsquo;s going on (or who mapped them)</li>
<li>Turns out he had already mapped some, but requested that I finish the rest</li>
<li>With this GREL in OpenRefine I can find items that are mapped, ie they have <code>10568/3||</code> or <code>10568/3$</code> in their <code>collection</code> field:</li>
</ul>
<pre><code>isNotNull(value.match(/.+?10568\/3(\|\|.+|$)/))
</code></pre><ul>
<li>Peter also made a lot of changes to the data in the Archives collections while I was attempting to import the changes, so we were essentially competing for PostgreSQL and Solr connections</li>
<li>I ended up having to kill the import and wait until he was done</li>
<li>I exported a clean CSV and applied the changes from that one, which was a hundred or two less than I thought there should be (at least compared to the current state of DSpace Test, which is a few months old)</li>
</ul>
<h2 id="2017-09-25">2017-09-25</h2>
<ul>
<li>Email Rosemary Kande from ICT to ask about the administrative / finance procedure for moving DSpace Test from EU to US region on Linode</li>
<li>Communicate (finally) with Tania and Tunji from the CGIAR System Organization office to tell them to request CGNET make the DNS updates for library.cgiar.org</li>
<li>Peter wants me to clean up the text values for Delia Grace&rsquo;s metadata, as the authorities are all messed up again since we cleaned them up in <a href="/cgspace-notes/2016-12">2016-12</a>:</li>
</ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';                                  
  text_value  |              authority               | confidence              
--------------+--------------------------------------+------------             
 Grace, Delia |                                      |        600              
 Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c |        600              
 Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c |         -1              
 Grace, D.    | 6a8ddca3-33c1-45f9-aa00-6fa9fc91e3fc |         -1
</code></pre><ul>
<li>Strangely, none of her authority entries have ORCIDs anymore&hellip;</li>
<li>I&rsquo;ll just fix the text values and forget about it for now:</li>
</ul>
<pre><code>dspace=# update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
UPDATE 610
</code></pre><ul>
<li>After this we have to reindex the Discovery and Authority cores (as <code>tomcat7</code> user):</li>
</ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b

real    83m56.895s
user    13m16.320s
sys     2m17.917s
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-authority -b
Retrieving all data
Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
Exception: null
java.lang.NullPointerException
        at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
        at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
        at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)

real    6m6.447s
user    1m34.010s
sys     0m12.113s
</code></pre><ul>
<li>The <code>index-authority</code> script always seems to fail, I think it&rsquo;s the same old bug</li>
<li>Something interesting for my notes about JNDI database pool—since I couldn&rsquo;t determine if it was working or not when I tried it locally the other day—is this error message that I just saw in the DSpace logs today:</li>
</ul>
<pre><code>ERROR org.dspace.storage.rdbms.DatabaseManager @ Error retrieving JNDI context: jdbc/dspaceLocal
...
INFO  org.dspace.storage.rdbms.DatabaseManager @ Unable to locate JNDI dataSource: jdbc/dspaceLocal
INFO  org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Database pool
</code></pre><ul>
<li>So it&rsquo;s good to know that <em>something</em> gets printed when it fails because I didn&rsquo;t see <em>any</em> mention of JNDI before when I was testing!</li>
</ul>
<h2 id="2017-09-26">2017-09-26</h2>
<ul>
<li>Adam Hunt from WLE finally registered so I added him to the editor and approver groups</li>
<li>Then I noticed that Sisay never removed Marianne&rsquo;s user accounts from the approver steps in the workflow because she is already in the WLE groups, which are in those steps</li>
<li>For what it&rsquo;s worth, I had asked him to remove them on 2017-09-14</li>
<li>I also went and added the WLE approvers and editors groups to the appropriate steps of all the Phase I and Phase II research theme collections</li>
<li>A lot of CIAT&rsquo;s items have manually generated thumbnails which have an incorrect aspect ratio and an ugly black border</li>
<li>I communicated with Elizabeth from CIAT to tell her she should use DSpace&rsquo;s automatically generated thumbnails</li>
<li>Start discussiong with ICT about Linode server update for DSpace Test</li>
<li>Rosemary said I need to work with Robert Okal to destroy/create the server, and then let her and Lilian Masigah from finance know the updated Linode asset names for their records</li>
</ul>
<h2 id="2017-09-28">2017-09-28</h2>
<ul>
<li>Tunji from the System Organization finally sent the DNS request for library.cgiar.org to CGNET</li>
<li>Now the redirects work</li>
<li>I quickly registered a Let&rsquo;s Encrypt certificate for the domain:</li>
</ul>
<pre><code># systemctl stop nginx
# /opt/certbot-auto certonly --standalone --email aorth@mjanja.ch -d library.cgiar.org
# systemctl start nginx
</code></pre><ul>
<li>I modified the nginx configuration of the ansible playbooks to use this new certificate and now the certificate is enabled and OCSP stapling is working:</li>
</ul>
<pre><code>$ openssl s_client -connect cgspace.cgiar.org:443 -servername library.cgiar.org  -tls1_2 -tlsextdebug -status
...
OCSP Response Data:
...
Cert Status: good
</code></pre>

  
</article> 


        </div> <!-- /.blog-main -->

        <aside class="col-sm-3 ml-auto blog-sidebar">
  

        <section class="sidebar-module">
    <h4>Recent Posts</h4>
    <ol class="list-unstyled">


<li><a href="/cgspace-notes/2020-02/">February, 2020</a></li>

<li><a href="/cgspace-notes/2020-01/">January, 2020</a></li>

<li><a href="/cgspace-notes/2019-12/">December, 2019</a></li>

<li><a href="/cgspace-notes/2019-11/">November, 2019</a></li>

<li><a href="/cgspace-notes/cgspace-cgcorev2-migration/">CGSpace CG Core v2 Migration</a></li>

    </ol>
  </section>

  
  <section class="sidebar-module">
    <h4>Links</h4>
    <ol class="list-unstyled">
      
      <li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
      
      <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
      
      <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
      
    </ol>
  </section>
  
</aside>


      </div> <!-- /.row -->
    </div> <!-- /.container -->
    

    <footer class="blog-footer">
      <p dir="auto">
      
      Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
      
      </p>
      <p>
      <a href="#">Back to top</a>
      </p>
    </footer>
    

  </body>

</html>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<!DOCTYPE html>
-												Update theme submodule and regenerate public

											
										
										
											2019-10-11 11:19:42 +03:00
+								<html lang="en" >
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
 								  <head>
 								    <meta charset="utf-8">
 								<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
 								<meta property="og:title" content="September, 2017" />
 								<meta property="og:description" content="2017-09-06
 								Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
 -09-07
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								" />
 								<meta property="og:type" content="article" />
-												Regenerate with Hugo 0.54

											
										
										
											2019-02-02 14:12:57 +02:00
+								<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2017-09/" />
-												Add notes for 2019-08-08

											
										
										
											2019-08-08 18:10:44 +03:00
+								<meta property="article:published_time" content="2017-09-07T16:54:52+07:00" />
 								<meta property="article:modified_time" content="2018-03-09T22:10:33+02:00" />
-												Add notes for 2018-09-30

											
										
										
											2018-09-30 08:23:48 +03:00
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<meta name="twitter:card" content="summary"/>
 								<meta name="twitter:title" content="September, 2017"/>
 								<meta name="twitter:description" content="2017-09-06
 								Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
 -09-07
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								"/>
-												Add notes for 2020-01-29

											
										
										
											2020-01-29 18:09:36 +02:00
+								<meta name="generator" content="Hugo 0.63.2" />
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
 								<script type="application/ld+json">
 								{
 								  "@context": "http://schema.org",
 								  "@type": "BlogPosting",
 								  "headline": "September, 2017",
-												Add notes for 2019-04-13

											
										
										
											2019-04-13 12:15:55 +03:00
+								  "url": "https:\/\/alanorth.github.io\/cgspace-notes\/2017-09\/",
-												Regenerate public for Hugo 0.40.2

											
										
										
											2018-04-30 19:05:39 +03:00
+								  "wordCount": "4199",
-												Update theme submodule and regenerate public

											
										
										
											2019-10-11 11:19:42 +03:00
+								  "datePublished": "2017-09-07T16:54:52+07:00",
 								  "dateModified": "2018-03-09T22:10:33+02:00",
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								  "author": {
 								    "@type": "Person",
 								    "name": "Alan Orth"
 								  },
 								  "keywords": "Notes"
 								}
 								</script>
 								    <link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2017-09/">
 								    <title>September, 2017 | CGSpace Notes</title>
-												Update theme submodule and regenerate public

											
										
										
											2019-10-11 11:19:42 +03:00
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								    <!-- combined, minified CSS -->
-												Regenerate docs

											
										
										
											2020-01-23 20:19:38 +02:00
-												Regenerate docs

											
										
										
											2020-01-28 12:01:42 +02:00
+								    <link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
-												Update theme submodule and regenerate public

											
										
										
											2019-10-11 11:19:42 +03:00
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
-												Regenerate docs

											
										
										
											2020-01-28 12:01:42 +02:00
+								    <!-- minified Font Awesome for SVG icons -->
 								    <script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.90e14c13cee52929ac33e1c21694a3cc95063a194eb22aad9f7976434e1a9125.js" integrity="sha256-kOFME87lKSmsM&#43;HCFpSjzJUGOhlOsiqtn3l2Q04akSU=" crossorigin="anonymous"></script>
-												Add notes for 2019-04-14

											
										
										
											2019-04-14 16:59:47 +03:00
+								    <!-- RSS 2.0 feed -->
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
 								  </head>
 								  <body>
 								    <div class="blog-masthead">
 								      <div class="container">
 								        <nav class="nav blog-nav">
 								          <a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
 								        </nav>
 								      </div>
 								    </div>
-												Regenerate docs

											
										
										
											2018-12-19 13:20:39 +02:00
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								    <header class="blog-header">
 								      <div class="container">
-												Update theme submodule and regenerate public

											
										
										
											2019-10-11 11:19:42 +03:00
+								        <h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
 								        <p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								      </div>
 								    </header>
-												Regenerate docs

											
										
										
											2018-12-19 13:20:39 +02:00
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
 								    <div class="container">
 								      <div class="row">
 								        <div class="col-sm-8 blog-main">
 								<article class="blog-post">
 								  <header>
-												Update theme submodule and regenerate public

											
										
										
											2019-10-11 11:19:42 +03:00
+								    <h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-09/">September, 2017</a></h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								    <p class="blog-post-meta"><time datetime="2017-09-07T16:54:52&#43;07:00">Thu Sep 07, 2017</time> by Alan Orth in
-												Regenerate docs

											
										
										
											2020-01-28 12:01:42 +02:00
+								<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
 								</p>
 								  </header>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								  <h2 id="2017-09-06">2017-09-06</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
 								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-07">2017-09-07</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-10">2017-09-10</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>Delete 58 blank metadata values from the CGSpace database:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
 								DELETE 58
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I also ran it on DSpace Test because we&rsquo;ll be migrating the CGIAR Library soon and it would be good to catch these before we migrate</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>Run system updates and restart DSpace Test</li>
 								<li>We only have 7.7GB of free space on DSpace Test so I need to copy some data off of it before doing the CGIAR Library migration (requires lots of exporting and creating temp files)</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I still have the original data from the CGIAR Library so I&rsquo;ve zipped it up and sent it off to linode18 for now</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>sha256sum of <code>original-cgiar-library-6.6GB.tar.gz</code> is: bcfabb52f51cbdf164b61b7e9b3a0e498479e4c1ed1d547d32d11f44c0d5eb8a</li>
 								<li>Start doing a test run of the CGIAR Library migration locally</li>
 								<li>Notes and todo checklist here for now: <a href="https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c">https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c</a></li>
 								<li>Create pull request for Phase I and II changes to CCAFS Project Tags: <a href="https://github.com/ilri/DSpace/pull/336">#336</a></li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>We&rsquo;ve been discussing with Macaroni Bros and CCAFS for the past month or so and the list of tags was recently finalized</li>
 								<li>There will need to be some metadata updates — though if I recall correctly it is only about seven records — for that as well, I had made some notes about it in <a href="/cgspace-notes/2017-07">2017-07</a>, but I&rsquo;ve asked for more clarification from Lili just in case</li>
 								<li>Looking at the DSpace logs to see if we&rsquo;ve had a change in the &ldquo;Cannot get a connection&rdquo; errors since last month when we adjusted the <code>db.maxconnections</code> parameter on CGSpace:</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code># grep -c &quot;Cannot get a connection, pool error Timeout waiting for idle object&quot; dspace.log.2017-09-*
 								dspace.log.2017-09-01:0
 								dspace.log.2017-09-02:0
 								dspace.log.2017-09-03:9
 								dspace.log.2017-09-04:17
 								dspace.log.2017-09-05:752
 								dspace.log.2017-09-06:0
 								dspace.log.2017-09-07:0
 								dspace.log.2017-09-08:10
 								dspace.log.2017-09-09:0
 								dspace.log.2017-09-10:0
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>Also, since last month (2017-08) Macaroni Bros no longer runs their REST API scraper every hour, so I&rsquo;m sure that helped</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>There are still some errors, though, so maybe I should bump the connection limit up a bit</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I remember seeing that Munin shows that the average number of connections is 50 (which is probably mostly from the XMLUI) and we&rsquo;re currently allowing 40 connections per app, so maybe it would be good to bump that value up to 50 or 60 along with the system&rsquo;s PostgreSQL <code>max_connections</code> (formula should be: webapps * 60 + 3, or 3 * 60 + 3 = 183 in our case)</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit)</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I&rsquo;m expecting to see 0 connection errors for the next few months</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-11">2017-09-11</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Lots of work testing the CGIAR Library migration</li>
 								<li>Many technical notes and TODOs here: <a href="https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c">https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c</a></li>
 								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-12">2017-09-12</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I was testing the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-AIPConfigurationsToImproveIngestionSpeedwhileValidating">METS XSD caching during AIP ingest</a> but it doesn&rsquo;t seem to help actually</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<li>The import process takes the same amount of time with and without the caching</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>Also, I captured TCP packets destined for port 80 and both imports only captured ONE packet (an update check from some component in Java):</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>$ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x47455420'
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>Great TCP dump guide here: <a href="https://danielmiessler.com/study/tcpdump">https://danielmiessler.com/study/tcpdump</a></li>
 								<li>The last part of that command filters for HTTP GET requests, of which there should have been many to fetch all the XSD files for validation</li>
 								<li>I sent a message to the mailing list to see if anyone knows more about this</li>
 								<li>In looking at the tcpdump results I notice that there is an update check to the ehcache server on <em>every</em> iteration of the ingest loop, for example:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>09:39:36.008956 IP 192.168.8.124.50515 &gt; 157.189.192.67.http: Flags [P.], seq 1736833672:1736834103, ack 147469926, win 4120, options [nop,nop,TS val 1175113331 ecr 550028064], length 431: HTTP: GET /kit/reflector?kitID=ehcache.default&amp;pageID=update.properties&amp;id=2130706433&amp;os-name=Mac+OS+X&amp;jvm-name=Java+HotSpot%28TM%29+64-Bit+Server+VM&amp;jvm-version=1.8.0_144&amp;platform=x86_64&amp;tc-version=UNKNOWN&amp;tc-product=Ehcache+Core+1.7.2&amp;source=Ehcache+Core&amp;uptime-secs=0&amp;patch=UNKNOWN HTTP/1.1
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>Turns out this is a known issue and Ehcache has refused to make it opt-in: <a href="https://jira.terracotta.org/jira/browse/EHC-461">https://jira.terracotta.org/jira/browse/EHC-461</a></li>
 								<li>But we can disable it by adding an <code>updateCheck=&quot;false&quot;</code> attribute to the main <code>&lt;ehcache &gt;</code> tag in <code>dspace-services/src/main/resources/caching/ehcache-config.xml</code></li>
 								<li>After re-compiling and re-deploying DSpace I no longer see those update checks during item submission</li>
 								<li>I had a Skype call with Bram Luyten from Atmire to discuss various issues related to ORCID in DSpace
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>First, ORCID is deprecating their version 1 API (which DSpace uses) and in version 2 API they have removed the ability to search for users by name</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>The logic is that searching by name actually isn&rsquo;t very useful because ORCID is essentially a global phonebook and there are tons of legitimately duplicate and ambiguous names</li>
 								<li>Atmire&rsquo;s proposed integration would work by having users lookup and add authors to the authority core directly using their ORCID ID itself (this would happen during the item submission process or perhaps as a standalone / batch process, for example to populate the authority core with a list of known ORCIDs)</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<li>Once the association between name and ORCID is made in the authority then it can be autocompleted in the lookup field</li>
 								<li>Ideally there could also be a user interface for cleanup and merging of authorities</li>
 								<li>He will prepare a quote for us with keeping in mind that this could be useful to contribute back to the community for a 5.x release</li>
 								<li>As far as exposing ORCIDs as flat metadata along side all other metadata, he says this should be possible and will work on a quote for us</li>
 								</ul>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</li>
 								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-13">2017-09-13</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Last night Linode sent an alert about CGSpace (linode18) that it has exceeded the outbound traffic rate threshold of 10Mb/s for the last two hours</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I wonder what was going on, and looking into the nginx logs I think maybe it&rsquo;s OAI&hellip;</li>
 								<li>Here is yesterday&rsquo;s top ten IP addresses making requests to <code>/oai</code>:</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
-												Add notes for 2019-05-05

											
										
										
											2019-05-05 16:45:12 +03:00
+								<pre><code># awk '{print $1}' /var/log/nginx/oai.log | sort -n | uniq -c | sort -h | tail -n 10
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+213.136.89.78
 66.249.66.90
 66.249.66.92
 68.180.229.31
 35.187.22.255
 54.70.175.86
 34.211.17.113
 35.161.215.53
 54.70.51.7
 								</code></pre><ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>Compared to the previous day&rsquo;s logs it looks VERY high:</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code># awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+207.46.13.39
 66.249.66.93
 66.249.66.91
 216.244.66.194
 66.249.66.90
 								</code></pre><ul>
 								<li>The user agents for those top IPs are:
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>54.70.175.86: API scraper</li>
 								<li>34.211.17.113: API scraper</li>
 								<li>35.161.215.53: API scraper</li>
 								<li>54.70.51.7: API scraper</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
 								</li>
 								<li>And this user agent has never been seen before today (or at least recently!):</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code># grep -c &quot;API scraper&quot; /var/log/nginx/oai.log
 
 								# zgrep -c &quot;API scraper&quot; /var/log/nginx/oai.log.*.gz
 								/var/log/nginx/oai.log.10.gz:0
 								/var/log/nginx/oai.log.11.gz:0
 								/var/log/nginx/oai.log.12.gz:0
 								/var/log/nginx/oai.log.13.gz:0
 								/var/log/nginx/oai.log.14.gz:0
 								/var/log/nginx/oai.log.15.gz:0
 								/var/log/nginx/oai.log.16.gz:0
 								/var/log/nginx/oai.log.17.gz:0
 								/var/log/nginx/oai.log.18.gz:0
 								/var/log/nginx/oai.log.19.gz:0
 								/var/log/nginx/oai.log.20.gz:0
 								/var/log/nginx/oai.log.21.gz:0
 								/var/log/nginx/oai.log.22.gz:0
 								/var/log/nginx/oai.log.23.gz:0
 								/var/log/nginx/oai.log.24.gz:0
 								/var/log/nginx/oai.log.25.gz:0
 								/var/log/nginx/oai.log.26.gz:0
 								/var/log/nginx/oai.log.27.gz:0
 								/var/log/nginx/oai.log.28.gz:0
 								/var/log/nginx/oai.log.29.gz:0
 								/var/log/nginx/oai.log.2.gz:0
 								/var/log/nginx/oai.log.30.gz:0
 								/var/log/nginx/oai.log.3.gz:0
 								/var/log/nginx/oai.log.4.gz:0
 								/var/log/nginx/oai.log.5.gz:0
 								/var/log/nginx/oai.log.6.gz:0
 								/var/log/nginx/oai.log.7.gz:0
 								/var/log/nginx/oai.log.8.gz:0
 								/var/log/nginx/oai.log.9.gz:0
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>Some of these heavy users are also using XMLUI, and their user agent isn&rsquo;t matched by the <a href="https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/tomcat/server-tomcat7.xml.j2#L158">Tomcat Session Crawler valve</a>, so each request uses a different session</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>Yesterday alone the IP addresses using the <code>API scraper</code> user agent were responsible for 16,000 sessions in XMLUI:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code># grep -a -E &quot;(54.70.51.7|35.161.215.53|34.211.17.113|54.70.175.86)&quot; /home/cgspace.cgiar.org/log/dspace.log.2017-09-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
 
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex</li>
 								<li>A search for &ldquo;API scraper&rdquo; user agent on Google returns a <code>robots.txt</code> with a comment that this is the Yewno bot: <a href="http://www.escholarship.org/robots.txt">http://www.escholarship.org/robots.txt</a></li>
 								<li>Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>WARN  org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>Looking at the spreadsheet with deletions and corrections that CCAFS sent last week</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>It appears they want to delete a lot of metadata, which I&rsquo;m not sure they realize the implications of:</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>dspace=# select text_value, count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange') group by text_value;
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								        text_value        | count
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								--------------------------+-------
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								 FP4_ClimateModels        |     6
 								 FP1_CSAEvidence          |     7
 								 SEA_UpscalingInnovation  |     7
 								 FP4_Baseline             |    69
 								 WA_Partnership           |     1
 								 WA_SciencePolicyExchange |     6
 								 SA_GHGMeasurement        |     2
 								 SA_CSV                   |     7
 								 EA_PAR                   |    18
 								 FP4_Livestock            |     7
 								 FP4_GenderPolicy         |     4
 								 FP2_CRMWestAfrica        |    12
 								 FP4_ClimateData          |    24
 								 FP4_CCPAG                |     2
 								 SEA_mitigationSAMPLES    |     2
 								 SA_Biodiversity          |     1
 								 FP4_PolicyEngagement     |    20
 								 FP3_Gender               |     9
 								 FP4_GenderToolbox        |     3
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								(19 rows)
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>I sent CCAFS people an email to ask if they really want to remove these 200+ tags</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>She responded yes, so I&rsquo;ll at least need to do these deletes in PostgreSQL:</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
 								DELETE 207
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>When we discussed this in late July there were some other renames they had requested, but I don&rsquo;t see them in the current spreadsheet so I will have to follow that up</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>I talked to Macaroni Bros and they said to just go ahead with the other corrections as well as their spreadsheet was evolved organically rather than systematically!</li>
 								<li>The final list of corrections and deletes should therefore be:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
 								update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
 								update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
 								delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
 								delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>Create and merge pull request to shut up the Ehcache update check (<a href="https://github.com/ilri/DSpace/pull/337">#337</a>)</li>
 								<li>Although it looks like there was a previous attempt to disable these update checks that was merged in DSpace 4.0 (although it only affects XMLUI): <a href="https://jira.duraspace.org/browse/DS-1492">https://jira.duraspace.org/browse/DS-1492</a></li>
 								<li>I commented there suggesting that we disable it globally</li>
 								<li>I merged the changes to the CCAFS project tags (<a href="https://github.com/ilri/DSpace/pull/336">#336</a>) but still need to finalize the metadata deletions/renames</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I merged the CGIAR Library theme changes (<a href="https://github.com/ilri/DSpace/pull/338">#338</a>) to the <code>5_x-prod</code> branch in preparation for next week&rsquo;s migration</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>I emailed the Handle administrators (<a href="mailto:hdladmin@cnri.reston.va.us">hdladmin@cnri.reston.va.us</a>) to ask them what the process for changing their prefix to be resolved by our resolver</li>
 								<li>They responded and said that they need email confirmation from the contact of record of the other prefix, so I should have the CGIAR System Organization people email them before I send the new <code>sitebndl.zip</code></li>
 								<li>Testing to see how we end up with all these new authorities after we keep cleaning and merging them in the database</li>
 								<li>Here are all my distinct authority combinations in the database before:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								 text_value |              authority               | confidence
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								------------+--------------------------------------+------------
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
 								 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
 								 Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 								 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
 								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
 								 Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
 								 Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								(8 rows)
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>And then after adding a new item and selecting an existing &ldquo;Orth, Alan&rdquo; with an ORCID in the author lookup:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								 text_value |              authority               | confidence
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								------------+--------------------------------------+------------
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
 								 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
 								 Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 								 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
 								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
 								 Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde |        600
 								 Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
 								 Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								(9 rows)
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>It created a new authority&hellip; let&rsquo;s try to add another item and select the same existing author and see what happens in the database:</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								 text_value |              authority               | confidence
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								------------+--------------------------------------+------------
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
 								 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
 								 Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 								 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
 								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
 								 Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde |        600
 								 Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
 								 Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								(9 rows)
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>No new one&hellip; so now let me try to add another item and select the italicized result from the ORCID lookup and see what happens in the database:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								 text_value |              authority               | confidence
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								------------+--------------------------------------+------------
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
 								 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 								 Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f |        600
 								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
 								 Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 								 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
 								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
 								 Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde |        600
 								 Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
 								 Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								(10 rows)
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>Shit, it created another authority! Let&rsquo;s try it again!</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								 text_value |              authority               | confidence
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								------------+--------------------------------------+------------
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |         -1
 								 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 								 Orth, Alan | d85a8a5b-9b82-4aaf-8033-d7e0c7d9cb8f |        600
 								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |        600
 								 Orth, Alan | 9aed566a-a248-4878-9577-0caedada43db |        600
 								 Orth, A.   | 1a1943a0-3f87-402f-9afe-e52fb46a513e |        600
 								 Orth, Alan | 1a1943a0-3f87-402f-9afe-e52fb46a513e |         -1
 								 Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad |          0
 								 Orth, Alan | cb3aa5ae-906f-4902-97b1-2667cf148dde |        600
 								 Orth, Alan | 0d575fa3-8ac4-4763-a90a-1248d4791793 |         -1
 								 Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								(11 rows)
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>It added <em>another</em> authority&hellip; surely this is not the desired behavior, or maybe we are not using this as intented?</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-14">2017-09-14</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Communicate with Handle.net admins to try to get some guidance about the 10947 prefix</li>
 								<li>Michael Marus is the contact for their prefix but he has left CGIAR, but as I actually have access to the CGIAR Library server I think I can just generate a new <code>sitebndl.zip</code> file from their server and send it to Handle.net</li>
 								<li>Also, Handle.net says their prefix is up for annual renewal next month so we might want to just pay for it and take it over</li>
 								<li>CGSpace was very slow and Uptime Robot even said it was down at one time</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I didn&rsquo;t see any abnormally high usage in the REST or OAI logs, but looking at Munin I see the average JVM usage was at 4.9GB and the heap is only 5GB (5120M), so I think it&rsquo;s just normal growing pains</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<li>Every few months I generally try to increase the JVM heap to be 512M higher than the average usage reported by Munin, so now I adjusted it to 5632M</li>
 								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-15">2017-09-15</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>Apply CCAFS project tag corrections on CGSpace:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>dspace=# \i /tmp/ccafs-projects.sql
 								DELETE 5
 								UPDATE 4
 								UPDATE 1
 								DELETE 1
 								DELETE 207
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								</code></pre><h2 id="2017-09-17">2017-09-17</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Create pull request for CGSpace to be able to resolve multiple handles (<a href="https://github.com/ilri/DSpace/pull/339">#339</a>)</li>
 								<li>We still need to do the changes to <code>config.dct</code> and regenerate the <code>sitebndl.zip</code> to send to the Handle.net admins</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>According to this <a href="http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html">dspace-tech mailing list entry from 2011</a>, we need to add the extra handle prefixes to <code>config.dct</code> like this:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>&quot;server_admins&quot; = (
 								&quot;300:0.NA/10568&quot;
 								&quot;300:0.NA/10947&quot;
 								)
 								&quot;replication_admins&quot; = (
 								&quot;300:0.NA/10568&quot;
 								&quot;300:0.NA/10947&quot;
 								)
 								&quot;backup_admins&quot; = (
 								&quot;300:0.NA/10568&quot;
 								&quot;300:0.NA/10947&quot;
 								)
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>More work on the CGIAR Library migration test run locally, as I was having problem with importing the last fourteen items from the CGIAR System Management Office community</li>
 								<li>The problem was that we remapped the items to new collections after the initial import, so the items were using the 10947 prefix but the community and collection was using 10568</li>
 								<li>I ended up having to read the <a href="https://wiki.duraspace.org/display/DSDOC5x/AIP+Backup+and+Restore#AIPBackupandRestore-ForceReplaceMode">AIP Backup and Restore</a> closely a few times and then explicitly preserve handles and ignore parents:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>$ for item in 10568-93759/ITEM@10947-46*; do ~/dspace/bin/dspace packager -r -t AIP -o ignoreHandle=false -o ignoreParent=true -e aorth@mjanja.ch -p 10568/87738 $item; done
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>Also, this was in replace mode (-r) rather than submit mode (-s), because submit mode always generated a new handle even if I told it not to!</li>
 								<li>I decided to start the import process in the evening rather than waiting for the morning, and right as the first community was finished importing I started seeing <code>Timeout waiting for idle object</code> errors</li>
 								<li>I had to cancel the import, clean up a bunch of database entries, increase the PostgreSQL <code>max_connections</code> as a precaution, restart PostgreSQL and Tomcat, and then finally completed the import</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-18">2017-09-18</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>I think we should force regeneration of all thumbnails in the CGIAR Library community, as their DSpace is version 1.7 and CGSpace is running DSpace 5.5 so they should look much better</li>
 								<li>One item for comparison:</li>
 								</ul>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<p><img src="/cgspace-notes/2017/09/10947-2919-before.jpg" alt="With original DSpace 1.7 thumbnail"></p>
 								<p><img src="/cgspace-notes/2017/09/10947-2919-after.jpg" alt="After DSpace 5.5"></p>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Moved the CGIAR Library Migration notes to a page — <a href="/cgspace-notes/cgiar-library-migration/">cgiar-library-migration</a> — as there seems to be a bug with post slugs defined in frontmatter when you have a permalink scheme defined in <code>config.toml</code> (happens currently in Hugo 0.27.1 at least)</li>
 								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-19">2017-09-19</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>Nightly Solr indexing is working again, and it appears to be pretty quick actually:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>2017-09-19 00:00:14,953 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Processing (0 of 65808): 17607
 								...
 -09-19 00:04:18,017 INFO  com.atmire.dspace.discovery.AtmireSolrService @ Processing (65807 of 65808): 83753
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>Sisay asked if he could import 50 items for IITA that have already been checked by Bosede and Bizuwork</li>
 								<li>I had a look at the collection and noticed a bunch of issues with item types and donors, so I asked him to fix those and import it to DSpace Test again first</li>
 								<li>Abenet wants to be able to filter by ISI Journal in advanced search on queries like this: <a href="https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&amp;filtertype_1=dateIssued&amp;filter_relational_operator_1=equals&amp;filter_relational_operator_0=equals&amp;filter_1=%5B2010+TO+2017%5D&amp;filter_0=2017&amp;filtertype=type&amp;filter_relational_operator=equals&amp;filter=Journal+Article">https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&amp;filtertype_1=dateIssued&amp;filter_relational_operator_1=equals&amp;filter_relational_operator_0=equals&amp;filter_1=%5B2010+TO+2017%5D&amp;filter_0=2017&amp;filtertype=type&amp;filter_relational_operator=equals&amp;filter=Journal+Article</a></li>
 								<li>I opened an issue to track this (<a href="https://github.com/ilri/DSpace/issues/340">#340</a>) and will test it on DSpace Test soon</li>
 								<li>Marianne Gadeberg from WLE asked if I would add an account for Adam Hunt on CGSpace and give him permissions to approve all WLE publications</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I told him to register first, as he&rsquo;s a CGIAR user and needs an account to be created before I can add him to the groups</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-20">2017-09-20</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Abenet and I noticed that hdl.handle.net is blocked by ETC at ILRI Addis so I asked Biruk Debebe to route it over the satellite</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>Force thumbnail regeneration for the CGIAR System Organization&rsquo;s Historic Archive community (2000 items):</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>$ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -f -i 10947/1 -p &quot;ImageMagick PDF Thumbnail&quot;
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I&rsquo;m still waiting (over 1 day later) to hear back from the CGIAR System Organization about updating the DNS for library.cgiar.org</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-21">2017-09-21</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Switch to OpenJDK 8 from Oracle JDK on DSpace Test</li>
 								<li>I want to test this for awhile to see if we can start using it instead</li>
 								<li>I need to look at the JVM graphs in Munin, test the Atmire modules, build the source, etc to get some impressions</li>
 								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-22">2017-09-22</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Experimenting with setting up a global JNDI database resource that can be pooled among all the DSpace webapps (reference the <a href="https://wiki.duraspace.org/display/cmtygp/DCAT+Meeting+April+2017">April, 2017 DCAT meeting</a> comments)</li>
 								<li>See: <a href="https://www.journaldev.com/2513/tomcat-datasource-jndi-example-java">https://www.journaldev.com/2513/tomcat-datasource-jndi-example-java</a></li>
 								<li>See: <a href="http://memorynotfound.com/configure-jndi-datasource-tomcat/">http://memorynotfound.com/configure-jndi-datasource-tomcat/</a></li>
 								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-24">2017-09-24</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Start investigating other platforms for CGSpace due to linear instance pricing on Linode</li>
 								<li>We need to figure out how much memory is used by applications, caches, etc, and how much disk space the asset store needs</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>First, here&rsquo;s the last week of memory usage on CGSpace and DSpace Test:</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								</ul>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<p><img src="/cgspace-notes/2017/09/cgspace-memory-week.png" alt="CGSpace memory week">
 								<img src="/cgspace-notes/2017/09/dspace-test-memory-week.png" alt="DSpace Test memory week"></p>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>8GB of RAM seems to be good for DSpace Test for now, with Tomcat&rsquo;s JVM heap taking 3GB, caches and buffers taking 3–4GB, and then ~1GB unused</li>
 								<li>24GB of RAM is <em>way</em> too much for CGSpace, with Tomcat&rsquo;s JVM heap taking 5.5GB and caches and buffers happily using 14GB or so</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<li>As far as disk space, the CGSpace assetstore currently uses 51GB and Solr cores use 86GB (mostly in the statistics core)</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>DSpace Test currently doesn&rsquo;t even have enough space to store a full copy of CGSpace, as its Linode instance only has 96GB of disk space</li>
 								<li>I&rsquo;ve heard Google Cloud is nice (cheap and performant) but it&rsquo;s definitely more complicated than Linode and instances aren&rsquo;t <em>that</em> much cheaper to make it worth it</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<li>Here are some theoretical instances on Google Cloud:
 								<ul>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>DSpace Test, <code>n1-standard-2 </code> with 2 vCPUs, 7.5GB RAM, 300GB persistent SSD: $99/month</li>
 								<li>CGSpace, <code>n1-standard-4 </code> with 4 vCPUs, 15GB RAM, 300GB persistent SSD: $148/month</li>
 								</ul>
 								</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>Looking at <a href="https://www.linode.com/pricing#all">Linode&rsquo;s instance pricing</a>, for DSpace Test it seems we could use the same 8GB instance for $40/month, and then add <a href="https://www.linode.com/docs/platform/how-to-use-block-storage-with-your-linode">block storage</a> of ~300GB for $30 (block storage is currently in beta and priced at $0.10/GiB)</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<li>For CGSpace we could use the cheaper 12GB instance for $80 and then add block storage of 500GB for $50</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I&rsquo;ve sent Peter a message about moving DSpace Test to the New Jersey data center so we can test the block storage beta</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<li>Create pull request for adding ISI Journal to search filters (<a href="https://github.com/ilri/DSpace/pull/341">#341</a>)</li>
 								<li>Peter asked if we could map all the items of type <code>Journal Article</code> in <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI Archive</a> to <a href="https://cgspace.cgiar.org/handle/10568/3">ILRI articles in journals and newsletters</a></li>
 								<li>It is easy to do via CSV using OpenRefine but I noticed that on CGSpace ~1,000 of the expected 2,500 are already mapped, while on DSpace Test they were not</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I&rsquo;ve asked Peter if he knows what&rsquo;s going on (or who mapped them)</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<li>Turns out he had already mapped some, but requested that I finish the rest</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								<li>With this GREL in OpenRefine I can find items that are mapped, ie they have <code>10568/3||</code> or <code>10568/3$</code> in their <code>collection</code> field:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>isNotNull(value.match(/.+?10568\/3(\|\|.+|$)/))
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>Peter also made a lot of changes to the data in the Archives collections while I was attempting to import the changes, so we were essentially competing for PostgreSQL and Solr connections</li>
 								<li>I ended up having to kill the import and wait until he was done</li>
 								<li>I exported a clean CSV and applied the changes from that one, which was a hundred or two less than I thought there should be (at least compared to the current state of DSpace Test, which is a few months old)</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-25">2017-09-25</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Email Rosemary Kande from ICT to ask about the administrative / finance procedure for moving DSpace Test from EU to US region on Linode</li>
 								<li>Communicate (finally) with Tania and Tunji from the CGIAR System Organization office to tell them to request CGNET make the DNS updates for library.cgiar.org</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>Peter wants me to clean up the text values for Delia Grace&rsquo;s metadata, as the authorities are all messed up again since we cleaned them up in <a href="/cgspace-notes/2016-12">2016-12</a>:</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								  text_value  |              authority               | confidence
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								--------------+--------------------------------------+------------
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								 Grace, Delia |                                      |        600
 								 Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c |        600
 								 Grace, Delia | bfa61d7c-7583-4175-991c-2e7315000f0c |         -1
 								 Grace, D.    | 6a8ddca3-33c1-45f9-aa00-6fa9fc91e3fc |         -1
 								</code></pre><ul>
 								<li>Strangely, none of her authority entries have ORCIDs anymore&hellip;</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I&rsquo;ll just fix the text values and forget about it for now:</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>dspace=# update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
 								UPDATE 610
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>After this we have to reindex the Discovery and Authority cores (as <code>tomcat7</code> user):</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1&quot;
 								$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
 								real    83m56.895s
 								user    13m16.320s
 								sys     2m17.917s
 								$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-authority -b
 								Retrieving all data
 								Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
 								Exception: null
 								java.lang.NullPointerException
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								        at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
 								        at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
 								        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
 								        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
 								        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
 								        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
 								        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
 								        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
 								        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
 								        at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
 								        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 								        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 								        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 								        at java.lang.reflect.Method.invoke(Method.java:498)
 								        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
 								        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
 								real    6m6.447s
 								user    1m34.010s
 								sys     0m12.113s
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>The <code>index-authority</code> script always seems to fail, I think it&rsquo;s the same old bug</li>
 								<li>Something interesting for my notes about JNDI database pool—since I couldn&rsquo;t determine if it was working or not when I tried it locally the other day—is this error message that I just saw in the DSpace logs today:</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>ERROR org.dspace.storage.rdbms.DatabaseManager @ Error retrieving JNDI context: jdbc/dspaceLocal
 								...
 								INFO  org.dspace.storage.rdbms.DatabaseManager @ Unable to locate JNDI dataSource: jdbc/dspaceLocal
 								INFO  org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Database pool
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>So it&rsquo;s good to know that <em>something</em> gets printed when it fails because I didn&rsquo;t see <em>any</em> mention of JNDI before when I was testing!</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-26">2017-09-26</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Adam Hunt from WLE finally registered so I added him to the editor and approver groups</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>Then I noticed that Sisay never removed Marianne&rsquo;s user accounts from the approver steps in the workflow because she is already in the WLE groups, which are in those steps</li>
 								<li>For what it&rsquo;s worth, I had asked him to remove them on 2017-09-14</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<li>I also went and added the WLE approvers and editors groups to the appropriate steps of all the Phase I and Phase II research theme collections</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>A lot of CIAT&rsquo;s items have manually generated thumbnails which have an incorrect aspect ratio and an ugly black border</li>
 								<li>I communicated with Elizabeth from CIAT to tell her she should use DSpace&rsquo;s automatically generated thumbnails</li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<li>Start discussiong with ICT about Linode server update for DSpace Test</li>
 								<li>Rosemary said I need to work with Robert Okal to destroy/create the server, and then let her and Lilian Masigah from finance know the updated Linode asset names for their records</li>
 								</ul>
-												Add notes for 2019-12-17

											
										
										
											2019-12-17 14:49:24 +02:00
+								<h2 id="2017-09-28">2017-09-28</h2>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<ul>
 								<li>Tunji from the System Organization finally sent the DNS request for library.cgiar.org to CGNET</li>
 								<li>Now the redirects work</li>
-												Add notes for 2020-01-27

											
										
										
											2020-01-27 16:20:44 +02:00
+								<li>I quickly registered a Let&rsquo;s Encrypt certificate for the domain:</li>
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code># systemctl stop nginx
 								# /opt/certbot-auto certonly --standalone --email aorth@mjanja.ch -d library.cgiar.org
 								# systemctl start nginx
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre><ul>
 								<li>I modified the nginx configuration of the ansible playbooks to use this new certificate and now the certificate is enabled and OCSP stapling is working:</li>
 								</ul>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								<pre><code>$ openssl s_client -connect cgspace.cgiar.org:443 -servername library.cgiar.org  -tls1_2 -tlsextdebug -status
 								...
 								OCSP Response Data:
 								...
 								Cert Status: good
-												Add notes for 2019-11-28

											
										
										
											2019-11-28 17:30:45 +02:00
+								</code></pre>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
 								</article>
 								        </div> <!-- /.blog-main -->
 								        <aside class="col-sm-3 ml-auto blog-sidebar">
 								        <section class="sidebar-module">
 								    <h4>Recent Posts</h4>
 								    <ol class="list-unstyled">
-												Add notes for 2020-02-02

											
										
										
											2020-02-02 17:15:48 +02:00
+								<li><a href="/cgspace-notes/2020-02/">February, 2020</a></li>
-												Add notes for 2020-01-14

											
										
										
											2020-01-14 20:40:41 +02:00
+								<li><a href="/cgspace-notes/2020-01/">January, 2020</a></li>
-												Add notes for 2019-12-01

											
										
										
											2019-12-01 11:29:49 +02:00
+								<li><a href="/cgspace-notes/2019-12/">December, 2019</a></li>
-												Add notes for 2019-11-04

											
										
										
											2019-11-04 16:41:19 +02:00
+								<li><a href="/cgspace-notes/2019-11/">November, 2019</a></li>
-												Regenerate public

											
										
										
											2019-10-28 13:43:25 +02:00
+								<li><a href="/cgspace-notes/cgspace-cgcorev2-migration/">CGSpace CG Core v2 Migration</a></li>
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
+								    </ol>
 								  </section>
 								  <section class="sidebar-module">
 								    <h4>Links</h4>
 								    <ol class="list-unstyled">
 								      <li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
 								      <li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
 								      <li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
 								    </ol>
 								  </section>
 								</aside>
 								      </div> <!-- /.row -->
 								    </div> <!-- /.container -->
 								    <footer class="blog-footer">
-												Update theme submodule and regenerate public

											
										
										
											2019-10-11 11:19:42 +03:00
+								      <p dir="auto">
-												Try to use 'docs' dir for GitHub pages

											
										
										
											2018-02-11 18:28:23 +02:00
 								      Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
 								      </p>
 								      <p>
 								      <a href="#">Back to top</a>
 								      </p>
 								    </footer>
 								  </body>
 								</html>