<h2 id="2019-05-01">2019-05-01</h2>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
<pre tabindex="0"><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
<li>I managed to delete the problematic item from the database
<li>First I deleted the item&rsquo;s bitstream in XMLUI and then ran <code>dspace cleanup -v</code> to remove it from the assetstore</li>
<li>Then I ran the following SQL:</li>
<pre tabindex="0"><code>dspace=# DELETE FROM metadatavalue WHERE resource_id=74648;
dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
dspace=# DELETE FROM item WHERE item_id=74648;
<li>Now the item is (hopefully) really gone and I can continue to troubleshoot the issue with REST API&rsquo;s <code>/items/find-by-metadata-value</code> endpoint
<li>Of course I run into another HTTP 401 error when I continue trying the LandPortal search from last month:</li>
<pre tabindex="0"><code>$ curl -f -H &#34;Content-Type: application/json&#34; -X POST &#34;http://localhost:8080/rest/items/find-by-metadata-field&#34; -d &#39;{&#34;key&#34;:&#34;cg.subject.cpwf&#34;, &#34;value&#34;:&#34;WATER MANAGEMENT&#34;,&#34;language&#34;: &#34;en_US&#34;}&#39;
curl: (22) The requested URL returned error: 401 Unauthorized
<li>The DSpace log shows the item ID (because I modified the error text):</li>
<pre tabindex="0"><code>2019-05-01 11:41:11,069 ERROR @ User(anonymous) has not permission to read item(id=77708)!
<li>If I delete that one I get another, making the list of item IDs so far:
<li>Some are in the <code>workspaceitem</code> table (pre-submission), others are in the <code>workflowitem</code> table (submitted), and others are actually approved, but withdrawn&hellip;
<li>This is actually a worthless exercise because the real issue is that the <code>/items/find-by-metadata-value</code> endpoint is simply designed flawed and shouldn&rsquo;t be fatally erroring when the search returns items the user doesn&rsquo;t have permission to access</li>
<li>It would take way too much time to try to fix the fucked up items that are in limbo by deleting them in SQL, but also, it doesn&rsquo;t actually fix the problem because some items are <em>submitted</em> but <em>withdrawn</em>, so they actually have handles and everything</li>
<li>I think the solution is to recommend people don&rsquo;t use the <code>/items/find-by-metadata-value</code> endpoint</li>
<li>CIP is asking about embedding PDF thumbnail images in their RSS feeds again
<li>They asked in 2018-09 as well and I told them it wasn&rsquo;t possible</li>
<li>To make sure, I looked at <a href="">the documentation for RSS media feeds</a> and tried it, but couldn&rsquo;t get it to work</li>
<li>It seems to be geared towards iTunes and Podcasts&hellip; I dunno</li>
<li>CIP also asked for a way to get an XML file of all their RTB journal articles on CGSpace
<li>I told them to use the REST API like (where <code>1179</code> is the id of the RTB journal articles collection):</li>
<pre tabindex="0"><code>;expand=metadata
</code></pre><h2 id="2019-05-03">2019-05-03</h2>
<li>A user from CIAT emailed to say that CGSpace submission emails have not been working the last few weeks
<li>I checked the <code>dspace test-email</code> script on CGSpace and they are indeed failing:</li>
<pre tabindex="0"><code>$ dspace test-email
About to send test email:
- To:
- Subject: DSpace test email
- Server:
Error sending email:
- Error: javax.mail.AuthenticationFailedException
Please see the DSpace documentation for assistance.
<li>I will ask ILRI ICT to reset the password
<li>They reset the password and I tested it on CGSpace</li>
<h2 id="2019-05-05">2019-05-05</h2>
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li>Merge changes into the <code>5_x-prod</code> branch of CGSpace:
<li>Updates to remove deprecated social media websites (Google+ and Delicious), update Twitter share intent, and add item title to Twitter and email links (<a href="">#421</a>)</li>
<li>Add new CCAFS Phase II project tags (<a href="">#420</a>)</li>
<li>Add item ID to REST API error logging (<a href="">#422</a>)</li>
<li>Re-deploy CGSpace from <code>5_x-prod</code> branch</li>
<li>Run all system updates on CGSpace (linode18) and reboot it</li>
<li>Tag version 1.1.0 of the <a href="">dspace-statistics-api</a> (with Falcon 2.0.0)
<li>Deploy on DSpace Test</li>
<h2 id="2019-05-06">2019-05-06</h2>
<li>Peter pointed out that Solr stats are only showing 2019 stats
<li>I looked at the Solr Admin UI and I see:</li>
<pre tabindex="0"><code>statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
<li>As well as this error in the logs:</li>
<pre tabindex="0"><code>Caused by: Lock obtain timed out: NativeFSLock@/home/
<li>Strangely enough, I <em>do</em> see the statistics-2018, statistics-2017, etc cores in the Admin UI&hellip;</li>
<li>I restarted Tomcat a few times (and even deleted all the Solr write locks) and at least five times there were issues loading one statistics core, causing the Atmire stats to be incomplete
<li>Also, I tried to increase the <code>writeLockTimeout</code> in <code>solrconfig.xml</code> from the default of 1000ms to 10000ms</li>
<li>Eventually the Atmire stats started working, despite errors about &ldquo;Error opening new searcher&rdquo; in the Solr Admin UI</li>
<li>I wrote to the dspace-tech mailing list again on the thread from March, 2019</li>
<li>There were a few alerts from UptimeRobot about CGSpace going up and down this morning, along with an alert from Linode about 596% load
<li>Looking at the Munin stats I see an exponential rise in DSpace XMLUI sessions, firewall activity, and PostgreSQL connections this morning:</li>
<p><img src="/cgspace-notes/2019/05/2019-05-06-jmx_dspace_sessions-day.png" alt="CGSpace XMLUI sessions day"></p>
<p><img src="/cgspace-notes/2019/05/2019-05-06-fw_conntrack-day.png" alt="linode18 firewall connections day"></p>
<p><img src="/cgspace-notes/2019/05/2019-05-06-postgres_connections_db-day.png" alt="linode18 postgres connections day"></p>
<p><img src="/cgspace-notes/2019/05/2019-05-06-cpu-day.png" alt="linode18 CPU day"></p>
<li>The number of unique sessions today is <em>ridiculously</em> high compared to the last few days considering it&rsquo;s only 12:30PM right now:</li>
<pre tabindex="0"><code>$ grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; dspace.log.2019-05-06 | sort | uniq | wc -l
$ grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; dspace.log.2019-05-05 | sort | uniq | wc -l
$ grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; dspace.log.2019-05-04 | sort | uniq | wc -l
$ grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; dspace.log.2019-05-03 | sort | uniq | wc -l
$ grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; dspace.log.2019-05-02 | sort | uniq | wc -l
$ grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; dspace.log.2019-05-01 | sort | uniq | wc -l
<li>The number of unique IP addresses from 2 to 6 AM this morning is already several times higher than the average for that time of the morning this past week:</li>
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &#39;06/May/2019:(02|03|04|05|06)&#39; | awk &#39;{print $1}&#39; | sort | uniq | wc -l
# zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E &#39;05/May/2019:(02|03|04|05|06)&#39; | awk &#39;{print $1}&#39; | sort | uniq | wc -l
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E &#39;04/May/2019:(02|03|04|05|06)&#39; | awk &#39;{print $1}&#39; | sort | uniq | wc -l
# zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz | grep -E &#39;03/May/2019:(02|03|04|05|06)&#39; | awk &#39;{print $1}&#39; | sort | uniq | wc -l
# zcat --force /var/log/nginx/access.log.4.gz /var/log/nginx/access.log.5.gz | grep -E &#39;02/May/2019:(02|03|04|05|06)&#39; | awk &#39;{print $1}&#39; | sort | uniq | wc -l
# zcat --force /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.6.gz | grep -E &#39;01/May/2019:(02|03|04|05|06)&#39; | awk &#39;{print $1}&#39; | sort | uniq | wc -l
<li>Just this morning between the hours of 2 and 6 the number of unique sessions was <em>very</em> high compared to previous mornings:</li>
<pre tabindex="0"><code>$ cat dspace.log.2019-05-06 | grep -E &#39;2019-05-06 (02|03|04|05|06):&#39; | grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
$ cat dspace.log.2019-05-05 | grep -E &#39;2019-05-05 (02|03|04|05|06):&#39; | grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
$ cat dspace.log.2019-05-04 | grep -E &#39;2019-05-04 (02|03|04|05|06):&#39; | grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
$ cat dspace.log.2019-05-03 | grep -E &#39;2019-05-03 (02|03|04|05|06):&#39; | grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
$ cat dspace.log.2019-05-02 | grep -E &#39;2019-05-02 (02|03|04|05|06):&#39; | grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
$ cat dspace.log.2019-05-01 | grep -E &#39;2019-05-01 (02|03|04|05|06):&#39; | grep -o -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
<li>Most of the requests were GETs:</li>
<pre tabindex="0"><code># cat /var/log/nginx/{access,library-access}.log /var/log/nginx/{access,library-access}.log.1 | grep -E &#39;06/May/2019:(02|03|04|05|06)&#39; | grep -o -E &#34;(GET|HEAD|POST|PUT)&#34; | sort | uniq -c | sort -n
2845 HEAD
98121 GET
<li>I&rsquo;m not exactly sure what happened this morning, but it looks like some legitimate user traffic—perhaps someone launched a new publication and it got a bunch of hits?</li>
<li>Looking again, I see 84,000 requests to <code>/handle</code> this morning (not including logs for because those get HTTP 301 redirect to CGSpace and appear here in <code>access.log</code>):</li>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &#39;06/May/2019:(02|03|04|05|06)&#39; | grep -c -o -E &#34; /handle/[0-9]+/[0-9]+&#34;
<li>But it would be difficult to find a pattern for those requests because they cover 78,000 <em>unique</em> Handles (ie direct browsing of items, collections, or communities) and only 2,492 discover/browse (total, not unique):</li>
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &#39;06/May/2019:(02|03|04|05|06)&#39; | grep -o -E &#34; /handle/[0-9]+/[0-9]+ HTTP&#34; | sort | uniq | wc -l
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &#39;06/May/2019:(02|03|04|05|06)&#39; | grep -o -E &#34; /handle/[0-9]+/[0-9]+/(discover|browse)&#34; | wc -l
<li>In other news, I see some IP is making several requests per second to the exact same REST API endpoints, for example:</li>
<pre tabindex="0"><code># grep /rest/handle/10568/3703?expand=all rest.log | awk &#39;{print $1}&#39; | sort | uniq -c
3 2a01:7e00::f03c:91ff:fe0a:d645
<li>According to <a href=";t=1"></a> that server belongs to Macaroni Brothers
<li>The user agent of their non-REST API requests from the same IP is Drupal</li>
<li>This is one very good reason to limit REST API requests, and perhaps to enable caching via nginx</li>
<h2 id="2019-05-07">2019-05-07</h2>
<li>The total number of unique IPs on CGSpace yesterday was almost 14,000, which is several thousand higher than previous day totals:</li>
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E &#39;06/May/2019&#39; | awk &#39;{print $1}&#39; | sort | uniq | wc -l
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E &#39;05/May/2019&#39; | awk &#39;{print $1}&#39; | sort | uniq | wc -l
# zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz | grep -E &#39;04/May/2019&#39; | awk &#39;{print $1}&#39; | sort | uniq | wc -l
# zcat --force /var/log/nginx/access.log.4.gz /var/log/nginx/access.log.5.gz | grep -E &#39;03/May/2019&#39; | awk &#39;{print $1}&#39; | sort | uniq | wc -l
<li>Total number of sessions yesterday was <em>much</em> higher compared to days last week:</li>
<pre tabindex="0"><code>$ cat dspace.log.2019-05-06 | grep -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
$ cat dspace.log.2019-05-05 | grep -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
$ cat dspace.log.2019-05-04 | grep -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
$ cat dspace.log.2019-05-03 | grep -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
$ cat dspace.log.2019-05-02 | grep -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
$ cat dspace.log.2019-05-01 | grep -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
<li>The usage statistics seem to agree that yesterday was crazy:</li>
<p><img src="/cgspace-notes/2019/05/2019-05-07-atmire-usage-week.png" alt="Atmire Usage statistics spike 2019-05-06"></p>
<li>Sarah from RTB asked me about the RSS / XML link for the the website again
<li>Apparently Sam Stacey is trying to add an RSS feed so the items get automatically syndicated to the CGIAR website</li>
<li>I send her the link to the collection RSS feed</li>
<li>Add requests cache to <code></code> script</li>
<h2 id="2019-05-08">2019-05-08</h2>
<li>A user said that CGSpace emails have stopped sending again
<li>Indeed, the <code>dspace test-email</code> script is showing an authentication failure:</li>
<pre tabindex="0"><code>$ dspace test-email
About to send test email:
- To:
- Subject: DSpace test email
- Server:
Error sending email:
- Error: javax.mail.AuthenticationFailedException
Please see the DSpace documentation for assistance.
<li>I checked the settings and apparently I had updated it incorrectly last week after ICT reset the password</li>
<li>Help Moayad with certbot-auto for Let&rsquo;s Encrypt scripts on the new AReS server (linode20)</li>
<li>Normalize all <code>text_lang</code> values for metadata on CGSpace and DSpace Test (as I had tested last month):</li>
<pre tabindex="0"><code>UPDATE metadatavalue SET text_lang=&#39;en_US&#39; WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN (&#39;ethnob&#39;, &#39;en&#39;, &#39;*&#39;, &#39;E.&#39;, &#39;&#39;);
UPDATE metadatavalue SET text_lang=&#39;en_US&#39; WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IS NULL;
UPDATE metadatavalue SET text_lang=&#39;es_ES&#39; WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN (&#39;es&#39;, &#39;spa&#39;);
<li>Send Francesca Giampieri from Bioversity a CSV export of all their items issued in 2018
<li>They will be doing a migration of 1500 items from their TYPO3 database into CGSpace soon and want an example CSV with all required metadata columns</li>
<h2 id="2019-05-10">2019-05-10</h2>
<li>I finally had time to analyze the 7,000 IPs from the major traffic spike on 2019-05-06 after several runs of my <code></code> script ( has a limit of 1,000 requests per day)</li>
<li>Resolving the unique IP addresses to organization and AS names reveals some pretty big abusers:
<li>1213 from Region40 LLC (AS200557)</li>
<li>697 from Trusov Ilya Igorevych (AS50896)</li>
<li>687 from UGB Hosting OU (AS206485)</li>
<li>620 from UAB Rakrejus (AS62282)</li>
<li>491 from Dedipath (AS35913)</li>
<li>476 from Global Layer B.V. (AS49453)</li>
<li>333 from QuadraNet Enterprises LLC (AS8100)</li>
<li>278 from GigeNET (AS32181)</li>
<li>261 from Psychz Networks (AS40676)</li>
<li>196 from Cogent Communications (AS174)</li>
<li>125 from Blockchain Network Solutions Ltd (AS43444)</li>
<li>118 from Silverstar Invest Limited (AS35624)</li>
<li>All of the IPs from these networks are using generic user agents like this, but MANY more, and they change many times:</li>
<pre tabindex="0"><code>&#34;Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2703.0 Safari/537.36&#34;
<li>I found a <a href="">blog post from 2018 detailing an attack from a DDoS service</a> that matches our pattern exactly</li>
<li>They specifically mention:</li>
<!-- raw HTML omitted -->
<li>So this was definitely an attack of some sort&hellip; only God knows why</li>
<li>I noticed a few new bots that don&rsquo;t use the word &ldquo;bot&rdquo; in their user agent and therefore don&rsquo;t match Tomcat&rsquo;s Crawler Session Manager Valve:
<li><code>Blackboard Safeassign</code></li>
<h2 id="2019-05-12">2019-05-12</h2>
<li>I see that the Unpaywall bot is resonsible for a few thousand XMLUI sessions every day (IP addresses come from nginx access.log):</li>
<pre tabindex="0"><code>$ cat dspace.log.2019-05-11 | grep -E &#39;ip_addr=(|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||; | grep -E &#39;session_id=[A-Z0-9]{32}&#39; | sort | uniq | wc -l
<li>I added &ldquo;Unpaywall&rdquo; to the list of bots in the Tomcat Crawler Session Manager Valve</li>
<li>Set up nginx to use TLS and proxy pass to NodeJS on the AReS development server (linode20)</li>
<li>Run all system updates on linode20 and reboot it</li>
<li>Also, there is 10 to 20% CPU steal on that VM, so I will ask Linode to move it to another host</li>
<li>Commit changes to the <code></code> script to add proper CSV output support</li>
<h2 id="2019-05-14">2019-05-14</h2>
<li>Skype with Peter and AgroKnow about CTA story telling modification they want to do on the CTA ICT Update collection on CGSpace
<li>I told them they should aim for modifying the collection theme and insert some custom HTML / JS</li>
<li>I need to send Panagis some documentation about Mirage 2 and the DSpace build process, as well as the Maven settings for build</li>
<h2 id="2019-05-15">2019-05-15</h2>
<li>Tezira says she&rsquo;s having issues with email reports for approved submissions, but I received an email about collection subscriptions this morning, and I tested with <code>dspace test-email</code> and it&rsquo;s also working&hellip;</li>
<li>Send a list of DSpace build tips to Panagis from AgroKnow</li>
<li>Finally fix the AReS v2 to work via DSpace Test and send it to Peter et al to give their feedback
<li>We had issues with CORS due to Moayad using a hard-coded domain name rather than a relative URL</li>
<h2 id="2019-05-16">2019-05-16</h2>
<li>Export a list of all investors (<code>dc.description.sponsorship</code>) for Peter to look through and correct:</li>
<pre tabindex="0"><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 29 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-05-16-investors.csv WITH CSV HEADER;
COPY 995
<li>Fork the <a href="">ICARDA AReS v1 repository</a> to <a href="">ILRI&rsquo;s GitHub</a> and give access to CodeObia guys
<li>The plan is that we develop the v2 code here</li>
<h2 id="2019-05-17">2019-05-17</h2>
<li>Peter sent me a bunch of fixes for investors from yesterday</li>
<li>I did a quick check in Open Refine (trim and collapse whitespace, clean smart quotes, etc) and then applied them on CGSpace:</li>
<pre tabindex="0"><code>$ ./ -i /tmp/2019-05-16-fix-306-Investors.csv -db dspace-u dspace-p &#39;fuuu&#39; -f dc.description.sponsorship -m 29 -t correct -d
$ ./ -i /tmp/2019-05-16-delete-297-Investors.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 29 -f dc.description.sponsorship -d
<li>Then I started a full Discovery re-indexing:</li>
<pre tabindex="0"><code>$ export JAVA_OPTS=&#34;-Dfile.encoding=UTF-8 -Xmx1024m&#34;
$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
<li>I was going to make a new controlled vocabulary of the top 100 terms after these corrections, but I noticed a bunch of duplicates and variations when I sorted them alphabetically</li>
<li>Instead, I exported a new list and asked Peter to look at it again</li>
<li>Apply Peter&rsquo;s new corrections on DSpace Test and CGSpace:</li>
<pre tabindex="0"><code>$ ./ -i /tmp/2019-05-17-fix-25-Investors.csv -db dspace -u dspace -p &#39;fuuu&#39; -f dc.description.sponsorship -m 29 -t correct -d
$ ./ -i /tmp/2019-05-17-delete-14-Investors.csv -db dspace -u dspace -p &#39;fuuu&#39; -m 29 -f dc.description.sponsorship -d
<li>Then I re-exported the sponsors and took the top 100 to update the existing controlled vocabulary (<a href="">#423</a>)
<li>I will deploy the changes on CGSpace the next time we re-deploy</li>
<h2 id="2019-05-19">2019-05-19</h2>
<li>Add &ldquo;ISI journal&rdquo; to item view sidebar at the request of Maria Garruccio</li>
<li>Update <code></code> and <code></code> scripts to add some basic checking of CSV fields and colorize shell output using Colorama</li>
<h2 id="2019-05-24">2019-05-24</h2>
<li>Update AReS on GitHub repository to add a proper introduction, credits, requirements, installation instructions, and legal information</li>
<li>Update CIP subjects in input forms on CGSpace (<a href="">#424</a>)</li>
<h2 id="2019-05-25">2019-05-25</h2>
<li>Help Abenet proof ten Africa Rice publications
<li>Convert some dates to string (from number in Excel)</li>
<li>Trim whitespace on all fields</li>
<li>Correct and standardize affiliations</li>
<li>Validate subject terms against AGROVOC</li>
<li>Add rights information to all items</li>
<li>Correct and standardize sponsors</li>
<li>Generate Simple Archive Format bundle with SAFBuilder and import into the <a href="">AfricaRice Articles in Journals</a> collection on CGSpace:</li>
<pre tabindex="0"><code>$ dspace import -a -e -m -s /tmp/SimpleArchiveFormat
</code></pre><h2 id="2019-05-27">2019-05-27</h2>
<li>Peter sent me over two thousand corrections for the authors on CGSpace that I had dumped last month
<li>I proofed them for whitespace and invalid special characters in OpenRefine and then applied them on CGSpace and DSpace Test:</li>
<pre tabindex="0"><code>$ ./ -i /tmp/2019-05-27-fix-2472-Authors.csv -db dspace -u dspace -p &#39;fuuu&#39; -f -m 3 -t corrections -d
<li>Then start a full Discovery re-indexing on each server:</li>
<pre tabindex="0"><code>$ export JAVA_OPTS=&#34;-Dfile.encoding=UTF-8 -Xmx1024m&#34;
$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
<li>Export new list of all authors from CGSpace database to send to Peter:</li>
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = &#39;contributor&#39; and qualifier = &#39;author&#39;) AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-05-27-all-authors.csv with csv header;
COPY 64871
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
<li>Paola from CIAT asked for a way to generate a report of the top keywords for each year of their articles and journals
<li>I told them that the best way (even though it&rsquo;s low tech) is to work on a CSV dump of the collection</li>
<h2 id="2019-05-29">2019-05-29</h2>
<li>A CIMMYT user was having problems registering or logging into CGSpace
<li>I tried to register her and it gave an error, then I remembered for CGIAR LDAP users we actually need to just log in and it will automatically create an eperson</li>
<li>I told her to try to log in with the LDAP login method and let me know what happens (then I can look in the logs too)</li>
<h2 id="2019-05-30">2019-05-30</h2>
<li>I see the following error in the DSpace log when the user tries to log in with her CGIAR email and password on the LDAP login:</li>
<pre tabindex="0"><code>2019-05-30 07:19:35,166 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=A5E0C836AF8F3ABB769FE47107AE1CFF:ip_addr= DN found for user
<li>For now I just created an eperson with her personal email address until I have time to check LDAP to see what&rsquo;s up with her CGIAR account:</li>
<pre tabindex="0"><code>$ dspace user -a -m -g Sakshi -s Saini -p &#39;sknflksnfksnfdls&#39;
</code></pre><!-- raw HTML omitted -->
</div> <!-- /.blog-main -->
</div> <!-- /.row -->
</div> <!-- /.container -->
