mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2019-12-17
This commit is contained in:
@ -57,7 +57,7 @@ This was due to newline characters in the dc.description.abstract column, which
|
||||
I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
|
||||
Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.60.1" />
|
||||
<meta name="generator" content="Hugo 0.61.0" />
|
||||
|
||||
|
||||
|
||||
@ -138,7 +138,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
|
||||
</p>
|
||||
</header>
|
||||
<h2 id="20170801">2017-08-01</h2>
|
||||
<h2 id="2017-08-01">2017-08-01</h2>
|
||||
<ul>
|
||||
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
|
||||
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
|
||||
@ -160,7 +160,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
|
||||
<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
|
||||
</ul>
|
||||
<h2 id="20170802">2017-08-02</h2>
|
||||
<h2 id="2017-08-02">2017-08-02</h2>
|
||||
<ul>
|
||||
<li>Magdalena from CCAFS asked if there was a way to get the top ten items published in 2016 (note: not the top items in 2016!)</li>
|
||||
<li>I think Atmire's Content and Usage Analysis module should be able to do this but I will have to look at the configuration and maybe email Atmire if I can't figure it out</li>
|
||||
@ -168,7 +168,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
<li>Atmire responded about the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=500">missing workflow statistics issue</a> a few weeks ago but I didn't see it for some reason</li>
|
||||
<li>They said they added a publication and saw the workflow stat for the user, so I should try again and let them know</li>
|
||||
</ul>
|
||||
<h2 id="20170805">2017-08-05</h2>
|
||||
<h2 id="2017-08-05">2017-08-05</h2>
|
||||
<ul>
|
||||
<li>Usman from CIFOR emailed to ask about the status of our OAI tests for harvesting their DSpace repository</li>
|
||||
<li>I told him that the OAI appears to not be harvesting properly after the first sync, and that the control panel shows an “Internal error” for that collection:</li>
|
||||
@ -178,18 +178,18 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
<li>I don't see anything related in our logs, so I asked him to check for our server's IP in their logs</li>
|
||||
<li>Also, in the mean time I stopped the harvesting process, reset the status, and restarted the process via the Admin control panel (note: I didn't reset the collection, just the harvester status!)</li>
|
||||
</ul>
|
||||
<h2 id="20170807">2017-08-07</h2>
|
||||
<h2 id="2017-08-07">2017-08-07</h2>
|
||||
<ul>
|
||||
<li>Apply Abenet's corrections for the CGIAR Library's Consortium subcommunity (697 records)</li>
|
||||
<li>I had to fix a few small things, like moving the <code>dc.title</code> column away from the beginning of the row, delete blank spaces in the abstract in vim using <code>:g/^$/d</code>, add the <code>dc.subject[en_US]</code> column back, as she had deleted it and DSpace didn't detect the changes made there (we needed to blank the values instead)</li>
|
||||
</ul>
|
||||
<h2 id="20170808">2017-08-08</h2>
|
||||
<h2 id="2017-08-08">2017-08-08</h2>
|
||||
<ul>
|
||||
<li>Apply Abenet's corrections for the CGIAR Library's historic archive subcommunity (2415 records)</li>
|
||||
<li>I had to add the <code>dc.subject[en_US]</code> column back with blank values so that DSpace could detect the changes</li>
|
||||
<li>I applied the changes in 500 item batches</li>
|
||||
</ul>
|
||||
<h2 id="20170809">2017-08-09</h2>
|
||||
<h2 id="2017-08-09">2017-08-09</h2>
|
||||
<ul>
|
||||
<li>Run system updates on DSpace Test and reboot server</li>
|
||||
<li>Help ICARDA upgrade their MELSpace to DSpace 5.7 using the <a href="https://github.com/alanorth/docker-dspace">docker-dspace</a> container
|
||||
@ -199,7 +199,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="20170810">2017-08-10</h2>
|
||||
<h2 id="2017-08-10">2017-08-10</h2>
|
||||
<ul>
|
||||
<li>Apply last updates to the CGIAR Library's Fund community (812 items)</li>
|
||||
<li>Had to do some quality checks and column renames before importing, as either Sisay or Abenet renamed a few columns and the metadata importer wanted to remove/add new metadata for title, abstract, etc.</li>
|
||||
@ -220,7 +220,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
<li>Follow up with Atmire on the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=510">ticket about ORCID metadata in DSpace</a></li>
|
||||
<li>Follow up with Lili and Andrea about the pending CCAFS metadata and flagship updates</li>
|
||||
</ul>
|
||||
<h2 id="20170811">2017-08-11</h2>
|
||||
<h2 id="2017-08-11">2017-08-11</h2>
|
||||
<ul>
|
||||
<li>CGSpace had load issues and was throwing errors related to PostgreSQL</li>
|
||||
<li>I told Tsega to reduce the max connections from 70 to 40 because actually each web application gets that limit and so for xmlui, oai, jspui, rest, etc it could be 70 x 4 = 280 connections depending on the load, and the PostgreSQL config itself is only 100!</li>
|
||||
@ -229,7 +229,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
<li>Also, I need to find out where the load is coming from (rest?) and possibly block bots from accessing dynamic pages like Browse and Discover instead of just sending an X-Robots-Tag HTTP header</li>
|
||||
<li>I noticed that Google has bitstreams from the <code>rest</code> interface in the search index. I need to ask on the dspace-tech mailing list to see what other people are doing about this, and maybe start issuing an <code>X-Robots-Tag: none</code> there!</li>
|
||||
</ul>
|
||||
<h2 id="20170812">2017-08-12</h2>
|
||||
<h2 id="2017-08-12">2017-08-12</h2>
|
||||
<ul>
|
||||
<li>I sent a message to the mailing list about the duplicate content issue with <code>/rest</code> and <code>/bitstream</code> URLs</li>
|
||||
<li>Looking at the logs for the REST API on <code>/rest</code>, it looks like there is someone hammering doing testing or something on it…</li>
|
||||
@ -249,12 +249,12 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
access_log /var/log/nginx/oai.log;
|
||||
proxy_pass http://tomcat_http;
|
||||
}
|
||||
</code></pre><h2 id="20170813">2017-08-13</h2>
|
||||
</code></pre><h2 id="2017-08-13">2017-08-13</h2>
|
||||
<ul>
|
||||
<li>Macaroni Bros say that CCAFS wants them to check once every hour for changes</li>
|
||||
<li>I told them to check every four or six hours</li>
|
||||
</ul>
|
||||
<h2 id="20170814">2017-08-14</h2>
|
||||
<h2 id="2017-08-14">2017-08-14</h2>
|
||||
<ul>
|
||||
<li>Run author corrections on CGIAR Library community from Peter</li>
|
||||
</ul>
|
||||
@ -300,7 +300,7 @@ $ grep -rsI SQLException dspace-xmlui | wc -l
|
||||
<li>Apply 223 more author corrections from Peter on CGIAR Library</li>
|
||||
<li>Help Magdalena from CCAFS with some CUA statistics questions</li>
|
||||
</ul>
|
||||
<h2 id="20170815">2017-08-15</h2>
|
||||
<h2 id="2017-08-15">2017-08-15</h2>
|
||||
<ul>
|
||||
<li>Increase the nginx upload limit on CGSpace (linode18) so Sisay can upload 23 CIAT reports</li>
|
||||
<li>Do some last minute cleanups and de-duplications of the CGIAR Library data, as I need to send it to Peter this week</li>
|
||||
@ -308,7 +308,7 @@ $ grep -rsI SQLException dspace-xmlui | wc -l
|
||||
<li>Also, a few dozen <code>dc.description.abstract</code> fields still had various HTML tags and entities in them</li>
|
||||
<li>Also, a bunch of <code>dc.subject</code> fields that were not AGROVOC had not been moved properly to <code>cg.system.subject</code></li>
|
||||
</ul>
|
||||
<h2 id="20170816">2017-08-16</h2>
|
||||
<h2 id="2017-08-16">2017-08-16</h2>
|
||||
<ul>
|
||||
<li>I wanted to merge the various field variations like <code>cg.subject.system</code> and <code>cg.subject.system[en_US]</code> in OpenRefine but I realized it would be easier in PostgreSQL:</li>
|
||||
</ul>
|
||||
@ -351,7 +351,7 @@ UPDATE 4899
|
||||
<li>I think we could use <code>harvest.includerestricted.rss = false</code> but the items might need to be 100% restricted, not just the metadata</li>
|
||||
<li>Adjust Ansible postgres role to use <code>max_connections</code> from a template variable and deploy a new limit of 123 on CGSpace</li>
|
||||
</ul>
|
||||
<h2 id="20170817">2017-08-17</h2>
|
||||
<h2 id="2017-08-17">2017-08-17</h2>
|
||||
<ul>
|
||||
<li>Run Peter's edits to the CGIAR System Organization community on DSpace Test</li>
|
||||
<li>Uptime Robot said CGSpace went down for 1 minute, not sure why</li>
|
||||
@ -395,7 +395,7 @@ dspace.log.2017-08-17:584
|
||||
</li>
|
||||
<li>Peter responded and said that he doesn't want to limit items to be restricted just so we can change the RSS feeds</li>
|
||||
</ul>
|
||||
<h2 id="20170818">2017-08-18</h2>
|
||||
<h2 id="2017-08-18">2017-08-18</h2>
|
||||
<ul>
|
||||
<li>Someone on the dspace-tech mailing list responded with some tips about using the authority framework to do external queries from the submission form</li>
|
||||
<li>He linked to some examples from DSpace-CRIS that use this functionality: <a href="https://github.com/4Science/DSpace/blob/dspace-5_x_x-cris/dspace-api/src/main/java/org/dspace/content/authority/VIAFAuthority.java">VIAFAuthority</a></li>
|
||||
@ -432,14 +432,14 @@ WHERE {
|
||||
<li>I found this blog post about speeding up the Tomcat startup time: <a href="http://skybert.net/java/improve-tomcat-startup-time/">http://skybert.net/java/improve-tomcat-startup-time/</a></li>
|
||||
<li>The startup time went from ~80s to 40s!</li>
|
||||
</ul>
|
||||
<h2 id="20170819">2017-08-19</h2>
|
||||
<h2 id="2017-08-19">2017-08-19</h2>
|
||||
<ul>
|
||||
<li>More examples of SPARQL queries: <a href="https://github.com/rsinger/openlcsh/wiki/Sparql-Examples">https://github.com/rsinger/openlcsh/wiki/Sparql-Examples</a></li>
|
||||
<li>Specifically the explanation of the <code>FILTER</code> regex</li>
|
||||
<li>Might want to <code>SELECT DISTINCT</code> or increase the <code>LIMIT</code> to get terms like “wheat” and “fish” to be visible</li>
|
||||
<li>Test queries online on the AGROVOC SPARQL portal: http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc</li>
|
||||
</ul>
|
||||
<h2 id="20170820">2017-08-20</h2>
|
||||
<h2 id="2017-08-20">2017-08-20</h2>
|
||||
<ul>
|
||||
<li>Since I cleared the XMLUI cache on 2017-08-17 there haven't been any more <code>ERROR net.sf.ehcache.store.DiskStore</code> errors</li>
|
||||
<li>Look at the CGIAR Library to see if I can find the items that have been submitted since May:</li>
|
||||
@ -466,16 +466,16 @@ WHERE {
|
||||
10947/4661
|
||||
10947/4664
|
||||
(5 rows)
|
||||
</code></pre><h2 id="20170823">2017-08-23</h2>
|
||||
</code></pre><h2 id="2017-08-23">2017-08-23</h2>
|
||||
<ul>
|
||||
<li>Start testing the nginx configs for the CGIAR Library migration as well as start making a checklist</li>
|
||||
</ul>
|
||||
<h2 id="20170828">2017-08-28</h2>
|
||||
<h2 id="2017-08-28">2017-08-28</h2>
|
||||
<ul>
|
||||
<li>Bram had written to me two weeks ago to set up a chat about ORCID stuff but the email apparently bounced and I only found out when he emaiiled me on another account</li>
|
||||
<li>I told him I can chat in a few weeks when I'm back</li>
|
||||
</ul>
|
||||
<h2 id="20170831">2017-08-31</h2>
|
||||
<h2 id="2017-08-31">2017-08-31</h2>
|
||||
<ul>
|
||||
<li>I notice that in many WLE collections Marianne Gadeberg is in the edit or approval steps, but she is also in the groups for those steps.</li>
|
||||
<li>I think we need to have a process to go back and check / fix some of these scenarios—to remove her user from the step and instead add her to the group—because we have way too many authorizations and in late 2016 we had <a href="https://github.com/ilri/rmg-ansible-public/commit/358b5ea43f9e5820986f897c9d560937c702ac6e">performance issues with Solr</a> because of this</li>
|
||||
|
Reference in New Issue
Block a user