mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2019-12-17
This commit is contained in:
@ -27,7 +27,7 @@ I'll update the DSpace role in our Ansible infrastructure playbooks and run
|
||||
Also, I'll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system's RAM, and we never re-ran them after migrating to larger Linodes last month
|
||||
I'm testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I'm getting those autowire errors in Tomcat 8.5.30 again:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.60.1" />
|
||||
<meta name="generator" content="Hugo 0.61.0" />
|
||||
|
||||
|
||||
|
||||
@ -108,7 +108,7 @@ I'm testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&#
|
||||
|
||||
</p>
|
||||
</header>
|
||||
<h2 id="20180902">2018-09-02</h2>
|
||||
<h2 id="2018-09-02">2018-09-02</h2>
|
||||
<ul>
|
||||
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
|
||||
<li>I'll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
|
||||
@ -139,7 +139,7 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
|
||||
<li>And the <code>5_x-prod</code> DSpace 5.8 branch does work in Tomcat 8.5.x on my Arch Linux laptop…</li>
|
||||
<li>I'm not sure where the issue is then!</li>
|
||||
</ul>
|
||||
<h2 id="20180903">2018-09-03</h2>
|
||||
<h2 id="2018-09-03">2018-09-03</h2>
|
||||
<ul>
|
||||
<li>Abenet says she's getting three emails about periodic statistics reports every day since the DSpace 5.8 upgrade last week</li>
|
||||
<li>They are from the CUA module</li>
|
||||
@ -148,7 +148,7 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
|
||||
<li>She will try to click the “Unsubscribe” link in the first two to see if it works, otherwise we should contact Atmire</li>
|
||||
<li>The only one she remembers subscribing to is the top downloads one</li>
|
||||
</ul>
|
||||
<h2 id="20180904">2018-09-04</h2>
|
||||
<h2 id="2018-09-04">2018-09-04</h2>
|
||||
<ul>
|
||||
<li>I'm looking over the latest round of IITA records from Sisay: <a href="https://dspacetest.cgiar.org/handle/10568/104230">Mercy1806_August_29</a>
|
||||
<ul>
|
||||
@ -171,7 +171,7 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
|
||||
</li>
|
||||
<li>Abenet says she hasn't received any more subscription emails from the CUA module since she unsubscribed yesterday, so I think we don't need create an issue on Atmire's bug tracker anymore</li>
|
||||
</ul>
|
||||
<h2 id="20180910">2018-09-10</h2>
|
||||
<h2 id="2018-09-10">2018-09-10</h2>
|
||||
<ul>
|
||||
<li>Playing with <a href="https://github.com/eykhagen/strest">strest</a> to test the DSpace REST API programatically</li>
|
||||
<li>For example, given this <code>test.yaml</code>:</li>
|
||||
@ -287,7 +287,7 @@ X-XSS-Protection: 1; mode=block
|
||||
</code></pre><ul>
|
||||
<li>I will have to keep an eye on it and perhaps add it to the list of “bad bots” that get rate limited</li>
|
||||
</ul>
|
||||
<h2 id="20180912">2018-09-12</h2>
|
||||
<h2 id="2018-09-12">2018-09-12</h2>
|
||||
<ul>
|
||||
<li>Merge AReS explorer changes to nginx config and deploy on CGSpace so CodeObia can start testing more</li>
|
||||
<li>Re-create my local Docker container for PostgreSQL data, but using a volume for the database data:</li>
|
||||
@ -301,7 +301,7 @@ $ sudo docker run --name dspacedb -v dspacetest_data:/var/lib/postgresql/data -e
|
||||
<li>I told Sisay to run the XML file through tidy</li>
|
||||
<li>More testing of the access and usage rights changes</li>
|
||||
</ul>
|
||||
<h2 id="20180913">2018-09-13</h2>
|
||||
<h2 id="2018-09-13">2018-09-13</h2>
|
||||
<ul>
|
||||
<li>Peter was communicating with Altmetric about the OAI mapping issue for item <a href="https://cgspace.cgiar.org/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:cgspace.cgiar.org:10568/82810">10568/82810</a> again</li>
|
||||
<li>Altmetric said it was somehow related to the OAI <code>dateStamp</code> not getting updated when the mappings changed, but I said that back in <a href="/cgspace-notes/2018-07/">2018-07</a> when this happened it was because the OAI was actually just not reflecting all the item's mappings</li>
|
||||
@ -348,12 +348,12 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
|
||||
<li>Must have been something like an old DSpace 5.5 file in the spring folder… weird</li>
|
||||
<li>But yay, this means we can update DSpace Test to Ubuntu 18.04, Tomcat 8, PostgreSQL 9.6, etc…</li>
|
||||
</ul>
|
||||
<h2 id="20180914">2018-09-14</h2>
|
||||
<h2 id="2018-09-14">2018-09-14</h2>
|
||||
<ul>
|
||||
<li>Sisay uploaded the IITA records to CGSpace, but forgot to remove the old Handles</li>
|
||||
<li>I explicitly told him not to forget to remove them yesterday!</li>
|
||||
</ul>
|
||||
<h2 id="20180916">2018-09-16</h2>
|
||||
<h2 id="2018-09-16">2018-09-16</h2>
|
||||
<ul>
|
||||
<li>Add the DSpace build.properties as a template into my <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> for configuring DSpace machines</li>
|
||||
<li>One stupid thing there is that I add all the variables in a private vars file, which is apparently higher precedence than host vars, meaning that I can't override them (like SMTP server) on a per-host basis</li>
|
||||
@ -361,7 +361,7 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
|
||||
<li>I suggested that we leave access rights (<code>cg.identifier.access</code>) as it is now, with “Open Access” or “Limited Access”, and then simply re-brand that as “Access rights” in the UIs and relevant drop downs</li>
|
||||
<li>Then we continue as planned to add <code>dc.rights</code> as “Usage rights”</li>
|
||||
</ul>
|
||||
<h2 id="20180917">2018-09-17</h2>
|
||||
<h2 id="2018-09-17">2018-09-17</h2>
|
||||
<ul>
|
||||
<li>Skype meeting with CGSpace team in Addis</li>
|
||||
<li>Change <code>cg.identifier.status</code> “Access rights” options to:
|
||||
@ -418,7 +418,7 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
|
||||
<li>That one returns 766, which is exactly 1655 minus 889…</li>
|
||||
<li>Also, Solr's <code>fq</code> is similar to the regular <code>q</code> query parameter, but it is considered for the Solr query cache so it should be faster for multiple queries</li>
|
||||
</ul>
|
||||
<h2 id="20180918">2018-09-18</h2>
|
||||
<h2 id="2018-09-18">2018-09-18</h2>
|
||||
<ul>
|
||||
<li>I managed to create a simple proof of concept REST API to expose item view and download statistics: <a href="https://github.com/alanorth/cgspace-statistics-api">cgspace-statistics-api</a></li>
|
||||
<li>It uses the Python-based <a href="https://falcon.readthedocs.io">Falcon</a> web framework and talks to Solr directly using the <a href="https://github.com/moonlitesolutions/SolrClient">SolrClient</a> library (which seems to have issues in Python 3.7 currently)</li>
|
||||
@ -439,12 +439,12 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
|
||||
</code></pre><ul>
|
||||
<li>The rest of the Falcon tooling will be more difficult…</li>
|
||||
</ul>
|
||||
<h2 id="20180919">2018-09-19</h2>
|
||||
<h2 id="2018-09-19">2018-09-19</h2>
|
||||
<ul>
|
||||
<li>I emailed Jane Poole to ask if there is some money we can use from the Big Data Platform (BDP) to fund the purchase of some Atmire credits for CGSpace</li>
|
||||
<li>I learned that there is an efficient way to do <a href="http://yonik.com/solr/paging-and-deep-paging/">“deep paging” in large Solr results sets by using <code>cursorMark</code></a>, but it doesn't work with faceting</li>
|
||||
</ul>
|
||||
<h2 id="20180920">2018-09-20</h2>
|
||||
<h2 id="2018-09-20">2018-09-20</h2>
|
||||
<ul>
|
||||
<li>Contact Atmire to ask how we can buy more credits for future development</li>
|
||||
<li>I researched the Solr <code>filterCache</code> size and I found out that the formula for calculating the potential memory use of <strong>each entry</strong> in the cache is:</li>
|
||||
@ -460,7 +460,7 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
|
||||
<li><a href="https://docs.google.com/document/d/1vl-nmlprSULvNZKQNrqp65eLnLhG9s_ydXQtg9iML10/edit">Article discussing testing methodology for different <code>filterCache</code> sizes</a></li>
|
||||
<li>Discuss Handle links on Twitter with IWMI</li>
|
||||
</ul>
|
||||
<h2 id="20180921">2018-09-21</h2>
|
||||
<h2 id="2018-09-21">2018-09-21</h2>
|
||||
<ul>
|
||||
<li>I see that there was a nice optimization to the ImageMagick PDF CMYK detection in the upstream <code>dspace-5_x</code> branch: <a href="https://github.com/DSpace/DSpace/pull/2204">DS-3664</a></li>
|
||||
<li>The fix will go into DSpace 5.10, and we are currently on DSpace 5.8 but I think I'll cherry-pick that fix into our <code>5_x-prod</code> branch:
|
||||
@ -475,14 +475,14 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="20190923">2019-09-23</h2>
|
||||
<h2 id="2019-09-23">2019-09-23</h2>
|
||||
<ul>
|
||||
<li>I did more work on my <a href="https://github.com/alanorth/cgspace-statistics-api">cgspace-statistics-api</a>, fixing some item view counts and adding indexing via SQLite (I'm trying to avoid having to set up <em>yet another</em> database, user, password, etc) during deployment</li>
|
||||
<li>I created a new branch called <code>5_x-upstream-cherry-picks</code> to test and track those cherry-picks from the upstream 5.x branch</li>
|
||||
<li>Also, I need to test the new LDAP server, so I will deploy that on DSpace Test today</li>
|
||||
<li>Rename my cgspace-statistics-api to <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> on GitHub</li>
|
||||
</ul>
|
||||
<h2 id="20180924">2018-09-24</h2>
|
||||
<h2 id="2018-09-24">2018-09-24</h2>
|
||||
<ul>
|
||||
<li>Trying to figure out how to get item views and downloads from SQLite in a join</li>
|
||||
<li>It appears SQLite doesn't support <code>FULL OUTER JOIN</code> so some people on StackOverflow have emulated it with <code>LEFT JOIN</code> and <code>UNION</code>:</li>
|
||||
@ -539,7 +539,7 @@ $ createuser -h localhost -U postgres --pwprompt dspacestatistics
|
||||
$ psql -h localhost -U postgres dspacestatistics
|
||||
dspacestatistics=> CREATE TABLE IF NOT EXISTS items
|
||||
dspacestatistics-> (id INT PRIMARY KEY, views INT DEFAULT 0, downloads INT DEFAULT 0)
|
||||
</code></pre><h2 id="20180925">2018-09-25</h2>
|
||||
</code></pre><h2 id="2018-09-25">2018-09-25</h2>
|
||||
<ul>
|
||||
<li>I deployed the DSpace statistics API on CGSpace, but when I ran the indexer it wanted to index 180,000 pages of item views</li>
|
||||
<li>I'm not even sure how that's possible, as we only have 74,000 items!</li>
|
||||
@ -586,7 +586,7 @@ Indexing item downloads (page 260 of 260)
|
||||
</code></pre><ul>
|
||||
<li>And now it's fast as hell due to the muuuuch smaller Solr statistics core</li>
|
||||
</ul>
|
||||
<h2 id="20180926">2018-09-26</h2>
|
||||
<h2 id="2018-09-26">2018-09-26</h2>
|
||||
<ul>
|
||||
<li>Linode emailed to say that CGSpace (linode18) was using 30Mb/sec of outward bandwidth for two hours around midnight</li>
|
||||
<li>I don't see anything unusual in the nginx logs, so perhaps it was the cron job that syncs the Solr database to Amazon S3?</li>
|
||||
@ -616,7 +616,7 @@ sys 2m18.485s
|
||||
<li>I updated the dspace-statistiscs-api to use psycopg2's <code>execute_values()</code> to insert batches of 100 values into PostgreSQL instead of doing every insert individually</li>
|
||||
<li>On CGSpace this reduces the total run time of <code>indexer.py</code> from 432 seconds to 400 seconds (most of the time is actually spent in getting the data from Solr though)</li>
|
||||
</ul>
|
||||
<h2 id="20180927">2018-09-27</h2>
|
||||
<h2 id="2018-09-27">2018-09-27</h2>
|
||||
<ul>
|
||||
<li>Linode emailed to say that CGSpace's (linode19) CPU load was high for a few hours last night</li>
|
||||
<li>Looking in the nginx logs around that time I see some new IPs that look like they are harvesting things:</li>
|
||||
@ -645,7 +645,7 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=68.6.87.12' dspace.log.2018-09-26
|
||||
<li>I will add their IPs to the list of bad bots in nginx so we can add a “bot” user agent to them and let Tomcat's Crawler Session Manager Valve handle them</li>
|
||||
<li>I asked Atmire to prepare an invoice for 125 credits</li>
|
||||
</ul>
|
||||
<h2 id="20180929">2018-09-29</h2>
|
||||
<h2 id="2018-09-29">2018-09-29</h2>
|
||||
<ul>
|
||||
<li>I merged some changes to author affiliations from Sisay as well as some corrections to organizational names using smart quotes like <code>Université d’Abomey Calavi</code> (<a href="https://github.com/ilri/DSpace/pull/388">#388</a>)</li>
|
||||
<li>Peter sent me a list of 43 author names to fix, but it had some encoding errors like <code>Belalcázar, John</code> like usual (I will tell him to stop trying to export as UTF-8 because it never seems to work)</li>
|
||||
@ -662,7 +662,7 @@ $ ./fix-metadata-values.py -i 2018-09-29-fix-authors.csv -db dspace -u dspace -p
|
||||
<li>It seems to be Moayad trying to do the AReS explorer indexing</li>
|
||||
<li>He was sending too many (5 or 10) concurrent requests to the server, but still… why is this shit so slow?!</li>
|
||||
</ul>
|
||||
<h2 id="20180930">2018-09-30</h2>
|
||||
<h2 id="2018-09-30">2018-09-30</h2>
|
||||
<ul>
|
||||
<li>Valerio keeps sending items on CGSpace that have weird or incorrect languages, authors, etc</li>
|
||||
<li>I think I should just batch export and update all languages…</li>
|
||||
|
Reference in New Issue
Block a user