IWMI notified me that AReS was down with an HTTP 502 error
Looking at UptimeRobot I see it has been down for 33 hours, but I never got a notification
I don’t see anything in the Elasticsearch container logs, or the systemd journal on the host, but I notice that the angular_nginx container isn’t running
IWMI notified me that AReS was down with an HTTP 502 error
Looking at UptimeRobot I see it has been down for 33 hours, but I never got a notification
I don’t see anything in the Elasticsearch container logs, or the systemd journal on the host, but I notice that the angular_nginx container isn’t running
<li>IWMI notified me that AReS was down with an HTTP 502 error
<ul>
<li>Looking at UptimeRobot I see it has been down for 33 hours, but I never got a notification</li>
<li>I don’t see anything in the Elasticsearch container logs, or the systemd journal on the host, but I notice that the <code>angular_nginx</code> container isn’t running</li>
<li>I simply started it and AReS was running again:</li>
<li>Margarita from CCAFS emailed me to say that workflow alerts haven’t been working lately
<ul>
<li>I guess this is related to the SMTP issues last week</li>
<li>I had fixed the config, but didn’t restart Tomcat so DSpace didn’t load the new variables</li>
<li>I ran all system updates on CGSpace (linode18) and DSpace Test (linode26) and rebooted the servers</li>
</ul>
</li>
</ul>
<h2id="2021-06-03">2021-06-03</h2>
<ul>
<li>Meeting with AMCOW and IWMI to discuss AMCOW getting IWMI’s content into the new AMCOW Knowledge Hub
<ul>
<li>At first we spent some time talking about DSpace communities/collections and the REST API, but then they said they actually prefer to send queries to sites on the fly and cache them in Redis for some time</li>
<li>That’s when I thought they could perhaps use the OpenSearch, but I can’t remember if it’s possible to limit by community, or only collection…</li>
<li>Looking now, I see there is a “scope” parameter that can be used for community or collection, for example:</li>
<li>Looking at the Solr statistics on CGSpace for last month I see many requests from hosts using seemingly normal Windows browser user agents, but using the MSN bot’s DNS
<ul>
<li>For example, user agent <code>Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0; Trident/5.0)</code> with DNS <code>msnbot-131-253-25-91.search.msn.com.</code></li>
<li>I queried Solr for all hits using the MSN bot DNS (<code>dns:*msnbot* AND dns:*.msn.com.</code>) and found 457,706</li>
<li>I extracted their IPs using Solr’s CSV format and ran them through my <code>resolve-addresses.py</code> script and found that they all belong to MICROSOFT-CORP-MSN-AS-BLOCK (AS8075)</li>
<li>Note that <ahref="https://www.bing.com/webmasters/help/how-to-verify-bingbot-3905dc26">Microsoft’s docs say that reverse lookups on Bingbot IPs will always have “search.msn.com”</a> so it is safe to purge these as non-human traffic</li>
<li>I purged the hits with <code>ilri/check-spider-ip-hits.sh</code> (though I had to do it in 3 batches because I forgot to increase the <code>facet.limit</code> so I was only getting them 100 at a time)</li>
</ul>
</li>
<li>Moayad sent a pull request a few days ago to re-work the harvesting on OpenRXV
<ul>
<li>It will hopefully also fix the duplicate and missing items issues</li>
<li>I had a Skype with him to discuss</li>
<li>I got it running on podman-compose, but I had to fix the storage permissions on the Elasticsearch volume after the first time it tries (and fails) to run:</li>