mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2021-06-27
This commit is contained in:
@ -369,4 +369,72 @@ $ redis-cli KEYS "bull:plugins:*" \
|
||||
- I thought of using `redis-cli --pipe` but then you have to construct the commands in the redis protocol format with the number of args and length of each command
|
||||
- There is clearly something wrong with the new DSpace health check plugin, as it creates WAY too many jobs every time we run the plugins
|
||||
|
||||
## 2021-06-27
|
||||
|
||||
- Looking into the spike in PostgreSQL connections last week
|
||||
- I see the same things that I always see (large number of connections waiting for lock, large number of threads, high CPU usage, etc), but I also see almost 10,000 DSpace sessions on 2021-06-25
|
||||
|
||||

|
||||
|
||||
- Looking at the DSpace log I see there was definitely a higher number of sessions that day, perhaps twice the normal:
|
||||
|
||||
```console
|
||||
$ for file in dspace.log.2021-06-[12]*; do echo "$file"; grep -oE 'session_id=[A-Z0-9]{32}' "$file" | sort | uniq | wc -l; done
|
||||
dspace.log.2021-06-10
|
||||
19072
|
||||
dspace.log.2021-06-11
|
||||
19224
|
||||
dspace.log.2021-06-12
|
||||
19215
|
||||
dspace.log.2021-06-13
|
||||
16721
|
||||
dspace.log.2021-06-14
|
||||
17880
|
||||
dspace.log.2021-06-15
|
||||
12103
|
||||
dspace.log.2021-06-16
|
||||
4651
|
||||
dspace.log.2021-06-17
|
||||
22785
|
||||
dspace.log.2021-06-18
|
||||
21406
|
||||
dspace.log.2021-06-19
|
||||
25967
|
||||
dspace.log.2021-06-20
|
||||
20850
|
||||
dspace.log.2021-06-21
|
||||
6388
|
||||
dspace.log.2021-06-22
|
||||
5945
|
||||
dspace.log.2021-06-23
|
||||
46371
|
||||
dspace.log.2021-06-24
|
||||
9024
|
||||
dspace.log.2021-06-25
|
||||
12521
|
||||
dspace.log.2021-06-26
|
||||
16163
|
||||
dspace.log.2021-06-27
|
||||
5886
|
||||
```
|
||||
|
||||
- I see 15,000 unique IPs in the XMLUI logs alone on that day:
|
||||
|
||||
```console
|
||||
# zcat /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.4.gz | grep '23/Jun/2021' | awk '{print $1}' | sort | uniq | wc -l
|
||||
15835
|
||||
```
|
||||
|
||||
- Annoyingly I found 37,000 more hits from Bing using `dns:*msnbot* AND dns:*.msn.com.` as a Solr filter
|
||||
- WTF, they are using a normal user agent: `Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko`
|
||||
- I will purge the IPs and add this user agent to the nginx config so that we can rate limit it
|
||||
- I signed up for Bing Webmaster Tools and verified cgspace.cgiar.org with the BingSiteAuth.xml file
|
||||
- Also I adjusted the nginx config to explicitly allow access to `robots.txt` even when bots are rate limited
|
||||
- Also I found that Bing was auto discovering all our RSS and Atom feeds as "sitemaps" so I deleted 750 of them and submitted the real sitemap
|
||||
- I need to see if I can adjust the nginx config further to map the `bot` user agent to DNS like msnbot...
|
||||
- Review Abdullah's filter on click pull request
|
||||
- I rebased his code on the latest master branch and tested adding filter on click to the map and list components, and it works fine
|
||||
- There seems to be a bug that breaks scrolling on the page though...
|
||||
- Abdullah fixed the bug in the filter on click branch
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user