mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-25 08:00:18 +01:00
Add notes for 2018-04-10
This commit is contained in:
parent
f45ab64261
commit
6f3b199d9f
@ -540,9 +540,9 @@ $ grep 104.196.152.243 dspace.log.2017-11-07 | grep -o -E 'session_id=[A-Z0-9]{3
|
||||
- What's amazing is that it seems to reuse its Java session across all requests:
|
||||
|
||||
```
|
||||
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' /home/cgspace.cgiar.org/log/dspace.log.2017-11-12
|
||||
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2017-11-12
|
||||
1558
|
||||
$ grep 5.9.6.51 /home/cgspace.cgiar.org/log/dspace.log.2017-11-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
$ grep 5.9.6.51 dspace.log.2017-11-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
1
|
||||
```
|
||||
|
||||
@ -552,7 +552,7 @@ $ grep 5.9.6.51 /home/cgspace.cgiar.org/log/dspace.log.2017-11-12 | grep -o -E '
|
||||
```
|
||||
# grep 95.108.181.88 /var/log/nginx/access.log | tail -n 1
|
||||
95.108.181.88 - - [12/Nov/2017:08:33:17 +0000] "GET /bitstream/handle/10568/57004/GenebankColombia_23Feb2015.pdf HTTP/1.1" 200 972019 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
|
||||
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' /home/cgspace.cgiar.org/log/dspace.log.2017-11-12
|
||||
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2017-11-12
|
||||
991
|
||||
```
|
||||
|
||||
|
@ -78,3 +78,145 @@ $ git rebase -i dspace-5.8
|
||||
- DS-3583 Usage of correct Collection Array (#1731) (upstream commit on dspace-5_x: c8f62e6f496fa86846bfa6bcf2d16811087d9761)
|
||||
- ... but somehow git knew, and didn't include them in my interactive rebase!
|
||||
- I need to send this branch to Atmire and also arrange payment (see [ticket #560](https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560) in their tracker)
|
||||
- Fix Sisay's SSH access to the new DSpace Test server (linode19)
|
||||
|
||||
## 2018-04-05
|
||||
|
||||
- Fix Sisay's sudo access on the new DSpace Test server (linode19)
|
||||
- The reindexing process on DSpace Test took _forever_ yesterday:
|
||||
|
||||
```
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
|
||||
real 599m32.961s
|
||||
user 9m3.947s
|
||||
sys 2m52.585s
|
||||
```
|
||||
|
||||
- So we really should not use this Linode block storage for Solr
|
||||
- Assetstore might be fine but would complicate things with configuration and deployment (ughhh)
|
||||
- Better to use Linode block storage only for backup
|
||||
- Help Peter with the GDPR compliance / reporting form for CGSpace
|
||||
- DSpace Test crashed due to memory issues again:
|
||||
|
||||
```
|
||||
# grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
|
||||
16
|
||||
```
|
||||
|
||||
- I ran all system updates on DSpace Test and rebooted it
|
||||
- Proof some records on DSpace Test for Udana from IWMI
|
||||
- He has done better with the small syntax and consistency issues but then there are larger concerns with not linking to DOIs, copying titles incorrectly, etc
|
||||
|
||||
## 2018-04-10
|
||||
|
||||
- I got a notice that CGSpace CPU usage was very high this morning
|
||||
- Looking at the nginx logs, here are the top users today so far:
|
||||
|
||||
```
|
||||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Apr/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
282 207.46.13.112
|
||||
286 54.175.208.220
|
||||
287 207.46.13.113
|
||||
298 66.249.66.153
|
||||
322 207.46.13.114
|
||||
780 104.196.152.243
|
||||
3994 178.154.200.38
|
||||
4295 70.32.83.92
|
||||
4388 95.108.181.88
|
||||
7653 45.5.186.2
|
||||
```
|
||||
|
||||
- 45.5.186.2 is of course CIAT
|
||||
- 95.108.181.88 appears to be Yandex:
|
||||
|
||||
```
|
||||
95.108.181.88 - - [09/Apr/2018:06:34:16 +0000] "GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1" 200 2638 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
|
||||
```
|
||||
|
||||
- And for some reason Yandex created a lot of Tomcat sessions today:
|
||||
|
||||
```
|
||||
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-04-10
|
||||
4363
|
||||
```
|
||||
|
||||
- 70.32.83.92 appears to be some harvester we've seen before, but on a new IP
|
||||
- They are not creating new Tomcat sessions so there is no problem there
|
||||
- 178.154.200.38 also appears to be Yandex, and is also creating many Tomcat sessions:
|
||||
|
||||
```
|
||||
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=178.154.200.38' dspace.log.2018-04-10
|
||||
3982
|
||||
```
|
||||
|
||||
- I'm not sure why Yandex creates so many Tomcat sessions, as its user agent should match the Crawler Session Manager valve
|
||||
- Let's try a manual request with and without their user agent:
|
||||
|
||||
```
|
||||
$ http --print Hh https://cgspace.cgiar.org/bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg 'User-Agent:Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)'
|
||||
GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1
|
||||
Accept: */*
|
||||
Accept-Encoding: gzip, deflate
|
||||
Connection: keep-alive
|
||||
Host: cgspace.cgiar.org
|
||||
User-Agent: Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
|
||||
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Language: en-US
|
||||
Content-Length: 2638
|
||||
Content-Type: image/jpeg;charset=ISO-8859-1
|
||||
Date: Tue, 10 Apr 2018 05:18:37 GMT
|
||||
Expires: Tue, 10 Apr 2018 06:18:37 GMT
|
||||
Last-Modified: Tue, 25 Apr 2017 07:05:54 GMT
|
||||
Server: nginx
|
||||
Strict-Transport-Security: max-age=15768000
|
||||
Vary: User-Agent
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-XSS-Protection: 1; mode=block
|
||||
|
||||
$ http --print Hh https://cgspace.cgiar.org/bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg
|
||||
GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1
|
||||
Accept: */*
|
||||
Accept-Encoding: gzip, deflate
|
||||
Connection: keep-alive
|
||||
Host: cgspace.cgiar.org
|
||||
User-Agent: HTTPie/0.9.9
|
||||
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Language: en-US
|
||||
Content-Length: 2638
|
||||
Content-Type: image/jpeg;charset=ISO-8859-1
|
||||
Date: Tue, 10 Apr 2018 05:20:08 GMT
|
||||
Expires: Tue, 10 Apr 2018 06:20:08 GMT
|
||||
Last-Modified: Tue, 25 Apr 2017 07:05:54 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=31635DB42B66D6A4208CFCC96DD96875; Path=/; Secure; HttpOnly
|
||||
Strict-Transport-Security: max-age=15768000
|
||||
Vary: User-Agent
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-XSS-Protection: 1; mode=block
|
||||
```
|
||||
|
||||
- So it definitely looks like Yandex requests are getting assigned a session from the Crawler Session Manager valve
|
||||
- And if I look at the DSpace log I see its IP sharing a session with other crawlers like Google (66.249.66.153)
|
||||
- Indeed the number of Tomcat sessions appears to be normal:
|
||||
|
||||
![Tomcat sessions week](/cgspace-notes/2018/04/jmx_dspace_sessions-week.png)
|
||||
|
||||
- Looks like the number of total requests processed by nginx in March went down from the previous months:
|
||||
|
||||
```
|
||||
# time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Mar/2018"
|
||||
2266594
|
||||
|
||||
real 0m13.658s
|
||||
user 0m16.533s
|
||||
sys 0m1.087s
|
||||
```
|
||||
|
@ -53,7 +53,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -55,7 +55,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ Update GitHub wiki for documentation of maintenance tasks.
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -59,7 +59,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -49,7 +49,7 @@ Also, I noticed the checker log has some errors we should pay attention to:
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -53,7 +53,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -51,7 +51,7 @@ Working on second phase of metadata migration, looks like this will work for mov
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -67,7 +67,7 @@ In this case the select query was showing 95 results before the update
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -61,7 +61,7 @@ $ git rebase -i dspace-5.5
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -53,7 +53,7 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=or
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -61,7 +61,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ Add dc.type to the output options for Atmire’s Listings and Reports module
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -69,7 +69,7 @@ Another worrying error from dspace.log is:
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -73,7 +73,7 @@ Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -77,7 +77,7 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -63,7 +63,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -29,7 +29,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="May, 2017"/>
|
||||
<meta name="twitter:description" content="2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -29,7 +29,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="June, 2017"/>
|
||||
<meta name="twitter:description" content="2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we’ll create a new sub-community for Phase II and create collections for the research themes there The current “Research Themes” community will be renamed to “WLE Phase I Research Themes” Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -57,7 +57,7 @@ We can use PostgreSQL’s extended output format (-x) plus sed to format the
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -77,7 +77,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -53,7 +53,7 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -57,7 +57,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -77,7 +77,7 @@ COPY 54701
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
@ -754,9 +754,9 @@ $ grep 104.196.152.243 dspace.log.2017-11-07 | grep -o -E 'session_id=[A-Z0-9]{3
|
||||
<li>What’s amazing is that it seems to reuse its Java session across all requests:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' /home/cgspace.cgiar.org/log/dspace.log.2017-11-12
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2017-11-12
|
||||
1558
|
||||
$ grep 5.9.6.51 /home/cgspace.cgiar.org/log/dspace.log.2017-11-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
$ grep 5.9.6.51 dspace.log.2017-11-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
1
|
||||
</code></pre>
|
||||
|
||||
@ -767,7 +767,7 @@ $ grep 5.9.6.51 /home/cgspace.cgiar.org/log/dspace.log.2017-11-12 | grep -o -E '
|
||||
|
||||
<pre><code># grep 95.108.181.88 /var/log/nginx/access.log | tail -n 1
|
||||
95.108.181.88 - - [12/Nov/2017:08:33:17 +0000] "GET /bitstream/handle/10568/57004/GenebankColombia_23Feb2015.pdf HTTP/1.1" 200 972019 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
|
||||
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' /home/cgspace.cgiar.org/log/dspace.log.2017-11-12
|
||||
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2017-11-12
|
||||
991
|
||||
</code></pre>
|
||||
|
||||
|
@ -47,7 +47,7 @@ The list of connections to XMLUI and REST API for today:
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -185,7 +185,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -47,7 +47,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-pl
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -41,7 +41,7 @@ Export a CSV of the IITA community metadata for Martin Mueller
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -21,7 +21,7 @@ Catalina logs at least show some memory errors yesterday:
|
||||
|
||||
<meta property="article:published_time" content="2018-04-01T16:13:54+02:00"/>
|
||||
|
||||
<meta property="article:modified_time" content="2018-04-04T15:57:34+03:00"/>
|
||||
<meta property="article:modified_time" content="2018-04-04T17:01:08+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -43,7 +43,7 @@ Catalina logs at least show some memory errors yesterday:
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
@ -53,9 +53,9 @@ Catalina logs at least show some memory errors yesterday:
|
||||
"@type": "BlogPosting",
|
||||
"headline": "April, 2018",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2018-04/",
|
||||
"wordCount": "423",
|
||||
"wordCount": "1005",
|
||||
"datePublished": "2018-04-01T16:13:54+02:00",
|
||||
"dateModified": "2018-04-04T15:57:34+03:00",
|
||||
"dateModified": "2018-04-04T17:01:08+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -206,8 +206,162 @@ $ git rebase -i dspace-5.8
|
||||
</ul></li>
|
||||
<li>… but somehow git knew, and didn’t include them in my interactive rebase!</li>
|
||||
<li>I need to send this branch to Atmire and also arrange payment (see <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">ticket #560</a> in their tracker)</li>
|
||||
<li>Fix Sisay’s SSH access to the new DSpace Test server (linode19)</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-04-05">2018-04-05</h2>
|
||||
|
||||
<ul>
|
||||
<li>Fix Sisay’s sudo access on the new DSpace Test server (linode19)</li>
|
||||
<li>The reindexing process on DSpace Test took <em>forever</em> yesterday:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
|
||||
real 599m32.961s
|
||||
user 9m3.947s
|
||||
sys 2m52.585s
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>So we really should not use this Linode block storage for Solr</li>
|
||||
<li>Assetstore might be fine but would complicate things with configuration and deployment (ughhh)</li>
|
||||
<li>Better to use Linode block storage only for backup</li>
|
||||
<li>Help Peter with the GDPR compliance / reporting form for CGSpace</li>
|
||||
<li>DSpace Test crashed due to memory issues again:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
|
||||
16
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>I ran all system updates on DSpace Test and rebooted it</li>
|
||||
<li>Proof some records on DSpace Test for Udana from IWMI</li>
|
||||
<li>He has done better with the small syntax and consistency issues but then there are larger concerns with not linking to DOIs, copying titles incorrectly, etc</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2018-04-10">2018-04-10</h2>
|
||||
|
||||
<ul>
|
||||
<li>I got a notice that CGSpace CPU usage was very high this morning</li>
|
||||
<li>Looking at the nginx logs, here are the top users today so far:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Apr/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
282 207.46.13.112
|
||||
286 54.175.208.220
|
||||
287 207.46.13.113
|
||||
298 66.249.66.153
|
||||
322 207.46.13.114
|
||||
780 104.196.152.243
|
||||
3994 178.154.200.38
|
||||
4295 70.32.83.92
|
||||
4388 95.108.181.88
|
||||
7653 45.5.186.2
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>45.5.186.2 is of course CIAT</li>
|
||||
<li>95.108.181.88 appears to be Yandex:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>95.108.181.88 - - [09/Apr/2018:06:34:16 +0000] "GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1" 200 2638 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>And for some reason Yandex created a lot of Tomcat sessions today:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-04-10
|
||||
4363
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>70.32.83.92 appears to be some harvester we’ve seen before, but on a new IP</li>
|
||||
<li>They are not creating new Tomcat sessions so there is no problem there</li>
|
||||
<li>178.154.200.38 also appears to be Yandex, and is also creating many Tomcat sessions:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=178.154.200.38' dspace.log.2018-04-10
|
||||
3982
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>I’m not sure why Yandex creates so many Tomcat sessions, as its user agent should match the Crawler Session Manager valve</li>
|
||||
<li>Let’s try a manual request with and without their user agent:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ http --print Hh https://cgspace.cgiar.org/bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg 'User-Agent:Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)'
|
||||
GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1
|
||||
Accept: */*
|
||||
Accept-Encoding: gzip, deflate
|
||||
Connection: keep-alive
|
||||
Host: cgspace.cgiar.org
|
||||
User-Agent: Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
|
||||
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Language: en-US
|
||||
Content-Length: 2638
|
||||
Content-Type: image/jpeg;charset=ISO-8859-1
|
||||
Date: Tue, 10 Apr 2018 05:18:37 GMT
|
||||
Expires: Tue, 10 Apr 2018 06:18:37 GMT
|
||||
Last-Modified: Tue, 25 Apr 2017 07:05:54 GMT
|
||||
Server: nginx
|
||||
Strict-Transport-Security: max-age=15768000
|
||||
Vary: User-Agent
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-XSS-Protection: 1; mode=block
|
||||
|
||||
$ http --print Hh https://cgspace.cgiar.org/bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg
|
||||
GET /bitstream/handle/10568/21794/ILRI_logo_usage.jpg.jpg HTTP/1.1
|
||||
Accept: */*
|
||||
Accept-Encoding: gzip, deflate
|
||||
Connection: keep-alive
|
||||
Host: cgspace.cgiar.org
|
||||
User-Agent: HTTPie/0.9.9
|
||||
|
||||
HTTP/1.1 200 OK
|
||||
Connection: keep-alive
|
||||
Content-Language: en-US
|
||||
Content-Length: 2638
|
||||
Content-Type: image/jpeg;charset=ISO-8859-1
|
||||
Date: Tue, 10 Apr 2018 05:20:08 GMT
|
||||
Expires: Tue, 10 Apr 2018 06:20:08 GMT
|
||||
Last-Modified: Tue, 25 Apr 2017 07:05:54 GMT
|
||||
Server: nginx
|
||||
Set-Cookie: JSESSIONID=31635DB42B66D6A4208CFCC96DD96875; Path=/; Secure; HttpOnly
|
||||
Strict-Transport-Security: max-age=15768000
|
||||
Vary: User-Agent
|
||||
X-Cocoon-Version: 2.2.0
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: SAMEORIGIN
|
||||
X-XSS-Protection: 1; mode=block
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>So it definitely looks like Yandex requests are getting assigned a session from the Crawler Session Manager valve</li>
|
||||
<li>And if I look at the DSpace log I see its IP sharing a session with other crawlers like Google (66.249.66.153)</li>
|
||||
<li>Indeed the number of Tomcat sessions appears to be normal:</li>
|
||||
</ul>
|
||||
|
||||
<p><img src="/cgspace-notes/2018/04/jmx_dspace_sessions-week.png" alt="Tomcat sessions week" /></p>
|
||||
|
||||
<ul>
|
||||
<li>Looks like the number of total requests processed by nginx in March went down from the previous months:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Mar/2018"
|
||||
2266594
|
||||
|
||||
real 0m13.658s
|
||||
user 0m16.533s
|
||||
sys 0m1.087s
|
||||
</code></pre>
|
||||
|
||||
|
||||
|
||||
|
||||
|
BIN
docs/2018/04/jmx_dspace_sessions-week.png
Normal file
BIN
docs/2018/04/jmx_dspace_sessions-week.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 12 KiB |
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="404 Page not found"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -29,7 +29,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGIAR Library Migration"/>
|
||||
<meta name="twitter:description" content="Notes on the migration of the CGIAR Library to CGSpace"/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2018-04/</loc>
|
||||
<lastmod>2018-04-04T15:57:34+03:00</lastmod>
|
||||
<lastmod>2018-04-04T17:01:08+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -159,7 +159,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2018-04-04T15:57:34+03:00</lastmod>
|
||||
<lastmod>2018-04-04T17:01:08+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -170,7 +170,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2018-04-04T15:57:34+03:00</lastmod>
|
||||
<lastmod>2018-04-04T17:01:08+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -182,13 +182,13 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2018-04-04T15:57:34+03:00</lastmod>
|
||||
<lastmod>2018-04-04T17:01:08+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2018-04-04T15:57:34+03:00</lastmod>
|
||||
<lastmod>2018-04-04T17:01:08+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
@ -26,7 +26,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.38" />
|
||||
<meta name="generator" content="Hugo 0.38.2" />
|
||||
|
||||
|
||||
|
||||
|
BIN
static/2018/04/jmx_dspace_sessions-week.png
Normal file
BIN
static/2018/04/jmx_dspace_sessions-week.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 12 KiB |
Loading…
Reference in New Issue
Block a user