cgspace-notes/content/posts/2019-05.md

223 lines
11 KiB
Markdown
Raw Normal View History

2019-05-01 10:53:26 +02:00
---
title: "May, 2019"
date: 2019-05-01T07:37:43+03:00
author: "Alan Orth"
tags: ["Notes"]
---
## 2019-05-01
- Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace
- A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
- Apparently if the item is in the `workflowitem` table it is submitted to a workflow
- And if it is in the `workspaceitem` table it is in the pre-submitted state
- The item seems to be in a pre-submitted state, so I tried to delete it from there:
```
dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1
```
- But after this I tried to delete the item from the XMLUI and it is *still* present...
<!--more-->
- I managed to delete the problematic item from the database
- First I deleted the item's bitstream in XMLUI and then ran `dspace cleanup -v` to remove it from the assetstore
- Then I ran the following SQL:
```
dspace=# DELETE FROM metadatavalue WHERE resource_id=74648;
dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
dspace=# DELETE FROM item WHERE item_id=74648;
```
- Now the item is (hopefully) really gone and I can continue to troubleshoot the issue with REST API's `/items/find-by-metadata-value` endpoint
- Of course I run into another HTTP 401 error when I continue trying the LandPortal search from last month:
```
$ curl -f -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": "en_US"}'
curl: (22) The requested URL returned error: 401 Unauthorized
```
- The DSpace log shows the item ID (because I modified the error text):
```
2019-05-01 11:41:11,069 ERROR org.dspace.rest.ItemsResource @ User(anonymous) has not permission to read item(id=77708)!
```
- If I delete that one I get another, making the list of item IDs so far:
- 74648
- 77708
- 85079
- Some are in the `workspaceitem` table (pre-submission), others are in the `workflowitem` table (submitted), and others are actually approved, but withdrawn...
- This is actually a worthless exercise because the real issue is that the `/items/find-by-metadata-value` endpoint is simply designed flawed and shouldn't be fatally erroring when the search returns items the user doesn't have permission to access
- It would take way too much time to try to fix the fucked up items that are in limbo by deleting them in SQL, but also, it doesn't actually fix the problem because some items are *submitted* but *withdrawn*, so they actually have handles and everything
- I think the solution is to recommend people don't use the `/items/find-by-metadata-value` endpoint
- CIP is asking about embedding PDF thumbnail images in their RSS feeds again
- They asked in 2018-09 as well and I told them it wasn't possible
- To make sure, I looked at [the documentation for RSS media feeds](https://wiki.duraspace.org/display/DSPACE/Enable+Media+RSS+Feeds) and tried it, but couldn't get it to work
- It seems to be geared towards iTunes and Podcasts... I dunno
2019-05-01 11:24:01 +02:00
- CIP also asked for a way to get an XML file of all their RTB journal articles on CGSpace
- I told them to use the REST API like (where `1179` is the id of the RTB journal articles collection):
```
https://cgspace.cgiar.org/rest/collections/1179/items?limit=812&expand=metadata
```
2019-05-01 10:53:26 +02:00
2019-05-03 09:29:01 +02:00
## 2019-05-03
- A user from CIAT emailed to say that CGSpace submission emails have not been working the last few weeks
- I checked the `dspace test-email` script on CGSpace and they are indeed failing:
```
$ dspace test-email
About to send test email:
- To: woohoo@cgiar.org
- Subject: DSpace test email
- Server: smtp.office365.com
Error sending email:
- Error: javax.mail.AuthenticationFailedException
Please see the DSpace documentation for assistance.
```
- I will ask ILRI ICT to reset the password
2019-05-03 15:33:34 +02:00
- They reset the password and I tested it on CGSpace
2019-05-03 09:29:01 +02:00
2019-05-05 15:45:12 +02:00
## 2019-05-05
- Run all system updates on DSpace Test (linode19) and reboot it
- Merge changes into the `5_x-prod` branch of CGSpace:
- Updates to remove deprecated social media websites (Google+ and Delicious), update Twitter share intent, and add item title to Twitter and email links ([#421](https://github.com/ilri/DSpace/pull/421))
- Add new CCAFS Phase II project tags ([#420](https://github.com/ilri/DSpace/pull/420))
- Add item ID to REST API error logging ([#422](https://github.com/ilri/DSpace/pull/422))
- Re-deploy CGSpace from `5_x-prod` branch
- Run all system updates on CGSpace (linode18) and reboot it
2019-05-05 22:53:42 +02:00
- Tag version 1.1.0 of the [dspace-statistics-api](https://github.com/ilri/dspace-statistics-api) (with Falcon 2.0.0)
- Deploy on DSpace Test
2019-05-05 15:45:12 +02:00
2019-05-06 10:50:57 +02:00
## 2019-05-06
- Peter pointed out that Solr stats are only showing 2019 stats
- I looked at the Solr Admin UI and I see:
```
statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error opening new searcher
```
- As well as this error in the logs:
```
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock
```
- Strangely enough, I *do* see the statistics-2018, statistics-2017, etc cores in the Admin UI...
- I restarted Tomcat a few times (and even deleted all the Solr write locks) and at least five times there were issues loading one statistics core, causing the Atmire stats to be incomplete
- Also, I tried to increase the `writeLockTimeout` in `solrconfig.xml` from the default of 1000ms to 10000ms
- Eventually the Atmire stats started working, despite errors about "Error opening new searcher" in the Solr Admin UI
- I wrote to the dspace-tech mailing list again on the thread from March, 2019
2019-05-06 14:41:40 +02:00
- There were a few alerts from UptimeRobot about CGSpace going up and down this morning, along with an alert from Linode about 596% load
- Looking at the Munin stats I see an exponential rise in DSpace XMLUI sessions, firewall activity, and PostgreSQL connections this morning:
![CGSpace XMLUI sessions day](/cgspace-notes/2019/05/2019-05-06-jmx_dspace_sessions-day.png)
![linode18 firewall connections day](/cgspace-notes/2019/05/2019-05-06-fw_conntrack-day.png)
![linode18 postgres connections day](/cgspace-notes/2019/05/2019-05-06-postgres_connections_db-day.png)
![linode18 CPU day](/cgspace-notes/2019/05/2019-05-06-cpu-day.png)
- The number of unique sessions today is *ridiculously* high compared to the last few days considering it's only 12:30PM right now:
```
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-06 | sort | uniq | wc -l
101108
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-05 | sort | uniq | wc -l
14618
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-04 | sort | uniq | wc -l
14946
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-03 | sort | uniq | wc -l
6410
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-02 | sort | uniq | wc -l
7758
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-01 | sort | uniq | wc -l
20528
```
- The number of unique IP addresses from 2 to 6 AM this morning is already several times higher than the average for that time of the morning this past week:
```
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
7127
# zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E '05/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
1231
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E '04/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
1255
# zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz | grep -E '03/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
1736
# zcat --force /var/log/nginx/access.log.4.gz /var/log/nginx/access.log.5.gz | grep -E '02/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
1573
# zcat --force /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.6.gz | grep -E '01/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
1410
```
- Just this morning between the hours of 2 and 6 the number of unique sessions was *very* high compared to previous mornings:
```
$ cat dspace.log.2019-05-06 | grep -E '2019-05-06 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
83650
$ cat dspace.log.2019-05-05 | grep -E '2019-05-05 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
2547
$ cat dspace.log.2019-05-04 | grep -E '2019-05-04 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
2574
$ cat dspace.log.2019-05-03 | grep -E '2019-05-03 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
2911
$ cat dspace.log.2019-05-02 | grep -E '2019-05-02 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
2704
$ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
3699
```
- Most of the requests were GETs:
```
# cat /var/log/nginx/{access,library-access}.log /var/log/nginx/{access,library-access}.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E "(GET|HEAD|POST|PUT)" | sort | uniq -c | sort -n
1 PUT
98 POST
2845 HEAD
98121 GET
```
- I'm not exactly sure what happened this morning, but it looks like some legitimate user traffic—perhaps someone launched a new publication and it got a bunch of hits?
- Looking again, I see 84,000 requests to `/handle` this morning (not including logs for library.cgiar.org because those get HTTP 301 redirect to CGSpace and appear here in `access.log`):
```
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -c -o -E " /handle/[0-9]+/[0-9]+"
84350
```
- But it would be difficult to find a pattern for those requests because they cover 78,000 *unique* Handles (ie direct browsing of items, collections, or communities) and only 2,492 discover/browse (total, not unique):
```
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E " /handle/[0-9]+/[0-9]+ HTTP" | sort | uniq | wc -l
78104
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E " /handle/[0-9]+/[0-9]+/(discover|browse)" | wc -l
2492
```
- In other news, I see some IP is making several requests per second to the exact same REST API endpoints, for example:
```
# grep /rest/handle/10568/3703?expand=all rest.log | awk '{print $1}' | sort | uniq -c
3 2a01:7e00::f03c:91ff:fe0a:d645
113 63.32.242.35
```
- According to [viewdns.info](https://viewdns.info/reverseip/?host=63.32.242.35&t=1) that server belongs to Macaroni Brothers'
- The user agent of their non-REST API requests from the same IP is Drupal
- This is one very good reason to limit REST API requests, and perhaps to enable caching via nginx
2019-05-06 10:50:57 +02:00
2019-05-01 10:53:26 +02:00
<!-- vim: set sw=2 ts=2: -->