Add notes for 2019-03-23

This commit is contained in:
2019-03-23 12:50:02 +02:00
parent bfeea010c4
commit 4748025d7a
3 changed files with 293 additions and 98 deletions

View File

@ -202,52 +202,35 @@ java.sql.SQLException: Connection org.postgresql.jdbc.PgConnection@55ba10b5 is c
- Interestingly, I see a pattern of these errors increasing, with single and double digit numbers over the past month, ~~but spikes of over 1,000 today~~, yesterday, and on 2019-03-08, which was exactly the first time we saw this blank page error recently
```
# grep -c 'SQL QueryTable Error' dspace.log.2019-0*
dspace.log.2019-02-01:6
dspace.log.2019-02-02:1
dspace.log.2019-02-03:0
dspace.log.2019-02-04:4
dspace.log.2019-02-05:4
dspace.log.2019-02-06:3
dspace.log.2019-02-07:0
dspace.log.2019-02-08:2
dspace.log.2019-02-09:0
dspace.log.2019-02-10:0
dspace.log.2019-02-11:1
dspace.log.2019-02-12:2
dspace.log.2019-02-13:1
dspace.log.2019-02-14:74
dspace.log.2019-02-15:47
dspace.log.2019-02-16:44
dspace.log.2019-02-17:37
dspace.log.2019-02-18:191
dspace.log.2019-02-19:51
dspace.log.2019-02-20:32
dspace.log.2019-02-21:64
dspace.log.2019-02-22:48
dspace.log.2019-02-23:32
dspace.log.2019-02-24:25
dspace.log.2019-02-25:53
dspace.log.2019-02-26:63
dspace.log.2019-02-27:28
dspace.log.2019-02-28:31
dspace.log.2019-03-01:29
dspace.log.2019-03-02:24
dspace.log.2019-03-03:41
dspace.log.2019-03-04:25
dspace.log.2019-03-05:48
dspace.log.2019-03-06:48
dspace.log.2019-03-07:42
dspace.log.2019-03-08:1781
dspace.log.2019-03-09:22
dspace.log.2019-03-10:23
dspace.log.2019-03-11:52
dspace.log.2019-03-12:474
dspace.log.2019-03-13:37
dspace.log.2019-03-14:924
dspace.log.2019-03-15:770
$ grep -I 'SQL QueryTable Error' dspace.log.2019-0* | awk -F: '{print $1}' | sort | uniq -c | tail -n 25
5 dspace.log.2019-02-27
11 dspace.log.2019-02-28
29 dspace.log.2019-03-01
24 dspace.log.2019-03-02
41 dspace.log.2019-03-03
11 dspace.log.2019-03-04
9 dspace.log.2019-03-05
15 dspace.log.2019-03-06
7 dspace.log.2019-03-07
9 dspace.log.2019-03-08
22 dspace.log.2019-03-09
23 dspace.log.2019-03-10
18 dspace.log.2019-03-11
13 dspace.log.2019-03-12
10 dspace.log.2019-03-13
25 dspace.log.2019-03-14
12 dspace.log.2019-03-15
67 dspace.log.2019-03-16
72 dspace.log.2019-03-17
8 dspace.log.2019-03-18
15 dspace.log.2019-03-19
21 dspace.log.2019-03-20
29 dspace.log.2019-03-21
41 dspace.log.2019-03-22
4807 dspace.log.2019-03-23
```
- (Update on 2019-03-23 to use correct grep query)
- There are not too many connections currently in PostgreSQL:
```
@ -375,7 +358,7 @@ dspace.log.2019-03-17:72
dspace.log.2019-03-18:1038
```
- Though WTF, this grep seems to be giving weird inaccurate results actually, and the real number of errors is much lower:
- Though WTF, this grep seems to be giving weird inaccurate results actually, and the real number of errors is much lower if I exclude the "binary file matches" result with `-I`:
```
$ grep -I 'SQL QueryTable Error' dspace.log.2019-03-18 | wc -l
@ -601,4 +584,114 @@ $ grep 'Can not load requested doc' cocoon.log.2019-03-21 | grep -oE '2019-03-21
- Share the initial list of invalid AGROVOC terms on Yammer to ask the editors for help in correcting them
- Advise Phanuel Ayuka from IITA about using controlled vocabularies in DSpace
## 2019-03-23
- CGSpace (linode18) is having the blank page issue again and it seems to have started last night around 21:00:
```
$ grep 'Can not load requested doc' cocoon.log.2019-03-22 | grep -oE '2019-03-22 [0-9]{2}:' | sort | uniq -c
2 2019-03-22 00:
69 2019-03-22 01:
1 2019-03-22 02:
13 2019-03-22 03:
2 2019-03-22 05:
2 2019-03-22 06:
8 2019-03-22 07:
4 2019-03-22 08:
12 2019-03-22 09:
7 2019-03-22 10:
1 2019-03-22 11:
2 2019-03-22 12:
14 2019-03-22 13:
4 2019-03-22 15:
7 2019-03-22 16:
7 2019-03-22 17:
3 2019-03-22 18:
3 2019-03-22 19:
7 2019-03-22 20:
323 2019-03-22 21:
685 2019-03-22 22:
357 2019-03-22 23:
$ grep 'Can not load requested doc' cocoon.log.2019-03-23 | grep -oE '2019-03-23 [0-9]{2}:' | sort | uniq -c
575 2019-03-23 00:
445 2019-03-23 01:
518 2019-03-23 02:
436 2019-03-23 03:
387 2019-03-23 04:
593 2019-03-23 05:
468 2019-03-23 06:
541 2019-03-23 07:
440 2019-03-23 08:
260 2019-03-23 09:
```
- I was curious to see if clearing the Cocoon cache in the XMLUI control panel would fix it, but it didn't
- Trying to drill down more, I see that the bulk of the errors started aroundi 21:20:
```
$ grep 'Can not load requested doc' cocoon.log.2019-03-22 | grep -oE '2019-03-22 21:[0-9]' | sort | uniq -c
1 2019-03-22 21:0
1 2019-03-22 21:1
59 2019-03-22 21:2
69 2019-03-22 21:3
89 2019-03-22 21:4
104 2019-03-22 21:5
```
- Looking at the Cocoon log around that time I see the full error is:
```
2019-03-22 21:21:34,378 WARN org.apache.cocoon.components.xslt.TraxErrorListener - Can not load requested doc: unknown protocol: cocoon at jndi:/localhost/themes/CIAT/xsl/../../0_CGIAR/xsl//aspect/artifactbrowser/common.xsl:141:90
```
- A few milliseconds before that time I see this in the DSpace log:
```
2019-03-22 21:21:34,356 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
org.postgresql.util.PSQLException: This statement has been closed.
at org.postgresql.jdbc.PgStatement.checkClosed(PgStatement.java:694)
at org.postgresql.jdbc.PgStatement.getMaxRows(PgStatement.java:501)
at org.postgresql.jdbc.PgStatement.createResultSet(PgStatement.java:153)
at org.postgresql.jdbc.PgStatement$StatementResultHandler.handleResultRows(PgStatement.java:204)
at org.postgresql.core.ResultHandlerDelegate.handleResultRows(ResultHandlerDelegate.java:29)
at org.postgresql.core.v3.QueryExecutorImpl$1.handleResultRows(QueryExecutorImpl.java:528)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2120)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:143)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:106)
at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
at org.apache.tomcat.dbcp.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
at org.dspace.storage.rdbms.DatabaseManager.queryTable(DatabaseManager.java:224)
at org.dspace.storage.rdbms.DatabaseManager.querySingleTable(DatabaseManager.java:375)
at org.dspace.storage.rdbms.DatabaseManager.findByUnique(DatabaseManager.java:544)
at org.dspace.storage.rdbms.DatabaseManager.find(DatabaseManager.java:501)
at org.dspace.eperson.Group.find(Group.java:706)
...
2019-03-22 21:21:34,381 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL query singleTable Error -
org.postgresql.util.PSQLException: This statement has been closed.
at org.postgresql.jdbc.PgStatement.checkClosed(PgStatement.java:694)
at org.postgresql.jdbc.PgStatement.getMaxRows(PgStatement.java:501)
at org.postgresql.jdbc.PgStatement.createResultSet(PgStatement.java:153)
...
2019-03-22 21:21:34,386 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL findByUnique Error -
org.postgresql.util.PSQLException: This statement has been closed.
at org.postgresql.jdbc.PgStatement.checkClosed(PgStatement.java:694)
at org.postgresql.jdbc.PgStatement.getMaxRows(PgStatement.java:501)
at org.postgresql.jdbc.PgStatement.createResultSet(PgStatement.java:153)
...
2019-03-22 21:21:34,395 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL find Error -
org.postgresql.util.PSQLException: This statement has been closed.
at org.postgresql.jdbc.PgStatement.checkClosed(PgStatement.java:694)
at org.postgresql.jdbc.PgStatement.getMaxRows(PgStatement.java:501)
at org.postgresql.jdbc.PgStatement.createResultSet(PgStatement.java:153)
at org.postgresql.jdbc.PgStatement$StatementResultHandler.handleResultRows(PgStatement.java:204)
```
- I restarted Tomcat and now the item displays are working again for now
- I am wondering if this is an issue with removing abandoned connections in Tomcat's JDBC pooling?
- It's hard to tell because we have `logAbanded` enabled, but I don't see anything in the `tomcat7` service logs in the systemd journal
- I sent another mail to the dspace-tech mailing list with my observations
<!-- vim: set sw=2 ts=2: -->