cgspace-notes/content/posts/2020-12.md

245 lines
14 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "December, 2020"
date: 2020-12-01T11:32:54+02:00
author: "Alan Orth"
categories: ["Notes"]
---
## 2020-12-01
- Atmire responded about the issue with duplicate data in our Solr statistics
- They noticed that some records in the statistics-2015 core haven't been migrated with the AtomicStatisticsUpdateCLI tool yet and assumed that I haven't migrated any of the records yet
- That's strange, as I checked all ten cores and 2015 is the only one with some unmigrated documents, as according to the `cua_version` field
- I started processing those (about 411,000 records):
<!--more-->
```console
$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics-2015
```
- AReS went down when the `renew-letsencrypt` service stopped the `angular_nginx` container in the pre-update hook and failed to bring it back up
- I ran all system updates on the host and rebooted it and AReS came back up OK
## 2020-12-02
- Udana emailed me yesterday to ask why the CGSpace usage statistics were showing "No Data"
- I noticed a message in the Solr Admin UI that one of the statistics cores failed to load, but it is up and I can query it...
- Nevertheless, I restarted Tomcat a few times to see if all cores would come up without an error message, but had no success (despite that all cores ARE up and I can query them, _sigh_)
- I think I will move all the Solr yearly statistics back into the main statistics core
- Start testing export/import of yearly Solr statistics data into the main statistics core on DSpace Test, for example:
```
$ ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o statistics-2010.json -k uid
$ ./run.sh -s http://localhost:8081/solr/statistics -a import -o statistics-2010.json -k uid
$ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:*</query></delete>"
```
- I deployed Tomcat 7.0.107 on DSpace Test (CGSpace is still Tomcat 7.0.104)
- I finished migrating all the statistics from the yearly shards back to the main core
## 2020-12-05
- I deleted all the yearly statistics shards and restarted Tomcat on DSpace Test (linode26)
## 2020-12-06
- Looking into the statistics on DSpace Test after I migrated them back to the main core
- All stats are working as expected... indexing time for the DSpace Statistics API is the same... and I don't even see a difference in the JVM or memory stats in Munin other than a minor jump last week when I was processing them
- I will migrate them on CGSpace too I think
- First I will start with the statistics-2010 and statistics-2015 cores because they were the ones that were failing to load recently (despite actually being available in Solr WTF)
![Error message in Solr admin UI about the statistics-2010 core failing to load](/cgspace-notes/2020/12/solr-statistics-2010-failed.png)
- First the 2010 core:
```console
$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o statistics-2010.json -k uid
$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics -a import -o statistics-2010.json -k uid
$ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:*</query></delete>"
```
- Judging by the DSpace logs all these cores had a problem starting up in the last month:
```console
# grep -rsI "Unable to create core" [dspace]/log/dspace.log.2020-* | grep -o -E "statistics-[0-9]+" | sort | uniq -c
24 statistics-2010
24 statistics-2015
18 statistics-2016
6 statistics-2018
```
- The message is always this:
```
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'statistics-2016': Unable to create core [statistics-2016] Caused by: Lock obtain timed out: NativeFSLock@/[dspace]/solr/statistics-2016/data/index/write.lock
```
- I will migrate all these cores and see if it makes a difference, then probably end up migrating all of them
- I removed the statistics-2010, statistics-2015, statistics-2016, and statistics-2018 cores and restarted Tomcat and _all the statistics cores came up OK and the CUA statistics are OK_!
## 2020-12-07
- Run `dspace cleanup -v` on CGSpace to clean up deleted bitstreams
- Atmire sent a [pull request](https://github.com/ilri/DSpace/pull/457) to address the duplicate owningComm and owningColl
- Built and deployed it on DSpace Test but I am not sure how to run it yet
- I sent feedback to Atmire on their tracker: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=839
- Abenet and Tezira are having issues with committing to the archive in their workflow
- I looked at the server and indeed the locks and transactions are back up:
![PostgreSQL Transactions day](/cgspace-notes/2020/12/postgres_transactions_ALL-day.png)
![PostgreSQL Locks day](/cgspace-notes/2020/12/postgres_locks_ALL-day.png)
![PostgreSQL Locks day](/cgspace-notes/2020/12/postgres_querylength_ALL-day.png)
![PostgreSQL Connections day](/cgspace-notes/2020/12/postgres_connections_ALL-day.png)
- There are apparently 1,700 locks right now:
```console
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
1739
```
## 2020-12-08
- Atmire sent some instructions for using the DeduplicateValuesProcessor
- I modified `atmire-cua-update.xml` as they instructed, but I get a million errors like this when I run AtomicStatisticsUpdateCLI with that configuration:
```
Record uid: 64387815-d9a7-4605-8024-1c0a5c7520e0 couldn't be processed
com.atmire.statistics.util.update.atomic.ProcessingException: something went wrong while processing record uid: 64387815-d9a7-4605-8024-1c0a5c7520e0, an error occured in the com.atmire.statistics.util.update.atomic.processor.DeduplicateValuesProcessor
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(SourceFile:304)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.processRecords(SourceFile:176)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(SourceFile:161)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(SourceFile:128)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(SourceFile:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
Caused by: java.lang.UnsupportedOperationException
at org.apache.solr.common.SolrDocument$1.entrySet(SolrDocument.java:256)
at java.util.HashMap.putMapEntries(HashMap.java:512)
at java.util.HashMap.<init>(HashMap.java:490)
at com.atmire.statistics.util.update.atomic.record.Record.getFieldValuesMap(SourceFile:86)
at com.atmire.statistics.util.update.atomic.processor.DeduplicateValuesProcessor.process(SourceFile:38)
at com.atmire.statistics.util.update.atomic.processor.DeduplicateValuesProcessor.visit(SourceFile:34)
at com.atmire.statistics.util.update.atomic.record.UsageRecord.accept(SourceFile:23)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(SourceFile:301)
... 10 more
```
- I sent some feedback to Atmire
- They responded with an updated CUA (6.x-4.1.10-ilri-RC7) that has a fix for the duplicates processor _and_ a possible fix for the database locking issues (a bug in CUASolrLoggerServiceImpl that causes an infinite loop and a Tomcat timeout)
- I deployed the changes on DSpace Test and CGSpace, hopefully it will fix both issues!
- In other news, after I restarted Tomcat on CGSpace the statistics-2013 core didn't come back up properly, so I exported it and imported it into the main statistics core like I did for the others a few days ago
- Sync DSpace Test with CGSpace's Solr, PostgreSQL database, and assetstore...
## 2020-12-09
- I was running the AtomicStatisticsUpdateCLI to remove duplicates on DSpace Test but it failed near the end of the statistics core (after 20 hours or so) with a memory error:
```
Successfully finished updating Solr Storage Reports | Wed Dec 09 15:25:11 CET 2020
Run 1 —  67% — 10,000/14,935 docs — 6m 6s — 6m 6s
Exception: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.noggit.CharArr.toString(CharArr.java:164)
```
- I increased the JVM heap to 2048m and tried again, but it failed with a memory error again...
- I increased the JVM heap to 4096m and tried again, but it failed with another error:
```
Successfully finished updating Solr Storage Reports | Wed Dec 09 15:53:40 CET 2020
Exception: parsing error
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: parsing error
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:530)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.getNextSetOfSolrDocuments(SourceFile:392)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(SourceFile:157)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(SourceFile:128)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(SourceFile:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
Caused by: org.apache.solr.common.SolrException: parsing error
at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:45)
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:528)
... 14 more
Caused by: org.apache.http.TruncatedChunkException: Truncated chunk ( expected size: 8192; actual size: 2843)
at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:200)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
at org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:125)
at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:152)
...
```
## 2020-12-10
- The statistics-2019 core finished processing the duplicate removal so I started the statistics-2017 core
- Peter asked me to add ONE HEALTH to ILRI subjects on CGSpace
- A few items that got "lost" after approval during the database issues earlier this week seem to have gone back into their workflows
- Abenet approved them again and they got new handles, phew
- Abenet was having an issue with the date filter on AReS and it turns out that it's the same `.keyword` issue I had noticed before that causes the filter to stop working
- I fixed the filter to use the correct field name and filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/63
- I checked the Solr statistics on DSpace Test to see if the Atmire duplicates remover was working, but now I see a comical amount of duplicates...
![Solr stats with dozens of duplicates](/cgspace-notes/2020/12/solr-stats-duplicates.png)
- I sent feedback about this to Atmire
- I will re-sync the Solr stats from CGSpace so we can try again...
- In other news, it has been a few days since we deployed the fix for the database locking issue and things seem much better now:
![PostgreSQL connections all week](/cgspace-notes/2020/12/postgres_connections_ALL-week.png)
![PostgreSQL locks all week](/cgspace-notes/2020/12/postgres_locks_ALL-week.png)
## 2020-12-13
- I tried to harvest a few times on OpenRXV in the last few days and every time it appends all the new records to the items index instead of overwriting it:
![OpenRXV duplicates](/cgspace-notes/2020/12/openrxv-duplicates.png)
- I can see it in the `openrxv-items-final` index:
```console
$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*' | json_pp
{
"_shards" : {
"failed" : 0,
"skipped" : 0,
"successful" : 1,
"total" : 1
},
"count" : 299922
}
```
- I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/64
- For now I will try to delete the index and start a re-harvest in the Admin UI:
```
$ curl -XDELETE http://localhost:9200/openrxv-items-final
{"acknowledged":true}%
```
- Moayad said he's working on the harvesting so I stopped it for now to re-deploy his latest changes
- I updated Tomcat to version 7.0.107 on CGSpace (linode18), ran all updates, and restarted the server
- I deleted both items indexes and restarted the harvesting:
```
$ curl -XDELETE http://localhost:9200/openrxv-items-final
$ curl -XDELETE http://localhost:9200/openrxv-items-temp
```
<!-- vim: set sw=2 ts=2: -->