December, 2020
2020-12-01
- Atmire responded about the issue with duplicate data in our Solr statistics
- They noticed that some records in the statistics-2015 core haven’t been migrated with the AtomicStatisticsUpdateCLI tool yet and assumed that I haven’t migrated any of the records yet
- That’s strange, as I checked all ten cores and 2015 is the only one with some unmigrated documents, as according to the
cua_version
field - I started processing those (about 411,000 records):
$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics-2015
- AReS went down when the
renew-letsencrypt
service stopped theangular_nginx
container in the pre-update hook and failed to bring it back up- I ran all system updates on the host and rebooted it and AReS came back up OK
2020-12-02
- Udana emailed me yesterday to ask why the CGSpace usage statistics were showing “No Data”
- I noticed a message in the Solr Admin UI that one of the statistics cores failed to load, but it is up and I can query it…
- Nevertheless, I restarted Tomcat a few times to see if all cores would come up without an error message, but had no success (despite that all cores ARE up and I can query them, sigh)
- I think I will move all the Solr yearly statistics back into the main statistics core
- Start testing export/import of yearly Solr statistics data into the main statistics core on DSpace Test, for example:
$ ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o statistics-2010.json -k uid
$ ./run.sh -s http://localhost:8081/solr/statistics -a import -o statistics-2010.json -k uid
$ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:*</query></delete>"
- I deployed Tomcat 7.0.107 on DSpace Test (CGSpace is still Tomcat 7.0.104)
- I finished migrating all the statistics from the yearly shards back to the main core
2020-12-05
- I deleted all the yearly statistics shards and restarted Tomcat on DSpace Test (linode26)
2020-12-06
- Looking into the statistics on DSpace Test after I migrated them back to the main core
- All stats are working as expected… indexing time for the DSpace Statistics API is the same… and I don’t even see a difference in the JVM or memory stats in Munin other than a minor jump last week when I was processing them
- I will migrate them on CGSpace too I think
- First I will start with the statistics-2010 and statistics-2015 cores because they were the ones that were failing to load recently (despite actually being available in Solr WTF)
- First the 2010 core:
$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics-2010 -a export -o statistics-2010.json -k uid
$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics -a import -o statistics-2010.json -k uid
$ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=true" -H "Content-Type: text/xml" --data-binary "<delete><query>*:*</query></delete>"
- Judging by the DSpace logs all these cores had a problem starting up in the last month:
# grep -rsI "Unable to create core" [dspace]/log/dspace.log.2020-* | grep -o -E "statistics-[0-9]+" | sort | uniq -c
24 statistics-2010
24 statistics-2015
18 statistics-2016
6 statistics-2018
- The message is always this:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error CREATEing SolrCore 'statistics-2016': Unable to create core [statistics-2016] Caused by: Lock obtain timed out: NativeFSLock@/[dspace]/solr/statistics-2016/data/index/write.lock
- I will migrate all these cores and see if it makes a difference, then probably end up migrating all of them
- I removed the statistics-2010, statistics-2015, statistics-2016, and statistics-2018 cores and restarted Tomcat and all the statistics cores came up OK and the CUA statistics are OK!
2020-12-07
- Run
dspace cleanup -v
on CGSpace to clean up deleted bitstreams - Atmire sent a pull request to address the duplicate owningComm and owningColl
- Built and deployed it on DSpace Test but I am not sure how to run it yet
- I sent feedback to Atmire on their tracker: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=839
- Abenet and Tezira are having issues with committing to the archive in their workflow
- I looked at the server and indeed the locks and transactions are back up:
- There are apparently 1,700 locks right now:
$ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid;' | wc -l
1739
2020-12-08
- Atmire sent some instructions for using the DeduplicateValuesProcessor
- I modified
atmire-cua-update.xml
as they instructed, but I get a million errors like this when I run AtomicStatisticsUpdateCLI with that configuration:
- I modified
Record uid: 64387815-d9a7-4605-8024-1c0a5c7520e0 couldn't be processed
com.atmire.statistics.util.update.atomic.ProcessingException: something went wrong while processing record uid: 64387815-d9a7-4605-8024-1c0a5c7520e0, an error occured in the com.atmire.statistics.util.update.atomic.processor.DeduplicateValuesProcessor
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(SourceFile:304)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.processRecords(SourceFile:176)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(SourceFile:161)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(SourceFile:128)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(SourceFile:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
Caused by: java.lang.UnsupportedOperationException
at org.apache.solr.common.SolrDocument$1.entrySet(SolrDocument.java:256)
at java.util.HashMap.putMapEntries(HashMap.java:512)
at java.util.HashMap.<init>(HashMap.java:490)
at com.atmire.statistics.util.update.atomic.record.Record.getFieldValuesMap(SourceFile:86)
at com.atmire.statistics.util.update.atomic.processor.DeduplicateValuesProcessor.process(SourceFile:38)
at com.atmire.statistics.util.update.atomic.processor.DeduplicateValuesProcessor.visit(SourceFile:34)
at com.atmire.statistics.util.update.atomic.record.UsageRecord.accept(SourceFile:23)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(SourceFile:301)
... 10 more
- I sent some feedback to Atmire
- They responded with an updated CUA (6.x-4.1.10-ilri-RC7) that has a fix for the duplicates processor and a possible fix for the database locking issues (a bug in CUASolrLoggerServiceImpl that causes an infinite loop and a Tomcat timeout)
- I deployed the changes on DSpace Test and CGSpace, hopefully it will fix both issues!
- In other news, after I restarted Tomcat on CGSpace the statistics-2013 core didn’t come back up properly, so I exported it and imported it into the main statistics core like I did for the others a few days ago
- Sync DSpace Test with CGSpace’s Solr, PostgreSQL database, and assetstore…
2020-12-09
- I was running the AtomicStatisticsUpdateCLI to remove duplicates on DSpace Test but it failed near the end of the statistics core (after 20 hours or so) with a memory error:
Successfully finished updating Solr Storage Reports | Wed Dec 09 15:25:11 CET 2020
Run 1 — 67% — 10,000/14,935 docs — 6m 6s — 6m 6s
Exception: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.noggit.CharArr.toString(CharArr.java:164)
- I increased the JVM heap to 2048m and tried again, but it failed with a memory error again…
- I increased the JVM heap to 4096m and tried again, but it failed with another error:
Successfully finished updating Solr Storage Reports | Wed Dec 09 15:53:40 CET 2020
Exception: parsing error
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: parsing error
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:530)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.getNextSetOfSolrDocuments(SourceFile:392)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(SourceFile:157)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(SourceFile:128)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(SourceFile:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
Caused by: org.apache.solr.common.SolrException: parsing error
at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:45)
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:528)
... 14 more
Caused by: org.apache.http.TruncatedChunkException: Truncated chunk ( expected size: 8192; actual size: 2843)
at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:200)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
at org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:89)
at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:125)
at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:152)
...
2020-12-10
- The statistics-2019 core finished processing the duplicate removal so I started the statistics-2017 core
- Peter asked me to add ONE HEALTH to ILRI subjects on CGSpace
- A few items that got “lost” after approval during the database issues earlier this week seem to have gone back into their workflows
- Abenet approved them again and they got new handles, phew
- Abenet was having an issue with the date filter on AReS and it turns out that it’s the same
.keyword
issue I had noticed before that causes the filter to stop working- I fixed the filter to use the correct field name and filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/63
- I checked the Solr statistics on DSpace Test to see if the Atmire duplicates remover was working, but now I see a comical amount of duplicates…
- I sent feedback about this to Atmire
- I will re-sync the Solr stats from CGSpace so we can try again…
- In other news, it has been a few days since we deployed the fix for the database locking issue and things seem much better now:
2020-12-13
- I tried to harvest a few times on OpenRXV in the last few days and every time it appends all the new records to the items index instead of overwriting it:
- I can see it in the
openrxv-items-final
index:
$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*' | json_pp
{
"_shards" : {
"failed" : 0,
"skipped" : 0,
"successful" : 1,
"total" : 1
},
"count" : 299922
}
- I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/64
- For now I will try to delete the index and start a re-harvest in the Admin UI:
$ curl -XDELETE http://localhost:9200/openrxv-items-final
{"acknowledged":true}%
- Moayad said he’s working on the harvesting so I stopped it for now to re-deploy his latest changes
- I updated Tomcat to version 7.0.107 on CGSpace (linode18), ran all updates, and restarted the server
- I deleted both items indexes and restarted the harvesting:
$ curl -XDELETE http://localhost:9200/openrxv-items-final
$ curl -XDELETE http://localhost:9200/openrxv-items-temp
- Peter asked me for a list of all submitters and approvers that were active recently on CGSpace
- I can probably extract that from the
dc.description.provenance
field, for example any that contains a 2020 date:
- I can probably extract that from the
localhost/dspace63= > SELECT * FROM metadatavalue WHERE metadata_field_id=28 AND text_value ~ '^.*on 2020-[0-9]{2}-*';
2020-12-14
- The re-harvesting finished last night on AReS but there are no records in the
openrxv-items-final
index- Strangely, there are 99,000 items in the temp index:
$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*' | json_pp
{
"count" : 99992,
"_shards" : {
"skipped" : 0,
"total" : 1,
"failed" : 0,
"successful" : 1
}
}
- I’m going to try to clone the temp index to the final one…
- First, set the
openrxv-items-temp
index to block writes (read only) and then clone it toopenrxv-items-final
:
- First, set the
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-final
{"acknowledged":true,"shards_acknowledged":true,"index":"openrxv-items-final"}
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": false}}'
- Now I see that the
openrxv-items-final
index has items, but there are still none in AReS Explorer UI!
$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*&pretty'
{
"count" : 99992,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
- The api logs show this from last night after the harvesting:
[Nest] 92 - 12/13/2020, 1:58:52 PM [HarvesterService] Starting Harvest
[Nest] 92 - 12/13/2020, 10:50:20 PM [FetchConsumer] OnGlobalQueueDrained
[Nest] 92 - 12/13/2020, 11:00:20 PM [PluginsConsumer] OnGlobalQueueDrained
[Nest] 92 - 12/13/2020, 11:00:20 PM [HarvesterService] reindex function is called
(node:92) UnhandledPromiseRejectionWarning: ResponseError: index_not_found_exception
at IncomingMessage.<anonymous> (/backend/node_modules/@elastic/elasticsearch/lib/Transport.js:232:25)
at IncomingMessage.emit (events.js:326:22)
at endReadableNT (_stream_readable.js:1223:12)
at processTicksAndRejections (internal/process/task_queues.js:84:21)
- But I’m not sure why the frontend doesn’t show any data despite there being documents in the index…
- I talked to Moayad and he reminded me that OpenRXV uses an alias to point to temp and final indexes, but the UI actually uses the
openrxv-items
index - I cloned the
openrxv-items-final
index toopenrxv-items
index and now I see items in the explorer UI - The PDF report was broken and I looked in the API logs and saw this:
(node:94) UnhandledPromiseRejectionWarning: Error: Error: Could not find soffice binary
at ExportService.downloadFile (/backend/dist/export/services/export/export.service.js:51:19)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
- I installed
unoconv
in the backend api container and now it works… but I wonder why this changed… - Skype with Abenet and Peter to discuss AReS that will be shown to ILRI scientists this week
- Peter noticed that this item from the ILRI policy and research briefs collection is missing in AReS, despite it being added one month ago in CGSpace and me harvesting on AReS last night
- The item appears fine in the REST API when I check the items in that collection
- Peter also noticed that this item appears twice in AReS
- The item is not duplicated on CGSpace or in the REST API
- We noticed that there are 136 items in the ILRI policy and research briefs collection according to AReS, yet on CGSpace there are only 132
- This is confirmed in the REST API (using query-json):
- Peter noticed that this item from the ILRI policy and research briefs collection is missing in AReS, despite it being added one month ago in CGSpace and me harvesting on AReS last night
$ http --print b 'https://cgspace.cgiar.org/rest/collections/defee001-8cc8-4a6c-8ac8-21bb5adab2db?expand=all&limit=100&offset=0' | json_pp > /tmp/policy1.json
$ http --print b 'https://cgspace.cgiar.org/rest/collections/defee001-8cc8-4a6c-8ac8-21bb5adab2db?expand=all&limit=100&offset=100' | json_pp > /tmp/policy2.json
$ query-json '.items | length' /tmp/policy1.json
100
$ query-json '.items | length' /tmp/policy2.json
32
- I realized that the issue of missing/duplicate items in AReS might be because of this REST API bug that causes /items to return items in non-deterministic order
- I decided to cherry-pick the following two patches from DSpace 6.4 into our
6_x-prod
(6.3) branch:- High CPU usage when calling the collection_id/items REST endpoint
- Jira: https://jira.lyrasis.org/browse/DS-4342
- c2e6719fa763e291b81b2d61da2f8c758fe38ff3
- REST API items resource returns items in non-deterministic order
- Jira: https://jira.lyrasis.org/browse/DS-3849
- 2a2ea0cb5d03e6da9355a2eff12aad667e465433
- High CPU usage when calling the collection_id/items REST endpoint
- After deploying the REST API fixes I decided to harvest from AReS again to see if the missing and duplicate items get fixed
- I made a backup of the current
openrxv-items-temp
index just in case:
- I made a backup of the current
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
$ curl -s -X POST http://localhost:9200/openrxv-items-temp/_clone/openrxv-items-2020-12-14
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": false}}'
2020-12-15
- After the re-harvest last night there were 200,000 items in the
openrxv-items-temp
index again- I cleared the core and started a re-harvest, but Peter sent me a bunch of author corrections for CGSpace so I decided to cancel it until after I apply them and re-index Discovery
- I checked the 1,534 fixes in Open Refine (had to fix a few UTF-8 errors, as always from Peter’s CSVs) and then applied them using the
fix-metadata-values.py
script:
$ ./fix-metadata-values.py -i /tmp/2020-10-28-fix-1534-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t 'correct' -m 3
$ ./delete-metadata-values.py -i /tmp/2020-10-28-delete-2-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3
- Since I was re-indexing Discovery anyways I decided to check for any uppercase AGROVOC and lowercase them:
dspace=# BEGIN;
BEGIN
dspace=# UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id=57 AND text_value ~ '[[:upper:]]';
UPDATE 406
dspace=# COMMIT;
COMMIT
- I also updated the Font Awesome icon classes for version 5 syntax:
dspace=# BEGIN;
dspace=# UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'fa fa-rss','fas fa-rss', 'g') WHERE text_value LIKE '%fa fa-rss%';
UPDATE 74
dspace=# UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'fa fa-at','fas fa-at', 'g') WHERE text_value LIKE '%fa fa-at%';
UPDATE 74
dspace=# COMMIT;
- Then I started a full Discovery re-index:
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m"
$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 265m11.224s
user 171m29.141s
sys 2m41.097s
- Udana sent a report that the WLE approver is experiencing the same issue Peter highlighted a few weeks ago: they are unable to save metadata edits in the workflow
- Yesterday Atmire responded about the owningComm and owningColl duplicates in Solr saying they didn’t see any anymore…
- Indeed I spent a few minutes looking randomly and I didn’t find any either…
- I did, however, see lots of duplicates in countryCode_search, countryCode_ngram, ip_search, ip_ngram, userAgent_search, userAgent_ngram, referrer_search, referrer_ngram fields
- I sent feedback to them
- On the database locking front we haven’t had issues in over a week and the Munin graphs look normal:
- After the Discovery re-indexing finished on CGSpace I prepared to start re-harvesting AReS by making sure the
openrxv-items-temp
index was empty and that the backup index I made yesterday was still there:
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp?pretty'
{
"acknowledged" : true
}
$ curl -s 'http://localhost:9200/openrxv-items-final/_count?q=*&pretty'
{
"count" : 0,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
$ curl -s 'http://localhost:9200/openrxv-items-2020-12-14/_count?q=*&pretty'
{
"count" : 99992,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
2020-12-16
- The harvesting on AReS finished last night so this morning I manually cloned the
openrxv-items-temp
index toopenrxv-items
- First check the number of items in the temp index, then set it to read only, then delete the items index, then delete the temp index:
$ curl -s 'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'
{
"count" : 100046,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": true}}'
$ curl -XDELETE 'http://localhost:9200/openrxv-items?pretty'
$ curl -s -X POST "http://localhost:9200/openrxv-items-temp/_clone/openrxv-items?pretty"
$ curl -s 'http://localhost:9200/openrxv-items/_count?q=*&pretty'
{
"count" : 100046,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
$ curl -X PUT "localhost:9200/openrxv-items-temp/_settings?pretty" -H 'Content-Type: application/json' -d'{"settings": {"index.blocks.write": false}}'
$ curl -XDELETE 'http://localhost:9200/openrxv-items-temp?pretty'
- Interestingly the item that we noticed was duplicated now only appears once
- The missing item is still missing
- Jane Poole noticed that the “previous page” and “next page” buttons are not working on AReS
- I filed a bug on GitHub: https://github.com/ilri/OpenRXV/issues/65
- Generate a list of submitters and approvers active in the last months using the Provenance field on CGSpace:
$ psql -h localhost -U postgres dspace -c "SELECT text_value FROM metadatavalue WHERE metadata_field_id=28 AND text_value ~ '^.*on 2020-(06|07|08|09|10|11|12)-*'" > /tmp/provenance.txt
$ grep -o -E 'by .*)' /tmp/provenance.txt | grep -v -E "( on |checksum)" | sed -e 's/by //' -e 's/ (/,/' -e 's/)//' | sort | uniq > /tmp/recent-submitters-approvers.csv
- Peter wanted it to send some mail to the users…