Add notes for 2023-07-25

This commit is contained in:
2023-07-25 23:54:53 +03:00
parent e4dc8a3ed0
commit 6e701ee9c2
133 changed files with 281 additions and 171 deletions

View File

@ -154,4 +154,56 @@ $ psql < locks-age.sql | grep -E " (19|18|17|16|12):" | awk -F"|" '{print $10}'
- Export CGSpace tp fix missing Initiative collections
- Start a harvest on AReS
## 2023-07-24
- Test Salem's new JavaScript-based DSpace Statistics API and send him some feedback
- I noticed a few times that the Solr service on my DSpace 7 instance is getting OOM killed
- I had been using a 4g Solr heap, but maybe we don't need that much
- Tomcat is also using 4.6GB, and then there's PostgreSQL... so perhaps it's all a bit much on this system now
## 2023-07-25
- Start testing exporting DSpace 6 Solr cores to import on DSpace 7:
```console
$ chrt -b 0 dspace solr-export-statistics -i statistics
```
- I'm curious how long it takes and how much data there will be
- The size of the Solr data directory is currently 82GB
- The export took about 2.5 hours and created 6,000 individual CSVs, one for each day of Solr stats
- The size of the exported CSVs is about 88GB
- I will copy just a few years to import on the DSpace 7 test server
- So importing these is going to require removing the Atmire custom fields:
```console
$ dspace solr-import-statistics -i statistics
Exception: Error from server at http://localhost:8983/solr/statistics: ERROR: [doc=1a92472e-e39d-4602-9b4d-da022df8f233] unknown field 'containerCommunity'
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/statistics: ERROR: [doc=1a92472e-e39d-4602-9b4d-da022df8f233] unknown field 'containerCommunity'
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
at org.dspace.util.SolrImportExport.importIndex(SolrImportExport.java:465)
at org.dspace.util.SolrImportExport.main(SolrImportExport.java:148)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:277)
at org.dspace.app.launcher.ScriptLauncher.handleScript(ScriptLauncher.java:133)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:98)
```
- I will try using solr-import-export-json, which I've used in the past to skip Atmire custom fields in Solr:
```console
$ chrt -b 0 ./run.sh -s http://localhost:8081/solr/statistics -a export -o /tmp/statistics-2022.json -f 'time:[2022-01-01T00\:00\:00Z TO 2022-12-31T23\:59\:59Z]' -k uid -S author_mtdt,author_mtdt_search,iso_mtdt_search,iso_mtdt,subject_mtdt,subject_mtdt_search,containerCollection,containerCommunity,containerItem,countryCode_ngram,countryCode_search,cua_version,dateYear,dateYearMonth,geoipcountrycode,geoIpCountryCode,ip_ngram,ip_search,isArchived,isInternal,isWithdrawn,containerBitstream,file_id,referrer_ngram,referrer_search,userAgent_ngram,userAgent_search,version_id,complete_query,complete_query_search,filterquery,ngram_query_search,ngram_simplequery_search,simple_query,simple_query_search,range,rangeDescription,rangeDescription_ngram,rangeDescription_search,range_ngram,range_search,actingGroupId,actorMemberGroupId,bitstreamCount,solr_update_time_stamp,bitstreamId,core_update_run_nb
```
- Some users complained that CGSpace was slow and I found a handful of locks that were hours and days old...
- I killed those and told them to try again
- After importing the Solr statistics into DSpace 7 I realized that my DSpace Statistics API will work fine
- I made some minor modifications to the Ansible infrastructure scripts to make sure it is enabled and then activated it on DSpace 7 Test
<!-- vim: set sw=2 ts=2: -->