This commit is contained in:
2020-08-06 16:24:01 +03:00
parent a0890eaaf4
commit 811a12cb5e
20 changed files with 481 additions and 23 deletions

View File

@ -58,7 +58,7 @@ $ http 'http://localhost:8080/rest/collections/1445/items' jq '. | length'
61
```
- Also on DSpace Test (which is running DSpace 6!), though the issue is slightly different there:
- Also on DSpace Test (which is running DSpace 6!), though the issue is slightly different there:
```
$ http 'https://dspacetest.cgiar.org/rest/collections/5471c3aa-202e-42f0-96c2-497a18e3b708' | json_pp | grep numberItems
@ -181,4 +181,26 @@ on_id=[A-Z0-9]{32}' | sort | uniq | wc -l
- I will add `Turnitin` to the Tomcat Crawler Session Manager Valve regex as well...
## 2020-08-06
- I have been working on processing the Solr statistics with the Atmire tool on DSpace Test the last few days:
- statistics:
- 2,040,385 docs: 2h 28m 49s
- statistics-2019:
- 8,960,000 docs: 12h 7s
- 1,780,575 docs: 2h 7m 29s
- statistics-2018:
- 1,970,000 docs: 12h 1m 28s
- 360,000 docs: 2h 54m 56s (Linode rebooted)
- 1,110,000 docs: 7h 1m 44s (Restarted Tomcat, oops)
- I decided to start the 2018 core over again, so I re-synced it from CGSpace and started again from the solr-upgrade-statistics-6x tool and now I'm having the same issues with Java heap space that I had last month
- The process kept crashing due to memory, so I increased the memory to 3072m and finally 4096m...
- Also, I decided to try to purge all the `-unmigrated` docs that it had found so far to see if that helps...
- There were about 466,000 records unmigrated so far, most of which were `type: 5` (SITE statistics)
- Now it is processing again...
- I developed a small Java class called `FixJpgJpgThumbnails` to remove ".jpg.jpg" thumbnails from the `THUMBNAIL` bundle and replace them with their originals from the `ORIGINAL` bundle
- The code is based on [RemovePNGThumbnailsForPDFs.java](https://github.com/UoW-IRRs/DSpace-Scripts/blob/master/src/main/java/nz/ac/waikato/its/irr/scripts/RemovePNGThumbnailsForPDFs.java) by Andrea Schweer
- I incorporated it into my dspace-curation-tasks repository, then renamed it to [cgspace-java-helpers](https://github.com/ilri/cgspace-java-helpers)
- In testing I found that I can replace ~3,500 thumbnails on CGSpace!
<!-- vim: set sw=2 ts=2: -->