2023-11-01
- Work a bit on the ETL pipeline for the CGIAR Climate Change Synthesis
- I improved the filtering and wrote some Python using pandas to merge my sources more reliably
2023-11-02
- Export CGSpace to check missing Initiative collection mappings
- Start a harvest on AReS
- IFPRI contacted us about importing their Slideshare presentations to CGSpace
- There are ~1,700 of them and date back to as early as 2008
- I did a quick cleanup of the metadata export from Slideshare (including tagging with some AGROVOC in OpenRefine) and uploaded to DSpace Test
2023-11-03
- A little bit of work on the CGIAR Climate Change Synthesis
- Discuss some CGSpace migration plans with Leigh from IFPRI
- For their Slideshare content we agreed:
- Exclude private
- Exclude deleted
- Exclude non presentation types
- Exclude duplicates within the collection for now until we can sort them out
- That leaves about 1,500 items out of the 1,700
- I did a duplicate check against CGSpace and found 44 items with 1.0 similarity so I removed those
2023-11-04
- Export CGSpace to check for missing Initiative collection mappings
- I ran through the list of potential duplicates on the IFPRI Slideshare presentations
2023-11-05
- Work with Salem to migrate AReS to the new version
2023-11-07
- DSpace 7 Test went down and there is very high load on the server
- I saw very high load from Java but didn’t have time to check exactly what was wrong so I just rebooted the host
- A few hours after restarting the system went down again, with very high load from Java again
- I see lots of messages like this in the Tomcat log:
tomcat9[732]: [9955.662s][info ][gc] GC(6291) Pause Full (G1 Compaction Pause) 4085M->4080M(4096M) 677.251ms
tomcat9[732]: [9955.662s][info ][gc] GC(6290) Concurrent Mark Cycle 677.558ms
tomcat9[732]: [9955.666s][info ][gc] GC(6292) To-space exhausted
- I see some messages in
dspace.log
about heap space:
Caused by: java.lang.OutOfMemoryError: Java heap space
- I will increase Tomcat’s heap from 4096m to 5120m
- A few hours later it happened again, so I increased the heap from 5120m to 6144m
- Not sure what’s going on today…
- I tested moving the CGIAR Fund Council community to the CGIAR historic archive on DSpace Test:
$ dspace community-filiator -r -p 10568/83389 -c 10947/2516
$ dspace community-filiator -s -p 10947/2515 -c 10947/2516
$ dspace index-discovery -r 10947/2516
$ dspace index-discovery -r 10947/2515
$ dspace index-discovery -r 10568/83389
$ dspace index-discovery
- I think this is the minimal we can do to avoid a full Discovery reindex which is very expensive
- I helped Maria resize some massive PDFs for upload to CGSpace using GhostScript prepress mode as I had done before in September, 2023,
2023-11-08
- DSpace 7 Test has very high load again and I see more Java heap space errors in the log
# grep -c 'Caused by: java.lang.OutOfMemoryError: Java heap space' /home/dspace7/log/dspace.log-2023-11-07
35
# grep -c 'Caused by: java.lang.OutOfMemoryError: Java heap space' /home/dspace7/log/dspace.log
7
- I don’t know what is happening… I will increase the heap size from 6144m to 7168m again…