+
+ 2023-11-01
+
+- Work a bit on the ETL pipeline for the CGIAR Climate Change Synthesis
+
+- I improved the filtering and wrote some Python using pandas to merge my sources more reliably
+
+
+
+2023-11-02
+
+- Export CGSpace to check missing Initiative collection mappings
+- Start a harvest on AReS
+
+
+- IFPRI contacted us about importing their Slideshare presentations to CGSpace
+
+- There are ~1,700 of them and date back to as early as 2008
+- I did a quick cleanup of the metadata export from Slideshare (including tagging with some AGROVOC in OpenRefine) and uploaded to DSpace Test
+
+
+
+2023-11-03
+
+- A little bit of work on the CGIAR Climate Change Synthesis
+- Discuss some CGSpace migration plans with Leigh from IFPRI
+
+- For their Slideshare content we agreed:
+
+- Exclude private
+- Exclude deleted
+- Exclude non presentation types
+- Exclude duplicates within the collection for now until we can sort them out
+
+
+- That leaves about 1,500 items out of the 1,700
+
+
+- I did a duplicate check against CGSpace and found 44 items with 1.0 similarity so I removed those
+
+2023-11-04
+
+- Export CGSpace to check for missing Initiative collection mappings
+- I ran through the list of potential duplicates on the IFPRI Slideshare presentations
+
+2023-11-05
+
+- Work with Salem to migrate AReS to the new version
+
+2023-11-07
+
+- DSpace 7 Test went down and there is very high load on the server
+
+- I saw very high load from Java but didn’t have time to check exactly what was wrong so I just rebooted the host
+- A few hours after restarting the system went down again, with very high load from Java again
+- I see lots of messages like this in the Tomcat log:
+
+
+
+tomcat9[732]: [9955.662s][info ][gc] GC(6291) Pause Full (G1 Compaction Pause) 4085M->4080M(4096M) 677.251ms
+tomcat9[732]: [9955.662s][info ][gc] GC(6290) Concurrent Mark Cycle 677.558ms
+tomcat9[732]: [9955.666s][info ][gc] GC(6292) To-space exhausted
+
+- I see some messages in
dspace.log
about heap space:
+
+Caused by: java.lang.OutOfMemoryError: Java heap space
+
+- I will increase Tomcat’s heap from 4096m to 5120m
+
+- A few hours later it happened again, so I increased the heap from 5120m to 6144m
+- Not sure what’s going on today…
+
+
+- I tested moving the CGIAR Fund Council community to the CGIAR historic archive on DSpace Test:
+
+$ dspace community-filiator -r -p 10568/83389 -c 10947/2516
+$ dspace community-filiator -s -p 10947/2515 -c 10947/2516
+$ dspace index-discovery -r 10947/2516
+$ dspace index-discovery -r 10947/2515
+$ dspace index-discovery -r 10568/83389
+$ dspace index-discovery
+
+- I think this is the minimal we can do to avoid a full Discovery reindex which is very expensive
+- I helped Maria resize some massive PDFs for upload to CGSpace using GhostScript prepress mode as I had done before in September, 2023,
+
+2023-11-08
+
+- DSpace 7 Test has very high load again and I see more Java heap space errors in the log
+
+# grep -c 'Caused by: java.lang.OutOfMemoryError: Java heap space' /home/dspace7/log/dspace.log-2023-11-07
+35
+# grep -c 'Caused by: java.lang.OutOfMemoryError: Java heap space' /home/dspace7/log/dspace.log
+7
+
+- I don’t know what is happening… I will increase the heap size from 6144m to 7168m again…
+
+
+
+
+
+
+
+