diff --git a/content/posts/2023-12.md b/content/posts/2023-12.md index 95041eaae..67f1b9dea 100644 --- a/content/posts/2023-12.md +++ b/content/posts/2023-12.md @@ -185,4 +185,87 @@ dspace=*# COMMIT; COMMIT ``` +## 2023-12-25 + +- Looking into [Solr backups](https://solr.apache.org/guide/8_11/making-and-restoring-backups.html) + - Since we are not running in Solr Cloud mode we need to use the replication endpoint for Solr standalone + - This works: + +```console +$ curl 'http://localhost:8983/solr/statistics/replication?command=backup' +{ + "responseHeader":{ + "status":0, + "QTime":26}, + "status":"OK"} +``` + +- Then I saw the size of the snapshot reach the size of the index... + +```console +# du -sh /var/solr/data/configsets/statistics/data/* +22G /var/solr/data/configsets/statistics/data/index +16G /var/solr/data/configsets/statistics/data/snapshot.20231225074111671 +4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata +# du -sh /var/solr/data/configsets/statistics/data/* +22G /var/solr/data/configsets/statistics/data/index +20G /var/solr/data/configsets/statistics/data/snapshot.20231225074111671 +4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata +# du -sh /var/solr/data/configsets/statistics/data/* +22G /var/solr/data/configsets/statistics/data/index +21G /var/solr/data/configsets/statistics/data/snapshot.20231225074111671 +4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata +# du -sh /var/solr/data/configsets/statistics/data/* +22G /var/solr/data/configsets/statistics/data/index +22G /var/solr/data/configsets/statistics/data/snapshot.20231225074111671 +4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata +``` + +- Then I deleted the core and restored from the snapshot backup: + +```console +$ curl http://localhost:8983/solr/statistics/update -H "Content-type: text/xml" --data-binary '*:*' +$ curl http://localhost:8983/solr/statistics/update -H "Content-type: text/xml" --data-binary '' +$ curl 'http://localhost:8983/solr/statistics/replication?command=restore&name=statistics' +``` + +- Interestingly the import worked fine, but created a new data index: + +```console +# du -sh /var/solr/data/configsets/statistics/data/* +4.0K /var/solr/data/configsets/statistics/data/index.properties +22G /var/solr/data/configsets/statistics/data/restore.20231225154626463 +4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata +22G /var/solr/data/configsets/statistics/data/snapshot.statistics +``` + +- Not sure the implications of that—Solr uses the data just fine +- I can surely use this for atomic Solr backups + +## 2023-12-27 + +- Delete duplicate metadata as described in my DSpace issue from last year: https://github.com/DSpace/DSpace/issues/8253 +- Do some other metadata cleanups on CGSpace + - I also looked up our DOIs on Crossref to get some missing abstracts and correct licenses and dates +- Some minor work on the CGSpace DSpace 7 theme to fix the navbar on mobile +- Some work on the IFPRI ISNAR archive + +## 2023-12-28 + +- I started porting the [cgspace-java-helpers](https://github.com/ilri/cgspace-java-helpers) to DSpace 7 +- Some work on the IFPRI ISNAR archive + - I ended up going through most of the PDFs to get better dates and abstracts + +## 2023-12-29 + +- I created a new Hetzner server to replace the current DSpace 6 CGSpace next week when we migrate to DSpace 7 +- Interesting, I haven't checked for content pointing to legacy domains in several years (!) + - `inurl:mahider.cgiar.org`: 0 results on Google! + - `inurl:mahider.ilri.org`: 2,100 results on Google + - `inurl:mahider.ilri.org inurl:https`: 2 results on Google (!) + - `inurl:dspace.ilri.org:` 1,390 results on Google + - `inurl:dspace.ilri.org inurl:https`: 0 results on Google (!) +- So it seems I can do away with the HTTPS virtual hosts finally + - Well my current certificates expired on 2021-02-13 and nobody noticed... so... + diff --git a/docs/2023-07/index.html b/docs/2023-07/index.html index 9fb91b3c1..1dad980f2 100644 --- a/docs/2023-07/index.html +++ b/docs/2023-07/index.html @@ -7,17 +7,17 @@ - + - + - + @@ -30,7 +30,7 @@ "url": "https://alanorth.github.io/cgspace-notes/2023-07/", "wordCount": "2255", "datePublished": "2023-07-01T17:14:36+03:00", - "dateModified": "2023-08-02T23:04:11+03:00", + "dateModified": "2023-12-27T10:48:32+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -170,7 +170,7 @@
  • I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses
  • -
  • Delete duplicate metadata as describe in my DSpace issue from last year: https://github.com/DSpace/DSpace/issues/8253
  • +
  • Delete duplicate metadata as described in my DSpace issue from last year: https://github.com/DSpace/DSpace/issues/8253
  • Start working on some statistics on AGROVOC usage for my presenation next week