CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

May, 2024

2024-05-01

  • I dumped all the CGSpace DOIs and resolved them with my crossref_doi_lookup.py script
    • Then I did some work to add missing abstracts (about 900!), volumes, issues, licenses, publishers, and types, etc
Read more →

January, 2024

2024-01-02

  • Work on preparation of new server for DSpace 7 migration
    • I’m not quite sure what we need to do for the Handle server
    • For now I just ran the dspace make-handle-config script and diffed it with the one from DSpace 6
    • I sent the bundle to the Handle admins to make sure it’s OK before we do the migration
  • Continue testing and debugging the cgspace-java-helpers on DSpace 7
  • Work on IFPRI ISNAR archive cleanup
Read more →

December, 2023

2023-12-01 There is still high load on CGSpace and I don’t know why I don’t see a high number of sessions compared to previous days in the last few weeks $ for file in dspace.log.2023-11-[23]*; do echo "$file"; grep -a -oE 'session_id=[A-Z0-9]{32}' "$file" | sort | uniq | wc -l; done dspace.log.2023-11-20 22865 dspace.log.2023-11-21 20296 dspace.log.2023-11-22 19688 dspace.log.2023-11-23 17906 dspace.log.2023-11-24 18453 dspace.log.2023-11-25 17513 dspace.log.2023-11-26 19037 dspace.log.2023-11-27 21103 dspace.log.2023-11-28 23023 dspace.log.2023-11-29 23545 dspace. Read more →

November, 2023

2023-11-01

  • Work a bit on the ETL pipeline for the CGIAR Climate Change Synthesis
    • I improved the filtering and wrote some Python using pandas to merge my sources more reliably

2023-11-02

  • Export CGSpace to check missing Initiative collection mappings
  • Start a harvest on AReS
Read more →

October, 2023

2023-10-02

  • Export CGSpace to check DOIs against Crossref
    • I found that Crossref’s metadata is in the public domain under the CC0 license
    • One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive
    • We can be on the safe side by using only abstracts for items that are licensed under Creative Commons
Read more →

September, 2023

2023-09-02

  • Export CGSpace to check for missing Initiative collection mappings
  • Start a harvest on AReS
Read more →

August, 2023

2023-08-03

  • I finally got around to working on Peter’s cleanups for affiliations, authors, and donors from last week
    • I did some minor cleanups myself and applied them to CGSpace
  • Start working on some batch uploads for IFPRI
Read more →