CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

November, 2023

2023-11-01

  • Work a bit on the ETL pipeline for the CGIAR Climate Change Synthesis
    • I improved the filtering and wrote some Python using pandas to merge my sources more reliably

2023-11-02

  • Export CGSpace to check missing Initiative collection mappings
  • Start a harvest on AReS
Read more →

October, 2023

2023-10-02

  • Export CGSpace to check DOIs against Crossref
    • I found that Crossref’s metadata is in the public domain under the CC0 license
    • One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive
    • We can be on the safe side by using only abstracts for items that are licensed under Creative Commons
Read more →

September, 2023

2023-09-02

  • Export CGSpace to check for missing Initiative collection mappings
  • Start a harvest on AReS
Read more →

August, 2023

2023-08-03

  • I finally got around to working on Peter’s cleanups for affiliations, authors, and donors from last week
    • I did some minor cleanups myself and applied them to CGSpace
  • Start working on some batch uploads for IFPRI
Read more →

July, 2023

2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as “Copyrighted; all rights reserved” based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it’s usually copyrighted (could still be open access, but we can’t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status… In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don’t like the Impact Area icons as a component because they don’t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I’ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be “acceptedVersion”, which is presumably the author’s version, as opposed to the “publishedVersion”, which means it’s available as open access on the publisher’s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as describe in my DSpace issue from last year: https://github. Read more →

June, 2023

2023-06-02

  • Spend some time testing my post_bitstreams.py script to update thumbnails for items on CGSpace
    • Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail…
  • Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
    • They have experience with improving the MODS interface in MELSpace’s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace
    • From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk
Read more →

May, 2023

2023-05-03

  • Alliance’s TIP team emailed me to ask about issues authenticating on CGSpace
    • It seems their password expired, which is annoying
  • I continued looking at the CGSpace subjects for the FAO / AGROVOC exercise that I started last week
    • There are many of our subjects that would match if they added a “-” like “high yielding varieties” or used singular…
    • Also I found at least two spelling mistakes, for example “decison support systems”, which would match if it was spelled correctly
  • Work on cleaning, proofing, and uploading twenty-seven records for IFPRI to CGSpace
Read more →

April, 2023

2023-04-02

  • Run all system updates on CGSpace and reboot it
  • I exported CGSpace to CSV to check for any missing Initiative collection mappings
    • I also did a check for missing country/region mappings with csv-metadata-quality
  • Start a harvest on AReS
Read more →

March, 2023

2023-03-01

  • Remove cg.subject.wle and cg.identifier.wletheme from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)
  • iso-codes 4.13.0 was released, which incorporates my changes to the common names for Iran, Laos, and Syria
  • I finally got through with porting the input form from DSpace 6 to DSpace 7
Read more →

February, 2023

2023-02-01

  • Export CGSpace to cross check the DOI metadata with Crossref
    • I want to try to expand my use of their data to journals, publishers, volumes, issues, etc…
Read more →