CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

July, 2024

2024-07-01

  • A bit of work to clean up duplicate DOIs on CGSpace
    • A handful of book chapters, working papers, and journal articles using the wrong DOI
  • I tried to delete all users who have been inactive since six years ago (July 1, 2018):
$ dspace dsrun org.dspace.eperson.Groomer -a -b 07/01/2018 -d

2024-07-11

  • Minor fixes to normalize the IFPRI CONTENTdm URLs in provenance fields:
dspace=# BEGIN;
BEGIN
dspace=*# UPDATE metadatavalue SET text_value = replace(text_value, 'cdm/ref', 'digital') WHERE text_value LIKE '%CONTENTdm%cdm/ref/%';
UPDATE 1876
dspace=*# UPDATE metadatavalue SET text_value = replace(text_value, 'CONTENTdm:  ', 'CONTENTdm: ') WHERE text_value LIKE '%CONTENTdm:  %';
UPDATE 21
dspace=*# COMMIT;
COMMIT
  • Then export a new list of CONTENTdm redirects, excluding withdrawn items:
dspace= ☘ \COPY (SELECT m.text_value, h.handle FROM metadatavalue m JOIN handle h on m.dspace_object_id = h.resource_id WHERE m.metadata_field_id=28 AND m.text_value LIKE '%URL from IFPRI CONTENTdm%' AND h.resource_type_id=2 AND m.dspace_object_id IN (SELECT uuid FROM item WHERE in_archive AND NOT withdrawn)) to /tmp/ifpri.csv CSV HEADER;
COPY 8568
  • Similarly, get a list of withdrawn item redirects:
dspace= ☘ \COPY (SELECT m.text_value AS handle_from, h.handle AS handle_to FROM metadatavalue m JOIN handle h on m.dspace_object_id = h.resource_id WHERE m.metadata_field_
id=181 AND h.resource_type_id=2 AND h.resource_id IN (SELECT uuid FROM item WHERE in_archive AND NOT withdrawn)) to /tmp/handle-redirects.csv CSV HEADER;
COPY 396

2024-07-18

  • I experimented with adding a regular expression to validate DOIs to the submission form