2023-10-02
- Export CGSpace to check DOIs against Crossref
- I found that Crossref’s metadata is in the public domain under the CC0 license
- One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive
- We can be on the safe side by using only abstracts for items that are licensed under Creative Commons
- This GREL extracts the text content of the
<jats:p>
tags (ie, no other JATS XML markup tags like <jats:i>
, <jats:sub>
, etc):
forEach(value.parseXml().select("jats|p"),i,i.xmlText()).join("")
- Note that we need to use
select("jats|p")
instead of select("jats:p")
for OpenRefine’s parseXml, and we need to join()
on the end
- I updated metadata for about 3,000 items using Crossref metadata
- I stripped trailing periods for titles where they were missing on the Crossref titles
- I copied abstracts for about 600 items that were missing them, for items that were Creative Commons
- I updated publishers for a few thousand more where ours and Crossref disagreed, checking a handful manually first
- I also added subjects to the
crossref_doi_lookup.py
script to see if they will be useful for us
- When checking with csv-metadata-quality I can validate those subjects against AGROVOC and add them if they are valid
2023-10-03
- I added the item type to the collection subscription email on DSpace 6
- It’s done differently on DSpace 7 so I’ll have to see how to do it there…
- Test a patch that fixes a bug with item versioning disabled in DSpace 7
- I hadn’t realized that DSpace 7 defaulted to versioning being enabled, whereas we never used this in DSpace 6 (yet)
- Submit an issue regarding duplicate Discovery sort fields in DSpace 7
2023-10-05
- Some discussion this week about issue and online dates for journal articles, with regards to PRMS
- I looked more closely at the Crossref API docs and realized (again) that their “issue” date is not the same as our issue date—they take the earlier of the print and online dates!
- Also, very many items have no print date at all, perhaps due to delays, errors, or simply because the journal is “online only”!
- I suggested again that PRMS should consider both, and take the earlier of the two, then make sure whether the date is in the current reporting period
- I managed to find 80 items with print publishing dates from 2023 and updated those from Crossref, but for the rest we will have to think about how we handle them