mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2023-06-08
This commit is contained in:
@ -50,4 +50,15 @@ $ ./ilri/update_orcids.py -i /tmp/2023-06-06-orcids-names.txt -db dspacetest -u
|
||||
|
||||
- Start working on updating the MODS schema in CGSpace from 3.1 to 3.8 based on Stefano and Salem's work last year
|
||||
|
||||
## 2023-06-08
|
||||
|
||||
- Continue working on the MODS schema mapping
|
||||
- Export CGSpace to check and update `dcterms.extent` fields
|
||||
- I normalized about 1,500 to use either "p. 1-6" or "5 p." format
|
||||
- Also, I used this GREL expression to extract missing pages from the citation field: `cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*(pp?\.\s?\d+[-–]\d+).*/)[0]`
|
||||
- This was over 4,000 items with a format like "p. 1-6" and "pp. 1-6" in the citation
|
||||
- I used another GREL expression to extract another 5,000: `cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*?(\d+\s+?[Pp]+\.).*/)[0]`
|
||||
- This was for the format like "1 p." (note we had to protect against the greedy `.*` in the beginning)
|
||||
- I also did some work to capture a handful of missing DOIs and ISSNs, but it was only about 100 items and I will have to wait until the 10,000+ above finish importing
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user