mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2024-05-20
This commit is contained in:
@ -83,4 +83,27 @@ $ csvgrep -c 'dc.description.provenance[en_US]' -m 'CONTENTdm' cgspace.csv \
|
||||
https://dspace7test.ilri.org/server/api/pid/find?id=10568/118424
|
||||
```
|
||||
|
||||
## 2024-05-15
|
||||
|
||||
- I got journal titles for 2,900 journal articles that were missing them from Crossref
|
||||
|
||||
## 2024-05-16
|
||||
|
||||
Helping IFPRI with some DSpace 7 API support, these are two queries for items issued in 2024:
|
||||
- https://dspace7test.ilri.org/server/api/discover/search/objects?query=dcterms.issued:2024
|
||||
- https://dspace7test.ilri.org/server/api/discover/search/objects?query=dcterms.issued_dt%3A%5B2024-01-01T00%3A00%3A00Z%20TO%20%2A%5D — note the Lucene search syntax is URL encoded version of `:[2024-01-01T00:00:00Z TO *]`
|
||||
|
||||
Both of them return the same number of results and seem identitical as far as I can see, but the second one uses Solr date indexes and requires the full Lucene datetime and range syntax
|
||||
|
||||
I wrote a new version of the `check_duplicates.py` script to help identify duplicates with different types
|
||||
- Initially I called it `check_duplicates_fast.py` but it's actually not faster
|
||||
- I need to find a way to deal with duplicates from IFPRI's repository because there are some mismatched types...
|
||||
|
||||
## 2024-05-20
|
||||
|
||||
Continue working through alternative duplicate matching for IFPRI
|
||||
- Their item types are sometimes different than ours...
|
||||
- One thing I think I can say for sure is that the default similarity factor in my script is 0.6, and I rarely see legitimate duplicates with such similarity so I might increase this to 0.7 to reduce the number of items I have to check
|
||||
- Also, the difference in issue dates is currently 365, but I should reduce that a bit, perhaps to 270 days (9 months)
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user