2023-03-01
- Remove
cg.subject.wle
and cg.identifier.wletheme
from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)
- iso-codes 4.13.0 was released, which incorporates my changes to the common names for Iran, Laos, and Syria
- I finally got through with porting the input form from DSpace 6 to DSpace 7
Read more →
2023-02-01
- Export CGSpace to cross check the DOI metadata with Crossref
- I want to try to expand my use of their data to journals, publishers, volumes, issues, etc…
Read more →
2023-01-01
- Apply some more ORCID identifiers to items on CGSpace using my
2022-09-22-add-orcids.csv
file
- I want to update all ORCID names and refresh them in the database
- I see we have some new ones that aren’t in our list if I combine with this file:
Read more →
2022-12-01
- Fix some incorrect regions on CGSpace
- I exported the CCAFS and IITA communities, extracted just the country and region columns, then ran them through csv-metadata-quality to fix the regions
- Add a few more authors to my CSV with author names and ORCID identifiers and tag 283 items!
- Replace “East Asia” with “Eastern Asia” region on CGSpace (UN M.49 region)
Read more →
2022-11-01
- Last night I re-synced DSpace 7 Test from CGSpace
- I also updated all my local
7_x-dev
branches on the latest upstreams
- I spent some time updating the authorizations in Alliance collections
- I want to make sure they use groups instead of individuals where possible!
- I reverted the Cocoon autosave change because it was more of a nuissance that Peter can’t upload CSVs from the web interface and is a very low severity security issue
Read more →
2022-10-01
- Start a harvest on AReS last night
- Yesterday I realized how to use GraphicsMagick with im4java and I want to re-visit some of my thumbnail tests
Read more →
2022-09-01
- A bit of work on the “Mapping CG Core–CGSpace–MEL–MARLO Types” spreadsheet
- I tested an item submission on DSpace Test with the Cocoon
org.apache.cocoon.uploads.autosave=false
change
- The submission works as expected
- Start debugging some region-related issues with csv-metadata-quality
- I created a new test file
test-geography.csv
with some different scenarios
- I also fixed a few bugs and improved the region-matching logic
Read more →
2022-08-01
Read more →
2022-07-02
- I learned how to use the Levenshtein functions in PostgreSQL
- The thing is that there is a limit of 255 characters for these functions in PostgreSQL so you need to truncate the strings before comparing
- Also, the trgm functions I’ve used before are case insensitive, but Levenshtein is not, so you need to make sure to lower case both strings first
Read more →
2022-06-06
- Look at the Solr statistics on CGSpace
- I see 167,000 hits from a bunch of Microsoft IPs with reverse DNS “msnbot-” using the Solr query
dns:*msnbot* AND dns:*.msn.com
- I purged these first so I could see the other “real” IPs in the Solr facets
- I see 47,500 hits from 80.248.237.167 on a data center ISP in Sweden, using a normal user agent
- I see 13,000 hits from 163.237.216.11 on a data center ISP in Australia, using a normal user agent
- I see 7,300 hits from 208.185.238.57 from Britanica, using a normal user agent
- There seem to be many more of these:
Read more →