Add notes for 2020-07-15

This commit is contained in:
2020-07-15 15:42:23 +03:00
parent b143ab3e5b
commit 49d08e2db9
20 changed files with 156 additions and 26 deletions

View File

@ -475,4 +475,69 @@ $ psql -d dspace -U dspace -c 'update bundle set primary_bitstream_id=NULL where
UPDATE 1
```
- Udana from WLE asked me about some items that didn't show Altmetric donuts
- I checked his list and at least three of them actually *did* show donuts, and for four others I tweeted them manually to see if they would get a donut in a few hours:
- https://hdl.handle.net/10568/108477
- https://hdl.handle.net/10568/108475
- https://hdl.handle.net/10568/108361
- https://hdl.handle.net/10568/108360
## 2020-07-15
- All four IWMI items that I tweeted yesterday have Altmetric donuts with a score of 1 now...
- Export CGSpace countries to check them against ISO 3166-1 and ISO 3166-3 (historic countries):
```
dspace=# \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=228) TO /tmp/2020-07-15-countries.csv;
COPY 194
```
- I wrote a script `iso3166-lookup.py` to check them:
```
$ ./iso3166-1-lookup.py -i /tmp/2020-07-15-countries.csv -o /tmp/2020-07-15-countries-resolved.csv
$ csvgrep -c matched -m false /tmp/2020-07-15-countries-resolved.csv
country,match type,matched
CAPE VERDE,,false
"KOREA, REPUBLIC",,false
PALESTINE,,false
"CONGO, DR",,false
COTE D'IVOIRE,,false
RUSSIA,,false
SYRIA,,false
"KOREA, DPR",,false
SWAZILAND,,false
MICRONESIA,,false
TIBET,,false
ZAIRE,,false
COCOS ISLANDS,,false
LAOS,,false
IRAN,,false
```
- Check the database for DOIs that are not in the preferred "https://doi.org/" format:
```
dspace=# \COPY (SELECT text_value as "cg.identifier.doi" FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=220 AND text_value NOT LIKE 'https://doi.org/%') TO /tmp/2020-07-15-doi.csv WITH CSV HEADER;
COPY 186
```
- Then I imported them into OpenRefine and replaced them in a new "correct" column using this GREL transform:
```
value.replace("dx.doi.org", "doi.org").replace("http://", "https://").replace("https://dx,doi,org", "https://doi.org").replace("https://doi.dx.org", "https://doi.org").replace("https://dx.doi:", "https://doi.org").replace("DOI: ", "https://doi.org/").replace("doi: ", "https://doi.org/").replace("http://dx.doi.org", "https://doi.org").replace("https://dx. doi.org. ", "https://doi.org").replace("https://dx.doi", "https://doi.org").replace("https://dx.doi:", "https://doi.org/").replace("hdl.handle.net", "doi.org")
```
- Then I fixed the DOIs on CGSpace:
```
$ ./fix-metadata-values.py -i /tmp/2020-07-15-fix-164-DOIs.csv -db dspace -u dspace -p 'fuuu' -f cg.identifier.doi -t 'correct' -m 220
```
- I filed [an issue on Debian's iso-codes](https://salsa.debian.org/iso-codes-team/iso-codes/-/issues/10) project to ask why "Swaziland" does not appear in the ISO 3166-3 list of historical country names despite it being changed to "Eswatini" in 2018.
- Atmire responded about the Solr issue
- They said that it seems like a DSpace issue so that it's not their responsibility, and nobody responded to my question on the dspace-tech mailing list...
- I said I would try to do a migration on DSpace Test with more of CGSpace's Solr data to try and approximate how much of our data be affected
- I also asked them about the Tomcat 8.5 issue with CUA as well as the CUA group name issue that I had asked originally in April
<!-- vim: set sw=2 ts=2: -->