mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 06:35:03 +01:00
Add notes for 2020-07-15
This commit is contained in:
parent
b143ab3e5b
commit
49d08e2db9
@ -475,4 +475,69 @@ $ psql -d dspace -U dspace -c 'update bundle set primary_bitstream_id=NULL where
|
||||
UPDATE 1
|
||||
```
|
||||
|
||||
- Udana from WLE asked me about some items that didn't show Altmetric donuts
|
||||
- I checked his list and at least three of them actually *did* show donuts, and for four others I tweeted them manually to see if they would get a donut in a few hours:
|
||||
- https://hdl.handle.net/10568/108477
|
||||
- https://hdl.handle.net/10568/108475
|
||||
- https://hdl.handle.net/10568/108361
|
||||
- https://hdl.handle.net/10568/108360
|
||||
|
||||
## 2020-07-15
|
||||
|
||||
- All four IWMI items that I tweeted yesterday have Altmetric donuts with a score of 1 now...
|
||||
- Export CGSpace countries to check them against ISO 3166-1 and ISO 3166-3 (historic countries):
|
||||
|
||||
```
|
||||
dspace=# \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=228) TO /tmp/2020-07-15-countries.csv;
|
||||
COPY 194
|
||||
```
|
||||
|
||||
- I wrote a script `iso3166-lookup.py` to check them:
|
||||
|
||||
```
|
||||
$ ./iso3166-1-lookup.py -i /tmp/2020-07-15-countries.csv -o /tmp/2020-07-15-countries-resolved.csv
|
||||
$ csvgrep -c matched -m false /tmp/2020-07-15-countries-resolved.csv
|
||||
country,match type,matched
|
||||
CAPE VERDE,,false
|
||||
"KOREA, REPUBLIC",,false
|
||||
PALESTINE,,false
|
||||
"CONGO, DR",,false
|
||||
COTE D'IVOIRE,,false
|
||||
RUSSIA,,false
|
||||
SYRIA,,false
|
||||
"KOREA, DPR",,false
|
||||
SWAZILAND,,false
|
||||
MICRONESIA,,false
|
||||
TIBET,,false
|
||||
ZAIRE,,false
|
||||
COCOS ISLANDS,,false
|
||||
LAOS,,false
|
||||
IRAN,,false
|
||||
```
|
||||
|
||||
- Check the database for DOIs that are not in the preferred "https://doi.org/" format:
|
||||
|
||||
```
|
||||
dspace=# \COPY (SELECT text_value as "cg.identifier.doi" FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=220 AND text_value NOT LIKE 'https://doi.org/%') TO /tmp/2020-07-15-doi.csv WITH CSV HEADER;
|
||||
COPY 186
|
||||
```
|
||||
|
||||
- Then I imported them into OpenRefine and replaced them in a new "correct" column using this GREL transform:
|
||||
|
||||
```
|
||||
value.replace("dx.doi.org", "doi.org").replace("http://", "https://").replace("https://dx,doi,org", "https://doi.org").replace("https://doi.dx.org", "https://doi.org").replace("https://dx.doi:", "https://doi.org").replace("DOI: ", "https://doi.org/").replace("doi: ", "https://doi.org/").replace("http://dx.doi.org", "https://doi.org").replace("https://dx. doi.org. ", "https://doi.org").replace("https://dx.doi", "https://doi.org").replace("https://dx.doi:", "https://doi.org/").replace("hdl.handle.net", "doi.org")
|
||||
```
|
||||
|
||||
- Then I fixed the DOIs on CGSpace:
|
||||
|
||||
```
|
||||
$ ./fix-metadata-values.py -i /tmp/2020-07-15-fix-164-DOIs.csv -db dspace -u dspace -p 'fuuu' -f cg.identifier.doi -t 'correct' -m 220
|
||||
```
|
||||
|
||||
- I filed [an issue on Debian's iso-codes](https://salsa.debian.org/iso-codes-team/iso-codes/-/issues/10) project to ask why "Swaziland" does not appear in the ISO 3166-3 list of historical country names despite it being changed to "Eswatini" in 2018.
|
||||
- Atmire responded about the Solr issue
|
||||
- They said that it seems like a DSpace issue so that it's not their responsibility, and nobody responded to my question on the dspace-tech mailing list...
|
||||
- I said I would try to do a migration on DSpace Test with more of CGSpace's Solr data to try and approximate how much of our data be affected
|
||||
- I also asked them about the Tomcat 8.5 issue with CUA as well as the CUA group name issue that I had asked originally in April
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -20,7 +20,7 @@ Since I was restarting Tomcat anyways I decided to redeploy the latest changes f
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-07/" />
|
||||
<meta property="article:published_time" content="2020-07-01T10:53:54+03:00" />
|
||||
<meta property="article:modified_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="article:modified_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="July, 2020"/>
|
||||
@ -45,9 +45,9 @@ Since I was restarting Tomcat anyways I decided to redeploy the latest changes f
|
||||
"@type": "BlogPosting",
|
||||
"headline": "July, 2020",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2020-07/",
|
||||
"wordCount": "2991",
|
||||
"wordCount": "3347",
|
||||
"datePublished": "2020-07-01T10:53:54+03:00",
|
||||
"dateModified": "2020-07-13T12:31:34+03:00",
|
||||
"dateModified": "2020-07-14T10:57:49+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -594,7 +594,72 @@ $ tidy -xml -utf8 -m -iq -w 0 dspace/config/controlled-vocabularies/dc-descripti
|
||||
</ul>
|
||||
<pre><code>$ psql -d dspace -U dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (189618, 188837);'
|
||||
UPDATE 1
|
||||
</code></pre><!-- raw HTML omitted -->
|
||||
</code></pre><ul>
|
||||
<li>Udana from WLE asked me about some items that didn’t show Altmetric donuts
|
||||
<ul>
|
||||
<li>I checked his list and at least three of them actually <em>did</em> show donuts, and for four others I tweeted them manually to see if they would get a donut in a few hours:
|
||||
<ul>
|
||||
<li><a href="https://hdl.handle.net/10568/108477">https://hdl.handle.net/10568/108477</a></li>
|
||||
<li><a href="https://hdl.handle.net/10568/108475">https://hdl.handle.net/10568/108475</a></li>
|
||||
<li><a href="https://hdl.handle.net/10568/108361">https://hdl.handle.net/10568/108361</a></li>
|
||||
<li><a href="https://hdl.handle.net/10568/108360">https://hdl.handle.net/10568/108360</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="2020-07-15">2020-07-15</h2>
|
||||
<ul>
|
||||
<li>All four IWMI items that I tweeted yesterday have Altmetric donuts with a score of 1 now…</li>
|
||||
<li>Export CGSpace countries to check them against ISO 3166-1 and ISO 3166-3 (historic countries):</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=228) TO /tmp/2020-07-15-countries.csv;
|
||||
COPY 194
|
||||
</code></pre><ul>
|
||||
<li>I wrote a script <code>iso3166-lookup.py</code> to check them:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./iso3166-1-lookup.py -i /tmp/2020-07-15-countries.csv -o /tmp/2020-07-15-countries-resolved.csv
|
||||
$ csvgrep -c matched -m false /tmp/2020-07-15-countries-resolved.csv
|
||||
country,match type,matched
|
||||
CAPE VERDE,,false
|
||||
"KOREA, REPUBLIC",,false
|
||||
PALESTINE,,false
|
||||
"CONGO, DR",,false
|
||||
COTE D'IVOIRE,,false
|
||||
RUSSIA,,false
|
||||
SYRIA,,false
|
||||
"KOREA, DPR",,false
|
||||
SWAZILAND,,false
|
||||
MICRONESIA,,false
|
||||
TIBET,,false
|
||||
ZAIRE,,false
|
||||
COCOS ISLANDS,,false
|
||||
LAOS,,false
|
||||
IRAN,,false
|
||||
</code></pre><ul>
|
||||
<li>Check the database for DOIs that are not in the preferred “<a href="https://doi.org/%22">https://doi.org/"</a> format:</li>
|
||||
</ul>
|
||||
<pre><code>dspace=# \COPY (SELECT text_value as "cg.identifier.doi" FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=220 AND text_value NOT LIKE 'https://doi.org/%') TO /tmp/2020-07-15-doi.csv WITH CSV HEADER;
|
||||
COPY 186
|
||||
</code></pre><ul>
|
||||
<li>Then I imported them into OpenRefine and replaced them in a new “correct” column using this GREL transform:</li>
|
||||
</ul>
|
||||
<pre><code>value.replace("dx.doi.org", "doi.org").replace("http://", "https://").replace("https://dx,doi,org", "https://doi.org").replace("https://doi.dx.org", "https://doi.org").replace("https://dx.doi:", "https://doi.org").replace("DOI: ", "https://doi.org/").replace("doi: ", "https://doi.org/").replace("http://dx.doi.org", "https://doi.org").replace("https://dx. doi.org. ", "https://doi.org").replace("https://dx.doi", "https://doi.org").replace("https://dx.doi:", "https://doi.org/").replace("hdl.handle.net", "doi.org")
|
||||
</code></pre><ul>
|
||||
<li>Then I fixed the DOIs on CGSpace:</li>
|
||||
</ul>
|
||||
<pre><code>$ ./fix-metadata-values.py -i /tmp/2020-07-15-fix-164-DOIs.csv -db dspace -u dspace -p 'fuuu' -f cg.identifier.doi -t 'correct' -m 220
|
||||
</code></pre><ul>
|
||||
<li>I filed <a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/issues/10">an issue on Debian’s iso-codes</a> project to ask why “Swaziland” does not appear in the ISO 3166-3 list of historical country names despite it being changed to “Eswatini” in 2018.</li>
|
||||
<li>Atmire responded about the Solr issue
|
||||
<ul>
|
||||
<li>They said that it seems like a DSpace issue so that it’s not their responsibility, and nobody responded to my question on the dspace-tech mailing list…</li>
|
||||
<li>I said I would try to do a migration on DSpace Test with more of CGSpace’s Solr data to try and approximate how much of our data be affected</li>
|
||||
<li>I also asked them about the Tomcat 8.5 issue with CUA as well as the CUA group name issue that I had asked originally in April</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-07-13T12:31:34+03:00" />
|
||||
<meta property="og:updated_time" content="2020-07-14T10:57:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -4,27 +4,27 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
|
||||
<lastmod>2020-07-13T12:31:34+03:00</lastmod>
|
||||
<lastmod>2020-07-14T10:57:49+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2020-07-13T12:31:34+03:00</lastmod>
|
||||
<lastmod>2020-07-14T10:57:49+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2020-07/</loc>
|
||||
<lastmod>2020-07-13T12:31:34+03:00</lastmod>
|
||||
<lastmod>2020-07-14T10:57:49+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2020-07-13T12:31:34+03:00</lastmod>
|
||||
<lastmod>2020-07-14T10:57:49+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2020-07-13T12:31:34+03:00</lastmod>
|
||||
<lastmod>2020-07-14T10:57:49+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
|
Loading…
Reference in New Issue
Block a user