Update notes for 2019-06-10

This commit is contained in:
Alan Orth 2019-06-10 17:17:58 +03:00
parent 70fd7c4ac7
commit a83e238541
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
2 changed files with 23 additions and 5 deletions

View File

@ -56,5 +56,23 @@ statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.Sol
- Rename the AReS repository on GitHub to OpenRXV: https://github.com/ilri/OpenRXV
- Create a new AReS repository: https://github.com/ilri/AReS
- Start looking at the 203 IITA records on DSpace Test from last month ([IITA_May_16](https://dspacetest.cgiar.org/handle/10568/102032) aka "20194th.xls") using OpenRefine
- Trim leading, trailing, and consecutive whitespace on all columns, but I didn't notice very many issues
- Validate affiliations against latest list of top 1500 terms using reconcile-csv, correcting and standardizing about twenty-seven
- Validate countries against latest list of countries using reconcile-csv, correcting three
- Convert all DOIs to "https://dx.doi.org" format
- Normalize all `cg.identifier.url` Google book fields to "books.google.com"
- Correct some inconsistencies in IITA subjects
- Correct two incorrect "Peer Review" in `dc.description.version`
- About fifteen items have incorrect ISBNs (looks like an Excel error because the values look like scientific numbers)
- I managed to get to subjects, so I'll continue from there when I start working next
- Generate a new list of countries from the database for use with reconcile-csv
- After dumping, use csvcut to add line numbers, then change the csv header to match those you use in reconcile-csv, for example `id` and `name`:
```
dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 228 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC) to /tmp/countries.csv WITH CSV HEADER
COPY 192
$ csvcut -l -c 0 /tmp/countries.csv > 2019-06-10-countries.csv
```
<!-- vim: set sw=2 ts=2: -->

View File

@ -4,30 +4,30 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2019-06-07T15:01:38+03:00</lastmod>
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2019-05/</loc>
<lastmod>2019-06-07T15:01:38+03:00</lastmod>
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2019-06-07T15:01:38+03:00</lastmod>
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2019-06-07T15:01:38+03:00</lastmod>
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2019-06-07T15:01:38+03:00</lastmod>
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
<priority>0</priority>
</url>