Update notes for 2019-06-10

This commit is contained in:
Alan Orth 2019-06-11 02:36:19 +03:00
parent a83e238541
commit 63f94aa5cb
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
2 changed files with 29 additions and 5 deletions

View File

@ -65,6 +65,7 @@ statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.Sol
- Correct some inconsistencies in IITA subjects
- Correct two incorrect "Peer Review" in `dc.description.version`
- About fifteen items have incorrect ISBNs (looks like an Excel error because the values look like scientific numbers)
- Delete one blank item
- I managed to get to subjects, so I'll continue from there when I start working next
- Generate a new list of countries from the database for use with reconcile-csv
- After dumping, use csvcut to add line numbers, then change the csv header to match those you use in reconcile-csv, for example `id` and `name`:
@ -75,4 +76,27 @@ COPY 192
$ csvcut -l -c 0 /tmp/countries.csv > 2019-06-10-countries.csv
```
- Get a list of all the unique AGROVOC subject terms in IITA's data and export it to a text file so I can validate them with my `agrovoc-lookup.py` script:
```
$ csvcut -c dc.subject ~/Downloads/2019-06-10-IITA-20194th-Round-2.csv| sed 's/||/\n/g' | grep -v dc.subject | sort -u > iita-agrovoc.txt
$ ./agrovoc-lookup.py -i iita-agrovoc.txt -om iita-agrovoc-matches.txt -or iita-agrovoc-rejects.txt
$ wc -l iita-agrovoc*
402 iita-agrovoc-matches.txt
29 iita-agrovoc-rejects.txt
431 iita-agrovoc.txt
```
- Combine these IITA matches with the subjects I matched a few months ago:
```
$ csvcut -c name 2019-03-18-subjects-matched.csv | grep -v name | cat - iita-agrovoc-matches.txt | sort -u > 2019-06-10-subjects-matched.txt
```
- Then make a new list to use with reconcile-csv by adding line numbers with csvcut and changing the line number header to `id`:
```
$ csvcut -c name -l 2019-06-10-subjects-matched.txt | sed 's/line_number/id/' > 2019-06-10-subjects-matched.csv
```
<!-- vim: set sw=2 ts=2: -->

View File

@ -4,30 +4,30 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
<lastmod>2019-06-10T17:17:58+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2019-05/</loc>
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
<lastmod>2019-06-10T17:17:58+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
<lastmod>2019-06-10T17:17:58+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
<lastmod>2019-06-10T17:17:58+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
<lastmod>2019-06-10T17:17:58+03:00</lastmod>
<priority>0</priority>
</url>