mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-22 21:22:19 +01:00
Update notes for 2019-06-10
This commit is contained in:
parent
a83e238541
commit
63f94aa5cb
@ -65,6 +65,7 @@ statistics-2018: org.apache.solr.common.SolrException:org.apache.solr.common.Sol
|
||||
- Correct some inconsistencies in IITA subjects
|
||||
- Correct two incorrect "Peer Review" in `dc.description.version`
|
||||
- About fifteen items have incorrect ISBNs (looks like an Excel error because the values look like scientific numbers)
|
||||
- Delete one blank item
|
||||
- I managed to get to subjects, so I'll continue from there when I start working next
|
||||
- Generate a new list of countries from the database for use with reconcile-csv
|
||||
- After dumping, use csvcut to add line numbers, then change the csv header to match those you use in reconcile-csv, for example `id` and `name`:
|
||||
@ -75,4 +76,27 @@ COPY 192
|
||||
$ csvcut -l -c 0 /tmp/countries.csv > 2019-06-10-countries.csv
|
||||
```
|
||||
|
||||
- Get a list of all the unique AGROVOC subject terms in IITA's data and export it to a text file so I can validate them with my `agrovoc-lookup.py` script:
|
||||
|
||||
```
|
||||
$ csvcut -c dc.subject ~/Downloads/2019-06-10-IITA-20194th-Round-2.csv| sed 's/||/\n/g' | grep -v dc.subject | sort -u > iita-agrovoc.txt
|
||||
$ ./agrovoc-lookup.py -i iita-agrovoc.txt -om iita-agrovoc-matches.txt -or iita-agrovoc-rejects.txt
|
||||
$ wc -l iita-agrovoc*
|
||||
402 iita-agrovoc-matches.txt
|
||||
29 iita-agrovoc-rejects.txt
|
||||
431 iita-agrovoc.txt
|
||||
```
|
||||
|
||||
- Combine these IITA matches with the subjects I matched a few months ago:
|
||||
|
||||
```
|
||||
$ csvcut -c name 2019-03-18-subjects-matched.csv | grep -v name | cat - iita-agrovoc-matches.txt | sort -u > 2019-06-10-subjects-matched.txt
|
||||
```
|
||||
|
||||
- Then make a new list to use with reconcile-csv by adding line numbers with csvcut and changing the line number header to `id`:
|
||||
|
||||
```
|
||||
$ csvcut -c name -l 2019-06-10-subjects-matched.txt | sed 's/line_number/id/' > 2019-06-10-subjects-matched.csv
|
||||
```
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -4,30 +4,30 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
|
||||
<lastmod>2019-06-10T17:17:58+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2019-05/</loc>
|
||||
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
|
||||
<lastmod>2019-06-10T17:17:58+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
|
||||
<lastmod>2019-06-10T17:17:58+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
|
||||
<lastmod>2019-06-10T17:17:58+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2019-06-10T12:58:33+03:00</lastmod>
|
||||
<lastmod>2019-06-10T17:17:58+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user