mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2023-02-22
This commit is contained in:
@ -328,4 +328,41 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: missing re
|
||||
2023 200
|
||||
```
|
||||
|
||||
- Start reviewing and fixing metadata for Sam's ~250 CAS publications from last year
|
||||
- Both Abenet and Peter have already looked at them and Sam has been waiting for months on this
|
||||
|
||||
## 2023-02-22
|
||||
|
||||
- Continue proofing CAS records for Sam
|
||||
- I downloaded all the PDFs manually and checked the issue dates for each from the PDF, noting some that had licenses, ISBNs, etc
|
||||
- I combined the title, abstract, and system subjects into one column to mine them for AGROVOC terms:
|
||||
|
||||
```console
|
||||
toLowercase(value) + toLowercase(cells["dcterms.abstract"].value) + toLowercase(cells["cg.subject.system"].value.replace("||", " "))
|
||||
```
|
||||
|
||||
- Then I extracted a list of AGROVOC terms the same way I did in [August, 2022]({{< relref "2022-08.md" >}}) and used this Jython code to extract matching terms:
|
||||
|
||||
```python
|
||||
import re
|
||||
|
||||
with open(r"/tmp/agrovoc-subjects.txt",'r') as f :
|
||||
terms = [name.rstrip().lower() for name in f]
|
||||
|
||||
return "||".join([term for term in terms if re.match(r".*\b" + term + r"\b.*", value.lower())])
|
||||
```
|
||||
|
||||
- Then I used [this cool Jython to remove duplicate metadata values](https://stackoverflow.com/questions/15419080/openrefine-remove-duplicates-from-list-with-jython):
|
||||
|
||||
```python
|
||||
deduped_list = list(set(value.split("||")))
|
||||
return '||'.join(map(str, deduped_list))
|
||||
```
|
||||
|
||||
- Then I did the same with countries, woooooo!
|
||||
- I checked for duplicates and found forty-one
|
||||
- I just stumbled upon UNTERM, which provides the official list of countries for the UN General Assembly, including a downloadable Excel with the short and formal names in all UN languages: https://unterm.un.org/unterm2/en/country
|
||||
- I created a [pull request to add common names for Iran, Laos, and Syria on the Debian iso-codes package](https://salsa.debian.org/iso-codes-team/iso-codes/-/merge_requests/32)
|
||||
- These are remarked upon in the ISO.org online browsing platform for ISO 3166-1
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user