Add notes for 2023-02-22

This commit is contained in:
2023-02-22 21:37:12 +03:00
parent ba6f826201
commit 2e80702de4
32 changed files with 121 additions and 43 deletions

View File

@ -328,4 +328,41 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: missing re
2023 200
```
- Start reviewing and fixing metadata for Sam's ~250 CAS publications from last year
- Both Abenet and Peter have already looked at them and Sam has been waiting for months on this
## 2023-02-22
- Continue proofing CAS records for Sam
- I downloaded all the PDFs manually and checked the issue dates for each from the PDF, noting some that had licenses, ISBNs, etc
- I combined the title, abstract, and system subjects into one column to mine them for AGROVOC terms:
```console
toLowercase(value) + toLowercase(cells["dcterms.abstract"].value) + toLowercase(cells["cg.subject.system"].value.replace("||", " "))
```
- Then I extracted a list of AGROVOC terms the same way I did in [August, 2022]({{< relref "2022-08.md" >}}) and used this Jython code to extract matching terms:
```python
import re
with open(r"/tmp/agrovoc-subjects.txt",'r') as f :
terms = [name.rstrip().lower() for name in f]
return "||".join([term for term in terms if re.match(r".*\b" + term + r"\b.*", value.lower())])
```
- Then I used [this cool Jython to remove duplicate metadata values](https://stackoverflow.com/questions/15419080/openrefine-remove-duplicates-from-list-with-jython):
```python
deduped_list = list(set(value.split("||")))
return '||'.join(map(str, deduped_list))
```
- Then I did the same with countries, woooooo!
- I checked for duplicates and found forty-one
- I just stumbled upon UNTERM, which provides the official list of countries for the UN General Assembly, including a downloadable Excel with the short and formal names in all UN languages: https://unterm.un.org/unterm2/en/country
- I created a [pull request to add common names for Iran, Laos, and Syria on the Debian iso-codes package](https://salsa.debian.org/iso-codes-team/iso-codes/-/merge_requests/32)
- These are remarked upon in the ISO.org online browsing platform for ISO 3166-1
<!-- vim: set sw=2 ts=2: -->