mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-09-30 06:04:16 +02:00
64 lines
2.9 KiB
Markdown
64 lines
2.9 KiB
Markdown
---
|
|
title: "March, 2023"
|
|
date: 2023-03-01T07:58:36+03:00
|
|
author: "Alan Orth"
|
|
categories: ["Notes"]
|
|
---
|
|
|
|
## 2023-03-01
|
|
|
|
- Remove `cg.subject.wle` and `cg.identifier.wletheme` from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)
|
|
- [iso-codes 4.13.0 was released](https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28), which incorporates my changes to the common names for Iran, Laos, and Syria
|
|
- I finally got through with porting the input form from DSpace 6 to DSpace 7
|
|
|
|
<!--more-->
|
|
|
|
- I can't put my finger on it, but the input form has to be formatted very particularly, for example if your rows have more than two fields in them with out a sufficient Bootstrap grid style, or if you use a `twobox`, etc, the entire form step appears blank
|
|
|
|
## 2023-03-02
|
|
|
|
- I did some experiments with the new [Pandas 2.0.0rc0 Apache Arrow support](https://datapythonista.me/blog/pandas-20-and-the-arrow-revolution-part-i)
|
|
- There is a change to the way nulls are handled and it causes my tests for `pd.isna(field)` to fail
|
|
- I think we need consider blanks as null, but I'm not sure
|
|
- I made some adjustments to the Discovery sidebar facets on DSpace 6 while I was looking at the DSpace 7 configuration
|
|
- I downgraded CIFOR subject, Humidtropics subject, Drylands subject, ICARDA subject, and Language from DiscoverySearchFilterFacet to DiscoverySearchFilter in `discovery.xml` since we are no longer using them in sidebar facets
|
|
|
|
## 2023-03-03
|
|
|
|
- Atmire merged one of my old pull requests into COUNTER-Robots:
|
|
- [COUNTER_Robots_list.json: Add new bots](https://github.com/atmire/COUNTER-Robots/pull/54)
|
|
- I will update the local ILRI overrides in our DSpace spider agents file
|
|
|
|
## 2023-03-04
|
|
|
|
- Submit a [pull request on pycountry to use iso-codes 4.13.0](https://github.com/flyingcircusio/pycountry/pull/156)
|
|
|
|
## 2023-03-05
|
|
|
|
- Start a harvest on AReS
|
|
|
|
## 2023-03-06
|
|
|
|
- Export CGSpace to do Initiative collection mappings
|
|
- There were thirty-three that needed updating
|
|
- Send Abenet and Sam a list of twenty-one CAS publications that had been marked as "multiple documents" that we uploaded as metadata-only items
|
|
- Goshu will download the PDFs for each and upload them to the items on CGSpace manually
|
|
- I spent some time trying to get csv-metadata-quality working with the new Arrow backend for Pandas 2.0.0rc0
|
|
- It seems there is a problem recognizing empty strings as na with `pd.isna()`
|
|
- If I do `pd.isna(field) or field == ""` then it works as expected, but that feels hacky
|
|
- I'm going to test again on the next release...
|
|
- Note that I had been setting both of these global options:
|
|
|
|
```
|
|
pd.options.mode.dtype_backend = 'pyarrow'
|
|
pd.options.mode.nullable_dtypes = True
|
|
```
|
|
|
|
- Then reading the CSV like this:
|
|
|
|
```
|
|
df = pd.read_csv(args.input_file, engine='pyarrow', dtype='string[pyarrow]'
|
|
```
|
|
|
|
<!-- vim: set sw=2 ts=2: -->
|