--- title: "March, 2023" date: 2023-03-01T07:58:36+03:00 author: "Alan Orth" categories: ["Notes"] --- ## 2023-03-01 - Remove `cg.subject.wle` and `cg.identifier.wletheme` from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021) - [iso-codes 4.13.0 was released](https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28), which incorporates my changes to the common names for Iran, Laos, and Syria - I finally got through with porting the input form from DSpace 6 to DSpace 7 - I can't put my finger on it, but the input form has to be formatted very particularly, for example if your rows have more than two fields in them with out a sufficient Bootstrap grid style, or if you use a `twobox`, etc, the entire form step appears blank ## 2023-03-02 - I did some experiments with the new [Pandas 2.0.0rc0 Apache Arrow support](https://datapythonista.me/blog/pandas-20-and-the-arrow-revolution-part-i) - There is a change to the way nulls are handled and it causes my tests for `pd.isna(field)` to fail - I think we need consider blanks as null, but I'm not sure - I made some adjustments to the Discovery sidebar facets on DSpace 6 while I was looking at the DSpace 7 configuration - I downgraded CIFOR subject, Humidtropics subject, Drylands subject, ICARDA subject, and Language from DiscoverySearchFilterFacet to DiscoverySearchFilter in `discovery.xml` since we are no longer using them in sidebar facets ## 2023-03-03 - Atmire merged one of my old pull requests into COUNTER-Robots: - [COUNTER_Robots_list.json: Add new bots](https://github.com/atmire/COUNTER-Robots/pull/54) - I will update the local ILRI overrides in our DSpace spider agents file ## 2023-03-04 - Submit a [pull request on pycountry to use iso-codes 4.13.0](https://github.com/flyingcircusio/pycountry/pull/156) ## 2023-03-05 - Start a harvest on AReS ## 2023-03-06 - Export CGSpace to do Initiative collection mappings - There were thirty-three that needed updating - Send Abenet and Sam a list of twenty-one CAS publications that had been marked as "multiple documents" that we uploaded as metadata-only items - Goshu will download the PDFs for each and upload them to the items on CGSpace manually - I spent some time trying to get csv-metadata-quality working with the new Arrow backend for Pandas 2.0.0rc0 - It seems there is a problem recognizing empty strings as na with `pd.isna()` - If I do `pd.isna(field) or field == ""` then it works as expected, but that feels hacky - I'm going to test again on the next release... - Note that I had been setting both of these global options: ``` pd.options.mode.dtype_backend = 'pyarrow' pd.options.mode.nullable_dtypes = True ``` - Then reading the CSV like this: ``` df = pd.read_csv(args.input_file, engine='pyarrow', dtype='string[pyarrow]' ``` ## 2023-03-07 - Create a PostgreSQL 14 instance on my local environment to start testing compatibility with DSpace 6 as well as all my scripts: ```console $ podman pull docker.io/library/postgres:14-alpine $ podman run --name dspacedb14 -v dspacedb14_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:14-alpine $ createuser -h localhost -p 5432 -U postgres --pwprompt dspacetest $ createdb -h localhost -p 5432 -U postgres -O dspacetest --encoding=UNICODE dspacetest ``` - Peter sent me a list of items that had ILRI affiation on Altmetric, but that didn't have Handles - I ran a duplicate check on them to find if they exist or if we can import them - There were about ninety matches, but a few dozen of those were pre-prints! - After excluding those there were about sixty-one items we already have on CGSpace so I will add their DOIs to the existing items - After joining these with the records from CGSpace and inspecting the DOIs I found that only forty-four were new DOIs - Surprisingly some of the DOIs on Altmetric were not working, though we also had some that were not working (specifically the Journal of Agricultural Economics seems to have reassigned DOIs) - An unscientific comparison of duplicate checking Peter's file with ~500 titles on PostgreSQL 12 and PostgreSQL 14: - PostgreSQL 12: `0.11s user 0.04s system 0% cpu 19:24.65 total` - PostgreSQL 14: `0.12s user 0.04s system 0% cpu 18:13.47 total`