cgspace-notes/2023-03.md at 19f8de4481b30a426de656412688e14c702aa622

mirror of https://github.com/alanorth/cgspace-notes.git synced 2024-06-29 01:23:47 +02:00

Alan Orth 19f8de4481

2023-03-07 09:53:31 +03:00

title

date

author

2023-03-01

Remove cg.subject.wle and cg.identifier.wletheme from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)
iso-codes 4.13.0 was released, which incorporates my changes to the common names for Iran, Laos, and Syria
I finally got through with porting the input form from DSpace 6 to DSpace 7

I can't put my finger on it, but the input form has to be formatted very particularly, for example if your rows have more than two fields in them with out a sufficient Bootstrap grid style, or if you use a twobox, etc, the entire form step appears blank

I did some experiments with the new Pandas 2.0.0rc0 Apache Arrow support
- There is a change to the way nulls are handled and it causes my tests for pd.isna(field) to fail
- I think we need consider blanks as null, but I'm not sure
I made some adjustments to the Discovery sidebar facets on DSpace 6 while I was looking at the DSpace 7 configuration
- I downgraded CIFOR subject, Humidtropics subject, Drylands subject, ICARDA subject, and Language from DiscoverySearchFilterFacet to DiscoverySearchFilter in discovery.xml since we are no longer using them in sidebar facets

Atmire merged one of my old pull requests into COUNTER-Robots:
- COUNTER_Robots_list.json: Add new bots
I will update the local ILRI overrides in our DSpace spider agents file

Export CGSpace to do Initiative collection mappings
- There were thirty-three that needed updating
Send Abenet and Sam a list of twenty-one CAS publications that had been marked as "multiple documents" that we uploaded as metadata-only items
- Goshu will download the PDFs for each and upload them to the items on CGSpace manually
I spent some time trying to get csv-metadata-quality working with the new Arrow backend for Pandas 2.0.0rc0
- It seems there is a problem recognizing empty strings as na with pd.isna()
- If I do pd.isna(field) or field == "" then it works as expected, but that feels hacky
- I'm going to test again on the next release...
- Note that I had been setting both of these global options:

pd.options.mode.dtype_backend = 'pyarrow'
pd.options.mode.nullable_dtypes = True

df = pd.read_csv(args.input_file, engine='pyarrow', dtype='string[pyarrow]'