diff --git a/content/posts/2023-03.md b/content/posts/2023-03.md index e7f3ed611..3eb92d67e 100644 --- a/content/posts/2023-03.md +++ b/content/posts/2023-03.md @@ -9,7 +9,55 @@ categories: ["Notes"] - Remove `cg.subject.wle` and `cg.identifier.wletheme` from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021) - [iso-codes 4.13.0 was released](https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28), which incorporates my changes to the common names for Iran, Laos, and Syria +- I finally got through with porting the input form from DSpace 6 to DSpace 7 +- I can't put my finger on it, but the input form has to be formatted very particularly, for example if your rows have more than two fields in them with out a sufficient Bootstrap grid style, or if you use a `twobox`, etc, the entire form step appears blank + +## 2023-03-02 + +- I did some experiments with the new [Pandas 2.0.0rc0 Apache Arrow support](https://datapythonista.me/blog/pandas-20-and-the-arrow-revolution-part-i) + - There is a change to the way nulls are handled and it causes my tests for `pd.isna(field)` to fail + - I think we need consider blanks as null, but I'm not sure +- I made some adjustments to the Discovery sidebar facets on DSpace 6 while I was looking at the DSpace 7 configuration + - I downgraded CIFOR subject, Humidtropics subject, Drylands subject, ICARDA subject, and Language from DiscoverySearchFilterFacet to DiscoverySearchFilter in `discovery.xml` since we are no longer using them in sidebar facets + +## 2023-03-03 + +- Atmire merged one of my old pull requests into COUNTER-Robots: + - [COUNTER_Robots_list.json: Add new bots](https://github.com/atmire/COUNTER-Robots/pull/54) +- I will update the local ILRI overrides in our DSpace spider agents file + +## 2023-03-04 + +- Submit a [pull request on pycountry to use iso-codes 4.13.0](https://github.com/flyingcircusio/pycountry/pull/156) + +## 2023-03-05 + +- Start a harvest on AReS + +## 2023-03-06 + +- Export CGSpace to do Initiative collection mappings + - There were thirty-three that needed updating +- Send Abenet and Sam a list of twenty-one CAS publications that had been marked as "multiple documents" that we uploaded as metadata-only items + - Goshu will download the PDFs for each and upload them to the items on CGSpace manually +- I spent some time trying to get csv-metadata-quality working with the new Arrow backend for Pandas 2.0.0rc0 + - It seems there is a problem recognizing empty strings as na with `pd.isna()` + - If I do `pd.isna(field) or field == ""` then it works as expected, but that feels hacky + - I'm going to test again on the next release... + - Note that I had been setting both of these global options: + +``` +pd.options.mode.dtype_backend = 'pyarrow' +pd.options.mode.nullable_dtypes = True +``` + +- Then reading the CSV like this: + +``` +df = pd.read_csv(args.input_file, engine='pyarrow', dtype='string[pyarrow]' +``` + diff --git a/docs/2023-02/index.html b/docs/2023-02/index.html index 5d8757162..97275f069 100644 --- a/docs/2023-02/index.html +++ b/docs/2023-02/index.html @@ -18,7 +18,7 @@ I want to try to expand my use of their data to journals, publishers, volumes, i - + @@ -44,7 +44,7 @@ I want to try to expand my use of their data to journals, publishers, volumes, i "url": "https://alanorth.github.io/cgspace-notes/2023-02/", "wordCount": "3087", "datePublished": "2023-02-01T10:57:36+03:00", - "dateModified": "2023-02-26T19:59:12+03:00", + "dateModified": "2023-03-01T08:30:25+03:00", "author": { "@type": "Person", "name": "Alan Orth" diff --git a/docs/2023-03/index.html b/docs/2023-03/index.html index 20a88fc70..1eef17327 100644 --- a/docs/2023-03/index.html +++ b/docs/2023-03/index.html @@ -11,11 +11,12 @@ Remove cg.subject.wle and cg.identifier.wletheme from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021) iso-codes 4.13.0 was released, which incorporates my changes to the common names for Iran, Laos, and Syria +I finally got through with porting the input form from DSpace 6 to DSpace 7 " /> - + @@ -25,6 +26,7 @@ iso-codes 4.13.0 was released, which incorporates my changes to the common names Remove cg.subject.wle and cg.identifier.wletheme from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021) iso-codes 4.13.0 was released, which incorporates my changes to the common names for Iran, Laos, and Syria +I finally got through with porting the input form from DSpace 6 to DSpace 7 "/> @@ -36,9 +38,9 @@ iso-codes 4.13.0 was released, which incorporates my changes to the common names "@type": "BlogPosting", "headline": "March, 2023", "url": "https://alanorth.github.io/cgspace-notes/2023-03/", - "wordCount": "41", + "wordCount": "380", "datePublished": "2023-03-01T07:58:36+03:00", - "dateModified": "2023-03-01T07:58:36+03:00", + "dateModified": "2023-03-01T08:30:25+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -116,8 +118,70 @@ iso-codes 4.13.0 was released, which incorporates my changes to the common names - + +

2023-03-02

+ +

2023-03-03

+ +

2023-03-04

+ +

2023-03-05

+ +

2023-03-06

+ +
pd.options.mode.dtype_backend = 'pyarrow'
+pd.options.mode.nullable_dtypes = True
+
+
df = pd.read_csv(args.input_file, engine='pyarrow', dtype='string[pyarrow]'
+
diff --git a/docs/categories/index.html b/docs/categories/index.html index 9c793776b..b4165aa13 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index 7c3651842..8312068bc 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + @@ -94,6 +94,7 @@ Read more → diff --git a/docs/categories/notes/index.xml b/docs/categories/notes/index.xml index 337db26e8..009b65e22 100644 --- a/docs/categories/notes/index.xml +++ b/docs/categories/notes/index.xml @@ -17,6 +17,7 @@ <ul> <li>Remove <code>cg.subject.wle</code> and <code>cg.identifier.wletheme</code> from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)</li> <li><a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28">iso-codes 4.13.0 was released</a>, which incorporates my changes to the common names for Iran, Laos, and Syria</li> +<li>I finally got through with porting the input form from DSpace 6 to DSpace 7</li> </ul> diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 22623cff5..dc5774aa3 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 202fcff6d..99dcd4cfa 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 4720ab2d5..70526d4bc 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index cbf0ebf4a..28d933159 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index 5e8945aaa..060fcb840 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html index a1958a4ed..e0df8f45a 100644 --- a/docs/categories/notes/page/7/index.html +++ b/docs/categories/notes/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index 713b4532f..3a86ef2ea 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + @@ -109,6 +109,7 @@ Read more → diff --git a/docs/index.xml b/docs/index.xml index 98ecae208..13807d77a 100644 --- a/docs/index.xml +++ b/docs/index.xml @@ -17,6 +17,7 @@ <ul> <li>Remove <code>cg.subject.wle</code> and <code>cg.identifier.wletheme</code> from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)</li> <li><a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28">iso-codes 4.13.0 was released</a>, which incorporates my changes to the common names for Iran, Laos, and Syria</li> +<li>I finally got through with porting the input form from DSpace 6 to DSpace 7</li> </ul> diff --git a/docs/page/10/index.html b/docs/page/10/index.html index 1450fb16a..66c5e1fe5 100644 --- a/docs/page/10/index.html +++ b/docs/page/10/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index 69c5add40..c45958ec8 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 099d45e76..5a5f1b29a 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index c710d6fff..8fc7216bf 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index 5bbcd6103..feeebb159 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index c6cb290a8..d6a0492ec 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 207b74195..5ffdd7056 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index 657916443..a835003c8 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/9/index.html b/docs/page/9/index.html index 3838a9ea7..3e1c5abe9 100644 --- a/docs/page/9/index.html +++ b/docs/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 07f62275c..40aab7f5d 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + @@ -109,6 +109,7 @@ Read more → diff --git a/docs/posts/index.xml b/docs/posts/index.xml index 88ab1fcf5..dc10ad138 100644 --- a/docs/posts/index.xml +++ b/docs/posts/index.xml @@ -17,6 +17,7 @@ <ul> <li>Remove <code>cg.subject.wle</code> and <code>cg.identifier.wletheme</code> from CGSpace input form after confirming with IWMI colleagues that they no longer need them (WLE closed in 2021)</li> <li><a href="https://salsa.debian.org/iso-codes-team/iso-codes/-/blob/main/CHANGELOG.md#4130-2023-02-28">iso-codes 4.13.0 was released</a>, which incorporates my changes to the common names for Iran, Laos, and Syria</li> +<li>I finally got through with porting the input form from DSpace 6 to DSpace 7</li> </ul> diff --git a/docs/posts/page/10/index.html b/docs/posts/page/10/index.html index c7c505278..9d8e307b1 100644 --- a/docs/posts/page/10/index.html +++ b/docs/posts/page/10/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 596226365..95ac64580 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 809e5204c..3fbed0292 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 733f9e590..d53637c70 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index f5a91a257..68a6bdd2d 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index dcc139da0..de9fdb37f 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 0825fc27a..4198ebae7 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index c7c08ac22..0c33aee03 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html index 63b6185c4..146e3f483 100644 --- a/docs/posts/page/9/index.html +++ b/docs/posts/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 647c298c1..2cbc06a9b 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,22 +3,22 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2023-03-01T07:58:36+03:00 + 2023-03-01T08:30:25+03:00 https://alanorth.github.io/cgspace-notes/ - 2023-03-01T07:58:36+03:00 + 2023-03-01T08:30:25+03:00 https://alanorth.github.io/cgspace-notes/2023-03/ - 2023-03-01T07:58:36+03:00 + 2023-03-01T08:30:25+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2023-03-01T07:58:36+03:00 + 2023-03-01T08:30:25+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2023-03-01T07:58:36+03:00 + 2023-03-01T08:30:25+03:00 https://alanorth.github.io/cgspace-notes/2023-02/ - 2023-02-26T19:59:12+03:00 + 2023-03-01T08:30:25+03:00 https://alanorth.github.io/cgspace-notes/2023-01/ 2023-01-31T22:20:38+03:00