From 9efd56b4059afc2c2c181f1b4060177a182f43c6 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Thu, 27 Jan 2022 16:58:05 +0300 Subject: [PATCH] Add notes --- content/posts/2022-01.md | 88 +++++++++++++++++++ docs/2022-01/index.html | 109 +++++++++++++++++++++++- docs/categories/index.html | 2 +- docs/categories/notes/index.html | 2 +- docs/categories/notes/page/2/index.html | 2 +- docs/categories/notes/page/3/index.html | 2 +- docs/categories/notes/page/4/index.html | 2 +- docs/categories/notes/page/5/index.html | 2 +- docs/categories/notes/page/6/index.html | 2 +- docs/index.html | 2 +- docs/page/2/index.html | 2 +- docs/page/3/index.html | 2 +- docs/page/4/index.html | 2 +- docs/page/5/index.html | 2 +- docs/page/6/index.html | 2 +- docs/page/7/index.html | 2 +- docs/page/8/index.html | 2 +- docs/posts/index.html | 2 +- docs/posts/page/2/index.html | 2 +- docs/posts/page/3/index.html | 2 +- docs/posts/page/4/index.html | 2 +- docs/posts/page/5/index.html | 2 +- docs/posts/page/6/index.html | 2 +- docs/posts/page/7/index.html | 2 +- docs/posts/page/8/index.html | 2 +- docs/sitemap.xml | 10 +-- 26 files changed, 222 insertions(+), 31 deletions(-) diff --git a/content/posts/2022-01.md b/content/posts/2022-01.md index be9d02dfc..af30ffe38 100644 --- a/content/posts/2022-01.md +++ b/content/posts/2022-01.md @@ -88,4 +88,92 @@ $ grep -E '^2022-01*' /var/log/postgresql/postgresql-10-main.log | grep -c 'stil - I set a system alert on DSpace and then restarted the server +## 2022-01-20 + +- Abenet gave me a thumbs up for Gaia's eighteen CAS Green Cover items from last month + - I created a SimpleArchiveFormat bundle with SAFBuilder and then imported them on CGSpace: + +```console +$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" dspace import --add --eperson=aorth@mjanja.ch --source /tmp/SimpleArchiveFormat --mapfile=./2022-01-20-green-covers.map +``` + +## 2022-01-21 + +- Start working on the rest of the ~980 CGIAR TAC and ICW documents from Gaia + - I did some cleanups and standardization of author names + - I also noticed that a few dozen items had no dates at all, so I checked the PDFs and found dates for them in the text + - Otherwise all items have only a year, which is not great... +- Proof of concept upgrade of OpenRXV from Angular 9 to Angular 10 + - I did some basic tests and created a [pull request](https://github.com/ilri/OpenRXV/pull/128) + +## 2022-01-22 + +- Spend some time adding months to the CGIAR TAC and IWC records from Gaia + - Most of the PDFs have months so this is annoying... + +## 2022-01-23 + +- Finalize cleaning up the dates on the CGIAR TAC and IWC records from Gaia +- Rebuild AReS and start a fresh harvest + +## 2022-01-25 + +- Help Udana from IWMI answer some questions about licenses on their journal articles + - I was surprised to see they have 921 total, but only about 200 have a `dcterms.license` field + - I updated about thirty manually, but really Udana should do more... +- Normalize the metadata `text_lang` attributes on CGSpace database: + +```console +dspace=# SELECT DISTINCT text_lang, count(text_lang) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) GROUP BY text_lang ORDER BY count DESC; + text_lang | count +-----------+--------- + en_US | 2803350 + en | 6232 + | 3200 + fr | 2 + vn | 2 + 92 | 1 + sp | 1 + | 0 +(8 rows) +dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_lang IN ('en', '92', ''); +UPDATE 9433 +``` + +- Then export the WLE Journal Articles collection again so there are fewer columns to mess with + +## 2022-01-26 + +- Send Gaia an example of the duplicate report for the first 200 TAC items to see what she thinks + +## 2022-01-27 + +- Work on WLE's Journal Articles a bit more + - I realized that ~130 items have DOIs in their citation, but no `cg.identifier.doi` field + - I used this OpenRefine GREL to copy them: + +``` +cells['dcterms.bibliographicCitation[en_US]'].value.split("doi: ")[1] +``` + +- I also spent a bit of time cleaning up ILRI Journal Articles, but I notice that we don't put DOIs in the citation so it's not possible to fix items that are missing DOIs that way + - And I cleaned up and normalized some licenses +- Francesca from Bioversity was having issues with a submission on CGSpace again + - I looked at PostgreSQL and see an increasing number of locks: + +```console +$ psql -c "SELECT application_name FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid" | sort | uniq -c | sort -n + 1 + 1 ------------------ + 1 (537 rows) + 1 application_name + 9 psql + 51 dspaceApi + 477 dspaceWeb +$ grep -E '^2022-01*' /var/log/postgresql/postgresql-10-main.log | grep -c 'still waiting for' +3 +``` + +- I set a system alert on CGSpace and then restarted Tomcat and PostgreSQL + diff --git a/docs/2022-01/index.html b/docs/2022-01/index.html index 9ffd653b5..15c775bbb 100644 --- a/docs/2022-01/index.html +++ b/docs/2022-01/index.html @@ -14,7 +14,7 @@ Start a full harvest on AReS - + @@ -34,9 +34,9 @@ Start a full harvest on AReS "@type": "BlogPosting", "headline": "January, 2022", "url": "https://alanorth.github.io/cgspace-notes/2022-01/", - "wordCount": "366", + "wordCount": "855", "datePublished": "2022-01-01T15:20:54+02:00", - "dateModified": "2022-01-12T19:55:47+02:00", + "dateModified": "2022-01-19T18:14:26+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -196,6 +196,109 @@ node_modules/@elastic/elasticsearch/api/types.d.ts:3209:13 - error TS2456: Type +

2022-01-20

+ +
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" dspace import --add --eperson=aorth@mjanja.ch --source /tmp/SimpleArchiveFormat --mapfile=./2022-01-20-green-covers.map
+

2022-01-21

+ +

2022-01-22

+ +

2022-01-23

+ +

2022-01-25

+ +
dspace=# SELECT DISTINCT text_lang, count(text_lang) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) GROUP BY text_lang ORDER BY count DESC;
+ text_lang |  count  
+-----------+---------
+ en_US     | 2803350
+ en        |    6232
+           |    3200
+ fr        |       2
+ vn        |       2
+ 92        |       1
+ sp        |       1
+           |       0
+(8 rows)
+dspace=# UPDATE metadatavalue SET text_lang='en_US' WHERE dspace_object_id IN (SELECT uuid FROM item) AND text_lang IN ('en', '92', '');
+UPDATE 9433
+
+

2022-01-26

+ +

2022-01-27

+ +
cells['dcterms.bibliographicCitation[en_US]'].value.split("doi: ")[1]
+
+
$ psql -c "SELECT application_name FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid = psa.pid" | sort | uniq -c | sort -n
+      1 
+      1 ------------------
+      1 (537 rows)
+      1  application_name 
+      9  psql
+     51  dspaceApi
+    477  dspaceWeb
+$ grep -E '^2022-01*' /var/log/postgresql/postgresql-10-main.log | grep -c 'still waiting for'
+3
+
diff --git a/docs/categories/index.html b/docs/categories/index.html index 3a4125df9..e3eb4917c 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index 6c7fbf2ca..351a7264d 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 3a89c20a9..dd284517b 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 6b1fd1452..aaddb825e 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 5a34a63fb..09b9245fe 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index 2be2e3264..5e81354d4 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index 08906e18e..a1b263f35 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index ea1de9b3b..2cbee5ca3 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index db6f770ff..a94d58b59 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index c1da31a11..679340690 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 9ff1af37c..a7e315887 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index 668110310..439a149bc 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index 8b88ee4c5..606bea2e3 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index ea015b373..9f6b79bdf 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index 0755baaa4..97b0432cb 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 0087c7ed8..3f27688cd 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 8f1ccdc82..645849975 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index da204b934..411530231 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 276c1d9b2..e44db00dd 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 08dab30b3..3f5deafdb 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index e572c59c9..4a7ee09af 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 0a0bb55ab..d7b63b113 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index b20e0817c..4aef71cb0 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index b271efc51..8a2ffbb1e 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2022-01-12T19:55:47+02:00 + 2022-01-19T18:14:26+03:00 https://alanorth.github.io/cgspace-notes/ - 2022-01-12T19:55:47+02:00 + 2022-01-19T18:14:26+03:00 https://alanorth.github.io/cgspace-notes/2022-01/ - 2022-01-12T19:55:47+02:00 + 2022-01-19T18:14:26+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2022-01-12T19:55:47+02:00 + 2022-01-19T18:14:26+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2022-01-12T19:55:47+02:00 + 2022-01-19T18:14:26+03:00 https://alanorth.github.io/cgspace-notes/2021-12/ 2022-01-09T10:39:51+02:00