From f07c04bd7e05e3775b56a9938b3a8fb61467c42b Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Fri, 13 May 2022 08:39:15 +0300 Subject: [PATCH] Add notes for 2022-05-12 --- content/posts/2022-05.md | 39 ++++++++++++++++ docs/2022-01/index.html | 8 ++-- docs/2022-05/index.html | 60 +++++++++++++++++++++++-- docs/categories/index.html | 2 +- docs/categories/notes/index.html | 2 +- docs/categories/notes/page/2/index.html | 2 +- docs/categories/notes/page/3/index.html | 2 +- docs/categories/notes/page/4/index.html | 2 +- docs/categories/notes/page/5/index.html | 2 +- docs/categories/notes/page/6/index.html | 2 +- docs/index.html | 2 +- docs/page/2/index.html | 2 +- docs/page/3/index.html | 2 +- docs/page/4/index.html | 2 +- docs/page/5/index.html | 2 +- docs/page/6/index.html | 2 +- docs/page/7/index.html | 2 +- docs/page/8/index.html | 2 +- docs/page/9/index.html | 2 +- docs/posts/index.html | 2 +- docs/posts/page/2/index.html | 2 +- docs/posts/page/3/index.html | 2 +- docs/posts/page/4/index.html | 2 +- docs/posts/page/5/index.html | 2 +- docs/posts/page/6/index.html | 2 +- docs/posts/page/7/index.html | 2 +- docs/posts/page/8/index.html | 2 +- docs/posts/page/9/index.html | 2 +- docs/sitemap.xml | 12 ++--- 29 files changed, 130 insertions(+), 39 deletions(-) diff --git a/content/posts/2022-05.md b/content/posts/2022-05.md index 7520f082f..e87d7b285 100644 --- a/content/posts/2022-05.md +++ b/content/posts/2022-05.md @@ -96,4 +96,43 @@ localhost/dspacetest= ☘ SELECT EXTRACT(year from TO_DATE(text_value, 'YYYY-MM- - This one is better than the previous one because it uses npm directly, which comes with the Node.js distribution, rather than requiring the user to install yarn - I also updated a bunch of grunt build deps +## 2022-05-12 + +- CGSpace meeting with Abenet and Peter + - We discussed the future of CGSpace and DSpace in general in the new One CGIAR + - We discussed how to prepare for bringing in content from the Initiatives, whether we need new metadata fields to support people from IFPRI etc + - We discussed the need for good quality Drupal and WordPress modules so sites can harvest content from the repository + - Peter asked me to send him a list of investors/funders/donors so he can clean it up, but also to try to align it with RoR and evntually do something like we do with country codes, adding the RoR IDs and potentially showing the badge on item views + - We also discussed removing some Mirage 2 themes for old programs and CRPs that don't have custom branding, ie only Google Analytics +- Export a list of donors for Peter to clean up: + +```console +localhost/dspacetest= ☘ \COPY (SELECT DISTINCT text_value as "cg.contributor.donor", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 248 GROUP BY text_value ORDER BY count DESC) to /tmp/2022-05-12-donors.csv WITH CSV HEADER; +COPY 1184 +``` + +- Then I created a CSV from our `cg-creator-identifier.xml` controlled vocabulary and ran it against our database with `add-orcid-identifiers-csv.py` to see if any author names by chance matched that are missing ORCIDs in CGSpace + +```console +$ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2022-05-12-add-orcids.csv -db dspace -u dspace -p 'fuuu' | tee /tmp/orcid.log +$ grep -c "Adding ORCID" /tmp/add-orcids.log +85 +``` + +- So it's only eighty-five, but better than nothing... +- I removed the custom Mirage 2 themes for some old projects: + - AgriFood + - AVCD + - LIVES + - FeedTheFuture + - DrylandSystems + - TechnicalConsortium + - EADD +- That should knock off a few minutes of the maven build time! +- I generated a report from the AReS nginx logs on linode18: + +```console +# zcat --force /var/log/nginx/access.log.* | grep 'GET /explorer' | goaccess --log-format=COMBINED - -o /tmp/ares_report.html +``` + diff --git a/docs/2022-01/index.html b/docs/2022-01/index.html index 8287c06a0..d0cddb09c 100644 --- a/docs/2022-01/index.html +++ b/docs/2022-01/index.html @@ -14,7 +14,7 @@ Start a full harvest on AReS - + @@ -36,7 +36,7 @@ Start a full harvest on AReS "url": "https://alanorth.github.io/cgspace-notes/2022-01/", "wordCount": "1224", "datePublished": "2022-01-01T15:20:54+02:00", - "dateModified": "2022-02-07T09:49:34+03:00", + "dateModified": "2022-05-12T12:51:45+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -122,11 +122,11 @@ Start a full harvest on AReS -
$ cat 2022-01-06-add-orcids.csv 
+
$ cat 2022-01-06-add-orcids.csv
 dc.contributor.author,cg.creator.identifier
 "Jones, Chris","Chris Jones: 0000-0001-9096-9728"
 "Jones, Christopher S.","Chris Jones: 0000-0001-9096-9728"
-$ ./ilri/add-orcid-identifiers-csv.py -i 2022-01-06-add-orcids.csv -db dspace63 -u dspacetest -p 'dom@in34sniper' 
+$ ./ilri/add-orcid-identifiers-csv.py -i 2022-01-06-add-orcids.csv -db dspace63 -u dspacetest -p 'dom@in34sniper'
 

2022-01-09

  • Validate and register CGSpace on OpenArchives diff --git a/docs/2022-05/index.html b/docs/2022-05/index.html index 0e098e239..f93c70757 100644 --- a/docs/2022-05/index.html +++ b/docs/2022-05/index.html @@ -35,7 +35,7 @@ I purged 93,974 hits from these IPs using my check-spider-ip-hits.sh script - + @@ -76,9 +76,9 @@ I purged 93,974 hits from these IPs using my check-spider-ip-hits.sh script "@type": "BlogPosting", "headline": "May, 2022", "url": "https://alanorth.github.io/cgspace-notes/2022-05/", - "wordCount": "564", + "wordCount": "947", "datePublished": "2022-05-04T09:13:39+03:00", - "dateModified": "2022-05-05T12:47:48+03:00", + "dateModified": "2022-05-10T16:35:50+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -243,12 +243,64 @@ I purged 93,974 hits from these IPs using my check-spider-ip-hits.sh script
  • But it seems PostgreSQL is smart enough to recognize date formatting in strings automatically when we cast so we don’t need to convert to date first
  • Another thing I noticed is that a few hundred items have accession dates from decades ago, perhaps this is due to importing items from the CGIAR Library?
  • I spent some time merging a few pull requests for DSpace 6.4 and porting one to main for DSpace 7.x
  • +
  • I also submitted a pull request to migrate Mirage 2’s build from bower and compass to yarn and node-sass

2022-05-07

  • Start a harvest on AReS
- +

2022-05-09

+
    +
  • Submit an issue to Atmire’s bug tracker inquiring about DSpace 6.4 support
  • +
+

2022-05-10

+ +

2022-05-12

+
    +
  • CGSpace meeting with Abenet and Peter +
      +
    • We discussed the future of CGSpace and DSpace in general in the new One CGIAR
    • +
    • We discussed how to prepare for bringing in content from the Initiatives, whether we need new metadata fields to support people from IFPRI etc
    • +
    • We discussed the need for good quality Drupal and WordPress modules so sites can harvest content from the repository
    • +
    • Peter asked me to send him a list of investors/funders/donors so he can clean it up, but also to try to align it with RoR and evntually do something like we do with country codes, adding the RoR IDs and potentially showing the badge on item views
    • +
    • We also discussed removing some Mirage 2 themes for old programs and CRPs that don’t have custom branding, ie only Google Analytics
    • +
    +
  • +
  • Export a list of donors for Peter to clean up:
  • +
+
localhost/dspacetest= ☘ \COPY (SELECT DISTINCT text_value as "cg.contributor.donor", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 248 GROUP BY text_value ORDER BY count DESC) to /tmp/2022-05-12-donors.csv WITH CSV HEADER;
+COPY 1184
+
    +
  • Then I created a CSV from our cg-creator-identifier.xml controlled vocabulary and ran it against our database with add-orcid-identifiers-csv.py to see if any author names by chance matched that are missing ORCIDs in CGSpace
  • +
+
$ ./ilri/add-orcid-identifiers-csv.py -i /tmp/2022-05-12-add-orcids.csv -db dspace -u dspace -p 'fuuu' | tee /tmp/orcid.log
+$ grep -c "Adding ORCID" /tmp/add-orcids.log
+85
+
    +
  • So it’s only eighty-five, but better than nothing…
  • +
  • I removed the custom Mirage 2 themes for some old projects: +
      +
    • AgriFood
    • +
    • AVCD
    • +
    • LIVES
    • +
    • FeedTheFuture
    • +
    • DrylandSystems
    • +
    • TechnicalConsortium
    • +
    • EADD
    • +
    +
  • +
  • That should knock off a few minutes of the maven build time!
  • +
  • I generated a report from the AReS nginx logs on linode18:
  • +
+
# zcat --force /var/log/nginx/access.log.* | grep 'GET /explorer' | goaccess --log-format=COMBINED - -o /tmp/ares_report.html
+
diff --git a/docs/categories/index.html b/docs/categories/index.html index 113cbaf64..59402a126 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index 0e4286376..f77e20a31 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 25591cb30..c44c3b677 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 190c5e2e5..aa0b8e515 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index d33ef09a4..b2b9cbac1 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index 12ddf9243..1e952fb58 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index e1f2fe39e..697f2932a 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index ded8087de..aade452cd 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index b7ebdd791..350f5846e 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 4897bdbba..2f0dc8803 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 4fe6b7385..87c511c0d 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index 02d655613..8e014cc9d 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index 2b3a48446..215e74a44 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 17d0c2168..00da4ea35 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index 1714410dc..94afe79b3 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/9/index.html b/docs/page/9/index.html index 7769c6eee..afc9ba68b 100644 --- a/docs/page/9/index.html +++ b/docs/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 09fbb6042..59ad55b16 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 32ce150e8..c6741a771 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index ed516f5c2..09bfe6979 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index f499c3dde..7814164b1 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index d70c6895c..56b64b855 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 4faf81d6f..96b3bd433 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index edcd25b9c..d1e03257a 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index e89ec0ba1..b33d15de7 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html index 9d16fc9fb..697f3b4c8 100644 --- a/docs/posts/page/9/index.html +++ b/docs/posts/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 87aeef51a..791cc1e0c 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2022-05-05T16:50:10+03:00 + 2022-05-12T12:51:45+03:00 https://alanorth.github.io/cgspace-notes/ - 2022-05-05T16:50:10+03:00 + 2022-05-12T12:51:45+03:00 https://alanorth.github.io/cgspace-notes/2022-05/ - 2022-05-05T12:47:48+03:00 + 2022-05-10T16:35:50+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2022-05-05T16:50:10+03:00 + 2022-05-12T12:51:45+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2022-05-05T16:50:10+03:00 + 2022-05-12T12:51:45+03:00 https://alanorth.github.io/cgspace-notes/2022-04/ 2022-05-04T11:09:45+03:00 @@ -27,7 +27,7 @@ 2022-03-01T17:17:27+03:00 https://alanorth.github.io/cgspace-notes/2022-01/ - 2022-02-07T09:49:34+03:00 + 2022-05-12T12:51:45+03:00 https://alanorth.github.io/cgspace-notes/2021-12/ 2022-01-09T10:39:51+02:00