From cbc18b83c53878aa1effea2415db752311488c3e Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Wed, 21 Oct 2020 15:36:31 +0300 Subject: [PATCH] Add notes for 2020-10-21 --- content/posts/2020-10.md | 19 +++++++++++++++++ docs/2020-10/index.html | 28 ++++++++++++++++++++++--- docs/categories/index.html | 2 +- docs/categories/notes/index.html | 2 +- docs/categories/notes/page/2/index.html | 2 +- docs/categories/notes/page/3/index.html | 2 +- docs/categories/notes/page/4/index.html | 2 +- docs/index.html | 2 +- docs/page/2/index.html | 2 +- docs/page/3/index.html | 2 +- docs/page/4/index.html | 2 +- docs/page/5/index.html | 2 +- docs/page/6/index.html | 2 +- docs/page/7/index.html | 2 +- docs/posts/index.html | 2 +- docs/posts/page/2/index.html | 2 +- docs/posts/page/3/index.html | 2 +- docs/posts/page/4/index.html | 2 +- docs/posts/page/5/index.html | 2 +- docs/posts/page/6/index.html | 2 +- docs/posts/page/7/index.html | 2 +- docs/sitemap.xml | 10 ++++----- 22 files changed, 68 insertions(+), 27 deletions(-) diff --git a/content/posts/2020-10.md b/content/posts/2020-10.md index c6f8087e6..143b39216 100644 --- a/content/posts/2020-10.md +++ b/content/posts/2020-10.md @@ -562,6 +562,7 @@ $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H "Content-Type: - Bosede said they were having problems with the "Access" step during item submission - I looked at the Munin graphs for PostgreSQL and both connections and locks look normal so I'm not sure what it could be - I restarted the PostgreSQL service just to see if that would help + - She said she was still experiencing the issue... - I ran the `dspace cleanup -v` process on CGSpace and got an error: ``` @@ -609,4 +610,22 @@ $ curl -s 'http://localhost:8083/solr/statistics/update?softCommit=true' - Is this an issue with Atmire's modules? - I sent them feedback on the ticket +## 2020-10-21 + +- Peter needs to do some reporting on gender across the entirety of CGSpace so he asked me to tag a bunch of items with the AGROVOC "gender" subject (in CGIAR Gender Platform community, all ILRI items with subject "gender" or "women", all CCAFS with "gender and social inclusion" etc) + - First I exported the Gender Platform community and tagged all the items there with "gender" in OpenRefine + - Then I exported all of CGSpace and extracted just the ILRI and other center-specific tags with `csvcut`: + + +``` +$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m" +$ dspace metadata-export -f /tmp/cgspace.csv +$ csvcut -c 'id,dc.subject[],dc.subject[en_US],cg.subject.ilri[],cg.subject.ilri[en_US],cg.subject.alliancebiovciat[],cg.subject.alliancebiovciat[en_US],cg.subject.bioversity[en_US],cg.subject.ccafs[],cg.subject.ccafs[en_US],cg.subject.ciat[],cg.subject.ciat[en_US],cg.subject.cip[],cg.subject.cip[en_US],cg.subject.cpwf[en_US],cg.subject.iita,cg.subject.iita[en_US],cg.subject.iwmi[en_US]' /tmp/cgspace.csv > /tmp/cgspace-subjects.csv +``` + +- Then I went through all center subjects looking for "WOMEN" or "GENDER" and checking if they were missing the associated AGROVOC subject + - To reduce the size of the CSV file I removed all center subject columns after filtering them, and I flagged all rows that I changed so I could upload a CSV with only the items that were modified + - In total it was about 1,100 items that I tagged across the Gender Platform community and elsewhere + - Also, I ran the CSVs through my `csv-metadata-quality` checker to do basic sanity checks, which ended up removing a few dozen duplicated subjects + diff --git a/docs/2020-10/index.html b/docs/2020-10/index.html index 203f9ac12..5b08ae3b9 100644 --- a/docs/2020-10/index.html +++ b/docs/2020-10/index.html @@ -23,7 +23,7 @@ During the FlywayDB migration I got an error: - + @@ -51,9 +51,9 @@ During the FlywayDB migration I got an error: "@type": "BlogPosting", "headline": "October, 2020", "url": "https://alanorth.github.io/cgspace-notes/2020-10/", - "wordCount": "3963", + "wordCount": "4171", "datePublished": "2020-10-06T16:55:54+03:00", - "dateModified": "2020-10-19T15:47:59+03:00", + "dateModified": "2020-10-19T17:22:49+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -754,6 +754,7 @@ $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H "Content-T
  • I ran the dspace cleanup -v process on CGSpace and got an error:
  • @@ -804,6 +805,27 @@ $ curl -s 'http://localhost:8083/solr/statistics/update?softCommit=true' +

    2020-10-21

    + +
    $ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m"
    +$ dspace metadata-export -f /tmp/cgspace.csv
    +$ csvcut -c 'id,dc.subject[],dc.subject[en_US],cg.subject.ilri[],cg.subject.ilri[en_US],cg.subject.alliancebiovciat[],cg.subject.alliancebiovciat[en_US],cg.subject.bioversity[en_US],cg.subject.ccafs[],cg.subject.ccafs[en_US],cg.subject.ciat[],cg.subject.ciat[en_US],cg.subject.cip[],cg.subject.cip[en_US],cg.subject.cpwf[en_US],cg.subject.iita,cg.subject.iita[en_US],cg.subject.iwmi[en_US]' /tmp/cgspace.csv > /tmp/cgspace-subjects.csv
    +
    diff --git a/docs/categories/index.html b/docs/categories/index.html index e2f98f93d..23919c019 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index a86093597..ac43efee3 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index a8ed41f3a..a436dc308 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 1557aa375..13daeb466 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index f56c170e6..4a4dcb234 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/index.html b/docs/index.html index c4892de8c..f5cf2ce5d 100644 --- a/docs/index.html +++ b/docs/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index a632067f7..bc0069ab8 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 9c61141c1..7b3efc786 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 9d74f825d..356d0ebf3 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index dd4b08a51..ca5d945d3 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index 11e8124e7..f9e0d45bb 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 0cfc235e5..6d0e8fdca 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 14a4f9ad2..6b2618bf3 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index b36180d28..7da802bb4 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index e1984de5a..a4865e0e3 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 44afe66c0..3990db0e7 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 89ad54bc4..cf68c8e61 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 14df487d7..acd3aae16 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index f1e72f571..13ad80d4d 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -9,7 +9,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 4dd39fc51..78fbbdb9a 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/categories/ - 2020-10-19T15:47:59+03:00 + 2020-10-19T17:22:49+03:00 https://alanorth.github.io/cgspace-notes/ - 2020-10-19T15:47:59+03:00 + 2020-10-19T17:22:49+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2020-10-19T15:47:59+03:00 + 2020-10-19T17:22:49+03:00 https://alanorth.github.io/cgspace-notes/2020-10/ - 2020-10-19T15:47:59+03:00 + 2020-10-19T17:22:49+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2020-10-19T15:47:59+03:00 + 2020-10-19T17:22:49+03:00