Add notes for 2020-10-21

This commit is contained in:
Alan Orth 2020-10-21 15:36:31 +03:00
parent 7cdb9f31e6
commit cbc18b83c5
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
22 changed files with 68 additions and 27 deletions

View File

@ -562,6 +562,7 @@ $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H "Content-Type:
- Bosede said they were having problems with the "Access" step during item submission
- I looked at the Munin graphs for PostgreSQL and both connections and locks look normal so I'm not sure what it could be
- I restarted the PostgreSQL service just to see if that would help
- She said she was still experiencing the issue...
- I ran the `dspace cleanup -v` process on CGSpace and got an error:
```
@ -609,4 +610,22 @@ $ curl -s 'http://localhost:8083/solr/statistics/update?softCommit=true'
- Is this an issue with Atmire's modules?
- I sent them feedback on the ticket
## 2020-10-21
- Peter needs to do some reporting on gender across the entirety of CGSpace so he asked me to tag a bunch of items with the AGROVOC "gender" subject (in CGIAR Gender Platform community, all ILRI items with subject "gender" or "women", all CCAFS with "gender and social inclusion" etc)
- First I exported the Gender Platform community and tagged all the items there with "gender" in OpenRefine
- Then I exported all of CGSpace and extracted just the ILRI and other center-specific tags with `csvcut`:
```
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m"
$ dspace metadata-export -f /tmp/cgspace.csv
$ csvcut -c 'id,dc.subject[],dc.subject[en_US],cg.subject.ilri[],cg.subject.ilri[en_US],cg.subject.alliancebiovciat[],cg.subject.alliancebiovciat[en_US],cg.subject.bioversity[en_US],cg.subject.ccafs[],cg.subject.ccafs[en_US],cg.subject.ciat[],cg.subject.ciat[en_US],cg.subject.cip[],cg.subject.cip[en_US],cg.subject.cpwf[en_US],cg.subject.iita,cg.subject.iita[en_US],cg.subject.iwmi[en_US]' /tmp/cgspace.csv > /tmp/cgspace-subjects.csv
```
- Then I went through all center subjects looking for "WOMEN" or "GENDER" and checking if they were missing the associated AGROVOC subject
- To reduce the size of the CSV file I removed all center subject columns after filtering them, and I flagged all rows that I changed so I could upload a CSV with only the items that were modified
- In total it was about 1,100 items that I tagged across the Gender Platform community and elsewhere
- Also, I ran the CSVs through my `csv-metadata-quality` checker to do basic sanity checks, which ended up removing a few dozen duplicated subjects
<!-- vim: set sw=2 ts=2: -->

View File

@ -23,7 +23,7 @@ During the FlywayDB migration I got an error:
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-10/" />
<meta property="article:published_time" content="2020-10-06T16:55:54+03:00" />
<meta property="article:modified_time" content="2020-10-19T15:47:59+03:00" />
<meta property="article:modified_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="October, 2020"/>
@ -51,9 +51,9 @@ During the FlywayDB migration I got an error:
"@type": "BlogPosting",
"headline": "October, 2020",
"url": "https://alanorth.github.io/cgspace-notes/2020-10/",
"wordCount": "3963",
"wordCount": "4171",
"datePublished": "2020-10-06T16:55:54+03:00",
"dateModified": "2020-10-19T15:47:59+03:00",
"dateModified": "2020-10-19T17:22:49+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -754,6 +754,7 @@ $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H &quot;Content-T
<ul>
<li>I looked at the Munin graphs for PostgreSQL and both connections and locks look normal so I&rsquo;m not sure what it could be</li>
<li>I restarted the PostgreSQL service just to see if that would help</li>
<li>She said she was still experiencing the issue&hellip;</li>
</ul>
</li>
<li>I ran the <code>dspace cleanup -v</code> process on CGSpace and got an error:</li>
@ -804,6 +805,27 @@ $ curl -s 'http://localhost:8083/solr/statistics/update?softCommit=true'
</ul>
</li>
</ul>
<h2 id="2020-10-21">2020-10-21</h2>
<ul>
<li>Peter needs to do some reporting on gender across the entirety of CGSpace so he asked me to tag a bunch of items with the AGROVOC &ldquo;gender&rdquo; subject (in CGIAR Gender Platform community, all ILRI items with subject &ldquo;gender&rdquo; or &ldquo;women&rdquo;, all CCAFS with &ldquo;gender and social inclusion&rdquo; etc)
<ul>
<li>First I exported the Gender Platform community and tagged all the items there with &ldquo;gender&rdquo; in OpenRefine</li>
<li>Then I exported all of CGSpace and extracted just the ILRI and other center-specific tags with <code>csvcut</code>:</li>
</ul>
</li>
</ul>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx2048m&quot;
$ dspace metadata-export -f /tmp/cgspace.csv
$ csvcut -c 'id,dc.subject[],dc.subject[en_US],cg.subject.ilri[],cg.subject.ilri[en_US],cg.subject.alliancebiovciat[],cg.subject.alliancebiovciat[en_US],cg.subject.bioversity[en_US],cg.subject.ccafs[],cg.subject.ccafs[en_US],cg.subject.ciat[],cg.subject.ciat[en_US],cg.subject.cip[],cg.subject.cip[en_US],cg.subject.cpwf[en_US],cg.subject.iita,cg.subject.iita[en_US],cg.subject.iwmi[en_US]' /tmp/cgspace.csv &gt; /tmp/cgspace-subjects.csv
</code></pre><ul>
<li>Then I went through all center subjects looking for &ldquo;WOMEN&rdquo; or &ldquo;GENDER&rdquo; and checking if they were missing the associated AGROVOC subject
<ul>
<li>To reduce the size of the CSV file I removed all center subject columns after filtering them, and I flagged all rows that I changed so I could upload a CSV with only the items that were modified</li>
<li>In total it was about 1,100 items that I tagged across the Gender Platform community and elsewhere</li>
<li>Also, I ran the CSVs through my <code>csv-metadata-quality</code> checker to do basic sanity checks, which ended up removing a few dozen duplicated subjects</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-10-19T15:47:59+03:00" />
<meta property="og:updated_time" content="2020-10-19T17:22:49+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>

View File

@ -4,27 +4,27 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2020-10-19T15:47:59+03:00</lastmod>
<lastmod>2020-10-19T17:22:49+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2020-10-19T15:47:59+03:00</lastmod>
<lastmod>2020-10-19T17:22:49+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2020-10-19T15:47:59+03:00</lastmod>
<lastmod>2020-10-19T17:22:49+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2020-10/</loc>
<lastmod>2020-10-19T15:47:59+03:00</lastmod>
<lastmod>2020-10-19T17:22:49+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2020-10-19T15:47:59+03:00</lastmod>
<lastmod>2020-10-19T17:22:49+03:00</lastmod>
</url>
<url>