mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2020-10-21
This commit is contained in:
@ -23,7 +23,7 @@ During the FlywayDB migration I got an error:
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-10/" />
|
||||
<meta property="article:published_time" content="2020-10-06T16:55:54+03:00" />
|
||||
<meta property="article:modified_time" content="2020-10-19T15:47:59+03:00" />
|
||||
<meta property="article:modified_time" content="2020-10-19T17:22:49+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="October, 2020"/>
|
||||
@ -51,9 +51,9 @@ During the FlywayDB migration I got an error:
|
||||
"@type": "BlogPosting",
|
||||
"headline": "October, 2020",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2020-10/",
|
||||
"wordCount": "3963",
|
||||
"wordCount": "4171",
|
||||
"datePublished": "2020-10-06T16:55:54+03:00",
|
||||
"dateModified": "2020-10-19T15:47:59+03:00",
|
||||
"dateModified": "2020-10-19T17:22:49+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -754,6 +754,7 @@ $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H "Content-T
|
||||
<ul>
|
||||
<li>I looked at the Munin graphs for PostgreSQL and both connections and locks look normal so I’m not sure what it could be</li>
|
||||
<li>I restarted the PostgreSQL service just to see if that would help</li>
|
||||
<li>She said she was still experiencing the issue…</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I ran the <code>dspace cleanup -v</code> process on CGSpace and got an error:</li>
|
||||
@ -804,6 +805,27 @@ $ curl -s 'http://localhost:8083/solr/statistics/update?softCommit=true'
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="2020-10-21">2020-10-21</h2>
|
||||
<ul>
|
||||
<li>Peter needs to do some reporting on gender across the entirety of CGSpace so he asked me to tag a bunch of items with the AGROVOC “gender” subject (in CGIAR Gender Platform community, all ILRI items with subject “gender” or “women”, all CCAFS with “gender and social inclusion” etc)
|
||||
<ul>
|
||||
<li>First I exported the Gender Platform community and tagged all the items there with “gender” in OpenRefine</li>
|
||||
<li>Then I exported all of CGSpace and extracted just the ILRI and other center-specific tags with <code>csvcut</code>:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m"
|
||||
$ dspace metadata-export -f /tmp/cgspace.csv
|
||||
$ csvcut -c 'id,dc.subject[],dc.subject[en_US],cg.subject.ilri[],cg.subject.ilri[en_US],cg.subject.alliancebiovciat[],cg.subject.alliancebiovciat[en_US],cg.subject.bioversity[en_US],cg.subject.ccafs[],cg.subject.ccafs[en_US],cg.subject.ciat[],cg.subject.ciat[en_US],cg.subject.cip[],cg.subject.cip[en_US],cg.subject.cpwf[en_US],cg.subject.iita,cg.subject.iita[en_US],cg.subject.iwmi[en_US]' /tmp/cgspace.csv > /tmp/cgspace-subjects.csv
|
||||
</code></pre><ul>
|
||||
<li>Then I went through all center subjects looking for “WOMEN” or “GENDER” and checking if they were missing the associated AGROVOC subject
|
||||
<ul>
|
||||
<li>To reduce the size of the CSV file I removed all center subject columns after filtering them, and I flagged all rows that I changed so I could upload a CSV with only the items that were modified</li>
|
||||
<li>In total it was about 1,100 items that I tagged across the Gender Platform community and elsewhere</li>
|
||||
<li>Also, I ran the CSVs through my <code>csv-metadata-quality</code> checker to do basic sanity checks, which ended up removing a few dozen duplicated subjects</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user