Add notes for 2020-10-21

This commit is contained in:
2020-10-21 15:36:31 +03:00
parent 7cdb9f31e6
commit cbc18b83c5
22 changed files with 68 additions and 27 deletions

View File

@ -562,6 +562,7 @@ $ curl -XPOST http://localhost:9200/openrxv-values/_doc/_bulk -H "Content-Type:
- Bosede said they were having problems with the "Access" step during item submission
- I looked at the Munin graphs for PostgreSQL and both connections and locks look normal so I'm not sure what it could be
- I restarted the PostgreSQL service just to see if that would help
- She said she was still experiencing the issue...
- I ran the `dspace cleanup -v` process on CGSpace and got an error:
```
@ -609,4 +610,22 @@ $ curl -s 'http://localhost:8083/solr/statistics/update?softCommit=true'
- Is this an issue with Atmire's modules?
- I sent them feedback on the ticket
## 2020-10-21
- Peter needs to do some reporting on gender across the entirety of CGSpace so he asked me to tag a bunch of items with the AGROVOC "gender" subject (in CGIAR Gender Platform community, all ILRI items with subject "gender" or "women", all CCAFS with "gender and social inclusion" etc)
- First I exported the Gender Platform community and tagged all the items there with "gender" in OpenRefine
- Then I exported all of CGSpace and extracted just the ILRI and other center-specific tags with `csvcut`:
```
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m"
$ dspace metadata-export -f /tmp/cgspace.csv
$ csvcut -c 'id,dc.subject[],dc.subject[en_US],cg.subject.ilri[],cg.subject.ilri[en_US],cg.subject.alliancebiovciat[],cg.subject.alliancebiovciat[en_US],cg.subject.bioversity[en_US],cg.subject.ccafs[],cg.subject.ccafs[en_US],cg.subject.ciat[],cg.subject.ciat[en_US],cg.subject.cip[],cg.subject.cip[en_US],cg.subject.cpwf[en_US],cg.subject.iita,cg.subject.iita[en_US],cg.subject.iwmi[en_US]' /tmp/cgspace.csv > /tmp/cgspace-subjects.csv
```
- Then I went through all center subjects looking for "WOMEN" or "GENDER" and checking if they were missing the associated AGROVOC subject
- To reduce the size of the CSV file I removed all center subject columns after filtering them, and I flagged all rows that I changed so I could upload a CSV with only the items that were modified
- In total it was about 1,100 items that I tagged across the Gender Platform community and elsewhere
- Also, I ran the CSVs through my `csv-metadata-quality` checker to do basic sanity checks, which ended up removing a few dozen duplicated subjects
<!-- vim: set sw=2 ts=2: -->