CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

February, 2016

2016-02-05

  • Looking at some DAGRIS data for Abenet Yabowork
  • Lots of issues with spaces, newlines, etc causing the import to fail
  • I noticed we have a very interesting list of countries on CGSpace:

CGSpace country list

  • Not only are there 49,000 countries, we have some blanks (25)…
  • Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
Read more →

January, 2016

2016-01-13

  • Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_collections.sh script I wrote last year.
  • I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
  • Update GitHub wiki for documentation of maintenance tasks.
Read more →

December, 2015

2015-12-02

  • Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
# cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
Read more →

November, 2015

2015-11-22

  • CGSpace went down
  • Looks like DSpace exhausted its PostgreSQL connection pool
  • Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78
Read more →

e– title: “October, 2021” date: 2021-10-01T11:14:07+03:00 author: “Alan Orth” categories: [“Notes”]

2021-10-01

  • Export all affiliations on CGSpace and run them against the latest RoR data dump:
localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
$ csvcut -c 1 /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
ations-matching.csv
$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l 
1879
$ wc -l /tmp/2021-10-01-affiliations.txt 
7100 /tmp/2021-10-01-affiliations.txt
  • So we have 1879/7100 (26.46%) matching already
Read more →