Add notes for 2020-10-06

This commit is contained in:
Alan Orth 2020-10-06 16:59:31 +03:00
parent eccec41aca
commit bda0324ed9
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
104 changed files with 1028 additions and 5244 deletions

15
content/posts/2020-10.md Normal file
View File

@ -0,0 +1,15 @@
---
title: "October, 2020"
date: 2020-10-06T16:55:54+03:00
author: "Alan Orth"
categories: ["Notes"]
---
## 2020-10-06
- Add tests for the new `/items` POST handlers to the DSpace 6.x branch of my [dspace-statistics-api](https://github.com/ilri/dspace-statistics-api/tree/v6_x)
- It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available
<!--more-->
<!-- vim: set sw=2 ts=2: -->

View File

@ -239,6 +239,8 @@ db.statementpool = true
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -247,8 +249,6 @@ db.statementpool = true
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -261,6 +261,8 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -269,8 +271,6 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -197,6 +197,8 @@ $ find SimpleArchiveForBio/ -iname &ldquo;*.pdf&rdquo; -exec basename {} ; | sor
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -205,8 +207,6 @@ $ find SimpleArchiveForBio/ -iname &ldquo;*.pdf&rdquo; -exec basename {} ; | sor
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -375,6 +375,8 @@ Bitstream: tést señora alimentación.pdf
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -383,8 +385,6 @@ Bitstream: tést señora alimentación.pdf
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -313,6 +313,8 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -321,8 +323,6 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -492,6 +492,8 @@ dspace.log.2016-04-27:7271
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -500,8 +502,6 @@ dspace.log.2016-04-27:7271
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -368,6 +368,8 @@ sys 0m20.540s
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -376,8 +378,6 @@ sys 0m20.540s
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -406,6 +406,8 @@ $ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-D
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -414,8 +416,6 @@ $ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-D
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -322,6 +322,8 @@ discovery.index.authority.ignore-variants=true
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -330,8 +332,6 @@ discovery.index.authority.ignore-variants=true
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -386,6 +386,8 @@ $ JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot; /home/cgspace.cgiar.org/b
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -394,8 +396,6 @@ $ JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx512m&quot; /home/cgspace.cgiar.org/b
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -603,6 +603,8 @@ $ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -611,8 +613,6 @@ $ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -369,6 +369,8 @@ dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http:
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -377,8 +379,6 @@ dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http:
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -545,6 +545,8 @@ org.dspace.discovery.SearchServiceException: Error executing query
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -553,8 +555,6 @@ org.dspace.discovery.SearchServiceException: Error executing query
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -781,6 +781,8 @@ $ exit
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -789,8 +791,6 @@ $ exit
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -366,6 +366,8 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -374,8 +376,6 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -421,6 +421,8 @@ COPY 1968
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -429,8 +431,6 @@ COPY 1968
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -352,6 +352,8 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -360,8 +362,6 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -582,6 +582,8 @@ $ gem install compass -v 1.0.3
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -590,8 +592,6 @@ $ gem install compass -v 1.0.3
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -388,6 +388,8 @@ UPDATE 187
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -396,8 +398,6 @@ UPDATE 187
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -267,6 +267,8 @@ $ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; [dspace]/bin/dspace impo
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -275,8 +277,6 @@ $ JAVA_OPTS=&quot;-Xmx1024m -Dfile.encoding=UTF-8&quot; [dspace]/bin/dspace impo
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -272,6 +272,8 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -280,8 +282,6 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -514,6 +514,8 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -522,8 +524,6 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -656,6 +656,8 @@ Cert Status: good
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -664,8 +666,6 @@ Cert Status: good
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -440,6 +440,8 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -448,8 +450,6 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -941,6 +941,8 @@ $ cat dspace.log.2017-11-28 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | u
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -949,8 +951,6 @@ $ cat dspace.log.2017-11-28 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | u
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -780,6 +780,8 @@ DELETE 20
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -788,8 +790,6 @@ DELETE 20
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -1449,6 +1449,8 @@ Catalina:type=Manager,context=/,host=localhost activeSessions 8
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -1457,8 +1459,6 @@ Catalina:type=Manager,context=/,host=localhost activeSessions 8
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -1036,6 +1036,8 @@ UPDATE 3
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -1044,8 +1046,6 @@ UPDATE 3
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -582,6 +582,8 @@ Fixed 5 occurences of: GENEBANKS
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -590,8 +592,6 @@ Fixed 5 occurences of: GENEBANKS
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -591,6 +591,8 @@ $ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -599,8 +601,6 @@ $ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -520,6 +520,8 @@ $ psql -h localhost -U postgres dspacetest
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -528,8 +530,6 @@ $ psql -h localhost -U postgres dspacetest
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -514,6 +514,8 @@ $ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 &gt; map-to-cifor-archive.csv
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -522,8 +524,6 @@ $ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 &gt; map-to-cifor-archive.csv
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -566,6 +566,8 @@ dspace=# select count(text_value) from metadatavalue where resource_type_id=2 an
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -574,8 +576,6 @@ dspace=# select count(text_value) from metadatavalue where resource_type_id=2 an
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -439,6 +439,8 @@ $ dspace database migrate ignored
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -447,8 +449,6 @@ $ dspace database migrate ignored
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -745,6 +745,8 @@ UPDATE metadatavalue SET text_value='ja' WHERE resource_type_id=2 AND metadata_f
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -753,8 +755,6 @@ UPDATE metadatavalue SET text_value='ja' WHERE resource_type_id=2 AND metadata_f
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -653,6 +653,8 @@ $ curl -X GET -H &quot;Content-Type: application/json&quot; -H &quot;Accept: app
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -661,8 +663,6 @@ $ curl -X GET -H &quot;Content-Type: application/json&quot; -H &quot;Accept: app
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -550,6 +550,8 @@ $ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -558,8 +560,6 @@ $ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -591,6 +591,8 @@ UPDATE 1
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -599,8 +601,6 @@ UPDATE 1
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -1261,6 +1261,8 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -1269,8 +1271,6 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -1341,6 +1341,8 @@ Please see the DSpace documentation for assistance.
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -1349,8 +1351,6 @@ Please see the DSpace documentation for assistance.
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -1205,6 +1205,8 @@ sys 0m2.551s
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -1213,8 +1215,6 @@ sys 0m2.551s
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -1296,6 +1296,8 @@ UPDATE 14
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -1304,8 +1306,6 @@ UPDATE 14
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -628,6 +628,8 @@ COPY 64871
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -636,8 +638,6 @@ COPY 64871
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -314,6 +314,8 @@ UPDATE 2
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -322,8 +324,6 @@ UPDATE 2
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -551,6 +551,8 @@ issn.validate('1020-3362')
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -559,8 +561,6 @@ issn.validate('1020-3362')
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -570,6 +570,8 @@ sys 2m27.496s
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -578,8 +580,6 @@ sys 2m27.496s
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -578,6 +578,8 @@ $ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institut
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -586,8 +588,6 @@ $ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institut
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -382,6 +382,8 @@ $ dspace import -a -c 10568/104057 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -390,8 +392,6 @@ $ dspace import -a -c 10568/104057 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -689,6 +689,8 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -697,8 +699,6 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -401,6 +401,8 @@ UPDATE 1
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -409,8 +411,6 @@ UPDATE 1
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -601,6 +601,8 @@ COPY 2900
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -609,8 +611,6 @@ COPY 2900
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -1272,6 +1272,8 @@ Moving: 21993 into core statistics-2019
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -1280,8 +1282,6 @@ Moving: 21993 into core statistics-2019
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -481,6 +481,8 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -489,8 +491,6 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -655,6 +655,8 @@ $ psql -c 'select * from pg_stat_activity' | wc -l
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -663,8 +665,6 @@ $ psql -c 'select * from pg_stat_activity' | wc -l
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -474,6 +474,8 @@ Caused by: java.lang.NullPointerException
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -482,8 +484,6 @@ Caused by: java.lang.NullPointerException
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -808,6 +808,8 @@ $ csvcut -c 'id,cg.subject.ilri[],cg.subject.ilri[en_US],dc.subject[en_US]' /tmp
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -816,8 +818,6 @@ $ csvcut -c 'id,cg.subject.ilri[],cg.subject.ilri[en_US],dc.subject[en_US]' /tmp
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -1139,6 +1139,8 @@ Fixed 4 occurences of: Muloi, D.M.
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -1147,8 +1149,6 @@ Fixed 4 occurences of: Muloi, D.M.
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -795,6 +795,8 @@ $ grep -c added /tmp/2020-08-27-countrycodetagger.log
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -803,8 +805,6 @@ $ grep -c added /tmp/2020-08-27-countrycodetagger.log
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -25,7 +25,7 @@ I filed an issue on OpenRXV to make some minor edits to the admin UI: https://gi
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-09/" />
<meta property="article:published_time" content="2020-09-02T15:35:54+03:00" />
<meta property="article:modified_time" content="2020-09-29T14:58:35+03:00" />
<meta property="article:modified_time" content="2020-10-01T10:47:40+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="September, 2020"/>
@ -57,7 +57,7 @@ I filed an issue on OpenRXV to make some minor edits to the admin UI: https://gi
"url": "https://alanorth.github.io/cgspace-notes/2020-09/",
"wordCount": "2970",
"datePublished": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-09-29T14:58:35+03:00",
"dateModified": "2020-10-01T10:47:40+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -714,6 +714,8 @@ solr_query_params = {
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -722,8 +724,6 @@ solr_query_params = {
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

198
docs/2020-10/index.html Normal file
View File

@ -0,0 +1,198 @@
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="October, 2020" />
<meta property="og:description" content="2020-10-06
Add tests for the new /items POST handlers to the DSpace 6.x branch of my dspace-statistics-api
It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-10/" />
<meta property="article:published_time" content="2020-10-06T16:55:54+03:00" />
<meta property="article:modified_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="October, 2020"/>
<meta name="twitter:description" content="2020-10-06
Add tests for the new /items POST handlers to the DSpace 6.x branch of my dspace-statistics-api
It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available
"/>
<meta name="generator" content="Hugo 0.75.1" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "October, 2020",
"url": "https://alanorth.github.io/cgspace-notes/2020-10/",
"wordCount": "40",
"datePublished": "2020-10-06T16:55:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2020-10/">
<title>October, 2020 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity="sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2020-10/">October, 2020</a></h2>
<p class="blog-post-meta"><time datetime="2020-10-06T16:55:54+03:00">Tue Oct 06, 2020</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2020-10-06">2020-10-06</h2>
<ul>
<li>Add tests for the new <code>/items</code> POST handlers to the DSpace 6.x branch of my <a href="https://github.com/ilri/dspace-statistics-api/tree/v6_x">dspace-statistics-api</a>
<ul>
<li>It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
<li><a href="/cgspace-notes/2020-07/">July, 2020</a></li>
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>

View File

@ -94,6 +94,8 @@
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -102,8 +104,6 @@
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>
@ -83,7 +83,7 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/categories/notes/">Notes</a></h2>
<p class="blog-post-meta"><time datetime="2020-09-02T15:35:54+03:00">Wed Sep 02, 2020</time> by Alan Orth</p>
<p class="blog-post-meta"><time datetime="2020-10-06T16:55:54+03:00">Tue Oct 06, 2020</time> by Alan Orth</p>
</header>
<a href='https://alanorth.github.io/cgspace-notes/categories/notes/'>Read more →</a>
@ -107,6 +107,8 @@
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -115,8 +117,6 @@
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -6,11 +6,11 @@
<description>Recent content in Categories on CGSpace Notes</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Wed, 02 Sep 2020 15:35:54 +0300</lastBuildDate><atom:link href="https://alanorth.github.io/cgspace-notes/categories/index.xml" rel="self" type="application/rss+xml" />
<lastBuildDate>Tue, 06 Oct 2020 16:55:54 +0300</lastBuildDate><atom:link href="https://alanorth.github.io/cgspace-notes/categories/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Notes</title>
<link>https://alanorth.github.io/cgspace-notes/categories/notes/</link>
<pubDate>Wed, 02 Sep 2020 15:35:54 +0300</pubDate>
<pubDate>Tue, 06 Oct 2020 16:55:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/categories/notes/</guid>
<description></description>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>
@ -80,6 +80,31 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2020-10/">October, 2020</a></h2>
<p class="blog-post-meta"><time datetime="2020-10-06T16:55:54+03:00">Tue Oct 06, 2020</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2020-10-06">2020-10-06</h2>
<ul>
<li>Add tests for the new <code>/items</code> POST handlers to the DSpace 6.x branch of my <a href="https://github.com/ilri/dspace-statistics-api/tree/v6_x">dspace-statistics-api</a>
<ul>
<li>It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available</li>
</ul>
</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2020-10/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2020-09/">September, 2020</a></h2>
@ -348,38 +373,6 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-12/">December, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-12-01T11:22:30+02:00">Sun Dec 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
<li>Check any packages that have residual configs and purge them:</li>
<li><!-- raw HTML omitted --># dpkg -l | grep -E &lsquo;^rc&rsquo; | awk &lsquo;{print $2}&rsquo; | xargs dpkg -P<!-- raw HTML omitted --></li>
<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
</ul>
</li>
</ul>
<pre><code># apt update &amp;&amp; apt full-upgrade
# apt-get autoremove &amp;&amp; apt-get autoclean
# dpkg -C
# reboot
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-12/'>Read more →</a>
</article>
<nav class="blog-pagination">
@ -404,6 +397,8 @@
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -412,8 +407,6 @@
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -6,7 +6,23 @@
<description>Recent content in Notes on CGSpace Notes</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Wed, 02 Sep 2020 15:35:54 +0300</lastBuildDate><atom:link href="https://alanorth.github.io/cgspace-notes/categories/notes/index.xml" rel="self" type="application/rss+xml" />
<lastBuildDate>Tue, 06 Oct 2020 16:55:54 +0300</lastBuildDate><atom:link href="https://alanorth.github.io/cgspace-notes/categories/notes/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>October, 2020</title>
<link>https://alanorth.github.io/cgspace-notes/2020-10/</link>
<pubDate>Tue, 06 Oct 2020 16:55:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2020-10/</guid>
<description>&lt;h2 id=&#34;2020-10-06&#34;&gt;2020-10-06&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add tests for the new &lt;code&gt;/items&lt;/code&gt; POST handlers to the DSpace 6.x branch of my &lt;a href=&#34;https://github.com/ilri/dspace-statistics-api/tree/v6_x&#34;&gt;dspace-statistics-api&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;</description>
</item>
<item>
<title>September, 2020</title>
<link>https://alanorth.github.io/cgspace-notes/2020-09/</link>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>
@ -80,6 +80,38 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-12/">December, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-12-01T11:22:30+02:00">Sun Dec 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
<li>Check any packages that have residual configs and purge them:</li>
<li><!-- raw HTML omitted --># dpkg -l | grep -E &lsquo;^rc&rsquo; | awk &lsquo;{print $2}&rsquo; | xargs dpkg -P<!-- raw HTML omitted --></li>
<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
</ul>
</li>
</ul>
<pre><code># apt update &amp;&amp; apt full-upgrade
# apt-get autoremove &amp;&amp; apt-get autoclean
# dpkg -C
# reboot
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-12/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-11/">November, 2019</a></h2>
@ -361,38 +393,6 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-03/">March, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-03-01T12:16:30+01:00">Fri Mar 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11
<ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li>
</ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/categories/notes/" rel="prev" role="button">Previous page</a>
@ -417,6 +417,8 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -425,8 +427,6 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>
@ -80,6 +80,38 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-03/">March, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-03-01T12:16:30+01:00">Fri Mar 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11
<ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li>
</ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-02/">February, 2019</a></h2>
@ -355,34 +387,6 @@ sys 2m7.289s
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-05/">May, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-05-01T16:43:54+03:00">Tue May 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-05/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/categories/notes/page/2/" rel="prev" role="button">Previous page</a>
@ -407,6 +411,8 @@ sys 2m7.289s
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -415,8 +421,6 @@ sys 2m7.289s
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Notes"/>
@ -80,6 +80,34 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-05/">May, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-05-01T16:43:54+03:00">Tue May 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-05/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-04/">April, 2018</a></h2>
@ -357,6 +385,8 @@ COPY 54701
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -365,8 +395,6 @@ COPY 54701
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -1,515 +0,0 @@
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="Categories" />
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2020-06-23T16:13:27+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.72.0" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Blog",
"headline": "CGSpace Notes",
"url" : "https://alanorth.github.io/cgspace-notes/categories/",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-06-01T13:55:39+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/categories/">
<title>CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity="sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
<link rel="alternate" type="application/rss+xml" href="https://alanorth.github.io/cgspace-notes/categories/index.xml" title="CGSpace Notes" />
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-09/">September, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-09-01T10:17:51+03:00">Sun Sep 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-08/">August, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-08-03T12:39:51+03:00">Sat Aug 03, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-08-03">2019-08-03</h2>
<ul>
<li>Look at Bioversity&rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul>
<h2 id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
<ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li>
<li>After rebooting, all statistics cores were loaded&hellip; wow, that&rsquo;s lucky.</li>
</ul>
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-07/">July, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-07-01T12:13:51+03:00">Mon Jul 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-07-01">2019-07-01</h2>
<ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
</ul>
</li>
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-06/">June, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-06-02T10:57:51+03:00">Sun Jun 02, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-06-02">2019-06-02</h2>
<ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul>
<h2 id="2019-06-03">2019-06-03</h2>
<ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-06/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-05/">May, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-05-01T07:37:43+03:00">Wed May 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-05-01">2019-05-01</h2>
<ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<ul>
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
</ul>
</li>
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
</ul>
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1
</code></pre><ul>
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-04/">April, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-04-01T09:00:43+03:00">Mon Apr 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>
<li>They asked if we had plans to enable RDF support in CGSpace</li>
</ul>
</li>
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
<ul>
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
</ul>
</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
</code></pre><ul>
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-03/">March, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-03-01T12:16:30+01:00">Fri Mar 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11
<ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li>
</ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-02/">February, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-02-01T21:37:30+02:00">Fri Feb 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-02-01">2019-02-01</h2>
<ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5
332 54.70.40.11
385 5.143.231.38
405 207.46.13.173
405 207.46.13.75
1117 66.249.66.219
1121 35.237.175.180
1546 5.9.6.51
2474 45.5.186.2
5490 85.25.237.71
</code></pre><ul>
<li><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</li>
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
<li>There were just over 3 million accesses in the nginx logs last month:</li>
</ul>
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot;
3018243
real 0m19.873s
user 0m22.203s
sys 0m1.979s
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-01/">January, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-01-02T09:48:30+02:00">Wed Jan 02, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-01-02">2019-01-02</h2>
<ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don&rsquo;t see anything interesting in the web server logs around that time though:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-12/">December, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-12-02T02:09:30+02:00">Sun Dec 02, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-12-01">2018-12-01</h2>
<ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li>
</ul>
<h2 id="2018-12-02">2018-12-02</h2>
<ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-12/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/categories/" rel="prev" role="button">Previous page</a>
<a class="btn btn-outline-primary" href="/cgspace-notes/categories/page/3/" rel="next" role="button">Next page</a>
</nav>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
<li><a href="/cgspace-notes/2020-04/">April, 2020</a></li>
<li><a href="/cgspace-notes/2020-03/">March, 2020</a></li>
<li><a href="/cgspace-notes/2020-02/">February, 2020</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>

View File

@ -1,437 +0,0 @@
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="Categories" />
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2020-06-23T16:13:27+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.72.0" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Blog",
"headline": "CGSpace Notes",
"url" : "https://alanorth.github.io/cgspace-notes/categories/",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-06-01T13:55:39+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/categories/">
<title>CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity="sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
<link rel="alternate" type="application/rss+xml" href="https://alanorth.github.io/cgspace-notes/categories/index.xml" title="CGSpace Notes" />
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-11/">November, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-11-01T16:41:30+02:00">Thu Nov 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-11-01">2018-11-01</h2>
<ul>
<li>Finalize AReS Phase I and Phase II ToRs</li>
<li>Send a note about my <a href="https://github.com/ilri/dspace-statistics-api">dspace-statistics-api</a> to the dspace-tech mailing list</li>
</ul>
<h2 id="2018-11-03">2018-11-03</h2>
<ul>
<li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li>
<li>Today these are the top 10 IPs:</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-11/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-10/">October, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-10-01T22:31:54+03:00">Mon Oct 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-10-01">2018-10-01</h2>
<ul>
<li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li>
<li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I&rsquo;m super busy in Nairobi right now</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-10/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-09/">September, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-09-02T09:55:54+03:00">Sun Sep 02, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-09-02">2018-09-02</h2>
<ul>
<li>New <a href="https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5">PostgreSQL JDBC driver version 42.2.5</a></li>
<li>I&rsquo;ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li>
<li>Also, I&rsquo;ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system&rsquo;s RAM, and we never re-ran them after migrating to larger Linodes last month</li>
<li>I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&rsquo;m getting those autowire errors in Tomcat 8.5.30 again:</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-09/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-08/">August, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-08-01T11:52:54+03:00">Wed Aug 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-08-01">2018-08-01</h2>
<ul>
<li>DSpace Test had crashed at some point yesterday morning and I see the following in <code>dmesg</code>:</li>
</ul>
<pre><code>[Tue Jul 31 00:00:41 2018] Out of memory: Kill process 1394 (java) score 668 or sacrifice child
[Tue Jul 31 00:00:41 2018] Killed process 1394 (java) total-vm:15601860kB, anon-rss:5355528kB, file-rss:0kB, shmem-rss:0kB
[Tue Jul 31 00:00:41 2018] oom_reaper: reaped process 1394 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
</code></pre><ul>
<li>Judging from the time of the crash it was probably related to the Discovery indexing that starts at midnight</li>
<li>From the DSpace log I see that eventually Solr stopped responding, so I guess the <code>java</code> process that was OOM killed above was Tomcat&rsquo;s</li>
<li>I&rsquo;m not sure why Tomcat didn&rsquo;t crash with an OutOfMemoryError&hellip;</li>
<li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li>
<li>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</li>
<li>I ran all system updates on DSpace Test and rebooted it</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-08/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-07/">July, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-07-01T12:56:54+03:00">Sun Jul 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-07-01">2018-07-01</h2>
<ul>
<li>I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:</li>
</ul>
<pre><code>$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
</code></pre><ul>
<li>During the <code>mvn package</code> stage on the 5.8 branch I kept getting issues with java running out of memory:</li>
</ul>
<pre><code>There is insufficient memory for the Java Runtime Environment to continue.
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2018-07/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-06/">June, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-06-04T19:49:54-07:00">Mon Jun 04, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-06-04">2018-06-04</h2>
<ul>
<li>Test the <a href="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560">DSpace 5.8 module upgrades from Atmire</a> (<a href="https://github.com/ilri/DSpace/pull/378">#378</a>)
<ul>
<li>There seems to be a problem with the CUA and L&amp;R versions in <code>pom.xml</code> because they are using SNAPSHOT and it doesn&rsquo;t build</li>
</ul>
</li>
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
</code></pre><ul>
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
<li>Time to index ~70,000 items on CGSpace:</li>
</ul>
<pre><code>$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
real 74m42.646s
user 8m5.056s
sys 2m7.289s
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2018-06/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-05/">May, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-05-01T16:43:54+03:00">Tue May 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-05/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-04/">April, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-04-01T16:13:54+02:00">Sun Apr 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-04-01">2018-04-01</h2>
<ul>
<li>I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when</li>
<li>Catalina logs at least show some memory errors yesterday:</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-04/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-03/">March, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-03-02T16:07:54+02:00">Fri Mar 02, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-03-02">2018-03-02</h2>
<ul>
<li>Export a CSV of the IITA community metadata for Martin Mueller</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-03/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-02/">February, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-02-01T16:28:54+02:00">Thu Feb 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-02-01">2018-02-01</h2>
<ul>
<li>Peter gave feedback on the <code>dc.rights</code> proof of concept that I had sent him last week</li>
<li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li>
<li>Yesterday I figured out how to monitor DSpace sessions using JMX</li>
<li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="/cgspace-notes/2018-01/">in 2018-01</a></li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-02/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/categories/page/2/" rel="prev" role="button">Previous page</a>
<a class="btn btn-outline-primary" href="/cgspace-notes/categories/page/4/" rel="next" role="button">Next page</a>
</nav>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
<li><a href="/cgspace-notes/2020-04/">April, 2020</a></li>
<li><a href="/cgspace-notes/2020-03/">March, 2020</a></li>
<li><a href="/cgspace-notes/2020-02/">February, 2020</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>

View File

@ -1,486 +0,0 @@
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="Categories" />
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2020-06-23T16:13:27+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.72.0" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Blog",
"headline": "CGSpace Notes",
"url" : "https://alanorth.github.io/cgspace-notes/categories/",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-06-01T13:55:39+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/categories/">
<title>CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity="sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
<link rel="alternate" type="application/rss+xml" href="https://alanorth.github.io/cgspace-notes/categories/index.xml" title="CGSpace Notes" />
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-01/">January, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-01-02T08:35:54-08:00">Tue Jan 02, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-01-02">2018-01-02</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time</li>
<li>I didn&rsquo;t get any load alerts from Linode and the REST and XMLUI logs don&rsquo;t show anything out of the ordinary</li>
<li>The nginx logs show HTTP 200s until <code>02/Jan/2018:11:27:17 +0000</code> when Uptime Robot got an HTTP 500</li>
<li>In dspace.log around that time I see many errors like &ldquo;Client closed the connection before file download was complete&rdquo;</li>
<li>And just before that I see this:</li>
</ul>
<pre><code>Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
</code></pre><ul>
<li>Ah hah! So the pool was actually empty!</li>
<li>I need to increase that, let&rsquo;s try to bump it up from 50 to 75</li>
<li>After that one client got an HTTP 499 but then the rest were HTTP 200, so I don&rsquo;t know what the hell Uptime Robot saw</li>
<li>I notice this error quite a few times in dspace.log:</li>
</ul>
<pre><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered &quot; &quot;]&quot; &quot;] &quot;&quot; at line 1, column 32.
</code></pre><ul>
<li>And there are many of these errors every day for the past month:</li>
</ul>
<pre><code>$ grep -c &quot;Error while searching for sidebar facets&quot; dspace.log.*
dspace.log.2017-11-21:4
dspace.log.2017-11-22:1
dspace.log.2017-11-23:4
dspace.log.2017-11-24:11
dspace.log.2017-11-25:0
dspace.log.2017-11-26:1
dspace.log.2017-11-27:7
dspace.log.2017-11-28:21
dspace.log.2017-11-29:31
dspace.log.2017-11-30:15
dspace.log.2017-12-01:15
dspace.log.2017-12-02:20
dspace.log.2017-12-03:38
dspace.log.2017-12-04:65
dspace.log.2017-12-05:43
dspace.log.2017-12-06:72
dspace.log.2017-12-07:27
dspace.log.2017-12-08:15
dspace.log.2017-12-09:29
dspace.log.2017-12-10:35
dspace.log.2017-12-11:20
dspace.log.2017-12-12:44
dspace.log.2017-12-13:36
dspace.log.2017-12-14:59
dspace.log.2017-12-15:104
dspace.log.2017-12-16:53
dspace.log.2017-12-17:66
dspace.log.2017-12-18:83
dspace.log.2017-12-19:101
dspace.log.2017-12-20:74
dspace.log.2017-12-21:55
dspace.log.2017-12-22:66
dspace.log.2017-12-23:50
dspace.log.2017-12-24:85
dspace.log.2017-12-25:62
dspace.log.2017-12-26:49
dspace.log.2017-12-27:30
dspace.log.2017-12-28:54
dspace.log.2017-12-29:68
dspace.log.2017-12-30:89
dspace.log.2017-12-31:53
dspace.log.2018-01-01:45
dspace.log.2018-01-02:34
</code></pre><ul>
<li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-01/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-12/">December, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-12-01T13:53:54+03:00">Fri Dec 01, 2017</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2017-12-01">2017-12-01</h2>
<ul>
<li>Uptime Robot noticed that CGSpace went down</li>
<li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li>
<li>PostgreSQL activity says there are 115 connections currently</li>
<li>The list of connections to XMLUI and REST API for today:</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-12/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-11/">November, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-11-02T09:37:54+02:00">Thu Nov 02, 2017</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2017-11-01">2017-11-01</h2>
<ul>
<li>The CORE developers responded to say they are looking into their bot not respecting our robots.txt</li>
</ul>
<h2 id="2017-11-02">2017-11-02</h2>
<ul>
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
</ul>
<pre><code># grep -c &quot;CORE&quot; /var/log/nginx/access.log
0
</code></pre><ul>
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
</ul>
<pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-10/">October, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-10-01T08:07:54+03:00">Sun Oct 01, 2017</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2017-10-01">2017-10-01</h2>
<ul>
<li>Peter emailed to point out that many items in the <a href="https://cgspace.cgiar.org/handle/10568/2703">ILRI archive collection</a> have multiple handles:</li>
</ul>
<pre><code>http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
</code></pre><ul>
<li>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li>
<li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-10/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/cgiar-library-migration/">CGIAR Library Migration</a></h2>
<p class="blog-post-meta"><time datetime="2017-09-18T16:38:35+03:00">Mon Sep 18, 2017</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/migration/" rel="tag">Migration</a>
</p>
</header>
<p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p>
<a href='https://alanorth.github.io/cgspace-notes/cgiar-library-migration/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-09/">September, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-09-07T16:54:52+07:00">Thu Sep 07, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-09-06">2017-09-06</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours</li>
</ul>
<h2 id="2017-09-07">2017-09-07</h2>
<ul>
<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-09/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-08/">August, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-08-01T11:51:52+03:00">Tue Aug 01, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like:
<ul>
<li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li>
</ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li>
<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-08/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-07/">July, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-07-01T18:03:52+03:00">Sat Jul 01, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-07-01">2017-07-01</h2>
<ul>
<li>Run system updates and reboot DSpace Test</li>
</ul>
<h2 id="2017-07-04">2017-07-04</h2>
<ul>
<li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li>
<li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li>
<li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-07/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-06/">June, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-06-01T10:14:52+03:00">Thu Jun 01, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg.
<a href='https://alanorth.github.io/cgspace-notes/2017-06/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-05/">May, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-05-01T16:21:52+02:00">Mon May 01, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace.
<a href='https://alanorth.github.io/cgspace-notes/2017-05/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/categories/page/3/" rel="prev" role="button">Previous page</a>
<a class="btn btn-outline-primary" href="/cgspace-notes/categories/page/5/" rel="next" role="button">Next page</a>
</nav>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
<li><a href="/cgspace-notes/2020-04/">April, 2020</a></li>
<li><a href="/cgspace-notes/2020-03/">March, 2020</a></li>
<li><a href="/cgspace-notes/2020-02/">February, 2020</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>

View File

@ -1,465 +0,0 @@
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="Categories" />
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2020-06-23T16:13:27+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.72.0" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Blog",
"headline": "CGSpace Notes",
"url" : "https://alanorth.github.io/cgspace-notes/categories/",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-06-01T13:55:39+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/categories/">
<title>CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity="sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
<link rel="alternate" type="application/rss+xml" href="https://alanorth.github.io/cgspace-notes/categories/index.xml" title="CGSpace Notes" />
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-04/">April, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-04-02T17:08:52+02:00">Sun Apr 02, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-04-02">2017-04-02</h2>
<ul>
<li>Merge one change to CCAFS flagships that I had forgotten to remove last month (&ldquo;MANAGING CLIMATE RISK&rdquo;): <a href="https://github.com/ilri/DSpace/pull/317">https://github.com/ilri/DSpace/pull/317</a></li>
<li>Quick proof-of-concept hack to add <code>dc.rights</code> to the input form, including some inline instructions/hints:</li>
</ul>
<p><img src="/cgspace-notes/2017/04/dc-rights.png" alt="dc.rights in the submission form"></p>
<ul>
<li>Remove redundant/duplicate text in the DSpace submission license</li>
<li>Testing the CMYK patch on a collection with 650 items:</li>
</ul>
<pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2017-04/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-03/">March, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-03-01T17:08:52+02:00">Wed Mar 01, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-03-01">2017-03-01</h2>
<ul>
<li>Run the 279 CIAT author corrections on CGSpace</li>
</ul>
<h2 id="2017-03-02">2017-03-02</h2>
<ul>
<li>Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace</li>
<li>CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles</li>
<li>They might come in at the top level in one &ldquo;CGIAR System&rdquo; community, or with several communities</li>
<li>I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?</li>
<li>Need to send Peter and Michael some notes about this in a few days</li>
<li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li>
<li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li>
<li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li>
<li>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li>
</ul>
<pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2017-03/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-02/">February, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-02-07T07:04:52-08:00">Tue Feb 07, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-02-07">2017-02-07</h2>
<ul>
<li>An item was mapped twice erroneously again, so I had to remove one of the mappings manually:</li>
</ul>
<pre><code>dspace=# select * from collection2item where item_id = '80278';
id | collection_id | item_id
-------+---------------+---------
92551 | 313 | 80278
92550 | 313 | 80278
90774 | 1051 | 80278
(3 rows)
dspace=# delete from collection2item where id = 92551 and item_id = 80278;
DELETE 1
</code></pre><ul>
<li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li>
<li>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-02/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-01/">January, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-01-02T10:43:00+03:00">Mon Jan 02, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-01-02">2017-01-02</h2>
<ul>
<li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li>
<li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li>
<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-01/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-12/">December, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-12-02T10:43:00+03:00">Fri Dec 02, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-12-02">2016-12-02</h2>
<ul>
<li>CGSpace was down for five hours in the morning while I was sleeping</li>
<li>While looking in the logs for errors, I see tons of warnings about Atmire MQM:</li>
</ul>
<pre><code>2016-12-02 03:00:32,352 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail=&quot;dc.title&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail=&quot;THUMBNAIL&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;)
</code></pre><ul>
<li>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</li>
<li>I&rsquo;ve raised a ticket with Atmire to ask</li>
<li>Another worrying error from dspace.log is:</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-12/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-11/">November, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-11-01T09:21:00+03:00">Tue Nov 01, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-11-01">2016-11-01</h2>
<ul>
<li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li>
</ul>
<p><img src="/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p>
<a href='https://alanorth.github.io/cgspace-notes/2016-11/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-10/">October, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-10-03T15:53:00+03:00">Mon Oct 03, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.lyrasis.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
<ul>
<li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li>
</ul>
</li>
<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-09/">September, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-09-01T15:53:00+03:00">Thu Sep 01, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-09-01">2016-09-01</h2>
<ul>
<li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li>
<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li>
<li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li>
<li>It looks like we might be able to use OUs now, instead of DCs:</li>
</ul>
<pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot;
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-09/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-08/">August, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-08-01T15:53:00+03:00">Mon Aug 01, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-08-01">2016-08-01</h2>
<ul>
<li>Add updated distribution license from Sisay (<a href="https://github.com/ilri/DSpace/issues/259">#259</a>)</li>
<li>Play with upgrading Mirage 2 dependencies in <code>bower.json</code> because most are several versions of out date</li>
<li>Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more</li>
<li>bower stuff is a dead end, waste of time, too many issues</li>
<li>Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of <code>fonts</code>)</li>
<li>Start working on DSpace 5.15.5 port:</li>
</ul>
<pre><code>$ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-08/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-07/">July, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-07-01T10:53:00+03:00">Fri Jul 01, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-07-01">2016-07-01</h2>
<ul>
<li>Add <code>dc.description.sponsorship</code> to Discovery sidebar facets and make investors clickable in item view (<a href="https://github.com/ilri/DSpace/issues/232">#232</a>)</li>
<li>I think this query should find and replace all authors that have &ldquo;,&rdquo; at the end of their names:</li>
</ul>
<pre><code>dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
UPDATE 95
dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
text_value
------------
(0 rows)
</code></pre><ul>
<li>In this case the select query was showing 95 results before the update</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-07/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/categories/page/4/" rel="prev" role="button">Previous page</a>
<a class="btn btn-outline-primary" href="/cgspace-notes/categories/page/6/" rel="next" role="button">Next page</a>
</nav>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
<li><a href="/cgspace-notes/2020-04/">April, 2020</a></li>
<li><a href="/cgspace-notes/2020-03/">March, 2020</a></li>
<li><a href="/cgspace-notes/2020-02/">February, 2020</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>

View File

@ -1,376 +0,0 @@
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="Categories" />
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2020-06-23T16:13:27+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Categories"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.72.0" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Blog",
"headline": "CGSpace Notes",
"url" : "https://alanorth.github.io/cgspace-notes/categories/",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-06-01T13:55:39+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/categories/">
<title>CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity="sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
<link rel="alternate" type="application/rss+xml" href="https://alanorth.github.io/cgspace-notes/categories/index.xml" title="CGSpace Notes" />
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-06/">June, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-06-01T10:53:00+03:00">Wed Jun 01, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-06-01">2016-06-01</h2>
<ul>
<li>Experimenting with IFPRI OAI (we want to harvest their publications)</li>
<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li>
<li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li>
<li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li>
<li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li>
<li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-06/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-05/">May, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-05-01T23:06:00+03:00">Sun May 01, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-05-01">2016-05-01</h2>
<ul>
<li>Since yesterday there have been 10,000 REST errors and the site has been unstable again</li>
<li>I have blocked access to the API now</li>
<li>There are 3,000 IPs accessing the REST API in a 24-hour period!</li>
</ul>
<pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-05/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-04/">April, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-04-04T11:06:00+03:00">Mon Apr 04, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-04-04">2016-04-04</h2>
<ul>
<li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li>
<li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li>
<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li>
<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li>
<li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-04/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-03/">March, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-03-02T16:50:00+03:00">Wed Mar 02, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-03-02">2016-03-02</h2>
<ul>
<li>Looking at issues with author authorities on CGSpace</li>
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li>
<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-03/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-02/">February, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-02-05T13:18:00+03:00">Fri Feb 05, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-02-05">2016-02-05</h2>
<ul>
<li>Looking at some DAGRIS data for Abenet Yabowork</li>
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
<li>I noticed we have a very <em>interesting</em> list of countries on CGSpace:</li>
</ul>
<p><img src="/cgspace-notes/2016/02/cgspace-countries.png" alt="CGSpace country list"></p>
<ul>
<li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li>
<li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-02/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-01/">January, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-01-13T13:18:00+03:00">Wed Jan 13, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-01-13">2016-01-13</h2>
<ul>
<li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li>
<li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li>
<li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2016-01/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2015-12/">December, 2015</a></h2>
<p class="blog-post-meta"><time datetime="2015-12-02T13:18:00+03:00">Wed Dec 02, 2015</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2015-11/">November, 2015</a></h2>
<p class="blog-post-meta"><time datetime="2015-11-23T17:00:57+03:00">Mon Nov 23, 2015</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2015-11-22">2015-11-22</h2>
<ul>
<li>CGSpace went down</li>
<li>Looks like DSpace exhausted its PostgreSQL connection pool</li>
<li>Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:</li>
</ul>
<pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2015-11/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/categories/page/5/" rel="prev" role="button">Previous page</a>
<a class="btn btn-outline-primary disabled" href="#" role="button" aria-disabled="true">Next page</a>
</nav>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
<li><a href="/cgspace-notes/2020-04/">April, 2020</a></li>
<li><a href="/cgspace-notes/2020-03/">March, 2020</a></li>
<li><a href="/cgspace-notes/2020-02/">February, 2020</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>

View File

@ -279,6 +279,8 @@ dspace=# select setval('handle_seq',86873);
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -287,8 +289,6 @@ dspace=# select setval('handle_seq',86873);
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -451,6 +451,8 @@
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -459,8 +461,6 @@
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,31 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2020-10/">October, 2020</a></h2>
<p class="blog-post-meta"><time datetime="2020-10-06T16:55:54+03:00">Tue Oct 06, 2020</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2020-10-06">2020-10-06</h2>
<ul>
<li>Add tests for the new <code>/items</code> POST handlers to the DSpace 6.x branch of my <a href="https://github.com/ilri/dspace-statistics-api/tree/v6_x">dspace-statistics-api</a>
<ul>
<li>It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available</li>
</ul>
</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2020-10/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2020-09/">September, 2020</a></h2>
@ -363,38 +388,6 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-12/">December, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-12-01T11:22:30+02:00">Sun Dec 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
<li>Check any packages that have residual configs and purge them:</li>
<li><!-- raw HTML omitted --># dpkg -l | grep -E &lsquo;^rc&rsquo; | awk &lsquo;{print $2}&rsquo; | xargs dpkg -P<!-- raw HTML omitted --></li>
<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
</ul>
</li>
</ul>
<pre><code># apt update &amp;&amp; apt full-upgrade
# apt-get autoremove &amp;&amp; apt-get autoclean
# dpkg -C
# reboot
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-12/'>Read more →</a>
</article>
<nav class="blog-pagination">
@ -419,6 +412,8 @@
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -427,8 +422,6 @@
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -6,7 +6,23 @@
<description>Recent content on CGSpace Notes</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Wed, 02 Sep 2020 15:35:54 +0300</lastBuildDate><atom:link href="https://alanorth.github.io/cgspace-notes/index.xml" rel="self" type="application/rss+xml" />
<lastBuildDate>Tue, 06 Oct 2020 16:55:54 +0300</lastBuildDate><atom:link href="https://alanorth.github.io/cgspace-notes/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>October, 2020</title>
<link>https://alanorth.github.io/cgspace-notes/2020-10/</link>
<pubDate>Tue, 06 Oct 2020 16:55:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2020-10/</guid>
<description>&lt;h2 id=&#34;2020-10-06&#34;&gt;2020-10-06&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add tests for the new &lt;code&gt;/items&lt;/code&gt; POST handlers to the DSpace 6.x branch of my &lt;a href=&#34;https://github.com/ilri/dspace-statistics-api/tree/v6_x&#34;&gt;dspace-statistics-api&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;</description>
</item>
<item>
<title>September, 2020</title>
<link>https://alanorth.github.io/cgspace-notes/2020-09/</link>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,38 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-12/">December, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-12-01T11:22:30+02:00">Sun Dec 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
<li>Check any packages that have residual configs and purge them:</li>
<li><!-- raw HTML omitted --># dpkg -l | grep -E &lsquo;^rc&rsquo; | awk &lsquo;{print $2}&rsquo; | xargs dpkg -P<!-- raw HTML omitted --></li>
<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
</ul>
</li>
</ul>
<pre><code># apt update &amp;&amp; apt full-upgrade
# apt-get autoremove &amp;&amp; apt-get autoclean
# dpkg -C
# reboot
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-12/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-11/">November, 2019</a></h2>
@ -376,38 +408,6 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-03/">March, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-03-01T12:16:30+01:00">Fri Mar 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11
<ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li>
</ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/" rel="prev" role="button">Previous page</a>
@ -432,6 +432,8 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -440,8 +442,6 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,38 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-03/">March, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-03-01T12:16:30+01:00">Fri Mar 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11
<ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li>
</ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-02/">February, 2019</a></h2>
@ -370,34 +402,6 @@ sys 2m7.289s
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-05/">May, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-05-01T16:43:54+03:00">Tue May 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-05/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/page/2/" rel="prev" role="button">Previous page</a>
@ -422,6 +426,8 @@ sys 2m7.289s
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -430,8 +436,6 @@ sys 2m7.289s
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,34 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-05/">May, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-05-01T16:43:54+03:00">Tue May 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-05/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-04/">April, 2018</a></h2>
@ -373,45 +401,6 @@ COPY 54701
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-08/">August, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-08-01T11:51:52+03:00">Tue Aug 01, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like:
<ul>
<li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li>
</ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li>
<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-08/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/page/3/" rel="prev" role="button">Previous page</a>
@ -436,6 +425,8 @@ COPY 54701
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -444,8 +435,6 @@ COPY 54701
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,45 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-08/">August, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-08-01T11:51:52+03:00">Tue Aug 01, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like:
<ul>
<li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li>
</ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li>
<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-08/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-07/">July, 2017</a></h2>
@ -333,36 +372,6 @@ DELETE 1
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-10/">October, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-10-03T15:53:00+03:00">Mon Oct 03, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.lyrasis.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
<ul>
<li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li>
</ul>
</li>
<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/page/4/" rel="prev" role="button">Previous page</a>
@ -387,6 +396,8 @@ DELETE 1
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -395,8 +406,6 @@ DELETE 1
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,36 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-10/">October, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-10-03T15:53:00+03:00">Mon Oct 03, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.lyrasis.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
<ul>
<li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li>
</ul>
</li>
<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-09/">September, 2016</a></h2>
@ -332,33 +362,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2015-12/">December, 2015</a></h2>
<p class="blog-post-meta"><time datetime="2015-12-02T13:18:00+03:00">Wed Dec 02, 2015</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/page/5/" rel="prev" role="button">Previous page</a>
@ -383,6 +386,8 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -391,8 +396,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="CGSpace Notes"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,33 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2015-12/">December, 2015</a></h2>
<p class="blog-post-meta"><time datetime="2015-12-02T13:18:00+03:00">Wed Dec 02, 2015</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2015-11/">November, 2015</a></h2>
@ -144,6 +171,8 @@
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -152,8 +181,6 @@
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,31 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2020-10/">October, 2020</a></h2>
<p class="blog-post-meta"><time datetime="2020-10-06T16:55:54+03:00">Tue Oct 06, 2020</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2020-10-06">2020-10-06</h2>
<ul>
<li>Add tests for the new <code>/items</code> POST handlers to the DSpace 6.x branch of my <a href="https://github.com/ilri/dspace-statistics-api/tree/v6_x">dspace-statistics-api</a>
<ul>
<li>It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available</li>
</ul>
</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2020-10/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2020-09/">September, 2020</a></h2>
@ -363,38 +388,6 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-12/">December, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-12-01T11:22:30+02:00">Sun Dec 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
<li>Check any packages that have residual configs and purge them:</li>
<li><!-- raw HTML omitted --># dpkg -l | grep -E &lsquo;^rc&rsquo; | awk &lsquo;{print $2}&rsquo; | xargs dpkg -P<!-- raw HTML omitted --></li>
<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
</ul>
</li>
</ul>
<pre><code># apt update &amp;&amp; apt full-upgrade
# apt-get autoremove &amp;&amp; apt-get autoclean
# dpkg -C
# reboot
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-12/'>Read more →</a>
</article>
<nav class="blog-pagination">
@ -419,6 +412,8 @@
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -427,8 +422,6 @@
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -6,7 +6,23 @@
<description>Recent content in Posts on CGSpace Notes</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Wed, 02 Sep 2020 15:35:54 +0300</lastBuildDate><atom:link href="https://alanorth.github.io/cgspace-notes/posts/index.xml" rel="self" type="application/rss+xml" />
<lastBuildDate>Tue, 06 Oct 2020 16:55:54 +0300</lastBuildDate><atom:link href="https://alanorth.github.io/cgspace-notes/posts/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>October, 2020</title>
<link>https://alanorth.github.io/cgspace-notes/2020-10/</link>
<pubDate>Tue, 06 Oct 2020 16:55:54 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2020-10/</guid>
<description>&lt;h2 id=&#34;2020-10-06&#34;&gt;2020-10-06&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Add tests for the new &lt;code&gt;/items&lt;/code&gt; POST handlers to the DSpace 6.x branch of my &lt;a href=&#34;https://github.com/ilri/dspace-statistics-api/tree/v6_x&#34;&gt;dspace-statistics-api&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;</description>
</item>
<item>
<title>September, 2020</title>
<link>https://alanorth.github.io/cgspace-notes/2020-09/</link>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,38 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-12/">December, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-12-01T11:22:30+02:00">Sun Dec 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-12-01">2019-12-01</h2>
<ul>
<li>Upgrade CGSpace (linode18) to Ubuntu 18.04:
<ul>
<li>Check any packages that have residual configs and purge them:</li>
<li><!-- raw HTML omitted --># dpkg -l | grep -E &lsquo;^rc&rsquo; | awk &lsquo;{print $2}&rsquo; | xargs dpkg -P<!-- raw HTML omitted --></li>
<li>Make sure all packages are up to date and the package manager is up to date, then reboot:</li>
</ul>
</li>
</ul>
<pre><code># apt update &amp;&amp; apt full-upgrade
# apt-get autoremove &amp;&amp; apt-get autoclean
# dpkg -C
# reboot
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-12/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-11/">November, 2019</a></h2>
@ -376,38 +408,6 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-03/">March, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-03-01T12:16:30+01:00">Fri Mar 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11
<ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li>
</ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/posts/" rel="prev" role="button">Previous page</a>
@ -432,6 +432,8 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -440,8 +442,6 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,38 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-03/">March, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-03-01T12:16:30+01:00">Fri Mar 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11
<ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li>
</ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-02/">February, 2019</a></h2>
@ -370,34 +402,6 @@ sys 2m7.289s
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-05/">May, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-05-01T16:43:54+03:00">Tue May 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-05/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/posts/page/2/" rel="prev" role="button">Previous page</a>
@ -422,6 +426,8 @@ sys 2m7.289s
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -430,8 +436,6 @@ sys 2m7.289s
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,34 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-05/">May, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-05-01T16:43:54+03:00">Tue May 01, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-05-01">2018-05-01</h2>
<ul>
<li>I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
<ul>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E</li>
<li>http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E</li>
</ul>
</li>
<li>Then I reduced the JVM heap size from 6144 back to 5120m</li>
<li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-05/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-04/">April, 2018</a></h2>
@ -373,45 +401,6 @@ COPY 54701
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-08/">August, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-08-01T11:51:52+03:00">Tue Aug 01, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like:
<ul>
<li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li>
</ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li>
<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-08/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/posts/page/3/" rel="prev" role="button">Previous page</a>
@ -436,6 +425,8 @@ COPY 54701
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -444,8 +435,6 @@ COPY 54701
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,45 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-08/">August, 2017</a></h2>
<p class="blog-post-meta"><time datetime="2017-08-01T11:51:52+03:00">Tue Aug 01, 2017</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2017-08-01">2017-08-01</h2>
<ul>
<li>Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours</li>
<li>I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)</li>
<li>The good thing is that, according to <code>dspace.log.2017-08-01</code>, they are all using the same Tomcat session</li>
<li>This means our Tomcat Crawler Session Valve is working</li>
<li>But many of the bots are browsing dynamic URLs like:
<ul>
<li>/handle/10568/3353/discover</li>
<li>/handle/10568/16510/browse</li>
</ul>
</li>
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
<li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li>
<li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li>
<li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li>
<li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li>
<li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2017-08/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2017-07/">July, 2017</a></h2>
@ -333,36 +372,6 @@ DELETE 1
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-10/">October, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-10-03T15:53:00+03:00">Mon Oct 03, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.lyrasis.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
<ul>
<li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li>
</ul>
</li>
<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/posts/page/4/" rel="prev" role="button">Previous page</a>
@ -387,6 +396,8 @@ DELETE 1
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -395,8 +406,6 @@ DELETE 1
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,36 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-10/">October, 2016</a></h2>
<p class="blog-post-meta"><time datetime="2016-10-03T15:53:00+03:00">Mon Oct 03, 2016</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2016-10-03">2016-10-03</h2>
<ul>
<li>Testing adding <a href="https://wiki.lyrasis.org/display/DSDOC5x/ORCID+Integration#ORCIDIntegration-EditingexistingitemsusingBatchCSVEditing">ORCIDs to a CSV</a> file for a single item to see if the author orders get messed up</li>
<li>Need to test the following scenarios to see how author order is affected:
<ul>
<li>ORCIDs only</li>
<li>ORCIDs plus normal authors</li>
</ul>
</li>
<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li>
</ul>
<pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2016-10/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2016-09/">September, 2016</a></h2>
@ -332,33 +362,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2015-12/">December, 2015</a></h2>
<p class="blog-post-meta"><time datetime="2015-12-02T13:18:00+03:00">Wed Dec 02, 2015</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/posts/page/5/" rel="prev" role="button">Previous page</a>
@ -383,6 +386,8 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -391,8 +396,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -9,7 +9,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2020-09-29T14:58:35+03:00" />
<meta property="og:updated_time" content="2020-10-06T16:55:54+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Posts"/>
@ -28,7 +28,7 @@
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2020-09-02T15:35:54+03:00",
"dateModified": "2020-10-06T16:55:54+03:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
@ -95,6 +95,33 @@
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2015-12/">December, 2015</a></h2>
<p class="blog-post-meta"><time datetime="2015-12-02T13:18:00+03:00">Wed Dec 02, 2015</time> by Alan Orth in
<span class="fas fa-tag" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/tags/notes/" rel="tag">Notes</a>
</p>
</header>
<h2 id="2015-12-02">2015-12-02</h2>
<ul>
<li>Replace <code>lzop</code> with <code>xz</code> in log compression cron jobs on DSpace Test—it uses less space:</li>
</ul>
<pre><code># cd /home/dspacetest.cgiar.org/log
# ls -lh dspace.log.2015-11-18*
-rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2015-12/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2015-11/">November, 2015</a></h2>
@ -144,6 +171,8 @@
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -152,8 +181,6 @@
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -4,6 +4,7 @@ User-agent: *
Disallow: /cgspace-notes/categories/
Disallow: /cgspace-notes/
Disallow: /cgspace-notes/categories/notes/
Disallow: /cgspace-notes/2020-10/
Disallow: /cgspace-notes/posts/
Disallow: /cgspace-notes/2020-09/
Disallow: /cgspace-notes/2020-08/

View File

@ -4,27 +4,32 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2020-09-29T14:58:35+03:00</lastmod>
<lastmod>2020-10-06T16:55:54+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2020-09-29T14:58:35+03:00</lastmod>
<lastmod>2020-10-06T16:55:54+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2020-09-29T14:58:35+03:00</lastmod>
<lastmod>2020-10-06T16:55:54+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2020-10/</loc>
<lastmod>2020-10-06T16:55:54+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2020-09-29T14:58:35+03:00</lastmod>
<lastmod>2020-10-06T16:55:54+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/2020-09/</loc>
<lastmod>2020-09-29T14:58:35+03:00</lastmod>
<lastmod>2020-10-01T10:47:40+03:00</lastmod>
</url>
<url>

View File

@ -121,6 +121,8 @@
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -129,8 +131,6 @@
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -134,6 +134,8 @@
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -142,8 +144,6 @@
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -384,6 +384,8 @@ DELETE 1
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -392,8 +394,6 @@ DELETE 1
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -370,6 +370,8 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -378,8 +380,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -179,6 +179,8 @@
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
@ -187,8 +189,6 @@
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
</ol>
</section>

View File

@ -1,515 +0,0 @@
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="Tags" />
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/tags/" />
<meta property="og:updated_time" content="2020-04-13T15:30:24+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="Tags"/>
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
<meta name="generator" content="Hugo 0.72.0" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Blog",
"headline": "CGSpace Notes",
"url" : "https://alanorth.github.io/cgspace-notes/tags/",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"dateModified": "2019-10-28T13:27:35+02:00",
"keywords": "notes,""migration,""notes,",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/tags/">
<title>CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity="sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
<link rel="alternate" type="application/rss+xml" href="https://alanorth.github.io/cgspace-notes/tags/index.xml" title="CGSpace Notes" />
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-09/">September, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-09-01T10:17:51+03:00">Sun Sep 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-09-01">2019-09-01</h2>
<ul>
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E &quot;01/Sep/2019:0&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-09/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-08/">August, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-08-03T12:39:51+03:00">Sat Aug 03, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-08-03">2019-08-03</h2>
<ul>
<li>Look at Bioversity&rsquo;s latest migration CSV and now I see that Francesco has cleaned up the extra columns and the newline at the end of the file, but many of the column headers have an extra space in the name&hellip;</li>
</ul>
<h2 id="2019-08-04">2019-08-04</h2>
<ul>
<li>Deploy ORCID identifier updates requested by Bioversity to CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it
<ul>
<li>Before updating it I checked Solr and verified that all statistics cores were loaded properly&hellip;</li>
<li>After rebooting, all statistics cores were loaded&hellip; wow, that&rsquo;s lucky.</li>
</ul>
</li>
<li>Run system updates on DSpace Test (linode19) and reboot it</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-08/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-07/">July, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-07-01T12:13:51+03:00">Mon Jul 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-07-01">2019-07-01</h2>
<ul>
<li>Create an &ldquo;AfricaRice books and book chapters&rdquo; collection on CGSpace for AfricaRice</li>
<li>Last month Sisay asked why the following &ldquo;most popular&rdquo; statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
<ul>
<li><a href="https://dspacetest.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">DSpace Test</a></li>
<li><a href="https://cgspace.cgiar.org/handle/10568/35697/most-popular/item#simplefilter=custom&amp;time_filter_end_date=01%2F12%2F2018">CGSpace</a></li>
</ul>
</li>
<li>Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-07/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-06/">June, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-06-02T10:57:51+03:00">Sun Jun 02, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-06-02">2019-06-02</h2>
<ul>
<li>Merge the <a href="https://github.com/ilri/DSpace/pull/425">Solr filterCache</a> and <a href="https://github.com/ilri/DSpace/pull/426">XMLUI ISI journal</a> changes to the <code>5_x-prod</code> branch and deploy on CGSpace</li>
<li>Run system updates on CGSpace (linode18) and reboot it</li>
</ul>
<h2 id="2019-06-03">2019-06-03</h2>
<ul>
<li>Skype with Marie-Angélique and Abenet about <a href="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2</a></li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-06/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-05/">May, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-05-01T07:37:43+03:00">Wed May 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-05-01">2019-05-01</h2>
<ul>
<li>Help CCAFS with regenerating some item thumbnails after they uploaded new PDFs to some items on CGSpace</li>
<li>A user on the dspace-tech mailing list offered some suggestions for troubleshooting the problem with the inability to delete certain items
<ul>
<li>Apparently if the item is in the <code>workflowitem</code> table it is submitted to a workflow</li>
<li>And if it is in the <code>workspaceitem</code> table it is in the pre-submitted state</li>
</ul>
</li>
<li>The item seems to be in a pre-submitted state, so I tried to delete it from there:</li>
</ul>
<pre><code>dspace=# DELETE FROM workspaceitem WHERE item_id=74648;
DELETE 1
</code></pre><ul>
<li>But after this I tried to delete the item from the XMLUI and it is <em>still</em> present&hellip;</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-05/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-04/">April, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-04-01T09:00:43+03:00">Mon Apr 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-04-01">2019-04-01</h2>
<ul>
<li>Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
<ul>
<li>They asked if we had plans to enable RDF support in CGSpace</li>
</ul>
</li>
<li>There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today
<ul>
<li>I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!</li>
</ul>
</li>
</ul>
<pre><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
4432 200
</code></pre><ul>
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
</ul>
<pre><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-03/">March, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-03-01T12:16:30+01:00">Fri Mar 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-03-01">2019-03-01</h2>
<ul>
<li>I checked IITA&rsquo;s 259 Feb 14 records from last month for duplicates using Atmire&rsquo;s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good</li>
<li>I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc&hellip;</li>
<li>Looking at the other half of Udana&rsquo;s WLE records from 2018-11
<ul>
<li>I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)</li>
<li>I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items</li>
<li>Most worryingly, there are encoding errors in the abstracts for eleven items, for example:</li>
<li>68.15% <20> 9.45 instead of 68.15% ± 9.45</li>
<li>2003<EFBFBD>2013 instead of 20032013</li>
</ul>
</li>
<li>I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs</li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2019-03/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-02/">February, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-02-01T21:37:30+02:00">Fri Feb 01, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-02-01">2019-02-01</h2>
<ul>
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
<li>The top IPs before, during, and after this latest alert tonight were:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;01/Feb/2019:(17|18|19|20|21)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
245 207.46.13.5
332 54.70.40.11
385 5.143.231.38
405 207.46.13.173
405 207.46.13.75
1117 66.249.66.219
1121 35.237.175.180
1546 5.9.6.51
2474 45.5.186.2
5490 85.25.237.71
</code></pre><ul>
<li><code>85.25.237.71</code> is the &ldquo;Linguee Bot&rdquo; that I first saw last month</li>
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
<li>There were just over 3 million accesses in the nginx logs last month:</li>
</ul>
<pre><code># time zcat --force /var/log/nginx/* | grep -cE &quot;[0-9]{1,2}/Jan/2019&quot;
3018243
real 0m19.873s
user 0m22.203s
sys 0m1.979s
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-02/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2019-01/">January, 2019</a></h2>
<p class="blog-post-meta"><time datetime="2019-01-02T09:48:30+02:00">Wed Jan 02, 2019</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2019-01-02">2019-01-02</h2>
<ul>
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
<li>I don&rsquo;t see anything interesting in the web server logs around that time though:</li>
</ul>
<pre><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E &quot;02/Jan/2019:0(1|2|3)&quot; | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
92 40.77.167.4
99 210.7.29.100
120 38.126.157.45
177 35.237.175.180
177 40.77.167.32
216 66.249.75.219
225 18.203.76.93
261 46.101.86.248
357 207.46.13.1
903 54.70.40.11
</code></pre>
<a href='https://alanorth.github.io/cgspace-notes/2019-01/'>Read more →</a>
</article>
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2018-12/">December, 2018</a></h2>
<p class="blog-post-meta"><time datetime="2018-12-02T02:09:30+02:00">Sun Dec 02, 2018</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2018-12-01">2018-12-01</h2>
<ul>
<li>Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK</li>
<li>I manually installed OpenJDK, then removed Oracle JDK, then re-ran the <a href="http://github.com/ilri/rmg-ansible-public">Ansible playbook</a> to update all configuration files, etc</li>
<li>Then I ran all system updates and restarted the server</li>
</ul>
<h2 id="2018-12-02">2018-12-02</h2>
<ul>
<li>I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another <a href="https://usn.ubuntu.com/3831-1/">Ghostscript vulnerability last week</a></li>
</ul>
<a href='https://alanorth.github.io/cgspace-notes/2018-12/'>Read more →</a>
</article>
<nav class="blog-pagination">
<a class="btn btn-outline-primary" href="/cgspace-notes/tags/" rel="prev" role="button">Previous page</a>
<a class="btn btn-outline-primary" href="/cgspace-notes/tags/page/3/" rel="next" role="button">Next page</a>
</nav>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
<li><a href="/cgspace-notes/2020-05/">May, 2020</a></li>
<li><a href="/cgspace-notes/2020-04/">April, 2020</a></li>
<li><a href="/cgspace-notes/2020-03/">March, 2020</a></li>
<li><a href="/cgspace-notes/2020-02/">February, 2020</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>

Some files were not shown because too many files have changed in this diff Show More