cgspace-notes/content/posts/2018-03.md

---
title: "March, 2018"
date: 2018-03-02T16:07:54+02:00
author: "Alan Orth"
tags: ["Notes"]
---

## 2018-03-02

- Export a CSV of the IITA community metadata for Martin Mueller

<!--more-->

## 2018-03-06

- Add three new CCAFS project tags to `input-forms.xml` ([#357](https://github.com/ilri/DSpace/pull/357))
- Andrea from Macaroni Bros had sent me an email that CCAFS needs them
- Give Udana more feedback on his WLE records from last month
- There were some records using a non-breaking space in their AGROVOC subject field
- I checked and tested some author corrections from Peter from last week, and then applied them on CGSpace

```
$ ./fix-metadata-values.py -i Correct-309-authors-2018-03-06.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3      
$ ./delete-metadata-values.py -i Delete-3-Authors-2018-03-06.csv -db dspace -u dspace-p 'fuuu' -f dc.contributor.author -m 3
```

- This time there were no errors in whitespace but I did have to correct one incorrectly encoded accent character
- Add new CRP subject "GRAIN LEGUMES AND DRYLAND CEREALS" to `input-forms.xml` ([#358](https://github.com/ilri/DSpace/pull/358))
- Merge the ORCID integration stuff in to `5_x-prod` for deployment on CGSpace soon ([#359](https://github.com/ilri/DSpace/pull/359))
- Deploy ORCID changes on CGSpace (linode18), run all system updates, and reboot the server
- Run all system updates on DSpace Test and reboot server
- I ran the [orcid-authority-to-item.py](https://gist.github.com/alanorth/24d8081a5dc25e2a4e27e548e7e2389c) script on CGSpace and mapped 2,864 ORCID identifiers from Solr to item metadata

```
$ ./orcid-authority-to-item.py -db dspace -u dspace -p 'fuuu' -s http://localhost:8081/solr -d
```

- I ran the DSpace cleanup script on CGSpace and it threw an error (as always):

```
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
  Detail: Key (bitstream_id)=(150659) is still referenced from table "bundle".
```

- The solution is, as always:
```
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (150659);'
UPDATE 1
```

- Apply the proposed PostgreSQL indexes from DS-3636 (pull request [#1791](https://github.com/DSpace/DSpace/pull/1791/) on CGSpace (linode18)

## 2018-03-07

- Add CIAT author Mauricio Efren Sotelo Cabrera to controlled vocabulary for ORCID identifiers ([#360](https://github.com/ilri/DSpace/pull/360))
- Help Sisay proof 200 IITA records on DSpace Test
- Finally import Udana's 24 items to [IWMI Journal Articles](https://cgspace.cgiar.org/handle/10568/36185) on CGSpace
- Skype with James Stapleton to discuss CGSpace, ILRI website, CKM staff issues, etc

## 2018-03-08

- Looking at a CSV dump of the CIAT community I see there are tons of stupid text languages people add for their metadata
- This makes the CSV have tons of columns, for example `dc.title`, `dc.title[]`, `dc.title[en]`, `dc.title[eng]`, `dc.title[en_US]` and so on!
- I think I can fix — or at least normalize — them in the database:

```
dspace=# select distinct text_lang from metadatavalue where resource_type_id=2;
 text_lang 
-----------
 
 ethnob
 en
 spa
 EN
 En
 en_
 en_US
 E.
 
 EN_US
 en_U
 eng
 fr
 es_ES
 es
(16 rows)

dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('en','EN','En','en_','EN_US','en_U','eng');
UPDATE 122227
dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2;
 text_lang
-----------

 ethnob
 en_US
 spa
 E.

 fr
 es_ES
 es
(9 rows)
```

- On second inspection it looks like `dc.description.provenance` fields use the text_lang "en" so that's probably why there are over 100,000 fields changed...
- If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc:

```
dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng');
UPDATE 2309
```

- I will apply this on CGSpace right now
- In other news, I was playing with adding ORCID identifiers to a dump of CIAT's community via CSV in OpenRefine
- Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the `cg.creator.id` field
- For example, a GREL expression in a custom text facet to get all items with `dc.contributor.author[en_US]` of a certain author with several name variations (this is how you use a logical OR in OpenRefine):

```
or(value.contains('Ceballos, Hern'), value.contains('Hernández Ceballos'))
```

- Then you can flag or star matching items and then use a conditional to either set the value directly or add it to an existing value:

```
if(isBlank(value), "Hernan Ceballos: 0000-0002-8744-7918", value + "||Hernan Ceballos: 0000-0002-8744-7918")
```

- One thing that bothers me is that this won't honor author order
- It might be better to do batches of these in PostgreSQL with a script that takes the `place` column of an author into account when setting the `cg.creator.id`
- I wrote a Python script to read the author names and ORCID identifiers from CSV and create matching `cg.creator.id` fields: [add-orcid-identifiers-csv.py ](https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050)
- The CSV should have two columns: author name and ORCID identifier:

```
dc.contributor.author,cg.creator.id
"Orth, Alan",Alan S. Orth: 0000-0002-1735-7458
"Orth, A.",Alan S. Orth: 0000-0002-1735-7458
```

- I didn't integrate the ORCID API lookup for author names in this script for now because I was only interested in "tagging" old items for a few given authors
- I added ORCID identifers for 187 items by CIAT's Hernan Ceballos, because that is what Elizabeth was trying to do manually!
- Also, I decided to add ORCID identifiers for all records from Peter, Abenet, and Sisay as well

## 2018-03-09

- Give James Stapleton input on Sisay's KRAs
- Create a pull request to disable ORCID authority integration for `dc.contributor.author` in the submission forms and XMLUI display ([#363](https://github.com/ilri/DSpace/pull/363))

## 2018-03-11

- Peter also wrote to say he is having issues with the Atmire Listings and Reports module
- When I logged in to try it I get a blank white page after continuing and I see this in dspace.log.2018-03-11:

```
2018-03-11 11:38:15,592 WARN  org.dspace.app.webui.servlet.InternalErrorServlet @ :session_id=91C2C0C59669B33A7683570F6010603A:internal_error:-- URL Was: https://cgspace.cgiar.or
g/jspui/listings-and-reports
-- Method: POST
-- Parameters were:
-- selected_admin_preset: "ilri authors2"
-- load: "normal"
-- next: "NEXT STEP >>"
-- step: "1"

org.apache.jasper.JasperException: java.lang.NullPointerException
```

- Looks like I needed to remove the Humidtropics subject from Listings and Reports because it was looking for the terms and couldn't find them
- I made a quick fix and it's working now ([#364](https://github.com/ilri/DSpace/pull/364))

## 2018-03-12

- Increase upload size on CGSpace's nginx config to 85MB so Sisay can upload some data

## 2018-03-13

- I created a new Linode server for DSpace Test (linode6623840) so I could try the block storage stuff, but when I went to add a 300GB volume it said that block storage capacity was exceeded in that datacenter (Newark, NJ)
- I deleted the Linode and created another one (linode6624164) in the Fremont, CA region
- After that I deployed the Ubuntu 16.04 image and attached a 300GB block storage volume to the image
- Magdalena wrote to ask why there was no Altmetric donut for an item on CGSpace, but there was one on the related CCAFS publication page
- It looks the the CCAFS publications page fetches the donut using its DOI, whereas CGSpace queries via Handle
- I will write to Altmetric support and ask them, as perhaps its part of a larger issue
- CGSpace item: https://cgspace.cgiar.org/handle/10568/89643
- CCAFS publication page: https://ccafs.cgiar.org/publications/can-scenario-planning-catalyse-transformational-change-evaluating-climate-change-policy
- Peter tweeted the Handle link and now Altmetric shows the donut for both the DOI and the Handle

## 2018-03-14

- Help Abenet with a troublesome Listings and Report question for CIAT author Steve Beebe
- Continue migrating DSpace Test to the new server (linode6624164)
- I emailed ILRI service desk to update the DNS records for dspacetest.cgiar.org
- Abenet was having problems saving Listings and Reports configurations or layouts but I tested it and it works

## 2018-03-15

- Help Abenet troubleshoot the Listings and Reports issue again
- It looks like it's an issue with the layouts, if you create a new layout that only has one type (`dc.identifier.citation`):

![Listing and Reports layout](/cgspace-notes/2018/03/layout-only-citation.png)

- The error in the DSpace log is:

```
org.apache.jasper.JasperException: java.lang.ArrayIndexOutOfBoundsException: -1
```

- The full error is here: https://gist.github.com/alanorth/ea47c092725960e39610db9b0c13f6ca
- If I do a report for "Orth, Alan" with the same custom layout it works!
- I submitted a ticket to Atmire: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589
- Small fix to the example citation text in Listings and Reports ([#365](https://github.com/ilri/DSpace/pull/365))

## 2018-03-16

- ICT made the DNS updates for dspacetest.cgiar.org late last night
- I have removed the old server (linode02 aka linode578611) in favor of linode19 aka linode6624164
- Looking at the CRP subjects on CGSpace I see there is one blank one so I'll just fix it:

```
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=230 and text_value='';
```

- Copy all CRP subjects to a CSV to do the mass updates:

```
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=230 group by text_value order by count desc) to /tmp/crps.csv with csv header;
COPY 21
```

- Once I prepare the new input forms ([#362](https://github.com/ilri/DSpace/issues/362)) I will need to do the batch corrections:

```
$ ./fix-metadata-values.py -i Correct-21-CRPs-2018-03-16.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.crp -t correct -m 230 -n -d
```

- Create a pull request to update the input forms for the new CRP subject style ([#366](https://github.com/ilri/DSpace/pull/366))

## 2018-03-19

- Tezira has been having problems accessing CGSpace from the ILRI Nairobi campus since last week
- She is getting an HTTPS error apparently
- It's working outside, and Ethiopian users seem to be having no issues so I've asked ICT to have a look
- CGSpace crashed this morning for about seven minutes and Dani restarted Tomcat
- Around that time there were an increase of SQL errors:

```
2018-03-19 09:10:54,856 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
...
2018-03-19 09:10:54,862 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL query singleTable Error -
```

- But these errors, I don't even know what they mean, because a handful of them happen every day:

```
$ grep -c 'ERROR org.dspace.storage.rdbms.DatabaseManager' dspace.log.2018-03-1*
dspace.log.2018-03-10:13
dspace.log.2018-03-11:15
dspace.log.2018-03-12:13
dspace.log.2018-03-13:13
dspace.log.2018-03-14:14
dspace.log.2018-03-15:13
dspace.log.2018-03-16:13
dspace.log.2018-03-17:13
dspace.log.2018-03-18:15
dspace.log.2018-03-19:90
```

- There wasn't even a lot of traffic at the time (8–9 AM):

```
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Mar/2018:0[89]:" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
     92 40.77.167.197
     92 83.103.94.48
     96 40.77.167.175
    116 207.46.13.178
    122 66.249.66.153
    140 95.108.181.88
    196 213.55.99.121
    206 197.210.168.174
    207 104.196.152.243
    294 54.198.169.202
```

- Well there is a hint in Tomcat's `catalina.out`:

```
Mon Mar 19 09:05:28 UTC 2018 | Query:id: 92032 AND type:2
Exception in thread "http-bio-127.0.0.1-8081-exec-280" java.lang.OutOfMemoryError: Java heap space
```

- So someone was doing something heavy somehow... my guess is content and usage stats!
- ICT responded that they "fixed" the CGSpace connectivity issue in Nairobi without telling me the problem
- When I asked, Robert Okal said CGNET messed up when updating the DNS for cgspace.cgiar.org last week
- I told him that my request last week was for dspacetest.cgiar.org, not cgspace.cgiar.org!
- So they updated the wrong fucking DNS records
- Magdalena from CCAFS wrote to ask about one record that has a bunch of metadata missing in her Listings and Reports export
- It appears to be this one: https://cgspace.cgiar.org/handle/10568/83473?show=full
- The title is "Untitled" and there is some metadata but indeed the citation is missing
- I don't know what would cause that

## 2018-03-20

- DSpace Test has been down for a few hours with SQL and memory errors starting this morning:

```
2018-03-20 08:47:10,177 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
...
2018-03-20 08:53:11,624 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space
```

- I have no idea why it crashed
- I ran all system updates and rebooted it
- Abenet told me that one of Lance Robinson's ORCID iDs on CGSpace is incorrect
- I will remove it from the controlled vocabulary ([#367](https://github.com/ilri/DSpace/pull/367)) and update any items using the old one:

```
dspace=# update metadatavalue set text_value='Lance W. Robinson: 0000-0002-5224-8644' where resource_type_id=2 and metadata_field_id=240 and text_value like '%0000-0002-6344-195X%';
UPDATE 1
```

- Communicate with DSpace editors on Yammer about being more careful about spaces and character editing when doing manual metadata edits
- Merge the changes to CRP names to the `5_x-prod` branch and deploy on CGSpace ([#363](https://github.com/ilri/DSpace/pull/363))
- Run corrections for CRP names in the database:

```
$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db dspace -u dspace -p 'fuuu'
```

- Run all system updates on CGSpace (linode18) and reboot the server
- I started a full Discovery re-index on CGSpace because of the updated CRPs
- I see this error in the DSpace log:

```
2018-03-20 19:03:14,844 ERROR com.atmire.dspace.discovery.AtmireSolrService @ No choices plugin was configured for  field "dc_contributor_author".
java.lang.IllegalArgumentException: No choices plugin was configured for  field "dc_contributor_author".
        at org.dspace.content.authority.ChoiceAuthorityManager.getLabel(ChoiceAuthorityManager.java:261)
        at org.dspace.content.authority.ChoiceAuthorityManager.getLabel(ChoiceAuthorityManager.java:249)
        at org.dspace.browse.SolrBrowseCreateDAO.additionalIndex(SolrBrowseCreateDAO.java:215)
        at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:662)
        at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:807)
        at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:876)
        at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
        at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
```

- I have to figure that one out...

## 2018-03-21

- Looks like the indexing gets confused that there is still data in the `authority` column
- Unfortunately this causes those items to simply not be indexed, which users noticed because item counts were cut in half and old items showed up in RSS!
- Since we've migrated the ORCID identifiers associated with the authority data to the `cg.creator.id` field we can nullify the authorities remaining in the database:

```sql
dspace=# UPDATE metadatavalue SET authority=NULL WHERE resource_type_id=2 AND metadata_field_id=3 AND authority IS NOT NULL;
UPDATE 195463
```

- After this the indexing works as usual and item counts and facets are back to normal
- Send Peter a list of all authors to correct:

```sql
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element
= 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv header;
COPY 56156
```

- Afterwards we'll want to do some batch tagging of ORCID identifiers to these names
- CGSpace crashed again this afternoon, I'm not sure of the cause but there are a lot of SQL errors in the DSpace log:

```
2018-03-21 15:11:08,166 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error - 
java.sql.SQLException: Connection has already been closed.
```

- I have no idea why so many connections were abandoned this afternoon:

```
# grep 'Mar 21, 2018' /var/log/tomcat7/catalina.out | grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon'
268
```

- DSpace Test crashed again due to Java heap space, this is from the DSpace log:

```
2018-03-21 15:18:48,149 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space
```

- And this is from the Tomcat Catalina log:

```
Mar 21, 2018 11:20:00 AM org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor run
SEVERE: Unexpected death of background thread ContainerBackgroundProcessor[StandardEngine[Catalina]]
java.lang.OutOfMemoryError: Java heap space
```

- But there are tons of heap space errors on DSpace Test actually:

```
# grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
319
```

- I guess we need to give it more RAM because it now has CGSpace's large Solr core
- I will increase the memory from 3072m to 4096m
- Update [Ansible playbooks](https://github.com/ilri/rmg-ansible-public) to use [PostgreSQL JBDC driver](https://jdbc.postgresql.org/) 42.2.2
- Deploy the new JDBC driver on DSpace Test
- I'm also curious to see how long the `dspace index-discovery -b` takes on DSpace Test where the DSpace installation directory is on one of Linode's new block storage volumes

```
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b

real    208m19.155s
user    8m39.138s
sys     2m45.135s
```

- So that's about three times as long as it took on CGSpace this morning
- I should also check the raw read speed with `hdparm -tT /dev/sdc`
- Looking at Peter's author corrections there are some mistakes due to Windows 1252 encoding
- I need to find a way to filter these easily with OpenRefine
- For example, Peter has inadvertantly introduced Unicode character 0xfffd into several fields
- I can search for Unicode values by their hex code in OpenRefine using the following GREL expression:

```
isNotNull(value.match(/.*\ufffd.*/))
```

- I need to be able to add many common characters though so that it is useful to copy and paste into a new project to find issues

## 2018-03-22

- Add ORCID identifier for Silvia Alonso
- Update my Mirage 2 setup notes for Ubuntu 18.04: https://gist.github.com/alanorth/9bfd29feb7d2e836a9d417633319b3f5

## 2018-03-24

- More work on the Ubuntu 18.04 readiness stuff for the [Ansible playbooks](https://github.com/ilri/rmg-ansible-public)
- The playbook now uses the system's Ruby and Node.js so I don't have to manually install RVM and NVM after

## 2018-03-25

- Looking at Peter's author corrections and trying to work out a way to find errors in OpenRefine easily
- I can find all names that have acceptable characters using a GREL expression like:

```
isNotNull(value.match(/.*[a-zA-ZáÁéèïíñØøöóúü].*/))
```

- But it's probably better to just say which characters I know for sure are not valid (like parentheses, pipe, or weird Unicode characters):

```
or(
  isNotNull(value.match(/.*[(|)].*/)),
  isNotNull(value.match(/.*\uFFFD.*/)),
  isNotNull(value.match(/.*\u00A0.*/)),
  isNotNull(value.match(/.*\u200A.*/))
)
```

- And here's one combined GREL expression to check for items marked as to delete or check so I can flag them and export them to a separate CSV (though perhaps it's time to add delete support to my `fix-metadata-values.py` script:

```
or(
  isNotNull(value.match(/.*delete.*/i)),
  isNotNull(value.match(/.*remove.*/i)),
  isNotNull(value.match(/.*check.*/i))
)
```

- So I guess the routine is in OpenRefine is:
  - Transform: trim leading/trailing whitespace
  - Transform: collapse consecutive whitespace
  - Custom text facet for items to delete/check
  - Custom text facet for illegal characters

- Test the corrections and deletions locally, then run them on CGSpace:

```
$ ./fix-metadata-values.py -i /tmp/Correct-2928-Authors-2018-03-21.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
$ ./delete-metadata-values.py -i /tmp/Delete-8-Authors-2018-03-21.csv -f dc.contributor.author -m 3 -db dspacetest -u dspace -p 'fuuu'
```

- Afterwards I started a full Discovery reindexing on both CGSpace and DSpace Test
- CGSpace took 76m28.292s
- DSpace Test took 194m56.048s

## 2018-03-26

- Atmire got back to me about the [Listings and Reports issue](https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589) and said it's caused by items that have missing `dc.identifier.citation` fields
- The will send a fix

## 2018-03-27

- Atmire got back with an updated quote about the DSpace 5.8 compatibility so I've forwarded it to Peter

## 2018-03-28

- DSpace Test crashed due to heap space so I've increased it from 4096m to 5120m
- The error in Tomcat's `catalina.out` was:

```
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: Java heap space
```

- Add ISI Journal (cg.isijournal) as an option in Atmire's Listing and Reports layout ([#370](https://github.com/ilri/DSpace/pull/370)) for Abenet
- I noticed a few hundred CRPs using the old capitalized formatting so I corrected them:

```
$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db cgspace -u cgspace -p 'fuuu'
Fixed 29 occurences of: CLIMATE CHANGE, AGRICULTURE AND FOOD SECURITY
Fixed 7 occurences of: WATER, LAND AND ECOSYSTEMS
Fixed 19 occurences of: AGRICULTURE FOR NUTRITION AND HEALTH
Fixed 100 occurences of: ROOTS, TUBERS AND BANANAS
Fixed 31 occurences of: HUMIDTROPICS
Fixed 21 occurences of: MAIZE
Fixed 11 occurences of: POLICIES, INSTITUTIONS, AND MARKETS
Fixed 28 occurences of: GRAIN LEGUMES
Fixed 3 occurences of: FORESTS, TREES AND AGROFORESTRY
Fixed 5 occurences of: GENEBANKS
```

- That's weird because we just updated them last week...
- Create a pull request to enable searching by ORCID identifier (`cg.creator.id`) in Discovery and Listings and Reports ([#371](https://github.com/ilri/DSpace/pull/371))
- I will test it on DSpace Test first!
- Fix one missing XMLUI string for "Access Status" (cg.identifier.status)
- Run all system updates on DSpace Test and reboot the machine
-												Add notes for 2018-03-02

											
										
										
											2018-03-02 15:09:18 +01:00
+								---
 								title: "March, 2018"
 								date: 2018-03-02T16:07:54+02:00
 								author: "Alan Orth"
 								tags: ["Notes"]
 								---
-												Update notes

											
										
										
											2018-03-06 09:17:14 +01:00
+								## 2018-03-02
-												Add notes for 2018-03-02

											
										
										
											2018-03-02 15:09:18 +01:00
 								- Export a CSV of the IITA community metadata for Martin Mueller
 								<!--more-->
-												Update notes

											
										
										
											2018-03-06 09:17:14 +01:00
 								## 2018-03-06
 								- Add three new CCAFS project tags to `input-forms.xml` ([#357](https://github.com/ilri/DSpace/pull/357))
 								- Andrea from Macaroni Bros had sent me an email that CCAFS needs them
-												Update notes for 2018-03-06

											
										
										
											2018-03-06 11:25:26 +01:00
+								- Give Udana more feedback on his WLE records from last month
 								- There were some records using a non-breaking space in their AGROVOC subject field
 								- I checked and tested some author corrections from Peter from last week, and then applied them on CGSpace
 								```
 								$ ./fix-metadata-values.py -i Correct-309-authors-2018-03-06.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
 								$ ./delete-metadata-values.py -i Delete-3-Authors-2018-03-06.csv -db dspace -u dspace-p 'fuuu' -f dc.contributor.author -m 3
 								```
 								- This time there were no errors in whitespace but I did have to correct one incorrectly encoded accent character
-												Update notes for 2018-03-06

											
										
										
											2018-03-06 13:45:51 +01:00
+								- Add new CRP subject "GRAIN LEGUMES AND DRYLAND CEREALS" to `input-forms.xml` ([#358](https://github.com/ilri/DSpace/pull/358))
-												Update notes for 2018-03-06

											
										
										
											2018-03-06 20:48:10 +01:00
+								- Merge the ORCID integration stuff in to `5_x-prod` for deployment on CGSpace soon ([#359](https://github.com/ilri/DSpace/pull/359))
 								- Deploy ORCID changes on CGSpace (linode18), run all system updates, and reboot the server
 								- Run all system updates on DSpace Test and reboot server
 								- I ran the [orcid-authority-to-item.py](https://gist.github.com/alanorth/24d8081a5dc25e2a4e27e548e7e2389c) script on CGSpace and mapped 2,864 ORCID identifiers from Solr to item metadata
 								```
 								$ ./orcid-authority-to-item.py -db dspace -u dspace -p 'fuuu' -s http://localhost:8081/solr -d
 								```
 								- I ran the DSpace cleanup script on CGSpace and it threw an error (as always):
 								```
 								Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
 								  Detail: Key (bitstream_id)=(150659) is still referenced from table "bundle".
 								```
 								- The solution is, as always:
 								```
 								$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (150659);'
 								UPDATE 1
 								```
-												Update notes

											
										
										
											2018-03-06 23:24:03 +01:00
 								- Apply the proposed PostgreSQL indexes from DS-3636 (pull request [#1791](https://github.com/DSpace/DSpace/pull/1791/) on CGSpace (linode18)
-												Update notes for 2018-03-07

											
										
										
											2018-03-07 11:29:24 +01:00
 								## 2018-03-07
 								- Add CIAT author Mauricio Efren Sotelo Cabrera to controlled vocabulary for ORCID identifiers ([#360](https://github.com/ilri/DSpace/pull/360))
 								- Help Sisay proof 200 IITA records on DSpace Test
 								- Finally import Udana's 24 items to [IWMI Journal Articles](https://cgspace.cgiar.org/handle/10568/36185) on CGSpace
-												Update notes

											
										
										
											2018-03-08 14:05:29 +01:00
+								- Skype with James Stapleton to discuss CGSpace, ILRI website, CKM staff issues, etc
-												Update

											
										
										
											2018-03-08 16:32:38 +01:00
 								## 2018-03-08
 								- Looking at a CSV dump of the CIAT community I see there are tons of stupid text languages people add for their metadata
 								- This makes the CSV have tons of columns, for example `dc.title`, `dc.title[]`, `dc.title[en]`, `dc.title[eng]`, `dc.title[en_US]` and so on!
-												Update notes for 2018-03-20

											
										
										
											2018-03-20 19:31:54 +01:00
+								- I think I can fix — or at least normalize — them in the database:
-												Update

											
										
										
											2018-03-08 16:32:38 +01:00
 								```
 								dspace=# select distinct text_lang from metadatavalue where resource_type_id=2;
 								 text_lang
 								-----------
 								 ethnob
 								 en
 								 spa
 								 EN
 								 En
 								 en_
 								 en_US
 								 E.
 								 EN_US
 								 en_U
 								 eng
 								 fr
 								 es_ES
 								 es
 								(16 rows)
 								dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('en','EN','En','en_','EN_US','en_U','eng');
 								UPDATE 122227
 								dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2;
 								 text_lang
 								-----------
 								 ethnob
 								 en_US
 								 spa
 								 E.
 								 fr
 								 es_ES
 								 es
 								(9 rows)
 								```
-												Update notes

											
										
										
											2018-03-08 21:47:12 +01:00
+								- On second inspection it looks like `dc.description.provenance` fields use the text_lang "en" so that's probably why there are over 100,000 fields changed...
 								- If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc:
 								```
 								dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng');
 								UPDATE 2309
 								```
 								- I will apply this on CGSpace right now
-												Update

											
										
										
											2018-03-08 16:32:38 +01:00
+								- In other news, I was playing with adding ORCID identifiers to a dump of CIAT's community via CSV in OpenRefine
 								- Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the `cg.creator.id` field
 								- For example, a GREL expression in a custom text facet to get all items with `dc.contributor.author[en_US]` of a certain author with several name variations (this is how you use a logical OR in OpenRefine):
 								```
 								or(value.contains('Ceballos, Hern'), value.contains('Hernández Ceballos'))
 								```
 								- Then you can flag or star matching items and then use a conditional to either set the value directly or add it to an existing value:
 								```
 								if(isBlank(value), "Hernan Ceballos: 0000-0002-8744-7918", value + "||Hernan Ceballos: 0000-0002-8744-7918")
 								```
 								- One thing that bothers me is that this won't honor author order
 								- It might be better to do batches of these in PostgreSQL with a script that takes the `place` column of an author into account when setting the `cg.creator.id`
-												Add notes for 2018-04-04

											
										
										
											2018-04-04 14:57:34 +02:00
+								- I wrote a Python script to read the author names and ORCID identifiers from CSV and create matching `cg.creator.id` fields: [add-orcid-identifiers-csv.py ](https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050)
-												Add notes for 2018-03-08

											
										
										
											2018-03-08 20:10:16 +01:00
+								- The CSV should have two columns: author name and ORCID identifier:
 								```
 								dc.contributor.author,cg.creator.id
 								"Orth, Alan",Alan S. Orth: 0000-0002-1735-7458
 								"Orth, A.",Alan S. Orth: 0000-0002-1735-7458
 								```
 								- I didn't integrate the ORCID API lookup for author names in this script for now because I was only interested in "tagging" old items for a few given authors
-												Update

											
										
										
											2018-03-08 20:20:39 +01:00
+								- I added ORCID identifers for 187 items by CIAT's Hernan Ceballos, because that is what Elizabeth was trying to do manually!
-												Update notes

											
										
										
											2018-03-08 20:29:37 +01:00
+								- Also, I decided to add ORCID identifiers for all records from Peter, Abenet, and Sisay as well
-												Add notes for 2018-03-09

											
										
										
											2018-03-09 21:16:20 +01:00
 								## 2018-03-09
 								- Give James Stapleton input on Sisay's KRAs
 								- Create a pull request to disable ORCID authority integration for `dc.contributor.author` in the submission forms and XMLUI display ([#363](https://github.com/ilri/DSpace/pull/363))
-												Add notes for 2018-03-11

											
										
										
											2018-03-11 12:56:20 +01:00
 								## 2018-03-11
 								- Peter also wrote to say he is having issues with the Atmire Listings and Reports module
 								- When I logged in to try it I get a blank white page after continuing and I see this in dspace.log.2018-03-11:
 								```
 -03-11 11:38:15,592 WARN  org.dspace.app.webui.servlet.InternalErrorServlet @ :session_id=91C2C0C59669B33A7683570F6010603A:internal_error:-- URL Was: https://cgspace.cgiar.or
 								g/jspui/listings-and-reports
 								-- Method: POST
 								-- Parameters were:
 								-- selected_admin_preset: "ilri authors2"
 								-- load: "normal"
 								-- next: "NEXT STEP >>"
 								-- step: "1"
 								org.apache.jasper.JasperException: java.lang.NullPointerException
 								```
 								- Looks like I needed to remove the Humidtropics subject from Listings and Reports because it was looking for the terms and couldn't find them
 								- I made a quick fix and it's working now ([#364](https://github.com/ilri/DSpace/pull/364))
-												Add notes for 2018-03-12

											
										
										
											2018-03-12 20:53:42 +01:00
 								## 2018-03-12
 								- Increase upload size on CGSpace's nginx config to 85MB so Sisay can upload some data
-												Update notes for 2018-03-13

											
										
										
											2018-03-13 18:21:40 +01:00
 								## 2018-03-13
 								- I created a new Linode server for DSpace Test (linode6623840) so I could try the block storage stuff, but when I went to add a 300GB volume it said that block storage capacity was exceeded in that datacenter (Newark, NJ)
 								- I deleted the Linode and created another one (linode6624164) in the Fremont, CA region
 								- After that I deployed the Ubuntu 16.04 image and attached a 300GB block storage volume to the image
 								- Magdalena wrote to ask why there was no Altmetric donut for an item on CGSpace, but there was one on the related CCAFS publication page
 								- It looks the the CCAFS publications page fetches the donut using its DOI, whereas CGSpace queries via Handle
 								- I will write to Altmetric support and ask them, as perhaps its part of a larger issue
 								- CGSpace item: https://cgspace.cgiar.org/handle/10568/89643
 								- CCAFS publication page: https://ccafs.cgiar.org/publications/can-scenario-planning-catalyse-transformational-change-evaluating-climate-change-policy
-												Update notes

											
										
										
											2018-03-14 21:33:09 +01:00
+								- Peter tweeted the Handle link and now Altmetric shows the donut for both the DOI and the Handle
 								## 2018-03-14
 								- Help Abenet with a troublesome Listings and Report question for CIAT author Steve Beebe
 								- Continue migrating DSpace Test to the new server (linode6624164)
 								- I emailed ILRI service desk to update the DNS records for dspacetest.cgiar.org
-												Update

											
										
										
											2018-03-14 21:45:37 +01:00
+								- Abenet was having problems saving Listings and Reports configurations or layouts but I tested it and it works
-												Add notes for 2018-03-15

											
										
										
											2018-03-15 20:41:38 +01:00
 								## 2018-03-15
 								- Help Abenet troubleshoot the Listings and Reports issue again
 								- It looks like it's an issue with the layouts, if you create a new layout that only has one type (`dc.identifier.citation`):
 								![Listing and Reports layout](/cgspace-notes/2018/03/layout-only-citation.png)
 								- The error in the DSpace log is:
 								```
 								org.apache.jasper.JasperException: java.lang.ArrayIndexOutOfBoundsException: -1
 								```
-												Update notes for 2018-03-15

											
										
										
											2018-03-15 20:44:01 +01:00
+								- The full error is here: https://gist.github.com/alanorth/ea47c092725960e39610db9b0c13f6ca
-												Add notes for 2018-03-15

											
										
										
											2018-03-15 20:41:38 +01:00
+								- If I do a report for "Orth, Alan" with the same custom layout it works!
 								- I submitted a ticket to Atmire: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589
-												Update notes for 2018-03-15

											
										
										
											2018-03-15 20:49:36 +01:00
+								- Small fix to the example citation text in Listings and Reports ([#365](https://github.com/ilri/DSpace/pull/365))
-												Update notes for 2018-03-16

											
										
										
											2018-03-16 15:48:39 +01:00
 								## 2018-03-16
 								- ICT made the DNS updates for dspacetest.cgiar.org late last night
 								- I have removed the old server (linode02 aka linode578611) in favor of linode19 aka linode6624164
 								- Looking at the CRP subjects on CGSpace I see there is one blank one so I'll just fix it:
 								```
 								dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=230 and text_value='';
 								```
 								- Copy all CRP subjects to a CSV to do the mass updates:
 								```
 								dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=230 group by text_value order by count desc) to /tmp/crps.csv with csv header;
 								COPY 21
 								```
 								- Once I prepare the new input forms ([#362](https://github.com/ilri/DSpace/issues/362)) I will need to do the batch corrections:
 								```
-												Update notes for 2018-03-16

											
										
										
											2018-03-16 17:44:32 +01:00
+								$ ./fix-metadata-values.py -i Correct-21-CRPs-2018-03-16.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.crp -t correct -m 230 -n -d
-												Update notes for 2018-03-16

											
										
										
											2018-03-16 15:48:39 +01:00
+								```
-												Update notes for 2018-03-16

											
										
										
											2018-03-16 17:58:36 +01:00
 								- Create a pull request to update the input forms for the new CRP subject style ([#366](https://github.com/ilri/DSpace/pull/366))
-												Add notes for 2018-03-19

											
										
										
											2018-03-19 07:30:32 +01:00
 								## 2018-03-19
 								- Tezira has been having problems accessing CGSpace from the ILRI Nairobi campus since last week
 								- She is getting an HTTPS error apparently
 								- It's working outside, and Ethiopian users seem to be having no issues so I've asked ICT to have a look
-												Update notes for 2018-03-19

											
										
										
											2018-03-19 16:56:41 +01:00
+								- CGSpace crashed this morning for about seven minutes and Dani restarted Tomcat
 								- Around that time there were an increase of SQL errors:
 								```
 -03-19 09:10:54,856 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
 								...
 -03-19 09:10:54,862 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL query singleTable Error -
 								```
 								- But these errors, I don't even know what they mean, because a handful of them happen every day:
 								```
 								$ grep -c 'ERROR org.dspace.storage.rdbms.DatabaseManager' dspace.log.2018-03-1*
 								dspace.log.2018-03-10:13
 								dspace.log.2018-03-11:15
 								dspace.log.2018-03-12:13
 								dspace.log.2018-03-13:13
 								dspace.log.2018-03-14:14
 								dspace.log.2018-03-15:13
 								dspace.log.2018-03-16:13
 								dspace.log.2018-03-17:13
 								dspace.log.2018-03-18:15
 								dspace.log.2018-03-19:90
 								```
 								- There wasn't even a lot of traffic at the time (8–9 AM):
 								```
 								# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Mar/2018:0[89]:" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
 40.77.167.197
 83.103.94.48
 40.77.167.175
 207.46.13.178
 66.249.66.153
 95.108.181.88
 213.55.99.121
 197.210.168.174
 104.196.152.243
 54.198.169.202
 								```
-												Update notes for 2018-03-19

											
										
										
											2018-03-19 17:18:28 +01:00
+								- Well there is a hint in Tomcat's `catalina.out`:
 								```
 								Mon Mar 19 09:05:28 UTC 2018 | Query:id: 92032 AND type:2
 								Exception in thread "http-bio-127.0.0.1-8081-exec-280" java.lang.OutOfMemoryError: Java heap space
 								```
 								- So someone was doing something heavy somehow... my guess is content and usage stats!
-												Update notes for 2018-03-19

											
										
										
											2018-03-19 16:56:41 +01:00
+								- ICT responded that they "fixed" the CGSpace connectivity issue in Nairobi without telling me the problem
 								- When I asked, Robert Okal said CGNET messed up when updating the DNS for cgspace.cgiar.org last week
 								- I told him that my request last week was for dspacetest.cgiar.org, not cgspace.cgiar.org!
 								- So they updated the wrong fucking DNS records
 								- Magdalena from CCAFS wrote to ask about one record that has a bunch of metadata missing in her Listings and Reports export
 								- It appears to be this one: https://cgspace.cgiar.org/handle/10568/83473?show=full
 								- The title is "Untitled" and there is some metadata but indeed the citation is missing
 								- I don't know what would cause that
-												Update notes for 2018-03-20

											
										
										
											2018-03-20 16:37:20 +01:00
 								## 2018-03-20
 								- DSpace Test has been down for a few hours with SQL and memory errors starting this morning:
 								```
 -03-20 08:47:10,177 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
 								...
 -03-20 08:53:11,624 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
 								org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space
 								```
 								- I have no idea why it crashed
 								- I ran all system updates and rebooted it
 								- Abenet told me that one of Lance Robinson's ORCID iDs on CGSpace is incorrect
 								- I will remove it from the controlled vocabulary ([#367](https://github.com/ilri/DSpace/pull/367)) and update any items using the old one:
 								```
 								dspace=# update metadatavalue set text_value='Lance W. Robinson: 0000-0002-5224-8644' where resource_type_id=2 and metadata_field_id=240 and text_value like '%0000-0002-6344-195X%';
 								UPDATE 1
 								```
 								- Communicate with DSpace editors on Yammer about being more careful about spaces and character editing when doing manual metadata edits
-												Update notes for 2018-03-20

											
										
										
											2018-03-20 19:31:54 +01:00
+								- Merge the changes to CRP names to the `5_x-prod` branch and deploy on CGSpace ([#363](https://github.com/ilri/DSpace/pull/363))
 								- Run corrections for CRP names in the database:
 								```
 								$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db dspace -u dspace -p 'fuuu'
 								```
 								- Run all system updates on CGSpace (linode18) and reboot the server
 								- I started a full Discovery re-index on CGSpace because of the updated CRPs
-												Update notes for 2018-03-20

											
										
										
											2018-03-20 20:04:10 +01:00
+								- I see this error in the DSpace log:
 								```
 -03-20 19:03:14,844 ERROR com.atmire.dspace.discovery.AtmireSolrService @ No choices plugin was configured for  field "dc_contributor_author".
 								java.lang.IllegalArgumentException: No choices plugin was configured for  field "dc_contributor_author".
 								        at org.dspace.content.authority.ChoiceAuthorityManager.getLabel(ChoiceAuthorityManager.java:261)
 								        at org.dspace.content.authority.ChoiceAuthorityManager.getLabel(ChoiceAuthorityManager.java:249)
 								        at org.dspace.browse.SolrBrowseCreateDAO.additionalIndex(SolrBrowseCreateDAO.java:215)
 								        at com.atmire.dspace.discovery.AtmireSolrService.buildDocument(AtmireSolrService.java:662)
 								        at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:807)
 								        at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:876)
 								        at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370)
 								        at org.dspace.discovery.IndexClient.main(IndexClient.java:117)
 								        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 								        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 								        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 								        at java.lang.reflect.Method.invoke(Method.java:498)
 								        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
 								        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
 								```
 								- I have to figure that one out...
-												Add notes for 2018-03-21

											
										
										
											2018-03-21 10:44:06 +01:00
 								## 2018-03-21
 								- Looks like the indexing gets confused that there is still data in the `authority` column
 								- Unfortunately this causes those items to simply not be indexed, which users noticed because item counts were cut in half and old items showed up in RSS!
 								- Since we've migrated the ORCID identifiers associated with the authority data to the `cg.creator.id` field we can nullify the authorities remaining in the database:
 								```sql
 								dspace=# UPDATE metadatavalue SET authority=NULL WHERE resource_type_id=2 AND metadata_field_id=3 AND authority IS NOT NULL;
 								UPDATE 195463
 								```
 								- After this the indexing works as usual and item counts and facets are back to normal
 								- Send Peter a list of all authors to correct:
 								```sql
 								dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element
 								= 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv header;
 								COPY 56156
 								```
 								- Afterwards we'll want to do some batch tagging of ORCID identifiers to these names
-												Update notes for 2018-03-21

											
										
										
											2018-03-21 17:11:22 +01:00
+								- CGSpace crashed again this afternoon, I'm not sure of the cause but there are a lot of SQL errors in the DSpace log:
 								```
 -03-21 15:11:08,166 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
 								java.sql.SQLException: Connection has already been closed.
 								```
 								- I have no idea why so many connections were abandoned this afternoon:
 								```
 								# grep 'Mar 21, 2018' /var/log/tomcat7/catalina.out | grep -c 'org.apache.tomcat.jdbc.pool.ConnectionPool abandon'
 
 								```
 								- DSpace Test crashed again due to Java heap space, this is from the DSpace log:
 								```
 -03-21 15:18:48,149 ERROR org.dspace.app.xmlui.cocoon.DSpaceCocoonServletFilter @ Serious Error Occurred Processing Request!
 								org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Java heap space
 								```
 								- And this is from the Tomcat Catalina log:
 								```
 								Mar 21, 2018 11:20:00 AM org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor run
 								SEVERE: Unexpected death of background thread ContainerBackgroundProcessor[StandardEngine[Catalina]]
 								java.lang.OutOfMemoryError: Java heap space
 								```
 								- But there are tons of heap space errors on DSpace Test actually:
 								```
 								# grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
 
 								```
 								- I guess we need to give it more RAM because it now has CGSpace's large Solr core
 								- I will increase the memory from 3072m to 4096m
 								- Update [Ansible playbooks](https://github.com/ilri/rmg-ansible-public) to use [PostgreSQL JBDC driver](https://jdbc.postgresql.org/) 42.2.2
 								- Deploy the new JDBC driver on DSpace Test
 								- I'm also curious to see how long the `dspace index-discovery -b` takes on DSpace Test where the DSpace installation directory is on one of Linode's new block storage volumes
-												Update notes for 2018-03-21

											
										
										
											2018-03-22 00:04:49 +01:00
 								```
 								$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
 								real    208m19.155s
 								user    8m39.138s
 								sys     2m45.135s
 								```
 								- So that's about three times as long as it took on CGSpace this morning
-												Update notes for 2018-03-21

											
										
										
											2018-03-21 17:11:22 +01:00
+								- I should also check the raw read speed with `hdparm -tT /dev/sdc`
-												Update notes for 2018-03-21

											
										
										
											2018-03-22 00:04:49 +01:00
+								- Looking at Peter's author corrections there are some mistakes due to Windows 1252 encoding
 								- I need to find a way to filter these easily with OpenRefine
 								- For example, Peter has inadvertantly introduced Unicode character 0xfffd into several fields
 								- I can search for Unicode values by their hex code in OpenRefine using the following GREL expression:
 								```
 								isNotNull(value.match(/.*\ufffd.*/))
 								```
 								- I need to be able to add many common characters though so that it is useful to copy and paste into a new project to find issues
-												Add notes for 2018-03-22

											
										
										
											2018-03-22 09:45:56 +01:00
 								## 2018-03-22
 								- Add ORCID identifier for Silvia Alonso
-												Update notes for 2018-03-22

											
										
										
											2018-03-22 22:07:03 +01:00
+								- Update my Mirage 2 setup notes for Ubuntu 18.04: https://gist.github.com/alanorth/9bfd29feb7d2e836a9d417633319b3f5
-												Update notes for 2018-03-24

											
										
										
											2018-03-24 21:03:00 +01:00
 								## 2018-03-24
 								- More work on the Ubuntu 18.04 readiness stuff for the [Ansible playbooks](https://github.com/ilri/rmg-ansible-public)
 								- The playbook now uses the system's Ruby and Node.js so I don't have to manually install RVM and NVM after
-												Add notes for 2018-03-25

											
										
										
											2018-03-25 21:46:48 +02:00
 								## 2018-03-25
 								- Looking at Peter's author corrections and trying to work out a way to find errors in OpenRefine easily
 								- I can find all names that have acceptable characters using a GREL expression like:
 								```
 								isNotNull(value.match(/.*[a-zA-ZáÁéèïíñØøöóúü].*/))
 								```
 								- But it's probably better to just say which characters I know for sure are not valid (like parentheses, pipe, or weird Unicode characters):
 								```
 								or(
 								  isNotNull(value.match(/.*[(|)].*/)),
 								  isNotNull(value.match(/.*\uFFFD.*/)),
 								  isNotNull(value.match(/.*\u00A0.*/)),
 								  isNotNull(value.match(/.*\u200A.*/))
 								)
 								```
 								- And here's one combined GREL expression to check for items marked as to delete or check so I can flag them and export them to a separate CSV (though perhaps it's time to add delete support to my `fix-metadata-values.py` script:
 								```
 								or(
 								  isNotNull(value.match(/.*delete.*/i)),
 								  isNotNull(value.match(/.*remove.*/i)),
 								  isNotNull(value.match(/.*check.*/i))
 								)
 								```
 								- So I guess the routine is in OpenRefine is:
 								  - Transform: trim leading/trailing whitespace
 								  - Transform: collapse consecutive whitespace
 								  - Custom text facet for items to delete/check
 								  - Custom text facet for illegal characters
 								- Test the corrections and deletions locally, then run them on CGSpace:
 								```
 								$ ./fix-metadata-values.py -i /tmp/Correct-2928-Authors-2018-03-21.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
 								$ ./delete-metadata-values.py -i /tmp/Delete-8-Authors-2018-03-21.csv -f dc.contributor.author -m 3 -db dspacetest -u dspace -p 'fuuu'
 								```
-												Update notes for 2018-03-25

											
										
										
											2018-03-26 09:11:38 +02:00
+								- Afterwards I started a full Discovery reindexing on both CGSpace and DSpace Test
 								- CGSpace took 76m28.292s
 								- DSpace Test took 194m56.048s
-												Add notes

											
										
										
											2018-03-28 08:48:08 +02:00
 								## 2018-03-26
 								- Atmire got back to me about the [Listings and Reports issue](https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589) and said it's caused by items that have missing `dc.identifier.citation` fields
 								- The will send a fix
 								## 2018-03-27
 								- Atmire got back with an updated quote about the DSpace 5.8 compatibility so I've forwarded it to Peter
 								## 2018-03-28
 								- DSpace Test crashed due to heap space so I've increased it from 4096m to 5120m
-												Update notes for 2018-03-28

											
										
										
											2018-03-28 16:48:23 +02:00
+								- The error in Tomcat's `catalina.out` was:
 								```
 								Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: Java heap space
 								```
-												Update notes for 2018-03-28

											
										
										
											2018-03-28 11:20:58 +02:00
+								- Add ISI Journal (cg.isijournal) as an option in Atmire's Listing and Reports layout ([#370](https://github.com/ilri/DSpace/pull/370)) for Abenet
-												Update notes for 2018-03-28

											
										
										
											2018-03-28 16:48:23 +02:00
+								- I noticed a few hundred CRPs using the old capitalized formatting so I corrected them:
 								```
 								$ ./fix-metadata-values.py -i /tmp/Correct-21-CRPs-2018-03-16.csv -f cg.contributor.crp -t correct -m 230 -db cgspace -u cgspace -p 'fuuu'
 								Fixed 29 occurences of: CLIMATE CHANGE, AGRICULTURE AND FOOD SECURITY
 								Fixed 7 occurences of: WATER, LAND AND ECOSYSTEMS
 								Fixed 19 occurences of: AGRICULTURE FOR NUTRITION AND HEALTH
 								Fixed 100 occurences of: ROOTS, TUBERS AND BANANAS
 								Fixed 31 occurences of: HUMIDTROPICS
 								Fixed 21 occurences of: MAIZE
 								Fixed 11 occurences of: POLICIES, INSTITUTIONS, AND MARKETS
 								Fixed 28 occurences of: GRAIN LEGUMES
 								Fixed 3 occurences of: FORESTS, TREES AND AGROFORESTRY
 								Fixed 5 occurences of: GENEBANKS
 								```
 								- That's weird because we just updated them last week...
 								- Create a pull request to enable searching by ORCID identifier (`cg.creator.id`) in Discovery and Listings and Reports ([#371](https://github.com/ilri/DSpace/pull/371))
 								- I will test it on DSpace Test first!
 								- Fix one missing XMLUI string for "Access Status" (cg.identifier.status)
-												Update notes for 2018-03-28

											
										
										
											2018-03-28 21:01:13 +02:00
+								- Run all system updates on DSpace Test and reboot the machine