mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-16 11:57:03 +01:00
289 lines
12 KiB
Markdown
289 lines
12 KiB
Markdown
---
|
||
title: "March, 2018"
|
||
date: 2018-03-02T16:07:54+02:00
|
||
author: "Alan Orth"
|
||
tags: ["Notes"]
|
||
---
|
||
|
||
## 2018-03-02
|
||
|
||
- Export a CSV of the IITA community metadata for Martin Mueller
|
||
|
||
<!--more-->
|
||
|
||
## 2018-03-06
|
||
|
||
- Add three new CCAFS project tags to `input-forms.xml` ([#357](https://github.com/ilri/DSpace/pull/357))
|
||
- Andrea from Macaroni Bros had sent me an email that CCAFS needs them
|
||
- Give Udana more feedback on his WLE records from last month
|
||
- There were some records using a non-breaking space in their AGROVOC subject field
|
||
- I checked and tested some author corrections from Peter from last week, and then applied them on CGSpace
|
||
|
||
```
|
||
$ ./fix-metadata-values.py -i Correct-309-authors-2018-03-06.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3
|
||
$ ./delete-metadata-values.py -i Delete-3-Authors-2018-03-06.csv -db dspace -u dspace-p 'fuuu' -f dc.contributor.author -m 3
|
||
```
|
||
|
||
- This time there were no errors in whitespace but I did have to correct one incorrectly encoded accent character
|
||
- Add new CRP subject "GRAIN LEGUMES AND DRYLAND CEREALS" to `input-forms.xml` ([#358](https://github.com/ilri/DSpace/pull/358))
|
||
- Merge the ORCID integration stuff in to `5_x-prod` for deployment on CGSpace soon ([#359](https://github.com/ilri/DSpace/pull/359))
|
||
- Deploy ORCID changes on CGSpace (linode18), run all system updates, and reboot the server
|
||
- Run all system updates on DSpace Test and reboot server
|
||
- I ran the [orcid-authority-to-item.py](https://gist.github.com/alanorth/24d8081a5dc25e2a4e27e548e7e2389c) script on CGSpace and mapped 2,864 ORCID identifiers from Solr to item metadata
|
||
|
||
```
|
||
$ ./orcid-authority-to-item.py -db dspace -u dspace -p 'fuuu' -s http://localhost:8081/solr -d
|
||
```
|
||
|
||
- I ran the DSpace cleanup script on CGSpace and it threw an error (as always):
|
||
|
||
```
|
||
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
|
||
Detail: Key (bitstream_id)=(150659) is still referenced from table "bundle".
|
||
```
|
||
|
||
- The solution is, as always:
|
||
```
|
||
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (150659);'
|
||
UPDATE 1
|
||
```
|
||
|
||
- Apply the proposed PostgreSQL indexes from DS-3636 (pull request [#1791](https://github.com/DSpace/DSpace/pull/1791/) on CGSpace (linode18)
|
||
|
||
## 2018-03-07
|
||
|
||
- Add CIAT author Mauricio Efren Sotelo Cabrera to controlled vocabulary for ORCID identifiers ([#360](https://github.com/ilri/DSpace/pull/360))
|
||
- Help Sisay proof 200 IITA records on DSpace Test
|
||
- Finally import Udana's 24 items to [IWMI Journal Articles](https://cgspace.cgiar.org/handle/10568/36185) on CGSpace
|
||
- Skype with James Stapleton to discuss CGSpace, ILRI website, CKM staff issues, etc
|
||
|
||
## 2018-03-08
|
||
|
||
- Looking at a CSV dump of the CIAT community I see there are tons of stupid text languages people add for their metadata
|
||
- This makes the CSV have tons of columns, for example `dc.title`, `dc.title[]`, `dc.title[en]`, `dc.title[eng]`, `dc.title[en_US]` and so on!
|
||
- I think I can fix — or at least normalize — them in the database:
|
||
|
||
```
|
||
dspace=# select distinct text_lang from metadatavalue where resource_type_id=2;
|
||
text_lang
|
||
-----------
|
||
|
||
ethnob
|
||
en
|
||
spa
|
||
EN
|
||
En
|
||
en_
|
||
en_US
|
||
E.
|
||
|
||
EN_US
|
||
en_U
|
||
eng
|
||
fr
|
||
es_ES
|
||
es
|
||
(16 rows)
|
||
|
||
dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('en','EN','En','en_','EN_US','en_U','eng');
|
||
UPDATE 122227
|
||
dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2;
|
||
text_lang
|
||
-----------
|
||
|
||
ethnob
|
||
en_US
|
||
spa
|
||
E.
|
||
|
||
fr
|
||
es_ES
|
||
es
|
||
(9 rows)
|
||
```
|
||
|
||
- On second inspection it looks like `dc.description.provenance` fields use the text_lang "en" so that's probably why there are over 100,000 fields changed...
|
||
- If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc:
|
||
|
||
```
|
||
dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng');
|
||
UPDATE 2309
|
||
```
|
||
|
||
- I will apply this on CGSpace right now
|
||
- In other news, I was playing with adding ORCID identifiers to a dump of CIAT's community via CSV in OpenRefine
|
||
- Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the `cg.creator.id` field
|
||
- For example, a GREL expression in a custom text facet to get all items with `dc.contributor.author[en_US]` of a certain author with several name variations (this is how you use a logical OR in OpenRefine):
|
||
|
||
```
|
||
or(value.contains('Ceballos, Hern'), value.contains('Hernández Ceballos'))
|
||
```
|
||
|
||
- Then you can flag or star matching items and then use a conditional to either set the value directly or add it to an existing value:
|
||
|
||
```
|
||
if(isBlank(value), "Hernan Ceballos: 0000-0002-8744-7918", value + "||Hernan Ceballos: 0000-0002-8744-7918")
|
||
```
|
||
|
||
- One thing that bothers me is that this won't honor author order
|
||
- It might be better to do batches of these in PostgreSQL with a script that takes the `place` column of an author into account when setting the `cg.creator.id`
|
||
- I wrote a Python script to read the author names and ORCID identifiers from CSV and create matching `cg.creator.id` fieldsa: [add-orcid-identifiers-csv.py ](https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050)
|
||
- The CSV should have two columns: author name and ORCID identifier:
|
||
|
||
```
|
||
dc.contributor.author,cg.creator.id
|
||
"Orth, Alan",Alan S. Orth: 0000-0002-1735-7458
|
||
"Orth, A.",Alan S. Orth: 0000-0002-1735-7458
|
||
```
|
||
|
||
- I didn't integrate the ORCID API lookup for author names in this script for now because I was only interested in "tagging" old items for a few given authors
|
||
- I added ORCID identifers for 187 items by CIAT's Hernan Ceballos, because that is what Elizabeth was trying to do manually!
|
||
- Also, I decided to add ORCID identifiers for all records from Peter, Abenet, and Sisay as well
|
||
|
||
## 2018-03-09
|
||
|
||
- Give James Stapleton input on Sisay's KRAs
|
||
- Create a pull request to disable ORCID authority integration for `dc.contributor.author` in the submission forms and XMLUI display ([#363](https://github.com/ilri/DSpace/pull/363))
|
||
|
||
## 2018-03-11
|
||
|
||
- Peter also wrote to say he is having issues with the Atmire Listings and Reports module
|
||
- When I logged in to try it I get a blank white page after continuing and I see this in dspace.log.2018-03-11:
|
||
|
||
```
|
||
2018-03-11 11:38:15,592 WARN org.dspace.app.webui.servlet.InternalErrorServlet @ :session_id=91C2C0C59669B33A7683570F6010603A:internal_error:-- URL Was: https://cgspace.cgiar.or
|
||
g/jspui/listings-and-reports
|
||
-- Method: POST
|
||
-- Parameters were:
|
||
-- selected_admin_preset: "ilri authors2"
|
||
-- load: "normal"
|
||
-- next: "NEXT STEP >>"
|
||
-- step: "1"
|
||
|
||
org.apache.jasper.JasperException: java.lang.NullPointerException
|
||
```
|
||
|
||
- Looks like I needed to remove the Humidtropics subject from Listings and Reports because it was looking for the terms and couldn't find them
|
||
- I made a quick fix and it's working now ([#364](https://github.com/ilri/DSpace/pull/364))
|
||
|
||
## 2018-03-12
|
||
|
||
- Increase upload size on CGSpace's nginx config to 85MB so Sisay can upload some data
|
||
|
||
## 2018-03-13
|
||
|
||
- I created a new Linode server for DSpace Test (linode6623840) so I could try the block storage stuff, but when I went to add a 300GB volume it said that block storage capacity was exceeded in that datacenter (Newark, NJ)
|
||
- I deleted the Linode and created another one (linode6624164) in the Fremont, CA region
|
||
- After that I deployed the Ubuntu 16.04 image and attached a 300GB block storage volume to the image
|
||
- Magdalena wrote to ask why there was no Altmetric donut for an item on CGSpace, but there was one on the related CCAFS publication page
|
||
- It looks the the CCAFS publications page fetches the donut using its DOI, whereas CGSpace queries via Handle
|
||
- I will write to Altmetric support and ask them, as perhaps its part of a larger issue
|
||
- CGSpace item: https://cgspace.cgiar.org/handle/10568/89643
|
||
- CCAFS publication page: https://ccafs.cgiar.org/publications/can-scenario-planning-catalyse-transformational-change-evaluating-climate-change-policy
|
||
- Peter tweeted the Handle link and now Altmetric shows the donut for both the DOI and the Handle
|
||
|
||
## 2018-03-14
|
||
|
||
- Help Abenet with a troublesome Listings and Report question for CIAT author Steve Beebe
|
||
- Continue migrating DSpace Test to the new server (linode6624164)
|
||
- I emailed ILRI service desk to update the DNS records for dspacetest.cgiar.org
|
||
- Abenet was having problems saving Listings and Reports configurations or layouts but I tested it and it works
|
||
|
||
## 2018-03-15
|
||
|
||
- Help Abenet troubleshoot the Listings and Reports issue again
|
||
- It looks like it's an issue with the layouts, if you create a new layout that only has one type (`dc.identifier.citation`):
|
||
|
||
![Listing and Reports layout](/cgspace-notes/2018/03/layout-only-citation.png)
|
||
|
||
- The error in the DSpace log is:
|
||
|
||
```
|
||
org.apache.jasper.JasperException: java.lang.ArrayIndexOutOfBoundsException: -1
|
||
```
|
||
|
||
- The full error is here: https://gist.github.com/alanorth/ea47c092725960e39610db9b0c13f6ca
|
||
- If I do a report for "Orth, Alan" with the same custom layout it works!
|
||
- I submitted a ticket to Atmire: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589
|
||
- Small fix to the example citation text in Listings and Reports ([#365](https://github.com/ilri/DSpace/pull/365))
|
||
|
||
## 2018-03-16
|
||
|
||
- ICT made the DNS updates for dspacetest.cgiar.org late last night
|
||
- I have removed the old server (linode02 aka linode578611) in favor of linode19 aka linode6624164
|
||
- Looking at the CRP subjects on CGSpace I see there is one blank one so I'll just fix it:
|
||
|
||
```
|
||
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=230 and text_value='';
|
||
```
|
||
|
||
- Copy all CRP subjects to a CSV to do the mass updates:
|
||
|
||
```
|
||
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=230 group by text_value order by count desc) to /tmp/crps.csv with csv header;
|
||
COPY 21
|
||
```
|
||
|
||
- Once I prepare the new input forms ([#362](https://github.com/ilri/DSpace/issues/362)) I will need to do the batch corrections:
|
||
|
||
```
|
||
$ ./fix-metadata-values.py -i Correct-21-CRPs-2018-03-16.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.crp -t correct -m 230 -n -d
|
||
```
|
||
|
||
- Create a pull request to update the input forms for the new CRP subject style ([#366](https://github.com/ilri/DSpace/pull/366))
|
||
|
||
## 2018-03-19
|
||
|
||
- Tezira has been having problems accessing CGSpace from the ILRI Nairobi campus since last week
|
||
- She is getting an HTTPS error apparently
|
||
- It's working outside, and Ethiopian users seem to be having no issues so I've asked ICT to have a look
|
||
- CGSpace crashed this morning for about seven minutes and Dani restarted Tomcat
|
||
- Around that time there were an increase of SQL errors:
|
||
|
||
```
|
||
2018-03-19 09:10:54,856 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
|
||
...
|
||
2018-03-19 09:10:54,862 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL query singleTable Error -
|
||
```
|
||
|
||
- But these errors, I don't even know what they mean, because a handful of them happen every day:
|
||
|
||
```
|
||
$ grep -c 'ERROR org.dspace.storage.rdbms.DatabaseManager' dspace.log.2018-03-1*
|
||
dspace.log.2018-03-10:13
|
||
dspace.log.2018-03-11:15
|
||
dspace.log.2018-03-12:13
|
||
dspace.log.2018-03-13:13
|
||
dspace.log.2018-03-14:14
|
||
dspace.log.2018-03-15:13
|
||
dspace.log.2018-03-16:13
|
||
dspace.log.2018-03-17:13
|
||
dspace.log.2018-03-18:15
|
||
dspace.log.2018-03-19:90
|
||
```
|
||
|
||
- There wasn't even a lot of traffic at the time (8–9 AM):
|
||
|
||
```
|
||
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Mar/2018:0[89]:" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||
92 40.77.167.197
|
||
92 83.103.94.48
|
||
96 40.77.167.175
|
||
116 207.46.13.178
|
||
122 66.249.66.153
|
||
140 95.108.181.88
|
||
196 213.55.99.121
|
||
206 197.210.168.174
|
||
207 104.196.152.243
|
||
294 54.198.169.202
|
||
```
|
||
|
||
- ICT responded that they "fixed" the CGSpace connectivity issue in Nairobi without telling me the problem
|
||
- When I asked, Robert Okal said CGNET messed up when updating the DNS for cgspace.cgiar.org last week
|
||
- I told him that my request last week was for dspacetest.cgiar.org, not cgspace.cgiar.org!
|
||
- So they updated the wrong fucking DNS records
|
||
- Magdalena from CCAFS wrote to ask about one record that has a bunch of metadata missing in her Listings and Reports export
|
||
- It appears to be this one: https://cgspace.cgiar.org/handle/10568/83473?show=full
|
||
- The title is "Untitled" and there is some metadata but indeed the citation is missing
|
||
- I don't know what would cause that
|