cgspace-notes/content/posts/2018-09.md

218 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "September, 2018"
date: 2018-09-02T09:55:54+03:00
author: "Alan Orth"
tags: ["Notes"]
---
## 2018-09-02
- New [PostgreSQL JDBC driver version 42.2.5](https://jdbc.postgresql.org/documentation/changelog.html#version_42.2.5)
- I'll update the DSpace role in our [Ansible infrastructure playbooks](https://github.com/ilri/rmg-ansible-public) and run the updated playbooks on CGSpace and DSpace Test
- Also, I'll re-run the `postgresql` tasks because the custom PostgreSQL variables are dynamic according to the system's RAM, and we never re-ran them after migrating to larger Linodes last month
- I'm testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I'm getting those autowire errors in Tomcat 8.5.30 again:
<!--more-->
```
02-Sep-2018 11:18:52.678 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener]
java.lang.RuntimeException: Failure during filter init: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/dspacetest.cgiar.org/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#4c5d5a2' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#4c5d5a2': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
at org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener.contextInitialized(DSpaceKernelServletContextListener.java:92)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4776)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5240)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:754)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:730)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734)
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:629)
at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1838)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/dspacetest.cgiar.org/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#4c5d5a2' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#4c5d5a2': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations:
```
- Full log here: https://gist.github.com/alanorth/1e4ae567b853fea9d9dbf1a030ecd8c2
- XMLUI fails to load, but the REST, SOLR, JSPUI, etc work
- The old `5_x-prod-dspace-5.5` branch does work in Ubuntu 18.04 with Tomcat 8.5.30-1ubuntu1.4, however!
- And the `5_x-prod` DSpace 5.8 branch does work in Tomcat 8.5.x on my Arch Linux laptop...
- I'm not sure where the issue is then!
## 2018-09-03
- Abenet says she's getting three emails about periodic statistics reports every day since the DSpace 5.8 upgrade last week
- They are from the CUA module
- Two of them have "no data" and one has a "null" title
- The last one is a report of the top downloaded items, and includes a graph
- She will try to click the "Unsubscribe" link in the first two to see if it works, otherwise we should contact Atmire
- The only one she remembers subscribing to is the top downloads one
## 2018-09-04
- I'm looking over the latest round of IITA records from Sisay: [Mercy1806_August_29](https://dspacetest.cgiar.org/handle/10568/104230)
- All fields are split with multiple columns like `cg.authorship.types` and `cg.authorship.types[]`
- This makes it super annoying to do the checks and cleanup, so I will merge them (also time consuming)
- Five items had `dc.date.issued` values like `2013-5` so I corrected them to be `2013-05`
- Several metadata fields had values with newlines in them (even in some titles!), which I fixed by trimming the consecutive whitespaces in Open Refine
- Many (91!) items from before 2011 are indicated as having a CRP, but CRPs didn't exist then so this is impossible
- I got all items that were from 2011 and onwards using a custom facet with this GREL on the `dc.date.issued` column: `isNotNull(value.match(/201[1-8].*/))` and then blanking their CRPs
- Some affiliations with only one separator (|) for multiple values
- I replaced smart quotes like `` with plain ones
- Some inconsistencies in `cg.subject.iita` like COWPEA and COWPEAS, and YAM and YAMS, etc, as well as some spelling mistakes like IMPACT ASSESSMENTN
- Some values in the `dc.identifier.isbn` are actually ISSNs so I moved them to the `dc.identifier.issn` column
- I found one invalid ISSN using a custom text facet with the regex from the [ISSN page on Wikipedia](https://en.wikipedia.org/wiki/International_Standard_Serial_Number#Code_format): `isNotBlank(value.match(/^\d{4}-\d{3}[\dxX]$/))`
- One invalid value for `dc.type`
- Abenet says she hasn't received any more subscription emails from the CUA module since she unsubscribed yesterday, so I think we don't need create an issue on Atmire's bug tracker anymore
## 2018-09-10
- Playing with [strest](https://github.com/eykhagen/strest) to test the DSpace REST API programatically
- For example, given this `test.yaml`:
```
version: 1
requests:
test:
method: GET
url: https://dspacetest.cgiar.org/rest/test
validate:
raw: "REST api is running."
login:
url: https://dspacetest.cgiar.org/rest/login
method: POST
data:
json: {"email":"test@dspace","password":"thepass"}
status:
url: https://dspacetest.cgiar.org/rest/status
method: GET
headers:
rest-dspace-token: Value(login)
logout:
url: https://dspacetest.cgiar.org/rest/logout
method: POST
headers:
rest-dspace-token: Value(login)
# vim: set sw=2 ts=2:
```
- Works pretty well, though the DSpace `logout` always returns an HTTP 415 error for some reason
- We could eventually use this to test sanity of the API for creating collections etc
- A user is getting an error in her workflow:
```
2018-09-10 07:26:35,551 ERROR org.dspace.submit.step.CompleteStep @ Caught exception in submission step:
org.dspace.authorize.AuthorizeException: Authorization denied for action WORKFLOW_STEP_1 on COLLECTION:2 by user 3819
```
- Seems to be during submit step, because it's workflow step 1...?
- Move some top-level CRP communities to be below the new [CGIAR Research Programs and Platforms](https://cgspace.cgiar.org/handle/10568/97114) community:
```
$ dspace community-filiator --set -p 10568/97114 -c 10568/51670
$ dspace community-filiator --set -p 10568/97114 -c 10568/35409
$ dspace community-filiator --set -p 10568/97114 -c 10568/3112
```
- Valerio contacted me to point out some issues with metadata on CGSpace, which I corrected in PostgreSQL:
```
update metadatavalue set text_value='ISI Journal' where resource_type_id=2 and metadata_field_id=226 and text_value='ISI Juornal';
UPDATE 1
update metadatavalue set text_value='ISI Journal' where resource_type_id=2 and metadata_field_id=226 and text_value='ISI journal';
UPDATE 23
update metadatavalue set text_value='ISI Journal' where resource_type_id=2 and metadata_field_id=226 and text_value='YES';
UPDATE 1
delete from metadatavalue where resource_type_id=2 and metadata_field_id=226 and text_value='NO';
DELETE 17
update metadatavalue set text_value='ISI Journal' where resource_type_id=2 and metadata_field_id=226 and text_value='ISI';
UPDATE 15
```
- Start working on adding metadata for access and usage rights that we started earlier in 2018 (and also in 2017)
- The current `cg.identifier.status` field will become "Access rights" and `dc.rights` will become "Usage rights"
- I have some work in progress on the [`5_x-rights` branch](https://github.com/alanorth/DSpace/tree/5_x-rights)
- Linode said that CGSpace (linode18) had a high CPU load earlier today
- When I looked, I see it's the same Russian IP that I noticed last month:
```
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Sep/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1459 157.55.39.202
1579 95.108.181.88
1615 157.55.39.147
1714 66.249.64.91
1924 50.116.102.77
3696 157.55.39.106
3763 157.55.39.148
4470 70.32.83.92
4724 35.237.175.180
14132 5.9.6.51
```
- And this bot is still creating more Tomcat sessions than Nginx requests (WTF?):
```
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-09-10
14133
```
- The user agent is still the same:
```
Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
```
- I added `.*crawl.*` to the Tomcat Session Crawler Manager Valve, so I'm not sure why the bot is creating so many sessions...
- I just tested that user agent on CGSpace and it *does not* create a new session:
```
$ http --print Hh https://cgspace.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)'
GET / HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: cgspace.cgiar.org
User-Agent: Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 10 Sep 2018 20:43:04 GMT
Server: nginx
Strict-Transport-Security: max-age=15768000
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
```
- I will have to keep an eye on it and perhaps add it to the list of "bad bots" that get rate limited
## 2018-09-12
- Merge AReS explorer changes to nginx config and deploy on CGSpace so CodeObia can start testing more
- Re-create my local Docker container for PostgreSQL data, but using a volume for the database data:
```
$ sudo docker volume create --name dspacetest_data
$ sudo docker run --name dspacedb -v dspacetest_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
```
- Sisay is still having problems with the controlled vocabulary for top authors
- I took a look at the submission template and Firefox complains that the XML file is missing a root element
- I guess it's because Firefox is receiving an empty XML file
- I told Sisay to run the XML file through tidy
- More testing of the access and usage rights changes
<!-- vim: set sw=2 ts=2: -->