CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

September, 2018

2018-09-02

  • New PostgreSQL JDBC driver version 42.2.5
  • I’ll update the DSpace role in our Ansible infrastructure playbooks and run the updated playbooks on CGSpace and DSpace Test
  • Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month
  • I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:

02-Sep-2018 11:18:52.678 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener]
 java.lang.RuntimeException: Failure during filter init: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/dspacetest.cgiar.org/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#4c5d5a2' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#4c5d5a2': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
    at org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener.contextInitialized(DSpaceKernelServletContextListener.java:92)
    at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4776)
    at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5240)
    at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
    at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:754)
    at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:730)
    at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734)
    at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:629)
    at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1838)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/dspacetest.cgiar.org/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#4c5d5a2' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#4c5d5a2': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations:
  • Full log here: https://gist.github.com/alanorth/1e4ae567b853fea9d9dbf1a030ecd8c2
  • XMLUI fails to load, but the REST, SOLR, JSPUI, etc work
  • The old 5_x-prod-dspace-5.5 branch does work in Ubuntu 18.04 with Tomcat 8.5.30-1ubuntu1.4, however!
  • And the 5_x-prod DSpace 5.8 branch does work in Tomcat 8.5.x on my Arch Linux laptop…
  • I’m not sure where the issue is then!

2018-09-03

  • Abenet says she’s getting three emails about periodic statistics reports every day since the DSpace 5.8 upgrade last week
  • They are from the CUA module
  • Two of them have “no data” and one has a “null” title
  • The last one is a report of the top downloaded items, and includes a graph
  • She will try to click the “Unsubscribe” link in the first two to see if it works, otherwise we should contact Atmire
  • The only one she remembers subscribing to is the top downloads one

2018-09-04

  • I’m looking over the latest round of IITA records from Sisay: Mercy1806_August_29
    • All fields are split with multiple columns like cg.authorship.types and cg.authorship.types[]
    • This makes it super annoying to do the checks and cleanup, so I will merge them (also time consuming)
    • Five items had dc.date.issued values like 2013-5 so I corrected them to be 2013-05
    • Several metadata fields had values with newlines in them (even in some titles!), which I fixed by trimming the consecutive whitespaces in Open Refine
    • Many (91!) items from before 2011 are indicated as having a CRP, but CRPs didn’t exist then so this is impossible
    • I got all items that were from 2011 and onwards using a custom facet with this GREL on the dc.date.issued column: isNotNull(value.match(/201[1-8].*/)) and then blanking their CRPs
    • Some affiliations with only one separator (|) for multiple values
    • I replaced smart quotes like with plain ones
    • Some inconsistencies in cg.subject.iita like COWPEA and COWPEAS, and YAM and YAMS, etc, as well as some spelling mistakes like IMPACT ASSESSMENTN
    • Some values in the dc.identifier.isbn are actually ISSNs so I moved them to the dc.identifier.issn column
    • I found one invalid ISSN using a custom text facet with the regex from the ISSN page on Wikipedia: isNotBlank(value.match(/^\d{4}-\d{3}[\dxX]$/))
    • One invalid value for dc.type
  • Abenet says she hasn’t received any more subscription emails from the CUA module since she unsubscribed yesterday, so I think we don’t need create an issue on Atmire’s bug tracker anymore

2018-09-10

  • Playing with strest to test the DSpace REST API programatically
  • For example, given this test.yaml:
version: 1

requests:
  test:
    method: GET
    url: https://dspacetest.cgiar.org/rest/test
    validate:
      raw: "REST api is running."

  login:
    url: https://dspacetest.cgiar.org/rest/login
    method: POST
    data:
      json: {"email":"test@dspace","password":"thepass"}

  status:
    url: https://dspacetest.cgiar.org/rest/status
    method: GET
    headers:
      rest-dspace-token: Value(login)

  logout:
    url: https://dspacetest.cgiar.org/rest/logout
    method: POST
    headers:
      rest-dspace-token: Value(login)

# vim: set sw=2 ts=2:
  • Works pretty well, though the DSpace logout always returns an HTTP 415 error for some reason
  • We could eventually use this to test sanity of the API for creating collections etc
  • A user is getting an error in her workflow:
2018-09-10 07:26:35,551 ERROR org.dspace.submit.step.CompleteStep @ Caught exception in submission step: 
org.dspace.authorize.AuthorizeException: Authorization denied for action WORKFLOW_STEP_1 on COLLECTION:2 by user 3819
$ dspace community-filiator --set -p 10568/97114 -c 10568/51670
$ dspace community-filiator --set -p 10568/97114 -c 10568/35409
$ dspace community-filiator --set -p 10568/97114 -c 10568/3112
  • Valerio contacted me to point out some issues with metadata on CGSpace, which I corrected in PostgreSQL:
update metadatavalue set text_value='ISI Journal' where resource_type_id=2 and metadata_field_id=226 and text_value='ISI Juornal';
UPDATE 1
update metadatavalue set text_value='ISI Journal' where resource_type_id=2 and metadata_field_id=226 and text_value='ISI journal';
UPDATE 23
update metadatavalue set text_value='ISI Journal' where resource_type_id=2 and metadata_field_id=226 and text_value='YES';
UPDATE 1
delete from metadatavalue where resource_type_id=2 and metadata_field_id=226 and text_value='NO';
DELETE 17
update metadatavalue set text_value='ISI Journal' where resource_type_id=2 and metadata_field_id=226 and text_value='ISI';
UPDATE 15
  • Start working on adding metadata for access and usage rights that we started earlier in 2018 (and also in 2017)
  • The current cg.identifier.status field will become “Access rights” and dc.rights will become “Usage rights”
  • I have some work in progress on the 5_x-rights branch
  • Linode said that CGSpace (linode18) had a high CPU load earlier today
  • When I looked, I see it’s the same Russian IP that I noticed last month:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Sep/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
   1459 157.55.39.202
   1579 95.108.181.88
   1615 157.55.39.147
   1714 66.249.64.91
   1924 50.116.102.77
   3696 157.55.39.106
   3763 157.55.39.148
   4470 70.32.83.92
   4724 35.237.175.180
  14132 5.9.6.51
  • And this bot is still creating more Tomcat sessions than Nginx requests (WTF?):
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-09-10 
14133
  • The user agent is still the same:
Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
  • I added .*crawl.* to the Tomcat Session Crawler Manager Valve, so I’m not sure why the bot is creating so many sessions…
  • I just tested that user agent on CGSpace and it does not create a new session:
$ http --print Hh https://cgspace.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)'
GET / HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: cgspace.cgiar.org
User-Agent: Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)

HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Language: en-US
Content-Type: text/html;charset=utf-8
Date: Mon, 10 Sep 2018 20:43:04 GMT
Server: nginx
Strict-Transport-Security: max-age=15768000
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Cocoon-Version: 2.2.0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
  • I will have to keep an eye on it and perhaps add it to the list of “bad bots” that get rate limited

2018-09-12

  • Merge AReS explorer changes to nginx config and deploy on CGSpace so CodeObia can start testing more
  • Re-create my local Docker container for PostgreSQL data, but using a volume for the database data:
$ sudo docker volume create --name dspacetest_data
$ sudo docker run --name dspacedb -v dspacetest_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
  • Sisay is still having problems with the controlled vocabulary for top authors
  • I took a look at the submission template and Firefox complains that the XML file is missing a root element
  • I guess it’s because Firefox is receiving an empty XML file
  • I told Sisay to run the XML file through tidy
  • More testing of the access and usage rights changes

2018-09-13

  • Peter was communicating with Altmetric about the OAI mapping issue for item 1056882810 again
  • Altmetric said it was somehow related to the OAI dateStamp not getting updated when the mappings changed, but I said that back in 2018-07 when this happened it was because the OAI was actually just not reflecting all the item’s mappings
  • After forcing a complete re-indexing of OAI the mappings were fine
  • The dateStamp is most probably only updated when the item’s metadata changes, not its mappings, so if Altmetric is relying on that we’re in a tricky spot
  • We need to make sure that our OAI isn’t publicizing stale data… I was going to post something on the dspace-tech mailing list, but never did
  • Linode says that CGSpace (linode18) has had high CPU for the past two hours
  • The top IP addresses today are:
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E "13/Sep/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10                                                                                                
     32 46.229.161.131
     38 104.198.9.108
     39 66.249.64.91
     56 157.55.39.224
     57 207.46.13.49
     58 40.77.167.120
     78 169.255.105.46
    702 54.214.112.202
   1840 50.116.102.77
   4469 70.32.83.92
  • And the top two addresses seem to be re-using their Tomcat sessions properly:
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=70.32.83.92' dspace.log.2018-09-13 | sort | uniq
7
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-13 | sort | uniq
2
  • So I’m not sure what’s going on
  • Valerio asked me if there’s a way to get the page views and downloads from CGSpace
  • I said no, but that we might be able to piggyback on the Atmire statlet REST API
  • For example, when you expand the “statlet” at the bottom of an item like 1056897103 you can see the following request in the browser console:
https://cgspace.cgiar.org/rest/statlets?handle=10568/97103&_=1536844046540
  • That JSON file has the total page views and item downloads for the item…