- Peter noticed that there were still some old CRP names on CGSpace, because I hadn't forced the Discovery index to be updated after I fixed the others last week
- For completeness I re-ran the CRP corrections on CGSpace:
- Elizabeth from CIAT emailed to ask if I could help her by adding ORCID identifiers to all of Joseph Tohme's items
- I used my [add-orcid-identifiers-csv.py](https://gist.githubusercontent.com/alanorth/a49d85cd9c5dea89cddbe809813a7050/raw/f67b6e45a9a940732882ae4bb26897a9b245ef31/add-orcid-identifiers-csv.py) script:
- There was a quoting error in my CRP CSV and the replacements for `Forests, Trees and Agroforestry` got messed up
- So I fixed them and had to re-index again!
- I started preparing the git branch for the the DSpace 5.5→5.8 upgrade:
```
$ git checkout -b 5_x-dspace-5.8 5_x-prod
$ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.8
```
- I was prepared to skip some commits that I had cherry picked from the upstream `dspace-5_x` branch when we did the DSpace 5.5 upgrade (see notes on 2016-10-19 and 2017-12-17):
- [DS-3246] Improve cleanup in recyclable components (upstream commit on dspace-5_x: 9f0f5940e7921765c6a22e85337331656b18a403)
- [DS-3250] applying patch provided by Atmire (upstream commit on dspace-5_x: c6fda557f731dbc200d7d58b8b61563f86fe6d06)
- bump up to latest minor pdfbox version (upstream commit on dspace-5_x: b5330b78153b2052ed3dc2fd65917ccdbfcc0439)
- DS-3583 Usage of correct Collection Array (#1731) (upstream commit on dspace-5_x: c8f62e6f496fa86846bfa6bcf2d16811087d9761)
- ... but somehow git knew, and didn't include them in my interactive rebase!
- I need to send this branch to Atmire and also arrange payment (see [ticket #560](https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560) in their tracker)
- I ran all system updates on DSpace Test and rebooted it
- Proof some records on DSpace Test for Udana from IWMI
- He has done better with the small syntax and consistency issues but then there are larger concerns with not linking to DOIs, copying titles incorrectly, etc
## 2018-04-10
- I got a notice that CGSpace CPU usage was very high this morning
- Looking at the nginx logs, here are the top users today so far:
- I assume we want `removeAbandonedOnBorrow` and make updates to the Tomcat 8 templates in Ansible
- After reading more documentation I see that Tomcat 8.5's default DBCP seems to now be Commons DBCP2 instead of Tomcat DBCP
- It can be overridden in Tomcat's _server.xml_ by setting `factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"` in the `<Resource>`
- I think we should use this default, so we'll need to remove some other settings that are specific to Tomcat's DBCP like `jdbcInterceptors` and `abandonWhenPercentageFull`
- Merge the changes adding ORCID identifier to advanced search and Atmire Listings and Reports ([#371](https://github.com/ilri/DSpace/pull/371))
- Fix one more issue of missing XMLUI strings (for CRP subject when clicking "view more" in the Discovery sidebar)
- I told Udana to fix the citation and abstract of the one item, and to correct the `dc.language.iso` for the five Spanish items in his Book Chapters collection
- Then we can import the records to CGSpace
## 2018-04-11
- DSpace Test (linode19) crashed again some time since yesterday:
- While testing an XMLUI patch for [DS-3883](https://jira.duraspace.org/browse/DS-3883) I noticed that there is still some remaining Authority / Solr configuration left that we need to remove:
```
2018-04-14 18:55:25,841 ERROR org.dspace.authority.AuthoritySolrServiceImpl @ Authority solr is not correctly configured, check "solr.authority.server" property in the dspace.cfg
java.lang.NullPointerException
```
- I assume we need to remove `authority` from the consumers in `dspace/config/dspace.cfg`:
- IWMI people are asking about building a search query that outputs RSS for their reports
- They want the same results as this Discovery query: https://cgspace.cgiar.org/discover?filtertype_1=dateAccessioned&filter_relational_operator_1=contains&filter_1=2018&submit_apply_filter=&query=&scope=10568%2F16814&rpp=100&sort_by=dc.date.issued_dt&order=desc
- They will need to use OpenSearch, but I can't remember all the parameters
- Apparently search sort options for OpenSearch are in `dspace.cfg`:
- They want items by issue date, so we need to use sort option 2
- According to the DSpace Manual there are only the following parameters to OpenSearch: format, scope, rpp, start, and sort_by
- The OpenSearch `query` parameter expects a Discovery search filter that is defined in `dspace/config/spring/api/discovery.xml`
- So for IWMI they should be able to use something like this: https://cgspace.cgiar.org/open-search/discover?query=dateIssued:2018&scope=10568/16814&sort_by=2&order=DESC&format=rss
- There are also `rpp` (results per page) and `start` parameters but in my testing now on DSpace 5.5 they behave very strangely
- For example, set `rpp=1` and then check the results for `start` values of 0, 1, and 2 and they are all the same!
- If I have time I will check if this behavior persists on DSpace 6.x on the official DSpace demo and file a bug
- Also, the DSpace Manual as of 5.x has very poor documentation for OpenSearch
- They don't tell you to use Discovery search filters in the `query` (with format `query=dateIssued:2018`)
- They don't tell you that the sort options are actually defined in `dspace.cfg` (ie, you need to use `2` instead of `dc.date.issued_dt`)
- They are missing the `order` parameter (ASC vs DESC)
- I notice that DSpace Test has crashed again, due to memory:
- I will increase the JVM heap size from 5120M to 6144M, though we don't have much room left to grow as DSpace Test (linode19) is using a smaller instance size than CGSpace
- Gabriela from CIP asked if I could send her a list of all CIP authors so she can do some replacements on the name formats
- I got a list of all the CIP collections manually and use the same query that I used in [August, 2017](/cgspace-notes/2017-08):
```
dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/89347', '10568/88229', '10568/53086', '10568/53085', '10568/69069', '10568/53087', '10568/53088', '10568/53089', '10568/53090', '10568/53091', '10568/53092', '10568/70150', '10568/53093', '10568/64874', '10568/53094'))) group by text_value order by count desc) to /tmp/cip-authors.csv with csv;
- Gabriela from CIP emailed to say that CGSpace was returning a white page, but I haven't seen any emails from UptimeRobot
- I confirm that it's just giving a white page around 4:16
- The DSpace logs show that there are no database connections:
```
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-715] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle:0; lastwait:5000].
```
- And there have been shit tons of errors in the last (starting only 20 minutes ago luckily):
- I tried to reboot the server from the command line but after a few minutes it didn't come back up
- Looking at the Linode console I see that it is stuck trying to shut down
- Even "Reboot" via Linode console doesn't work!
- After shutting it down a few times via the Linode console it finally rebooted
- Everything is back but I have no idea what caused this—I suspect something with the hosting provider
- Also super weird, the last entry in the DSpace log file is from `2018-04-20 16:35:09`, and then immediately it goes to `2018-04-20 19:15:04` (three hours later!):
```
2018-04-20 16:35:09,144 ERROR org.dspace.app.util.AbstractDSpaceWebapp @ Failed to record shutdown in Webapp table.
org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle
:0; lastwait:5000].
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:685)
at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:187)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:128)
at org.dspace.storage.rdbms.DatabaseManager.getConnection(DatabaseManager.java:632)
at org.dspace.core.Context.init(Context.java:121)
at org.dspace.core.Context.<init>(Context.java:95)
at org.dspace.app.util.AbstractDSpaceWebapp.deregister(AbstractDSpaceWebapp.java:97)
at org.dspace.app.util.DSpaceContextListener.contextDestroyed(DSpaceContextListener.java:146)
at org.apache.catalina.core.StandardContext.listenerStop(StandardContext.java:5115)
at org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5779)
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:224)
at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1588)
at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1577)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-04-20 19:15:04,006 INFO org.dspace.core.ConfigurationManager @ Loading from classloader: file:/home/cgspace.cgiar.org/config/dspace.cfg
- Testing my Ansible playbooks with a clean and updated installation of Ubuntu 18.04 and I fixed some issues that I hadn't run into a few weeks ago
- There seems to be a new issue with Java dependencies, though
- The `default-jre` package is going to be Java 10 on Ubuntu 18.04, but I want to use `openjdk-8-jre-headless` (well, the JDK actually, but it uses this JRE)
- Tomcat and Ant are fine with Java 8, but the `maven` package wants to pull in Java 10 for some reason
- Looking closer, I see that `maven` depends on `java7-runtime-headless`, which is indeed provided by `openjdk-8-jre-headless`
- So it must be one of Maven's dependencies...
- I will watch it for a few days because it could be an issue that will be resolved before Ubuntu 18.04's release
- Otherwise I will post a bug to the ubuntu-release mailing list
- Looks like the only way to fix this is to install `openjdk-8-jdk-headless` before (so it pulls in the JRE) in a separate transaction, or to manually install `openjdk-8-jre-headless` in the same apt transaction as `maven`
- Also, I started porting PostgreSQL 9.6 into the Ansible infrastructure scripts
- This should be a drop in I believe, though I will definitely test it more locally as well as on DSpace Test once we move to DSpace 5.8 and Ubuntu 18.04 in the coming months
- Still testing the [Ansible infrastructure playbooks](https://github.com/ilri/rmg-ansible-public) for Ubuntu 18.04, Tomcat 8.5, and PostgreSQL 9.6
- One other new thing I notice is that PostgreSQL 9.6 no longer uses `createuser` and `nocreateuser`, as those have actually meant `superuser` and `nosuperuser` and have been deprecated for *ten years*
- So for my notes, when I'm importing a CGSpace database dump I need to amend my notes to give super user permission to a user, rather than create user:
```
$ psql dspacetest -c 'alter user dspacetest superuser;'
at org.apache.coyote.http11.Http11InputBuffer.init(Http11InputBuffer.java:688)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:672)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
```
- There's a [Debian bug about this from a few weeks ago](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=895866)
- Apparently Tomcat was compiled with Java 9, so doesn't work with Java 8
## 2018-04-29
- DSpace Test crashed again, looks like memory issues again
- JVM heap size was last increased to 6144m but the system only has 8GB total so there's not much we can do here other than get a bigger Linode instance or remove the massive Solr Statistics data
- I will email the CGSpace team to ask them whether or not we want to commit to having a public test server that accurately mirrors CGSpace (ie, to upgrade to the next largest Linode)