2018-07-01
- I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:
$ pg_dump -b -v -o --format=custom -U dspace -f dspace-2018-07-01.backup dspace
- During the
mvn package
stage on the 5.8 branch I kept getting issues with java running out of memory:
There is insufficient memory for the Java Runtime Environment to continue.
- As the machine only has 8GB of RAM, I reduced the Tomcat memory heap from 5120m to 4096m so I could try to allocate more to the build process:
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
$ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=dspacetest.cgiar.org -P \!dspace-lni,\!dspace-rdf,\!dspace-sword,\!dspace-swordv2 clean package
- Then I stopped the Tomcat 7 service, ran the ant update, and manually ran the old and ignored SQL migrations:
$ sudo su - postgres
$ psql dspace
...
dspace=# begin;
BEGIN
dspace=# \i Atmire-DSpace-5.8-Schema-Migration.sql
DELETE 0
UPDATE 1
DELETE 1
dspace=# commit
dspace=# \q
$ exit
$ dspace database migrate ignored
- After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have
2018-07-02
2018-07-03
- Finally finish with the CIFOR Archive records (a total of 2448):
- I mapped the 50 items that were duplicates from elsewhere in CGSpace into CIFOR Archive
- I did one last check of the remaining 2398 items and found eight who have a
cg.identifier.doi
that links to some URL other than a DOI so I moved those to cg.identifier.url
and cg.identifier.googleurl
as appropriate
- Also, thirteen items had a DOI in their citation, but did not have a
cg.identifier.doi
field, so I added those
- Then I imported those 2398 items in two batches (to deal with memory issues):
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive.csv
$ dspace metadata-import -e aorth@mjanja.ch -f /tmp/2018-06-27-New-CIFOR-Archive2.csv
- I noticed there are many items that use HTTP instead of HTTPS for their Google Books URL, and some missing HTTP entirely:
dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
count
-------
785
dspace=# select count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
count
-------
4
- I think I should fix that as well as some other garbage values like “test” and “dspace.ilri.org” etc:
dspace=# begin;
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value like 'http://books.google.%';
UPDATE 785
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'books.google', 'https://books.google') where resource_type_id=2 and metadata_field_id=222 and text_value ~ '^books\.google\..*';
UPDATE 4
dspace=# update metadatavalue set text_value='https://books.google.com/books?id=meF1CLdPSF4C' where resource_type_id=2 and metadata_field_id=222 and text_value='meF1CLdPSF4C';
UPDATE 1
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=222 and metadata_value_id in (2299312, 10684, 10700, 996403);
DELETE 4
dspace=# commit;
- Testing DSpace 5.8 with PostgreSQL 9.6 and Tomcat 8.5.32 (instead of my usual 7.0.88) and for some reason I get autowire errors on Catalina startup with 8.5.32:
03-Jul-2018 19:51:37.272 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener]
java.lang.RuntimeException: Failure during filter init: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
at org.dspace.servicemanager.servlet.DSpaceKernelServletContextListener.contextInitialized(DSpaceKernelServletContextListener.java:92)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4792)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5256)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:754)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:730)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734)
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:629)
at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1839)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Manager: failure starting up spring service manager: Error creating bean with name 'conversionService' defined in file [/home/aorth/dspace/config/spring/xmlui/spring-dspace-addon-cua-services.xml]: Cannot create inner bean 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a' of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter] while setting bean property 'converters' with key [1]; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter#3f6c3e6a': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$ColumnsConverter.filterConverter; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No matching bean of type [com.atmire.app.xmlui.aspect.statistics.mostpopular.MostPopularConfig$FilterConverter] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {@org.springframework.beans.factory.annotation.Autowired(required=true)}
- Gotta check that out later…
2018-07-04
- I verified that the autowire error indeed only occurs on Tomcat 8.5, but the application works fine on Tomcat 7
- I have raised this in the DSpace 5.8 compatibility ticket on Atmire’s tracker
- Abenet wants me to add “United Kingdom government” to the sponsors on CGSpace so I created a ticket to track it (#381)
- Also, Udana wants me to add “Enhancing Sustainability Across Agricultural Systems” to the WLE Phase II research themes so I created a ticket to track that (#382
- I need to try to finish this DSpace 5.8 business first because I have too many branches with cherry-picks going on right now!
2018-07-06
- CCAFS want me to add “PII-FP2_MSCCCAFS” to their Phase II project tags on CGSpace (#383
- I’ll do it in a batch with all the other metadata updates next week
2018-07-08
- I was tempted to do the Linode instance upgrade on CGSpace (linode18), but after looking closely at the system backups I noticed that Solr isn’t being backed up to S3
- I apparently noticed this—and fixed it!—in 2016-07, but it doesn’t look like the backup has been updated since then!
- It looks like I added Solr to the
backup_to_s3.sh
script, but that script is not even being used (s3cmd
is run directly from root’s crontab)
- For now I have just initiated a manual S3 backup of the Solr data:
# s3cmd sync --delete-removed /home/backup/solr/ s3://cgspace.cgiar.org/solr/
- But I need to add this to cron!
- I wonder if I should convert some of the cron jobs to systemd services / timers…
- I sent a note to all our users on Yammer to ask them about possible maintenance on Sunday, July 14th
- Abenet wants to be able to search by journal title (dc.source) in the advanced Discovery search so I opened an issue for it (#384)
- I regenerated the list of names for all our ORCID iDs using my resolve-orcids.py script:
$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml | sort | uniq > /tmp/2018-07-08-orcids.txt
$ ./resolve-orcids.py -i /tmp/2018-07-08-orcids.txt -o /tmp/2018-07-08-names.txt -d
- But after comparing to the existing list of names I didn’t see much change, so I just ignored it
2018-07-09
- Uptime Robot said that CGSpace was down for two minutes early this morning but I don’t see anything in Tomcat logs or dmesg
- Uptime Robot said that CGSpace was down for two minutes again later in the day, and this time I saw a memory error in Tomcat’s
catalina.out
:
Exception in thread "http-bio-127.0.0.1-8081-exec-557" java.lang.OutOfMemoryError: Java heap space
- I’m not sure if it’s the same error, but I see this in DSpace’s
solr.log
:
2018-07-09 06:25:09,913 ERROR org.apache.solr.servlet.SolrDispatchFilter @ null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
- I see a strange error around that time in
dspace.log.2018-07-08
:
2018-07-09 06:23:43,510 ERROR com.atmire.statistics.SolrLogThread @ IOException occured when talking to server at: http://localhost:8081/solr/statistics
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8081/solr/statistics
- But not sure what caused that…
- I got a message from Linode tonight that CPU usage was high on CGSpace for the past few hours around 8PM GMT
- Looking in the nginx logs I see the top ten IP addresses active today:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "09/Jul/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
1691 40.77.167.84
1701 40.77.167.69
1718 50.116.102.77
1872 137.108.70.6
2172 157.55.39.234
2190 207.46.13.47
2848 178.154.200.38
4367 35.227.26.162
4387 70.32.83.92
4738 95.108.181.88
- Of those, all except
70.32.83.92
and 50.116.102.77
are NOT re-using their Tomcat sessions, for example from the XMLUI logs:
$ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2018-07-09
4435
95.108.181.88
appears to be Yandex, so I dunno why it’s creating so many sessions, as its user agent should match Tomcat’s Crawler Session Manager Valve
70.32.83.92
is on MediaTemple but I’m not sure who it is. They are mostly hitting REST so I guess that’s fine
35.227.26.162
doesn’t declare a user agent and is on Google Cloud, so I should probably mark them as a bot in nginx
178.154.200.38
is Yandex again
207.46.13.47
is Bing
157.55.39.234
is Bing
137.108.70.6
is our old friend CORE bot
50.116.102.77
doesn’t declare a user agent and lives on HostGator, but mostly just hits the REST API so I guess that’s fine
40.77.167.84
is Bing again
- Interestingly, the first time that I see
35.227.26.162
was on 2018-06-08
- I’ve added
35.227.26.162
to the bot tagging logic in the nginx vhost
2018-07-10
- Add “United Kingdom government” to sponsors (#381)
- Add “Enhancing Sustainability Across Agricultural Systems” to WLE Phase II Research Themes (#382)
- Add “PII-FP2_MSCCCAFS” to CCAFS Phase II Project Tags (#383)
- Add journal title (dc.source) to Discovery search filters (#384)
- All were tested and merged to the
5_x-prod
branch and will be deployed on CGSpace this coming weekend when I do the Linode server upgrade
- I need to get them onto the 5.8 testing branch too, either via cherry-picking or by rebasing after we finish testing Atmire’s 5.8 pull request (#378)
- Linode sent an alert about CPU usage on CGSpace again, about 13:00UTC
- These are the top ten users in the last two hours:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "10/Jul/2018:(11|12|13)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
81 193.95.22.113
82 50.116.102.77
112 40.77.167.90
117 196.190.95.98
120 178.154.200.38
215 40.77.167.96
243 41.204.190.40
415 95.108.181.88
695 35.227.26.162
697 213.139.52.250
- Looks like
213.139.52.250
is Moayad testing his new CGSpace vizualization thing:
213.139.52.250 - - [10/Jul/2018:13:39:41 +0000] "GET /bitstream/handle/10568/75668/dryad.png HTTP/2.0" 200 53750 "http://localhost:4200/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36"
- He said there was a bug that caused his app to request a bunch of invalid URLs
- I’ll have to keep and eye on this and see how their platform evolves
2018-07-11
- Skype meeting with Peter and Addis CGSpace team
- We need to look at doing the
dc.rights
stuff again, which we last worked on in 2018-01 and 2018-02
- Abenet suggested that we do a controlled vocabulary for the authors, perhaps with the top 1,500 or so on CGSpace?
- Peter told Sisay to test this controlled vocabulary
- Discuss meeting in Nairobi in October
2018-07-12
- Uptime Robot said that CGSpace went down a few times last night, around 10:45 PM and 12:30 AM
- Here are the top ten IPs from last night and this morning:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "11/Jul/2018:22" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
48 66.249.64.91
50 35.227.26.162
57 157.55.39.234
59 157.55.39.71
62 147.99.27.190
82 95.108.181.88
92 40.77.167.90
97 183.128.40.185
97 240e:f0:44:fa53:745a:8afe:d221:1232
3634 208.110.72.10
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "12/Jul/2018:00" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
25 216.244.66.198
38 40.77.167.185
46 66.249.64.93
56 157.55.39.71
60 35.227.26.162
65 157.55.39.234
83 95.108.181.88
87 66.249.64.91
96 40.77.167.90
7075 208.110.72.10
- We have never seen
208.110.72.10
before… so that’s interesting!
- The user agent for these requests is: Pcore-HTTP/v0.44.0
- A brief Google search doesn’t turn up any information about what this bot is, but lots of users complaining about it
- This bot does make a lot of requests all through the day, although it seems to re-use its Tomcat session:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "Pcore-HTTP" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
17098 208.110.72.10
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=208.110.72.10' dspace.log.2018-07-11
1161
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=208.110.72.10' dspace.log.2018-07-12
1885
- I think the problem is that, despite the bot requesting
robots.txt
, it almost exlusively requests dynamic pages from /discover
:
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "Pcore-HTTP" | grep -o -E "GET /(browse|discover|search-filter)" | sort -n | uniq -c | sort -rn
13364 GET /discover
993 GET /search-filter
804 GET /browse
# zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "Pcore-HTTP" | grep robots
208.110.72.10 - - [12/Jul/2018:00:22:28 +0000] "GET /robots.txt HTTP/1.1" 200 1301 "https://cgspace.cgiar.org/robots.txt" "Pcore-HTTP/v0.44.0"
- So this bot is just like Baiduspider, and I need to add it to the nginx rate limiting
- I’ll also add it to Tomcat’s Crawler Session Manager Valve to force the re-use of a common Tomcat sesssion for all crawlers just in case
- Generate a list of all affiliations in CGSpace to send to Mohamed Salem to compare with the list on MEL (sorting the list by most occurrences):
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv header
COPY 4518
dspace=# \q
$ csvcut -c 1 < /tmp/affiliations.csv > /tmp/affiliations-1.csv
- We also need to discuss standardizing our countries and comparing our ORCID iDs