March, 2019
2019-03-01
- I checked IITA’s 259 Feb 14 records from last month for duplicates using Atmire’s Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good
- I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…
- Looking at the other half of Udana’s WLE records from 2018-11
- I finished the ones for Restoring Degraded Landscapes (RDL), but these are for Variability, Risks and Competing Uses (VRC)
- I did the usual cleanups for whitespace, added regions where they made sense for certain countries, cleaned up the DOI link formats, added rights information based on the publications page for a few items
- Most worryingly, there are encoding errors in the abstracts for eleven items, for example:
- 68.15% � 9.45 instead of 68.15% ± 9.45
- 2003�2013 instead of 2003–2013
- I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs
2019-03-03
- Trying to finally upload IITA’s 259 Feb 14 items to CGSpace so I exported them from DSpace Test:
$ mkdir 2019-03-03-IITA-Feb14
$ dspace export -i 10568/108684 -t COLLECTION -m -n 0 -d 2019-03-03-IITA-Feb14
- As I was inspecting the archive I noticed that there were some problems with the bitsreams:
- First, Sisay didn’t include the bitstream descriptions
- Second, only five items had bitstreams and I remember in the discussion with IITA that there should have been nine!
- I had to refer to the original CSV from January to find the file names, then download and add them to the export contents manually!
- After adding the missing bitstreams and descriptions manually I tested them again locally, then imported them to a temporary collection on CGSpace:
$ dspace import -a -c 10568/99832 -e aorth@stfu.com -m 2019-03-03-IITA-Feb14.map -s /tmp/2019-03-03-IITA-Feb14
- DSpace’s export function doesn’t include the collections for some reason, so you need to import them somewhere first, then export the collection metadata and re-map the items to proper owning collections based on their types using OpenRefine or something
- After re-importing to CGSpace to apply the mappings, I deleted the collection on DSpace Test and ran the
dspace cleanup
script - Merge the IITA research theme changes from last month to the
5_x-prod
branch (#413)- I will deploy to CGSpace soon and then think about how to batch tag all IITA’s existing items with this metadata
- Deploy Tomcat 7.0.93 on CGSpace (linode18) after having tested it on DSpace Test (linode19) for a week
2019-03-06
- Abenet was having problems with a CIP user account, I think that the user could not register
- I suspect it’s related to the email issue that ICT hasn’t responded about since last week
- As I thought, I still cannot send emails from CGSpace:
$ dspace test-email
About to send test email:
- To: blah@stfu.com
- Subject: DSpace test email
- Server: smtp.office365.com
Error sending email:
- Error: javax.mail.AuthenticationFailedException
- I will send a follow-up to ICT to ask them to reset the password
2019-03-07
- ICT reset the email password and I confirmed that it is working now
- Generate a controlled vocabulary of 1187 AGROVOC subjects from the top 1500 that I checked last month, dumping the terms themselves using
csvcut
and then applying XML controlled vocabulary format in vim and then checking with tidy for good measure:
$ csvcut -c name 2019-02-22-subjects.csv > dspace/config/controlled-vocabularies/dc-contributor-author.xml
$ # apply formatting in XML file
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/dc-subject.xml
- I tested the AGROVOC controlled vocabulary locally and will deploy it on DSpace Test soon so people can see it
- Atmire noticed my message about the “solr_update_time_stamp” error on the dspace-tech mailing list and created an issue on their tracker to discuss it with me
- They say the error is harmless, but has nevertheless been fixed in their newer module versions
2019-03-08
- There’s an issue with CGSpace right now where all items are giving a blank page in the XMLUI
Interestingly, if I check an item in the REST API it is also mostly blank: only the title and the ID!On second thought I realize I probably was just seeing the default view without any “expands”- I don’t see anything unusual in the Tomcat logs, though there are thousands of those
solr_update_time_stamp
errors:
# journalctl -u tomcat7 | grep -c 'Multiple update components target the same field:solr_update_time_stamp'
1076
- I restarted Tomcat and it’s OK now…
- Skype meeting with Peter and Abenet and Sisay
- We want to try to crowd source the correction of invalid AGROVOC terms starting with the ~313 invalid ones from our top 1500
- We will share a Google Docs spreadsheet with the partners and ask them to mark the deletions and corrections
- Abenet and Alan to spend some time identifying correct DCTERMS fields to move to, with preference over CG Core 2.0 as we want to be globally compliant (use information from SEO crosswalks)
- I need to follow up on the privacy page that Sisay worked on
- We want to try to migrate the 600 Livestock CRP blog posts to CGSpace, Peter will try to export the XML from WordPress so I can try to parse it with a script
2019-03-09
- I shared a post on Yammer informing our editors to try to AGROVOC controlled list
- The SPDX legal committee had a meeting and discussed the addition of CC-BY-ND-3.0-IGO and other IGO licenses to their list, but it seems unlikely (spdx/license-list-XML/issues/767)
- The FireOak report highlights the fact that several CGSpace collections have mixed-content errors due to the use of HTTP links in the Feedburner forms
- I see 46 occurrences of these with this query:
dspace=# SELECT text_value FROM metadatavalue WHERE resource_type_id in (3,4) AND (text_value LIKE '%http://feedburner.%' OR text_value LIKE '%http://feeds.feedburner.%');
- I can replace these globally using the following SQL:
dspace=# UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'http://feedburner.','https//feedburner.', 'g') WHERE resource_type_id in (3,4) AND text_value LIKE '%http://feedburner.%';
UPDATE 43
dspace=# UPDATE metadatavalue SET text_value = REGEXP_REPLACE(text_value, 'http://feeds.feedburner.','https//feeds.feedburner.', 'g') WHERE resource_type_id in (3,4) AND text_value LIKE '%http://feeds.feedburner.%';
UPDATE 44
- I ran the corrections on CGSpace and DSpace Test
2019-03-10
- Working on tagging IITA’s items with their new research theme (
cg.identifier.iitatheme
) based on their existing IITA subjects (see notes from 2019-02) - I exported the entire IITA community from CGSpace and then used
csvcut
to extract only the needed fields:
$ csvcut -c 'id,cg.subject.iita,cg.subject.iita[],cg.subject.iita[en],cg.subject.iita[en_US]' ~/Downloads/10568-68616.csv > /tmp/iita.csv
After importing to OpenRefine I realized that tagging items based on their subjects is tricky because of the row/record mode of OpenRefine when you split the multi-value cells as well as the fact that some items might need to be tagged twice (thus needing a
||
)I think it might actually be easier to filter by IITA subject, then by IITA theme (if needed), and then do transformations with some conditional values in GREL expressions like:
if(isBlank(value), 'PLANT PRODUCTION & HEALTH', value + '||PLANT PRODUCTION & HEALTH')
- Then it’s more annoying because there are four IITA subject columns…
- In total this would add research themes to 1,755 items
- I want to double check one last time with Bosede that they would like to do this, because I also see that this will tag a few hundred items from the 1970s and 1980s
2019-03-11
- Bosede said that she would like the IITA research theme tagging only for items since 2015, which would be 256 items
2019-03-12
- I imported the changes to 256 of IITA’s records on CGSpace
2019-03-14
- CGSpace had the same issue with blank items like earlier this month and I restarted Tomcat to fix it
- Create a pull request to change Swaziland to Eswatini and Macedonia to North Macedonia (#414)
- I see thirty-six items using Swaziland country metadata, and Peter says we should change only those from 2018 and 2019
- I think that I could get the resource IDs from SQL and then export them using
dspace metadata-export
…
- This is a bit ugly, but it works (using the DSpace 5 SQL helper function to resolve ID to handle):
for id in $(psql -U postgres -d dspacetest -h localhost -c "SELECT resource_id FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=228 AND text_value LIKE '%SWAZILAND%'" | grep -oE '[0-9]{3,}'); do
echo "Getting handle for id: ${id}"
handle=$(psql -U postgres -d dspacetest -h localhost -c "SELECT ds5_item2itemhandle($id)" | grep -oE '[0-9]{5}/[0-9]+')
~/dspace/bin/dspace metadata-export -f /tmp/${id}.csv -i $handle
done
- Then I couldn’t figure out a clever way to join all the CSVs, so I just grepped them to find the IDs with dates from 2018 and 2019 and there are apparently only three:
$ grep -oE '201[89]' /tmp/*.csv | sort -u
/tmp/94834.csv:2018
/tmp/95615.csv:2018
/tmp/96747.csv:2018
- And looking at those items more closely, only one of them has an issue date of after 2018-04, so I will only update that one (as the countrie’s name only changed in 2018-04)
- Run all system updates and reboot linode20
- Follow up with Felix from Earlham to see if he’s done testing DSpace Test with COPO so I can re-sync the server from CGSpace
2019-03-15
- CGSpace (linode18) has the blank page error again
- I’m not sure if it’s related, but I see the following error in DSpace’s log:
2019-03-15 14:09:32,685 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
java.sql.SQLException: Connection org.postgresql.jdbc.PgConnection@55ba10b5 is closed.
at org.apache.tomcat.dbcp.dbcp.DelegatingConnection.checkOpen(DelegatingConnection.java:398)
at org.apache.tomcat.dbcp.dbcp.DelegatingConnection.prepareStatement(DelegatingConnection.java:279)
at org.apache.tomcat.dbcp.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.prepareStatement(PoolingDataSource.java:313)
at org.dspace.storage.rdbms.DatabaseManager.queryTable(DatabaseManager.java:220)
at org.dspace.authorize.AuthorizeManager.getPolicies(AuthorizeManager.java:612)
at org.dspace.content.crosswalk.METSRightsCrosswalk.disseminateElement(METSRightsCrosswalk.java:154)
at org.dspace.content.crosswalk.METSRightsCrosswalk.disseminateElement(METSRightsCrosswalk.java:300)
- Interestingly, I see a pattern of these errors increasing, with single and double digit numbers over the past month,
but spikes of over 1,000 today, yesterday, and on 2019-03-08, which was exactly the first time we saw this blank page error recently
# grep -c 'SQL QueryTable Error' dspace.log.2019-0*
dspace.log.2019-02-01:6
dspace.log.2019-02-02:1
dspace.log.2019-02-03:0
dspace.log.2019-02-04:4
dspace.log.2019-02-05:4
dspace.log.2019-02-06:3
dspace.log.2019-02-07:0
dspace.log.2019-02-08:2
dspace.log.2019-02-09:0
dspace.log.2019-02-10:0
dspace.log.2019-02-11:1
dspace.log.2019-02-12:2
dspace.log.2019-02-13:1
dspace.log.2019-02-14:74
dspace.log.2019-02-15:47
dspace.log.2019-02-16:44
dspace.log.2019-02-17:37
dspace.log.2019-02-18:191
dspace.log.2019-02-19:51
dspace.log.2019-02-20:32
dspace.log.2019-02-21:64
dspace.log.2019-02-22:48
dspace.log.2019-02-23:32
dspace.log.2019-02-24:25
dspace.log.2019-02-25:53
dspace.log.2019-02-26:63
dspace.log.2019-02-27:28
dspace.log.2019-02-28:31
dspace.log.2019-03-01:29
dspace.log.2019-03-02:24
dspace.log.2019-03-03:41
dspace.log.2019-03-04:25
dspace.log.2019-03-05:48
dspace.log.2019-03-06:48
dspace.log.2019-03-07:42
dspace.log.2019-03-08:1781
dspace.log.2019-03-09:22
dspace.log.2019-03-10:23
dspace.log.2019-03-11:52
dspace.log.2019-03-12:474
dspace.log.2019-03-13:37
dspace.log.2019-03-14:924
dspace.log.2019-03-15:770
- There are not too many connections currently in PostgreSQL:
$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
6 dspaceApi
10 dspaceCli
15 dspaceWeb
- I didn’t see anything interesting in the PostgreSQL logs, though this stack trace from the Tomcat logs (in the systemd journal) from earlier today might be related?
SEVERE: Servlet.service() for servlet [spring] in context with path [] threw exception [org.springframework.web.util.NestedServletException: Request processing failed; nested exception is java.util.EmptyStackException] with root cause
java.util.EmptyStackException
at java.util.Stack.peek(Stack.java:102)
at java.util.Stack.pop(Stack.java:84)
at org.apache.cocoon.callstack.CallStack.leave(CallStack.java:54)
at org.apache.cocoon.servletservice.CallStackHelper.leaveServlet(CallStackHelper.java:85)
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:484)
at org.apache.cocoon.servletservice.ServletServiceContext$PathDispatcher.forward(ServletServiceContext.java:443)
at org.apache.cocoon.servletservice.spring.ServletFactoryBean$ServiceInterceptor.invoke(ServletFactoryBean.java:264)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
at com.sun.proxy.$Proxy90.service(Unknown Source)
at org.dspace.springmvc.CocoonView.render(CocoonView.java:113)
at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1180)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:950)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:624)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.rdf.negotiation.NegotiationFilter.doFilter(NegotiationFilter.java:59)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dspace.utils.servlet.DSpaceWebappServletFilter.doFilter(DSpaceWebappServletFilter.java:78)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:110)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:494)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
at org.apache.catalina.valves.CrawlerSessionManagerValve.invoke(CrawlerSessionManagerValve.java:234)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:1025)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:445)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1137)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:637)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
- For now I will just restart Tomcat…
2019-03-17
- Last week Felix from Earlham said that they finished testing on DSpace Test (linode19) so I made backups of some things there and re-deployed the system on Ubuntu 18.04
- During re-deployment I hit a few issues with the Ansible playbooks and made some minor improvements
- There seems to be an issue with nodejs’s dependencies now, which causes npm to get uninstalled when installing the certbot dependencies (due to a conflict in libssl dependencies)
- I re-worked the playbooks to use Node.js from the upstream official repository for now
- Create and merge pull request for the AGROVOC controlled list (#415)
- Run all system updates on CGSpace (linode18) and re-deploy the
5_x-prod
branch and reboot the server
- Run all system updates on CGSpace (linode18) and re-deploy the
- Re-sync DSpace Test with a fresh database snapshot and assetstore from CGSpace
- After restarting Tomcat, Solr was giving the “Error opening new searcher” error for all cores
- I stopped Tomcat, added
ulimit -v unlimited
to thecatalina.sh
script and deleted all old locks in the DSpacesolr
directory and then DSpace started up normally - I’m still not exactly sure why I see this error and if the
ulimit
trick actually helps, as thetomcat7.service
hasLimitAS=infinity
anyways (and from checking the PID’s limits file in/proc
it seems to be applied) - Then I noticed that the item displays were blank… so I checked the database info and saw there were some unfinished migrations
- I’m not entirely sure if it’s related, but I tried to delete the old migrations and then force running the ignored ones like when we upgraded to DSpace 5.8 in 2018-06 and then after restarting Tomcat I could see the item displays again
- I copied the 2019 Solr statistics core from CGSpace to DSpace Test and it works (and is only 5.5GB currently), so now we have some useful stats on DSpace Test for the CUA module and the dspace-statistics-api
- I ran DSpace’s cleanup task on CGSpace (linode18) and there were errors:
$ dspace cleanup -v
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
Detail: Key (bitstream_id)=(164496) is still referenced from table "bundle".
- The solution is, as always:
# su - postgres
$ psql dspacetest -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (164496);'
UPDATE 1
2019-03-18
- I noticed that the regular expression for validating lines from input files in my
agrovoc-lookup.py
script was skipping characters with accents, etc, so I changed it to use the\w
character class for words instead of trying to match[A-Z]
etc…- We have a Spanish and French subjects so this is very important
- Also there were some subjects with apostrophes, dashes, and periods… these are probably invalid AGROVOC subject terms, but we should save them to the rejects file instead of skipping them nevertheless
- Dump top 1500 subjects from CGSpace to try one more time to generate a list of invalid terms using my
agrovoc-lookup.py
script:
dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2019-03-18-top-1500-subject.csv WITH CSV HEADER;
COPY 1500
dspace=# \q
$ csvcut -c text_value /tmp/2019-03-18-top-1500-subject.csv > 2019-03-18-top-1500-subject.csv
$ ./agrovoc-lookup.py -l en -i 2019-03-18-top-1500-subject.csv -om /tmp/en-subjects-matched.txt -or /tmp/en-subjects-unmatched.txt
$ ./agrovoc-lookup.py -l es -i 2019-03-18-top-1500-subject.csv -om /tmp/es-subjects-matched.txt -or /tmp/es-subjects-unmatched.txt
$ ./agrovoc-lookup.py -l fr -i 2019-03-18-top-1500-subject.csv -om /tmp/fr-subjects-matched.txt -or /tmp/fr-subjects-unmatched.txt
$ cat /tmp/*-subjects-matched.txt | sort -u > /tmp/subjects-matched-sorted.txt
$ wc -l /tmp/subjects-matched-sorted.txt
1318 /tmp/subjects-matched-sorted.txt
$ sort -u 2019-03-18-top-1500-subject.csv > /tmp/1500-subjects-sorted.txt
$ comm -13 /tmp/subjects-matched-sorted.txt /tmp/1500-subjects-sorted.txt > 2019-03-18-subjects-unmatched.txt
$ wc -l 2019-03-18-subjects-unmatched.txt
182 2019-03-18-subjects-unmatched.txt
- So the new total of matched terms with the updated regex is 1317 and unmatched is 183 (previous number of matched terms was 1187)
- Create and merge a pull request to update the controlled vocabulary for AGROVOC terms (#416)
- We are getting the blank page issue on CGSpace again today and I see a
large numberof the “SQL QueryTable Error” in the DSpace log again (last time was 2019-03-15):
$ grep -c 'SQL QueryTable Error' dspace.log.2019-03-1[5678]
dspace.log.2019-03-15:929
dspace.log.2019-03-16:67
dspace.log.2019-03-17:72
dspace.log.2019-03-18:1038
- Though WTF, this grep seems to be giving weird inaccurate results actually, and the real number of errors is much lower:
$ grep -I 'SQL QueryTable Error' dspace.log.2019-03-18 | wc -l
8
$ grep -I 'SQL QueryTable Error' dspace.log.2019-03-{08,14,15,16,17,18} | awk -F: '{print $1}' | sort | uniq -c
9 dspace.log.2019-03-08
25 dspace.log.2019-03-14
12 dspace.log.2019-03-15
67 dspace.log.2019-03-16
72 dspace.log.2019-03-17
8 dspace.log.2019-03-18
- It seems to be something with grep doing binary matching on some log files for some reason, so I guess I need to always use
-I
to say binary files don’t match - Anyways, the full error in DSpace’s log is:
2019-03-18 12:26:23,331 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL QueryTable Error -
java.sql.SQLException: Connection org.postgresql.jdbc.PgConnection@75eaa668 is closed.
at org.apache.tomcat.dbcp.dbcp.DelegatingConnection.checkOpen(DelegatingConnection.java:398)
at org.apache.tomcat.dbcp.dbcp.DelegatingConnection.prepareStatement(DelegatingConnection.java:279)
at org.apache.tomcat.dbcp.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.prepareStatement(PoolingDataSource.java:313)
at org.dspace.storage.rdbms.DatabaseManager.queryTable(DatabaseManager.java:220)
- There is a low number of connections to PostgreSQL currently:
$ psql -c 'select * from pg_stat_activity' | wc -l
33
$ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c
6 dspaceApi
7 dspaceCli
15 dspaceWeb
- I looked in the PostgreSQL logs, but all I see are a bunch of these errors going back two months to January:
2019-01-13 06:25:13.062 CET [9157] postgres@template1 ERROR: column "waiting" does not exist at character 217
- This is unrelated and apparently due to Munin checking a column that was changed in PostgreSQL 9.6
- I suspect that this issue with the blank pages might not be PostgreSQL after all, perhaps it’s a Cocoon thing?
- Looking in the cocoon logs I see a large number of warnings about “Can not load requested doc” around 11AM and 12PM:
$ grep 'Can not load requested doc' cocoon.log.2019-03-18 | grep -oE '2019-03-18 [0-9]{2}:' | sort | uniq -c
2 2019-03-18 00:
6 2019-03-18 02:
3 2019-03-18 04:
1 2019-03-18 05:
1 2019-03-18 07:
2 2019-03-18 08:
4 2019-03-18 09:
5 2019-03-18 10:
863 2019-03-18 11:
203 2019-03-18 12:
14 2019-03-18 13:
1 2019-03-18 14:
- And a few days ago on 2019-03-15 when I happened last it was in the afternoon when it happened and the same pattern occurs then around 1–2PM:
$ xzgrep 'Can not load requested doc' cocoon.log.2019-03-15.xz | grep -oE '2019-03-15 [0-9]{2}:' | sort | uniq -c
4 2019-03-15 01:
3 2019-03-15 02:
1 2019-03-15 03:
13 2019-03-15 04:
1 2019-03-15 05:
2 2019-03-15 06:
3 2019-03-15 07:
27 2019-03-15 09:
9 2019-03-15 10:
3 2019-03-15 11:
2 2019-03-15 12:
531 2019-03-15 13:
274 2019-03-15 14:
4 2019-03-15 15:
75 2019-03-15 16:
5 2019-03-15 17:
5 2019-03-15 18:
6 2019-03-15 19:
2 2019-03-15 20:
4 2019-03-15 21:
3 2019-03-15 22:
1 2019-03-15 23:
- And again on 2019-03-08, surprise surprise, it happened in the morning:
$ xzgrep 'Can not load requested doc' cocoon.log.2019-03-08.xz | grep -oE '2019-03-08 [0-9]{2}:' | sort | uniq -c
11 2019-03-08 01:
3 2019-03-08 02:
1 2019-03-08 03:
2 2019-03-08 04:
1 2019-03-08 05:
1 2019-03-08 06:
1 2019-03-08 08:
425 2019-03-08 09:
432 2019-03-08 10:
717 2019-03-08 11:
59 2019-03-08 12:
- I’m not sure if it’s cocoon or that’s just a symptom of something else
2019-03-19
- I found a handful of AGROVOC subjects that use a non-breaking space (0x00a0) instead of a regular space, which makes for a pretty confusing debugging…
- I will replace these in the database immediately to save myself the headache later:
dspace=# SELECT count(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id = 57 AND text_value ~ '.+\u00a0.+';
count
-------
84
(1 row)
- Perhaps my
agrovoc-lookup.py
script could notify if it finds these because they potentially give false negatives - CGSpace (linode18) is having problems with Solr again, I’m seeing “Error opening new searcher” in the Solr logs and there are no stats for previous years
- Apparently the Solr statistics shards didn’t load properly when we restarted Tomcat yesterday:
2019-03-18 12:32:39,799 ERROR org.apache.solr.core.CoreContainer @ Error creating core [statistics-2018]: Error opening new searcher
...
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:845)
... 31 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2018/data/index/write.lock
- For reference, I don’t see the
ulimit -v unlimited
in thecatalina.sh
script, though thetomcat7
systemd service hasLimitAS=infinity
- The limits of the current Tomcat java process are:
# cat /proc/27182/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 128589 128589 processes
Max open files 16384 16384 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 128589 128589 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
- I will try to add
ulimit -v unlimited
to the Catalina startup script and check the output of the limits to see if it’s different in practice, as some wisdom on Stack Overflow says this solves the Solr core issues and I’ve superstitiously tried it various times in the past- The result is the same before and after, so adding the ulimit directly is unneccessary (whether or not unlimited address space is useful or not is another question)
- For now I will just stop Tomcat, delete Solr locks, then start Tomcat again:
# systemctl stop tomcat7
# find /home/cgspace.cgiar.org/solr/ -iname "*.lock" -delete
# systemctl start tomcat7
- After restarting I confirmed that all Solr statistics cores were loaded successfully…
- Another avenue might be to look at point releases in Solr 4.10.x, as we’re running 4.10.2 and they released 4.10.3 and 4.10.4 back in 2014 or 2015
- I see several issues regarding locks and IndexWriter that were fixed in Solr and Lucene 4.10.3 and 4.10.4…
- I sent a mail to the dspace-tech mailing list to ask about Solr issues