2018-02-11 17:28:23 +01:00
<!DOCTYPE html>
2019-10-11 10:19:42 +02:00
< html lang = "en" >
2018-02-11 17:28:23 +01:00
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
2020-12-06 15:53:29 +01:00
2018-02-11 17:28:23 +01:00
< meta property = "og:title" content = "April, 2016" / >
< meta property = "og:description" content = "2016-04-04
Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
2020-01-27 15:20:44 +01:00
After running DSpace for over five years I’ ve never needed to look in any other log file than dspace.log, leave alone one from last year!
This will save us a few gigs of backup space we’ re paying for on S3
2018-02-11 17:28:23 +01:00
Also, I noticed the checker log has some errors we should pay attention to:
" />
< meta property = "og:type" content = "article" / >
2019-02-02 13:12:57 +01:00
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2016-04/" / >
2019-08-08 17:10:44 +02:00
< meta property = "article:published_time" content = "2016-04-04T11:06:00+03:00" / >
< meta property = "article:modified_time" content = "2018-03-09T22:10:33+02:00" / >
2018-09-30 07:23:48 +02:00
2020-12-06 15:53:29 +01:00
2018-02-11 17:28:23 +01:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "April, 2016" / >
< meta name = "twitter:description" content = "2016-04-04
Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
2020-01-27 15:20:44 +01:00
After running DSpace for over five years I’ ve never needed to look in any other log file than dspace.log, leave alone one from last year!
This will save us a few gigs of backup space we’ re paying for on S3
2018-02-11 17:28:23 +01:00
Also, I noticed the checker log has some errors we should pay attention to:
"/>
2021-06-30 11:21:16 +02:00
< meta name = "generator" content = "Hugo 0.84.2" / >
2018-02-11 17:28:23 +01:00
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "April, 2016",
2020-04-02 09:55:42 +02:00
"url": "https://alanorth.github.io/cgspace-notes/2016-04/",
2018-04-30 18:05:39 +02:00
"wordCount": "2006",
2019-10-11 10:19:42 +02:00
"datePublished": "2016-04-04T11:06:00+03:00",
"dateModified": "2018-03-09T22:10:33+02:00",
2018-02-11 17:28:23 +01:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2016-04/" >
< title > April, 2016 | CGSpace Notes< / title >
2019-10-11 10:19:42 +02:00
2018-02-11 17:28:23 +01:00
<!-- combined, minified CSS -->
2020-01-23 19:19:38 +01:00
2021-01-24 08:46:27 +01:00
< link href = "https://alanorth.github.io/cgspace-notes/css/style.beb8012edc08ba10be012f079d618dc243812267efe62e11f22fe49618f976a4.css" rel = "stylesheet" integrity = "sha256-vrgBLtwIuhC+AS8HnWGNwkOBImfv5i4R8i/klhj5dqQ=" crossorigin = "anonymous" >
2019-10-11 10:19:42 +02:00
2018-02-11 17:28:23 +01:00
2020-01-28 11:01:42 +01:00
<!-- minified Font Awesome for SVG icons -->
2021-01-24 08:46:27 +01:00
< script defer src = "https://alanorth.github.io/cgspace-notes/js/fontawesome.min.ffbfea088a9a1666ec65c3a8cb4906e2a0e4f92dc70dbbf400a125ad2422123a.js" integrity = "sha256-/7/qCIqaFmbsZcOoy0kG4qDk+S3HDbv0AKElrSQiEjo=" crossorigin = "anonymous" > < / script >
2020-01-28 11:01:42 +01:00
2019-04-14 15:59:47 +02:00
<!-- RSS 2.0 feed -->
2018-02-11 17:28:23 +01:00
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
2018-12-19 12:20:39 +01:00
2018-02-11 17:28:23 +01:00
< header class = "blog-header" >
< div class = "container" >
2019-10-11 10:19:42 +02:00
< h1 class = "blog-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" dir = "auto" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
2018-02-11 17:28:23 +01:00
< / div >
< / header >
2018-12-19 12:20:39 +01:00
2018-02-11 17:28:23 +01:00
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
< article class = "blog-post" >
< header >
2019-10-11 10:19:42 +02:00
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2016-04/" > April, 2016< / a > < / h2 >
2020-11-16 09:54:00 +01:00
< p class = "blog-post-meta" >
< time datetime = "2016-04-04T11:06:00+03:00" > Mon Apr 04, 2016< / time >
in
2018-02-11 17:28:23 +01:00
2020-01-28 11:01:42 +01:00
< span class = "fas fa-tag" aria-hidden = "true" > < / span > < a href = "/cgspace-notes/tags/notes/" rel = "tag" > Notes< / a >
2018-02-11 17:28:23 +01:00
< / p >
< / header >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-04" > 2016-04-04< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
< li > Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit< / li >
< li > We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc< / li >
2020-01-27 15:20:44 +01:00
< li > After running DSpace for over five years I’ ve never needed to look in any other log file than dspace.log, leave alone one from last year!< / li >
< li > This will save us a few gigs of backup space we’ re paying for on S3< / li >
2018-02-11 17:28:23 +01:00
< li > Also, I noticed the < code > checker< / code > log has some errors we should pay attention to:< / li >
< / ul >
< pre > < code > Run start time: 03/06/2016 04:00:22
Error retrieving bitstream ID 71274 from asset store.
java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290601546459645925328536011917633626 (Too many open files)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.< init> (FileInputStream.java:146)
at edu.sdsc.grid.io.local.LocalFileInputStream.open(LocalFileInputStream.java:171)
at edu.sdsc.grid.io.GeneralFileInputStream.< init> (GeneralFileInputStream.java:145)
at edu.sdsc.grid.io.local.LocalFileInputStream.< init> (LocalFileInputStream.java:139)
at edu.sdsc.grid.io.FileFactory.newFileInputStream(FileFactory.java:630)
at org.dspace.storage.bitstore.BitstreamStorageManager.retrieve(BitstreamStorageManager.java:525)
at org.dspace.checker.BitstreamDAO.getBitstream(BitstreamDAO.java:60)
at org.dspace.checker.CheckerCommand.processBitstream(CheckerCommand.java:303)
at org.dspace.checker.CheckerCommand.checkBitstream(CheckerCommand.java:171)
at org.dspace.checker.CheckerCommand.process(CheckerCommand.java:120)
at org.dspace.app.checker.ChecksumChecker.main(ChecksumChecker.java:236)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:225)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:77)
******************************************************
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
2018-02-11 17:28:23 +01:00
< li > So this would be the < code > tomcat7< / code > Unix user, who seems to have a default limit of 1024 files in its shell< / li >
2020-09-16 12:47:13 +02:00
< li > For what it’ s worth, we have been setting the actual Tomcat 7 process' limit to 16384 for a few years (in < code > /etc/default/tomcat7< / code > )< / li >
2018-02-11 17:28:23 +01:00
< li > Looks like cron will read limits from < code > /etc/security/limits.*< / code > so we can do something for the tomcat7 user there< / li >
< li > Submit pull request for Tomcat 7 limits in Ansible dspace role (< a href = "https://github.com/ilri/rmg-ansible-public/pull/30" > #30< / a > )< / li >
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-05" > 2016-04-05< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
2020-01-27 15:20:44 +01:00
< li > Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don’ t need!< / li >
2019-11-28 16:30:45 +01:00
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > # s3cmd ls s3://cgspace.cgiar.org/log/ > /tmp/s3-logs.txt
# grep checker.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep cocoon.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep handle-plugin.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
# grep solr.log /tmp/s3-logs.txt | awk '{print $4}' | xargs s3cmd del
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > Also, adjust the cron jobs for backups so they only backup < code > dspace.log< / code > and some stats files (.dat)< / li >
< li > Try to do some metadata field migrations using the Atmire batch UI (< code > dc.Species< / code > → < code > cg.species< / code > ) but it took several hours and even missed a few records< / li >
2018-02-11 17:28:23 +01:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-06" > 2016-04-06< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
2019-11-28 16:30:45 +01:00
< li > A better way to move metadata on this scale is via SQL, for example < code > dc.type.output< / code > → < code > dc.type< / code > (their IDs in the metadatafieldregistry are 66 and 109, respectively):< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > dspacetest=# update metadatavalue set metadata_field_id=109 where metadata_field_id=66;
UPDATE 40852
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > After that an < code > index-discovery -bf< / code > is required< / li >
< li > Start working on metadata migrations, add 25 or so new metadata fields to CGSpace< / li >
2018-02-11 17:28:23 +01:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-07" > 2016-04-07< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
< li > Write shell script to do the migration of fields: < a href = "https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b" > https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b< / a > < / li >
2019-11-28 16:30:45 +01:00
< li > Testing with a few fields it seems to work well:< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > $ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40883
UPDATE metadatavalue SET metadata_field_id=202 WHERE metadata_field_id=72
UPDATE 21420
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51258
2019-12-17 13:49:24 +01:00
< / code > < / pre > < h2 id = "2016-04-08" > 2016-04-08< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
2020-01-27 15:20:44 +01:00
< li > Discuss metadata renaming with Abenet, we decided it’ s better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF< / li >
< li > I’ ve e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change< / li >
2018-02-11 17:28:23 +01:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-10" > 2016-04-10< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
< li > Looking at the DOI issue < a href = "https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=678507860" > reported by Leroy from CIAT a few weeks ago< / a > < / li >
2019-11-28 16:30:45 +01:00
< li > It seems the < code > dx.doi.org< / code > URLs are much more proper in our repository!< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://dx.doi.org%';
2019-11-28 16:30:45 +01:00
count
2018-02-11 17:28:23 +01:00
-------
2019-11-28 16:30:45 +01:00
5638
2018-02-11 17:28:23 +01:00
(1 row)
dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://doi.org%';
2019-11-28 16:30:45 +01:00
count
2018-02-11 17:28:23 +01:00
-------
2019-11-28 16:30:45 +01:00
3
< / code > < / pre > < ul >
< li > I will manually edit the < code > dc.identifier.doi< / code > in < a href = "https://cgspace.cgiar.org/handle/10568/72509?show=full" > 10568/72509< / a > and tweet the link, then check back in a week to see if the donut gets updated< / li >
2018-02-11 17:28:23 +01:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-11" > 2016-04-11< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
< li > The donut is already updated and shows the correct number now< / li >
2020-01-27 15:20:44 +01:00
< li > CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we’ d do it tentatively on Monday the 18th.< / li >
2018-02-11 17:28:23 +01:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-12" > 2016-04-12< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
2019-11-28 16:30:45 +01:00
< li > Looking at quality of WLE data (< code > cg.subject.iwmi< / code > ) in SQL:< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > dspacetest=# select text_value, count(*) from metadatavalue where metadata_field_id=217 group by text_value order by count(*) desc;
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > Listings and Reports is still not returning reliable data for < code > dc.type< / code > < / li >
2020-01-27 15:20:44 +01:00
< li > I think we need to ask Atmire, as their documentation isn’ t too clear on the format of the filter configs< / li >
2019-11-28 16:30:45 +01:00
< li > Alternatively, I want to see if I move all the data from < code > dc.type.output< / code > to < code > dc.type< / code > and then re-index, if it behaves better< / li >
< li > Looking at our < code > input-forms.xml< / code > I see we have two sets of ILRI subjects, but one has a few extra subjects< / li >
< li > Remove one set of ILRI subjects and remove duplicate < code > VALUE CHAINS< / code > from existing list (< a href = "https://github.com/ilri/DSpace/pull/216" > #216< / a > )< / li >
< li > I decided to keep the set of subjects that had < code > FMD< / code > and < code > RANGELANDS< / code > added, as it appears to have been requested to have been added, and might be the newer list< / li >
< li > I found 226 blank metadatavalues:< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > dspacetest# select * from metadatavalue where resource_type_id=2 and text_value='';
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > I think we should delete them and do a full re-index:< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value='';
DELETE 226
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
2020-01-27 15:20:44 +01:00
< li > I deleted them on CGSpace but I’ ll wait to do the re-index as we’ re going to be doing one in a few days for the metadata changes anyways< / li >
2019-11-28 16:30:45 +01:00
< li > In other news, moving the < code > dc.type.output< / code > to < code > dc.type< / code > and re-indexing seems to have fixed the Listings and Reports issue from above< / li >
2020-01-27 15:20:44 +01:00
< li > Unfortunately this isn’ t a very good solution, because Listings and Reports config should allow us to filter on < code > dc.type.*< / code > but the documentation isn’ t very clear and I couldn’ t reach Atmire today< / li >
2019-11-28 16:30:45 +01:00
< li > We want to do the < code > dc.type.output< / code > move on CGSpace anyways, but we should wait as it might affect other external people!< / li >
2018-02-11 17:28:23 +01:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-14" > 2016-04-14< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
< li > Communicate with Macaroni Bros again about < code > dc.type< / code > < / li >
< li > Help Sisay with some rsync and Linux stuff< / li >
< li > Notify CIAT people of metadata changes (I had forgotten them last week)< / li >
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-15" > 2016-04-15< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
< li > DSpace Test had crashed, so I ran all system updates, rebooted, and re-deployed DSpace code< / li >
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-18" > 2016-04-18< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
< li > Talk to CIAT people about their portal again< / li >
< li > Start looking more at the fields we want to delete< / li >
< li > The following metadata fields have 0 items using them, so we can just remove them from the registry and any references in XMLUI, input forms, etc:
< ul >
< li > dc.description.abstractother< / li >
< li > dc.whatwasknown< / li >
< li > dc.whatisnew< / li >
< li > dc.description.nationalpartners< / li >
< li > dc.peerreviewprocess< / li >
< li > cg.species.animal< / li >
2019-11-28 16:30:45 +01:00
< / ul >
< / li >
2018-02-11 17:28:23 +01:00
< li > Deleted!< / li >
< li > The following fields have some items using them and I have to decide what to do with them (delete or move):
< ul >
< li > dc.icsubject.icrafsubject: 6 items, mostly in CPWF collections< / li >
< li > dc.type.journal: 11 items, mostly in ILRI collections< / li >
< li > dc.publicationcategory: 1 item, in CPWF< / li >
< li > dc.GRP: 2 items, CPWF< / li >
< li > dc.Species.animal: 6 items, in ILRI and AnGR< / li >
< li > cg.livestock.agegroup: 9 items, in ILRI collections< / li >
< li > cg.livestock.function: 20 items, mostly in EADD< / li >
2019-11-28 16:30:45 +01:00
< / ul >
< / li >
< li > Test metadata migration on local instance again:< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > $ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40885
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51330
UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82
UPDATE 5986
UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88
UPDATE 2456
UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
UPDATE 3872
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
UPDATE 46075
$ JAVA_OPTS=" -Xms512m -Xmx512m -Dfile.encoding=UTF-8" ~/dspace/bin/dspace index-discovery -bf
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
2020-01-27 15:20:44 +01:00
< li > CGSpace was down but I’ m not sure why, this was in < code > catalina.out< / code > :< / li >
2019-11-28 16:30:45 +01:00
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > Apr 18, 2016 7:32:26 PM com.sun.jersey.spi.container.ContainerResponse logException
SEVERE: Mapped exception to response: 500 (Internal Server Error)
javax.ws.rs.WebApplicationException
2019-11-28 16:30:45 +01:00
at org.dspace.rest.Resource.processFinally(Resource.java:163)
at org.dspace.rest.HandleResource.getObject(HandleResource.java:81)
at sun.reflect.GeneratedMethodAccessor198.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
2018-02-11 17:28:23 +01:00
...
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)< / li >
< li > After restarting Tomcat a few more of these errors were logged but the application was up< / li >
2018-02-11 17:28:23 +01:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-19" > 2016-04-19< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
2019-11-28 16:30:45 +01:00
< li > Get handles for items that are using a given metadata field, ie < code > dc.Species.animal< / code > (105):< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > # select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=105);
2019-11-28 16:30:45 +01:00
handle
2018-02-11 17:28:23 +01:00
-------------
2019-11-28 16:30:45 +01:00
10568/10298
10568/16413
10568/16774
10568/34487
< / code > < / pre > < ul >
< li > Delete metadata values for < code > dc.GRP< / code > and < code > dc.icsubject.icrafsubject< / code > :< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > # delete from metadatavalue where resource_type_id=2 and metadata_field_id=96;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=83;
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
2020-01-27 15:20:44 +01:00
< li > They are old ICRAF fields and we haven’ t used them since 2011 or so< / li >
2019-11-28 16:30:45 +01:00
< li > Also delete them from the metadata registry< / li >
< li > CGSpace went down again, < code > dspace.log< / code > had this:< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > 2016-04-19 15:02:17,025 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error -
org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
2020-01-27 15:20:44 +01:00
< li > I restarted Tomcat and PostgreSQL and now it’ s back up< / li >
2019-11-28 16:30:45 +01:00
< li > I bet this is the same crash as yesterday, but I only saw the errors in < code > catalina.out< / code > < / li >
< li > Looks to be related to this, from < code > dspace.log< / code > :< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > 2016-04-19 15:16:34,670 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > We have 18,000 of these errors right now… < / li >
< li > Delete a few more old metadata values: < code > dc.Species.animal< / code > , < code > dc.type.journal< / code > , and < code > dc.publicationcategory< / code > :< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > # delete from metadatavalue where resource_type_id=2 and metadata_field_id=105;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=85;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=95;
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > And then remove them from the metadata registry< / li >
2018-02-11 17:28:23 +01:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-20" > 2016-04-20< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
< li > Re-deploy DSpace Test with the new subject and type fields, run all system updates, and reboot the server< / li >
< li > Migrate fields and re-deploy CGSpace with the new subject and type fields, run all system updates, and reboot the server< / li >
2019-11-28 16:30:45 +01:00
< li > Field migration went well:< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > $ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40909
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51419
UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82
UPDATE 5986
UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88
UPDATE 2458
UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
UPDATE 3872
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
UPDATE 46075
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > Also, I migrated CGSpace to using the PGDG PostgreSQL repo as the infrastructure playbooks had been using it for a while and it seemed to be working well< / li >
< li > Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)< / li >
< li > Looking into the REST API errors again, it looks like these started appearing a few days ago in the tens of thousands:< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > $ grep -c " Aborting context in finally statement" dspace.log.2016-04-20
21252
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
2020-01-27 15:20:44 +01:00
< li > I found a recent discussion on the DSpace mailing list and I’ ve asked for advice there< / li >
< li > Looks like this issue was noted and fixed in DSpace 5.5 (we’ re on 5.1): < a href = "https://jira.duraspace.org/browse/DS-2936" > https://jira.duraspace.org/browse/DS-2936< / a > < / li >
< li > I’ ve sent a message to Atmire asking about compatibility with DSpace 5.5< / li >
2018-02-11 17:28:23 +01:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-21" > 2016-04-21< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
< li > Fix a bunch of metadata consistency issues with IITA Journal Articles (Peer review, Formally published, messed up DOIs, etc)< / li >
2020-01-27 15:20:44 +01:00
< li > Atmire responded with DSpace 5.5 compatible versions for their modules, so I’ ll start testing those in a few weeks< / li >
2018-02-11 17:28:23 +01:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-22" > 2016-04-22< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
2020-01-27 15:20:44 +01:00
< li > Import 95 records into < a href = "https://cgspace.cgiar.org/handle/10568/42219" > CTA’ s Agrodok collection< / a > < / li >
2018-02-11 17:28:23 +01:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-26" > 2016-04-26< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
< li > Test embargo during item upload< / li >
< li > Seems to be working but the help text is misleading as to the date format< / li >
2020-01-27 15:20:44 +01:00
< li > It turns out the < code > robots.txt< / code > issue we thought we solved last month isn’ t solved because you can’ t use wildcards in URL patterns: < a href = "https://jira.duraspace.org/browse/DS-2962" > https://jira.duraspace.org/browse/DS-2962< / a > < / li >
2018-02-11 17:28:23 +01:00
< li > Write some nginx rules to add < code > X-Robots-Tag< / code > HTTP headers to the dynamic requests from < code > robots.txt< / code > instead< / li >
< li > A few URLs to test with:
< ul >
< li > < a href = "https://dspacetest.cgiar.org/handle/10568/440/browse?type=bioversity" > https://dspacetest.cgiar.org/handle/10568/440/browse?type=bioversity< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org/handle/10568/913/discover" > https://dspacetest.cgiar.org/handle/10568/913/discover< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org/handle/10568/1/search-filter?filtertype_0=country&filter_0=VIETNAM&filter_relational_operator_0=equals&field=country" > https://dspacetest.cgiar.org/handle/10568/1/search-filter?filtertype_0=country& filter_0=VIETNAM& filter_relational_operator_0=equals& field=country< / a > < / li >
< / ul >
2019-11-28 16:30:45 +01:00
< / li >
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-27" > 2016-04-27< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
< li > I woke up to ten or fifteen “ up” and “ down” emails from the monitoring website< / li >
< li > Looks like the last one was “ down” from about four hours ago< / li >
2019-11-28 16:30:45 +01:00
< li > I think there must be something with this REST stuff:< / li >
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > # grep -c " Aborting context in finally statement" dspace.log.2016-04-*
dspace.log.2016-04-01:0
dspace.log.2016-04-02:0
dspace.log.2016-04-03:0
dspace.log.2016-04-04:0
dspace.log.2016-04-05:0
dspace.log.2016-04-06:0
dspace.log.2016-04-07:0
dspace.log.2016-04-08:0
dspace.log.2016-04-09:0
dspace.log.2016-04-10:0
dspace.log.2016-04-11:0
dspace.log.2016-04-12:235
dspace.log.2016-04-13:44
dspace.log.2016-04-14:0
dspace.log.2016-04-15:35
dspace.log.2016-04-16:0
dspace.log.2016-04-17:0
dspace.log.2016-04-18:11942
dspace.log.2016-04-19:28496
dspace.log.2016-04-20:28474
dspace.log.2016-04-21:28654
dspace.log.2016-04-22:28763
dspace.log.2016-04-23:28773
dspace.log.2016-04-24:28775
dspace.log.2016-04-25:28626
dspace.log.2016-04-26:28655
dspace.log.2016-04-27:7271
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > I restarted tomcat and it is back up< / li >
< li > Add Spanish XMLUI strings so those users see “ CGSpace” instead of “ DSpace” in the user interface (< a href = "https://github.com/ilri/DSpace/pull/222" > #222< / a > )< / li >
< li > Submit patch to upstream DSpace for the misleading help text in the embargo step of the item submission: < a href = "https://jira.duraspace.org/browse/DS-3172" > https://jira.duraspace.org/browse/DS-3172< / a > < / li >
< li > Update infrastructure playbooks for nginx 1.10.x (stable) release: < a href = "https://github.com/ilri/rmg-ansible-public/issues/32" > https://github.com/ilri/rmg-ansible-public/issues/32< / a > < / li >
2020-01-27 15:20:44 +01:00
< li > Currently running on DSpace Test, we’ ll give it a few days before we adjust CGSpace< / li >
< li > CGSpace down, restarted tomcat and it’ s back up< / li >
2018-02-11 17:28:23 +01:00
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-28" > 2016-04-28< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
2020-01-27 15:20:44 +01:00
< li > Problems with stability again. I’ ve blocked access to < code > /rest< / code > for now to see if the number of errors in the log files drop< / li >
2018-02-11 17:28:23 +01:00
< li > Later we could maybe start logging access to < code > /rest< / code > and perhaps whitelist some IPs… < / li >
< / ul >
2019-12-17 13:49:24 +01:00
< h2 id = "2016-04-30" > 2016-04-30< / h2 >
2018-02-11 17:28:23 +01:00
< ul >
2020-01-27 15:20:44 +01:00
< li > Logs for today and yesterday have zero references to this REST error, so I’ m going to open back up the REST API but log all requests< / li >
2019-11-28 16:30:45 +01:00
< / ul >
2018-02-11 17:28:23 +01:00
< pre > < code > location /rest {
access_log /var/log/nginx/rest.log;
proxy_pass http://127.0.0.1:8443;
}
2019-11-28 16:30:45 +01:00
< / code > < / pre > < ul >
< li > I will check the logs again in a few days to look for patterns, see who is accessing it, etc< / li >
2018-02-11 17:28:23 +01:00
< / ul >
< / article >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
2021-06-03 20:54:49 +02:00
< li > < a href = "/cgspace-notes/2021-06/" > June, 2021< / a > < / li >
2021-05-02 18:55:06 +02:00
< li > < a href = "/cgspace-notes/2021-05/" > May, 2021< / a > < / li >
2021-04-05 18:36:44 +02:00
< li > < a href = "/cgspace-notes/2021-04/" > April, 2021< / a > < / li >
2021-03-04 21:46:05 +01:00
< li > < a href = "/cgspace-notes/2021-03/" > March, 2021< / a > < / li >
2021-04-01 08:49:08 +02:00
< li > < a href = "/cgspace-notes/cgspace-cgcorev2-migration/" > CGSpace CG Core v2 Migration< / a > < / li >
2018-02-11 17:28:23 +01:00
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
2019-10-11 10:19:42 +02:00
< p dir = "auto" >
2018-02-11 17:28:23 +01:00
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >