diff --git a/content/posts/2018-11.md b/content/posts/2018-11.md index 60d69d1a1..cabad2045 100644 --- a/content/posts/2018-11.md +++ b/content/posts/2018-11.md @@ -274,4 +274,10 @@ $ time ./rest-find-collections.py 10568/27629 --rest-url https://dspacetest.cgia - Update my [dspace-statistics-api](https://github.com/ilri/dspace-statistics-api) to use a database management class with Python contexts so that connections and cursors are automatically opened and closed - Tag version 0.7.0 of the dspace-statistics-api +## 2018-11-08 + +- I deployed verison 0.7.0 of the dspace-statistics-api on DSpace Test (linode19) so I can test it for a few days (and check the Munin stats to see the change in database connections) before deploying on CGSpace +- I also enabled systemd's persistent journal by setting [`Storage=persistent` in *journald.conf*](https://www.freedesktop.org/software/systemd/man/journald.conf.html) +- Apparently [Ubuntu 16.04 defaulted to using rsyslog for boot records until early 2018](https://www.freedesktop.org/software/systemd/man/journald.conf.html), so I removed `rsyslog` too + diff --git a/docs/2015-11/index.html b/docs/2015-11/index.html index 7777389e1..0bd512169 100644 --- a/docs/2015-11/index.html +++ b/docs/2015-11/index.html @@ -16,8 +16,6 @@ Last week I had increased the limit from 30 to 60, which seemed to help, but now $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 78 - - " /> @@ -35,10 +33,8 @@ Last week I had increased the limit from 30 to 60, which seemed to help, but now $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 78 - - "/> - + @@ -128,8 +124,6 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac 78 -

- diff --git a/docs/2015-12/index.html b/docs/2015-12/index.html index e20e43735..9f44e2584 100644 --- a/docs/2015-12/index.html +++ b/docs/2015-12/index.html @@ -17,8 +17,6 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz - - " /> @@ -37,10 +35,8 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz - - "/> - + @@ -131,8 +127,6 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -

- -

-

2016-01-14

-

-

2016-02-06

-

-

2016-03-07

-

-
Run start time: 03/06/2016 04:00:22
 Error retrieving bitstream ID 71274 from asset store.
 java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290601546459645925328536011917633626 (Too many open files)
diff --git a/docs/2016-05/index.html b/docs/2016-05/index.html
index b2bef7aa9..2175f48ab 100644
--- a/docs/2016-05/index.html
+++ b/docs/2016-05/index.html
@@ -16,8 +16,6 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
 
 # awk '{print $1}' /var/log/nginx/rest.log  | uniq | wc -l
 3168
-
-
 " />
 
 
@@ -35,10 +33,8 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
 
 # awk '{print $1}' /var/log/nginx/rest.log  | uniq | wc -l
 3168
-
-
 "/>
-
+
 
 
     
@@ -128,8 +124,6 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
 3168
 
-

- -

-
dspacetest=# update metadatavalue set metadata_field_id=130 where metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA');
 UPDATE 497
 dspacetest=# update metadatavalue set metadata_field_id=29 where metadata_field_id=75;
diff --git a/docs/2016-07/index.html b/docs/2016-07/index.html
index a8977b88e..dcdbd6e3e 100644
--- a/docs/2016-07/index.html
+++ b/docs/2016-07/index.html
@@ -23,8 +23,6 @@ dspacetest=# select text_value from  metadatavalue where metadata_field_id=3 and
 
 
 In this case the select query was showing 95 results before the update
-
-
 " />
 
 
@@ -49,10 +47,8 @@ dspacetest=# select text_value from  metadatavalue where metadata_field_id=3 and
 
 
 In this case the select query was showing 95 results before the update
-
-
 "/>
-
+
 
 
     
@@ -149,8 +145,6 @@ dspacetest=# select text_value from  metadatavalue where metadata_field_id=3 and
 
  • In this case the select query was showing 95 results before the update
  • -

    -

    2016-07-02

    -

    - -

    -

    2018-05-02

    -

    -

    2018-10-03

    -

    -
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Nov/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
        1300 66.249.64.63
        1384 35.237.175.180
    @@ -420,6 +414,14 @@ Today these are the top 10 IPs:
     
  • Tag version 0.7.0 of the dspace-statistics-api
  • +

    2018-11-08

    + + + diff --git a/docs/404.html b/docs/404.html index 303ed1f8f..0ad28f0bf 100644 --- a/docs/404.html +++ b/docs/404.html @@ -13,7 +13,7 @@ - + diff --git a/docs/categories/index.html b/docs/categories/index.html index f0adf6a89..d6bb6de30 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -13,7 +13,7 @@ - + @@ -109,8 +109,6 @@
  • Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
  • Today these are the top 10 IPs:
  • - -

    Read more → @@ -134,8 +132,6 @@
  • Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
  • I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now
  • - -

    Read more → @@ -161,8 +157,6 @@
  • Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month
  • I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:
  • - -

    Read more → @@ -199,8 +193,6 @@
  • The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
  • I ran all system updates on DSpace Test and rebooted it
  • - -

    Read more → @@ -233,8 +225,6 @@
    There is insufficient memory for the Java Runtime Environment to continue.
     
    - -

    Read more → @@ -278,8 +268,6 @@ real 74m42.646s user 8m5.056s sys 2m7.289s
    - -

    Read more → @@ -309,8 +297,6 @@ sys 2m7.289s
  • Then I reduced the JVM heap size from 6144 back to 5120m
  • Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
  • - -

    Read more → @@ -334,8 +320,6 @@ sys 2m7.289s
  • I tried to test something on DSpace Test but noticed that it’s down since god knows when
  • Catalina logs at least show some memory errors yesterday:
  • - -

    Read more → @@ -358,8 +342,6 @@ sys 2m7.289s - -

    Read more → @@ -385,8 +367,6 @@ sys 2m7.289s
  • Yesterday I figured out how to monitor DSpace sessions using JMX
  • I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
  • - -

    Read more → diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index b0585df82..35ab84955 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -14,7 +14,7 @@ - + @@ -84,8 +84,6 @@

    Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization.

    - -

    Read more → diff --git a/docs/categories/notes/index.xml b/docs/categories/notes/index.xml index 88149ec10..15ff6958d 100644 --- a/docs/categories/notes/index.xml +++ b/docs/categories/notes/index.xml @@ -17,9 +17,7 @@ Mon, 18 Sep 2017 16:38:35 +0300 https://alanorth.github.io/cgspace-notes/cgiar-library-migration/ - <p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p> - -<p></p> + <p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p> diff --git a/docs/categories/page/2/index.html b/docs/categories/page/2/index.html index 5e57bf1be..acd4ad6e9 100644 --- a/docs/categories/page/2/index.html +++ b/docs/categories/page/2/index.html @@ -14,7 +14,7 @@ - + @@ -175,8 +175,6 @@ dspace.log.2018-01-02:34 - -

    Read more → @@ -202,8 +200,6 @@ dspace.log.2018-01-02:34
  • PostgreSQL activity says there are 115 connections currently
  • The list of connections to XMLUI and REST API for today:
  • - -

    Read more → @@ -244,8 +240,6 @@ dspace.log.2018-01-02:34
    dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
     COPY 54701
     
    - -

    Read more → @@ -276,8 +270,6 @@ COPY 54701
  • There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
  • Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
  • - -

    Read more → @@ -296,8 +288,6 @@ COPY 54701

    Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization.

    - -

    Read more → @@ -326,8 +316,6 @@ COPY 54701 - -

    Read more → @@ -368,8 +356,6 @@ COPY 54701
  • I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
  • Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
  • - -

    Read more → @@ -400,8 +386,6 @@ COPY 54701
  • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
  • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
  • - -

    Read more → diff --git a/docs/categories/page/3/index.html b/docs/categories/page/3/index.html index 0b42823c2..8b0fa09d7 100644 --- a/docs/categories/page/3/index.html +++ b/docs/categories/page/3/index.html @@ -14,7 +14,7 @@ - + @@ -114,8 +114,6 @@
    $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
     
    - -

    Read more → @@ -156,8 +154,6 @@
    $ identify ~/Desktop/alc_contrastes_desafios.jpg
     /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
     
    - -

    Read more → @@ -196,8 +192,6 @@ DELETE 1
  • Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
  • Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
  • - -

    Read more → @@ -222,8 +216,6 @@ DELETE 1
  • I tested on DSpace Test as well and it doesn’t work there either
  • I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
  • - -

    Read more → @@ -260,8 +252,6 @@ DELETE 1
  • I’ve raised a ticket with Atmire to ask
  • Another worrying error from dspace.log is:
  • - -

    Read more → @@ -286,8 +276,6 @@ DELETE 1

    Listings and Reports with output type

    - -

    Read more → @@ -320,8 +308,6 @@ DELETE 1
    0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
     
    - -

    Read more → @@ -350,8 +336,6 @@ DELETE 1
    $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
     
    - -

    Read more → @@ -384,8 +368,6 @@ DELETE 1 $ git reset --hard ilri/5_x-prod $ git rebase -i dspace-5.5
    - -

    Read more → @@ -421,8 +403,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and - -

    Read more → diff --git a/docs/categories/page/4/index.html b/docs/categories/page/4/index.html index b1f8d25a0..09828816f 100644 --- a/docs/categories/page/4/index.html +++ b/docs/categories/page/4/index.html @@ -14,7 +14,7 @@ - + @@ -108,8 +108,6 @@
  • You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
  • Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
  • - -

    Read more → @@ -138,8 +136,6 @@
    # awk '{print $1}' /var/log/nginx/rest.log  | uniq | wc -l
     3168
     
    - -

    Read more → @@ -166,8 +162,6 @@
  • This will save us a few gigs of backup space we’re paying for on S3
  • Also, I noticed the checker log has some errors we should pay attention to:
  • - -

    Read more → @@ -192,8 +186,6 @@
  • For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
  • Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
  • - -

    Read more → @@ -225,8 +217,6 @@
  • Not only are there 49,000 countries, we have some blanks (25)…
  • Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
  • - -

    Read more → @@ -251,8 +241,6 @@
  • I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
  • Update GitHub wiki for documentation of maintenance tasks.
  • - -

    Read more → @@ -282,8 +270,6 @@ -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
    - -

    Read more → @@ -312,8 +298,6 @@
    $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
     78
     
    - -

    Read more → diff --git a/docs/cgiar-library-migration/index.html b/docs/cgiar-library-migration/index.html index 60ebec85c..60aae80f0 100644 --- a/docs/cgiar-library-migration/index.html +++ b/docs/cgiar-library-migration/index.html @@ -14,7 +14,7 @@ - + @@ -95,8 +95,6 @@

    Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization.

    -

    -

    Pre-migration Technical TODOs

    Things that need to happen before the migration:

    diff --git a/docs/index.html b/docs/index.html index 23bec0f5d..154e1cc35 100644 --- a/docs/index.html +++ b/docs/index.html @@ -14,7 +14,7 @@ - + @@ -111,8 +111,6 @@
  • Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
  • Today these are the top 10 IPs:
  • - -

    Read more → @@ -136,8 +134,6 @@
  • Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
  • I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now
  • - -

    Read more → @@ -163,8 +159,6 @@
  • Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month
  • I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:
  • - -

    Read more → @@ -201,8 +195,6 @@
  • The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
  • I ran all system updates on DSpace Test and rebooted it
  • - -

    Read more → @@ -235,8 +227,6 @@
    There is insufficient memory for the Java Runtime Environment to continue.
     
    - -

    Read more → @@ -280,8 +270,6 @@ real 74m42.646s user 8m5.056s sys 2m7.289s - -

    Read more → @@ -311,8 +299,6 @@ sys 2m7.289s
  • Then I reduced the JVM heap size from 6144 back to 5120m
  • Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
  • - -

    Read more → @@ -336,8 +322,6 @@ sys 2m7.289s
  • I tried to test something on DSpace Test but noticed that it’s down since god knows when
  • Catalina logs at least show some memory errors yesterday:
  • - -

    Read more → @@ -360,8 +344,6 @@ sys 2m7.289s - -

    Read more → @@ -387,8 +369,6 @@ sys 2m7.289s
  • Yesterday I figured out how to monitor DSpace sessions using JMX
  • I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
  • - -

    Read more → diff --git a/docs/index.xml b/docs/index.xml index 6ca38da91..698443a04 100644 --- a/docs/index.xml +++ b/docs/index.xml @@ -29,9 +29,7 @@ <ul> <li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> <li>Today these are the top 10 IPs:</li> -</ul> - -<p></p> +</ul> @@ -45,9 +43,7 @@ <ul> <li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li> <li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I&rsquo;m super busy in Nairobi right now</li> -</ul> - -<p></p> +</ul> @@ -63,9 +59,7 @@ <li>I&rsquo;ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li> <li>Also, I&rsquo;ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system&rsquo;s RAM, and we never re-ran them after migrating to larger Linodes last month</li> <li>I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&rsquo;m getting those autowire errors in Tomcat 8.5.30 again:</li> -</ul> - -<p></p> +</ul> @@ -92,9 +86,7 @@ <li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li> <li>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</li> <li>I ran all system updates on DSpace Test and rebooted it</li> -</ul> - -<p></p> +</ul> @@ -117,9 +109,7 @@ </ul> <pre><code>There is insufficient memory for the Java Runtime Environment to continue. -</code></pre> - -<p></p> +</code></pre> @@ -153,9 +143,7 @@ real 74m42.646s user 8m5.056s sys 2m7.289s -</code></pre> - -<p></p> +</code></pre> @@ -175,9 +163,7 @@ sys 2m7.289s </ul></li> <li>Then I reduced the JVM heap size from 6144 back to 5120m</li> <li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li> -</ul> - -<p></p> +</ul> @@ -191,9 +177,7 @@ sys 2m7.289s <ul> <li>I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when</li> <li>Catalina logs at least show some memory errors yesterday:</li> -</ul> - -<p></p> +</ul> @@ -206,9 +190,7 @@ sys 2m7.289s <ul> <li>Export a CSV of the IITA community metadata for Martin Mueller</li> -</ul> - -<p></p> +</ul> @@ -224,9 +206,7 @@ sys 2m7.289s <li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li> <li>Yesterday I figured out how to monitor DSpace sessions using JMX</li> <li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/">in 2018-01</a></li> -</ul> - -<p></p> +</ul> @@ -311,9 +291,7 @@ dspace.log.2018-01-02:34 <ul> <li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</li> -</ul> - -<p></p> +</ul> @@ -329,9 +307,7 @@ dspace.log.2018-01-02:34 <li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li> <li>PostgreSQL activity says there are 115 connections currently</li> <li>The list of connections to XMLUI and REST API for today:</li> -</ul> - -<p></p> +</ul> @@ -362,9 +338,7 @@ dspace.log.2018-01-02:34 <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; COPY 54701 -</code></pre> - -<p></p> +</code></pre> @@ -385,9 +359,7 @@ COPY 54701 <ul> <li>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li> <li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li> -</ul> - -<p></p> +</ul> @@ -396,9 +368,7 @@ COPY 54701 Mon, 18 Sep 2017 16:38:35 +0300 https://alanorth.github.io/cgspace-notes/cgiar-library-migration/ - <p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p> - -<p></p> + <p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p> @@ -417,9 +387,7 @@ COPY 54701 <ul> <li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> -</ul> - -<p></p> +</ul> @@ -450,9 +418,7 @@ COPY 54701 <li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li> <li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li> <li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li> -</ul> - -<p></p> +</ul> @@ -473,9 +439,7 @@ COPY 54701 <li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> <li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> <li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> -</ul> - -<p></p> +</ul> @@ -517,9 +481,7 @@ COPY 54701 </ul> <pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt -</code></pre> - -<p></p> +</code></pre> @@ -550,9 +512,7 @@ COPY 54701 <pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 -</code></pre> - -<p></p> +</code></pre> @@ -581,9 +541,7 @@ DELETE 1 <ul> <li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li> <li>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li> -</ul> - -<p></p> +</ul> @@ -598,9 +556,7 @@ DELETE 1 <li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> <li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> <li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> -</ul> - -<p></p> +</ul> @@ -627,9 +583,7 @@ DELETE 1 <li>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</li> <li>I&rsquo;ve raised a ticket with Atmire to ask</li> <li>Another worrying error from dspace.log is:</li> -</ul> - -<p></p> +</ul> @@ -644,9 +598,7 @@ DELETE 1 <li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> </ul> -<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p> - -<p></p> +<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p> @@ -669,9 +621,7 @@ DELETE 1 </ul> <pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X -</code></pre> - -<p></p> +</code></pre> @@ -690,9 +640,7 @@ DELETE 1 </ul> <pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot; -</code></pre> - -<p></p> +</code></pre> @@ -715,9 +663,7 @@ DELETE 1 <pre><code>$ git checkout -b 55new 5_x-prod $ git reset --hard ilri/5_x-prod $ git rebase -i dspace-5.5 -</code></pre> - -<p></p> +</code></pre> @@ -743,9 +689,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <ul> <li>In this case the select query was showing 95 results before the update</li> -</ul> - -<p></p> +</ul> @@ -763,9 +707,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> <li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> <li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li> -</ul> - -<p></p> +</ul> @@ -784,9 +726,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l 3168 -</code></pre> - -<p></p> +</code></pre> @@ -803,9 +743,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> <li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> <li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> -</ul> - -<p></p> +</ul> @@ -820,9 +758,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <li>Looking at issues with author authorities on CGSpace</li> <li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> <li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> -</ul> - -<p></p> +</ul> @@ -844,9 +780,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <ul> <li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li> <li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li> -</ul> - -<p></p> +</ul> @@ -861,9 +795,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li> <li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li> <li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li> -</ul> - -<p></p> +</ul> @@ -883,9 +815,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -</code></pre> - -<p></p> +</code></pre> @@ -904,9 +834,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 78 -</code></pre> - -<p></p> +</code></pre> diff --git a/docs/page/2/index.html b/docs/page/2/index.html index a9f723757..3ab66ce9a 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -14,7 +14,7 @@ - + @@ -175,8 +175,6 @@ dspace.log.2018-01-02:34 - -

    Read more → @@ -202,8 +200,6 @@ dspace.log.2018-01-02:34
  • PostgreSQL activity says there are 115 connections currently
  • The list of connections to XMLUI and REST API for today:
  • - -

    Read more → @@ -244,8 +240,6 @@ dspace.log.2018-01-02:34
    dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
     COPY 54701
     
    - -

    Read more → @@ -276,8 +270,6 @@ COPY 54701
  • There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
  • Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
  • - -

    Read more → @@ -296,8 +288,6 @@ COPY 54701

    Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization.

    - -

    Read more → @@ -326,8 +316,6 @@ COPY 54701 - -

    Read more → @@ -368,8 +356,6 @@ COPY 54701
  • I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
  • Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
  • - -

    Read more → @@ -400,8 +386,6 @@ COPY 54701
  • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
  • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
  • - -

    Read more → diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 7b5b2da3b..e32f297e9 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -14,7 +14,7 @@ - + @@ -114,8 +114,6 @@
    $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
     
    - -

    Read more → @@ -156,8 +154,6 @@
    $ identify ~/Desktop/alc_contrastes_desafios.jpg
     /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
     
    - -

    Read more → @@ -196,8 +192,6 @@ DELETE 1
  • Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
  • Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
  • - -

    Read more → @@ -222,8 +216,6 @@ DELETE 1
  • I tested on DSpace Test as well and it doesn’t work there either
  • I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
  • - -

    Read more → @@ -260,8 +252,6 @@ DELETE 1
  • I’ve raised a ticket with Atmire to ask
  • Another worrying error from dspace.log is:
  • - -

    Read more → @@ -286,8 +276,6 @@ DELETE 1

    Listings and Reports with output type

    - -

    Read more → @@ -320,8 +308,6 @@ DELETE 1
    0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
     
    - -

    Read more → @@ -350,8 +336,6 @@ DELETE 1
    $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
     
    - -

    Read more → @@ -384,8 +368,6 @@ DELETE 1 $ git reset --hard ilri/5_x-prod $ git rebase -i dspace-5.5 - -

    Read more → @@ -421,8 +403,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and - -

    Read more → diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 85344c20f..caf0a14c1 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -14,7 +14,7 @@ - + @@ -108,8 +108,6 @@
  • You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
  • Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
  • - -

    Read more → @@ -138,8 +136,6 @@
    # awk '{print $1}' /var/log/nginx/rest.log  | uniq | wc -l
     3168
     
    - -

    Read more → @@ -166,8 +162,6 @@
  • This will save us a few gigs of backup space we’re paying for on S3
  • Also, I noticed the checker log has some errors we should pay attention to:
  • - -

    Read more → @@ -192,8 +186,6 @@
  • For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
  • Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
  • - -

    Read more → @@ -225,8 +217,6 @@
  • Not only are there 49,000 countries, we have some blanks (25)…
  • Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
  • - -

    Read more → @@ -251,8 +241,6 @@
  • I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
  • Update GitHub wiki for documentation of maintenance tasks.
  • - -

    Read more → @@ -282,8 +270,6 @@ -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz - -

    Read more → @@ -312,8 +298,6 @@
    $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
     78
     
    - -

    Read more → diff --git a/docs/posts/index.html b/docs/posts/index.html index 3cef573e3..8b858b9ed 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -14,7 +14,7 @@ - + @@ -111,8 +111,6 @@
  • Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
  • Today these are the top 10 IPs:
  • - -

    Read more → @@ -136,8 +134,6 @@
  • Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
  • I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now
  • - -

    Read more → @@ -163,8 +159,6 @@
  • Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month
  • I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:
  • - -

    Read more → @@ -201,8 +195,6 @@
  • The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
  • I ran all system updates on DSpace Test and rebooted it
  • - -

    Read more → @@ -235,8 +227,6 @@
    There is insufficient memory for the Java Runtime Environment to continue.
     
    - -

    Read more → @@ -280,8 +270,6 @@ real 74m42.646s user 8m5.056s sys 2m7.289s - -

    Read more → @@ -311,8 +299,6 @@ sys 2m7.289s
  • Then I reduced the JVM heap size from 6144 back to 5120m
  • Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
  • - -

    Read more → @@ -336,8 +322,6 @@ sys 2m7.289s
  • I tried to test something on DSpace Test but noticed that it’s down since god knows when
  • Catalina logs at least show some memory errors yesterday:
  • - -

    Read more → @@ -360,8 +344,6 @@ sys 2m7.289s - -

    Read more → @@ -387,8 +369,6 @@ sys 2m7.289s
  • Yesterday I figured out how to monitor DSpace sessions using JMX
  • I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
  • - -

    Read more → diff --git a/docs/posts/index.xml b/docs/posts/index.xml index f84ea936a..ff20bc24c 100644 --- a/docs/posts/index.xml +++ b/docs/posts/index.xml @@ -29,9 +29,7 @@ <ul> <li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> <li>Today these are the top 10 IPs:</li> -</ul> - -<p></p> +</ul> @@ -45,9 +43,7 @@ <ul> <li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li> <li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I&rsquo;m super busy in Nairobi right now</li> -</ul> - -<p></p> +</ul> @@ -63,9 +59,7 @@ <li>I&rsquo;ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li> <li>Also, I&rsquo;ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system&rsquo;s RAM, and we never re-ran them after migrating to larger Linodes last month</li> <li>I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&rsquo;m getting those autowire errors in Tomcat 8.5.30 again:</li> -</ul> - -<p></p> +</ul> @@ -92,9 +86,7 @@ <li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li> <li>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</li> <li>I ran all system updates on DSpace Test and rebooted it</li> -</ul> - -<p></p> +</ul> @@ -117,9 +109,7 @@ </ul> <pre><code>There is insufficient memory for the Java Runtime Environment to continue. -</code></pre> - -<p></p> +</code></pre> @@ -153,9 +143,7 @@ real 74m42.646s user 8m5.056s sys 2m7.289s -</code></pre> - -<p></p> +</code></pre> @@ -175,9 +163,7 @@ sys 2m7.289s </ul></li> <li>Then I reduced the JVM heap size from 6144 back to 5120m</li> <li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li> -</ul> - -<p></p> +</ul> @@ -191,9 +177,7 @@ sys 2m7.289s <ul> <li>I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when</li> <li>Catalina logs at least show some memory errors yesterday:</li> -</ul> - -<p></p> +</ul> @@ -206,9 +190,7 @@ sys 2m7.289s <ul> <li>Export a CSV of the IITA community metadata for Martin Mueller</li> -</ul> - -<p></p> +</ul> @@ -224,9 +206,7 @@ sys 2m7.289s <li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li> <li>Yesterday I figured out how to monitor DSpace sessions using JMX</li> <li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/">in 2018-01</a></li> -</ul> - -<p></p> +</ul> @@ -311,9 +291,7 @@ dspace.log.2018-01-02:34 <ul> <li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</li> -</ul> - -<p></p> +</ul> @@ -329,9 +307,7 @@ dspace.log.2018-01-02:34 <li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li> <li>PostgreSQL activity says there are 115 connections currently</li> <li>The list of connections to XMLUI and REST API for today:</li> -</ul> - -<p></p> +</ul> @@ -362,9 +338,7 @@ dspace.log.2018-01-02:34 <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; COPY 54701 -</code></pre> - -<p></p> +</code></pre> @@ -385,9 +359,7 @@ COPY 54701 <ul> <li>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li> <li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li> -</ul> - -<p></p> +</ul> @@ -396,9 +368,7 @@ COPY 54701 Mon, 18 Sep 2017 16:38:35 +0300 https://alanorth.github.io/cgspace-notes/cgiar-library-migration/ - <p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p> - -<p></p> + <p>Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called <em>CGIAR System Organization</em>.</p> @@ -417,9 +387,7 @@ COPY 54701 <ul> <li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> -</ul> - -<p></p> +</ul> @@ -450,9 +418,7 @@ COPY 54701 <li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li> <li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li> <li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li> -</ul> - -<p></p> +</ul> @@ -473,9 +439,7 @@ COPY 54701 <li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> <li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> <li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> -</ul> - -<p></p> +</ul> @@ -517,9 +481,7 @@ COPY 54701 </ul> <pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt -</code></pre> - -<p></p> +</code></pre> @@ -550,9 +512,7 @@ COPY 54701 <pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 -</code></pre> - -<p></p> +</code></pre> @@ -581,9 +541,7 @@ DELETE 1 <ul> <li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li> <li>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li> -</ul> - -<p></p> +</ul> @@ -598,9 +556,7 @@ DELETE 1 <li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> <li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> <li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> -</ul> - -<p></p> +</ul> @@ -627,9 +583,7 @@ DELETE 1 <li>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</li> <li>I&rsquo;ve raised a ticket with Atmire to ask</li> <li>Another worrying error from dspace.log is:</li> -</ul> - -<p></p> +</ul> @@ -644,9 +598,7 @@ DELETE 1 <li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> </ul> -<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p> - -<p></p> +<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p> @@ -669,9 +621,7 @@ DELETE 1 </ul> <pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X -</code></pre> - -<p></p> +</code></pre> @@ -690,9 +640,7 @@ DELETE 1 </ul> <pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot; -</code></pre> - -<p></p> +</code></pre> @@ -715,9 +663,7 @@ DELETE 1 <pre><code>$ git checkout -b 55new 5_x-prod $ git reset --hard ilri/5_x-prod $ git rebase -i dspace-5.5 -</code></pre> - -<p></p> +</code></pre> @@ -743,9 +689,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <ul> <li>In this case the select query was showing 95 results before the update</li> -</ul> - -<p></p> +</ul> @@ -763,9 +707,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> <li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> <li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li> -</ul> - -<p></p> +</ul> @@ -784,9 +726,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l 3168 -</code></pre> - -<p></p> +</code></pre> @@ -803,9 +743,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> <li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> <li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> -</ul> - -<p></p> +</ul> @@ -820,9 +758,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <li>Looking at issues with author authorities on CGSpace</li> <li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> <li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> -</ul> - -<p></p> +</ul> @@ -844,9 +780,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <ul> <li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li> <li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li> -</ul> - -<p></p> +</ul> @@ -861,9 +795,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li> <li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li> <li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li> -</ul> - -<p></p> +</ul> @@ -883,9 +815,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -</code></pre> - -<p></p> +</code></pre> @@ -904,9 +834,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 78 -</code></pre> - -<p></p> +</code></pre> diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index c85a3bcff..2657609a0 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -14,7 +14,7 @@ - + @@ -175,8 +175,6 @@ dspace.log.2018-01-02:34 - -

    Read more → @@ -202,8 +200,6 @@ dspace.log.2018-01-02:34
  • PostgreSQL activity says there are 115 connections currently
  • The list of connections to XMLUI and REST API for today:
  • - -

    Read more → @@ -244,8 +240,6 @@ dspace.log.2018-01-02:34
    dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
     COPY 54701
     
    - -

    Read more → @@ -276,8 +270,6 @@ COPY 54701
  • There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
  • Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
  • - -

    Read more → @@ -296,8 +288,6 @@ COPY 54701

    Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization.

    - -

    Read more → @@ -326,8 +316,6 @@ COPY 54701 - -

    Read more → @@ -368,8 +356,6 @@ COPY 54701
  • I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
  • Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
  • - -

    Read more → @@ -400,8 +386,6 @@ COPY 54701
  • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
  • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
  • - -

    Read more → diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 026a26f14..3f54449a2 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -14,7 +14,7 @@ - + @@ -114,8 +114,6 @@
    $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
     
    - -

    Read more → @@ -156,8 +154,6 @@
    $ identify ~/Desktop/alc_contrastes_desafios.jpg
     /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
     
    - -

    Read more → @@ -196,8 +192,6 @@ DELETE 1
  • Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
  • Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
  • - -

    Read more → @@ -222,8 +216,6 @@ DELETE 1
  • I tested on DSpace Test as well and it doesn’t work there either
  • I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
  • - -

    Read more → @@ -260,8 +252,6 @@ DELETE 1
  • I’ve raised a ticket with Atmire to ask
  • Another worrying error from dspace.log is:
  • - -

    Read more → @@ -286,8 +276,6 @@ DELETE 1

    Listings and Reports with output type

    - -

    Read more → @@ -320,8 +308,6 @@ DELETE 1
    0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
     
    - -

    Read more → @@ -350,8 +336,6 @@ DELETE 1
    $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
     
    - -

    Read more → @@ -384,8 +368,6 @@ DELETE 1 $ git reset --hard ilri/5_x-prod $ git rebase -i dspace-5.5 - -

    Read more → @@ -421,8 +403,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and - -

    Read more → diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 3236a49f0..7400f0ccd 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -14,7 +14,7 @@ - + @@ -108,8 +108,6 @@
  • You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
  • Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
  • - -

    Read more → @@ -138,8 +136,6 @@
    # awk '{print $1}' /var/log/nginx/rest.log  | uniq | wc -l
     3168
     
    - -

    Read more → @@ -166,8 +162,6 @@
  • This will save us a few gigs of backup space we’re paying for on S3
  • Also, I noticed the checker log has some errors we should pay attention to:
  • - -

    Read more → @@ -192,8 +186,6 @@
  • For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
  • Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
  • - -

    Read more → @@ -225,8 +217,6 @@
  • Not only are there 49,000 countries, we have some blanks (25)…
  • Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
  • - -

    Read more → @@ -251,8 +241,6 @@
  • I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
  • Update GitHub wiki for documentation of maintenance tasks.
  • - -

    Read more → @@ -282,8 +270,6 @@ -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz - -

    Read more → @@ -312,8 +298,6 @@
    $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
     78
     
    - -

    Read more → diff --git a/docs/robots.txt b/docs/robots.txt index 3eebe0dcf..b8338064a 100644 --- a/docs/robots.txt +++ b/docs/robots.txt @@ -41,7 +41,7 @@ Disallow: /cgspace-notes/2015-12/ Disallow: /cgspace-notes/2015-11/ Disallow: /cgspace-notes/ Disallow: /cgspace-notes/categories/ -Disallow: /cgspace-notes/tags/notes/ Disallow: /cgspace-notes/categories/notes/ +Disallow: /cgspace-notes/tags/notes/ Disallow: /cgspace-notes/posts/ Disallow: /cgspace-notes/tags/ diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 6bff85414..c00bfbc92 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-11/ - 2018-11-06T18:03:44+02:00 + 2018-11-07T19:20:25+02:00 @@ -194,7 +194,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-11-06T18:03:44+02:00 + 2018-11-07T19:20:25+02:00 0 @@ -203,27 +203,27 @@ 0 - - https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-11-06T18:03:44+02:00 - 0 - - https://alanorth.github.io/cgspace-notes/categories/notes/ 2018-03-09T22:10:33+02:00 0 + + https://alanorth.github.io/cgspace-notes/tags/notes/ + 2018-11-07T19:20:25+02:00 + 0 + + https://alanorth.github.io/cgspace-notes/posts/ - 2018-11-06T18:03:44+02:00 + 2018-11-07T19:20:25+02:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-11-06T18:03:44+02:00 + 2018-11-07T19:20:25+02:00 0 diff --git a/docs/tags/index.html b/docs/tags/index.html index a4d0067b8..343c20866 100644 --- a/docs/tags/index.html +++ b/docs/tags/index.html @@ -14,7 +14,7 @@ - + @@ -111,8 +111,6 @@
  • Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
  • Today these are the top 10 IPs:
  • - -

    Read more → @@ -136,8 +134,6 @@
  • Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
  • I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now
  • - -

    Read more → @@ -163,8 +159,6 @@
  • Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month
  • I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:
  • - -

    Read more → @@ -201,8 +195,6 @@
  • The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
  • I ran all system updates on DSpace Test and rebooted it
  • - -

    Read more → @@ -235,8 +227,6 @@
    There is insufficient memory for the Java Runtime Environment to continue.
     
    - -

    Read more → @@ -280,8 +270,6 @@ real 74m42.646s user 8m5.056s sys 2m7.289s - -

    Read more → @@ -311,8 +299,6 @@ sys 2m7.289s
  • Then I reduced the JVM heap size from 6144 back to 5120m
  • Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
  • - -

    Read more → @@ -336,8 +322,6 @@ sys 2m7.289s
  • I tried to test something on DSpace Test but noticed that it’s down since god knows when
  • Catalina logs at least show some memory errors yesterday:
  • - -

    Read more → @@ -360,8 +344,6 @@ sys 2m7.289s - -

    Read more → @@ -387,8 +369,6 @@ sys 2m7.289s
  • Yesterday I figured out how to monitor DSpace sessions using JMX
  • I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
  • - -

    Read more → diff --git a/docs/tags/notes/index.html b/docs/tags/notes/index.html index 6854409e6..f47e3e53e 100644 --- a/docs/tags/notes/index.html +++ b/docs/tags/notes/index.html @@ -14,7 +14,7 @@ - + @@ -96,8 +96,6 @@
  • Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
  • Today these are the top 10 IPs:
  • - -

    Read more → @@ -121,8 +119,6 @@
  • Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
  • I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now
  • - -

    Read more → @@ -148,8 +144,6 @@
  • Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month
  • I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:
  • - -

    Read more → @@ -186,8 +180,6 @@
  • The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes
  • I ran all system updates on DSpace Test and rebooted it
  • - -

    Read more → @@ -220,8 +212,6 @@
    There is insufficient memory for the Java Runtime Environment to continue.
     
    - -

    Read more → @@ -265,8 +255,6 @@ real 74m42.646s user 8m5.056s sys 2m7.289s - -

    Read more → @@ -296,8 +284,6 @@ sys 2m7.289s
  • Then I reduced the JVM heap size from 6144 back to 5120m
  • Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
  • - -

    Read more → @@ -321,8 +307,6 @@ sys 2m7.289s
  • I tried to test something on DSpace Test but noticed that it’s down since god knows when
  • Catalina logs at least show some memory errors yesterday:
  • - -

    Read more → @@ -345,8 +329,6 @@ sys 2m7.289s - -

    Read more → @@ -372,8 +354,6 @@ sys 2m7.289s
  • Yesterday I figured out how to monitor DSpace sessions using JMX
  • I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
  • - -

    Read more → diff --git a/docs/tags/notes/index.xml b/docs/tags/notes/index.xml index b6f0a76d6..b0f93b0c9 100644 --- a/docs/tags/notes/index.xml +++ b/docs/tags/notes/index.xml @@ -29,9 +29,7 @@ <ul> <li>Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage</li> <li>Today these are the top 10 IPs:</li> -</ul> - -<p></p> +</ul> @@ -45,9 +43,7 @@ <ul> <li>Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items</li> <li>I created a GitHub issue to track this <a href="https://github.com/ilri/DSpace/issues/389">#389</a>, because I&rsquo;m super busy in Nairobi right now</li> -</ul> - -<p></p> +</ul> @@ -63,9 +59,7 @@ <li>I&rsquo;ll update the DSpace role in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a> and run the updated playbooks on CGSpace and DSpace Test</li> <li>Also, I&rsquo;ll re-run the <code>postgresql</code> tasks because the custom PostgreSQL variables are dynamic according to the system&rsquo;s RAM, and we never re-ran them after migrating to larger Linodes last month</li> <li>I&rsquo;m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I&rsquo;m getting those autowire errors in Tomcat 8.5.30 again:</li> -</ul> - -<p></p> +</ul> @@ -92,9 +86,7 @@ <li>Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did a few months ago when we tried to run the whole CGSpace Solr core</li> <li>The server only has 8GB of RAM so we&rsquo;ll eventually need to upgrade to a larger one because we&rsquo;ll start starving the OS, PostgreSQL, and command line batch processes</li> <li>I ran all system updates on DSpace Test and rebooted it</li> -</ul> - -<p></p> +</ul> @@ -117,9 +109,7 @@ </ul> <pre><code>There is insufficient memory for the Java Runtime Environment to continue. -</code></pre> - -<p></p> +</code></pre> @@ -153,9 +143,7 @@ real 74m42.646s user 8m5.056s sys 2m7.289s -</code></pre> - -<p></p> +</code></pre> @@ -175,9 +163,7 @@ sys 2m7.289s </ul></li> <li>Then I reduced the JVM heap size from 6144 back to 5120m</li> <li>Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> to support hosts choosing which distribution they want to use</li> -</ul> - -<p></p> +</ul> @@ -191,9 +177,7 @@ sys 2m7.289s <ul> <li>I tried to test something on DSpace Test but noticed that it&rsquo;s down since god knows when</li> <li>Catalina logs at least show some memory errors yesterday:</li> -</ul> - -<p></p> +</ul> @@ -206,9 +190,7 @@ sys 2m7.289s <ul> <li>Export a CSV of the IITA community metadata for Martin Mueller</li> -</ul> - -<p></p> +</ul> @@ -224,9 +206,7 @@ sys 2m7.289s <li>We don&rsquo;t need to distinguish between internal and external works, so that makes it just a simple list</li> <li>Yesterday I figured out how to monitor DSpace sessions using JMX</li> <li>I copied the logic in the <code>jmx_tomcat_dbpools</code> provided by Ubuntu&rsquo;s <code>munin-plugins-java</code> package and used the stuff I discovered about JMX <a href="https://alanorth.github.io/cgspace-notes/cgspace-notes/2018-01/">in 2018-01</a></li> -</ul> - -<p></p> +</ul> @@ -311,9 +291,7 @@ dspace.log.2018-01-02:34 <ul> <li>Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let&rsquo;s Encrypt if it&rsquo;s just a handful of domains</li> -</ul> - -<p></p> +</ul> @@ -329,9 +307,7 @@ dspace.log.2018-01-02:34 <li>The logs say &ldquo;Timeout waiting for idle object&rdquo;</li> <li>PostgreSQL activity says there are 115 connections currently</li> <li>The list of connections to XMLUI and REST API for today:</li> -</ul> - -<p></p> +</ul> @@ -362,9 +338,7 @@ dspace.log.2018-01-02:34 <pre><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; COPY 54701 -</code></pre> - -<p></p> +</code></pre> @@ -385,9 +359,7 @@ COPY 54701 <ul> <li>There appears to be a pattern but I&rsquo;ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine</li> <li>Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections</li> -</ul> - -<p></p> +</ul> @@ -406,9 +378,7 @@ COPY 54701 <ul> <li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> -</ul> - -<p></p> +</ul> @@ -439,9 +409,7 @@ COPY 54701 <li>This was due to newline characters in the <code>dc.description.abstract</code> column, which caused OpenRefine to choke when exporting the CSV</li> <li>I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using <code>g/^$/d</code></li> <li>Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet</li> -</ul> - -<p></p> +</ul> @@ -462,9 +430,7 @@ COPY 54701 <li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> <li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> <li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> -</ul> - -<p></p> +</ul> @@ -506,9 +472,7 @@ COPY 54701 </ul> <pre><code>$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p &quot;ImageMagick PDF Thumbnail&quot; -v &gt;&amp; /tmp/filter-media-cmyk.txt -</code></pre> - -<p></p> +</code></pre> @@ -539,9 +503,7 @@ COPY 54701 <pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 -</code></pre> - -<p></p> +</code></pre> @@ -570,9 +532,7 @@ DELETE 1 <ul> <li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li> <li>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li> -</ul> - -<p></p> +</ul> @@ -587,9 +547,7 @@ DELETE 1 <li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> <li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> <li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> -</ul> - -<p></p> +</ul> @@ -616,9 +574,7 @@ DELETE 1 <li>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</li> <li>I&rsquo;ve raised a ticket with Atmire to ask</li> <li>Another worrying error from dspace.log is:</li> -</ul> - -<p></p> +</ul> @@ -633,9 +589,7 @@ DELETE 1 <li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> </ul> -<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p> - -<p></p> +<p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type" /></p> @@ -658,9 +612,7 @@ DELETE 1 </ul> <pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X -</code></pre> - -<p></p> +</code></pre> @@ -679,9 +631,7 @@ DELETE 1 </ul> <pre><code>$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b &quot;dc=cgiarad,dc=org&quot; -D &quot;admigration1@cgiarad.org&quot; -W &quot;(sAMAccountName=admigration1)&quot; -</code></pre> - -<p></p> +</code></pre> @@ -704,9 +654,7 @@ DELETE 1 <pre><code>$ git checkout -b 55new 5_x-prod $ git reset --hard ilri/5_x-prod $ git rebase -i dspace-5.5 -</code></pre> - -<p></p> +</code></pre> @@ -732,9 +680,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <ul> <li>In this case the select query was showing 95 results before the update</li> -</ul> - -<p></p> +</ul> @@ -752,9 +698,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> <li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> <li>Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in <code>dc.identifier.fund</code> to <code>cg.identifier.cpwfproject</code> and then the rest to <code>dc.description.sponsorship</code></li> -</ul> - -<p></p> +</ul> @@ -773,9 +717,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <pre><code># awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l 3168 -</code></pre> - -<p></p> +</code></pre> @@ -792,9 +734,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> <li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> <li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> -</ul> - -<p></p> +</ul> @@ -809,9 +749,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <li>Looking at issues with author authorities on CGSpace</li> <li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> <li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> -</ul> - -<p></p> +</ul> @@ -833,9 +771,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <ul> <li>Not only are there 49,000 countries, we have some blanks (25)&hellip;</li> <li>Also, lots of things like &ldquo;COTE D`LVOIRE&rdquo; and &ldquo;COTE D IVOIRE&rdquo;</li> -</ul> - -<p></p> +</ul> @@ -850,9 +786,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <li>Move ILRI collection <code>10568/12503</code> from <code>10568/27869</code> to <code>10568/27629</code> using the <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">move_collections.sh</a> script I wrote last year.</li> <li>I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.</li> <li>Update GitHub wiki for documentation of <a href="https://github.com/ilri/DSpace/wiki/Maintenance-Tasks">maintenance tasks</a>.</li> -</ul> - -<p></p> +</ul> @@ -872,9 +806,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz -</code></pre> - -<p></p> +</code></pre> @@ -893,9 +825,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <pre><code>$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 78 -</code></pre> - -<p></p> +</code></pre> diff --git a/docs/tags/notes/page/2/index.html b/docs/tags/notes/page/2/index.html index e3d0340ee..5a9ecff78 100644 --- a/docs/tags/notes/page/2/index.html +++ b/docs/tags/notes/page/2/index.html @@ -14,7 +14,7 @@ - + @@ -160,8 +160,6 @@ dspace.log.2018-01-02:34 - -

    Read more → @@ -187,8 +185,6 @@ dspace.log.2018-01-02:34
  • PostgreSQL activity says there are 115 connections currently
  • The list of connections to XMLUI and REST API for today:
  • - -

    Read more → @@ -229,8 +225,6 @@ dspace.log.2018-01-02:34
    dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
     COPY 54701
     
    - -

    Read more → @@ -261,8 +255,6 @@ COPY 54701
  • There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
  • Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
  • - -

    Read more → @@ -291,8 +283,6 @@ COPY 54701 - -

    Read more → @@ -333,8 +323,6 @@ COPY 54701
  • I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
  • Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
  • - -

    Read more → @@ -365,8 +353,6 @@ COPY 54701
  • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
  • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
  • - -

    Read more → @@ -436,8 +422,6 @@ COPY 54701
    $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
     
    - -

    Read more → diff --git a/docs/tags/notes/page/3/index.html b/docs/tags/notes/page/3/index.html index 6bc16f95a..492efa756 100644 --- a/docs/tags/notes/page/3/index.html +++ b/docs/tags/notes/page/3/index.html @@ -14,7 +14,7 @@ - + @@ -106,8 +106,6 @@
    $ identify ~/Desktop/alc_contrastes_desafios.jpg
     /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
     
    - -

    Read more → @@ -146,8 +144,6 @@ DELETE 1
  • Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
  • Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
  • - -

    Read more → @@ -172,8 +168,6 @@ DELETE 1
  • I tested on DSpace Test as well and it doesn’t work there either
  • I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
  • - -

    Read more → @@ -210,8 +204,6 @@ DELETE 1
  • I’ve raised a ticket with Atmire to ask
  • Another worrying error from dspace.log is:
  • - -

    Read more → @@ -236,8 +228,6 @@ DELETE 1

    Listings and Reports with output type

    - -

    Read more → @@ -270,8 +260,6 @@ DELETE 1
    0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
     
    - -

    Read more → @@ -300,8 +288,6 @@ DELETE 1
    $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
     
    - -

    Read more → @@ -334,8 +320,6 @@ DELETE 1 $ git reset --hard ilri/5_x-prod $ git rebase -i dspace-5.5 - -

    Read more → @@ -371,8 +355,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and - -

    Read more → @@ -400,8 +382,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
  • You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
  • Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
  • - -

    Read more → diff --git a/docs/tags/notes/page/4/index.html b/docs/tags/notes/page/4/index.html index 80c7edf2b..29d495f54 100644 --- a/docs/tags/notes/page/4/index.html +++ b/docs/tags/notes/page/4/index.html @@ -14,7 +14,7 @@ - + @@ -94,8 +94,6 @@
    # awk '{print $1}' /var/log/nginx/rest.log  | uniq | wc -l
     3168
     
    - -

    Read more → @@ -122,8 +120,6 @@
  • This will save us a few gigs of backup space we’re paying for on S3
  • Also, I noticed the checker log has some errors we should pay attention to:
  • - -

    Read more → @@ -148,8 +144,6 @@
  • For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
  • Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
  • - -

    Read more → @@ -181,8 +175,6 @@
  • Not only are there 49,000 countries, we have some blanks (25)…
  • Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
  • - -

    Read more → @@ -207,8 +199,6 @@
  • I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
  • Update GitHub wiki for documentation of maintenance tasks.
  • - -

    Read more → @@ -238,8 +228,6 @@ -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz - -

    Read more → @@ -268,8 +256,6 @@
    $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
     78
     
    - -

    Read more → diff --git a/docs/tags/page/2/index.html b/docs/tags/page/2/index.html index a573329dc..8bd9c4fa3 100644 --- a/docs/tags/page/2/index.html +++ b/docs/tags/page/2/index.html @@ -14,7 +14,7 @@ - + @@ -175,8 +175,6 @@ dspace.log.2018-01-02:34 - -

    Read more → @@ -202,8 +200,6 @@ dspace.log.2018-01-02:34
  • PostgreSQL activity says there are 115 connections currently
  • The list of connections to XMLUI and REST API for today:
  • - -

    Read more → @@ -244,8 +240,6 @@ dspace.log.2018-01-02:34
    dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
     COPY 54701
     
    - -

    Read more → @@ -276,8 +270,6 @@ COPY 54701
  • There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
  • Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
  • - -

    Read more → @@ -296,8 +288,6 @@ COPY 54701

    Rough notes for importing the CGIAR Library content. It was decided that this content would go to a new top-level community called CGIAR System Organization.

    - -

    Read more → @@ -326,8 +316,6 @@ COPY 54701 - -

    Read more → @@ -368,8 +356,6 @@ COPY 54701
  • I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
  • Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
  • - -

    Read more → @@ -400,8 +386,6 @@ COPY 54701
  • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
  • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
  • - -

    Read more → diff --git a/docs/tags/page/3/index.html b/docs/tags/page/3/index.html index adf78f4d0..4d893c72b 100644 --- a/docs/tags/page/3/index.html +++ b/docs/tags/page/3/index.html @@ -14,7 +14,7 @@ - + @@ -114,8 +114,6 @@
    $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
     
    - -

    Read more → @@ -156,8 +154,6 @@
    $ identify ~/Desktop/alc_contrastes_desafios.jpg
     /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
     
    - -

    Read more → @@ -196,8 +192,6 @@ DELETE 1
  • Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
  • Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
  • - -

    Read more → @@ -222,8 +216,6 @@ DELETE 1
  • I tested on DSpace Test as well and it doesn’t work there either
  • I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
  • - -

    Read more → @@ -260,8 +252,6 @@ DELETE 1
  • I’ve raised a ticket with Atmire to ask
  • Another worrying error from dspace.log is:
  • - -

    Read more → @@ -286,8 +276,6 @@ DELETE 1

    Listings and Reports with output type

    - -

    Read more → @@ -320,8 +308,6 @@ DELETE 1
    0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
     
    - -

    Read more → @@ -350,8 +336,6 @@ DELETE 1
    $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
     
    - -

    Read more → @@ -384,8 +368,6 @@ DELETE 1 $ git reset --hard ilri/5_x-prod $ git rebase -i dspace-5.5 - -

    Read more → @@ -421,8 +403,6 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and - -

    Read more → diff --git a/docs/tags/page/4/index.html b/docs/tags/page/4/index.html index 043952b68..e88417e62 100644 --- a/docs/tags/page/4/index.html +++ b/docs/tags/page/4/index.html @@ -14,7 +14,7 @@ - + @@ -108,8 +108,6 @@
  • You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
  • Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
  • - -

    Read more → @@ -138,8 +136,6 @@
    # awk '{print $1}' /var/log/nginx/rest.log  | uniq | wc -l
     3168
     
    - -

    Read more → @@ -166,8 +162,6 @@
  • This will save us a few gigs of backup space we’re paying for on S3
  • Also, I noticed the checker log has some errors we should pay attention to:
  • - -

    Read more → @@ -192,8 +186,6 @@
  • For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
  • Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
  • - -

    Read more → @@ -225,8 +217,6 @@
  • Not only are there 49,000 countries, we have some blanks (25)…
  • Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
  • - -

    Read more → @@ -251,8 +241,6 @@
  • I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
  • Update GitHub wiki for documentation of maintenance tasks.
  • - -

    Read more → @@ -282,8 +270,6 @@ -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz - -

    Read more → @@ -312,8 +298,6 @@
    $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
     78
     
    - -

    Read more →