diff --git a/content/posts/2020-01.md b/content/posts/2020-01.md index a76efdd54..2bfdfd1d1 100644 --- a/content/posts/2020-01.md +++ b/content/posts/2020-01.md @@ -264,4 +264,41 @@ $ ./fix-metadata-values.py -i /tmp/2020-01-22-fix-1113-affiliations.csv -db dspa $ ./delete-metadata-values.py -i /tmp/2020-01-22-delete-36-affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211 ``` +## 2020-01-26 + +- Add "Gender" to controlled vocabulary for CRPs ([#442](https://github.com/ilri/DSpace/pull/442)) +- Deploy the changes on CGSpace and run all updates on the server and reboot it + - I had to restart the `tomcat7` service several times until all Solr statistics cores came up OK +- I spent a few hours writing a script ([create-thumbnails](https://gist.github.com/alanorth/1c7c8b2131a19559e273fbc1e58d6a71)) to compare the default DSpace thumbnails with the improved parameters above and actually when comparing them at size 600px I don't really notice much difference, other than the new ones have slightly crisper text + - So that was a waste of time, though I think our 300px thumbnails are a bit small now + - [Another thread on the ImageMagick forum](https://www.imagemagick.org/discourse-server/viewtopic.php?t=14561) mentions that you need to set the density, then read the image, then set the density again: + +``` +$ convert -density 288 10568-97925.pdf\[0\] -density 72 -filter lagrange -flatten 10568-97925-density.jpg +``` + +- One thing worth mentioning was this syntax for extracting bits from JSON in bash using `jq`: + +``` +$ RESPONSE=$(curl -s 'https://dspacetest.cgiar.org/rest/handle/10568/103447?expand=bitstreams') +$ echo $RESPONSE | jq '.bitstreams[] | select(.bundleName=="ORIGINAL") | .retrieveLink' +"/bitstreams/172559/retrieve" +``` + +## 2020-01-27 + +- Bizu has been having problems when she logs into CGSpace, she can't see the community list on the front page + - This last happened for another user in [2016-11]({{< ref "2016-11.md" >}}), and it was related to the Tomcat `maxHttpHeaderSize` being too small because the user was in too many groups + - I see that it is similar, with this message appearing in the DSpace log just after she logs in: + +``` +2020-01-27 06:02:23,681 ERROR org.dspace.app.xmlui.aspect.discovery.AbstractRecentSubmissionTransformer @ Caught SearchServiceException while retrieving recent submission for: home page +org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'read:(g0 OR e610 OR g0 OR g3 OR g5 OR g4102 OR g9 OR g4105 OR g10 OR g4107 OR g4108 OR g13 OR g4109 OR g14 OR g15 OR g16 OR g18 OR g20 OR g23 OR g24 OR g2072 OR g2074 OR g28 OR g2076 OR g29 OR g2078 OR g2080 OR g34 OR g2082 OR g2084 OR g38 OR g2086 OR g2088 OR g43 OR g2093 OR g2095 OR g2097 OR g50 OR g51 OR g2101 OR g2103 OR g62 OR g65 OR g77 OR g78 OR g2127 OR g2142 OR g2151 OR g2152 OR g2153 OR g2154 OR g2156 OR g2165 OR g2171 OR g2174 OR g2175 OR g129 OR g2178 OR g2182 OR g2186 OR g153 OR g155 OR g158 OR g166 OR g167 OR g168 OR g169 OR g2225 OR g179 OR g2227 OR g2229 OR g183 OR g2231 OR g184 OR g2233 OR g186 OR g2235 OR g2237 OR g191 OR g192 OR g193 OR g2242 OR g2244 OR g2246 OR g2250 OR g204 OR g205 OR g207 OR g208 OR g2262 OR g2265 OR g218 OR g2268 OR g222 OR g223 OR g2271 OR g2274 OR g2277 OR g230 OR g231 OR g2280 OR g2283 OR g238 OR g2286 OR g241 OR g2289 OR g244 OR g2292 OR g2295 OR g2298 OR g2301 OR g254 OR g255 OR g2305 OR g2308 OR g262 OR g2311 OR g265 OR g268 OR g269 OR g273 OR g276 OR g277 OR g279 OR g282 OR g292 OR g293 OR g296 OR g297 OR g301 OR g303 OR g305 OR g2353 OR g310 OR g311 OR g313 OR g321 OR g325 OR g328 OR g333 OR g334 OR g342 OR g343 OR g345 OR g348 OR g2409 [...] ': too many boolean clauses +``` + +- Now this appears to be a Solr limit of some kind ("too many boolean clauses") + - I changed the `maxBooleanClauses` for all Solr cores on DSpace Test from 1024 to 2048 and then she was able to see her communities... + - I made a [pull request](https://github.com/ilri/DSpace/pull/443) and merged it to the `5_x-prod` branch and will deploy on CGSpace later tonight + - I am curious if anyone on the dspace-tech mailing list has run into this, so I will try to send a message about this there when I get a chance + diff --git a/docs/2015-11/index.html b/docs/2015-11/index.html index c9cfc9cd9..1aa3d7293 100644 --- a/docs/2015-11/index.html +++ b/docs/2015-11/index.html @@ -31,7 +31,7 @@ Last week I had increased the limit from 30 to 60, which seemed to help, but now $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 78 "/> - + @@ -61,7 +61,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac - + @@ -109,7 +109,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac

November, 2015

by Alan Orth in -  + 

@@ -127,7 +127,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac

2015-11-24

$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
@@ -145,9 +145,9 @@ location ~ /(themes|static|aspects/ReportingSuite) {
     try_files $uri @tomcat;
 ...
 
@@ -157,7 +157,7 @@ location ~ /(themes|static|aspects/ReportingSuite) {
location ~ /(themes|aspects/ReportingSuite|aspects/Statistics) {
 
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
@@ -173,7 +173,7 @@ datid | datname  |  pid  | usesysid | usename  | application_name | client_addr
 ...
 

CCAFS item

2015-12-03

2016-01-19

2016-01-21

diff --git a/docs/2016-02/index.html b/docs/2016-02/index.html index 142e6cead..47cdbd618 100644 --- a/docs/2016-02/index.html +++ b/docs/2016-02/index.html @@ -35,7 +35,7 @@ I noticed we have a very interesting list of countries on CGSpace: Not only are there 49,000 countries, we have some blanks (25)… Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE” "/> - + @@ -65,7 +65,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r - + @@ -113,7 +113,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r

February, 2016

by Alan Orth in -  + 

@@ -144,7 +144,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r
dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '22678';
 
dspacetest=# delete from metadatavalue where metadata_field_id=78 and text_value='';
 DELETE 25
@@ -157,7 +157,7 @@ DELETE 25
 
 

2016-02-07

$ mv /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT.orig
 $ ln -sfv ~/dspace/webapps/xmlui /opt/brew/Cellar/tomcat/8.0.30/libexec/webapps/ROOT
@@ -199,7 +199,7 @@ $ /opt/brew/Cellar/tomcat/8.0.30/bin/catalina start
 

2016-02-08

ILRI submission buttons Drylands submission buttons

@@ -207,7 +207,7 @@ $ /opt/brew/Cellar/tomcat/8.0.30/bin/catalina start
$ cd ~/src/git
 $ git clone https://github.com/letsencrypt/letsencrypt
@@ -222,7 +222,7 @@ $ ansible-playbook dspace.yml -l linode02 -t nginx,firewall -u aorth --ask-becom
 
  • I had to export some CIAT items that were being cleaned up on the test server and I noticed their dc.contributor.author fields have DSpace 5 authority index UUIDs…
  • To clean those up in OpenRefine I used this GREL expression: value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")
  • Getting more and more hangs on DSpace Test, seemingly random but also during CSV import
  • -
  • Logs don't always show anything right when it fails, but eventually one of these appears:
  • +
  • Logs don’t always show anything right when it fails, but eventually one of these appears:
  • org.dspace.discovery.SearchServiceException: Error while processing facet fields: java.lang.OutOfMemoryError: Java heap space
     
      @@ -230,7 +230,7 @@ $ ansible-playbook dspace.yml -l linode02 -t nginx,firewall -u aorth --ask-becom
    Caused by: java.util.NoSuchElementException: Timeout waiting for idle object
     
      -
    • Right now DSpace Test's Tomcat heap is set to 1536m and we have quite a bit of free RAM:
    • +
    • Right now DSpace Test’s Tomcat heap is set to 1536m and we have quite a bit of free RAM:
    # free -m
                  total       used       free     shared    buffers     cached
    @@ -238,7 +238,7 @@ Mem:          3950       3902         48          9         37       1311
     -/+ buffers/cache:       2552       1397
     Swap:          255         57        198
     
      -
    • So I'll bump up the Tomcat heap to 2048 (CGSpace production server is using 3GB)
    • +
    • So I’ll bump up the Tomcat heap to 2048 (CGSpace production server is using 3GB)

    2016-02-11

    2016-02-12

    2016-02-12

    $ ls | grep -c -E "%"
    @@ -291,7 +291,7 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
     

    2016-02-20

    java.io.FileNotFoundException: /usr/share/tomcat7/SimpleArchiveFormat/item_1021/CIAT_COLOMBIA_000075_Medición_de_palatabilidad_en_forrajes.pdf (No such file or directory)
     
    $ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
     $ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace filter-media -f -v -i 10568/71249 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
    @@ -471,11 +471,11 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: this Index
     
    [dspace]/bin/dspace index-discovery
     
    • Now everything is ok
    • -
    • Finally finished manually running the cleanup task over and over and null'ing the conflicting IDs:
    • +
    • Finally finished manually running the cleanup task over and over and null’ing the conflicting IDs:
    dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (435, 1132, 1136, 1220, 1236, 3002, 3255, 5322, 5098, 5982, 5897, 6245, 6184, 4927, 6070, 4925, 6888, 7368, 7136, 7294, 7698, 7864, 10799, 10839, 11765, 13241, 13634, 13642, 14127, 14146, 15582, 16116, 16254, 17136, 17486, 17824, 18098, 22091, 22149, 22206, 22449, 22548, 22559, 22454, 22253, 22553, 22897, 22941, 30262, 33657, 39796, 46943, 56561, 58237, 58739, 58734, 62020, 62535, 64149, 64672, 66988, 66919, 76005, 79780, 78545, 81078, 83620, 84492, 92513, 93915);
     
      -
    • Now running the cleanup script on DSpace Test and already seeing 11GB freed from the assetstore—it's likely we haven't had a cleanup task complete successfully in years…
    • +
    • Now running the cleanup script on DSpace Test and already seeing 11GB freed from the assetstore—it’s likely we haven’t had a cleanup task complete successfully in years…

    2017-04-25

      @@ -548,7 +548,7 @@ Caused by: java.lang.ClassNotFoundException: org.dspace.statistics.content.DSpac

      2017-04-26

      • The size of the CGSpace database dump went from 111MB to 96MB, not sure about actual database size though
      • -
      • Update RVM's Ruby from 2.3.0 to 2.4.0 on DSpace Test:
      • +
      • Update RVM’s Ruby from 2.3.0 to 2.4.0 on DSpace Test:
      $ gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3
       $ \curl -sSL https://raw.githubusercontent.com/wayneeseguin/rvm/master/binscripts/rvm-installer | bash -s stable --ruby
      diff --git a/docs/2017-05/index.html b/docs/2017-05/index.html
      index b1871605f..2b84b6759 100644
      --- a/docs/2017-05/index.html
      +++ b/docs/2017-05/index.html
      @@ -6,7 +6,7 @@
       
       
       
      -
      +
       
       
       
      @@ -14,8 +14,8 @@
       
       
       
      -
      -
      +
      +
       
       
           
      @@ -45,7 +45,7 @@
           
           
           
      -    
      +    
           
       
           
      @@ -93,7 +93,7 @@
           

      May, 2017

      @@ -109,12 +109,12 @@

    2017-05-02

      -
    • Atmire got back about the Workflow Statistics issue, and apparently it's a bug in the CUA module so they will send us a pull request
    • +
    • Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request

    2017-05-04

    • Sync DSpace Test with database and assetstore from CGSpace
    • -
    • Re-deploy DSpace Test with Atmire's CUA patch for workflow statistics, run system updates, and restart the server
    • +
    • Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server
    • Now I can see the workflow statistics and am able to select users, but everything returns 0 items
    • Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b
    • Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace.cgiar.org/handle/10568/80731
    • @@ -149,8 +149,8 @@
    • We decided to use AIP export to preserve the hierarchies and handles of communities and collections
    • When ingesting some collections I was getting java.lang.OutOfMemoryError: GC overhead limit exceeded, which can be solved by disabling the GC timeout with -XX:-UseGCOverheadLimit
    • Other times I was getting an error about heap space, so I kept bumping the RAM allocation by 512MB each time (up to 4096m!) it crashed
    • -
    • This leads to tens of thousands of abandoned files in the assetstore, which need to be cleaned up using dspace cleanup -v, or else you'll run out of disk space
    • -
    • In the end I realized it's better to use submission mode (-s) to ingest the community object as a single AIP without its children, followed by each of the collections:
    • +
    • This leads to tens of thousands of abandoned files in the assetstore, which need to be cleaned up using dspace cleanup -v, or else you’ll run out of disk space
    • +
    • In the end I realized it’s better to use submission mode (-s) to ingest the community object as a single AIP without its children, followed by each of the collections:
    $ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m -XX:-UseGCOverheadLimit"
     $ [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10568/87775 /home/aorth/10947-1/10947-1.zip
    @@ -162,14 +162,14 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
     
  • Give feedback to CIFOR about their data quality:
    • Suggestion: uppercase dc.subject, cg.coverage.region, and cg.coverage.subregion in your crosswalk so they match CGSpace and therefore can be faceted / reported on easier
    • -
    • Suggestion: use CGSpace's CRP names (cg.contributor.crp), see: dspace/config/input-forms.xml
    • +
    • Suggestion: use CGSpace’s CRP names (cg.contributor.crp), see: dspace/config/input-forms.xml
    • Suggestion: clean up duplicates and errors in funders, perhaps use a controlled vocabulary like ours, see: dspace/config/controlled-vocabularies/dc-description-sponsorship.xml
    • Suggestion: use dc.type “Blog Post” instead of “Blog” for your blog post items (we are also adding a “Blog Post” type to CGSpace soon)
    • Question: many of your items use dc.document.uri AND cg.identifier.url with the same text value?
  • Help Marianne from WLE with an Open Search query to show the latest WLE CRP outputs: https://cgspace.cgiar.org/open-search/discover?query=crpsubject:WATER%2C+LAND+AND+ECOSYSTEMS&sort_by=2&order=DESC
  • -
  • This uses the webui's item list sort options, see webui.itemlist.sort-option in dspace.cfg
  • +
  • This uses the webui’s item list sort options, see webui.itemlist.sort-option in dspace.cfg
  • The equivalent Discovery search would be: https://cgspace.cgiar.org/discover?filtertype_1=crpsubject&filter_relational_operator_1=equals&filter_1=WATER%2C+LAND+AND+ECOSYSTEMS&submit_apply_filter=&query=&rpp=10&sort_by=dc.date.issued_dt&order=desc
  • 2017-05-09

    @@ -191,7 +191,7 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager

    2017-05-10

      -
    • Atmire says they are willing to extend the ORCID implementation, and I've asked them to provide a quote
    • +
    • Atmire says they are willing to extend the ORCID implementation, and I’ve asked them to provide a quote
    • I clarified that the scope of the implementation should be that ORCIDs are stored in the database and exposed via REST / API like other fields
    • Finally finished importing all the CGIAR Library content, final method was:
    @@ -239,7 +239,7 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager

    Reboot DSpace Test

  • -

    Fix cron jobs for log management on DSpace Test, as they weren't catching dspace.log.* files correctly and we had over six months of them and they were taking up many gigs of disk space

    +

    Fix cron jobs for log management on DSpace Test, as they weren’t catching dspace.log.* files correctly and we had over six months of them and they were taking up many gigs of disk space

  • 2017-05-16

    @@ -253,7 +253,7 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
    ERROR: duplicate key value violates unique constraint "handle_pkey" Detail: Key (handle_id)=(84834) already exists.
     
      -
    • I tried updating the sequences a few times, with Tomcat running and stopped, but it hasn't helped
    • +
    • I tried updating the sequences a few times, with Tomcat running and stopped, but it hasn’t helped
    • It appears item with handle_id 84834 is one of the imported CGIAR Library items:
    dspace=# select * from handle where handle_id=84834;
    @@ -269,16 +269,16 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
          86873 | 10947/99 |                2 |       89153
     (1 row)
     
      -
    • I've posted on the dspace-test mailing list to see if I can just manually set the handle_seq to that value
    • +
    • I’ve posted on the dspace-test mailing list to see if I can just manually set the handle_seq to that value
    • Actually, it seems I can manually set the handle sequence using:
    dspace=# select setval('handle_seq',86873);
     
      -
    • After that I can create collections just fine, though I'm not sure if it has other side effects
    • +
    • After that I can create collections just fine, though I’m not sure if it has other side effects

    2017-05-21

      -
    • Start creating a basic theme for the CGIAR System Organization's community on CGSpace
    • +
    • Start creating a basic theme for the CGIAR System Organization’s community on CGSpace
    • Using colors from the CGIAR Branding guidelines (2014)
    • Make a GitHub issue to track this work: #324
    @@ -315,14 +315,14 @@ AND resource_id IN (select item_id from collection2item where collection_id IN (

    2017-05-23

    • Add Affiliation to filters on Listing and Reports module (#325)
    • -
    • Start looking at WLE's Phase II metadata updates but it seems they are not tagging their items properly, as their website importer infers which theme to use based on the name of the CGSpace collection!
    • -
    • For now I've suggested that they just change the collection names and that we fix their metadata manually afterwards
    • +
    • Start looking at WLE’s Phase II metadata updates but it seems they are not tagging their items properly, as their website importer infers which theme to use based on the name of the CGSpace collection!
    • +
    • For now I’ve suggested that they just change the collection names and that we fix their metadata manually afterwards
    • Also, they have a lot of messed up values in their cg.subject.wle field so I will clean up some of those first:
    dspace=# \copy (select distinct text_value from metadatavalue where resource_type_id=2 and metadata_field_id=119) to /tmp/wle.csv with csv;
     COPY 111
     
      -
    • Respond to Atmire message about ORCIDs, saying that right now we'd prefer to just have them available via REST API like any other metadata field, and that I'm available for a Skype
    • +
    • Respond to Atmire message about ORCIDs, saying that right now we’d prefer to just have them available via REST API like any other metadata field, and that I’m available for a Skype

    2017-05-26

      @@ -334,7 +334,7 @@ COPY 111
    • File an issue on GitHub to explore/track migration to proper country/region codes (ISO 2/3 and UN M.49): #326
    • Ask Peter how the Landportal.info people should acknowledge us as the source of data on their website
    • Communicate with MARLO people about progress on exposing ORCIDs via the REST API, as it is set to be discussed in the June, 2017 DCAT meeting
    • -
    • Find all of Amos Omore's author name variations so I can link them to his authority entry that has an ORCID:
    • +
    • Find all of Amos Omore’s author name variations so I can link them to his authority entry that has an ORCID:
    dspace=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like 'Omore, A%';
     
      @@ -347,7 +347,7 @@ UPDATE 187
    dspace=# select distinct text_value, authority, confidence from metadatavalue where metadata_field_id=3 and text_value like 'Twine, E%';
     
      -
    • But it doesn't look like any of his existing entries are linked to an authority which has an ORCID, so I edited the metadata via “Edit this Item” and looked up his ORCID and linked it there
    • +
    • But it doesn’t look like any of his existing entries are linked to an authority which has an ORCID, so I edited the metadata via “Edit this Item” and looked up his ORCID and linked it there
    • Now I should be able to set his name variations to the new authority:
    dspace=# update metadatavalue set authority='f70d0a01-d562-45b8-bca3-9cf7f249bc8b', confidence=600 where metadata_field_id=3 and resource_type_id=2 and text_value like 'Twine, E%';
    @@ -359,7 +359,7 @@ UPDATE 187
     
    • Discuss WLE themes and subjects with Mia and Macaroni Bros
    • We decided we need to create metadata fields for Phase I and II themes
    • -
    • I've updated the existing GitHub issue for Phase II (#322) and created a new one to track the changes for Phase I themes (#327)
    • +
    • I’ve updated the existing GitHub issue for Phase II (#322) and created a new one to track the changes for Phase I themes (#327)
    • After Macaroni Bros update the WLE website importer we will rename the WLE collections to reflect Phase II
    • Also, we need to have Mia and Udana look through the existing metadata in cg.subject.wle as it is quite a mess
    diff --git a/docs/2017-06/index.html b/docs/2017-06/index.html index bad2692d0..4380d0109 100644 --- a/docs/2017-06/index.html +++ b/docs/2017-06/index.html @@ -6,7 +6,7 @@ - + @@ -14,8 +14,8 @@ - - + + @@ -45,7 +45,7 @@ - + @@ -93,7 +93,7 @@

    June, 2017

    @@ -101,7 +101,7 @@
    • After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes
    • The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes
    • -
    • Then we'll create a new sub-community for Phase II and create collections for the research themes there
    • +
    • Then we’ll create a new sub-community for Phase II and create collections for the research themes there
    • The current “Research Themes” community will be renamed to “WLE Phase I Research Themes”
    • Tagged all items in the current Phase I collections with their appropriate themes
    • Create pull request to add Phase II research themes to the submission form: #328
    • @@ -111,15 +111,15 @@
      • After adding cg.identifier.wletheme to 1106 WLE items I can see the field on XMLUI but not in REST!
      • Strangely it happens on DSpace Test AND on CGSpace!
      • -
      • I tried to re-index Discovery but it didn't fix it
      • +
      • I tried to re-index Discovery but it didn’t fix it
      • Run all system updates on DSpace Test and reboot the server
      • After rebooting the server (and therefore restarting Tomcat) the new metadata field is available
      • -
      • I've sent a message to the dspace-tech mailing list to ask if this is a bug and whether I should file a Jira ticket
      • +
      • I’ve sent a message to the dspace-tech mailing list to ask if this is a bug and whether I should file a Jira ticket

      2016-06-05

        -
      • Rename WLE's “Research Themes” sub-community to “WLE Phase I Research Themes” on DSpace Test so Macaroni Bros can continue their testing
      • -
      • Macaroni Bros tested it and said it's fine, so I renamed it on CGSpace as well
      • +
      • Rename WLE’s “Research Themes” sub-community to “WLE Phase I Research Themes” on DSpace Test so Macaroni Bros can continue their testing
      • +
      • Macaroni Bros tested it and said it’s fine, so I renamed it on CGSpace as well
      • Working on how to automate the extraction of the CIAT Book chapters, doing some magic in OpenRefine to extract page from–to from cg.identifier.url and dc.format.extent, respectively:
        • cg.identifier.url: value.split("page=", "")[1]
        • @@ -144,7 +144,7 @@
      • 17 of the items have issues with incorrect page number ranges, and upon closer inspection they do not appear in the referenced PDF
      • -
      • I've flagged them and proceeded without them (752 total) on DSpace Test:
      • +
      • I’ve flagged them and proceeded without them (752 total) on DSpace Test:
      $ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/93843 --source /home/aorth/src/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &> /tmp/ciat-books.log
       
        @@ -154,9 +154,9 @@

      2017-06-07

        -
      • Testing Atmire's patch for the CUA Workflow Statistics again
      • -
      • Still doesn't seem to give results I'd expect, like there are no results for Maria Garruccio, or for the ILRI community!
      • -
      • Then I'll file an update to the issue on Atmire's tracker
      • +
      • Testing Atmire’s patch for the CUA Workflow Statistics again
      • +
      • Still doesn’t seem to give results I’d expect, like there are no results for Maria Garruccio, or for the ILRI community!
      • +
      • Then I’ll file an update to the issue on Atmire’s tracker
      • Created a new branch with just the relevant changes, so I can send it to them
      • One thing I noticed is that there is a failed database migration related to CUA:
      @@ -194,7 +194,7 @@

    2017-06-20

      -
    • Import Abenet and Peter's changes to the CGIAR Library CRP community
    • +
    • Import Abenet and Peter’s changes to the CGIAR Library CRP community
    • Due to them using Windows and renaming some columns there were formatting, encoding, and duplicate metadata value issues
    • I had to remove some fields from the CSV and rename some back to, ie, dc.subject[en_US] just so DSpace would detect changes properly
    • Now it looks much better: https://dspacetest.cgiar.org/handle/10947/2517
    • @@ -212,7 +212,7 @@ $ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace impo
      • WLE has said that one of their Phase II research themes is being renamed from Regenerating Degraded Landscapes to Restoring Degraded Landscapes
      • Pull request with the changes to input-forms.xml: #329
      • -
      • As of now it doesn't look like there are any items using this research theme so we don't need to do any updates:
      • +
      • As of now it doesn’t look like there are any items using this research theme so we don’t need to do any updates:
      dspace=# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=237 and text_value like 'Regenerating Degraded Landscapes%';
        text_value
      @@ -229,15 +229,15 @@ $ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace impo
       
      Java stacktrace: java.util.NoSuchElementException: Timeout waiting for idle object
       
      • After looking at the Tomcat logs, Munin graphs, and PostgreSQL connection stats, it seems there is just a high load
      • -
      • Might be a good time to adjust DSpace's database connection settings, like I first mentioned in April, 2017 after reading the 2017-04 DCAT comments
      • -
      • I've adjusted the following in CGSpace's config: +
      • Might be a good time to adjust DSpace’s database connection settings, like I first mentioned in April, 2017 after reading the 2017-04 DCAT comments
      • +
      • I’ve adjusted the following in CGSpace’s config:
          -
        • db.maxconnections 30→70 (the default PostgreSQL config allows 100 connections, so DSpace's default of 30 is quite low)
        • +
        • db.maxconnections 30→70 (the default PostgreSQL config allows 100 connections, so DSpace’s default of 30 is quite low)
        • db.maxwait 5000→10000
        • db.maxidle 8→20 (DSpace default is -1, unlimited, but we had set it to 8 earlier)
      • -
      • We will need to adjust this again (as well as the pg_hba.conf settings) when we deploy tsega's REST API
      • +
      • We will need to adjust this again (as well as the pg_hba.conf settings) when we deploy tsega’s REST API
      • Whip up a test for Marianne of WLE to be able to show both their Phase I and II research themes in the CGSpace item submission form:

      Test A for displaying the Phase I and II research themes diff --git a/docs/2017-07/index.html b/docs/2017-07/index.html index 752ad42e3..9704ee0b4 100644 --- a/docs/2017-07/index.html +++ b/docs/2017-07/index.html @@ -13,8 +13,8 @@ Run system updates and reboot DSpace Test 2017-07-04 Merge changes for WLE Phase II theme rename (#329) -Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace -We can use PostgreSQL's extended output format (-x) plus sed to format the output into quasi XML: +Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace +We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML: " /> @@ -30,10 +30,10 @@ Run system updates and reboot DSpace Test 2017-07-04 Merge changes for WLE Phase II theme rename (#329) -Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace -We can use PostgreSQL's extended output format (-x) plus sed to format the output into quasi XML: +Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace +We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML: "/> - + @@ -63,7 +63,7 @@ We can use PostgreSQL's extended output format (-x) plus sed to format the o - + @@ -111,7 +111,7 @@ We can use PostgreSQL's extended output format (-x) plus sed to format the o

      July, 2017

      @@ -122,19 +122,19 @@ We can use PostgreSQL's extended output format (-x) plus sed to format the o

      2017-07-04

      • Merge changes for WLE Phase II theme rename (#329)
      • -
      • Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace
      • -
      • We can use PostgreSQL's extended output format (-x) plus sed to format the output into quasi XML:
      • +
      • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
      • +
      • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
      $ psql dspacenew -x -c 'select element, qualifier, scope_note from metadatafieldregistry where metadata_schema_id=5 order by element, qualifier;' | sed -r 's:^-\[ RECORD (.*) \]-+$:</dc-type>\n<dc-type>\n<schema>cg</schema>:;s:([^ ]*) +\| (.*):  <\1>\2</\1>:;s:^$:</dc-type>:;1s:</dc-type>\n::'
       
      • The sed script is from a post on the PostgreSQL mailing list
      • -
      • Abenet says the ILRI board wants to be able to have “lead author” for every item, so I've whipped up a WIP test in the 5_x-lead-author branch
      • -
      • It works but is still very rough and we haven't thought out the whole lifecycle yet
      • +
      • Abenet says the ILRI board wants to be able to have “lead author” for every item, so I’ve whipped up a WIP test in the 5_x-lead-author branch
      • +
      • It works but is still very rough and we haven’t thought out the whole lifecycle yet

      Testing lead author in submission form

      • I assume that “lead author” would actually be the first question on the item submission form
      • -
      • We also need to check to see which ORCID authority core this uses, because it seems to be using an entirely new one rather than the one for dc.contributor.author (which makes sense of course, but fuck, all the author problems aren't bad enough?!)
      • +
      • We also need to check to see which ORCID authority core this uses, because it seems to be using an entirely new one rather than the one for dc.contributor.author (which makes sense of course, but fuck, all the author problems aren’t bad enough?!)
      • Also would need to edit XMLUI item displays to incorporate this into authors list
      • And fuck, then anyone consuming our data via REST / OAI will not notice that we have an author outside of dc.contributor.authors… ugh
      • What if we modify the item submission form to use type-bind fields to show/hide certain fields depending on the type?
      • @@ -152,8 +152,8 @@ We can use PostgreSQL's extended output format (-x) plus sed to format the o org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserved for non-replication superuser connections
      • Looking at the pg_stat_activity table I saw there were indeed 98 active connections to PostgreSQL, and at this time the limit is 100, so that makes sense
      • -
      • Tsega restarted Tomcat and it's working now
      • -
      • Abenet said she was generating a report with Atmire's CUA module, so it could be due to that?
      • +
      • Tsega restarted Tomcat and it’s working now
      • +
      • Abenet said she was generating a report with Atmire’s CUA module, so it could be due to that?
      • Looking in the logs I see this random error again that I should report to DSpace:
      2017-07-05 13:50:07,196 ERROR org.dspace.statistics.SolrLogger @ COUNTRY ERROR: EU
      @@ -171,7 +171,7 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
       

    2017-07-14

      -
    • Sisay sent me a patch to add “Photo Report” to dc.type so I've added it to the 5_x-prod branch
    • +
    • Sisay sent me a patch to add “Photo Report” to dc.type so I’ve added it to the 5_x-prod branch

    2017-07-17

      @@ -193,7 +193,7 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
    • Talk to Tsega and Danny about exporting/injesting the blog posts from Drupal into DSpace?
    • Followup meeting on August 8/9?
    • -
    • Sent Abenet the 2415 records from CGIAR Library's Historical Archive (10947/1) after cleaning up the author authorities and HTML entities in dc.contributor.author and dc.description.abstract using OpenRefine: +
    • Sent Abenet the 2415 records from CGIAR Library’s Historical Archive (10947/1) after cleaning up the author authorities and HTML entities in dc.contributor.author and dc.description.abstract using OpenRefine:
      • Authors: value.replace(/::\w{8}-\w{4}-\w{4}-\w{4}-\w{12}::600/,"")
      • Abstracts: replace(value,/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)\/?>/,'')
      • @@ -210,10 +210,10 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve

      2017-07-27

        -
      • Help Sisay with some transforms to add descriptions to the filename column of some CIAT Presentations he's working on in OpenRefine
      • +
      • Help Sisay with some transforms to add descriptions to the filename column of some CIAT Presentations he’s working on in OpenRefine
      • Marianne emailed a few days ago to ask why “Integrating Ecosystem Solutions” was not in the list of WLE Phase I Research Themes on the input form
      • I told her that I only added the themes that I saw in the WLE Phase I Research Themes community
      • -
      • Then Mia from WLE also emailed to ask where some WLE focal regions went, and I said I didn't understand what she was talking about, as all we did in our previous work was rename the old “Research Themes” subcommunity to “WLE Phase I Research Themes” and add a new subcommunity for “WLE Phase II Research Themes”.
      • +
      • Then Mia from WLE also emailed to ask where some WLE focal regions went, and I said I didn’t understand what she was talking about, as all we did in our previous work was rename the old “Research Themes” subcommunity to “WLE Phase I Research Themes” and add a new subcommunity for “WLE Phase II Research Themes”.
      • Discuss some modifications to the CCAFS project tags in CGSpace submission form and in the database

      2017-07-28

      @@ -228,7 +228,7 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve

      2017-07-30

      • Start working on CCAFS project tag cleanup
      • -
      • More questions about inconsistencies and spelling mistakes in their tags, so I've sent some questions for followup
      • +
      • More questions about inconsistencies and spelling mistakes in their tags, so I’ve sent some questions for followup

      2017-07-31

        diff --git a/docs/2017-08/index.html b/docs/2017-08/index.html index 6fb0bb62e..536c124b9 100644 --- a/docs/2017-08/index.html +++ b/docs/2017-08/index.html @@ -20,7 +20,7 @@ But many of the bots are browsing dynamic URLs like: The robots.txt only blocks the top-level /discover and /browse URLs… we will need to find a way to forbid them from accessing these! Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962 -It turns out that we're already adding the X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it! +It turns out that we’re already adding the X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it! Also, the bot has to successfully browse the page first so it can receive the HTTP header… We might actually have to block these requests with HTTP 403 depending on the user agent Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415 @@ -49,7 +49,7 @@ But many of the bots are browsing dynamic URLs like: The robots.txt only blocks the top-level /discover and /browse URLs… we will need to find a way to forbid them from accessing these! Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962 -It turns out that we're already adding the X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it! +It turns out that we’re already adding the X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it! Also, the bot has to successfully browse the page first so it can receive the HTTP header… We might actually have to block these requests with HTTP 403 depending on the user agent Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415 @@ -57,7 +57,7 @@ This was due to newline characters in the dc.description.abstract column, which I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet "/> - + @@ -87,7 +87,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s - + @@ -135,7 +135,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s

        August, 2017

        @@ -153,7 +153,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
      • The robots.txt only blocks the top-level /discover and /browse URLs… we will need to find a way to forbid them from accessing these!
      • Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962
      • -
      • It turns out that we're already adding the X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it!
      • +
      • It turns out that we’re already adding the X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it!
      • Also, the bot has to successfully browse the page first so it can receive the HTTP header…
      • We might actually have to block these requests with HTTP 403 depending on the user agent
      • Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415
      • @@ -164,9 +164,9 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s

        2017-08-02

        • Magdalena from CCAFS asked if there was a way to get the top ten items published in 2016 (note: not the top items in 2016!)
        • -
        • I think Atmire's Content and Usage Analysis module should be able to do this but I will have to look at the configuration and maybe email Atmire if I can't figure it out
        • -
        • I had a look at the moduel configuration and couldn't figure out a way to do this, so I opened a ticket on the Atmire tracker
        • -
        • Atmire responded about the missing workflow statistics issue a few weeks ago but I didn't see it for some reason
        • +
        • I think Atmire’s Content and Usage Analysis module should be able to do this but I will have to look at the configuration and maybe email Atmire if I can’t figure it out
        • +
        • I had a look at the moduel configuration and couldn’t figure out a way to do this, so I opened a ticket on the Atmire tracker
        • +
        • Atmire responded about the missing workflow statistics issue a few weeks ago but I didn’t see it for some reason
        • They said they added a publication and saw the workflow stat for the user, so I should try again and let them know

        2017-08-05

        @@ -176,17 +176,17 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s

      CIFOR OAI harvesting

        -
      • I don't see anything related in our logs, so I asked him to check for our server's IP in their logs
      • -
      • Also, in the mean time I stopped the harvesting process, reset the status, and restarted the process via the Admin control panel (note: I didn't reset the collection, just the harvester status!)
      • +
      • I don’t see anything related in our logs, so I asked him to check for our server’s IP in their logs
      • +
      • Also, in the mean time I stopped the harvesting process, reset the status, and restarted the process via the Admin control panel (note: I didn’t reset the collection, just the harvester status!)

      2017-08-07

        -
      • Apply Abenet's corrections for the CGIAR Library's Consortium subcommunity (697 records)
      • -
      • I had to fix a few small things, like moving the dc.title column away from the beginning of the row, delete blank spaces in the abstract in vim using :g/^$/d, add the dc.subject[en_US] column back, as she had deleted it and DSpace didn't detect the changes made there (we needed to blank the values instead)
      • +
      • Apply Abenet’s corrections for the CGIAR Library’s Consortium subcommunity (697 records)
      • +
      • I had to fix a few small things, like moving the dc.title column away from the beginning of the row, delete blank spaces in the abstract in vim using :g/^$/d, add the dc.subject[en_US] column back, as she had deleted it and DSpace didn’t detect the changes made there (we needed to blank the values instead)

      2017-08-08

        -
      • Apply Abenet's corrections for the CGIAR Library's historic archive subcommunity (2415 records)
      • +
      • Apply Abenet’s corrections for the CGIAR Library’s historic archive subcommunity (2415 records)
      • I had to add the dc.subject[en_US] column back with blank values so that DSpace could detect the changes
      • I applied the changes in 500 item batches
      @@ -196,13 +196,13 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
    • Help ICARDA upgrade their MELSpace to DSpace 5.7 using the docker-dspace container
      • We had to import the PostgreSQL dump to the PostgreSQL container using: pg_restore -U postgres -d dspace blah.dump
      • -
      • Otherwise, when using -O it messes up the permissions on the schema and DSpace can't read it
      • +
      • Otherwise, when using -O it messes up the permissions on the schema and DSpace can’t read it

    2017-08-10

      -
    • Apply last updates to the CGIAR Library's Fund community (812 items)
    • +
    • Apply last updates to the CGIAR Library’s Fund community (812 items)
    • Had to do some quality checks and column renames before importing, as either Sisay or Abenet renamed a few columns and the metadata importer wanted to remove/add new metadata for title, abstract, etc.
    • Also I applied the HTML entities unescape transform on the abstract column in Open Refine
    • I need to get an author list from the database for only the CGIAR Library community to send to Peter
    • @@ -243,7 +243,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s 85736 70.32.83.92
    • The top offender is 70.32.83.92 which is actually the same IP as ccafs.cgiar.org, so I will email the Macaroni Bros to see if they can test on DSpace Test instead
    • -
    • I've enabled logging of /oai requests on nginx as well so we can potentially determine bad actors here (also to see if anyone is actually using OAI!)
    • +
    • I’ve enabled logging of /oai requests on nginx as well so we can potentially determine bad actors here (also to see if anyone is actually using OAI!)
        # log oai requests
         location /oai {
    @@ -268,7 +268,7 @@ DELETE 1
     dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value='WSSD';
     
    • Generate a new list of authors from the CGIAR Library community for Peter to look through now that the initial corrections have been done
    • -
    • Thinking about resource limits for PostgreSQL again after last week's CGSpace crash and related to a recently discussion I had in the comments of the April, 2017 DCAT meeting notes
    • +
    • Thinking about resource limits for PostgreSQL again after last week’s CGSpace crash and related to a recently discussion I had in the comments of the April, 2017 DCAT meeting notes
    • In that thread Chris Wilper suggests a new default of 35 max connections for db.maxconnections (from the current default of 30), knowing that each DSpace web application gets to use up to this many on its own
    • It would be good to approximate what the theoretical maximum number of connections on a busy server would be, perhaps by looking to see which apps use SQL:
    @@ -283,21 +283,21 @@ $ grep -rsI SQLException dspace-solr | wc -l $ grep -rsI SQLException dspace-xmlui | wc -l 866

    PostgreSQL connections 2017-08

    @@ -320,7 +320,7 @@ $ grep -rsI SQLException dspace-xmlui | wc -l
    dspace=# update metadatavalue set text_lang='en_US' where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'abstract') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/93761', '10947/1', '10947/10', '10947/11', '10947/12', '10947/13', '10947/14', '10947/15', '10947/16', '10947/17', '10947/18', '10947/19', '10947/2', '10947/20', '10947/21', '10947/22', '10947/23', '10947/24', '10947/25', '10947/2512', '10947/2515', '10947/2516', '10947/2517', '10947/2518', '10947/2519', '10947/2520', '10947/2521', '10947/2522', '10947/2523', '10947/2524', '10947/2525', '10947/2526', '10947/2527', '10947/2528', '10947/2529', '10947/2530', '10947/2531', '10947/2532', '10947/2533', '10947/2534', '10947/2535', '10947/2536', '10947/2537', '10947/2538', '10947/2539', '10947/2540', '10947/2541', '10947/2589', '10947/26', '10947/2631', '10947/27', '10947/2708', '10947/2776', '10947/2782', '10947/2784', '10947/2786', '10947/2790', '10947/28', '10947/2805', '10947/2836', '10947/2871', '10947/2878', '10947/29', '10947/2900', '10947/2919', '10947/3', '10947/30', '10947/31', '10947/32', '10947/33', '10947/34', '10947/3457', '10947/35', '10947/36', '10947/37', '10947/38', '10947/39', '10947/4', '10947/40', '10947/4052', '10947/4054', '10947/4056', '10947/4068', '10947/41', '10947/42', '10947/43', '10947/4368', '10947/44', '10947/4467', '10947/45', '10947/4508', '10947/4509', '10947/4510', '10947/4573', '10947/46', '10947/4635', '10947/4636', '10947/4637', '10947/4638', '10947/4639', '10947/4651', '10947/4657', '10947/47', '10947/48', '10947/49', '10947/5', '10947/50', '10947/51', '10947/5308', '10947/5322', '10947/5324', '10947/5326', '10947/6', '10947/7', '10947/8', '10947/9')))
     
    dspace=# update metadatavalue set metadata_field_id = 219, text_lang = 'en_US' where resource_type_id = 2 AND metadata_field_id = 237;
     UPDATE 15
    @@ -339,8 +339,8 @@ UPDATE 4899
     
     
    isNotNull(value.match(/(CGIAR .+?)\|\|\1/))
     
      -
    • This would be true if the authors were like CGIAR System Management Office||CGIAR System Management Office, which some of the CGIAR Library's were
    • -
    • Unfortunately when you fix these in OpenRefine and then submit the metadata to DSpace it doesn't detect any changes, so you have to edit them all manually via DSpace's “Edit Item”
    • +
    • This would be true if the authors were like CGIAR System Management Office||CGIAR System Management Office, which some of the CGIAR Library’s were
    • +
    • Unfortunately when you fix these in OpenRefine and then submit the metadata to DSpace it doesn’t detect any changes, so you have to edit them all manually via DSpace’s “Edit Item”
    • Ooh! And an even more interesting regex would match any duplicated author:
    isNotNull(value.match(/(.+?)\|\|\1/))
    @@ -354,7 +354,7 @@ UPDATE 4899
     
     

    2017-08-17

      -
    • Run Peter's edits to the CGIAR System Organization community on DSpace Test
    • +
    • Run Peter’s edits to the CGIAR System Organization community on DSpace Test
    • Uptime Robot said CGSpace went down for 1 minute, not sure why
    • Looking in dspace.log.2017-08-17 I see some weird errors that might be related?
    @@ -386,7 +386,7 @@ dspace.log.2017-08-17:584
  • A few posts on the dspace-tech mailing list say this is related to the Cocoon cache somehow
  • I will clear the XMLUI cache for now and see if the errors continue (though perpaps shutting down Tomcat and removing the cache is more effective somehow?)
  • We tested the option for limiting restricted items from the RSS feeds on DSpace Test
  • -
  • I created four items, and only the two with public metadata showed up in the community's RSS feed: +
  • I created four items, and only the two with public metadata showed up in the community’s RSS feed:
    • Public metadata, public bitstream ✓
    • Public metadata, restricted bitstream ✓
    • @@ -394,7 +394,7 @@ dspace.log.2017-08-17:584
    • Private item ✗
  • -
  • Peter responded and said that he doesn't want to limit items to be restricted just so we can change the RSS feeds
  • +
  • Peter responded and said that he doesn’t want to limit items to be restricted just so we can change the RSS feeds
  • 2017-08-18

    $ ./sparql-query http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc
     sparql$ PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    @@ -442,7 +442,7 @@ WHERE {
     
     

    2017-08-20

      -
    • Since I cleared the XMLUI cache on 2017-08-17 there haven't been any more ERROR net.sf.ehcache.store.DiskStore errors
    • +
    • Since I cleared the XMLUI cache on 2017-08-17 there haven’t been any more ERROR net.sf.ehcache.store.DiskStore errors
    • Look at the CGIAR Library to see if I can find the items that have been submitted since May:
    dspace=# select * from metadatavalue where metadata_field_id=11 and date(text_value) > '2017-05-01T00:00:00Z';
    @@ -474,13 +474,13 @@ WHERE {
     

    2017-08-28

    • Bram had written to me two weeks ago to set up a chat about ORCID stuff but the email apparently bounced and I only found out when he emaiiled me on another account
    • -
    • I told him I can chat in a few weeks when I'm back
    • +
    • I told him I can chat in a few weeks when I’m back

    2017-08-31

    • I notice that in many WLE collections Marianne Gadeberg is in the edit or approval steps, but she is also in the groups for those steps.
    • I think we need to have a process to go back and check / fix some of these scenarios—to remove her user from the step and instead add her to the group—because we have way too many authorizations and in late 2016 we had performance issues with Solr because of this
    • -
    • I asked Sisay about this and hinted that he should go back and fix these things, but let's see what he says
    • +
    • I asked Sisay about this and hinted that he should go back and fix these things, but let’s see what he says
    • Saw CGSpace go down briefly today and noticed SQL connection pool errors in the dspace log file:
    ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error
    @@ -488,7 +488,7 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
     
    • Looking at the logs I see we have been having hundreds or thousands of these errors a few times per week in 2017-07 and almost every day in 2017-08
    • It seems that I changed the db.maxconnections setting from 70 to 40 around 2017-08-14, but Macaroni Bros also reduced their hourly hammering of the REST API then
    • -
    • Nevertheless, it seems like a connection limit is not enough and that I should increase it (as well as the system's PostgreSQL max_connections)
    • +
    • Nevertheless, it seems like a connection limit is not enough and that I should increase it (as well as the system’s PostgreSQL max_connections)
    diff --git a/docs/2017-09/index.html b/docs/2017-09/index.html index 315b5b58b..a6d29bbfe 100644 --- a/docs/2017-09/index.html +++ b/docs/2017-09/index.html @@ -12,7 +12,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two 2017-09-07 -Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group +Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group " /> @@ -27,9 +27,9 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two 2017-09-07 -Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group +Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group "/> - + @@ -59,7 +59,7 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is - + @@ -107,7 +107,7 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is

    September, 2017

    @@ -117,7 +117,7 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is

    2017-09-07

      -
    • Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group
    • +
    • Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group

    2017-09-10

      @@ -126,17 +126,17 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is
      dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
       DELETE 58
       
        -
      • I also ran it on DSpace Test because we'll be migrating the CGIAR Library soon and it would be good to catch these before we migrate
      • +
      • I also ran it on DSpace Test because we’ll be migrating the CGIAR Library soon and it would be good to catch these before we migrate
      • Run system updates and restart DSpace Test
      • We only have 7.7GB of free space on DSpace Test so I need to copy some data off of it before doing the CGIAR Library migration (requires lots of exporting and creating temp files)
      • -
      • I still have the original data from the CGIAR Library so I've zipped it up and sent it off to linode18 for now
      • +
      • I still have the original data from the CGIAR Library so I’ve zipped it up and sent it off to linode18 for now
      • sha256sum of original-cgiar-library-6.6GB.tar.gz is: bcfabb52f51cbdf164b61b7e9b3a0e498479e4c1ed1d547d32d11f44c0d5eb8a
      • Start doing a test run of the CGIAR Library migration locally
      • Notes and todo checklist here for now: https://gist.github.com/alanorth/3579b74e116ab13418d187ed379abd9c
      • Create pull request for Phase I and II changes to CCAFS Project Tags: #336
      • -
      • We've been discussing with Macaroni Bros and CCAFS for the past month or so and the list of tags was recently finalized
      • -
      • There will need to be some metadata updates — though if I recall correctly it is only about seven records — for that as well, I had made some notes about it in 2017-07, but I've asked for more clarification from Lili just in case
      • -
      • Looking at the DSpace logs to see if we've had a change in the “Cannot get a connection” errors since last month when we adjusted the db.maxconnections parameter on CGSpace:
      • +
      • We’ve been discussing with Macaroni Bros and CCAFS for the past month or so and the list of tags was recently finalized
      • +
      • There will need to be some metadata updates — though if I recall correctly it is only about seven records — for that as well, I had made some notes about it in 2017-07, but I’ve asked for more clarification from Lili just in case
      • +
      • Looking at the DSpace logs to see if we’ve had a change in the “Cannot get a connection” errors since last month when we adjusted the db.maxconnections parameter on CGSpace:
      # grep -c "Cannot get a connection, pool error Timeout waiting for idle object" dspace.log.2017-09-*
       dspace.log.2017-09-01:0
      @@ -150,11 +150,11 @@ dspace.log.2017-09-08:10
       dspace.log.2017-09-09:0
       dspace.log.2017-09-10:0
       
        -
      • Also, since last month (2017-08) Macaroni Bros no longer runs their REST API scraper every hour, so I'm sure that helped
      • +
      • Also, since last month (2017-08) Macaroni Bros no longer runs their REST API scraper every hour, so I’m sure that helped
      • There are still some errors, though, so maybe I should bump the connection limit up a bit
      • -
      • I remember seeing that Munin shows that the average number of connections is 50 (which is probably mostly from the XMLUI) and we're currently allowing 40 connections per app, so maybe it would be good to bump that value up to 50 or 60 along with the system's PostgreSQL max_connections (formula should be: webapps * 60 + 3, or 3 * 60 + 3 = 183 in our case)
      • +
      • I remember seeing that Munin shows that the average number of connections is 50 (which is probably mostly from the XMLUI) and we’re currently allowing 40 connections per app, so maybe it would be good to bump that value up to 50 or 60 along with the system’s PostgreSQL max_connections (formula should be: webapps * 60 + 3, or 3 * 60 + 3 = 183 in our case)
      • I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit)
      • -
      • I'm expecting to see 0 connection errors for the next few months
      • +
      • I’m expecting to see 0 connection errors for the next few months

      2017-09-11

        @@ -163,7 +163,7 @@ dspace.log.2017-09-10:0

      2017-09-12

        -
      • I was testing the METS XSD caching during AIP ingest but it doesn't seem to help actually
      • +
      • I was testing the METS XSD caching during AIP ingest but it doesn’t seem to help actually
      • The import process takes the same amount of time with and without the caching
      • Also, I captured TCP packets destined for port 80 and both imports only captured ONE packet (an update check from some component in Java):
      @@ -182,8 +182,8 @@ dspace.log.2017-09-10:0
    • I had a Skype call with Bram Luyten from Atmire to discuss various issues related to ORCID in DSpace
      • First, ORCID is deprecating their version 1 API (which DSpace uses) and in version 2 API they have removed the ability to search for users by name
      • -
      • The logic is that searching by name actually isn't very useful because ORCID is essentially a global phonebook and there are tons of legitimately duplicate and ambiguous names
      • -
      • Atmire's proposed integration would work by having users lookup and add authors to the authority core directly using their ORCID ID itself (this would happen during the item submission process or perhaps as a standalone / batch process, for example to populate the authority core with a list of known ORCIDs)
      • +
      • The logic is that searching by name actually isn’t very useful because ORCID is essentially a global phonebook and there are tons of legitimately duplicate and ambiguous names
      • +
      • Atmire’s proposed integration would work by having users lookup and add authors to the authority core directly using their ORCID ID itself (this would happen during the item submission process or perhaps as a standalone / batch process, for example to populate the authority core with a list of known ORCIDs)
      • Once the association between name and ORCID is made in the authority then it can be autocompleted in the lookup field
      • Ideally there could also be a user interface for cleanup and merging of authorities
      • He will prepare a quote for us with keeping in mind that this could be useful to contribute back to the community for a 5.x release
      • @@ -194,8 +194,8 @@ dspace.log.2017-09-10:0

        2017-09-13

        • Last night Linode sent an alert about CGSpace (linode18) that it has exceeded the outbound traffic rate threshold of 10Mb/s for the last two hours
        • -
        • I wonder what was going on, and looking into the nginx logs I think maybe it's OAI…
        • -
        • Here is yesterday's top ten IP addresses making requests to /oai:
        • +
        • I wonder what was going on, and looking into the nginx logs I think maybe it’s OAI…
        • +
        • Here is yesterday’s top ten IP addresses making requests to /oai:
        # awk '{print $1}' /var/log/nginx/oai.log | sort -n | uniq -c | sort -h | tail -n 10
               1 213.136.89.78
        @@ -208,7 +208,7 @@ dspace.log.2017-09-10:0
           15825 35.161.215.53
           16704 54.70.51.7
         
          -
        • Compared to the previous day's logs it looks VERY high:
        • +
        • Compared to the previous day’s logs it looks VERY high:
        # awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
               1 207.46.13.39
        @@ -260,7 +260,7 @@ dspace.log.2017-09-10:0
         /var/log/nginx/oai.log.8.gz:0
         /var/log/nginx/oai.log.9.gz:0
         
          -
        • Some of these heavy users are also using XMLUI, and their user agent isn't matched by the Tomcat Session Crawler valve, so each request uses a different session
        • +
        • Some of these heavy users are also using XMLUI, and their user agent isn’t matched by the Tomcat Session Crawler valve, so each request uses a different session
        • Yesterday alone the IP addresses using the API scraper user agent were responsible for 16,000 sessions in XMLUI:
        # grep -a -E "(54.70.51.7|35.161.215.53|34.211.17.113|54.70.175.86)" /home/cgspace.cgiar.org/log/dspace.log.2017-09-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
        @@ -273,7 +273,7 @@ dspace.log.2017-09-10:0
         
        WARN  org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
         
        • Looking at the spreadsheet with deletions and corrections that CCAFS sent last week
        • -
        • It appears they want to delete a lot of metadata, which I'm not sure they realize the implications of:
        • +
        • It appears they want to delete a lot of metadata, which I’m not sure they realize the implications of:
        dspace=# select text_value, count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange') group by text_value;                                                                                                                                                                                                                  
                 text_value        | count                              
        @@ -300,12 +300,12 @@ dspace.log.2017-09-10:0
         (19 rows)
         
        • I sent CCAFS people an email to ask if they really want to remove these 200+ tags
        • -
        • She responded yes, so I'll at least need to do these deletes in PostgreSQL:
        • +
        • She responded yes, so I’ll at least need to do these deletes in PostgreSQL:
        dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
         DELETE 207
         
          -
        • When we discussed this in late July there were some other renames they had requested, but I don't see them in the current spreadsheet so I will have to follow that up
        • +
        • When we discussed this in late July there were some other renames they had requested, but I don’t see them in the current spreadsheet so I will have to follow that up
        • I talked to Macaroni Bros and they said to just go ahead with the other corrections as well as their spreadsheet was evolved organically rather than systematically!
        • The final list of corrections and deletes should therefore be:
        @@ -319,7 +319,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
      • Although it looks like there was a previous attempt to disable these update checks that was merged in DSpace 4.0 (although it only affects XMLUI): https://jira.duraspace.org/browse/DS-1492
      • I commented there suggesting that we disable it globally
      • I merged the changes to the CCAFS project tags (#336) but still need to finalize the metadata deletions/renames
      • -
      • I merged the CGIAR Library theme changes (#338) to the 5_x-prod branch in preparation for next week's migration
      • +
      • I merged the CGIAR Library theme changes (#338) to the 5_x-prod branch in preparation for next week’s migration
      • I emailed the Handle administrators (hdladmin@cnri.reston.va.us) to ask them what the process for changing their prefix to be resolved by our resolver
      • They responded and said that they need email confirmation from the contact of record of the other prefix, so I should have the CGIAR System Organization people email them before I send the new sitebndl.zip
      • Testing to see how we end up with all these new authorities after we keep cleaning and merging them in the database
      • @@ -354,7 +354,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134 Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 | 600 (9 rows)
          -
        • It created a new authority… let's try to add another item and select the same existing author and see what happens in the database:
        • +
        • It created a new authority… let’s try to add another item and select the same existing author and see what happens in the database:
        dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
          text_value |              authority               | confidence 
        @@ -387,7 +387,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
          Orth, Alan | 67a9588f-d86a-4155-81a2-af457e9d13f9 |        600
         (10 rows)
         
          -
        • Shit, it created another authority! Let's try it again!
        • +
        • Shit, it created another authority! Let’s try it again!
        dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';                                                                                             
          text_value |              authority               | confidence
        @@ -413,7 +413,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
         
      • Michael Marus is the contact for their prefix but he has left CGIAR, but as I actually have access to the CGIAR Library server I think I can just generate a new sitebndl.zip file from their server and send it to Handle.net
      • Also, Handle.net says their prefix is up for annual renewal next month so we might want to just pay for it and take it over
      • CGSpace was very slow and Uptime Robot even said it was down at one time
      • -
      • I didn't see any abnormally high usage in the REST or OAI logs, but looking at Munin I see the average JVM usage was at 4.9GB and the heap is only 5GB (5120M), so I think it's just normal growing pains
      • +
      • I didn’t see any abnormally high usage in the REST or OAI logs, but looking at Munin I see the average JVM usage was at 4.9GB and the heap is only 5GB (5120M), so I think it’s just normal growing pains
      • Every few months I generally try to increase the JVM heap to be 512M higher than the average usage reported by Munin, so now I adjusted it to 5632M

      2017-09-15

      @@ -480,16 +480,16 @@ DELETE 207
    • Abenet wants to be able to filter by ISI Journal in advanced search on queries like this: https://cgspace.cgiar.org/discover?filtertype_0=dateIssued&filtertype_1=dateIssued&filter_relational_operator_1=equals&filter_relational_operator_0=equals&filter_1=%5B2010+TO+2017%5D&filter_0=2017&filtertype=type&filter_relational_operator=equals&filter=Journal+Article
    • I opened an issue to track this (#340) and will test it on DSpace Test soon
    • Marianne Gadeberg from WLE asked if I would add an account for Adam Hunt on CGSpace and give him permissions to approve all WLE publications
    • -
    • I told him to register first, as he's a CGIAR user and needs an account to be created before I can add him to the groups
    • +
    • I told him to register first, as he’s a CGIAR user and needs an account to be created before I can add him to the groups

    2017-09-20

    • Abenet and I noticed that hdl.handle.net is blocked by ETC at ILRI Addis so I asked Biruk Debebe to route it over the satellite
    • -
    • Force thumbnail regeneration for the CGIAR System Organization's Historic Archive community (2000 items):
    • +
    • Force thumbnail regeneration for the CGIAR System Organization’s Historic Archive community (2000 items):
    $ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -f -i 10947/1 -p "ImageMagick PDF Thumbnail"
     
      -
    • I'm still waiting (over 1 day later) to hear back from the CGIAR System Organization about updating the DNS for library.cgiar.org
    • +
    • I’m still waiting (over 1 day later) to hear back from the CGIAR System Organization about updating the DNS for library.cgiar.org

    2017-09-21

      @@ -507,29 +507,29 @@ DELETE 207
      • Start investigating other platforms for CGSpace due to linear instance pricing on Linode
      • We need to figure out how much memory is used by applications, caches, etc, and how much disk space the asset store needs
      • -
      • First, here's the last week of memory usage on CGSpace and DSpace Test:
      • +
      • First, here’s the last week of memory usage on CGSpace and DSpace Test:

      CGSpace memory week DSpace Test memory week

        -
      • 8GB of RAM seems to be good for DSpace Test for now, with Tomcat's JVM heap taking 3GB, caches and buffers taking 3–4GB, and then ~1GB unused
      • -
      • 24GB of RAM is way too much for CGSpace, with Tomcat's JVM heap taking 5.5GB and caches and buffers happily using 14GB or so
      • +
      • 8GB of RAM seems to be good for DSpace Test for now, with Tomcat’s JVM heap taking 3GB, caches and buffers taking 3–4GB, and then ~1GB unused
      • +
      • 24GB of RAM is way too much for CGSpace, with Tomcat’s JVM heap taking 5.5GB and caches and buffers happily using 14GB or so
      • As far as disk space, the CGSpace assetstore currently uses 51GB and Solr cores use 86GB (mostly in the statistics core)
      • -
      • DSpace Test currently doesn't even have enough space to store a full copy of CGSpace, as its Linode instance only has 96GB of disk space
      • -
      • I've heard Google Cloud is nice (cheap and performant) but it's definitely more complicated than Linode and instances aren't that much cheaper to make it worth it
      • +
      • DSpace Test currently doesn’t even have enough space to store a full copy of CGSpace, as its Linode instance only has 96GB of disk space
      • +
      • I’ve heard Google Cloud is nice (cheap and performant) but it’s definitely more complicated than Linode and instances aren’t that much cheaper to make it worth it
      • Here are some theoretical instances on Google Cloud:
        • DSpace Test, n1-standard-2 with 2 vCPUs, 7.5GB RAM, 300GB persistent SSD: $99/month
        • CGSpace, n1-standard-4 with 4 vCPUs, 15GB RAM, 300GB persistent SSD: $148/month
      • -
      • Looking at Linode's instance pricing, for DSpace Test it seems we could use the same 8GB instance for $40/month, and then add block storage of ~300GB for $30 (block storage is currently in beta and priced at $0.10/GiB)
      • +
      • Looking at Linode’s instance pricing, for DSpace Test it seems we could use the same 8GB instance for $40/month, and then add block storage of ~300GB for $30 (block storage is currently in beta and priced at $0.10/GiB)
      • For CGSpace we could use the cheaper 12GB instance for $80 and then add block storage of 500GB for $50
      • -
      • I've sent Peter a message about moving DSpace Test to the New Jersey data center so we can test the block storage beta
      • +
      • I’ve sent Peter a message about moving DSpace Test to the New Jersey data center so we can test the block storage beta
      • Create pull request for adding ISI Journal to search filters (#341)
      • Peter asked if we could map all the items of type Journal Article in ILRI Archive to ILRI articles in journals and newsletters
      • It is easy to do via CSV using OpenRefine but I noticed that on CGSpace ~1,000 of the expected 2,500 are already mapped, while on DSpace Test they were not
      • -
      • I've asked Peter if he knows what's going on (or who mapped them)
      • +
      • I’ve asked Peter if he knows what’s going on (or who mapped them)
      • Turns out he had already mapped some, but requested that I finish the rest
      • With this GREL in OpenRefine I can find items that are mapped, ie they have 10568/3|| or 10568/3$ in their collection field:
      @@ -543,7 +543,7 @@ DELETE 207
      • Email Rosemary Kande from ICT to ask about the administrative / finance procedure for moving DSpace Test from EU to US region on Linode
      • Communicate (finally) with Tania and Tunji from the CGIAR System Organization office to tell them to request CGNET make the DNS updates for library.cgiar.org
      • -
      • Peter wants me to clean up the text values for Delia Grace's metadata, as the authorities are all messed up again since we cleaned them up in 2016-12:
      • +
      • Peter wants me to clean up the text values for Delia Grace’s metadata, as the authorities are all messed up again since we cleaned them up in 2016-12:
      dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';                                  
         text_value  |              authority               | confidence              
      @@ -554,7 +554,7 @@ DELETE 207
        Grace, D.    | 6a8ddca3-33c1-45f9-aa00-6fa9fc91e3fc |         -1
       
      • Strangely, none of her authority entries have ORCIDs anymore…
      • -
      • I'll just fix the text values and forget about it for now:
      • +
      • I’ll just fix the text values and forget about it for now:
      dspace=# update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
       UPDATE 610
      @@ -593,24 +593,24 @@ real    6m6.447s
       user    1m34.010s
       sys     0m12.113s
       
        -
      • The index-authority script always seems to fail, I think it's the same old bug
      • -
      • Something interesting for my notes about JNDI database pool—since I couldn't determine if it was working or not when I tried it locally the other day—is this error message that I just saw in the DSpace logs today:
      • +
      • The index-authority script always seems to fail, I think it’s the same old bug
      • +
      • Something interesting for my notes about JNDI database pool—since I couldn’t determine if it was working or not when I tried it locally the other day—is this error message that I just saw in the DSpace logs today:
      ERROR org.dspace.storage.rdbms.DatabaseManager @ Error retrieving JNDI context: jdbc/dspaceLocal
       ...
       INFO  org.dspace.storage.rdbms.DatabaseManager @ Unable to locate JNDI dataSource: jdbc/dspaceLocal
       INFO  org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Database pool
       
        -
      • So it's good to know that something gets printed when it fails because I didn't see any mention of JNDI before when I was testing!
      • +
      • So it’s good to know that something gets printed when it fails because I didn’t see any mention of JNDI before when I was testing!

      2017-09-26

      • Adam Hunt from WLE finally registered so I added him to the editor and approver groups
      • -
      • Then I noticed that Sisay never removed Marianne's user accounts from the approver steps in the workflow because she is already in the WLE groups, which are in those steps
      • -
      • For what it's worth, I had asked him to remove them on 2017-09-14
      • +
      • Then I noticed that Sisay never removed Marianne’s user accounts from the approver steps in the workflow because she is already in the WLE groups, which are in those steps
      • +
      • For what it’s worth, I had asked him to remove them on 2017-09-14
      • I also went and added the WLE approvers and editors groups to the appropriate steps of all the Phase I and Phase II research theme collections
      • -
      • A lot of CIAT's items have manually generated thumbnails which have an incorrect aspect ratio and an ugly black border
      • -
      • I communicated with Elizabeth from CIAT to tell her she should use DSpace's automatically generated thumbnails
      • +
      • A lot of CIAT’s items have manually generated thumbnails which have an incorrect aspect ratio and an ugly black border
      • +
      • I communicated with Elizabeth from CIAT to tell her she should use DSpace’s automatically generated thumbnails
      • Start discussiong with ICT about Linode server update for DSpace Test
      • Rosemary said I need to work with Robert Okal to destroy/create the server, and then let her and Lilian Masigah from finance know the updated Linode asset names for their records
      @@ -618,7 +618,7 @@ INFO org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Da
      • Tunji from the System Organization finally sent the DNS request for library.cgiar.org to CGNET
      • Now the redirects work
      • -
      • I quickly registered a Let's Encrypt certificate for the domain:
      • +
      • I quickly registered a Let’s Encrypt certificate for the domain:
      # systemctl stop nginx
       # /opt/certbot-auto certonly --standalone --email aorth@mjanja.ch -d library.cgiar.org
      diff --git a/docs/2017-10/index.html b/docs/2017-10/index.html
      index 5e9be4429..732b802c2 100644
      --- a/docs/2017-10/index.html
      +++ b/docs/2017-10/index.html
      @@ -12,7 +12,7 @@ Peter emailed to point out that many items in the ILRI archive collection have m
       
       http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
       
      -There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
      +There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
       Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
       " />
       
      @@ -28,10 +28,10 @@ Peter emailed to point out that many items in the ILRI archive collection have m
       
       http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
       
      -There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
      +There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
       Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
       "/>
      -
      +
       
       
           
      @@ -61,7 +61,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
           
           
           
      -    
      +    
           
       
           
      @@ -108,7 +108,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
         

      October, 2017

      @@ -119,7 +119,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
    http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
     
      -
    • There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
    • +
    • There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
    • Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections

    2017-10-02

    @@ -130,13 +130,13 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
    2017-10-01 20:24:57,928 WARN  org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:ldap_attribute_lookup:type=failed_search javax.naming.CommunicationException\colon; svcgroot2.cgiarad.org\colon;3269 [Root exception is java.net.ConnectException\colon; Connection timed out (Connection timed out)]
     2017-10-01 20:22:37,982 INFO  org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:failed_login:no DN found for user pballantyne
     
      -
    • I thought maybe his account had expired (seeing as it's was the first of the month) but he says he was finally able to log in today
    • +
    • I thought maybe his account had expired (seeing as it’s was the first of the month) but he says he was finally able to log in today
    • The logs for yesterday show fourteen errors related to LDAP auth failures:
    $ grep -c "ldap_authentication:type=failed_auth" dspace.log.2017-10-01
     14
     
      -
    • For what it's worth, there are no errors on any other recent days, so it must have been some network issue on Linode or CGNET's LDAP server
    • +
    • For what it’s worth, there are no errors on any other recent days, so it must have been some network issue on Linode or CGNET’s LDAP server
    • Linode emailed to say that linode578611 (DSpace Test) needs to migrate to a new host for a security update so I initiated the migration immediately rather than waiting for the scheduled time in two weeks

    2017-10-04

    @@ -147,7 +147,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
    http://library.cgiar.org/browse?value=Intellectual%20Assets%20Reports&type=subject → https://cgspace.cgiar.org/browse?value=Intellectual%20Assets%20Reports&type=subject
     
      -
    • We'll need to check for browse links and handle them properly, including swapping the subject parameter for systemsubject (which doesn't exist in Discovery yet, but we'll need to add it) as we have moved their poorly curated subjects from dc.subject to cg.subject.system
    • +
    • We’ll need to check for browse links and handle them properly, including swapping the subject parameter for systemsubject (which doesn’t exist in Discovery yet, but we’ll need to add it) as we have moved their poorly curated subjects from dc.subject to cg.subject.system
    • The second link was a direct link to a bitstream which has broken due to the sequence being updated, so I told him he should link to the handle of the item instead
    • Help Sisay proof sixty-two IITA records on DSpace Test
    • Lots of inconsistencies and errors in subjects, dc.format.extent, regions, countries
    • @@ -155,8 +155,8 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG

    2017-10-05

      -
    • Twice in the past twenty-four hours Linode has warned that CGSpace's outbound traffic rate was exceeding the notification threshold
    • -
    • I had a look at yesterday's OAI and REST logs in /var/log/nginx but didn't see anything unusual:
    • +
    • Twice in the past twenty-four hours Linode has warned that CGSpace’s outbound traffic rate was exceeding the notification threshold
    • +
    • I had a look at yesterday’s OAI and REST logs in /var/log/nginx but didn’t see anything unusual:
    # awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 10
         141 157.55.39.240
    @@ -183,7 +183,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
     
    • Working on the nginx redirects for CGIAR Library
    • We should start using 301 redirects and also allow for /sitemap to work on the library.cgiar.org domain so the CGIAR System Organization people can update their Google Search Console and allow Google to find their content in a structured way
    • -
    • Remove eleven occurrences of ACP in IITA's cg.coverage.region using the Atmire batch edit module from Discovery
    • +
    • Remove eleven occurrences of ACP in IITA’s cg.coverage.region using the Atmire batch edit module from Discovery
    • Need to investigate how we can verify the library.cgiar.org using the HTML or DNS methods
    • Run corrections on 143 ILRI Archive items that had two dc.identifier.uri values (Handle) that Peter had pointed out earlier this week
    • I used OpenRefine to isolate them and then fixed and re-imported them into CGSpace
    • @@ -197,7 +197,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG

      Original flat thumbnails Tweaked with border and box shadow

        -
      • I'll post it to the Yammer group to see what people think
      • +
      • I’ll post it to the Yammer group to see what people think
      • I figured out at way to do the HTML verification for Google Search console for library.cgiar.org
      • We can drop the HTML file in their XMLUI theme folder and it will get copied to the webapps directory during build/install
      • Then we add an nginx alias for that URL in the library.cgiar.org vhost
      • @@ -213,7 +213,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG Google Search Console 2 Google Search results

          -
        • I tried to submit a “Change of Address” request in the Google Search Console but I need to be an owner on CGSpace's console (currently I'm just a user) in order to do that
        • +
        • I tried to submit a “Change of Address” request in the Google Search Console but I need to be an owner on CGSpace’s console (currently I’m just a user) in order to do that
        • Manually clean up some communities and collections that Peter had requested a few weeks ago
        • Delete Community 10568/102 (ILRI Research and Development Issues)
        • Move five collections to 10568/27629 (ILRI Projects) using move-collections.sh with the following configuration:
        • @@ -233,8 +233,8 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG

        Change of Address error

          -
        • We are sending top-level CGIAR Library traffic to their specific community hierarchy in CGSpace so this type of change of address won't work—we'll just need to wait for Google to slowly index everything and take note of the HTTP 301 redirects
        • -
        • Also the Google Search Console doesn't work very well with Google Analytics being blocked, so I had to turn off my ad blocker to get the “Change of Address” tool to work!
        • +
        • We are sending top-level CGIAR Library traffic to their specific community hierarchy in CGSpace so this type of change of address won’t work—we’ll just need to wait for Google to slowly index everything and take note of the HTTP 301 redirects
        • +
        • Also the Google Search Console doesn’t work very well with Google Analytics being blocked, so I had to turn off my ad blocker to get the “Change of Address” tool to work!

        2017-10-12

          @@ -245,7 +245,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
          • Run system updates on DSpace Test and reboot server
          • Merge changes adding a search/browse index for CGIAR System subject to 5_x-prod (#344)
          • -
          • I checked the top browse links in Google's search results for site:library.cgiar.org inurl:browse and they are all redirected appropriately by the nginx rewrites I worked on last week
          • +
          • I checked the top browse links in Google’s search results for site:library.cgiar.org inurl:browse and they are all redirected appropriately by the nginx rewrites I worked on last week

          2017-10-22

            @@ -256,12 +256,12 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG

          2017-10-26

            -
          • In the last 24 hours we've gotten a few alerts from Linode that there was high CPU and outgoing traffic on CGSpace
          • +
          • In the last 24 hours we’ve gotten a few alerts from Linode that there was high CPU and outgoing traffic on CGSpace
          • Uptime Robot even noticed CGSpace go “down” for a few minutes
          • In other news, I was trying to look at a question about stats raised by Magdalena and then CGSpace went down due to SQL connection pool
          • Looking at the PostgreSQL activity I see there are 93 connections, but after a minute or two they went down and CGSpace came back up
          • Annnd I reloaded the Atmire Usage Stats module and the connections shot back up and CGSpace went down again
          • -
          • Still not sure where the load is coming from right now, but it's clear why there were so many alerts yesterday on the 25th!
          • +
          • Still not sure where the load is coming from right now, but it’s clear why there were so many alerts yesterday on the 25th!
          # grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2017-10-25 | sort -n | uniq | wc -l
           18022
          @@ -274,12 +274,12 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
           7851
           
          • I still have no idea what was causing the load to go up today
          • -
          • I finally investigated Magdalena's issue with the item download stats and now I can't reproduce it: I get the same number of downloads reported in the stats widget on the item page, the “Most Popular Items” page, and in Usage Stats
          • +
          • I finally investigated Magdalena’s issue with the item download stats and now I can’t reproduce it: I get the same number of downloads reported in the stats widget on the item page, the “Most Popular Items” page, and in Usage Stats
          • I think it might have been an issue with the statistics not being fresh
          • I added the admin group for the systems organization to the admin role of the top-level community of CGSpace because I guess Sisay had forgotten
          • Magdalena asked if there was a way to reuse data in item submissions where items have a lot of similar data
          • I told her about the possibility to use per-collection item templates, and asked if her items in question were all from a single collection
          • -
          • We've never used it but it could be worth looking at
          • +
          • We’ve never used it but it could be worth looking at

          2017-10-27

            @@ -292,24 +292,24 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG

            2017-10-29

            • Linode alerted about high CPU usage again on CGSpace around 2AM and 4AM
            • -
            • I'm still not sure why this started causing alerts so repeatadely the past week
            • -
            • I don't see any tell tale signs in the REST or OAI logs, so trying to do rudimentary analysis in DSpace logs:
            • +
            • I’m still not sure why this started causing alerts so repeatadely the past week
            • +
            • I don’t see any tell tale signs in the REST or OAI logs, so trying to do rudimentary analysis in DSpace logs:
            # grep '2017-10-29 02:' dspace.log.2017-10-29 | grep -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
             2049
             
            • So there were 2049 unique sessions during the hour of 2AM
            • Looking at my notes, the number of unique sessions was about the same during the same hour on other days when there were no alerts
            • -
            • I think I'll need to enable access logging in nginx to figure out what's going on
            • -
            • After enabling logging on requests to XMLUI on / I see some new bot I've never seen before:
            • +
            • I think I’ll need to enable access logging in nginx to figure out what’s going on
            • +
            • After enabling logging on requests to XMLUI on / I see some new bot I’ve never seen before:
            137.108.70.6 - - [29/Oct/2017:07:39:49 +0000] "GET /discover?filtertype_0=type&filter_relational_operator_0=equals&filter_0=Internal+Document&filtertype=author&filter_relational_operator=equals&filter=CGIAR+Secretariat HTTP/1.1" 200 7776 "-" "Mozilla/5.0 (compatible; CORE/0.6; +http://core.ac.uk; http://core.ac.uk/intro/contact)"
             
            • CORE seems to be some bot that is “Aggregating the world’s open access research papers”
            • -
            • The contact address listed in their bot's user agent is incorrect, correct page is simply: https://core.ac.uk/contact
            • -
            • I will check the logs in a few days to see if they are harvesting us regularly, then add their bot's user agent to the Tomcat Crawler Session Valve
            • +
            • The contact address listed in their bot’s user agent is incorrect, correct page is simply: https://core.ac.uk/contact
            • +
            • I will check the logs in a few days to see if they are harvesting us regularly, then add their bot’s user agent to the Tomcat Crawler Session Valve
            • After browsing the CORE site it seems that the CGIAR Library is somehow a member of CORE, so they have probably only been harvesting CGSpace since we did the migration, as library.cgiar.org directs to us now
            • -
            • For now I will just contact them to have them update their contact info in the bot's user agent, but eventually I think I'll tell them to swap out the CGIAR Library entry for CGSpace
            • +
            • For now I will just contact them to have them update their contact info in the bot’s user agent, but eventually I think I’ll tell them to swap out the CGIAR Library entry for CGSpace

            2017-10-30

              @@ -333,7 +333,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG 137.108.70.6 137.108.70.7
      -
    • I will add their user agent to the Tomcat Session Crawler Valve but it won't help much because they are only using two sessions:
    • +
    • I will add their user agent to the Tomcat Session Crawler Valve but it won’t help much because they are only using two sessions:
    # grep 137.108.70 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq
     session_id=5771742CABA3D0780860B8DA81E0551B
    @@ -346,7 +346,7 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
     # grep 137.108.70 /var/log/nginx/access.log | grep -c "GET /discover"
     24055
     
      -
    • Just because I'm curious who the top IPs are:
    • +
    • Just because I’m curious who the top IPs are:
    # awk '{print $1}' /var/log/nginx/access.log | sort -n | uniq -c | sort -h | tail
         496 62.210.247.93
    @@ -362,7 +362,7 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
     
    • At least we know the top two are CORE, but who are the others?
    • 190.19.92.5 is apparently in Argentina, and 104.196.152.243 is from Google Cloud Engine
    • -
    • Actually, these two scrapers might be more responsible for the heavy load than the CORE bot, because they don't reuse their session variable, creating thousands of new sessions!
    • +
    • Actually, these two scrapers might be more responsible for the heavy load than the CORE bot, because they don’t reuse their session variable, creating thousands of new sessions!
    # grep 190.19.92.5 dspace.log.2017-10-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
     1419
    @@ -372,7 +372,7 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
     
  • From looking at the requests, it appears these are from CIAT and CCAFS
  • I wonder if I could somehow instruct them to use a user agent so that we could apply a crawler session manager valve to them
  • Actually, according to the Tomcat docs, we could use an IP with crawlerIps: https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve
  • -
  • Ah, wait, it looks like crawlerIps only came in 2017-06, so probably isn't in Ubuntu 16.04's 7.0.68 build!
  • +
  • Ah, wait, it looks like crawlerIps only came in 2017-06, so probably isn’t in Ubuntu 16.04’s 7.0.68 build!
  • That would explain the errors I was getting when trying to set it:
  • WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Valve} Setting property 'crawlerIps' to '190\.19\.92\.5|104\.196\.152\.243' did not find a matching property.
    @@ -389,14 +389,14 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
     

    2017-10-31

    • Very nice, Linode alerted that CGSpace had high CPU usage at 2AM again
    • -
    • Ask on the dspace-tech mailing list if it's possible to use an existing item as a template for a new item
    • +
    • Ask on the dspace-tech mailing list if it’s possible to use an existing item as a template for a new item
    • To follow up on the CORE bot traffic, there were almost 300,000 request yesterday:
    # grep "CORE/0.6" /var/log/nginx/access.log.1 | awk '{print $1}' | sort -n | uniq -c | sort -h
      139109 137.108.70.6
      139253 137.108.70.7
     
      -
    • I've emailed the CORE people to ask if they can update the repository information from CGIAR Library to CGSpace
    • +
    • I’ve emailed the CORE people to ask if they can update the repository information from CGIAR Library to CGSpace
    • Also, I asked if they could perhaps use the sitemap.xml, OAI-PMH, or REST APIs to index us more efficiently, because they mostly seem to be crawling the nearly endless Discovery facets
    • I added GoAccess to the list of package to install in the DSpace role of the Ansible infrastructure scripts
    • It makes it very easy to analyze nginx logs from the command line, to see where traffic is coming from:
    • @@ -406,14 +406,14 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
    • According to Uptime Robot CGSpace went down and up a few times
    • I had a look at goaccess and I saw that CORE was actively indexing
    • Also, PostgreSQL connections were at 91 (with the max being 60 per web app, hmmm)
    • -
    • I'm really starting to get annoyed with these guys, and thinking about blocking their IP address for a few days to see if CGSpace becomes more stable
    • -
    • Actually, come to think of it, they aren't even obeying robots.txt, because we actually disallow /discover and /search-filter URLs but they are hitting those massively:
    • +
    • I’m really starting to get annoyed with these guys, and thinking about blocking their IP address for a few days to see if CGSpace becomes more stable
    • +
    • Actually, come to think of it, they aren’t even obeying robots.txt, because we actually disallow /discover and /search-filter URLs but they are hitting those massively:
    # grep "CORE/0.6" /var/log/nginx/access.log | grep -o -E "GET /(discover|search-filter)" | sort -n | uniq -c | sort -rn 
      158058 GET /discover
       14260 GET /search-filter
     
      -
    • I tested a URL of pattern /discover in Google's webmaster tools and it was indeed identified as blocked
    • +
    • I tested a URL of pattern /discover in Google’s webmaster tools and it was indeed identified as blocked
    • I will send feedback to the CORE bot team
    diff --git a/docs/2017-11/index.html b/docs/2017-11/index.html index 90438795f..00db8fa24 100644 --- a/docs/2017-11/index.html +++ b/docs/2017-11/index.html @@ -45,7 +45,7 @@ Generate list of authors on CGSpace for Peter to go through and correct: dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; COPY 54701 "/> - + @@ -75,7 +75,7 @@ COPY 54701 - + @@ -122,7 +122,7 @@ COPY 54701

    November, 2017

    @@ -160,15 +160,15 @@ COPY 54701

    2017-11-03

    • Atmire got back to us to say that they estimate it will take two days of labor to implement the change to Listings and Reports
    • -
    • I said I'd ask Abenet if she wants that feature
    • +
    • I said I’d ask Abenet if she wants that feature

    2017-11-04

      -
    • I finished looking through Sisay's CIAT records for the “Alianzas de Aprendizaje” data
    • +
    • I finished looking through Sisay’s CIAT records for the “Alianzas de Aprendizaje” data
    • I corrected about half of the authors to standardize them
    • Linode emailed this morning to say that the CPU usage was high again, this time at 6:14AM
    • -
    • It's the first time in a few days that this has happened
    • -
    • I had a look to see what was going on, but it isn't the CORE bot:
    • +
    • It’s the first time in a few days that this has happened
    • +
    • I had a look to see what was going on, but it isn’t the CORE bot:
    # awk '{print $1}' /var/log/nginx/access.log | sort -n | uniq -c | sort -h | tail
         306 68.180.229.31
    @@ -193,11 +193,11 @@ COPY 54701
     /var/log/nginx/access.log.5.gz:0
     /var/log/nginx/access.log.6.gz:0
     
      -
    • It's clearly a bot as it's making tens of thousands of requests, but it's using a “normal” user agent:
    • +
    • It’s clearly a bot as it’s making tens of thousands of requests, but it’s using a “normal” user agent:
    Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36
     
      -
    • For now I don't know what this user is!
    • +
    • For now I don’t know what this user is!

    2017-11-05

      @@ -222,8 +222,8 @@ COPY 54701 International Livestock Research Institute | 8f3865dc-d056-4aec-90b7-77f49ab4735c | 500 (8 rows)
      -
    • So I'm not sure if this is just a graphical glitch or if editors have to edit this metadata field prior to approval
    • -
    • Looking at monitoring Tomcat's JVM heap with Prometheus, it looks like we need to use JMX + jmx_exporter
    • +
    • So I’m not sure if this is just a graphical glitch or if editors have to edit this metadata field prior to approval
    • +
    • Looking at monitoring Tomcat’s JVM heap with Prometheus, it looks like we need to use JMX + jmx_exporter
    • This guide shows how to enable JMX in Tomcat by modifying CATALINA_OPTS
    • I was able to successfully connect to my local Tomcat with jconsole!
    @@ -268,8 +268,8 @@ $ grep 104.196.152.243 dspace.log.2017-11-03 | grep -o -E 'session_id=[A-Z0-9]{3 $ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l 7051
      -
    • The worst thing is that this user never specifies a user agent string so we can't lump it in with the other bots using the Tomcat Session Crawler Manager Valve
    • -
    • They don't request dynamic URLs like “/discover” but they seem to be fetching handles from XMLUI instead of REST (and some with //handle, note the regex below):
    • +
    • The worst thing is that this user never specifies a user agent string so we can’t lump it in with the other bots using the Tomcat Session Crawler Manager Valve
    • +
    • They don’t request dynamic URLs like “/discover” but they seem to be fetching handles from XMLUI instead of REST (and some with //handle, note the regex below):
    # grep -c 104.196.152.243 /var/log/nginx/access.log.1
     4681
    @@ -277,7 +277,7 @@ $ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{3
     4618
     
    • I just realized that ciat.cgiar.org points to 104.196.152.243, so I should contact Leroy from CIAT to see if we can change their scraping behavior
    • -
    • The next IP (207.46.13.36) seem to be Microsoft's bingbot, but all its requests specify the “bingbot” user agent and there are no requests for dynamic URLs that are forbidden, like “/discover”:
    • +
    • The next IP (207.46.13.36) seem to be Microsoft’s bingbot, but all its requests specify the “bingbot” user agent and there are no requests for dynamic URLs that are forbidden, like “/discover”:
    $ grep -c 207.46.13.36 /var/log/nginx/access.log.1 
     2034
    @@ -328,18 +328,18 @@ $ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{3
     
  • Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)
  • -
  • I'll just keep an eye on that one for now, as it only made a few hundred requests to dynamic discovery URLs
  • -
  • While it's not in the top ten, Baidu is one bot that seems to not give a fuck:
  • +
  • I’ll just keep an eye on that one for now, as it only made a few hundred requests to dynamic discovery URLs
  • +
  • While it’s not in the top ten, Baidu is one bot that seems to not give a fuck:
  • # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "7/Nov/2017" | grep -c Baiduspider
     8912
     # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "7/Nov/2017" | grep Baiduspider | grep -c -E "GET /(browse|discover|search-filter)"
     2521
     
      -
    • According to their documentation their bot respects robots.txt, but I don't see this being the case
    • +
    • According to their documentation their bot respects robots.txt, but I don’t see this being the case
    • I think I will end up blocking Baidu as well…
    • Next is for me to look and see what was happening specifically at 3AM and 7AM when the server crashed
    • -
    • I should look in nginx access.log, rest.log, oai.log, and DSpace's dspace.log.2017-11-07
    • +
    • I should look in nginx access.log, rest.log, oai.log, and DSpace’s dspace.log.2017-11-07
    • Here are the top IPs making requests to XMLUI from 2 to 8 AM:
    # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '07/Nov/2017:0[2-8]' | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
    @@ -389,8 +389,8 @@ $ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{3
         462 ip_addr=104.196.152.243
         488 ip_addr=66.249.66.90
     
      -
    • These aren't actually very interesting, as the top few are Google, CIAT, Bingbot, and a few other unknown scrapers
    • -
    • The number of requests isn't even that high to be honest
    • +
    • These aren’t actually very interesting, as the top few are Google, CIAT, Bingbot, and a few other unknown scrapers
    • +
    • The number of requests isn’t even that high to be honest
    • As I was looking at these logs I noticed another heavy user (124.17.34.59) that was not active during this time period, but made many requests today alone:
    # zgrep -c 124.17.34.59 /var/log/nginx/access.log*
    @@ -405,13 +405,13 @@ $ grep 104.196.152.243 dspace.log.2017-11-01 | grep -o -E 'session_id=[A-Z0-9]{3
     /var/log/nginx/access.log.8.gz:0
     /var/log/nginx/access.log.9.gz:1
     
      -
    • The whois data shows the IP is from China, but the user agent doesn't really give any clues:
    • +
    • The whois data shows the IP is from China, but the user agent doesn’t really give any clues:
    # grep 124.17.34.59 /var/log/nginx/access.log | awk -F'" ' '{print $3}' | sort | uniq -c | sort -h
         210 "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
       22610 "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.2; Win64; x64; Trident/7.0; LCTE)"
     
      -
    • A Google search for “LCTE bot” doesn't return anything interesting, but this Stack Overflow discussion references the lack of information
    • +
    • A Google search for “LCTE bot” doesn’t return anything interesting, but this Stack Overflow discussion references the lack of information
    • So basically after a few hours of looking at the log files I am not closer to understanding what is going on!
    • I do know that we want to block Baidu, though, as it does not respect robots.txt
    • And as we speak Linode alerted that the outbound traffic rate is very high for the past two hours (about 12–14 hours)
    • @@ -479,13 +479,13 @@ $ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017-
      $ cat dspace.log.2017-11-07 dspace.log.2017-11-08 | grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=124.17.34.59' | sort | uniq | wc -l
       20733
       
        -
      • I'm getting really sick of this
      • +
      • I’m getting really sick of this
      • Sisay re-uploaded the CIAT records that I had already corrected earlier this week, erasing all my corrections
      • I had to re-correct all the publishers, places, names, dates, etc and apply the changes on DSpace Test
      • Run system updates on DSpace Test and reboot the server
      • Magdalena had written to say that two of their Phase II project tags were missing on CGSpace, so I added them (#346)
      • -
      • I figured out a way to use nginx's map function to assign a “bot” user agent to misbehaving clients who don't define a user agent
      • -
      • Most bots are automatically lumped into one generic session by Tomcat's Crawler Session Manager Valve but this only works if their user agent matches a pre-defined regular expression like .*[bB]ot.*
      • +
      • I figured out a way to use nginx’s map function to assign a “bot” user agent to misbehaving clients who don’t define a user agent
      • +
      • Most bots are automatically lumped into one generic session by Tomcat’s Crawler Session Manager Valve but this only works if their user agent matches a pre-defined regular expression like .*[bB]ot.*
      • Some clients send thousands of requests without a user agent which ends up creating thousands of Tomcat sessions, wasting precious memory, CPU, and database resources in the process
      • Basically, we modify the nginx config to add a mapping with a modified user agent $ua:
      @@ -495,15 +495,15 @@ $ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017- default $http_user_agent; }
      -
    • If the client's address matches then the user agent is set, otherwise the default $http_user_agent variable is used
    • -
    • Then, in the server's / block we pass this header to Tomcat:
    • +
    • If the client’s address matches then the user agent is set, otherwise the default $http_user_agent variable is used
    • +
    • Then, in the server’s / block we pass this header to Tomcat:
    proxy_pass http://tomcat_http;
     proxy_set_header User-Agent $ua;
     
      -
    • Note to self: the $ua variable won't show up in nginx access logs because the default combined log format doesn't show it, so don't run around pulling your hair out wondering with the modified user agents aren't showing in the logs!
    • +
    • Note to self: the $ua variable won’t show up in nginx access logs because the default combined log format doesn’t show it, so don’t run around pulling your hair out wondering with the modified user agents aren’t showing in the logs!
    • If a client matching one of these IPs connects without a session, it will be assigned one by the Crawler Session Manager Valve
    • -
    • You can verify by cross referencing nginx's access.log and DSpace's dspace.log.2017-11-08, for example
    • +
    • You can verify by cross referencing nginx’s access.log and DSpace’s dspace.log.2017-11-08, for example
    • I will deploy this on CGSpace later this week
    • I am interested to check how this affects the number of sessions used by the CIAT and Chinese bots (see above on 2017-11-07 for example)
    • I merged the clickable thumbnails code to 5_x-prod (#347) and will deploy it later along with the new bot mapping stuff (and re-run the Asible nginx and tomcat tags)
    • @@ -522,7 +522,7 @@ proxy_set_header User-Agent $ua; 1134
    • I have been looking for a reason to ban Baidu and this is definitely a good one
    • -
    • Disallowing Baiduspider in robots.txt probably won't work because this bot doesn't seem to respect the robot exclusion standard anyways!
    • +
    • Disallowing Baiduspider in robots.txt probably won’t work because this bot doesn’t seem to respect the robot exclusion standard anyways!
    • I will whip up something in nginx later
    • Run system updates on CGSpace and reboot the server
    • Re-deploy latest 5_x-prod branch on CGSpace and DSpace Test (includes the clickable thumbnails, CCAFS phase II project tags, and updated news text)
    • @@ -548,7 +548,7 @@ $ grep 104.196.152.243 dspace.log.2017-11-07 | grep -o -E 'session_id=[A-Z0-9]{3 3506
    • The number of sessions is over ten times less!
    • -
    • This gets me thinking, I wonder if I can use something like nginx's rate limiter to automatically change the user agent of clients who make too many requests
    • +
    • This gets me thinking, I wonder if I can use something like nginx’s rate limiter to automatically change the user agent of clients who make too many requests
    • Perhaps using a combination of geo and map, like illustrated here: https://www.nginx.com/blog/rate-limiting-nginx/

    2017-11-11

    @@ -560,7 +560,7 @@ $ grep 104.196.152.243 dspace.log.2017-11-07 | grep -o -E 'session_id=[A-Z0-9]{3

    2017-11-12

    • Update the Ansible infrastructure templates to be a little more modular and flexible
    • -
    • Looking at the top client IPs on CGSpace so far this morning, even though it's only been eight hours:
    • +
    • Looking at the top client IPs on CGSpace so far this morning, even though it’s only been eight hours:
    # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep "12/Nov/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
         243 5.83.120.111
    @@ -579,7 +579,7 @@ $ grep 104.196.152.243 dspace.log.2017-11-07 | grep -o -E 'session_id=[A-Z0-9]{3
     
    # grep 5.9.6.51 /var/log/nginx/access.log | tail -n 1
     5.9.6.51 - - [12/Nov/2017:08:13:13 +0000] "GET /handle/10568/16515/recent-submissions HTTP/1.1" 200 5097 "-" "Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)"
     
      -
    • What's amazing is that it seems to reuse its Java session across all requests:
    • +
    • What’s amazing is that it seems to reuse its Java session across all requests:
    $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2017-11-12
     1558
    @@ -587,7 +587,7 @@ $ grep 5.9.6.51 dspace.log.2017-11-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | s
     1
     
    • Bravo to MegaIndex.ru!
    • -
    • The same cannot be said for 95.108.181.88, which appears to be YandexBot, even though Tomcat's Crawler Session Manager valve regex should match ‘YandexBot’:
    • +
    • The same cannot be said for 95.108.181.88, which appears to be YandexBot, even though Tomcat’s Crawler Session Manager valve regex should match ‘YandexBot’:
    # grep 95.108.181.88 /var/log/nginx/access.log | tail -n 1
     95.108.181.88 - - [12/Nov/2017:08:33:17 +0000] "GET /bitstream/handle/10568/57004/GenebankColombia_23Feb2015.pdf HTTP/1.1" 200 972019 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
    @@ -600,8 +600,8 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=95.108.181.88' dspace.log.2017-11-
     10947/34   10947/1 10568/83389
     10947/2512 10947/1 10568/83389
     
    @@ -664,7 +664,7 @@ Server: nginx
    • Deploy some nginx configuration updates to CGSpace
    • They had been waiting on a branch for a few months and I think I just forgot about them
    • -
    • I have been running them on DSpace Test for a few days and haven't seen any issues there
    • +
    • I have been running them on DSpace Test for a few days and haven’t seen any issues there
    • Started testing DSpace 6.2 and a few things have changed
    • Now PostgreSQL needs pgcrypto:
    @@ -672,21 +672,21 @@ Server: nginx dspace6=# CREATE EXTENSION pgcrypto;
    • Also, local settings are no longer in build.properties, they are now in local.cfg
    • -
    • I'm not sure if we can use separate profiles like we did before with mvn -Denv=blah to use blah.properties
    • +
    • I’m not sure if we can use separate profiles like we did before with mvn -Denv=blah to use blah.properties
    • It seems we need to use “system properties” to override settings, ie: -Ddspace.dir=/Users/aorth/dspace6

    2017-11-15

    • Send Adam Hunt an invite to the DSpace Developers network on Yammer
    • He is the new head of communications at WLE, since Michael left
    • -
    • Merge changes to item view's wording of link metadata (#348)
    • +
    • Merge changes to item view’s wording of link metadata (#348)

    2017-11-17

    • Uptime Robot said that CGSpace went down today and I see lots of Timeout waiting for idle object errors in the DSpace logs
    • I looked in PostgreSQL using SELECT * FROM pg_stat_activity; and saw that there were 73 active connections
    • After a few minutes the connecitons went down to 44 and CGSpace was kinda back up, it seems like Tsega restarted Tomcat
    • -
    • Looking at the REST and XMLUI log files, I don't see anything too crazy:
    • +
    • Looking at the REST and XMLUI log files, I don’t see anything too crazy:
    # cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep "17/Nov/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
          13 66.249.66.223
    @@ -712,7 +712,7 @@ dspace6=# CREATE EXTENSION pgcrypto;
        2020 66.249.66.219
     
    • I need to look into using JMX to analyze active sessions I think, rather than looking at log files
    • -
    • After adding appropriate JMX listener options to Tomcat's JAVA_OPTS and restarting Tomcat, I can connect remotely using an SSH dynamic port forward (SOCKS) on port 7777 for example, and then start jconsole locally like:
    • +
    • After adding appropriate JMX listener options to Tomcat’s JAVA_OPTS and restarting Tomcat, I can connect remotely using an SSH dynamic port forward (SOCKS) on port 7777 for example, and then start jconsole locally like:
    $ jconsole -J-DsocksProxyHost=localhost -J-DsocksProxyPort=7777 service:jmx:rmi:///jndi/rmi://localhost:9000/jmxrmi -J-DsocksNonProxyHosts=
     
      @@ -760,14 +760,14 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
      2017-11-19 03:00:32,806 INFO  org.apache.pdfbox.pdfparser.PDFParser @ Document is encrypted
       2017-11-19 03:00:32,807 ERROR org.apache.pdfbox.filter.FlateFilter @ FlateFilter: stop reading corrupt stream due to a DataFormatException
       
        -
      • It's been a few days since I enabled the G1GC on DSpace Test and the JVM graph definitely changed:
      • +
      • It’s been a few days since I enabled the G1GC on DSpace Test and the JVM graph definitely changed:

      Tomcat G1GC

      2017-11-20

      • I found an article about JVM tuning that gives some pointers how to enable logging and tools to analyze logs for you
      • Also notes on rotating GC logs
      • -
      • I decided to switch DSpace Test back to the CMS garbage collector because it is designed for low pauses and high throughput (like G1GC!) and because we haven't even tried to monitor or tune it
      • +
      • I decided to switch DSpace Test back to the CMS garbage collector because it is designed for low pauses and high throughput (like G1GC!) and because we haven’t even tried to monitor or tune it

      2017-11-21

        @@ -777,7 +777,7 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19

    2017-11-22

    # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "22/Nov/2017:0[456]" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
         136 31.6.77.23
    @@ -791,7 +791,7 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
         696 66.249.66.90
         707 104.196.152.243
     

    Tomcat JVM with CMS GC

    @@ -826,22 +826,22 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19 942 45.5.184.196 3995 70.32.83.92
    $ grep 70.32.83.92 dspace.log.2017-11-23 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
     2
     

    2017-11-24

    PostgreSQL connections after tweak (week)

    PostgreSQL connections after tweak (month)

    @@ -893,29 +893,29 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19 6053 45.5.184.196
    $ cat dspace.log.2017-11-29 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
     10037
     
    $ cat dspace.log.2017-11-27 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
     12377
     $ cat dspace.log.2017-11-28 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
     16984
     

    2017-11-30

    diff --git a/docs/2017-12/index.html b/docs/2017-12/index.html index 87089c0d1..9931eb6a7 100644 --- a/docs/2017-12/index.html +++ b/docs/2017-12/index.html @@ -27,7 +27,7 @@ The logs say “Timeout waiting for idle object” PostgreSQL activity says there are 115 connections currently The list of connections to XMLUI and REST API for today: "/> - + @@ -57,7 +57,7 @@ The list of connections to XMLUI and REST API for today: - + @@ -104,7 +104,7 @@ The list of connections to XMLUI and REST API for today:

    December, 2017

    by Alan Orth in - Notes + Notes

    @@ -128,7 +128,7 @@ The list of connections to XMLUI and REST API for today: 4007 70.32.83.92 6061 45.5.184.196
    $ cat /home/cgspace.cgiar.org/log/dspace.log.2017-12-01 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
     5815
    @@ -148,7 +148,7 @@ The list of connections to XMLUI and REST API for today:
         314 2.86.122.76
     
    $ grep 2.86.122.76 /home/cgspace.cgiar.org/log/dspace.log.2017-12-01 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
     822
    @@ -169,20 +169,20 @@ The list of connections to XMLUI and REST API for today:
         319 2001:4b99:1:1:216:3eff:fe76:205b
     

    2017-12-03

    2017-12-04

    DSpace Test PostgreSQL connections month

    CGSpace PostgreSQL connections month

    2017-12-05

    @@ -196,8 +196,8 @@ The list of connections to XMLUI and REST API for today:
  • Linode alerted again that the CPU usage on CGSpace was high this morning from 6 to 8 AM
  • Uptime Robot alerted that the server went down and up around 8:53 this morning
  • Uptime Robot alerted that CGSpace was down and up again a few minutes later
  • -
  • I don't see any errors in the DSpace logs but I see in nginx's access.log that UptimeRobot was returned with HTTP 499 status (Client Closed Request)
  • -
  • Looking at the REST API logs I see some new client IP I haven't noticed before:
  • +
  • I don’t see any errors in the DSpace logs but I see in nginx’s access.log that UptimeRobot was returned with HTTP 499 status (Client Closed Request)
  • +
  • Looking at the REST API logs I see some new client IP I haven’t noticed before:
  • # cat /var/log/nginx/rest.log /var/log/nginx/rest.log.1 | grep -E "6/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
          18 95.108.181.88
    @@ -233,7 +233,7 @@ The list of connections to XMLUI and REST API for today:
        2662 66.249.66.219
        5110 124.17.34.60
     
    Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.2; Win64; x64; Trident/7.0; LCTE)
    @@ -243,7 +243,7 @@ The list of connections to XMLUI and REST API for today:
     
    $ grep 124.17.34.60 /home/cgspace.cgiar.org/log/dspace.log.2017-12-07 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
     4574
     
      -
    • I've adjusted the nginx IP mapping that I set up last month to account for 124.17.34.60 and 124.17.34.59 using a regex, as it's the same bot on the same subnet
    • +
    • I’ve adjusted the nginx IP mapping that I set up last month to account for 124.17.34.60 and 124.17.34.59 using a regex, as it’s the same bot on the same subnet
    • I was running the DSpace cleanup task manually and it hit an error:
    $ /home/cgspace.cgiar.org/bin/dspace cleanup -v
    @@ -261,7 +261,7 @@ UPDATE 1
     
     

    2017-12-16

      -
    • Re-work the XMLUI base theme to allow child themes to override the header logo's image and link destination: #349
    • +
    • Re-work the XMLUI base theme to allow child themes to override the header logo’s image and link destination: #349
    • This required a little bit of work to restructure the XSL templates
    • Optimize PNG and SVG image assets in the CGIAR base theme using pngquant and svgo: #350
    @@ -276,7 +276,7 @@ UPDATE 1
  • I also had to add the .jpg to the thumbnail string in the CSV
  • The thumbnail11.jpg is missing
  • The dates are in super long ISO8601 format (from Excel?) like 2016-02-07T00:00:00Z so I converted them to simpler forms in GREL: value.toString("yyyy-MM-dd")
  • -
  • I trimmed the whitespaces in a few fields but it wasn't many
  • +
  • I trimmed the whitespaces in a few fields but it wasn’t many
  • Rename her thumbnail column to filename, and format it so SAFBuilder adds the files to the thumbnail bundle with this GREL in OpenRefine: value + "__bundle:THUMBNAIL"
  • Rename dc.identifier.status and dc.identifier.url columns to cg.identifier.status and cg.identifier.url
  • Item 4 has weird characters in citation, ie: Nagoya et de Trait
  • @@ -289,7 +289,7 @@ UPDATE 1
    $ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" ~/dspace/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/89338 --source /Users/aorth/Downloads/2016\ bulk\ upload\ thumbnails/SimpleArchiveFormat --mapfile=/tmp/ccafs.map &> /tmp/ccafs.log
     
      -
    • It's the same on DSpace Test, I can't import the SAF bundle without specifying the collection:
    • +
    • It’s the same on DSpace Test, I can’t import the SAF bundle without specifying the collection:
    $ dspace import --add --eperson=aorth@mjanja.ch --mapfile=/tmp/ccafs.map --source=/tmp/ccafs-2016/SimpleArchiveFormat
     No collections given. Assuming 'collections' file inside item directory
    @@ -317,7 +317,7 @@ Elapsed time: 2 secs (2559 msecs)
     
    -Dlog4j.configuration=file:/Users/aorth/dspace/config/log4j-console.properties -Ddspace.log.init.disable=true
     
    • … but the error message was the same, just with more INFO noise around it
    • -
    • For now I'll import into a collection in DSpace Test but I'm really not sure what's up with this!
    • +
    • For now I’ll import into a collection in DSpace Test but I’m really not sure what’s up with this!
    • Linode alerted that CGSpace was using high CPU from 4 to 6 PM
    • The logs for today show the CORE bot (137.108.70.7) being active in XMLUI:
    @@ -347,7 +347,7 @@ Elapsed time: 2 secs (2559 msecs) 4014 70.32.83.92 11030 45.5.184.196
      -
    • That's probably ok, as I don't think the REST API connections use up a Tomcat session…
    • +
    • That’s probably ok, as I don’t think the REST API connections use up a Tomcat session…
    • CIP emailed a few days ago to ask about unique IDs for authors and organizations, and if we can provide them via an API
    • Regarding the import issue above it seems to be a known issue that has a patch in DSpace 5.7:
    • -
    • We're on DSpace 5.5 but there is a one-word fix to the addItem() function here: https://github.com/DSpace/DSpace/pull/1731
    • +
    • We’re on DSpace 5.5 but there is a one-word fix to the addItem() function here: https://github.com/DSpace/DSpace/pull/1731
    • I will apply it on our branch but I need to make a note to NOT cherry-pick it when I rebase on to the latest 5.x upstream later
    • Pull request: #351
    @@ -393,7 +393,7 @@ Elapsed time: 2 secs (2559 msecs)
  • I need to keep an eye on this issue because it has nice fixes for reducing the number of database connections in DSpace 5.7: https://jira.duraspace.org/browse/DS-3551
  • Update text on CGSpace about page to give some tips to developers about using the resources more wisely (#352)
  • Linode alerted that CGSpace was using 396.3% CPU from 12 to 2 PM
  • -
  • The REST and OAI API logs look pretty much the same as earlier this morning, but there's a new IP harvesting XMLUI:
  • +
  • The REST and OAI API logs look pretty much the same as earlier this morning, but there’s a new IP harvesting XMLUI:
  • # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "18/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail            
         360 95.108.181.88
    @@ -416,8 +416,8 @@ Elapsed time: 2 secs (2559 msecs)
     
    $ grep 2.86.72.181 dspace.log.2017-12-18 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l                                                                                          
     1
     
      -
    • I guess there's nothing I can do to them for now
    • -
    • In other news, I am curious how many PostgreSQL connection pool errors we've had in the last month:
    • +
    • I guess there’s nothing I can do to them for now
    • +
    • In other news, I am curious how many PostgreSQL connection pool errors we’ve had in the last month:
    $ grep -c "Cannot get a connection, pool error Timeout waiting for idle object" dspace.log.2017-1* | grep -v :0
     dspace.log.2017-11-07:15695
    @@ -430,9 +430,9 @@ dspace.log.2017-12-01:1601
     dspace.log.2017-12-02:1274
     dspace.log.2017-12-07:2769
     
      -
    • I made a small fix to my move-collections.sh script so that it handles the case when a “to” or “from” community doesn't exist
    • +
    • I made a small fix to my move-collections.sh script so that it handles the case when a “to” or “from” community doesn’t exist
    • The script lives here: https://gist.github.com/alanorth/e60b530ed4989df0c731afbb0c640515
    • -
    • Major reorganization of four of CTA's French collections
    • +
    • Major reorganization of four of CTA’s French collections
    • Basically moving their items into the English ones, then moving the English ones to the top-level of the CTA community, and deleting the old sub-communities
    • Move collection 10568/51821 from 10568/42212 to 10568/42211
    • Move collection 10568/51400 from 10568/42214 to 10568/42211
    • @@ -457,21 +457,21 @@ $ schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery

      2017-12-19

      • Briefly had PostgreSQL connection issues on CGSpace for the millionth time
      • -
      • I'm fucking sick of this!
      • +
      • I’m fucking sick of this!
      • The connection graph on CGSpace shows shit tons of connections idle

      Idle PostgreSQL connections on CGSpace

        -
      • And I only now just realized that DSpace's db.maxidle parameter is not seconds, but number of idle connections to allow.
      • +
      • And I only now just realized that DSpace’s db.maxidle parameter is not seconds, but number of idle connections to allow.
      • So theoretically, because each webapp has its own pool, this could be 20 per app—so no wonder we have 50 idle connections!
      • I notice that this number will be set to 10 by default in DSpace 6.1 and 7.0: https://jira.duraspace.org/browse/DS-3564
      • -
      • So I'm going to reduce ours from 20 to 10 and start trying to figure out how the hell to supply a database pool using Tomcat JNDI
      • +
      • So I’m going to reduce ours from 20 to 10 and start trying to figure out how the hell to supply a database pool using Tomcat JNDI
      • I re-deployed the 5_x-prod branch on CGSpace, applied all system updates, and restarted the server
      • Looking through the dspace.log I see this error:
      2017-12-19 08:17:15,740 ERROR org.dspace.statistics.SolrLogger @ Error CREATEing SolrCore 'statistics-2010': Unable to create core [statistics-2010] Caused by: Lock obtain timed out: NativeFSLock@/home/cgspace.cgiar.org/solr/statistics-2010/data/index/write.lock
       
        -
      • I don't have time now to look into this but the Solr sharding has long been an issue!
      • +
      • I don’t have time now to look into this but the Solr sharding has long been an issue!
      • Looking into using JDBC / JNDI to provide a database pool to DSpace
      • The DSpace 6.x configuration docs have more notes about setting up the database pool than the 5.x ones (which actually have none!)
      • First, I uncomment db.jndi in dspace/config/dspace.cfg
      • @@ -496,7 +496,7 @@ $ schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery
        <ResourceLink global="jdbc/dspace" name="jdbc/dspace" type="javax.sql.DataSource"/>
         
        • I am not sure why several guides show configuration snippets for server.xml and web application contexts that use a Local and Global jdbc…
        • -
        • When DSpace can't find the JNDI context (for whatever reason) you will see this in the dspace logs:
        • +
        • When DSpace can’t find the JNDI context (for whatever reason) you will see this in the dspace logs:
        2017-12-19 13:12:08,796 ERROR org.dspace.storage.rdbms.DatabaseManager @ Error retrieving JNDI context: jdbc/dspace
         javax.naming.NameNotFoundException: Name [jdbc/dspace] is not bound in this Context. Unable to find [jdbc].
        @@ -547,31 +547,31 @@ javax.naming.NameNotFoundException: Name [jdbc/dspace] is not bound in this Cont
            <version>9.1-901-1.jdbc4</version>
         </dependency>
         
          -
        • So WTF? Let's try copying one to Tomcat's lib folder and restarting Tomcat:
        • +
        • So WTF? Let’s try copying one to Tomcat’s lib folder and restarting Tomcat:
        $ cp ~/dspace/lib/postgresql-9.1-901-1.jdbc4.jar /usr/local/opt/tomcat@7/libexec/lib
         
          -
        • Oh that's fantastic, now at least Tomcat doesn't print an error during startup so I guess it succeeds to create the JNDI pool
        • -
        • DSpace starts up but I have no idea if it's using the JNDI configuration because I see this in the logs:
        • +
        • Oh that’s fantastic, now at least Tomcat doesn’t print an error during startup so I guess it succeeds to create the JNDI pool
        • +
        • DSpace starts up but I have no idea if it’s using the JNDI configuration because I see this in the logs:
        2017-12-19 13:26:54,271 INFO  org.dspace.storage.rdbms.DatabaseManager @ DBMS is '{}'PostgreSQL
         2017-12-19 13:26:54,277 INFO  org.dspace.storage.rdbms.DatabaseManager @ DBMS driver version is '{}'9.5.10
         2017-12-19 13:26:54,293 INFO  org.dspace.storage.rdbms.DatabaseUtils @ Loading Flyway DB migrations from: filesystem:/Users/aorth/dspace/etc/postgres, classpath:org.dspace.storage.rdbms.sqlmigration.postgres, classpath:org.dspace.storage.rdbms.migration
         2017-12-19 13:26:54,306 INFO  org.flywaydb.core.internal.dbsupport.DbSupportFactory @ Database: jdbc:postgresql://localhost:5432/dspacetest (PostgreSQL 9.5)
         
          -
        • Let's try again, but this time explicitly blank the PostgreSQL connection parameters in dspace.cfg and see if DSpace starts…
        • -
        • Wow, ok, that works, but having to copy the PostgreSQL JDBC JAR to Tomcat's lib folder totally blows
        • -
        • Also, it's likely this is only a problem on my local macOS + Tomcat test environment
        • -
        • Ubuntu's Tomcat distribution will probably handle this differently
        • +
        • Let’s try again, but this time explicitly blank the PostgreSQL connection parameters in dspace.cfg and see if DSpace starts…
        • +
        • Wow, ok, that works, but having to copy the PostgreSQL JDBC JAR to Tomcat’s lib folder totally blows
        • +
        • Also, it’s likely this is only a problem on my local macOS + Tomcat test environment
        • +
        • Ubuntu’s Tomcat distribution will probably handle this differently
        • So for reference I have:
          • a <Resource> defined globally in server.xml
          • -
          • a <ResourceLink> defined in each web application's context XML
          • +
          • a <ResourceLink> defined in each web application’s context XML
          • unset the db.url, db.username, and db.password parameters in dspace.cfg
          • set the db.jndi in dspace.cfg to the name specified in the web application context
        • -
        • After adding the Resource to server.xml on Ubuntu I get this in Catalina's logs:
        • +
        • After adding the Resource to server.xml on Ubuntu I get this in Catalina’s logs:
        SEVERE: Unable to create initial connections of pool.
         java.sql.SQLException: org.postgresql.Driver
        @@ -579,8 +579,8 @@ java.sql.SQLException: org.postgresql.Driver
         Caused by: java.lang.ClassNotFoundException: org.postgresql.Driver
         
        • The username and password are correct, but maybe I need to copy the fucking lib there too?
        • -
        • I tried installing Ubuntu's libpostgresql-jdbc-java package but Tomcat still can't find the class
        • -
        • Let me try to symlink the lib into Tomcat's libs:
        • +
        • I tried installing Ubuntu’s libpostgresql-jdbc-java package but Tomcat still can’t find the class
        • +
        • Let me try to symlink the lib into Tomcat’s libs:
        # ln -sv /usr/share/java/postgresql.jar /usr/share/tomcat7/lib
         
          @@ -589,17 +589,17 @@ Caused by: java.lang.ClassNotFoundException: org.postgresql.Driver
          SEVERE: Exception sending context initialized event to listener instance of class org.dspace.app.util.DSpaceContextListener
           java.lang.AbstractMethodError: Method org/postgresql/jdbc3/Jdbc3ResultSet.isClosed()Z is abstract
           
            -
          • Could be a version issue or something since the Ubuntu package provides 9.2 and DSpace's are 9.1…
          • -
          • Let me try to remove it and copy in DSpace's:
          • +
          • Could be a version issue or something since the Ubuntu package provides 9.2 and DSpace’s are 9.1…
          • +
          • Let me try to remove it and copy in DSpace’s:
          # rm /usr/share/tomcat7/lib/postgresql.jar
           # cp [dspace]/webapps/xmlui/WEB-INF/lib/postgresql-9.1-901-1.jdbc4.jar /usr/share/tomcat7/lib/
           
          • Wow, I think that actually works…
          • I wonder if I could get the JDBC driver from postgresql.org instead of relying on the one from the DSpace build: https://jdbc.postgresql.org/
          • -
          • I notice our version is 9.1-901, which isn't even available anymore! The latest in the archived versions is 9.1-903
          • +
          • I notice our version is 9.1-901, which isn’t even available anymore! The latest in the archived versions is 9.1-903
          • Also, since I commented out all the db parameters in DSpace.cfg, how does the command line dspace tool work?
          • -
          • Let's try the upstream JDBC driver first:
          • +
          • Let’s try the upstream JDBC driver first:
          # rm /usr/share/tomcat7/lib/postgresql-9.1-901-1.jdbc4.jar
           # wget https://jdbc.postgresql.org/download/postgresql-42.1.4.jar -O /usr/share/tomcat7/lib/postgresql-42.1.4.jar
          @@ -648,8 +648,8 @@ javax.naming.NoInitialContextException: Need to specify class name in environmen
           
          • If I add the db values back to dspace.cfg the dspace database info command succeeds but the log still shows errors retrieving the JNDI connection
          • Perhaps something to report to the dspace-tech mailing list when I finally send my comments
          • -
          • Oh cool! select * from pg_stat_activity shows “PostgreSQL JDBC Driver” for the application name! That's how you know it's working!
          • -
          • If you monitor the pg_stat_activity while you run dspace database info you can see that it doesn't use the JNDI and creates ~9 extra PostgreSQL connections!
          • +
          • Oh cool! select * from pg_stat_activity shows “PostgreSQL JDBC Driver” for the application name! That’s how you know it’s working!
          • +
          • If you monitor the pg_stat_activity while you run dspace database info you can see that it doesn’t use the JNDI and creates ~9 extra PostgreSQL connections!
          • And in the middle of all of this Linode sends an alert that CGSpace has high CPU usage from 2 to 4 PM

          2017-12-20

          @@ -678,14 +678,14 @@ $ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -i 10568/89287

          2017-12-24

          • Linode alerted that CGSpace was using high CPU this morning around 6 AM
          • -
          • I'm playing with reading all of a month's nginx logs into goaccess:
          • +
          • I’m playing with reading all of a month’s nginx logs into goaccess:
          # find /var/log/nginx -type f -newermt "2017-12-01" | xargs zcat --force | goaccess --log-format=COMBINED -
           
          • I can see interesting things using this approach, for example:
              -
            • 50.116.102.77 checked our status almost 40,000 times so far this month—I think it's the CGNet uptime tool
            • -
            • Also, we've handled 2.9 million requests this month from 172,000 unique IP addresses!
            • +
            • 50.116.102.77 checked our status almost 40,000 times so far this month—I think it’s the CGNet uptime tool
            • +
            • Also, we’ve handled 2.9 million requests this month from 172,000 unique IP addresses!
            • Total bandwidth so far this month is 640GiB
            • The user that made the most requests so far this month is 45.5.184.196 (267,000 requests)
            @@ -720,13 +720,13 @@ UPDATE 5 # delete from metadatavalue where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(dc.language.iso|CGIAR Challenge Program on Water and Food)'; DELETE 20
      -
    • I need to figure out why we have records with language in because that's not a language!
    • +
    • I need to figure out why we have records with language in because that’s not a language!

    2017-12-30

    • Linode alerted that CGSpace was using 259% CPU from 4 to 6 AM
    • Uptime Robot noticed that the server went down for 1 minute a few hours later, around 9AM
    • -
    • Here's the XMLUI logs:
    • +
    • Here’s the XMLUI logs:
    # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "30/Dec/2017" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
         637 207.46.13.106
    @@ -740,14 +740,14 @@ DELETE 20
        1586 66.249.64.78
        3653 66.249.64.91
     
      -
    • Looks pretty normal actually, but I don't know who 54.175.208.220 is
    • +
    • Looks pretty normal actually, but I don’t know who 54.175.208.220 is
    • They identify as “com.plumanalytics”, which Google says is associated with Elsevier
    • -
    • They only seem to have used one Tomcat session so that's good, I guess I don't need to add them to the Tomcat Crawler Session Manager valve:
    • +
    • They only seem to have used one Tomcat session so that’s good, I guess I don’t need to add them to the Tomcat Crawler Session Manager valve:
    $ grep 54.175.208.220 dspace.log.2017-12-30 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l          
     1 
     
      -
    • 216.244.66.245 seems to be moz.com's DotBot
    • +
    • 216.244.66.245 seems to be moz.com’s DotBot

    2017-12-31

      diff --git a/docs/2018-01/index.html b/docs/2018-01/index.html index bf99a7005..6fcdbd4d6 100644 --- a/docs/2018-01/index.html +++ b/docs/2018-01/index.html @@ -9,7 +9,7 @@ @@ -83,7 +83,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv - + @@ -177,7 +177,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv - + @@ -224,7 +224,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv

      January, 2018

      @@ -232,7 +232,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv

      2018-01-02

      • Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time
      • -
      • I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary
      • +
      • I didn’t get any load alerts from Linode and the REST and XMLUI logs don’t show anything out of the ordinary
      • The nginx logs show HTTP 200s until 02/Jan/2018:11:27:17 +0000 when Uptime Robot got an HTTP 500
      • In dspace.log around that time I see many errors like “Client closed the connection before file download was complete”
      • And just before that I see this:
      • @@ -240,8 +240,8 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
        Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-980] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:50; busy:50; idle:0; lastwait:5000].
         
        • Ah hah! So the pool was actually empty!
        • -
        • I need to increase that, let's try to bump it up from 50 to 75
        • -
        • After that one client got an HTTP 499 but then the rest were HTTP 200, so I don't know what the hell Uptime Robot saw
        • +
        • I need to increase that, let’s try to bump it up from 50 to 75
        • +
        • After that one client got an HTTP 499 but then the rest were HTTP 200, so I don’t know what the hell Uptime Robot saw
        • I notice this error quite a few times in dspace.log:
        2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
        @@ -294,7 +294,7 @@ dspace.log.2017-12-31:53
         dspace.log.2018-01-01:45
         dspace.log.2018-01-02:34
         
          -
        • Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains
        • +
        • Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains

        2018-01-03

          @@ -326,8 +326,8 @@ dspace.log.2018-01-03:1909
    • 134.155.96.78 appears to be at the University of Mannheim in Germany
    • They identify as: Mozilla/5.0 (compatible; heritrix/3.2.0 +http://ifm.uni-mannheim.de)
    • -
    • This appears to be the Internet Archive's open source bot
    • -
    • They seem to be re-using their Tomcat session so I don't need to do anything to them just yet:
    • +
    • This appears to be the Internet Archive’s open source bot
    • +
    • They seem to be re-using their Tomcat session so I don’t need to do anything to them just yet:
    $ grep 134.155.96.78 dspace.log.2018-01-03 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
     2
    @@ -387,8 +387,8 @@ dspace.log.2018-01-03:1909
         139 164.39.7.62
     
    • I have no idea what these are but they seem to be coming from Amazon…
    • -
    • I guess for now I just have to increase the database connection pool's max active
    • -
    • It's currently 75 and normally I'd just bump it by 25 but let me be a bit daring and push it by 50 to 125, because I used to see at least 121 connections in pg_stat_activity before when we were using the shitty default pooling
    • +
    • I guess for now I just have to increase the database connection pool’s max active
    • +
    • It’s currently 75 and normally I’d just bump it by 25 but let me be a bit daring and push it by 50 to 125, because I used to see at least 121 connections in pg_stat_activity before when we were using the shitty default pooling

    2018-01-04

      @@ -420,14 +420,14 @@ dspace.log.2018-01-02:1972 dspace.log.2018-01-03:1909 dspace.log.2018-01-04:1559

    2018-01-05

    $ grep -c "Timeout: Pool empty." dspace.log.2018-01-*
     dspace.log.2018-01-01:0
    @@ -442,8 +442,8 @@ dspace.log.2018-01-05:0
     
    [Fri Jan 05 09:31:22.965398 2018] [:error] [pid 9340] [client 213.55.99.121:64476] WARNING: Unable to find a match for "9-16-1-RV.doc" in "/home/files/journals/6//articles/9/". Skipping this file., referer: http://dagris.info/reviewtool/index.php/index/install/upgrade
     
    • I will delete the log file for now and tell Danny
    • -
    • Also, I'm still seeing a hundred or so of the “ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer” errors in dspace logs, I need to search the dspace-tech mailing list to see what the cause is
    • -
    • I will run a full Discovery reindex in the mean time to see if it's something wrong with the Discovery Solr core
    • +
    • Also, I’m still seeing a hundred or so of the “ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer” errors in dspace logs, I need to search the dspace-tech mailing list to see what the cause is
    • +
    • I will run a full Discovery reindex in the mean time to see if it’s something wrong with the Discovery Solr core
    $ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
     $ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
    @@ -456,7 +456,7 @@ sys     3m14.890s
     
     

    2018-01-06

      -
    • I'm still seeing Solr errors in the DSpace logs even after the full reindex yesterday:
    • +
    • I’m still seeing Solr errors in the DSpace logs even after the full reindex yesterday:
    org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1983+TO+1989]': Encountered " "]" "] "" at line 1, column 32.
     
      @@ -471,7 +471,7 @@ sys 3m14.890s COPY 4515

    2018-01-10

      -
    • I looked to see what happened to this year's Solr statistics sharding task that should have run on 2018-01-01 and of course it failed:
    • +
    • I looked to see what happened to this year’s Solr statistics sharding task that should have run on 2018-01-01 and of course it failed:
    Moving: 81742 into core statistics-2010
     Exception: IOException occured when talking to server at: http://localhost:8081/solr//statistics-2010
    @@ -542,9 +542,9 @@ Caused by: org.apache.http.client.ClientProtocolException
             ... 10 more
     
    • There is interesting documentation about this on the DSpace Wiki: https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance#SOLRStatisticsMaintenance-SolrShardingByYear
    • -
    • I'm looking to see maybe if we're hitting the issues mentioned in DS-2212 that were apparently fixed in DSpace 5.2
    • +
    • I’m looking to see maybe if we’re hitting the issues mentioned in DS-2212 that were apparently fixed in DSpace 5.2
    • I can apparently search for records in the Solr stats core that have an empty owningColl field using this in the Solr admin query: -owningColl:*
    • -
    • On CGSpace I see 48,000,000 records that have an owningColl field and 34,000,000 that don't:
    • +
    • On CGSpace I see 48,000,000 records that have an owningColl field and 34,000,000 that don’t:
    $ http 'http://localhost:3000/solr/statistics/select?q=owningColl%3A*&wt=json&indent=true' | grep numFound 
       "response":{"numFound":48476327,"start":0,"docs":[
    @@ -552,14 +552,14 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=js
       "response":{"numFound":34879872,"start":0,"docs":[
     
    • I tested the dspace stats-util -s process on my local machine and it failed the same way
    • -
    • It doesn't seem to be helpful, but the dspace log shows this:
    • +
    • It doesn’t seem to be helpful, but the dspace log shows this:
    2018-01-10 10:51:19,301 INFO  org.dspace.statistics.SolrLogger @ Created core with name: statistics-2016
     2018-01-10 10:51:19,301 INFO  org.dspace.statistics.SolrLogger @ Moving: 3821 records into core statistics-2016
     
    • Terry Brady has written some notes on the DSpace Wiki about Solr sharing issues: https://wiki.duraspace.org/display/%7Eterrywbrady/Statistics+Import+Export+Issues
    • Uptime Robot said that CGSpace went down at around 9:43 AM
    • -
    • I looked at PostgreSQL's pg_stat_activity table and saw 161 active connections, but no pool errors in the DSpace logs:
    • +
    • I looked at PostgreSQL’s pg_stat_activity table and saw 161 active connections, but no pool errors in the DSpace logs:
    $ grep -c "Timeout: Pool empty." dspace.log.2018-01-10 
     0
    @@ -583,7 +583,7 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=js
     
    "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.111 Safari/537.36"
     
    • whois says they come from Perfect IP
    • -
    • I've never seen those top IPs before, but they have created 50,000 Tomcat sessions today:
    • +
    • I’ve never seen those top IPs before, but they have created 50,000 Tomcat sessions today:
    $ grep -E '(2607:fa98:40:9:26b6:fdff:feff:1888|2607:fa98:40:9:26b6:fdff:feff:195d|2607:fa98:40:9:26b6:fdff:feff:1c96|70.36.107.49|70.36.107.190|70.36.107.50)' /home/cgspace.cgiar.org/log/dspace.log.2018-01-10 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l                                                                                                                                                                                                  
     49096
    @@ -599,20 +599,20 @@ $ http 'http://localhost:3000/solr/statistics/select?q=-owningColl%3A*&wt=js
       23401 2607:fa98:40:9:26b6:fdff:feff:195d 
       47875 2607:fa98:40:9:26b6:fdff:feff:1888
     
      -
    • I added the user agent to nginx's badbots limit req zone but upon testing the config I got an error:
    • +
    • I added the user agent to nginx’s badbots limit req zone but upon testing the config I got an error:
    # nginx -t
     nginx: [emerg] could not build map_hash, you should increase map_hash_bucket_size: 64
     nginx: configuration file /etc/nginx/nginx.conf test failed
     
    # cat /proc/cpuinfo | grep cache_alignment | head -n1
     cache_alignment : 64
     
    • On our servers that is 64, so I increased this parameter to 128 and deployed the changes to nginx
    • Almost immediately the PostgreSQL connections dropped back down to 40 or so, and UptimeRobot said the site was back up
    • -
    • So that's interesting that we're not out of PostgreSQL connections (current pool maxActive is 300!) but the system is “down” to UptimeRobot and very slow to use
    • +
    • So that’s interesting that we’re not out of PostgreSQL connections (current pool maxActive is 300!) but the system is “down” to UptimeRobot and very slow to use
    • Linode continues to test mitigations for Meltdown and Spectre: https://blog.linode.com/2018/01/03/cpu-vulnerabilities-meltdown-spectre/
    • I rebooted DSpace Test to see if the kernel will be updated (currently Linux 4.14.12-x86_64-linode92)… nope.
    • It looks like Linode will reboot the KVM hosts later this week, though
    • @@ -650,7 +650,7 @@ cache_alignment : 64 111535 2607:fa98:40:9:26b6:fdff:feff:1c96 161797 2607:fa98:40:9:26b6:fdff:feff:1888
      -
    • Wow, I just figured out how to set the application name of each database pool in the JNDI config of Tomcat's server.xml:
    • +
    • Wow, I just figured out how to set the application name of each database pool in the JNDI config of Tomcat’s server.xml:
    <Resource name="jdbc/dspaceWeb" auth="Container" type="javax.sql.DataSource"
               driverClassName="org.postgresql.Driver"
    @@ -665,9 +665,9 @@ cache_alignment : 64
               validationQuery='SELECT 1'
               testOnBorrow='true' />
     
      -
    • So theoretically I could name each connection “xmlui” or “dspaceWeb” or something meaningful and it would show up in PostgreSQL's pg_stat_activity table!
    • +
    • So theoretically I could name each connection “xmlui” or “dspaceWeb” or something meaningful and it would show up in PostgreSQL’s pg_stat_activity table!
    • This would be super helpful for figuring out where load was coming from (now I wonder if I could figure out how to graph this)
    • -
    • Also, I realized that the db.jndi parameter in dspace.cfg needs to match the name value in your applicaiton's context—not the global one
    • +
    • Also, I realized that the db.jndi parameter in dspace.cfg needs to match the name value in your applicaiton’s context—not the global one
    • Ah hah! Also, I can name the default DSpace connection pool in dspace.cfg as well, like:
    db.url = jdbc:postgresql://localhost:5432/dspacetest?ApplicationName=dspaceDefault
    @@ -676,7 +676,7 @@ cache_alignment : 64
     
     

    2018-01-12

      -
    • I'm looking at the DSpace 6.0 Install docs and notice they tweak the number of threads in their Tomcat connector:
    • +
    • I’m looking at the DSpace 6.0 Install docs and notice they tweak the number of threads in their Tomcat connector:
    <!-- Define a non-SSL HTTP/1.1 Connector on port 8080 -->
     <Connector port="8080"
    @@ -691,8 +691,8 @@ cache_alignment : 64
                URIEncoding="UTF-8"/>
     
    • In Tomcat 8.5 the maxThreads defaults to 200 which is probably fine, but tweaking minSpareThreads could be good
    • -
    • I don't see a setting for maxSpareThreads in the docs so that might be an error
    • -
    • Looks like in Tomcat 8.5 the default URIEncoding for Connectors is UTF-8, so we don't need to specify that manually anymore: https://tomcat.apache.org/tomcat-8.5-doc/config/http.html
    • +
    • I don’t see a setting for maxSpareThreads in the docs so that might be an error
    • +
    • Looks like in Tomcat 8.5 the default URIEncoding for Connectors is UTF-8, so we don’t need to specify that manually anymore: https://tomcat.apache.org/tomcat-8.5-doc/config/http.html
    • Ooh, I just saw the acceptorThreadCount setting (in Tomcat 7 and 8.5):
    The number of threads to be used to accept connections. Increase this value on a multi CPU machine, although you would never really need more than 2. Also, with a lot of non keep alive connections, you might want to increase this value as well. Default value is 1.
    @@ -707,7 +707,7 @@ cache_alignment : 64
     
    13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxActive is not used in DBCP2, use maxTotal instead. maxTotal default value is 8. You have set value of "35" for "maxActive" property, which is being ignored.
     13-Jan-2018 13:59:05.245 WARNING [main] org.apache.tomcat.dbcp.dbcp2.BasicDataSourceFactory.getObjectInstance Name = dspace6 Property maxWait is not used in DBCP2 , use maxWaitMillis instead. maxWaitMillis default value is -1. You have set value of "5000" for "maxWait" property, which is being ignored.
     
      -
    • I looked in my Tomcat 7.0.82 logs and I don't see anything about DBCP2 errors, so I guess this a Tomcat 8.0.x or 8.5.x thing
    • +
    • I looked in my Tomcat 7.0.82 logs and I don’t see anything about DBCP2 errors, so I guess this a Tomcat 8.0.x or 8.5.x thing
    • DBCP2 appears to be Tomcat 8.0.x and up according to the Tomcat 8.0 migration guide
    • I have updated our Ansible infrastructure scripts so that it will be ready whenever we switch to Tomcat 8 (probably with Ubuntu 18.04 later this year)
    • When I enable the ResourceLink in the ROOT.xml context I get the following error in the Tomcat localhost log:
    • @@ -735,24 +735,24 @@ Caused by: java.lang.NullPointerException ... 15 more
    • Interesting blog post benchmarking Tomcat JDBC vs Apache Commons DBCP2, with configuration snippets: http://www.tugay.biz/2016/07/tomcat-connection-pool-vs-apache.html
    • -
    • The Tomcat vs Apache pool thing is confusing, but apparently we're using Apache Commons DBCP2 because we don't specify factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" in our global resource
    • -
    • So at least I know that I'm not looking for documentation or troubleshooting on the Tomcat JDBC pool!
    • -
    • I looked at pg_stat_activity during Tomcat's startup and I see that the pool created in server.xml is indeed connecting, just that nothing uses it
    • +
    • The Tomcat vs Apache pool thing is confusing, but apparently we’re using Apache Commons DBCP2 because we don’t specify factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" in our global resource
    • +
    • So at least I know that I’m not looking for documentation or troubleshooting on the Tomcat JDBC pool!
    • +
    • I looked at pg_stat_activity during Tomcat’s startup and I see that the pool created in server.xml is indeed connecting, just that nothing uses it
    • Also, the fallback connection parameters specified in local.cfg (not dspace.cfg) are used
    • Shit, this might actually be a DSpace error: https://jira.duraspace.org/browse/DS-3434
    • -
    • I'll comment on that issue
    • +
    • I’ll comment on that issue

    2018-01-14

    • Looking at the authors Peter had corrected
    • -
    • Some had multiple and he's corrected them by adding || in the correction column, but I can't process those this way so I will just have to flag them and do those manually later
    • +
    • Some had multiple and he’s corrected them by adding || in the correction column, but I can’t process those this way so I will just have to flag them and do those manually later
    • Also, I can flag the values that have “DELETE”
    • Then I need to facet the correction column on isBlank(value) and not flagged

    2018-01-15

    • Help Udana from IWMI export a CSV from DSpace Test so he can start trying a batch upload
    • -
    • I'm going to apply these ~130 corrections on CGSpace:
    • +
    • I’m going to apply these ~130 corrections on CGSpace:
    update metadatavalue set text_value='Formally Published' where resource_type_id=2 and metadata_field_id=214 and text_value like 'Formally published';
     delete from metadatavalue where resource_type_id=2 and metadata_field_id=214 and text_value like 'NO';
    @@ -764,7 +764,7 @@ update metadatavalue set text_value='ru' where resource_type_id=2 and metadata_f
     update metadatavalue set text_value='in' where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(IN|In)';
     delete from metadatavalue where resource_type_id=2 and metadata_field_id=38 and text_value ~ '(dc.language.iso|CGIAR Challenge Program on Water and Food)';
     
      -
    • Continue proofing Peter's author corrections that I started yesterday, faceting on non blank, non flagged, and briefly scrolling through the values of the corrections to find encoding errors for French and Spanish names
    • +
    • Continue proofing Peter’s author corrections that I started yesterday, faceting on non blank, non flagged, and briefly scrolling through the values of the corrections to find encoding errors for French and Spanish names

    OpenRefine Authors

      @@ -817,9 +817,9 @@ COPY 4552
    • Looking over the affiliations again I see dozens of CIAT ones with their affiliation formatted like: International Center for Tropical Agriculture (CIAT)
    • For example, this one is from just last month: https://cgspace.cgiar.org/handle/10568/89930
    • Our controlled vocabulary has this in the format without the abbreviation: International Center for Tropical Agriculture
    • -
    • So some submitters don't know to use the controlled vocabulary lookup
    • +
    • So some submitters don’t know to use the controlled vocabulary lookup
    • Help Sisay with some thumbnails for book chapters in Open Refine and SAFBuilder
    • -
    • CGSpace users were having problems logging in, I think something's wrong with LDAP because I see this in the logs:
    • +
    • CGSpace users were having problems logging in, I think something’s wrong with LDAP because I see this in the logs:
    2018-01-15 12:53:15,810 WARN  org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=2386749547D03E0AA4EC7E44181A7552:ip_addr=x.x.x.x:ldap_authentication:type=failed_auth javax.naming.AuthenticationException\colon; [LDAP\colon; error code 49 - 80090308\colon; LdapErr\colon; DSID-0C090400, comment\colon; AcceptSecurityContext error, data 775, v1db1^@]
     
      @@ -835,7 +835,7 @@ sys 0m2.210s
      • Meeting with CGSpace team, a few action items:
          -
        • Discuss standardized names for CRPs and centers with ICARDA (don't wait for CG Core)
        • +
        • Discuss standardized names for CRPs and centers with ICARDA (don’t wait for CG Core)
        • Re-send DC rights implementation and forward to everyone so we can move forward with it (without the URI field for now)
        • Start looking at where I was with the AGROVOC API
        • Have a controlled vocabulary for CGIAR authors’ names and ORCIDs? Perhaps values like: Orth, Alan S. (0000-0002-1735-7458)
        • @@ -845,15 +845,15 @@ sys 0m2.210s
        • Add Sisay and Danny to Uptime Robot and allow them to restart Tomcat on CGSpace ✔
      • -
      • I removed Tsega's SSH access to the web and DSpace servers, and asked Danny to check whether there is anything he needs from Tsega's home directories so we can delete the accounts completely
      • -
      • I removed Tsega's access to Linode dashboard as well
      • +
      • I removed Tsega’s SSH access to the web and DSpace servers, and asked Danny to check whether there is anything he needs from Tsega’s home directories so we can delete the accounts completely
      • +
      • I removed Tsega’s access to Linode dashboard as well
      • I ended up creating a Jira issue for my db.jndi documentation fix: DS-3803
      • The DSpace developers said they wanted each pull request to be associated with a Jira issue

      2018-01-17

      • Abenet asked me to proof and upload 54 records for LIVES
      • -
      • A few records were missing countries (even though they're all from Ethiopia)
      • +
      • A few records were missing countries (even though they’re all from Ethiopia)
      • Also, there are whitespace issues in many columns, and the items are mapped to the LIVES and ILRI articles collections, not Theses
      • In any case, importing them like this:
      @@ -862,7 +862,7 @@ $ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFor
    • And fantastic, before I started the import there were 10 PostgreSQL connections, and then CGSpace crashed during the upload
    • When I looked there were 210 PostgreSQL connections!
    • -
    • I don't see any high load in XMLUI or REST/OAI:
    • +
    • I don’t see any high load in XMLUI or REST/OAI:
    # cat /var/log/nginx/access.log /var/log/nginx/access.log.1 /var/log/nginx/library-access.log /var/log/nginx/library-access.log.1 | grep -E "17/Jan/2018" | awk '{print $1}' | sort -n | uniq -c | sort -h | tail
         381 40.77.167.124
    @@ -892,8 +892,8 @@ $ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFor
     
    2018-01-17 07:59:25,856 INFO  org.apache.http.impl.client.SystemDefaultHttpClient @ I/O exception (org.apache.http.NoHttpResponseException) caught when processing request to {}->http://localhost:8081: The target server failed to respond
     2018-01-17 07:59:25,856 INFO  org.apache.http.impl.client.SystemDefaultHttpClient @ Retrying request to {}->http://localhost:8081
     
      -
    • I have NEVER seen this error before, and there is no error before or after that in DSpace's solr.log
    • -
    • Tomcat's catalina.out does show something interesting, though, right at that time:
    • +
    • I have NEVER seen this error before, and there is no error before or after that in DSpace’s solr.log
    • +
    • Tomcat’s catalina.out does show something interesting, though, right at that time:
    [====================>                              ]40% time remaining: 7 hour(s) 14 minute(s) 45 seconds. timestamp: 2018-01-17 07:57:02
     [====================>                              ]40% time remaining: 7 hour(s) 14 minute(s) 45 seconds. timestamp: 2018-01-17 07:57:11
    @@ -933,7 +933,7 @@ Exception in thread "http-bio-127.0.0.1-8081-exec-627" java.lang.OutOf
             at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
             at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
     
      -
    • You can see the timestamp above, which is some Atmire nightly task I think, but I can't figure out which one
    • +
    • You can see the timestamp above, which is some Atmire nightly task I think, but I can’t figure out which one
    • So I restarted Tomcat and tried the import again, which finished very quickly and without errors!
    $ dspace import -a -e aorth@mjanja.ch -s /tmp/2018-01-16\ LIVES/SimpleArchiveFormat -m lives2.map &> lives2.log
    @@ -942,7 +942,7 @@ Exception in thread "http-bio-127.0.0.1-8081-exec-627" java.lang.OutOf
     
     

    Tomcat JVM Heap

    $ docker pull docker.bintray.io/jfrog/artifactory-oss:latest
     $ docker volume create --name artifactory5_data
    @@ -961,10 +961,10 @@ $ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/
     
    $ mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false -Denv=localhost -P \!dspace-sword,\!dspace-swordv2 clean package
     
    • UptimeRobot said CGSpace went down for a few minutes
    • -
    • I didn't do anything but it came back up on its own
    • -
    • I don't see anything unusual in the XMLUI or REST/OAI logs
    • +
    • I didn’t do anything but it came back up on its own
    • +
    • I don’t see anything unusual in the XMLUI or REST/OAI logs
    • Now Linode alert says the CPU load is high, sigh
    • -
    • Regarding the heap space error earlier today, it looks like it does happen a few times a week or month (I'm not sure how far these logs go back, as they are not strictly daily):
    • +
    • Regarding the heap space error earlier today, it looks like it does happen a few times a week or month (I’m not sure how far these logs go back, as they are not strictly daily):
    # zgrep -c java.lang.OutOfMemoryError /var/log/tomcat7/catalina.out* | grep -v :0
     /var/log/tomcat7/catalina.out:2
    @@ -994,14 +994,14 @@ $ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/
     

    2018-01-18

    • UptimeRobot said CGSpace was down for 1 minute last night
    • -
    • I don't see any errors in the nginx or catalina logs, so I guess UptimeRobot just got impatient and closed the request, which caused nginx to send an HTTP 499
    • +
    • I don’t see any errors in the nginx or catalina logs, so I guess UptimeRobot just got impatient and closed the request, which caused nginx to send an HTTP 499
    • I realize I never did a full re-index after the SQL author and affiliation updates last week, so I should force one now:
    $ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
     $ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-discovery -b
     
    • Maria from Bioversity asked if I could remove the abstracts from all of their Limited Access items in the Bioversity Journal Articles collection
    • -
    • It's easy enough to do in OpenRefine, but you have to be careful to only get those items that are uploaded into Bioversity's collection, not the ones that are mapped from others!
    • +
    • It’s easy enough to do in OpenRefine, but you have to be careful to only get those items that are uploaded into Bioversity’s collection, not the ones that are mapped from others!
    • Use this GREL in OpenRefine after isolating all the Limited Access items: value.startsWith("10568/35501")
    • UptimeRobot said CGSpace went down AGAIN and both Sisay and Danny immediately logged in and restarted Tomcat without talking to me or each other!
    @@ -1011,8 +1011,8 @@ Jan 18 07:01:22 linode18 systemd[1]: Stopping LSB: Start Tomcat.... Jan 18 07:01:22 linode18 sudo[10812]: swebshet : TTY=pts/3 ; PWD=/home/swebshet ; USER=root ; COMMAND=/bin/systemctl restart tomcat7 Jan 18 07:01:22 linode18 sudo[10812]: pam_unix(sudo:session): session opened for user root by swebshet(uid=0)
      -
    • I had to cancel the Discovery indexing and I'll have to re-try it another time when the server isn't so busy (it had already taken two hours and wasn't even close to being done)
    • -
    • For now I've increased the Tomcat JVM heap from 5632 to 6144m, to give ~1GB of free memory over the average usage to hopefully account for spikes caused by load or background jobs
    • +
    • I had to cancel the Discovery indexing and I’ll have to re-try it another time when the server isn’t so busy (it had already taken two hours and wasn’t even close to being done)
    • +
    • For now I’ve increased the Tomcat JVM heap from 5632 to 6144m, to give ~1GB of free memory over the average usage to hopefully account for spikes caused by load or background jobs

    2018-01-19

      @@ -1023,8 +1023,8 @@ Jan 18 07:01:22 linode18 sudo[10812]: pam_unix(sudo:session): session opened for $ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace index-discovery -b
    • Linode alerted again and said that CGSpace was using 301% CPU
    • -
    • Peter emailed to ask why this item doesn't have an Altmetric badge on CGSpace but does have one on the Altmetric dashboard
    • -
    • Looks like our badge code calls the handle endpoint which doesn't exist:
    • +
    • Peter emailed to ask why this item doesn’t have an Altmetric badge on CGSpace but does have one on the Altmetric dashboard
    • +
    • Looks like our badge code calls the handle endpoint which doesn’t exist:
    https://api.altmetric.com/v1/handle/10568/88090
     
      @@ -1060,7 +1060,7 @@ real 7m2.241s user 1m33.198s sys 0m12.317s
      -
    • I tested the abstract cleanups on Bioversity's Journal Articles collection again that I had started a few days ago
    • +
    • I tested the abstract cleanups on Bioversity’s Journal Articles collection again that I had started a few days ago
    • In the end there were 324 items in the collection that were Limited Access, but only 199 had abstracts
    • I want to document the workflow of adding a production PostgreSQL database to a development instance of DSpace in Docker:
    @@ -1075,7 +1075,7 @@ $ docker cp ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspace_db: $ docker exec dspace_db psql -U dspace -f /tmp/update-sequences.sql dspace

    2018-01-22

      -
    • Look over Udana's CSV of 25 WLE records from last week
    • +
    • Look over Udana’s CSV of 25 WLE records from last week
    • I sent him some corrections:
      • The file encoding is Windows-1252
      • @@ -1090,7 +1090,7 @@ $ docker exec dspace_db psql -U dspace -f /tmp/update-sequences.sql dspace
      • I wrote a quick Python script to use the DSpace REST API to find all collections under a given community
      • The source code is here: rest-find-collections.py
      • -
      • Peter had said that found a bunch of ILRI collections that were called “untitled”, but I don't see any:
      • +
      • Peter had said that found a bunch of ILRI collections that were called “untitled”, but I don’t see any:
      $ ./rest-find-collections.py 10568/1 | wc -l
       308
      @@ -1099,17 +1099,17 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
       
    • Looking at the Tomcat connector docs I think we really need to increase maxThreads
    • The default is 200, which can easily be taken up by bots considering that Google and Bing each browse with fifty (50) connections each sometimes!
    • Before I increase this I want to see if I can measure and graph this, and then benchmark
    • -
    • I'll probably also increase minSpareThreads to 20 (its default is 10)
    • +
    • I’ll probably also increase minSpareThreads to 20 (its default is 10)
    • I still want to bump up acceptorThreadCount from 1 to 2 as well, as the documentation says this should be increased on multi-core systems
    • I spent quite a bit of time looking at jvisualvm and jconsole today
    • Run system updates on DSpace Test and reboot it
    • I see I can monitor the number of Tomcat threads and some detailed JVM memory stuff if I install munin-plugins-java
    • -
    • I'd still like to get arbitrary mbeans like activeSessions etc, though
    • -
    • I can't remember if I had to configure the jmx settings in /etc/munin/plugin-conf.d/munin-node or not—I think all I did was re-run the munin-node-configure script and of course enable JMX in Tomcat's JVM options
    • +
    • I’d still like to get arbitrary mbeans like activeSessions etc, though
    • +
    • I can’t remember if I had to configure the jmx settings in /etc/munin/plugin-conf.d/munin-node or not—I think all I did was re-run the munin-node-configure script and of course enable JMX in Tomcat’s JVM options

    2018-01-23

      -
    • Thinking about generating a jmeter test plan for DSpace, along the lines of Georgetown's dspace-performance-test
    • +
    • Thinking about generating a jmeter test plan for DSpace, along the lines of Georgetown’s dspace-performance-test
    • I got a list of all the GET requests on CGSpace for January 21st (the last time Linode complained the load was high), excluding admin calls:
    # zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz /var/log/nginx/library-access.log.2.gz /var/log/nginx/library-access.log.3.gz /var/log/nginx/rest.log.2.gz /var/log/nginx/rest.log.3.gz /var/log/nginx/oai.log.2.gz /var/log/nginx/oai.log.3.gz /var/log/nginx/error.log.2.gz /var/log/nginx/error.log.3.gz | grep "21/Jan/2018" | grep "GET " | grep -c -v "/admin"
    @@ -1208,7 +1208,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
     
    $ jmeter -g 2018-01-24-linode5451120-baseline.jtl -o 2018-01-24-linode5451120-baseline
     

    2018-01-25

      -
    • Run another round of tests on DSpace Test with jmeter after changing Tomcat's minSpareThreads to 20 (default is 10) and acceptorThreadCount to 2 (default is 1):
    • +
    • Run another round of tests on DSpace Test with jmeter after changing Tomcat’s minSpareThreads to 20 (default is 10) and acceptorThreadCount to 2 (default is 1):
    $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads.log
     $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads2.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-tomcat-threads2.log
    @@ -1221,18 +1221,18 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
     $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-g1gc2.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-g1gc2.log
     $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.jmx -l ~/dspace-performance-test/2018-01-25-linode5451120-g1gc3.jtl -j ~/dspace-performance-test/2018-01-25-linode5451120-g1gc3.log
     
      -
    • I haven't had time to look at the results yet
    • +
    • I haven’t had time to look at the results yet

    2018-01-26

    • Peter followed up about some of the points from the Skype meeting last week
    • -
    • Regarding the ORCID field issue, I see ICARDA's MELSpace is using cg.creator.ID: 0000-0001-9156-7691
    • +
    • Regarding the ORCID field issue, I see ICARDA’s MELSpace is using cg.creator.ID: 0000-0001-9156-7691
    • I had floated the idea of using a controlled vocabulary with values formatted something like: Orth, Alan S. (0000-0002-1735-7458)
    • Update PostgreSQL JDBC driver version from 42.1.4 to 42.2.1 on DSpace Test, see: https://jdbc.postgresql.org/
    • Reboot DSpace Test to get new Linode kernel (Linux 4.14.14-x86_64-linode94)
    • I am testing my old work on the dc.rights field, I had added a branch for it a few months ago
    • I added a list of Creative Commons and other licenses in input-forms.xml
    • -
    • The problem is that Peter wanted to use two questions, one for CG centers and one for other, but using the same metadata value, which isn't possible (?)
    • +
    • The problem is that Peter wanted to use two questions, one for CG centers and one for other, but using the same metadata value, which isn’t possible (?)
    • So I used some creativity and made several fields display values, but not store any, ie:
    <pair>
    @@ -1240,7 +1240,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
       <stored-value></stored-value>
     </pair>
     
      -
    • I was worried that if a user selected this field for some reason that DSpace would store an empty value, but it simply doesn't register that as a valid option:
    • +
    • I was worried that if a user selected this field for some reason that DSpace would store an empty value, but it simply doesn’t register that as a valid option:

    Rights

      @@ -1286,9 +1286,9 @@ Was expecting one of: Maximum: 2771268 Average: 210483
      -
    • I guess responses that don't fit in RAM get saved to disk (a default of 1024M), so this is definitely not the issue here, and that warning is totally unrelated
    • -
    • My best guess is that the Solr search error is related somehow but I can't figure it out
    • -
    • We definitely have enough database connections, as I haven't seen a pool error in weeks:
    • +
    • I guess responses that don’t fit in RAM get saved to disk (a default of 1024M), so this is definitely not the issue here, and that warning is totally unrelated
    • +
    • My best guess is that the Solr search error is related somehow but I can’t figure it out
    • +
    • We definitely have enough database connections, as I haven’t seen a pool error in weeks:
    $ grep -c "Timeout: Pool empty." dspace.log.2018-01-2*
     dspace.log.2018-01-20:0
    @@ -1305,7 +1305,7 @@ dspace.log.2018-01-29:0
     
  • Adam Hunt from WLE complained that pages take “1-2 minutes” to load each, from France and Sri Lanka
  • I asked him which particular pages, as right now pages load in 2 or 3 seconds for me
  • UptimeRobot said CGSpace went down again, and I looked at PostgreSQL and saw 211 active database connections
  • -
  • If it's not memory and it's not database, it's gotta be Tomcat threads, seeing as the default maxThreads is 200 anyways, it actually makes sense
  • +
  • If it’s not memory and it’s not database, it’s gotta be Tomcat threads, seeing as the default maxThreads is 200 anyways, it actually makes sense
  • I decided to change the Tomcat thread settings on CGSpace:
    • maxThreads from 200 (default) to 400
    • @@ -1333,8 +1333,8 @@ busy.value 0 idle.value 20 max.value 400
    • -
    • Apparently you can't monitor more than one connector, so I guess the most important to monitor would be the one that nginx is sending stuff to
    • -
    • So for now I think I'll just monitor these and skip trying to configure the jmx plugins
    • +
    • Apparently you can’t monitor more than one connector, so I guess the most important to monitor would be the one that nginx is sending stuff to
    • +
    • So for now I think I’ll just monitor these and skip trying to configure the jmx plugins
    • Although following the logic of /usr/share/munin/plugins/jmx_tomcat_dbpools could be useful for getting the active Tomcat sessions
    • From debugging the jmx_tomcat_db_pools script from the munin-plugins-java package, I see that this is how you call arbitrary mbeans:
    @@ -1343,7 +1343,7 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name="jdbc/dspace"
    [===================>                               ]38% time remaining: 5 hour(s) 21 minute(s) 47 seconds. timestamp: 2018-01-29 06:25:16
     
    @@ -1411,18 +1411,18 @@ javax.ws.rs.WebApplicationException

    CPU usage week

    # port=5400 ip="127.0.0.1" /usr/bin/java -cp /usr/share/munin/munin-jmx-plugins.jar org.munin.plugin.jmx.Beans Catalina:type=Manager,context=/,host=localhost activeSessions
     Catalina:type=Manager,context=/,host=localhost  activeSessions  8
     

    MBeans in JVisualVM

    diff --git a/docs/2018-02/index.html b/docs/2018-02/index.html index 4adbe47d5..f39111521 100644 --- a/docs/2018-02/index.html +++ b/docs/2018-02/index.html @@ -9,9 +9,9 @@ @@ -23,11 +23,11 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu's munin-plug - + @@ -57,7 +57,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu's munin-plug - + @@ -104,7 +104,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu's munin-plug

    February, 2018

    @@ -112,9 +112,9 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu's munin-plug

    2018-02-01

    DSpace Sessions

    cgspace=# update metadatavalue set text_value='United States Agency for International Development' where resource_type_id=2 and metadata_field_id=29 and text_value like '%U.S. Agency for International Development%';
    diff --git a/docs/2018-03/index.html b/docs/2018-03/index.html
    index 7882881d0..8ee63ebc8 100644
    --- a/docs/2018-03/index.html
    +++ b/docs/2018-03/index.html
    @@ -21,7 +21,7 @@ Export a CSV of the IITA community metadata for Martin Mueller
     
     Export a CSV of the IITA community metadata for Martin Mueller
     "/>
    -
    +
     
     
         
    @@ -51,7 +51,7 @@ Export a CSV of the IITA community metadata for Martin Mueller
         
         
         
    -    
    +    
         
     
         
    @@ -98,7 +98,7 @@ Export a CSV of the IITA community metadata for Martin Mueller
       

    March, 2018

    @@ -143,7 +143,7 @@ UPDATE 1
    • Add CIAT author Mauricio Efren Sotelo Cabrera to controlled vocabulary for ORCID identifiers (#360)
    • Help Sisay proof 200 IITA records on DSpace Test
    • -
    • Finally import Udana's 24 items to IWMI Journal Articles on CGSpace
    • +
    • Finally import Udana’s 24 items to IWMI Journal Articles on CGSpace
    • Skype with James Stapleton to discuss CGSpace, ILRI website, CKM staff issues, etc

    2018-03-08

    @@ -189,14 +189,14 @@ dspacetest=# select distinct text_lang from metadatavalue where resource_type_id es (9 rows)
      -
    • On second inspection it looks like dc.description.provenance fields use the text_lang “en” so that's probably why there are over 100,000 fields changed…
    • +
    • On second inspection it looks like dc.description.provenance fields use the text_lang “en” so that’s probably why there are over 100,000 fields changed…
    • If I skip that, there are about 2,000, which seems more reasonably like the amount of fields users have edited manually, or fucked up during CSV import, etc:
    dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and text_lang in ('EN','En','en_','EN_US','en_U','eng');
     UPDATE 2309
     
    • I will apply this on CGSpace right now
    • -
    • In other news, I was playing with adding ORCID identifiers to a dump of CIAT's community via CSV in OpenRefine
    • +
    • In other news, I was playing with adding ORCID identifiers to a dump of CIAT’s community via CSV in OpenRefine
    • Using a series of filters, flags, and GREL expressions to isolate items for a certain author, I figured out how to add ORCID identifiers to the cg.creator.id field
    • For example, a GREL expression in a custom text facet to get all items with dc.contributor.author[en_US] of a certain author with several name variations (this is how you use a logical OR in OpenRefine):
    @@ -206,7 +206,7 @@ UPDATE 2309
    if(isBlank(value), "Hernan Ceballos: 0000-0002-8744-7918", value + "||Hernan Ceballos: 0000-0002-8744-7918")
     
    # grep -c 'java.lang.OutOfMemoryError: Java heap space' /var/log/tomcat7/catalina.out
     178
     
      -
    • I will increase the JVM heap size from 5120M to 6144M, though we don't have much room left to grow as DSpace Test (linode19) is using a smaller instance size than CGSpace
    • +
    • I will increase the JVM heap size from 5120M to 6144M, though we don’t have much room left to grow as DSpace Test (linode19) is using a smaller instance size than CGSpace
    • Gabriela from CIP asked if I could send her a list of all CIP authors so she can do some replacements on the name formats
    • I got a list of all the CIP collections manually and use the same query that I used in August, 2017:
    @@ -445,8 +445,8 @@ sys 2m2.687s

    2018-04-20

      -
    • Gabriela from CIP emailed to say that CGSpace was returning a white page, but I haven't seen any emails from UptimeRobot
    • -
    • I confirm that it's just giving a white page around 4:16
    • +
    • Gabriela from CIP emailed to say that CGSpace was returning a white page, but I haven’t seen any emails from UptimeRobot
    • +
    • I confirm that it’s just giving a white page around 4:16
    • The DSpace logs show that there are no database connections:
    org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-bio-127.0.0.1-8443-exec-715] Timeout: Pool empty. Unable to fetch a connection in 5 seconds, none available[size:250; busy:18; idle:0; lastwait:5000].
    @@ -456,7 +456,7 @@ sys     2m2.687s
     
    # grep -c 'org.apache.tomcat.jdbc.pool.PoolExhaustedException' /home/cgspace.cgiar.org/log/dspace.log.2018-04-20
     32147
     
      -
    • I can't even log into PostgreSQL as the postgres user, WTF?
    • +
    • I can’t even log into PostgreSQL as the postgres user, WTF?
    $ psql -c 'select * from pg_stat_activity' | grep -o -E '(dspaceWeb|dspaceApi|dspaceCli)' | sort | uniq -c 
     ^C
    @@ -475,7 +475,7 @@ sys     2m2.687s
        4325 70.32.83.92
       10718 45.5.184.2
     
      -
    • It doesn't even seem like there is a lot of traffic compared to the previous days:
    • +
    • It doesn’t even seem like there is a lot of traffic compared to the previous days:
    # zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "20/Apr/2018" | wc -l
     74931
    @@ -485,9 +485,9 @@ sys     2m2.687s
     93459
     
    • I tried to restart Tomcat but systemctl hangs
    • -
    • I tried to reboot the server from the command line but after a few minutes it didn't come back up
    • +
    • I tried to reboot the server from the command line but after a few minutes it didn’t come back up
    • Looking at the Linode console I see that it is stuck trying to shut down
    • -
    • Even “Reboot” via Linode console doesn't work!
    • +
    • Even “Reboot” via Linode console doesn’t work!
    • After shutting it down a few times via the Linode console it finally rebooted
    • Everything is back but I have no idea what caused this—I suspect something with the hosting provider
    • Also super weird, the last entry in the DSpace log file is from 2018-04-20 16:35:09, and then immediately it goes to 2018-04-20 19:15:04 (three hours later!):
    • @@ -518,13 +518,13 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Time

    2018-04-24

      -
    • Testing my Ansible playbooks with a clean and updated installation of Ubuntu 18.04 and I fixed some issues that I hadn't run into a few weeks ago
    • +
    • Testing my Ansible playbooks with a clean and updated installation of Ubuntu 18.04 and I fixed some issues that I hadn’t run into a few weeks ago
    • There seems to be a new issue with Java dependencies, though
    • The default-jre package is going to be Java 10 on Ubuntu 18.04, but I want to use openjdk-8-jre-headless (well, the JDK actually, but it uses this JRE)
    • Tomcat and Ant are fine with Java 8, but the maven package wants to pull in Java 10 for some reason
    • Looking closer, I see that maven depends on java7-runtime-headless, which is indeed provided by openjdk-8-jre-headless
    • -
    • So it must be one of Maven's dependencies…
    • -
    • I will watch it for a few days because it could be an issue that will be resolved before Ubuntu 18.04's release
    • +
    • So it must be one of Maven’s dependencies…
    • +
    • I will watch it for a few days because it could be an issue that will be resolved before Ubuntu 18.04’s release
    • Otherwise I will post a bug to the ubuntu-release mailing list
    • Looks like the only way to fix this is to install openjdk-8-jdk-headless before (so it pulls in the JRE) in a separate transaction, or to manually install openjdk-8-jre-headless in the same apt transaction as maven
    • Also, I started porting PostgreSQL 9.6 into the Ansible infrastructure scripts
    • @@ -534,12 +534,12 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Time
      • Still testing the Ansible infrastructure playbooks for Ubuntu 18.04, Tomcat 8.5, and PostgreSQL 9.6
      • One other new thing I notice is that PostgreSQL 9.6 no longer uses createuser and nocreateuser, as those have actually meant superuser and nosuperuser and have been deprecated for ten years
      • -
      • So for my notes, when I'm importing a CGSpace database dump I need to amend my notes to give super user permission to a user, rather than create user:
      • +
      • So for my notes, when I’m importing a CGSpace database dump I need to amend my notes to give super user permission to a user, rather than create user:
      $ psql dspacetest -c 'alter user dspacetest superuser;'
       $ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-18.backup
       
        -
      • There's another issue with Tomcat in Ubuntu 18.04:
      • +
      • There’s another issue with Tomcat in Ubuntu 18.04:
      25-Apr-2018 13:26:21.493 SEVERE [http-nio-127.0.0.1-8443-exec-1] org.apache.coyote.AbstractProtocol$ConnectionHandler.process Error reading request, ignored
        java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;
      @@ -554,13 +554,13 @@ $ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-
               at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
               at java.lang.Thread.run(Thread.java:748)
       

      2018-04-29

      • DSpace Test crashed again, looks like memory issues again
      • -
      • JVM heap size was last increased to 6144m but the system only has 8GB total so there's not much we can do here other than get a bigger Linode instance or remove the massive Solr Statistics data
      • +
      • JVM heap size was last increased to 6144m but the system only has 8GB total so there’s not much we can do here other than get a bigger Linode instance or remove the massive Solr Statistics data

      2018-04-30

        diff --git a/docs/2018-05/index.html b/docs/2018-05/index.html index 8436ba410..f6f2e7a0f 100644 --- a/docs/2018-05/index.html +++ b/docs/2018-05/index.html @@ -35,7 +35,7 @@ http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E Then I reduced the JVM heap size from 6144 back to 5120m Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use "/> - + @@ -65,7 +65,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked - + @@ -112,7 +112,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked

        May, 2018

        @@ -135,7 +135,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
      • Looking over some IITA records for Sisay
        • Other than trimming and collapsing consecutive whitespace, I made some other corrections
        • -
        • I need to check the correct formatting of COTE D'IVOIRE vs COTE D’IVOIRE
        • +
        • I need to check the correct formatting of COTE D’IVOIRE vs COTE D’IVOIRE
        • I replaced all DOIs with HTTPS
        • I checked a few DOIs and found at least one that was missing, so I Googled the title of the paper and found the correct DOI
        • Also, I found an FAQ for DOI that says the dx.doi.org syntax is older, so I will replace all the DOIs with doi.org instead
        • @@ -180,7 +180,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
        $ for line in $(< /tmp/links.txt); do echo $line; http --print h $line; done
         
          -
        • Most of the links are good, though one is duplicate and one seems to even be incorrect in the publisher's site so…
        • +
        • Most of the links are good, though one is duplicate and one seems to even be incorrect in the publisher’s site so…
        • Also, there are some duplicates:
          • 10568/92241 and 10568/92230 (same DOI)
          • @@ -216,8 +216,8 @@ $ ./resolve-orcids.py -i /tmp/2018-05-06-combined.txt -o /tmp/2018-05-06-combine # sort names, copy to cg-creator-id.xml, add XML formatting, and then format with tidy (preserving accents) $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
      -
    • I made a pull request (#373) for this that I'll merge some time next week (I'm expecting Atmire to get back to us about DSpace 5.8 soon)
    • -
    • After testing quickly I just decided to merge it, and I noticed that I don't even need to restart Tomcat for the changes to get loaded
    • +
    • I made a pull request (#373) for this that I’ll merge some time next week (I’m expecting Atmire to get back to us about DSpace 5.8 soon)
    • +
    • After testing quickly I just decided to merge it, and I noticed that I don’t even need to restart Tomcat for the changes to get loaded

    2018-05-07

      @@ -225,7 +225,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
    • The documentation regarding the Solr stuff is limited, and I cannot figure out what all the fields in conciliator.properties are supposed to be
    • But then I found reconcile-csv, which allows you to reconcile against values in a CSV file!
    • That, combined with splitting our multi-value fields on “||” in OpenRefine is amaaaaazing, because after reconciliation you can just join them again
    • -
    • Oh wow, you can also facet on the individual values once you've split them! That's going to be amazing for proofing CRPs, subjects, etc.
    • +
    • Oh wow, you can also facet on the individual values once you’ve split them! That’s going to be amazing for proofing CRPs, subjects, etc.

    2018-05-09

    2018-12-13

      -
    • Oh this is very interesting: WorldFish's repository is live now
    • -
    • It's running DSpace 5.9-SNAPSHOT running on KnowledgeArc and the OAI and REST interfaces are active at least
    • -
    • Also, I notice they ended up registering a Handle (they had been considering taking KnowledgeArc's advice to not use Handles!)
    • +
    • Oh this is very interesting: WorldFish’s repository is live now
    • +
    • It’s running DSpace 5.9-SNAPSHOT running on KnowledgeArc and the OAI and REST interfaces are active at least
    • +
    • Also, I notice they ended up registering a Handle (they had been considering taking KnowledgeArc’s advice to not use Handles!)
    • Did some coordination work on the hotel bookings for the January AReS workshop in Amman

    2018-12-17

    @@ -479,7 +479,7 @@ $ ls -lh cgspace_2018-12-19.backup* -rw-r--r-- 1 aorth aorth 94M Dec 20 11:36 cgspace_2018-12-19.backup.gz -rw-r--r-- 1 aorth aorth 93M Dec 20 11:35 cgspace_2018-12-19.backup.xz

    2016-06-01

    @@ -287,7 +287,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and

    April, 2016

    by Alan Orth in -  + 

    @@ -295,8 +295,8 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and Read more → @@ -312,14 +312,14 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and

    March, 2016

    by Alan Orth in -  + 

    2016-03-02

    Read more → @@ -335,7 +335,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and

    February, 2016

    by Alan Orth in -  + 

    diff --git a/docs/categories/page/6/index.html b/docs/categories/page/6/index.html index 9982f5817..0ee3d3153 100644 --- a/docs/categories/page/6/index.html +++ b/docs/categories/page/6/index.html @@ -14,7 +14,7 @@ - + @@ -42,7 +42,7 @@ - + @@ -96,7 +96,7 @@

    January, 2016

    by Alan Orth in -  + 

    @@ -119,7 +119,7 @@

    December, 2015

    by Alan Orth in -  + 

    @@ -146,7 +146,7 @@

    November, 2015

    by Alan Orth in -  + 

    diff --git a/docs/cgiar-library-migration/index.html b/docs/cgiar-library-migration/index.html index 3b461491c..6edb9b537 100644 --- a/docs/cgiar-library-migration/index.html +++ b/docs/cgiar-library-migration/index.html @@ -15,7 +15,7 @@ - + @@ -46,7 +46,7 @@ - + @@ -93,10 +93,10 @@

    CGIAR Library Migration

    by Alan Orth in - Notes + Notes -  + 

    @@ -122,8 +122,8 @@
  • SELECT * FROM pg_stat_activity; seems to show ~6 extra connections used by the command line tools during import
  • -
  • Temporarily disable nightly index-discovery cron job because the import process will be taking place during some of this time and I don't want them to be competing to update the Solr index
  • -
  • Copy HTTPS certificate key pair from CGIAR Library server's Tomcat keystore:
  • +
  • Temporarily disable nightly index-discovery cron job because the import process will be taking place during some of this time and I don’t want them to be competing to update the Solr index
  • +
  • Copy HTTPS certificate key pair from CGIAR Library server’s Tomcat keystore:
  • $ keytool -list -keystore tomcat.keystore
     $ keytool -importkeystore -srckeystore tomcat.keystore -destkeystore library.cgiar.org.p12 -deststoretype PKCS12 -srcalias tomcat
    @@ -172,7 +172,7 @@ $ for item in 10947-2527/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aor
     $ dspace packager -s -t AIP -o ignoreHandle=false -e aorth@mjanja.ch -p 10568/83389 10947-1/10947-1.zip
     $ for collection in 10947-1/COLLECTION@10947-*; do dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
     $ for item in 10947-1/ITEM@10947-*; do dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
    -

    This submits AIP hierarchies recursively (-r) and suppresses errors when an item's parent collection hasn't been created yet—for example, if the item is mapped. The large historic archive (10947/1) is created in several steps because it requires a lot of memory and often crashes.

    +

    This submits AIP hierarchies recursively (-r) and suppresses errors when an item’s parent collection hasn’t been created yet—for example, if the item is mapped. The large historic archive (10947/1) is created in several steps because it requires a lot of memory and often crashes.

    Create new subcommunities and collections for content we reorganized into new hierarchies from the original:

    -
  • I exported a random item's metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
  • +
  • I exported a random item’s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
  • 0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
     
    @@ -148,14 +148,14 @@

    September, 2016

    by Alan Orth in -  + 

    2016-09-01

    @@ -174,7 +174,7 @@

    August, 2016

    by Alan Orth in -  + 

    @@ -204,7 +204,7 @@ $ git rebase -i dspace-5.5

    July, 2016

    by Alan Orth in -  + 

    @@ -235,14 +235,14 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and

    June, 2016

    by Alan Orth in -  + 

    2016-06-01

    -
  • I exported a random item's metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
  • +
  • I exported a random item’s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
  • 0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
     
    @@ -148,14 +148,14 @@

    September, 2016

    by Alan Orth in -  + 

    2016-09-01

    @@ -174,7 +174,7 @@

    August, 2016

    by Alan Orth in -  + 

    @@ -204,7 +204,7 @@ $ git rebase -i dspace-5.5

    July, 2016

    by Alan Orth in -  + 

    @@ -235,14 +235,14 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and

    June, 2016

    by Alan Orth in -  + 

    2016-06-01

    $ identify ~/Desktop/alc_contrastes_desafios.jpg
     /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
    @@ -273,7 +273,7 @@
         

    February, 2017

    @@ -292,7 +292,7 @@ dspace=# delete from collection2item where id = 92551 and item_id = 80278; DELETE 1
    Read more → @@ -307,15 +307,15 @@ DELETE 1

    January, 2017

    by Alan Orth in -  + 

    2017-01-02

    Read more → @@ -330,7 +330,7 @@ DELETE 1

    December, 2016

    by Alan Orth in -  + 

    @@ -345,8 +345,8 @@ DELETE 1 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607") 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
    Read more → diff --git a/docs/tags/notes/index.xml b/docs/tags/notes/index.xml index d1210445e..fb1a87cc0 100644 --- a/docs/tags/notes/index.xml +++ b/docs/tags/notes/index.xml @@ -23,7 +23,7 @@ </ul> <h2 id="2017-09-07">2017-09-07</h2> <ul> -<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group</li> +<li>Ask Sisay to clean up the WLE approvers a bit, as Marianne&rsquo;s user account is both in the approvers step as well as the group</li> </ul> @@ -47,7 +47,7 @@ </li> <li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li> <li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li> -<li>It turns out that we're already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li> +<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li> <li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li> <li>We might actually have to <em>block</em> these requests with HTTP 403 depending on the user agent</li> <li>Abenet pointed out that the CGIAR Library Historical Archive collection I sent July 20th only had ~100 entries, instead of 2415</li> @@ -70,8 +70,8 @@ <h2 id="2017-07-04">2017-07-04</h2> <ul> <li>Merge changes for WLE Phase II theme rename (<a href="https://github.com/ilri/DSpace/pull/329">#329</a>)</li> -<li>Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace</li> -<li>We can use PostgreSQL's extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> +<li>Looking at extracting the metadata registries from ICARDA&rsquo;s MEL DSpace database so we can compare fields with CGSpace</li> +<li>We can use PostgreSQL&rsquo;s extended output format (<code>-x</code>) plus <code>sed</code> to format the output into quasi XML:</li> </ul> @@ -81,7 +81,7 @@ Thu, 01 Jun 2017 10:14:52 +0300 https://alanorth.github.io/cgspace-notes/2017-06/ - 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we'll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. + 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we&rsquo;ll create a new sub-community for Phase II and create collections for the research themes there The current &ldquo;Research Themes&rdquo; community will be renamed to &ldquo;WLE Phase I Research Themes&rdquo; Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. @@ -90,7 +90,7 @@ Mon, 01 May 2017 16:21:52 +0200 https://alanorth.github.io/cgspace-notes/2017-05/ - 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it's a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire's CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. + 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it&rsquo;s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire&rsquo;s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. @@ -133,7 +133,7 @@ <li>Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI</li> <li>Filed an issue on DSpace issue tracker for the <code>filter-media</code> bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: <a href="https://jira.duraspace.org/browse/DS-3516">DS-3516</a></li> <li>Discovered that the ImageMagic <code>filter-media</code> plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK</li> -<li>Interestingly, it seems DSpace 4.x's thumbnails were sRGB, but forcing regeneration using DSpace 5.x's ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li> +<li>Interestingly, it seems DSpace 4.x&rsquo;s thumbnails were sRGB, but forcing regeneration using DSpace 5.x&rsquo;s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see <a href="https://cgspace.cgiar.org/handle/10568/51999">10568/51999</a>):</li> </ul> <pre><code>$ identify ~/Desktop/alc_contrastes_desafios.jpg /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 @@ -161,7 +161,7 @@ dspace=# delete from collection2item where id = 92551 and item_id = 80278; DELETE 1 </code></pre><ul> <li>Create issue on GitHub to track the addition of CCAFS Phase II project tags (<a href="https://github.com/ilri/DSpace/issues/301">#301</a>)</li> -<li>Looks like we'll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li> +<li>Looks like we&rsquo;ll be using <code>cg.identifier.ccafsprojectpii</code> as the field name</li> </ul> @@ -174,8 +174,8 @@ DELETE 1 <h2 id="2017-01-02">2017-01-02</h2> <ul> <li>I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error</li> -<li>I tested on DSpace Test as well and it doesn't work there either</li> -<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I'm not sure if we've ever had the sharding task run successfully over all these years</li> +<li>I tested on DSpace Test as well and it doesn&rsquo;t work there either</li> +<li>I asked on the dspace-tech mailing list because it seems to be broken, and actually now I&rsquo;m not sure if we&rsquo;ve ever had the sharding task run successfully over all these years</li> </ul> @@ -196,8 +196,8 @@ DELETE 1 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail=&quot;-1&quot;, transactionID=&quot;TX157907838689377964651674089851855413607&quot;) 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID=&quot;TX157907838689377964651674089851855413607&quot;) </code></pre><ul> -<li>I see thousands of them in the logs for the last few months, so it's not related to the DSpace 5.5 upgrade</li> -<li>I've raised a ticket with Atmire to ask</li> +<li>I see thousands of them in the logs for the last few months, so it&rsquo;s not related to the DSpace 5.5 upgrade</li> +<li>I&rsquo;ve raised a ticket with Atmire to ask</li> <li>Another worrying error from dspace.log is:</li> </ul> @@ -210,7 +210,7 @@ DELETE 1 https://alanorth.github.io/cgspace-notes/2016-11/ <h2 id="2016-11-01">2016-11-01</h2> <ul> -<li>Add <code>dc.type</code> to the output options for Atmire's Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> +<li>Add <code>dc.type</code> to the output options for Atmire&rsquo;s Listings and Reports module (<a href="https://github.com/ilri/DSpace/pull/286">#286</a>)</li> </ul> <p><img src="https://alanorth.github.io/cgspace-notes/cgspace-notes/2016/11/listings-and-reports.png" alt="Listings and Reports with output type"></p> @@ -230,7 +230,7 @@ DELETE 1 <li>ORCIDs plus normal authors</li> </ul> </li> -<li>I exported a random item's metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li> +<li>I exported a random item&rsquo;s metadata as CSV, deleted <em>all columns</em> except id and collection, and made a new coloum called <code>ORCID:dc.contributor.author</code> with the following random ORCIDs from the ORCID registry:</li> </ul> <pre><code>0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X </code></pre> @@ -245,7 +245,7 @@ DELETE 1 <h2 id="2016-09-01">2016-09-01</h2> <ul> <li>Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors</li> -<li>Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace</li> +<li>Discuss how the migration of CGIAR&rsquo;s Active Directory to a flat structure will break our LDAP groups in DSpace</li> <li>We had been using <code>DC=ILRI</code> to determine whether a user was ILRI or not</li> <li>It looks like we might be able to use OUs now, instead of DCs:</li> </ul> @@ -305,7 +305,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <h2 id="2016-06-01">2016-06-01</h2> <ul> <li>Experimenting with IFPRI OAI (we want to harvest their publications)</li> -<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI's OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> +<li>After reading the <a href="https://www.oclc.org/support/services/contentdm/help/server-admin-help/oai-support.en.html">ContentDM documentation</a> I found IFPRI&rsquo;s OAI endpoint: <a href="http://ebrary.ifpri.org/oai/oai.php">http://ebrary.ifpri.org/oai/oai.php</a></li> <li>After reading the <a href="https://www.openarchives.org/OAI/openarchivesprotocol.html">OAI documentation</a> and testing with an <a href="http://validator.oaipmh.com/">OAI validator</a> I found out how to get their publications</li> <li>This is their publications set: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc">http://ebrary.ifpri.org/oai/oai.php?verb=ListRecords&amp;from=2016-01-01&amp;set=p15738coll2&amp;metadataPrefix=oai_dc</a></li> <li>You can see the others by using the OAI <code>ListSets</code> verb: <a href="http://ebrary.ifpri.org/oai/oai.php?verb=ListSets">http://ebrary.ifpri.org/oai/oai.php?verb=ListSets</a></li> @@ -340,8 +340,8 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <ul> <li>Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit</li> <li>We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc</li> -<li>After running DSpace for over five years I've never needed to look in any other log file than dspace.log, leave alone one from last year!</li> -<li>This will save us a few gigs of backup space we're paying for on S3</li> +<li>After running DSpace for over five years I&rsquo;ve never needed to look in any other log file than dspace.log, leave alone one from last year!</li> +<li>This will save us a few gigs of backup space we&rsquo;re paying for on S3</li> <li>Also, I noticed the <code>checker</code> log has some errors we should pay attention to:</li> </ul> @@ -355,7 +355,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and <h2 id="2016-03-02">2016-03-02</h2> <ul> <li>Looking at issues with author authorities on CGSpace</li> -<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module</li> +<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I&rsquo;m pretty sure we don&rsquo;t need it as of the latest few versions of Atmire&rsquo;s Listings and Reports module</li> <li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li> </ul> diff --git a/docs/tags/notes/page/2/index.html b/docs/tags/notes/page/2/index.html index ee46ae5f9..7e0f55936 100644 --- a/docs/tags/notes/page/2/index.html +++ b/docs/tags/notes/page/2/index.html @@ -14,7 +14,7 @@ - + @@ -28,7 +28,7 @@ - + @@ -81,13 +81,13 @@

    November, 2016

    by Alan Orth in -  + 

    2016-11-01

    Listings and Reports with output type

    Read more → @@ -103,7 +103,7 @@

    October, 2016

    by Alan Orth in -  + 

    @@ -116,7 +116,7 @@
  • ORCIDs plus normal authors
  • -
  • I exported a random item's metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
  • +
  • I exported a random item’s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
  • 0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
     
    @@ -133,14 +133,14 @@

    September, 2016

    by Alan Orth in -  + 

    2016-09-01

    @@ -159,7 +159,7 @@

    August, 2016

    by Alan Orth in -  + 

    @@ -189,7 +189,7 @@ $ git rebase -i dspace-5.5

    July, 2016

    by Alan Orth in -  + 

    @@ -220,14 +220,14 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and

    June, 2016

    by Alan Orth in -  + 

    2016-06-01

    $ identify ~/Desktop/alc_contrastes_desafios.jpg
     /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
    @@ -288,7 +288,7 @@
         

    February, 2017

    @@ -307,7 +307,7 @@ dspace=# delete from collection2item where id = 92551 and item_id = 80278; DELETE 1
    Read more → @@ -322,15 +322,15 @@ DELETE 1

    January, 2017

    by Alan Orth in -  + 

    2017-01-02

    Read more → @@ -345,7 +345,7 @@ DELETE 1

    December, 2016

    by Alan Orth in -  + 

    @@ -360,8 +360,8 @@ DELETE 1 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607") 2016-12-02 03:00:32,353 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607") Read more → diff --git a/docs/tags/page/5/index.html b/docs/tags/page/5/index.html index b1a9fa521..12cae9196 100644 --- a/docs/tags/page/5/index.html +++ b/docs/tags/page/5/index.html @@ -14,7 +14,7 @@ - + @@ -42,7 +42,7 @@ - + @@ -96,13 +96,13 @@

    November, 2016

    by Alan Orth in -  + 

    2016-11-01

    Listings and Reports with output type

    Read more → @@ -118,7 +118,7 @@

    October, 2016

    by Alan Orth in -  + 

    @@ -131,7 +131,7 @@
  • ORCIDs plus normal authors
  • -
  • I exported a random item's metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
  • +
  • I exported a random item’s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
  • 0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
     
    @@ -148,14 +148,14 @@

    September, 2016

    by Alan Orth in -  + 

    2016-09-01

    @@ -174,7 +174,7 @@

    August, 2016

    by Alan Orth in -  + 

    @@ -204,7 +204,7 @@ $ git rebase -i dspace-5.5

    July, 2016

    by Alan Orth in -  + 

    @@ -235,14 +235,14 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and

    June, 2016

    by Alan Orth in -  + 

    2016-06-01