From ba5755d4413bbc95a30922b564dee076c945c802 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Tue, 14 Jan 2020 20:40:41 +0200 Subject: [PATCH] Add notes for 2020-01-14 --- content/posts/2020-01.md | 32 ++- docs/2015-11/index.html | 6 +- docs/2015-12/index.html | 6 +- docs/2016-01/index.html | 6 +- docs/2016-02/index.html | 6 +- docs/2016-03/index.html | 6 +- docs/2016-04/index.html | 6 +- docs/2016-05/index.html | 6 +- docs/2016-06/index.html | 6 +- docs/2016-07/index.html | 6 +- docs/2016-08/index.html | 6 +- docs/2016-09/index.html | 6 +- docs/2016-10/index.html | 6 +- docs/2016-11/index.html | 6 +- docs/2016-12/index.html | 6 +- docs/2017-01/index.html | 6 +- docs/2017-02/index.html | 6 +- docs/2017-03/index.html | 6 +- docs/2017-04/index.html | 6 +- docs/2017-05/index.html | 6 +- docs/2017-06/index.html | 6 +- docs/2017-07/index.html | 6 +- docs/2017-08/index.html | 6 +- docs/2017-09/index.html | 6 +- docs/2017-10/index.html | 6 +- docs/2017-11/index.html | 6 +- docs/2017-12/index.html | 6 +- docs/2018-01/index.html | 6 +- docs/2018-02/index.html | 6 +- docs/2018-03/index.html | 6 +- docs/2018-04/index.html | 6 +- docs/2018-05/index.html | 6 +- docs/2018-06/index.html | 6 +- docs/2018-07/index.html | 6 +- docs/2018-08/index.html | 6 +- docs/2018-09/index.html | 6 +- docs/2018-10/index.html | 6 +- docs/2018-11/index.html | 6 +- docs/2018-12/index.html | 6 +- docs/2019-01/index.html | 6 +- docs/2019-02/index.html | 6 +- docs/2019-03/index.html | 6 +- docs/2019-04/index.html | 6 +- docs/2019-05/index.html | 6 +- docs/2019-06/index.html | 6 +- docs/2019-07/index.html | 6 +- docs/2019-08/index.html | 6 +- docs/2019-09/index.html | 6 +- docs/2019-10/index.html | 6 +- docs/2019-11/index.html | 6 +- docs/2019-12/index.html | 6 +- docs/2020-01/index.html | 297 +++++++++++++++++++++ docs/404.html | 6 +- docs/categories/index.html | 88 +++--- docs/categories/index.xml | 4 +- docs/categories/notes/index.html | 86 +++--- docs/categories/notes/index.xml | 58 ++-- docs/categories/notes/page/2/index.html | 86 +++--- docs/categories/notes/page/3/index.html | 8 +- docs/categories/page/2/index.html | 88 +++--- docs/categories/page/3/index.html | 10 +- docs/categories/page/4/index.html | 10 +- docs/categories/page/5/index.html | 10 +- docs/categories/page/6/index.html | 10 +- docs/cgiar-library-migration/index.html | 6 +- docs/cgspace-cgcorev2-migration/index.html | 6 +- docs/index.html | 88 +++--- docs/index.xml | 58 ++-- docs/page/2/index.html | 88 +++--- docs/page/3/index.html | 10 +- docs/page/4/index.html | 10 +- docs/page/5/index.html | 10 +- docs/page/6/index.html | 10 +- docs/posts/index.html | 88 +++--- docs/posts/index.xml | 58 ++-- docs/posts/page/2/index.html | 88 +++--- docs/posts/page/3/index.html | 10 +- docs/posts/page/4/index.html | 10 +- docs/posts/page/5/index.html | 10 +- docs/posts/page/6/index.html | 10 +- docs/robots.txt | 4 +- docs/sitemap.xml | 34 +-- docs/tags/index.html | 84 +++--- docs/tags/migration/index.html | 6 +- docs/tags/notes/index.html | 6 +- docs/tags/notes/page/2/index.html | 6 +- docs/tags/notes/page/3/index.html | 6 +- docs/tags/page/2/index.html | 84 +++--- docs/tags/page/3/index.html | 6 +- docs/tags/page/4/index.html | 6 +- docs/tags/page/5/index.html | 6 +- docs/tags/page/6/index.html | 6 +- 92 files changed, 1116 insertions(+), 791 deletions(-) create mode 100644 docs/2020-01/index.html diff --git a/content/posts/2020-01.md b/content/posts/2020-01.md index 89baa28d6..05ee007e2 100644 --- a/content/posts/2020-01.md +++ b/content/posts/2020-01.md @@ -1,6 +1,6 @@ --- title: "January, 2020" -date: 2019-01-06T10:48:30+02:00 +date: 2020-01-06T10:48:30+02:00 author: "Alan Orth" categories: ["Notes"] --- @@ -53,7 +53,7 @@ $ sed -n '5227p' /tmp/2020-01-08-authors.csv | xxd -c1 00000007: 72 r ``` -- According to the blog post linked above the troublesome character is probably the "High Octect Preset" (81), which vim identifies (using `ga` on the character) as: +- ~~According to the blog post linked above the troublesome character is probably the "High Octect Preset" (81)~~, which vim identifies (using `ga` on the character) as: ``` 101, Hex 65, Octal 145 < ́> 769, Hex 0301, Octal 1401 @@ -65,4 +65,32 @@ $ sed -n '5227p' /tmp/2020-01-08-authors.csv | xxd -c1 - I think the solution is to upload it to Google Docs, or just send it to him and deal with each case manually in the corrections he sends me - Re-deploy DSpace Test (linode19) with a fresh snapshot of the CGSpace database and assetstore, and using the `5_x-prod` (no CG Core v2) branch +## 2020-01-14 + +- I checked the yearly Solr statistics sharding cron job that should have run on 2020-01 on CGSpace (linode18) and saw that there was an error + - I manually ran it on the server as the DSpace user and it said "Moving: 51633080 into core statistics-2019" + - After a few hours it died with the same error that I had seen in the log from the first run: + +``` +Exception: Read timed out +java.net.SocketTimeoutException: Read timed out +``` + +- I am not sure how I will fix that shard... +- I discovered a very interesting tool called [ftfy](https://github.com/LuminosoInsight/python-ftfy) that attempts to fix errors in UTF-8 + - I'm curious to start checking input files with this to see what it highlights + - I ran it on the authors file from last week and it converted characters like those with Spanish accents from multi-byte sequences (I don't know what it's called?) to digraphs (é→é), which vim identifies as: + - ` 101, Hex 65, Octal 145 < ́> 769, Hex 0301, Octal 1401` + - `<é> 233, Hex 00e9, Oct 351, Digr e'` +- Ah hah! We need to be [normalizing characters into their canonical forms](https://withblue.ink/2019/03/11/why-you-need-to-normalize-unicode-strings.html)! + - In Python 3.8 we can even [check if the string is normalized using the `unicodedata` library](https://docs.python.org/3/library/unicodedata.html): + +``` +In [7]: unicodedata.is_normalized('NFC', 'é') +Out[7]: False + +In [8]: unicodedata.is_normalized('NFC', 'é') +Out[8]: True +``` + diff --git a/docs/2015-11/index.html b/docs/2015-11/index.html index 5144140de..18f6072bd 100644 --- a/docs/2015-11/index.html +++ b/docs/2015-11/index.html @@ -31,7 +31,7 @@ Last week I had increased the limit from 30 to 60, which seemed to help, but now $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 78 "/> - + @@ -234,6 +234,8 @@ db.statementpool = true
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -242,8 +244,6 @@ db.statementpool = true
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2015-12/index.html b/docs/2015-12/index.html index c1209bce6..45df480c7 100644 --- a/docs/2015-12/index.html +++ b/docs/2015-12/index.html @@ -33,7 +33,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz "/> - + @@ -256,6 +256,8 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -264,8 +266,6 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2016-01/index.html b/docs/2016-01/index.html index 1fe1f4868..1f50449f8 100644 --- a/docs/2016-01/index.html +++ b/docs/2016-01/index.html @@ -25,7 +25,7 @@ Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_ I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated. Update GitHub wiki for documentation of maintenance tasks. "/> - + @@ -192,6 +192,8 @@ $ find SimpleArchiveForBio/ -iname “*.pdf” -exec basename {} ; | sor
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -200,8 +202,6 @@ $ find SimpleArchiveForBio/ -iname “*.pdf” -exec basename {} ; | sor
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2016-02/index.html b/docs/2016-02/index.html index 55ccfc366..9cad7af2a 100644 --- a/docs/2016-02/index.html +++ b/docs/2016-02/index.html @@ -35,7 +35,7 @@ I noticed we have a very interesting list of countries on CGSpace: Not only are there 49,000 countries, we have some blanks (25)… Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE” "/> - + @@ -370,6 +370,8 @@ Bitstream: tést señora alimentación.pdf
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -378,8 +380,6 @@ Bitstream: tést señora alimentación.pdf
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2016-03/index.html b/docs/2016-03/index.html index e6acc09ca..88c08a732 100644 --- a/docs/2016-03/index.html +++ b/docs/2016-03/index.html @@ -25,7 +25,7 @@ Looking at issues with author authorities on CGSpace For some reason we still have the index-lucene-update cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server "/> - + @@ -308,6 +308,8 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -316,8 +318,6 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2016-04/index.html b/docs/2016-04/index.html index bf33023d6..53cbae4c4 100644 --- a/docs/2016-04/index.html +++ b/docs/2016-04/index.html @@ -29,7 +29,7 @@ After running DSpace for over five years I've never needed to look in any ot This will save us a few gigs of backup space we're paying for on S3 Also, I noticed the checker log has some errors we should pay attention to: "/> - + @@ -487,6 +487,8 @@ dspace.log.2016-04-27:7271
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -495,8 +497,6 @@ dspace.log.2016-04-27:7271
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2016-05/index.html b/docs/2016-05/index.html index e3957d245..fd8227fc5 100644 --- a/docs/2016-05/index.html +++ b/docs/2016-05/index.html @@ -31,7 +31,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period! # awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l 3168 "/> - + @@ -363,6 +363,8 @@ sys 0m20.540s
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -371,8 +373,6 @@ sys 0m20.540s
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2016-06/index.html b/docs/2016-06/index.html index bc77e56d1..64a2ab5c4 100644 --- a/docs/2016-06/index.html +++ b/docs/2016-06/index.html @@ -31,7 +31,7 @@ This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRec You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship "/> - + @@ -401,6 +401,8 @@ $ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-D
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -409,8 +411,6 @@ $ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-D
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2016-07/index.html b/docs/2016-07/index.html index 586b8a0c8..059f243e9 100644 --- a/docs/2016-07/index.html +++ b/docs/2016-07/index.html @@ -41,7 +41,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and In this case the select query was showing 95 results before the update "/> - + @@ -317,6 +317,8 @@ discovery.index.authority.ignore-variants=true
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -325,8 +327,6 @@ discovery.index.authority.ignore-variants=true
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2016-08/index.html b/docs/2016-08/index.html index 365a93a96..7eaa58c83 100644 --- a/docs/2016-08/index.html +++ b/docs/2016-08/index.html @@ -39,7 +39,7 @@ $ git checkout -b 55new 5_x-prod $ git reset --hard ilri/5_x-prod $ git rebase -i dspace-5.5 "/> - + @@ -381,6 +381,8 @@ $ JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m" /home/cgspace.cgiar.org/b
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -389,8 +391,6 @@ $ JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m" /home/cgspace.cgiar.org/b
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2016-09/index.html b/docs/2016-09/index.html index 2a7904d93..3d52961c8 100644 --- a/docs/2016-09/index.html +++ b/docs/2016-09/index.html @@ -31,7 +31,7 @@ It looks like we might be able to use OUs now, instead of DCs: $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)" "/> - + @@ -598,6 +598,8 @@ $ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -606,8 +608,6 @@ $ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2016-10/index.html b/docs/2016-10/index.html index 0e109dd98..a9533ab25 100644 --- a/docs/2016-10/index.html +++ b/docs/2016-10/index.html @@ -39,7 +39,7 @@ I exported a random item's metadata as CSV, deleted all columns except id an 0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X "/> - + @@ -364,6 +364,8 @@ dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http:
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -372,8 +374,6 @@ dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http:
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2016-11/index.html b/docs/2016-11/index.html index 3e9c9d695..d058ee71f 100644 --- a/docs/2016-11/index.html +++ b/docs/2016-11/index.html @@ -23,7 +23,7 @@ Add dc.type to the output options for Atmire's Listings and Reports module ( Add dc.type to the output options for Atmire's Listings and Reports module (#286) "/> - + @@ -540,6 +540,8 @@ org.dspace.discovery.SearchServiceException: Error executing query
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -548,8 +550,6 @@ org.dspace.discovery.SearchServiceException: Error executing query
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2016-12/index.html b/docs/2016-12/index.html index 5465b0cda..9920645b9 100644 --- a/docs/2016-12/index.html +++ b/docs/2016-12/index.html @@ -43,7 +43,7 @@ I see thousands of them in the logs for the last few months, so it's not rel I've raised a ticket with Atmire to ask Another worrying error from dspace.log is: "/> - + @@ -776,6 +776,8 @@ $ exit
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -784,8 +786,6 @@ $ exit
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2017-01/index.html b/docs/2017-01/index.html index 52ddc0322..b06a16f60 100644 --- a/docs/2017-01/index.html +++ b/docs/2017-01/index.html @@ -25,7 +25,7 @@ I checked to see if the Solr sharding task that is supposed to run on January 1s I tested on DSpace Test as well and it doesn't work there either I asked on the dspace-tech mailing list because it seems to be broken, and actually now I'm not sure if we've ever had the sharding task run successfully over all these years "/> - + @@ -361,6 +361,8 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -369,8 +371,6 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2017-02/index.html b/docs/2017-02/index.html index c60ae30c2..0f60d4d90 100644 --- a/docs/2017-02/index.html +++ b/docs/2017-02/index.html @@ -47,7 +47,7 @@ DELETE 1 Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301) Looks like we'll be using cg.identifier.ccafsprojectpii as the field name "/> - + @@ -416,6 +416,8 @@ COPY 1968
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -424,8 +426,6 @@ COPY 1968
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2017-03/index.html b/docs/2017-03/index.html index 5a6b9efb0..154454c9f 100644 --- a/docs/2017-03/index.html +++ b/docs/2017-03/index.html @@ -51,7 +51,7 @@ Interestingly, it seems DSpace 4.x's thumbnails were sRGB, but forcing regen $ identify ~/Desktop/alc_contrastes_desafios.jpg /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 "/> - + @@ -347,6 +347,8 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -355,8 +357,6 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2017-04/index.html b/docs/2017-04/index.html index 674c35813..18aa303af 100644 --- a/docs/2017-04/index.html +++ b/docs/2017-04/index.html @@ -37,7 +37,7 @@ Testing the CMYK patch on a collection with 650 items: $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt "/> - + @@ -577,6 +577,8 @@ $ gem install compass -v 1.0.3
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -585,8 +587,6 @@ $ gem install compass -v 1.0.3
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2017-05/index.html b/docs/2017-05/index.html index 1b16dde12..c21a0d008 100644 --- a/docs/2017-05/index.html +++ b/docs/2017-05/index.html @@ -15,7 +15,7 @@ - + @@ -383,6 +383,8 @@ UPDATE 187
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -391,8 +393,6 @@ UPDATE 187
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2017-06/index.html b/docs/2017-06/index.html index d83e55d7f..e41a8ed62 100644 --- a/docs/2017-06/index.html +++ b/docs/2017-06/index.html @@ -15,7 +15,7 @@ - + @@ -262,6 +262,8 @@ $ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace impo
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -270,8 +272,6 @@ $ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace impo
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2017-07/index.html b/docs/2017-07/index.html index 883f5b26a..b6fa11d97 100644 --- a/docs/2017-07/index.html +++ b/docs/2017-07/index.html @@ -33,7 +33,7 @@ Merge changes for WLE Phase II theme rename (#329) Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace We can use PostgreSQL's extended output format (-x) plus sed to format the output into quasi XML: "/> - + @@ -267,6 +267,8 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -275,8 +277,6 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2017-08/index.html b/docs/2017-08/index.html index e213f1af1..034b791a5 100644 --- a/docs/2017-08/index.html +++ b/docs/2017-08/index.html @@ -57,7 +57,7 @@ This was due to newline characters in the dc.description.abstract column, which I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet "/> - + @@ -509,6 +509,8 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -517,8 +519,6 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2017-09/index.html b/docs/2017-09/index.html index 0a9945e94..16a8cdd8b 100644 --- a/docs/2017-09/index.html +++ b/docs/2017-09/index.html @@ -29,7 +29,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group "/> - + @@ -651,6 +651,8 @@ Cert Status: good
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -659,8 +661,6 @@ Cert Status: good
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2017-10/index.html b/docs/2017-10/index.html index f83c9e3b9..c23d91197 100644 --- a/docs/2017-10/index.html +++ b/docs/2017-10/index.html @@ -31,7 +31,7 @@ http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections "/> - + @@ -435,6 +435,8 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -443,8 +445,6 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2017-11/index.html b/docs/2017-11/index.html index 84f9b5d87..f89f5b602 100644 --- a/docs/2017-11/index.html +++ b/docs/2017-11/index.html @@ -45,7 +45,7 @@ Generate list of authors on CGSpace for Peter to go through and correct: dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; COPY 54701 "/> - + @@ -936,6 +936,8 @@ $ cat dspace.log.2017-11-28 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | u
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -944,8 +946,6 @@ $ cat dspace.log.2017-11-28 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | u
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2017-12/index.html b/docs/2017-12/index.html index 27e2ba488..3089f445a 100644 --- a/docs/2017-12/index.html +++ b/docs/2017-12/index.html @@ -27,7 +27,7 @@ The logs say “Timeout waiting for idle object” PostgreSQL activity says there are 115 connections currently The list of connections to XMLUI and REST API for today: "/> - + @@ -775,6 +775,8 @@ DELETE 20
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -783,8 +785,6 @@ DELETE 20
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2018-01/index.html b/docs/2018-01/index.html index cf004a782..310891c32 100644 --- a/docs/2018-01/index.html +++ b/docs/2018-01/index.html @@ -147,7 +147,7 @@ dspace.log.2018-01-02:34 Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains "/> - + @@ -1444,6 +1444,8 @@ Catalina:type=Manager,context=/,host=localhost activeSessions 8
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -1452,8 +1454,6 @@ Catalina:type=Manager,context=/,host=localhost activeSessions 8
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2018-02/index.html b/docs/2018-02/index.html index cf7139c2e..e1868f85d 100644 --- a/docs/2018-02/index.html +++ b/docs/2018-02/index.html @@ -27,7 +27,7 @@ We don't need to distinguish between internal and external works, so that ma Yesterday I figured out how to monitor DSpace sessions using JMX I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu's munin-plugins-java package and used the stuff I discovered about JMX in 2018-01 "/> - + @@ -1031,6 +1031,8 @@ UPDATE 3
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -1039,8 +1041,6 @@ UPDATE 3
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2018-03/index.html b/docs/2018-03/index.html index a73abe11b..3da5aadba 100644 --- a/docs/2018-03/index.html +++ b/docs/2018-03/index.html @@ -21,7 +21,7 @@ Export a CSV of the IITA community metadata for Martin Mueller Export a CSV of the IITA community metadata for Martin Mueller "/> - + @@ -577,6 +577,8 @@ Fixed 5 occurences of: GENEBANKS
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -585,8 +587,6 @@ Fixed 5 occurences of: GENEBANKS
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2018-04/index.html b/docs/2018-04/index.html index 9b21b2cad..b90d95986 100644 --- a/docs/2018-04/index.html +++ b/docs/2018-04/index.html @@ -23,7 +23,7 @@ Catalina logs at least show some memory errors yesterday: I tried to test something on DSpace Test but noticed that it's down since god knows when Catalina logs at least show some memory errors yesterday: "/> - + @@ -586,6 +586,8 @@ $ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -594,8 +596,6 @@ $ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2018-05/index.html b/docs/2018-05/index.html index c22dfb70f..789413703 100644 --- a/docs/2018-05/index.html +++ b/docs/2018-05/index.html @@ -35,7 +35,7 @@ http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E Then I reduced the JVM heap size from 6144 back to 5120m Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use "/> - + @@ -515,6 +515,8 @@ $ psql -h localhost -U postgres dspacetest
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -523,8 +525,6 @@ $ psql -h localhost -U postgres dspacetest
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2018-06/index.html b/docs/2018-06/index.html index 9991d1a96..3ca371c00 100644 --- a/docs/2018-06/index.html +++ b/docs/2018-06/index.html @@ -55,7 +55,7 @@ real 74m42.646s user 8m5.056s sys 2m7.289s "/> - + @@ -506,6 +506,8 @@ $ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 > map-to-cifor-archive.csv
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -514,8 +516,6 @@ $ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 > map-to-cifor-archive.csv
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2018-07/index.html b/docs/2018-07/index.html index 76a559ac9..70b60beca 100644 --- a/docs/2018-07/index.html +++ b/docs/2018-07/index.html @@ -33,7 +33,7 @@ During the mvn package stage on the 5.8 branch I kept getting issues with java r There is insufficient memory for the Java Runtime Environment to continue. "/> - + @@ -561,6 +561,8 @@ dspace=# select count(text_value) from metadatavalue where resource_type_id=2 an
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -569,8 +571,6 @@ dspace=# select count(text_value) from metadatavalue where resource_type_id=2 an
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2018-08/index.html b/docs/2018-08/index.html index 8e6dbc989..405668410 100644 --- a/docs/2018-08/index.html +++ b/docs/2018-08/index.html @@ -43,7 +43,7 @@ Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did The server only has 8GB of RAM so we'll eventually need to upgrade to a larger one because we'll start starving the OS, PostgreSQL, and command line batch processes I ran all system updates on DSpace Test and rebooted it "/> - + @@ -434,6 +434,8 @@ $ dspace database migrate ignored
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -442,8 +444,6 @@ $ dspace database migrate ignored
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2018-09/index.html b/docs/2018-09/index.html index 8ae6d9688..edddd7d20 100644 --- a/docs/2018-09/index.html +++ b/docs/2018-09/index.html @@ -27,7 +27,7 @@ I'll update the DSpace role in our Ansible infrastructure playbooks and run Also, I'll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system's RAM, and we never re-ran them after migrating to larger Linodes last month I'm testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I'm getting those autowire errors in Tomcat 8.5.30 again: "/> - + @@ -740,6 +740,8 @@ UPDATE metadatavalue SET text_value='ja' WHERE resource_type_id=2 AND metadata_f
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -748,8 +750,6 @@ UPDATE metadatavalue SET text_value='ja' WHERE resource_type_id=2 AND metadata_f
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2018-10/index.html b/docs/2018-10/index.html index a0997e910..6f87de332 100644 --- a/docs/2018-10/index.html +++ b/docs/2018-10/index.html @@ -23,7 +23,7 @@ I created a GitHub issue to track this #389, because I'm super busy in Nairo Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items I created a GitHub issue to track this #389, because I'm super busy in Nairobi right now "/> - + @@ -649,6 +649,8 @@ $ curl -X GET -H "Content-Type: application/json" -H "Accept: app
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -657,8 +659,6 @@ $ curl -X GET -H "Content-Type: application/json" -H "Accept: app
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2018-11/index.html b/docs/2018-11/index.html index 111c9782e..547d30ea4 100644 --- a/docs/2018-11/index.html +++ b/docs/2018-11/index.html @@ -33,7 +33,7 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage Today these are the top 10 IPs: "/> - + @@ -545,6 +545,8 @@ $ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -553,8 +555,6 @@ $ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2018-12/index.html b/docs/2018-12/index.html index 7b0ec999e..9a84b1f02 100644 --- a/docs/2018-12/index.html +++ b/docs/2018-12/index.html @@ -33,7 +33,7 @@ Then I ran all system updates and restarted the server I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week "/> - + @@ -586,6 +586,8 @@ UPDATE 1
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -594,8 +596,6 @@ UPDATE 1
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2019-01/index.html b/docs/2019-01/index.html index 223827286..0c9aae24d 100644 --- a/docs/2019-01/index.html +++ b/docs/2019-01/index.html @@ -47,7 +47,7 @@ I don't see anything interesting in the web server logs around that time tho 357 207.46.13.1 903 54.70.40.11 "/> - + @@ -1256,6 +1256,8 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -1264,8 +1266,6 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2019-02/index.html b/docs/2019-02/index.html index e255df5a6..eb1416687 100644 --- a/docs/2019-02/index.html +++ b/docs/2019-02/index.html @@ -69,7 +69,7 @@ real 0m19.873s user 0m22.203s sys 0m1.979s "/> - + @@ -1336,6 +1336,8 @@ Please see the DSpace documentation for assistance.
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -1344,8 +1346,6 @@ Please see the DSpace documentation for assistance.
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2019-03/index.html b/docs/2019-03/index.html index 8386e00fa..6fdf95126 100644 --- a/docs/2019-03/index.html +++ b/docs/2019-03/index.html @@ -43,7 +43,7 @@ Most worryingly, there are encoding errors in the abstracts for eleven items, fo I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs "/> - + @@ -1200,6 +1200,8 @@ sys 0m2.551s
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -1208,8 +1210,6 @@ sys 0m2.551s
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2019-04/index.html b/docs/2019-04/index.html index 7a9f24460..4f8a4d04a 100644 --- a/docs/2019-04/index.html +++ b/docs/2019-04/index.html @@ -61,7 +61,7 @@ $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u ds $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d "/> - + @@ -1291,6 +1291,8 @@ UPDATE 14
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -1299,8 +1301,6 @@ UPDATE 14
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2019-05/index.html b/docs/2019-05/index.html index 929bf4ab1..0d549ae8f 100644 --- a/docs/2019-05/index.html +++ b/docs/2019-05/index.html @@ -45,7 +45,7 @@ DELETE 1 But after this I tried to delete the item from the XMLUI and it is still present… "/> - + @@ -623,6 +623,8 @@ COPY 64871
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -631,8 +633,6 @@ COPY 64871
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2019-06/index.html b/docs/2019-06/index.html index c36fdb287..ca9429fb8 100644 --- a/docs/2019-06/index.html +++ b/docs/2019-06/index.html @@ -31,7 +31,7 @@ Run system updates on CGSpace (linode18) and reboot it Skype with Marie-Angélique and Abenet about CG Core v2 "/> - + @@ -309,6 +309,8 @@ UPDATE 2
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -317,8 +319,6 @@ UPDATE 2
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2019-07/index.html b/docs/2019-07/index.html index 3bce25028..586a5d4fb 100644 --- a/docs/2019-07/index.html +++ b/docs/2019-07/index.html @@ -35,7 +35,7 @@ CGSpace Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community "/> - + @@ -546,6 +546,8 @@ issn.validate('1020-3362')
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -554,8 +556,6 @@ issn.validate('1020-3362')
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2019-08/index.html b/docs/2019-08/index.html index f4a1cc968..63dc776dd 100644 --- a/docs/2019-08/index.html +++ b/docs/2019-08/index.html @@ -43,7 +43,7 @@ After rebooting, all statistics cores were loaded… wow, that's lucky. Run system updates on DSpace Test (linode19) and reboot it "/> - + @@ -565,6 +565,8 @@ sys 2m27.496s
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -573,8 +575,6 @@ sys 2m27.496s
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2019-09/index.html b/docs/2019-09/index.html index dcc3ee34b..6c181a550 100644 --- a/docs/2019-09/index.html +++ b/docs/2019-09/index.html @@ -69,7 +69,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning: 7249 2a01:7e00::f03c:91ff:fe18:7396 9124 45.5.186.2 "/> - + @@ -573,6 +573,8 @@ $ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institut
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -581,8 +583,6 @@ $ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institut
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2019-10/index.html b/docs/2019-10/index.html index 1adacd22f..6278eb582 100644 --- a/docs/2019-10/index.html +++ b/docs/2019-10/index.html @@ -15,7 +15,7 @@ - + @@ -377,6 +377,8 @@ $ dspace import -a -c 10568/104057 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -385,8 +387,6 @@ $ dspace import -a -c 10568/104057 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2019-11/index.html b/docs/2019-11/index.html index e679d4cdb..342d1aff1 100644 --- a/docs/2019-11/index.html +++ b/docs/2019-11/index.html @@ -55,7 +55,7 @@ Let's see how many of the REST API requests were for bitstreams (because the # zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams" 106781 "/> - + @@ -684,6 +684,8 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -692,8 +694,6 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2019-12/index.html b/docs/2019-12/index.html index b27df16e5..073bec0bc 100644 --- a/docs/2019-12/index.html +++ b/docs/2019-12/index.html @@ -43,7 +43,7 @@ Make sure all packages are up to date and the package manager is up to date, the # dpkg -C # reboot "/> - + @@ -396,6 +396,8 @@ UPDATE 1
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -404,8 +406,6 @@ UPDATE 1
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/2020-01/index.html b/docs/2020-01/index.html new file mode 100644 index 000000000..d8b52f38c --- /dev/null +++ b/docs/2020-01/index.html @@ -0,0 +1,297 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + January, 2020 | CGSpace Notes + + + + + + + + + + + + + + + + + + +
+
+ +
+
+ + + + +
+
+

CGSpace Notes

+

Documenting day-to-day work on the CGSpace repository.

+
+
+ + + + +
+
+
+ + + + +
+
+

January, 2020

+ +
+

2020-01-06

+
    +
  • Open a ticket with Atmire to request a quote for the upgrade to DSpace 6
  • +
  • Last week Altmetric responded about the item that had a lower score than than its DOI +
      +
    • The score is now linked to the DOI
    • +
    • Another item that had the same problem in 2019 has now also linked to the score for its DOI
    • +
    • Another item that had the same problem in 2019 has also been fixed
    • +
    +
  • +
+

2020-01-07

+
    +
  • Peter Ballantyne highlighted one more WLE item that is missing the Altmetric score that its DOI has +
      +
    • The DOI has a score of 259, but the Handle has no score at all
    • +
    • I tweeted the CGSpace repository link
    • +
    +
  • +
+

2020-01-08

+
    +
  • Export a list of authors from CGSpace for Peter Ballantyne to look through and correct:
  • +
+
dspace=# \COPY (SELECT DISTINCT text_value as "dc.contributor.author", count(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 3 GROUP BY text_value ORDER BY count DESC) to /tmp/2020-01-08-authors.csv WITH CSV HEADER;
+COPY 68790
+
    +
  • As I always have encoding issues with files Peter sends, I tried to convert it to some Windows encoding, but got an error:
  • +
+
$ iconv -f utf-8 -t windows-1252 /tmp/2020-01-08-authors.csv -o /tmp/2020-01-08-authors-windows.csv
+iconv: illegal input sequence at position 104779
+
    +
  • According to this trick the troublesome character is on line 5227:
  • +
+
$ awk 'END {print NR": "$0}' /tmp/2020-01-08-authors-windows.csv                                   
+5227: "Oue
+$ sed -n '5227p' /tmp/2020-01-08-authors.csv | xxd -c1
+00000000: 22  "
+00000001: 4f  O
+00000002: 75  u
+00000003: 65  e
+00000004: cc  .
+00000005: 81  .
+00000006: 64  d
+00000007: 72  r
+
    +
  • According to the blog post linked above the troublesome character is probably the “High Octect Preset” (81), which vim identifies (using ga on the character) as:
  • +
+
<e>  101,  Hex 65,  Octal 145 < ́> 769, Hex 0301, Octal 1401
+
    +
  • If I understand the situation correctly it sounds like this means that the character is not actually encoded as UTF-8, so it's stored incorrectly in the database…
  • +
  • Other encodings like windows-1251 and windows-1257 also fail on different characters like “ž” and “é” that are legitimate UTF-8 characters
  • +
  • Then there is the issue of Russian, Chinese, etc characters, which are simply not representable in any of those encodings
  • +
  • I think the solution is to upload it to Google Docs, or just send it to him and deal with each case manually in the corrections he sends me
  • +
  • Re-deploy DSpace Test (linode19) with a fresh snapshot of the CGSpace database and assetstore, and using the 5_x-prod (no CG Core v2) branch
  • +
+

2020-01-14

+
    +
  • I checked the yearly Solr statistics sharding cron job that should have run on 2020-01 on CGSpace (linode18) and saw that there was an error +
      +
    • I manually ran it on the server as the DSpace user and it said “Moving: 51633080 into core statistics-2019”
    • +
    • After a few hours it died with the same error that I had seen in the log from the first run:
    • +
    +
  • +
+
Exception: Read timed out
+java.net.SocketTimeoutException: Read timed out
+
    +
  • I am not sure how I will fix that shard…
  • +
  • I discovered a very interesting tool called ftfy that attempts to fix errors in UTF-8 +
      +
    • I'm curious to start checking input files with this to see what it highlights
    • +
    • I ran it on the authors file from last week and it converted characters like those with Spanish accents from multi-byte sequences (I don't know what it's called?) to digraphs (é→é), which vim identifies as:
    • +
    • <e> 101, Hex 65, Octal 145 < ́> 769, Hex 0301, Octal 1401
    • +
    • <é> 233, Hex 00e9, Oct 351, Digr e'
    • +
    +
  • +
  • Ah hah! We need to be normalizing characters into their canonical forms! + +
  • +
+
In [7]: unicodedata.is_normalized('NFC', 'é')
+Out[7]: False
+
+In [8]: unicodedata.is_normalized('NFC', 'é')
+Out[8]: True
+
+ + + + + +
+ + + +
+ + + + +
+
+ + + + + + + + + diff --git a/docs/404.html b/docs/404.html index f1494bcc6..8a9bc5df5 100644 --- a/docs/404.html +++ b/docs/404.html @@ -14,7 +14,7 @@ - + @@ -89,6 +89,8 @@
    +
  1. January, 2020
  2. +
  3. December, 2019
  4. November, 2019
  5. @@ -97,8 +99,6 @@
  6. October, 2019
  7. -
  8. September, 2019
  9. -
diff --git a/docs/categories/index.html b/docs/categories/index.html index 18b83aa2b..f6a60f65a 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -9,12 +9,12 @@ - + - + @@ -28,7 +28,7 @@ "@type": "Person", "name": "Alan Orth" }, - "dateModified": "2019-12-01T11:22:30+02:00", + "dateModified": "2020-01-06T10:48:30+02:00", "keywords": "notes,migration,notes,", "description": "Documenting day-to-day work on the [CGSpace](https:\/\/cgspace.cgiar.org) repository." } @@ -90,6 +90,43 @@ +
+
+

January, 2020

+ +
+

2020-01-06

+
    +
  • Open a ticket with Atmire to request a quote for the upgrade to DSpace 6
  • +
  • Last week Altmetric responded about the item that had a lower score than than its DOI +
      +
    • The score is now linked to the DOI
    • +
    • Another item that had the same problem in 2019 has now also linked to the score for its DOI
    • +
    • Another item that had the same problem in 2019 has also been fixed
    • +
    +
  • +
+

2020-01-07

+
    +
  • Peter Ballantyne highlighted one more WLE item that is missing the Altmetric score that its DOI has +
      +
    • The DOI has a score of 259, but the Handle has no score at all
    • +
    • I tweeted the CGSpace repository link
    • +
    +
  • +
+ Read more → +
+ + + + + +

December, 2019

@@ -362,47 +399,6 @@ DELETE 1 - -
-
-

April, 2019

- -
-

2019-04-01

-
    -
  • Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc -
      -
    • They asked if we had plans to enable RDF support in CGSpace
    • -
    -
  • -
  • There have been 4,400 more downloads of the CTA Spore publication from those strange Amazon IP addresses today -
      -
    • I suspected that some might not be successful, because the stats show less, but today they were all HTTP 200!
    • -
    -
  • -
-
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
-   4432 200
-
    -
  • In the last two weeks there have been 47,000 downloads of this same exact PDF by these three IP addresses
  • -
  • Apply country and region corrections and deletions on DSpace Test and CGSpace:
  • -
-
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
-$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
-$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
-$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
-
- Read more → -
- - - - -