diff --git a/content/posts/2020-10.md b/content/posts/2020-10.md index 77f7cc92e..7d9a53f4a 100644 --- a/content/posts/2020-10.md +++ b/content/posts/2020-10.md @@ -359,4 +359,55 @@ sys 2m22.713s - I found a new setting in DSpace 6's `usage-statistics.cfg` about case insensitive matching of bots that defaults to false, so I enabled it in our DSpace 6 branch - I am curious to see if that resolves the strange issues I noticed yesterday about bot matching of patterns in the spider agents file completely not working +## 2020-10-15 + +- Re-deploy latest code on both CGSpace and DSpace Test to get the input forms changes + - Run system updates and reboot each server (linode18 and linode26) + - I had to restart Tomcat seven times on CGSpace before all Solr stats cores came up OK +- Skype with Peter and Abenet about AReS and CGSpace + - We agreed to lower case the AGROVOC subjects on CGSpace to make it harmonized with MELSpace and WorldFish + - We agreed to separate the AGROVOC from the other center- and CRP-specific subjects so that the search and tag clouds are cleaner and more useful + - We added a filter for journal title +- I enabled anonymous access to the "Export search metadata" option on DSpace Test + - If I search for author containing "Orth, Alan" or "Orth Alan" the export search metadata returns HTTP 400 + - If I search for author containing "Orth" it exports a CSV properly... +- I created issues on the OpenRXV repository: + - [Can't download templates that have spaces in their file name](https://github.com/ilri/OpenRXV/issues/42) + - [Can't search for text values with a space in "Mapping Values" interface](https://github.com/ilri/OpenRXV/issues/43) +- Atmire responded about the Listings and Reports and Content and Usage Statistics issues with DSpace 6 that I reported last week + - They said that the CUA issue was a mistake and should be fixed in a minor version bump + - They asked me to confirm if the L&R version bump from last week did not solve the issue there (which I had tested locally, but not on DSpace Test) + - I will test them both again on DSpace Test and report back +- I posted a message on Yammer to inform all our users about the changes to countries, regions, and AGROVOC subjects +- I modified all AGROVOC subjects to be lower case in PostgreSQL and then exported a list of the top 1500 to update the controlled vocabulary in our submission form: + +``` +dspace=> BEGIN; +dspace=> UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE resource_type_id=2 AND metadata_field_id=57; +UPDATE 335063 +dspace=> COMMIT; +dspace=> \COPY (SELECT DISTINCT text_value as "dc.subject", count(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=57 GROUP BY "dc.subject" ORDER BY count DESC LIMIT 1500) TO /tmp/2020-10-15-top-1500-agrovoc-subject.csv WITH CSV HEADER; +COPY 1500 +``` + +- Use my `agrovoc-lookup.py` script to validate subject terms against the AGROVOC REST API, extract matches with `csvgrep`, and then update and format the controlled vocabulary: + +``` +$ csvcut -c 1 /tmp/2020-10-15-top-1500-agrovoc-subject.csv | tail -n 1500 > /tmp/subjects.txt +$ ./agrovoc-lookup.py -i /tmp/subjects.txt -o /tmp/subjects.csv -d +$ csvgrep -c 4 -m 0 -i /tmp/subjects.csv | csvcut -c 1 | sed '1d' > dspace/config/controlled-vocabularies/dc-subject.xml +# apply formatting in XML file +$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/dc-subject.xml +``` + +- Then I started a full re-indexing on CGSpace: + +``` +$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b + +real 88m21.678s +user 7m59.182s +sys 2m22.713s +``` + diff --git a/docs/2015-11/index.html b/docs/2015-11/index.html index 0e0aacbc2..1d24d30da 100644 --- a/docs/2015-11/index.html +++ b/docs/2015-11/index.html @@ -31,7 +31,7 @@ Last week I had increased the limit from 30 to 60, which seemed to help, but now $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 78 "/> - + diff --git a/docs/2015-12/index.html b/docs/2015-12/index.html index fc05b5eac..a34acf64a 100644 --- a/docs/2015-12/index.html +++ b/docs/2015-12/index.html @@ -33,7 +33,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz "/> - + diff --git a/docs/2016-01/index.html b/docs/2016-01/index.html index 1c0105de9..36d0901ca 100644 --- a/docs/2016-01/index.html +++ b/docs/2016-01/index.html @@ -25,7 +25,7 @@ Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_ I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated. Update GitHub wiki for documentation of maintenance tasks. "/> - + diff --git a/docs/2016-02/index.html b/docs/2016-02/index.html index fb3f4c768..ac00c4587 100644 --- a/docs/2016-02/index.html +++ b/docs/2016-02/index.html @@ -35,7 +35,7 @@ I noticed we have a very interesting list of countries on CGSpace: Not only are there 49,000 countries, we have some blanks (25)… Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE” "/> - + diff --git a/docs/2016-03/index.html b/docs/2016-03/index.html index 91b22733d..f0a58bbc0 100644 --- a/docs/2016-03/index.html +++ b/docs/2016-03/index.html @@ -25,7 +25,7 @@ Looking at issues with author authorities on CGSpace For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server "/> - + diff --git a/docs/2016-04/index.html b/docs/2016-04/index.html index 148554bc1..2a11fe9d1 100644 --- a/docs/2016-04/index.html +++ b/docs/2016-04/index.html @@ -29,7 +29,7 @@ After running DSpace for over five years I’ve never needed to look in any This will save us a few gigs of backup space we’re paying for on S3 Also, I noticed the checker log has some errors we should pay attention to: "/> - + diff --git a/docs/2016-05/index.html b/docs/2016-05/index.html index c68e02f51..c97dfb9bd 100644 --- a/docs/2016-05/index.html +++ b/docs/2016-05/index.html @@ -31,7 +31,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period! # awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l 3168 "/> - + diff --git a/docs/2016-06/index.html b/docs/2016-06/index.html index bf9777660..cf2311d15 100644 --- a/docs/2016-06/index.html +++ b/docs/2016-06/index.html @@ -31,7 +31,7 @@ This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRec You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship "/> - + diff --git a/docs/2016-07/index.html b/docs/2016-07/index.html index 62672cc63..fabe0281a 100644 --- a/docs/2016-07/index.html +++ b/docs/2016-07/index.html @@ -41,7 +41,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and In this case the select query was showing 95 results before the update "/> - + diff --git a/docs/2016-08/index.html b/docs/2016-08/index.html index 51952bbac..7a6cfc067 100644 --- a/docs/2016-08/index.html +++ b/docs/2016-08/index.html @@ -39,7 +39,7 @@ $ git checkout -b 55new 5_x-prod $ git reset --hard ilri/5_x-prod $ git rebase -i dspace-5.5 "/> - + diff --git a/docs/2016-09/index.html b/docs/2016-09/index.html index 0734e89dd..00ee24e6c 100644 --- a/docs/2016-09/index.html +++ b/docs/2016-09/index.html @@ -31,7 +31,7 @@ It looks like we might be able to use OUs now, instead of DCs: $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)" "/> - + diff --git a/docs/2016-10/index.html b/docs/2016-10/index.html index b1ad5148f..8de7ac4ad 100644 --- a/docs/2016-10/index.html +++ b/docs/2016-10/index.html @@ -39,7 +39,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id 0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X "/> - + diff --git a/docs/2016-11/index.html b/docs/2016-11/index.html index 5cfaaa5e8..1f0625883 100644 --- a/docs/2016-11/index.html +++ b/docs/2016-11/index.html @@ -23,7 +23,7 @@ Add dc.type to the output options for Atmire’s Listings and Reports module Add dc.type to the output options for Atmire’s Listings and Reports module (#286) "/> - + diff --git a/docs/2016-12/index.html b/docs/2016-12/index.html index 96909c062..dd70f2195 100644 --- a/docs/2016-12/index.html +++ b/docs/2016-12/index.html @@ -43,7 +43,7 @@ I see thousands of them in the logs for the last few months, so it’s not r I’ve raised a ticket with Atmire to ask Another worrying error from dspace.log is: "/> - + diff --git a/docs/2017-01/index.html b/docs/2017-01/index.html index 72a89d13c..a7cfc84a3 100644 --- a/docs/2017-01/index.html +++ b/docs/2017-01/index.html @@ -25,7 +25,7 @@ I checked to see if the Solr sharding task that is supposed to run on January 1s I tested on DSpace Test as well and it doesn’t work there either I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years "/> - + diff --git a/docs/2017-02/index.html b/docs/2017-02/index.html index a1953edd3..831c3e496 100644 --- a/docs/2017-02/index.html +++ b/docs/2017-02/index.html @@ -47,7 +47,7 @@ DELETE 1 Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301) Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name "/> - + diff --git a/docs/2017-03/index.html b/docs/2017-03/index.html index 184aa8db3..2719f710a 100644 --- a/docs/2017-03/index.html +++ b/docs/2017-03/index.html @@ -51,7 +51,7 @@ Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing reg $ identify ~/Desktop/alc_contrastes_desafios.jpg /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 "/> - + diff --git a/docs/2017-04/index.html b/docs/2017-04/index.html index 20cb5fdf6..3a50c1468 100644 --- a/docs/2017-04/index.html +++ b/docs/2017-04/index.html @@ -37,7 +37,7 @@ Testing the CMYK patch on a collection with 650 items: $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt "/> - + diff --git a/docs/2017-05/index.html b/docs/2017-05/index.html index 3c5c1be36..8ecf371e2 100644 --- a/docs/2017-05/index.html +++ b/docs/2017-05/index.html @@ -15,7 +15,7 @@ - + diff --git a/docs/2017-06/index.html b/docs/2017-06/index.html index 90c73e9a7..f945c9b9f 100644 --- a/docs/2017-06/index.html +++ b/docs/2017-06/index.html @@ -15,7 +15,7 @@ - + diff --git a/docs/2017-07/index.html b/docs/2017-07/index.html index c9b8d6e24..38eeebb9e 100644 --- a/docs/2017-07/index.html +++ b/docs/2017-07/index.html @@ -33,7 +33,7 @@ Merge changes for WLE Phase II theme rename (#329) Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML: "/> - + diff --git a/docs/2017-08/index.html b/docs/2017-08/index.html index 565673eb6..784cc54c5 100644 --- a/docs/2017-08/index.html +++ b/docs/2017-08/index.html @@ -57,7 +57,7 @@ This was due to newline characters in the dc.description.abstract column, which I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet "/> - + diff --git a/docs/2017-09/index.html b/docs/2017-09/index.html index fd31a691b..d2886c24c 100644 --- a/docs/2017-09/index.html +++ b/docs/2017-09/index.html @@ -29,7 +29,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group "/> - + diff --git a/docs/2017-10/index.html b/docs/2017-10/index.html index 1457e584a..b39394ff2 100644 --- a/docs/2017-10/index.html +++ b/docs/2017-10/index.html @@ -31,7 +31,7 @@ http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections "/> - + diff --git a/docs/2017-11/index.html b/docs/2017-11/index.html index 3272b3b51..9e8c3ca3d 100644 --- a/docs/2017-11/index.html +++ b/docs/2017-11/index.html @@ -45,7 +45,7 @@ Generate list of authors on CGSpace for Peter to go through and correct: dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; COPY 54701 "/> - + diff --git a/docs/2017-12/index.html b/docs/2017-12/index.html index 17dde80c2..ebd5a7053 100644 --- a/docs/2017-12/index.html +++ b/docs/2017-12/index.html @@ -27,7 +27,7 @@ The logs say “Timeout waiting for idle object” PostgreSQL activity says there are 115 connections currently The list of connections to XMLUI and REST API for today: "/> - + diff --git a/docs/2018-01/index.html b/docs/2018-01/index.html index c86632784..6243bc019 100644 --- a/docs/2018-01/index.html +++ b/docs/2018-01/index.html @@ -147,7 +147,7 @@ dspace.log.2018-01-02:34 Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains "/> - + diff --git a/docs/2018-02/index.html b/docs/2018-02/index.html index 2c958830c..82e145bb5 100644 --- a/docs/2018-02/index.html +++ b/docs/2018-02/index.html @@ -27,7 +27,7 @@ We don’t need to distinguish between internal and external works, so that Yesterday I figured out how to monitor DSpace sessions using JMX I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01 "/> - + diff --git a/docs/2018-03/index.html b/docs/2018-03/index.html index 928b3cae2..2657b628d 100644 --- a/docs/2018-03/index.html +++ b/docs/2018-03/index.html @@ -21,7 +21,7 @@ Export a CSV of the IITA community metadata for Martin Mueller Export a CSV of the IITA community metadata for Martin Mueller "/> - + diff --git a/docs/2018-04/index.html b/docs/2018-04/index.html index aafb5b42e..c3e240277 100644 --- a/docs/2018-04/index.html +++ b/docs/2018-04/index.html @@ -23,7 +23,7 @@ Catalina logs at least show some memory errors yesterday: I tried to test something on DSpace Test but noticed that it’s down since god knows when Catalina logs at least show some memory errors yesterday: "/> - + diff --git a/docs/2018-05/index.html b/docs/2018-05/index.html index 4b30ff27a..fc8b7bfca 100644 --- a/docs/2018-05/index.html +++ b/docs/2018-05/index.html @@ -35,7 +35,7 @@ http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E Then I reduced the JVM heap size from 6144 back to 5120m Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use "/> - + diff --git a/docs/2018-06/index.html b/docs/2018-06/index.html index c7b5275fb..e2ac02333 100644 --- a/docs/2018-06/index.html +++ b/docs/2018-06/index.html @@ -55,7 +55,7 @@ real 74m42.646s user 8m5.056s sys 2m7.289s "/> - + diff --git a/docs/2018-07/index.html b/docs/2018-07/index.html index 12199723c..883241308 100644 --- a/docs/2018-07/index.html +++ b/docs/2018-07/index.html @@ -33,7 +33,7 @@ During the mvn package stage on the 5.8 branch I kept getting issues with java r There is insufficient memory for the Java Runtime Environment to continue. "/> - + diff --git a/docs/2018-08/index.html b/docs/2018-08/index.html index 1f33bdb75..c515cf480 100644 --- a/docs/2018-08/index.html +++ b/docs/2018-08/index.html @@ -43,7 +43,7 @@ Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes I ran all system updates on DSpace Test and rebooted it "/> - + diff --git a/docs/2018-09/index.html b/docs/2018-09/index.html index 6bb0bc93b..d0d3d3868 100644 --- a/docs/2018-09/index.html +++ b/docs/2018-09/index.html @@ -27,7 +27,7 @@ I’ll update the DSpace role in our Ansible infrastructure playbooks and ru Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again: "/> - + diff --git a/docs/2018-10/index.html b/docs/2018-10/index.html index df93749b9..f27067626 100644 --- a/docs/2018-10/index.html +++ b/docs/2018-10/index.html @@ -23,7 +23,7 @@ I created a GitHub issue to track this #389, because I’m super busy in Nai Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now "/> - + diff --git a/docs/2018-11/index.html b/docs/2018-11/index.html index 5c79d4e31..f44af01b3 100644 --- a/docs/2018-11/index.html +++ b/docs/2018-11/index.html @@ -33,7 +33,7 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage Today these are the top 10 IPs: "/> - + diff --git a/docs/2018-12/index.html b/docs/2018-12/index.html index c9d4484ea..ee632bcf6 100644 --- a/docs/2018-12/index.html +++ b/docs/2018-12/index.html @@ -33,7 +33,7 @@ Then I ran all system updates and restarted the server I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week "/> - + diff --git a/docs/2019-01/index.html b/docs/2019-01/index.html index 81de9316e..c014515b6 100644 --- a/docs/2019-01/index.html +++ b/docs/2019-01/index.html @@ -47,7 +47,7 @@ I don’t see anything interesting in the web server logs around that time t 357 207.46.13.1 903 54.70.40.11 "/> - + diff --git a/docs/2019-02/index.html b/docs/2019-02/index.html index 4c9fd64fe..9353e269a 100644 --- a/docs/2019-02/index.html +++ b/docs/2019-02/index.html @@ -69,7 +69,7 @@ real 0m19.873s user 0m22.203s sys 0m1.979s "/> - + diff --git a/docs/2019-03/index.html b/docs/2019-03/index.html index 5c23817ee..da0c95780 100644 --- a/docs/2019-03/index.html +++ b/docs/2019-03/index.html @@ -43,7 +43,7 @@ Most worryingly, there are encoding errors in the abstracts for eleven items, fo I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs "/> - + diff --git a/docs/2019-04/index.html b/docs/2019-04/index.html index ecddf7083..f59de9d7a 100644 --- a/docs/2019-04/index.html +++ b/docs/2019-04/index.html @@ -61,7 +61,7 @@ $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u ds $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d "/> - + diff --git a/docs/2019-05/index.html b/docs/2019-05/index.html index 174dc7980..c4447ebd2 100644 --- a/docs/2019-05/index.html +++ b/docs/2019-05/index.html @@ -45,7 +45,7 @@ DELETE 1 But after this I tried to delete the item from the XMLUI and it is still present… "/> - + diff --git a/docs/2019-06/index.html b/docs/2019-06/index.html index b55c49ba0..77ff44771 100644 --- a/docs/2019-06/index.html +++ b/docs/2019-06/index.html @@ -31,7 +31,7 @@ Run system updates on CGSpace (linode18) and reboot it Skype with Marie-Angélique and Abenet about CG Core v2 "/> - + diff --git a/docs/2019-07/index.html b/docs/2019-07/index.html index e15c5a4f9..c6826610f 100644 --- a/docs/2019-07/index.html +++ b/docs/2019-07/index.html @@ -35,7 +35,7 @@ CGSpace Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community "/> - + diff --git a/docs/2019-08/index.html b/docs/2019-08/index.html index 4f8d7ce1e..c39dd59be 100644 --- a/docs/2019-08/index.html +++ b/docs/2019-08/index.html @@ -43,7 +43,7 @@ After rebooting, all statistics cores were loaded… wow, that’s luck Run system updates on DSpace Test (linode19) and reboot it "/> - + diff --git a/docs/2019-09/index.html b/docs/2019-09/index.html index 44a48b6eb..24493de2e 100644 --- a/docs/2019-09/index.html +++ b/docs/2019-09/index.html @@ -69,7 +69,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning: 7249 2a01:7e00::f03c:91ff:fe18:7396 9124 45.5.186.2 "/> - + diff --git a/docs/2019-10/index.html b/docs/2019-10/index.html index 85138db84..fc927265e 100644 --- a/docs/2019-10/index.html +++ b/docs/2019-10/index.html @@ -15,7 +15,7 @@ - + diff --git a/docs/2019-11/index.html b/docs/2019-11/index.html index a83f134dd..bc61fd690 100644 --- a/docs/2019-11/index.html +++ b/docs/2019-11/index.html @@ -55,7 +55,7 @@ Let’s see how many of the REST API requests were for bitstreams (because t # zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams" 106781 "/> - + diff --git a/docs/2019-12/index.html b/docs/2019-12/index.html index 4428e81bf..625255931 100644 --- a/docs/2019-12/index.html +++ b/docs/2019-12/index.html @@ -43,7 +43,7 @@ Make sure all packages are up to date and the package manager is up to date, the # dpkg -C # reboot "/> - + diff --git a/docs/2020-01/index.html b/docs/2020-01/index.html index d1ffa10ab..ebb1224ab 100644 --- a/docs/2020-01/index.html +++ b/docs/2020-01/index.html @@ -53,7 +53,7 @@ I tweeted the CGSpace repository link "/> - + diff --git a/docs/2020-02/index.html b/docs/2020-02/index.html index 28fc84540..e7a8d90c3 100644 --- a/docs/2020-02/index.html +++ b/docs/2020-02/index.html @@ -35,7 +35,7 @@ The code finally builds and runs with a fresh install "/> - + diff --git a/docs/2020-03/index.html b/docs/2020-03/index.html index 9c7f72185..e9e54fac9 100644 --- a/docs/2020-03/index.html +++ b/docs/2020-03/index.html @@ -39,7 +39,7 @@ You need to download this into the DSpace 6.x source and compile it "/> - + diff --git a/docs/2020-04/index.html b/docs/2020-04/index.html index 6435205fa..e7d9e4219 100644 --- a/docs/2020-04/index.html +++ b/docs/2020-04/index.html @@ -45,7 +45,7 @@ The third item now has a donut with score 1 since I tweeted it last week On the same note, the one item Abenet pointed out last week now has a donut with score of 104 after I tweeted it last week "/> - + diff --git a/docs/2020-05/index.html b/docs/2020-05/index.html index 9e3cd2a9f..d5b2a77ea 100644 --- a/docs/2020-05/index.html +++ b/docs/2020-05/index.html @@ -31,7 +31,7 @@ I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2 "/> - + diff --git a/docs/2020-06/index.html b/docs/2020-06/index.html index 5ba9c116f..ea2dab17a 100644 --- a/docs/2020-06/index.html +++ b/docs/2020-06/index.html @@ -33,7 +33,7 @@ I sent Atmire the dspace.log from today and told them to log into the server to In other news, I checked the statistics API on DSpace 6 and it’s working I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error: "/> - + diff --git a/docs/2020-07/index.html b/docs/2020-07/index.html index 3d0caef77..5b6e0e66f 100644 --- a/docs/2020-07/index.html +++ b/docs/2020-07/index.html @@ -35,7 +35,7 @@ I restarted Tomcat and PostgreSQL and the issue was gone Since I was restarting Tomcat anyways I decided to redeploy the latest changes from the 5_x-prod branch and I added a note about COVID-19 items to the CGSpace frontpage at Peter’s request "/> - + diff --git a/docs/2020-08/index.html b/docs/2020-08/index.html index ddd7bd203..efed00975 100644 --- a/docs/2020-08/index.html +++ b/docs/2020-08/index.html @@ -33,7 +33,7 @@ It is class based so I can easily add support for other vocabularies, and the te "/> - + diff --git a/docs/2020-09/index.html b/docs/2020-09/index.html index c19c500bf..fe7d32f24 100644 --- a/docs/2020-09/index.html +++ b/docs/2020-09/index.html @@ -45,7 +45,7 @@ I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/39 I filed an issue on OpenRXV to make some minor edits to the admin UI: https://github.com/ilri/OpenRXV/issues/40 "/> - + diff --git a/docs/2020-10/index.html b/docs/2020-10/index.html index 109c27155..bccfbdce1 100644 --- a/docs/2020-10/index.html +++ b/docs/2020-10/index.html @@ -23,7 +23,7 @@ During the FlywayDB migration I got an error: - + @@ -41,7 +41,7 @@ During the FlywayDB migration I got an error: "/> - + @@ -51,9 +51,9 @@ During the FlywayDB migration I got an error: "@type": "BlogPosting", "headline": "October, 2020", "url": "https://alanorth.github.io/cgspace-notes/2020-10/", - "wordCount": "2381", + "wordCount": "2831", "datePublished": "2020-10-06T16:55:54+03:00", - "dateModified": "2020-10-12T17:53:24+03:00", + "dateModified": "2020-10-14T22:21:03+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -539,7 +539,66 @@ sys 2m22.713s - +

2020-10-15

+ +
dspace=> BEGIN;
+dspace=> UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE resource_type_id=2 AND metadata_field_id=57;
+UPDATE 335063
+dspace=> COMMIT;
+dspace=> \COPY (SELECT DISTINCT text_value as "dc.subject", count(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=57 GROUP BY "dc.subject" ORDER BY count DESC LIMIT 1500) TO /tmp/2020-10-15-top-1500-agrovoc-subject.csv WITH CSV HEADER;
+COPY 1500
+
+
$ csvcut -c 1 /tmp/2020-10-15-top-1500-agrovoc-subject.csv | tail -n 1500 > /tmp/subjects.txt
+$ ./agrovoc-lookup.py -i /tmp/subjects.txt -o /tmp/subjects.csv -d
+$ csvgrep -c 4 -m 0 -i /tmp/subjects.csv | csvcut -c 1 | sed '1d' > dspace/config/controlled-vocabularies/dc-subject.xml
+# apply formatting in XML file
+$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/dc-subject.xml
+
+
$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
+
+real    88m21.678s
+user    7m59.182s
+sys     2m22.713s
+
diff --git a/docs/404.html b/docs/404.html index 3d73b91b1..291dd08c1 100644 --- a/docs/404.html +++ b/docs/404.html @@ -14,7 +14,7 @@ - + diff --git a/docs/categories/index.html b/docs/categories/index.html index bec313a04..e096a24f9 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index c444d821d..3c1f85b50 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 5882c5446..ff1a79473 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 4d68d88d8..27625e880 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 04518f6ba..6fc3c7b2a 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/cgiar-library-migration/index.html b/docs/cgiar-library-migration/index.html index c3456a302..6f3441bae 100644 --- a/docs/cgiar-library-migration/index.html +++ b/docs/cgiar-library-migration/index.html @@ -15,7 +15,7 @@ - + diff --git a/docs/cgspace-cgcorev2-migration/index.html b/docs/cgspace-cgcorev2-migration/index.html index 015e2e4eb..776ad0d68 100644 --- a/docs/cgspace-cgcorev2-migration/index.html +++ b/docs/cgspace-cgcorev2-migration/index.html @@ -15,7 +15,7 @@ - + diff --git a/docs/index.html b/docs/index.html index 174204c3c..eff6d5540 100644 --- a/docs/index.html +++ b/docs/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index 6a23500a2..85d6199db 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index a057aef1a..8bdb75626 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 642157dd1..2833756bc 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index 79b976185..713b123d1 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index 3c47eaaed..79edd2199 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 166207a86..8ad89a52c 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 39af2cf42..525fe91c6 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index cc5a4d543..e9177ae99 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index ae94ab1c1..d5aa8b89e 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 49b6521cc..2deb354b1 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 8d06197d3..9a98147b0 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index f9c90b521..5152784f9 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index ff0d954e5..68df87e7f 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 68b8c0727..77eba8b3e 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/categories/ - 2020-10-12T17:53:24+03:00 + 2020-10-14T22:21:03+03:00 https://alanorth.github.io/cgspace-notes/ - 2020-10-12T17:53:24+03:00 + 2020-10-14T22:21:03+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2020-10-12T17:53:24+03:00 + 2020-10-14T22:21:03+03:00 https://alanorth.github.io/cgspace-notes/2020-10/ - 2020-10-12T17:53:24+03:00 + 2020-10-14T22:21:03+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2020-10-12T17:53:24+03:00 + 2020-10-14T22:21:03+03:00 diff --git a/docs/tags/index.html b/docs/tags/index.html index f9fdce67f..b0435844a 100644 --- a/docs/tags/index.html +++ b/docs/tags/index.html @@ -14,7 +14,7 @@ - + diff --git a/docs/tags/migration/index.html b/docs/tags/migration/index.html index 02bc13d34..3d1cda6cf 100644 --- a/docs/tags/migration/index.html +++ b/docs/tags/migration/index.html @@ -14,7 +14,7 @@ - + diff --git a/docs/tags/notes/index.html b/docs/tags/notes/index.html index 9504a0078..956599ea8 100644 --- a/docs/tags/notes/index.html +++ b/docs/tags/notes/index.html @@ -14,7 +14,7 @@ - + diff --git a/docs/tags/notes/page/2/index.html b/docs/tags/notes/page/2/index.html index 19546ba2d..cc23fd786 100644 --- a/docs/tags/notes/page/2/index.html +++ b/docs/tags/notes/page/2/index.html @@ -14,7 +14,7 @@ - + diff --git a/docs/tags/notes/page/3/index.html b/docs/tags/notes/page/3/index.html index 810bbd493..e663161b6 100644 --- a/docs/tags/notes/page/3/index.html +++ b/docs/tags/notes/page/3/index.html @@ -14,7 +14,7 @@ - +