Add notes for 2020-11-22

2025-01-27 05:49:12 +01:00 · 2020-11-22 23:08:49 +02:00
parent 05c4b236f4
commit 26f17edd92
97 changed files with 232 additions and 123 deletions
--- a/content/posts/2020-11.md
+++ b/content/posts/2020-11.md
@ -409,4 +409,54 @@ $ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid =

 - Very curious that there was such a high number of rolled back transactions after the update

+## 2020-11-22
+
+- PostgreSQL situation on CGSpace (linode18) looks much better now:
+
+![PostgreSQL locks week](/cgspace-notes/2020/11/postgres_locks_ALL-week3.png)
+![PostgreSQL transaction log week](/cgspace-notes/2020/11/postgres_xlog-week2.png)
+
+- In other news, I noticed that harvesting DSpace 6 works fine in OpenRXV, but the statistics fail on page 1
+  - I filed an issue: https://github.com/ilri/OpenRXV/issues/59
+- Abenet asked for help trying to add a new user to the Bioversity and CIAT groups on CGSpace
+  - I see that the user search is split on five results, so the user in question appears on page 2
+  - I asked Abenet if she was getting an error or it was simply this...
+- Maria Garuccio sent me an example report that she wants to be able to generate from AReS
+  - First, she would like to have the option to group by output type
+  - Second, she would like to be able to control the sorting in the template, like sorting the citation alphabetically
+  - I filed an issue: https://github.com/ilri/OpenRXV/issues/60
+- Mohammad Salem had asked if there was an item ID to UUID mapping for CGSpace
+  - I found a thread on the dspace-tech mailing list that pointed out that there is a new `uuid` column in the item table
+  - Only old items have an `item_id` so we can get a mapping easily:
+
+```
+dspace=# \COPY (SELECT item_id,uuid FROM item WHERE in_archive='t' AND withdrawn='f' AND item_id IS NOT NULL) TO /tmp/2020-11-22-item-id2uuid.csv WITH CSV HEADER;
+COPY 87411
+```
+
+- Saving some notes I wrote down about faceting by community and collection in Solr, for potential use in the future in the DSpace Statistics API
+- Facet by owningComm to see total number of distinct communities (136):
+
+```
+  facet=true&facet.mincount=1&facet.field=owningComm&facet.limit=1&facet.offset=0&stats=true&stats.field=id&stats.calcdistinct=true
+```
+
+- Facet by owningComm and get the first 5 distinct:
+
+```
+  facet=true&facet.mincount=1&facet.field=owningComm&facet.limit=5&facet.offset=0&facet.pivot=id,countryCode
+```
+
+- Facet by owningComm and countryCode using facet.pivot and maybe I can just skip the normal facet params?
+
+```
+facet=true&f.owningComm.facet.limit=5&f.owningComm.facet.offset=5&facet.pivot=owningComm,countryCode
+```
+
+- Facet by owningComm and countryCode using facet.pivot and limiting to top five countries... fuck it's possible!
+
+```
+facet=true&f.owningComm.facet.limit=5&f.owningComm.facet.offset=5&f.countryCode.facet.limit=5&facet.pivot=owningComm,countryCode
+```
+
 <!-- vim: set sw=2 ts=2: -->
--- a/content/posts/cgspace-dspace6-upgrade.md
+++ b/content/posts/cgspace-dspace6-upgrade.md
@ -410,8 +410,14 @@ $ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=true" -H

 ### Processing Solr statistics with AtomicStatisticsUpdateCLI

-On 2020-11-18 I finished processing the Solr statistics with solr-upgrade-statistics-6x and I started processing them with AtomicStatisticsUpdateCLI:
+On 2020-11-18 I finished processing the Solr statistics with solr-upgrade-statistics-6x and I started processing them with AtomicStatisticsUpdateCLI.
+
+## statistics
+
+First the current year's statistics core, in 12-hour batches:

 ```
 $ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics
 ```
+
+It took ~38 hours to finish processing this core.