Add notes for 2020-11-22

This commit is contained in:
2020-11-22 23:08:49 +02:00
parent 05c4b236f4
commit 26f17edd92
97 changed files with 232 additions and 123 deletions

View File

@ -409,4 +409,54 @@ $ psql -c 'SELECT * FROM pg_locks pl LEFT JOIN pg_stat_activity psa ON pl.pid =
- Very curious that there was such a high number of rolled back transactions after the update
## 2020-11-22
- PostgreSQL situation on CGSpace (linode18) looks much better now:
![PostgreSQL locks week](/cgspace-notes/2020/11/postgres_locks_ALL-week3.png)
![PostgreSQL transaction log week](/cgspace-notes/2020/11/postgres_xlog-week2.png)
- In other news, I noticed that harvesting DSpace 6 works fine in OpenRXV, but the statistics fail on page 1
- I filed an issue: https://github.com/ilri/OpenRXV/issues/59
- Abenet asked for help trying to add a new user to the Bioversity and CIAT groups on CGSpace
- I see that the user search is split on five results, so the user in question appears on page 2
- I asked Abenet if she was getting an error or it was simply this...
- Maria Garuccio sent me an example report that she wants to be able to generate from AReS
- First, she would like to have the option to group by output type
- Second, she would like to be able to control the sorting in the template, like sorting the citation alphabetically
- I filed an issue: https://github.com/ilri/OpenRXV/issues/60
- Mohammad Salem had asked if there was an item ID to UUID mapping for CGSpace
- I found a thread on the dspace-tech mailing list that pointed out that there is a new `uuid` column in the item table
- Only old items have an `item_id` so we can get a mapping easily:
```
dspace=# \COPY (SELECT item_id,uuid FROM item WHERE in_archive='t' AND withdrawn='f' AND item_id IS NOT NULL) TO /tmp/2020-11-22-item-id2uuid.csv WITH CSV HEADER;
COPY 87411
```
- Saving some notes I wrote down about faceting by community and collection in Solr, for potential use in the future in the DSpace Statistics API
- Facet by owningComm to see total number of distinct communities (136):
```
facet=true&facet.mincount=1&facet.field=owningComm&facet.limit=1&facet.offset=0&stats=true&stats.field=id&stats.calcdistinct=true
```
- Facet by owningComm and get the first 5 distinct:
```
facet=true&facet.mincount=1&facet.field=owningComm&facet.limit=5&facet.offset=0&facet.pivot=id,countryCode
```
- Facet by owningComm and countryCode using facet.pivot and maybe I can just skip the normal facet params?
```
facet=true&f.owningComm.facet.limit=5&f.owningComm.facet.offset=5&facet.pivot=owningComm,countryCode
```
- Facet by owningComm and countryCode using facet.pivot and limiting to top five countries... fuck it's possible!
```
facet=true&f.owningComm.facet.limit=5&f.owningComm.facet.offset=5&f.countryCode.facet.limit=5&facet.pivot=owningComm,countryCode
```
<!-- vim: set sw=2 ts=2: -->

View File

@ -410,8 +410,14 @@ $ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=true" -H
### Processing Solr statistics with AtomicStatisticsUpdateCLI
On 2020-11-18 I finished processing the Solr statistics with solr-upgrade-statistics-6x and I started processing them with AtomicStatisticsUpdateCLI:
On 2020-11-18 I finished processing the Solr statistics with solr-upgrade-statistics-6x and I started processing them with AtomicStatisticsUpdateCLI.
## statistics
First the current year's statistics core, in 12-hour batches:
```
$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics
```
It took ~38 hours to finish processing this core.