Add notes for 2020-10-15

This commit is contained in:
2020-10-15 18:11:00 +03:00
parent ae2c5bd8f6
commit d44615936a
89 changed files with 225 additions and 115 deletions

View File

@ -359,4 +359,55 @@ sys 2m22.713s
- I found a new setting in DSpace 6's `usage-statistics.cfg` about case insensitive matching of bots that defaults to false, so I enabled it in our DSpace 6 branch
- I am curious to see if that resolves the strange issues I noticed yesterday about bot matching of patterns in the spider agents file completely not working
## 2020-10-15
- Re-deploy latest code on both CGSpace and DSpace Test to get the input forms changes
- Run system updates and reboot each server (linode18 and linode26)
- I had to restart Tomcat seven times on CGSpace before all Solr stats cores came up OK
- Skype with Peter and Abenet about AReS and CGSpace
- We agreed to lower case the AGROVOC subjects on CGSpace to make it harmonized with MELSpace and WorldFish
- We agreed to separate the AGROVOC from the other center- and CRP-specific subjects so that the search and tag clouds are cleaner and more useful
- We added a filter for journal title
- I enabled anonymous access to the "Export search metadata" option on DSpace Test
- If I search for author containing "Orth, Alan" or "Orth Alan" the export search metadata returns HTTP 400
- If I search for author containing "Orth" it exports a CSV properly...
- I created issues on the OpenRXV repository:
- [Can't download templates that have spaces in their file name](https://github.com/ilri/OpenRXV/issues/42)
- [Can't search for text values with a space in "Mapping Values" interface](https://github.com/ilri/OpenRXV/issues/43)
- Atmire responded about the Listings and Reports and Content and Usage Statistics issues with DSpace 6 that I reported last week
- They said that the CUA issue was a mistake and should be fixed in a minor version bump
- They asked me to confirm if the L&R version bump from last week did not solve the issue there (which I had tested locally, but not on DSpace Test)
- I will test them both again on DSpace Test and report back
- I posted a message on Yammer to inform all our users about the changes to countries, regions, and AGROVOC subjects
- I modified all AGROVOC subjects to be lower case in PostgreSQL and then exported a list of the top 1500 to update the controlled vocabulary in our submission form:
```
dspace=> BEGIN;
dspace=> UPDATE metadatavalue SET text_value=LOWER(text_value) WHERE resource_type_id=2 AND metadata_field_id=57;
UPDATE 335063
dspace=> COMMIT;
dspace=> \COPY (SELECT DISTINCT text_value as "dc.subject", count(text_value) FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=57 GROUP BY "dc.subject" ORDER BY count DESC LIMIT 1500) TO /tmp/2020-10-15-top-1500-agrovoc-subject.csv WITH CSV HEADER;
COPY 1500
```
- Use my `agrovoc-lookup.py` script to validate subject terms against the AGROVOC REST API, extract matches with `csvgrep`, and then update and format the controlled vocabulary:
```
$ csvcut -c 1 /tmp/2020-10-15-top-1500-agrovoc-subject.csv | tail -n 1500 > /tmp/subjects.txt
$ ./agrovoc-lookup.py -i /tmp/subjects.txt -o /tmp/subjects.csv -d
$ csvgrep -c 4 -m 0 -i /tmp/subjects.csv | csvcut -c 1 | sed '1d' > dspace/config/controlled-vocabularies/dc-subject.xml
# apply formatting in XML file
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/dc-subject.xml
```
- Then I started a full re-indexing on CGSpace:
```
$ time chrt -b 0 ionice -c2 -n7 nice -n19 dspace index-discovery -b
real 88m21.678s
user 7m59.182s
sys 2m22.713s
```
<!-- vim: set sw=2 ts=2: -->