2021-12-01
- Atmire merged some changes I had submitted to the COUNTER-Robots project
- I updated our local spider user agents and then re-ran the list with my
check-spider-hits.sh
script on CGSpace:
$ ./ilri/check-spider-hits.sh -f /tmp/agents -p
Purging 1989 hits from The Knowledge AI in statistics
Purging 1235 hits from MaCoCu in statistics
Purging 455 hits from WhatsApp in statistics
Total number of bot hits purged: 3679
Read more →
2021-11-02
- I experimented with manually sharding the Solr statistics on DSpace Test
- First I exported all the 2019 stats from CGSpace:
$ ./run.sh -s http://localhost:8081/solr/statistics -f 'time:2019-*' -a export -o statistics-2019.json -k uid
$ zstd statistics-2019.json
Read more →
2021-10-01
- Export all affiliations on CGSpace and run them against the latest RoR data dump:
localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
$ csvcut -c 1 /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
ations-matching.csv
$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l
1879
$ wc -l /tmp/2021-10-01-affiliations.txt
7100 /tmp/2021-10-01-affiliations.txt
- So we have 1879/7100 (26.46%) matching already
Read more →
2021-09-02
- Troubleshooting the missing Altmetric scores on AReS
- Turns out that I didn’t actually fix them last month because the check for
content.altmetric
still exists, and I can’t access the DOIs using _h.source.DOI
for some reason
- I can access all other kinds of item metadata using the Elasticsearch label, but not DOI!!!
- I will change
DOI
to tomato
in the repository setup and start a re-harvest… I need to see if this is some kind of reserved word or something…
- Even as
tomato
I can’t access that field as _h.source.tomato
in Angular, but it does work as a filter source… sigh
- I’m having problems using the OpenRXV API
- The syntax Moayad showed me last month doesn’t seem to honor the search query properly…
Read more →
2021-08-01
- Update Docker images on AReS server (linode20) and reboot the server:
# docker images | grep -v ^REPO | sed 's/ \+/:/g' | cut -d: -f1,2 | grep -v none | xargs -L1 docker pull
- I decided to upgrade linode20 from Ubuntu 18.04 to 20.04
Read more →
2021-07-01
- Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:
localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
COPY 20994
Read more →
2021-06-01
- IWMI notified me that AReS was down with an HTTP 502 error
- Looking at UptimeRobot I see it has been down for 33 hours, but I never got a notification
- I don’t see anything in the Elasticsearch container logs, or the systemd journal on the host, but I notice that the
angular_nginx
container isn’t running
- I simply started it and AReS was running again:
Read more →
2021-05-01
- I looked at the top user agents and IPs in the Solr statistics for last month and I see these user agents:
- “RI/1.0”, 1337
- “Microsoft Office Word 2014”, 941
- I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…
Read more →
2021-04-01
- I wrote a script to query Sherpa’s API for our ISSNs:
sherpa-issn-lookup.py
- I’m curious to see how the results compare with the results from Crossref yesterday
- AReS Explorer was down since this morning, I didn’t see anything in the systemd journal
- I simply took everything down with docker-compose and then back up, and then it was OK
- Perhaps one of the containers crashed, I should have looked closer but I was in a hurry
Read more →
2021-03-01
- Discuss some OpenRXV issues with Abdullah from CodeObia
- He’s trying to work on the DSpace 6+ metadata schema autoimport using the DSpace 6+ REST API
- Also, we found some issues building and running OpenRXV currently due to ecosystem shift in the Node.js dependencies
Read more →