cgspace-notes/content/posts/2019-09.md

101 lines
5.1 KiB
Markdown
Raw Normal View History

2019-09-01 09:41:30 +02:00
---
title: "September, 2019"
date: 2019-09-01T10:17:51+03:00
author: "Alan Orth"
tags: ["Notes"]
---
## 2019-09-01
- Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning
- Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
```
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
440 17.58.101.255
441 157.55.39.101
485 207.46.13.43
728 169.60.128.125
730 207.46.13.108
758 157.55.39.9
808 66.160.140.179
814 207.46.13.212
2472 163.172.71.23
6092 3.94.211.189
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
33 2a01:7e00::f03c:91ff:fe16:fcb
57 3.83.192.124
57 3.87.77.25
57 54.82.1.8
822 2a01:9cc0:47:1:1a:4:0:2
1223 45.5.184.72
1633 172.104.229.92
5112 205.186.128.185
7249 2a01:7e00::f03c:91ff:fe18:7396
9124 45.5.186.2
```
<!--more-->
- `3.94.211.189` is MauiBot, and most of its requests are to Discovery and get rate limited with HTTP 503
- `163.172.71.23` is some IP on Online SAS in France and its user agent is:
```
Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6)
```
- It actually got mostly HTTP 200 responses:
```
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | grep 163.172.71.23 | awk '{print $9}' | sort | uniq -c
1775 200
703 499
72 503
```
- And it was mostly requesting Discover pages:
```
# zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | grep 163.172.71.23 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c
2350 discover
71 handle
```
- I'm not sure why the outbound traffic rate was so high...
2019-09-03 00:03:06 +02:00
## 2019-09-02
- Follow up with Carol and Francesca from Bioversity as they were on holiday during the mid-to-late August
- I told them to check the [temporary collection on DSpace Test](https://dspacetest.cgiar.org/handle/10568/103999) where I uploaded the 1,427 items so they can see how it will look
- Also, I told them to advise me about the strange file extensions (.7z, .zip, .lck)
- Also, I reminded Abenet to check the metadata, as the institutional authors at least will need some modification
2019-09-10 15:59:18 +02:00
## 2019-09-10
- Altmetric responded to say that they have fixed an issue with their badge code so now research outputs with multiple handles are showing badges!
- See: https://hdl.handle.net/handle/10568/97825
2019-09-12 17:21:43 +02:00
- Follow up with Bosede about the mixup with PDFs in the items uploaded in 2018-12 (aka Daniel1807.xsl)
2019-09-10 15:59:18 +02:00
- These are the same ones that Peter noticed last week, that Bosede and I had been discussing earlier this year that we never sorted out
2019-09-10 16:20:42 +02:00
- It looks like these items were uploaded by Sisay on 2018-12-19 so we can use the [accession date as a filter](https://cgspace.cgiar.org/handle/10568/68616/discover?filtertype_1=dateAccessioned&filter_relational_operator_1=contains&filter_1=2018-12-19&submit_apply_filter=&query=) to narrow it down to 230 items (of which only 104 have PDFs, according to the Daniel1807.xls input input file)
2019-09-12 17:21:43 +02:00
- Now I just checked a few manually and they are correct in the original input file, so something must have happened when Sisay was processing them for upload
- I have asked Sisay to fix them...
2019-09-10 15:59:18 +02:00
- Continue working on CG Core v2 migration, focusing on the crosswalk mappings
- I think we can skip the MODS crosswalk for now because it is only used in [AIP exports that are meant for non-DSpace systems](https://wiki.duraspace.org/display/DSDOC5x/DSpace+AIP+Format#DSpaceAIPFormat-MODSSchema)
- We should probably do the QDC crosswalk as well as those in `xhtml-head-item.properties`...
- Ouch, there is potentially a lot of work in the OAI metadata formats like DIM, METS, and QDC (see `dspace/config/crosswalks/oai/*.xsl`)
- In general I think I should only modify the left side of the crosswalk mappings (ie, where metadata is coming from) so we maintain the same exact output for search engines, etc
2019-09-12 17:21:43 +02:00
## 2019-09-11
- Maria Garruccio asked me to add two new Bioversity ORCID identifiers to CGSpace so I created a [pull request](https://github.com/ilri/DSpace/pull/431)
- Marissa Van Epp asked me to add new CCAFS Phase II project tags to CGSpace so I created a [pull request](https://github.com/ilri/DSpace/pull/432)
- I will wait until I hear from her to merge it because there is one tag that seems to be a duplicate because its name (PII-WA_agrosylvopast) is similar to one that already exists (PII-WA_AgroSylvopastoralSystems)
- More work on the CG Core v2 migrations
- I have updated my [notes on the possible changes](https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409) and done more work on the XMLUI replacements
## 2019-09-12
- Deploy [PostgreSQL JDBC driver](https://jdbc.postgresql.org/) version 42.2.7 on DSpace Test and update the [Ansible infrastructure scripts](https://github.com/ilri/rmg-ansible-public)
2019-09-01 09:41:30 +02:00
<!-- vim: set sw=2 ts=2: -->