--- title: "September, 2019" date: 2019-09-01T10:17:51+03:00 author: "Alan Orth" tags: ["Notes"] --- ## 2019-09-01 - Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning - Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning: ``` # zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 440 17.58.101.255 441 157.55.39.101 485 207.46.13.43 728 169.60.128.125 730 207.46.13.108 758 157.55.39.9 808 66.160.140.179 814 207.46.13.212 2472 163.172.71.23 6092 3.94.211.189 # zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10 33 2a01:7e00::f03c:91ff:fe16:fcb 57 3.83.192.124 57 3.87.77.25 57 54.82.1.8 822 2a01:9cc0:47:1:1a:4:0:2 1223 45.5.184.72 1633 172.104.229.92 5112 205.186.128.185 7249 2a01:7e00::f03c:91ff:fe18:7396 9124 45.5.186.2 ``` - `3.94.211.189` is MauiBot, and most of its requests are to Discovery and get rate limited with HTTP 503 - `163.172.71.23` is some IP on Online SAS in France and its user agent is: ``` Mozilla/5.0 ((Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6) ``` - It actually got mostly HTTP 200 responses: ``` # zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | grep 163.172.71.23 | awk '{print $9}' | sort | uniq -c 1775 200 703 499 72 503 ``` - And it was mostly requesting Discover pages: ``` # zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | grep 163.172.71.23 | grep -o -E "(bitstream|discover|handle)" | sort | uniq -c 2350 discover 71 handle ``` - I'm not sure why the outbound traffic rate was so high... ## 2019-09-02 - Follow up with Carol and Francesca from Bioversity as they were on holiday during the mid-to-late August - I told them to check the [temporary collection on DSpace Test](https://dspacetest.cgiar.org/handle/10568/103999) where I uploaded the 1,427 items so they can see how it will look - Also, I told them to advise me about the strange file extensions (.7z, .zip, .lck) - Also, I reminded Abenet to check the metadata, as the institutional authors at least will need some modification ## 2019-09-10 - Altmetric responded to say that they have fixed an issue with their badge code so now research outputs with multiple handles are showing badges! - See: https://hdl.handle.net/handle/10568/97825 - Follow up with Bosede about the mixup with PDFs in the items uploaded in 2018-12 (aka Daniel1807.xsl) - These are the same ones that Peter noticed last week, that Bosede and I had been discussing earlier this year that we never sorted out - It looks like these items were uploaded by Sisay on 2018-12-19 so we can use the [accession date as a filter](https://cgspace.cgiar.org/handle/10568/68616/discover?filtertype_1=dateAccessioned&filter_relational_operator_1=contains&filter_1=2018-12-19&submit_apply_filter=&query=) to narrow it down to 230 items (of which only 104 have PDFs, according to the Daniel1807.xls input input file) - Now I just checked a few manually and they are correct in the original input file, so something must have happened when Sisay was processing them for upload - I have asked Sisay to fix them... - Continue working on CG Core v2 migration, focusing on the crosswalk mappings - I think we can skip the MODS crosswalk for now because it is only used in [AIP exports that are meant for non-DSpace systems](https://wiki.duraspace.org/display/DSDOC5x/DSpace+AIP+Format#DSpaceAIPFormat-MODSSchema) - We should probably do the QDC crosswalk as well as those in `xhtml-head-item.properties`... - Ouch, there is potentially a lot of work in the OAI metadata formats like DIM, METS, and QDC (see `dspace/config/crosswalks/oai/*.xsl`) - In general I think I should only modify the left side of the crosswalk mappings (ie, where metadata is coming from) so we maintain the same exact output for search engines, etc ## 2019-09-11 - Maria Garruccio asked me to add two new Bioversity ORCID identifiers to CGSpace so I created a [pull request](https://github.com/ilri/DSpace/pull/431) - Marissa Van Epp asked me to add new CCAFS Phase II project tags to CGSpace so I created a [pull request](https://github.com/ilri/DSpace/pull/432) - I will wait until I hear from her to merge it because there is one tag that seems to be a duplicate because its name (PII-WA_agrosylvopast) is similar to one that already exists (PII-WA_AgroSylvopastoralSystems) - More work on the CG Core v2 migrations - I have updated my [notes on the possible changes](https://gist.github.com/alanorth/2db39e91f48d116e00a4edffd6ba6409) and done more work on the XMLUI replacements ## 2019-09-12 - Deploy [PostgreSQL JDBC driver](https://jdbc.postgresql.org/) version 42.2.7 on DSpace Test and update the [Ansible infrastructure scripts](https://github.com/ilri/rmg-ansible-public)