From 16ba5723eb283b4ef3976d10de500d0170a8fc05 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Tue, 31 Jan 2023 22:20:38 +0300 Subject: [PATCH] Add notes for 2023-01-31 --- content/posts/2023-01.md | 55 ++++++++++++++++++++ docs/2023-01/index.html | 68 +++++++++++++++++++++++-- docs/categories/index.html | 2 +- docs/categories/notes/index.html | 2 +- docs/categories/notes/page/2/index.html | 2 +- docs/categories/notes/page/3/index.html | 2 +- docs/categories/notes/page/4/index.html | 2 +- docs/categories/notes/page/5/index.html | 2 +- docs/categories/notes/page/6/index.html | 2 +- docs/categories/notes/page/7/index.html | 2 +- docs/index.html | 2 +- docs/page/2/index.html | 2 +- docs/page/3/index.html | 2 +- docs/page/4/index.html | 2 +- docs/page/5/index.html | 2 +- docs/page/6/index.html | 2 +- docs/page/7/index.html | 2 +- docs/page/8/index.html | 2 +- docs/page/9/index.html | 2 +- docs/posts/index.html | 2 +- docs/posts/page/2/index.html | 2 +- docs/posts/page/3/index.html | 2 +- docs/posts/page/4/index.html | 2 +- docs/posts/page/5/index.html | 2 +- docs/posts/page/6/index.html | 2 +- docs/posts/page/7/index.html | 2 +- docs/posts/page/8/index.html | 2 +- docs/posts/page/9/index.html | 2 +- docs/sitemap.xml | 10 ++-- 29 files changed, 151 insertions(+), 34 deletions(-) diff --git a/content/posts/2023-01.md b/content/posts/2023-01.md index 609e224ef..168277dc4 100644 --- a/content/posts/2023-01.md +++ b/content/posts/2023-01.md @@ -550,5 +550,60 @@ $ csvjoin -c doi /tmp/cgspace-temp.csv /tmp/crossref-results.csv \ - The above was done with just 5,000 DOIs because it was taking a long time, but after the last step I imported into OpenRefine to clean up the license URLs - Then I imported 635 new licenses to CGSpace woooo - After checking the remaining 6,500 DOIs there were another 852 new licenses, woooo +- Peter finished the corrections on affiliations, authors, and donors + - I quickly checked them and applied each on CGSpace +- Start a harvest on AReS + +## 2023-01-30 + +- Run the thumbnail fixer tasks on the Initiatives collections: + +```console +$ chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails 10568/115087 | tee -a /tmp/FixLowQualityThumbnails.log +$ grep -c remove /tmp/FixLowQualityThumbnails.log +16 +$ chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixJpgJpgThumbnails 10568/115087 | tee -a /tmp/FixJpgJpgThumbnails.log +$ grep -c replacing /tmp/FixJpgJpgThumbnails.log +13 +``` + +## 2023-01-31 + +- Someone from the Google Scholar team contacted us to ask why Googlebot is blocked from crawling CGSpace + - I said that I blocked them because they crawl haphazardly and we had high load during PRMS reporting + - Now I will unblock their ASN15169 in nginx... + - I urged them to be smarter about crawling since we're a small team and they are a huge engineering company +- I removed their ASN and regenerted my list from 2023-01-17: + +```console +$ wget https://asn.ipinfo.app/api/text/list/AS714 \ + https://asn.ipinfo.app/api/text/list/AS16276 \ + https://asn.ipinfo.app/api/text/list/AS23576 \ + https://asn.ipinfo.app/api/text/list/AS24940 \ + https://asn.ipinfo.app/api/text/list/AS13238 \ + https://asn.ipinfo.app/api/text/list/AS32934 \ + https://asn.ipinfo.app/api/text/list/AS14061 \ + https://asn.ipinfo.app/api/text/list/AS12876 \ + https://asn.ipinfo.app/api/text/list/AS55286 \ + https://asn.ipinfo.app/api/text/list/AS203020 \ + https://asn.ipinfo.app/api/text/list/AS204287 \ + https://asn.ipinfo.app/api/text/list/AS50245 \ + https://asn.ipinfo.app/api/text/list/AS6939 \ + https://asn.ipinfo.app/api/text/list/AS16509 \ + https://asn.ipinfo.app/api/text/list/AS14618 +$ cat AS* | sort | uniq | wc -l +17134 +$ cat /tmp/AS* | ~/go/bin/mapcidr -a > /tmp/networks.txt +``` + +- Then I updated nginx... +- Re-run the scripts to delete duplicate metadata values and update item timestamps that I originally used in 2022-11 + - This was about 650 duplicate metadata values... +- Exported CGSpace to do some metadata interrogation in OpenRefine + - I looked at items that are set as `Limited Access` but have Creative Commons licenses + - I filtered ~150 that had DOIs and checked them on the Crossref API using `crossref-doi-lookup.py` + - Of those, only about five or so were incorrectly marked as having Creative Commons licenses, so I set those to copyrighted + - For the rest, I set them to Open Access +- Start a harvest on AReS diff --git a/docs/2023-01/index.html b/docs/2023-01/index.html index c7efc4b61..47fd7101e 100644 --- a/docs/2023-01/index.html +++ b/docs/2023-01/index.html @@ -19,7 +19,7 @@ I see we have some new ones that aren’t in our list if I combine with this - + @@ -44,9 +44,9 @@ I see we have some new ones that aren’t in our list if I combine with this "@type": "BlogPosting", "headline": "January, 2023", "url": "https://alanorth.github.io/cgspace-notes/2023-01/", - "wordCount": "4065", + "wordCount": "4361", "datePublished": "2023-01-01T08:44:36+03:00", - "dateModified": "2023-01-22T21:53:45+03:00", + "dateModified": "2023-01-29T18:19:31+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -743,6 +743,68 @@ I see we have some new ones that aren’t in our list if I combine with this
  • After checking the remaining 6,500 DOIs there were another 852 new licenses, woooo
  • +
  • Peter finished the corrections on affiliations, authors, and donors + +
  • +
  • Start a harvest on AReS
  • + +

    2023-01-30

    + +
    $ chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails 10568/115087 | tee -a /tmp/FixLowQualityThumbnails.log
    +$ grep -c remove /tmp/FixLowQualityThumbnails.log
    +16
    +$ chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixJpgJpgThumbnails 10568/115087 | tee -a /tmp/FixJpgJpgThumbnails.log
    +$ grep -c replacing /tmp/FixJpgJpgThumbnails.log 
    +13
    +

    2023-01-31

    + +
    $ wget https://asn.ipinfo.app/api/text/list/AS714 \
    +     https://asn.ipinfo.app/api/text/list/AS16276 \
    +     https://asn.ipinfo.app/api/text/list/AS23576 \
    +     https://asn.ipinfo.app/api/text/list/AS24940 \
    +     https://asn.ipinfo.app/api/text/list/AS13238 \
    +     https://asn.ipinfo.app/api/text/list/AS32934 \
    +     https://asn.ipinfo.app/api/text/list/AS14061 \
    +     https://asn.ipinfo.app/api/text/list/AS12876 \
    +     https://asn.ipinfo.app/api/text/list/AS55286 \
    +     https://asn.ipinfo.app/api/text/list/AS203020 \
    +     https://asn.ipinfo.app/api/text/list/AS204287 \
    +     https://asn.ipinfo.app/api/text/list/AS50245 \
    +     https://asn.ipinfo.app/api/text/list/AS6939 \
    +     https://asn.ipinfo.app/api/text/list/AS16509 \
    +     https://asn.ipinfo.app/api/text/list/AS14618
    +$ cat AS* | sort | uniq | wc -l
    +17134
    +$ cat /tmp/AS* | ~/go/bin/mapcidr -a > /tmp/networks.txt
    +
    diff --git a/docs/categories/index.html b/docs/categories/index.html index 73c8a3eeb..03f3ba99d 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index 1626b17fb..d7091129d 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index 4ad407437..ff56dbae0 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 0c79c126a..97019f27c 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 13b8ca7a0..120c651a2 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html index 1ec88c9fc..b995b0937 100644 --- a/docs/categories/notes/page/5/index.html +++ b/docs/categories/notes/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html index e25b5ebbe..6639f5bcd 100644 --- a/docs/categories/notes/page/6/index.html +++ b/docs/categories/notes/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html index 908c89644..0bea343cc 100644 --- a/docs/categories/notes/page/7/index.html +++ b/docs/categories/notes/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/index.html b/docs/index.html index 23c02d19e..387d3a5d6 100644 --- a/docs/index.html +++ b/docs/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index 617be7b7c..4e0f7758a 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index f8081fcb2..2cfd330b2 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index 25d4d3fe2..d8f682464 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index 545d403d9..aa745d2b9 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index cc687a047..3291c70ca 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/7/index.html b/docs/page/7/index.html index 1916f88be..914c04b0d 100644 --- a/docs/page/7/index.html +++ b/docs/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/8/index.html b/docs/page/8/index.html index 95664ba46..9abfaa26e 100644 --- a/docs/page/8/index.html +++ b/docs/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/page/9/index.html b/docs/page/9/index.html index b1bad4638..753894726 100644 --- a/docs/page/9/index.html +++ b/docs/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 9ddaadcb3..3c80b2e3e 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index dce297d43..ed0bd4483 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 1bea9df1e..5070c074d 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index 229963897..861fa2358 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index 32cbbad34..773e6f61a 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index fa351f861..2afbfa7d5 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html index 315577044..59dd432c8 100644 --- a/docs/posts/page/7/index.html +++ b/docs/posts/page/7/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html index b3d68b506..858f07088 100644 --- a/docs/posts/page/8/index.html +++ b/docs/posts/page/8/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html index 451fd1b2c..06b165e60 100644 --- a/docs/posts/page/9/index.html +++ b/docs/posts/page/9/index.html @@ -10,7 +10,7 @@ - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 713719b80..b216d8835 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -3,19 +3,19 @@ xmlns:xhtml="http://www.w3.org/1999/xhtml"> https://alanorth.github.io/cgspace-notes/categories/ - 2023-01-22T21:53:45+03:00 + 2023-01-29T18:19:31+03:00 https://alanorth.github.io/cgspace-notes/ - 2023-01-22T21:53:45+03:00 + 2023-01-29T18:19:31+03:00 https://alanorth.github.io/cgspace-notes/2023-01/ - 2023-01-22T21:53:45+03:00 + 2023-01-29T18:19:31+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2023-01-22T21:53:45+03:00 + 2023-01-29T18:19:31+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2023-01-22T21:53:45+03:00 + 2023-01-29T18:19:31+03:00 https://alanorth.github.io/cgspace-notes/2022-12/ 2023-01-01T10:12:13+02:00