diff --git a/content/posts/2023-01.md b/content/posts/2023-01.md
index 609e224ef..168277dc4 100644
--- a/content/posts/2023-01.md
+++ b/content/posts/2023-01.md
@@ -550,5 +550,60 @@ $ csvjoin -c doi /tmp/cgspace-temp.csv /tmp/crossref-results.csv \
- The above was done with just 5,000 DOIs because it was taking a long time, but after the last step I imported into OpenRefine to clean up the license URLs
- Then I imported 635 new licenses to CGSpace woooo
- After checking the remaining 6,500 DOIs there were another 852 new licenses, woooo
+- Peter finished the corrections on affiliations, authors, and donors
+ - I quickly checked them and applied each on CGSpace
+- Start a harvest on AReS
+
+## 2023-01-30
+
+- Run the thumbnail fixer tasks on the Initiatives collections:
+
+```console
+$ chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails 10568/115087 | tee -a /tmp/FixLowQualityThumbnails.log
+$ grep -c remove /tmp/FixLowQualityThumbnails.log
+16
+$ chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixJpgJpgThumbnails 10568/115087 | tee -a /tmp/FixJpgJpgThumbnails.log
+$ grep -c replacing /tmp/FixJpgJpgThumbnails.log
+13
+```
+
+## 2023-01-31
+
+- Someone from the Google Scholar team contacted us to ask why Googlebot is blocked from crawling CGSpace
+ - I said that I blocked them because they crawl haphazardly and we had high load during PRMS reporting
+ - Now I will unblock their ASN15169 in nginx...
+ - I urged them to be smarter about crawling since we're a small team and they are a huge engineering company
+- I removed their ASN and regenerted my list from 2023-01-17:
+
+```console
+$ wget https://asn.ipinfo.app/api/text/list/AS714 \
+ https://asn.ipinfo.app/api/text/list/AS16276 \
+ https://asn.ipinfo.app/api/text/list/AS23576 \
+ https://asn.ipinfo.app/api/text/list/AS24940 \
+ https://asn.ipinfo.app/api/text/list/AS13238 \
+ https://asn.ipinfo.app/api/text/list/AS32934 \
+ https://asn.ipinfo.app/api/text/list/AS14061 \
+ https://asn.ipinfo.app/api/text/list/AS12876 \
+ https://asn.ipinfo.app/api/text/list/AS55286 \
+ https://asn.ipinfo.app/api/text/list/AS203020 \
+ https://asn.ipinfo.app/api/text/list/AS204287 \
+ https://asn.ipinfo.app/api/text/list/AS50245 \
+ https://asn.ipinfo.app/api/text/list/AS6939 \
+ https://asn.ipinfo.app/api/text/list/AS16509 \
+ https://asn.ipinfo.app/api/text/list/AS14618
+$ cat AS* | sort | uniq | wc -l
+17134
+$ cat /tmp/AS* | ~/go/bin/mapcidr -a > /tmp/networks.txt
+```
+
+- Then I updated nginx...
+- Re-run the scripts to delete duplicate metadata values and update item timestamps that I originally used in 2022-11
+ - This was about 650 duplicate metadata values...
+- Exported CGSpace to do some metadata interrogation in OpenRefine
+ - I looked at items that are set as `Limited Access` but have Creative Commons licenses
+ - I filtered ~150 that had DOIs and checked them on the Crossref API using `crossref-doi-lookup.py`
+ - Of those, only about five or so were incorrectly marked as having Creative Commons licenses, so I set those to copyrighted
+ - For the rest, I set them to Open Access
+- Start a harvest on AReS
diff --git a/docs/2023-01/index.html b/docs/2023-01/index.html
index c7efc4b61..47fd7101e 100644
--- a/docs/2023-01/index.html
+++ b/docs/2023-01/index.html
@@ -19,7 +19,7 @@ I see we have some new ones that aren’t in our list if I combine with this
-
+
@@ -44,9 +44,9 @@ I see we have some new ones that aren’t in our list if I combine with this
"@type": "BlogPosting",
"headline": "January, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-01/",
- "wordCount": "4065",
+ "wordCount": "4361",
"datePublished": "2023-01-01T08:44:36+03:00",
- "dateModified": "2023-01-22T21:53:45+03:00",
+ "dateModified": "2023-01-29T18:19:31+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@@ -743,6 +743,68 @@ I see we have some new ones that aren’t in our list if I combine with this
After checking the remaining 6,500 DOIs there were another 852 new licenses, woooo
+Peter finished the corrections on affiliations, authors, and donors
+
+- I quickly checked them and applied each on CGSpace
+
+
+Start a harvest on AReS
+
+2023-01-30
+
+- Run the thumbnail fixer tasks on the Initiatives collections:
+
+$ chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails 10568/115087 | tee -a /tmp/FixLowQualityThumbnails.log
+$ grep -c remove /tmp/FixLowQualityThumbnails.log
+16
+$ chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixJpgJpgThumbnails 10568/115087 | tee -a /tmp/FixJpgJpgThumbnails.log
+$ grep -c replacing /tmp/FixJpgJpgThumbnails.log
+13
+
2023-01-31
+
+- Someone from the Google Scholar team contacted us to ask why Googlebot is blocked from crawling CGSpace
+
+- I said that I blocked them because they crawl haphazardly and we had high load during PRMS reporting
+- Now I will unblock their ASN15169 in nginx…
+- I urged them to be smarter about crawling since we’re a small team and they are a huge engineering company
+
+
+- I removed their ASN and regenerted my list from 2023-01-17:
+
+$ wget https://asn.ipinfo.app/api/text/list/AS714 \
+ https://asn.ipinfo.app/api/text/list/AS16276 \
+ https://asn.ipinfo.app/api/text/list/AS23576 \
+ https://asn.ipinfo.app/api/text/list/AS24940 \
+ https://asn.ipinfo.app/api/text/list/AS13238 \
+ https://asn.ipinfo.app/api/text/list/AS32934 \
+ https://asn.ipinfo.app/api/text/list/AS14061 \
+ https://asn.ipinfo.app/api/text/list/AS12876 \
+ https://asn.ipinfo.app/api/text/list/AS55286 \
+ https://asn.ipinfo.app/api/text/list/AS203020 \
+ https://asn.ipinfo.app/api/text/list/AS204287 \
+ https://asn.ipinfo.app/api/text/list/AS50245 \
+ https://asn.ipinfo.app/api/text/list/AS6939 \
+ https://asn.ipinfo.app/api/text/list/AS16509 \
+ https://asn.ipinfo.app/api/text/list/AS14618
+$ cat AS* | sort | uniq | wc -l
+17134
+$ cat /tmp/AS* | ~/go/bin/mapcidr -a > /tmp/networks.txt
+
+- Then I updated nginx…
+- Re-run the scripts to delete duplicate metadata values and update item timestamps that I originally used in 2022-11
+
+- This was about 650 duplicate metadata values…
+
+
+- Exported CGSpace to do some metadata interrogation in OpenRefine
+
+- I looked at items that are set as
Limited Access
but have Creative Commons licenses
+- I filtered ~150 that had DOIs and checked them on the Crossref API using
crossref-doi-lookup.py
+- Of those, only about five or so were incorrectly marked as having Creative Commons licenses, so I set those to copyrighted
+- For the rest, I set them to Open Access
+
+
+- Start a harvest on AReS
diff --git a/docs/categories/index.html b/docs/categories/index.html
index 73c8a3eeb..03f3ba99d 100644
--- a/docs/categories/index.html
+++ b/docs/categories/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html
index 1626b17fb..d7091129d 100644
--- a/docs/categories/notes/index.html
+++ b/docs/categories/notes/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html
index 4ad407437..ff56dbae0 100644
--- a/docs/categories/notes/page/2/index.html
+++ b/docs/categories/notes/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html
index 0c79c126a..97019f27c 100644
--- a/docs/categories/notes/page/3/index.html
+++ b/docs/categories/notes/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html
index 13b8ca7a0..120c651a2 100644
--- a/docs/categories/notes/page/4/index.html
+++ b/docs/categories/notes/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/5/index.html b/docs/categories/notes/page/5/index.html
index 1ec88c9fc..b995b0937 100644
--- a/docs/categories/notes/page/5/index.html
+++ b/docs/categories/notes/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/6/index.html b/docs/categories/notes/page/6/index.html
index e25b5ebbe..6639f5bcd 100644
--- a/docs/categories/notes/page/6/index.html
+++ b/docs/categories/notes/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/categories/notes/page/7/index.html b/docs/categories/notes/page/7/index.html
index 908c89644..0bea343cc 100644
--- a/docs/categories/notes/page/7/index.html
+++ b/docs/categories/notes/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/index.html b/docs/index.html
index 23c02d19e..387d3a5d6 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/2/index.html b/docs/page/2/index.html
index 617be7b7c..4e0f7758a 100644
--- a/docs/page/2/index.html
+++ b/docs/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/3/index.html b/docs/page/3/index.html
index f8081fcb2..2cfd330b2 100644
--- a/docs/page/3/index.html
+++ b/docs/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/4/index.html b/docs/page/4/index.html
index 25d4d3fe2..d8f682464 100644
--- a/docs/page/4/index.html
+++ b/docs/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/5/index.html b/docs/page/5/index.html
index 545d403d9..aa745d2b9 100644
--- a/docs/page/5/index.html
+++ b/docs/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/6/index.html b/docs/page/6/index.html
index cc687a047..3291c70ca 100644
--- a/docs/page/6/index.html
+++ b/docs/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/7/index.html b/docs/page/7/index.html
index 1916f88be..914c04b0d 100644
--- a/docs/page/7/index.html
+++ b/docs/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/8/index.html b/docs/page/8/index.html
index 95664ba46..9abfaa26e 100644
--- a/docs/page/8/index.html
+++ b/docs/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/page/9/index.html b/docs/page/9/index.html
index b1bad4638..753894726 100644
--- a/docs/page/9/index.html
+++ b/docs/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/index.html b/docs/posts/index.html
index 9ddaadcb3..3c80b2e3e 100644
--- a/docs/posts/index.html
+++ b/docs/posts/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html
index dce297d43..ed0bd4483 100644
--- a/docs/posts/page/2/index.html
+++ b/docs/posts/page/2/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html
index 1bea9df1e..5070c074d 100644
--- a/docs/posts/page/3/index.html
+++ b/docs/posts/page/3/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html
index 229963897..861fa2358 100644
--- a/docs/posts/page/4/index.html
+++ b/docs/posts/page/4/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html
index 32cbbad34..773e6f61a 100644
--- a/docs/posts/page/5/index.html
+++ b/docs/posts/page/5/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html
index fa351f861..2afbfa7d5 100644
--- a/docs/posts/page/6/index.html
+++ b/docs/posts/page/6/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/7/index.html b/docs/posts/page/7/index.html
index 315577044..59dd432c8 100644
--- a/docs/posts/page/7/index.html
+++ b/docs/posts/page/7/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/8/index.html b/docs/posts/page/8/index.html
index b3d68b506..858f07088 100644
--- a/docs/posts/page/8/index.html
+++ b/docs/posts/page/8/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/posts/page/9/index.html b/docs/posts/page/9/index.html
index 451fd1b2c..06b165e60 100644
--- a/docs/posts/page/9/index.html
+++ b/docs/posts/page/9/index.html
@@ -10,7 +10,7 @@
-
+
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
index 713719b80..b216d8835 100644
--- a/docs/sitemap.xml
+++ b/docs/sitemap.xml
@@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
https://alanorth.github.io/cgspace-notes/categories/
- 2023-01-22T21:53:45+03:00
+ 2023-01-29T18:19:31+03:00
https://alanorth.github.io/cgspace-notes/
- 2023-01-22T21:53:45+03:00
+ 2023-01-29T18:19:31+03:00
https://alanorth.github.io/cgspace-notes/2023-01/
- 2023-01-22T21:53:45+03:00
+ 2023-01-29T18:19:31+03:00
https://alanorth.github.io/cgspace-notes/categories/notes/
- 2023-01-22T21:53:45+03:00
+ 2023-01-29T18:19:31+03:00
https://alanorth.github.io/cgspace-notes/posts/
- 2023-01-22T21:53:45+03:00
+ 2023-01-29T18:19:31+03:00
https://alanorth.github.io/cgspace-notes/2022-12/
2023-01-01T10:12:13+02:00