Add notes for 2023-01-31

This commit is contained in:
Alan Orth 2023-01-31 22:20:38 +03:00
parent 81f04f48ad
commit 16ba5723eb
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
29 changed files with 151 additions and 34 deletions

View File

@ -550,5 +550,60 @@ $ csvjoin -c doi /tmp/cgspace-temp.csv /tmp/crossref-results.csv \
- The above was done with just 5,000 DOIs because it was taking a long time, but after the last step I imported into OpenRefine to clean up the license URLs
- Then I imported 635 new licenses to CGSpace woooo
- After checking the remaining 6,500 DOIs there were another 852 new licenses, woooo
- Peter finished the corrections on affiliations, authors, and donors
- I quickly checked them and applied each on CGSpace
- Start a harvest on AReS
## 2023-01-30
- Run the thumbnail fixer tasks on the Initiatives collections:
```console
$ chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails 10568/115087 | tee -a /tmp/FixLowQualityThumbnails.log
$ grep -c remove /tmp/FixLowQualityThumbnails.log
16
$ chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixJpgJpgThumbnails 10568/115087 | tee -a /tmp/FixJpgJpgThumbnails.log
$ grep -c replacing /tmp/FixJpgJpgThumbnails.log
13
```
## 2023-01-31
- Someone from the Google Scholar team contacted us to ask why Googlebot is blocked from crawling CGSpace
- I said that I blocked them because they crawl haphazardly and we had high load during PRMS reporting
- Now I will unblock their ASN15169 in nginx...
- I urged them to be smarter about crawling since we're a small team and they are a huge engineering company
- I removed their ASN and regenerted my list from 2023-01-17:
```console
$ wget https://asn.ipinfo.app/api/text/list/AS714 \
https://asn.ipinfo.app/api/text/list/AS16276 \
https://asn.ipinfo.app/api/text/list/AS23576 \
https://asn.ipinfo.app/api/text/list/AS24940 \
https://asn.ipinfo.app/api/text/list/AS13238 \
https://asn.ipinfo.app/api/text/list/AS32934 \
https://asn.ipinfo.app/api/text/list/AS14061 \
https://asn.ipinfo.app/api/text/list/AS12876 \
https://asn.ipinfo.app/api/text/list/AS55286 \
https://asn.ipinfo.app/api/text/list/AS203020 \
https://asn.ipinfo.app/api/text/list/AS204287 \
https://asn.ipinfo.app/api/text/list/AS50245 \
https://asn.ipinfo.app/api/text/list/AS6939 \
https://asn.ipinfo.app/api/text/list/AS16509 \
https://asn.ipinfo.app/api/text/list/AS14618
$ cat AS* | sort | uniq | wc -l
17134
$ cat /tmp/AS* | ~/go/bin/mapcidr -a > /tmp/networks.txt
```
- Then I updated nginx...
- Re-run the scripts to delete duplicate metadata values and update item timestamps that I originally used in 2022-11
- This was about 650 duplicate metadata values...
- Exported CGSpace to do some metadata interrogation in OpenRefine
- I looked at items that are set as `Limited Access` but have Creative Commons licenses
- I filtered ~150 that had DOIs and checked them on the Crossref API using `crossref-doi-lookup.py`
- Of those, only about five or so were incorrectly marked as having Creative Commons licenses, so I set those to copyrighted
- For the rest, I set them to Open Access
- Start a harvest on AReS
<!-- vim: set sw=2 ts=2: -->

View File

@ -19,7 +19,7 @@ I see we have some new ones that aren&rsquo;t in our list if I combine with this
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2023-01/" />
<meta property="article:published_time" content="2023-01-01T08:44:36+03:00" />
<meta property="article:modified_time" content="2023-01-22T21:53:45+03:00" />
<meta property="article:modified_time" content="2023-01-29T18:19:31+03:00" />
@ -44,9 +44,9 @@ I see we have some new ones that aren&rsquo;t in our list if I combine with this
"@type": "BlogPosting",
"headline": "January, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-01/",
"wordCount": "4065",
"wordCount": "4361",
"datePublished": "2023-01-01T08:44:36+03:00",
"dateModified": "2023-01-22T21:53:45+03:00",
"dateModified": "2023-01-29T18:19:31+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -743,6 +743,68 @@ I see we have some new ones that aren&rsquo;t in our list if I combine with this
<li>After checking the remaining 6,500 DOIs there were another 852 new licenses, woooo</li>
</ul>
</li>
<li>Peter finished the corrections on affiliations, authors, and donors
<ul>
<li>I quickly checked them and applied each on CGSpace</li>
</ul>
</li>
<li>Start a harvest on AReS</li>
</ul>
<h2 id="2023-01-30">2023-01-30</h2>
<ul>
<li>Run the thumbnail fixer tasks on the Initiatives collections:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ chrt -b <span style="color:#ae81ff">0</span> dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails 10568/115087 | tee -a /tmp/FixLowQualityThumbnails.log
</span></span><span style="display:flex;"><span>$ grep -c remove /tmp/FixLowQualityThumbnails.log
</span></span><span style="display:flex;"><span>16
</span></span><span style="display:flex;"><span>$ chrt -b <span style="color:#ae81ff">0</span> dspace dsrun io.github.ilri.cgspace.scripts.FixJpgJpgThumbnails 10568/115087 | tee -a /tmp/FixJpgJpgThumbnails.log
</span></span><span style="display:flex;"><span>$ grep -c replacing /tmp/FixJpgJpgThumbnails.log
</span></span><span style="display:flex;"><span>13
</span></span></code></pre></div><h2 id="2023-01-31">2023-01-31</h2>
<ul>
<li>Someone from the Google Scholar team contacted us to ask why Googlebot is blocked from crawling CGSpace
<ul>
<li>I said that I blocked them because they crawl haphazardly and we had high load during PRMS reporting</li>
<li>Now I will unblock their ASN15169 in nginx&hellip;</li>
<li>I urged them to be smarter about crawling since we&rsquo;re a small team and they are a huge engineering company</li>
</ul>
</li>
<li>I removed their ASN and regenerted my list from 2023-01-17:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ wget https://asn.ipinfo.app/api/text/list/AS714 <span style="color:#ae81ff">\
</span></span></span><span style="display:flex;"><span><span style="color:#ae81ff"></span> https://asn.ipinfo.app/api/text/list/AS16276 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS23576 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS24940 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS13238 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS32934 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS14061 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS12876 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS55286 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS203020 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS204287 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS50245 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS6939 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS16509 \
</span></span><span style="display:flex;"><span> https://asn.ipinfo.app/api/text/list/AS14618
</span></span><span style="display:flex;"><span>$ cat AS* | sort | uniq | wc -l
</span></span><span style="display:flex;"><span>17134
</span></span><span style="display:flex;"><span>$ cat /tmp/AS* | ~/go/bin/mapcidr -a &gt; /tmp/networks.txt
</span></span></code></pre></div><ul>
<li>Then I updated nginx&hellip;</li>
<li>Re-run the scripts to delete duplicate metadata values and update item timestamps that I originally used in 2022-11
<ul>
<li>This was about 650 duplicate metadata values&hellip;</li>
</ul>
</li>
<li>Exported CGSpace to do some metadata interrogation in OpenRefine
<ul>
<li>I looked at items that are set as <code>Limited Access</code> but have Creative Commons licenses</li>
<li>I filtered ~150 that had DOIs and checked them on the Crossref API using <code>crossref-doi-lookup.py</code></li>
<li>Of those, only about five or so were incorrectly marked as having Creative Commons licenses, so I set those to copyrighted</li>
<li>For the rest, I set them to Open Access</li>
</ul>
</li>
<li>Start a harvest on AReS</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-01-22T21:53:45+03:00" />
<meta property="og:updated_time" content="2023-01-29T18:19:31+03:00" />

View File

@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2023-01-22T21:53:45+03:00</lastmod>
<lastmod>2023-01-29T18:19:31+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2023-01-22T21:53:45+03:00</lastmod>
<lastmod>2023-01-29T18:19:31+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2023-01/</loc>
<lastmod>2023-01-22T21:53:45+03:00</lastmod>
<lastmod>2023-01-29T18:19:31+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2023-01-22T21:53:45+03:00</lastmod>
<lastmod>2023-01-29T18:19:31+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2023-01-22T21:53:45+03:00</lastmod>
<lastmod>2023-01-29T18:19:31+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2022-12/</loc>
<lastmod>2023-01-01T10:12:13+02:00</lastmod>