mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2023-01-31
This commit is contained in:
@ -550,5 +550,60 @@ $ csvjoin -c doi /tmp/cgspace-temp.csv /tmp/crossref-results.csv \
|
||||
- The above was done with just 5,000 DOIs because it was taking a long time, but after the last step I imported into OpenRefine to clean up the license URLs
|
||||
- Then I imported 635 new licenses to CGSpace woooo
|
||||
- After checking the remaining 6,500 DOIs there were another 852 new licenses, woooo
|
||||
- Peter finished the corrections on affiliations, authors, and donors
|
||||
- I quickly checked them and applied each on CGSpace
|
||||
- Start a harvest on AReS
|
||||
|
||||
## 2023-01-30
|
||||
|
||||
- Run the thumbnail fixer tasks on the Initiatives collections:
|
||||
|
||||
```console
|
||||
$ chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixLowQualityThumbnails 10568/115087 | tee -a /tmp/FixLowQualityThumbnails.log
|
||||
$ grep -c remove /tmp/FixLowQualityThumbnails.log
|
||||
16
|
||||
$ chrt -b 0 dspace dsrun io.github.ilri.cgspace.scripts.FixJpgJpgThumbnails 10568/115087 | tee -a /tmp/FixJpgJpgThumbnails.log
|
||||
$ grep -c replacing /tmp/FixJpgJpgThumbnails.log
|
||||
13
|
||||
```
|
||||
|
||||
## 2023-01-31
|
||||
|
||||
- Someone from the Google Scholar team contacted us to ask why Googlebot is blocked from crawling CGSpace
|
||||
- I said that I blocked them because they crawl haphazardly and we had high load during PRMS reporting
|
||||
- Now I will unblock their ASN15169 in nginx...
|
||||
- I urged them to be smarter about crawling since we're a small team and they are a huge engineering company
|
||||
- I removed their ASN and regenerted my list from 2023-01-17:
|
||||
|
||||
```console
|
||||
$ wget https://asn.ipinfo.app/api/text/list/AS714 \
|
||||
https://asn.ipinfo.app/api/text/list/AS16276 \
|
||||
https://asn.ipinfo.app/api/text/list/AS23576 \
|
||||
https://asn.ipinfo.app/api/text/list/AS24940 \
|
||||
https://asn.ipinfo.app/api/text/list/AS13238 \
|
||||
https://asn.ipinfo.app/api/text/list/AS32934 \
|
||||
https://asn.ipinfo.app/api/text/list/AS14061 \
|
||||
https://asn.ipinfo.app/api/text/list/AS12876 \
|
||||
https://asn.ipinfo.app/api/text/list/AS55286 \
|
||||
https://asn.ipinfo.app/api/text/list/AS203020 \
|
||||
https://asn.ipinfo.app/api/text/list/AS204287 \
|
||||
https://asn.ipinfo.app/api/text/list/AS50245 \
|
||||
https://asn.ipinfo.app/api/text/list/AS6939 \
|
||||
https://asn.ipinfo.app/api/text/list/AS16509 \
|
||||
https://asn.ipinfo.app/api/text/list/AS14618
|
||||
$ cat AS* | sort | uniq | wc -l
|
||||
17134
|
||||
$ cat /tmp/AS* | ~/go/bin/mapcidr -a > /tmp/networks.txt
|
||||
```
|
||||
|
||||
- Then I updated nginx...
|
||||
- Re-run the scripts to delete duplicate metadata values and update item timestamps that I originally used in 2022-11
|
||||
- This was about 650 duplicate metadata values...
|
||||
- Exported CGSpace to do some metadata interrogation in OpenRefine
|
||||
- I looked at items that are set as `Limited Access` but have Creative Commons licenses
|
||||
- I filtered ~150 that had DOIs and checked them on the Crossref API using `crossref-doi-lookup.py`
|
||||
- Of those, only about five or so were incorrectly marked as having Creative Commons licenses, so I set those to copyrighted
|
||||
- For the rest, I set them to Open Access
|
||||
- Start a harvest on AReS
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
Reference in New Issue
Block a user