diff --git a/content/posts/2021-10.md b/content/posts/2021-10.md new file mode 100644 index 000000000..f9109890f --- /dev/null +++ b/content/posts/2021-10.md @@ -0,0 +1,110 @@ +--- +title: "October, 2021" +date: 2021-10-01T11:14:07+03:00 +author: "Alan Orth" +categories: ["Notes"] +--- + +## 2021-10-01 + +- Export all affiliations on CGSpace and run them against the latest RoR data dump: + +```console +localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER; +$ csvcut -c 1 /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt +$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili +ations-matching.csv +$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l +1879 +$ wc -l /tmp/2021-10-01-affiliations.txt +7100 /tmp/2021-10-01-affiliations.txt +``` + +- So we have 1879/7100 (26.46%) matching already + + + +## 2021-10-03 + +- Dominique from IWMI asked me for information about how CGSpace partners are using CGSpace APIs to feed their websites +- Start a fresh indexing on AReS +- Udana sent me his file of 292 non-IWMI publications for the Virtual library on water management + - He added licenses + - I want to clean up the `dcterms.extent` field though because it has volume, issue, and pages there + - I cloned the column several times and extracted values based on their positions, for example: + - Volume: `value.partition(":")[0]` + - Issue: `value.partition("(")[2].partition(")")[0]` + - Page: `"p. " + value.replace(".", "")` + +## 2021-10-04 + +- Start looking at the last month of Solr statistics on CGSpace + - I see a number of IPs with "normal" user agents who clearly behave like bots + - 198.15.130.18: 21,000 requests to /discover with a normal-looking user agent, from ASN 11282 (SERVERYOU, US) + - 93.158.90.107: 8,500 requests to handle and browse links with a Firefox 84.0 user agent, from ASN 12552 (IPO-EU, SE) + - 193.235.141.162: 4,800 requests to handle, browse, and discovery links with a Firefox 84.0 user agent, from ASN 51747 (INTERNETBOLAGET, SE) + - 3.225.28.105: 2,900 requests to REST API for the CIAT Story Maps collection with a normal user agent, from ASN 14618 (AMAZON-AES, US) + - 34.228.236.6: 2,800 requests to discovery for the CGIAR System community with user agent `Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)`, from ASN 14618 (AMAZON-AES, US) + - 18.212.137.2: 2,800 requests to discovery for the CGIAR System community with user agent `Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)`, from ASN 14618 (AMAZON-AES, US) + - 3.81.123.72: 2,800 requests to discovery and handles for the CGIAR System community with user agent `Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)`, from ASN 14618 (AMAZON-AES, US) + - 3.227.16.188: 2,800 requests to discovery and handles for the CGIAR System community with user agent `Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)`, from ASN 14618 (AMAZON-AES, US) + - Looking closer into the requests with this Mozilla/4.0 user agent, I see 500+ IPs using it: + +```console +# zcat --force /var/log/nginx/*.log* | grep 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)' | awk '{print $1}' | sort | uniq > /tmp/mozilla-4.0-ips.txt +# wc -l /tmp/mozilla-4.0-ips.txt +543 /tmp/mozilla-4.0-ips.txt +``` + +- Then I resolved the IPs and extracted the ones belonging to Amazon: + +```console +$ ./ilri/resolve-addresses-geoip2.py -i /tmp/mozilla-4.0-ips.txt -k "$ABUSEIPDB_API_KEY" -o /tmp/mozilla-4.0-ips.csv +$ csvgrep -c asn -m 14618 /tmp/mozilla-4.0-ips.csv | csvcut -c ip | sed 1d | tee /tmp/amazon-ips.txt | wc -l +``` + +- I am thinking I will purge them all, as I have several indicators that they are bots: mysterious user agent, IP owned by Amazon +- Even more interesting, these requests are weighted VERY heavily on the CGIAR System community: + +```console + 1592 GET /handle/10947/2526 + 1592 GET /handle/10947/2527 + 1592 GET /handle/10947/34 + 1593 GET /handle/10947/6 + 1594 GET /handle/10947/1 + 1598 GET /handle/10947/2515 + 1598 GET /handle/10947/2516 + 1599 GET /handle/10568/101335 + 1599 GET /handle/10568/91688 + 1599 GET /handle/10947/2517 + 1599 GET /handle/10947/2518 + 1599 GET /handle/10947/2519 + 1599 GET /handle/10947/2708 + 1599 GET /handle/10947/2871 + 1600 GET /handle/10568/89342 + 1600 GET /handle/10947/4467 + 1607 GET /handle/10568/103816 + 290382 GET /handle/10568/83389 +``` + +- Before I purge all those I will ask someone Samuel Stacey from the System office to hopefully get an insight... +- Meeting with Michael Victor, Peter, Jane, and Abenet about the future of repositories in the One CGIAR +- Meeting with Michelle from Altmetric about their new CSV upload system + - I sent her some examples of Handles that have DOIs, but no linked score (yet) to see if an association will be created when she uploads them + +```csv +doi,handle +10.1016/j.agsy.2021.103263,10568/115288 +10.3389/fgene.2021.723360,10568/115287 +10.3389/fpls.2021.720670,10568/115285 +``` + +- Extract the AGROVOC subjects from IWMI's 292 publications to validate them against AGROVOC: + +```console +$ csvcut -c 'dcterms.subject[en_US]' ~/Downloads/2021-10-03-non-IWMI-publications.csv | sed -e 1d -e 's/||/\n/g' -e 's/"//g' | sort -u > /tmp/agrovoc.txt +$ ./ilri/agrovoc-lookup.py -i /tmp/agrovoc-sorted.txt -o /tmp/agrovoc-matches.csv +$ csvgrep -c 'number of matches' -m '0' /tmp/agrovoc-matches.csv | csvcut -c 1 > /tmp/invalid-agrovoc.csv +``` + + diff --git a/docs/2015-11/index.html b/docs/2015-11/index.html index c73e40093..28245c15e 100644 --- a/docs/2015-11/index.html +++ b/docs/2015-11/index.html @@ -242,6 +242,8 @@ db.statementpool = true
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -250,8 +252,6 @@ db.statementpool = true
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2015-12/index.html b/docs/2015-12/index.html index 178620a8c..2d6b25906 100644 --- a/docs/2015-12/index.html +++ b/docs/2015-12/index.html @@ -264,6 +264,8 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -272,8 +274,6 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2016-01/index.html b/docs/2016-01/index.html index 0ce80fb91..c4dc25ec6 100644 --- a/docs/2016-01/index.html +++ b/docs/2016-01/index.html @@ -200,6 +200,8 @@ $ find SimpleArchiveForBio/ -iname “*.pdf” -exec basename {} ; | sor
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -208,8 +210,6 @@ $ find SimpleArchiveForBio/ -iname “*.pdf” -exec basename {} ; | sor
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2016-02/index.html b/docs/2016-02/index.html index 504a6a86d..7e169ab6a 100644 --- a/docs/2016-02/index.html +++ b/docs/2016-02/index.html @@ -378,6 +378,8 @@ Bitstream: tést señora alimentación.pdf
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -386,8 +388,6 @@ Bitstream: tést señora alimentación.pdf
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2016-03/index.html b/docs/2016-03/index.html index ccaefb927..1702472d8 100644 --- a/docs/2016-03/index.html +++ b/docs/2016-03/index.html @@ -316,6 +316,8 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -324,8 +326,6 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2016-04/index.html b/docs/2016-04/index.html index 0e5aa9cb2..7d6828e81 100644 --- a/docs/2016-04/index.html +++ b/docs/2016-04/index.html @@ -495,6 +495,8 @@ dspace.log.2016-04-27:7271
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -503,8 +505,6 @@ dspace.log.2016-04-27:7271
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2016-05/index.html b/docs/2016-05/index.html index 17a484b6c..663946faf 100644 --- a/docs/2016-05/index.html +++ b/docs/2016-05/index.html @@ -371,6 +371,8 @@ sys 0m20.540s
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -379,8 +381,6 @@ sys 0m20.540s
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2016-06/index.html b/docs/2016-06/index.html index f1d3fc087..ab470edad 100644 --- a/docs/2016-06/index.html +++ b/docs/2016-06/index.html @@ -409,6 +409,8 @@ $ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-D
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -417,8 +419,6 @@ $ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-D
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2016-07/index.html b/docs/2016-07/index.html index 90b72301d..8249eaf86 100644 --- a/docs/2016-07/index.html +++ b/docs/2016-07/index.html @@ -325,6 +325,8 @@ discovery.index.authority.ignore-variants=true
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -333,8 +335,6 @@ discovery.index.authority.ignore-variants=true
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2016-08/index.html b/docs/2016-08/index.html index 570128f13..d0386ee41 100644 --- a/docs/2016-08/index.html +++ b/docs/2016-08/index.html @@ -389,6 +389,8 @@ $ JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m" /home/cgspace.cgiar.org/b
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -397,8 +399,6 @@ $ JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m" /home/cgspace.cgiar.org/b
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2016-09/index.html b/docs/2016-09/index.html index a8e91369f..879053e2e 100644 --- a/docs/2016-09/index.html +++ b/docs/2016-09/index.html @@ -606,6 +606,8 @@ $ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -614,8 +616,6 @@ $ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2016-10/index.html b/docs/2016-10/index.html index 47153d9d1..d9d709b2b 100644 --- a/docs/2016-10/index.html +++ b/docs/2016-10/index.html @@ -372,6 +372,8 @@ dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http:
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -380,8 +382,6 @@ dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http:
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2016-11/index.html b/docs/2016-11/index.html index 5ca90d089..82b56640a 100644 --- a/docs/2016-11/index.html +++ b/docs/2016-11/index.html @@ -548,6 +548,8 @@ org.dspace.discovery.SearchServiceException: Error executing query
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -556,8 +558,6 @@ org.dspace.discovery.SearchServiceException: Error executing query
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2016-12/index.html b/docs/2016-12/index.html index 9396e1345..2741113d7 100644 --- a/docs/2016-12/index.html +++ b/docs/2016-12/index.html @@ -784,6 +784,8 @@ $ exit
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -792,8 +794,6 @@ $ exit
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2017-01/index.html b/docs/2017-01/index.html index cd0e07a62..3b4744616 100644 --- a/docs/2017-01/index.html +++ b/docs/2017-01/index.html @@ -369,6 +369,8 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -377,8 +379,6 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2017-02/index.html b/docs/2017-02/index.html index 13499998f..835a29d0a 100644 --- a/docs/2017-02/index.html +++ b/docs/2017-02/index.html @@ -424,6 +424,8 @@ COPY 1968
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -432,8 +434,6 @@ COPY 1968
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2017-03/index.html b/docs/2017-03/index.html index 7408e1417..9a4a09aa8 100644 --- a/docs/2017-03/index.html +++ b/docs/2017-03/index.html @@ -355,6 +355,8 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -363,8 +365,6 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2017-04/index.html b/docs/2017-04/index.html index ee008fbfd..54b8f9d67 100644 --- a/docs/2017-04/index.html +++ b/docs/2017-04/index.html @@ -585,6 +585,8 @@ $ gem install compass -v 1.0.3
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -593,8 +595,6 @@ $ gem install compass -v 1.0.3
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2017-05/index.html b/docs/2017-05/index.html index 3c1a901de..3a70bd966 100644 --- a/docs/2017-05/index.html +++ b/docs/2017-05/index.html @@ -391,6 +391,8 @@ UPDATE 187
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -399,8 +401,6 @@ UPDATE 187
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2017-06/index.html b/docs/2017-06/index.html index 06b58f0d1..0be7708ec 100644 --- a/docs/2017-06/index.html +++ b/docs/2017-06/index.html @@ -270,6 +270,8 @@ $ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace impo
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -278,8 +280,6 @@ $ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace impo
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2017-07/index.html b/docs/2017-07/index.html index b9cfdac85..1241bd813 100644 --- a/docs/2017-07/index.html +++ b/docs/2017-07/index.html @@ -275,6 +275,8 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -283,8 +285,6 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2017-08/index.html b/docs/2017-08/index.html index 1a725e3d3..9a8b2154d 100644 --- a/docs/2017-08/index.html +++ b/docs/2017-08/index.html @@ -517,6 +517,8 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -525,8 +527,6 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2017-09/index.html b/docs/2017-09/index.html index bd99ecd89..00d703d47 100644 --- a/docs/2017-09/index.html +++ b/docs/2017-09/index.html @@ -659,6 +659,8 @@ Cert Status: good
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -667,8 +669,6 @@ Cert Status: good
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2017-10/index.html b/docs/2017-10/index.html index ab4f13ad0..92a81aec3 100644 --- a/docs/2017-10/index.html +++ b/docs/2017-10/index.html @@ -443,6 +443,8 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -451,8 +453,6 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2017-11/index.html b/docs/2017-11/index.html index 71a72e278..c427b9a15 100644 --- a/docs/2017-11/index.html +++ b/docs/2017-11/index.html @@ -944,6 +944,8 @@ $ cat dspace.log.2017-11-28 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | u
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -952,8 +954,6 @@ $ cat dspace.log.2017-11-28 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | u
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2017-12/index.html b/docs/2017-12/index.html index 9933f152f..b21fb73e4 100644 --- a/docs/2017-12/index.html +++ b/docs/2017-12/index.html @@ -783,6 +783,8 @@ DELETE 20
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -791,8 +793,6 @@ DELETE 20
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2018-01/index.html b/docs/2018-01/index.html index 4f913c911..46b02c9ed 100644 --- a/docs/2018-01/index.html +++ b/docs/2018-01/index.html @@ -1452,6 +1452,8 @@ Catalina:type=Manager,context=/,host=localhost activeSessions 8
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -1460,8 +1462,6 @@ Catalina:type=Manager,context=/,host=localhost activeSessions 8
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2018-02/index.html b/docs/2018-02/index.html index b34decf95..b408aa1a1 100644 --- a/docs/2018-02/index.html +++ b/docs/2018-02/index.html @@ -1039,6 +1039,8 @@ UPDATE 3
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -1047,8 +1049,6 @@ UPDATE 3
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2018-03/index.html b/docs/2018-03/index.html index ea67312bb..5316ae9a0 100644 --- a/docs/2018-03/index.html +++ b/docs/2018-03/index.html @@ -585,6 +585,8 @@ Fixed 5 occurences of: GENEBANKS
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -593,8 +595,6 @@ Fixed 5 occurences of: GENEBANKS
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2018-04/index.html b/docs/2018-04/index.html index b3572033a..9a76213ee 100644 --- a/docs/2018-04/index.html +++ b/docs/2018-04/index.html @@ -594,6 +594,8 @@ $ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -602,8 +604,6 @@ $ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2018-05/index.html b/docs/2018-05/index.html index 42290356e..3eb8ef507 100644 --- a/docs/2018-05/index.html +++ b/docs/2018-05/index.html @@ -523,6 +523,8 @@ $ psql -h localhost -U postgres dspacetest
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -531,8 +533,6 @@ $ psql -h localhost -U postgres dspacetest
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2018-06/index.html b/docs/2018-06/index.html index b49717c26..d58f3eb3b 100644 --- a/docs/2018-06/index.html +++ b/docs/2018-06/index.html @@ -517,6 +517,8 @@ $ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 > map-to-cifor-archive.csv
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -525,8 +527,6 @@ $ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 > map-to-cifor-archive.csv
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2018-07/index.html b/docs/2018-07/index.html index 19a7e8789..80173e051 100644 --- a/docs/2018-07/index.html +++ b/docs/2018-07/index.html @@ -569,6 +569,8 @@ dspace=# select count(text_value) from metadatavalue where resource_type_id=2 an
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -577,8 +579,6 @@ dspace=# select count(text_value) from metadatavalue where resource_type_id=2 an
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2018-08/index.html b/docs/2018-08/index.html index a2af57b3a..38d4c20ad 100644 --- a/docs/2018-08/index.html +++ b/docs/2018-08/index.html @@ -442,6 +442,8 @@ $ dspace database migrate ignored
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -450,8 +452,6 @@ $ dspace database migrate ignored
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2018-09/index.html b/docs/2018-09/index.html index 006c3de5e..19f8c6b9a 100644 --- a/docs/2018-09/index.html +++ b/docs/2018-09/index.html @@ -748,6 +748,8 @@ UPDATE metadatavalue SET text_value='ja' WHERE resource_type_id=2 AND metadata_f
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -756,8 +758,6 @@ UPDATE metadatavalue SET text_value='ja' WHERE resource_type_id=2 AND metadata_f
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2018-10/index.html b/docs/2018-10/index.html index d5d1445bc..05ff9d6ac 100644 --- a/docs/2018-10/index.html +++ b/docs/2018-10/index.html @@ -656,6 +656,8 @@ $ curl -X GET -H "Content-Type: application/json" -H "Accept: app
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -664,8 +666,6 @@ $ curl -X GET -H "Content-Type: application/json" -H "Accept: app
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2018-11/index.html b/docs/2018-11/index.html index 730a39aee..c44643610 100644 --- a/docs/2018-11/index.html +++ b/docs/2018-11/index.html @@ -553,6 +553,8 @@ $ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -561,8 +563,6 @@ $ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2018-12/index.html b/docs/2018-12/index.html index fea550695..980e29f49 100644 --- a/docs/2018-12/index.html +++ b/docs/2018-12/index.html @@ -594,6 +594,8 @@ UPDATE 1
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -602,8 +604,6 @@ UPDATE 1
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2019-01/index.html b/docs/2019-01/index.html index bb95b8217..948545473 100644 --- a/docs/2019-01/index.html +++ b/docs/2019-01/index.html @@ -1264,6 +1264,8 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -1272,8 +1274,6 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2019-02/index.html b/docs/2019-02/index.html index 452251d27..e39520c88 100644 --- a/docs/2019-02/index.html +++ b/docs/2019-02/index.html @@ -1344,6 +1344,8 @@ Please see the DSpace documentation for assistance.
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -1352,8 +1354,6 @@ Please see the DSpace documentation for assistance.
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2019-03/index.html b/docs/2019-03/index.html index 61a70de7c..ebaffba69 100644 --- a/docs/2019-03/index.html +++ b/docs/2019-03/index.html @@ -1208,6 +1208,8 @@ sys 0m2.551s
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -1216,8 +1218,6 @@ sys 0m2.551s
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2019-04/index.html b/docs/2019-04/index.html index 3ed955dfd..d3f509431 100644 --- a/docs/2019-04/index.html +++ b/docs/2019-04/index.html @@ -1299,6 +1299,8 @@ UPDATE 14
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -1307,8 +1309,6 @@ UPDATE 14
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2019-05/index.html b/docs/2019-05/index.html index 88d128925..f29b61496 100644 --- a/docs/2019-05/index.html +++ b/docs/2019-05/index.html @@ -631,6 +631,8 @@ COPY 64871
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -639,8 +641,6 @@ COPY 64871
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2019-06/index.html b/docs/2019-06/index.html index 97cac967b..d5335bcf8 100644 --- a/docs/2019-06/index.html +++ b/docs/2019-06/index.html @@ -317,6 +317,8 @@ UPDATE 2
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -325,8 +327,6 @@ UPDATE 2
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2019-07/index.html b/docs/2019-07/index.html index e45253809..f34fa2385 100644 --- a/docs/2019-07/index.html +++ b/docs/2019-07/index.html @@ -554,6 +554,8 @@ issn.validate('1020-3362')
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -562,8 +564,6 @@ issn.validate('1020-3362')
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2019-08/index.html b/docs/2019-08/index.html index e23779849..c064abc30 100644 --- a/docs/2019-08/index.html +++ b/docs/2019-08/index.html @@ -573,6 +573,8 @@ sys 2m27.496s
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -581,8 +583,6 @@ sys 2m27.496s
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2019-09/index.html b/docs/2019-09/index.html index aed774983..5fdc58ada 100644 --- a/docs/2019-09/index.html +++ b/docs/2019-09/index.html @@ -581,6 +581,8 @@ $ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institut
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -589,8 +591,6 @@ $ csv-metadata-quality -i /tmp/clarisa-institutions.csv -o /tmp/clarisa-institut
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2019-10/index.html b/docs/2019-10/index.html index 71118ee79..cbb091686 100644 --- a/docs/2019-10/index.html +++ b/docs/2019-10/index.html @@ -385,6 +385,8 @@ $ dspace import -a -c 10568/104057 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -393,8 +395,6 @@ $ dspace import -a -c 10568/104057 -e fuu@cgiar.org -m 2019-10-15-Bioversity.map
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2019-11/index.html b/docs/2019-11/index.html index aa065533d..df2cce998 100644 --- a/docs/2019-11/index.html +++ b/docs/2019-11/index.html @@ -692,6 +692,8 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -700,8 +702,6 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2019-12/index.html b/docs/2019-12/index.html index 4bf1c8666..2049dcd62 100644 --- a/docs/2019-12/index.html +++ b/docs/2019-12/index.html @@ -404,6 +404,8 @@ UPDATE 1
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -412,8 +414,6 @@ UPDATE 1
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2020-01/index.html b/docs/2020-01/index.html index 263e72ba2..3f1dfd4d2 100644 --- a/docs/2020-01/index.html +++ b/docs/2020-01/index.html @@ -604,6 +604,8 @@ COPY 2900
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -612,8 +614,6 @@ COPY 2900
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2020-02/index.html b/docs/2020-02/index.html index 608ab59c9..bbf569298 100644 --- a/docs/2020-02/index.html +++ b/docs/2020-02/index.html @@ -1275,6 +1275,8 @@ Moving: 21993 into core statistics-2019
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -1283,8 +1285,6 @@ Moving: 21993 into core statistics-2019
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2020-03/index.html b/docs/2020-03/index.html index 7f077df8d..0ccd80969 100644 --- a/docs/2020-03/index.html +++ b/docs/2020-03/index.html @@ -484,6 +484,8 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -492,8 +494,6 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2020-04/index.html b/docs/2020-04/index.html index 7f12b3abc..013a419eb 100644 --- a/docs/2020-04/index.html +++ b/docs/2020-04/index.html @@ -658,6 +658,8 @@ $ psql -c 'select * from pg_stat_activity' | wc -l
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -666,8 +668,6 @@ $ psql -c 'select * from pg_stat_activity' | wc -l
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2020-05/index.html b/docs/2020-05/index.html index c1d187a7f..c9cacaa7a 100644 --- a/docs/2020-05/index.html +++ b/docs/2020-05/index.html @@ -477,6 +477,8 @@ Caused by: java.lang.NullPointerException
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -485,8 +487,6 @@ Caused by: java.lang.NullPointerException
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2020-06/index.html b/docs/2020-06/index.html index 41a05733e..438209ebe 100644 --- a/docs/2020-06/index.html +++ b/docs/2020-06/index.html @@ -811,6 +811,8 @@ $ csvcut -c 'id,cg.subject.ilri[],cg.subject.ilri[en_US],dc.subject[en_US]' /tmp
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -819,8 +821,6 @@ $ csvcut -c 'id,cg.subject.ilri[],cg.subject.ilri[en_US],dc.subject[en_US]' /tmp
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2020-07/index.html b/docs/2020-07/index.html index 699bcce35..7666e35ae 100644 --- a/docs/2020-07/index.html +++ b/docs/2020-07/index.html @@ -1142,6 +1142,8 @@ Fixed 4 occurences of: Muloi, D.M.
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -1150,8 +1152,6 @@ Fixed 4 occurences of: Muloi, D.M.
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2020-08/index.html b/docs/2020-08/index.html index ebe877fe4..58a50a640 100644 --- a/docs/2020-08/index.html +++ b/docs/2020-08/index.html @@ -798,6 +798,8 @@ $ grep -c added /tmp/2020-08-27-countrycodetagger.log
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -806,8 +808,6 @@ $ grep -c added /tmp/2020-08-27-countrycodetagger.log
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2020-09/index.html b/docs/2020-09/index.html index fba740918..68349a2ef 100644 --- a/docs/2020-09/index.html +++ b/docs/2020-09/index.html @@ -717,6 +717,8 @@ solr_query_params = {
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -725,8 +727,6 @@ solr_query_params = {
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2020-10/index.html b/docs/2020-10/index.html index 60eaa91b2..d81b01427 100644 --- a/docs/2020-10/index.html +++ b/docs/2020-10/index.html @@ -1241,6 +1241,8 @@ $ ./delete-metadata-values.py -i 2020-10-31-delete-74-sponsors.csv -db dspace -u
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -1249,8 +1251,6 @@ $ ./delete-metadata-values.py -i 2020-10-31-delete-74-sponsors.csv -db dspace -u
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2020-11/index.html b/docs/2020-11/index.html index 0003e5039..1fa4a42af 100644 --- a/docs/2020-11/index.html +++ b/docs/2020-11/index.html @@ -731,6 +731,8 @@ $ ./fix-metadata-values.py -i 2020-11-30-fix-hung-orcid.csv -db dspace63 -u dspa
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -739,8 +741,6 @@ $ ./fix-metadata-values.py -i 2020-11-30-fix-hung-orcid.csv -db dspace63 -u dspa
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2020-12/index.html b/docs/2020-12/index.html index 9042af4eb..b0e960af4 100644 --- a/docs/2020-12/index.html +++ b/docs/2020-12/index.html @@ -869,6 +869,8 @@ $ curl -XDELETE 'http://localhost:9200/openrxv-items-2020-12-29?pretty'
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -877,8 +879,6 @@ $ curl -XDELETE 'http://localhost:9200/openrxv-items-2020-12-29?pretty'
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2021-01/index.html b/docs/2021-01/index.html index 03c74ef7d..4da34e3de 100644 --- a/docs/2021-01/index.html +++ b/docs/2021-01/index.html @@ -688,6 +688,8 @@ java.lang.IllegalArgumentException: Invalid character found in the request targe
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -696,8 +698,6 @@ java.lang.IllegalArgumentException: Invalid character found in the request targe
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2021-02/index.html b/docs/2021-02/index.html index 9b0a51188..46a2d7ff2 100644 --- a/docs/2021-02/index.html +++ b/docs/2021-02/index.html @@ -898,6 +898,8 @@ dspace.log.2021-02-28:0
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -906,8 +908,6 @@ dspace.log.2021-02-28:0
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2021-03/index.html b/docs/2021-03/index.html index f55866a7a..eed5eb900 100644 --- a/docs/2021-03/index.html +++ b/docs/2021-03/index.html @@ -875,6 +875,8 @@ COPY 3081
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -883,8 +885,6 @@ COPY 3081
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2021-04/index.html b/docs/2021-04/index.html index abc93bf64..873a1ee58 100644 --- a/docs/2021-04/index.html +++ b/docs/2021-04/index.html @@ -1042,6 +1042,8 @@ $ ./ilri/doi-to-handle.py -i /tmp/dois.txt -o /tmp/handles.csv -db dspace63 -u d
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -1050,8 +1052,6 @@ $ ./ilri/doi-to-handle.py -i /tmp/dois.txt -o /tmp/handles.csv -db dspace63 -u d
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2021-05/index.html b/docs/2021-05/index.html index b330a88a5..98d679b4f 100644 --- a/docs/2021-05/index.html +++ b/docs/2021-05/index.html @@ -685,6 +685,8 @@ Please see the DSpace documentation for assistance.
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -693,8 +695,6 @@ Please see the DSpace documentation for assistance.
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2021-06/index.html b/docs/2021-06/index.html index 1153dda36..0f1682064 100644 --- a/docs/2021-06/index.html +++ b/docs/2021-06/index.html @@ -693,6 +693,8 @@ COPY 1710
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -701,8 +703,6 @@ COPY 1710
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2021-07/index.html b/docs/2021-07/index.html index 115839278..73465dbc3 100644 --- a/docs/2021-07/index.html +++ b/docs/2021-07/index.html @@ -715,6 +715,8 @@ $ cat AS* /tmp/ddos-networks-to-block.txt | sed -e '/^$/d' -e '/^#/d' -e '/^{/d'
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -723,8 +725,6 @@ $ cat AS* /tmp/ddos-networks-to-block.txt | sed -e '/^$/d' -e '/^#/d' -e '/^{/d'
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2021-08/index.html b/docs/2021-08/index.html index c8e0e76ab..4e6f7bd92 100644 --- a/docs/2021-08/index.html +++ b/docs/2021-08/index.html @@ -606,6 +606,8 @@ $ ./ilri/add-orcid-identifiers-csv.py -i 2021-08-25-add-orcids.csv -db dspace -u
    +
  1. October, 2021
  2. +
  3. September, 2021
  4. August, 2021
  5. @@ -614,8 +616,6 @@ $ ./ilri/add-orcid-identifiers-csv.py -i 2021-08-25-add-orcids.csv -db dspace -u
  6. June, 2021
  7. -
  8. May, 2021
  9. -
diff --git a/docs/2021-09/index.html b/docs/2021-09/index.html index cf012b820..1f7948979 100644 --- a/docs/2021-09/index.html +++ b/docs/2021-09/index.html @@ -26,7 +26,7 @@ The syntax Moayad showed me last month doesn’t seem to honor the search qu - + @@ -58,9 +58,9 @@ The syntax Moayad showed me last month doesn’t seem to honor the search qu "@type": "BlogPosting", "headline": "September, 2021", "url": "https://alanorth.github.io/cgspace-notes/2021-09/", - "wordCount": "2812", + "wordCount": "2864", "datePublished": "2021-09-01T09:14:07+03:00", - "dateModified": "2021-09-28T22:00:36+03:00", + "dateModified": "2021-10-04T11:10:54+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -223,7 +223,7 @@ $ docker-compose build
  • Some people from the Alliance contacted me last week about AICCRA metadata
  • @@ -558,6 +558,15 @@ $ csvcut -c subject,'match type' /tmp/2021-09-29-ilri-subjects.csv | sed -e 's/m +

    2021-09-30

    + @@ -579,6 +588,8 @@ $ csvcut -c subject,'match type' /tmp/2021-09-29-ilri-subjects.csv | sed -e 's/m
      +
    1. October, 2021
    2. +
    3. September, 2021
    4. August, 2021
    5. @@ -587,8 +598,6 @@ $ csvcut -c subject,'match type' /tmp/2021-09-29-ilri-subjects.csv | sed -e 's/m
    6. June, 2021
    7. -
    8. May, 2021
    9. -
    diff --git a/docs/2021-10/index.html b/docs/2021-10/index.html new file mode 100644 index 000000000..5e6507691 --- /dev/null +++ b/docs/2021-10/index.html @@ -0,0 +1,309 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + October, 2021 | CGSpace Notes + + + + + + + + + + + + + + + + + + + + + +
    +
    + +
    +
    + + + + +
    +
    +

    CGSpace Notes

    +

    Documenting day-to-day work on the CGSpace repository.

    +
    +
    + + + + +
    +
    +
    + + + + +
    +
    +

    October, 2021

    + +
    +

    2021-10-01

    +
      +
    • Export all affiliations on CGSpace and run them against the latest RoR data dump:
    • +
    +
    localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
    +$ csvcut -c 1 /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
    +$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
    +ations-matching.csv
    +$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l 
    +1879
    +$ wc -l /tmp/2021-10-01-affiliations.txt 
    +7100 /tmp/2021-10-01-affiliations.txt
    +
      +
    • So we have 1879/7100 (26.46%) matching already
    • +
    +

    2021-10-03

    +
      +
    • Dominique from IWMI asked me for information about how CGSpace partners are using CGSpace APIs to feed their websites
    • +
    • Start a fresh indexing on AReS
    • +
    • Udana sent me his file of 292 non-IWMI publications for the Virtual library on water management +
        +
      • He added licenses
      • +
      • I want to clean up the dcterms.extent field though because it has volume, issue, and pages there
      • +
      • I cloned the column several times and extracted values based on their positions, for example: +
          +
        • Volume: value.partition(":")[0]
        • +
        • Issue: value.partition("(")[2].partition(")")[0]
        • +
        • Page: "p. " + value.replace(".", "")
        • +
        +
      • +
      +
    • +
    +

    2021-10-04

    +
      +
    • Start looking at the last month of Solr statistics on CGSpace +
        +
      • I see a number of IPs with “normal” user agents who clearly behave like bots +
          +
        • 198.15.130.18: 21,000 requests to /discover with a normal-looking user agent, from ASN 11282 (SERVERYOU, US)
        • +
        • 93.158.90.107: 8,500 requests to handle and browse links with a Firefox 84.0 user agent, from ASN 12552 (IPO-EU, SE)
        • +
        • 193.235.141.162: 4,800 requests to handle, browse, and discovery links with a Firefox 84.0 user agent, from ASN 51747 (INTERNETBOLAGET, SE)
        • +
        • 3.225.28.105: 2,900 requests to REST API for the CIAT Story Maps collection with a normal user agent, from ASN 14618 (AMAZON-AES, US)
        • +
        • 34.228.236.6: 2,800 requests to discovery for the CGIAR System community with user agent Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1), from ASN 14618 (AMAZON-AES, US)
        • +
        • 18.212.137.2: 2,800 requests to discovery for the CGIAR System community with user agent Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1), from ASN 14618 (AMAZON-AES, US)
        • +
        • 3.81.123.72: 2,800 requests to discovery and handles for the CGIAR System community with user agent Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1), from ASN 14618 (AMAZON-AES, US)
        • +
        • 3.227.16.188: 2,800 requests to discovery and handles for the CGIAR System community with user agent Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1), from ASN 14618 (AMAZON-AES, US)
        • +
        +
      • +
      • Looking closer into the requests with this Mozilla/4.0 user agent, I see 500+ IPs using it:
      • +
      +
    • +
    +
    # zcat --force /var/log/nginx/*.log* | grep 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)' | awk '{print $1}' | sort | uniq > /tmp/mozilla-4.0-ips.txt
    +# wc -l /tmp/mozilla-4.0-ips.txt 
    +543 /tmp/mozilla-4.0-ips.txt
    +
      +
    • Then I resolved the IPs and extracted the ones belonging to Amazon:
    • +
    +
    $ ./ilri/resolve-addresses-geoip2.py -i /tmp/mozilla-4.0-ips.txt -k "$ABUSEIPDB_API_KEY" -o /tmp/mozilla-4.0-ips.csv
    +$ csvgrep -c asn -m 14618 /tmp/mozilla-4.0-ips.csv | csvcut -c ip | sed 1d | tee /tmp/amazon-ips.txt | wc -l
    +
      +
    • I am thinking I will purge them all, as I have several indicators that they are bots: mysterious user agent, IP owned by Amazon
    • +
    • Even more interesting, these requests are weighted VERY heavily on the CGIAR System community:
    • +
    +
       1592 GET /handle/10947/2526
    +   1592 GET /handle/10947/2527
    +   1592 GET /handle/10947/34
    +   1593 GET /handle/10947/6
    +   1594 GET /handle/10947/1
    +   1598 GET /handle/10947/2515
    +   1598 GET /handle/10947/2516
    +   1599 GET /handle/10568/101335
    +   1599 GET /handle/10568/91688
    +   1599 GET /handle/10947/2517
    +   1599 GET /handle/10947/2518
    +   1599 GET /handle/10947/2519
    +   1599 GET /handle/10947/2708
    +   1599 GET /handle/10947/2871
    +   1600 GET /handle/10568/89342
    +   1600 GET /handle/10947/4467
    +   1607 GET /handle/10568/103816
    + 290382 GET /handle/10568/83389
    +
      +
    • Before I purge all those I will ask someone Samuel Stacey from the System office to hopefully get an insight…
    • +
    • Meeting with Michael Victor, Peter, Jane, and Abenet about the future of repositories in the One CGIAR
    • +
    • Meeting with Michelle from Altmetric about their new CSV upload system +
        +
      • I sent her some examples of Handles that have DOIs, but no linked score (yet) to see if an association will be created when she uploads them
      • +
      +
    • +
    +
    doi,handle
    +10.1016/j.agsy.2021.103263,10568/115288
    +10.3389/fgene.2021.723360,10568/115287
    +10.3389/fpls.2021.720670,10568/115285
    +
      +
    • Extract the AGROVOC subjects from IWMI’s 292 publications to validate them against AGROVOC:
    • +
    +
    $ csvcut -c 'dcterms.subject[en_US]' ~/Downloads/2021-10-03-non-IWMI-publications.csv | sed -e 1d -e 's/||/\n/g' -e 's/"//g' | sort -u > /tmp/agrovoc.txt
    +$ ./ilri/agrovoc-lookup.py -i /tmp/agrovoc-sorted.txt -o /tmp/agrovoc-matches.csv
    +$ csvgrep -c 'number of matches' -m '0' /tmp/agrovoc-matches.csv | csvcut -c 1 > /tmp/invalid-agrovoc.csv
    +
    + + + + + +
    + + + +
    + + + + +
    +
    + + + + + + + + + diff --git a/docs/404.html b/docs/404.html index 3351ab790..05ca42b65 100644 --- a/docs/404.html +++ b/docs/404.html @@ -95,6 +95,8 @@
      +
    1. October, 2021
    2. +
    3. September, 2021
    4. August, 2021
    5. @@ -103,8 +105,6 @@
    6. June, 2021
    7. -
    8. May, 2021
    9. -
    diff --git a/docs/categories/index.html b/docs/categories/index.html index d70e41366..8785f87b2 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -10,7 +10,7 @@ - + @@ -84,7 +84,7 @@

    Notes

    - +
    Read more → @@ -108,6 +108,8 @@
      +
    1. October, 2021
    2. +
    3. September, 2021
    4. August, 2021
    5. @@ -116,8 +118,6 @@
    6. June, 2021
    7. -
    8. May, 2021
    9. -
    diff --git a/docs/categories/index.xml b/docs/categories/index.xml index 2ef3455b2..1753fc18c 100644 --- a/docs/categories/index.xml +++ b/docs/categories/index.xml @@ -6,11 +6,11 @@ Recent content in Categories on CGSpace Notes Hugo -- gohugo.io en-us - Wed, 01 Sep 2021 09:14:07 +0300 + Fri, 01 Oct 2021 11:14:07 +0300 Notes https://alanorth.github.io/cgspace-notes/categories/notes/ - Wed, 01 Sep 2021 09:14:07 +0300 + Fri, 01 Oct 2021 11:14:07 +0300 https://alanorth.github.io/cgspace-notes/categories/notes/ diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index 192993971..c7631b7e0 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -10,7 +10,7 @@ - + @@ -81,6 +81,38 @@ +
    +
    +

    October, 2021

    + +
    +

    2021-10-01

    +
      +
    • Export all affiliations on CGSpace and run them against the latest RoR data dump:
    • +
    +
    localhost/dspace63= > \COPY (SELECT DISTINCT text_value as "cg.contributor.affiliation", count(*) FROM metadatavalue WHERE dspace_object_id IN (SELECT uuid FROM item) AND metadata_field_id = 211 GROUP BY text_value ORDER BY count DESC) to /tmp/2021-10-01-affiliations.csv WITH CSV HEADER;
    +$ csvcut -c 1 /tmp/2021-10-01-affiliations.csv | sed 1d > /tmp/2021-10-01-affiliations.txt
    +$ ./ilri/ror-lookup.py -i /tmp/2021-10-01-affiliations.txt -r 2021-09-23-ror-data.json -o /tmp/2021-10-01-affili
    +ations-matching.csv
    +$ csvgrep -c matched -m true /tmp/2021-10-01-affiliations-matching.csv | sed 1d | wc -l 
    +1879
    +$ wc -l /tmp/2021-10-01-affiliations.txt 
    +7100 /tmp/2021-10-01-affiliations.txt
    +
      +
    • So we have 1879/7100 (26.46%) matching already
    • +
    + Read more → +
    + + + + + +

    September, 2021

    @@ -333,40 +365,6 @@ COPY 20994 - -
    -
    -

    January, 2021

    - -
    -

    2021-01-03

    -
      -
    • Peter notified me that some filters on AReS were broken again -
        -
      • It’s the same issue with the field names getting .keyword appended to the end that I already filed an issue on OpenRXV about last month
      • -
      • I fixed the broken filters (careful to not edit any others, lest they break too!)
      • -
      -
    • -
    • Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV -
        -
      • The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
      • -
      • I adjusted it to default to 0 and added a note to the admin screen
      • -
      • I realized that this issue was actually causing the first page of 100 statistics to be missing…
      • -
      • For example, this item has 51 views on CGSpace, but 0 on AReS
      • -
      -
    • -
    - Read more → -
    - - - - -