From a23b442c7791c5b0078bf356ecb74e36097705dd Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Tue, 30 Jun 2020 15:47:18 +0300 Subject: [PATCH] Add notes for 2020-06-30 --- content/posts/2020-06.md | 162 ++++++++++ docs/2015-11/index.html | 2 +- docs/2015-12/index.html | 2 +- docs/2016-01/index.html | 2 +- docs/2016-02/index.html | 2 +- docs/2016-03/index.html | 2 +- docs/2016-04/index.html | 2 +- docs/2016-05/index.html | 2 +- docs/2016-06/index.html | 2 +- docs/2016-07/index.html | 2 +- docs/2016-08/index.html | 2 +- docs/2016-09/index.html | 2 +- docs/2016-10/index.html | 2 +- docs/2016-11/index.html | 2 +- docs/2016-12/index.html | 2 +- docs/2017-01/index.html | 2 +- docs/2017-02/index.html | 2 +- docs/2017-03/index.html | 2 +- docs/2017-04/index.html | 2 +- docs/2017-05/index.html | 2 +- docs/2017-06/index.html | 2 +- docs/2017-07/index.html | 2 +- docs/2017-08/index.html | 2 +- docs/2017-09/index.html | 2 +- docs/2017-10/index.html | 2 +- docs/2017-11/index.html | 2 +- docs/2017-12/index.html | 2 +- docs/2018-01/index.html | 2 +- docs/2018-02/index.html | 2 +- docs/2018-03/index.html | 2 +- docs/2018-04/index.html | 2 +- docs/2018-05/index.html | 2 +- docs/2018-06/index.html | 2 +- docs/2018-07/index.html | 2 +- docs/2018-08/index.html | 2 +- docs/2018-09/index.html | 2 +- docs/2018-10/index.html | 2 +- docs/2018-11/index.html | 2 +- docs/2018-12/index.html | 2 +- docs/2019-01/index.html | 2 +- docs/2019-02/index.html | 2 +- docs/2019-03/index.html | 2 +- docs/2019-04/index.html | 2 +- docs/2019-05/index.html | 2 +- docs/2019-06/index.html | 2 +- docs/2019-07/index.html | 2 +- docs/2019-08/index.html | 2 +- docs/2019-09/index.html | 2 +- docs/2019-10/index.html | 2 +- docs/2019-11/index.html | 2 +- docs/2019-12/index.html | 2 +- docs/2020-01/index.html | 2 +- docs/2020-02/index.html | 2 +- docs/2020-03/index.html | 2 +- docs/2020-04/index.html | 2 +- docs/2020-05/index.html | 2 +- docs/2020-06/index.html | 193 +++++++++++- docs/404.html | 2 +- docs/categories/index.html | 323 +------------------- docs/categories/notes/index.html | 4 +- docs/categories/notes/page/2/index.html | 4 +- docs/categories/notes/page/3/index.html | 4 +- docs/categories/notes/page/4/index.html | 4 +- docs/cgiar-library-migration/index.html | 2 +- docs/cgspace-cgcorev2-migration/index.html | 2 +- docs/index.html | 4 +- docs/page/2/index.html | 4 +- docs/page/3/index.html | 4 +- docs/page/4/index.html | 4 +- docs/page/5/index.html | 4 +- docs/page/6/index.html | 4 +- docs/posts/index.html | 4 +- docs/posts/page/2/index.html | 4 +- docs/posts/page/3/index.html | 4 +- docs/posts/page/4/index.html | 4 +- docs/posts/page/5/index.html | 4 +- docs/posts/page/6/index.html | 4 +- docs/sitemap.xml | 10 +- docs/tags/index.html | 333 ++------------------- docs/tags/migration/index.html | 2 +- docs/tags/notes/index.html | 2 +- docs/tags/notes/page/2/index.html | 2 +- docs/tags/notes/page/3/index.html | 2 +- 83 files changed, 484 insertions(+), 725 deletions(-) diff --git a/content/posts/2020-06.md b/content/posts/2020-06.md index 88063e5cc..2c3c5ff8e 100644 --- a/content/posts/2020-06.md +++ b/content/posts/2020-06.md @@ -437,5 +437,167 @@ COPY 3917 - Email GRID.ac to ask them about where old names for institutes are stores, as I see them in the "Disambiguate" search function online, but not in the standalone data - For example, both "International Laboratory for Research on Animal Diseases" (ILRAD) and "International Livestock Centre for Africa" (ILCA) correctly return a hit for "International Livestock Research Institute", but it's nowhere in the data +- I discovered two interesting OpenRefine reconciliation services: + - [OpenRefine reconciler for the Research Organization Registry](https://github.com/ror-community/ror-reconciler) + - [Getty Vocabularies OpenRefine Reconciliation](https://www.getty.edu/research/tools/vocabularies/obtain/openrefine.html) (see the Getty Thesaurus of Geographic Names ® (TGN)) + +## 2020-06-29 + +- I stumbled upon a sort of [standard for rights statements](https://rightsstatements.org/page/1.0/) that we might want to use for `dc.rights` eventually +- I'm trying to understand the difference between `dcterms.coverage`, `dcterms.spatial`, and `dcterms.temporal` + - According to the [Dublin Core specification for coverage](https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/coverage/) the more specific spatial and temporal subproperties: + +> Because coverage is so broadly defined, it is preferable to use the more specific subproperties Temporal Coverage and Spatial Coverage. + +- So I guess we should be using this for countries... but then all regions, countries, etc get merged together into this when you use DCTERMS + - Perhaps better to use `cg.coverage.country` and crosswalk to `dcterms.spatial` + - Another thing is that these values are not literals—you are supposed to embed classes... +- I also notice that there is a [CrossRef funders registry](https://www.crossref.org/services/funder-registry/) with 23,000+ funders that you can [download as RDF](https://gitlab.com/crossref/open_funder_registry) or [access via an API](https://www.crossref.org/education/funder-registry/accessing-the-funder-registry/) + +``` +$ http 'https://api.crossref.org/funders?query=Bill+and+Melinda+Gates&mailto=a.orth@cgiar.org' +``` + +- Searching for "Bill and Melinda Gates" we can see the `name` literal and a list of `alt-names` literals + - This could be good for checking our funders + - The API currently returns pages for each funder in the vocabulary, but they are giving HTTP 404 right now: https://data.crossref.org/fundingdata/vocabulary/Label-599174 + - I sent an email to the CrossRef Funders Registry team +- See the [CrossRef API docs](https://github.com/CrossRef/rest-api-doc) (specifically the parameters and filters) +- I made a pull request on CG Core v2 to recommend using persistent identifiers for DOIs and ORCID iDs ([#26](https://github.com/AgriculturalSemantics/cg-core/pull/26)) +- I exported sponsors/funders from CGSpace and wrote a script to query the CrossRef API for matches: + +``` +dspace=# \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=29) TO /tmp/2020-06-29-sponsors.csv; +COPY 682 +``` + +- The script is `crossref-funders-lookup.py` and it is based on `agrovoc-lookup.py` + - On that note, I realized I need to URL encode the funder before making the search request with requests because, while the requests library *does* do URL encoding, it seems that it interprets characters like `&` as indicative query parameters and this causes searches for funders like `Bill & Melinda Gates Foundation` to get misinterpreted + - So then I noticed that I had worked around this in `agrovoc-lookup.py` a few years ago by just ignoring subjects with special characters like apostrophes and accents! +- I tested the script on our funders: + +``` +$ ./crossref-funders-lookup.py -i /tmp/2020-06-29-sponsors.csv -om /tmp/sponsors-matched.txt -or /tmp/sponsors-rejected.txt -d -e blah@blah.com +$ wc -l /tmp/2020-06-29-sponsors.csv +682 /tmp/2020-06-29-sponsors.csv +$ wc -l /tmp/sponsors-* + 180 /tmp/sponsors-matched.txt + 502 /tmp/sponsors-rejected.txt + 682 total +``` + +- It seems that 35% of our funders already match... I bet a few more will match if I check for simple errors + - Interesting, I found a few funders that we have correct, but can't figure out how to match them in the API: + - `Claussen-Simon-Stiftung` + - `H2020 Marie Skłodowska-Curie Actions` + +## 2020-06-30 + +- GRID responded to my question about historical names + - They said the information is not part of the public GRID or ROR lists, but you can access it with a license to the Dimensions API +- Gabriela from CIP sent me a list of erroneously added CIP subjects to remove from CGSpace: + +``` +$ cat /tmp/2020-06-30-remove-cip-subjects.csv +cg.subject.cip +INTEGRATED PEST MANAGEMENT +ORANGE FLESH SWEET POTATOES +AEROPONICS +FOOD SUPPLY +SASHA +SPHI +INSECT LIFE CYCLE MODELLING +SUSTAIN +AGRICULTURAL INNOVATIONS +NATIVE VARIETIES +PHYTOPHTHORA INFESTANS +$ ./delete-metadata-values.py -i /tmp/2020-06-30-remove-cip-subjects.csv -db dspace -u dspace -p 'fuuu' -f cg.subject.cip -m 127 -d +``` + +- She also wants to change their `SWEET POTATOES` term to `SWEETPOTATOES`, both in the CIP subject list and existing items so I updated those too: + +``` +$ cat /tmp/2020-06-30-fix-cip-subjects.csv +cg.subject.cip,correct +SWEET POTATOES,SWEETPOTATOES +$ ./fix-metadata-values.py -i /tmp/2020-06-30-fix-cip-subjects.csv -db dspace -u dspace -p 'fuuu' -f cg.subject.cip -t correct -m 127 -d +``` + +- She also finished doing all the corrections to authors that I had sent her last week, but many of the changes are removing Spanish accents from authors names so I asked if she's really should she wants to do that +- I ran the fixes and deletes on CGSpace, but not on DSpace Test yet because those scripts need updating for DSpace 6 UUIDs +- I spent about two hours manually checking our sponsors that were rejected from CrossRef and found about fifty-five corrections that I ran on CGSpace: + +``` +$ cat 2020-06-29-fix-sponsors.csv +dc.description.sponsorship,correct +"Conselho Nacional de Desenvolvimento Científico e Tecnológico, Brazil","Conselho Nacional de Desenvolvimento Científico e Tecnológico" +"Claussen Simon Stiftung","Claussen-Simon-Stiftung" +"Fonds pour la formation á la Recherche dans l'Industrie et dans l'Agriculture, Belgium","Fonds pour la Formation à la Recherche dans l’Industrie et dans l’Agriculture" +"Fundação de Amparo à Pesquisa do Estado de São Paulo, Brazil","Fundação de Amparo à Pesquisa do Estado de São Paulo" +"Schlumberger Foundation Faculty for the Future","Schlumberger Foundation" +"Wildlife Conservation Society, United States","Wildlife Conservation Society" +"Portuguese Foundation for Science and Technology","Portuguese Science and Technology Foundation" +"Wageningen University and Research","Wageningen University and Research Centre" +"Leverhulme Centre for Integrative Research in Agriculture and Health","Leverhulme Centre for Integrative Research on Agriculture and Health" +"Natural Science and Engineering Research Council of Canada","Natural Sciences and Engineering Research Council of Canada" +"Biotechnology and Biological Sciences Research Council, United Kingdom","Biotechnology and Biological Sciences Research Council" +"Home Grown Ceraels Authority United Kingdom","Home-Grown Cereals Authority" +"Fiat Panis Foundation","Foundation fiat panis" +"Defence Science and Technology Laboratory, United Kingdom","Defence Science and Technology Laboratory" +"African Development Bank","African Development Bank Group" +"Ministry of Health, Labour, and Welfare, Japan","Ministry of Health, Labour and Welfare" +"World Academy of Sciences","The World Academy of Sciences" +"Agricultural Research Council, South Africa","Agricultural Research Council" +"Department of Homeland Security, USA","U.S. Department of Homeland Security" +"Quadram Institute","Quadram Institute Bioscience" +"Google.org","Google" +"Department for Environment, Food and Rural Affairs, United Kingdom","Department for Environment, Food and Rural Affairs, UK Government" +"National Commission for Science, Technology and Innovation, Kenya","National Commission for Science, Technology and Innovation" +"Hainan Province Natural Science Foundation of China","Natural Science Foundation of Hainan Province" +"German Society for International Cooperation (GIZ)","GIZ" +"German Federal Ministry of Food and Agriculture","Federal Ministry of Food and Agriculture" +"State Key Laboratory of Environmental Geochemistry, China","State Key Laboratory of Environmental Geochemistry" +"QUT student scholarship","Queensland University of Technology" +"Australia Centre for International Agricultural Research","Australian Centre for International Agricultural Research" +"Belgian Science Policy","Belgian Federal Science Policy Office" +"U.S. Department of Agriculture USDA","U.S. Department of Agriculture" +"U.S.. Department of Agriculture (USDA)","U.S. Department of Agriculture" +"Fundação de Amparo à Pesquisa do Estado de São Paulo ( FAPESP)","Fundação de Amparo à Pesquisa do Estado de São Paulo" +"Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul, Brazil","Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul" +"Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro, Brazil","Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro" +"Swedish University of Agricultural Sciences (SLU)","Swedish University of Agricultural Sciences" +"U.S. Department of Agriculture (USDA)","U.S. Department of Agriculture" +"Swedish International Development Cooperation Agency (Sida)","Sida" +"Swedish International Development Agency","Sida" +"Federal Ministry for Economic Cooperation and Development, Germany","Federal Ministry for Economic Cooperation and Development" +"Natural Environment Research Council, United Kingdom","Natural Environment Research Council" +"Economic and Social Research Council, United Kingdom","Economic and Social Research Council" +"Medical Research Council, United Kingdom","Medical Research Council" +"Federal Ministry for Education and Research, Germany","Federal Ministry for Education, Science, Research and Technology" +"UK Government’s Department for International Development","Department for International Development, UK Government" +"Department for International Development, United Kingdom","Department for International Development, UK Government" +"United Nations Children's Fund","United Nations Children's Emergency Fund" +"Swedish Research Council for Environment, Agricultural Science and Spatial Planning","Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning" +"Agence Nationale de la Recherche, France","French National Research Agency" +"Fondation pour la recherche sur la biodiversité","Foundation for Research on Biodiversity" +"Programa Nacional de Innovacion Agraria, Peru","Programa Nacional de Innovación Agraria, Peru" +"United States Agency for International Development (USAID)","United States Agency for International Development" +"West Africa Agricultural Productivity Programme","West Africa Agricultural Productivity Program" +"West African Agricultural Productivity Project","West Africa Agricultural Productivity Program" +"Rural Development Administration, Republic of Korea","Rural Development Administration" +"UK’s Biotechnology and Biological Sciences Research Council (BBSRC)","Biotechnology and Biological Sciences Research Council" +$ ./fix-metadata-values.py -i /tmp/2020-06-29-fix-sponsors.csv -db dspace -u dspace -p 'fuuu' -f dc.description.sponsorship -t correct -m 29 +``` + +- Peter wants me to add "CORONAVIRUS DISEASE" to all ILRI items that have ILRI subject "COVID19" + - I exported the ILRI community and cut the columns I needed, then opened the file in OpenRefine: + +``` +$ export JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" +$ dspace metadata-export -i 10568/1 -f /tmp/ilri.cs +$ csvcut -c 'id,cg.subject.ilri[],cg.subject.ilri[en_US],dc.subject[en_US]' /tmp/ilri.csv > /tmp/ilri-covid19.csv +``` + +- I see that all items with "COVID19" already have "CORONAVIRUS DISEASE" so I don't need to do anything diff --git a/docs/2015-11/index.html b/docs/2015-11/index.html index 6727f1607..27f805324 100644 --- a/docs/2015-11/index.html +++ b/docs/2015-11/index.html @@ -31,7 +31,7 @@ Last week I had increased the limit from 30 to 60, which seemed to help, but now $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 78 "/> - + diff --git a/docs/2015-12/index.html b/docs/2015-12/index.html index 88907a060..008ac611c 100644 --- a/docs/2015-12/index.html +++ b/docs/2015-12/index.html @@ -33,7 +33,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz "/> - + diff --git a/docs/2016-01/index.html b/docs/2016-01/index.html index edc6c8f26..d7f1d5255 100644 --- a/docs/2016-01/index.html +++ b/docs/2016-01/index.html @@ -25,7 +25,7 @@ Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_ I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated. Update GitHub wiki for documentation of maintenance tasks. "/> - + diff --git a/docs/2016-02/index.html b/docs/2016-02/index.html index 92a11eb42..dff24eb78 100644 --- a/docs/2016-02/index.html +++ b/docs/2016-02/index.html @@ -35,7 +35,7 @@ I noticed we have a very interesting list of countries on CGSpace: Not only are there 49,000 countries, we have some blanks (25)… Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE” "/> - + diff --git a/docs/2016-03/index.html b/docs/2016-03/index.html index 55f91468d..2f711102a 100644 --- a/docs/2016-03/index.html +++ b/docs/2016-03/index.html @@ -25,7 +25,7 @@ Looking at issues with author authorities on CGSpace For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server "/> - + diff --git a/docs/2016-04/index.html b/docs/2016-04/index.html index 3005fc651..191dccb98 100644 --- a/docs/2016-04/index.html +++ b/docs/2016-04/index.html @@ -29,7 +29,7 @@ After running DSpace for over five years I’ve never needed to look in any This will save us a few gigs of backup space we’re paying for on S3 Also, I noticed the checker log has some errors we should pay attention to: "/> - + diff --git a/docs/2016-05/index.html b/docs/2016-05/index.html index 17fccd92b..42851cc9d 100644 --- a/docs/2016-05/index.html +++ b/docs/2016-05/index.html @@ -31,7 +31,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period! # awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l 3168 "/> - + diff --git a/docs/2016-06/index.html b/docs/2016-06/index.html index 37201c1fe..9d5510d83 100644 --- a/docs/2016-06/index.html +++ b/docs/2016-06/index.html @@ -31,7 +31,7 @@ This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRec You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship "/> - + diff --git a/docs/2016-07/index.html b/docs/2016-07/index.html index bc4f09e20..9efe989c9 100644 --- a/docs/2016-07/index.html +++ b/docs/2016-07/index.html @@ -41,7 +41,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and In this case the select query was showing 95 results before the update "/> - + diff --git a/docs/2016-08/index.html b/docs/2016-08/index.html index 106d28f7a..a5ba742ec 100644 --- a/docs/2016-08/index.html +++ b/docs/2016-08/index.html @@ -39,7 +39,7 @@ $ git checkout -b 55new 5_x-prod $ git reset --hard ilri/5_x-prod $ git rebase -i dspace-5.5 "/> - + diff --git a/docs/2016-09/index.html b/docs/2016-09/index.html index d410a9ba4..b83d6e94d 100644 --- a/docs/2016-09/index.html +++ b/docs/2016-09/index.html @@ -31,7 +31,7 @@ It looks like we might be able to use OUs now, instead of DCs: $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)" "/> - + diff --git a/docs/2016-10/index.html b/docs/2016-10/index.html index 64ef930d8..930753f24 100644 --- a/docs/2016-10/index.html +++ b/docs/2016-10/index.html @@ -39,7 +39,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id 0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X "/> - + diff --git a/docs/2016-11/index.html b/docs/2016-11/index.html index 150af92e2..8588e95d3 100644 --- a/docs/2016-11/index.html +++ b/docs/2016-11/index.html @@ -23,7 +23,7 @@ Add dc.type to the output options for Atmire’s Listings and Reports module Add dc.type to the output options for Atmire’s Listings and Reports module (#286) "/> - + diff --git a/docs/2016-12/index.html b/docs/2016-12/index.html index b0b521336..b20fa03f4 100644 --- a/docs/2016-12/index.html +++ b/docs/2016-12/index.html @@ -43,7 +43,7 @@ I see thousands of them in the logs for the last few months, so it’s not r I’ve raised a ticket with Atmire to ask Another worrying error from dspace.log is: "/> - + diff --git a/docs/2017-01/index.html b/docs/2017-01/index.html index 35c94e4a7..f568e906b 100644 --- a/docs/2017-01/index.html +++ b/docs/2017-01/index.html @@ -25,7 +25,7 @@ I checked to see if the Solr sharding task that is supposed to run on January 1s I tested on DSpace Test as well and it doesn’t work there either I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years "/> - + diff --git a/docs/2017-02/index.html b/docs/2017-02/index.html index 2f4162092..e5d5a6ca8 100644 --- a/docs/2017-02/index.html +++ b/docs/2017-02/index.html @@ -47,7 +47,7 @@ DELETE 1 Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301) Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name "/> - + diff --git a/docs/2017-03/index.html b/docs/2017-03/index.html index 947ca7a4f..f5409fe49 100644 --- a/docs/2017-03/index.html +++ b/docs/2017-03/index.html @@ -51,7 +51,7 @@ Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing reg $ identify ~/Desktop/alc_contrastes_desafios.jpg /Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000 "/> - + diff --git a/docs/2017-04/index.html b/docs/2017-04/index.html index a72e761fb..b026a67cf 100644 --- a/docs/2017-04/index.html +++ b/docs/2017-04/index.html @@ -37,7 +37,7 @@ Testing the CMYK patch on a collection with 650 items: $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt "/> - + diff --git a/docs/2017-05/index.html b/docs/2017-05/index.html index e35d3982e..b58e80a03 100644 --- a/docs/2017-05/index.html +++ b/docs/2017-05/index.html @@ -15,7 +15,7 @@ - + diff --git a/docs/2017-06/index.html b/docs/2017-06/index.html index a0329e7c2..5f3a51712 100644 --- a/docs/2017-06/index.html +++ b/docs/2017-06/index.html @@ -15,7 +15,7 @@ - + diff --git a/docs/2017-07/index.html b/docs/2017-07/index.html index 22871ba64..663bef437 100644 --- a/docs/2017-07/index.html +++ b/docs/2017-07/index.html @@ -33,7 +33,7 @@ Merge changes for WLE Phase II theme rename (#329) Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML: "/> - + diff --git a/docs/2017-08/index.html b/docs/2017-08/index.html index 6b211678d..b2ea6c6a8 100644 --- a/docs/2017-08/index.html +++ b/docs/2017-08/index.html @@ -57,7 +57,7 @@ This was due to newline characters in the dc.description.abstract column, which I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet "/> - + diff --git a/docs/2017-09/index.html b/docs/2017-09/index.html index e89f05cca..4c23c0ba7 100644 --- a/docs/2017-09/index.html +++ b/docs/2017-09/index.html @@ -29,7 +29,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group "/> - + diff --git a/docs/2017-10/index.html b/docs/2017-10/index.html index 0fd178ffd..239d30f8a 100644 --- a/docs/2017-10/index.html +++ b/docs/2017-10/index.html @@ -31,7 +31,7 @@ http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336 There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections "/> - + diff --git a/docs/2017-11/index.html b/docs/2017-11/index.html index 415c1912a..9fe942a2d 100644 --- a/docs/2017-11/index.html +++ b/docs/2017-11/index.html @@ -45,7 +45,7 @@ Generate list of authors on CGSpace for Peter to go through and correct: dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv; COPY 54701 "/> - + diff --git a/docs/2017-12/index.html b/docs/2017-12/index.html index 8bf43293f..5f84d32f0 100644 --- a/docs/2017-12/index.html +++ b/docs/2017-12/index.html @@ -27,7 +27,7 @@ The logs say “Timeout waiting for idle object” PostgreSQL activity says there are 115 connections currently The list of connections to XMLUI and REST API for today: "/> - + diff --git a/docs/2018-01/index.html b/docs/2018-01/index.html index 174553fdd..c526f704a 100644 --- a/docs/2018-01/index.html +++ b/docs/2018-01/index.html @@ -147,7 +147,7 @@ dspace.log.2018-01-02:34 Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains "/> - + diff --git a/docs/2018-02/index.html b/docs/2018-02/index.html index 470e5f044..69d902cee 100644 --- a/docs/2018-02/index.html +++ b/docs/2018-02/index.html @@ -27,7 +27,7 @@ We don’t need to distinguish between internal and external works, so that Yesterday I figured out how to monitor DSpace sessions using JMX I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01 "/> - + diff --git a/docs/2018-03/index.html b/docs/2018-03/index.html index 3e1ff310c..eaf5a901f 100644 --- a/docs/2018-03/index.html +++ b/docs/2018-03/index.html @@ -21,7 +21,7 @@ Export a CSV of the IITA community metadata for Martin Mueller Export a CSV of the IITA community metadata for Martin Mueller "/> - + diff --git a/docs/2018-04/index.html b/docs/2018-04/index.html index 18eb43d89..6031adbac 100644 --- a/docs/2018-04/index.html +++ b/docs/2018-04/index.html @@ -23,7 +23,7 @@ Catalina logs at least show some memory errors yesterday: I tried to test something on DSpace Test but noticed that it’s down since god knows when Catalina logs at least show some memory errors yesterday: "/> - + diff --git a/docs/2018-05/index.html b/docs/2018-05/index.html index ba8d74fc0..330ee64a0 100644 --- a/docs/2018-05/index.html +++ b/docs/2018-05/index.html @@ -35,7 +35,7 @@ http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E Then I reduced the JVM heap size from 6144 back to 5120m Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use "/> - + diff --git a/docs/2018-06/index.html b/docs/2018-06/index.html index 5c399d2d5..8e29315f6 100644 --- a/docs/2018-06/index.html +++ b/docs/2018-06/index.html @@ -55,7 +55,7 @@ real 74m42.646s user 8m5.056s sys 2m7.289s "/> - + diff --git a/docs/2018-07/index.html b/docs/2018-07/index.html index 3bb6b62b0..855622b5f 100644 --- a/docs/2018-07/index.html +++ b/docs/2018-07/index.html @@ -33,7 +33,7 @@ During the mvn package stage on the 5.8 branch I kept getting issues with java r There is insufficient memory for the Java Runtime Environment to continue. "/> - + diff --git a/docs/2018-08/index.html b/docs/2018-08/index.html index b86fcc92e..5f673505f 100644 --- a/docs/2018-08/index.html +++ b/docs/2018-08/index.html @@ -43,7 +43,7 @@ Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did The server only has 8GB of RAM so we’ll eventually need to upgrade to a larger one because we’ll start starving the OS, PostgreSQL, and command line batch processes I ran all system updates on DSpace Test and rebooted it "/> - + diff --git a/docs/2018-09/index.html b/docs/2018-09/index.html index 600d1c7c3..33ed34e7c 100644 --- a/docs/2018-09/index.html +++ b/docs/2018-09/index.html @@ -27,7 +27,7 @@ I’ll update the DSpace role in our Ansible infrastructure playbooks and ru Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again: "/> - + diff --git a/docs/2018-10/index.html b/docs/2018-10/index.html index 06fce9e4d..058612a6d 100644 --- a/docs/2018-10/index.html +++ b/docs/2018-10/index.html @@ -23,7 +23,7 @@ I created a GitHub issue to track this #389, because I’m super busy in Nai Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now "/> - + diff --git a/docs/2018-11/index.html b/docs/2018-11/index.html index 6f2bad265..b3e9778eb 100644 --- a/docs/2018-11/index.html +++ b/docs/2018-11/index.html @@ -33,7 +33,7 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage Today these are the top 10 IPs: "/> - + diff --git a/docs/2018-12/index.html b/docs/2018-12/index.html index 955209081..2bb878ce6 100644 --- a/docs/2018-12/index.html +++ b/docs/2018-12/index.html @@ -33,7 +33,7 @@ Then I ran all system updates and restarted the server I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week "/> - + diff --git a/docs/2019-01/index.html b/docs/2019-01/index.html index f8d25ee7a..2a6eb93c2 100644 --- a/docs/2019-01/index.html +++ b/docs/2019-01/index.html @@ -47,7 +47,7 @@ I don’t see anything interesting in the web server logs around that time t 357 207.46.13.1 903 54.70.40.11 "/> - + diff --git a/docs/2019-02/index.html b/docs/2019-02/index.html index 05118c2c6..2e340d993 100644 --- a/docs/2019-02/index.html +++ b/docs/2019-02/index.html @@ -69,7 +69,7 @@ real 0m19.873s user 0m22.203s sys 0m1.979s "/> - + diff --git a/docs/2019-03/index.html b/docs/2019-03/index.html index ca8f4abbe..a86d3684b 100644 --- a/docs/2019-03/index.html +++ b/docs/2019-03/index.html @@ -43,7 +43,7 @@ Most worryingly, there are encoding errors in the abstracts for eleven items, fo I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs "/> - + diff --git a/docs/2019-04/index.html b/docs/2019-04/index.html index 83caa3754..7faa5cf5f 100644 --- a/docs/2019-04/index.html +++ b/docs/2019-04/index.html @@ -61,7 +61,7 @@ $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u ds $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d "/> - + diff --git a/docs/2019-05/index.html b/docs/2019-05/index.html index 9c20c4f58..4b62886e6 100644 --- a/docs/2019-05/index.html +++ b/docs/2019-05/index.html @@ -45,7 +45,7 @@ DELETE 1 But after this I tried to delete the item from the XMLUI and it is still present… "/> - + diff --git a/docs/2019-06/index.html b/docs/2019-06/index.html index 097dd2e64..9b5aa570d 100644 --- a/docs/2019-06/index.html +++ b/docs/2019-06/index.html @@ -31,7 +31,7 @@ Run system updates on CGSpace (linode18) and reboot it Skype with Marie-Angélique and Abenet about CG Core v2 "/> - + diff --git a/docs/2019-07/index.html b/docs/2019-07/index.html index 55db346a7..7bbac007e 100644 --- a/docs/2019-07/index.html +++ b/docs/2019-07/index.html @@ -35,7 +35,7 @@ CGSpace Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community "/> - + diff --git a/docs/2019-08/index.html b/docs/2019-08/index.html index f430c9068..6df28d9a8 100644 --- a/docs/2019-08/index.html +++ b/docs/2019-08/index.html @@ -43,7 +43,7 @@ After rebooting, all statistics cores were loaded… wow, that’s luck Run system updates on DSpace Test (linode19) and reboot it "/> - + diff --git a/docs/2019-09/index.html b/docs/2019-09/index.html index 84de320ce..086f36371 100644 --- a/docs/2019-09/index.html +++ b/docs/2019-09/index.html @@ -69,7 +69,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning: 7249 2a01:7e00::f03c:91ff:fe18:7396 9124 45.5.186.2 "/> - + diff --git a/docs/2019-10/index.html b/docs/2019-10/index.html index ef1828b3e..312de5af4 100644 --- a/docs/2019-10/index.html +++ b/docs/2019-10/index.html @@ -15,7 +15,7 @@ - + diff --git a/docs/2019-11/index.html b/docs/2019-11/index.html index db694cc8c..a23f4eeec 100644 --- a/docs/2019-11/index.html +++ b/docs/2019-11/index.html @@ -55,7 +55,7 @@ Let’s see how many of the REST API requests were for bitstreams (because t # zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams" 106781 "/> - + diff --git a/docs/2019-12/index.html b/docs/2019-12/index.html index 86ea90f9a..a0785ca17 100644 --- a/docs/2019-12/index.html +++ b/docs/2019-12/index.html @@ -43,7 +43,7 @@ Make sure all packages are up to date and the package manager is up to date, the # dpkg -C # reboot "/> - + diff --git a/docs/2020-01/index.html b/docs/2020-01/index.html index 0d4d46c85..e25f83a8f 100644 --- a/docs/2020-01/index.html +++ b/docs/2020-01/index.html @@ -53,7 +53,7 @@ I tweeted the CGSpace repository link "/> - + diff --git a/docs/2020-02/index.html b/docs/2020-02/index.html index 45d1f5783..42c9aff3b 100644 --- a/docs/2020-02/index.html +++ b/docs/2020-02/index.html @@ -35,7 +35,7 @@ The code finally builds and runs with a fresh install "/> - + diff --git a/docs/2020-03/index.html b/docs/2020-03/index.html index a536c0ff6..d47879cb6 100644 --- a/docs/2020-03/index.html +++ b/docs/2020-03/index.html @@ -39,7 +39,7 @@ You need to download this into the DSpace 6.x source and compile it "/> - + diff --git a/docs/2020-04/index.html b/docs/2020-04/index.html index 07bbe1d6c..b89ed4bb6 100644 --- a/docs/2020-04/index.html +++ b/docs/2020-04/index.html @@ -45,7 +45,7 @@ The third item now has a donut with score 1 since I tweeted it last week On the same note, the one item Abenet pointed out last week now has a donut with score of 104 after I tweeted it last week "/> - + diff --git a/docs/2020-05/index.html b/docs/2020-05/index.html index 8f25f64a3..b127c1d70 100644 --- a/docs/2020-05/index.html +++ b/docs/2020-05/index.html @@ -31,7 +31,7 @@ I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2 "/> - + diff --git a/docs/2020-06/index.html b/docs/2020-06/index.html index 9c41e0a7c..5d61e3e65 100644 --- a/docs/2020-06/index.html +++ b/docs/2020-06/index.html @@ -19,7 +19,7 @@ I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Tes - + @@ -33,7 +33,7 @@ I sent Atmire the dspace.log from today and told them to log into the server to In other news, I checked the statistics API on DSpace 6 and it’s working I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error: "/> - + @@ -43,9 +43,9 @@ I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Tes "@type": "BlogPosting", "headline": "June, 2020", "url": "https://alanorth.github.io/cgspace-notes/2020-06/", - "wordCount": "3319", + "wordCount": "4764", "datePublished": "2020-06-01T13:55:39+03:00", - "dateModified": "2020-06-23T16:13:27+03:00", + "dateModified": "2020-06-28T18:13:44+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -594,6 +594,191 @@ COPY 3917 +

2020-06-28

+ +

2020-06-29

+ +
+

Because coverage is so broadly defined, it is preferable to use the more specific subproperties Temporal Coverage and Spatial Coverage.

+
+ +
$ http 'https://api.crossref.org/funders?query=Bill+and+Melinda+Gates&mailto=a.orth@cgiar.org'
+
+
dspace=# \COPY (SELECT DISTINCT text_value FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=29) TO /tmp/2020-06-29-sponsors.csv;
+COPY 682
+
+
$ ./crossref-funders-lookup.py -i /tmp/2020-06-29-sponsors.csv -om /tmp/sponsors-matched.txt -or /tmp/sponsors-rejected.txt -d -e blah@blah.com
+$ wc -l /tmp/2020-06-29-sponsors.csv 
+682 /tmp/2020-06-29-sponsors.csv
+$ wc -l /tmp/sponsors-*
+  180 /tmp/sponsors-matched.txt
+  502 /tmp/sponsors-rejected.txt
+  682 total
+
+

2020-06-30

+ +
$ cat /tmp/2020-06-30-remove-cip-subjects.csv 
+cg.subject.cip
+INTEGRATED PEST MANAGEMENT
+ORANGE FLESH SWEET POTATOES
+AEROPONICS
+FOOD SUPPLY
+SASHA
+SPHI
+INSECT LIFE CYCLE MODELLING
+SUSTAIN
+AGRICULTURAL INNOVATIONS
+NATIVE VARIETIES
+PHYTOPHTHORA INFESTANS
+$ ./delete-metadata-values.py -i /tmp/2020-06-30-remove-cip-subjects.csv -db dspace -u dspace -p 'fuuu' -f cg.subject.cip -m 127 -d
+
+
$ cat /tmp/2020-06-30-fix-cip-subjects.csv 
+cg.subject.cip,correct
+SWEET POTATOES,SWEETPOTATOES
+$ ./fix-metadata-values.py -i /tmp/2020-06-30-fix-cip-subjects.csv -db dspace -u dspace -p 'fuuu' -f cg.subject.cip -t correct -m 127 -d
+
+
$ cat 2020-06-29-fix-sponsors.csv
+dc.description.sponsorship,correct
+"Conselho Nacional de Desenvolvimento Científico e Tecnológico, Brazil","Conselho Nacional de Desenvolvimento Científico e Tecnológico"
+"Claussen Simon Stiftung","Claussen-Simon-Stiftung"
+"Fonds pour la formation á la Recherche dans l'Industrie et dans l'Agriculture, Belgium","Fonds pour la Formation à la Recherche dans l’Industrie et dans l’Agriculture"
+"Fundação de Amparo à Pesquisa do Estado de São Paulo, Brazil","Fundação de Amparo à Pesquisa do Estado de São Paulo"
+"Schlumberger Foundation Faculty for the Future","Schlumberger Foundation"
+"Wildlife Conservation Society, United States","Wildlife Conservation Society"
+"Portuguese Foundation for Science and Technology","Portuguese Science and Technology Foundation"
+"Wageningen University and Research","Wageningen University and Research Centre"
+"Leverhulme Centre for Integrative Research in Agriculture and Health","Leverhulme Centre for Integrative Research on Agriculture and Health"
+"Natural Science and Engineering Research Council of Canada","Natural Sciences and Engineering Research Council of Canada"
+"Biotechnology and Biological Sciences Research Council, United Kingdom","Biotechnology and Biological Sciences Research Council"
+"Home Grown Ceraels Authority United Kingdom","Home-Grown Cereals Authority"
+"Fiat Panis Foundation","Foundation fiat panis"
+"Defence Science and Technology Laboratory, United Kingdom","Defence Science and Technology Laboratory"
+"African Development Bank","African Development Bank Group"
+"Ministry of Health, Labour, and Welfare, Japan","Ministry of Health, Labour and Welfare"
+"World Academy of Sciences","The World Academy of Sciences"
+"Agricultural Research Council, South Africa","Agricultural Research Council"
+"Department of Homeland Security, USA","U.S. Department of Homeland Security"
+"Quadram Institute","Quadram Institute Bioscience"
+"Google.org","Google"
+"Department for Environment, Food and Rural Affairs, United Kingdom","Department for Environment, Food and Rural Affairs, UK Government"
+"National Commission for Science, Technology and Innovation, Kenya","National Commission for Science, Technology and Innovation"
+"Hainan Province Natural Science Foundation of China","Natural Science Foundation of Hainan Province"
+"German Society for International Cooperation (GIZ)","GIZ"
+"German Federal Ministry of Food and Agriculture","Federal Ministry of Food and Agriculture"
+"State Key Laboratory of Environmental Geochemistry, China","State Key Laboratory of Environmental Geochemistry"
+"QUT student scholarship","Queensland University of Technology"
+"Australia Centre for International Agricultural Research","Australian Centre for International Agricultural Research"
+"Belgian Science Policy","Belgian Federal Science Policy Office"
+"U.S. Department of Agriculture USDA","U.S. Department of Agriculture"
+"U.S.. Department of Agriculture (USDA)","U.S. Department of Agriculture"
+"Fundação de Amparo à Pesquisa do Estado de São Paulo ( FAPESP)","Fundação de Amparo à Pesquisa do Estado de São Paulo"
+"Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul, Brazil","Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul"
+"Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro, Brazil","Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro"
+"Swedish University of Agricultural Sciences (SLU)","Swedish University of Agricultural Sciences"
+"U.S. Department of Agriculture (USDA)","U.S. Department of Agriculture"
+"Swedish International Development Cooperation Agency (Sida)","Sida"
+"Swedish International Development Agency","Sida"
+"Federal Ministry for Economic Cooperation and Development, Germany","Federal Ministry for Economic Cooperation and Development"
+"Natural Environment Research Council, United Kingdom","Natural Environment Research Council"
+"Economic and Social Research Council, United Kingdom","Economic and Social Research Council"
+"Medical Research Council, United Kingdom","Medical Research Council"
+"Federal Ministry for Education and Research, Germany","Federal Ministry for Education, Science, Research and Technology"
+"UK Government’s Department for International Development","Department for International Development, UK Government"
+"Department for International Development, United Kingdom","Department for International Development, UK Government"
+"United Nations Children's Fund","United Nations Children's Emergency Fund"
+"Swedish Research Council for Environment, Agricultural Science and Spatial Planning","Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning"
+"Agence Nationale de la Recherche, France","French National Research Agency"
+"Fondation pour la recherche sur la biodiversité","Foundation for Research on Biodiversity"
+"Programa Nacional de Innovacion Agraria, Peru","Programa Nacional de Innovación Agraria, Peru"
+"United States Agency for International Development (USAID)","United States Agency for International Development"
+"West Africa Agricultural Productivity Programme","West Africa Agricultural Productivity Program"
+"West African Agricultural Productivity Project","West Africa Agricultural Productivity Program"
+"Rural Development Administration, Republic of Korea","Rural Development Administration"
+"UK’s Biotechnology and Biological Sciences Research Council (BBSRC)","Biotechnology and Biological Sciences Research Council"
+$ ./fix-metadata-values.py -i /tmp/2020-06-29-fix-sponsors.csv -db dspace -u dspace -p 'fuuu' -f dc.description.sponsorship -t correct -m 29
+
+
$ export JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8"
+$ dspace metadata-export -i 10568/1 -f /tmp/ilri.cs
+$ csvcut -c 'id,cg.subject.ilri[],cg.subject.ilri[en_US],dc.subject[en_US]' /tmp/ilri.csv > /tmp/ilri-covid19.csv
+
diff --git a/docs/404.html b/docs/404.html index d32928073..b75b1ade1 100644 --- a/docs/404.html +++ b/docs/404.html @@ -14,7 +14,7 @@ - + diff --git a/docs/categories/index.html b/docs/categories/index.html index d42f5649c..c799ba7a6 100644 --- a/docs/categories/index.html +++ b/docs/categories/index.html @@ -9,31 +9,17 @@ - + - + - - + + @@ -94,304 +80,17 @@ -
-

June, 2020

- +

Notes

+
-

2020-06-01

- - Read more → -
- - - - - - -
-
-

May, 2020

- -
-

2020-05-02

- - Read more → -
- - - - - - -
-
-

April, 2020

- -
-

2020-04-02

- - Read more → -
- - - - - - -
-
-

March, 2020

- -
-

2020-03-02

- - Read more → -
- - - - - - -
-
-

February, 2020

- -
-

2020-02-02

- - Read more → -
- - - - - - -
-
-

January, 2020

- -
-

2020-01-06

- -

2020-01-07

- - Read more → -
- - - - - - -
-
-

December, 2019

- -
-

2019-12-01

- -
# apt update && apt full-upgrade
-# apt-get autoremove && apt-get autoclean
-# dpkg -C
-# reboot
-
- Read more → -
- - - - - - -
-
-

November, 2019

- -
-

2019-11-04

- -
# zcat --force /var/log/nginx/*access.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
-4671942
-# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
-1277694
-
-
# zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E "[0-9]{1,2}/Oct/2019"
-1183456 
-# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
-106781
-
- Read more → -
- - - - - - -
-
-

CGSpace CG Core v2 Migration

- -
-

Possible changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2.

-

With reference to CG Core v2 draft standard by Marie-Angélique as well as DCMI DCTERMS.

- Read more → -
- - - - - - -
-
-

October, 2019

- -
- 2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script’s “unneccesary Unicode” fix: $ csvcut -c 'id,dc. - Read more → -
- - - - - - + Read more → + + + + diff --git a/docs/categories/notes/index.html b/docs/categories/notes/index.html index 1258d54ad..2e163fbe8 100644 --- a/docs/categories/notes/index.html +++ b/docs/categories/notes/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html index aa805359b..849490120 100644 --- a/docs/categories/notes/page/2/index.html +++ b/docs/categories/notes/page/2/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html index 4036cefe9..4aeb51860 100644 --- a/docs/categories/notes/page/3/index.html +++ b/docs/categories/notes/page/3/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html index 6d3a2b9a1..2d7787f95 100644 --- a/docs/categories/notes/page/4/index.html +++ b/docs/categories/notes/page/4/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/cgiar-library-migration/index.html b/docs/cgiar-library-migration/index.html index c8cd0a02e..86a695c0a 100644 --- a/docs/cgiar-library-migration/index.html +++ b/docs/cgiar-library-migration/index.html @@ -15,7 +15,7 @@ - + diff --git a/docs/cgspace-cgcorev2-migration/index.html b/docs/cgspace-cgcorev2-migration/index.html index 4d74b84f5..b065f0d96 100644 --- a/docs/cgspace-cgcorev2-migration/index.html +++ b/docs/cgspace-cgcorev2-migration/index.html @@ -15,7 +15,7 @@ - + diff --git a/docs/index.html b/docs/index.html index ff916230d..6173f42a8 100644 --- a/docs/index.html +++ b/docs/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/page/2/index.html b/docs/page/2/index.html index 59c42a50b..ee1aeb4ed 100644 --- a/docs/page/2/index.html +++ b/docs/page/2/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/page/3/index.html b/docs/page/3/index.html index 7027de137..90c76cadd 100644 --- a/docs/page/3/index.html +++ b/docs/page/3/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/page/4/index.html b/docs/page/4/index.html index d8ee6cb49..7e65c5178 100644 --- a/docs/page/4/index.html +++ b/docs/page/4/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/page/5/index.html b/docs/page/5/index.html index 441c0f5dd..918015225 100644 --- a/docs/page/5/index.html +++ b/docs/page/5/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/page/6/index.html b/docs/page/6/index.html index c945dffc4..c870329fa 100644 --- a/docs/page/6/index.html +++ b/docs/page/6/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/index.html b/docs/posts/index.html index 8ceb2fcab..bec4b1708 100644 --- a/docs/posts/index.html +++ b/docs/posts/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/page/2/index.html b/docs/posts/page/2/index.html index 6f4e5a4df..9049e4554 100644 --- a/docs/posts/page/2/index.html +++ b/docs/posts/page/2/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/page/3/index.html b/docs/posts/page/3/index.html index 157f0217a..289ad9f09 100644 --- a/docs/posts/page/3/index.html +++ b/docs/posts/page/3/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/page/4/index.html b/docs/posts/page/4/index.html index b63b7ecc5..6905b7a1a 100644 --- a/docs/posts/page/4/index.html +++ b/docs/posts/page/4/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/page/5/index.html b/docs/posts/page/5/index.html index d17b49ffa..0cdda3905 100644 --- a/docs/posts/page/5/index.html +++ b/docs/posts/page/5/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/posts/page/6/index.html b/docs/posts/page/6/index.html index 9fdd4a120..620b638f2 100644 --- a/docs/posts/page/6/index.html +++ b/docs/posts/page/6/index.html @@ -9,12 +9,12 @@ - + - + diff --git a/docs/sitemap.xml b/docs/sitemap.xml index ba61a3106..3f74095be 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,27 +4,27 @@ https://alanorth.github.io/cgspace-notes/categories/ - 2020-06-23T16:13:27+03:00 + 2020-06-28T18:13:44+03:00 https://alanorth.github.io/cgspace-notes/ - 2020-06-23T16:13:27+03:00 + 2020-06-28T18:13:44+03:00 https://alanorth.github.io/cgspace-notes/2020-06/ - 2020-06-23T16:13:27+03:00 + 2020-06-28T18:13:44+03:00 https://alanorth.github.io/cgspace-notes/categories/notes/ - 2020-06-23T16:13:27+03:00 + 2020-06-28T18:13:44+03:00 https://alanorth.github.io/cgspace-notes/posts/ - 2020-06-23T16:13:27+03:00 + 2020-06-28T18:13:44+03:00 diff --git a/docs/tags/index.html b/docs/tags/index.html index 77b939c23..08a8c4447 100644 --- a/docs/tags/index.html +++ b/docs/tags/index.html @@ -14,26 +14,12 @@ - + - - + + @@ -94,304 +80,31 @@ -
-

June, 2020

- +

Migration

+
-

2020-06-01

-
    -
  • I tried to run the AtomicStatisticsUpdateCLI CUA migration script on DSpace Test (linode26) again and it is still going very slowly and has tons of errors like I noticed yesterday -
      -
    • I sent Atmire the dspace.log from today and told them to log into the server to debug the process
    • -
    -
  • -
  • In other news, I checked the statistics API on DSpace 6 and it’s working
  • -
  • I tried to build the OAI registry on the freshly migrated DSpace 6 on DSpace Test and I get an error:
  • -
- Read more → -
- - - - - - -
-
-

May, 2020

- -
-

2020-05-02

-
    -
  • Peter said that CTA is having problems submitting an item to CGSpace -
      -
    • Looking at the PostgreSQL stats it seems to be the same issue that Tezira was having last week, as I see the number of connections in ‘idle in transaction’ and ‘waiting for lock’ state are increasing again
    • -
    • I see that CGSpace (linode18) is still using PostgreSQL JDBC driver version 42.2.11, and there were some bugs related to transactions fixed in 42.2.12 (which I had updated in the Ansible playbooks, but not deployed yet)
    • -
    -
  • -
- Read more → -
- - - - - - -
-
-

April, 2020

- -
-

2020-04-02

-
    -
  • Maria asked me to update Charles Staver’s ORCID iD in the submission template and on CGSpace, as his name was lower case before, and now he has corrected it -
      -
    • I updated the fifty-eight existing items on CGSpace
    • -
    -
  • -
  • Looking into the items Udana had asked about last week that were missing Altmetric donuts: - -
  • -
  • On the same note, the one item Abenet pointed out last week now has a donut with score of 104 after I tweeted it last week
  • -
- Read more → -
- - - - - - - - - - - - - -
-
-

February, 2020

- -
-

2020-02-02

-
    -
  • Continue working on porting CGSpace’s DSpace 5 code to DSpace 6.3 that I started yesterday -
      -
    • Sign up for an account with MaxMind so I can get the GeoLite2-City.mmdb database
    • -
    • I still need to wire up the API credentials and cron job into the Ansible infrastructure playbooks
    • -
    • Fix some minor issues in the config and XMLUI themes, like removing Atmire stuff
    • -
    • The code finally builds and runs with a fresh install
    • -
    -
  • -
- Read more → -
- - - - - - -
-
-

January, 2020

- -
-

2020-01-06

-
    -
  • Open a ticket with Atmire to request a quote for the upgrade to DSpace 6
  • -
  • Last week Altmetric responded about the item that had a lower score than than its DOI -
      -
    • The score is now linked to the DOI
    • -
    • Another item that had the same problem in 2019 has now also linked to the score for its DOI
    • -
    • Another item that had the same problem in 2019 has also been fixed
    • -
    -
  • -
-

2020-01-07

-
    -
  • Peter Ballantyne highlighted one more WLE item that is missing the Altmetric score that its DOI has -
      -
    • The DOI has a score of 259, but the Handle has no score at all
    • -
    • I tweeted the CGSpace repository link
    • -
    -
  • -
- Read more → -
- - - - - - -
-
-

December, 2019

- -
-

2019-12-01

-
    -
  • Upgrade CGSpace (linode18) to Ubuntu 18.04: -
      -
    • Check any packages that have residual configs and purge them:
    • -
    • # dpkg -l | grep -E ‘^rc’ | awk ‘{print $2}’ | xargs dpkg -P
    • -
    • Make sure all packages are up to date and the package manager is up to date, then reboot:
    • -
    -
  • -
-
# apt update && apt full-upgrade
-# apt-get autoremove && apt-get autoclean
-# dpkg -C
-# reboot
-
- Read more → -
- - - - - - -
-
-

November, 2019

- -
-

2019-11-04

-
    -
  • Peter noticed that there were 5.2 million hits on CGSpace in 2019-10 according to the Atmire usage statistics -
      -
    • I looked in the nginx logs and see 4.6 million in the access logs, and 1.2 million in the API logs:
    • -
    -
  • -
-
# zcat --force /var/log/nginx/*access.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
-4671942
-# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
-1277694
-
    -
  • So 4.6 million from XMLUI and another 1.2 million from API requests
  • -
  • Let’s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):
  • -
-
# zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E "[0-9]{1,2}/Oct/2019"
-1183456 
-# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
-106781
-
- Read more → -
- - - - - - - - - - - - - -
-
-

October, 2019

- -
- 2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script’s “unneccesary Unicode” fix: $ csvcut -c 'id,dc. - Read more → -
- - - - - - + Read more → + + + + diff --git a/docs/tags/migration/index.html b/docs/tags/migration/index.html index 8d7d0f6d4..79f7c8326 100644 --- a/docs/tags/migration/index.html +++ b/docs/tags/migration/index.html @@ -14,7 +14,7 @@ - + diff --git a/docs/tags/notes/index.html b/docs/tags/notes/index.html index 367ae0aeb..97b485c50 100644 --- a/docs/tags/notes/index.html +++ b/docs/tags/notes/index.html @@ -14,7 +14,7 @@ - + diff --git a/docs/tags/notes/page/2/index.html b/docs/tags/notes/page/2/index.html index 5622c5f35..cc6e18380 100644 --- a/docs/tags/notes/page/2/index.html +++ b/docs/tags/notes/page/2/index.html @@ -14,7 +14,7 @@ - + diff --git a/docs/tags/notes/page/3/index.html b/docs/tags/notes/page/3/index.html index f128f61e5..33335a91a 100644 --- a/docs/tags/notes/page/3/index.html +++ b/docs/tags/notes/page/3/index.html @@ -14,7 +14,7 @@ - +