mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-03-04
This commit is contained in:
@ -26,7 +26,7 @@ I created a GitHub issue to track this #389, because I’m super busy in Nai
|
||||
Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
|
||||
I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -121,7 +121,7 @@ I created a GitHub issue to track this #389, because I’m super busy in Nai
|
||||
<ul>
|
||||
<li>I see Moayad was busy collecting item views and downloads from CGSpace yesterday:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
933 40.77.167.90
|
||||
971 95.108.181.88
|
||||
1043 41.204.190.40
|
||||
@ -135,13 +135,13 @@ I created a GitHub issue to track this #389, because I’m super busy in Nai
|
||||
</code></pre><ul>
|
||||
<li>Of those, about 20% were HTTP 500 responses (!):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | grep 34.218.226.147 | awk '{print $9}' | sort -n | uniq -c
|
||||
<pre tabindex="0"><code>$ zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Oct/2018" | grep 34.218.226.147 | awk '{print $9}' | sort -n | uniq -c
|
||||
118927 200
|
||||
31435 500
|
||||
</code></pre><ul>
|
||||
<li>I added Phil Thornton and Sonal Henson’s ORCID identifiers to the controlled vocabulary for <code>cg.creator.orcid</code> and then re-generated the names using my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml | sort | uniq > 2018-10-03-orcids.txt
|
||||
<pre tabindex="0"><code>$ grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml | sort | uniq > 2018-10-03-orcids.txt
|
||||
$ ./resolve-orcids.py -i 2018-10-03-orcids.txt -o 2018-10-03-names.txt -d
|
||||
</code></pre><ul>
|
||||
<li>I found a new corner case error that I need to check, given <em>and</em> family names deactivated:</li>
|
||||
@ -154,7 +154,7 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
<li>Linode sent another alert about CPU usage on CGSpace (linode18) this evening</li>
|
||||
<li>It seems that Moayad is making quite a lot of requests today:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "03/Oct/2018" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
1594 157.55.39.160
|
||||
1627 157.55.39.173
|
||||
1774 136.243.6.84
|
||||
@ -169,13 +169,13 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
<li>But in super positive news, he says they are using my new <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> and it’s MUCH faster than using Atmire CUA’s internal “restlet” API</li>
|
||||
<li>I don’t recognize the <code>138.201.49.199</code> IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c
|
||||
<pre tabindex="0"><code># grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /[a-z]+' | sort | uniq -c
|
||||
8324 GET /bitstream
|
||||
4193 GET /handle
|
||||
</code></pre><ul>
|
||||
<li>Suspiciously, it’s only grabbing the CGIAR System Office community (handle prefix 10947):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c
|
||||
<pre tabindex="0"><code># grep 138.201.49.199 /var/log/nginx/access.log | grep -o -E 'GET /handle/[0-9]{5}' | sort | uniq -c
|
||||
7 GET /handle/10568
|
||||
4186 GET /handle/10947
|
||||
</code></pre><ul>
|
||||
@ -187,19 +187,19 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
<li>I looked in Solr’s statistics core and these hits were actually all counted as <code>isBot:false</code> (of course)… hmmm</li>
|
||||
<li>I tagged all of Sonal and Phil’s items with their ORCID identifiers on CGSpace using my <a href="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers.py</a> script:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./add-orcid-identifiers-csv.py -i 2018-10-03-add-orcids.csv -db dspace -u dspace -p 'fuuu'
|
||||
<pre tabindex="0"><code>$ ./add-orcid-identifiers-csv.py -i 2018-10-03-add-orcids.csv -db dspace -u dspace -p 'fuuu'
|
||||
</code></pre><ul>
|
||||
<li>Where <code>2018-10-03-add-orcids.csv</code> contained:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dc.contributor.author,cg.creator.id
|
||||
"Henson, Sonal P.",Sonal Henson: 0000-0002-2002-5462
|
||||
"Henson, S.",Sonal Henson: 0000-0002-2002-5462
|
||||
"Thornton, P.K.",Philip Thornton: 0000-0002-1854-0182
|
||||
"Thornton, Philip K",Philip Thornton: 0000-0002-1854-0182
|
||||
"Thornton, Phil",Philip Thornton: 0000-0002-1854-0182
|
||||
"Thornton, Philip K.",Philip Thornton: 0000-0002-1854-0182
|
||||
"Thornton, Phillip",Philip Thornton: 0000-0002-1854-0182
|
||||
"Thornton, Phillip K.",Philip Thornton: 0000-0002-1854-0182
|
||||
"Henson, Sonal P.",Sonal Henson: 0000-0002-2002-5462
|
||||
"Henson, S.",Sonal Henson: 0000-0002-2002-5462
|
||||
"Thornton, P.K.",Philip Thornton: 0000-0002-1854-0182
|
||||
"Thornton, Philip K",Philip Thornton: 0000-0002-1854-0182
|
||||
"Thornton, Phil",Philip Thornton: 0000-0002-1854-0182
|
||||
"Thornton, Philip K.",Philip Thornton: 0000-0002-1854-0182
|
||||
"Thornton, Phillip",Philip Thornton: 0000-0002-1854-0182
|
||||
"Thornton, Phillip K.",Philip Thornton: 0000-0002-1854-0182
|
||||
</code></pre><h2 id="2018-10-04">2018-10-04</h2>
|
||||
<ul>
|
||||
<li>Salem raised an issue that the dspace-statistics-api reports downloads for some items that have no bitstreams (like many limited access items)</li>
|
||||
@ -214,7 +214,7 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
<li>So it’s fixed, but I’m not sure why!</li>
|
||||
<li>Peter wants to know the number of API requests per month, which was about 250,000 in September (exluding statlet requests):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/{oai,rest}.log* | grep -E 'Sep/2018' | grep -c -v 'statlets'
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/{oai,rest}.log* | grep -E 'Sep/2018' | grep -c -v 'statlets'
|
||||
251226
|
||||
</code></pre><ul>
|
||||
<li>I found a logic error in the dspace-statistics-api <code>indexer.py</code> script that was causing item views to be inserted into downloads</li>
|
||||
@ -243,7 +243,7 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
<li>When I tried to force them to be generated I got an error that I’ve never seen before:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ dspace filter-media -v -f -i 10568/97613
|
||||
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: not authorized `/tmp/impdfthumb5039464037201498062.pdf' @ error/constitute.c/ReadImage/412.
|
||||
org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.core.CommandException: identify: not authorized `/tmp/impdfthumb5039464037201498062.pdf' @ error/constitute.c/ReadImage/412.
|
||||
</code></pre><ul>
|
||||
<li>I see there was an update to Ubuntu’s ImageMagick on 2018-10-05, so maybe something changed or broke?</li>
|
||||
<li>I get the same error when forcing <code>filter-media</code> to run on DSpace Test too, so it’s gotta be an ImageMagic bug</li>
|
||||
@ -251,7 +251,7 @@ org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.c
|
||||
<li>Wow, someone on <a href="https://twitter.com/rosscampbell/status/1048268966819319808">Twitter posted about this breaking his web application</a> (and it was retweeted by the ImageMagick acount!)</li>
|
||||
<li>I commented out the line that disables PDF thumbnails in <code>/etc/ImageMagick-6/policy.xml</code>:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code> <!--<policy domain="coder" rights="none" pattern="PDF" />-->
|
||||
<pre tabindex="0"><code> <!--<policy domain="coder" rights="none" pattern="PDF" />-->
|
||||
</code></pre><ul>
|
||||
<li>This works, but I’m not sure what ImageMagick’s long-term plan is if they are going to disable ALL image formats…</li>
|
||||
<li>I suppose I need to enable a workaround for this in Ansible?</li>
|
||||
@ -274,9 +274,9 @@ $ sudo podman create --name dspacedb -v /home/aorth/.local/lib/containers/volume
|
||||
$ sudo podman start dspacedb
|
||||
$ createuser -h localhost -U postgres --pwprompt dspacetest
|
||||
$ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2018-10-11.backup
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
|
||||
</code></pre><ul>
|
||||
<li>I tried to make an Artifactory in podman, but it seems to have problems because Artifactory is distributed on the Bintray repository</li>
|
||||
@ -311,7 +311,7 @@ COPY 10000
|
||||
</code></pre><ul>
|
||||
<li>Then I exported and applied them on my local test server:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i 2018-10-11-top-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t CORRECT -m 3
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i 2018-10-11-top-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t CORRECT -m 3
|
||||
</code></pre><ul>
|
||||
<li>I will apply these on CGSpace when I do the other updates tomorrow, as well as double check the high scoring ones to see if they are correct in Sisay’s author controlled vocabulary</li>
|
||||
</ul>
|
||||
@ -321,7 +321,7 @@ COPY 10000
|
||||
<li>Switch to new CGIAR LDAP server on CGSpace, as it’s been running (at least for authentication) on DSpace Test for the last few weeks, and I think they old one will be deprecated soon (today?)</li>
|
||||
<li>Apply Peter’s 746 author corrections on CGSpace and DSpace Test using my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a> script:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-10-11-top-authors.csv -f dc.contributor.author -t CORRECT -m 3 -db dspace -u dspace -p 'fuuu'
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-10-11-top-authors.csv -f dc.contributor.author -t CORRECT -m 3 -db dspace -u dspace -p 'fuuu'
|
||||
</code></pre><ul>
|
||||
<li>Run all system updates on CGSpace (linode19) and reboot the server</li>
|
||||
<li>After rebooting the server I noticed that Handles are not resolving, and the <code>dspace-handle-server</code> systemd service is not running (or rather, it exited with success)</li>
|
||||
@ -356,20 +356,20 @@ COPY 10000
|
||||
$ sudo docker run --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:9.6-alpine
|
||||
$ createuser -h localhost -U postgres --pwprompt dspacetest
|
||||
$ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
|
||||
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2018-10-11.backup
|
||||
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
|
||||
</code></pre><h2 id="2018-10-16">2018-10-16</h2>
|
||||
<ul>
|
||||
<li>Generate a list of the schema on CGSpace so CodeObia can compare with MELSpace:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# \copy (SELECT (CASE when metadata_schema_id=1 THEN 'dc' WHEN metadata_schema_id=2 THEN 'cg' END) AS schema, element, qualifier, scope_note FROM metadatafieldregistry where metadata_schema_id IN (1,2)) TO /tmp/cgspace-schema.csv WITH CSV HEADER;
|
||||
<pre tabindex="0"><code>dspace=# \copy (SELECT (CASE when metadata_schema_id=1 THEN 'dc' WHEN metadata_schema_id=2 THEN 'cg' END) AS schema, element, qualifier, scope_note FROM metadatafieldregistry where metadata_schema_id IN (1,2)) TO /tmp/cgspace-schema.csv WITH CSV HEADER;
|
||||
</code></pre><ul>
|
||||
<li>Talking to the CodeObia guys about the REST API I started to wonder why it’s so slow and how I can quantify it in order to ask the dspace-tech mailing list for help profiling it</li>
|
||||
<li>Interestingly, the speed doesn’t get better after you request the same thing multiple times–it’s consistently bad on both CGSpace and DSpace Test!</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ time http --print h 'https://cgspace.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
|
||||
<pre tabindex="0"><code>$ time http --print h 'https://cgspace.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
|
||||
...
|
||||
0.35s user 0.06s system 1% cpu 25.133 total
|
||||
0.31s user 0.04s system 1% cpu 25.223 total
|
||||
@ -377,7 +377,7 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser
|
||||
0.20s user 0.05s system 1% cpu 23.838 total
|
||||
0.30s user 0.05s system 1% cpu 24.301 total
|
||||
|
||||
$ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
|
||||
$ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
|
||||
...
|
||||
0.22s user 0.03s system 1% cpu 17.248 total
|
||||
0.23s user 0.02s system 1% cpu 16.856 total
|
||||
@ -389,7 +389,7 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,b
|
||||
<li>I wonder if the Java garbage collector is important here, or if there are missing indexes in PostgreSQL?</li>
|
||||
<li>I switched DSpace Test to the G1GC garbage collector and tried again and now the results are worse!</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
|
||||
<pre tabindex="0"><code>$ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
|
||||
...
|
||||
0.20s user 0.03s system 0% cpu 25.017 total
|
||||
0.23s user 0.02s system 1% cpu 23.299 total
|
||||
@ -399,7 +399,7 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,b
|
||||
</code></pre><ul>
|
||||
<li>If I make a request without the expands it is ten time faster:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ time http --print h 'https://dspacetest.cgiar.org/rest/items?limit=100&offset=0'
|
||||
<pre tabindex="0"><code>$ time http --print h 'https://dspacetest.cgiar.org/rest/items?limit=100&offset=0'
|
||||
...
|
||||
0.20s user 0.03s system 7% cpu 3.098 total
|
||||
0.22s user 0.03s system 8% cpu 2.896 total
|
||||
@ -414,29 +414,29 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,b
|
||||
<li>Most of the are from Bioversity, and I asked Maria for permission before updating them</li>
|
||||
<li>I manually went through and looked at the existing values and updated them in several batches:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%CC BY %';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-ND%' AND text_value LIKE '%by-nc-nd%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-SA-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-SA%' AND text_value LIKE '%by-nc-sa%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value LIKE '%/by/%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%/by/%' AND text_value NOT LIKE '%zero%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-2.5' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE
|
||||
'%/by-nc%' AND text_value LIKE '%2.5%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%/by-nc%' AND text_value LIKE '%4.0%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%zero%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-SA-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value LIKE '%4.0%' AND text_value LIKE '%Attribution-NonCommercial-ShareAlike%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution-NonCommercial %';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution-NonCommercial %';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution %';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78184;
|
||||
UPDATE metadatavalue SET text_value='CC-BY' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value NOT LIKE '%CC0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%CC-%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78564;
|
||||
<pre tabindex="0"><code>UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%CC BY %';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-ND%' AND text_value LIKE '%by-nc-nd%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-SA-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-SA%' AND text_value LIKE '%by-nc-sa%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value LIKE '%/by/%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%/by/%' AND text_value NOT LIKE '%zero%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-2.5' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE
|
||||
'%/by-nc%' AND text_value LIKE '%2.5%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%/by-nc%' AND text_value LIKE '%4.0%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%zero%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-SA-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value LIKE '%4.0%' AND text_value LIKE '%Attribution-NonCommercial-ShareAlike%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution-NonCommercial %';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution-NonCommercial %';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution %';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78184;
|
||||
UPDATE metadatavalue SET text_value='CC-BY' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value NOT LIKE '%CC0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%CC-%';
|
||||
UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78564;
|
||||
</code></pre><ul>
|
||||
<li>I updated the fields on CGSpace and then started a re-index of Discovery</li>
|
||||
<li>We also need to re-think the <code>dc.rights</code> field in the submission form: we should probably use a popup controlled vocabulary and list the Creative Commons values with version numbers and allow the user to enter their own (like the ORCID identifier field)</li>
|
||||
<li>Ask Jane if we can use some of the BDP money to host AReS explorer on a more powerful server</li>
|
||||
<li>IWMI sent me a list of new ORCID identifiers for their staff so I combined them with our list, updated the names with my <a href="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script, and regenerated the controlled vocabulary:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json MEL\ ORCID_V2.json 2018-10-17-IWMI-ORCIDs.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq >
|
||||
<pre tabindex="0"><code>$ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml MEL\ ORCID.json MEL\ ORCID_V2.json 2018-10-17-IWMI-ORCIDs.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq >
|
||||
2018-10-17-orcids.txt
|
||||
$ ./resolve-orcids.py -i 2018-10-17-orcids.txt -o 2018-10-17-names.txt -d
|
||||
$ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml
|
||||
@ -458,7 +458,7 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
|
||||
<li>I upgraded PostgreSQL to 9.6 on DSpace Test using Ansible, then had to manually <a href="https://wiki.postgresql.org/wiki/Using_pg_upgrade_on_Ubuntu/Debian">migrate from 9.5 to 9.6</a>:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># su - postgres
|
||||
$ /usr/lib/postgresql/9.6/bin/pg_upgrade -b /usr/lib/postgresql/9.5/bin -B /usr/lib/postgresql/9.6/bin -d /var/lib/postgresql/9.5/main -D /var/lib/postgresql/9.6/main -o ' -c config_file=/etc/postgresql/9.5/main/postgresql.conf' -O ' -c config_file=/etc/postgresql/9.6/main/postgresql.conf'
|
||||
$ /usr/lib/postgresql/9.6/bin/pg_upgrade -b /usr/lib/postgresql/9.5/bin -B /usr/lib/postgresql/9.6/bin -d /var/lib/postgresql/9.5/main -D /var/lib/postgresql/9.6/main -o ' -c config_file=/etc/postgresql/9.5/main/postgresql.conf' -O ' -c config_file=/etc/postgresql/9.6/main/postgresql.conf'
|
||||
$ exit
|
||||
# systemctl start postgresql
|
||||
# dpkg -r postgresql-9.5 postgresql-client-9.5 postgresql-contrib-9.5
|
||||
@ -468,7 +468,7 @@ $ exit
|
||||
<li>Linode emailed me to say that CGSpace (linode18) had high CPU usage for a few hours this afternoon</li>
|
||||
<li>Looking at the nginx logs around that time I see the following IPs making the most requests:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Oct/2018:(12|13|14|15)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "19/Oct/2018:(12|13|14|15)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
361 207.46.13.179
|
||||
395 181.115.248.74
|
||||
485 66.249.64.93
|
||||
@ -491,14 +491,14 @@ $ exit
|
||||
$ sudo docker run --name my_solr -v ~/dspace/solr/statistics/conf:/tmp/conf -d -p 8983:8983 -t solr:5
|
||||
$ sudo docker logs my_solr
|
||||
...
|
||||
ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics] Caused by: solr.IntField
|
||||
ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics] Caused by: solr.IntField
|
||||
</code></pre><ul>
|
||||
<li>Apparently a bunch of variable types were removed in <a href="https://issues.apache.org/jira/browse/SOLR-5936">Solr 5</a></li>
|
||||
<li>So for now it’s actually a huge pain in the ass to run the tests for my dspace-statistics-api</li>
|
||||
<li>Linode sent a message that the CPU usage was high on CGSpace (linode18) last night</li>
|
||||
<li>According to the nginx logs around that time it was 5.9.6.51 (MegaIndex) again:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "20/Oct/2018:(14|15|16)" | awk '{print $1}' | sort
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "20/Oct/2018:(14|15|16)" | awk '{print $1}' | sort
|
||||
| uniq -c | sort -n | tail -n 10
|
||||
249 207.46.13.179
|
||||
250 157.55.39.173
|
||||
@ -520,12 +520,12 @@ ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics]
|
||||
/var/log/nginx/oai.log:0
|
||||
/var/log/nginx/rest.log:0
|
||||
/var/log/nginx/statistics.log:0
|
||||
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-10-20 | sort | uniq
|
||||
# grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=5.9.6.51' dspace.log.2018-10-20 | sort | uniq
|
||||
8915
|
||||
</code></pre><ul>
|
||||
<li>Last month I added “crawl” to the Tomcat Crawler Session Manager Valve’s regular expression matching, and it seems to be working for MegaIndex’s user agent:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/1' User-Agent:'"Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)"'
|
||||
<pre tabindex="0"><code>$ http --print Hh 'https://dspacetest.cgiar.org/handle/10568/1' User-Agent:'"Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)"'
|
||||
</code></pre><ul>
|
||||
<li>So I’m not sure why this bot uses so many sessions — is it because it requests very slowly?</li>
|
||||
</ul>
|
||||
@ -539,7 +539,7 @@ ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics]
|
||||
<li>Change <code>build.properties</code> to use HTTPS for Handles in our <a href="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a></li>
|
||||
<li>We will still need to do a batch update of the <code>dc.identifier.uri</code> and other fields in the database:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# UPDATE metadatavalue SET text_value=replace(text_value, 'http://', 'https://') WHERE resource_type_id=2 AND text_value LIKE 'http://hdl.handle.net%';
|
||||
<pre tabindex="0"><code>dspace=# UPDATE metadatavalue SET text_value=replace(text_value, 'http://', 'https://') WHERE resource_type_id=2 AND text_value LIKE 'http://hdl.handle.net%';
|
||||
</code></pre><ul>
|
||||
<li>While I was doing that I found two items using CGSpace URLs instead of handles in their <code>dc.identifier.uri</code> so I corrected those</li>
|
||||
<li>I also found several items that had invalid characters or multiple Handles in some related URL field like <code>cg.link.reference</code> so I corrected those too</li>
|
||||
@ -547,7 +547,7 @@ ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics]
|
||||
<li>I deployed the changes on CGSpace, ran all system updates, and rebooted the server</li>
|
||||
<li>Also, I updated all Handles in the database to use HTTPS:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# UPDATE metadatavalue SET text_value=replace(text_value, 'http://', 'https://') WHERE resource_type_id=2 AND text_value LIKE 'http://hdl.handle.net%';
|
||||
<pre tabindex="0"><code>dspace=# UPDATE metadatavalue SET text_value=replace(text_value, 'http://', 'https://') WHERE resource_type_id=2 AND text_value LIKE 'http://hdl.handle.net%';
|
||||
UPDATE 76608
|
||||
</code></pre><ul>
|
||||
<li>Skype with Peter about ToRs for the AReS open source work and future plans to develop tools around the DSpace ecosystem</li>
|
||||
@ -560,20 +560,20 @@ UPDATE 76608
|
||||
<li>I emailed the MARLO guys to ask if they can send us a dump of rights data and Handles from their system so we can tag our older items on CGSpace</li>
|
||||
<li>Testing REST login and logout via httpie because Felix from Earlham says he’s having issues:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ http --print b POST 'https://dspacetest.cgiar.org/rest/login' email='testdeposit@cgiar.org' password=deposit
|
||||
<pre tabindex="0"><code>$ http --print b POST 'https://dspacetest.cgiar.org/rest/login' email='testdeposit@cgiar.org' password=deposit
|
||||
acef8a4a-41f3-4392-b870-e873790f696b
|
||||
|
||||
$ http POST 'https://dspacetest.cgiar.org/rest/logout' rest-dspace-token:acef8a4a-41f3-4392-b870-e873790f696b
|
||||
$ http POST 'https://dspacetest.cgiar.org/rest/logout' rest-dspace-token:acef8a4a-41f3-4392-b870-e873790f696b
|
||||
</code></pre><ul>
|
||||
<li>Also works via curl (login, check status, logout, check status):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ curl -H "Content-Type: application/json" --data '{"email":"testdeposit@cgiar.org", "password":"deposit"}' https://dspacetest.cgiar.org/rest/login
|
||||
<pre tabindex="0"><code>$ curl -H "Content-Type: application/json" --data '{"email":"testdeposit@cgiar.org", "password":"deposit"}' https://dspacetest.cgiar.org/rest/login
|
||||
e09fb5e1-72b0-4811-a2e5-5c1cd78293cc
|
||||
$ curl -X GET -H "Content-Type: application/json" -H "Accept: application/json" -H "rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc" https://dspacetest.cgiar.org/rest/status
|
||||
{"okay":true,"authenticated":true,"email":"testdeposit@cgiar.org","fullname":"Test deposit","token":"e09fb5e1-72b0-4811-a2e5-5c1cd78293cc"}
|
||||
$ curl -X POST -H "Content-Type: application/json" -H "rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc" https://dspacetest.cgiar.org/rest/logout
|
||||
$ curl -X GET -H "Content-Type: application/json" -H "Accept: application/json" -H "rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc" https://dspacetest.cgiar.org/rest/status
|
||||
{"okay":true,"authenticated":false,"email":null,"fullname":null,"token":null}%
|
||||
$ curl -X GET -H "Content-Type: application/json" -H "Accept: application/json" -H "rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc" https://dspacetest.cgiar.org/rest/status
|
||||
{"okay":true,"authenticated":true,"email":"testdeposit@cgiar.org","fullname":"Test deposit","token":"e09fb5e1-72b0-4811-a2e5-5c1cd78293cc"}
|
||||
$ curl -X POST -H "Content-Type: application/json" -H "rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc" https://dspacetest.cgiar.org/rest/logout
|
||||
$ curl -X GET -H "Content-Type: application/json" -H "Accept: application/json" -H "rest-dspace-token: e09fb5e1-72b0-4811-a2e5-5c1cd78293cc" https://dspacetest.cgiar.org/rest/status
|
||||
{"okay":true,"authenticated":false,"email":null,"fullname":null,"token":null}%
|
||||
</code></pre><ul>
|
||||
<li>Improve the documentatin of my <a href="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a></li>
|
||||
<li>Email Modi and Jayashree from ICRISAT to ask if they want to join CGSpace as partners</li>
|
||||
|
Reference in New Issue
Block a user