mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-03-04
This commit is contained in:
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2022-03-01T17:17:27+03:00" />
|
||||
<meta property="og:updated_time" content="2022-03-01T17:48:40+03:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -206,17 +206,17 @@
|
||||
<li>I had a call with CodeObia to discuss the work on OpenRXV</li>
|
||||
<li>Check the results of the AReS harvesting from last night:</li>
|
||||
</ul>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-console" data-lang="console">$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'</span>
|
||||
{
|
||||
"count" : 100875,
|
||||
"_shards" : {
|
||||
"total" : 1,
|
||||
"successful" : 1,
|
||||
"skipped" : 0,
|
||||
"failed" : 0
|
||||
}
|
||||
}
|
||||
</code></pre></div>
|
||||
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl -s <span style="color:#e6db74">'http://localhost:9200/openrxv-items-temp/_count?q=*&pretty'</span>
|
||||
</span></span><span style="display:flex;"><span>{
|
||||
</span></span><span style="display:flex;"><span> "count" : 100875,
|
||||
</span></span><span style="display:flex;"><span> "_shards" : {
|
||||
</span></span><span style="display:flex;"><span> "total" : 1,
|
||||
</span></span><span style="display:flex;"><span> "successful" : 1,
|
||||
</span></span><span style="display:flex;"><span> "skipped" : 0,
|
||||
</span></span><span style="display:flex;"><span> "failed" : 0
|
||||
</span></span><span style="display:flex;"><span> }
|
||||
</span></span><span style="display:flex;"><span>}
|
||||
</span></span></code></pre></div>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2021-02/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
|
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2022-03-01T17:17:27+03:00" />
|
||||
<meta property="og:updated_time" content="2022-03-01T17:48:40+03:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
|
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2022-03-01T17:17:27+03:00" />
|
||||
<meta property="og:updated_time" content="2022-03-01T17:48:40+03:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -98,17 +98,17 @@
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*access.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
|
||||
4671942
|
||||
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
|
||||
# zcat --force /var/log/nginx/{rest,oai,statistics}.log.*.gz | grep -cE "[0-9]{1,2}/Oct/2019"
|
||||
1277694
|
||||
</code></pre><ul>
|
||||
<li>So 4.6 million from XMLUI and another 1.2 million from API requests</li>
|
||||
<li>Let’s see how many of the REST API requests were for bitstreams (because they are counted in Solr stats):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E "[0-9]{1,2}/Oct/2019"
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/rest.log.*.gz | grep -c -E "[0-9]{1,2}/Oct/2019"
|
||||
1183456
|
||||
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
|
||||
# zcat --force /var/log/nginx/rest.log.*.gz | grep -E "[0-9]{1,2}/Oct/2019" | grep -c -E "/rest/bitstreams"
|
||||
106781
|
||||
</code></pre>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-11/'>Read more →</a>
|
||||
@ -128,7 +128,7 @@
|
||||
|
||||
</p>
|
||||
</header>
|
||||
2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script’s “unneccesary Unicode” fix: $ csvcut -c 'id,dc.
|
||||
2019-10-01 Udana from IWMI asked me for a CSV export of their community on CGSpace I exported it, but a quick run through the csv-metadata-quality tool shows that there are some low-hanging fruits we can fix before I send him the data I will limit the scope to the titles, regions, subregions, and river basins for now to manually fix some non-breaking spaces (U+00A0) there that would otherwise be removed by the csv-metadata-quality script’s “unneccesary Unicode” fix: $ csvcut -c 'id,dc.
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-10/'>Read more →</a>
|
||||
</article>
|
||||
|
||||
@ -151,7 +151,7 @@
|
||||
<li>Linode emailed to say that CGSpace (linode18) had a high rate of outbound traffic for several hours this morning</li>
|
||||
<li>Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
440 17.58.101.255
|
||||
441 157.55.39.101
|
||||
485 207.46.13.43
|
||||
@ -162,7 +162,7 @@
|
||||
814 207.46.13.212
|
||||
2472 163.172.71.23
|
||||
6092 3.94.211.189
|
||||
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
# zcat --force /var/log/nginx/rest.log /var/log/nginx/rest.log.1 /var/log/nginx/oai.log /var/log/nginx/oai.log.1 | grep -E "01/Sep/2019:0" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
33 2a01:7e00::f03c:91ff:fe16:fcb
|
||||
57 3.83.192.124
|
||||
57 3.87.77.25
|
||||
@ -323,16 +323,16 @@ DELETE 1
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep 'Spore-192-EN-web.pdf' | grep -E '(18.196.196.108|18.195.78.144|18.195.218.6)' | awk '{print $9}' | sort | uniq -c | sort -n | tail -n 5
|
||||
4432 200
|
||||
</code></pre><ul>
|
||||
<li>In the last two weeks there have been 47,000 downloads of this <em>same exact PDF</em> by these three IP addresses</li>
|
||||
<li>Apply country and region corrections and deletions on DSpace Test and CGSpace:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
|
||||
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-9-countries.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.country -m 228 -t ACTION -d
|
||||
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
|
||||
</code></pre>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2019-04/'>Read more →</a>
|
||||
</article>
|
||||
@ -388,7 +388,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
<li>Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!</li>
|
||||
<li>The top IPs before, during, and after this latest alert tonight were:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "01/Feb/2019:(17|18|19|20|21)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
245 207.46.13.5
|
||||
332 54.70.40.11
|
||||
385 5.143.231.38
|
||||
@ -404,7 +404,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
|
||||
<li>The Solr statistics the past few months have been very high and I was wondering if the web server logs also showed an increase</li>
|
||||
<li>There were just over 3 million accesses in the nginx logs last month:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
|
||||
<pre tabindex="0"><code># time zcat --force /var/log/nginx/* | grep -cE "[0-9]{1,2}/Jan/2019"
|
||||
3018243
|
||||
|
||||
real 0m19.873s
|
||||
|
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2022-03-01T17:17:27+03:00" />
|
||||
<meta property="og:updated_time" content="2022-03-01T17:48:40+03:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -95,7 +95,7 @@
|
||||
<li>Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning</li>
|
||||
<li>I don’t see anything interesting in the web server logs around that time though:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/*.log /var/log/nginx/*.log.1 | grep -E "02/Jan/2019:0(1|2|3)" | awk '{print $1}' | sort | uniq -c | sort -n | tail -n 10
|
||||
92 40.77.167.4
|
||||
99 210.7.29.100
|
||||
120 38.126.157.45
|
||||
@ -293,7 +293,7 @@
|
||||
<li>I added the new CCAFS Phase II Project Tag <code>PII-FP1_PACCA2</code> and merged it into the <code>5_x-prod</code> branch (<a href="https://github.com/ilri/DSpace/pull/379">#379</a>)</li>
|
||||
<li>I proofed and tested the ILRI author corrections that Peter sent back to me this week:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2018-05-30-Correct-660-authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -t correct -m 3 -n
|
||||
</code></pre><ul>
|
||||
<li>I think a sane proofing workflow in OpenRefine is to apply the custom text facets for check/delete/remove and illegal characters that I developed in <a href="/cgspace-notes/2018-03/">March, 2018</a></li>
|
||||
<li>Time to index ~70,000 items on CGSpace:</li>
|
||||
|
@ -10,14 +10,14 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2022-03-01T17:17:27+03:00" />
|
||||
<meta property="og:updated_time" content="2022-03-01T17:48:40+03:00" />
|
||||
|
||||
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -151,11 +151,11 @@
|
||||
<li>I notice this error quite a few times in dspace.log:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>2018-01-02 01:21:19,137 ERROR org.dspace.app.xmlui.aspect.discovery.SidebarFacetsTransformer @ Error while searching for sidebar facets
|
||||
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
|
||||
org.dspace.discovery.SearchServiceException: org.apache.solr.search.SyntaxError: Cannot parse 'dateIssued_keyword:[1976+TO+1979]': Encountered " "]" "] "" at line 1, column 32.
|
||||
</code></pre><ul>
|
||||
<li>And there are many of these errors every day for the past month:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
|
||||
<pre tabindex="0"><code>$ grep -c "Error while searching for sidebar facets" dspace.log.*
|
||||
dspace.log.2017-11-21:4
|
||||
dspace.log.2017-11-22:1
|
||||
dspace.log.2017-11-23:4
|
||||
@ -251,12 +251,12 @@ dspace.log.2018-01-02:34
|
||||
<ul>
|
||||
<li>Today there have been no hits by CORE and no alerts from Linode (coincidence?)</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># grep -c "CORE" /var/log/nginx/access.log
|
||||
<pre tabindex="0"><code># grep -c "CORE" /var/log/nginx/access.log
|
||||
0
|
||||
</code></pre><ul>
|
||||
<li>Generate list of authors on CGSpace for Peter to go through and correct:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
|
||||
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
|
||||
COPY 54701
|
||||
</code></pre>
|
||||
<a href='https://alanorth.github.io/cgspace-notes/2017-11/'>Read more →</a>
|
||||
|
Reference in New Issue
Block a user