mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-03-04
This commit is contained in:
@ -48,7 +48,7 @@ DELETE 1
|
||||
|
||||
But after this I tried to delete the item from the XMLUI and it is still present…
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -168,7 +168,7 @@ dspace=# DELETE FROM item WHERE item_id=74648;
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ curl -f -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": "en_US"}'
|
||||
<pre tabindex="0"><code>$ curl -f -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key":"cg.subject.cpwf", "value":"WATER MANAGEMENT","language": "en_US"}'
|
||||
curl: (22) The requested URL returned error: 401 Unauthorized
|
||||
</code></pre><ul>
|
||||
<li>The DSpace log shows the item ID (because I modified the error text):</li>
|
||||
@ -282,52 +282,52 @@ Please see the DSpace documentation for assistance.
|
||||
<ul>
|
||||
<li>The number of unique sessions today is <em>ridiculously</em> high compared to the last few days considering it’s only 12:30PM right now:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-06 | sort | uniq | wc -l
|
||||
<pre tabindex="0"><code>$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-06 | sort | uniq | wc -l
|
||||
101108
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-05 | sort | uniq | wc -l
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-05 | sort | uniq | wc -l
|
||||
14618
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-04 | sort | uniq | wc -l
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-04 | sort | uniq | wc -l
|
||||
14946
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-03 | sort | uniq | wc -l
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-03 | sort | uniq | wc -l
|
||||
6410
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-02 | sort | uniq | wc -l
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-02 | sort | uniq | wc -l
|
||||
7758
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-01 | sort | uniq | wc -l
|
||||
$ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-05-01 | sort | uniq | wc -l
|
||||
20528
|
||||
</code></pre><ul>
|
||||
<li>The number of unique IP addresses from 2 to 6 AM this morning is already several times higher than the average for that time of the morning this past week:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
7127
|
||||
# zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E '05/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
# zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E '05/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
1231
|
||||
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E '04/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E '04/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
1255
|
||||
# zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz | grep -E '03/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
# zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz | grep -E '03/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
1736
|
||||
# zcat --force /var/log/nginx/access.log.4.gz /var/log/nginx/access.log.5.gz | grep -E '02/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
# zcat --force /var/log/nginx/access.log.4.gz /var/log/nginx/access.log.5.gz | grep -E '02/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
1573
|
||||
# zcat --force /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.6.gz | grep -E '01/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
# zcat --force /var/log/nginx/access.log.5.gz /var/log/nginx/access.log.6.gz | grep -E '01/May/2019:(02|03|04|05|06)' | awk '{print $1}' | sort | uniq | wc -l
|
||||
1410
|
||||
</code></pre><ul>
|
||||
<li>Just this morning between the hours of 2 and 6 the number of unique sessions was <em>very</em> high compared to previous mornings:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ cat dspace.log.2019-05-06 | grep -E '2019-05-06 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
<pre tabindex="0"><code>$ cat dspace.log.2019-05-06 | grep -E '2019-05-06 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
83650
|
||||
$ cat dspace.log.2019-05-05 | grep -E '2019-05-05 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
$ cat dspace.log.2019-05-05 | grep -E '2019-05-05 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
2547
|
||||
$ cat dspace.log.2019-05-04 | grep -E '2019-05-04 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
$ cat dspace.log.2019-05-04 | grep -E '2019-05-04 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
2574
|
||||
$ cat dspace.log.2019-05-03 | grep -E '2019-05-03 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
$ cat dspace.log.2019-05-03 | grep -E '2019-05-03 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
2911
|
||||
$ cat dspace.log.2019-05-02 | grep -E '2019-05-02 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
$ cat dspace.log.2019-05-02 | grep -E '2019-05-02 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
2704
|
||||
$ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
$ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
3699
|
||||
</code></pre><ul>
|
||||
<li>Most of the requests were GETs:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/{access,library-access}.log /var/log/nginx/{access,library-access}.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E "(GET|HEAD|POST|PUT)" | sort | uniq -c | sort -n
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/{access,library-access}.log /var/log/nginx/{access,library-access}.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E "(GET|HEAD|POST|PUT)" | sort | uniq -c | sort -n
|
||||
1 PUT
|
||||
98 POST
|
||||
2845 HEAD
|
||||
@ -336,19 +336,19 @@ $ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -
|
||||
<li>I’m not exactly sure what happened this morning, but it looks like some legitimate user traffic—perhaps someone launched a new publication and it got a bunch of hits?</li>
|
||||
<li>Looking again, I see 84,000 requests to <code>/handle</code> this morning (not including logs for library.cgiar.org because those get HTTP 301 redirect to CGSpace and appear here in <code>access.log</code>):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -c -o -E " /handle/[0-9]+/[0-9]+"
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -c -o -E " /handle/[0-9]+/[0-9]+"
|
||||
84350
|
||||
</code></pre><ul>
|
||||
<li>But it would be difficult to find a pattern for those requests because they cover 78,000 <em>unique</em> Handles (ie direct browsing of items, collections, or communities) and only 2,492 discover/browse (total, not unique):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E " /handle/[0-9]+/[0-9]+ HTTP" | sort | uniq | wc -l
|
||||
<pre tabindex="0"><code># cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E " /handle/[0-9]+/[0-9]+ HTTP" | sort | uniq | wc -l
|
||||
78104
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E " /handle/[0-9]+/[0-9]+/(discover|browse)" | wc -l
|
||||
# cat /var/log/nginx/access.log /var/log/nginx/access.log.1 | grep -E '06/May/2019:(02|03|04|05|06)' | grep -o -E " /handle/[0-9]+/[0-9]+/(discover|browse)" | wc -l
|
||||
2492
|
||||
</code></pre><ul>
|
||||
<li>In other news, I see some IP is making several requests per second to the exact same REST API endpoints, for example:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># grep /rest/handle/10568/3703?expand=all rest.log | awk '{print $1}' | sort | uniq -c
|
||||
<pre tabindex="0"><code># grep /rest/handle/10568/3703?expand=all rest.log | awk '{print $1}' | sort | uniq -c
|
||||
3 2a01:7e00::f03c:91ff:fe0a:d645
|
||||
113 63.32.242.35
|
||||
</code></pre><ul>
|
||||
@ -363,28 +363,28 @@ $ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -
|
||||
<ul>
|
||||
<li>The total number of unique IPs on CGSpace yesterday was almost 14,000, which is several thousand higher than previous day totals:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E '06/May/2019' | awk '{print $1}' | sort | uniq | wc -l
|
||||
<pre tabindex="0"><code># zcat --force /var/log/nginx/access.log.1 /var/log/nginx/access.log.2.gz | grep -E '06/May/2019' | awk '{print $1}' | sort | uniq | wc -l
|
||||
13969
|
||||
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E '05/May/2019' | awk '{print $1}' | sort | uniq | wc -l
|
||||
# zcat --force /var/log/nginx/access.log.2.gz /var/log/nginx/access.log.3.gz | grep -E '05/May/2019' | awk '{print $1}' | sort | uniq | wc -l
|
||||
5936
|
||||
# zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz | grep -E '04/May/2019' | awk '{print $1}' | sort | uniq | wc -l
|
||||
# zcat --force /var/log/nginx/access.log.3.gz /var/log/nginx/access.log.4.gz | grep -E '04/May/2019' | awk '{print $1}' | sort | uniq | wc -l
|
||||
6229
|
||||
# zcat --force /var/log/nginx/access.log.4.gz /var/log/nginx/access.log.5.gz | grep -E '03/May/2019' | awk '{print $1}' | sort | uniq | wc -l
|
||||
# zcat --force /var/log/nginx/access.log.4.gz /var/log/nginx/access.log.5.gz | grep -E '03/May/2019' | awk '{print $1}' | sort | uniq | wc -l
|
||||
8051
|
||||
</code></pre><ul>
|
||||
<li>Total number of sessions yesterday was <em>much</em> higher compared to days last week:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ cat dspace.log.2019-05-06 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
<pre tabindex="0"><code>$ cat dspace.log.2019-05-06 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
144160
|
||||
$ cat dspace.log.2019-05-05 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
$ cat dspace.log.2019-05-05 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
57269
|
||||
$ cat dspace.log.2019-05-04 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
$ cat dspace.log.2019-05-04 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
58648
|
||||
$ cat dspace.log.2019-05-03 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
$ cat dspace.log.2019-05-03 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
27883
|
||||
$ cat dspace.log.2019-05-02 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
$ cat dspace.log.2019-05-02 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
26996
|
||||
$ cat dspace.log.2019-05-01 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
$ cat dspace.log.2019-05-01 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
61866
|
||||
</code></pre><ul>
|
||||
<li>The usage statistics seem to agree that yesterday was crazy:</li>
|
||||
@ -423,9 +423,9 @@ Please see the DSpace documentation for assistance.
|
||||
<li>Help Moayad with certbot-auto for Let’s Encrypt scripts on the new AReS server (linode20)</li>
|
||||
<li>Normalize all <code>text_lang</code> values for metadata on CGSpace and DSpace Test (as I had tested last month):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('ethnob', 'en', '*', 'E.', '');
|
||||
UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IS NULL;
|
||||
UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('es', 'spa');
|
||||
<pre tabindex="0"><code>UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('ethnob', 'en', '*', 'E.', '');
|
||||
UPDATE metadatavalue SET text_lang='en_US' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IS NULL;
|
||||
UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata_field_id != 28 AND text_lang IN ('es', 'spa');
|
||||
</code></pre><ul>
|
||||
<li>Send Francesca Giampieri from Bioversity a CSV export of all their items issued in 2018
|
||||
<ul>
|
||||
@ -454,7 +454,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
|
||||
</li>
|
||||
<li>All of the IPs from these networks are using generic user agents like this, but MANY more, and they change many times:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2703.0 Safari/537.36"
|
||||
<pre tabindex="0"><code>"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2703.0 Safari/537.36"
|
||||
</code></pre><ul>
|
||||
<li>I found a <a href="https://www.qurium.org/alerts/azerbaijan/azerbaijan-and-the-region40-ddos-service/">blog post from 2018 detailing an attack from a DDoS service</a> that matches our pattern exactly</li>
|
||||
<li>They specifically mention:</li>
|
||||
@ -473,7 +473,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
|
||||
<ul>
|
||||
<li>I see that the Unpaywall bot is resonsible for a few thousand XMLUI sessions every day (IP addresses come from nginx access.log):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ cat dspace.log.2019-05-11 | grep -E 'ip_addr=(100.26.206.188|100.27.19.233|107.22.98.199|174.129.156.41|18.205.243.110|18.205.245.200|18.207.176.164|18.207.209.186|18.212.126.89|18.212.5.59|18.213.4.150|18.232.120.6|18.234.180.224|18.234.81.13|3.208.23.222|34.201.121.183|34.201.241.214|34.201.39.122|34.203.188.39|34.207.197.154|34.207.232.63|34.207.91.147|34.224.86.47|34.227.205.181|34.228.220.218|34.229.223.120|35.171.160.166|35.175.175.202|3.80.201.39|3.81.120.70|3.81.43.53|3.84.152.19|3.85.113.253|3.85.237.139|3.85.56.100|3.87.23.95|3.87.248.240|3.87.250.3|3.87.62.129|3.88.13.9|3.88.57.237|3.89.71.15|3.90.17.242|3.90.68.247|3.91.44.91|3.92.138.47|3.94.250.180|52.200.78.128|52.201.223.200|52.90.114.186|52.90.48.73|54.145.91.243|54.160.246.228|54.165.66.180|54.166.219.216|54.166.238.172|54.167.89.152|54.174.94.223|54.196.18.211|54.198.234.175|54.208.8.172|54.224.146.147|54.234.169.91|54.235.29.216|54.237.196.147|54.242.68.231|54.82.6.96|54.87.12.181|54.89.217.141|54.89.234.182|54.90.81.216|54.91.104.162)' | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
<pre tabindex="0"><code>$ cat dspace.log.2019-05-11 | grep -E 'ip_addr=(100.26.206.188|100.27.19.233|107.22.98.199|174.129.156.41|18.205.243.110|18.205.245.200|18.207.176.164|18.207.209.186|18.212.126.89|18.212.5.59|18.213.4.150|18.232.120.6|18.234.180.224|18.234.81.13|3.208.23.222|34.201.121.183|34.201.241.214|34.201.39.122|34.203.188.39|34.207.197.154|34.207.232.63|34.207.91.147|34.224.86.47|34.227.205.181|34.228.220.218|34.229.223.120|35.171.160.166|35.175.175.202|3.80.201.39|3.81.120.70|3.81.43.53|3.84.152.19|3.85.113.253|3.85.237.139|3.85.56.100|3.87.23.95|3.87.248.240|3.87.250.3|3.87.62.129|3.88.13.9|3.88.57.237|3.89.71.15|3.90.17.242|3.90.68.247|3.91.44.91|3.92.138.47|3.94.250.180|52.200.78.128|52.201.223.200|52.90.114.186|52.90.48.73|54.145.91.243|54.160.246.228|54.165.66.180|54.166.219.216|54.166.238.172|54.167.89.152|54.174.94.223|54.196.18.211|54.198.234.175|54.208.8.172|54.224.146.147|54.234.169.91|54.235.29.216|54.237.196.147|54.242.68.231|54.82.6.96|54.87.12.181|54.89.217.141|54.89.234.182|54.90.81.216|54.91.104.162)' | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq | wc -l
|
||||
2206
|
||||
</code></pre><ul>
|
||||
<li>I added “Unpaywall” to the list of bots in the Tomcat Crawler Session Manager Valve</li>
|
||||
@ -519,20 +519,20 @@ COPY 995
|
||||
<li>Peter sent me a bunch of fixes for investors from yesterday</li>
|
||||
<li>I did a quick check in Open Refine (trim and collapse whitespace, clean smart quotes, etc) and then applied them on CGSpace:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-05-16-fix-306-Investors.csv -db dspace-u dspace-p 'fuuu' -f dc.description.sponsorship -m 29 -t correct -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-05-16-delete-297-Investors.csv -db dspace -u dspace -p 'fuuu' -m 29 -f dc.description.sponsorship -d
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-05-16-fix-306-Investors.csv -db dspace-u dspace-p 'fuuu' -f dc.description.sponsorship -m 29 -t correct -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-05-16-delete-297-Investors.csv -db dspace -u dspace -p 'fuuu' -m 29 -f dc.description.sponsorship -d
|
||||
</code></pre><ul>
|
||||
<li>Then I started a full Discovery re-indexing:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
</code></pre><ul>
|
||||
<li>I was going to make a new controlled vocabulary of the top 100 terms after these corrections, but I noticed a bunch of duplicates and variations when I sorted them alphabetically</li>
|
||||
<li>Instead, I exported a new list and asked Peter to look at it again</li>
|
||||
<li>Apply Peter’s new corrections on DSpace Test and CGSpace:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-05-17-fix-25-Investors.csv -db dspace -u dspace -p 'fuuu' -f dc.description.sponsorship -m 29 -t correct -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-05-17-delete-14-Investors.csv -db dspace -u dspace -p 'fuuu' -m 29 -f dc.description.sponsorship -d
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-05-17-fix-25-Investors.csv -db dspace -u dspace -p 'fuuu' -f dc.description.sponsorship -m 29 -t correct -d
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-05-17-delete-14-Investors.csv -db dspace -u dspace -p 'fuuu' -m 29 -f dc.description.sponsorship -d
|
||||
</code></pre><ul>
|
||||
<li>Then I re-exported the sponsors and took the top 100 to update the existing controlled vocabulary (<a href="https://github.com/ilri/DSpace/pull/423">#423</a>)
|
||||
<ul>
|
||||
@ -573,16 +573,16 @@ $ ./delete-metadata-values.py -i /tmp/2019-05-17-delete-14-Investors.csv -db dsp
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-05-27-fix-2472-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t corrections -d
|
||||
<pre tabindex="0"><code>$ ./fix-metadata-values.py -i /tmp/2019-05-27-fix-2472-Authors.csv -db dspace -u dspace -p 'fuuu' -f dc.contributor.author -m 3 -t corrections -d
|
||||
</code></pre><ul>
|
||||
<li>Then start a full Discovery re-indexing on each server:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||
$ time schedtool -B -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||
</code></pre><ul>
|
||||
<li>Export new list of all authors from CGSpace database to send to Peter:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-05-27-all-authors.csv with csv header;
|
||||
<pre tabindex="0"><code>dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-05-27-all-authors.csv with csv header;
|
||||
COPY 64871
|
||||
</code></pre><ul>
|
||||
<li>Run all system updates on DSpace Test (linode19) and reboot it</li>
|
||||
@ -609,7 +609,7 @@ COPY 64871
|
||||
</code></pre><ul>
|
||||
<li>For now I just created an eperson with her personal email address until I have time to check LDAP to see what’s up with her CGIAR account:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ dspace user -a -m blah@blah.com -g Sakshi -s Saini -p 'sknflksnfksnfdls'
|
||||
<pre tabindex="0"><code>$ dspace user -a -m blah@blah.com -g Sakshi -s Saini -p 'sknflksnfksnfdls'
|
||||
</code></pre><!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user