mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2022-03-04
This commit is contained in:
@ -32,7 +32,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two
|
||||
|
||||
Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.92.2" />
|
||||
<meta name="generator" content="Hugo 0.93.1" />
|
||||
|
||||
|
||||
|
||||
@ -130,7 +130,7 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account
|
||||
<ul>
|
||||
<li>Delete 58 blank metadata values from the CGSpace database:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
|
||||
<pre tabindex="0"><code>dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
|
||||
DELETE 58
|
||||
</code></pre><ul>
|
||||
<li>I also ran it on DSpace Test because we’ll be migrating the CGIAR Library soon and it would be good to catch these before we migrate</li>
|
||||
@ -145,7 +145,7 @@ DELETE 58
|
||||
<li>There will need to be some metadata updates — though if I recall correctly it is only about seven records — for that as well, I had made some notes about it in <a href="/cgspace-notes/2017-07">2017-07</a>, but I’ve asked for more clarification from Lili just in case</li>
|
||||
<li>Looking at the DSpace logs to see if we’ve had a change in the “Cannot get a connection” errors since last month when we adjusted the <code>db.maxconnections</code> parameter on CGSpace:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># grep -c "Cannot get a connection, pool error Timeout waiting for idle object" dspace.log.2017-09-*
|
||||
<pre tabindex="0"><code># grep -c "Cannot get a connection, pool error Timeout waiting for idle object" dspace.log.2017-09-*
|
||||
dspace.log.2017-09-01:0
|
||||
dspace.log.2017-09-02:0
|
||||
dspace.log.2017-09-03:9
|
||||
@ -174,7 +174,7 @@ dspace.log.2017-09-10:0
|
||||
<li>The import process takes the same amount of time with and without the caching</li>
|
||||
<li>Also, I captured TCP packets destined for port 80 and both imports only captured ONE packet (an update check from some component in Java):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x47455420'
|
||||
<pre tabindex="0"><code>$ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x47455420'
|
||||
</code></pre><ul>
|
||||
<li>Great TCP dump guide here: <a href="https://danielmiessler.com/study/tcpdump">https://danielmiessler.com/study/tcpdump</a></li>
|
||||
<li>The last part of that command filters for HTTP GET requests, of which there should have been many to fetch all the XSD files for validation</li>
|
||||
@ -204,7 +204,7 @@ dspace.log.2017-09-10:0
|
||||
<li>I wonder what was going on, and looking into the nginx logs I think maybe it’s OAI…</li>
|
||||
<li>Here is yesterday’s top ten IP addresses making requests to <code>/oai</code>:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># awk '{print $1}' /var/log/nginx/oai.log | sort -n | uniq -c | sort -h | tail -n 10
|
||||
<pre tabindex="0"><code># awk '{print $1}' /var/log/nginx/oai.log | sort -n | uniq -c | sort -h | tail -n 10
|
||||
1 213.136.89.78
|
||||
1 66.249.66.90
|
||||
1 66.249.66.92
|
||||
@ -217,7 +217,7 @@ dspace.log.2017-09-10:0
|
||||
</code></pre><ul>
|
||||
<li>Compared to the previous day’s logs it looks VERY high:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
|
||||
<pre tabindex="0"><code># awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
|
||||
1 207.46.13.39
|
||||
1 66.249.66.93
|
||||
2 66.249.66.91
|
||||
@ -234,9 +234,9 @@ dspace.log.2017-09-10:0
|
||||
</li>
|
||||
<li>And this user agent has never been seen before today (or at least recently!):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># grep -c "API scraper" /var/log/nginx/oai.log
|
||||
<pre tabindex="0"><code># grep -c "API scraper" /var/log/nginx/oai.log
|
||||
62088
|
||||
# zgrep -c "API scraper" /var/log/nginx/oai.log.*.gz
|
||||
# zgrep -c "API scraper" /var/log/nginx/oai.log.*.gz
|
||||
/var/log/nginx/oai.log.10.gz:0
|
||||
/var/log/nginx/oai.log.11.gz:0
|
||||
/var/log/nginx/oai.log.12.gz:0
|
||||
@ -270,7 +270,7 @@ dspace.log.2017-09-10:0
|
||||
<li>Some of these heavy users are also using XMLUI, and their user agent isn’t matched by the <a href="https://github.com/ilri/rmg-ansible-public/blob/master/roles/dspace/templates/tomcat/server-tomcat7.xml.j2#L158">Tomcat Session Crawler valve</a>, so each request uses a different session</li>
|
||||
<li>Yesterday alone the IP addresses using the <code>API scraper</code> user agent were responsible for 16,000 sessions in XMLUI:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code># grep -a -E "(54.70.51.7|35.161.215.53|34.211.17.113|54.70.175.86)" /home/cgspace.cgiar.org/log/dspace.log.2017-09-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
<pre tabindex="0"><code># grep -a -E "(54.70.51.7|35.161.215.53|34.211.17.113|54.70.175.86)" /home/cgspace.cgiar.org/log/dspace.log.2017-09-12 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | uniq | wc -l
|
||||
15924
|
||||
</code></pre><ul>
|
||||
<li>If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex</li>
|
||||
@ -282,7 +282,7 @@ dspace.log.2017-09-10:0
|
||||
<li>Looking at the spreadsheet with deletions and corrections that CCAFS sent last week</li>
|
||||
<li>It appears they want to delete a lot of metadata, which I’m not sure they realize the implications of:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# select text_value, count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange') group by text_value;
|
||||
<pre tabindex="0"><code>dspace=# select text_value, count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange') group by text_value;
|
||||
text_value | count
|
||||
--------------------------+-------
|
||||
FP4_ClimateModels | 6
|
||||
@ -309,18 +309,18 @@ dspace.log.2017-09-10:0
|
||||
<li>I sent CCAFS people an email to ask if they really want to remove these 200+ tags</li>
|
||||
<li>She responded yes, so I’ll at least need to do these deletes in PostgreSQL:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
|
||||
<pre tabindex="0"><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
|
||||
DELETE 207
|
||||
</code></pre><ul>
|
||||
<li>When we discussed this in late July there were some other renames they had requested, but I don’t see them in the current spreadsheet so I will have to follow that up</li>
|
||||
<li>I talked to Macaroni Bros and they said to just go ahead with the other corrections as well as their spreadsheet was evolved organically rather than systematically!</li>
|
||||
<li>The final list of corrections and deletes should therefore be:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
|
||||
update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
|
||||
update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
|
||||
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
|
||||
delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
|
||||
<pre tabindex="0"><code>delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-FP4_CRMWestAfrica';
|
||||
update metadatavalue set text_value='FP3_VietnamLED' where resource_type_id=2 and metadata_field_id=134 and text_value='FP3_VeitnamLED';
|
||||
update metadatavalue set text_value='PII-FP1_PIRCCA' where resource_type_id=2 and metadata_field_id=235 and text_value='PII-SEA_PIRCCA';
|
||||
delete from metadatavalue where resource_type_id=2 and metadata_field_id=235 and text_value='PII-WA_IntegratedInterventions';
|
||||
delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
|
||||
</code></pre><ul>
|
||||
<li>Create and merge pull request to shut up the Ehcache update check (<a href="https://github.com/ilri/DSpace/pull/337">#337</a>)</li>
|
||||
<li>Although it looks like there was a previous attempt to disable these update checks that was merged in DSpace 4.0 (although it only affects XMLUI): <a href="https://jira.duraspace.org/browse/DS-1492">https://jira.duraspace.org/browse/DS-1492</a></li>
|
||||
@ -332,7 +332,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
|
||||
<li>Testing to see how we end up with all these new authorities after we keep cleaning and merging them in the database</li>
|
||||
<li>Here are all my distinct authority combinations in the database before:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
@ -347,7 +347,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
|
||||
</code></pre><ul>
|
||||
<li>And then after adding a new item and selecting an existing “Orth, Alan” with an ORCID in the author lookup:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
@ -363,7 +363,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
|
||||
</code></pre><ul>
|
||||
<li>It created a new authority… let’s try to add another item and select the same existing author and see what happens in the database:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
@ -379,7 +379,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
|
||||
</code></pre><ul>
|
||||
<li>No new one… so now let me try to add another item and select the italicized result from the ORCID lookup and see what happens in the database:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
@ -396,7 +396,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
|
||||
</code></pre><ul>
|
||||
<li>Shit, it created another authority! Let’s try it again!</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like '%Orth, %';
|
||||
text_value | authority | confidence
|
||||
------------+--------------------------------------+------------
|
||||
Orth, Alan | 7c2bffb8-58c9-4bc8-b102-ebe8aec200ad | -1
|
||||
@ -439,19 +439,19 @@ DELETE 207
|
||||
<li>We still need to do the changes to <code>config.dct</code> and regenerate the <code>sitebndl.zip</code> to send to the Handle.net admins</li>
|
||||
<li>According to this <a href="http://dspace.2283337.n4.nabble.com/Multiple-handle-prefixes-merged-DSpace-instances-td3427192.html">dspace-tech mailing list entry from 2011</a>, we need to add the extra handle prefixes to <code>config.dct</code> like this:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>"server_admins" = (
|
||||
"300:0.NA/10568"
|
||||
"300:0.NA/10947"
|
||||
<pre tabindex="0"><code>"server_admins" = (
|
||||
"300:0.NA/10568"
|
||||
"300:0.NA/10947"
|
||||
)
|
||||
|
||||
"replication_admins" = (
|
||||
"300:0.NA/10568"
|
||||
"300:0.NA/10947"
|
||||
"replication_admins" = (
|
||||
"300:0.NA/10568"
|
||||
"300:0.NA/10947"
|
||||
)
|
||||
|
||||
"backup_admins" = (
|
||||
"300:0.NA/10568"
|
||||
"300:0.NA/10947"
|
||||
"backup_admins" = (
|
||||
"300:0.NA/10568"
|
||||
"300:0.NA/10947"
|
||||
)
|
||||
</code></pre><ul>
|
||||
<li>More work on the CGIAR Library migration test run locally, as I was having problem with importing the last fourteen items from the CGIAR System Management Office community</li>
|
||||
@ -494,7 +494,7 @@ DELETE 207
|
||||
<li>Abenet and I noticed that hdl.handle.net is blocked by ETC at ILRI Addis so I asked Biruk Debebe to route it over the satellite</li>
|
||||
<li>Force thumbnail regeneration for the CGIAR System Organization’s Historic Archive community (2000 items):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -f -i 10947/1 -p "ImageMagick PDF Thumbnail"
|
||||
<pre tabindex="0"><code>$ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -f -i 10947/1 -p "ImageMagick PDF Thumbnail"
|
||||
</code></pre><ul>
|
||||
<li>I’m still waiting (over 1 day later) to hear back from the CGIAR System Organization about updating the DNS for library.cgiar.org</li>
|
||||
</ul>
|
||||
@ -552,7 +552,7 @@ DELETE 207
|
||||
<li>Communicate (finally) with Tania and Tunji from the CGIAR System Organization office to tell them to request CGNET make the DNS updates for library.cgiar.org</li>
|
||||
<li>Peter wants me to clean up the text values for Delia Grace’s metadata, as the authorities are all messed up again since we cleaned them up in <a href="/cgspace-notes/2016-12">2016-12</a>:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
||||
<pre tabindex="0"><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
||||
text_value | authority | confidence
|
||||
--------------+--------------------------------------+------------
|
||||
Grace, Delia | | 600
|
||||
@ -563,12 +563,12 @@ DELETE 207
|
||||
<li>Strangely, none of her authority entries have ORCIDs anymore…</li>
|
||||
<li>I’ll just fix the text values and forget about it for now:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
||||
<pre tabindex="0"><code>dspace=# update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
|
||||
UPDATE 610
|
||||
</code></pre><ul>
|
||||
<li>After this we have to reindex the Discovery and Authority cores (as <code>tomcat7</code> user):</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
||||
<pre tabindex="0"><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m -XX:+TieredCompilation -XX:TieredStopAtLevel=1"
|
||||
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace index-discovery -b
|
||||
|
||||
real 83m56.895s
|
||||
|
Reference in New Issue
Block a user