mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-26 16:38:19 +01:00
Add notes for 2017-09-13
This commit is contained in:
parent
6d071a6426
commit
a13c5e93b6
@ -173,8 +173,54 @@ $ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x
|
|||||||
```
|
```
|
||||||
|
|
||||||
- If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex
|
- If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex
|
||||||
|
- A search for "API scraper" user agent on Google returns a `robots.txt` with a comment that this is the Yewno bot: http://www.escholarship.org/robots.txt
|
||||||
- Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into:
|
- Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into:
|
||||||
|
|
||||||
```
|
```
|
||||||
WARN org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
|
WARN org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
|
||||||
```
|
```
|
||||||
|
|
||||||
|
- Looking at the spreadsheet with deletions and corrections that CCAFS sent last week
|
||||||
|
- It appears they want to delete a lot of metadata, which I'm not sure they realize the implications of:
|
||||||
|
|
||||||
|
```
|
||||||
|
dspace=# select text_value, count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange') group by text_value;
|
||||||
|
text_value | count
|
||||||
|
--------------------------+-------
|
||||||
|
FP4_ClimateModels | 6
|
||||||
|
FP1_CSAEvidence | 7
|
||||||
|
SEA_UpscalingInnovation | 7
|
||||||
|
FP4_Baseline | 69
|
||||||
|
WA_Partnership | 1
|
||||||
|
WA_SciencePolicyExchange | 6
|
||||||
|
SA_GHGMeasurement | 2
|
||||||
|
SA_CSV | 7
|
||||||
|
EA_PAR | 18
|
||||||
|
FP4_Livestock | 7
|
||||||
|
FP4_GenderPolicy | 4
|
||||||
|
FP2_CRMWestAfrica | 12
|
||||||
|
FP4_ClimateData | 24
|
||||||
|
FP4_CCPAG | 2
|
||||||
|
SEA_mitigationSAMPLES | 2
|
||||||
|
SA_Biodiversity | 1
|
||||||
|
FP4_PolicyEngagement | 20
|
||||||
|
FP3_Gender | 9
|
||||||
|
FP4_GenderToolbox | 3
|
||||||
|
(19 rows)
|
||||||
|
```
|
||||||
|
|
||||||
|
- I sent CCAFS people an email to ask if they really want to remove these 200+ tags
|
||||||
|
- She responded yes, so I'll at least need to do these deletes in PostgreSQL:
|
||||||
|
|
||||||
|
```
|
||||||
|
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
|
||||||
|
DELETE 207
|
||||||
|
```
|
||||||
|
|
||||||
|
- When we discussed this in late July there were some other renames they had requested, but I don't see them in the current spreadsheet so I will have to follow that up
|
||||||
|
- Create and merge pull request to shut up the Ehcache update check ([#337](https://github.com/ilri/DSpace/pull/337))
|
||||||
|
- Although it looks like there was a previous attempt to disable these update checks that was merged in DSpace 4.0 (although it only affects XMLUI): https://jira.duraspace.org/browse/DS-1492
|
||||||
|
- I commented there suggesting that we disable it globally
|
||||||
|
- I merged the changes to the CCAFS project tags ([#336](https://github.com/ilri/DSpace/pull/336)) but still need to finalize the metadata deletions/renames
|
||||||
|
- I merged the CGIAR Library theme changes ([#338](https://github.com/ilri/DSpace/pull/338)) to the `5_x-prod` branch in preparation for next week's migration
|
||||||
|
- I emailed the Handle administrators (hdladmin@cnri.reston.va.us) to ask them what the process for changing their prefix to be resolved by our resolver
|
||||||
|
@ -25,7 +25,7 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account
|
|||||||
|
|
||||||
|
|
||||||
<meta property="article:published_time" content="2017-09-07T16:54:52+07:00"/>
|
<meta property="article:published_time" content="2017-09-07T16:54:52+07:00"/>
|
||||||
<meta property="article:modified_time" content="2017-09-12T16:57:19+03:00"/>
|
<meta property="article:modified_time" content="2017-09-13T09:53:54+03:00"/>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -61,9 +61,9 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "September, 2017",
|
"headline": "September, 2017",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2017-09/",
|
"url": "https://alanorth.github.io/cgspace-notes/2017-09/",
|
||||||
"wordCount": "1241",
|
"wordCount": "1566",
|
||||||
"datePublished": "2017-09-07T16:54:52+07:00",
|
"datePublished": "2017-09-07T16:54:52+07:00",
|
||||||
"dateModified": "2017-09-12T16:57:19+03:00",
|
"dateModified": "2017-09-13T09:53:54+03:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -319,12 +319,62 @@ dspace.log.2017-09-10:0
|
|||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li>If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex</li>
|
<li>If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex</li>
|
||||||
|
<li>A search for “API scraper” user agent on Google returns a <code>robots.txt</code> with a comment that this is the Yewno bot: <a href="http://www.escholarship.org/robots.txt">http://www.escholarship.org/robots.txt</a></li>
|
||||||
<li>Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into:</li>
|
<li>Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into:</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<pre><code>WARN org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
|
<pre><code>WARN org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
|
||||||
</code></pre>
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Looking at the spreadsheet with deletions and corrections that CCAFS sent last week</li>
|
||||||
|
<li>It appears they want to delete a lot of metadata, which I’m not sure they realize the implications of:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>dspace=# select text_value, count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange') group by text_value;
|
||||||
|
text_value | count
|
||||||
|
--------------------------+-------
|
||||||
|
FP4_ClimateModels | 6
|
||||||
|
FP1_CSAEvidence | 7
|
||||||
|
SEA_UpscalingInnovation | 7
|
||||||
|
FP4_Baseline | 69
|
||||||
|
WA_Partnership | 1
|
||||||
|
WA_SciencePolicyExchange | 6
|
||||||
|
SA_GHGMeasurement | 2
|
||||||
|
SA_CSV | 7
|
||||||
|
EA_PAR | 18
|
||||||
|
FP4_Livestock | 7
|
||||||
|
FP4_GenderPolicy | 4
|
||||||
|
FP2_CRMWestAfrica | 12
|
||||||
|
FP4_ClimateData | 24
|
||||||
|
FP4_CCPAG | 2
|
||||||
|
SEA_mitigationSAMPLES | 2
|
||||||
|
SA_Biodiversity | 1
|
||||||
|
FP4_PolicyEngagement | 20
|
||||||
|
FP3_Gender | 9
|
||||||
|
FP4_GenderToolbox | 3
|
||||||
|
(19 rows)
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I sent CCAFS people an email to ask if they really want to remove these 200+ tags</li>
|
||||||
|
<li>She responded yes, so I’ll at least need to do these deletes in PostgreSQL:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
|
||||||
|
DELETE 207
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>When we discussed this in late July there were some other renames they had requested, but I don’t see them in the current spreadsheet so I will have to follow that up</li>
|
||||||
|
<li>Create and merge pull request to shut up the Ehcache update check (<a href="https://github.com/ilri/DSpace/pull/337">#337</a>)</li>
|
||||||
|
<li>Although it looks like there was a previous attempt to disable these update checks that was merged in DSpace 4.0 (although it only affects XMLUI): <a href="https://jira.duraspace.org/browse/DS-1492">https://jira.duraspace.org/browse/DS-1492</a></li>
|
||||||
|
<li>I commented there suggesting that we disable it globally</li>
|
||||||
|
<li>I merged the changes to the CCAFS project tags (<a href="https://github.com/ilri/DSpace/pull/336">#336</a>) but still need to finalize the metadata deletions/renames</li>
|
||||||
|
<li>I merged the CGIAR Library theme changes (<a href="https://github.com/ilri/DSpace/pull/338">#338</a>) to the <code>5_x-prod</code> branch in preparation for next week’s migration</li>
|
||||||
|
<li>I emailed the Handle administrators (hdladmin@cnri.reston.va.us) to ask them what the process for changing their prefix to be resolved by our resolver</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2017-09/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2017-09/</loc>
|
||||||
<lastmod>2017-09-12T16:57:19+03:00</lastmod>
|
<lastmod>2017-09-13T09:53:54+03:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -119,7 +119,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2017-09-12T16:57:19+03:00</lastmod>
|
<lastmod>2017-09-13T09:53:54+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -130,19 +130,19 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
<lastmod>2017-09-12T16:57:19+03:00</lastmod>
|
<lastmod>2017-09-13T09:53:54+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
||||||
<lastmod>2017-09-12T16:57:19+03:00</lastmod>
|
<lastmod>2017-09-13T09:53:54+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2017-09-12T16:57:19+03:00</lastmod>
|
<lastmod>2017-09-13T09:53:54+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user