Add notes for 2017-09-13

This commit is contained in:
2017-09-13 16:42:17 +03:00
parent 6d071a6426
commit a13c5e93b6
3 changed files with 104 additions and 8 deletions

View File

@ -173,8 +173,54 @@ $ sudo tcpdump -i en0 -w without-cached-xsd.dump dst port 80 and 'tcp[32:4] = 0x
```
- If this continues I will definitely need to figure out who is responsible for this scraper and add their user agent to the session crawler valve regex
- A search for "API scraper" user agent on Google returns a `robots.txt` with a comment that this is the Yewno bot: http://www.escholarship.org/robots.txt
- Also, in looking at the DSpace logs I noticed a warning from OAI that I should look into:
```
WARN org.dspace.xoai.services.impl.xoai.DSpaceRepositoryConfiguration @ { OAI 2.0 :: DSpace } Not able to retrieve the dspace.oai.url property from oai.cfg. Falling back to request address
```
- Looking at the spreadsheet with deletions and corrections that CCAFS sent last week
- It appears they want to delete a lot of metadata, which I'm not sure they realize the implications of:
```
dspace=# select text_value, count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange') group by text_value;
text_value | count
--------------------------+-------
FP4_ClimateModels | 6
FP1_CSAEvidence | 7
SEA_UpscalingInnovation | 7
FP4_Baseline | 69
WA_Partnership | 1
WA_SciencePolicyExchange | 6
SA_GHGMeasurement | 2
SA_CSV | 7
EA_PAR | 18
FP4_Livestock | 7
FP4_GenderPolicy | 4
FP2_CRMWestAfrica | 12
FP4_ClimateData | 24
FP4_CCPAG | 2
SEA_mitigationSAMPLES | 2
SA_Biodiversity | 1
FP4_PolicyEngagement | 20
FP3_Gender | 9
FP4_GenderToolbox | 3
(19 rows)
```
- I sent CCAFS people an email to ask if they really want to remove these 200+ tags
- She responded yes, so I'll at least need to do these deletes in PostgreSQL:
```
dspace=# delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134, 235) and text_value in ('EA_PAR','FP1_CSAEvidence','FP2_CRMWestAfrica','FP3_Gender','FP4_Baseline','FP4_CCPAG','FP4_CCPG','FP4_CIATLAM IMPACT','FP4_ClimateData','FP4_ClimateModels','FP4_GenderPolicy','FP4_GenderToolbox','FP4_Livestock','FP4_PolicyEngagement','FP_GII','SA_Biodiversity','SA_CSV','SA_GHGMeasurement','SEA_mitigationSAMPLES','SEA_UpscalingInnovation','WA_Partnership','WA_SciencePolicyExchange','FP_GII');
DELETE 207
```
- When we discussed this in late July there were some other renames they had requested, but I don't see them in the current spreadsheet so I will have to follow that up
- Create and merge pull request to shut up the Ehcache update check ([#337](https://github.com/ilri/DSpace/pull/337))
- Although it looks like there was a previous attempt to disable these update checks that was merged in DSpace 4.0 (although it only affects XMLUI): https://jira.duraspace.org/browse/DS-1492
- I commented there suggesting that we disable it globally
- I merged the changes to the CCAFS project tags ([#336](https://github.com/ilri/DSpace/pull/336)) but still need to finalize the metadata deletions/renames
- I merged the CGIAR Library theme changes ([#338](https://github.com/ilri/DSpace/pull/338)) to the `5_x-prod` branch in preparation for next week's migration
- I emailed the Handle administrators (hdladmin@cnri.reston.va.us) to ask them what the process for changing their prefix to be resolved by our resolver