mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2016-03-21
Signed-off-by: Alan Orth <alan.orth@gmail.com>
This commit is contained in:
@ -118,3 +118,30 @@ $ gm convert -trim -quality 82 -thumbnail x300 -flatten Descriptor\ for\ Butia_E
|
||||
```
|
||||
|
||||
- Also, it looks like adding `-sharpen 0x1.0` really improves the quality of the image for only a few KB
|
||||
|
||||
## 2016-03-21
|
||||
|
||||
- Fix 66 site errors in Google's webmaster tools
|
||||
- I looked at a bunch of them and they were old URLs, weird things linked from non-existent items, etc, so I just marked them all as fixed
|
||||
- We also have 1,300 "soft 404" errors for URLs like: https://cgspace.cgiar.org/handle/10568/440/browse?type=bioversity
|
||||
- I've marked them as fixed as well since the ones I tested were working fine
|
||||
- This raises another question, as many of these pages are linked from Discovery search results and might create a duplicate content problem...
|
||||
- Results pages like this give items that Google already knows from the sitemap: https://cgspace.cgiar.org/discover?filtertype=author&filter_relational_operator=equals&filter=Orth%2C+A.
|
||||
- There are some access denied errors on JSPUI links (of course! we forbid them!), but I'm not sure why Google is trying to index them...
|
||||
- For example:
|
||||
- This: https://cgspace.cgiar.org/jspui/bitstream/10568/809/1/main-page.pdf
|
||||
- Linked from: https://cgspace.cgiar.org/jspui/handle/10568/809
|
||||
- I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!
|
||||
- Google says the first time it saw this particular error was September 29, 2015... so maybe it accidentally saw it somehow...
|
||||
- On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content
|
||||
- Turns out this is a problem with DSpace's `robots.txt`, and there's a Jira ticket since December, 2015: https://jira.duraspace.org/browse/DS-2962
|
||||
- I am not sure if I want to apply it yet
|
||||
- For now I've just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools
|
||||
|
||||

|
||||

|
||||
|
||||
- Move AVCD collection to new community and update `move_collection.sh` script: https://gist.github.com/alanorth/392c4660e8b022d99dfa
|
||||
- It seems Feedburner can do HTTPS now, so we might be able to update our feeds and simplify the nginx configs
|
||||
- De-deploy CGSpace with latest `5_x-prod` branch
|
||||
- Run updates on CGSpace and reboot server (new kernel, `4.5.0`)
|
||||
|
Reference in New Issue
Block a user