diff --git a/content/2016-03.md b/content/2016-03.md index feeb7b3a3..330426f38 100644 --- a/content/2016-03.md +++ b/content/2016-03.md @@ -134,6 +134,9 @@ $ gm convert -trim -quality 82 -thumbnail x300 -flatten Descriptor\ for\ Butia_E - I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time! - Google says the first time it saw this particular error was September 29, 2015... so maybe it accidentally saw it somehow... - On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content + +![CGSpace pages in Google index](../images/2016/03/google-index.png) + - Turns out this is a problem with DSpace's `robots.txt`, and there's a Jira ticket since December, 2015: https://jira.duraspace.org/browse/DS-2962 - I am not sure if I want to apply it yet - For now I've just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools diff --git a/public/2016-03/index.html b/public/2016-03/index.html index bb42a6edc..53c70b6c2 100644 --- a/public/2016-03/index.html +++ b/public/2016-03/index.html @@ -238,6 +238,11 @@
  • I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!
  • Google says the first time it saw this particular error was September 29, 2015… so maybe it accidentally saw it somehow…
  • On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content
  • + + +

    CGSpace pages in Google index

    + +