Update notes for 2016-03-21

Signed-off-by: Alan Orth <alan.orth@gmail.com>
This commit is contained in:
Alan Orth 2016-03-22 10:42:18 +02:00
parent c613704234
commit 8c9dc9e310
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
6 changed files with 18 additions and 0 deletions

View File

@ -134,6 +134,9 @@ $ gm convert -trim -quality 82 -thumbnail x300 -flatten Descriptor\ for\ Butia_E
- I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!
- Google says the first time it saw this particular error was September 29, 2015... so maybe it accidentally saw it somehow...
- On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content
![CGSpace pages in Google index](../images/2016/03/google-index.png)
- Turns out this is a problem with DSpace's `robots.txt`, and there's a Jira ticket since December, 2015: https://jira.duraspace.org/browse/DS-2962
- I am not sure if I want to apply it yet
- For now I've just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools

View File

@ -238,6 +238,11 @@
<li>I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!</li>
<li>Google says the first time it saw this particular error was September 29, 2015&hellip; so maybe it accidentally saw it somehow&hellip;</li>
<li>On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content</li>
</ul>
<p><img src="../images/2016/03/google-index.png" alt="CGSpace pages in Google index" /></p>
<ul>
<li>Turns out this is a problem with DSpace&rsquo;s <code>robots.txt</code>, and there&rsquo;s a Jira ticket since December, 2015: <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
<li>I am not sure if I want to apply it yet</li>
<li>For now I&rsquo;ve just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools</li>

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

View File

@ -176,6 +176,11 @@
&lt;li&gt;I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!&lt;/li&gt;
&lt;li&gt;Google says the first time it saw this particular error was September 29, 2015&amp;hellip; so maybe it accidentally saw it somehow&amp;hellip;&lt;/li&gt;
&lt;li&gt;On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;../images/2016/03/google-index.png&#34; alt=&#34;CGSpace pages in Google index&#34; /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Turns out this is a problem with DSpace&amp;rsquo;s &lt;code&gt;robots.txt&lt;/code&gt;, and there&amp;rsquo;s a Jira ticket since December, 2015: &lt;a href=&#34;https://jira.duraspace.org/browse/DS-2962&#34;&gt;https://jira.duraspace.org/browse/DS-2962&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;I am not sure if I want to apply it yet&lt;/li&gt;
&lt;li&gt;For now I&amp;rsquo;ve just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools&lt;/li&gt;

View File

@ -176,6 +176,11 @@
&lt;li&gt;I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!&lt;/li&gt;
&lt;li&gt;Google says the first time it saw this particular error was September 29, 2015&amp;hellip; so maybe it accidentally saw it somehow&amp;hellip;&lt;/li&gt;
&lt;li&gt;On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;../images/2016/03/google-index.png&#34; alt=&#34;CGSpace pages in Google index&#34; /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Turns out this is a problem with DSpace&amp;rsquo;s &lt;code&gt;robots.txt&lt;/code&gt;, and there&amp;rsquo;s a Jira ticket since December, 2015: &lt;a href=&#34;https://jira.duraspace.org/browse/DS-2962&#34;&gt;https://jira.duraspace.org/browse/DS-2962&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;I am not sure if I want to apply it yet&lt;/li&gt;
&lt;li&gt;For now I&amp;rsquo;ve just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools&lt;/li&gt;

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB