mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-21 22:25:02 +01:00
Update notes for 2016-03-21
Signed-off-by: Alan Orth <alan.orth@gmail.com>
This commit is contained in:
parent
c613704234
commit
8c9dc9e310
@ -134,6 +134,9 @@ $ gm convert -trim -quality 82 -thumbnail x300 -flatten Descriptor\ for\ Butia_E
|
||||
- I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!
|
||||
- Google says the first time it saw this particular error was September 29, 2015... so maybe it accidentally saw it somehow...
|
||||
- On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content
|
||||
|
||||
![CGSpace pages in Google index](../images/2016/03/google-index.png)
|
||||
|
||||
- Turns out this is a problem with DSpace's `robots.txt`, and there's a Jira ticket since December, 2015: https://jira.duraspace.org/browse/DS-2962
|
||||
- I am not sure if I want to apply it yet
|
||||
- For now I've just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools
|
||||
|
@ -238,6 +238,11 @@
|
||||
<li>I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!</li>
|
||||
<li>Google says the first time it saw this particular error was September 29, 2015… so maybe it accidentally saw it somehow…</li>
|
||||
<li>On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content</li>
|
||||
</ul>
|
||||
|
||||
<p><img src="../images/2016/03/google-index.png" alt="CGSpace pages in Google index" /></p>
|
||||
|
||||
<ul>
|
||||
<li>Turns out this is a problem with DSpace’s <code>robots.txt</code>, and there’s a Jira ticket since December, 2015: <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
|
||||
<li>I am not sure if I want to apply it yet</li>
|
||||
<li>For now I’ve just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools</li>
|
||||
|
BIN
public/images/2016/03/google-index.png
Normal file
BIN
public/images/2016/03/google-index.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 92 KiB |
@ -176,6 +176,11 @@
|
||||
<li>I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!</li>
|
||||
<li>Google says the first time it saw this particular error was September 29, 2015&hellip; so maybe it accidentally saw it somehow&hellip;</li>
|
||||
<li>On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content</li>
|
||||
</ul>
|
||||
|
||||
<p><img src="../images/2016/03/google-index.png" alt="CGSpace pages in Google index" /></p>
|
||||
|
||||
<ul>
|
||||
<li>Turns out this is a problem with DSpace&rsquo;s <code>robots.txt</code>, and there&rsquo;s a Jira ticket since December, 2015: <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
|
||||
<li>I am not sure if I want to apply it yet</li>
|
||||
<li>For now I&rsquo;ve just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools</li>
|
||||
|
@ -176,6 +176,11 @@
|
||||
<li>I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!</li>
|
||||
<li>Google says the first time it saw this particular error was September 29, 2015&hellip; so maybe it accidentally saw it somehow&hellip;</li>
|
||||
<li>On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content</li>
|
||||
</ul>
|
||||
|
||||
<p><img src="../images/2016/03/google-index.png" alt="CGSpace pages in Google index" /></p>
|
||||
|
||||
<ul>
|
||||
<li>Turns out this is a problem with DSpace&rsquo;s <code>robots.txt</code>, and there&rsquo;s a Jira ticket since December, 2015: <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
|
||||
<li>I am not sure if I want to apply it yet</li>
|
||||
<li>For now I&rsquo;ve just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools</li>
|
||||
|
BIN
static/images/2016/03/google-index.png
Normal file
BIN
static/images/2016/03/google-index.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 92 KiB |
Loading…
Reference in New Issue
Block a user