mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-24 14:04:29 +01:00
Update notes for 2017-08-01
This commit is contained in:
parent
e3e602881e
commit
5b11434f0f
@ -16,5 +16,7 @@ tags = ["Notes"]
|
||||
- /handle/10568/16510/browse
|
||||
- The `robots.txt` only blocks the top-level `/discover` and `/browse` URLs... we will need to find a way to forbid them from accessing these!
|
||||
- Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962
|
||||
- It turns out that we're already adding the `X-Robots-Tag "none"` HTTP header, but this only forbids the search engine from _indexing_ the page, not crawling it!
|
||||
- Also, the bot has to successfully browse the page first so it can receive the HTTP header...
|
||||
|
||||
<!--more-->
|
||||
|
@ -25,7 +25,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2015-11-23T17:00:57+03:00"/>
|
||||
<meta property="article:modified_time" content="2015-11-23T17:00:57+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-09-28T17:02:30+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -71,7 +71,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2015-11/",
|
||||
"wordCount": "798",
|
||||
"datePublished": "2015-11-23T17:00:57+03:00",
|
||||
"dateModified": "2015-11-23T17:00:57+03:00",
|
||||
"dateModified": "2016-09-28T17:02:30+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -26,7 +26,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2015-12-02T13:18:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2015-12-02T13:18:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-09T16:18:07+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -73,7 +73,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2015-12/",
|
||||
"wordCount": "753",
|
||||
"datePublished": "2015-12-02T13:18:00+03:00",
|
||||
"dateModified": "2015-12-02T13:18:00+03:00",
|
||||
"dateModified": "2017-01-09T16:18:07+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -21,7 +21,7 @@ Update GitHub wiki for documentation of maintenance tasks.
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2016-01-13T13:18:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-01-13T13:18:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-09T16:18:07+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -63,7 +63,7 @@ Update GitHub wiki for documentation of maintenance tasks.
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2016-01/",
|
||||
"wordCount": "466",
|
||||
"datePublished": "2016-01-13T13:18:00+03:00",
|
||||
"dateModified": "2016-01-13T13:18:00+03:00",
|
||||
"dateModified": "2017-01-09T16:18:07+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -28,7 +28,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2016-02-05T13:18:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-02-05T13:18:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-09T16:18:07+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -77,7 +77,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2016-02/",
|
||||
"wordCount": "1657",
|
||||
"datePublished": "2016-02-05T13:18:00+03:00",
|
||||
"dateModified": "2016-02-05T13:18:00+03:00",
|
||||
"dateModified": "2017-01-09T16:18:07+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -21,7 +21,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2016-03-02T16:50:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-03-02T16:50:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-09T16:18:07+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -63,7 +63,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2016-03/",
|
||||
"wordCount": "1581",
|
||||
"datePublished": "2016-03-02T16:50:00+03:00",
|
||||
"dateModified": "2016-03-02T16:50:00+03:00",
|
||||
"dateModified": "2017-01-09T16:18:07+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -23,7 +23,7 @@ Also, I noticed the checker log has some errors we should pay attention to:
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2016-04-04T11:06:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-04-04T11:06:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-09-28T17:02:30+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -67,7 +67,7 @@ Also, I noticed the checker log has some errors we should pay attention to:
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2016-04/",
|
||||
"wordCount": "2006",
|
||||
"datePublished": "2016-04-04T11:06:00+03:00",
|
||||
"dateModified": "2016-04-04T11:06:00+03:00",
|
||||
"dateModified": "2016-09-28T17:02:30+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -25,7 +25,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2016-05-01T23:06:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-05-01T23:06:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-09T16:18:07+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -71,7 +71,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2016-05/",
|
||||
"wordCount": "1349",
|
||||
"datePublished": "2016-05-01T23:06:00+03:00",
|
||||
"dateModified": "2016-05-01T23:06:00+03:00",
|
||||
"dateModified": "2017-01-09T16:18:07+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -24,7 +24,7 @@ Working on second phase of metadata migration, looks like this will work for mov
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2016-06-01T10:53:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-06-01T10:53:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-09T16:18:07+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -69,7 +69,7 @@ Working on second phase of metadata migration, looks like this will work for mov
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2016-06/",
|
||||
"wordCount": "1549",
|
||||
"datePublished": "2016-06-01T10:53:00+03:00",
|
||||
"dateModified": "2016-06-01T10:53:00+03:00",
|
||||
"dateModified": "2017-01-09T16:18:07+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -32,7 +32,7 @@ In this case the select query was showing 95 results before the update
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2016-07-01T10:53:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-07-01T10:53:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-09T16:18:07+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -85,7 +85,7 @@ In this case the select query was showing 95 results before the update
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2016-07/",
|
||||
"wordCount": "866",
|
||||
"datePublished": "2016-07-01T10:53:00+03:00",
|
||||
"dateModified": "2016-07-01T10:53:00+03:00",
|
||||
"dateModified": "2017-01-09T16:18:07+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -29,7 +29,7 @@ $ git rebase -i dspace-5.5
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2016-08-01T15:53:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-08-01T15:53:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-09T16:18:07+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -79,7 +79,7 @@ $ git rebase -i dspace-5.5
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2016-08/",
|
||||
"wordCount": "1514",
|
||||
"datePublished": "2016-08-01T15:53:00+03:00",
|
||||
"dateModified": "2016-08-01T15:53:00+03:00",
|
||||
"dateModified": "2017-01-09T16:18:07+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -25,7 +25,7 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=or
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2016-09-01T15:53:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-09-01T15:53:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-09T16:18:07+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -71,7 +71,7 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=or
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2016-09/",
|
||||
"wordCount": "3298",
|
||||
"datePublished": "2016-09-01T15:53:00+03:00",
|
||||
"dateModified": "2016-09-01T15:53:00+03:00",
|
||||
"dateModified": "2017-01-09T16:18:07+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -29,7 +29,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2016-10-03T15:53:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-10-03T15:53:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-10T16:21:47+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -79,7 +79,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2016-10/",
|
||||
"wordCount": "1828",
|
||||
"datePublished": "2016-10-03T15:53:00+03:00",
|
||||
"dateModified": "2016-10-03T15:53:00+03:00",
|
||||
"dateModified": "2017-01-10T16:21:47+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -21,7 +21,7 @@ Add dc.type to the output options for Atmire’s Listings and Reports module
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2016-11-01T09:21:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-11-01T09:21:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-10T16:21:47+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -63,7 +63,7 @@ Add dc.type to the output options for Atmire’s Listings and Reports module
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2016-11/",
|
||||
"wordCount": "2825",
|
||||
"datePublished": "2016-11-01T09:21:00+03:00",
|
||||
"dateModified": "2016-11-01T09:21:00+03:00",
|
||||
"dateModified": "2017-01-10T16:21:47+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -33,7 +33,7 @@ Another worrying error from dspace.log is:
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2016-12-02T10:43:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2016-12-02T10:43:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-10T16:21:47+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -87,7 +87,7 @@ Another worrying error from dspace.log is:
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2016-12/",
|
||||
"wordCount": "4078",
|
||||
"datePublished": "2016-12-02T10:43:00+03:00",
|
||||
"dateModified": "2016-12-02T10:43:00+03:00",
|
||||
"dateModified": "2017-01-10T16:21:47+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -21,7 +21,7 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2017-01-02T10:43:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-02T10:43:00+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-01-29T13:18:32+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -63,7 +63,7 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2017-01/",
|
||||
"wordCount": "1594",
|
||||
"datePublished": "2017-01-02T10:43:00+03:00",
|
||||
"dateModified": "2017-01-02T10:43:00+03:00",
|
||||
"dateModified": "2017-01-29T13:18:32+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -35,7 +35,7 @@ Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2017-02-07T07:04:52-08:00"/>
|
||||
<meta property="article:modified_time" content="2017-02-07T07:04:52-08:00"/>
|
||||
<meta property="article:modified_time" content="2017-02-28T22:58:29+02:00"/>
|
||||
|
||||
|
||||
|
||||
@ -91,7 +91,7 @@ Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2017-02/",
|
||||
"wordCount": "2028",
|
||||
"datePublished": "2017-02-07T07:04:52-08:00",
|
||||
"dateModified": "2017-02-07T07:04:52-08:00",
|
||||
"dateModified": "2017-02-28T22:58:29+02:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -37,7 +37,7 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2017-03-01T17:08:52+02:00"/>
|
||||
<meta property="article:modified_time" content="2017-03-01T17:08:52+02:00"/>
|
||||
<meta property="article:modified_time" content="2017-03-31T05:36:10+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -95,7 +95,7 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2017-03/",
|
||||
"wordCount": "1538",
|
||||
"datePublished": "2017-03-01T17:08:52+02:00",
|
||||
"dateModified": "2017-03-01T17:08:52+02:00",
|
||||
"dateModified": "2017-03-31T05:36:10+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -30,7 +30,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2017-04-02T17:08:52+02:00"/>
|
||||
<meta property="article:modified_time" content="2017-04-02T17:08:52+02:00"/>
|
||||
<meta property="article:modified_time" content="2017-04-26T13:35:10+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -81,7 +81,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2017-04/",
|
||||
"wordCount": "2917",
|
||||
"datePublished": "2017-04-02T17:08:52+02:00",
|
||||
"dateModified": "2017-04-02T17:08:52+02:00",
|
||||
"dateModified": "2017-04-26T13:35:10+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -13,7 +13,7 @@
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2017-05-01T16:21:52+02:00"/>
|
||||
<meta property="article:modified_time" content="2017-05-01T16:21:52+02:00"/>
|
||||
<meta property="article:modified_time" content="2017-05-29T13:15:22+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -47,7 +47,7 @@
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2017-05/",
|
||||
"wordCount": "2412",
|
||||
"datePublished": "2017-05-01T16:21:52+02:00",
|
||||
"dateModified": "2017-05-01T16:21:52+02:00",
|
||||
"dateModified": "2017-05-29T13:15:22+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -13,7 +13,7 @@
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2017-06-01T10:14:52+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-06-01T10:14:52+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-06-30T18:34:51+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -47,7 +47,7 @@
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2017-06/",
|
||||
"wordCount": "1261",
|
||||
"datePublished": "2017-06-01T10:14:52+03:00",
|
||||
"dateModified": "2017-06-01T10:14:52+03:00",
|
||||
"dateModified": "2017-06-30T18:34:51+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -27,7 +27,7 @@ We can use PostgreSQL’s extended output format (-x) plus sed to format the
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2017-07-01T18:03:52+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-07-01T18:03:52+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-08-01T08:55:37+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -75,7 +75,7 @@ We can use PostgreSQL’s extended output format (-x) plus sed to format the
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2017-07/",
|
||||
"wordCount": "1151",
|
||||
"datePublished": "2017-07-01T18:03:52+03:00",
|
||||
"dateModified": "2017-07-01T18:03:52+03:00",
|
||||
"dateModified": "2017-08-01T08:55:37+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
|
@ -21,6 +21,8 @@ But many of the bots are browsing dynamic URLs like:
|
||||
|
||||
The robots.txt only blocks the top-level /discover and /browse URLs… we will need to find a way to forbid them from accessing these!
|
||||
Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962
|
||||
It turns out that we’re already adding the X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it!
|
||||
Also, the bot has to successfully browse the page first so it can receive the HTTP header…
|
||||
|
||||
|
||||
" />
|
||||
@ -30,7 +32,7 @@ Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.dura
|
||||
|
||||
|
||||
<meta property="article:published_time" content="2017-08-01T11:51:52+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-08-01T11:51:52+03:00"/>
|
||||
<meta property="article:modified_time" content="2017-08-01T11:57:37+03:00"/>
|
||||
|
||||
|
||||
|
||||
@ -65,6 +67,8 @@ But many of the bots are browsing dynamic URLs like:
|
||||
|
||||
The robots.txt only blocks the top-level /discover and /browse URLs… we will need to find a way to forbid them from accessing these!
|
||||
Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.duraspace.org/browse/DS-2962
|
||||
It turns out that we’re already adding the X-Robots-Tag "none" HTTP header, but this only forbids the search engine from indexing the page, not crawling it!
|
||||
Also, the bot has to successfully browse the page first so it can receive the HTTP header…
|
||||
|
||||
|
||||
"/>
|
||||
@ -79,9 +83,9 @@ Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.dura
|
||||
"@type": "BlogPosting",
|
||||
"headline": "August, 2017",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2017-08/",
|
||||
"wordCount": "123",
|
||||
"wordCount": "166",
|
||||
"datePublished": "2017-08-01T11:51:52+03:00",
|
||||
"dateModified": "2017-08-01T11:51:52+03:00",
|
||||
"dateModified": "2017-08-01T11:57:37+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -159,6 +163,8 @@ Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): https://jira.dura
|
||||
</ul></li>
|
||||
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs… we will need to find a way to forbid them from accessing these!</li>
|
||||
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
|
||||
<li>It turns out that we’re already adding the <code>X-Robots-Tag "none"</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
|
||||
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header…</li>
|
||||
</ul>
|
||||
|
||||
<p></p>
|
||||
|
@ -119,6 +119,8 @@
|
||||
</ul></li>
|
||||
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs… we will need to find a way to forbid them from accessing these!</li>
|
||||
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
|
||||
<li>It turns out that we’re already adding the <code>X-Robots-Tag "none"</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
|
||||
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header…</li>
|
||||
</ul>
|
||||
|
||||
<p></p>
|
||||
|
@ -32,6 +32,8 @@
|
||||
</ul></li>
|
||||
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
|
||||
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
|
||||
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
|
||||
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
|
||||
</ul>
|
||||
|
||||
<p></p></description>
|
||||
|
@ -119,6 +119,8 @@
|
||||
</ul></li>
|
||||
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs… we will need to find a way to forbid them from accessing these!</li>
|
||||
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
|
||||
<li>It turns out that we’re already adding the <code>X-Robots-Tag "none"</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
|
||||
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header…</li>
|
||||
</ul>
|
||||
|
||||
<p></p>
|
||||
|
@ -32,6 +32,8 @@
|
||||
</ul></li>
|
||||
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
|
||||
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
|
||||
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
|
||||
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
|
||||
</ul>
|
||||
|
||||
<p></p></description>
|
||||
|
@ -4,117 +4,117 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2017-08/</loc>
|
||||
<lastmod>2017-08-01T11:51:52+03:00</lastmod>
|
||||
<lastmod>2017-08-01T11:57:37+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2017-07/</loc>
|
||||
<lastmod>2017-07-01T18:03:52+03:00</lastmod>
|
||||
<lastmod>2017-08-01T08:55:37+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2017-06/</loc>
|
||||
<lastmod>2017-06-01T10:14:52+03:00</lastmod>
|
||||
<lastmod>2017-06-30T18:34:51+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2017-05/</loc>
|
||||
<lastmod>2017-05-01T16:21:52+02:00</lastmod>
|
||||
<lastmod>2017-05-29T13:15:22+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2017-04/</loc>
|
||||
<lastmod>2017-04-02T17:08:52+02:00</lastmod>
|
||||
<lastmod>2017-04-26T13:35:10+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2017-03/</loc>
|
||||
<lastmod>2017-03-01T17:08:52+02:00</lastmod>
|
||||
<lastmod>2017-03-31T05:36:10+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2017-02/</loc>
|
||||
<lastmod>2017-02-07T07:04:52-08:00</lastmod>
|
||||
<lastmod>2017-02-28T22:58:29+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2017-01/</loc>
|
||||
<lastmod>2017-01-02T10:43:00+03:00</lastmod>
|
||||
<lastmod>2017-01-29T13:18:32+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2016-12/</loc>
|
||||
<lastmod>2016-12-02T10:43:00+03:00</lastmod>
|
||||
<lastmod>2017-01-10T16:21:47+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2016-11/</loc>
|
||||
<lastmod>2016-11-01T09:21:00+03:00</lastmod>
|
||||
<lastmod>2017-01-10T16:21:47+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2016-10/</loc>
|
||||
<lastmod>2016-10-03T15:53:00+03:00</lastmod>
|
||||
<lastmod>2017-01-10T16:21:47+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2016-09/</loc>
|
||||
<lastmod>2016-09-01T15:53:00+03:00</lastmod>
|
||||
<lastmod>2017-01-09T16:18:07+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2016-08/</loc>
|
||||
<lastmod>2016-08-01T15:53:00+03:00</lastmod>
|
||||
<lastmod>2017-01-09T16:18:07+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2016-07/</loc>
|
||||
<lastmod>2016-07-01T10:53:00+03:00</lastmod>
|
||||
<lastmod>2017-01-09T16:18:07+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2016-06/</loc>
|
||||
<lastmod>2016-06-01T10:53:00+03:00</lastmod>
|
||||
<lastmod>2017-01-09T16:18:07+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2016-05/</loc>
|
||||
<lastmod>2016-05-01T23:06:00+03:00</lastmod>
|
||||
<lastmod>2017-01-09T16:18:07+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2016-04/</loc>
|
||||
<lastmod>2016-04-04T11:06:00+03:00</lastmod>
|
||||
<lastmod>2016-09-28T17:02:30+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2016-03/</loc>
|
||||
<lastmod>2016-03-02T16:50:00+03:00</lastmod>
|
||||
<lastmod>2017-01-09T16:18:07+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2016-02/</loc>
|
||||
<lastmod>2016-02-05T13:18:00+03:00</lastmod>
|
||||
<lastmod>2017-01-09T16:18:07+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2016-01/</loc>
|
||||
<lastmod>2016-01-13T13:18:00+03:00</lastmod>
|
||||
<lastmod>2017-01-09T16:18:07+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2015-12/</loc>
|
||||
<lastmod>2015-12-02T13:18:00+03:00</lastmod>
|
||||
<lastmod>2017-01-09T16:18:07+02:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2015-11/</loc>
|
||||
<lastmod>2015-11-23T17:00:57+03:00</lastmod>
|
||||
<lastmod>2016-09-28T17:02:30+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2017-08-01T11:51:52+03:00</lastmod>
|
||||
<lastmod>2017-08-01T11:57:37+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -125,19 +125,19 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2017-08-01T11:51:52+03:00</lastmod>
|
||||
<lastmod>2017-08-01T11:57:37+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
||||
<lastmod>2017-08-01T11:51:52+03:00</lastmod>
|
||||
<lastmod>2017-08-01T11:57:37+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2017-08-01T11:51:52+03:00</lastmod>
|
||||
<lastmod>2017-08-01T11:57:37+03:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
@ -119,6 +119,8 @@
|
||||
</ul></li>
|
||||
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs… we will need to find a way to forbid them from accessing these!</li>
|
||||
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
|
||||
<li>It turns out that we’re already adding the <code>X-Robots-Tag "none"</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
|
||||
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header…</li>
|
||||
</ul>
|
||||
|
||||
<p></p>
|
||||
|
@ -32,6 +32,8 @@
|
||||
</ul></li>
|
||||
<li>The <code>robots.txt</code> only blocks the top-level <code>/discover</code> and <code>/browse</code> URLs&hellip; we will need to find a way to forbid them from accessing these!</li>
|
||||
<li>Relevant issue from DSpace Jira (semi resolved in DSpace 6.0): <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
|
||||
<li>It turns out that we&rsquo;re already adding the <code>X-Robots-Tag &quot;none&quot;</code> HTTP header, but this only forbids the search engine from <em>indexing</em> the page, not crawling it!</li>
|
||||
<li>Also, the bot has to successfully browse the page first so it can receive the HTTP header&hellip;</li>
|
||||
</ul>
|
||||
|
||||
<p></p></description>
|
||||
|
Loading…
Reference in New Issue
Block a user