From 6a4cb0aca6b546e87a6ee423362d63b59a5db4f9 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Mon, 15 Feb 2016 11:36:31 +0200 Subject: [PATCH] Add notes for 2016-02-15 Signed-off-by: Alan Orth --- content/2016-02.md | 12 ++++++++++++ public/2016-02/index.html | 15 +++++++++++++++ public/index.xml | 15 +++++++++++++++ public/tags/notes/index.xml | 15 +++++++++++++++ 4 files changed, 57 insertions(+) diff --git a/content/2016-02.md b/content/2016-02.md index 778af306a..6b3598f9b 100644 --- a/content/2016-02.md +++ b/content/2016-02.md @@ -185,3 +185,15 @@ Processing 64195.pdf - A few items link to PDFs on IFPRI's e-Library or Research Gate - A few items have no item - Also, I'm not sure if we import these items, will be remove the `dc.identifier.url` field from the records? + +## 2016-02-12 + +- Looking at CIAT's records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I'm not sure if we can use those +- 265 items have dirty, URL-encoded filenames: + +``` +$ ls | grep -c -E "%" +265 +``` + +- I suggest that we import ~850 or so of the clean ones first, then do the rest after I can find a clean/reliable way to decode the filenames diff --git a/public/2016-02/index.html b/public/2016-02/index.html index e87417e06..399d96a53 100644 --- a/public/2016-02/index.html +++ b/public/2016-02/index.html @@ -283,6 +283,21 @@ Processing 64195.pdf
  • A few items link to PDFs on IFPRI’s e-Library or Research Gate
  • A few items have no item
  • Also, I’m not sure if we import these items, will be remove the dc.identifier.url field from the records?
  • + + +

    2016-02-12

    + + + +
    $ ls | grep -c -E "%"
    +265
    +
    + + diff --git a/public/index.xml b/public/index.xml index 6b0891cba..fcaae8613 100644 --- a/public/index.xml +++ b/public/index.xml @@ -222,6 +222,21 @@ Processing 64195.pdf <li>A few items have no item</li> <li>Also, I&rsquo;m not sure if we import these items, will be remove the <code>dc.identifier.url</code> field from the records?</li> </ul> + +<h2 id="2016-02-12-1:124a59adbaa8ef13e1518d003fc03981">2016-02-12</h2> + +<ul> +<li>Looking at CIAT&rsquo;s records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I&rsquo;m not sure if we can use those</li> +<li>265 items have dirty, URL-encoded filenames:</li> +</ul> + +<pre><code>$ ls | grep -c -E &quot;%&quot; +265 +</code></pre> + +<ul> +<li>I suggest that we import ~850 or so of the clean ones first, then do the rest after I can find a clean/reliable way to decode the filenames</li> +</ul> diff --git a/public/tags/notes/index.xml b/public/tags/notes/index.xml index c128a6e8e..532faaf7a 100644 --- a/public/tags/notes/index.xml +++ b/public/tags/notes/index.xml @@ -222,6 +222,21 @@ Processing 64195.pdf <li>A few items have no item</li> <li>Also, I&rsquo;m not sure if we import these items, will be remove the <code>dc.identifier.url</code> field from the records?</li> </ul> + +<h2 id="2016-02-12-1:124a59adbaa8ef13e1518d003fc03981">2016-02-12</h2> + +<ul> +<li>Looking at CIAT&rsquo;s records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I&rsquo;m not sure if we can use those</li> +<li>265 items have dirty, URL-encoded filenames:</li> +</ul> + +<pre><code>$ ls | grep -c -E &quot;%&quot; +265 +</code></pre> + +<ul> +<li>I suggest that we import ~850 or so of the clean ones first, then do the rest after I can find a clean/reliable way to decode the filenames</li> +</ul>