Add notes for 2016-02-15

Signed-off-by: Alan Orth <alan.orth@gmail.com>
This commit is contained in:
Alan Orth 2016-02-15 11:36:31 +02:00
parent 450965091c
commit 6a4cb0aca6
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
4 changed files with 57 additions and 0 deletions

View File

@ -185,3 +185,15 @@ Processing 64195.pdf
- A few items link to PDFs on IFPRI's e-Library or Research Gate - A few items link to PDFs on IFPRI's e-Library or Research Gate
- A few items have no item - A few items have no item
- Also, I'm not sure if we import these items, will be remove the `dc.identifier.url` field from the records? - Also, I'm not sure if we import these items, will be remove the `dc.identifier.url` field from the records?
## 2016-02-12
- Looking at CIAT's records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I'm not sure if we can use those
- 265 items have dirty, URL-encoded filenames:
```
$ ls | grep -c -E "%"
265
```
- I suggest that we import ~850 or so of the clean ones first, then do the rest after I can find a clean/reliable way to decode the filenames

View File

@ -283,6 +283,21 @@ Processing 64195.pdf
<li>A few items link to PDFs on IFPRI&rsquo;s e-Library or Research Gate</li> <li>A few items link to PDFs on IFPRI&rsquo;s e-Library or Research Gate</li>
<li>A few items have no item</li> <li>A few items have no item</li>
<li>Also, I&rsquo;m not sure if we import these items, will be remove the <code>dc.identifier.url</code> field from the records?</li> <li>Also, I&rsquo;m not sure if we import these items, will be remove the <code>dc.identifier.url</code> field from the records?</li>
</ul>
<h2 id="2016-02-12-1:124a59adbaa8ef13e1518d003fc03981">2016-02-12</h2>
<ul>
<li>Looking at CIAT&rsquo;s records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I&rsquo;m not sure if we can use those</li>
<li>265 items have dirty, URL-encoded filenames:</li>
</ul>
<pre><code>$ ls | grep -c -E &quot;%&quot;
265
</code></pre>
<ul>
<li>I suggest that we import ~850 or so of the clean ones first, then do the rest after I can find a clean/reliable way to decode the filenames</li>
</ul> </ul>
</section> </section>

View File

@ -222,6 +222,21 @@ Processing 64195.pdf
&lt;li&gt;A few items have no item&lt;/li&gt; &lt;li&gt;A few items have no item&lt;/li&gt;
&lt;li&gt;Also, I&amp;rsquo;m not sure if we import these items, will be remove the &lt;code&gt;dc.identifier.url&lt;/code&gt; field from the records?&lt;/li&gt; &lt;li&gt;Also, I&amp;rsquo;m not sure if we import these items, will be remove the &lt;code&gt;dc.identifier.url&lt;/code&gt; field from the records?&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;2016-02-12-1:124a59adbaa8ef13e1518d003fc03981&#34;&gt;2016-02-12&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Looking at CIAT&amp;rsquo;s records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I&amp;rsquo;m not sure if we can use those&lt;/li&gt;
&lt;li&gt;265 items have dirty, URL-encoded filenames:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ ls | grep -c -E &amp;quot;%&amp;quot;
265
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;I suggest that we import ~850 or so of the clean ones first, then do the rest after I can find a clean/reliable way to decode the filenames&lt;/li&gt;
&lt;/ul&gt;
</description> </description>
</item> </item>

View File

@ -222,6 +222,21 @@ Processing 64195.pdf
&lt;li&gt;A few items have no item&lt;/li&gt; &lt;li&gt;A few items have no item&lt;/li&gt;
&lt;li&gt;Also, I&amp;rsquo;m not sure if we import these items, will be remove the &lt;code&gt;dc.identifier.url&lt;/code&gt; field from the records?&lt;/li&gt; &lt;li&gt;Also, I&amp;rsquo;m not sure if we import these items, will be remove the &lt;code&gt;dc.identifier.url&lt;/code&gt; field from the records?&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;2016-02-12-1:124a59adbaa8ef13e1518d003fc03981&#34;&gt;2016-02-12&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Looking at CIAT&amp;rsquo;s records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I&amp;rsquo;m not sure if we can use those&lt;/li&gt;
&lt;li&gt;265 items have dirty, URL-encoded filenames:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;$ ls | grep -c -E &amp;quot;%&amp;quot;
265
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;I suggest that we import ~850 or so of the clean ones first, then do the rest after I can find a clean/reliable way to decode the filenames&lt;/li&gt;
&lt;/ul&gt;
</description> </description>
</item> </item>