mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2016-02-11 and 2016-02-12
Signed-off-by: Alan Orth <alan.orth@gmail.com>
This commit is contained in:
@ -248,6 +248,40 @@ Swap: 255 57 198
|
||||
|
||||
<ul>
|
||||
<li>So I’ll bump up the Tomcat heap to 2048 (CGSpace production server is using 3GB)</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2016-02-11:124a59adbaa8ef13e1518d003fc03981">2016-02-11</h2>
|
||||
|
||||
<ul>
|
||||
<li>Massaging some CIAT data in OpenRefine</li>
|
||||
<li>There are 1200 records that have PDFs, and will need to be imported into CGSpace</li>
|
||||
<li>I created a <code>filename</code> column based on the <code>dc.identifier.url</code> column using the following transform:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>value.split('/')[-1]
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Then I wrote a tool called <a href="https://gist.github.com/alanorth/2206f24483fe5f0454fc"><code>generate-thumbnails.py</code></a> to download the PDFs and generate thumbnails for them, for example:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ ./generate-thumbnails.py ciat-reports.csv
|
||||
Processing 64661.pdf
|
||||
> Downloading 64661.pdf
|
||||
> Creating thumbnail for 64661.pdf
|
||||
Processing 64195.pdf
|
||||
> Downloading 64195.pdf
|
||||
> Creating thumbnail for 64195.pdf
|
||||
</code></pre>
|
||||
|
||||
<h2 id="2016-02-12:124a59adbaa8ef13e1518d003fc03981">2016-02-12</h2>
|
||||
|
||||
<ul>
|
||||
<li>Looking at CIAT’s records again, there are some problems with a dozen or so files (out of 1200)</li>
|
||||
<li>A few items are using the same exact PDF</li>
|
||||
<li>A few items are using HTM or DOC files</li>
|
||||
<li>A few items link to PDFs on IFPRI’s e-Library or Research Gate</li>
|
||||
<li>A few items have no item</li>
|
||||
</ul>
|
||||
|
||||
</section>
|
||||
|
Reference in New Issue
Block a user