mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes for 2019-12-17
This commit is contained in:
@ -35,7 +35,7 @@ I noticed we have a very interesting list of countries on CGSpace:
|
||||
Not only are there 49,000 countries, we have some blanks (25)…
|
||||
Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.60.1" />
|
||||
<meta name="generator" content="Hugo 0.61.0" />
|
||||
|
||||
|
||||
|
||||
@ -116,7 +116,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r
|
||||
|
||||
</p>
|
||||
</header>
|
||||
<h2 id="20160205">2016-02-05</h2>
|
||||
<h2 id="2016-02-05">2016-02-05</h2>
|
||||
<ul>
|
||||
<li>Looking at some DAGRIS data for Abenet Yabowork</li>
|
||||
<li>Lots of issues with spaces, newlines, etc causing the import to fail</li>
|
||||
@ -127,7 +127,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r
|
||||
<li>Not only are there 49,000 countries, we have some blanks (25)…</li>
|
||||
<li>Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”</li>
|
||||
</ul>
|
||||
<h2 id="20160206">2016-02-06</h2>
|
||||
<h2 id="2016-02-06">2016-02-06</h2>
|
||||
<ul>
|
||||
<li>Found a way to get items with null/empty metadata values from SQL</li>
|
||||
<li>First, find the <code>metadata_field_id</code> for the field you want from the <code>metadatafieldregistry</code> table:</li>
|
||||
@ -154,7 +154,7 @@ DELETE 25
|
||||
<li>Yep! The full re-index seems to work.</li>
|
||||
<li>Process the empty countries on CGSpace</li>
|
||||
</ul>
|
||||
<h2 id="20160207">2016-02-07</h2>
|
||||
<h2 id="2016-02-07">2016-02-07</h2>
|
||||
<ul>
|
||||
<li>Working on cleaning up Abenet's DAGRIS data with OpenRefine</li>
|
||||
<li>I discovered two really nice functions in OpenRefine: <code>value.trim()</code> and <code>value.escape("javascript")</code> which shows whitespace characters like <code>\r\n</code>!</li>
|
||||
@ -195,14 +195,14 @@ $ /opt/brew/Cellar/tomcat/8.0.30/bin/catalina start
|
||||
<li>After verifying that the site is working, start a full index:</li>
|
||||
</ul>
|
||||
<pre><code>$ ~/dspace/bin/dspace index-discovery -b
|
||||
</code></pre><h2 id="20160208">2016-02-08</h2>
|
||||
</code></pre><h2 id="2016-02-08">2016-02-08</h2>
|
||||
<ul>
|
||||
<li>Finish cleaning up and importing ~400 DAGRIS items into CGSpace</li>
|
||||
<li>Whip up some quick CSS to make the button in the submission workflow use the XMLUI theme's brand colors (<a href="https://github.com/ilri/DSpace/issues/154">#154</a>)</li>
|
||||
</ul>
|
||||
<p><img src="/cgspace-notes/2016/02/submit-button-ilri.png" alt="ILRI submission buttons">
|
||||
<img src="/cgspace-notes/2016/02/submit-button-drylands.png" alt="Drylands submission buttons"></p>
|
||||
<h2 id="20160209">2016-02-09</h2>
|
||||
<h2 id="2016-02-09">2016-02-09</h2>
|
||||
<ul>
|
||||
<li>Re-sync DSpace Test with CGSpace</li>
|
||||
<li>Help Sisay with OpenRefine</li>
|
||||
@ -239,7 +239,7 @@ Swap: 255 57 198
|
||||
</code></pre><ul>
|
||||
<li>So I'll bump up the Tomcat heap to 2048 (CGSpace production server is using 3GB)</li>
|
||||
</ul>
|
||||
<h2 id="20160211">2016-02-11</h2>
|
||||
<h2 id="2016-02-11">2016-02-11</h2>
|
||||
<ul>
|
||||
<li>Massaging some CIAT data in OpenRefine</li>
|
||||
<li>There are 1200 records that have PDFs, and will need to be imported into CGSpace</li>
|
||||
@ -256,7 +256,7 @@ Processing 64661.pdf
|
||||
Processing 64195.pdf
|
||||
> Downloading 64195.pdf
|
||||
> Creating thumbnail for 64195.pdf
|
||||
</code></pre><h2 id="20160212">2016-02-12</h2>
|
||||
</code></pre><h2 id="2016-02-12">2016-02-12</h2>
|
||||
<ul>
|
||||
<li>Looking at CIAT's records again, there are some problems with a dozen or so files (out of 1200)</li>
|
||||
<li>A few items are using the same exact PDF</li>
|
||||
@ -265,7 +265,7 @@ Processing 64195.pdf
|
||||
<li>A few items have no item</li>
|
||||
<li>Also, I'm not sure if we import these items, will be remove the <code>dc.identifier.url</code> field from the records?</li>
|
||||
</ul>
|
||||
<h2 id="201602121">2016-02-12</h2>
|
||||
<h2 id="2016-02-12-1">2016-02-12</h2>
|
||||
<ul>
|
||||
<li>Looking at CIAT's records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I'm not sure if we can use those</li>
|
||||
<li>265 items have dirty, URL-encoded filenames:</li>
|
||||
@ -282,7 +282,7 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
|
||||
<li>Merge pull requests for submission form theming (<a href="https://github.com/ilri/DSpace/pull/178">#178</a>) and missing center subjects in XMLUI item views (<a href="https://github.com/ilri/DSpace/pull/176">#176</a>)</li>
|
||||
<li>They will be deployed on CGSpace the next time I re-deploy</li>
|
||||
</ul>
|
||||
<h2 id="20160216">2016-02-16</h2>
|
||||
<h2 id="2016-02-16">2016-02-16</h2>
|
||||
<ul>
|
||||
<li>Turns out OpenRefine has an unescape function!</li>
|
||||
</ul>
|
||||
@ -296,14 +296,14 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
|
||||
<li>To get filenames from <code>dc.identifier.url</code>, create a new column based on this transform: <code>forEach(value.split('||'), v, v.split('/')[-1]).join('||')</code></li>
|
||||
<li>This also works for records that have multiple URLs (separated by “||”)</li>
|
||||
</ul>
|
||||
<h2 id="20160217">2016-02-17</h2>
|
||||
<h2 id="2016-02-17">2016-02-17</h2>
|
||||
<ul>
|
||||
<li>Re-deploy CGSpace, run all system updates, and reboot</li>
|
||||
<li>More work on CIAT data, cleaning and doing a last metadata-only import into DSpace Test</li>
|
||||
<li>SAFBuilder has a bug preventing it from processing filenames containing more than one underscore</li>
|
||||
<li>Need to re-process the filename column to replace multiple underscores with one: <code>value.replace(/_{2,}/, "_")</code></li>
|
||||
</ul>
|
||||
<h2 id="20160220">2016-02-20</h2>
|
||||
<h2 id="2016-02-20">2016-02-20</h2>
|
||||
<ul>
|
||||
<li>Turns out the “bug” in SAFBuilder isn't a bug, it's a feature that allows you to encode extra information like the destintion bundle in the filename</li>
|
||||
<li>Also, it seems DSpace's SAF import tool doesn't like importing filenames that have accents in them:</li>
|
||||
@ -313,7 +313,7 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
|
||||
<li>Need to rename files to have no accents or umlauts, etc…</li>
|
||||
<li>Useful custom text facet for URLs ending with “.pdf”: <code>value.endsWith(".pdf")</code></li>
|
||||
</ul>
|
||||
<h2 id="20160222">2016-02-22</h2>
|
||||
<h2 id="2016-02-22">2016-02-22</h2>
|
||||
<ul>
|
||||
<li>To change Spanish accents to ASCII in OpenRefine:</li>
|
||||
</ul>
|
||||
@ -330,7 +330,7 @@ Bitstream: tést señora alimentación.pdf
|
||||
<li>HFS+ stores filenames as a string, and filenames with accents get stored as <a href="https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/">character+accent</a> whereas Linux's ext4 stores them as an array of bytes</li>
|
||||
<li>Running the SAFBuilder on Mac OS X works if you're going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem's encoding matches</li>
|
||||
</ul>
|
||||
<h2 id="20160229">2016-02-29</h2>
|
||||
<h2 id="2016-02-29">2016-02-29</h2>
|
||||
<ul>
|
||||
<li>Got notified by some CIFOR colleagues that the Google Scholar team had contacted them about CGSpace's incorrect ordering of authors in Google Scholar metadata</li>
|
||||
<li>Turns out there is a patch, and it was merged in DSpace 5.4: <a href="https://jira.duraspace.org/browse/DS-2679">https://jira.duraspace.org/browse/DS-2679</a></li>
|
||||
|
Reference in New Issue
Block a user