mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-23 05:32:20 +01:00
364 lines
18 KiB
HTML
364 lines
18 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en-us">
|
||
<head prefix="og: http://ogp.me/ns#">
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1" />
|
||
<meta property="og:title" content=" March, 2016 · CGSpace Notes" />
|
||
|
||
<meta property="og:site_name" content="CGSpace Notes" />
|
||
<meta property="og:url" content="/cgspace-notes/2016-03/" />
|
||
|
||
|
||
<meta property="og:type" content="article" />
|
||
|
||
<meta property="og:article:published_time" content="2016-03-02T16:50:00+03:00" />
|
||
|
||
<meta property="og:article:tag" content="notes" />
|
||
|
||
|
||
|
||
<title>
|
||
March, 2016 · CGSpace Notes
|
||
</title>
|
||
|
||
<link rel="stylesheet" href="/cgspace-notes/css/bootstrap.min.css" />
|
||
<link rel="stylesheet" href="/cgspace-notes/css/main.css" />
|
||
<link rel="stylesheet" href="/cgspace-notes/css/font-awesome.min.css" />
|
||
<link rel="stylesheet" href="/cgspace-notes/css/github.css" />
|
||
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=Source+Sans+Pro:200,300,400" type="text/css">
|
||
<link rel="shortcut icon" href="/cgspace-notes/images/favicon.ico" />
|
||
<link rel="apple-touch-icon" href="/cgspace-notes/images/apple-touch-icon.png" />
|
||
|
||
</head>
|
||
<body>
|
||
<header class="global-header" style="background-image:url(../images/bg.jpg )">
|
||
<section class="header-text">
|
||
<h1><a href="/cgspace-notes/">CGSpace Notes</a></h1>
|
||
|
||
<div class="sns-links hidden-print">
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
</div>
|
||
|
||
|
||
<a href="/cgspace-notes/" class="btn-header btn-back hidden-xs">
|
||
<i class="fa fa-angle-left" aria-hidden="true"></i>
|
||
Home
|
||
</a>
|
||
|
||
|
||
</section>
|
||
</header>
|
||
<main class="container">
|
||
|
||
|
||
<article>
|
||
<header>
|
||
<h1 class="text-primary">March, 2016</h1>
|
||
<div class="post-meta clearfix">
|
||
<div class="post-date pull-left">
|
||
Posted on
|
||
<time datetime="2016-03-02T16:50:00+03:00">
|
||
Mar 2, 2016
|
||
</time>
|
||
</div>
|
||
<div class="pull-right">
|
||
|
||
<span class="post-tag small"><a href="/cgspace-notes//tags/notes">#notes</a></span>
|
||
|
||
</div>
|
||
</div>
|
||
</header>
|
||
<section>
|
||
|
||
|
||
<h2 id="2016-03-02:5a28ddf3ee658c043c064ccddb151717">2016-03-02</h2>
|
||
|
||
<ul>
|
||
<li>Looking at issues with author authorities on CGSpace</li>
|
||
<li>For some reason we still have the <code>index-lucene-update</code> cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module</li>
|
||
<li>Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server</li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-07:5a28ddf3ee658c043c064ccddb151717">2016-03-07</h2>
|
||
|
||
<ul>
|
||
<li>Troubleshooting the issues with the slew of commits for Atmire modules in <a href="https://github.com/ilri/DSpace/pull/182">#182</a></li>
|
||
<li>Their changes on <code>5_x-dev</code> branch work, but it is messy as hell with merge commits and old branch base</li>
|
||
<li>When I rebase their branch on the latest <code>5_x-prod</code> I get blank white pages</li>
|
||
<li>I identified one commit that causes the issue and let them know</li>
|
||
<li>Restart DSpace Test, as it seems to have crashed after Sisay tried to import some CSV or zip or something:</li>
|
||
</ul>
|
||
|
||
<pre><code>Exception in thread "Lucene Merge Thread #19" org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device
|
||
</code></pre>
|
||
|
||
<h2 id="2016-03-08:5a28ddf3ee658c043c064ccddb151717">2016-03-08</h2>
|
||
|
||
<ul>
|
||
<li>Add a few new filters to Atmire’s Listings and Reports module (<a href="https://github.com/ilri/DSpace/issues/180">#180</a>)</li>
|
||
<li>We had also wanted to add a few to the Content and Usage module but I have to ask the editors which ones they were</li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-10:5a28ddf3ee658c043c064ccddb151717">2016-03-10</h2>
|
||
|
||
<ul>
|
||
<li>Disable the lucene cron job on CGSpace as it shouldn’t be needed anymore</li>
|
||
<li>Discuss ORCiD and duplicate authors on Yammer</li>
|
||
<li>Request new documentation for Atmire CUA and L&R modules, as ours are from 2013</li>
|
||
<li>Walk Sisay through some data cleaning workflows in OpenRefine</li>
|
||
<li>Start cleaning up the configuration for Atmire’s CUA module (<a href="https://github.com/ilri/DSpace/issues/185">#184</a>)</li>
|
||
<li>It is very messed up because some labels are incorrect, fields are missing, etc</li>
|
||
</ul>
|
||
|
||
<p><img src="../images/2016/03/cua-label-mixup.png" alt="Mixed up label in Atmire CUA" /></p>
|
||
|
||
<ul>
|
||
<li>Update documentation for Atmire modules</li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-11:5a28ddf3ee658c043c064ccddb151717">2016-03-11</h2>
|
||
|
||
<ul>
|
||
<li>As I was looking at the CUA config I realized our Discovery config is all messed up and confusing</li>
|
||
<li>I’ve opened an issue to track some of that work (<a href="https://github.com/ilri/DSpace/issues/186">#186</a>)</li>
|
||
<li>I did some major cleanup work on Discovery and XMLUI stuff related to the <code>dc.type</code> indexes (<a href="https://github.com/ilri/DSpace/pull/187">#187</a>)</li>
|
||
<li>We had been confusing <code>dc.type</code> (a Dublin Core value) with <code>dc.type.output</code> (a value we invented) for a few years and it had permeated all aspects of our data, indexes, item displays, etc.</li>
|
||
<li>There is still some more work to be done to remove references to old <code>outputtype</code> and <code>output</code></li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-14:5a28ddf3ee658c043c064ccddb151717">2016-03-14</h2>
|
||
|
||
<ul>
|
||
<li>Fix some items that had invalid dates (I noticed them in the log during a re-indexing)</li>
|
||
<li>Reset <code>search.index.*</code> to the default, as it is only used by Lucene (deprecated by Discovery in DSpace 5.x): <a href="https://github.com/ilri/DSpace/pull/188">#188</a></li>
|
||
<li>Make titles in Discovery and Browse by more consistent (singular, sentence case, etc) (<a href="https://github.com/ilri/DSpace/issues/186">#186</a>)</li>
|
||
<li>Also four or so center-specific subject strings were missing for Discovery</li>
|
||
</ul>
|
||
|
||
<p><img src="../images/2016/03/missing-xmlui-string.png" alt="Missing XMLUI string" /></p>
|
||
|
||
<h2 id="2016-03-15:5a28ddf3ee658c043c064ccddb151717">2016-03-15</h2>
|
||
|
||
<ul>
|
||
<li>Create simple theme for new AVCD community just for a unique Google Tracking ID (<a href="https://github.com/ilri/DSpace/pull/191">#191</a>)</li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-16:5a28ddf3ee658c043c064ccddb151717">2016-03-16</h2>
|
||
|
||
<ul>
|
||
<li>Still having problems deploying Atmire’s CUA updates and fixes from January!</li>
|
||
<li>More discussion on the GitHub issue here: <a href="https://github.com/ilri/DSpace/pull/182">https://github.com/ilri/DSpace/pull/182</a></li>
|
||
<li>Clean up Atmire CUA config (<a href="https://github.com/ilri/DSpace/pull/193">#193</a>)</li>
|
||
<li>Help Sisay with some PostgreSQL queries to clean up the incorrect <code>dc.contributor.corporateauthor</code> field</li>
|
||
<li>I noticed that we have some weird values in <code>dc.language</code>:</li>
|
||
</ul>
|
||
|
||
<pre><code># select * from metadatavalue where metadata_field_id=37;
|
||
metadata_value_id | resource_id | metadata_field_id | text_value | text_lang | place | authority | confidence | resource_type_id
|
||
-------------------+-------------+-------------------+------------+-----------+-------+-----------+------------+------------------
|
||
1942571 | 35342 | 37 | hi | | 1 | | -1 | 2
|
||
1942468 | 35345 | 37 | hi | | 1 | | -1 | 2
|
||
1942479 | 35337 | 37 | hi | | 1 | | -1 | 2
|
||
1942505 | 35336 | 37 | hi | | 1 | | -1 | 2
|
||
1942519 | 35338 | 37 | hi | | 1 | | -1 | 2
|
||
1942535 | 35340 | 37 | hi | | 1 | | -1 | 2
|
||
1942555 | 35341 | 37 | hi | | 1 | | -1 | 2
|
||
1942588 | 35343 | 37 | hi | | 1 | | -1 | 2
|
||
1942610 | 35346 | 37 | hi | | 1 | | -1 | 2
|
||
1942624 | 35347 | 37 | hi | | 1 | | -1 | 2
|
||
1942639 | 35339 | 37 | hi | | 1 | | -1 | 2
|
||
</code></pre>
|
||
|
||
<ul>
|
||
<li>It seems this <code>dc.language</code> field isn’t really used, but we should delete these values</li>
|
||
<li>Also, <code>dc.language.iso</code> has some weird values, like “En” and “English”</li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-17:5a28ddf3ee658c043c064ccddb151717">2016-03-17</h2>
|
||
|
||
<ul>
|
||
<li>It turns out <code>hi</code> is the ISO 639 language code for Hindi, but these should be in <code>dc.language.iso</code> instead of <code>dc.language</code></li>
|
||
<li>I fixed the eleven items with <code>hi</code> as well as some using the incorrect <code>vn</code> for Vietnamese</li>
|
||
<li>Start discussing CG core with Abenet and Sisay</li>
|
||
<li>Re-sync CGSpace database to DSpace Test for Atmire to do some tests about the problematic CUA patches</li>
|
||
<li>The patches work fine with a clean database, so the error was caused by some mismatch in CUA versions and the database during my testing</li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-18:5a28ddf3ee658c043c064ccddb151717">2016-03-18</h2>
|
||
|
||
<ul>
|
||
<li>Merge Atmire fixes into <code>5_x-prod</code></li>
|
||
<li>Discuss thumbnails with Francesca from Bioversity</li>
|
||
<li>Some of their items end up with thumbnails that have a big white border around them:</li>
|
||
</ul>
|
||
|
||
<p><img src="../images/2016/03/bioversity-thumbnail-bad.jpg" alt="Excessive whitespace in thumbnail" /></p>
|
||
|
||
<ul>
|
||
<li>Turns out we can add <code>-trim</code> to the GraphicsMagick options to trim the whitespace</li>
|
||
</ul>
|
||
|
||
<p><img src="../images/2016/03/bioversity-thumbnail-good.jpg" alt="Trimmed thumbnail" /></p>
|
||
|
||
<ul>
|
||
<li>Command used:</li>
|
||
</ul>
|
||
|
||
<pre><code>$ gm convert -trim -quality 82 -thumbnail x300 -flatten Descriptor\ for\ Butia_EN-2015_2021.pdf\[0\] cover.jpg
|
||
</code></pre>
|
||
|
||
<ul>
|
||
<li>Also, it looks like adding <code>-sharpen 0x1.0</code> really improves the quality of the image for only a few KB</li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-21:5a28ddf3ee658c043c064ccddb151717">2016-03-21</h2>
|
||
|
||
<ul>
|
||
<li>Fix 66 site errors in Google’s webmaster tools</li>
|
||
<li>I looked at a bunch of them and they were old URLs, weird things linked from non-existent items, etc, so I just marked them all as fixed</li>
|
||
<li>We also have 1,300 “soft 404” errors for URLs like: <a href="https://cgspace.cgiar.org/handle/10568/440/browse?type=bioversity">https://cgspace.cgiar.org/handle/10568/440/browse?type=bioversity</a></li>
|
||
<li>I’ve marked them as fixed as well since the ones I tested were working fine</li>
|
||
<li>This raises another question, as many of these pages are linked from Discovery search results and might create a duplicate content problem…</li>
|
||
<li>Results pages like this give items that Google already knows from the sitemap: <a href="https://cgspace.cgiar.org/discover?filtertype=author&filter_relational_operator=equals&filter=Orth%2C+A">https://cgspace.cgiar.org/discover?filtertype=author&filter_relational_operator=equals&filter=Orth%2C+A</a>.</li>
|
||
<li>There are some access denied errors on JSPUI links (of course! we forbid them!), but I’m not sure why Google is trying to index them…</li>
|
||
<li>For example:
|
||
|
||
<ul>
|
||
<li>This: <a href="https://cgspace.cgiar.org/jspui/bitstream/10568/809/1/main-page.pdf">https://cgspace.cgiar.org/jspui/bitstream/10568/809/1/main-page.pdf</a></li>
|
||
<li>Linked from: <a href="https://cgspace.cgiar.org/jspui/handle/10568/809">https://cgspace.cgiar.org/jspui/handle/10568/809</a></li>
|
||
</ul></li>
|
||
<li>I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!</li>
|
||
<li>Google says the first time it saw this particular error was September 29, 2015… so maybe it accidentally saw it somehow…</li>
|
||
<li>On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content</li>
|
||
</ul>
|
||
|
||
<p><img src="../images/2016/03/google-index.png" alt="CGSpace pages in Google index" /></p>
|
||
|
||
<ul>
|
||
<li>Turns out this is a problem with DSpace’s <code>robots.txt</code>, and there’s a Jira ticket since December, 2015: <a href="https://jira.duraspace.org/browse/DS-2962">https://jira.duraspace.org/browse/DS-2962</a></li>
|
||
<li>I am not sure if I want to apply it yet</li>
|
||
<li>For now I’ve just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools</li>
|
||
</ul>
|
||
|
||
<p><img src="../images/2016/03/url-parameters.png" alt="URL parameters cause millions of dynamic pages" />
|
||
<img src="../images/2016/03/url-parameters2.png" alt="Setting pages with the filter_0 param not to show in search results" /></p>
|
||
|
||
<ul>
|
||
<li>Move AVCD collection to new community and update <code>move_collection.sh</code> script: <a href="https://gist.github.com/alanorth/392c4660e8b022d99dfa">https://gist.github.com/alanorth/392c4660e8b022d99dfa</a></li>
|
||
<li>It seems Feedburner can do HTTPS now, so we might be able to update our feeds and simplify the nginx configs</li>
|
||
<li>De-deploy CGSpace with latest <code>5_x-prod</code> branch</li>
|
||
<li>Run updates on CGSpace and reboot server (new kernel, <code>4.5.0</code>)</li>
|
||
<li>Deploy Let’s Encrypt certificate for cgspace.cgiar.org, but still need to work it into the ansible playbooks</li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-22:5a28ddf3ee658c043c064ccddb151717">2016-03-22</h2>
|
||
|
||
<ul>
|
||
<li>Merge robots.txt patch and disallow indexing of browse pages as our sitemap is consumed correctly (<a href="https://github.com/ilri/DSpace/issues/198">#198</a>)</li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-23:5a28ddf3ee658c043c064ccddb151717">2016-03-23</h2>
|
||
|
||
<ul>
|
||
<li>Abenet is having problems saving group memberships, and she gets this error: <a href="https://gist.github.com/alanorth/87281c061c2de57b773e">https://gist.github.com/alanorth/87281c061c2de57b773e</a></li>
|
||
</ul>
|
||
|
||
<pre><code>Can't find method org.dspace.app.xmlui.aspect.administrative.FlowGroupUtils.processSaveGroup(org.dspace.core.Context,number,string,[Ljava.lang.String;,[Ljava.lang.String;,org.apache.cocoon.environment.wrapper.RequestWrapper). (resource://aspects/Administrative/administrative.js#967)
|
||
</code></pre>
|
||
|
||
<ul>
|
||
<li>I can reproduce the same error on DSpace Test and on my Mac</li>
|
||
<li>Looks to be an issue with the Atmire modules, I’ve submitted a ticket to their tracker.</li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-24:5a28ddf3ee658c043c064ccddb151717">2016-03-24</h2>
|
||
|
||
<ul>
|
||
<li>Atmire sent a patch for the group saving issue: <a href="https://github.com/ilri/DSpace/pull/201">https://github.com/ilri/DSpace/pull/201</a></li>
|
||
<li>I tested it locally and it works, so I merged it to <code>5_x-prod</code> and will deploy on CGSpace this week</li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-25:5a28ddf3ee658c043c064ccddb151717">2016-03-25</h2>
|
||
|
||
<ul>
|
||
<li>Having problems with Listings and Reports, seems to be caused by a rogue reference to <code>dc.type.output</code></li>
|
||
<li>This is the error we get when we proceed to the second page of Listings and Reports: <a href="https://gist.github.com/alanorth/b2d7fb5b82f94898caaf">https://gist.github.com/alanorth/b2d7fb5b82f94898caaf</a></li>
|
||
<li>Commenting out the line works, but I haven’t figured out the proper syntax for referring to <code>dc.type.*</code></li>
|
||
</ul>
|
||
|
||
<h2 id="2016-03-28:5a28ddf3ee658c043c064ccddb151717">2016-03-28</h2>
|
||
|
||
<ul>
|
||
<li>Look into enabling the embargo during item submission, see: <a href="https://wiki.duraspace.org/display/DSDOC5x/Embargo#Embargo-SubmissionProcess">https://wiki.duraspace.org/display/DSDOC5x/Embargo#Embargo-SubmissionProcess</a></li>
|
||
<li>Seems we only want <code>AccessStep</code> because <code>UploadWithEmbargoStep</code> disables the ability to edit embargos at the item level</li>
|
||
<li>This pull request enables the ability to set an item-level embargo during submission: <a href="https://github.com/ilri/DSpace/pull/203">https://github.com/ilri/DSpace/pull/203</a></li>
|
||
<li>I figured out that the problem with Listings and Reports was because I disabled the <code>search.index.*</code> last week, and they are still used by JSPUI apparently</li>
|
||
<li>This pull request re-enables them: <a href="https://github.com/ilri/DSpace/pull/202">https://github.com/ilri/DSpace/pull/202</a></li>
|
||
<li>Re-deploy DSpace Test, run all system updates, and restart the server</li>
|
||
<li>Looks like the Listings and Reports fix was NOT due to the search indexes (which are actually not used), and rather due to the filter configuration in the Listings and Reports config</li>
|
||
<li>This pull request simply updates the config for the dc.type.output → dc.type change that was made last week: <a href="https://github.com/ilri/DSpace/pull/204">https://github.com/ilri/DSpace/pull/204</a></li>
|
||
</ul>
|
||
|
||
</section>
|
||
<footer>
|
||
|
||
<section class="author-info row">
|
||
<div class="author-avatar col-md-2">
|
||
|
||
</div>
|
||
<div class="author-meta col-md-6">
|
||
|
||
<h1 class="author-name text-primary">Alan Orth</h1>
|
||
|
||
|
||
</div>
|
||
|
||
</section>
|
||
<ul class="pager">
|
||
|
||
<li class="previous"><a href="/cgspace-notes/2016-02/"><span aria-hidden="true">←</span> Older</a></li>
|
||
|
||
|
||
<li class="next disabled"><a href="#">Newer <span aria-hidden="true">→</span></a></li>
|
||
|
||
</ul>
|
||
</footer>
|
||
</article>
|
||
|
||
</main>
|
||
<footer class="container global-footer">
|
||
<div class="copyright-note pull-left">
|
||
|
||
</div>
|
||
<div class="sns-links hidden-print">
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
</div>
|
||
|
||
</footer>
|
||
|
||
<script src="/cgspace-notes/js/highlight.pack.js"></script>
|
||
<script>
|
||
hljs.initHighlightingOnLoad();
|
||
</script>
|
||
|
||
|
||
</body>
|
||
</html>
|
||
|