mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 14:45:03 +01:00
291 lines
13 KiB
HTML
291 lines
13 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en-us">
|
|
<head prefix="og: http://ogp.me/ns#">
|
|
<meta charset="utf-8" />
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1" />
|
|
<meta property="og:title" content=" CGSpace Notes" />
|
|
|
|
<meta property="og:site_name" content="CGSpace Notes" />
|
|
<meta property="og:url" content="/cgspace-notes/" />
|
|
|
|
|
|
<meta property="og:type" content="website" />
|
|
|
|
|
|
<title>
|
|
CGSpace Notes
|
|
</title>
|
|
|
|
<link rel="stylesheet" href="/cgspace-notes/css/bootstrap.min.css" />
|
|
<link rel="stylesheet" href="/cgspace-notes/css/main.css" />
|
|
<link rel="stylesheet" href="/cgspace-notes/css/font-awesome.min.css" />
|
|
<link rel="stylesheet" href="/cgspace-notes/css/github.css" />
|
|
<link rel="stylesheet" href="//fonts.googleapis.com/css?family=Source+Sans+Pro:200,300,400" type="text/css">
|
|
<link rel="shortcut icon" href="/cgspace-notes/images/favicon.ico" />
|
|
<link rel="apple-touch-icon" href="/cgspace-notes/images/apple-touch-icon.png" />
|
|
|
|
<link href="/cgspace-notes/index.xml" rel="alternate" type="application/rss+xml" title="CGSpace Notes" />
|
|
|
|
</head>
|
|
<body>
|
|
<header class="global-header" style="background-image:url( /images/bg.jpg )">
|
|
<section class="header-text">
|
|
<h1><a href="/cgspace-notes/">CGSpace Notes</a></h1>
|
|
|
|
<div class="sns-links hidden-print">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<a href="/cgspace-notes/index.xml" class="btn-header btn-subscribe hidden-xs">
|
|
<i class="fa fa-rss" aria-hidden="true"></i>
|
|
Subscribe
|
|
</a>
|
|
|
|
</section>
|
|
</header>
|
|
<main class="container">
|
|
|
|
<div class="article-list">
|
|
|
|
|
|
<article>
|
|
<header>
|
|
<h2><a href="/cgspace-notes/2016-05/">May, 2016</a></h2>
|
|
<div class="post-meta clearfix">
|
|
<div class="post-date pull-left">
|
|
Posted on
|
|
<time datetime="2016-05-01T23:06:00+03:00">
|
|
May 1, 2016
|
|
</time>
|
|
</div>
|
|
</div>
|
|
</header>
|
|
<div>
|
|
2016-05-01 Since yesterday there have been 10,000 REST errors and the site has been unstable again I have blocked access to the API now There are 3,000 IPs accessing the REST API in a 24-hour period! # awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l 3168 The two most often requesters are in Ethiopia and Colombia: 213.55.99.121 and 181.118.144.29 100% of the requests coming from Ethiopia are like this
|
|
</div>
|
|
|
|
<footer>
|
|
<ul class="pager">
|
|
<li class="next"><a href="/cgspace-notes/2016-05/">Read more <span aria-hidden="true">»</span></a></li>
|
|
</ul>
|
|
</footer>
|
|
|
|
</article>
|
|
|
|
|
|
|
|
<hr/>
|
|
|
|
<article>
|
|
<header>
|
|
<h2><a href="/cgspace-notes/2016-04/">April, 2016</a></h2>
|
|
<div class="post-meta clearfix">
|
|
<div class="post-date pull-left">
|
|
Posted on
|
|
<time datetime="2016-04-04T11:06:00+03:00">
|
|
Apr 4, 2016
|
|
</time>
|
|
</div>
|
|
</div>
|
|
</header>
|
|
<div>
|
|
2016-04-04 Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year! This will save us a few gigs of backup space we’re paying for on S3 Also, I noticed the checker log has some errors we should pay attention to: Run start time: 03/06/2016 04:00:22 Error retrieving bitstream ID 71274 from asset store.
|
|
</div>
|
|
|
|
<footer>
|
|
<ul class="pager">
|
|
<li class="next"><a href="/cgspace-notes/2016-04/">Read more <span aria-hidden="true">»</span></a></li>
|
|
</ul>
|
|
</footer>
|
|
|
|
</article>
|
|
|
|
|
|
|
|
<hr/>
|
|
|
|
<article>
|
|
<header>
|
|
<h2><a href="/cgspace-notes/2016-03/">March, 2016</a></h2>
|
|
<div class="post-meta clearfix">
|
|
<div class="post-date pull-left">
|
|
Posted on
|
|
<time datetime="2016-03-02T16:50:00+03:00">
|
|
Mar 2, 2016
|
|
</time>
|
|
</div>
|
|
</div>
|
|
</header>
|
|
<div>
|
|
2016-03-02 Looking at issues with author authorities on CGSpace For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server 2016-03-07 Troubleshooting the issues with the slew of commits for Atmire modules in #182 Their changes on 5_x-dev branch work, but it is messy as hell with merge commits and old branch base When I rebase their branch on the latest 5_x-prod I get blank white pages I identified one commit that causes the issue and let them know Restart DSpace Test, as it seems to have crashed after Sisay tried to import some CSV or zip or something: Exception in thread "Lucene Merge Thread #19" org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device 2016-03-08 Add a few new filters to Atmire’s Listings and Reports module (#180) We had also wanted to add a few to the Content and Usage module but I have to ask the editors which ones they were 2016-03-10 Disable the lucene cron job on CGSpace as it shouldn’t be needed anymore Discuss ORCiD and duplicate authors on Yammer Request new documentation for Atmire CUA and L&R modules, as ours are from 2013 Walk Sisay through some data cleaning workflows in OpenRefine Start cleaning up the configuration for Atmire’s CUA module (#184) It is very messed up because some labels are incorrect, fields are missing, etc Update documentation for Atmire modules 2016-03-11 As I was looking at the CUA config I realized our Discovery config is all messed up and confusing I’ve opened an issue to track some of that work (#186) I did some major cleanup work on Discovery and XMLUI stuff related to the dc.type indexes (#187) We had been confusing dc.type (a Dublin Core value) with dc.type.output (a value we invented) for a few years and it had permeated all aspects of our data, indexes, item displays, etc.
|
|
</div>
|
|
|
|
<footer>
|
|
<ul class="pager">
|
|
<li class="next"><a href="/cgspace-notes/2016-03/">Read more <span aria-hidden="true">»</span></a></li>
|
|
</ul>
|
|
</footer>
|
|
|
|
</article>
|
|
|
|
|
|
|
|
<hr/>
|
|
|
|
<article>
|
|
<header>
|
|
<h2><a href="/cgspace-notes/2016-02/">February, 2016</a></h2>
|
|
<div class="post-meta clearfix">
|
|
<div class="post-date pull-left">
|
|
Posted on
|
|
<time datetime="2016-02-05T13:18:00+03:00">
|
|
Feb 5, 2016
|
|
</time>
|
|
</div>
|
|
</div>
|
|
</header>
|
|
<div>
|
|
2016-02-05 Looking at some DAGRIS data for Abenet Yabowork Lots of issues with spaces, newlines, etc causing the import to fail I noticed we have a very interesting list of countries on CGSpace: Not only are there 49,000 countries, we have some blanks (25)… Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE” 2016-02-06 Found a way to get items with null/empty metadata values from SQL First, find the metadata_field_id for the field you want from the metadatafieldregistry table: dspacetest=# select * from metadatafieldregistry; In this case our country field is 78 Now find all resources with type 2 (item) that have null/empty values for that field: dspacetest=# select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=78 and (text_value='' OR text_value IS NULL); Then you can find the handle that owns it from its resource_id: dspacetest=# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id = '22678'; It’s 25 items so editing in the web UI is annoying, let’s try SQL!
|
|
</div>
|
|
|
|
<footer>
|
|
<ul class="pager">
|
|
<li class="next"><a href="/cgspace-notes/2016-02/">Read more <span aria-hidden="true">»</span></a></li>
|
|
</ul>
|
|
</footer>
|
|
|
|
</article>
|
|
|
|
|
|
|
|
<hr/>
|
|
|
|
<article>
|
|
<header>
|
|
<h2><a href="/cgspace-notes/2016-01/">January, 2016</a></h2>
|
|
<div class="post-meta clearfix">
|
|
<div class="post-date pull-left">
|
|
Posted on
|
|
<time datetime="2016-01-13T13:18:00+03:00">
|
|
Jan 13, 2016
|
|
</time>
|
|
</div>
|
|
</div>
|
|
</header>
|
|
<div>
|
|
2016-01-13 Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_collections.sh script I wrote last year. I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated. Update GitHub wiki for documentation of maintenance tasks. 2016-01-14 Update CCAFS project identifiers in input-forms.xml Run system updates and restart the server 2016-01-18 Change “Extension material” to “Extension Material” in input-forms.xml (a mistake that fell through the cracks when we fixed the others in DSpace 4 era) 2016-01-19 Work on tweaks and updates for the social sharing icons on item pages: add Delicious and Mendeley (from Academicons), make links open in new windows, and set the icon color to the theme’s primary color (#157) Tweak date-based facets to show more values in drill-down ranges (#162) Need to remember to clear the Cocoon cache after deployment or else you don’t see the new ranges immediately Set up recipe on IFTTT to tweet new items from the CGSpace Atom feed to my twitter account Altmetrics’ support for Handles is kinda weak, so they can’t associate our items with DOIs until they are tweeted or blogged, etc first.
|
|
</div>
|
|
|
|
<footer>
|
|
<ul class="pager">
|
|
<li class="next"><a href="/cgspace-notes/2016-01/">Read more <span aria-hidden="true">»</span></a></li>
|
|
</ul>
|
|
</footer>
|
|
|
|
</article>
|
|
|
|
|
|
|
|
<hr/>
|
|
|
|
<article>
|
|
<header>
|
|
<h2><a href="/cgspace-notes/2015-12/">December, 2015</a></h2>
|
|
<div class="post-meta clearfix">
|
|
<div class="post-date pull-left">
|
|
Posted on
|
|
<time datetime="2015-12-02T13:18:00+03:00">
|
|
Dec 2, 2015
|
|
</time>
|
|
</div>
|
|
</div>
|
|
</header>
|
|
<div>
|
|
2015-12-02 Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space: # cd /home/dspacetest.cgiar.org/log # ls -lh dspace.log.2015-11-18* -rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18 -rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo -rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz I had used lrzip once, but it needs more memory and is harder to use as it requires the lrztar
|
|
</div>
|
|
|
|
<footer>
|
|
<ul class="pager">
|
|
<li class="next"><a href="/cgspace-notes/2015-12/">Read more <span aria-hidden="true">»</span></a></li>
|
|
</ul>
|
|
</footer>
|
|
|
|
</article>
|
|
|
|
|
|
|
|
<hr/>
|
|
|
|
<article>
|
|
<header>
|
|
<h2><a href="/cgspace-notes/2015-11/">November, 2015</a></h2>
|
|
<div class="post-meta clearfix">
|
|
<div class="post-date pull-left">
|
|
Posted on
|
|
<time datetime="2015-11-23T17:00:57+03:00">
|
|
Nov 23, 2015
|
|
</time>
|
|
</div>
|
|
</div>
|
|
</header>
|
|
<div>
|
|
2015-11-22 CGSpace went down Looks like DSpace exhausted its PostgreSQL connection pool Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections: $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 78 For now I have increased the limit from 60 to 90, run updates, and rebooted the server 2015-11-24 CGSpace went down again Getting emails from uptimeRobot and uptimeButler that it’s down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors Looks like there are still a bunch of idle PostgreSQL connections: $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace 96 For some reason the number of idle connections is very high since we upgraded to DSpace 5 2015-11-25 Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config The OAI application requests stylesheets and javascript files with the path /oai/static/css, which gets matched here: # static assets we can load from the file system directly with nginx location ~ /(themes|static|aspects/ReportingSuite) { try_files $uri @tomcat; ...
|
|
</div>
|
|
|
|
<footer>
|
|
<ul class="pager">
|
|
<li class="next"><a href="/cgspace-notes/2015-11/">Read more <span aria-hidden="true">»</span></a></li>
|
|
</ul>
|
|
</footer>
|
|
|
|
</article>
|
|
|
|
|
|
</div>
|
|
<nav class="pagination" role="navigation">
|
|
|
|
|
|
<span class="page-number">Page 1 of 1</span>
|
|
|
|
</nav>
|
|
|
|
</main>
|
|
<footer class="container global-footer">
|
|
<div class="copyright-note pull-left">
|
|
|
|
</div>
|
|
<div class="sns-links hidden-print">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</div>
|
|
|
|
</footer>
|
|
|
|
<script src="/cgspace-notes/js/highlight.pack.js"></script>
|
|
<script>
|
|
hljs.initHighlightingOnLoad();
|
|
</script>
|
|
|
|
|
|
</body>
|
|
</html>
|
|
|
|
|