cgspace-notes/docs/2020-10/index.html

427 lines
20 KiB
HTML
Raw Normal View History

2020-10-06 15:59:31 +02:00
<!DOCTYPE html>
<html lang="en" >
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:title" content="October, 2020" />
<meta property="og:description" content="2020-10-06
Add tests for the new /items POST handlers to the DSpace 6.x branch of my dspace-statistics-api
It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available
2020-10-06 22:38:45 +02:00
Tag and release version 1.3.0 on GitHub: https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0
Trying to test the changes Atmire sent last week but I had to re-create my local database from a recent CGSpace dump
During the FlywayDB migration I got an error:
2020-10-06 15:59:31 +02:00
" />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-10/" />
<meta property="article:published_time" content="2020-10-06T16:55:54+03:00" />
2020-10-08 10:15:49 +02:00
<meta property="article:modified_time" content="2020-10-07T14:44:39+03:00" />
2020-10-06 15:59:31 +02:00
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="October, 2020"/>
<meta name="twitter:description" content="2020-10-06
Add tests for the new /items POST handlers to the DSpace 6.x branch of my dspace-statistics-api
It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available
2020-10-06 22:38:45 +02:00
Tag and release version 1.3.0 on GitHub: https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0
Trying to test the changes Atmire sent last week but I had to re-create my local database from a recent CGSpace dump
During the FlywayDB migration I got an error:
2020-10-06 15:59:31 +02:00
"/>
2020-10-08 10:15:49 +02:00
<meta name="generator" content="Hugo 0.76.2" />
2020-10-06 15:59:31 +02:00
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "October, 2020",
"url": "https://alanorth.github.io/cgspace-notes/2020-10/",
2020-10-08 10:15:49 +02:00
"wordCount": "1077",
2020-10-06 15:59:31 +02:00
"datePublished": "2020-10-06T16:55:54+03:00",
2020-10-08 10:15:49 +02:00
"dateModified": "2020-10-07T14:44:39+03:00",
2020-10-06 15:59:31 +02:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
</script>
<link rel="canonical" href="https://alanorth.github.io/cgspace-notes/2020-10/">
<title>October, 2020 | CGSpace Notes</title>
<!-- combined, minified CSS -->
<link href="https://alanorth.github.io/cgspace-notes/css/style.6da5c906cc7a8fbb93f31cd2316c5dbe3f19ac4aa6bfb066f1243045b8f6061e.css" rel="stylesheet" integrity="sha256-baXJBsx6j7uT8xzSMWxdvj8ZrEqmv7Bm8SQwRbj2Bh4=" crossorigin="anonymous">
<!-- minified Font Awesome for SVG icons -->
<script defer src="https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f3d2a1f5980bab30ddd0d8cadbd496475309fc48e2b1d052c5c09e6facffcb0f.js" integrity="sha256-89Kh9ZgLqzDd0NjK29SWR1MJ/EjisdBSxcCeb6z/yw8=" crossorigin="anonymous"></script>
<!-- RSS 2.0 feed -->
</head>
<body>
<div class="blog-masthead">
<div class="container">
<nav class="nav blog-nav">
<a class="nav-link " href="https://alanorth.github.io/cgspace-notes/">Home</a>
</nav>
</div>
</div>
<header class="blog-header">
<div class="container">
<h1 class="blog-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/" rel="home">CGSpace Notes</a></h1>
<p class="lead blog-description" dir="auto">Documenting day-to-day work on the <a href="https://cgspace.cgiar.org">CGSpace</a> repository.</p>
</div>
</header>
<div class="container">
<div class="row">
<div class="col-sm-8 blog-main">
<article class="blog-post">
<header>
<h2 class="blog-post-title" dir="auto"><a href="https://alanorth.github.io/cgspace-notes/2020-10/">October, 2020</a></h2>
<p class="blog-post-meta"><time datetime="2020-10-06T16:55:54+03:00">Tue Oct 06, 2020</time> by Alan Orth in
<span class="fas fa-folder" aria-hidden="true"></span>&nbsp;<a href="/cgspace-notes/categories/notes/" rel="category tag">Notes</a>
</p>
</header>
<h2 id="2020-10-06">2020-10-06</h2>
<ul>
<li>Add tests for the new <code>/items</code> POST handlers to the DSpace 6.x branch of my <a href="https://github.com/ilri/dspace-statistics-api/tree/v6_x">dspace-statistics-api</a>
<ul>
<li>It took a bit of extra work because I had to learn how to mock the responses for when Solr is not available</li>
2020-10-06 22:38:45 +02:00
<li>Tag and release version 1.3.0 on GitHub: <a href="https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0">https://github.com/ilri/dspace-statistics-api/releases/tag/v1.3.0</a></li>
2020-10-06 15:59:31 +02:00
</ul>
</li>
2020-10-06 22:38:45 +02:00
<li>Trying to test the changes Atmire sent last week but I had to re-create my local database from a recent CGSpace dump
<ul>
<li>During the FlywayDB migration I got an error:</li>
</ul>
</li>
</ul>
<pre><code>2020-10-06 21:36:04,138 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ Batch entry 0 update public.bitstreamformatregistry set description='Electronic publishing', internal='FALSE', mimetype='application/epub+zip', short_description='EPUB', support_level=1 where bitstream_format_id=78 was aborted: ERROR: duplicate key value violates unique constraint &quot;bitstreamformatregistry_short_description_key&quot;
Detail: Key (short_description)=(EPUB) already exists. Call getNextException to see other errors in the batch.
2020-10-06 21:36:04,138 WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ SQL Error: 0, SQLState: 23505
2020-10-06 21:36:04,138 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper @ ERROR: duplicate key value violates unique constraint &quot;bitstreamformatregistry_short_description_key&quot;
Detail: Key (short_description)=(EPUB) already exists.
2020-10-06 21:36:04,142 ERROR org.hibernate.engine.jdbc.batch.internal.BatchingBatch @ HHH000315: Exception executing batch [could not execute batch]
2020-10-06 21:36:04,143 ERROR org.dspace.storage.rdbms.DatabaseRegistryUpdater @ Error attempting to update Bitstream Format and/or Metadata Registries
org.hibernate.exception.ConstraintViolationException: could not execute batch
at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:129)
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:49)
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:124)
at org.hibernate.engine.jdbc.batch.internal.BatchingBatch.performExecution(BatchingBatch.java:122)
at org.hibernate.engine.jdbc.batch.internal.BatchingBatch.doExecuteBatch(BatchingBatch.java:101)
at org.hibernate.engine.jdbc.batch.internal.AbstractBatchImpl.execute(AbstractBatchImpl.java:161)
at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.executeBatch(JdbcCoordinatorImpl.java:207)
at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:390)
at org.hibernate.engine.spi.ActionQueue.executeActions(ActionQueue.java:304)
at org.hibernate.event.internal.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:349)
at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:56)
at org.hibernate.internal.SessionImpl.flush(SessionImpl.java:1195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.hibernate.context.internal.ThreadLocalSessionContext$TransactionProtectionWrapper.invoke(ThreadLocalSessionContext.java:352)
at com.sun.proxy.$Proxy162.flush(Unknown Source)
at org.dspace.core.HibernateDBConnection.commit(HibernateDBConnection.java:83)
at org.dspace.core.Context.commit(Context.java:435)
at org.dspace.core.Context.complete(Context.java:380)
at org.dspace.administer.MetadataImporter.loadRegistry(MetadataImporter.java:164)
at org.dspace.storage.rdbms.DatabaseRegistryUpdater.updateRegistries(DatabaseRegistryUpdater.java:72)
at org.dspace.storage.rdbms.DatabaseRegistryUpdater.afterMigrate(DatabaseRegistryUpdater.java:121)
at org.flywaydb.core.internal.command.DbMigrate$3.doInTransaction(DbMigrate.java:250)
at org.flywaydb.core.internal.util.jdbc.TransactionTemplate.execute(TransactionTemplate.java:72)
at org.flywaydb.core.internal.command.DbMigrate.migrate(DbMigrate.java:246)
at org.flywaydb.core.Flyway$1.execute(Flyway.java:959)
at org.flywaydb.core.Flyway$1.execute(Flyway.java:917)
at org.flywaydb.core.Flyway.execute(Flyway.java:1373)
at org.flywaydb.core.Flyway.migrate(Flyway.java:917)
at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:663)
at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:575)
at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:551)
at org.dspace.core.Context.&lt;clinit&gt;(Context.java:103)
at org.dspace.app.util.AbstractDSpaceWebapp.register(AbstractDSpaceWebapp.java:74)
at org.dspace.app.util.DSpaceWebappListener.contextInitialized(DSpaceWebappListener.java:31)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:5197)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5720)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:1016)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:992)
</code></pre><ul>
<li>I checked the database migrations with <code>dspace database info</code> and they were all OK
<ul>
<li>Then I restarted the Tomcat again and it started up OK&hellip;</li>
</ul>
</li>
<li>There were two issues I had reported to Atmire last month:
<ul>
<li>Importing items from the command line throws a <code>NullPointerException</code> from <code>com.atmire.dspace.cua.CUASolrLoggerServiceImpl</code> for every item, but the item still gets imported</li>
<li>No results for author name in Listing and Reports, despite there being hits in Discovery search</li>
</ul>
</li>
<li>To test the first one I imported a very simple CSV file with one item with minimal data
<ul>
<li>There is a new error now (but the item does get imported):</li>
</ul>
</li>
</ul>
<pre><code>$ dspace metadata-import -f /tmp/2020-10-06-import-test.csv -e aorth@mjanja.ch
Loading @mire database changes for module MQM
Changes have been processed
-----------------------------------------------------------
New item:
+ New owning collection (10568/3): ILRI articles in journals
+ Add (dc.contributor.author): Orth, Alan
+ Add (dc.date.issued): 2020-09-01
+ Add (dc.title): Testing CUA import NPE
1 item(s) will be changed
Do you want to make these changes? [y/n] y
-----------------------------------------------------------
New item: aff5e78d-87c9-438d-94f8-1050b649961c (10568/108548)
+ New owning collection (10568/3): ILRI articles in journals
+ Added (dc.contributor.author): Orth, Alan
+ Added (dc.date.issued): 2020-09-01
+ Added (dc.title): Testing CUA import NPE
Tue Oct 06 22:06:14 CEST 2020 | Query:containerItem:aff5e78d-87c9-438d-94f8-1050b649961c
Error while updating
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Expected mime type application/octet-stream but got text/html. &lt;!doctype html&gt;&lt;html lang=&quot;en&quot;&gt;&lt;head&gt;&lt;title&gt;HTTP Status 404 Not Found&lt;/title&gt;&lt;style type=&quot;text/css&quot;&gt;body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}&lt;/style&gt;&lt;/head&gt;&lt;body&gt;&lt;h1&gt;HTTP Status 404 Not Found&lt;/h1&gt;&lt;hr class=&quot;line&quot; /&gt;&lt;p&gt;&lt;b&gt;Type&lt;/b&gt; Status Report&lt;/p&gt;&lt;p&gt;&lt;b&gt;Message&lt;/b&gt; The requested resource [/solr/update] is not available&lt;/p&gt;&lt;p&gt;&lt;b&gt;Description&lt;/b&gt; The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.&lt;/p&gt;&lt;hr class=&quot;line&quot; /&gt;&lt;h3&gt;Apache Tomcat/7.0.104&lt;/h3&gt;&lt;/body&gt;&lt;/html&gt;
at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:512)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl$5.visit(SourceFile:1131)
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.visitEachStatisticShard(SourceFile:212)
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1104)
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1093)
at org.dspace.statistics.StatisticsLoggingConsumer.consume(SourceFile:104)
at org.dspace.event.BasicDispatcher.consume(BasicDispatcher.java:177)
at org.dspace.event.BasicDispatcher.dispatch(BasicDispatcher.java:123)
at org.dspace.core.Context.dispatchEvents(Context.java:455)
at org.dspace.core.Context.commit(Context.java:424)
at org.dspace.core.Context.complete(Context.java:380)
at org.dspace.app.bulkedit.MetadataImport.main(MetadataImport.java:1399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
</code></pre><ul>
<li>Also, I tested Listings and Reports and there are still no hits for &ldquo;Orth, Alan&rdquo; as a contributor, despite there being dozens of items in the repository and the Solr query generated by Listings and Reports actually returning hits:</li>
</ul>
<pre><code>2020-10-06 22:23:44,116 INFO org.apache.solr.core.SolrCore @ [search] webapp=/solr path=/select params={q=*:*&amp;fl=handle,search.resourcetype,search.resourceid,search.uniqueid&amp;start=0&amp;fq=NOT(withdrawn:true)&amp;fq=NOT(discoverable:false)&amp;fq=search.resourcetype:2&amp;fq=author_keyword:Orth,\+A.+OR+author_keyword:Orth,\+Alan&amp;fq=dateIssued.year:[2013+TO+2021]&amp;rows=500&amp;wt=javabin&amp;version=2} hits=18 status=0 QTime=10
</code></pre><ul>
<li>Solr returns <code>hits=18</code> for the L&amp;R query, but there are no result shown in the L&amp;R UI</li>
<li>I sent all this feedback to Atmire&hellip;</li>
2020-10-06 15:59:31 +02:00
</ul>
2020-10-07 13:44:39 +02:00
<h2 id="2020-10-07">2020-10-07</h2>
<ul>
<li>Udana from IWMI had asked about stats discrepencies from reports they had generated in previous months or years
<ul>
<li>I told him that we very often purge bots and the number of stats can change drastically</li>
<li>Also, I told him that it is not possible to compare stats from previous exports and that the stats should be taking with a grain of salt</li>
</ul>
</li>
<li>Testing POSTing items to the DSpace 6 REST API
<ul>
<li>We need to authenticate to get a JSESSIONID cookie first:</li>
</ul>
</li>
</ul>
<pre><code>$ http -f POST https://dspacetest.cgiar.org/rest/login email=aorth@fuuu.com 'password=fuuuu'
$ http https://dspacetest.cgiar.org/rest/status Cookie:JSESSIONID=EABAC9EFF942028AA52DFDA16DBCAFDE
</code></pre><ul>
<li>Then we post an item in JSON format to <code>/rest/collections/{uuid}/items</code>:</li>
</ul>
<pre><code>$ http POST https://dspacetest.cgiar.org/rest/collections/f10ad667-2746-4705-8b16-4439abe61d22/items Cookie:JSESSIONID=EABAC9EFF942028AA52DFDA16DBCAFDE &lt; item-object.json
</code></pre><ul>
<li>Format of JSON is:</li>
</ul>
<pre><code>{ &quot;metadata&quot;: [
{
&quot;key&quot;: &quot;dc.title&quot;,
&quot;value&quot;: &quot;Testing REST API post&quot;,
&quot;language&quot;: &quot;en_US&quot;
},
{
&quot;key&quot;: &quot;dc.contributor.author&quot;,
&quot;value&quot;: &quot;Orth, Alan&quot;,
&quot;language&quot;: &quot;en_US&quot;
},
{
&quot;key&quot;: &quot;dc.date.issued&quot;,
&quot;value&quot;: &quot;2020-09-01&quot;,
&quot;language&quot;: &quot;en_US&quot;
}
],
&quot;archived&quot;:&quot;false&quot;,
&quot;withdrawn&quot;:&quot;false&quot;
}
</code></pre><ul>
<li>What is unclear to me is the <code>archived</code> parameter, it seems to do nothing&hellip; perhaps it is only used for the <code>/items</code> endpoint when printing information about an item
<ul>
<li>If I submit to a collection that has a workflow, even as a super admin and with &ldquo;archived=false&rdquo; in the JSON, the item enters the workflow (&ldquo;Awaiting editor&rsquo;s attention&rdquo;)</li>
<li>If I submit to a new collection without a workflow the item gets archived immediately</li>
<li>I created <a href="https://gist.github.com/alanorth/40fc3092aefd78f978cca00e8abeeb7a">some notes</a> to share with Salem and Leroy for future reference when we start discussion POSTing items to the REST API</li>
</ul>
</li>
<li>I created an account for Salem on DSpace Test and added it to the submitters group of an ICARDA collection with no other workflow steps so we can see what happens
<ul>
<li>We are curious to see if he gets a UUID when posting from MEL</li>
</ul>
</li>
2020-10-08 10:15:49 +02:00
<li>I did some tests by adding his account to certain workflow steps and trying to POST the item</li>
<li>Member of collection &ldquo;Submitters&rdquo; step:
<ul>
<li>HTTP Status 401 Unauthorized</li>
<li>The request has not been applied because it lacks valid authentication credentials for the target resource.</li>
</ul>
</li>
<li>Member of collection &ldquo;Accept/Reject&rdquo; step:
<ul>
<li>Same error&hellip;</li>
</ul>
</li>
<li>Member of collection &ldquo;Accept/Reject/Edit Metadata&rdquo; step:
<ul>
<li>Same error&hellip;</li>
</ul>
</li>
<li>Member of collection Administrators with no other workflow steps&hellip;:
<ul>
<li>Posts straight to archive</li>
</ul>
</li>
<li>Member of collection Administrators with empty &ldquo;Accept/Reject/Edit Metadata&rdquo; step:
<ul>
<li>Posts straight to archive</li>
</ul>
</li>
<li>Member of collection Administrators with populated &ldquo;Accept/Reject/Edit Metadata&rdquo; step:
<ul>
<li>Does <em>not</em> post straight to archive, goes to workflow</li>
</ul>
</li>
<li>Note that community administrators have no role in item submission other than being able to create/manage collection groups</li>
2020-10-07 13:44:39 +02:00
</ul>
2020-10-06 15:59:31 +02:00
<!-- raw HTML omitted -->
</article>
</div> <!-- /.blog-main -->
<aside class="col-sm-3 ml-auto blog-sidebar">
<section class="sidebar-module">
<h4>Recent Posts</h4>
<ol class="list-unstyled">
<li><a href="/cgspace-notes/2020-10/">October, 2020</a></li>
<li><a href="/cgspace-notes/2020-09/">September, 2020</a></li>
<li><a href="/cgspace-notes/2020-08/">August, 2020</a></li>
<li><a href="/cgspace-notes/2020-07/">July, 2020</a></li>
<li><a href="/cgspace-notes/2020-06/">June, 2020</a></li>
</ol>
</section>
<section class="sidebar-module">
<h4>Links</h4>
<ol class="list-unstyled">
<li><a href="https://cgspace.cgiar.org">CGSpace</a></li>
<li><a href="https://dspacetest.cgiar.org">DSpace Test</a></li>
<li><a href="https://github.com/ilri/DSpace">CGSpace @ GitHub</a></li>
</ol>
</section>
</aside>
</div> <!-- /.row -->
</div> <!-- /.container -->
<footer class="blog-footer">
<p dir="auto">
Blog template created by <a href="https://twitter.com/mdo">@mdo</a>, ported to Hugo by <a href='https://twitter.com/mralanorth'>@mralanorth</a>.
</p>
<p>
<a href="#">Back to top</a>
</p>
</footer>
</body>
</html>