diff --git a/content/post/2016-12.md b/content/post/2016-12.md index 13770f1d6..d0e025e9d 100644 --- a/content/post/2016-12.md +++ b/content/post/2016-12.md @@ -227,3 +227,134 @@ UPDATE 561 - In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work - I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there - Paola from CCAFS mentioned she also has the "take task" bug on CGSpace +- Reading about [`shared_buffers` in PostgreSQL configuration](https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html) (default is 128MB) +- Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres +- The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn't dedicated (also runs Solr, which can benefit from OS cache) so let's try 1024MB +- In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above): + +``` +$ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority +Retrieving all data +Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer +Exception: null +java.lang.NullPointerException + at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82) + at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) + at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) + at java.lang.reflect.Method.invoke(Method.java:498) + at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) + at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) + +real 8m39.913s +user 1m54.190s +sys 0m22.647s +``` + +## 2016-12-07 + +- For what it's worth, after running the same SQL updates on my local test server, `index-authority` runs and completes just fine +- I will have to test more +- Anyways, I noticed that some of the authority values I set actually have versions of author names we don't want, ie "Grace, D." +- For example, do a Solr query for "first_name:Grace" and look at the results +- Querying that ID shows the fields that need to be changed: + +``` +{ + "responseHeader": { + "status": 0, + "QTime": 1, + "params": { + "q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b", + "indent": "true", + "wt": "json", + "_": "1481102189244" + } + }, + "response": { + "numFound": 1, + "start": 0, + "docs": [ + { + "id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b", + "field": "dc_contributor_author", + "value": "Grace, D.", + "deleted": false, + "creation_date": "2016-11-10T15:13:40.318Z", + "last_modified_date": "2016-11-10T15:13:40.318Z", + "authority_type": "person", + "first_name": "D.", + "last_name": "Grace" + } + ] + } +} +``` + +- I think I can just update the `value`, `first_name`, and `last_name` fields... +- The update syntax should be something like this, but I'm getting errors from Solr: + +``` +$ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]' +{ + "responseHeader":{ + "status":400, + "QTime":0}, + "error":{ + "msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]", + "code":400}} +``` + +- When I try using the XML format I get an error that the `updateLog` needs to be configured for that core +- Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly? + + +``` +dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; +UPDATE 561 +``` + +- Then I'll reindex discovery and authority and see how the authority Solr core looks +- After this, now there are authorities for some of the "Grace, D." and "Grace, Delia" text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new): + +``` +$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true' +{ + "responseHeader":{ + "status":0, + "QTime":0, + "params":{ + "q":"id:18ea1525-2513-430a-8817-a834cd733fbc", + "indent":"true", + "wt":"json"}}, + "response":{"numFound":1,"start":0,"docs":[ + { + "id":"18ea1525-2513-430a-8817-a834cd733fbc", + "field":"dc_contributor_author", + "value":"Grace, Delia", + "deleted":false, + "creation_date":"2016-12-07T10:54:34.356Z", + "last_modified_date":"2016-12-07T10:54:34.356Z", + "authority_type":"person", + "first_name":"Delia", + "last_name":"Grace"}] + }} +``` +- So now I could set them all to this ID and the name would be ok, but there has to be a better way! +- In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one! +- Better to use: + +``` +dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; +``` + +- This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky! diff --git a/public/2016-12/index.html b/public/2016-12/index.html index fb602b748..197c38057 100644 --- a/public/2016-12/index.html +++ b/public/2016-12/index.html @@ -30,7 +30,7 @@ - + @@ -350,6 +350,143 @@ UPDATE 561
  • In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work
  • I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there
  • Paola from CCAFS mentioned she also has the “take task” bug on CGSpace
  • +
  • Reading about shared_buffers in PostgreSQL configuration (default is 128MB)
  • +
  • Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres
  • +
  • The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn’t dedicated (also runs Solr, which can benefit from OS cache) so let’s try 1024MB
  • +
  • In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):
  • + + +
    $ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority
    +Retrieving all data
    +Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
    +Exception: null
    +java.lang.NullPointerException
    +        at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
    +        at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
    +        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
    +        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
    +        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
    +        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
    +        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
    +        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
    +        at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
    +        at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
    +        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    +        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    +        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    +        at java.lang.reflect.Method.invoke(Method.java:498)
    +        at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
    +        at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
    +
    +real    8m39.913s
    +user    1m54.190s
    +sys     0m22.647s
    +
    + +

    2016-12-07

    + + + +
    {
    +  "responseHeader": {
    +    "status": 0,
    +    "QTime": 1,
    +    "params": {
    +      "q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
    +      "indent": "true",
    +      "wt": "json",
    +      "_": "1481102189244"
    +    }
    +  },
    +  "response": {
    +    "numFound": 1,
    +    "start": 0,
    +    "docs": [
    +      {
    +        "id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
    +        "field": "dc_contributor_author",
    +        "value": "Grace, D.",
    +        "deleted": false,
    +        "creation_date": "2016-11-10T15:13:40.318Z",
    +        "last_modified_date": "2016-11-10T15:13:40.318Z",
    +        "authority_type": "person",
    +        "first_name": "D.",
    +        "last_name": "Grace"
    +      }
    +    ]
    +  }
    +}
    +
    + + + +
    $ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'
    +{
    +  "responseHeader":{
    +    "status":400,
    +    "QTime":0},
    +  "error":{
    +    "msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]",
    +    "code":400}}
    +
    + + + +
    dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
    +UPDATE 561
    +
    + + + +
    $ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true'
    +{
    +  "responseHeader":{
    +    "status":0,
    +    "QTime":0,
    +    "params":{
    +      "q":"id:18ea1525-2513-430a-8817-a834cd733fbc",
    +      "indent":"true",
    +      "wt":"json"}},
    +  "response":{"numFound":1,"start":0,"docs":[
    +      {
    +        "id":"18ea1525-2513-430a-8817-a834cd733fbc",
    +        "field":"dc_contributor_author",
    +        "value":"Grace, Delia",
    +        "deleted":false,
    +        "creation_date":"2016-12-07T10:54:34.356Z",
    +        "last_modified_date":"2016-12-07T10:54:34.356Z",
    +        "authority_type":"person",
    +        "first_name":"Delia",
    +        "last_name":"Grace"}]
    +  }}
    +
    + + + +
    dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
    +
    + + diff --git a/public/index.xml b/public/index.xml index 08e4ac7aa..b6ed0a98b 100644 --- a/public/index.xml +++ b/public/index.xml @@ -254,6 +254,143 @@ UPDATE 561 <li>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</li> <li>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</li> <li>Paola from CCAFS mentioned she also has the &ldquo;take task&rdquo; bug on CGSpace</li> +<li>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</li> +<li>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</li> +<li>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn&rsquo;t dedicated (also runs Solr, which can benefit from OS cache) so let&rsquo;s try 1024MB</li> +<li>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</li> +</ul> + +<pre><code>$ time JAVA_OPTS=&quot;-Xms768m -Xmx768m -Dfile.encoding=UTF-8&quot; /home/dspacetest.cgiar.org/bin/dspace index-authority +Retrieving all data +Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer +Exception: null +java.lang.NullPointerException + at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82) + at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) + at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) + at java.lang.reflect.Method.invoke(Method.java:498) + at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) + at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) + +real 8m39.913s +user 1m54.190s +sys 0m22.647s +</code></pre> + +<h2 id="2016-12-07">2016-12-07</h2> + +<ul> +<li>For what it&rsquo;s worth, after running the same SQL updates on my local test server, <code>index-authority</code> runs and completes just fine</li> +<li>I will have to test more</li> +<li>Anyways, I noticed that some of the authority values I set actually have versions of author names we don&rsquo;t want, ie &ldquo;Grace, D.&rdquo;</li> +<li>For example, do a Solr query for &ldquo;first_name:Grace&rdquo; and look at the results</li> +<li>Querying that ID shows the fields that need to be changed:</li> +</ul> + +<pre><code>{ + &quot;responseHeader&quot;: { + &quot;status&quot;: 0, + &quot;QTime&quot;: 1, + &quot;params&quot;: { + &quot;q&quot;: &quot;id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;, + &quot;indent&quot;: &quot;true&quot;, + &quot;wt&quot;: &quot;json&quot;, + &quot;_&quot;: &quot;1481102189244&quot; + } + }, + &quot;response&quot;: { + &quot;numFound&quot;: 1, + &quot;start&quot;: 0, + &quot;docs&quot;: [ + { + &quot;id&quot;: &quot;0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;, + &quot;field&quot;: &quot;dc_contributor_author&quot;, + &quot;value&quot;: &quot;Grace, D.&quot;, + &quot;deleted&quot;: false, + &quot;creation_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;, + &quot;last_modified_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;, + &quot;authority_type&quot;: &quot;person&quot;, + &quot;first_name&quot;: &quot;D.&quot;, + &quot;last_name&quot;: &quot;Grace&quot; + } + ] + } +} +</code></pre> + +<ul> +<li>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields&hellip;</li> +<li>The update syntax should be something like this, but I&rsquo;m getting errors from Solr:</li> +</ul> + +<pre><code>$ curl 'localhost:8081/solr/authority/update?commit=true&amp;wt=json&amp;indent=true' -H 'Content-type:application/json' -d '[{&quot;id&quot;:&quot;1&quot;,&quot;price&quot;:{&quot;set&quot;:100}}]' +{ + &quot;responseHeader&quot;:{ + &quot;status&quot;:400, + &quot;QTime&quot;:0}, + &quot;error&quot;:{ + &quot;msg&quot;:&quot;Unexpected character '[' (code 91) in prolog; expected '&lt;'\n at [row,col {unknown-source}]: [1,1]&quot;, + &quot;code&quot;:400}} +</code></pre> + +<ul> +<li>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</li> +<li>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</li> +</ul> + +<pre><code>dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; +UPDATE 561 +</code></pre> + +<ul> +<li>Then I&rsquo;ll reindex discovery and authority and see how the authority Solr core looks</li> +<li>After this, now there are authorities for some of the &ldquo;Grace, D.&rdquo; and &ldquo;Grace, Delia&rdquo; text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</li> +</ul> + +<pre><code>$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&amp;wt=json&amp;indent=true' +{ + &quot;responseHeader&quot;:{ + &quot;status&quot;:0, + &quot;QTime&quot;:0, + &quot;params&quot;:{ + &quot;q&quot;:&quot;id:18ea1525-2513-430a-8817-a834cd733fbc&quot;, + &quot;indent&quot;:&quot;true&quot;, + &quot;wt&quot;:&quot;json&quot;}}, + &quot;response&quot;:{&quot;numFound&quot;:1,&quot;start&quot;:0,&quot;docs&quot;:[ + { + &quot;id&quot;:&quot;18ea1525-2513-430a-8817-a834cd733fbc&quot;, + &quot;field&quot;:&quot;dc_contributor_author&quot;, + &quot;value&quot;:&quot;Grace, Delia&quot;, + &quot;deleted&quot;:false, + &quot;creation_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;, + &quot;last_modified_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;, + &quot;authority_type&quot;:&quot;person&quot;, + &quot;first_name&quot;:&quot;Delia&quot;, + &quot;last_name&quot;:&quot;Grace&quot;}] + }} +</code></pre> + +<ul> +<li>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</li> +<li>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</li> +<li>Better to use:</li> +</ul> + +<pre><code>dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; +</code></pre> + +<ul> +<li>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</li> </ul> diff --git a/public/post/index.xml b/public/post/index.xml index 999807ae6..05670ece0 100644 --- a/public/post/index.xml +++ b/public/post/index.xml @@ -254,6 +254,143 @@ UPDATE 561 <li>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</li> <li>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</li> <li>Paola from CCAFS mentioned she also has the &ldquo;take task&rdquo; bug on CGSpace</li> +<li>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</li> +<li>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</li> +<li>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn&rsquo;t dedicated (also runs Solr, which can benefit from OS cache) so let&rsquo;s try 1024MB</li> +<li>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</li> +</ul> + +<pre><code>$ time JAVA_OPTS=&quot;-Xms768m -Xmx768m -Dfile.encoding=UTF-8&quot; /home/dspacetest.cgiar.org/bin/dspace index-authority +Retrieving all data +Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer +Exception: null +java.lang.NullPointerException + at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82) + at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) + at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) + at java.lang.reflect.Method.invoke(Method.java:498) + at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) + at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) + +real 8m39.913s +user 1m54.190s +sys 0m22.647s +</code></pre> + +<h2 id="2016-12-07">2016-12-07</h2> + +<ul> +<li>For what it&rsquo;s worth, after running the same SQL updates on my local test server, <code>index-authority</code> runs and completes just fine</li> +<li>I will have to test more</li> +<li>Anyways, I noticed that some of the authority values I set actually have versions of author names we don&rsquo;t want, ie &ldquo;Grace, D.&rdquo;</li> +<li>For example, do a Solr query for &ldquo;first_name:Grace&rdquo; and look at the results</li> +<li>Querying that ID shows the fields that need to be changed:</li> +</ul> + +<pre><code>{ + &quot;responseHeader&quot;: { + &quot;status&quot;: 0, + &quot;QTime&quot;: 1, + &quot;params&quot;: { + &quot;q&quot;: &quot;id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;, + &quot;indent&quot;: &quot;true&quot;, + &quot;wt&quot;: &quot;json&quot;, + &quot;_&quot;: &quot;1481102189244&quot; + } + }, + &quot;response&quot;: { + &quot;numFound&quot;: 1, + &quot;start&quot;: 0, + &quot;docs&quot;: [ + { + &quot;id&quot;: &quot;0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;, + &quot;field&quot;: &quot;dc_contributor_author&quot;, + &quot;value&quot;: &quot;Grace, D.&quot;, + &quot;deleted&quot;: false, + &quot;creation_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;, + &quot;last_modified_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;, + &quot;authority_type&quot;: &quot;person&quot;, + &quot;first_name&quot;: &quot;D.&quot;, + &quot;last_name&quot;: &quot;Grace&quot; + } + ] + } +} +</code></pre> + +<ul> +<li>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields&hellip;</li> +<li>The update syntax should be something like this, but I&rsquo;m getting errors from Solr:</li> +</ul> + +<pre><code>$ curl 'localhost:8081/solr/authority/update?commit=true&amp;wt=json&amp;indent=true' -H 'Content-type:application/json' -d '[{&quot;id&quot;:&quot;1&quot;,&quot;price&quot;:{&quot;set&quot;:100}}]' +{ + &quot;responseHeader&quot;:{ + &quot;status&quot;:400, + &quot;QTime&quot;:0}, + &quot;error&quot;:{ + &quot;msg&quot;:&quot;Unexpected character '[' (code 91) in prolog; expected '&lt;'\n at [row,col {unknown-source}]: [1,1]&quot;, + &quot;code&quot;:400}} +</code></pre> + +<ul> +<li>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</li> +<li>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</li> +</ul> + +<pre><code>dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; +UPDATE 561 +</code></pre> + +<ul> +<li>Then I&rsquo;ll reindex discovery and authority and see how the authority Solr core looks</li> +<li>After this, now there are authorities for some of the &ldquo;Grace, D.&rdquo; and &ldquo;Grace, Delia&rdquo; text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</li> +</ul> + +<pre><code>$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&amp;wt=json&amp;indent=true' +{ + &quot;responseHeader&quot;:{ + &quot;status&quot;:0, + &quot;QTime&quot;:0, + &quot;params&quot;:{ + &quot;q&quot;:&quot;id:18ea1525-2513-430a-8817-a834cd733fbc&quot;, + &quot;indent&quot;:&quot;true&quot;, + &quot;wt&quot;:&quot;json&quot;}}, + &quot;response&quot;:{&quot;numFound&quot;:1,&quot;start&quot;:0,&quot;docs&quot;:[ + { + &quot;id&quot;:&quot;18ea1525-2513-430a-8817-a834cd733fbc&quot;, + &quot;field&quot;:&quot;dc_contributor_author&quot;, + &quot;value&quot;:&quot;Grace, Delia&quot;, + &quot;deleted&quot;:false, + &quot;creation_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;, + &quot;last_modified_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;, + &quot;authority_type&quot;:&quot;person&quot;, + &quot;first_name&quot;:&quot;Delia&quot;, + &quot;last_name&quot;:&quot;Grace&quot;}] + }} +</code></pre> + +<ul> +<li>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</li> +<li>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</li> +<li>Better to use:</li> +</ul> + +<pre><code>dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; +</code></pre> + +<ul> +<li>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</li> </ul> diff --git a/public/tags/notes/index.xml b/public/tags/notes/index.xml index d7adc0975..4ac57e874 100644 --- a/public/tags/notes/index.xml +++ b/public/tags/notes/index.xml @@ -253,6 +253,143 @@ UPDATE 561 <li>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</li> <li>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</li> <li>Paola from CCAFS mentioned she also has the &ldquo;take task&rdquo; bug on CGSpace</li> +<li>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</li> +<li>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</li> +<li>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn&rsquo;t dedicated (also runs Solr, which can benefit from OS cache) so let&rsquo;s try 1024MB</li> +<li>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</li> +</ul> + +<pre><code>$ time JAVA_OPTS=&quot;-Xms768m -Xmx768m -Dfile.encoding=UTF-8&quot; /home/dspacetest.cgiar.org/bin/dspace index-authority +Retrieving all data +Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer +Exception: null +java.lang.NullPointerException + at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82) + at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144) + at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61) + at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) + at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) + at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) + at java.lang.reflect.Method.invoke(Method.java:498) + at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226) + at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78) + +real 8m39.913s +user 1m54.190s +sys 0m22.647s +</code></pre> + +<h2 id="2016-12-07">2016-12-07</h2> + +<ul> +<li>For what it&rsquo;s worth, after running the same SQL updates on my local test server, <code>index-authority</code> runs and completes just fine</li> +<li>I will have to test more</li> +<li>Anyways, I noticed that some of the authority values I set actually have versions of author names we don&rsquo;t want, ie &ldquo;Grace, D.&rdquo;</li> +<li>For example, do a Solr query for &ldquo;first_name:Grace&rdquo; and look at the results</li> +<li>Querying that ID shows the fields that need to be changed:</li> +</ul> + +<pre><code>{ + &quot;responseHeader&quot;: { + &quot;status&quot;: 0, + &quot;QTime&quot;: 1, + &quot;params&quot;: { + &quot;q&quot;: &quot;id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;, + &quot;indent&quot;: &quot;true&quot;, + &quot;wt&quot;: &quot;json&quot;, + &quot;_&quot;: &quot;1481102189244&quot; + } + }, + &quot;response&quot;: { + &quot;numFound&quot;: 1, + &quot;start&quot;: 0, + &quot;docs&quot;: [ + { + &quot;id&quot;: &quot;0b4fcbc1-d930-4319-9b4d-ea1553cca70b&quot;, + &quot;field&quot;: &quot;dc_contributor_author&quot;, + &quot;value&quot;: &quot;Grace, D.&quot;, + &quot;deleted&quot;: false, + &quot;creation_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;, + &quot;last_modified_date&quot;: &quot;2016-11-10T15:13:40.318Z&quot;, + &quot;authority_type&quot;: &quot;person&quot;, + &quot;first_name&quot;: &quot;D.&quot;, + &quot;last_name&quot;: &quot;Grace&quot; + } + ] + } +} +</code></pre> + +<ul> +<li>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields&hellip;</li> +<li>The update syntax should be something like this, but I&rsquo;m getting errors from Solr:</li> +</ul> + +<pre><code>$ curl 'localhost:8081/solr/authority/update?commit=true&amp;wt=json&amp;indent=true' -H 'Content-type:application/json' -d '[{&quot;id&quot;:&quot;1&quot;,&quot;price&quot;:{&quot;set&quot;:100}}]' +{ + &quot;responseHeader&quot;:{ + &quot;status&quot;:400, + &quot;QTime&quot;:0}, + &quot;error&quot;:{ + &quot;msg&quot;:&quot;Unexpected character '[' (code 91) in prolog; expected '&lt;'\n at [row,col {unknown-source}]: [1,1]&quot;, + &quot;code&quot;:400}} +</code></pre> + +<ul> +<li>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</li> +<li>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</li> +</ul> + +<pre><code>dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; +UPDATE 561 +</code></pre> + +<ul> +<li>Then I&rsquo;ll reindex discovery and authority and see how the authority Solr core looks</li> +<li>After this, now there are authorities for some of the &ldquo;Grace, D.&rdquo; and &ldquo;Grace, Delia&rdquo; text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</li> +</ul> + +<pre><code>$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&amp;wt=json&amp;indent=true' +{ + &quot;responseHeader&quot;:{ + &quot;status&quot;:0, + &quot;QTime&quot;:0, + &quot;params&quot;:{ + &quot;q&quot;:&quot;id:18ea1525-2513-430a-8817-a834cd733fbc&quot;, + &quot;indent&quot;:&quot;true&quot;, + &quot;wt&quot;:&quot;json&quot;}}, + &quot;response&quot;:{&quot;numFound&quot;:1,&quot;start&quot;:0,&quot;docs&quot;:[ + { + &quot;id&quot;:&quot;18ea1525-2513-430a-8817-a834cd733fbc&quot;, + &quot;field&quot;:&quot;dc_contributor_author&quot;, + &quot;value&quot;:&quot;Grace, Delia&quot;, + &quot;deleted&quot;:false, + &quot;creation_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;, + &quot;last_modified_date&quot;:&quot;2016-12-07T10:54:34.356Z&quot;, + &quot;authority_type&quot;:&quot;person&quot;, + &quot;first_name&quot;:&quot;Delia&quot;, + &quot;last_name&quot;:&quot;Grace&quot;}] + }} +</code></pre> + +<ul> +<li>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</li> +<li>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</li> +<li>Better to use:</li> +</ul> + +<pre><code>dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%'; +</code></pre> + +<ul> +<li>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</li> </ul>