diff --git a/content/post/2016-12.md b/content/post/2016-12.md
index 13770f1d6..d0e025e9d 100644
--- a/content/post/2016-12.md
+++ b/content/post/2016-12.md
@@ -227,3 +227,134 @@ UPDATE 561
- In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work
- I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there
- Paola from CCAFS mentioned she also has the "take task" bug on CGSpace
+- Reading about [`shared_buffers` in PostgreSQL configuration](https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html) (default is 128MB)
+- Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres
+- The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn't dedicated (also runs Solr, which can benefit from OS cache) so let's try 1024MB
+- In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):
+
+```
+$ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority
+Retrieving all data
+Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
+Exception: null
+java.lang.NullPointerException
+ at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
+ at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
+ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
+ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+ at java.lang.reflect.Method.invoke(Method.java:498)
+ at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
+ at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
+
+real 8m39.913s
+user 1m54.190s
+sys 0m22.647s
+```
+
+## 2016-12-07
+
+- For what it's worth, after running the same SQL updates on my local test server, `index-authority` runs and completes just fine
+- I will have to test more
+- Anyways, I noticed that some of the authority values I set actually have versions of author names we don't want, ie "Grace, D."
+- For example, do a Solr query for "first_name:Grace" and look at the results
+- Querying that ID shows the fields that need to be changed:
+
+```
+{
+ "responseHeader": {
+ "status": 0,
+ "QTime": 1,
+ "params": {
+ "q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
+ "indent": "true",
+ "wt": "json",
+ "_": "1481102189244"
+ }
+ },
+ "response": {
+ "numFound": 1,
+ "start": 0,
+ "docs": [
+ {
+ "id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
+ "field": "dc_contributor_author",
+ "value": "Grace, D.",
+ "deleted": false,
+ "creation_date": "2016-11-10T15:13:40.318Z",
+ "last_modified_date": "2016-11-10T15:13:40.318Z",
+ "authority_type": "person",
+ "first_name": "D.",
+ "last_name": "Grace"
+ }
+ ]
+ }
+}
+```
+
+- I think I can just update the `value`, `first_name`, and `last_name` fields...
+- The update syntax should be something like this, but I'm getting errors from Solr:
+
+```
+$ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'
+{
+ "responseHeader":{
+ "status":400,
+ "QTime":0},
+ "error":{
+ "msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]",
+ "code":400}}
+```
+
+- When I try using the XML format I get an error that the `updateLog` needs to be configured for that core
+- Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?
+
+
+```
+dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
+UPDATE 561
+```
+
+- Then I'll reindex discovery and authority and see how the authority Solr core looks
+- After this, now there are authorities for some of the "Grace, D." and "Grace, Delia" text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):
+
+```
+$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true'
+{
+ "responseHeader":{
+ "status":0,
+ "QTime":0,
+ "params":{
+ "q":"id:18ea1525-2513-430a-8817-a834cd733fbc",
+ "indent":"true",
+ "wt":"json"}},
+ "response":{"numFound":1,"start":0,"docs":[
+ {
+ "id":"18ea1525-2513-430a-8817-a834cd733fbc",
+ "field":"dc_contributor_author",
+ "value":"Grace, Delia",
+ "deleted":false,
+ "creation_date":"2016-12-07T10:54:34.356Z",
+ "last_modified_date":"2016-12-07T10:54:34.356Z",
+ "authority_type":"person",
+ "first_name":"Delia",
+ "last_name":"Grace"}]
+ }}
+```
+- So now I could set them all to this ID and the name would be ok, but there has to be a better way!
+- In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!
+- Better to use:
+
+```
+dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
+```
+
+- This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!
diff --git a/public/2016-12/index.html b/public/2016-12/index.html
index fb602b748..197c38057 100644
--- a/public/2016-12/index.html
+++ b/public/2016-12/index.html
@@ -30,7 +30,7 @@
-
+
@@ -350,6 +350,143 @@ UPDATE 561
In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work
I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there
Paola from CCAFS mentioned she also has the “take task” bug on CGSpace
+Reading about shared_buffers
in PostgreSQL configuration (default is 128MB)
+Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres
+The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn’t dedicated (also runs Solr, which can benefit from OS cache) so let’s try 1024MB
+In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):
+
+
+$ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority
+Retrieving all data
+Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
+Exception: null
+java.lang.NullPointerException
+ at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
+ at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
+ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
+ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+ at java.lang.reflect.Method.invoke(Method.java:498)
+ at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
+ at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
+
+real 8m39.913s
+user 1m54.190s
+sys 0m22.647s
+
+
+2016-12-07
+
+
+- For what it’s worth, after running the same SQL updates on my local test server,
index-authority
runs and completes just fine
+- I will have to test more
+- Anyways, I noticed that some of the authority values I set actually have versions of author names we don’t want, ie “Grace, D.”
+- For example, do a Solr query for “first_name:Grace” and look at the results
+- Querying that ID shows the fields that need to be changed:
+
+
+{
+ "responseHeader": {
+ "status": 0,
+ "QTime": 1,
+ "params": {
+ "q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
+ "indent": "true",
+ "wt": "json",
+ "_": "1481102189244"
+ }
+ },
+ "response": {
+ "numFound": 1,
+ "start": 0,
+ "docs": [
+ {
+ "id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
+ "field": "dc_contributor_author",
+ "value": "Grace, D.",
+ "deleted": false,
+ "creation_date": "2016-11-10T15:13:40.318Z",
+ "last_modified_date": "2016-11-10T15:13:40.318Z",
+ "authority_type": "person",
+ "first_name": "D.",
+ "last_name": "Grace"
+ }
+ ]
+ }
+}
+
+
+
+- I think I can just update the
value
, first_name
, and last_name
fields…
+- The update syntax should be something like this, but I’m getting errors from Solr:
+
+
+$ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'
+{
+ "responseHeader":{
+ "status":400,
+ "QTime":0},
+ "error":{
+ "msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]",
+ "code":400}}
+
+
+
+- When I try using the XML format I get an error that the
updateLog
needs to be configured for that core
+- Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?
+
+
+dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
+UPDATE 561
+
+
+
+- Then I’ll reindex discovery and authority and see how the authority Solr core looks
+- After this, now there are authorities for some of the “Grace, D.” and “Grace, Delia” text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):
+
+
+$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true'
+{
+ "responseHeader":{
+ "status":0,
+ "QTime":0,
+ "params":{
+ "q":"id:18ea1525-2513-430a-8817-a834cd733fbc",
+ "indent":"true",
+ "wt":"json"}},
+ "response":{"numFound":1,"start":0,"docs":[
+ {
+ "id":"18ea1525-2513-430a-8817-a834cd733fbc",
+ "field":"dc_contributor_author",
+ "value":"Grace, Delia",
+ "deleted":false,
+ "creation_date":"2016-12-07T10:54:34.356Z",
+ "last_modified_date":"2016-12-07T10:54:34.356Z",
+ "authority_type":"person",
+ "first_name":"Delia",
+ "last_name":"Grace"}]
+ }}
+
+
+
+- So now I could set them all to this ID and the name would be ok, but there has to be a better way!
+- In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!
+- Better to use:
+
+
+dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
+
+
+
+- This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!
diff --git a/public/index.xml b/public/index.xml
index 08e4ac7aa..b6ed0a98b 100644
--- a/public/index.xml
+++ b/public/index.xml
@@ -254,6 +254,143 @@ UPDATE 561
<li>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</li>
<li>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</li>
<li>Paola from CCAFS mentioned she also has the “take task” bug on CGSpace</li>
+<li>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</li>
+<li>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</li>
+<li>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn’t dedicated (also runs Solr, which can benefit from OS cache) so let’s try 1024MB</li>
+<li>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</li>
+</ul>
+
+<pre><code>$ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority
+Retrieving all data
+Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
+Exception: null
+java.lang.NullPointerException
+ at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
+ at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
+ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
+ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+ at java.lang.reflect.Method.invoke(Method.java:498)
+ at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
+ at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
+
+real 8m39.913s
+user 1m54.190s
+sys 0m22.647s
+</code></pre>
+
+<h2 id="2016-12-07">2016-12-07</h2>
+
+<ul>
+<li>For what it’s worth, after running the same SQL updates on my local test server, <code>index-authority</code> runs and completes just fine</li>
+<li>I will have to test more</li>
+<li>Anyways, I noticed that some of the authority values I set actually have versions of author names we don’t want, ie “Grace, D.”</li>
+<li>For example, do a Solr query for “first_name:Grace” and look at the results</li>
+<li>Querying that ID shows the fields that need to be changed:</li>
+</ul>
+
+<pre><code>{
+ "responseHeader": {
+ "status": 0,
+ "QTime": 1,
+ "params": {
+ "q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
+ "indent": "true",
+ "wt": "json",
+ "_": "1481102189244"
+ }
+ },
+ "response": {
+ "numFound": 1,
+ "start": 0,
+ "docs": [
+ {
+ "id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
+ "field": "dc_contributor_author",
+ "value": "Grace, D.",
+ "deleted": false,
+ "creation_date": "2016-11-10T15:13:40.318Z",
+ "last_modified_date": "2016-11-10T15:13:40.318Z",
+ "authority_type": "person",
+ "first_name": "D.",
+ "last_name": "Grace"
+ }
+ ]
+ }
+}
+</code></pre>
+
+<ul>
+<li>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields…</li>
+<li>The update syntax should be something like this, but I’m getting errors from Solr:</li>
+</ul>
+
+<pre><code>$ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'
+{
+ "responseHeader":{
+ "status":400,
+ "QTime":0},
+ "error":{
+ "msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]",
+ "code":400}}
+</code></pre>
+
+<ul>
+<li>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</li>
+<li>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</li>
+</ul>
+
+<pre><code>dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
+UPDATE 561
+</code></pre>
+
+<ul>
+<li>Then I’ll reindex discovery and authority and see how the authority Solr core looks</li>
+<li>After this, now there are authorities for some of the “Grace, D.” and “Grace, Delia” text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</li>
+</ul>
+
+<pre><code>$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true'
+{
+ "responseHeader":{
+ "status":0,
+ "QTime":0,
+ "params":{
+ "q":"id:18ea1525-2513-430a-8817-a834cd733fbc",
+ "indent":"true",
+ "wt":"json"}},
+ "response":{"numFound":1,"start":0,"docs":[
+ {
+ "id":"18ea1525-2513-430a-8817-a834cd733fbc",
+ "field":"dc_contributor_author",
+ "value":"Grace, Delia",
+ "deleted":false,
+ "creation_date":"2016-12-07T10:54:34.356Z",
+ "last_modified_date":"2016-12-07T10:54:34.356Z",
+ "authority_type":"person",
+ "first_name":"Delia",
+ "last_name":"Grace"}]
+ }}
+</code></pre>
+
+<ul>
+<li>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</li>
+<li>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</li>
+<li>Better to use:</li>
+</ul>
+
+<pre><code>dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
+</code></pre>
+
+<ul>
+<li>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</li>
</ul>
diff --git a/public/post/index.xml b/public/post/index.xml
index 999807ae6..05670ece0 100644
--- a/public/post/index.xml
+++ b/public/post/index.xml
@@ -254,6 +254,143 @@ UPDATE 561
<li>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</li>
<li>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</li>
<li>Paola from CCAFS mentioned she also has the “take task” bug on CGSpace</li>
+<li>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</li>
+<li>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</li>
+<li>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn’t dedicated (also runs Solr, which can benefit from OS cache) so let’s try 1024MB</li>
+<li>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</li>
+</ul>
+
+<pre><code>$ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority
+Retrieving all data
+Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
+Exception: null
+java.lang.NullPointerException
+ at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
+ at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
+ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
+ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+ at java.lang.reflect.Method.invoke(Method.java:498)
+ at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
+ at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
+
+real 8m39.913s
+user 1m54.190s
+sys 0m22.647s
+</code></pre>
+
+<h2 id="2016-12-07">2016-12-07</h2>
+
+<ul>
+<li>For what it’s worth, after running the same SQL updates on my local test server, <code>index-authority</code> runs and completes just fine</li>
+<li>I will have to test more</li>
+<li>Anyways, I noticed that some of the authority values I set actually have versions of author names we don’t want, ie “Grace, D.”</li>
+<li>For example, do a Solr query for “first_name:Grace” and look at the results</li>
+<li>Querying that ID shows the fields that need to be changed:</li>
+</ul>
+
+<pre><code>{
+ "responseHeader": {
+ "status": 0,
+ "QTime": 1,
+ "params": {
+ "q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
+ "indent": "true",
+ "wt": "json",
+ "_": "1481102189244"
+ }
+ },
+ "response": {
+ "numFound": 1,
+ "start": 0,
+ "docs": [
+ {
+ "id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
+ "field": "dc_contributor_author",
+ "value": "Grace, D.",
+ "deleted": false,
+ "creation_date": "2016-11-10T15:13:40.318Z",
+ "last_modified_date": "2016-11-10T15:13:40.318Z",
+ "authority_type": "person",
+ "first_name": "D.",
+ "last_name": "Grace"
+ }
+ ]
+ }
+}
+</code></pre>
+
+<ul>
+<li>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields…</li>
+<li>The update syntax should be something like this, but I’m getting errors from Solr:</li>
+</ul>
+
+<pre><code>$ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'
+{
+ "responseHeader":{
+ "status":400,
+ "QTime":0},
+ "error":{
+ "msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]",
+ "code":400}}
+</code></pre>
+
+<ul>
+<li>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</li>
+<li>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</li>
+</ul>
+
+<pre><code>dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
+UPDATE 561
+</code></pre>
+
+<ul>
+<li>Then I’ll reindex discovery and authority and see how the authority Solr core looks</li>
+<li>After this, now there are authorities for some of the “Grace, D.” and “Grace, Delia” text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</li>
+</ul>
+
+<pre><code>$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true'
+{
+ "responseHeader":{
+ "status":0,
+ "QTime":0,
+ "params":{
+ "q":"id:18ea1525-2513-430a-8817-a834cd733fbc",
+ "indent":"true",
+ "wt":"json"}},
+ "response":{"numFound":1,"start":0,"docs":[
+ {
+ "id":"18ea1525-2513-430a-8817-a834cd733fbc",
+ "field":"dc_contributor_author",
+ "value":"Grace, Delia",
+ "deleted":false,
+ "creation_date":"2016-12-07T10:54:34.356Z",
+ "last_modified_date":"2016-12-07T10:54:34.356Z",
+ "authority_type":"person",
+ "first_name":"Delia",
+ "last_name":"Grace"}]
+ }}
+</code></pre>
+
+<ul>
+<li>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</li>
+<li>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</li>
+<li>Better to use:</li>
+</ul>
+
+<pre><code>dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
+</code></pre>
+
+<ul>
+<li>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</li>
</ul>
diff --git a/public/tags/notes/index.xml b/public/tags/notes/index.xml
index d7adc0975..4ac57e874 100644
--- a/public/tags/notes/index.xml
+++ b/public/tags/notes/index.xml
@@ -253,6 +253,143 @@ UPDATE 561
<li>In other news, it seems the batch edit patch is working, there are no more WARN errors in the logs and the batch edit seems to work</li>
<li>I need to check the CGSpace logs to see if there are still errors there, and then deploy/monitor it there</li>
<li>Paola from CCAFS mentioned she also has the “take task” bug on CGSpace</li>
+<li>Reading about <a href="https://www.postgresql.org/docs/9.5/static/runtime-config-resource.html"><code>shared_buffers</code> in PostgreSQL configuration</a> (default is 128MB)</li>
+<li>Looks like we have ~5GB of memory used by caches on the test server (after OS and JVM heap!), so we might as well bump up the buffers for Postgres</li>
+<li>The docs say a good starting point for a dedicated server is 25% of the system RAM, and our server isn’t dedicated (also runs Solr, which can benefit from OS cache) so let’s try 1024MB</li>
+<li>In other news, the authority reindexing keeps crashing (I was manually running it after the author updates above):</li>
+</ul>
+
+<pre><code>$ time JAVA_OPTS="-Xms768m -Xmx768m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace index-authority
+Retrieving all data
+Initialize org.dspace.authority.indexer.DSpaceAuthorityIndexer
+Exception: null
+java.lang.NullPointerException
+ at org.dspace.authority.AuthorityValueGenerator.generateRaw(AuthorityValueGenerator.java:82)
+ at org.dspace.authority.AuthorityValueGenerator.generate(AuthorityValueGenerator.java:39)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.prepareNextValue(DSpaceAuthorityIndexer.java:201)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:132)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:159)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.DSpaceAuthorityIndexer.hasMore(DSpaceAuthorityIndexer.java:144)
+ at org.dspace.authority.indexer.AuthorityIndexClient.main(AuthorityIndexClient.java:61)
+ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
+ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
+ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
+ at java.lang.reflect.Method.invoke(Method.java:498)
+ at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:226)
+ at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:78)
+
+real 8m39.913s
+user 1m54.190s
+sys 0m22.647s
+</code></pre>
+
+<h2 id="2016-12-07">2016-12-07</h2>
+
+<ul>
+<li>For what it’s worth, after running the same SQL updates on my local test server, <code>index-authority</code> runs and completes just fine</li>
+<li>I will have to test more</li>
+<li>Anyways, I noticed that some of the authority values I set actually have versions of author names we don’t want, ie “Grace, D.”</li>
+<li>For example, do a Solr query for “first_name:Grace” and look at the results</li>
+<li>Querying that ID shows the fields that need to be changed:</li>
+</ul>
+
+<pre><code>{
+ "responseHeader": {
+ "status": 0,
+ "QTime": 1,
+ "params": {
+ "q": "id:0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
+ "indent": "true",
+ "wt": "json",
+ "_": "1481102189244"
+ }
+ },
+ "response": {
+ "numFound": 1,
+ "start": 0,
+ "docs": [
+ {
+ "id": "0b4fcbc1-d930-4319-9b4d-ea1553cca70b",
+ "field": "dc_contributor_author",
+ "value": "Grace, D.",
+ "deleted": false,
+ "creation_date": "2016-11-10T15:13:40.318Z",
+ "last_modified_date": "2016-11-10T15:13:40.318Z",
+ "authority_type": "person",
+ "first_name": "D.",
+ "last_name": "Grace"
+ }
+ ]
+ }
+}
+</code></pre>
+
+<ul>
+<li>I think I can just update the <code>value</code>, <code>first_name</code>, and <code>last_name</code> fields…</li>
+<li>The update syntax should be something like this, but I’m getting errors from Solr:</li>
+</ul>
+
+<pre><code>$ curl 'localhost:8081/solr/authority/update?commit=true&wt=json&indent=true' -H 'Content-type:application/json' -d '[{"id":"1","price":{"set":100}}]'
+{
+ "responseHeader":{
+ "status":400,
+ "QTime":0},
+ "error":{
+ "msg":"Unexpected character '[' (code 91) in prolog; expected '<'\n at [row,col {unknown-source}]: [1,1]",
+ "code":400}}
+</code></pre>
+
+<ul>
+<li>When I try using the XML format I get an error that the <code>updateLog</code> needs to be configured for that core</li>
+<li>Maybe I can just remove the authority UUID from the records, run the indexing again so it creates a new one for each name variant, then match them correctly?</li>
+</ul>
+
+<pre><code>dspace=# update metadatavalue set authority=null, confidence=-1 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
+UPDATE 561
+</code></pre>
+
+<ul>
+<li>Then I’ll reindex discovery and authority and see how the authority Solr core looks</li>
+<li>After this, now there are authorities for some of the “Grace, D.” and “Grace, Delia” text_values in the database (the first version is actually the same authority that already exists in the core, so it was just added back to some text_values, but the second one is new):</li>
+</ul>
+
+<pre><code>$ curl 'localhost:8081/solr/authority/select?q=id%3A18ea1525-2513-430a-8817-a834cd733fbc&wt=json&indent=true'
+{
+ "responseHeader":{
+ "status":0,
+ "QTime":0,
+ "params":{
+ "q":"id:18ea1525-2513-430a-8817-a834cd733fbc",
+ "indent":"true",
+ "wt":"json"}},
+ "response":{"numFound":1,"start":0,"docs":[
+ {
+ "id":"18ea1525-2513-430a-8817-a834cd733fbc",
+ "field":"dc_contributor_author",
+ "value":"Grace, Delia",
+ "deleted":false,
+ "creation_date":"2016-12-07T10:54:34.356Z",
+ "last_modified_date":"2016-12-07T10:54:34.356Z",
+ "authority_type":"person",
+ "first_name":"Delia",
+ "last_name":"Grace"}]
+ }}
+</code></pre>
+
+<ul>
+<li>So now I could set them all to this ID and the name would be ok, but there has to be a better way!</li>
+<li>In this case it seems that since there were also two different IDs in the original database, I just picked the wrong one!</li>
+<li>Better to use:</li>
+</ul>
+
+<pre><code>dspace#= update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
+</code></pre>
+
+<ul>
+<li>This proves that unifying author name varieties in authorities is easy, but fixing the name in the authority is tricky!</li>
</ul>