Add notes for 2016-12-11

This commit is contained in:
Alan Orth 2016-12-11 16:07:48 +02:00
parent 4c76dfda8d
commit ddde0ad075
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
9 changed files with 225 additions and 1 deletions

View File

@ -439,3 +439,43 @@ dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab76
``` ```
- The authority IDs were different now than when I was looking a few days ago so I had to adjust them here - The authority IDs were different now than when I was looking a few days ago so I had to adjust them here
## 2016-12-11
- After enabling a sizable `shared_buffers` for CGSpace's PostgreSQL configuration the number of connections to the database dropped significantly
![postgres_bgwriter-week](2016/12/postgres_bgwriter-week.png)
![postgres_connections_ALL-week](2016/12/postgres_connections_ALL-week.png)
- Looking at CIAT records from last week again, they have a lot of double authors like:
```
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0
```
- Some in the same `dc.contributor.author` field, and some in others like `dc.contributor.author[en_US]` etc
- Removing the duplicates in OpenRefine and uploading a CSV to DSpace says "no changes detected"
- Seems like the only way to sortof clean these up would be to start in SQL:
```
dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'International Center for Tropical Agriculture';
text_value | authority | confidence
-----------------------------------------------+--------------------------------------+------------
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1
International Center for Tropical Agriculture | | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600
International Center for Tropical Agriculture | | -1
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
UPDATE 1693
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', text_value='International Center for Tropical Agriculture', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%CIAT%';
UPDATE 35
```
- Work on article for KM4Dev journal

View File

@ -30,7 +30,7 @@
<meta itemprop="dateModified" content="2016-12-02T10:43:00&#43;03:00" /> <meta itemprop="dateModified" content="2016-12-02T10:43:00&#43;03:00" />
<meta itemprop="wordCount" content="2376"> <meta itemprop="wordCount" content="2622">
@ -579,6 +579,52 @@ dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab76
<li>The authority IDs were different now than when I was looking a few days ago so I had to adjust them here</li> <li>The authority IDs were different now than when I was looking a few days ago so I had to adjust them here</li>
</ul> </ul>
<h2 id="2016-12-11">2016-12-11</h2>
<ul>
<li>After enabling a sizable <code>shared_buffers</code> for CGSpace&rsquo;s PostgreSQL configuration the number of connections to the database dropped significantly</li>
</ul>
<p><img src="2016/12/postgres_bgwriter-week.png" alt="postgres_bgwriter-week" />
<img src="2016/12/postgres_connections_ALL-week.png" alt="postgres_connections_ALL-week" /></p>
<ul>
<li>Looking at CIAT records from last week again, they have a lot of double authors like:</li>
</ul>
<pre><code>International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0
</code></pre>
<ul>
<li>Some in the same <code>dc.contributor.author</code> field, and some in others like <code>dc.contributor.author[en_US]</code> etc</li>
<li>Removing the duplicates in OpenRefine and uploading a CSV to DSpace says &ldquo;no changes detected&rdquo;</li>
<li>Seems like the only way to sortof clean these up would be to start in SQL:</li>
</ul>
<pre><code>dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like 'International Center for Tropical Agriculture';
text_value | authority | confidence
-----------------------------------------------+--------------------------------------+------------
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1
International Center for Tropical Agriculture | | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600
International Center for Tropical Agriculture | | -1
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = 'International Center for Tropical Agriculture';
UPDATE 1693
dspace=# update metadatavalue set authority='3026b1de-9302-4f3e-85ab-ef48da024eb2', text_value='International Center for Tropical Agriculture', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like '%CIAT%';
UPDATE 35
</code></pre>
<ul>
<li>Work on article for KM4Dev journal</li>
</ul>

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

View File

@ -482,6 +482,52 @@ dspace=# update metadatavalue set authority=&#39;2df8136e-d8f4-4142-b58c-562337c
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;The authority IDs were different now than when I was looking a few days ago so I had to adjust them here&lt;/li&gt; &lt;li&gt;The authority IDs were different now than when I was looking a few days ago so I had to adjust them here&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;2016-12-11&#34;&gt;2016-12-11&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;After enabling a sizable &lt;code&gt;shared_buffers&lt;/code&gt; for CGSpace&amp;rsquo;s PostgreSQL configuration the number of connections to the database dropped significantly&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;2016/12/postgres_bgwriter-week.png&#34; alt=&#34;postgres_bgwriter-week&#34; /&gt;
&lt;img src=&#34;2016/12/postgres_connections_ALL-week.png&#34; alt=&#34;postgres_connections_ALL-week&#34; /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Looking at CIAT records from last week again, they have a lot of double authors like:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Some in the same &lt;code&gt;dc.contributor.author&lt;/code&gt; field, and some in others like &lt;code&gt;dc.contributor.author[en_US]&lt;/code&gt; etc&lt;/li&gt;
&lt;li&gt;Removing the duplicates in OpenRefine and uploading a CSV to DSpace says &amp;ldquo;no changes detected&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Seems like the only way to sortof clean these up would be to start in SQL:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like &#39;International Center for Tropical Agriculture&#39;;
text_value | authority | confidence
-----------------------------------------------+--------------------------------------+------------
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1
International Center for Tropical Agriculture | | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600
International Center for Tropical Agriculture | | -1
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0
dspace=# update metadatavalue set authority=&#39;3026b1de-9302-4f3e-85ab-ef48da024eb2&#39;, confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = &#39;International Center for Tropical Agriculture&#39;;
UPDATE 1693
dspace=# update metadatavalue set authority=&#39;3026b1de-9302-4f3e-85ab-ef48da024eb2&#39;, text_value=&#39;International Center for Tropical Agriculture&#39;, confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like &#39;%CIAT%&#39;;
UPDATE 35
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Work on article for KM4Dev journal&lt;/li&gt;
&lt;/ul&gt;
</description> </description>
</item> </item>

View File

@ -482,6 +482,52 @@ dspace=# update metadatavalue set authority=&#39;2df8136e-d8f4-4142-b58c-562337c
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;The authority IDs were different now than when I was looking a few days ago so I had to adjust them here&lt;/li&gt; &lt;li&gt;The authority IDs were different now than when I was looking a few days ago so I had to adjust them here&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;2016-12-11&#34;&gt;2016-12-11&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;After enabling a sizable &lt;code&gt;shared_buffers&lt;/code&gt; for CGSpace&amp;rsquo;s PostgreSQL configuration the number of connections to the database dropped significantly&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;2016/12/postgres_bgwriter-week.png&#34; alt=&#34;postgres_bgwriter-week&#34; /&gt;
&lt;img src=&#34;2016/12/postgres_connections_ALL-week.png&#34; alt=&#34;postgres_connections_ALL-week&#34; /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Looking at CIAT records from last week again, they have a lot of double authors like:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Some in the same &lt;code&gt;dc.contributor.author&lt;/code&gt; field, and some in others like &lt;code&gt;dc.contributor.author[en_US]&lt;/code&gt; etc&lt;/li&gt;
&lt;li&gt;Removing the duplicates in OpenRefine and uploading a CSV to DSpace says &amp;ldquo;no changes detected&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Seems like the only way to sortof clean these up would be to start in SQL:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like &#39;International Center for Tropical Agriculture&#39;;
text_value | authority | confidence
-----------------------------------------------+--------------------------------------+------------
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1
International Center for Tropical Agriculture | | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600
International Center for Tropical Agriculture | | -1
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0
dspace=# update metadatavalue set authority=&#39;3026b1de-9302-4f3e-85ab-ef48da024eb2&#39;, confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = &#39;International Center for Tropical Agriculture&#39;;
UPDATE 1693
dspace=# update metadatavalue set authority=&#39;3026b1de-9302-4f3e-85ab-ef48da024eb2&#39;, text_value=&#39;International Center for Tropical Agriculture&#39;, confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like &#39;%CIAT%&#39;;
UPDATE 35
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Work on article for KM4Dev journal&lt;/li&gt;
&lt;/ul&gt;
</description> </description>
</item> </item>

View File

@ -481,6 +481,52 @@ dspace=# update metadatavalue set authority=&#39;2df8136e-d8f4-4142-b58c-562337c
&lt;ul&gt; &lt;ul&gt;
&lt;li&gt;The authority IDs were different now than when I was looking a few days ago so I had to adjust them here&lt;/li&gt; &lt;li&gt;The authority IDs were different now than when I was looking a few days ago so I had to adjust them here&lt;/li&gt;
&lt;/ul&gt; &lt;/ul&gt;
&lt;h2 id=&#34;2016-12-11&#34;&gt;2016-12-11&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;After enabling a sizable &lt;code&gt;shared_buffers&lt;/code&gt; for CGSpace&amp;rsquo;s PostgreSQL configuration the number of connections to the database dropped significantly&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;2016/12/postgres_bgwriter-week.png&#34; alt=&#34;postgres_bgwriter-week&#34; /&gt;
&lt;img src=&#34;2016/12/postgres_connections_ALL-week.png&#34; alt=&#34;postgres_connections_ALL-week&#34; /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Looking at CIAT records from last week again, they have a lot of double authors like:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::600
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::500
International Center for Tropical Agriculture::3026b1de-9302-4f3e-85ab-ef48da024eb2::0
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Some in the same &lt;code&gt;dc.contributor.author&lt;/code&gt; field, and some in others like &lt;code&gt;dc.contributor.author[en_US]&lt;/code&gt; etc&lt;/li&gt;
&lt;li&gt;Removing the duplicates in OpenRefine and uploading a CSV to DSpace says &amp;ldquo;no changes detected&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Seems like the only way to sortof clean these up would be to start in SQL:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;dspace=# select distinct text_value, authority, confidence from metadatavalue where resource_type_id=2 and metadata_field_id=3 and text_value like &#39;International Center for Tropical Agriculture&#39;;
text_value | authority | confidence
-----------------------------------------------+--------------------------------------+------------
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | -1
International Center for Tropical Agriculture | | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 500
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 600
International Center for Tropical Agriculture | | -1
International Center for Tropical Agriculture | cc726b78-a2f4-4ee9-af98-855c2ea31c36 | 500
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 600
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | -1
International Center for Tropical Agriculture | 3026b1de-9302-4f3e-85ab-ef48da024eb2 | 0
dspace=# update metadatavalue set authority=&#39;3026b1de-9302-4f3e-85ab-ef48da024eb2&#39;, confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value = &#39;International Center for Tropical Agriculture&#39;;
UPDATE 1693
dspace=# update metadatavalue set authority=&#39;3026b1de-9302-4f3e-85ab-ef48da024eb2&#39;, text_value=&#39;International Center for Tropical Agriculture&#39;, confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like &#39;%CIAT%&#39;;
UPDATE 35
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Work on article for KM4Dev journal&lt;/li&gt;
&lt;/ul&gt;
</description> </description>
</item> </item>

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB