Update notes for 2017-01-11

2025-01-27 05:49:12 +01:00 · 2017-01-11 18:19:57 +02:00
parent b15d1f7892
commit a01175f5ce
5 changed files with 177 additions and 1 deletions
--- a/content/post/2017-01.md
+++ b/content/post/2017-01.md
@ -123,3 +123,35 @@ dspace=# delete from collection2item where item_id = '80596' and id not in (9079
 ## 2017-01-11
 - Maria found another item with duplicate mappings: https://cgspace.cgiar.org/handle/10568/78658
 - Error in `fix-metadata-values.py` when it tries to print the value for Entwicklung & Ländlicher Raum:
 ```
 Traceback (most recent call last):
  File "./fix-metadata-values.py", line 80, in <module>
    print("Fixing {} occurences of: {}".format(records_to_fix, record[0]))
 UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)
 ```
 - Seems we need to encode as UTF-8 before printing to screen, ie:
 ```
 print("Fixing {} occurences of: {}".format(records_to_fix, record[0].encode('utf-8')))
 ```
 - See: http://stackoverflow.com/a/36427358/487333
 - I'm actually not sure if we need to encode() the strings to UTF-8 before writing them to the database... I've never had this issue before
 - Now back to cleaning up some journal titles so we can make the controlled vocabulary:
 ```
 $ ./fix-metadata-values.py -i /tmp/fix-27-journal-titles.csv -f dc.source -t correct -m 55 -d dspace-u dspace-p 'fuuu'
 ```
 - Now get the top 500 journal titles:
 ```
 dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by cou
 nt desc limit 500) to /tmp/journal-titles.csv with csv;
 ```
 - The values are a bit dirty and outdated, since the file I had given to Abenet and Peter was from November
 - I will have to go through these and fix some more before making the controlled vocabulary
--- a/public/2017-01/index.html
+++ b/public/2017-01/index.html
@ -28,7 +28,7 @@
 <meta itemprop="dateModified" content="2017-01-02T10:43:00&#43;03:00" />
-<meta itemprop="wordCount" content="614">
+<meta itemprop="wordCount" content="804">
@ -244,6 +244,42 @@ Caused by: java.net.SocketException: Broken pipe (Write failed)
 <ul>
 <li>Maria found another item with duplicate mappings: <a href="https://cgspace.cgiar.org/handle/10568/78658">https://cgspace.cgiar.org/handle/10568/78658</a></li>
 <li>Error in <code>fix-metadata-values.py</code> when it tries to print the value for Entwicklung &amp; Ländlicher Raum:</li>
 </ul>
 <pre><code>Traceback (most recent call last):
  File &quot;./fix-metadata-values.py&quot;, line 80, in &lt;module&gt;
    print(&quot;Fixing {} occurences of: {}&quot;.format(records_to_fix, record[0]))
 UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15: ordinal not in range(128)
 </code></pre>
 <ul>
 <li>Seems we need to encode as UTF-8 before printing to screen, ie:</li>
 </ul>
 <pre><code>print(&quot;Fixing {} occurences of: {}&quot;.format(records_to_fix, record[0].encode('utf-8')))
 </code></pre>
 <ul>
 <li>See: <a href="http://stackoverflow.com/a/36427358/487333">http://stackoverflow.com/a/36427358/487333</a></li>
 <li>I&rsquo;m actually not sure if we need to encode() the strings to UTF-8 before writing them to the database&hellip; I&rsquo;ve never had this issue before</li>
 <li>Now back to cleaning up some journal titles so we can make the controlled vocabulary:</li>
 </ul>
 <pre><code>$ ./fix-metadata-values.py -i /tmp/fix-27-journal-titles.csv -f dc.source -t correct -m 55 -d dspace-u dspace-p 'fuuu'
 </code></pre>
 <ul>
 <li>Now get the top 500 journal titles:</li>
 </ul>
 <pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by cou
 nt desc limit 500) to /tmp/journal-titles.csv with csv;
 </code></pre>
 <ul>
 <li>The values are a bit dirty and outdated, since the file I had given to Abenet and Peter was from November</li>
 <li>I will have to go through these and fix some more before making the controlled vocabulary</li>
 </ul>
--- a/public/index.xml
+++ b/public/index.xml
@ -151,6 +151,42 @@ Caused by: java.net.SocketException: Broken pipe (Write failed)
 &lt;ul&gt;
 &lt;li&gt;Maria found another item with duplicate mappings: &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/78658&#34;&gt;https://cgspace.cgiar.org/handle/10568/78658&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Error in &lt;code&gt;fix-metadata-values.py&lt;/code&gt; when it tries to print the value for Entwicklung &amp;amp; Ländlicher Raum:&lt;/li&gt;
 &lt;/ul&gt;
 &lt;pre&gt;&lt;code&gt;Traceback (most recent call last):
  File &amp;quot;./fix-metadata-values.py&amp;quot;, line 80, in &amp;lt;module&amp;gt;
    print(&amp;quot;Fixing {} occurences of: {}&amp;quot;.format(records_to_fix, record[0]))
 UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character u&#39;\xe4&#39; in position 15: ordinal not in range(128)
 &lt;/code&gt;&lt;/pre&gt;
 &lt;ul&gt;
 &lt;li&gt;Seems we need to encode as UTF-8 before printing to screen, ie:&lt;/li&gt;
 &lt;/ul&gt;
 &lt;pre&gt;&lt;code&gt;print(&amp;quot;Fixing {} occurences of: {}&amp;quot;.format(records_to_fix, record[0].encode(&#39;utf-8&#39;)))
 &lt;/code&gt;&lt;/pre&gt;
 &lt;ul&gt;
 &lt;li&gt;See: &lt;a href=&#34;http://stackoverflow.com/a/36427358/487333&#34;&gt;http://stackoverflow.com/a/36427358/487333&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;I&amp;rsquo;m actually not sure if we need to encode() the strings to UTF-8 before writing them to the database&amp;hellip; I&amp;rsquo;ve never had this issue before&lt;/li&gt;
 &lt;li&gt;Now back to cleaning up some journal titles so we can make the controlled vocabulary:&lt;/li&gt;
 &lt;/ul&gt;
 &lt;pre&gt;&lt;code&gt;$ ./fix-metadata-values.py -i /tmp/fix-27-journal-titles.csv -f dc.source -t correct -m 55 -d dspace-u dspace-p &#39;fuuu&#39;
 &lt;/code&gt;&lt;/pre&gt;
 &lt;ul&gt;
 &lt;li&gt;Now get the top 500 journal titles:&lt;/li&gt;
 &lt;/ul&gt;
 &lt;pre&gt;&lt;code&gt;dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by cou
 nt desc limit 500) to /tmp/journal-titles.csv with csv;
 &lt;/code&gt;&lt;/pre&gt;
 &lt;ul&gt;
 &lt;li&gt;The values are a bit dirty and outdated, since the file I had given to Abenet and Peter was from November&lt;/li&gt;
 &lt;li&gt;I will have to go through these and fix some more before making the controlled vocabulary&lt;/li&gt;
 &lt;/ul&gt;
 </description>
    </item>
--- a/public/post/index.xml
+++ b/public/post/index.xml
@ -151,6 +151,42 @@ Caused by: java.net.SocketException: Broken pipe (Write failed)
 &lt;ul&gt;
 &lt;li&gt;Maria found another item with duplicate mappings: &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/78658&#34;&gt;https://cgspace.cgiar.org/handle/10568/78658&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Error in &lt;code&gt;fix-metadata-values.py&lt;/code&gt; when it tries to print the value for Entwicklung &amp;amp; Ländlicher Raum:&lt;/li&gt;
 &lt;/ul&gt;
 &lt;pre&gt;&lt;code&gt;Traceback (most recent call last):
  File &amp;quot;./fix-metadata-values.py&amp;quot;, line 80, in &amp;lt;module&amp;gt;
    print(&amp;quot;Fixing {} occurences of: {}&amp;quot;.format(records_to_fix, record[0]))
 UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character u&#39;\xe4&#39; in position 15: ordinal not in range(128)
 &lt;/code&gt;&lt;/pre&gt;
 &lt;ul&gt;
 &lt;li&gt;Seems we need to encode as UTF-8 before printing to screen, ie:&lt;/li&gt;
 &lt;/ul&gt;
 &lt;pre&gt;&lt;code&gt;print(&amp;quot;Fixing {} occurences of: {}&amp;quot;.format(records_to_fix, record[0].encode(&#39;utf-8&#39;)))
 &lt;/code&gt;&lt;/pre&gt;
 &lt;ul&gt;
 &lt;li&gt;See: &lt;a href=&#34;http://stackoverflow.com/a/36427358/487333&#34;&gt;http://stackoverflow.com/a/36427358/487333&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;I&amp;rsquo;m actually not sure if we need to encode() the strings to UTF-8 before writing them to the database&amp;hellip; I&amp;rsquo;ve never had this issue before&lt;/li&gt;
 &lt;li&gt;Now back to cleaning up some journal titles so we can make the controlled vocabulary:&lt;/li&gt;
 &lt;/ul&gt;
 &lt;pre&gt;&lt;code&gt;$ ./fix-metadata-values.py -i /tmp/fix-27-journal-titles.csv -f dc.source -t correct -m 55 -d dspace-u dspace-p &#39;fuuu&#39;
 &lt;/code&gt;&lt;/pre&gt;
 &lt;ul&gt;
 &lt;li&gt;Now get the top 500 journal titles:&lt;/li&gt;
 &lt;/ul&gt;
 &lt;pre&gt;&lt;code&gt;dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by cou
 nt desc limit 500) to /tmp/journal-titles.csv with csv;
 &lt;/code&gt;&lt;/pre&gt;
 &lt;ul&gt;
 &lt;li&gt;The values are a bit dirty and outdated, since the file I had given to Abenet and Peter was from November&lt;/li&gt;
 &lt;li&gt;I will have to go through these and fix some more before making the controlled vocabulary&lt;/li&gt;
 &lt;/ul&gt;
 </description>
    </item>
--- a/public/tags/notes/index.xml
+++ b/public/tags/notes/index.xml
@ -150,6 +150,42 @@ Caused by: java.net.SocketException: Broken pipe (Write failed)
 &lt;ul&gt;
 &lt;li&gt;Maria found another item with duplicate mappings: &lt;a href=&#34;https://cgspace.cgiar.org/handle/10568/78658&#34;&gt;https://cgspace.cgiar.org/handle/10568/78658&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;Error in &lt;code&gt;fix-metadata-values.py&lt;/code&gt; when it tries to print the value for Entwicklung &amp;amp; Ländlicher Raum:&lt;/li&gt;
 &lt;/ul&gt;
 &lt;pre&gt;&lt;code&gt;Traceback (most recent call last):
  File &amp;quot;./fix-metadata-values.py&amp;quot;, line 80, in &amp;lt;module&amp;gt;
    print(&amp;quot;Fixing {} occurences of: {}&amp;quot;.format(records_to_fix, record[0]))
 UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character u&#39;\xe4&#39; in position 15: ordinal not in range(128)
 &lt;/code&gt;&lt;/pre&gt;
 &lt;ul&gt;
 &lt;li&gt;Seems we need to encode as UTF-8 before printing to screen, ie:&lt;/li&gt;
 &lt;/ul&gt;
 &lt;pre&gt;&lt;code&gt;print(&amp;quot;Fixing {} occurences of: {}&amp;quot;.format(records_to_fix, record[0].encode(&#39;utf-8&#39;)))
 &lt;/code&gt;&lt;/pre&gt;
 &lt;ul&gt;
 &lt;li&gt;See: &lt;a href=&#34;http://stackoverflow.com/a/36427358/487333&#34;&gt;http://stackoverflow.com/a/36427358/487333&lt;/a&gt;&lt;/li&gt;
 &lt;li&gt;I&amp;rsquo;m actually not sure if we need to encode() the strings to UTF-8 before writing them to the database&amp;hellip; I&amp;rsquo;ve never had this issue before&lt;/li&gt;
 &lt;li&gt;Now back to cleaning up some journal titles so we can make the controlled vocabulary:&lt;/li&gt;
 &lt;/ul&gt;
 &lt;pre&gt;&lt;code&gt;$ ./fix-metadata-values.py -i /tmp/fix-27-journal-titles.csv -f dc.source -t correct -m 55 -d dspace-u dspace-p &#39;fuuu&#39;
 &lt;/code&gt;&lt;/pre&gt;
 &lt;ul&gt;
 &lt;li&gt;Now get the top 500 journal titles:&lt;/li&gt;
 &lt;/ul&gt;
 &lt;pre&gt;&lt;code&gt;dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by cou
 nt desc limit 500) to /tmp/journal-titles.csv with csv;
 &lt;/code&gt;&lt;/pre&gt;
 &lt;ul&gt;
 &lt;li&gt;The values are a bit dirty and outdated, since the file I had given to Abenet and Peter was from November&lt;/li&gt;
 &lt;li&gt;I will have to go through these and fix some more before making the controlled vocabulary&lt;/li&gt;
 &lt;/ul&gt;
 </description>
    </item>