Update notes for 2017-01-24

2025-01-27 05:49:12 +01:00 · 2017-01-24 12:41:58 +02:00
parent dad9c406f6
commit 54c60de7d1
5 changed files with 70 additions and 22 deletions
--- a/content/post/2017-01.md
+++ b/content/post/2017-01.md
@@ -194,8 +194,7 @@ value + "__description:" + cells["dc.type"].value
 - Test importing of the new CIAT records (actually there are 232, not 234):

 ```
-$ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568
-/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &> /tmp/ciat.log
+$ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &> /tmp/ciat.log
 ```

 - Many of the PDFs are 20, 30, 40, 50+ MB, which makes a total of 4GB
@@ -246,3 +245,12 @@ $ for community in 10568/171 10568/27868 10568/231 10568/27869 10568/150 10568/2
 ```
 $ ./fix-metadata-values.py -i /tmp/fix-49-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p 'password'
 ```
+
+- Create a new list of the top 500 journal titles from the database:
+
+```
+dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
+```
+
+- Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request ([#298](https://github.com/ilri/DSpace/pull/298))
+- This would be the last issue remaining to close the meta issue about switching to controlled vocabularies ([#69](https://github.com/ilri/DSpace/pull/69))
--- a/public/2017-01/index.html
+++ b/public/2017-01/index.html
@@ -59,7 +59,7 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua
  
  "headline": "January, 2017",
  "url": "https://alanorth.github.io/cgspace-notes/2017-01/",
-  "wordCount": "1327",
+  "wordCount": "1400",
  
  
  "datePublished": "2017-01-02T10:43:00+03:00",
@@ -299,8 +299,7 @@ UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15:
 <li>Now get the top 500 journal titles:</li>
 </ul>

-<pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by cou
-nt desc limit 500) to /tmp/journal-titles.csv with csv;
+<pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
 </code></pre>

 <ul>
@@ -351,8 +350,7 @@ delete from collection2item where id = '91082';
 <li>Test importing of the new CIAT records (actually there are 232, not 234):</li>
 </ul>

-<pre><code>$ JAVA_OPTS=&quot;-Xmx512m -Dfile.encoding=UTF-8&quot; /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568
-/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;&gt; /tmp/ciat.log
+<pre><code>$ JAVA_OPTS=&quot;-Xmx512m -Dfile.encoding=UTF-8&quot; /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;&gt; /tmp/ciat.log
 </code></pre>

 <ul>
@@ -413,6 +411,18 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
 <pre><code>$ ./fix-metadata-values.py -i /tmp/fix-49-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p 'password'
 </code></pre>

+<ul>
+<li>Create a new list of the top 500 journal titles from the database:</li>
+</ul>
+
+<pre><code>dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
+</code></pre>
+
+<ul>
+<li>Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request (<a href="https://github.com/ilri/DSpace/pull/298">#298</a>)</li>
+<li>This would be the last issue remaining to close the meta issue about switching to controlled vocabularies (<a href="https://github.com/ilri/DSpace/pull/69">#69</a>)</li>
+</ul>
+
  

  
--- a/public/index.xml
+++ b/public/index.xml
@@ -180,8 +180,7 @@ UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character u&#39;\xe4&
 &lt;li&gt;Now get the top 500 journal titles:&lt;/li&gt;
 &lt;/ul&gt;

-&lt;pre&gt;&lt;code&gt;dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by cou
-nt desc limit 500) to /tmp/journal-titles.csv with csv;
+&lt;pre&gt;&lt;code&gt;dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
 &lt;/code&gt;&lt;/pre&gt;

 &lt;ul&gt;
@@ -232,8 +231,7 @@ delete from collection2item where id = &#39;91082&#39;;
 &lt;li&gt;Test importing of the new CIAT records (actually there are 232, not 234):&lt;/li&gt;
 &lt;/ul&gt;

-&lt;pre&gt;&lt;code&gt;$ JAVA_OPTS=&amp;quot;-Xmx512m -Dfile.encoding=UTF-8&amp;quot; /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568
-/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;amp;&amp;gt; /tmp/ciat.log
+&lt;pre&gt;&lt;code&gt;$ JAVA_OPTS=&amp;quot;-Xmx512m -Dfile.encoding=UTF-8&amp;quot; /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;amp;&amp;gt; /tmp/ciat.log
 &lt;/code&gt;&lt;/pre&gt;

 &lt;ul&gt;
@@ -292,7 +290,19 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
 &lt;/ul&gt;

 &lt;pre&gt;&lt;code&gt;$ ./fix-metadata-values.py -i /tmp/fix-49-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p &#39;password&#39;
-&lt;/code&gt;&lt;/pre&gt;</description>
+&lt;/code&gt;&lt;/pre&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Create a new list of the top 500 journal titles from the database:&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;pre&gt;&lt;code&gt;dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
+&lt;/code&gt;&lt;/pre&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/298&#34;&gt;#298&lt;/a&gt;)&lt;/li&gt;
+&lt;li&gt;This would be the last issue remaining to close the meta issue about switching to controlled vocabularies (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/69&#34;&gt;#69&lt;/a&gt;)&lt;/li&gt;
+&lt;/ul&gt;</description>
    </item>
    
    <item>
--- a/public/post/index.xml
+++ b/public/post/index.xml
@@ -180,8 +180,7 @@ UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character u&#39;\xe4&
 &lt;li&gt;Now get the top 500 journal titles:&lt;/li&gt;
 &lt;/ul&gt;

-&lt;pre&gt;&lt;code&gt;dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by cou
-nt desc limit 500) to /tmp/journal-titles.csv with csv;
+&lt;pre&gt;&lt;code&gt;dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
 &lt;/code&gt;&lt;/pre&gt;

 &lt;ul&gt;
@@ -232,8 +231,7 @@ delete from collection2item where id = &#39;91082&#39;;
 &lt;li&gt;Test importing of the new CIAT records (actually there are 232, not 234):&lt;/li&gt;
 &lt;/ul&gt;

-&lt;pre&gt;&lt;code&gt;$ JAVA_OPTS=&amp;quot;-Xmx512m -Dfile.encoding=UTF-8&amp;quot; /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568
-/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;amp;&amp;gt; /tmp/ciat.log
+&lt;pre&gt;&lt;code&gt;$ JAVA_OPTS=&amp;quot;-Xmx512m -Dfile.encoding=UTF-8&amp;quot; /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;amp;&amp;gt; /tmp/ciat.log
 &lt;/code&gt;&lt;/pre&gt;

 &lt;ul&gt;
@@ -292,7 +290,19 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
 &lt;/ul&gt;

 &lt;pre&gt;&lt;code&gt;$ ./fix-metadata-values.py -i /tmp/fix-49-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p &#39;password&#39;
-&lt;/code&gt;&lt;/pre&gt;</description>
+&lt;/code&gt;&lt;/pre&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Create a new list of the top 500 journal titles from the database:&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;pre&gt;&lt;code&gt;dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
+&lt;/code&gt;&lt;/pre&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/298&#34;&gt;#298&lt;/a&gt;)&lt;/li&gt;
+&lt;li&gt;This would be the last issue remaining to close the meta issue about switching to controlled vocabularies (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/69&#34;&gt;#69&lt;/a&gt;)&lt;/li&gt;
+&lt;/ul&gt;</description>
    </item>
    
    <item>
--- a/public/tags/notes/index.xml
+++ b/public/tags/notes/index.xml
@@ -179,8 +179,7 @@ UnicodeEncodeError: &#39;ascii&#39; codec can&#39;t encode character u&#39;\xe4&
 &lt;li&gt;Now get the top 500 journal titles:&lt;/li&gt;
 &lt;/ul&gt;

-&lt;pre&gt;&lt;code&gt;dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by cou
-nt desc limit 500) to /tmp/journal-titles.csv with csv;
+&lt;pre&gt;&lt;code&gt;dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
 &lt;/code&gt;&lt;/pre&gt;

 &lt;ul&gt;
@@ -231,8 +230,7 @@ delete from collection2item where id = &#39;91082&#39;;
 &lt;li&gt;Test importing of the new CIAT records (actually there are 232, not 234):&lt;/li&gt;
 &lt;/ul&gt;

-&lt;pre&gt;&lt;code&gt;$ JAVA_OPTS=&amp;quot;-Xmx512m -Dfile.encoding=UTF-8&amp;quot; /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568
-/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;amp;&amp;gt; /tmp/ciat.log
+&lt;pre&gt;&lt;code&gt;$ JAVA_OPTS=&amp;quot;-Xmx512m -Dfile.encoding=UTF-8&amp;quot; /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &amp;amp;&amp;gt; /tmp/ciat.log
 &lt;/code&gt;&lt;/pre&gt;

 &lt;ul&gt;
@@ -291,7 +289,19 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
 &lt;/ul&gt;

 &lt;pre&gt;&lt;code&gt;$ ./fix-metadata-values.py -i /tmp/fix-49-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p &#39;password&#39;
-&lt;/code&gt;&lt;/pre&gt;</description>
+&lt;/code&gt;&lt;/pre&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Create a new list of the top 500 journal titles from the database:&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;pre&gt;&lt;code&gt;dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
+&lt;/code&gt;&lt;/pre&gt;
+
+&lt;ul&gt;
+&lt;li&gt;Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/298&#34;&gt;#298&lt;/a&gt;)&lt;/li&gt;
+&lt;li&gt;This would be the last issue remaining to close the meta issue about switching to controlled vocabularies (&lt;a href=&#34;https://github.com/ilri/DSpace/pull/69&#34;&gt;#69&lt;/a&gt;)&lt;/li&gt;
+&lt;/ul&gt;</description>
    </item>
    
    <item>