Update notes for 2017-01-24

2025-01-27 05:49:12 +01:00 · 2017-01-24 12:41:58 +02:00
parent dad9c406f6
commit 54c60de7d1
5 changed files with 70 additions and 22 deletions
--- a/content/post/2017-01.md
+++ b/content/post/2017-01.md
@ -194,8 +194,7 @@ value + "__description:" + cells["dc.type"].value
 - Test importing of the new CIAT records (actually there are 232, not 234):

 ```
-$ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568
-/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &> /tmp/ciat.log
+$ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" /home/dspacetest.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/79042 --source /home/aorth/CIAT_234/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &> /tmp/ciat.log
 ```

 - Many of the PDFs are 20, 30, 40, 50+ MB, which makes a total of 4GB
@ -246,3 +245,12 @@ $ for community in 10568/171 10568/27868 10568/231 10568/27869 10568/150 10568/2
 ```
 $ ./fix-metadata-values.py -i /tmp/fix-49-journal-titles.csv -f dc.source -t correct -m 55 -d dspace -u dspace -p 'password'
 ```
+
+- Create a new list of the top 500 journal titles from the database:
+
+```
+dspace-# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc limit 500) to /tmp/journal-titles.csv with csv;
+```
+
+- Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request ([#298](https://github.com/ilri/DSpace/pull/298))
+- This would be the last issue remaining to close the meta issue about switching to controlled vocabularies ([#69](https://github.com/ilri/DSpace/pull/69))