diff --git a/content/2016-09.md b/content/2016-09.md index 477f78773..958724719 100644 --- a/content/2016-09.md +++ b/content/2016-09.md @@ -276,11 +276,10 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error ``` - Looking at the top 20 IPs or so, most are Yahoo, MSN, Google, Baidu, TurnitIn (iParadigm), etc... do we have any real users? -- Generate a list of all Affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from: +- Generate a list of all author affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from: ``` -dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) - to /tmp/affiliations.csv with csv; +dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv; ``` - Looking into the Catalina logs again around the time of the first crash, I see: @@ -387,3 +386,15 @@ Exception in thread "Thread-54216" org.apache.solr.client.solrj.impl.HttpSolrSer ``` - I've sent a message to Atmire about the Solr error to see if it's related to their batch update module + +## 2016-09-19 + +- Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions: + +``` +$ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu +$ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace-d dspace-p fuuu +``` + +- After that we need to take the top ~300 and make a controlled vocabulary for it +- I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it ([#267](https://github.com/ilri/DSpace/pull/267)) diff --git a/public/2016-09/index.html b/public/2016-09/index.html index b2a6cfeac..28d745f25 100644 --- a/public/2016-09/index.html +++ b/public/2016-09/index.html @@ -395,11 +395,10 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc)
- to /tmp/affiliations.csv with csv;
+dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
@@ -519,6 +518,21 @@ Exception in thread "Thread-54216" org.apache.solr.client.solrj.impl.H
- I’ve sent a message to Atmire about the Solr error to see if it’s related to their batch update module
+
+
+2016-09-19
+
+
+- Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions:
+
+
+
$ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu
+$ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace-d dspace-p fuuu
+
+
+
+- After that we need to take the top ~300 and make a controlled vocabulary for it
+- I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (#267)
diff --git a/public/index.xml b/public/index.xml
index e184b1c6a..22b829f2b 100644
--- a/public/index.xml
+++ b/public/index.xml
@@ -333,11 +333,10 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
<ul>
<li>Looking at the top 20 IPs or so, most are Yahoo, MSN, Google, Baidu, TurnitIn (iParadigm), etc… do we have any real users?</li>
-<li>Generate a list of all Affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:</li>
+<li>Generate a list of all author affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:</li>
</ul>
-<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc)
- to /tmp/affiliations.csv with csv;
+<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
</code></pre>
<ul>
@@ -458,6 +457,21 @@ Exception in thread "Thread-54216" org.apache.solr.client.solr
<ul>
<li>I’ve sent a message to Atmire about the Solr error to see if it’s related to their batch update module</li>
</ul>
+
+<h2 id="2016-09-19">2016-09-19</h2>
+
+<ul>
+<li>Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions:</li>
+</ul>
+
+<pre><code>$ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu
+$ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace-d dspace-p fuuu
+</code></pre>
+
+<ul>
+<li>After that we need to take the top ~300 and make a controlled vocabulary for it</li>
+<li>I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (<a href="https://github.com/ilri/DSpace/pull/267">#267</a>)</li>
+</ul>
diff --git a/public/page/1/index.html b/public/page/1/index.html
index 04196b592..7d9b1d241 100644
--- a/public/page/1/index.html
+++ b/public/page/1/index.html
@@ -1 +1 @@
-https://alanorth.github.io/cgspace-notes/
\ No newline at end of file
+
\ No newline at end of file
diff --git a/public/tags/notes/index.xml b/public/tags/notes/index.xml
index 5ae983e2b..c27798642 100644
--- a/public/tags/notes/index.xml
+++ b/public/tags/notes/index.xml
@@ -333,11 +333,10 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
<ul>
<li>Looking at the top 20 IPs or so, most are Yahoo, MSN, Google, Baidu, TurnitIn (iParadigm), etc… do we have any real users?</li>
-<li>Generate a list of all Affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:</li>
+<li>Generate a list of all author affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:</li>
</ul>
-<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc)
- to /tmp/affiliations.csv with csv;
+<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
</code></pre>
<ul>
@@ -458,6 +457,21 @@ Exception in thread "Thread-54216" org.apache.solr.client.solr
<ul>
<li>I’ve sent a message to Atmire about the Solr error to see if it’s related to their batch update module</li>
</ul>
+
+<h2 id="2016-09-19">2016-09-19</h2>
+
+<ul>
+<li>Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions:</li>
+</ul>
+
+<pre><code>$ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu
+$ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace-d dspace-p fuuu
+</code></pre>
+
+<ul>
+<li>After that we need to take the top ~300 and make a controlled vocabulary for it</li>
+<li>I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (<a href="https://github.com/ilri/DSpace/pull/267">#267</a>)</li>
+</ul>