From 6fc8031da47299a5790e294302521ee02a3fdee4 Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Mon, 19 Sep 2016 17:52:47 +0300 Subject: [PATCH] Update notes for 2016-09-19 --- content/2016-09.md | 17 ++++++++++++++--- public/2016-09/index.html | 20 +++++++++++++++++--- public/index.xml | 20 +++++++++++++++++--- public/page/1/index.html | 2 +- public/tags/notes/index.xml | 20 +++++++++++++++++--- 5 files changed, 66 insertions(+), 13 deletions(-) diff --git a/content/2016-09.md b/content/2016-09.md index 477f78773..958724719 100644 --- a/content/2016-09.md +++ b/content/2016-09.md @@ -276,11 +276,10 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error ``` - Looking at the top 20 IPs or so, most are Yahoo, MSN, Google, Baidu, TurnitIn (iParadigm), etc... do we have any real users? -- Generate a list of all Affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from: +- Generate a list of all author affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from: ``` -dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) - to /tmp/affiliations.csv with csv; +dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv; ``` - Looking into the Catalina logs again around the time of the first crash, I see: @@ -387,3 +386,15 @@ Exception in thread "Thread-54216" org.apache.solr.client.solrj.impl.HttpSolrSer ``` - I've sent a message to Atmire about the Solr error to see if it's related to their batch update module + +## 2016-09-19 + +- Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions: + +``` +$ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu +$ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace-d dspace-p fuuu +``` + +- After that we need to take the top ~300 and make a controlled vocabulary for it +- I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it ([#267](https://github.com/ilri/DSpace/pull/267)) diff --git a/public/2016-09/index.html b/public/2016-09/index.html index b2a6cfeac..28d745f25 100644 --- a/public/2016-09/index.html +++ b/public/2016-09/index.html @@ -395,11 +395,10 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error -
dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc)
- to /tmp/affiliations.csv with csv;
+
dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
 
    @@ -519,6 +518,21 @@ Exception in thread "Thread-54216" org.apache.solr.client.solrj.impl.H
    • I’ve sent a message to Atmire about the Solr error to see if it’s related to their batch update module
    • +
    + +

    2016-09-19

    + +
      +
    • Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions:
    • +
    + +
    $ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu
    +$ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace-d dspace-p fuuu
    +
    + +
      +
    • After that we need to take the top ~300 and make a controlled vocabulary for it
    • +
    • I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (#267)
    diff --git a/public/index.xml b/public/index.xml index e184b1c6a..22b829f2b 100644 --- a/public/index.xml +++ b/public/index.xml @@ -333,11 +333,10 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error <ul> <li>Looking at the top 20 IPs or so, most are Yahoo, MSN, Google, Baidu, TurnitIn (iParadigm), etc&hellip; do we have any real users?</li> -<li>Generate a list of all Affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:</li> +<li>Generate a list of all author affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:</li> </ul> -<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) - to /tmp/affiliations.csv with csv; +<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv; </code></pre> <ul> @@ -458,6 +457,21 @@ Exception in thread &quot;Thread-54216&quot; org.apache.solr.client.solr <ul> <li>I&rsquo;ve sent a message to Atmire about the Solr error to see if it&rsquo;s related to their batch update module</li> </ul> + +<h2 id="2016-09-19">2016-09-19</h2> + +<ul> +<li>Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions:</li> +</ul> + +<pre><code>$ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu +$ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace-d dspace-p fuuu +</code></pre> + +<ul> +<li>After that we need to take the top ~300 and make a controlled vocabulary for it</li> +<li>I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (<a href="https://github.com/ilri/DSpace/pull/267">#267</a>)</li> +</ul> diff --git a/public/page/1/index.html b/public/page/1/index.html index 04196b592..7d9b1d241 100644 --- a/public/page/1/index.html +++ b/public/page/1/index.html @@ -1 +1 @@ -https://alanorth.github.io/cgspace-notes/ \ No newline at end of file + \ No newline at end of file diff --git a/public/tags/notes/index.xml b/public/tags/notes/index.xml index 5ae983e2b..c27798642 100644 --- a/public/tags/notes/index.xml +++ b/public/tags/notes/index.xml @@ -333,11 +333,10 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error <ul> <li>Looking at the top 20 IPs or so, most are Yahoo, MSN, Google, Baidu, TurnitIn (iParadigm), etc&hellip; do we have any real users?</li> -<li>Generate a list of all Affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:</li> +<li>Generate a list of all author affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from:</li> </ul> -<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) - to /tmp/affiliations.csv with csv; +<pre><code>dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv; </code></pre> <ul> @@ -458,6 +457,21 @@ Exception in thread &quot;Thread-54216&quot; org.apache.solr.client.solr <ul> <li>I&rsquo;ve sent a message to Atmire about the Solr error to see if it&rsquo;s related to their batch update module</li> </ul> + +<h2 id="2016-09-19">2016-09-19</h2> + +<ul> +<li>Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions:</li> +</ul> + +<pre><code>$ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu +$ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace-d dspace-p fuuu +</code></pre> + +<ul> +<li>After that we need to take the top ~300 and make a controlled vocabulary for it</li> +<li>I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it (<a href="https://github.com/ilri/DSpace/pull/267">#267</a>)</li> +</ul>