diff --git a/content/2016-09.md b/content/2016-09.md index 477f78773..958724719 100644 --- a/content/2016-09.md +++ b/content/2016-09.md @@ -276,11 +276,10 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error ``` - Looking at the top 20 IPs or so, most are Yahoo, MSN, Google, Baidu, TurnitIn (iParadigm), etc... do we have any real users? -- Generate a list of all Affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from: +- Generate a list of all author affiliations for Peter Ballantyne to go through, make corrections, and create a lookup list from: ``` -dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) - to /tmp/affiliations.csv with csv; +dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv; ``` - Looking into the Catalina logs again around the time of the first crash, I see: @@ -387,3 +386,15 @@ Exception in thread "Thread-54216" org.apache.solr.client.solrj.impl.HttpSolrSer ``` - I've sent a message to Atmire about the Solr error to see if it's related to their batch update module + +## 2016-09-19 + +- Work on cleanups for author affiliations after Peter sent me his list of corrections/deletions: + +``` +$ ./fix-metadata-values.py -i affiliations_pb-322-corrections.csv -f cg.contributor.affiliation -t correct -m 211 -d dspace -u dspace -p fuuu +$ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2-deletions.csv -m 211 -u dspace-d dspace-p fuuu +``` + +- After that we need to take the top ~300 and make a controlled vocabulary for it +- I dumped a list of the top 300 affiliations from the database, sorted it alphabetically in OpenRefine, and created a controlled vocabulary for it ([#267](https://github.com/ilri/DSpace/pull/267)) diff --git a/public/2016-09/index.html b/public/2016-09/index.html index b2a6cfeac..28d745f25 100644 --- a/public/2016-09/index.html +++ b/public/2016-09/index.html @@ -395,11 +395,10 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error -
dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc)
- to /tmp/affiliations.csv with csv;
+
dspacetest=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=211 group by text_value order by count desc) to /tmp/affiliations.csv with csv;