+++ date = "2016-11-01T09:21:00+03:00" author = "Alan Orth" title = "November, 2016" tags = ["Notes"] +++ ## 2016-11-01 - Add `dc.type` to the output options for Atmire's Listings and Reports module ([#286](https://github.com/ilri/DSpace/pull/286)) ![Listings and Reports with output type](2016/11/listings-and-reports.png) ## 2016-11-02 - Migrate DSpace Test to DSpace 5.5 ([notes](https://gist.github.com/alanorth/61013895c6efe7095d7f81000953d1cf)) - Run all updates on DSpace Test and reboot the server - Looks like the OAI bug from DSpace 5.1 that caused validation at Base Search to fail is now fixed and DSpace Test passes validation! ([#63](https://github.com/ilri/DSpace/issues/63)) - Indexing Discovery on DSpace Test took 332 minutes, which is like five times as long as it usually takes - At the end it appeared to finish correctly but there were lots of errors right after it finished: ``` 2016-11-02 15:09:48,578 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76454 to Index 2016-11-02 15:09:48,584 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Community: 10568/3202 to Index 2016-11-02 15:09:48,589 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76455 to Index 2016-11-02 15:09:48,590 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Community: 10568/51693 to Index 2016-11-02 15:09:48,590 INFO org.dspace.discovery.IndexClient @ Done with indexing 2016-11-02 15:09:48,600 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76456 to Index 2016-11-02 15:09:48,613 INFO org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/55536 to Index 2016-11-02 15:09:48,616 INFO com.atmire.dspace.discovery.AtmireSolrService @ Wrote Collection: 10568/76457 to Index 2016-11-02 15:09:48,634 ERROR com.atmire.dspace.discovery.AtmireSolrService @ java.lang.NullPointerException at org.dspace.discovery.SearchUtils.getDiscoveryConfiguration(SourceFile:57) at org.dspace.discovery.SolrServiceImpl.buildDocument(SolrServiceImpl.java:824) at com.atmire.dspace.discovery.AtmireSolrService.indexContent(AtmireSolrService.java:821) at com.atmire.dspace.discovery.AtmireSolrService.updateIndex(AtmireSolrService.java:898) at org.dspace.discovery.SolrServiceImpl.createIndex(SolrServiceImpl.java:370) at org.dspace.storage.rdbms.DatabaseUtils$ReindexerThread.run(DatabaseUtils.java:945) ``` - DSpace is still up, and a few minutes later I see the default DSpace indexer is still running - Sure enough, looking back before the first one finished, I see output from both indexers interleaved in the log: ``` 2016-11-02 15:09:28,545 INFO org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/47242 to Index 2016-11-02 15:09:28,633 INFO org.dspace.discovery.SolrServiceImpl @ Wrote Item: 10568/60785 to Index 2016-11-02 15:09:28,678 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (55695 of 55722): 43557 2016-11-02 15:09:28,688 INFO com.atmire.dspace.discovery.AtmireSolrService @ Processing (55703 of 55722): 34476 ``` - I will raise a ticket with Atmire to ask them ## 2016-11-06 - After re-deploying and re-indexing I didn't see the same issue, and the indexing completed in 85 minutes, which is about how long it is supposed to take ## 2016-11-07 - Horrible one liner to get Linode ID from certain Ansible host vars: ``` $ grep -A 3 contact_info * | grep -E "(Orth|Sisay|Peter|Daniel|Tsega)" | awk -F'-' '{print $1}' | grep linode | uniq | xargs grep linode_id ``` - I noticed some weird CRPs in the database, and they don't show up in Discovery for some reason, perhaps the `:` - I'll export these and fix them in batch: ``` dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=230 group by text_value order by count desc) to /tmp/crp.csv with csv; COPY 22 ``` - Test running the replacements: ``` $ ./fix-metadata-values.py -i /tmp/CRPs.csv -f cg.contributor.crp -t correct -m 230 -d dspace -u dspace -p 'fuuu' ``` - Add `AMR` to ILRI subjects and remove one duplicate instance of IITA in author affiliations controlled vocabulary ([#288](https://github.com/ilri/DSpace/pull/288)) ## 2016-11-08 - Atmire's Listings and Reports module seems to be broken on DSpace 5.5 ![Listings and Reports broken in DSpace 5.5](2016/11/listings-and-reports-55.png) - I've filed a ticket with Atmire - Thinking about batch updates for ORCIDs and authors - Playing with [SolrClient](https://github.com/moonlitesolutions/SolrClient) in Python to query Solr - All records in the authority core are either `authority_type:orcid` or `authority_type:person` - There is a `deleted` field and all items seem to be `false`, but might be important sanity check to remember - The way to go is probably to have a CSV of author names and authority IDs, then to batch update them in PostgreSQL - Dump of the top ~200 authors in CGSpace: ``` dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=3 group by text_value order by count desc limit 210) to /tmp/210-authors.csv with csv; ``` ## 2016-11-09 - CGSpace crashed so I quickly ran system updates, applied one or two of the waiting changes from the `5_x-prod` branch, and rebooted the server - The error was `Timeout waiting for idle object` but I haven't looked into the Tomcat logs to see what happened - Also, I ran the corrections for CRPs from earlier this week ## 2016-11-10 - Helping Megan Zandstra and CIAT with some questions about the REST API - Playing with `find-by-metadata-field`, this works: ``` $ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS"}' ``` - But the results are deceiving because metadata fields can have text languages and your query must match exactly! ``` dspace=# select distinct text_value, text_lang from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS'; text_value | text_lang ------------+----------- SEEDS | SEEDS | SEEDS | en_US (3 rows) ``` - So basically, the text language here could be null, blank, or en_US - To query metadata with these properties, you can do: ``` $ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS"}' | jq length 55 $ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS", "language":""}' | jq length 34 $ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS", "language":"en_US"}' | jq length ``` - The results (55+34=89) don't seem to match those from the database: ``` dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang is null; count ------- 15 dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang=''; count ------- 4 dspace=# select count(text_value) from metadatavalue where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS' and text_lang='en_US'; count ------- 66 ``` - So, querying from the API I get 55 + 34 = 89 results, but the database actually only has 85... - And the `find-by-metadata-field` endpoint doesn't seem to have a way to get all items with the field, or a wildcard value - I'll ask a question on the dspace-tech mailing list - And speaking of `text_lang`, this is interesting: ``` dspacetest=# select distinct text_lang from metadatavalue where resource_type_id=2; text_lang ----------- ethnob en spa EN es frn en_ en_US EN_US eng en_U fr (14 rows) ``` - Generate a list of all these so I can maybe fix them in batch: ``` dspace=# \copy (select distinct text_lang, count(*) from metadatavalue where resource_type_id=2 group by text_lang order by count desc) to /tmp/text-langs.csv with csv; COPY 14 ``` - Perhaps we need to fix them all in batch, or experiment with fixing only certain metadatavalues: ``` dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS'; UPDATE 85 ``` - The `fix-metadata.py` script I have is meant for specific metadata values, so if I want to update some `text_lang` values I should just do it directly in the database - For example, on a limited set: ``` dspace=# update metadatavalue set text_lang=NULL where resource_type_id=2 and metadata_field_id=203 and text_value='LIVESTOCK' and text_lang=''; UPDATE 420 ``` - And assuming I want to do it for all fields: ``` dspacetest=# update metadatavalue set text_lang=NULL where resource_type_id=2 and text_lang=''; UPDATE 183726 ``` - After that restarted Tomcat and PostgreSQL (because I'm superstitious about caches) and now I see the following in REST API query: ``` $ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS"}' | jq length 71 $ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS", "language":""}' | jq length 0 $ curl -s -H "accept: application/json" -H "Content-Type: application/json" -X POST "http://localhost:8080/rest/items/find-by-metadata-field" -d '{"key": "cg.subject.ilri","value": "SEEDS", "language":"en_US"}' | jq length ``` - Not sure what's going on, but Discovery shows 83 values, and database shows 85, so I'm going to reindex Discovery just in case ## 2016-11-14 - I applied Atmire's suggestions to fix Listings and Reports for DSpace 5.5 and now it works - There were some issues with the `dspace/modules/jspui/pom.xml`, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire's installation procedure must have changed