- Looks like the OAI bug from DSpace 5.1 that caused validation at Base Search to fail is now fixed and DSpace Test passes validation! ([#63](https://github.com/ilri/DSpace/issues/63))
- After re-deploying and re-indexing I didn't see the same issue, and the indexing completed in 85 minutes, which is about how long it is supposed to take
- I noticed some weird CRPs in the database, and they don't show up in Discovery for some reason, perhaps the `:`
- I'll export these and fix them in batch:
```
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=230 group by text_value order by count desc) to /tmp/crp.csv with csv;
- Add `AMR` to ILRI subjects and remove one duplicate instance of IITA in author affiliations controlled vocabulary ([#288](https://github.com/ilri/DSpace/pull/288))
- Thinking about batch updates for ORCIDs and authors
- Playing with [SolrClient](https://github.com/moonlitesolutions/SolrClient) in Python to query Solr
- All records in the authority core are either `authority_type:orcid` or `authority_type:person`
- There is a `deleted` field and all items seem to be `false`, but might be important sanity check to remember
- The way to go is probably to have a CSV of author names and authority IDs, then to batch update them in PostgreSQL
- Dump of the top ~200 authors in CGSpace:
```
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=3 group by text_value order by count desc limit 210) to /tmp/210-authors.csv with csv;
dspace=# \copy (select distinct text_lang, count(*) from metadatavalue where resource_type_id=2 group by text_lang order by count desc) to /tmp/text-langs.csv with csv;
COPY 14
```
- Perhaps we need to fix them all in batch, or experiment with fixing only certain metadatavalues:
```
dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS';
- The `fix-metadata.py` script I have is meant for specific metadata values, so if I want to update some `text_lang` values I should just do it directly in the database
- For example, on a limited set:
```
dspace=# update metadatavalue set text_lang=NULL where resource_type_id=2 and metadata_field_id=203 and text_value='LIVESTOCK' and text_lang='';
UPDATE 420
```
- And assuming I want to do it for all fields:
```
dspacetest=# update metadatavalue set text_lang=NULL where resource_type_id=2 and text_lang='';
UPDATE 183726
```
- After that restarted Tomcat and PostgreSQL (because I'm superstitious about caches) and now I see the following in REST API query:
- I applied Atmire's suggestions to fix Listings and Reports for DSpace 5.5 and now it works
- There were some issues with the `dspace/modules/jspui/pom.xml`, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire's installation procedure must have changed
- So there is apparently this Tomcat native way to limit web crawlers to one session: [Crawler Session Manager](https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve)
- After adding that to `server.xml` bots matching the pattern in the configuration will all use ONE session, just like normal users:
```
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'