- Looks like the OAI bug from DSpace 5.1 that caused validation at Base Search to fail is now fixed and DSpace Test passes validation! ([#63](https://github.com/ilri/DSpace/issues/63))
- After re-deploying and re-indexing I didn't see the same issue, and the indexing completed in 85 minutes, which is about how long it is supposed to take
- I noticed some weird CRPs in the database, and they don't show up in Discovery for some reason, perhaps the `:`
- I'll export these and fix them in batch:
```
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=230 group by text_value order by count desc) to /tmp/crp.csv with csv;
- Add `AMR` to ILRI subjects and remove one duplicate instance of IITA in author affiliations controlled vocabulary ([#288](https://github.com/ilri/DSpace/pull/288))
- Thinking about batch updates for ORCIDs and authors
- Playing with [SolrClient](https://github.com/moonlitesolutions/SolrClient) in Python to query Solr
- All records in the authority core are either `authority_type:orcid` or `authority_type:person`
- There is a `deleted` field and all items seem to be `false`, but might be important sanity check to remember
- The way to go is probably to have a CSV of author names and authority IDs, then to batch update them in PostgreSQL
- Dump of the top ~200 authors in CGSpace:
```
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=3 group by text_value order by count desc limit 210) to /tmp/210-authors.csv with csv;
dspace=# \copy (select distinct text_lang, count(*) from metadatavalue where resource_type_id=2 group by text_lang order by count desc) to /tmp/text-langs.csv with csv;
COPY 14
```
- Perhaps we need to fix them all in batch, or experiment with fixing only certain metadatavalues:
```
dspace=# update metadatavalue set text_lang='en_US' where resource_type_id=2 and metadata_field_id=203 and text_value='SEEDS';
- The `fix-metadata.py` script I have is meant for specific metadata values, so if I want to update some `text_lang` values I should just do it directly in the database
- For example, on a limited set:
```
dspace=# update metadatavalue set text_lang=NULL where resource_type_id=2 and metadata_field_id=203 and text_value='LIVESTOCK' and text_lang='';
UPDATE 420
```
- And assuming I want to do it for all fields:
```
dspacetest=# update metadatavalue set text_lang=NULL where resource_type_id=2 and text_lang='';
UPDATE 183726
```
- After that restarted Tomcat and PostgreSQL (because I'm superstitious about caches) and now I see the following in REST API query:
- I applied Atmire's suggestions to fix Listings and Reports for DSpace 5.5 and now it works
- There were some issues with the `dspace/modules/jspui/pom.xml`, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire's installation procedure must have changed
- So there is apparently this Tomcat native way to limit web crawlers to one session: [Crawler Session Manager](https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve)
- After adding that to `server.xml` bots matching the pattern in the configuration will all use ONE session, just like normal users:
```
$ http --print h https://dspacetest.cgiar.org 'User-Agent:Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
- Generate a list of journal titles for Peter and Abenet to look through so we can make a controlled vocabulary out of them:
```
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id=55 group by text_value order by count desc) to /tmp/journal-titles.csv with csv;
COPY 2515
```
- Send a message to users of the CGSpace REST API to notify them of upcoming upgrade so they can test their apps against DSpace Test
- Test an update old, non-HTTPS links to the CCAFS website in CGSpace metadata:
```
dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, 'http://ccafs.cgiar.org','https://ccafs.cgiar.org') where resource_type_id=2 and text_value like '%http://ccafs.cgiar.org%';
UPDATE 164
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://ccafs.cgiar.org','https://ccafs.cgiar.org') where resource_type_id=2 and text_value like '%http://ccafs.cgiar.org%';
UPDATE 7
```
- Had to run it twice to get all (not sure about "global" regex in PostgreSQL)
- Run the updates on CGSpace as well
- Run through some collections and manually regenerate some PDF thumbnails for items from before 2016 on DSpace Test to compare with CGSpace
- I'm debating forcing the re-generation of ALL thumbnails, since some come from DSpace 3 and 4 when the thumbnailing wasn't as good
- The results were very good, I think that after we upgrade to 5.5 I will do it, perhaps one community / collection at a time:
```
$ [dspace]/bin/dspace filter-media -f -i 10568/67156 -p "ImageMagick PDF Thumbnail"
```
- In related news, I'm looking at thumbnails of thumbnails (the ones we uploaded manually before, and now DSpace's media filter has made thumbnails of THEM):
```
dspace=# select text_value from metadatavalue where text_value like '%.jpg.jpg';
```
- I'm not sure if there's anything we can do, actually, because we would have to remove those from the thumbnail bundles, and replace them with the regular JPGs from the content bundle, and then remove them from the assetstore...
- I had started planning the inplace PostgreSQL 9.3→9.5 upgrade but decided that I will have to `pg_dump` and `pg_restore` when I move to the new server soon anyways, so there's no need to upgrade the database right now
- Chat with Carlos about CGCore and the CGSpace metadata registry
- Dump CGSpace metadata field registry for Carlos: https://gist.github.com/alanorth/8cbd0bb2704d4bbec78025b4742f8e70
- Send some feedback to Carlos on CG Core so they can better understand how DSpace/CGSpace uses metadata
- Notes about PostgreSQL tuning from James: https://paste.fedoraproject.org/488776/14798952/
- Play with Creative Commons stuff in DSpace submission step
- It seems to work but it doesn't let you choose a version of CC (like 4.0), and we would need to customize the XMLUI item display so it doesn't display the gross CC badges
## 2016-11-24
- Bizuwork was testing DSpace Test on DSPace 5.5 and noticed that the Listings and Reports module seems to be case sensitive, whereas CGSpace's Listings and Reports isn't (ie, a search for "orth, alan" vs "Orth, Alan" returns the same results on CGSpace, but different on DSpace Test)