- Looking at issues with author authorities on CGSpace
- For some reason we still have the `index-lucene-update` cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module
- As I was looking at the CUA config I realized our Discovery config is all messed up and confusing
- I've opened an issue to track some of that work ([#186](https://github.com/ilri/DSpace/issues/186))
- I did some major cleanup work on Discovery and XMLUI stuff related to the `dc.type` indexes ([#187](https://github.com/ilri/DSpace/pull/187))
- We had been confusing `dc.type` (a Dublin Core value) with `dc.type.output` (a value we invented) for a few years and it had permeated all aspects of our data, indexes, item displays, etc.
- There is still some more work to be done to remove references to old `outputtype` and `output`
- Fix some items that had invalid dates (I noticed them in the log during a re-indexing)
- Reset `search.index.*` to the default, as it is only used by Lucene (deprecated by Discovery in DSpace 5.x): [#188](https://github.com/ilri/DSpace/pull/188)
- I looked at a bunch of them and they were old URLs, weird things linked from non-existent items, etc, so I just marked them all as fixed
- We also have 1,300 "soft 404" errors for URLs like: https://cgspace.cgiar.org/handle/10568/440/browse?type=bioversity
- I've marked them as fixed as well since the ones I tested were working fine
- This raises another question, as many of these pages are linked from Discovery search results and might create a duplicate content problem...
- Results pages like this give items that Google already knows from the sitemap: https://cgspace.cgiar.org/discover?filtertype=author&filter_relational_operator=equals&filter=Orth%2C+A.
- There are some access denied errors on JSPUI links (of course! we forbid them!), but I'm not sure why Google is trying to index them...
- I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!
- Google says the first time it saw this particular error was September 29, 2015... so maybe it accidentally saw it somehow...
- On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content
- Turns out this is a problem with DSpace's `robots.txt`, and there's a Jira ticket since December, 2015: https://jira.duraspace.org/browse/DS-2962
- I am not sure if I want to apply it yet
- For now I've just set a bunch of these dynamic pages to not appear in search results by using the URL Parameters tool in Webmaster Tools
![URL parameters cause millions of dynamic pages](../images/2016/03/url-parameters.png)
![Setting pages with the filter_0 param not to show in search results](../images/2016/03/url-parameters2.png)
- Move AVCD collection to new community and update `move_collection.sh` script: https://gist.github.com/alanorth/392c4660e8b022d99dfa
- It seems Feedburner can do HTTPS now, so we might be able to update our feeds and simplify the nginx configs
- De-deploy CGSpace with latest `5_x-prod` branch
- Run updates on CGSpace and reboot server (new kernel, `4.5.0`)