- Looking at issues with author authorities on CGSpace
- For some reason we still have the `index-lucene-update` cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module
- As I was looking at the CUA config I realized our Discovery config is all messed up and confusing
- I've opened an issue to track some of that work ([#186](https://github.com/ilri/DSpace/issues/186))
- I did some major cleanup work on Discovery and XMLUI stuff related to the `dc.type` indexes ([#187](https://github.com/ilri/DSpace/pull/187))
- We had been confusing `dc.type` (a Dublin Core value) with `dc.type.output` (a value we invented) for a few years and it had permeated all aspects of our data, indexes, item displays, etc.
- There is still some more work to be done to remove references to old `outputtype` and `output`
- Fix some items that had invalid dates (I noticed them in the log during a re-indexing)
- Reset `search.index.*` to the default, as it is only used by Lucene (deprecated by Discovery in DSpace 5.x): [#188](https://github.com/ilri/DSpace/pull/188)
- I looked at a bunch of them and they were old URLs, weird things linked from non-existent items, etc, so I just marked them all as fixed
- We also have 1,300 "soft 404" errors for URLs like: https://cgspace.cgiar.org/handle/10568/440/browse?type=bioversity
- I've marked them as fixed as well since the ones I tested were working fine
- This raises another question, as many of these pages are linked from Discovery search results and might create a duplicate content problem...
- Results pages like this give items that Google already knows from the sitemap: https://cgspace.cgiar.org/discover?filtertype=author&filter_relational_operator=equals&filter=Orth%2C+A.
- There are some access denied errors on JSPUI links (of course! we forbid them!), but I'm not sure why Google is trying to index them...
- I will mark these errors as resolved because they are returning HTTP 403 on purpose, for a long time!
- Google says the first time it saw this particular error was September 29, 2015... so maybe it accidentally saw it somehow...
- On a related note, we have 51,000 items indexed from the sitemap, but 500,000 items in the Google index, so we DEFINITELY have a problem with duplicate content
- Merge robots.txt patch and disallow indexing of browse pages as our sitemap is consumed correctly ([#198](https://github.com/ilri/DSpace/issues/198))
- Seems we only want `AccessStep` because `UploadWithEmbargoStep` disables the ability to edit embargos at the item level
- This pull request enables the ability to set an item-level embargo during submission: https://github.com/ilri/DSpace/pull/203
- I figured out that the problem with Listings and Reports was because I disabled the `search.index.*` last week, and they are still used by JSPUI apparently
- This pull request re-enables them: https://github.com/ilri/DSpace/pull/202
- Re-deploy DSpace Test, run all system updates, and restart the server
- Looks like the Listings and Reports fix was NOT due to the search indexes (which are actually not used), and rather due to the filter configuration in the Listings and Reports config
- This pull request simply updates the config for the dc.type.output→dc.type change that was made last week: https://github.com/ilri/DSpace/pull/204
- Skype meeting with Peter and Addis team to discuss metadata changes for Dublin Core, CGcore, and CGSpace-specific fields
- We decided to proceed with some deletes first, then identify CGSpace-specific fields to clean/move to `cg.*`, and then worry about broader changes to DC
- Before we move or rename and fields we need to circulate a list of fields we intend to change to CCAFS, CWPF, etc who might be harvesting the fields
- After all of this we need to start implementing controlled vocabularies for fields, either with the Javascript lookup or like existing ILRI subjects