- Try to do some metadata field migrations using the Atmire batch UI (`dc.Species` → `cg.species`) but it took several hours and even missed a few records
## 2016-04-06
- A better way to move metadata on this scale is via SQL, for example `dc.type.output` → `dc.type` (their IDs in the metadatafieldregistry are 66 and 109, respectively):
```
dspacetest=# update metadatavalue set metadata_field_id=109 where metadata_field_id=66;
- Looking at the DOI issue [reported by Leroy from CIAT a few weeks ago](https://www.yammer.com/dspacedevelopers/#/Threads/show?threadId=678507860)
- It seems the `dx.doi.org` URLs are much more proper in our repository!
```
dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://dx.doi.org%';
count
-------
5638
(1 row)
dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and text_value like 'http://doi.org%';
count
-------
3
```
- I will manually edit the `dc.identifier.doi` in [10568/72509](https://cgspace.cgiar.org/handle/10568/72509?show=full) and tweet the link, then check back in a week to see if the donut gets updated
- The donut is already updated and shows the correct number now
- CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we'd do it tentatively on Monday the 18th.
- Looking at our `input-forms.xml` I see we have two sets of ILRI subjects, but one has a few extra subjects
- Remove one set of ILRI subjects and remove duplicate `VALUE CHAINS` from existing list ([#216](https://github.com/ilri/DSpace/pull/216))
- I decided to keep the set of subjects that had `FMD` and `RANGELANDS` added, as it appears to have been requested to have been added, and might be the newer list
- I found 226 blank metadatavalues:
```
dspacetest# select * from metadatavalue where resource_type_id=2 and text_value='';
```
- I think we should delete them and do a full re-index:
```
dspacetest=# delete from metadatavalue where resource_type_id=2 and text_value='';
- In other news, moving the `dc.type.output` to `dc.type` and re-indexing seems to have fixed the Listings and Reports issue from above
- Unfortunately this isn't a very good solution, because Listings and Reports config should allow us to filter on `dc.type.*` but the documentation isn't very clear and I couldn't reach Atmire today
- We want to do the `dc.type.output` move on CGSpace anyways, but we should wait as it might affect other external people!
SEVERE: Mapped exception to response: 500 (Internal Server Error)
javax.ws.rs.WebApplicationException
at org.dspace.rest.Resource.processFinally(Resource.java:163)
at org.dspace.rest.HandleResource.getObject(HandleResource.java:81)
at sun.reflect.GeneratedMethodAccessor198.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
...
```
- Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)
- After restarting Tomcat a few more of these errors were logged but the application was up
- Get handles for items that are using a given metadata field, ie `dc.Species.animal` (105):
```
# select handle from item, handle where handle.resource_id = item.item_id AND item.item_id in (select resource_id from metadatavalue where resource_type_id=2 and metadata_field_id=105);
handle
-------------
10568/10298
10568/16413
10568/16774
10568/34487
```
- Delete metadata values for `dc.GRP` and `dc.icsubject.icrafsubject`:
```
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=96;
# delete from metadatavalue where resource_type_id=2 and metadata_field_id=83;
```
- They are old ICRAF fields and we haven't used them since 2011 or so
- Re-deploy DSpace Test with the new subject and type fields, run all system updates, and reboot the server
- Migrate fields and re-deploy CGSpace with the new subject and type fields, run all system updates, and reboot the server
- Field migration went well:
```
$ ./migrate-fields.sh
UPDATE metadatavalue SET metadata_field_id=109 WHERE metadata_field_id=66
UPDATE 40909
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51419
UPDATE metadatavalue SET metadata_field_id=208 WHERE metadata_field_id=82
UPDATE 5986
UPDATE metadatavalue SET metadata_field_id=210 WHERE metadata_field_id=88
UPDATE 2458
UPDATE metadatavalue SET metadata_field_id=215 WHERE metadata_field_id=106
UPDATE 3872
UPDATE metadatavalue SET metadata_field_id=217 WHERE metadata_field_id=108
UPDATE 46075
```
- Also, I migrated CGSpace to using the PGDG PostgreSQL repo as the infrastructure playbooks had been using it for a while and it seemed to be working well
- Basically, this gives us the ability to use the latest upstream stable 9.3.x release (currently 9.3.12)
- It turns out the `robots.txt` issue we thought we solved last month isn't solved because you can't use wildcards in URL patterns: https://jira.duraspace.org/browse/DS-2962
- Write some nginx rules to add `X-Robots-Tag` HTTP headers to the dynamic requests from `robots.txt` instead