This was more me being cautious when I was writing the original tool
than a warning about it being actually unsafe. Now that this web fro
ntend will be used by less-technical users I should tone down the la
nguage.
Enable AGROVOC lookup on dcterms.subject as well as the "unsafe"
fixes. For the AGROVOC lookup I just think that it might not be
obvious to non-technical users that you have to check the box AND
enter a field name, despite the placeholder value. In any case, it
doesn't hurt to enable AGROVOC lookup by default because it won't
fail if the default dcterms.subject field is not present in the
user's CSV.
Google App Engine agressively caches stuff. They are currently serving
a 24-hour old version of my CSS after multiple updates and re-deploys.
Ughhh. From their docs:
> After a file is transmitted with a given expiration time, there is
> generally no way to clear it out of web-proxy caches, even if the user
> clears their own browser cache. Re-deploying a new version of the app
> will not reset any caches. Therefore, if you ever plan to modify a
> static file, it should have a short (less than one hour) expiration
> time. In most cases, the default 10-minute expiration time is
> appropriate.
The only way to break this for now is to change the CSS *directory*.
In the future I think we have to be sure to set the private cache
control header, which lets browsers cache it, but not public CDNs.
See: https://cloud.google.com/appengine/docs/standard/python3/how-requests-are-handled
As I expected, on Google App Engine we can't write the cache file
to the current working directory. I modified csv-metadata-quality
CLI to check for the REQUESTS_CACHE_DIR environment variable so we
don't really have to do anything different other than setting the
variable.
This works locally, but I don't think it will work on App Engine
because csv-metadata-quality uses requests-cache and creates the
agrovoc-response-cache.sqlite file in the current working directory.
Re-work upload and file processing so they are in the same Python
function. Now I will start exposing other command line options in
the form, like unsafe fixes, excluding fields, etc. Now I see tha
t it is easier to save the POSTed file and process it in the same
function so I don't have to pass around the other POSTed form val
ues as URL query parameters.
Now, as a result of changing the flow above, I also had to make a
change to the way I show the results page. Instead of processing
the file and returning the rendered results to the user directly,
I process the file, save the rendered results to /tmp, and return
a redirect to the user to the results page.