From d3a5169489894ff0ba3f01669db39860b6128d3e Mon Sep 17 00:00:00 2001 From: Alan Orth Date: Wed, 12 Apr 2017 16:01:30 +0300 Subject: [PATCH] Update notes for 2017-04-12 --- content/post/2017-04.md | 24 +++++++++++++++++++++++- public/2017-04/index.html | 31 +++++++++++++++++++++++++++---- public/sitemap.xml | 10 +++++----- 3 files changed, 55 insertions(+), 10 deletions(-) diff --git a/content/post/2017-04.md b/content/post/2017-04.md index 28cf81cf5..250791f5e 100644 --- a/content/post/2017-04.md +++ b/content/post/2017-04.md @@ -123,10 +123,13 @@ $ grep -c profile /tmp/filter-media-cmyk.txt ## 2017-04-11 - Looking at the item from CIFOR it hasn't been updated yet, maybe they aren't running the cron job +- I emailed Usman from CIFOR to ask if he's running the cron job ## 2017-04-12 -- CIFOR says they have cleaned their OAI cache and run the import again, but I still don't see any updates in their OAI +- CIFOR says they have cleaned their OAI cache and that the cron job for OAI import is enabled +- Now I see updated fields, like `dc.date.issued` but none from the CG or CIFOR namespaces +- Also, DSpace Test hasn't re-harvested this item yet, so I will wait one more day before forcing a re-harvest - Looking at CIFOR's OAI using different metadata formats, like qualified Dublin Core and DSpace Intermediate Metadata: - QDC: https://data.cifor.org/dspace/oai/request?verb=ListRecords&resumptionToken=qdc///col_11463_6/900 - DIM: https://data.cifor.org/dspace/oai/request?verb=ListRecords&resumptionToken=dim///col_11463_6/900 @@ -157,3 +160,22 @@ OAI 2.0 manager action ended. It took 829 seconds. - After reading some threads on the DSpace mailing list, I see that `clean-cache` is actually only for caching _responses_, ie to client requests in the OAI web application - These are stored in `[dspace]/var/oai/requests/` - The import command should theoretically catch situations like this where an item's metadata was updated, but in this case we changed the metadata schema and it doesn't seem to catch it (could be a bug!) +- Attempting a full rebuild of OAI on CGSpace: + +``` +$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m" +$ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspace oai import -c +... +58700 items imported so far... +Total: 58789 items +Purging cached OAI responses. +OAI 2.0 manager action ended. It took 1032 seconds. + +real 17m20.156s +user 4m35.293s +sys 1m29.310s +``` + +- Now the data for 10568/6 is correct in OAI: https://cgspace.cgiar.org/oai/request?verb=GetRecord&metadataPrefix=dim&identifier=oai:cgspace.cgiar.org:10568/6 +- Perhaps I need to file a bug for this, or at least ask on the DSpace Test mailing list? +- I wonder if we could use a crosswalk to convert to a format that CG Core wants, like `` diff --git a/public/2017-04/index.html b/public/2017-04/index.html index 8a908c2c2..3b40d686e 100644 --- a/public/2017-04/index.html +++ b/public/2017-04/index.html @@ -30,7 +30,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th - + @@ -79,9 +79,9 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th "@type": "BlogPosting", "headline": "April, 2017", "url": "https://alanorth.github.io/cgspace-notes/2017-04/", - "wordCount": "1063", + "wordCount": "1208", "datePublished": "2017-04-02T17:08:52+02:00", - "dateModified": "2017-04-11T20:46:03+03:00", + "dateModified": "2017-04-12T14:39:42+03:00", "author": { "@type": "Person", "name": "Alan Orth" @@ -288,12 +288,15 @@ ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0

2017-04-12