mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-16 11:57:03 +01:00
Update notes for 2017-05-08
This commit is contained in:
parent
18346b3b90
commit
3a1e203aa6
@ -56,6 +56,19 @@ $ ./fix-metadata-values.py -i ccafs-flagships-may7.csv -f cg.subject.ccafs -t co
|
|||||||
|
|
||||||
- Start working on CGIAR Library migration
|
- Start working on CGIAR Library migration
|
||||||
- We decided to use AIP export to preserve the hierarchies and handles of communities and collections
|
- We decided to use AIP export to preserve the hierarchies and handles of communities and collections
|
||||||
|
- When ingesting some collections I was getting `java.lang.OutOfMemoryError: GC overhead limit exceeded`, which can be solved by disabling the GC timeout with `-XX:-UseGCOverheadLimit`
|
||||||
|
- Other times I was getting an error about heap space, so I kept bumping the RAM allocation by 512MB each time (up to 4096m!) it crashed
|
||||||
|
- This leads to tens of thousands of abandoned files in the assetstore, which need to be cleaned up using `dspace cleanup -v`, or else you'll run out of disk space
|
||||||
|
- In the end I realized it's better to use submission mode (`-s`) to ingest the community object as a single AIP without its children, followed by each of the collections:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m -XX:-UseGCOverheadLimit"
|
||||||
|
$ [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10568/87775 /home/aorth/10947-1/10947-1.zip
|
||||||
|
$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
|
||||||
|
$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
|
||||||
|
```
|
||||||
|
|
||||||
|
- Note that in submission mode DSpace ignores the handle specified in `mets.xml` in the zip file, so you need to turn that off with `-o ignoreHandle=false`
|
||||||
- Give feedback to CIFOR about their data quality:
|
- Give feedback to CIFOR about their data quality:
|
||||||
- Suggestion: uppercase dc.subject, cg.coverage.region, and cg.coverage.subregion in your crosswalk so they match CGSpace and therefore can be faceted / reported on easier
|
- Suggestion: uppercase dc.subject, cg.coverage.region, and cg.coverage.subregion in your crosswalk so they match CGSpace and therefore can be faceted / reported on easier
|
||||||
- Suggestion: use CGSpace's CRP names (cg.contributor.crp), see: dspace/config/input-forms.xml
|
- Suggestion: use CGSpace's CRP names (cg.contributor.crp), see: dspace/config/input-forms.xml
|
||||||
|
@ -13,7 +13,7 @@
|
|||||||
|
|
||||||
|
|
||||||
<meta property="article:published_time" content="2017-05-01T16:21:52+02:00"/>
|
<meta property="article:published_time" content="2017-05-01T16:21:52+02:00"/>
|
||||||
<meta property="article:modified_time" content="2017-05-08T17:51:55+03:00"/>
|
<meta property="article:modified_time" content="2017-05-08T20:20:52+03:00"/>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -45,9 +45,9 @@
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "May, 2017",
|
"headline": "May, 2017",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2017-05/",
|
"url": "https://alanorth.github.io/cgspace-notes/2017-05/",
|
||||||
"wordCount": "464",
|
"wordCount": "653",
|
||||||
"datePublished": "2017-05-01T16:21:52+02:00",
|
"datePublished": "2017-05-01T16:21:52+02:00",
|
||||||
"dateModified": "2017-05-08T17:51:55+03:00",
|
"dateModified": "2017-05-08T20:20:52+03:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -181,6 +181,20 @@
|
|||||||
<ul>
|
<ul>
|
||||||
<li>Start working on CGIAR Library migration</li>
|
<li>Start working on CGIAR Library migration</li>
|
||||||
<li>We decided to use AIP export to preserve the hierarchies and handles of communities and collections</li>
|
<li>We decided to use AIP export to preserve the hierarchies and handles of communities and collections</li>
|
||||||
|
<li>When ingesting some collections I was getting <code>java.lang.OutOfMemoryError: GC overhead limit exceeded</code>, which can be solved by disabling the GC timeout with <code>-XX:-UseGCOverheadLimit</code></li>
|
||||||
|
<li>Other times I was getting an error about heap space, so I kept bumping the RAM allocation by 512MB each time (up to 4096m!) it crashed</li>
|
||||||
|
<li>This leads to tens of thousands of abandoned files in the assetstore, which need to be cleaned up using <code>dspace cleanup -v</code>, or else you’ll run out of disk space</li>
|
||||||
|
<li>In the end I realized it’s better to use submission mode (<code>-s</code>) to ingest the community object as a single AIP without its children, followed by each of the collections:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx2048m -XX:-UseGCOverheadLimit"
|
||||||
|
$ [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e some@user.com -p 10568/87775 /home/aorth/10947-1/10947-1.zip
|
||||||
|
$ for collection in /home/aorth/10947-1/COLLECTION@10947-*; do [dspace]/bin/dspace packager -s -o ignoreHandle=false -t AIP -e aorth@mjanja.ch -p 10947/1 $collection; done
|
||||||
|
$ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager -r -f -u -t AIP -e aorth@mjanja.ch $item; done
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Note that in submission mode DSpace ignores the handle specified in <code>mets.xml</code> in the zip file, so you need to turn that off with <code>-o ignoreHandle=false</code></li>
|
||||||
<li>Give feedback to CIFOR about their data quality:
|
<li>Give feedback to CIFOR about their data quality:
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2017-05/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2017-05/</loc>
|
||||||
<lastmod>2017-05-08T17:51:55+03:00</lastmod>
|
<lastmod>2017-05-08T20:20:52+03:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -99,7 +99,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2017-05-08T17:51:55+03:00</lastmod>
|
<lastmod>2017-05-08T20:20:52+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -110,19 +110,19 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
<lastmod>2017-05-08T17:51:55+03:00</lastmod>
|
<lastmod>2017-05-08T20:20:52+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/post/</loc>
|
||||||
<lastmod>2017-05-08T17:51:55+03:00</lastmod>
|
<lastmod>2017-05-08T20:20:52+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2017-05-08T17:51:55+03:00</lastmod>
|
<lastmod>2017-05-08T20:20:52+03:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user