Add notes for 2019-08-14

This commit is contained in:
Alan Orth 2019-08-14 13:39:29 +03:00
parent 9f41690ed8
commit 3d73a51b1c
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
3 changed files with 74 additions and 8 deletions

View File

@ -159,5 +159,36 @@ $ dspace user -a -m blah@blah.com -g Mohammad -s Salem -p 'domoamaaa'
- Create and merge a pull request ([#429](https://github.com/ilri/DSpace/pull/429)) to add eleven new CCAFS Phase II Project Tags to CGSpace
- Atmire responded to the [Solr cores issue](https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=685) last week, but they could not reproduce the issue
- I told them not to continue, and that we would keep an eye on it and keep troubleshooting it (if neccessary) in the public eye on dspace-tech and Solr mailing lists
- Testing an import of 1,429 Bioversity items (metadata only) on my local development machine and got an error with Java memory after about 1,000 items:
```
$ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com
...
java.lang.OutOfMemoryError: GC overhead limit exceeded
```
- I increased the heap size to 1536m and tried again:
```
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1536m"
$ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com
```
- This time it succeeded, and using VisualVM I noticed that the import process used a maximum of 620MB of RAM
## 2019-08-14
- I imported the 1429 Bioversity records into DSpace Test
- To make sure we didn't have memory issues I reduced Tomcat's JVM heap by 512m, increased the import processes's heap to 512m, and split the input file into two parts with about 700 each
- Then I had to create a few new temporary collections on DSpace Test that had been created on CGSpace after our last sync
- After that the import succeeded:
```
$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx512m'
$ dspace metadata-import -f /tmp/bioversity1.csv -e blah@blah.com
$ dspace metadata-import -f /tmp/bioversity2.csv -e blah@blah.com
```
- The next step is to check these items for duplicates
<!-- vim: set sw=2 ts=2: -->

View File

@ -27,7 +27,7 @@ Run system updates on DSpace Test (linode19) and reboot it
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-08/" />
<meta property="article:published_time" content="2019-08-03T12:39:51+03:00" />
<meta property="article:modified_time" content="2019-08-13T15:33:29+03:00" />
<meta property="article:modified_time" content="2019-08-13T16:54:35+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="August, 2019"/>
@ -59,9 +59,9 @@ Run system updates on DSpace Test (linode19) and reboot it
"@type": "BlogPosting",
"headline": "August, 2019",
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-08\/",
"wordCount": "1230",
"wordCount": "1409",
"datePublished": "2019-08-03T12:39:51\x2b03:00",
"dateModified": "2019-08-13T15:33:29\x2b03:00",
"dateModified": "2019-08-13T16:54:35\x2b03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -337,6 +337,41 @@ $ ./generate-thumbnails.py -i /tmp/user-upload2.csv -w --url-field-name url -d |
<ul>
<li>I told them not to continue, and that we would keep an eye on it and keep troubleshooting it (if neccessary) in the public eye on dspace-tech and Solr mailing lists</li>
</ul></li>
<li><p>Testing an import of 1,429 Bioversity items (metadata only) on my local development machine and got an error with Java memory after about 1,000 items:</p>
<pre><code>$ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com
...
java.lang.OutOfMemoryError: GC overhead limit exceeded
</code></pre></li>
<li><p>I increased the heap size to 1536m and tried again:</p>
<pre><code>$ export JAVA_OPTS=&quot;-Dfile.encoding=UTF-8 -Xmx1536m&quot;
$ ~/dspace/bin/dspace metadata-import -f /tmp/bioversity.csv -e blah@blah.com
</code></pre></li>
<li><p>This time it succeeded, and using VisualVM I noticed that the import process used a maximum of 620MB of RAM</p></li>
</ul>
<h2 id="2019-08-14">2019-08-14</h2>
<ul>
<li><p>I imported the 1429 Bioversity records into DSpace Test</p>
<ul>
<li>To make sure we didn&rsquo;t have memory issues I reduced Tomcat&rsquo;s JVM heap by 512m, increased the import processes&rsquo;s heap to 512m, and split the input file into two parts with about 700 each</li>
<li>Then I had to create a few new temporary collections on DSpace Test that had been created on CGSpace after our last sync</li>
<li><p>After that the import succeeded:</p>
<pre><code>$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx512m'
$ dspace metadata-import -f /tmp/bioversity1.csv -e blah@blah.com
$ dspace metadata-import -f /tmp/bioversity2.csv -e blah@blah.com
</code></pre></li>
</ul></li>
<li><p>The next step is to check these items for duplicates</p></li>
</ul>
<!-- vim: set sw=2 ts=2: -->

View File

@ -4,30 +4,30 @@
<url>
<loc>https://alanorth.github.io/cgspace-notes/2019-08/</loc>
<lastmod>2019-08-13T15:33:29+03:00</lastmod>
<lastmod>2019-08-13T16:54:35+03:00</lastmod>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2019-08-13T15:33:29+03:00</lastmod>
<lastmod>2019-08-13T16:54:35+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
<lastmod>2019-08-13T15:33:29+03:00</lastmod>
<lastmod>2019-08-13T16:54:35+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2019-08-13T15:33:29+03:00</lastmod>
<lastmod>2019-08-13T16:54:35+03:00</lastmod>
<priority>0</priority>
</url>
<url>
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
<lastmod>2019-08-13T15:33:29+03:00</lastmod>
<lastmod>2019-08-13T16:54:35+03:00</lastmod>
<priority>0</priority>
</url>