mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-11 14:33:21 +01:00
Update notes for 2019-02-05
This commit is contained in:
parent
f40f503304
commit
c053b90504
@ -160,4 +160,34 @@ COPY 321
|
|||||||
- At this rate I think I just need to stop paying attention to these alerts—DSpace gets thrashed when people use the APIs properly and there's nothing we can do to improve REST API performance!
|
- At this rate I think I just need to stop paying attention to these alerts—DSpace gets thrashed when people use the APIs properly and there's nothing we can do to improve REST API performance!
|
||||||
- Perhaps I just need to keep increasing the Linode alert threshold (currently 300%) for this host?
|
- Perhaps I just need to keep increasing the Linode alert threshold (currently 300%) for this host?
|
||||||
|
|
||||||
|
## 2019-02-05
|
||||||
|
|
||||||
|
- Peter sent me corrections and deletions for the CTA subjects and as usual, there were encoding errors with some accentsÁ in his file
|
||||||
|
- In other news, it seems that the GREL syntax regarding booleans changed in OpenRefine recently, so I need to update some expressions like the one I use to detect encoding errors to use `toString()`:
|
||||||
|
|
||||||
|
```
|
||||||
|
or(
|
||||||
|
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||||
|
isNotNull(value.match(/.*\u00A0.*/)),
|
||||||
|
isNotNull(value.match(/.*\u200A.*/)),
|
||||||
|
isNotNull(value.match(/.*\u2019.*/)),
|
||||||
|
isNotNull(value.match(/.*\u00b4.*/)),
|
||||||
|
isNotNull(value.match(/.*\u007e.*/))
|
||||||
|
).toString()
|
||||||
|
```
|
||||||
|
|
||||||
|
- Testing the corrections for sixty-five items and sixteen deletions using my [fix-metadata-values.py](https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897) and [delete-metadata-values.py](https://gist.github.com/alanorth/bd7d58c947f686401a2b1fadc78736be) scripts:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ ./fix-metadata-values.py -i 2019-02-04-Correct-65-CTA-Subjects.csv -f cg.subject.cta -t CORRECT -m 124 -db dspace -u dspace -p 'fuu' -d
|
||||||
|
$ ./delete-metadata-values.py -i 2019-02-04-Delete-16-CTA-Subjects.csv -f cg.subject.cta -m 124 -db dspace -u dspace -p 'fuu' -d
|
||||||
|
```
|
||||||
|
|
||||||
|
- I applied them on DSpace Test and CGSpace and started a full Discovery re-index:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||||
|
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||||
|
```
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
@ -42,7 +42,7 @@ sys 0m1.979s
|
|||||||
<meta property="og:type" content="article" />
|
<meta property="og:type" content="article" />
|
||||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-02/" />
|
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-02/" />
|
||||||
<meta property="article:published_time" content="2019-02-01T21:37:30+02:00"/>
|
<meta property="article:published_time" content="2019-02-01T21:37:30+02:00"/>
|
||||||
<meta property="article:modified_time" content="2019-02-04T20:09:20+02:00"/>
|
<meta property="article:modified_time" content="2019-02-04T23:05:12+02:00"/>
|
||||||
|
|
||||||
<meta name="twitter:card" content="summary"/>
|
<meta name="twitter:card" content="summary"/>
|
||||||
<meta name="twitter:title" content="February, 2019"/>
|
<meta name="twitter:title" content="February, 2019"/>
|
||||||
@ -89,9 +89,9 @@ sys 0m1.979s
|
|||||||
"@type": "BlogPosting",
|
"@type": "BlogPosting",
|
||||||
"headline": "February, 2019",
|
"headline": "February, 2019",
|
||||||
"url": "https://alanorth.github.io/cgspace-notes/2019-02/",
|
"url": "https://alanorth.github.io/cgspace-notes/2019-02/",
|
||||||
"wordCount": "846",
|
"wordCount": "990",
|
||||||
"datePublished": "2019-02-01T21:37:30+02:00",
|
"datePublished": "2019-02-01T21:37:30+02:00",
|
||||||
"dateModified": "2019-02-04T20:09:20+02:00",
|
"dateModified": "2019-02-04T23:05:12+02:00",
|
||||||
"author": {
|
"author": {
|
||||||
"@type": "Person",
|
"@type": "Person",
|
||||||
"name": "Alan Orth"
|
"name": "Alan Orth"
|
||||||
@ -338,6 +338,39 @@ COPY 321
|
|||||||
<li>Perhaps I just need to keep increasing the Linode alert threshold (currently 300%) for this host?</li>
|
<li>Perhaps I just need to keep increasing the Linode alert threshold (currently 300%) for this host?</li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<h2 id="2019-02-05">2019-02-05</h2>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Peter sent me corrections and deletions for the CTA subjects and as usual, there were encoding errors with some accentsÁ in his file</li>
|
||||||
|
<li>In other news, it seems that the GREL syntax regarding booleans changed in OpenRefine recently, so I need to update some expressions like the one I use to detect encoding errors to use <code>toString()</code>:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>or(
|
||||||
|
isNotNull(value.match(/.*\uFFFD.*/)),
|
||||||
|
isNotNull(value.match(/.*\u00A0.*/)),
|
||||||
|
isNotNull(value.match(/.*\u200A.*/)),
|
||||||
|
isNotNull(value.match(/.*\u2019.*/)),
|
||||||
|
isNotNull(value.match(/.*\u00b4.*/)),
|
||||||
|
isNotNull(value.match(/.*\u007e.*/))
|
||||||
|
).toString()
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>Testing the corrections for sixty-five items and sixteen deletions using my <a href="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a> and <a href="https://gist.github.com/alanorth/bd7d58c947f686401a2b1fadc78736be">delete-metadata-values.py</a> scripts:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ ./fix-metadata-values.py -i 2019-02-04-Correct-65-CTA-Subjects.csv -f cg.subject.cta -t CORRECT -m 124 -db dspace -u dspace -p 'fuu' -d
|
||||||
|
$ ./delete-metadata-values.py -i 2019-02-04-Delete-16-CTA-Subjects.csv -f cg.subject.cta -m 124 -db dspace -u dspace -p 'fuu' -d
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li>I applied them on DSpace Test and CGSpace and started a full Discovery re-index:</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<pre><code>$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
|
||||||
|
$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
|
||||||
|
</code></pre>
|
||||||
|
|
||||||
<!-- vim: set sw=2 ts=2: -->
|
<!-- vim: set sw=2 ts=2: -->
|
||||||
|
|
||||||
|
|
||||||
|
@ -44,7 +44,7 @@ Disallow: /cgspace-notes/2015-12/
|
|||||||
Disallow: /cgspace-notes/2015-11/
|
Disallow: /cgspace-notes/2015-11/
|
||||||
Disallow: /cgspace-notes/
|
Disallow: /cgspace-notes/
|
||||||
Disallow: /cgspace-notes/categories/
|
Disallow: /cgspace-notes/categories/
|
||||||
Disallow: /cgspace-notes/tags/notes/
|
|
||||||
Disallow: /cgspace-notes/categories/notes/
|
Disallow: /cgspace-notes/categories/notes/
|
||||||
|
Disallow: /cgspace-notes/tags/notes/
|
||||||
Disallow: /cgspace-notes/posts/
|
Disallow: /cgspace-notes/posts/
|
||||||
Disallow: /cgspace-notes/tags/
|
Disallow: /cgspace-notes/tags/
|
||||||
|
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/2019-02/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/2019-02/</loc>
|
||||||
<lastmod>2019-02-04T20:09:20+02:00</lastmod>
|
<lastmod>2019-02-04T23:05:12+02:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
@ -209,7 +209,7 @@
|
|||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||||
<lastmod>2019-02-04T20:09:20+02:00</lastmod>
|
<lastmod>2019-02-04T23:05:12+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
@ -218,27 +218,27 @@
|
|||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
|
||||||
<lastmod>2019-02-04T20:09:20+02:00</lastmod>
|
|
||||||
<priority>0</priority>
|
|
||||||
</url>
|
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||||
<lastmod>2018-03-09T22:10:33+02:00</lastmod>
|
<lastmod>2018-03-09T22:10:33+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
<url>
|
||||||
|
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||||
|
<lastmod>2019-02-04T23:05:12+02:00</lastmod>
|
||||||
|
<priority>0</priority>
|
||||||
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||||
<lastmod>2019-02-04T20:09:20+02:00</lastmod>
|
<lastmod>2019-02-04T23:05:12+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
<url>
|
<url>
|
||||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||||
<lastmod>2019-02-04T20:09:20+02:00</lastmod>
|
<lastmod>2019-02-04T23:05:12+02:00</lastmod>
|
||||||
<priority>0</priority>
|
<priority>0</priority>
|
||||||
</url>
|
</url>
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user