mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-22 21:22:19 +01:00
Add notes for 2020-08-14
This commit is contained in:
parent
eafe422984
commit
3252567208
@ -398,4 +398,56 @@ dspace=# SELECT count(text_value) FROM metadatavalue WHERE metadata_field_id = 2
|
||||
|
||||
- I purged 150,000 hits from 2020 and 2020 from these user agents and hosts
|
||||
|
||||
## 2020-08-14
|
||||
|
||||
- Last night I started the processing of the statistics-2016 core with the Atmire stats util and I see some errors like this:
|
||||
|
||||
```
|
||||
Record uid: f6b288d7-d60d-4df9-b311-1696b88552a0 couldn't be processed
|
||||
com.atmire.statistics.util.update.atomic.ProcessingException: something went wrong while processing record uid: f6b288d7-d60d-4df9-b311-1696b88552a0, an error occured in the com.atmire.statistics.util.update.atomic.processor.ContainerOwnerDBProcessor
|
||||
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(SourceFile:304)
|
||||
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.processRecords(SourceFile:176)
|
||||
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(SourceFile:161)
|
||||
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(SourceFile:128)
|
||||
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(SourceFile:78)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
|
||||
Caused by: java.lang.NullPointerException
|
||||
```
|
||||
|
||||
- I see it has `id: 980-unmigrated` and `type: 0`...
|
||||
- The 2016 core has 629,983 unmigrated docs, mostly:
|
||||
- `type: 5`: 620311
|
||||
- `type: 0`: 7255
|
||||
- `type: 3`: 1333
|
||||
- I purged the unmigrated docs and continued processing:
|
||||
|
||||
```
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2016/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>id:/.*unmigrated.*/</query></delete>'
|
||||
$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m'
|
||||
$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics-2016
|
||||
```
|
||||
|
||||
- Then I see there are 849,000 docs with `id: -1` and `type: 5` so I should purge those too probably:
|
||||
|
||||
```
|
||||
$ curl -s "http://localhost:8081/solr/statistics-2017/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>id:\-1</query></delete>'
|
||||
```
|
||||
|
||||
- Altmetric asked for a dump of CGSpace's OAI "sets" so they can update their affiliation mappings
|
||||
- I did it in a kinda ghetto way:
|
||||
|
||||
```
|
||||
$ http 'https://cgspace.cgiar.org/oai/request?verb=ListSets' > /tmp/0.xml
|
||||
$ for num in {100..1300..100}; do http "https://cgspace.cgiar.org/oai/request?verb=ListSets&resumptionToken=////$num" > /tmp/$num.xml; sleep 2; done
|
||||
$ for num in {0..1300..100}; do cat /tmp/$num.xml >> /tmp/cgspace-oai-sets.xml; done
|
||||
```
|
||||
|
||||
- This produces one file that has all the sets, albeit with 14 pages of responses concatenated into one document, but that's how theirs was in the first place...
|
||||
- Help Bizu with a restricted item for CIAT
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -19,7 +19,7 @@ It is class based so I can easily add support for other vocabularies, and the te
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-08/" />
|
||||
<meta property="article:published_time" content="2020-08-02T15:35:54+03:00" />
|
||||
<meta property="article:modified_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="article:modified_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="August, 2020"/>
|
||||
@ -43,9 +43,9 @@ It is class based so I can easily add support for other vocabularies, and the te
|
||||
"@type": "BlogPosting",
|
||||
"headline": "August, 2020",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2020-08/",
|
||||
"wordCount": "2554",
|
||||
"wordCount": "2800",
|
||||
"datePublished": "2020-08-02T15:35:54+03:00",
|
||||
"dateModified": "2020-08-11T11:35:05+03:00",
|
||||
"dateModified": "2020-08-13T17:56:39+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -566,6 +566,56 @@ $ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=tru
|
||||
<ul>
|
||||
<li>I purged 150,000 hits from 2020 and 2020 from these user agents and hosts</li>
|
||||
</ul>
|
||||
<h2 id="2020-08-14">2020-08-14</h2>
|
||||
<ul>
|
||||
<li>Last night I started the processing of the statistics-2016 core with the Atmire stats util and I see some errors like this:</li>
|
||||
</ul>
|
||||
<pre><code>Record uid: f6b288d7-d60d-4df9-b311-1696b88552a0 couldn't be processed
|
||||
com.atmire.statistics.util.update.atomic.ProcessingException: something went wrong while processing record uid: f6b288d7-d60d-4df9-b311-1696b88552a0, an error occured in the com.atmire.statistics.util.update.atomic.processor.ContainerOwnerDBProcessor
|
||||
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(SourceFile:304)
|
||||
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.processRecords(SourceFile:176)
|
||||
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(SourceFile:161)
|
||||
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(SourceFile:128)
|
||||
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(SourceFile:78)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
||||
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
||||
at java.lang.reflect.Method.invoke(Method.java:498)
|
||||
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
|
||||
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
|
||||
Caused by: java.lang.NullPointerException
|
||||
</code></pre><ul>
|
||||
<li>I see it has <code>id: 980-unmigrated</code> and <code>type: 0</code>…</li>
|
||||
<li>The 2016 core has 629,983 unmigrated docs, mostly:
|
||||
<ul>
|
||||
<li><code>type: 5</code>: 620311</li>
|
||||
<li><code>type: 0</code>: 7255</li>
|
||||
<li><code>type: 3</code>: 1333</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I purged the unmigrated docs and continued processing:</li>
|
||||
</ul>
|
||||
<pre><code>$ curl -s "http://localhost:8081/solr/statistics-2016/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>id:/.*unmigrated.*/</query></delete>'
|
||||
$ export JAVA_OPTS='-Dfile.encoding=UTF-8 -Xmx2048m'
|
||||
$ chrt -b 0 dspace dsrun com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI -t 12 -c statistics-2016
|
||||
</code></pre><ul>
|
||||
<li>Then I see there are 849,000 docs with <code>id: -1</code> and <code>type: 5</code> so I should purge those too probably:</li>
|
||||
</ul>
|
||||
<pre><code>$ curl -s "http://localhost:8081/solr/statistics-2017/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>id:\-1</query></delete>'
|
||||
</code></pre><ul>
|
||||
<li>Altmetric asked for a dump of CGSpace’s OAI “sets” so they can update their affiliation mappings
|
||||
<ul>
|
||||
<li>I did it in a kinda ghetto way:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>$ http 'https://cgspace.cgiar.org/oai/request?verb=ListSets' > /tmp/0.xml
|
||||
$ for num in {100..1300..100}; do http "https://cgspace.cgiar.org/oai/request?verb=ListSets&resumptionToken=////$num" > /tmp/$num.xml; sleep 2; done
|
||||
$ for num in {0..1300..100}; do cat /tmp/$num.xml >> /tmp/cgspace-oai-sets.xml; done
|
||||
</code></pre><ul>
|
||||
<li>This produces one file that has all the sets, albeit with 14 pages of responses concatenated into one document, but that’s how theirs was in the first place…</li>
|
||||
<li>Help Bizu with a restricted item for CIAT</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-08-11T11:35:05+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-13T17:56:39+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -4,27 +4,27 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2020-08/</loc>
|
||||
<lastmod>2020-08-11T11:35:05+03:00</lastmod>
|
||||
<lastmod>2020-08-13T17:56:39+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
|
||||
<lastmod>2020-08-11T11:35:05+03:00</lastmod>
|
||||
<lastmod>2020-08-13T17:56:39+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2020-08-11T11:35:05+03:00</lastmod>
|
||||
<lastmod>2020-08-13T17:56:39+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2020-08-11T11:35:05+03:00</lastmod>
|
||||
<lastmod>2020-08-13T17:56:39+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2020-08-11T11:35:05+03:00</lastmod>
|
||||
<lastmod>2020-08-13T17:56:39+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
|
Loading…
Reference in New Issue
Block a user