mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-12-23 21:44:30 +01:00
Update notes for 2020-08-20
This commit is contained in:
parent
d2c037d0de
commit
ebe0cea35b
@ -463,6 +463,7 @@ $ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=true" -H
|
||||
- Furthermore, it seems that each item is curated once for each collection it appears in, causing about 115,000 items to be processed, even though we only have about 87,000
|
||||
- I had been running the tasks on the entire repository with `-i 10568/0`, but I think I might need to try again with the special `all` option before writing to the dspace-tech mailing list for help
|
||||
- Actually I just tested the `all` option on DSpace 5.8 and it still does many of the items multiple times, once for each of their mappings
|
||||
- I sent a message to the dspace-tech mailing list
|
||||
- I finished the Atmire stats processing on all cores on DSpace Test:
|
||||
- statistics:
|
||||
- 2,040,385 docs: 2h 28m 49s
|
||||
@ -495,4 +496,32 @@ $ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=true" -H
|
||||
- On both my local test and DSpace Test I get no results when searching for "Orth, A." and "Orth, Alan" or even Delia Grace, but the Discovery index is up to date and I have eighteen items...
|
||||
- I sent a message to Atmire...
|
||||
|
||||
## 2020-08-20
|
||||
|
||||
- Natalia from CIAT was asking how she can download all the PDFs for the items in a search result
|
||||
- The search result is for the keyword "trade off" in the WLE community
|
||||
- I converted the Discovery search to an open-search query to extract the XML, but we can't get all the results on one page so I had to change the `rpp` to 100 and request a few times to get them all:
|
||||
|
||||
```
|
||||
$ http 'https://cgspace.cgiar.org/open-search/discover?scope=10568%2F34494&query=trade+off&rpp=100&start=0' User-Agent:'curl' > /tmp/wle-trade-off-page1.xml
|
||||
$ http 'https://cgspace.cgiar.org/open-search/discover?scope=10568%2F34494&query=trade+off&rpp=100&start=100' User-Agent:'curl' > /tmp/wle-trade-off-page2.xml
|
||||
$ http 'https://cgspace.cgiar.org/open-search/discover?scope=10568%2F34494&query=trade+off&rpp=100&start=200' User-Agent:'curl' > /tmp/wle-trade-off-page3.xml
|
||||
```
|
||||
|
||||
- Ugh, and to extract the `<id>` from each `<entry>` we have to use an XPath query, but use a [hack to ignore the default namespace by setting each element's local name](http://blog.powered-up-games.com/wordpress/archives/70):
|
||||
|
||||
```
|
||||
$ xmllint --xpath '//*[local-name()="entry"]/*[local-name()="id"]/text()' /tmp/wle-trade-off-page1.xml >> /tmp/ids.txt
|
||||
$ xmllint --xpath '//*[local-name()="entry"]/*[local-name()="id"]/text()' /tmp/wle-trade-off-page2.xml >> /tmp/ids.txt
|
||||
$ xmllint --xpath '//*[local-name()="entry"]/*[local-name()="id"]/text()' /tmp/wle-trade-off-page3.xml >> /tmp/ids.txt
|
||||
$ sort -u /tmp/ids.txt > /tmp/ids-sorted.txt
|
||||
$ grep -oE '[0-9]+/[0-9]+' /tmp/ids.txt > /tmp/handles.txt
|
||||
```
|
||||
|
||||
- Now I have all the handles for the matching items and I can use the REST API to get each item's PDFs...
|
||||
- I wrote `get-wle-pdfs.py` to read the handles from a text file and get all PDFs: https://github.com/ilri/DSpace/blob/5_x-prod/get-wle-pdfs.py
|
||||
- Add `Foreign, Commonwealth and Development Office, United Kingdom` to the controlled vocabulary for sponsors on CGSpace
|
||||
- This is the new name for DFID as of 2020-09-01
|
||||
- We will continue using DFID for older items
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -19,7 +19,7 @@ It is class based so I can easily add support for other vocabularies, and the te
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2020-08/" />
|
||||
<meta property="article:published_time" content="2020-08-02T15:35:54+03:00" />
|
||||
<meta property="article:modified_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="article:modified_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="August, 2020"/>
|
||||
@ -43,9 +43,9 @@ It is class based so I can easily add support for other vocabularies, and the te
|
||||
"@type": "BlogPosting",
|
||||
"headline": "August, 2020",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2020-08/",
|
||||
"wordCount": "3168",
|
||||
"wordCount": "3406",
|
||||
"datePublished": "2020-08-02T15:35:54+03:00",
|
||||
"dateModified": "2020-08-14T11:22:16+03:00",
|
||||
"dateModified": "2020-08-19T22:08:33+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -632,6 +632,7 @@ $ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=tru
|
||||
<li>I had been running the tasks on the entire repository with <code>-i 10568/0</code>, but I think I might need to try again with the special <code>all</code> option before writing to the dspace-tech mailing list for help
|
||||
<ul>
|
||||
<li>Actually I just tested the <code>all</code> option on DSpace 5.8 and it still does many of the items multiple times, once for each of their mappings</li>
|
||||
<li>I sent a message to the dspace-tech mailing list</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>I finished the Atmire stats processing on all cores on DSpace Test:
|
||||
@ -702,6 +703,39 @@ $ curl -s "http://localhost:8081/solr/statistics-2010/update?softCommit=tru
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="2020-08-20">2020-08-20</h2>
|
||||
<ul>
|
||||
<li>Natalia from CIAT was asking how she can download all the PDFs for the items in a search result
|
||||
<ul>
|
||||
<li>The search result is for the keyword “trade off” in the WLE community</li>
|
||||
<li>I converted the Discovery search to an open-search query to extract the XML, but we can’t get all the results on one page so I had to change the <code>rpp</code> to 100 and request a few times to get them all:</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<pre><code>$ http 'https://cgspace.cgiar.org/open-search/discover?scope=10568%2F34494&query=trade+off&rpp=100&start=0' User-Agent:'curl' > /tmp/wle-trade-off-page1.xml
|
||||
$ http 'https://cgspace.cgiar.org/open-search/discover?scope=10568%2F34494&query=trade+off&rpp=100&start=100' User-Agent:'curl' > /tmp/wle-trade-off-page2.xml
|
||||
$ http 'https://cgspace.cgiar.org/open-search/discover?scope=10568%2F34494&query=trade+off&rpp=100&start=200' User-Agent:'curl' > /tmp/wle-trade-off-page3.xml
|
||||
</code></pre><ul>
|
||||
<li>Ugh, and to extract the <code><id></code> from each <code><entry></code> we have to use an XPath query, but use a <a href="http://blog.powered-up-games.com/wordpress/archives/70">hack to ignore the default namespace by setting each element’s local name</a>:</li>
|
||||
</ul>
|
||||
<pre><code>$ xmllint --xpath '//*[local-name()="entry"]/*[local-name()="id"]/text()' /tmp/wle-trade-off-page1.xml >> /tmp/ids.txt
|
||||
$ xmllint --xpath '//*[local-name()="entry"]/*[local-name()="id"]/text()' /tmp/wle-trade-off-page2.xml >> /tmp/ids.txt
|
||||
$ xmllint --xpath '//*[local-name()="entry"]/*[local-name()="id"]/text()' /tmp/wle-trade-off-page3.xml >> /tmp/ids.txt
|
||||
$ sort -u /tmp/ids.txt > /tmp/ids-sorted.txt
|
||||
$ grep -oE '[0-9]+/[0-9]+' /tmp/ids.txt > /tmp/handles.txt
|
||||
</code></pre><ul>
|
||||
<li>Now I have all the handles for the matching items and I can use the REST API to get each item’s PDFs…
|
||||
<ul>
|
||||
<li>I wrote <code>get-wle-pdfs.py</code> to read the handles from a text file and get all PDFs: <a href="https://github.com/ilri/DSpace/blob/5_x-prod/get-wle-pdfs.py">https://github.com/ilri/DSpace/blob/5_x-prod/get-wle-pdfs.py</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Add <code>Foreign, Commonwealth and Development Office, United Kingdom</code> to the controlled vocabulary for sponsors on CGSpace
|
||||
<ul>
|
||||
<li>This is the new name for DFID as of 2020-09-01</li>
|
||||
<li>We will continue using DFID for older items</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -9,7 +9,7 @@
|
||||
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
|
||||
<meta property="og:type" content="website" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
|
||||
<meta property="og:updated_time" content="2020-08-14T11:22:16+03:00" />
|
||||
<meta property="og:updated_time" content="2020-08-19T22:08:33+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
|
@ -4,27 +4,27 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2020-08/</loc>
|
||||
<lastmod>2020-08-14T11:22:16+03:00</lastmod>
|
||||
<lastmod>2020-08-19T22:08:33+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
|
||||
<lastmod>2020-08-14T11:22:16+03:00</lastmod>
|
||||
<lastmod>2020-08-19T22:08:33+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2020-08-14T11:22:16+03:00</lastmod>
|
||||
<lastmod>2020-08-19T22:08:33+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
|
||||
<lastmod>2020-08-14T11:22:16+03:00</lastmod>
|
||||
<lastmod>2020-08-19T22:08:33+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2020-08-14T11:22:16+03:00</lastmod>
|
||||
<lastmod>2020-08-19T22:08:33+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
|
Loading…
Reference in New Issue
Block a user