mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-22 06:35:03 +01:00
Update notes for 2018-05-30
This commit is contained in:
parent
0fafc7a626
commit
1eb62971a5
@ -365,3 +365,19 @@ $ sed 's/.*Item1.*/\n&/g' ~/cifor-duplicates.txt > ~/cifor-duplicates-cleaned.tx
|
||||
```
|
||||
|
||||
- I told Vika to look through the list manually and indicate which ones are indeed duplicates that we should delete, and which ones to map to CIFOR's collection
|
||||
- A few weeks ago Peter wanted a list of authors from the ILRI collections, so I need to find a way to get the handles of all those collections
|
||||
- I can use the `/communities/{id}/collections` endpoint of the REST API but it only takes IDs (not handles) and doesn't seem to descend into sub communities
|
||||
- Shit, so I need the IDs for the the top-level ILRI community and all its sub communities (and their sub communities)
|
||||
- There has got to be a better way to do this than going to each community and getting their handles and IDs manually
|
||||
- Oh shit, I literally already wrote a script to get all collections in a community hierarchy from the REST API: [rest-find-collections.py](https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50)
|
||||
- The output isn't great, but all the handles and IDs are printed in debug mode:
|
||||
|
||||
```
|
||||
$ ./rest-find-collections.py -u https://cgspace.cgiar.org/rest -d 10568/1 2> /tmp/ilri-collections.txt
|
||||
```
|
||||
|
||||
- Then I format the list of handles and put it into this SQL query to export authors from items ONLY in those collections (too many to list here):
|
||||
|
||||
```
|
||||
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/67236','10568/67274',...))) group by text_value order by count desc) to /tmp/ilri-authors.csv with csv;
|
||||
```
|
||||
|
@ -27,7 +27,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
|
||||
|
||||
<meta property="article:published_time" content="2018-05-01T16:43:54+03:00"/>
|
||||
|
||||
<meta property="article:modified_time" content="2018-05-30T10:50:55-07:00"/>
|
||||
<meta property="article:modified_time" content="2018-05-30T14:48:10-07:00"/>
|
||||
|
||||
|
||||
|
||||
@ -65,9 +65,9 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
|
||||
"@type": "BlogPosting",
|
||||
"headline": "May, 2018",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2018-05/",
|
||||
"wordCount": "3135",
|
||||
"wordCount": "3361",
|
||||
"datePublished": "2018-05-01T16:43:54+03:00",
|
||||
"dateModified": "2018-05-30T10:50:55-07:00",
|
||||
"dateModified": "2018-05-30T14:48:10-07:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -565,8 +565,24 @@ $ sed 's/.*Item1.*/\n&/g' ~/cifor-duplicates.txt > ~/cifor-duplicates-cle
|
||||
|
||||
<ul>
|
||||
<li>I told Vika to look through the list manually and indicate which ones are indeed duplicates that we should delete, and which ones to map to CIFOR’s collection</li>
|
||||
<li>A few weeks ago Peter wanted a list of authors from the ILRI collections, so I need to find a way to get the handles of all those collections</li>
|
||||
<li>I can use the <code>/communities/{id}/collections</code> endpoint of the REST API but it only takes IDs (not handles) and doesn’t seem to descend into sub communities</li>
|
||||
<li>Shit, so I need the IDs for the the top-level ILRI community and all its sub communities (and their sub communities)</li>
|
||||
<li>There has got to be a better way to do this than going to each community and getting their handles and IDs manually</li>
|
||||
<li>Oh shit, I literally already wrote a script to get all collections in a community hierarchy from the REST API: <a href="https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50">rest-find-collections.py</a></li>
|
||||
<li>The output isn’t great, but all the handles and IDs are printed in debug mode:</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>$ ./rest-find-collections.py -u https://cgspace.cgiar.org/rest -d 10568/1 2> /tmp/ilri-collections.txt
|
||||
</code></pre>
|
||||
|
||||
<ul>
|
||||
<li>Then I format the list of handles and put it into this SQL query to export authors from items ONLY in those collections (too many to list here):</li>
|
||||
</ul>
|
||||
|
||||
<pre><code>dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/67236','10568/67274',...))) group by text_value order by count desc) to /tmp/ilri-authors.csv with csv;
|
||||
</code></pre>
|
||||
|
||||
|
||||
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2018-05/</loc>
|
||||
<lastmod>2018-05-30T10:50:55-07:00</lastmod>
|
||||
<lastmod>2018-05-30T14:48:10-07:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
@ -164,7 +164,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2018-05-30T10:50:55-07:00</lastmod>
|
||||
<lastmod>2018-05-30T14:48:10-07:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -175,7 +175,7 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2018-05-30T10:50:55-07:00</lastmod>
|
||||
<lastmod>2018-05-30T14:48:10-07:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
@ -187,13 +187,13 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2018-05-30T10:50:55-07:00</lastmod>
|
||||
<lastmod>2018-05-30T14:48:10-07:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2018-05-30T10:50:55-07:00</lastmod>
|
||||
<lastmod>2018-05-30T14:48:10-07:00</lastmod>
|
||||
<priority>0</priority>
|
||||
</url>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user