Update notes for 2018-05-30

This commit is contained in:
2018-05-30 17:44:58 -07:00
parent 0fafc7a626
commit 1eb62971a5
3 changed files with 40 additions and 8 deletions

View File

@ -365,3 +365,19 @@ $ sed 's/.*Item1.*/\n&/g' ~/cifor-duplicates.txt > ~/cifor-duplicates-cleaned.tx
```
- I told Vika to look through the list manually and indicate which ones are indeed duplicates that we should delete, and which ones to map to CIFOR's collection
- A few weeks ago Peter wanted a list of authors from the ILRI collections, so I need to find a way to get the handles of all those collections
- I can use the `/communities/{id}/collections` endpoint of the REST API but it only takes IDs (not handles) and doesn't seem to descend into sub communities
- Shit, so I need the IDs for the the top-level ILRI community and all its sub communities (and their sub communities)
- There has got to be a better way to do this than going to each community and getting their handles and IDs manually
- Oh shit, I literally already wrote a script to get all collections in a community hierarchy from the REST API: [rest-find-collections.py](https://gist.github.com/alanorth/ddd7f555f0e487fe0e9d3eb4ff26ce50)
- The output isn't great, but all the handles and IDs are printed in debug mode:
```
$ ./rest-find-collections.py -u https://cgspace.cgiar.org/rest -d 10568/1 2> /tmp/ilri-collections.txt
```
- Then I format the list of handles and put it into this SQL query to export authors from items ONLY in those collections (too many to list here):
```
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/67236','10568/67274',...))) group by text_value order by count desc) to /tmp/ilri-authors.csv with csv;
```