mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-16 11:57:03 +01:00
485 lines
24 KiB
Markdown
485 lines
24 KiB
Markdown
---
|
|
title: "September, 2020"
|
|
date: 2020-09-02T15:35:54+03:00
|
|
author: "Alan Orth"
|
|
categories: ["Notes"]
|
|
---
|
|
|
|
## 2020-09-02
|
|
|
|
- Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS
|
|
- The AReS Explorer hasn't updated its index since 2020-08-22 when I last forced it
|
|
- I restarted it again now and told Moayad that the automatic indexing isn't working
|
|
- Add `Alliance of Bioversity International and CIAT` to affiliations on CGSpace
|
|
- Abenet told me that the general search text on AReS doesn't get reset when you use the "Reset Filters" button
|
|
- I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/39
|
|
- I filed an issue on OpenRXV to make some minor edits to the admin UI: https://github.com/ilri/OpenRXV/issues/40
|
|
|
|
<!--more-->
|
|
|
|
- I ran the country code tagger on CGSpace:
|
|
|
|
```
|
|
$ time chrt -b 0 dspace curate -t countrycodetagger -i all -r - -l 500 -s object | tee /tmp/2020-09-02-countrycodetagger.log
|
|
...
|
|
real 2m10.516s
|
|
user 1m43.953s
|
|
sys 0m15.192s
|
|
$ grep -c added /tmp/2020-09-02-countrycodetagger.log
|
|
39
|
|
```
|
|
|
|
- I still need to create a cron job for this...
|
|
- Sisay and Abenet said they can't log in with LDAP on DSpace Test (DSpace 6)
|
|
- I tried and I can't either... but it is working on CGSpace
|
|
- The error on DSpace 6 is:
|
|
|
|
```
|
|
2020-09-02 12:03:10,666 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=A629116488DCC467E1EA2062A2E2EFD7:ip_addr=92.220.02.201:failed_login:no DN found for user aorth
|
|
```
|
|
|
|
- I tried to query LDAP directly using the application credentials with ldapsearch and it works:
|
|
|
|
```
|
|
$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b "dc=cgiarad,dc=org" -D "applicationaccount@cgiarad.org" -W "(sAMAccountName=me)"
|
|
```
|
|
|
|
- According to the [DSpace 6 docs](https://wiki.lyrasis.org/display/DSDOC6x/Authentication+Plugins#AuthenticationPlugins-LDAPAuthentication) we need to escape commas in our LDAP parameters due to the new configuration system
|
|
- I added the commas and restarted DSpace (though technically we shouldn't need to restart due to the new config system hot reloading configs)
|
|
- Run all system updates on DSpace Test (linode26) and reboot it
|
|
- After the restart LDAP login works...
|
|
|
|
## 2020-09-03
|
|
|
|
- Fix some erroneous "review status" fields that Abenet noticed on AReS
|
|
- I used my `fix-metadata-values.py` and `delete-metadata-values.py` scripts with the following input files:
|
|
|
|
```
|
|
$ cat 2020-09-03-fix-review-status.csv
|
|
dc.description.version,correct
|
|
Externally Peer Reviewed,Peer Review
|
|
Peer Reviewed,Peer Review
|
|
Peer review,Peer Review
|
|
Peer reviewed,Peer Review
|
|
Peer-Reviewed,Peer Review
|
|
Peer-reviewed,Peer Review
|
|
peer Review,Peer Review
|
|
$ cat 2020-09-03-delete-review-status.csv
|
|
dc.description.version
|
|
Report
|
|
Formally Published
|
|
Poster
|
|
Unrefereed reprint
|
|
$ ./delete-metadata-values.py -i 2020-09-03-delete-review-status.csv -db dspace -u dspace -p 'fuuu' -f dc.description.version -m 68
|
|
$ ./fix-metadata-values.py -i 2020-09-03-fix-review-status.csv -db dspace -u dspace -p 'fuuu' -f dc.description.version -t 'correct' -m 68
|
|
```
|
|
|
|
- Start reviewing 95 items for IITA (20201stbatch)
|
|
- I used my [csv-metadata-quality](https://github.com/ilri/csv-metadata-quality) tool to check and fix some low-hanging fruit first
|
|
- This fixed a few unnecessary Unicode, excessive whitespace, invalid multi-value separator, and duplicate metadata values
|
|
- Then I looked at the data in OpenRefine and noticed some things:
|
|
- All issue dates use year only, but some have months in the citation so they could be more specific
|
|
- I normalized all the DOIs to use "https://doi.org" format
|
|
- I fixed a few AGROVOC subjects with a simple GREL: `value.replace("GRAINS","GRAIN").replace("SOILS","SOIL").replace("CORN","MAIZE")`
|
|
- But there are a few more that are invalid that she will have to look at
|
|
- I uploaded the items to [DSpace Test](https://dspacetest.cgiar.org/handle/10568/108357) and it was apparently successful but I get these errors to the console:
|
|
|
|
```
|
|
Thu Sep 03 12:26:33 CEST 2020 | Query:containerItem:ea7a2648-180d-4fce-bdc5-c3aa2304fc58
|
|
Error while updating
|
|
java.lang.NullPointerException
|
|
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl$5.visit(SourceFile:1131)
|
|
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.visitEachStatisticShard(SourceFile:212)
|
|
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1104)
|
|
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1093)
|
|
at org.dspace.statistics.StatisticsLoggingConsumer.consume(SourceFile:104)
|
|
at org.dspace.event.BasicDispatcher.consume(BasicDispatcher.java:177)
|
|
at org.dspace.event.BasicDispatcher.dispatch(BasicDispatcher.java:123)
|
|
at org.dspace.core.Context.dispatchEvents(Context.java:455)
|
|
at org.dspace.core.Context.commit(Context.java:424)
|
|
at org.dspace.core.Context.complete(Context.java:380)
|
|
at org.dspace.app.bulkedit.MetadataImport.main(MetadataImport.java:1399)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:498)
|
|
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
|
|
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
|
|
```
|
|
|
|
- There are more in the DSpace log so I will raise it with Atmire immediately
|
|
|
|
## 2020-09-04
|
|
|
|
- I was checking the recent IITA data for duplicates when I noticed that one in CIFOR's Archive and saw that CIFOR has updated a bunch of their website URLs, for example:
|
|
- http://www.cifor.org/nc/online-library/browse/view-publication/publication/151.html → https://www.cifor.org/knowledge/publication/151
|
|
- https://www.cifor.org/library/4033 → https://www.cifor.org/knowledge/publication/4033
|
|
- https://www.cifor.org/pid/5087 → https://www.cifor.org/knowledge/publication/5087
|
|
- I will update our nearly 6,000 metadata values for CIFOR in the database accordingly:
|
|
|
|
```
|
|
dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^(http://)?www\.cifor\.org/(nc/)?online-library/browse/view-publication/publication/([[:digit:]]+)\.html$', 'https://www.cifor.org/knowledge/publication/\3') WHERE metadata_field_id=219 AND text_value ~ 'www\.cifor\.org/(nc/)?online-library/browse/view-publication/publication/[[:digit:]]+';
|
|
dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^https?://www\.cifor\.org/library/([[:digit:]]+)/?$', 'https://www.cifor.org/knowledge/publication/\1') WHERE metadata_field_id=219 AND text_value ~ 'https?://www\.cifor\.org/library/[[:digit:]]+/?';
|
|
dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^https?://www\.cifor\.org/pid/([[:digit:]]+)/?$', 'https://www.cifor.org/knowledge/publication/\1') WHERE metadata_field_id=219 AND text_value ~ 'https?://www\.cifor\.org/pid/[[:digit:]]+';
|
|
```
|
|
|
|
- I did some cleanup on the author affiliations of the IITA data our 2019-04 list using reconcile-csv and OpenRefine:
|
|
- `$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id`
|
|
- I always forget how to copy the reconciled values in OpenRefine, but you need to make a new column and populate it using this GREL: `if(cell.recon.matched, cell.recon.match.name, value)`
|
|
- I mapped one duplicated from the CIFOR Archives and re-uploaded the 94 IITA items to a new collection on [DSpace Test](https://dspacetest.cgiar.org/handle/10568/108453)
|
|
|
|
## 2020-09-08
|
|
|
|
- I noticed that the "share" link in AReS wasn't working properly because it excludes the "explorer" part of the URI
|
|
|
|
![AReS share link broken](/cgspace-notes/2020/09/ares-share-link.png)
|
|
|
|
- I filed an issue on GitHub: https://github.com/ilri/OpenRXV/issues/41
|
|
- I uploaded the 94 IITA items that I had been working on last week to CGSpace
|
|
- RTB emailed to ask why they are getting HTTP 503 errors during harvesting to the RTB WordPress website
|
|
- From the screenshot I can see they are requesting URLs like this:
|
|
|
|
```
|
|
https://cgspace.cgiar.org/bitstream/handle/10568/82745/Characteristics-Silage.JPG
|
|
```
|
|
|
|
- So they end up getting rate limited due to the XMLUI rate limits
|
|
- I told them to use the REST API bitstream retrieve links, because we don't have any rate limits there
|
|
|
|
## 2020-09-09
|
|
|
|
- Wire up the systemd service/timer for the CGSpace Country Code Tagger curation task in the [Ansible infrastructure scripts](https://github.com/ilri/rmg-ansible-public)
|
|
- ~~For now it won't work on DSpace 6 because the curation task invocation needs to be slightly different (minus the `-l` parameter) and for some reason the task isn't working on DSpace Test (version 6) right now~~
|
|
- I added DSpace 6 support to the playbook templates...
|
|
- Run system updates on DSpace Test (linode26), re-deploy the DSpace 6 test branch, and reboot the server
|
|
- After rebooting I deleted old copies of the cgspace-java-helpers JAR in the DSpace lib directory and then the curation worked
|
|
- To my great surprise the curation worked (and completed, albeit a few times slower) on my local DSpace 6 environment as well:
|
|
|
|
```
|
|
$ ~/dspace63/bin/dspace curate -t countrycodetagger -i all -s object
|
|
```
|
|
|
|
## 2020-09-10
|
|
|
|
- I checked the country code tagger on CGSpace and DSpace Test and it ran fine from the systemd timer last night... w00t
|
|
- I started looking at Peter's changes to the CGSpace regions that were proposed in 2020-07
|
|
- The changes will be:
|
|
|
|
```
|
|
$ cat 2020-09-10-fix-cgspace-regions.csv
|
|
cg.coverage.region,correct
|
|
EAST AFRICA,EASTERN AFRICA
|
|
WEST AFRICA,WESTERN AFRICA
|
|
SOUTHEAST ASIA,SOUTHEASTERN ASIA
|
|
SOUTH ASIA,SOUTHERN ASIA
|
|
AFRICA SOUTH OF SAHARA,SUB-SAHARAN AFRICA
|
|
NORTH AFRICA,NORTHERN AFRICA
|
|
WEST ASIA,WESTERN ASIA
|
|
SOUTHWEST ASIA,SOUTHWESTERN ASIA
|
|
$ ./fix-metadata-values.py -i 2020-09-10-fix-cgspace-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -t 'correct' -m 227 -d -n
|
|
Connected to database.
|
|
Would fix 12227 occurences of: EAST AFRICA
|
|
Would fix 7996 occurences of: WEST AFRICA
|
|
Would fix 3515 occurences of: SOUTHEAST ASIA
|
|
Would fix 3443 occurences of: SOUTH ASIA
|
|
Would fix 1134 occurences of: AFRICA SOUTH OF SAHARA
|
|
Would fix 357 occurences of: NORTH AFRICA
|
|
Would fix 81 occurences of: WEST ASIA
|
|
Would fix 3 occurences of: SOUTHWEST ASIA
|
|
```
|
|
|
|
- I think we need to wait for the web team, though, as they need to update their mappings
|
|
- Not to mention that we'll need to give WLE and CCAFS time to update their harvesters as well... hmmm
|
|
- Looking at the top user agents active on CGSpace in 2020-08 and I see:
|
|
- `Delphi 2009`: 235353 (this is GARDIAN harvester I guess, as the IP is in Greece)
|
|
- `Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)`: 57004 (IP is 18.196.100.94, and the requests seem to be for CTA's content)
|
|
- `RTB website BOT`: 12282
|
|
- `ILRI Livestock Website Publications importer BOT`: 9393
|
|
- Shit, I meant to add Delphi to the DSpace spider agents list last month but I guess I didn't commit the change
|
|
- HTTrack is in the agents list so I'm not sure why DSpace registers a hit from that request
|
|
- Also, I am surprised to see the RTB and ILRI bots here because they have "BOT" in the name and that should also be dropped
|
|
- I also see hits from `curl` and `Java/1.8.0_66` and `Apache-HttpClient` so WTF... those are supposed to be dropped by the default agents list
|
|
- Some IP `2607:f298:5:101d:f816:3eff:fed9:a484` made 9,000 requests with the `RI/1.0` user agent this year...
|
|
- That's on DreamHost...?
|
|
- I purged 448658 hits from these agents and added `Delphi` to our local agents overload for Solr as well as Tomcat's Crawler Session Manager Valve so that it forces them to re-use a single session
|
|
- I made a pull request on the COUNTER-Robots project for the Daum robot: https://github.com/atmire/COUNTER-Robots/pull/38
|
|
- This bot made 8,000 requests to CGSpace this year
|
|
- I purged about 20,000 total requests from this bot from our Solr stats for the last few years
|
|
|
|
## 2020-09-11
|
|
|
|
- Peter noticed that an export from AReS shows some items with zero views and others with zero views/downloads, but on CGSpace and in the statistics API there are views/downloads
|
|
- I need to ask Moayad...
|
|
|
|
## 2020-09-12
|
|
|
|
- Carlos Tejo from the LandPortal emailed to ask for advice about integrating their [LandVoc](https://landvoc.org/) vocabulary, which is a subset of AGROVOC, into DSpace
|
|
- I told him that they could use the DSpace authority control framework and sent an example of the VIAFAuthority from the DSpace-CRIS project: https://github.com/4Science/DSpace/blob/dspace-6_x_x-cris/dspace-api/src/main/java/org/dspace/content/authority/VIAFAuthority.java
|
|
- Redeploy the latest `5_x-prod` branch on CGSpace, re-run the latest Ansible DSpace playbook, run all system updates, and reboot the server (linode18)
|
|
- This will bring the latest bot lists for Solr and Tomcat
|
|
- I had to restart Tomcat 7 three times before all Solr statistics cores came up OK
|
|
- Leroy and Carol from CIAT/Bioversity were asking for information about posting to the CGSpace REST API from Sharepoint
|
|
- I told them that we don't allow this yet, but that we need to check in the future whether content can be posted to a workflow
|
|
|
|
## 2020-09-15
|
|
|
|
- Charlotte from Altmetric said they had issues parsing the XML file I sent them last month
|
|
- I told them that it was mimicking the same format that they had sent me (fourteen pages of XML responses concatenated together)!
|
|
- A few days ago IWMI asked us if we can add a new field on CGSpace for their library identifier
|
|
- The IDs look like this: H049940
|
|
- I suggested that we use `cg.identifier.iwmilibrary`
|
|
- I added it to the input forms and push it to the `5_x-prod` and 6.x branches and will re-deploy it in the next few days
|
|
- Abenet asked me to import sixty-nine (69) CIP Annual Reports to CGSpace
|
|
- I looked at the data in OpenRefine and it is very good quality
|
|
- I only added descriptions to the filename field so that SAFBuilder will add them to the bitstreams on import:
|
|
|
|
```
|
|
value + "__description:" + cells["dc.type"].value
|
|
```
|
|
|
|
- Then I created a SAF bundle with SAFBuilder:
|
|
|
|
```
|
|
$ ./safbuilder.sh -c ~/Downloads/cip-annual-reports/cip-reports.csv
|
|
```
|
|
|
|
- And imported them into my local test instance of CGSpace:
|
|
|
|
```
|
|
$ ~/dspace/bin/dspace import -a -e y.arrr@cgiar.org -m /tmp/2020-09-15-cip-annual-reports.map -s ~/Downloads/cip-annual-reports/SimpleArchiveFormat
|
|
```
|
|
|
|
- Then I uploaded them to CGSpace
|
|
|
|
## 2020-09-16
|
|
|
|
- Looking further into Carlos Tejos's question about integrating LandVoc (the AGROVOC subset) into DSpace
|
|
- I see that you can actually get LandVoc concepts directly from AGROVOC's SPARQL, for example with [this query](http://agrovoc.uniroma2.it/sparql#query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX+skos%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23%3E%0A%0ASELECT+%3Fconcept%0AWHERE+%7B%0A++%3Fconcept+a+skos%3AConcept+%3B%0A+++++++++++skos%3AinScheme+%3Chttp%3A%2F%2Flandvoc.org%2Flandvoc%3E+.%0A%0A%7D+ORDER+BY+%3Fconcept&contentTypeConstruct=text%2Fturtle&contentTypeSelect=application%2Fsparql-results%2Bjson&endpoint=http%3A%2F%2Fagrovoc.uniroma2.it%2Fsparql&requestMethod=POST&tabTitle=Query&headers=%7B%7D&outputFormat=table)
|
|
|
|
![AGROVOC LandVoc SPARQL](/cgspace-notes/2020/09/agrovoc-landvoc-sparql.png)
|
|
|
|
- So maybe we can query AGROVOC directly using a similar method to [DSpace-CRIS's GettyAuthority](https://github.com/4Science/DSpace/blob/dspace-5_x_x-cris/dspace-api/src/main/java/org/dspace/content/authority/TGNAuthority.java)
|
|
- I wired up DSpace-CRIS's VIAFAuthority to see how authorities for auto suggested names get stored
|
|
- After submission you can see the item's VIAF identifier:
|
|
|
|
![VIAF authority](/cgspace-notes/2020/09/viaf-authority.png)
|
|
|
|
- And this identifier is the ID on VIAF, pretty cool!
|
|
|
|
![VIAF entry for Charles Darwin](/cgspace-notes/2020/09/viaf-darwin.png)
|
|
|
|
- I did a similar test with the Getty Thesaurus of Geographic Names (TGN) and it stores the concept URI in the authority:
|
|
|
|
![TGNAuthority](/cgspace-notes/2020/09/tgn-concept-uri.png)
|
|
|
|
- But the authority values are not exposed anywhere as metadata...
|
|
- I need to play with it a bit more I guess...
|
|
- The nice thing is that the Getty example from DSpace-CRIS uses SPARQL as well, and the TGN authority extends it
|
|
- We could use a similar model for AGROVOC/LandVoc very easily
|
|
|
|
## 2020-09-17
|
|
|
|
- Maria from Bioveristy asked about the ORCID identifier for one of her colleagues that seems to have been removed from our list
|
|
- I re-added it to our controlled vocabulary and added the identifier to fifty-one of his existing items on CGSpace using my script:
|
|
|
|
```
|
|
$ cat 2020-09-17-add-bioversity-orcids.csv
|
|
dc.contributor.author,cg.creator.id
|
|
"Etten, Jacob van","Jacob van Etten: 0000-0001-7554-2558"
|
|
"van Etten, Jacob","Jacob van Etten: 0000-0001-7554-2558"
|
|
$ ./add-orcid-identifiers-csv.py -i 2020-09-17-add-bioversity-orcids.csv -db dspace -u dspace -p 'dom@in34sniper'
|
|
```
|
|
|
|
- I sent a follow-up message to Atmire to look into the two remaining issues with the DSpace 6 upgrade
|
|
- First is the fact that we have zero results in our Listings and Reports, for any search
|
|
- Second is the error we get during CSV imports
|
|
- Help Natalia and Cathy from Bioversity-CIAT with their OpenSearch query on "trade offs" again
|
|
- They wanted to build a search query with multiple filters (type, crpsubject, status) and the general query "trade offs"
|
|
- I found a great [reference for DSpace's OpenSearch syntax](https://www.kiwi.fi/pages/viewpage.action?pageId=45782169) (albeit in Finnish, but the example URLs show the syntax clearly)
|
|
- We can use quotes and `AND` and `OR` and even group search parameters with parenthesis!
|
|
- So now I built a query for Natalia which uses these (showing without URL encoding so you can see the syntax):
|
|
|
|
```
|
|
https://cgspace.cgiar.org/open-search/discover?query=type:"Journal Article" AND status:"Open Access" AND crpsubject:"Water, Land and Ecosystems" AND "tradeoffs"&rpp=100
|
|
```
|
|
|
|
- I noticed that my `move-collections.sh` script didn't work on DSpace 6 because of the change from IDs to UUIDs, so I modified it to quote the collection `resource_id` parameters in the PostgreSQL query
|
|
|
|
## 2020-09-18
|
|
|
|
- Help Natalia with her WLE "tradeoffs" search query again...
|
|
|
|
## 2020-09-20
|
|
|
|
- Deploy latest 5_x-prod branch on CGSpace, run all system updates, and reboot the server
|
|
- To my great surprise, all the Solr statistics cores came up correctly after reboot
|
|
- Deploy latest 6_x-dev branch on DSpace Test, run all system updates and reboot the server
|
|
|
|
## 2020-09-22
|
|
|
|
- Abenet sent some feedback about AReS
|
|
- The item views and downloads are still incorrect
|
|
- I looked in the server's API logs and there are no errors, and the database has many more views/downloads:
|
|
|
|
```
|
|
dspacestatistics=# SELECT SUM(views) FROM items;
|
|
sum
|
|
----------
|
|
15714024
|
|
(1 row)
|
|
|
|
dspacestatistics=# SELECT SUM(downloads) FROM items;
|
|
sum
|
|
----------
|
|
13979911
|
|
(1 row)
|
|
```
|
|
- I deleted "Report" from twelve items that had it in their peer review field:
|
|
|
|
```
|
|
dspace=# BEGIN;
|
|
BEGIN
|
|
dspace=# DELETE FROM metadatavalue WHERE text_value='Report' AND resource_type_id=2 AND metadata_field_id=68;
|
|
DELETE 12
|
|
dspace=# COMMIT;
|
|
```
|
|
|
|
- I added all CG center- and CRP-specific subject fields and mapped them to `dc.subject` in AReS
|
|
- After forcing a re-harvesting now the review status is much cleaner and the missing subjects are available
|
|
- Last week Natalia from CIAT had asked me to download all the PDFs for a certain query:
|
|
- items with status "Open Access"
|
|
- items with type "Journal Article"
|
|
- items containing any of the following words: water land and ecosystems & trade offs
|
|
- The resulting OpenSearch query is: https://cgspace.cgiar.org/open-search/discover?query=type:"Journal Article" AND status:"Open Access" AND Water Land Ecosystems trade offs&rpp=1
|
|
- There were 241 results with a total of 208 PDFs, which I downloaded with my `get-wle-pdfs.py` script and shared to her via bashupload.com
|
|
|
|
## 2020-09-23
|
|
|
|
- Peter said he was having problems submitting items to CGSpace
|
|
- On a hunch I looked at the PostgreSQL locks in Munin and indeed the normal issue with locks is back (though I haven't seen it in a few months?)
|
|
|
|
![PostgreSQL connections day](/cgspace-notes/2020/09/postgres_connections_ALL-day.png)
|
|
|
|
- Instead of restarting Tomcat I restarted the PostgreSQL service and then Peter said he was able to submit the item...
|
|
- Experiment with doing direct queries for items in the [dspace-statistics-api](https://github.com/ilri/dspace-statistics-api)
|
|
- I tested querying a handful of item UUIDs with a date range and returning their hits faceted by `id`
|
|
- Assuming a list of item UUIDs was posted to the REST API we could prepare them for a Solr query by joining them into a string with "OR" and escaping the hyphens:
|
|
|
|
```
|
|
...
|
|
item_ids = ['0079470a-87a1-4373-beb1-b16e3f0c4d81', '007a9df1-0871-4612-8b28-5335982198cb']
|
|
item_ids_str = ' OR '.join(item_ids).replace('-', '\-')
|
|
...
|
|
solr_query_params = {
|
|
"q": f"id:({item_ids_str})",
|
|
"fq": "type:2 AND isBot:false AND statistics_type:view AND time:[2020-01-01T00:00:00Z TO 2020-09-02T00:00:00Z]",
|
|
"facet": "true",
|
|
"facet.field": "id",
|
|
"facet.mincount": 1,
|
|
"facet.limit": 1,
|
|
"facet.offset": 0,
|
|
"stats": "true",
|
|
"stats.field": "id",
|
|
"stats.calcdistinct": "true",
|
|
"shards": shards,
|
|
"rows": 0,
|
|
"wt": "json",
|
|
}
|
|
```
|
|
|
|
- The date range format for Solr is important, but it seems we only need to add `T00:00:00Z` to the normal ISO 8601 YYYY-MM-DD strings
|
|
|
|
## 2020-09-25
|
|
|
|
- I did some more work on the dspace-statistics-api and finalized the support for sending a POST to `/items`:
|
|
|
|
```
|
|
$ curl -s -d @request.json https://dspacetest.cgiar.org/rest/statistics/items | json_pp
|
|
{
|
|
"currentPage" : 0,
|
|
"limit" : 10,
|
|
"statistics" : [
|
|
{
|
|
"downloads" : 3329,
|
|
"id" : "b2c1bbfd-65b0-438c-9e49-d271c49b2696",
|
|
"views" : 1565
|
|
},
|
|
{
|
|
"downloads" : 3797,
|
|
"id" : "f44cf173-2344-4eb2-8f00-ee55df32c76f",
|
|
"views" : 48
|
|
},
|
|
{
|
|
"downloads" : 11064,
|
|
"id" : "8542f9da-9ce1-4614-abf4-f2e3fdb4b305",
|
|
"views" : 26
|
|
},
|
|
{
|
|
"downloads" : 6782,
|
|
"id" : "2324aa41-e9de-4a2b-bc36-16241464683e",
|
|
"views" : 19
|
|
},
|
|
{
|
|
"downloads" : 48,
|
|
"id" : "0fe573e7-042a-4240-a4d9-753b61233908",
|
|
"views" : 12
|
|
},
|
|
{
|
|
"downloads" : 0,
|
|
"id" : "000e61ca-695d-43e5-9ab8-1f3fd7a67a32",
|
|
"views" : 4
|
|
},
|
|
{
|
|
"downloads" : 0,
|
|
"id" : "000dc7cd-9485-424b-8ecf-78002613cc87",
|
|
"views" : 1
|
|
},
|
|
{
|
|
"downloads" : 0,
|
|
"id" : "000e1616-3901-4431-80b1-c6bc67312d8c",
|
|
"views" : 1
|
|
},
|
|
{
|
|
"downloads" : 0,
|
|
"id" : "000ea897-5557-49c7-9f54-9fa192c0f83b",
|
|
"views" : 1
|
|
},
|
|
{
|
|
"downloads" : 0,
|
|
"id" : "000ec427-97e5-4766-85a5-e8dd62199ab5",
|
|
"views" : 1
|
|
}
|
|
],
|
|
"totalPages" : 13
|
|
}
|
|
```
|
|
|
|
- I deployed it on DSpace Test and sent a note to Salem so he can test it
|
|
- I still need to add tests...
|
|
- After that I will probably tag it as version 1.3.0
|
|
|
|
## 2020-09-25
|
|
|
|
- Atmire responded with some notes about the issues we're having with CUA and L&R on DSpace Test
|
|
- They think they have found the reason the issues are happening...
|
|
|
|
## 2020-09-29
|
|
|
|
- Atmire sent a pull request yesterday with a potential fix for the Listings and Reports (L&R) issue
|
|
- I tried to build it on DSpace Test but I got an HTTP 401 Unauthorized for the artifact
|
|
- I sent them a message...
|
|
|
|
## 2020-09-30
|
|
|
|
- Experiment with re-creating IWMI's "Monthly Abstract" type report with an AReS template
|
|
- The template library for reports is: https://docxtemplater.com
|
|
- Conditions start with a pound and end with a slash: {#items} {/items}
|
|
- An inverted section begins with a caret (hat) and ends with a slash: {^citation} No citation{/citation}
|
|
- I found a bug: templates with a space in the file name don't download
|
|
- It would be nice if we could use [angular expressions](https://docxtemplater.readthedocs.io/en/latest/angular_parse.html) to make more complex templates
|
|
- Ability to iterate over authors (to change the separator)
|
|
- Ability to get item number in a loop (for a list)
|
|
- To do things like checking if a CRP is "WLE"
|
|
|
|
<!-- vim: set sw=2 ts=2: -->
|