mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-06-29 09:33:48 +02:00
132 lines
7.5 KiB
Markdown
132 lines
7.5 KiB
Markdown
---
|
|
title: "September, 2020"
|
|
date: 2020-09-02T15:35:54+03:00
|
|
author: "Alan Orth"
|
|
categories: ["Notes"]
|
|
---
|
|
|
|
## 2020-09-02
|
|
|
|
- Replace Marissa van Epp for Rhys Bucknall in the CCAFS groups on CGSpace because Marissa no longer works at CCAFS
|
|
- The AReS Explorer hasn't updated its index since 2020-08-22 when I last forced it
|
|
- I restarted it again now and told Moayad that the automatic indexing isn't working
|
|
- Add `Alliance of Bioversity International and CIAT` to affiliations on CGSpace
|
|
- Abenet told me that the general search text on AReS doesn't get reset when you use the "Reset Filters" button
|
|
- I filed a bug on OpenRXV: https://github.com/ilri/OpenRXV/issues/39
|
|
- I filed an issue on OpenRXV to make some minor edits to the admin UI: https://github.com/ilri/OpenRXV/issues/40
|
|
|
|
<!--more-->
|
|
|
|
- I ran the country code tagger on CGSpace:
|
|
|
|
```
|
|
$ time chrt -b 0 dspace curate -t countrycodetagger -i all -r - -l 500 -s object | tee /tmp/2020-09-02-countrycodetagger.log
|
|
...
|
|
real 2m10.516s
|
|
user 1m43.953s
|
|
sys 0m15.192s
|
|
$ grep -c added /tmp/2020-09-02-countrycodetagger.log
|
|
39
|
|
```
|
|
|
|
- I still need to create a cron job for this...
|
|
- Sisay and Abenet said they can't log in with LDAP on DSpace Test (DSpace 6)
|
|
- I tried and I can't either... but it is working on CGSpace
|
|
- The error on DSpace 6 is:
|
|
|
|
```
|
|
2020-09-02 12:03:10,666 INFO org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=A629116488DCC467E1EA2062A2E2EFD7:ip_addr=92.220.02.201:failed_login:no DN found for user aorth
|
|
```
|
|
|
|
- I tried to query LDAP directly using the application credentials with ldapsearch and it works:
|
|
|
|
```
|
|
$ ldapsearch -x -H ldaps://AZCGNEROOT2.CGIARAD.ORG:636/ -b "dc=cgiarad,dc=org" -D "applicationaccount@cgiarad.org" -W "(sAMAccountName=me)"
|
|
```
|
|
|
|
- According to the [DSpace 6 docs](https://wiki.lyrasis.org/display/DSDOC6x/Authentication+Plugins#AuthenticationPlugins-LDAPAuthentication) we need to escape commas in our LDAP parameters due to the new configuration system
|
|
- I added the commas and restarted DSpace (though technically we shouldn't need to restart due to the new config system hot reloading configs)
|
|
- Run all system updates on DSpace Test (linode26) and reboot it
|
|
- After the restart LDAP login works...
|
|
|
|
## 2020-09-03
|
|
|
|
- Fix some erroneous "review status" fields that Abenet noticed on AReS
|
|
- I used my `fix-metadata-values.py` and `delete-metadata-values.py` scripts with the following input files:
|
|
|
|
```
|
|
$ cat 2020-09-03-fix-review-status.csv
|
|
dc.description.version,correct
|
|
Externally Peer Reviewed,Peer Review
|
|
Peer Reviewed,Peer Review
|
|
Peer review,Peer Review
|
|
Peer reviewed,Peer Review
|
|
Peer-Reviewed,Peer Review
|
|
Peer-reviewed,Peer Review
|
|
peer Review,Peer Review
|
|
$ cat 2020-09-03-delete-review-status.csv
|
|
dc.description.version
|
|
Report
|
|
Formally Published
|
|
Poster
|
|
Unrefereed reprint
|
|
$ ./delete-metadata-values.py -i 2020-09-03-delete-review-status.csv -db dspace -u dspace -p 'fuuu' -f dc.description.version -m 68
|
|
$ ./fix-metadata-values.py -i 2020-09-03-fix-review-status.csv -db dspace -u dspace -p 'fuuu' -f dc.description.version -t 'correct' -m 68
|
|
```
|
|
|
|
- Start reviewing 95 items for IITA (20201stbatch)
|
|
- I used my [csv-metadata-quality](https://github.com/ilri/csv-metadata-quality) tool to check and fix some low-hanging fruit first
|
|
- This fixed a few unnecessary Unicode, excessive whitespace, invalid multi-value separator, and duplicate metadata values
|
|
- Then I looked at the data in OpenRefine and noticed some things:
|
|
- All issue dates use year only, but some have months in the citation so they could be more specific
|
|
- I normalized all the DOIs to use "https://doi.org" format
|
|
- I fixed a few AGROVOC subjects with a simple GREL: `value.replace("GRAINS","GRAIN").replace("SOILS","SOIL").replace("CORN","MAIZE")`
|
|
- But there are a few more that are invalid that she will have to look at
|
|
- I uploaded the items to [DSpace Test](https://dspacetest.cgiar.org/handle/10568/108357) and it was apparently successful but I get these errors to the console:
|
|
|
|
```
|
|
Thu Sep 03 12:26:33 CEST 2020 | Query:containerItem:ea7a2648-180d-4fce-bdc5-c3aa2304fc58
|
|
Error while updating
|
|
java.lang.NullPointerException
|
|
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl$5.visit(SourceFile:1131)
|
|
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.visitEachStatisticShard(SourceFile:212)
|
|
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1104)
|
|
at com.atmire.dspace.cua.CUASolrLoggerServiceImpl.update(SourceFile:1093)
|
|
at org.dspace.statistics.StatisticsLoggingConsumer.consume(SourceFile:104)
|
|
at org.dspace.event.BasicDispatcher.consume(BasicDispatcher.java:177)
|
|
at org.dspace.event.BasicDispatcher.dispatch(BasicDispatcher.java:123)
|
|
at org.dspace.core.Context.dispatchEvents(Context.java:455)
|
|
at org.dspace.core.Context.commit(Context.java:424)
|
|
at org.dspace.core.Context.complete(Context.java:380)
|
|
at org.dspace.app.bulkedit.MetadataImport.main(MetadataImport.java:1399)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
|
at java.lang.reflect.Method.invoke(Method.java:498)
|
|
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
|
|
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
|
|
```
|
|
|
|
- There are more in the DSpace log so I will raise it with Atmire immediately
|
|
|
|
## 2020-09-04
|
|
|
|
- I was checking the recent IITA data for duplicates when I noticed that one in CIFOR's Archive and saw that CIFOR has updated a bunch of their website URLs, for example:
|
|
- http://www.cifor.org/nc/online-library/browse/view-publication/publication/151.html → https://www.cifor.org/knowledge/publication/151
|
|
- https://www.cifor.org/library/4033 → https://www.cifor.org/knowledge/publication/4033
|
|
- https://www.cifor.org/pid/5087 → https://www.cifor.org/knowledge/publication/5087
|
|
- I will update our nearly 6,000 metadata values for CIFOR in the database accordingly:
|
|
|
|
```
|
|
dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^(http://)?www\.cifor\.org/(nc/)?online-library/browse/view-publication/publication/([[:digit:]]+)\.html$', 'https://www.cifor.org/knowledge/publication/\3') WHERE metadata_field_id=219 AND text_value ~ 'www\.cifor\.org/(nc/)?online-library/browse/view-publication/publication/[[:digit:]]+';
|
|
dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^https?://www\.cifor\.org/library/([[:digit:]]+)/?$', 'https://www.cifor.org/knowledge/publication/\1') WHERE metadata_field_id=219 AND text_value ~ 'https?://www\.cifor\.org/library/[[:digit:]]+/?';
|
|
dspace=# UPDATE metadatavalue SET text_value = regexp_replace(text_value, '^https?://www\.cifor\.org/pid/([[:digit:]]+)/?$', 'https://www.cifor.org/knowledge/publication/\1') WHERE metadata_field_id=219 AND text_value ~ 'https?://www\.cifor\.org/pid/[[:digit:]]+';
|
|
```
|
|
|
|
- I did some cleanup on the author affiliations of the IITA data our 2019-04 list using reconcile-csv and OpenRefine:
|
|
- `$ lein run ~/src/git/DSpace/2019-04-08-affiliations.csv name id`
|
|
- I always forget how to copy the reconciled values in OpenRefine, but you need to make a new column and populate it using this GREL: `if(cell.recon.matched, cell.recon.match.name, value)`
|
|
- I mapped one duplicated from the CIFOR Archives and re-uploaded the 94 IITA items to a new collection on [DSpace Test](https://dspacetest.cgiar.org/handle/10568/108453)
|
|
|
|
<!-- vim: set sw=2 ts=2: -->
|