- Maria asked me to update Charles Staver's ORCID iD in the submission template and on CGSpace, as his name was lower case before, and now he has corrected it
- [The first](https://hdl.handle.net/10568/103225) is still missing its DOI, so I added it and [tweeted its handle](https://twitter.com/mralanorth/status/1245632619661766657) (after a few hours there was a donut with score 222)
- [The second item](https://hdl.handle.net/10568/106899) now has a donut with score 2 since I [tweeted its handle](https://twitter.com/mralanorth/status/1243158045540134913) last week
- [The third item](https://hdl.handle.net/10568/107258) now has a donut with score 1 since I [tweeted it](https://twitter.com/mralanorth/status/1243158786392625153) last week
- On the same note, the [one item](https://hdl.handle.net/10568/106573) Abenet pointed out last week now has a donut with score of 104 after I [tweeted it](https://twitter.com/mralanorth/status/1243163710241345536) last week
- Altmetric responded about [one item](https://hdl.handle.net/10568/101286) that had no donut since at least 2019-12 and said they fixed some problems with their bot's user agent
- I decided to [tweet the item](https://twitter.com/mralanorth/status/1245703049445851140), as I can't remember if I ever did it before
- Update PostgreSQL JDBC driver to version 42.2.12
## 2020-04-07
- Yesterday Atmire sent me their [pull request for DSpace 6 modules](https://github.com/ilri/DSpace/pull/445)
- Peter pointed out that some items have his ORCID identifier (`cg.creator.id`) twice
- I think this is because my early `add-orcid-identifiers.py` script was adding identifiers to existing records without properly checking if there was already one present (at first it only checked if there was one with the exact `place` value)
- As a test I dropped all his ORCID identifiers and added them back with the `add-orcid-identifiers.py` script:
```
$ psql -h localhost -U postgres dspace -c "DELETE FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=240 AND text_value LIKE '%Ballantyne%';"
- I used this CSV with the script (all records with his name have the name standardized like this):
```
dc.contributor.author,cg.creator.id
"Ballantyne, Peter G.","Peter G. Ballantyne: 0000-0001-9346-2893"
```
- Then I tried another way, to identify all duplicate ORCID identifiers for a given resource ID and group them so I can see if count is greater than 1:
```
dspace=# \COPY (SELECT DISTINCT(resource_id, text_value) as distinct_orcid, COUNT(*) FROM metadatavalue WHERE resource_type_id = 2 AND metadata_field_id = 240 GROUP BY distinct_orcid ORDER BY count DESC) TO /tmp/2020-04-07-duplicate-orcids.csv WITH CSV HEADER;
COPY 15209
```
- Of those, about nine authors had duplicate ORCID identifiers over about thirty records, so I created a CSV with all their name variations and ORCID identifiers:
```
dc.contributor.author,cg.creator.id
"Ballantyne, Peter G.","Peter G. Ballantyne: 0000-0001-9346-2893"
- Then I deleted *all* their existing ORCID identifier records:
```
dspace=# DELETE FROM metadatavalue WHERE resource_type_id=2 AND metadata_field_id=240 AND text_value SIMILAR TO '%(0000-0001-6543-0798|0000-0001-9346-2893|0000-0002-6950-4018|0000-0002-7583-3811|0000-0002-8044-583X|0000-0002-8599-7895|0000-0003-0934-1218|0000-0003-2765-7101)%';
DELETE 994
```
- And then I added them again using the `add-orcid-identifiers` records:
- I started testing the [pull request](https://github.com/ilri/DSpace/pull/445) sent by Atmire yesterday
- I notice that we now need `yarn` to build, and I need to bump the Node.js `engine` version in our Mirage 2 theme in order to get it to build on Node.js 10.x
- Font Awesome icons for GitHub etc weren't loading, and after a bit of troubleshooting I replaced version 4.5.0 with 5.13.0 and to my surprise they now include Mendeley and ORCID so we can get rid of the Academicons dependency
- Testing the Atmire DSpace 6.3 code with a clean CGSpace DSpace 5.8 database snapshot
- One Flyway migration failed so I had to manually remove it (and of coursecreate the pgcrypto extension):
```
dspace63=# DELETE FROM schema_version WHERE version IN ('5.8.2015.12.03.3');
dspace63=# CREATE EXTENSION pgcrypto;
```
- Then DSpace 6.3 started up OK and I was able to see some statistics in the Content and Usage Analysis (CUA) module, but not on community, collection, or item pages
- I also noticed at least one of these errors in the DSpace log:
- Run system updates on DSpace Test (linode26) and reboot it
- More work on the DSpace 6.3 stuff, improving the GDPR consent logic to use [haven](https://github.com/chiiya/haven) instead of cookieconsent
- It works better by injecting the Google Analytics script after the user clicks agree, and it also has a preferences section that gets automatically injected on the privacy page!
- I realized that `solr-upgrade-statistics-6x` only processes 100,000 records by default so I think we actually need to finish running it for all legacy Solr records before asking Atmire why CUA statlets and detailed statistics aren't working
- For now I am just doing 250,000 records at a time on my local environment:
- Despite running the migration for all of my local 1.5 million Solr records, I still see a few hundred thousand like `-1` and `0-unmigrated`
- I will purge them all and try to import only a subset...
- After importing again I see there are indeed tens of thousands of these documents with IDs "-1" and "0"
- They are all `type: 5`, which is "SITE" according to `Constants.java`:
```
/** DSpace site type */
public static final int SITE = 5;
```
- Even after deleting those documents and re-running `solr-upgrade-statistics-6x` I still get the UUID errors when using CUA and the statlets
- I have sent some feedback and questions to Atmire (including about the  issue with glypicons in the header trail)
- In other news, my local Artifactory container stopped working for some reason so I re-created it and it seems some things have changed upstream (port 8082 for web UI?):
dspace=# UPDATE metadatavalue SET text_value='Knight-Jones, Theodore J.D.' WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value='Knight-Jones, T.J.D.';
```
- I updated his existing records on CGSpace, changed the controlled lists, added his ORCID identifier to the controlled list, and tagged his thirty-nine items with the ORCID iD
- The new DSpace 6 stuff that Atmire sent modifies the Mirage 2's `pom.xml` to copy the each theme's resulting `node_modules` to each theme after building and installing with `ant update` because they moved some packages from bower to npm and now reference them in `page-structure.xsl`
- This is a good idea, because bower is no longer supported, and npm has gotten a lot better, but it causes an extra 200,000 files to get copied!
- Most scripts are concatenated into `theme.js` during build, so we don't need the `node_modules` after that, but there are three scripts in `page-structure.xsl` that are not included there
- The scripts are a very old version of modernizr which is not even available on npm, html5shiv, and respond.js
- For modernizr I can simply download a static copy and put it in `0_CGIAR/scripts` and concatenate it into `theme.js`
- For the others, I can revert to using them from bower's `vendor` directory, which is installed by the parent XMLUI Mirage 2 theme
- During this process I also realized that `mvn clean` doesn't actually clean everything, and `dspace/modules/xmlui-mirage2/target` is remaining from previous builds and contains a bunch of shit from previous builds (including all the themes which I was trying to build without!)
- This must be a DSpace bug, but I should theoretically check on vanilla DSpace and then file a bug...
- Atmire responded to some of the issues I raised earlier this week about the DSpace 6 pull request
- They said they don't think the glyphicon encoding issue is due to their changes, but I built a new clean version of the vanilla `6_x-dev` branch from before their pull request and it *does not* have the encoding issue in the Mirage 2 header trails
- Also, they said we need to use something called `AtomicStatisticsUpdateCLI` to do the Solr legacy integer ID to UUID conversion so I asked for more information about that workflow
- I researched a bit more about the  issue with glypicons and realized it's due to [a bug in libsass](https://github.com/twbs/bootstrap-sass/issues/919)
- I have Ruby sass version 3.4.25 installed in my local environment, but DSpace Mirage 2 is supposed to build with 3.3.14
- Downgrading the version fixes it, though I wonder: why did I not have this issue on the `6.x-dev` branch before Atmire's pull request, and also on `5_x-prod` where I've been building for a few months here...
- I deployed the latest `5_x-prod` branch on CGSpace (linode18), ran all updates, and rebooted the server
- This includes the "COVID19" ILRI subject, the new CCAFS Phase II project tags, and the changes to an ILRI author name
- After restarting the server I had to restart Tomcat three times before all Solr statistics cores loaded properly.
- After that I started a full Discovery reindexing to pick up some author changes I made last week:
- I spent some time working on the XMLUI themes in DSpace 6
- Atmire's pull request modifies `pom.xml` to *not* exclude `node_modules`, which means an extra ~260,000 files get copied to our installation folder because of all the themes
- I worked on the `Gruntfile.js` to copy Font Awesome and Bootstrap glyphicon fonts out of `node_modules` and into `fonts` at build time, but still `jquery-ui.min.css` was being referenced as a `url()` in CSS
- SASS can include imported CSS in your compiled CSS—instead of including an `@import url(..)` if you import it without the ".css", but our version of Ruby SASS doesn't support that
- I hacked `Gruntfile.js` to use `dart-sass` instead of Ruby `compass` (including installing compass's mixins via npm!) but then [dart-sass converts all the glyphicon ASCII escape codes to Unicode literals](https://github.com/sass/dart-sass/issues/345) and they show up garbled in Firefox
- I tried to use `node-sass` instead of dart-sass and it doesn't replace the ASCII escapes with literals, but then I get the the  issue with the glyphicon in the header trail again! Back to square one!
- So that was a waste of five hours...
- I might just leave this [tiny hack](https://github.com/twbs/bootstrap-sass/issues/919) in `0_CGIAR/styles/_style.scss` to override this and be done with it:
- On the third run it took one hour to process 150,000 records
## 2020-04-29
- I found out that running the Atmire CUA script with more memory and a larger number of records (-r 20000) makes it run faster
- Now the process finishes, but there are errors on some records:
```
Record uid: ee085cc0-0110-42c5-80b9-0fad4015ed9f couldn't be processed
com.atmire.statistics.util.update.atomic.ProcessingException: something went wrong while processing record uid: ee085cc0-0110-42c5-80b9-0fad4015ed9f, an error occured in the com.atmire.statistics.util.update.atomic.processor.ContainerOwnerDBProcessor
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(AtomicStatisticsUpdater.java:304)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.processRecords(AtomicStatisticsUpdater.java:176)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.performRun(AtomicStatisticsUpdater.java:161)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.update(AtomicStatisticsUpdater.java:128)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdateCLI.main(AtomicStatisticsUpdateCLI.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)
Caused by: java.lang.NullPointerException
at com.atmire.dspace.cua.CUADSOServiceImpl.findByLegacyID(CUADSOServiceImpl.java:40)
at com.atmire.statistics.util.update.atomic.processor.AtomicUpdateProcessor.getDSpaceObject(AtomicUpdateProcessor.java:49)
at com.atmire.statistics.util.update.atomic.processor.ContainerOwnerDBProcessor.process(ContainerOwnerDBProcessor.java:45)
at com.atmire.statistics.util.update.atomic.processor.ContainerOwnerDBProcessor.visit(ContainerOwnerDBProcessor.java:38)
at com.atmire.statistics.util.update.atomic.record.UsageRecord.accept(UsageRecord.java:23)
at com.atmire.statistics.util.update.atomic.AtomicStatisticsUpdater.applyProcessors(AtomicStatisticsUpdater.java:301)
... 10 more
```
- I've sent a message to Atmire to ask for advice
- Also, now I can actually see the CUA statlets and usage statistics
- Unfortunately it seems they are using Font Awesome 4 in their CUA module and this means that some icons are broken because the names have changed, but also some of their code is using Unicode characters instead of classes in spans!
- I've reverted my sweet SVG work from yesterday and adjusted the Font Awesome 5 SCSS to add a few more icons that they are using
$ psql -c 'select * from pg_stat_activity' | grep 'dspaceWeb' | grep -c "idle in transaction"
67
```
- I don't see anything in the PostgreSQL or Tomcat logs suggesting anything is wrong... I think the solution to clear these idle connections is probably to just restart Tomcat
- Delphi is only used by IP addresses in Greece, so that's obviously the GARDIAN people harvesting us...
- I have no idea what OgScrper is, but it's not a user!
- Then there are 276,000 hits from `MEL-API` from Jordanian IPs in 2018, so that's obviously CodeObia guys...
- Other user agents:
- GigablastOpenSource/1
- Owlin Domain Resolver V1
- API scraper
- MetaURI
- I don't know why, but my `check-spider-hits.sh` script doesn't seem to be handling the user agents with spaces properly so I will delete those manually after
- First delete the ones without spaces, creating a temp file in `/tmp/agents` containing the patterns:
```
$ for year in {2010..2019}; do ./check-spider-hits.sh -f /tmp/agents -s statistics-$year -p; done
$ for year in {2010..2019}; do curl -s "http://localhost:8081/solr/statistics-$year/update?softCommit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>userAgent:"Delphi 2009"</query></delete>'; done