--- title: "February, 2020" date: 2020-02-02T11:56:30+02:00 author: "Alan Orth" categories: ["Notes"] --- ## 2020-02-02 - Continue working on porting CGSpace's DSpace 5 code to DSpace 6.3 that I started yesterday - Sign up for an account with MaxMind so I can get the GeoLite2-City.mmdb database - I still need to wire up the API credentials and cron job into the Ansible infrastructure playbooks - Fix some minor issues in the config and XMLUI themes, like removing Atmire stuff - The code finally builds and runs with a fresh install - Now we don't specify the build environment because site modification are in `local.cfg`, so we just build like this: ``` $ schedtool -D -e ionice -c2 -n7 nice -n19 mvn -U -Dmirage2.on=true -Dmirage2.deps.included=false clean package ``` - And it seems that we need to enabled `pg_crypto` now (used for UUIDs): ``` $ psql -h localhost -U postgres dspace63 dspace63=# CREATE EXTENSION pgcrypto; CREATE EXTENSION pgcrypto; ``` - I tried importing a PostgreSQL snapshot from CGSpace and had errors due to missing Atmire database migrations - If I try to run `dspace database migrate` I get the IDs of the migrations that are missing - I delete them manually in psql: ``` dspace63=# DELETE FROM schema_version WHERE version IN ('5.0.2015.01.27', '5.6.2015.12.03.2', '5.6.2016.08.08', '5.0.2017.04.28', '5.0.2017.09.25', '5.8.2015.12.03.3'); ``` - Then I ran `dspace database migrate` and got an error: ``` $ ~/dspace63/bin/dspace database migrate Database URL: jdbc:postgresql://localhost:5432/dspace63?ApplicationName=dspaceCli Migrating database to latest version... (Check dspace logs for details) Migration exception: java.sql.SQLException: Flyway migration error occurred at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:673) at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:576) at org.dspace.storage.rdbms.DatabaseUtils.main(DatabaseUtils.java:221) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229) at org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81) Caused by: org.flywaydb.core.internal.dbsupport.FlywaySqlScriptException: Migration V6.0_2015.03.07__DS-2701_Hibernate_migration.sql failed ----------------------------------------------------------------- SQL State : 2BP01 Error Code : 0 Message : ERROR: cannot drop table metadatavalue column resource_id because other objects depend on it Detail: view eperson_metadata depends on table metadatavalue column resource_id Hint: Use DROP ... CASCADE to drop the dependent objects too. Location : org/dspace/storage/rdbms/sqlmigration/postgres/V6.0_2015.03.07__DS-2701_Hibernate_migration.sql (/home/aorth/src/git/DSpace-6.3/file:/home/aorth/dspace63/lib/dspace-api-6.3.jar!/org/dspace/storage/rdbms/sqlmigration/postgres/V6.0_2015.03.07__DS-2701_Hibernate_migration.sql) Line : 391 Statement : ALTER TABLE metadatavalue DROP COLUMN IF EXISTS resource_id at org.flywaydb.core.internal.dbsupport.SqlScript.execute(SqlScript.java:117) at org.flywaydb.core.internal.resolver.sql.SqlMigrationExecutor.execute(SqlMigrationExecutor.java:71) at org.flywaydb.core.internal.command.DbMigrate.doMigrate(DbMigrate.java:352) at org.flywaydb.core.internal.command.DbMigrate.access$1100(DbMigrate.java:47) at org.flywaydb.core.internal.command.DbMigrate$4.doInTransaction(DbMigrate.java:308) at org.flywaydb.core.internal.util.jdbc.TransactionTemplate.execute(TransactionTemplate.java:72) at org.flywaydb.core.internal.command.DbMigrate.applyMigration(DbMigrate.java:305) at org.flywaydb.core.internal.command.DbMigrate.access$1000(DbMigrate.java:47) at org.flywaydb.core.internal.command.DbMigrate$2.doInTransaction(DbMigrate.java:230) at org.flywaydb.core.internal.command.DbMigrate$2.doInTransaction(DbMigrate.java:173) at org.flywaydb.core.internal.util.jdbc.TransactionTemplate.execute(TransactionTemplate.java:72) at org.flywaydb.core.internal.command.DbMigrate.migrate(DbMigrate.java:173) at org.flywaydb.core.Flyway$1.execute(Flyway.java:959) at org.flywaydb.core.Flyway$1.execute(Flyway.java:917) at org.flywaydb.core.Flyway.execute(Flyway.java:1373) at org.flywaydb.core.Flyway.migrate(Flyway.java:917) at org.dspace.storage.rdbms.DatabaseUtils.updateDatabase(DatabaseUtils.java:662) ... 8 more Caused by: org.postgresql.util.PSQLException: ERROR: cannot drop table metadatavalue column resource_id because other objects depend on it Detail: view eperson_metadata depends on table metadatavalue column resource_id Hint: Use DROP ... CASCADE to drop the dependent objects too. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2422) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2167) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:307) at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:293) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:270) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:266) at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291) at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291) at org.flywaydb.core.internal.dbsupport.JdbcTemplate.executeStatement(JdbcTemplate.java:238) at org.flywaydb.core.internal.dbsupport.SqlScript.execute(SqlScript.java:114) ... 24 more ``` - I think I might need to update the sequences first... nope - Perhaps it's due to some missing bitstream IDs and I need to run `dspace cleanup` on CGSpace and take a new PostgreSQL dump... nope - A thread on the dspace-tech mailing list regarding this migration noticed that his database had some views created that were using the `resource_id` column - Our database had the same issue, where the `eperson_metadata` view was created by something (Atmire module?) but has no references in the vanilla DSpace code, so I dropped it and tried the migration again: ``` dspace63=# DROP VIEW eperson_metadata; DROP VIEW ``` - After that the migration was successful and DSpace starts up successfully and begins indexing - xmlui, solr, jspui, rest, and oai are working (rest was redirecting to HTTPS, so I set the Tomcat connector to `secure="true"` and it fixed it on localhost, but caused other issues so I disabled it for now) - I started diffing our themes against the Mirage 2 reference theme to capture the latest changes ## 2020-02-03 - Update DSpace mimetype fallback images from [KDE Breeze Icons](https://github.com/KDE/breeze-icons) project - Our icons are four years old (see [my bitstream icons demo](https://alanorth.github.io/dspace-bitstream-icons/)) - Issues remaining in the DSpace 6 port of our CGSpace 5.x code: - [x] Community and collection pages only show one recent submission (seems that there is only one item in Solr?) - [x] Community and collection pages have tons of "Browse" buttons that we need to remove - [x] Order of navigation elements in right side bar ("My Account" etc, compare to DSpace Test) - [ ] Home page trail says "CGSpace Home" instead of "CGSpace Home / Community List" (see DSpace Test) - There are lots of errors in the DSpace log, which might explain some of the issues with recent submissions / Solr: ``` 2020-02-03 10:27:14,485 ERROR org.dspace.browse.ItemCountDAOSolr @ caught exception: org.dspace.discovery.SearchServiceException: Invalid UUID string: 1 2020-02-03 13:20:20,475 ERROR org.dspace.app.xmlui.aspect.discovery.AbstractRecentSubmissionTransformer @ Caught SearchServiceException while retrieving recent submission for: home page org.dspace.discovery.SearchServiceException: Invalid UUID string: 111210 ``` - If I look in Solr's search core I do actually see items with integers for their resource ID, which I think are all supposed to be UUIDs now... - I dropped all the documents in the search core: ``` $ http --print b 'http://localhost:8080/solr/search/update?stream.body=*:*&commit=true' ``` - Still didn't work, so I'm going to try a clean database import and migration: ``` $ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspace63 $ psql -h localhost -U postgres -c 'alter user dspacetest superuser;' $ pg_restore -h localhost -U postgres -d dspace63 -O --role=dspacetest -h localhost dspace_2020-01-27.backup $ psql -h localhost -U postgres -c 'alter user dspacetest nosuperuser;' $ psql -h localhost -U postgres dspace63 dspace63=# CREATE EXTENSION pgcrypto; dspace63=# DELETE FROM schema_version WHERE version IN ('5.0.2015.01.27', '5.6.2015.12.03.2', '5.6.2016.08.08', '5.0.2017.04.28', '5.0.2017.09.25', '5.8.2015.12.03.3'); dspace63=# DROP VIEW eperson_metadata; dspace63=# \q $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspace63 $ ~/dspace63/bin/dspace database migrate ``` - I notice that the indexing doesn't work correctly if I start it manually with `dspace index-discovery -b` (search.resourceid becomes an integer!) - If I induce an indexing by touching `dspace/solr/search/conf/reindex.flag` the search.resourceid are all UUIDs... - Speaking of database stuff, there was a performance-related update for the [indexes that we used in DSpace 5](https://github.com/DSpace/DSpace/pull/1791/) - We might want to [apply it in DSpace 6](https://github.com/DSpace/DSpace/pull/1792), as it was never merged to 6.x, but it helped with the performance of `/submissions` in XMLUI for us in [2018-03]({{< relref path="2018-03.md" >}}) ## 2020-02-04 - The indexing issue I was having yesterday seems to only present itself the first time a new installation is running DSpace 6 - Once the indexing induced by touching `dspace/solr/search/conf/reindex.flag` has finished, subsequent manual invocations of `dspace index-discovery -b` work as expected - Nevertheless, I sent a message to the dspace-tech mailing list describing the issue to see if anyone has any comments - I am seeing that the number of important commits on the unreleased DSpace 6.4 are really numerous and it might be better for us to target that version - I did a simple test and it's easy to rebase my current 6.3 branch on top of the upstream `dspace-6_x` branch: ``` $ git checkout -b 6_x-dev64 6_x-dev $ git rebase -i upstream/dspace-6_x ``` - I finally understand why our themes show all the "Browse by" buttons on community and collection pages in DSpace 6.x - The code in `./dspace-xmlui/src/main/java/org/dspace/app/xmlui/aspect/browseArtifacts/CommunityBrowse.java` iterates over all the browse indexes and prints them when it is called - The XMLUI theme code in `dspace/modules/xmlui-mirage2/src/main/webapp/themes/0_CGIAR/xsl/preprocess/browse.xsl` calls the template because the id of the div matches "aspect.browseArtifacts.CommunityBrowse.list.community-browse" - I checked the DRI of a community page on my local 6.x and DSpace Test 5.x by appending `?XML` to the URL and I see the ID is missing on DSpace 5.x - The issue is the same with the ordering of the "My Account" link, but in Navigation.java - I tried modifying `preprocess/browse.xsl` but it always ends up printing some default list of browse by links... - I'm starting to wonder if Atmire's modules somehow override this, as I don't see how `CommunityBrowse.java` can behave like ours on DSpace 5.x unless they have overridden it (as the open source code is the same in 5.x and 6.x) - At least the "account" link in the sidebar is overridden in our 5.x branch because Atmire copied a modified `Navigation.java` to the local xmlui modules folder... so that explains that (and it's easy to replicate in 6.x) ## 2020-02-05 - UptimeRobot told me that AReS Explorer crashed last night, so I logged into it, ran all updates, and rebooted it - Testing Discovery indexing speed on my local DSpace 6.3: ``` $ time schedtool -D -e ~/dspace63/bin/dspace index-discovery -b schedtool -D -e ~/dspace63/bin/dspace index-discovery -b 3771.78s user 93.63s system 41% cpu 2:34:19.53 total schedtool -D -e ~/dspace63/bin/dspace index-discovery -b 3360.28s user 82.63s system 38% cpu 2:30:22.07 total schedtool -D -e ~/dspace63/bin/dspace index-discovery -b 4678.72s user 138.87s system 42% cpu 3:08:35.72 total schedtool -D -e ~/dspace63/bin/dspace index-discovery -b 3334.19s user 86.54s system 35% cpu 2:41:56.73 total ``` - DSpace 5.8 was taking about 1 hour (or less on this laptop), so this is 2-3 times longer! ``` $ time schedtool -D -e ~/dspace/bin/dspace index-discovery -b schedtool -D -e ~/dspace/bin/dspace index-discovery -b 299.53s user 69.67s system 20% cpu 30:34.47 total schedtool -D -e ~/dspace/bin/dspace index-discovery -b 270.31s user 69.88s system 19% cpu 29:01.38 total ``` - Checking out the DSpace 6.x REST API query client - There is a [tutorial](https://terrywbrady.github.io/restReportTutorial/intro) that explains how it works and I see it is very powerful because you can export a CSV of results in order to fix and re-upload them with batch import! - Custom queries can be added in `dspace-rest/src/main/webapp/static/reports/restQueryReport.js` - I noticed two new bots in the logs with the following user agents: - `Jersey/2.6 (HttpUrlConnection 1.8.0_152)` - `magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)` - I filed an [issue to add Jersey to the COUNTER-Robots](https://github.com/atmire/COUNTER-Robots/issues/30) list - Peter noticed that the statlets on community, collection, and item pages aren't working on CGSpace - I thought it might be related to the fact that the yearly sharding didn't complete successfully this year so the `statistics-2019` core is empty - I removed the `statistics-2019` core and had to restart Tomcat like six times before all cores would load properly (ugh!!!!) - After that the statlets were working properly... - Run all system updates on DSpace Test (linode19) and restart it ## 2020-02-06 - I sent a mail to the dspace-tech mailing list asking about slow Discovery indexing speed in DSpace 6 - I destroyed my PostgreSQL 9.6 containers and re-created them using PostgreSQL 10 to see if there are any speedups with DSpace 6.x: ``` $ podman pull postgres:10-alpine $ podman run --name dspacedb10 -v dspacedb_data:/var/lib/postgresql/data -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:10-alpine $ createuser -h localhost -U postgres --pwprompt dspacetest $ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspacetest $ createdb -h localhost -U postgres -O dspacetest --encoding=UNICODE dspace63 $ psql -h localhost -U postgres -c 'alter user dspacetest superuser;' $ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2020-02-06.backup $ pg_restore -h localhost -U postgres -d dspace63 -O --role=dspacetest -h localhost ~/Downloads/cgspace_2020-02-06.backup $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest $ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspace63 $ psql -h localhost -U postgres -c 'alter user dspacetest nosuperuser;' $ psql -h localhost -U postgres dspace63 dspace63=# CREATE EXTENSION pgcrypto; dspace63=# DELETE FROM schema_version WHERE version IN ('5.0.2015.01.27', '5.6.2015.12.03.2', '5.6.2016.08.08', '5.0.2017.04.28', '5.0.2017.09.25', '5.8.2015.12.03.3'); dspace63=# DROP VIEW eperson_metadata; dspace63=# \q ``` - I purged ~33,000 hits from the "Jersey/2.6" bot in CGSpace's statistics using my `check-spider-hits.sh` script: ``` $ ./check-spider-hits.sh -d -p -f /tmp/jersey -s statistics -u http://localhost:8081/solr $ for year in 2018 2017 2016 2015; do ./check-spider-hits.sh -d -p -f /tmp/jersey -s "statistics-${year}" -u http://localhost:8081/solr; done ``` - I noticed another user agen in the logs that we should add to the list: ``` ReactorNetty/0.9.2.RELEASE ``` - I made [an issue on the COUNTER-Robots repository](https://github.com/atmire/COUNTER-Robots/issues/31) - I found a [nice tool for exporting and importing Solr records](https://github.com/freedev/solr-import-export-json) and it seems to workfor exporting our 2019 stats from the large statistics core! ``` $ ./run.sh -s http://localhost:8081/solr/statistics -a export -o /tmp/statistics-2019-01.json -f 'dateYearMonth:2019-01' -k uid $ ls -lh /tmp/statistics-2019-01.json -rw-rw-r-- 1 aorth aorth 3.7G Feb 6 09:26 /tmp/statistics-2019-01.json ``` - Then I tested importing this by creating a new core in my development environment: ``` $ curl 'http://localhost:8080/solr/admin/cores?action=CREATE&name=statistics-2019&instanceDir=/home/aorth/dspace/solr/statistics&dataDir=/home/aorth/dspace/solr/statistics-2019/data' $ ./run.sh -s http://localhost:8080/solr/statistics-2019 -a import -o ~/Downloads/statistics-2019-01.json -k uid ``` - This imports the records into the core, but DSpace can't see them, and when I restart Tomcat the core is not seen by Solr... - I got the core to load by adding it to `dspace/solr/solr.xml` manually, ie: ``` ... ... ``` - But I don't like having to do that... why doesn't it load automatically? - I sent a mail to the dspace-tech mailing list to ask about it - Just for fun I tried to load these stats into a Solr 7.7.2 instance using the DSpace 7 solr config: - First, create a Solr statistics core using the DSpace 7 config: ``` $ ./bin/solr create_core -c statistics -d ~/src/git/DSpace/dspace/solr/statistics/conf -p 8983 ``` - Then try to import the stats, skipping a shitload of fields that are apparently added to our Solr statistics by Atmire modules: ``` $ ./run.sh -s http://localhost:8983/solr/statistics -a import -o ~/Downloads/statistics-2019-01.json -k uid -S author_mtdt,author_mtdt_search,iso_mtdt_search,iso_mtdt,subject_mtdt,subject_mtdt_search,containerCollection,containerCommunity,containerItem,countryCode_ngram,countryCode_search,cua_version,dateYear,dateYearMonth,geoipcountrycode,ip_ngram,ip_search,isArchived,isInternal,isWithdrawn,containerBitstream,file_id,referrer_ngram,referrer_search,userAgent_ngram,userAgent_search,version_id,complete_query,complete_query_search,filterquery,ngram_query_search,ngram_simplequery_search,simple_query,simple_query_search,range,rangeDescription,rangeDescription_ngram,rangeDescription_search,range_ngram,range_search,actingGroupId,actorMemberGroupId,bitstreamCount,solr_update_time_stamp,bitstreamId ``` - OK that imported! I wonder if it works... maybe I'll try another day ## 2020-02-07 - I did some investigation into DSpace indexing performance using flame graphs - Excellent introduction: http://www.brendangregg.com/flamegraphs.html - Using flame graphs with java: https://netflixtechblog.com/java-in-flames-e763b3d32166 - Fantastic wrapper scripts for doing perf on Java processes: https://github.com/jvm-profiling-tools/perf-map-agent ``` $ cd ~/src/git/perf-map-agent $ cmake . $ make $ ./bin/create-links-in ~/.local/bin $ export FLAMEGRAPH_DIR=/home/aorth/src/git/FlameGraph $ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk $ export JAVA_OPTS="-XX:+PreserveFramePointer" $ ~/dspace63/bin/dspace index-discovery -b & # pid of tomcat java process $ perf-java-flames 4478 # pid of java indexing process $ perf-java-flames 11359 ``` - All Java processes need to have `-XX:+PreserveFramePointer` if you want to trace their methods - I did the same tests against DSpace 5.8 and 6.4-SNAPSHOT's CLI indexing process and Tomcat process - For what it's worth, it appears all the Hibernate stuff is in the CLI processes, so we don't need to trace the Tomcat process - Here is the flame graph for DSpace 5.8's `dspace index-discovery -b` java process: ![DSpace 5.8 index-discovery flame graph](/cgspace-notes/2020/02/flamegraph-java-cli-dspace58.svg) - Here is the flame graph for DSpace 6.4-SNAPSHOT's `dspace index-discovery -b` java process: ![DSpace 6.4-SNAPSHOT index-discovery flame graph](/cgspace-notes/2020/02/flamegraph-java-cli-dspace64-snapshot.svg) - If the width of the stacks indicates time, then it's clear that Hibernate takes longer... - Apparently there is a "flame diff" tool, I wonder if we can use that to compare! ## 2020-02-09 - This weekend I did a lot more testing of indexing performance with our DSpace 5.8 branch, vanilla DSpace 5.10, and vanilla DSpace 6.4-SNAPSHOT: ``` # CGSpace 5.8 schedtool -D -e ~/dspace/bin/dspace index-discovery -b 385.72s user 131.16s system 19% cpu 43:21.18 total schedtool -D -e ~/dspace/bin/dspace index-discovery -b 382.95s user 127.31s system 20% cpu 42:10.07 total schedtool -D -e ~/dspace/bin/dspace index-discovery -b 368.56s user 143.97s system 20% cpu 42:22.66 total schedtool -D -e ~/dspace/bin/dspace index-discovery -b 360.09s user 104.03s system 19% cpu 39:24.41 total # Vanilla DSpace 5.10 schedtool -D -e ~/dspace510/bin/dspace index-discovery -b 236.19s user 59.70s system 3% cpu 2:03:31.14 total schedtool -D -e ~/dspace510/bin/dspace index-discovery -b 232.41s user 50.38s system 3% cpu 2:04:16.00 total # Vanilla DSpace 6.4-SNAPSHOT schedtool -D -e ~/dspace63/bin/dspace index-discovery -b 5112.96s user 127.80s system 40% cpu 3:36:53.98 total schedtool -D -e ~/dspace63/bin/dspace index-discovery -b 5112.96s user 127.80s system 40% cpu 3:21:0.0 total ``` - I generated better flame graphs for the DSpace indexing process by using `perf-record-stack` and filtering out the java process: ``` $ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk $ export PERF_RECORD_SECONDS=60 $ export JAVA_OPTS="-XX:+PreserveFramePointer" $ time schedtool -D -e ~/dspace/bin/dspace index-discovery -b & # process id of java indexing process (not Tomcat) $ perf-java-record-stack 169639 $ sudo perf script -i /tmp/perf-169639.data > out.dspace510-1 $ cat out.dspace510-1 | ../FlameGraph/stackcollapse-perf.pl | grep -E '^java' | ../FlameGraph/flamegraph.pl --color=java --hash > out.dspace510-1.svg ``` - All data recorded on my laptop with the same kernel, same boot, etc. - CGSpace 5.8 (with Atmire patches): ![DSpace 5.8 (with Atmire modules) index-discovery flame graph](/cgspace-notes/2020/02/out.dspace58-2.svg) - Vanilla DSpace 5.10: ![Vanilla DSpace 5.10 index-discovery flame graph](/cgspace-notes/2020/02/out.dspace510-3.svg) - Vanilla DSpace 6.4-SNAPSHOT: ![Vanilla DSpace 6.4-SNAPSHOT index-discovery flame graph](/cgspace-notes/2020/02/out.dspace64-3.svg) - I sent my feedback to the dspace-tech mailing list so someone can hopefully comment. - Last week Peter asked Sisay to upload some items to CGSpace in the GENNOVATE collection (part of Gender CRP) - He uploaded them here: https://cgspace.cgiar.org/handle/10568/105926 - On a whim I checked and found five duplicates there, which means Sisay didn't even check ## 2020-02-10 - Follow up with [Atmire about DSpace 6.x upgrade](https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=706) - I raised the issue of targetting 6.4-SNAPSHOT as well as the Discovery indexing performance issues in 6.x ## 2020-02-11 - Maria from Bioversity asked me to add some ORCID iDs to our controlled vocabulary so I combined them with our existing ones and updated the names from the ORCID API: ``` $ cat ~/src/git/DSpace/dspace/config/controlled-vocabularies/cg-creator-id.xml /tmp/bioversity-orcid-ids.txt | grep -oE '[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}' | sort | uniq > /tmp/2020-02-11-combined-orcids.txt $ ./resolve-orcids.py -i /tmp/2020-02-11-combined-orcids.txt -o /tmp/2020-02-11-combined-names.txt -d # sort names, copy to cg-creator-id.xml, add XML formatting, and then format with tidy (preserving accents) $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-id.xml ``` - Then I noticed some author names had changed, so I captured the old and new names in a CSV file and fixed them using `fix-metadata-values.py`: ``` $ ./fix-metadata-values.py -i 2020-02-11-correct-orcid-ids.csv -db dspace -u dspace -p 'fuuu' -f cg.creator.id -t correct -m 240 -d ``` - On a hunch I decided to try to add these ORCID iDs to existing items that might not have them yet - I checked the database for likely matches to the author name and then created a CSV with the author names and ORCID iDs: ``` dc.contributor.author,cg.creator.id "Staver, Charles",charles staver: 0000-0002-4532-6077 "Staver, C.",charles staver: 0000-0002-4532-6077 "Fungo, R.",Robert Fungo: 0000-0002-4264-6905 "Remans, R.",Roseline Remans: 0000-0003-3659-8529 "Remans, Roseline",Roseline Remans: 0000-0003-3659-8529 "Rietveld A.",Anne Rietveld: 0000-0002-9400-9473 "Rietveld, A.",Anne Rietveld: 0000-0002-9400-9473 "Rietveld, A.M.",Anne Rietveld: 0000-0002-9400-9473 "Rietveld, Anne M.",Anne Rietveld: 0000-0002-9400-9473 "Fongar, A.",Andrea Fongar: 0000-0003-2084-1571 "Müller, Anna",Anna Müller: 0000-0003-3120-8560 "Müller, A.",Anna Müller: 0000-0003-3120-8560 ``` - Running the `add-orcid-identifiers-csv.py` script I added 144 ORCID iDs to items on CGSpace! ``` $ ./add-orcid-identifiers-csv.py -i /tmp/2020-02-11-add-orcid-ids.csv -db dspace -u dspace -p 'fuuu' ``` - Minor updates to all Python utility scripts in the CGSpace git repository - Update the spider agent patterns in CGSpace `5_x-prod` branch from the latest [COUNTER-Robots](https://github.com/atmire/COUNTER-Robots) project - I ran the `check-spider-hits.sh` script with the updated file and purged 6,000 hits from our Solr statistics core on CGSpace ## 2020-02-12 - Follow up with people about AReS funding for next phase - Peter asked about the "stats" and "summary" reports that he had requested in December - I opened a [new issue on AReS for the "summary" report](https://github.com/ilri/AReS/issues/13) - Peter asked me to update John McIntire's name format on CGSpace so I ran the following PostgreSQL query: ``` dspace=# UPDATE metadatavalue SET text_value='McIntire, John M.' WHERE resource_type_id=2 AND metadata_field_id=3 AND text_value='McIntire, John'; UPDATE 26 ``` ## 2020-02-17 - A few days ago Atmire responded to my question about DSpace 6.4-SNAPSHOT saying that they can only confirm that 6.3 works with their modules - I responded to say that we agree to target 6.3, but that I will cherry-pick important patches from the `dspace-6_x` branch at our own responsibility - Send a message to dspace-devel asking them to tag DSpace 6.4 - Udana from IWMI asked about the OAI base URL for their community on CGSpace - I think it should be this: https://cgspace.cgiar.org/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=com_10568_16814 ## 2020-02-19 - I noticed a thread on the mailing list about the Tomcat header size and Solr max boolean clauses error - The solution is to do as we have done and increase the headers / boolean clauses, or to simply [disable access rights awareness](https://wiki.lyrasis.org/display/DSPACE/TechnicalFaq#TechnicalFAQ-I'mgetting%22SolrException:BadRequest%22followedbyalongqueryora%22tooManyClauses%22Exception) in Discovery - I applied the fix to the `5_x-prod` branch and cherry-picked it to `6_x-dev` - Upgrade Tomcat from 7.0.99 to 7.0.100 in [Ansible infrastructure playbooks](https://github.com/ilri/rmg-ansible-public) - Upgrade PostgreSQL JDBC driver from 42.2.9 to 42.2.10 in [Ansible infrastructure playbooks](https://github.com/ilri/rmg-ansible-public) - Run Tomcat and PostgreSQL JDBC driver updates on DSpace Test (linode19)