<li>I created a GitHub issue to track this <ahref="https://github.com/ilri/DSpace/issues/389">#389</a>, because I’m super busy in Nairobi right now</li>
<li>I added Phil Thornton and Sonal Henson’s ORCID identifiers to the controlled vocabulary for <code>cg.creator.orcid</code> and then re-generated the names using my <ahref="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script:</li>
<li>But in super positive news, he says they are using my new <ahref="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> and it’s MUCH faster than using Atmire CUA’s internal “restlet” API</li>
<li>I don’t recognize the <code>138.201.49.199</code> IP, but it is in Germany (Hetzner) and appears to be paginating over some browse pages and downloading bitstreams:</li>
<li>It’s clearly a bot and it’s not re-using its Tomcat session, so I will add its IP to the nginx bad bot list</li>
<li>I looked in Solr’s statistics core and these hits were actually all counted as <code>isBot:false</code> (of course)… hmmm</li>
<li>I tagged all of Sonal and Phil’s items with their ORCID identifiers on CGSpace using my <ahref="https://gist.github.com/alanorth/a49d85cd9c5dea89cddbe809813a7050">add-orcid-identifiers.py</a> script:</li>
<li>Salem raised an issue that the dspace-statistics-api reports downloads for some items that have no bitstreams (like many limited access items)</li>
<li>Every item has at least a <code>LICENSE</code> bundle, and some have a <code>THUMBNAIL</code> bundle, but the indexing code is specifically checking for downloads from the <code>ORIGINAL</code> bundle
<li>I see there are other bundles we might need to pay attention to: <code>TEXT</code>, <code>@_LOGO-COLLECTION_@</code>, <code>@_LOGO-COMMUNITY_@</code>, etc…</li>
<li>On a hunch I dropped the statistics table and re-indexed and now those two items above have no downloads</li>
<li>AgriKnowledge says they’re going to add the <code>dc.identifier.uri</code> to their item view in November when they update their website software</li>
<li>The ImageMagick version is currently 8:6.8.9.9-7ubuntu5.13, and there is an <ahref="https://usn.ubuntu.com/3785-1/">Ubuntu Security Notice from 2018-10-04</a></li>
<li>Wow, someone on <ahref="https://twitter.com/rosscampbell/status/1048268966819319808">Twitter posted about this breaking his web application</a> (and it was retweeted by the ImageMagick acount!)</li>
<li>I commented out the line that disables PDF thumbnails in <code>/etc/ImageMagick-6/policy.xml</code>:</li>
<li>I emailed DuraSpace to update <ahref="https://duraspace.org/registry/entry/4188/?gvid=178">our entry in their DSpace registry</a> (the data was still on DSpace 3, JSPUI, etc)</li>
<pretabindex="0"><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-10-11-top-1500-subject.csv WITH CSV HEADER;
<li>Give WorldFish advice about Handles because they are talking to some company called KnowledgeArc who recommends they do not use Handles!</li>
<li>Last week I emailed Altmetric to ask if their software would notice mentions of our Handle in the format “handle:10568/80775” because I noticed that the <ahref="https://landportal.org/library/resources/handle1056880775/unlocking-farming-potential-bangladesh%E2%80%99-polders">Land Portal does this</a></li>
<li>Altmetric support responded to say no, but the reason is that Land Portal is doing even more strange stuff by not using <code><meta></code> tags in their page header, and using “dct:identifier” property instead of “dc:identifier”</li>
<li>I re-created my local DSpace databse container using <ahref="https://github.com/containers/libpod">podman</a> instead of Docker:</li>
<pretabindex="0"><code>dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 3 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 10000) to /tmp/2018-10-11-top-10000-authors.csv WITH CSV HEADER;
<li>CTA uploaded some infographics that are very tall and their thumbnails disrupt the item lists on the front page and in their communities and collections</li>
<li>I decided to constrain the max height of these to 200px using CSS (<ahref="https://github.com/ilri/DSpace/pull/392">#392</a>)</li>
<li>I will apply these on CGSpace when I do the other updates tomorrow, as well as double check the high scoring ones to see if they are correct in Sisay’s author controlled vocabulary</li>
<li>Merge the authors controlled vocabulary (<ahref="https://github.com/ilri/DSpace/pull/393">#393</a>), usage rights (<ahref="https://github.com/ilri/DSpace/pull/394">#394</a>), and the upstream DSpace 5.x cherry-picks (<ahref="https://github.com/ilri/DSpace/pull/395">#394</a>) into our <code>5_x-prod</code> branch</li>
<li>Switch to new CGIAR LDAP server on CGSpace, as it’s been running (at least for authentication) on DSpace Test for the last few weeks, and I think they old one will be deprecated soon (today?)</li>
<li>Apply Peter’s 746 author corrections on CGSpace and DSpace Test using my <ahref="https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897">fix-metadata-values.py</a> script:</li>
<li>Run all system updates on CGSpace (linode19) and reboot the server</li>
<li>After rebooting the server I noticed that Handles are not resolving, and the <code>dspace-handle-server</code> systemd service is not running (or rather, it exited with success)</li>
<li>Restarting the service with systemd works for a few seconds, then the java process quits</li>
<li>I suspect that the systemd service type needs to be <code>forking</code> rather than <code>simple</code>, because the service calls the default DSpace <code>start-handle-server</code> shell script, which uses <code>nohup</code> and <code>&</code>to background the java process</li>
<li>It would be nice if there was a cleaner way to start the service and then just log to the systemd journal rather than all this hiding and log redirecting</li>
<li>Email the Landportal.org people to ask if they would consider Dublin Core metadata tags in their page’s header, rather than the HTML properties they are using in their body</li>
<li>When I tried to generate them manually I noticed that the path to the CMYK profile had changed because Ubuntu upgraded Ghostscript from 9.18 to 9.25 last week… WTF?</li>
<li>Looks like I can use <code>/usr/share/ghostscript/current</code> instead of <code>/usr/share/ghostscript/9.25</code>…</li>
<li>I limited the tall thumbnails even further to 170px because Peter said CTA’s were still too tall at 200px (<ahref="https://github.com/ilri/DSpace/pull/396">#396</a>)</li>
<li>I don’t see anything in the Catalina logs or <code>dmesg</code>, and the Tomcat manager shows XMLUI, REST, OAI, etc all “Running: false”</li>
<li>Actually, now I remember that yesterday when I deployed the latest changes from git on DSpace Test I noticed a syntax error in one XML file when I was doing the discovery reindexing</li>
<pretabindex="0"><code>dspace=# \copy (SELECT (CASE when metadata_schema_id=1 THEN 'dc' WHEN metadata_schema_id=2 THEN 'cg' END) AS schema, element, qualifier, scope_note FROM metadatafieldregistry where metadata_schema_id IN (1,2)) TO /tmp/cgspace-schema.csv WITH CSV HEADER;
<li>Talking to the CodeObia guys about the REST API I started to wonder why it’s so slow and how I can quantify it in order to ask the dspace-tech mailing list for help profiling it</li>
<li>Interestingly, the speed doesn’t get better after you request the same thing multiple times–it’s consistently bad on both CGSpace and DSpace Test!</li>
<pretabindex="0"><code>$ time http --print h 'https://cgspace.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
<pretabindex="0"><code>$ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,bitstreams,parentCommunityList&limit=100&offset=0'
<li>I decided to update most of the existing metadata values that we have in <code>dc.rights</code> on CGSpace to be machine readable in SPDX format (with Creative Commons version if it was included)</li>
<li>Most of the are from Bioversity, and I asked Maria for permission before updating them</li>
<pretabindex="0"><code>UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%CC BY %';
UPDATE metadatavalue SET text_value='CC-BY-NC-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-ND%' AND text_value LIKE '%by-nc-nd%';
UPDATE metadatavalue SET text_value='CC-BY-NC-SA-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%BY-NC-SA%' AND text_value LIKE '%by-nc-sa%';
UPDATE metadatavalue SET text_value='CC-BY-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value LIKE '%/by/%';
UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%/by/%' AND text_value NOT LIKE '%zero%';
UPDATE metadatavalue SET text_value='CC-BY-NC-2.5' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE
'%/by-nc%' AND text_value LIKE '%2.5%';
UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%/by-nc%' AND text_value LIKE '%4.0%';
UPDATE metadatavalue SET text_value='CC-BY-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%zero%';
UPDATE metadatavalue SET text_value='CC-BY-NC-SA-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value LIKE '%4.0%' AND text_value LIKE '%Attribution-NonCommercial-ShareAlike%';
UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%4.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution-NonCommercial %';
UPDATE metadatavalue SET text_value='CC-BY-NC-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution-NonCommercial %';
UPDATE metadatavalue SET text_value='CC-BY-3.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value LIKE '%3.0%' AND text_value NOT LIKE '%zero%' AND text_value LIKE '%Attribution %';
UPDATE metadatavalue SET text_value='CC-BY-ND-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78184;
UPDATE metadatavalue SET text_value='CC-BY' WHERE resource_type_id=2 AND metadata_field_id=53 AND text_value NOT LIKE '%zero%' AND text_value NOT LIKE '%CC0%' AND text_value LIKE '%Attribution %' AND text_value NOT LIKE '%CC-%';
UPDATE metadatavalue SET text_value='CC-BY-NC-4.0' WHERE resource_type_id=2 AND metadata_field_id=53 AND resource_id=78564;
<li>I updated the fields on CGSpace and then started a re-index of Discovery</li>
<li>We also need to re-think the <code>dc.rights</code> field in the submission form: we should probably use a popup controlled vocabulary and list the Creative Commons values with version numbers and allow the user to enter their own (like the ORCID identifier field)</li>
<li>Ask Jane if we can use some of the BDP money to host AReS explorer on a more powerful server</li>
<li>IWMI sent me a list of new ORCID identifiers for their staff so I combined them with our list, updated the names with my <ahref="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script, and regenerated the controlled vocabulary:</li>
<li>I also decided to add the ORCID identifiers that MEL had sent us a few months ago…</li>
<li>One problem I had with the <code>resolve-orcids.py</code> script is that one user seems to have disabled their profile data since we last updated:</li>
<li>So I need to handle that situation in the script for sure, but I’m not sure what to do organizationally or ethically, since that user disabled their name! Do we remove him from the list?</li>
<li>I made a pull request and merged the ORCID updates into the <code>5_x-prod</code> branch (<ahref="https://github.com/ilri/DSpace/pull/397">#397</a>)</li>
<li>Improve the logic of name checking in my <ahref="https://gist.github.com/alanorth/57a88379126d844563c1410bd7b8d12b">resolve-orcids.py</a> script</li>
<li>I granted MEL’s deposit user admin access to IITA, CIP, Bioversity, and RTB communities on DSpace Test so they can start testing real depositing</li>
<li>I upgraded PostgreSQL to 9.6 on DSpace Test using Ansible, then had to manually <ahref="https://wiki.postgresql.org/wiki/Using_pg_upgrade_on_Ubuntu/Debian">migrate from 9.5 to 9.6</a>:</li>
<li>I was going to try to run Solr in Docker because I learned I can run Docker on Travis-CI (for testing my dspace-statistics-api), but the oldest official Solr images are for 5.5, and DSpace’s Solr configuration is for 4.9</li>
<li>This means our existing Solr configuration doesn’t run in Solr 5.5:</li>
<li>Last month I added “crawl” to the Tomcat Crawler Session Manager Valve’s regular expression matching, and it seems to be working for MegaIndex’s user agent:</li>
<li>Post message to Yammer about usage rights (dc.rights)</li>
<li>Change <code>build.properties</code> to use HTTPS for Handles in our <ahref="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure playbooks</a></li>
<pretabindex="0"><code>dspace=# UPDATE metadatavalue SET text_value=replace(text_value, 'http://', 'https://') WHERE resource_type_id=2 AND text_value LIKE 'http://hdl.handle.net%';
<li>While I was doing that I found two items using CGSpace URLs instead of handles in their <code>dc.identifier.uri</code> so I corrected those</li>
<li>I also found several items that had invalid characters or multiple Handles in some related URL field like <code>cg.link.reference</code> so I corrected those too</li>
<li>Improve the usage rights on the submission form by adding a default selection with no value as well as a better hint to look for the CC license on the publisher page or in the PDF (<ahref="https://github.com/ilri/DSpace/pull/398">#398</a>)</li>
<li>I deployed the changes on CGSpace, ran all system updates, and rebooted the server</li>
<li>Also, I updated all Handles in the database to use HTTPS:</li>
<pretabindex="0"><code>dspace=# UPDATE metadatavalue SET text_value=replace(text_value, 'http://', 'https://') WHERE resource_type_id=2 AND text_value LIKE 'http://hdl.handle.net%';
<li>Improve the usage rights (dc.rights) on CGSpace again by adding the long names in the submission form, as well as adding versio 3.0 and Creative Commons Zero (CC0) public domain license (<ahref="https://github.com/ilri/DSpace/pull/399">#399</a>)</li>
<li>Add “usage rights” to the XMLUI item display (<ahref="https://github.com/ilri/DSpace/pull/400">#400</a>)</li>
<li>I emailed the MARLO guys to ask if they can send us a dump of rights data and Handles from their system so we can tag our older items on CGSpace</li>
<li>Re-work the <ahref="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-api</a> to use Python’s native json instead of ujson to make it easier to deploy in places where we don’t have—or don’t want to have—Python headers and a compiler (like containers)</li>
<li>Re-work the deployment of the API to use systemd’s <code>EnvironmentFile</code> to read the environment variables instead of <code>Environment</code> in the <ahref="https://github.com/ilri/rmg-ansible-public">RMG Ansible infrastructure scripts</a></li>
<li>Send Peter and Jane a list of technical ToRs for AReS open source work:</li>
<li>Basic version of AReS that works with metadata fields present in default DSpace 5.x/6.x (for example author, date issued, type, subjects)
<ul>
<li>Ability to harvest multiple repositories</li>
<li>Configurable list of extra fields to harvest, per repository</li>
<li>Configurable list of field and value mappings for consistent display/search with multiple repositories</li>
<li>Configurable list of graphs/blocks to display on homepage</li>
<li>Optional harvesting of DSpace view/download statistics if dspace-statistics-api is available on repository</li>
<li>Optional harvesting of Altmetric mentions</li>
<li>Configurable scheduling of harvesting (daily, weekly, etc)</li>
<li>High-quality README.md on GitHub with description, requirements, deployment instructions, and license (GPLv3 unless ICARDA has a problem with that)</li>
<li>Maria asked if we can add publisher (<code>dc.publisher</code>) to the advanced search filters, so I created a <ahref="https://github.com/ilri/DSpace/issues/401">GitHub issue</a> to track it</li>
<li>I forked the <ahref="https://github.com/alanorth/SolrClient/tree/kazoo-2.5.0">SolrClient library and updated its kazoo dependency to be version 2.5.0</a> so we stop getting errors about “async” being a reserved keyword in Python 3.7</li>
<li>Then I re-generated the <code>requirements.txt</code> in the <ahref="https://github.com/alanorth/dspace-statistics-api">dspace-statistics-library</a> and released version 0.5.2</li>
<li>Then I re-deployed the API on DSpace Test, ran all system updates on the server, and rebooted it</li>
<li>I tested my hack of depositing to one collection where the default item and bistream READ policies are restricted and then mapping the item to another collection, but the item retains its default policies so Anonymous cannot see them in the mapped collection either</li>
<li>I merged the changes for adding publisher (<code>dc.publisher</code>) to the advanced search to the <code>5_x-prod</code> branch (<ahref="https://github.com/ilri/DSpace/pull/402">#402</a>)</li>
<li>I merged the changes for adding versionless Creative Commons licenses to the submission form to the <code>5_x-prod</code> branch (<ahref="https://github.com/ilri/DSpace/pull/403">#403</a>)</li>
<li>I suggested that they look into submitting via the <ahref="https://wiki.lyrasis.org/display/DSDOC5x/SWORDv2+Server">SWORDv2</a> protocol because it respects the workflows</li>
<li>They said that they’re not too worried about the hierarchical CG core schema, that they would just flatten metadata like affiliations when depositing to a DSpace repository</li>
<li>I said that it might be time to engage the DSpace community to add support for more advanced schemas in DSpace 7+ (perhaps partnership with Atmire?)</li>
<li>More discussion and planning for AReS open sourcing and Amman meeting in 2019-10</li>
<li>I did some work to clean up and improve the dspace-statistics-api README.md and project structure and <ahref="https://github.com/ilri/dspace-statistics-api">moved it to the ILRI organization on GitHub</a></li>
<li>Now the API serves some basic documentation on the root route</li>
<li>I want to announce it to the dspace-tech mailing list soon</li>