Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
I adjusted it to default to 0 and added a note to the admin screen
I realized that this issue was actually causing the first page of 100 statistics to be missing…
For example, this item has 51 views on CGSpace, but 0 on AReS
Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API
I adjusted it to default to 0 and added a note to the admin screen
I realized that this issue was actually causing the first page of 100 statistics to be missing…
For example, this item has 51 views on CGSpace, but 0 on AReS
<li>Peter notified me that some filters on AReS were broken again
<ul>
<li>It’s the same issue with the field names getting <code>.keyword</code> appended to the end that I already <ahref="https://github.com/ilri/OpenRXV/issues/66">filed an issue on OpenRXV about last month</a></li>
<li>I fixed the broken filters (careful to not edit any others, lest they break too!)</li>
<li>Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
<ul>
<li>The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API</li>
<li>I adjusted it to default to 0 and added a note to the admin screen</li>
<li>I realized that this issue was actually causing the first page of 100 statistics to be missing…</li>
<li>For example, <ahref="https://cgspace.cgiar.org/handle/10568/66839">this item</a> has 51 views on CGSpace, but 0 on AReS</li>
</ul>
</li>
</ul>
<ul>
<li>Start a re-index on AReS
<ul>
<li>First delete the old Elasticsearch temp index:</li>
<li>Then, the next morning when it’s done, check the results of the harvesting, backup the current <code>openrxv-items</code> index, and clone the <code>openrxv-items-temp</code> index to <code>openrxv-items</code>:</li>
<li>There is one item that appears twice in AReS: <ahref="https://cgspace.cgiar.org/handle/10568/66839">10568/66839</a>
<ul>
<li>If I use the Handle filter I see it twice… whereas other items don’t appear twice</li>
<li>I filed a bug on OpenRXV: <ahref="https://github.com/ilri/OpenRXV/issues/67">https://github.com/ilri/OpenRXV/issues/67</a></li>
</ul>
</li>
<li>Help Peter troubleshoot an issue with Altmetric badges on AReS
<ul>
<li>He generated a report of our repository from Altmetric and noticed that many were missing scores despite having scores on CGSpace item pages</li>
<li>AReS harvest Altmetric scores using the Handle prefix (10568) in batch, while CGSpace uses the DOI if it is found, and falls back to using the Handle</li>
<li>I think it’s due to the fact that some items were never tweeted, so Altmetric never made the link between the DOI and the Handle</li>
<li>I did some tweets of five items and within an hour or so the DOI API link registers the associated Handle, and within an hour or so the Handle API link is live with the same score</li>
<li>A user sent me <ahref="https://github.com/ilri/dspace-statistics-api/issues/12">feedback about the dspace-statistics-api</a>
<ul>
<li>He noticed that the indexer fails if there are unmigrated legacy records in Solr</li>
<li>I added a UUID filter to the queries in the indexer</li>
</ul>
</li>
<li>I generated a CSV of titles and Handles for 2019 and 2020 items for Peter to Tweet
<ul>
<li>We need to make sure that Altmetric has linked them all with their DOIs</li>
<li>I wrote a quick and dirty script called <ahref="https://gist.github.com/alanorth/281b7624301049e8fa91742b9b8c51b9">doi-to-handle.py</a> to read the DOIs from a text file, query the database, and save the handles and titles to a CSV</li>
<li>He wanted me to give him CSV export permissions on CGSpace, but I told him that this requires super admin so I’m not comfortable with it</li>
<li>Import twenty CABI book chapters for Abenet</li>
<li>Udana and some editors from IWMI are still having problems editing metadata during the workflow step
<ul>
<li>It is the same issue Peter reported last month, that values he edits are not saved when the item gets archived</li>
<li>I added myself the the edit and approval steps of <ahref="https://dspacetest.cgiar.org/handle/10568/81589">the collection</a> on DSpace Test and asked Udana to submit an item there for me to test</li>
</ul>
</li>
<li>Atmire got back to me about the duplicate data in Solr
<ul>
<li>They want to arrange a time for us to do the stats processing so they can monitor it</li>
<li>I proposed that I set everything up with a fresh Solr snapshot from CGSpace and then let them start the stats process</li>
</ul>
</li>
</ul>
<h2id="2021-01-10">2021-01-10</h2>
<ul>
<li>Dominique from IWMI asked about API access to the IWMI collections
<ul>
<li>A partner of theirs called AMCOW is interested in harvesting their publications</li>
<li>I told her that they can use the REST API or OAI to get them from the <ahref="https://cgspace.cgiar.org/handle/10568/36185">IWMI Journal Articles collection</a>:
<li>Udana submitted an item to <ahref="https://dspacetest.cgiar.org/handle/10568/81589">the collection</a> on DSpace Test that I discussed last week
<ul>
<li>I was able to take the task, add a new AGROVOC subject, approve the task, and commit it to archive</li>
<li>The final item had my new AGROVOC subject, so I don’t see the issue</li>
<li>Perhaps the issue only occurs when we replace an existing field? Or only on IWMI fields? I don’t know…</li>
<li>Also there is this warning that occurs in the DSpace log during editing (and many other operations):</li>
</ul>
</li>
</ul>
<pre><codeclass="language-console"data-lang="console">2021-01-10 10:03:27,692 WARN com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=1e8fb96c-b994-4fe2-8f0c-0a98ab138be0, ObjectType=(Unknown), ObjectID=null, TimeStamp=1610269383279, dispatcher=1544803905, detail=[null], transactionID="TX35636856957739531161091194485578658698")
</code></pre><ul>
<li>I filed <ahref="https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=907">a bug on Atmire’s issue tracker</a></li>
<li>Peter asked me to move the CGIAR Gender Platform community to the top level of CGSpace, but I get an error when I use the community-filiator command:</li>
<li>The AReS indexing finished this morning and I moved the <code>openrxv-items-temp</code> core to <code>openrxv-items</code> (see above)
<ul>
<li>I sorted the explorer results by Altmetric attention score and I see a few new ones on the top so I think the recent tweeting of Handles by Peter and myself worked</li>
</ul>
</li>
<li>I deployed the community-filiator fix on CGSpace and moved the Gender Platform community to the top level of CGSpace:</li>
<li>IWMI is really pressuring us to have a periodic CSV export of their community
<ul>
<li>I decided to write a systemd timer to use <code>dspace metadata-export</code> every week, and made an nginx alias to make it available <ahref="https://cgspace.cgiar.org/iwmi.csv">publicly</a></li>
<li>It is part of the <ahref="https://github.com/ilri/rmg-ansible-public">Ansible infrastructure scripts</a> that I use to provision the servers</li>
<li>I wrote to Atmire to tell them to try their CUA duplicates processor on DSpace Test whenever they get a chance this week
<ul>
<li>I verified that there were indeed duplicate metadata values in the <code>userAgent_ngram</code> and <code>userAgent_search</code> fields, even in the first few results I saw in Solr</li>
<li>For reference, the UID of the record I saw with duplicate metadata was: 50e52a06-ffb7-4597-8d92-1c608cc71c98</li>
<li>I filed <ahref="https://github.com/AgriculturalSemantics/cg-core/issues/30">an issue on cg-core</a> asking about how to handle series name / number
<ul>
<li>Currently the values are in format “series name; series number” in the <code>dc.relation.ispartofseries</code> field, but Peter wants to be able to separate them</li>