<li>Export another list of ALL subjects on CGSpace, including AGROVOC and non-AGROVOC for Enrico:</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span>localhost/dspace63= > \COPY (SELECT DISTINCT LOWER(text_value) AS subject, count(*) FROM metadatavalue WHERE dspace_object_id in (SELECT dspace_object_id FROM item) AND metadata_field_id IN (119, 120, 127, 122, 128, 125, 135, 203, 208, 210, 215, 123, 236, 242, 187) GROUP BY subject ORDER BY count DESC) to /tmp/2021-07-01-all-subjects.csv WITH CSV HEADER;
<li>IWMI notified me that AReS was down with an HTTP 502 error
<ul>
<li>Looking at UptimeRobot I see it has been down for 33 hours, but I never got a notification</li>
<li>I don’t see anything in the Elasticsearch container logs, or the systemd journal on the host, but I notice that the <code>angular_nginx</code> container isn’t running</li>
<li>I simply started it and AReS was running again:</li>
</ul>
</li>
</ul>
<ahref='https://alanorth.github.io/cgspace-notes/2021-06/'>Read more →</a>
<li>I looked at the top user agents and IPs in the Solr statistics for last month and I see these user agents:
<ul>
<li>“RI/1.0”, 1337</li>
<li>“Microsoft Office Word 2014”, 941</li>
</ul>
</li>
<li>I will add the RI/1.0 pattern to our DSpace agents overload and purge them from Solr (we had previously seen this agent with 9,000 hits or so in 2020-09), but I think I will leave the Microsoft Word one… as that’s an actual user…</li>
</ul>
<ahref='https://alanorth.github.io/cgspace-notes/2021-05/'>Read more →</a>
<p>Changes to CGSpace metadata fields to align more with DC, QDC, and DCTERMS as well as CG Core v2. Implemented on 2021-02-21.</p>
<p>With reference to <ahref="https://agriculturalsemantics.github.io/cg-core/cgcore.html">CG Core v2 draft standard</a> by Marie-Angélique as well as <ahref="http://www.dublincore.org/specifications/dublin-core/dcmi-terms/">DCMI DCTERMS</a>.</p>
<ahref='https://alanorth.github.io/cgspace-notes/cgspace-cgcorev2-migration/'>Read more →</a>
<li>Abenet said that CIP found more duplicate records in their export from AReS
<ul>
<li>I re-opened <ahref="https://github.com/ilri/OpenRXV/issues/67">the issue</a> on OpenRXV where we had previously noticed this</li>
<li>The shared link where the duplicates are is here: <ahref="https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6">https://cgspace.cgiar.org/explorer/shared/heEOz3YBnXdK69bR2ra6</a></li>
</ul>
</li>
<li>I had a call with CodeObia to discuss the work on OpenRXV</li>
<li>Check the results of the AReS harvesting from last night:</li>
<li>Peter notified me that some filters on AReS were broken again
<ul>
<li>It’s the same issue with the field names getting <code>.keyword</code> appended to the end that I already <ahref="https://github.com/ilri/OpenRXV/issues/66">filed an issue on OpenRXV about last month</a></li>
<li>I fixed the broken filters (careful to not edit any others, lest they break too!)</li>
</ul>
</li>
<li>Fix an issue with start page number for the DSpace REST API and statistics API in OpenRXV
<ul>
<li>The start page had been “1” in the UI, but in the backend they were doing some gymnastics to adjust to the zero-based offset/limit/page of the DSpace REST API and the statistics API</li>
<li>I adjusted it to default to 0 and added a note to the admin screen</li>
<li>I realized that this issue was actually causing the first page of 100 statistics to be missing…</li>
<li>For example, <ahref="https://cgspace.cgiar.org/handle/10568/66839">this item</a> has 51 views on CGSpace, but 0 on AReS</li>
</ul>
</li>
</ul>
<ahref='https://alanorth.github.io/cgspace-notes/2021-01/'>Read more →</a>
<li>Atmire responded about the issue with duplicate data in our Solr statistics
<ul>
<li>They noticed that some records in the statistics-2015 core haven’t been migrated with the AtomicStatisticsUpdateCLI tool yet and assumed that I haven’t migrated any of the records yet</li>
<li>That’s strange, as I checked all ten cores and 2015 is the only one with some unmigrated documents, as according to the <code>cua_version</code> field</li>
<li>I started processing those (about 411,000 records):</li>
</ul>
</li>
</ul>
<ahref='https://alanorth.github.io/cgspace-notes/2020-12/'>Read more →</a>