Add notes

This commit is contained in:
Alan Orth 2023-12-29 12:08:57 +03:00
parent 293b500b26
commit 264cdcf1db
Signed by: alanorth
GPG Key ID: 0FB860CC9C45B1B9
38 changed files with 225 additions and 52 deletions

View File

@ -185,4 +185,87 @@ dspace=*# COMMIT;
COMMIT
```
## 2023-12-25
- Looking into [Solr backups](https://solr.apache.org/guide/8_11/making-and-restoring-backups.html)
- Since we are not running in Solr Cloud mode we need to use the replication endpoint for Solr standalone
- This works:
```console
$ curl 'http://localhost:8983/solr/statistics/replication?command=backup'
{
"responseHeader":{
"status":0,
"QTime":26},
"status":"OK"}
```
- Then I saw the size of the snapshot reach the size of the index...
```console
# du -sh /var/solr/data/configsets/statistics/data/*
22G /var/solr/data/configsets/statistics/data/index
16G /var/solr/data/configsets/statistics/data/snapshot.20231225074111671
4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata
# du -sh /var/solr/data/configsets/statistics/data/*
22G /var/solr/data/configsets/statistics/data/index
20G /var/solr/data/configsets/statistics/data/snapshot.20231225074111671
4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata
# du -sh /var/solr/data/configsets/statistics/data/*
22G /var/solr/data/configsets/statistics/data/index
21G /var/solr/data/configsets/statistics/data/snapshot.20231225074111671
4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata
# du -sh /var/solr/data/configsets/statistics/data/*
22G /var/solr/data/configsets/statistics/data/index
22G /var/solr/data/configsets/statistics/data/snapshot.20231225074111671
4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata
```
- Then I deleted the core and restored from the snapshot backup:
```console
$ curl http://localhost:8983/solr/statistics/update -H "Content-type: text/xml" --data-binary '<delete><query>*:*</query></delete>'
$ curl http://localhost:8983/solr/statistics/update -H "Content-type: text/xml" --data-binary '<commit />'
$ curl 'http://localhost:8983/solr/statistics/replication?command=restore&name=statistics'
```
- Interestingly the import worked fine, but created a new data index:
```console
# du -sh /var/solr/data/configsets/statistics/data/*
4.0K /var/solr/data/configsets/statistics/data/index.properties
22G /var/solr/data/configsets/statistics/data/restore.20231225154626463
4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata
22G /var/solr/data/configsets/statistics/data/snapshot.statistics
```
- Not sure the implications of that—Solr uses the data just fine
- I can surely use this for atomic Solr backups
## 2023-12-27
- Delete duplicate metadata as described in my DSpace issue from last year: https://github.com/DSpace/DSpace/issues/8253
- Do some other metadata cleanups on CGSpace
- I also looked up our DOIs on Crossref to get some missing abstracts and correct licenses and dates
- Some minor work on the CGSpace DSpace 7 theme to fix the navbar on mobile
- Some work on the IFPRI ISNAR archive
## 2023-12-28
- I started porting the [cgspace-java-helpers](https://github.com/ilri/cgspace-java-helpers) to DSpace 7
- Some work on the IFPRI ISNAR archive
- I ended up going through most of the PDFs to get better dates and abstracts
## 2023-12-29
- I created a new Hetzner server to replace the current DSpace 6 CGSpace next week when we migrate to DSpace 7
- Interesting, I haven't checked for content pointing to legacy domains in several years (!)
- `inurl:mahider.cgiar.org`: 0 results on Google!
- `inurl:mahider.ilri.org`: 2,100 results on Google
- `inurl:mahider.ilri.org inurl:https`: 2 results on Google (!)
- `inurl:dspace.ilri.org:` 1,390 results on Google
- `inurl:dspace.ilri.org inurl:https`: 0 results on Google (!)
- So it seems I can do away with the HTTPS virtual hosts finally
- Well my current certificates expired on 2021-02-13 and nobody noticed... so...
<!-- vim: set sw=2 ts=2: -->

View File

@ -7,17 +7,17 @@
<meta property="og:title" content="July, 2023" />
<meta property="og:description" content="2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &ldquo;Copyrighted; all rights reserved&rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&rsquo;s usually copyrighted (could still be open access, but we can&rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&rsquo;t like the Impact Area icons as a component because they don&rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &ldquo;acceptedVersion&rdquo;, which is presumably the author&rsquo;s version, as opposed to the &ldquo;publishedVersion&rdquo;, which means it&rsquo;s available as open access on the publisher&rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as describe in my DSpace issue from last year: https://github." />
<meta property="og:description" content="2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &ldquo;Copyrighted; all rights reserved&rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&rsquo;s usually copyrighted (could still be open access, but we can&rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&rsquo;t like the Impact Area icons as a component because they don&rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &ldquo;acceptedVersion&rdquo;, which is presumably the author&rsquo;s version, as opposed to the &ldquo;publishedVersion&rdquo;, which means it&rsquo;s available as open access on the publisher&rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as described in my DSpace issue from last year: https://github." />
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2023-07/" />
<meta property="article:published_time" content="2023-07-01T17:14:36+03:00" />
<meta property="article:modified_time" content="2023-08-02T23:04:11+03:00" />
<meta property="article:modified_time" content="2023-12-27T10:48:32+03:00" />
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="July, 2023"/>
<meta name="twitter:description" content="2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &ldquo;Copyrighted; all rights reserved&rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&rsquo;s usually copyrighted (could still be open access, but we can&rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&rsquo;t like the Impact Area icons as a component because they don&rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &ldquo;acceptedVersion&rdquo;, which is presumably the author&rsquo;s version, as opposed to the &ldquo;publishedVersion&rdquo;, which means it&rsquo;s available as open access on the publisher&rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as describe in my DSpace issue from last year: https://github."/>
<meta name="twitter:description" content="2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &ldquo;Copyrighted; all rights reserved&rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&rsquo;s usually copyrighted (could still be open access, but we can&rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&rsquo;t like the Impact Area icons as a component because they don&rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &ldquo;acceptedVersion&rdquo;, which is presumably the author&rsquo;s version, as opposed to the &ldquo;publishedVersion&rdquo;, which means it&rsquo;s available as open access on the publisher&rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as described in my DSpace issue from last year: https://github."/>
<meta name="generator" content="Hugo 0.121.1">
@ -30,7 +30,7 @@
"url": "https://alanorth.github.io/cgspace-notes/2023-07/",
"wordCount": "2255",
"datePublished": "2023-07-01T17:14:36+03:00",
"dateModified": "2023-08-02T23:04:11+03:00",
"dateModified": "2023-12-27T10:48:32+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -170,7 +170,7 @@
<li>I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses</li>
</ul>
</li>
<li>Delete duplicate metadata as describe in my DSpace issue from last year: <a href="https://github.com/DSpace/DSpace/issues/8253">https://github.com/DSpace/DSpace/issues/8253</a></li>
<li>Delete duplicate metadata as described in my DSpace issue from last year: <a href="https://github.com/DSpace/DSpace/issues/8253">https://github.com/DSpace/DSpace/issues/8253</a></li>
<li>Start working on some statistics on AGROVOC usage for my presenation next week
<ul>
<li>I used the following SQL query to dump values from all subject fields and lower case them:</li>

View File

@ -11,7 +11,7 @@
<meta property="og:type" content="article" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2023-12/" />
<meta property="article:published_time" content="2023-12-01T08:48:36+03:00" />
<meta property="article:modified_time" content="2023-12-18T23:15:27+03:00" />
<meta property="article:modified_time" content="2023-12-21T10:08:59+03:00" />
@ -28,9 +28,9 @@
"@type": "BlogPosting",
"headline": "December, 2023",
"url": "https://alanorth.github.io/cgspace-notes/2023-12/",
"wordCount": "980",
"wordCount": "1323",
"datePublished": "2023-12-01T08:48:36+03:00",
"dateModified": "2023-12-18T23:15:27+03:00",
"dateModified": "2023-12-21T10:08:59+03:00",
"author": {
"@type": "Person",
"name": "Alan Orth"
@ -296,7 +296,97 @@
</span></span><span style="display:flex;"><span>UPDATE 462
</span></span><span style="display:flex;"><span>dspace=*# COMMIT;
</span></span><span style="display:flex;"><span>COMMIT
</span></span></code></pre></div><!-- raw HTML omitted -->
</span></span></code></pre></div><h2 id="2023-12-25">2023-12-25</h2>
<ul>
<li>Looking into <a href="https://solr.apache.org/guide/8_11/making-and-restoring-backups.html">Solr backups</a>
<ul>
<li>Since we are not running in Solr Cloud mode we need to use the replication endpoint for Solr standalone</li>
<li>This works:</li>
</ul>
</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl <span style="color:#e6db74">&#39;http://localhost:8983/solr/statistics/replication?command=backup&#39;</span>
</span></span><span style="display:flex;"><span>{
</span></span><span style="display:flex;"><span> &#34;responseHeader&#34;:{
</span></span><span style="display:flex;"><span> &#34;status&#34;:0,
</span></span><span style="display:flex;"><span> &#34;QTime&#34;:26},
</span></span><span style="display:flex;"><span> &#34;status&#34;:&#34;OK&#34;}
</span></span></code></pre></div><ul>
<li>Then I saw the size of the snapshot reach the size of the index&hellip;</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># du -sh /var/solr/data/configsets/statistics/data/*
</span></span><span style="display:flex;"><span>22G /var/solr/data/configsets/statistics/data/index
</span></span><span style="display:flex;"><span>16G /var/solr/data/configsets/statistics/data/snapshot.20231225074111671
</span></span><span style="display:flex;"><span>4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata
</span></span><span style="display:flex;"><span># du -sh /var/solr/data/configsets/statistics/data/*
</span></span><span style="display:flex;"><span>22G /var/solr/data/configsets/statistics/data/index
</span></span><span style="display:flex;"><span>20G /var/solr/data/configsets/statistics/data/snapshot.20231225074111671
</span></span><span style="display:flex;"><span>4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata
</span></span><span style="display:flex;"><span># du -sh /var/solr/data/configsets/statistics/data/*
</span></span><span style="display:flex;"><span>22G /var/solr/data/configsets/statistics/data/index
</span></span><span style="display:flex;"><span>21G /var/solr/data/configsets/statistics/data/snapshot.20231225074111671
</span></span><span style="display:flex;"><span>4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata
</span></span><span style="display:flex;"><span># du -sh /var/solr/data/configsets/statistics/data/*
</span></span><span style="display:flex;"><span>22G /var/solr/data/configsets/statistics/data/index
</span></span><span style="display:flex;"><span>22G /var/solr/data/configsets/statistics/data/snapshot.20231225074111671
</span></span><span style="display:flex;"><span>4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata
</span></span></code></pre></div><ul>
<li>Then I deleted the core and restored from the snapshot backup:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span>$ curl http://localhost:8983/solr/statistics/update -H <span style="color:#e6db74">&#34;Content-type: text/xml&#34;</span> --data-binary <span style="color:#e6db74">&#39;&lt;delete&gt;&lt;query&gt;*:*&lt;/query&gt;&lt;/delete&gt;&#39;</span>
</span></span><span style="display:flex;"><span>$ curl http://localhost:8983/solr/statistics/update -H <span style="color:#e6db74">&#34;Content-type: text/xml&#34;</span> --data-binary <span style="color:#e6db74">&#39;&lt;commit /&gt;&#39;</span>
</span></span><span style="display:flex;"><span>$ curl <span style="color:#e6db74">&#39;http://localhost:8983/solr/statistics/replication?command=restore&amp;name=statistics&#39;</span>
</span></span></code></pre></div><ul>
<li>Interestingly the import worked fine, but created a new data index:</li>
</ul>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-console" data-lang="console"><span style="display:flex;"><span># du -sh /var/solr/data/configsets/statistics/data/*
</span></span><span style="display:flex;"><span>4.0K /var/solr/data/configsets/statistics/data/index.properties
</span></span><span style="display:flex;"><span>22G /var/solr/data/configsets/statistics/data/restore.20231225154626463
</span></span><span style="display:flex;"><span>4.0K /var/solr/data/configsets/statistics/data/snapshot_metadata
</span></span><span style="display:flex;"><span>22G /var/solr/data/configsets/statistics/data/snapshot.statistics
</span></span></code></pre></div><ul>
<li>Not sure the implications of that—Solr uses the data just fine</li>
<li>I can surely use this for atomic Solr backups</li>
</ul>
<h2 id="2023-12-27">2023-12-27</h2>
<ul>
<li>Delete duplicate metadata as described in my DSpace issue from last year: <a href="https://github.com/DSpace/DSpace/issues/8253">https://github.com/DSpace/DSpace/issues/8253</a></li>
<li>Do some other metadata cleanups on CGSpace
<ul>
<li>I also looked up our DOIs on Crossref to get some missing abstracts and correct licenses and dates</li>
</ul>
</li>
<li>Some minor work on the CGSpace DSpace 7 theme to fix the navbar on mobile</li>
<li>Some work on the IFPRI ISNAR archive</li>
</ul>
<h2 id="2023-12-28">2023-12-28</h2>
<ul>
<li>I started porting the <a href="https://github.com/ilri/cgspace-java-helpers">cgspace-java-helpers</a> to DSpace 7</li>
<li>Some work on the IFPRI ISNAR archive
<ul>
<li>I ended up going through most of the PDFs to get better dates and abstracts</li>
</ul>
</li>
</ul>
<h2 id="2023-12-29">2023-12-29</h2>
<ul>
<li>I created a new Hetzner server to replace the current DSpace 6 CGSpace next week when we migrate to DSpace 7</li>
<li>Interesting, I haven&rsquo;t checked for content pointing to legacy domains in several years (!)
<ul>
<li><code>inurl:mahider.cgiar.org</code>: 0 results on Google!</li>
<li><code>inurl:mahider.ilri.org</code>: 2,100 results on Google</li>
<li><code>inurl:mahider.ilri.org inurl:https</code>: 2 results on Google (!)</li>
<li><code>inurl:dspace.ilri.org:</code> 1,390 results on Google</li>
<li><code>inurl:dspace.ilri.org inurl:https</code>: 0 results on Google (!)</li>
</ul>
</li>
<li>So it seems I can do away with the HTTPS virtual hosts finally
<ul>
<li>Well my current certificates expired on 2021-02-13 and nobody noticed&hellip; so&hellip;</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />
@ -213,7 +213,7 @@
</p>
</header>
2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &ldquo;Copyrighted; all rights reserved&rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&rsquo;s usually copyrighted (could still be open access, but we can&rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&rsquo;t like the Impact Area icons as a component because they don&rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &ldquo;acceptedVersion&rdquo;, which is presumably the author&rsquo;s version, as opposed to the &ldquo;publishedVersion&rdquo;, which means it&rsquo;s available as open access on the publisher&rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as describe in my DSpace issue from last year: https://github.
2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &ldquo;Copyrighted; all rights reserved&rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&rsquo;s usually copyrighted (could still be open access, but we can&rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&rsquo;t like the Impact Area icons as a component because they don&rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &ldquo;acceptedVersion&rdquo;, which is presumably the author&rsquo;s version, as opposed to the &ldquo;publishedVersion&rdquo;, which means it&rsquo;s available as open access on the publisher&rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as described in my DSpace issue from last year: https://github.
<a href='https://alanorth.github.io/cgspace-notes/2023-07/'>Read more →</a>
</article>

View File

@ -48,7 +48,7 @@
<link>https://alanorth.github.io/cgspace-notes/2023-07/</link>
<pubDate>Sat, 01 Jul 2023 17:14:36 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2023-07/</guid>
<description>2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &amp;ldquo;Copyrighted; all rights reserved&amp;rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&amp;rsquo;s usually copyrighted (could still be open access, but we can&amp;rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&amp;hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&amp;rsquo;t like the Impact Area icons as a component because they don&amp;rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&amp;rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &amp;ldquo;acceptedVersion&amp;rdquo;, which is presumably the author&amp;rsquo;s version, as opposed to the &amp;ldquo;publishedVersion&amp;rdquo;, which means it&amp;rsquo;s available as open access on the publisher&amp;rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as describe in my DSpace issue from last year: https://github.</description>
<description>2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &amp;ldquo;Copyrighted; all rights reserved&amp;rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&amp;rsquo;s usually copyrighted (could still be open access, but we can&amp;rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&amp;hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&amp;rsquo;t like the Impact Area icons as a component because they don&amp;rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&amp;rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &amp;ldquo;acceptedVersion&amp;rdquo;, which is presumably the author&amp;rsquo;s version, as opposed to the &amp;ldquo;publishedVersion&amp;rdquo;, which means it&amp;rsquo;s available as open access on the publisher&amp;rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as described in my DSpace issue from last year: https://github.</description>
</item>
<item>
<title>June, 2023</title>

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/categories/notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />
@ -228,7 +228,7 @@
</p>
</header>
2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &ldquo;Copyrighted; all rights reserved&rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&rsquo;s usually copyrighted (could still be open access, but we can&rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&rsquo;t like the Impact Area icons as a component because they don&rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &ldquo;acceptedVersion&rdquo;, which is presumably the author&rsquo;s version, as opposed to the &ldquo;publishedVersion&rdquo;, which means it&rsquo;s available as open access on the publisher&rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as describe in my DSpace issue from last year: https://github.
2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &ldquo;Copyrighted; all rights reserved&rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&rsquo;s usually copyrighted (could still be open access, but we can&rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&rsquo;t like the Impact Area icons as a component because they don&rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &ldquo;acceptedVersion&rdquo;, which is presumably the author&rsquo;s version, as opposed to the &ldquo;publishedVersion&rdquo;, which means it&rsquo;s available as open access on the publisher&rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as described in my DSpace issue from last year: https://github.
<a href='https://alanorth.github.io/cgspace-notes/2023-07/'>Read more →</a>
</article>

View File

@ -48,7 +48,7 @@
<link>https://alanorth.github.io/cgspace-notes/2023-07/</link>
<pubDate>Sat, 01 Jul 2023 17:14:36 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2023-07/</guid>
<description>2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &amp;ldquo;Copyrighted; all rights reserved&amp;rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&amp;rsquo;s usually copyrighted (could still be open access, but we can&amp;rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&amp;hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&amp;rsquo;t like the Impact Area icons as a component because they don&amp;rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&amp;rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &amp;ldquo;acceptedVersion&amp;rdquo;, which is presumably the author&amp;rsquo;s version, as opposed to the &amp;ldquo;publishedVersion&amp;rdquo;, which means it&amp;rsquo;s available as open access on the publisher&amp;rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as describe in my DSpace issue from last year: https://github.</description>
<description>2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &amp;ldquo;Copyrighted; all rights reserved&amp;rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&amp;rsquo;s usually copyrighted (could still be open access, but we can&amp;rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&amp;hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&amp;rsquo;t like the Impact Area icons as a component because they don&amp;rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&amp;rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &amp;ldquo;acceptedVersion&amp;rdquo;, which is presumably the author&amp;rsquo;s version, as opposed to the &amp;ldquo;publishedVersion&amp;rdquo;, which means it&amp;rsquo;s available as open access on the publisher&amp;rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as described in my DSpace issue from last year: https://github.</description>
</item>
<item>
<title>June, 2023</title>

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />
@ -228,7 +228,7 @@
</p>
</header>
2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &ldquo;Copyrighted; all rights reserved&rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&rsquo;s usually copyrighted (could still be open access, but we can&rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&rsquo;t like the Impact Area icons as a component because they don&rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &ldquo;acceptedVersion&rdquo;, which is presumably the author&rsquo;s version, as opposed to the &ldquo;publishedVersion&rdquo;, which means it&rsquo;s available as open access on the publisher&rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as describe in my DSpace issue from last year: https://github.
2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &ldquo;Copyrighted; all rights reserved&rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&rsquo;s usually copyrighted (could still be open access, but we can&rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&rsquo;t like the Impact Area icons as a component because they don&rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &ldquo;acceptedVersion&rdquo;, which is presumably the author&rsquo;s version, as opposed to the &ldquo;publishedVersion&rdquo;, which means it&rsquo;s available as open access on the publisher&rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as described in my DSpace issue from last year: https://github.
<a href='https://alanorth.github.io/cgspace-notes/2023-07/'>Read more →</a>
</article>

View File

@ -48,7 +48,7 @@
<link>https://alanorth.github.io/cgspace-notes/2023-07/</link>
<pubDate>Sat, 01 Jul 2023 17:14:36 +0300</pubDate>
<guid>https://alanorth.github.io/cgspace-notes/2023-07/</guid>
<description>2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &amp;ldquo;Copyrighted; all rights reserved&amp;rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&amp;rsquo;s usually copyrighted (could still be open access, but we can&amp;rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&amp;hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&amp;rsquo;t like the Impact Area icons as a component because they don&amp;rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&amp;rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &amp;ldquo;acceptedVersion&amp;rdquo;, which is presumably the author&amp;rsquo;s version, as opposed to the &amp;ldquo;publishedVersion&amp;rdquo;, which means it&amp;rsquo;s available as open access on the publisher&amp;rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as describe in my DSpace issue from last year: https://github.</description>
<description>2023-07-01 Export CGSpace to check for missing Initiative collection mappings Start harvesting on AReS 2023-07-02 Minor edits to the crossref_doi_lookup.py script while running some checks from 22,000 CGSpace DOIs 2023-07-03 I analyzed the licenses declared by Crossref and found with high confidence that ~400 of ours were incorrect I took the more accurate ones from Crossref and updated the items on CGSpace I took a few hundred ISBNs as well for where we were missing them I also tagged ~4,700 items with missing licenses as &amp;ldquo;Copyrighted; all rights reserved&amp;rdquo; based on their Crossref license status being TDM, mostly from Elsevier, Wiley, and Springer Checking a dozen or so manually, I confirmed that if Crossref only has a TDM license then it&amp;rsquo;s usually copyrighted (could still be open access, but we can&amp;rsquo;t tell via Crossref) I would be curious to write a script to check the Unpaywall API for open access status&amp;hellip; In the past I found that their license status was not very accurate, but the open access status might be more reliable More minor work on the DSpace 7 item views I learned some new Angular template syntax I created a custom component to show Creative Commons licenses on the simple item page I also decided that I don&amp;rsquo;t like the Impact Area icons as a component because they don&amp;rsquo;t have any visual meaning 2023-07-04 Focus group meeting with CGSpace partners about DSpace 7 I added a themed file selection component to the CGSpace theme It displays the bistream description instead of the file name, just like we did in DSpace 6 XMLUI I added a custom component to show share icons 2023-07-05 I spent some time trying to update OpenRXV from Angular 9 to 10 to 11 to 12 to 13 Most things work but there are some minor bugs it seems Mishell from CIP emailed me to say she was having problems approving an item on CGSpace Looking at PostgreSQL I saw there were a dozen or so locks that were several hours and even over one day old so I killed those processes and told her to try again 2023-07-06 Types meeting I wrote a Python script to check Unpaywall for some information about DOIs 2023-07-7 Continue exploring Unpaywall data for some of our DOIs In the past I&amp;rsquo;ve found their licensing information to not be very reliable (preferring Crossref), but I think their open access status is more reliable, especially when the provider is listed as being the publisher Even so, sometimes the version can be &amp;ldquo;acceptedVersion&amp;rdquo;, which is presumably the author&amp;rsquo;s version, as opposed to the &amp;ldquo;publishedVersion&amp;rdquo;, which means it&amp;rsquo;s available as open access on the publisher&amp;rsquo;s website I did some quality assurance and found ~100 that were marked as Limited Access, but should have been Open Access, and fixed a handful of licenses Delete duplicate metadata as described in my DSpace issue from last year: https://github.</description>
</item>
<item>
<title>June, 2023</title>

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -10,7 +10,7 @@
<meta property="og:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/posts/" />
<meta property="og:updated_time" content="2023-12-18T23:15:27+03:00" />
<meta property="og:updated_time" content="2023-12-27T10:48:32+03:00" />

View File

@ -3,19 +3,19 @@
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://alanorth.github.io/cgspace-notes/categories/</loc>
<lastmod>2023-12-18T23:15:27+03:00</lastmod>
<lastmod>2023-12-27T10:48:32+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/</loc>
<lastmod>2023-12-18T23:15:27+03:00</lastmod>
<lastmod>2023-12-27T10:48:32+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2023-12/</loc>
<lastmod>2023-12-18T23:15:27+03:00</lastmod>
<lastmod>2023-12-21T10:08:59+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/categories/notes/</loc>
<lastmod>2023-12-18T23:15:27+03:00</lastmod>
<lastmod>2023-12-27T10:48:32+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
<lastmod>2023-12-18T23:15:27+03:00</lastmod>
<lastmod>2023-12-27T10:48:32+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2023-11/</loc>
<lastmod>2023-12-06T20:57:07+03:00</lastmod>
@ -30,7 +30,7 @@
<lastmod>2023-09-01T08:10:02+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2023-07/</loc>
<lastmod>2023-08-02T23:04:11+03:00</lastmod>
<lastmod>2023-12-27T10:48:32+03:00</lastmod>
</url><url>
<loc>https://alanorth.github.io/cgspace-notes/2023-06/</loc>
<lastmod>2023-07-01T17:17:31+03:00</lastmod>