mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2025-01-27 05:49:12 +01:00
Add notes
This commit is contained in:
@ -14,7 +14,7 @@ Work on CGSpace duplicate DOIs more
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2024-04/" />
|
||||
<meta property="article:published_time" content="2024-04-04T10:23:00+03:00" />
|
||||
<meta property="article:modified_time" content="2024-04-09T16:50:56+03:00" />
|
||||
<meta property="article:modified_time" content="2024-04-12T20:40:52+03:00" />
|
||||
|
||||
|
||||
|
||||
@ -34,9 +34,9 @@ Work on CGSpace duplicate DOIs more
|
||||
"@type": "BlogPosting",
|
||||
"headline": "April, 2024",
|
||||
"url": "https://alanorth.github.io/cgspace-notes/2024-04/",
|
||||
"wordCount": "77",
|
||||
"wordCount": "236",
|
||||
"datePublished": "2024-04-04T10:23:00+03:00",
|
||||
"dateModified": "2024-04-09T16:50:56+03:00",
|
||||
"dateModified": "2024-04-12T20:40:52+03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -137,6 +137,37 @@ Work on CGSpace duplicate DOIs more
|
||||
<li>I need to merge the metadata for the remaining 212 that are already on CGSpace</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Spend some time looking at duplicate DOIs again…</li>
|
||||
</ul>
|
||||
<h2 id="2024-04-13">2024-04-13</h2>
|
||||
<ul>
|
||||
<li>Spend some time looking at duplicate DOIs again…</li>
|
||||
</ul>
|
||||
<h2 id="2024-04-14">2024-04-14</h2>
|
||||
<ul>
|
||||
<li>Spend some time looking at duplicate DOIs again…</li>
|
||||
</ul>
|
||||
<h2 id="2024-04-15">2024-04-15</h2>
|
||||
<ul>
|
||||
<li>Spend some time looking at duplicate DOIs again…</li>
|
||||
<li>Delete ~260 duplicate metadata values using the elaborate SQL and sort method I documented here: <a href="https://github.com/DSpace/DSpace/issues/8253#issuecomment-1331756418">https://github.com/DSpace/DSpace/issues/8253#issuecomment-1331756418</a></li>
|
||||
<li>Tony noticed that the DSpace 7 REST API is very slow with the embeds so I profiled a bit:</li>
|
||||
</ul>
|
||||
<pre tabindex="0"><code>$ time curl -s -o /dev/null 'https://cgspace.cgiar.org/server/api/discover/search/objects?query=cg.identifier.project%3AIFPRI*&scope=8f1e9650-fe87-4e6e-889a-1cacfb747408&page=0&size=100&embed=thumbnail,bundles/bitstreams&sort=dcterms.issued,desc'
|
||||
curl -s -o /dev/null 0.01s user 0.01s system 0% cpu 47.515 total
|
||||
$ time curl -s -o /dev/null 'https://cgspace.cgiar.org/server/api/discover/search/objects?query=cg.identifier.project%3AIFPRI*&scope=8f1e9650-fe87-4e6e-889a-1cacfb747408&page=0&size=100&sort=dcterms.issued,desc'
|
||||
curl -s -o /dev/null 0.01s user 0.01s system 0% cpu 4.764 total
|
||||
</code></pre><ul>
|
||||
<li>Finalize processing the remaining 206 items from the IFPRI 2022 batch set that already existed on CGSpace
|
||||
<ul>
|
||||
<li>I merged metadata with the existing items</li>
|
||||
<li>There are still six remaining items that I identified as being duplicates (3x2) in the IFPRI set itself</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<h2 id="2024-04-16">2024-04-16</h2>
|
||||
<ul>
|
||||
<li>Spend some time looking at duplicate DOIs again…</li>
|
||||
</ul>
|
||||
<!-- raw HTML omitted -->
|
||||
|
||||
|
Reference in New Issue
Block a user