2023-07-04 07:03:36 +02:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
< meta property = "og:title" content = "CGSpace Notes" / >
< meta property = "og:description" content = "Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." / >
< meta property = "og:type" content = "website" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/" / >
2024-08-07 17:54:13 +02:00
< meta property = "og:updated_time" content = "2024-07-11T13:08:22+03:00" / >
2023-07-04 07:03:36 +02:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "CGSpace Notes" / >
< meta name = "twitter:description" content = "Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository." / >
2024-08-07 17:54:13 +02:00
< meta name = "generator" content = "Hugo 0.131.0" >
2023-07-04 07:03:36 +02:00
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "Blog",
"headline": "CGSpace Notes",
"url" : "https://alanorth.github.io/cgspace-notes/",
"author": {
"@type": "Person",
"name": "Alan Orth"
},
2024-07-02 10:12:03 +02:00
"dateModified": "2024-07-01T09:37:00+03:00",
2023-07-04 07:03:36 +02:00
"keywords": "notes, migration, notes",
"description":"Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/" >
< title > CGSpace Notes< / title >
<!-- combined, minified CSS -->
< link href = "https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel = "stylesheet" integrity = "sha256-xrqAvFBmlVdkWr4F+GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin = "anonymous" >
<!-- minified Font Awesome for SVG icons -->
< script defer src = "https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity = "sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz+lcnA=" crossorigin = "anonymous" > < / script >
<!-- RSS 2.0 feed -->
< link rel = "alternate" type = "application/rss+xml" href = "https://alanorth.github.io/cgspace-notes/index.xml" title = "CGSpace Notes" / >
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link active" href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" dir = "auto" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
2024-07-02 10:12:03 +02:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2024-07/" > July, 2024< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2024-07-01T09:37:00+03:00" > Mon Jul 01, 2024< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2024-07-01" > 2024-07-01< / h2 >
< ul >
< li > A bit of work to clean up duplicate DOIs on CGSpace
< ul >
< li > A handful of book chapters, working papers, and journal articles using the wrong DOI< / li >
< / ul >
< / li >
< li > I tried to delete all users who have been inactive since six years ago (July 1, 2018):< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2024-07/' > Read more →< / a >
< / article >
2024-06-03 16:31:03 +02:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2024-06/" > June, 2024< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2024-06-03T14:14:00+03:00" > Mon Jun 03, 2024< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2024-06-03" > 2024-06-03< / h2 >
< ul >
< li > Working on IFPRI datasets
< ul >
< li > I noticed the licenses were missing from Nilam’ s original file so I found a way to check < a href = "https://guides.dataverse.org/en/latest/api/native-api.html#export-metadata-of-a-dataset-in-various-formats" > Dataverse’ s API for a persistent identifier< / a > < / li >
< li > We have both Handles and DOIs for these datasets, both from Harvard’ s Dataverse< / li >
< / ul >
< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2024-06/' > Read more →< / a >
< / article >
2024-05-01 16:10:05 +02:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2024-05/" > May, 2024< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2024-05-01T10:39:00+03:00" > Wed May 01, 2024< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2024-05-01" > 2024-05-01< / h2 >
< ul >
< li > I dumped all the CGSpace DOIs and resolved them with my < code > crossref_doi_lookup.py< / code > script
< ul >
< li > Then I did some work to add missing abstracts (about 900!), volumes, issues, licenses, publishers, and types, etc< / li >
< / ul >
< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2024-05/' > Read more →< / a >
< / article >
2024-04-04 09:23:49 +02:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2024-04/" > April, 2024< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2024-04-04T10:23:00+03:00" > Thu Apr 04, 2024< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2024-04-04" > 2024-04-04< / h2 >
< ul >
2024-04-09 15:50:56 +02:00
< li > Work on CGSpace duplicate DOIs more< / li >
2024-04-04 09:23:49 +02:00
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2024-04/' > Read more →< / a >
< / article >
2024-02-06 09:45:02 +01:00
< article class = "blog-post" >
< header >
2024-03-04 08:02:14 +01:00
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2024-03/" > March, 2024< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2024-03-01T09:55:00+03:00" > Fri Mar 01, 2024< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2024-03-01" > 2024-03-01< / h2 >
< ul >
< li > Last week Bizu reported an issue with the “ browse by issue date” drop down
< ul >
< li > I verified it, and suspect it could be due to missing issue dates… < / li >
< li > It might be this issue: < a href = "https://github.com/DSpace/dspace-angular/issues/2808" > https://github.com/DSpace/dspace-angular/issues/2808< / a > < / li >
< / ul >
< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2024-03/' > Read more →< / a >
< / article >
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2024-02/" > February, 2024< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2024-02-05T11:10:00+03:00" > Mon Feb 05, 2024< / time > by Alan Orth in
2024-02-06 09:45:02 +01:00
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2024-02-05" > 2024-02-05< / h2 >
< ul >
< li > Delete duplicate metadata as described in my DSpace issue from last year: < a href = "https://github.com/DSpace/DSpace/issues/8253" > https://github.com/DSpace/DSpace/issues/8253< / a > < / li >
< li > Lower case all the AGROVOC subjects on CGSpace< / li >
< / ul >
2024-03-04 08:02:14 +01:00
< a href = 'https://alanorth.github.io/cgspace-notes/2024-02/' > Read more →< / a >
2024-02-06 09:45:02 +01:00
< / article >
2024-01-05 13:45:46 +01:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2024-01/" > January, 2024< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2024-01-02T10:08:00+03:00" > Tue Jan 02, 2024< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2024-01-02" > 2024-01-02< / h2 >
< ul >
< li > Work on preparation of new server for DSpace 7 migration
< ul >
< li > I’ m not quite sure what we need to do for the Handle server< / li >
< li > For now I just ran the < code > dspace make-handle-config< / code > script and diffed it with the one from DSpace 6< / li >
< li > I sent the bundle to the Handle admins to make sure it’ s OK before we do the migration< / li >
< / ul >
< / li >
< li > Continue testing and debugging the cgspace-java-helpers on DSpace 7< / li >
< li > Work on IFPRI ISNAR archive cleanup< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2024-01/' > Read more →< / a >
< / article >
2023-12-02 08:38:09 +01:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2023-12/" > December, 2023< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2023-12-01T08:48:36+03:00" > Fri Dec 01, 2023< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
2023-12-01 There is still high load on CGSpace and I don’ t know why I don’ t see a high number of sessions compared to previous days in the last few weeks $ for file in dspace.log.2023-11-[23]*; do echo " $file" ; grep -a -oE ' session_id=[A-Z0-9]{32}' " $file" | sort | uniq | wc -l; done dspace.log.2023-11-20 22865 dspace.log.2023-11-21 20296 dspace.log.2023-11-22 19688 dspace.log.2023-11-23 17906 dspace.log.2023-11-24 18453 dspace.log.2023-11-25 17513 dspace.log.2023-11-26 19037 dspace.log.2023-11-27 21103 dspace.log.2023-11-28 23023 dspace.log.2023-11-29 23545 dspace.
< a href = 'https://alanorth.github.io/cgspace-notes/2023-12/' > Read more →< / a >
< / article >
2023-11-08 06:20:31 +01:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2023-11/" > November, 2023< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2023-11-02T12:59:36+03:00" > Thu Nov 02, 2023< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2023-11-01" > 2023-11-01< / h2 >
< ul >
< li > Work a bit on the ETL pipeline for the CGIAR Climate Change Synthesis
< ul >
< li > I improved the filtering and wrote some Python using pandas to merge my sources more reliably< / li >
< / ul >
< / li >
< / ul >
< h2 id = "2023-11-02" > 2023-11-02< / h2 >
< ul >
< li > Export CGSpace to check missing Initiative collection mappings< / li >
< li > Start a harvest on AReS< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2023-11/' > Read more →< / a >
< / article >
2023-10-04 08:24:33 +02:00
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2023-10/" > October, 2023< / a > < / h2 >
< p class = "blog-post-meta" > < time datetime = "2023-10-02T09:05:36+03:00" > Mon Oct 02, 2023< / time > by Alan Orth in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2023-10-02" > 2023-10-02< / h2 >
< ul >
< li > Export CGSpace to check DOIs against Crossref
< ul >
< li > I found that < a href = "https://www.crossref.org/documentation/retrieve-metadata/rest-api/rest-api-metadata-license-information/" > Crossref’ s metadata is in the public domain under the CC0 license< / a > < / li >
< li > One interesting thing is the abstracts, which are copyrighted by the copyright owner, meaning Crossref cannot waive the copyright under the terms of the CC0 license, because it is not theirs to waive< / li >
< li > We can be on the safe side by using only abstracts for items that are licensed under Creative Commons< / li >
< / ul >
< / li >
< / ul >
< a href = 'https://alanorth.github.io/cgspace-notes/2023-10/' > Read more →< / a >
< / article >
2023-07-04 07:03:36 +02:00
< nav class = "blog-pagination" >
< a class = "btn btn-outline-primary disabled" href = "#" role = "button" aria-disabled = "true" > Previous page< / a >
< a class = "btn btn-outline-primary" href = "/cgspace-notes/page/2/" rel = "next" role = "button" > Next page< / a >
< / nav >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
2024-07-02 10:12:03 +02:00
< li > < a href = "/cgspace-notes/2024-07/" > July, 2024< / a > < / li >
2024-06-03 16:31:03 +02:00
< li > < a href = "/cgspace-notes/2024-06/" > June, 2024< / a > < / li >
2024-05-01 16:10:05 +02:00
< li > < a href = "/cgspace-notes/2024-05/" > May, 2024< / a > < / li >
2024-04-04 09:23:49 +02:00
< li > < a href = "/cgspace-notes/2024-04/" > April, 2024< / a > < / li >
2024-03-04 08:02:14 +01:00
< li > < a href = "/cgspace-notes/2024-03/" > March, 2024< / a > < / li >
2023-07-04 07:03:36 +02:00
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
< p dir = "auto" >
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >