2024-03-04 10:02:14 +03:00
<!DOCTYPE html>
< html lang = "en" >
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1, shrink-to-fit=no" >
< meta property = "og:title" content = "March, 2024" / >
< meta property = "og:description" content = "2024-03-01
Last week Bizu reported an issue with the “ browse by issue date” drop down
I verified it, and suspect it could be due to missing issue dates…
It might be this issue: https://github.com/DSpace/dspace-angular/issues/2808
" />
< meta property = "og:type" content = "article" / >
< meta property = "og:url" content = "https://alanorth.github.io/cgspace-notes/2024-03/" / >
< meta property = "article:published_time" content = "2024-03-01T09:55:00+03:00" / >
2024-03-19 09:01:13 +03:00
< meta property = "article:modified_time" content = "2024-03-14T09:29:05+03:00" / >
2024-03-04 10:02:14 +03:00
< meta name = "twitter:card" content = "summary" / >
< meta name = "twitter:title" content = "March, 2024" / >
< meta name = "twitter:description" content = "2024-03-01
Last week Bizu reported an issue with the “ browse by issue date” drop down
I verified it, and suspect it could be due to missing issue dates…
It might be this issue: https://github.com/DSpace/dspace-angular/issues/2808
"/>
2024-03-08 17:31:19 +03:00
< meta name = "generator" content = "Hugo 0.123.8" >
2024-03-04 10:02:14 +03:00
< script type = "application/ld+json" >
{
"@context": "http://schema.org",
"@type": "BlogPosting",
"headline": "March, 2024",
"url": "https://alanorth.github.io/cgspace-notes/2024-03/",
2024-03-19 09:01:13 +03:00
"wordCount": "923",
2024-03-04 10:02:14 +03:00
"datePublished": "2024-03-01T09:55:00+03:00",
2024-03-19 09:01:13 +03:00
"dateModified": "2024-03-14T09:29:05+03:00",
2024-03-04 10:02:14 +03:00
"author": {
"@type": "Person",
"name": "Alan Orth"
},
"keywords": "Notes"
}
< / script >
< link rel = "canonical" href = "https://alanorth.github.io/cgspace-notes/2024-03/" >
< title > March, 2024 | CGSpace Notes< / title >
<!-- combined, minified CSS -->
< link href = "https://alanorth.github.io/cgspace-notes/css/style.c6ba80bc50669557645abe05f86b73cc5af84408ed20f1551a267bc19ece8228.css" rel = "stylesheet" integrity = "sha256-xrqAvFBmlVdkWr4F+GtzzFr4RAjtIPFVGiZ7wZ7Ogig=" crossorigin = "anonymous" >
<!-- minified Font Awesome for SVG icons -->
< script defer src = "https://alanorth.github.io/cgspace-notes/js/fontawesome.min.f5072c55a0721857184db93a50561d7dc13975b4de2e19db7f81eb5f3fa57270.js" integrity = "sha256-9QcsVaByGFcYTbk6UFYdfcE5dbTeLhnbf4HrXz+lcnA=" crossorigin = "anonymous" > < / script >
<!-- RSS 2.0 feed -->
< / head >
< body >
< div class = "blog-masthead" >
< div class = "container" >
< nav class = "nav blog-nav" >
< a class = "nav-link " href = "https://alanorth.github.io/cgspace-notes/" > Home< / a >
< / nav >
< / div >
< / div >
< header class = "blog-header" >
< div class = "container" >
< h1 class = "blog-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/" rel = "home" > CGSpace Notes< / a > < / h1 >
< p class = "lead blog-description" dir = "auto" > Documenting day-to-day work on the < a href = "https://cgspace.cgiar.org" > CGSpace< / a > repository.< / p >
< / div >
< / header >
< div class = "container" >
< div class = "row" >
< div class = "col-sm-8 blog-main" >
< article class = "blog-post" >
< header >
< h2 class = "blog-post-title" dir = "auto" > < a href = "https://alanorth.github.io/cgspace-notes/2024-03/" > March, 2024< / a > < / h2 >
< p class = "blog-post-meta" >
< time datetime = "2024-03-01T09:55:00+03:00" > Fri Mar 01, 2024< / time >
in
< span class = "fas fa-folder" aria-hidden = "true" > < / span > < a href = "/categories/notes/" rel = "category tag" > Notes< / a >
< / p >
< / header >
< h2 id = "2024-03-01" > 2024-03-01< / h2 >
< ul >
< li > Last week Bizu reported an issue with the “ browse by issue date” drop down
< ul >
< li > I verified it, and suspect it could be due to missing issue dates… < / li >
< li > It might be this issue: < a href = "https://github.com/DSpace/dspace-angular/issues/2808" > https://github.com/DSpace/dspace-angular/issues/2808< / a > < / li >
< / ul >
< / li >
< / ul >
< ul >
< li > I spent some time trying to reproduce the bug affecting < code > onebox< / code > fields that are configured to use external vocabularies and are not repeatable
< ul >
< li > I filed an issue: < a href = "https://github.com/DSpace/dspace-angular/issues/2846" > https://github.com/DSpace/dspace-angular/issues/2846< / a > < / li >
< / ul >
< / li >
< / ul >
< h2 id = "2024-03-03" > 2024-03-03< / h2 >
< ul >
< li > I did some cleanups on abstracts, licenses, and dates from CrossRef< / li >
< li > I also did some minor cleanups to affiliations because I saw some incorrect and duplicate ones in our list< / li >
< / ul >
2024-03-08 17:31:19 +03:00
< h2 id = "2024-03-05" > 2024-03-05< / h2 >
< ul >
< li > I tried a new technique to get some affiliations from Crossref using OpenRefine
< ul >
< li > First I split them and clustered, resolving a few hundred clusters out of 1500 (!)< / li >
< li > Then I used a custom text facet with a few dozen CGIAR and other large affiliations to reduce the work< / li >
< li > Then I joined them with our affiliations, paying no attention to duplicates< / li >
< li > Then I deduped them using the Jython technique I learned in 2023-02< / li >
< / ul >
< / li >
< / ul >
< h2 id = "2024-03-06" > 2024-03-06< / h2 >
< ul >
< li > Peter sent me some more corrections for the authors that I had sent him in 2023-12< / li >
< / ul >
< h2 id = "2024-03-08" > 2024-03-08< / h2 >
< ul >
< li > IFPRI sent me their 2023 records from CONTENTdm so I started working on those
< ul >
< li > I found a way to match their ORCID identifiers in our list using Jython in OpenRefine:< / li >
< / ul >
< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-python" data-lang = "python" > < span style = "display:flex;" > < span > < span style = "color:#f92672" > import< / span > re
< / span > < / span > < span style = "display:flex;" > < span >
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > with< / span > open(< span style = "color:#e6db74" > r< / span > < span style = "color:#e6db74" > " /tmp/cg-creator-identifier.txt" < / span > ,< span style = "color:#e6db74" > ' r' < / span > ) < span style = "color:#66d9ef" > as< / span > f :
< / span > < / span > < span style = "display:flex;" > < span > orcid_ids < span style = "color:#f92672" > =< / span > [orcid_id< span style = "color:#f92672" > .< / span > strip() < span style = "color:#66d9ef" > for< / span > orcid_id < span style = "color:#f92672" > in< / span > f]
< / span > < / span > < span style = "display:flex;" > < span >
< / span > < / span > < span style = "display:flex;" > < span > matched < span style = "color:#f92672" > =< / span > < span style = "color:#66d9ef" > False< / span >
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > for< / span > orcid_id < span style = "color:#f92672" > in< / span > orcid_ids:
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > if< / span > re< span style = "color:#f92672" > .< / span > search(< span style = "color:#e6db74" > r< / span > < span style = "color:#e6db74" > ' .+: < / span > < span style = "color:#e6db74" > {}< / span > < span style = "color:#e6db74" > ' < / span > < span style = "color:#f92672" > .< / span > format(value), orcid_id):
< / span > < / span > < span style = "display:flex;" > < span > matched < span style = "color:#f92672" > =< / span > < span style = "color:#66d9ef" > True< / span >
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > break< / span >
< / span > < / span > < span style = "display:flex;" > < span >
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > if< / span > matched:
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > return< / span > orcid_id
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > else< / span > :
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > return< / span > value
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > I realized that < a href = "https://www.unicef.org/about-unicef/frequently-asked-questions#3" > UNICEF was renamed to its current name in 1953< / a > so I replaced all other variations in our vocabularies and metadata:< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-sql" data-lang = "sql" > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > UPDATE< / span > metadatavalue < span style = "color:#66d9ef" > SET< / span > text_value< span style = "color:#f92672" > =< / span > < span style = "color:#e6db74" > ' United Nations Children' ' s Fund' < / span > < span style = "color:#66d9ef" > WHERE< / span > dspace_object_id < span style = "color:#66d9ef" > IN< / span > (< span style = "color:#66d9ef" > SELECT< / span > uuid < span style = "color:#66d9ef" > FROM< / span > item) < span style = "color:#66d9ef" > AND< / span > text_value < span style = "color:#66d9ef" > IN< / span > (< span style = "color:#e6db74" > ' United Nations International Children' ' s Emergency Fund' < / span > , < span style = "color:#e6db74" > ' United Nations International Children' ' s Emergency Fund' < / span > , < span style = "color:#e6db74" > ' UNICEF' < / span > );
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > Note the use of two single quotes to escape the one in the name< / li >
< / ul >
2024-03-11 18:04:40 +03:00
< h2 id = "2024-03-11" > 2024-03-11< / h2 >
< ul >
< li > Experimenting with moving some of my Python scripts to the DSpace 7 REST API
< ul >
< li > I need a way to get UUIDs for Handles… < / li >
< li > Seems that I can use a Discovery query like: < a href = "https://dspace7test.ilri.org/server/api/discover/search/objects?dsoType=item&query=handle:10568/130864" > https://dspace7test.ilri.org/server/api/discover/search/objects?dsoType=item& query=handle:10568/130864< / a > < / li >
< li > Then just take the first result… ?< / li >
< / ul >
< / li >
< li > I spent some time working on the script get abstracts from CGSpace, and found a bug in my logic
< ul >
< li > I also noticed that one item had two abstracts, but the first one was blank!< / li >
< li > Looking deeper, I found 113 blank metadata values so I deleted those:< / li >
< / ul >
< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-sql" data-lang = "sql" > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > BEGIN< / span > ;
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > DELETE< / span > < span style = "color:#66d9ef" > FROM< / span > metadatavalue < span style = "color:#66d9ef" > WHERE< / span > dspace_object_id < span style = "color:#66d9ef" > IN< / span > (< span style = "color:#66d9ef" > SELECT< / span > uuid < span style = "color:#66d9ef" > FROM< / span > item) < span style = "color:#66d9ef" > AND< / span > text_value< span style = "color:#f92672" > =< / span > < span style = "color:#e6db74" > ' ' < / span > ;
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > COMMIT< / span > ;
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > I also found a few dozen items with “ N/A” for their citation, so I deleted those too:< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-sql" data-lang = "sql" > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > BEGIN< / span > ;
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > DELETE< / span > < span style = "color:#66d9ef" > FROM< / span > metadatavalue < span style = "color:#66d9ef" > WHERE< / span > dspace_object_id < span style = "color:#66d9ef" > IN< / span > (< span style = "color:#66d9ef" > SELECT< / span > uuid < span style = "color:#66d9ef" > FROM< / span > item) < span style = "color:#66d9ef" > AND< / span > text_value< span style = "color:#f92672" > =< / span > < span style = "color:#e6db74" > ' N/A' < / span > < span style = "color:#66d9ef" > AND< / span > metadata_field_id< span style = "color:#f92672" > =< / span > < span style = "color:#ae81ff" > 146< / span > ;
< / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > COMMIT< / span > ;
2024-03-11 21:58:15 +03:00
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > I deployed the change to disable Angular SSR’ s < code > inlineCriticalCss< / code > on production because we had heavy load on the frontend and I’ ve been meaning to do this permanently for some time< / li >
< li > Maria asked me for a CSV with all the broken Bioversity permalinks so I exported them for her:< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > $ csvcut -c < span style = "color:#e6db74" > ' id,dc.title[en_US],dc.identifier.uri[en_US],cg.link.permalink[en_US]' < / span > ~/Downloads/2024-03-05-cgspace.csv < span style = "color:#ae81ff" > \
< / span > < / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#ae81ff" > < / span > | csvgrep -c ' cg.link.permalink[en_US]' -r ' ^.+$' > /tmp/2024-03-11-Bioversity-Permalinks.csv
2024-03-14 09:29:05 +03:00
< / span > < / span > < / code > < / pre > < / div > < h2 id = "2024-03-12" > 2024-03-12< / h2 >
< ul >
< li > Run the duplicate checker for IFPRI 2023 batch upload< / li >
< / ul >
< h2 id = "2024-03-13" > 2024-03-13< / h2 >
< ul >
< li > I found about 428 duplicates in the IFPRI 2023 batch records
< ul >
< li > Alarmingly, I found about 18 that are duplicated on CGSpace as well!< / li >
< li > I looked closer and decided that 11 were duplicates, so I merged the metadata and withdrew the later ones< / li >
< / ul >
< / li >
< li > Alliance asked me to get him the Handles for items submitted by TIP that are not discoverable
< ul >
< li > I found it easiest to use the < code > ds6_item2itemhandle< / code > < a href = "https://wiki.lyrasis.org/display/DSPACE/Helper+SQL+functions+for+DSpace+6" > DSpace SQL helper function< / a > with a nested query on the provenance:< / li >
< / ul >
< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-sql" data-lang = "sql" > < span style = "display:flex;" > < span > < span style = "color:#66d9ef" > SELECT< / span > ds6_item2itemhandle(dspace_object_id) < span style = "color:#66d9ef" > AS< / span > handle < span style = "color:#66d9ef" > FROM< / span > metadatavalue < span style = "color:#66d9ef" > WHERE< / span > dspace_object_id < span style = "color:#66d9ef" > IN< / span > (< span style = "color:#66d9ef" > SELECT< / span > uuid < span style = "color:#66d9ef" > FROM< / span > item < span style = "color:#66d9ef" > WHERE< / span > < span style = "color:#66d9ef" > NOT< / span > discoverable) < span style = "color:#66d9ef" > AND< / span > metadata_field_id< span style = "color:#f92672" > =< / span > < span style = "color:#ae81ff" > 28< / span > < span style = "color:#66d9ef" > AND< / span > text_value < span style = "color:#66d9ef" > LIKE< / span > < span style = "color:#e6db74" > ' Submitted by Alliance TIP Submit%' < / span > ;
2024-03-19 09:01:13 +03:00
< / span > < / span > < / code > < / pre > < / div > < h2 id = "2024-03-14" > 2024-03-14< / h2 >
< ul >
< li > Looking in to reports of rate limiting of Altmetric’ s bot on CGSpace
< ul >
< li > I don’ t see any HTTP 429 responses for their user agents in any of our logs… < / li >
< li > I tried myself on an item page and never hit a limit… < / li >
< / ul >
< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > $ < span style = "color:#66d9ef" > for< / span > num in < span style = "color:#f92672" > {< / span > 1..60< span style = "color:#f92672" > }< / span > ; < span style = "color:#66d9ef" > do< / span > echo -n < span style = "color:#e6db74" > " Request < / span > < span style = "color:#e6db74" > ${< / span > num< span style = "color:#e6db74" > }< / span > < span style = "color:#e6db74" > : " < / span > ; curl -s -o /dev/null -w < span style = "color:#e6db74" > " %{http_code}" < / span > https://dspace7test.ilri.org/items/c9b8999d-3001-42ba-a267-14f4bfa90b53 < span style = "color:#f92672" > & & < / span > echo; < span style = "color:#66d9ef" > done< / span >
< / span > < / span > < span style = "display:flex;" > < span > Request 1: 200
< / span > < / span > < span style = "display:flex;" > < span > Request 2: 200
< / span > < / span > < span style = "display:flex;" > < span > Request 3: 200
< / span > < / span > < span style = "display:flex;" > < span > Request 4: 200
< / span > < / span > < span style = "display:flex;" > < span > ...
< / span > < / span > < span style = "display:flex;" > < span > Request 60: 200
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > All responses were HTTP 200… < / li >
< li > In any case, I whitelisted their production IPs and told them to try again< / li >
< li > I imported 468 of IFPRI’ s 2023 records that were confirmed to not be duplicates to CGSpace
< ul >
< li > I also spent some time merging metadata from 415 of the remaining 432 duplicates with the metadata for the existing items on CGSpace< / li >
< li > This was a bit of dirty work using csvkit, xsv, and OpenRefine< / li >
< / ul >
< / li >
< / ul >
< h2 id = "2024-03-17" > 2024-03-17< / h2 >
< ul >
< li > There are 17 records from IFPRI’ s 2023 batch that are remaining from the 432 that I identified as already being on CGSpace
< ul >
< li > These are different in that they are duplicates on CGSpace as well, so the csvjoin failed and the metadata got messed up in my migration< / li >
< li > I looked closer and whittled this down to 14 actual records, and spent some time working on them< / li >
< li > I isolated 12 of these items that existed on CGSpace and added publication ranks, project identifiers, and provenance links< / li >
< li > Now there only remain two confusing records about the Inkomati catchment< / li >
< / ul >
< / li >
< / ul >
< h2 id = "2024-03-18" > 2024-03-18< / h2 >
< ul >
< li > Checking to see how many IFPRI records we have migrated so far:< / li >
< / ul >
< div class = "highlight" > < pre tabindex = "0" style = "color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;" > < code class = "language-console" data-lang = "console" > < span style = "display:flex;" > < span > $ csvgrep -c < span style = "color:#e6db74" > ' dc.description.provenance[en_US]' < / span > -m < span style = "color:#e6db74" > ' Original URL from IFPRI CONTENTdm' < / span > cgspace.csv < span style = "color:#ae81ff" > \
< / span > < / span > < / span > < span style = "display:flex;" > < span > < span style = "color:#ae81ff" > < / span > | csvcut -c ' id,dc.title[en_US],dc.identifier.uri[en_US],dc.description.provenance[en_US],dcterms.type[en_US]' \
< / span > < / span > < span style = "display:flex;" > < span > | tee /tmp/ifpri-records.csv \
< / span > < / span > < span style = "display:flex;" > < span > | csvstat --count
< / span > < / span > < span style = "display:flex;" > < span > 898
< / span > < / span > < / code > < / pre > < / div > < ul >
< li > I finalized the remaining two on Inkomati catchment and now we are at 900!< / li >
< / ul >
<!-- raw HTML omitted -->
2024-03-04 10:02:14 +03:00
< / article >
< / div > <!-- /.blog - main -->
< aside class = "col-sm-3 ml-auto blog-sidebar" >
< section class = "sidebar-module" >
< h4 > Recent Posts< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "/cgspace-notes/2024-03/" > March, 2024< / a > < / li >
< li > < a href = "/cgspace-notes/2024-02/" > February, 2024< / a > < / li >
< li > < a href = "/cgspace-notes/2024-01/" > January, 2024< / a > < / li >
< li > < a href = "/cgspace-notes/2023-12/" > December, 2023< / a > < / li >
< li > < a href = "/cgspace-notes/2023-11/" > November, 2023< / a > < / li >
< / ol >
< / section >
< section class = "sidebar-module" >
< h4 > Links< / h4 >
< ol class = "list-unstyled" >
< li > < a href = "https://cgspace.cgiar.org" > CGSpace< / a > < / li >
< li > < a href = "https://dspacetest.cgiar.org" > DSpace Test< / a > < / li >
< li > < a href = "https://github.com/ilri/DSpace" > CGSpace @ GitHub< / a > < / li >
< / ol >
< / section >
< / aside >
< / div > <!-- /.row -->
< / div > <!-- /.container -->
< footer class = "blog-footer" >
< p dir = "auto" >
Blog template created by < a href = "https://twitter.com/mdo" > @mdo< / a > , ported to Hugo by < a href = 'https://twitter.com/mralanorth' > @mralanorth< / a > .
< / p >
< p >
< a href = "#" > Back to top< / a >
< / p >
< / footer >
< / body >
< / html >