Spend some time testing my post_bitstreams.py script to update thumbnails for items on CGSpace
Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail…
Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
They have experience with improving the MODS interface in MELSpace’s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace
From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk
Spend some time testing my post_bitstreams.py script to update thumbnails for items on CGSpace
Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail…
Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
They have experience with improving the MODS interface in MELSpace’s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace
From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk
<li>Spend some time testing my <code>post_bitstreams.py</code> script to update thumbnails for items on CGSpace
<ul>
<li>Interestingly I found an item with a JFIF thumbnail and another with a WebP thumbnail…</li>
</ul>
</li>
<li>Meeting with Valentina, Stefano, and Sara about MODS metadata in CGSpace
<ul>
<li>They have experience with improving the MODS interface in MELSpace’s OAI-PMH for use with AGRIS and were curious if we could do the same in CGSpace</li>
<li>From what I can see we need to upgrade the MODS schema from 3.1 to 3.7 and then just add a bunch of our fields to the crosswalk</li>
<li>The upgrade was mostly normal, but I had to unhold the openjdk package in order for <code>do-release-upgrade</code> to run:</li>
</ul>
</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span># apt-mark hold openjdk-8-jdk-headless:amd64 openjdk-8-jre-headless:amd64
</span></span></code></pre></div><ul>
<li>In <ahref="/cgspace-notes/2022-11/">2022-11</a> an upstream Java update broke the DSpace 6 Handle server so we will have to pin this again after the upgrade to Ubuntu 22.04</li>
<li>After the upgrade I made sure CGSpace was working, then proceeded to upgrade PostgreSQL from 12 to 14, like I did on <ahref="/cgspace-notes/2023-03/">DSpace Test in 2023-03</a></li>
<li>Then I had to downgrade OpenJDK to fix the Handle server using the ones I had previously downloaded for Ubuntu 20.04 because they no longer exist on Launchpad:</li>
<li>Continue working on the MODS schema mapping</li>
<li>Export CGSpace to check and update <code>dcterms.extent</code> fields
<ul>
<li>I normalized about 1,500 to use either “p. 1-6” or “5 p.” format</li>
<li>Also, I used this GREL expression to extract missing pages from the citation field: <code>cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*(pp?\.\s?\d+[-–]\d+).*/)[0]</code></li>
<li>This was over 4,000 items with a format like “p. 1-6” and “pp. 1-6” in the citation</li>
<li>I used another GREL expression to extract another 5,000: <code>cells['dcterms.bibliographicCitation[en_US]'].value.match(/.*?(\d+\s+?[Pp]+\.).*/)[0]</code></li>
<li>This was for the format like “1 p.” (note we had to protect against the greedy <code>.*</code> in the beginning)</li>
</ul>
</li>
<li>I also did some work to capture a handful of missing DOIs and ISSNs, but it was only about 100 items and I will have to wait until the 10,000+ above finish importing</li>
<li>I see there are ~200 users in CGSpace that have registered with their CGIAR email address using a password as opposed to using Active Directory:</li>
<li>File <ahref="https://github.com/DSpace/DSpace/issues/8900">an issue</a> on DSpace for the <code>Content-Disposition</code> bug causing images to get downloaded instead of opened inline</li>
</ul>
<h2id="2023-06-12">2023-06-12</h2>
<ul>
<li>Export CGSpace to do some more work extracting volume and issue from citations for items where they are missing
<ul>
<li>I found and fixed over 7,000!</li>
<li>Then I found and extracted another 7,000 items with no extents (pages)</li>
<li>Then I replaced all occurences of en dashes for ranges in pages with regular hyphens</li>
</ul>
</li>
</ul>
<h2id="2023-06-13">2023-06-13</h2>
<ul>
<li>Last night I finally figured out how to do basic overrides to the simple item view in Angular</li>
<li>Add a handful of new ORCID identifiers to my list and tag them on CGSpace</li>
<li>Extract a list of all the proposed actions for CG Core output types and create a <ahref="https://github.com/AgriculturalSemantics/cg-core/issues/45">new issue for them on CG Core’s GitHub repository</a></li>
<li>Extract a list of all the proposed actions for CG Core output types for MARLO and create <ahref="https://github.com/CCAFS/MARLO/issues/2479">a new issue for them on MARLO’s GitHub repository</a></li>
<li>Meeting with Indira, Ryan, and Abenet to discuss plans for the DSpace 7 focus group</li>
<li>Did some more work on the DSpace 7 Test to improve the submission forms and the look and feel</li>
<li>Extract a list of all the proposed actions for CG Core output types for MEL and create <ahref="https://github.com/CodeObia/MEL/issues/11216">a new issue for them on MEL’s GitHub repository</a></li>
<li>Today I started getting an error on DSpace 7 Test
<ul>
<li>The page loads, and then when it is almost done it goes blank to white with this in the console:</li>
</ul>
</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span>ERROR DOMException: CSSStyleSheet.cssRules getter: Not allowed to access cross-origin stylesheet
</span></span></code></pre></div><ul>
<li>I restarted Angular, but it didn’t fix it</li>
<li>The <code>yarn test:rest</code> script shows everything OK, and I haven’t changed anything recently…</li>
<li>I re-compiled the Angular UI using the default theme and it was the same…</li>
<li>I tried in Firefox Nightly and it works…
<ul>
<li>So it must be something related to the browser</li>
<li>I tried clearing all the session storage / cookies and refreshing and it worked</li>
</ul>
</li>
<li>I switched back to the CGSpace theme and it happened again
<ul>
<li>I had a hunch it might be due to the GDPR cookie plugin in my browser, so I disabled that and then refreshed and it worked… hmmm</li>
</ul>
</li>
<li>Upload thumbnails for about 42 IITA Journal Articles after resolving their DOIs and making sure they were not CC ND
<ul>
<li>I fixed a few bugs in <code>get_scihub_pdfs.py</code> in the process</li>
<li>Stefano got back to me about the MODS OAI-PMH schema test on DSpace Test
<ul>
<li>He said that it’s fine if we use iso8601 encoding for dates instead of w3cdtf and asked if we can create a custom end point for AGRIS that only includes types like Journal Articles similar to how Salem did it: <ahref="https://melspace.loc.codeobia.com/oai/agris?verb=ListRecords&metadataPrefix=mods">https://melspace.loc.codeobia.com/oai/agris?verb=ListRecords&metadataPrefix=mods</a></li>
<li>I updated DSpace Test with the new date format and said I’d work on the custom AGRIS set</li>
</ul>
</li>
</ul>
<h2id="2023-06-25">2023-06-25</h2>
<ul>
<li>Export CGSpace to check for missing Initiative collection mappings</li>
<li>I wanted to start a harvest on AReS but I’ve seen the load on the server high for a few days and I’m not sure what it is
<ul>
<li>I decided to run all updates and reboot it since it’s Sunday anyway</li>
</ul>
</li>
</ul>
<h2id="2023-06-26">2023-06-26</h2>
<ul>
<li>Since the new DSpace 7 will respect newlines in metadata fields I am curious to see how many of our abstracts have poor newlines
<ul>
<li>I exported CGSpace and used a custom text facet with this GREL expression in OpenRefine to count the number of newlines in each cell:</li>
<li>Also useful to check for general length of the text in the cell to make sure it’s a reasonably long string
<ul>
<li>I spent some time trying to find a pattern that I could use to identify “easy” targets, but there are so many exceptions that it will have to be done manually</li>
<li>I fixed a few dozen</li>
</ul>
</li>
<li>Do a bit of work on thumbnails on CGSpace</li>
<li>I’m trying to troubleshoot the Discovery error I get on DSpace 7:</li>
</ul>
<divclass="highlight"><pretabindex="0"style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><codeclass="language-console"data-lang="console"><spanstyle="display:flex;"><span>java.lang.NullPointerException: Cannot invoke "org.dspace.discovery.configuration.DiscoverySearchFilterFacet.getIndexFieldName()" because the return value of "org.dspace.content.authority.DSpaceControlledVocabularyIndex.getFacetConfig()" is null
</span></span></code></pre></div><ul>
<li>I reverted to the default <code>submission-forms.xml</code> and the <code>getFacetConfig()</code> error goes away…</li>
<li>Kill some long-held locks on CGSpace PostgreSQL, as some users are complaining of slowness in archiving</li>
<li>I did some testing of the LDAP login issue related to groupmaps
<ul>
<li>It does seem to be a regression from the <ahref="https://github.com/DSpace/DSpace/pull/8814">LDAP auth patch</a> from last month, so I <ahref="https://github.com/DSpace/DSpace/issues/8920">filed an issue</a></li>
</ul>
</li>
<li>I spent some time on working on Angular and I figured out how to add a custom Angular component to show the UN SDG Goal icons on DSpace 7</li>
</ul>
<h2id="2023-06-27">2023-06-27</h2>
<ul>
<li>I debugged the NullPointerException and somehow it disappeared
<ul>
<li>It seems to be related to the external controlled vocabularies in the submission form</li>
<li>I removed them all, then added them all back, and now the issue is solved… hmmmm</li>
<li>Oh now, now they are gone again, sigh…</li>
</ul>
</li>
</ul>
<h2id="2023-06-28">2023-06-28</h2>
<ul>
<li>Spent a lot of time debugging the browse indexes
<ul>
<li>Looking at the <ahref="https://api7.dspace.org/server/api/discover/browses">DSpace 7 demo API</a> I see the four default browse indexes from <code>dspace.cfg</code> and the one default <code>srsc</code> one that gets automatically enabled from the <code><vocabulary>srsc</vocabulary></code> in the <code>submission-forms.xml</code></li>
<li>The same API call on my test DSpace 7 configuration results in the HTTP 500 I’ve been seeing for some time, and I am pretty sure it’s due to the automagic configuration of hierarchical browses based on the submission form</li>
<li>Yes, if I remove them all from my submission form then this works: http://localhost:8080/server/api/discover/browses</li>
<li>I went through each of our vocabularies and tested them one by one:
<ul>
<li>dcterms-subject: OK</li>
<li>dc-contributor-author: NO</li>
<li>cg-creator-identifier: NO</li>
<li>cg-contributor-affiliation: OK (and with <code>facetType: "affiliation"</code> in API response?!)</li>
<li>cg-contributor-donor: OK (<code>facetType: "sponsorship"</code>)</li>
<li>cg-journal: NO</li>
<li>cg-coverage-subregion: NO</li>
<li>cg-species-breed: NO</li>
</ul>
</li>
<li>Now I need to figure out what it is about those five that causes them to not work!</li>
<li>Ah, after debugging with someone on the DSpace Slack, I realized that DSpace expects these vocabularies to have corresponding indexes configured in <code>discovery.xml</code>, and they must be added as search filters AND sidebar facets.</li>