diff --git a/content/posts/2019-12.md b/content/posts/2019-12.md
index 1976a4bbd..c3412c6c9 100644
--- a/content/posts/2019-12.md
+++ b/content/posts/2019-12.md
@@ -117,4 +117,21 @@ COPY 48
- I restarted Tomcat three times before all cores came up successfully
- While I was restarting the Tomcat service I upgraded the PostgreSQL JDBC driver to version 42.2.9, which had been deployed on DSpace Test earlier this week
+## 2019-12-16
+
+- Visit CodeObia office to discuss next phase of OpenRXV/AReS development
+ - We discussed using CSV instead of Excel for tabular reports
+ - OpenRXV should only have "simple" reports with Dublin Core fields
+ - AReS should have this as well as a customized "extended" report that has CRPs, Subjects, Sponsors, etc from CGSpace
+ - We discussed using RTF instead of Word for graphical reports
+
+## 2019-12-17
+
+- Start filing GitHub issues for the reporting features on OpenRXV and AReS
+ - I created an issue for the "simple" tabular reports on OpenRXV GitHub ([#29](https://github.com/ilri/OpenRXV/issues/29))
+ - I created an issue for the "extended" tabular reports on AReS GitHub ([#8](https://github.com/ilri/AReS/issues/8))
+ - I created an issue for "simple" text reports on the OpenRXV GitHub ([#30](https://github.com/ilri/OpenRXV/issues/30))
+ - I created an issue for "extended" text reports on the AReS GitHub ([#9](https://github.com/ilri/AReS/issues/9))
+- I looked into creating RTF documents from HTML in Node.js and there is a library called [html-to-rtf](https://www.npmjs.com/package/html-to-rtf) that works well, but doesn't support images
+
diff --git a/docs/2015-11/index.html b/docs/2015-11/index.html
index d9a62ab0f..d30f73f49 100644
--- a/docs/2015-11/index.html
+++ b/docs/2015-11/index.html
@@ -31,7 +31,7 @@ Last week I had increased the limit from 30 to 60, which seemed to help, but now
$ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
78
"/>
-
+
@@ -112,7 +112,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac
- 2015-11-22
+ 2015-11-22
- CGSpace went down
- Looks like DSpace exhausted its PostgreSQL connection pool
@@ -123,7 +123,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac
- For now I have increased the limit from 60 to 90, run updates, and rebooted the server
-2015-11-24
+2015-11-24
- CGSpace went down again
- Getting emails from uptimeRobot and uptimeButler that it's down, and Google Webmaster Tools is sending emails that there is an increase in crawl errors
@@ -134,7 +134,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac
- For some reason the number of idle connections is very high since we upgraded to DSpace 5
-2015-11-25
+2015-11-25
- Troubleshoot the DSpace 5 OAI breakage caused by nginx routing config
- The OAI application requests stylesheets and javascript files with the path
/oai/static/css
, which gets matched here:
@@ -177,7 +177,7 @@ datid | datname | pid | usesysid | usename | application_name | client_addr
- Also redeploy DSpace Test with a clean sync of CGSpace and mirror these database settings there as well
- Also deploy the nginx fixes for the
try_files
location block as well as the expires block
-2015-11-26
+2015-11-26
- CGSpace behaving much better since changing
db.maxidle
yesterday, but still two up/down notices from monitoring this morning (better than 50!)
- CCAFS colleagues mentioned that the REST API is very slow, 24 seconds for one item
@@ -195,7 +195,7 @@ datid | datname | pid | usesysid | usename | application_name | client_addr
- At the time, the current DSpace pool size was 50…
- I reduced the pool back to the default of 30, and reduced the
db.maxidle
settings from 10 to 8
-2015-11-29
+2015-11-29
- Still more alerts that CGSpace has been up and down all day
- Current database settings for DSpace:
diff --git a/docs/2015-12/index.html b/docs/2015-12/index.html
index 396d3019e..1e6053882 100644
--- a/docs/2015-12/index.html
+++ b/docs/2015-12/index.html
@@ -33,7 +33,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
-rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
"/>
-
+
@@ -114,7 +114,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
- 2015-12-02
+ 2015-12-02
- Replace
lzop
with xz
in log compression cron jobs on DSpace Test—it uses less space:
@@ -176,7 +176,7 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
- I filed a ticket on Atmire's issue tracker
- I also filed a ticket on Atmire's issue tracker for the PostgreSQL stuff
-2015-12-03
+2015-12-03
- CGSpace very slow, and monitoring emailing me to say its down, even though I can load the page (very slowly)
- Idle postgres connections look like this (with no change in DSpace db settings lately):
@@ -201,7 +201,7 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
0.806
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.854
-2015-12-05
+2015-12-05
- CGSpace has been up and down all day and REST API is completely unresponsive
- PostgreSQL idle connections are currently:
@@ -216,7 +216,7 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
-2015-12-07
+2015-12-07
- Atmire sent some fixes to DSpace's REST API code that was leaving contexts open (causing the slow performance and database issues)
- After deploying the fix to CGSpace the REST API is consistently faster:
@@ -231,7 +231,7 @@ $ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle
0.566
$ curl -o /dev/null -s -w %{time_total}\\n https://cgspace.cgiar.org/rest/handle/10568/32802?expand=all
0.497
-2015-12-08
+2015-12-08
- Switch CGSpace log compression cron jobs from using lzop to xz—the compression isn't as good, but it's much faster and causes less IO/CPU load
- Since we figured out (and fixed) the cause of the performance issue, I reverted Google Bot's crawl rate to the “Let Google optimize” setting
diff --git a/docs/2016-01/index.html b/docs/2016-01/index.html
index d30cc4295..c66591e09 100644
--- a/docs/2016-01/index.html
+++ b/docs/2016-01/index.html
@@ -25,7 +25,7 @@ Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_
I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
Update GitHub wiki for documentation of maintenance tasks.
"/>
-
+
@@ -106,22 +106,22 @@ Update GitHub wiki for documentation of maintenance tasks.
- 2016-01-13
+ 2016-01-13
- Move ILRI collection
10568/12503
from 10568/27869
to 10568/27629
using the move_collections.sh script I wrote last year.
- I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
- Update GitHub wiki for documentation of maintenance tasks.
-2016-01-14
+2016-01-14
- Update CCAFS project identifiers in input-forms.xml
- Run system updates and restart the server
-2016-01-18
+2016-01-18
- Change “Extension material” to “Extension Material” in input-forms.xml (a mistake that fell through the cracks when we fixed the others in DSpace 4 era)
-2016-01-19
+2016-01-19
- Work on tweaks and updates for the social sharing icons on item pages: add Delicious and Mendeley (from Academicons), make links open in new windows, and set the icon color to the theme's primary color (#157)
- Tweak date-based facets to show more values in drill-down ranges (#162)
@@ -129,7 +129,7 @@ Update GitHub wiki for documentation of maintenance tasks.
- Set up recipe on IFTTT to tweet new items from the CGSpace Atom feed to my twitter account
- Altmetrics’ support for Handles is kinda weak, so they can't associate our items with DOIs until they are tweeted or blogged, etc first.
-2016-01-21
+2016-01-21
- Still waiting for my IFTTT recipe to fire, two days later
- It looks like the Atom feed on CGSpace hasn't changed in two days, but there have definitely been new items
@@ -139,17 +139,17 @@ Update GitHub wiki for documentation of maintenance tasks.
- In any case, we should change this cache to be something more like 6 hours, as we publish new items several times per day.
- Work around a CSS issue with long URLs in the item view (#172)
-2016-01-25
+2016-01-25
- Re-deploy CGSpace and DSpace Test with latest
5_x-prod
branch
- This included the social icon fixes/updates, date-based facet tweaks, reducing the feed cache age, and fixing a layout issue in XMLUI item view when an item had long URLs
-2016-01-26
+2016-01-26
- Run nginx updates on CGSpace and DSpace Test (1.8.1 and 1.9.10, respectively)
- Run updates on DSpace Test and reboot for new Linode kernel
Linux 4.4.0-x86_64-linode63
(first update in months)
-2016-01-28
+2016-01-28
-2016-01-29
+2016-01-29
- Add five missing center-specific subjects to XMLUI item view (#174)
- This CCAFS item Before:
diff --git a/docs/2016-02/index.html b/docs/2016-02/index.html
index 753acb340..6cc9b1ddd 100644
--- a/docs/2016-02/index.html
+++ b/docs/2016-02/index.html
@@ -35,7 +35,7 @@ I noticed we have a very interesting list of countries on CGSpace:
Not only are there 49,000 countries, we have some blanks (25)…
Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
"/>
-
+
@@ -116,7 +116,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r
- 2016-02-05
+ 2016-02-05
- Looking at some DAGRIS data for Abenet Yabowork
- Lots of issues with spaces, newlines, etc causing the import to fail
@@ -127,7 +127,7 @@ Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE&r
- Not only are there 49,000 countries, we have some blanks (25)…
- Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
-2016-02-06
+2016-02-06
- Found a way to get items with null/empty metadata values from SQL
- First, find the
metadata_field_id
for the field you want from the metadatafieldregistry
table:
@@ -154,7 +154,7 @@ DELETE 25
- Yep! The full re-index seems to work.
- Process the empty countries on CGSpace
-2016-02-07
+2016-02-07
- Working on cleaning up Abenet's DAGRIS data with OpenRefine
- I discovered two really nice functions in OpenRefine:
value.trim()
and value.escape("javascript")
which shows whitespace characters like \r\n
!
@@ -195,14 +195,14 @@ $ /opt/brew/Cellar/tomcat/8.0.30/bin/catalina start
- After verifying that the site is working, start a full index:
$ ~/dspace/bin/dspace index-discovery -b
-
2016-02-08
+2016-02-08
- Finish cleaning up and importing ~400 DAGRIS items into CGSpace
- Whip up some quick CSS to make the button in the submission workflow use the XMLUI theme's brand colors (#154)
-2016-02-09
+2016-02-09
- Re-sync DSpace Test with CGSpace
- Help Sisay with OpenRefine
@@ -239,7 +239,7 @@ Swap: 255 57 198
- So I'll bump up the Tomcat heap to 2048 (CGSpace production server is using 3GB)
-2016-02-11
+2016-02-11
- Massaging some CIAT data in OpenRefine
- There are 1200 records that have PDFs, and will need to be imported into CGSpace
@@ -256,7 +256,7 @@ Processing 64661.pdf
Processing 64195.pdf
> Downloading 64195.pdf
> Creating thumbnail for 64195.pdf
-2016-02-12
+2016-02-12
- Looking at CIAT's records again, there are some problems with a dozen or so files (out of 1200)
- A few items are using the same exact PDF
@@ -265,7 +265,7 @@ Processing 64195.pdf
- A few items have no item
- Also, I'm not sure if we import these items, will be remove the
dc.identifier.url
field from the records?
-2016-02-12
+2016-02-12
- Looking at CIAT's records again, there are some files linking to PDFs on Slide Share, Embrapa, UEA UK, and Condesan, so I'm not sure if we can use those
- 265 items have dirty, URL-encoded filenames:
@@ -282,7 +282,7 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
- Merge pull requests for submission form theming (#178) and missing center subjects in XMLUI item views (#176)
- They will be deployed on CGSpace the next time I re-deploy
-2016-02-16
+2016-02-16
- Turns out OpenRefine has an unescape function!
@@ -296,14 +296,14 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
- To get filenames from
dc.identifier.url
, create a new column based on this transform: forEach(value.split('||'), v, v.split('/')[-1]).join('||')
- This also works for records that have multiple URLs (separated by “||”)
-2016-02-17
+2016-02-17
- Re-deploy CGSpace, run all system updates, and reboot
- More work on CIAT data, cleaning and doing a last metadata-only import into DSpace Test
- SAFBuilder has a bug preventing it from processing filenames containing more than one underscore
- Need to re-process the filename column to replace multiple underscores with one:
value.replace(/_{2,}/, "_")
-2016-02-20
+2016-02-20
- Turns out the “bug” in SAFBuilder isn't a bug, it's a feature that allows you to encode extra information like the destintion bundle in the filename
- Also, it seems DSpace's SAF import tool doesn't like importing filenames that have accents in them:
@@ -313,7 +313,7 @@ CIAT_COLOMBIA_000169_Técnicas_para_el_aislamiento_y_cultivo_de_protoplastos_de_
- Need to rename files to have no accents or umlauts, etc…
- Useful custom text facet for URLs ending with “.pdf”:
value.endsWith(".pdf")
-2016-02-22
+2016-02-22
- To change Spanish accents to ASCII in OpenRefine:
@@ -330,7 +330,7 @@ Bitstream: tést señora alimentación.pdf
- HFS+ stores filenames as a string, and filenames with accents get stored as character+accent whereas Linux's ext4 stores them as an array of bytes
- Running the SAFBuilder on Mac OS X works if you're going to import the resulting bundle on Mac OS X, but if your DSpace is running on Linux you need to run the SAFBuilder there where the filesystem's encoding matches
-2016-02-29
+2016-02-29
- Got notified by some CIFOR colleagues that the Google Scholar team had contacted them about CGSpace's incorrect ordering of authors in Google Scholar metadata
- Turns out there is a patch, and it was merged in DSpace 5.4: https://jira.duraspace.org/browse/DS-2679
diff --git a/docs/2016-03/index.html b/docs/2016-03/index.html
index 3889bce48..a7e70069d 100644
--- a/docs/2016-03/index.html
+++ b/docs/2016-03/index.html
@@ -25,7 +25,7 @@ Looking at issues with author authorities on CGSpace
For some reason we still have the index-lucene-update cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module
Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
"/>
-
+
@@ -106,13 +106,13 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
- 2016-03-02
+ 2016-03-02
- Looking at issues with author authorities on CGSpace
- For some reason we still have the
index-lucene-update
cron job active on CGSpace, but I'm pretty sure we don't need it as of the latest few versions of Atmire's Listings and Reports module
- Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
-2016-03-07
+2016-03-07
- Troubleshooting the issues with the slew of commits for Atmire modules in #182
- Their changes on
5_x-dev
branch work, but it is messy as hell with merge commits and old branch base
@@ -121,12 +121,12 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
- Restart DSpace Test, as it seems to have crashed after Sisay tried to import some CSV or zip or something:
Exception in thread "Lucene Merge Thread #19" org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device
-
2016-03-08
+2016-03-08
- Add a few new filters to Atmire's Listings and Reports module (#180)
- We had also wanted to add a few to the Content and Usage module but I have to ask the editors which ones they were
-2016-03-10
+2016-03-10
- Disable the lucene cron job on CGSpace as it shouldn't be needed anymore
- Discuss ORCiD and duplicate authors on Yammer
@@ -139,7 +139,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
- Update documentation for Atmire modules
-2016-03-11
+2016-03-11
- As I was looking at the CUA config I realized our Discovery config is all messed up and confusing
- I've opened an issue to track some of that work (#186)
@@ -147,7 +147,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
- We had been confusing
dc.type
(a Dublin Core value) with dc.type.output
(a value we invented) for a few years and it had permeated all aspects of our data, indexes, item displays, etc.
- There is still some more work to be done to remove references to old
outputtype
and output
-2016-03-14
+2016-03-14
- Fix some items that had invalid dates (I noticed them in the log during a re-indexing)
- Reset
search.index.*
to the default, as it is only used by Lucene (deprecated by Discovery in DSpace 5.x): #188
@@ -155,11 +155,11 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
- Also four or so center-specific subject strings were missing for Discovery
-2016-03-15
+2016-03-15
- Create simple theme for new AVCD community just for a unique Google Tracking ID (#191)
-2016-03-16
+2016-03-16
- Still having problems deploying Atmire's CUA updates and fixes from January!
- More discussion on the GitHub issue here: https://github.com/ilri/DSpace/pull/182
@@ -185,7 +185,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
- It seems this
dc.language
field isn't really used, but we should delete these values
- Also,
dc.language.iso
has some weird values, like “En” and “English”
-2016-03-17
+2016-03-17
- It turns out
hi
is the ISO 639 language code for Hindi, but these should be in dc.language.iso
instead of dc.language
- I fixed the eleven items with
hi
as well as some using the incorrect vn
for Vietnamese
@@ -193,7 +193,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
- Re-sync CGSpace database to DSpace Test for Atmire to do some tests about the problematic CUA patches
- The patches work fine with a clean database, so the error was caused by some mismatch in CUA versions and the database during my testing
-2016-03-18
+2016-03-18
- Merge Atmire fixes into
5_x-prod
- Discuss thumbnails with Francesca from Bioversity
@@ -211,7 +211,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
- Also, it looks like adding
-sharpen 0x1.0
really improves the quality of the image for only a few KB
-2016-03-21
+2016-03-21
- Fix 66 site errors in Google's webmaster tools
- I looked at a bunch of them and they were old URLs, weird things linked from non-existent items, etc, so I just marked them all as fixed
@@ -245,11 +245,11 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
- Run updates on CGSpace and reboot server (new kernel,
4.5.0
)
- Deploy Let's Encrypt certificate for cgspace.cgiar.org, but still need to work it into the ansible playbooks
-2016-03-22
+2016-03-22
- Merge robots.txt patch and disallow indexing of browse pages as our sitemap is consumed correctly (#198)
-2016-03-23
+2016-03-23
@@ -258,18 +258,18 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
- I can reproduce the same error on DSpace Test and on my Mac
- Looks to be an issue with the Atmire modules, I've submitted a ticket to their tracker.
-2016-03-24
+2016-03-24
-2016-03-25
+2016-03-25
- Having problems with Listings and Reports, seems to be caused by a rogue reference to
dc.type.output
- This is the error we get when we proceed to the second page of Listings and Reports: https://gist.github.com/alanorth/b2d7fb5b82f94898caaf
- Commenting out the line works, but I haven't figured out the proper syntax for referring to
dc.type.*
-2016-03-28
+2016-03-28
- Look into enabling the embargo during item submission, see: https://wiki.duraspace.org/display/DSDOC5x/Embargo#Embargo-SubmissionProcess
- Seems we only want
AccessStep
because UploadWithEmbargoStep
disables the ability to edit embargos at the item level
@@ -281,7 +281,7 @@ Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Ja
- This pull request simply updates the config for the dc.type.output → dc.type change that was made last week: https://github.com/ilri/DSpace/pull/204
- Deploy robots.txt fix, embargo for item submissions, and listings and reports fix on CGSpace
-2016-03-29
+2016-03-29
- Skype meeting with Peter and Addis team to discuss metadata changes for Dublin Core, CGcore, and CGSpace-specific fields
- We decided to proceed with some deletes first, then identify CGSpace-specific fields to clean/move to
cg.*
, and then worry about broader changes to DC
diff --git a/docs/2016-04/index.html b/docs/2016-04/index.html
index b3d5af323..c6724bf09 100644
--- a/docs/2016-04/index.html
+++ b/docs/2016-04/index.html
@@ -29,7 +29,7 @@ After running DSpace for over five years I've never needed to look in any ot
This will save us a few gigs of backup space we're paying for on S3
Also, I noticed the checker log has some errors we should pay attention to:
"/>
-
+
@@ -110,7 +110,7 @@ Also, I noticed the checker log has some errors we should pay attention to:
- 2016-04-04
+ 2016-04-04
- Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
- We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
@@ -146,7 +146,7 @@ java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290
- Looks like cron will read limits from
/etc/security/limits.*
so we can do something for the tomcat7 user there
- Submit pull request for Tomcat 7 limits in Ansible dspace role (#30)
-2016-04-05
+2016-04-05
- Reduce Amazon S3 storage used for logs from 46 GB to 6GB by deleting a bunch of logs we don't need!
@@ -159,7 +159,7 @@ java.io.FileNotFoundException: /home/cgspace.cgiar.org/assetstore/64/29/06/64290
- Also, adjust the cron jobs for backups so they only backup
dspace.log
and some stats files (.dat)
- Try to do some metadata field migrations using the Atmire batch UI (
dc.Species
→ cg.species
) but it took several hours and even missed a few records
-2016-04-06
+2016-04-06
- A better way to move metadata on this scale is via SQL, for example
dc.type.output
→ dc.type
(their IDs in the metadatafieldregistry are 66 and 109, respectively):
@@ -169,7 +169,7 @@ UPDATE 40852
- After that an
index-discovery -bf
is required
- Start working on metadata migrations, add 25 or so new metadata fields to CGSpace
-2016-04-07
+2016-04-07
- Write shell script to do the migration of fields: https://gist.github.com/alanorth/72a70aca856d76f24c127a6e67b3342b
- Testing with a few fields it seems to work well:
@@ -181,12 +181,12 @@ UPDATE metadatavalue SET metadata_field_id=202 WHERE metadata_field_id=72
UPDATE 21420
UPDATE metadatavalue SET metadata_field_id=203 WHERE metadata_field_id=76
UPDATE 51258
-2016-04-08
+2016-04-08
- Discuss metadata renaming with Abenet, we decided it's better to start with the center-specific subjects like ILRI, CIFOR, CCAFS, IWMI, and CPWF
- I've e-mailed CCAFS and CPWF people to ask them how much time it will take for them to update their systems to cope with this change
-2016-04-10
+2016-04-10
- Looking at the DOI issue reported by Leroy from CIAT a few weeks ago
- It seems the
dx.doi.org
URLs are much more proper in our repository!
@@ -204,12 +204,12 @@ dspacetest=# select count(*) from metadatavalue where metadata_field_id=74 and t
- I will manually edit the
dc.identifier.doi
in 10568/72509 and tweet the link, then check back in a week to see if the donut gets updated
-2016-04-11
+2016-04-11
- The donut is already updated and shows the correct number now
- CCAFS people say it will only take them an hour to update their code for the metadata renames, so I proposed we'd do it tentatively on Monday the 18th.
-2016-04-12
+2016-04-12
- Looking at quality of WLE data (
cg.subject.iwmi
) in SQL:
@@ -235,17 +235,17 @@ DELETE 226
- Unfortunately this isn't a very good solution, because Listings and Reports config should allow us to filter on
dc.type.*
but the documentation isn't very clear and I couldn't reach Atmire today
- We want to do the
dc.type.output
move on CGSpace anyways, but we should wait as it might affect other external people!
-2016-04-14
+2016-04-14
- Communicate with Macaroni Bros again about
dc.type
- Help Sisay with some rsync and Linux stuff
- Notify CIAT people of metadata changes (I had forgotten them last week)
-2016-04-15
+2016-04-15
- DSpace Test had crashed, so I ran all system updates, rebooted, and re-deployed DSpace code
-2016-04-18
+2016-04-18
- Talk to CIAT people about their portal again
- Start looking more at the fields we want to delete
@@ -316,7 +316,7 @@ javax.ws.rs.WebApplicationException
- Everything else in the system looked normal (50GB disk space available, nothing weird in dmesg, etc)
- After restarting Tomcat a few more of these errors were logged but the application was up
-2016-04-19
+2016-04-19
- Get handles for items that are using a given metadata field, ie
dc.Species.animal
(105):
@@ -355,7 +355,7 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
- And then remove them from the metadata registry
-2016-04-20
+2016-04-20
- Re-deploy DSpace Test with the new subject and type fields, run all system updates, and reboot the server
- Migrate fields and re-deploy CGSpace with the new subject and type fields, run all system updates, and reboot the server
@@ -386,16 +386,16 @@ UPDATE 46075
- Looks like this issue was noted and fixed in DSpace 5.5 (we're on 5.1): https://jira.duraspace.org/browse/DS-2936
- I've sent a message to Atmire asking about compatibility with DSpace 5.5
-2016-04-21
+2016-04-21
- Fix a bunch of metadata consistency issues with IITA Journal Articles (Peer review, Formally published, messed up DOIs, etc)
- Atmire responded with DSpace 5.5 compatible versions for their modules, so I'll start testing those in a few weeks
-2016-04-22
+2016-04-22
-2016-04-26
+2016-04-26
- Test embargo during item upload
- Seems to be working but the help text is misleading as to the date format
@@ -409,7 +409,7 @@ UPDATE 46075
-2016-04-27
+2016-04-27
- I woke up to ten or fifteen “up” and “down” emails from the monitoring website
- Looks like the last one was “down” from about four hours ago
@@ -451,12 +451,12 @@ dspace.log.2016-04-27:7271
- Currently running on DSpace Test, we'll give it a few days before we adjust CGSpace
- CGSpace down, restarted tomcat and it's back up
-2016-04-28
+2016-04-28
- Problems with stability again. I've blocked access to
/rest
for now to see if the number of errors in the log files drop
- Later we could maybe start logging access to
/rest
and perhaps whitelist some IPs…
-2016-04-30
+2016-04-30
- Logs for today and yesterday have zero references to this REST error, so I'm going to open back up the REST API but log all requests
diff --git a/docs/2016-05/index.html b/docs/2016-05/index.html
index cda7ee263..00b9eafa2 100644
--- a/docs/2016-05/index.html
+++ b/docs/2016-05/index.html
@@ -31,7 +31,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
# awk '{print $1}' /var/log/nginx/rest.log | uniq | wc -l
3168
"/>
-
+
@@ -112,7 +112,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
- 2016-05-01
+ 2016-05-01
- Since yesterday there have been 10,000 REST errors and the site has been unstable again
- I have blocked access to the API now
@@ -129,13 +129,13 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
- For now I'll block just the Ethiopian IP
- The owner of that application has said that the
NaN
(not a number) is an error in his code and he'll fix it
-2016-05-03
+2016-05-03
- Update nginx to 1.10.x branch on CGSpace
- Fix a reference to
dc.type.output
in Discovery that I had missed when we migrated to dc.type
last month (#223)
-2016-05-06
+2016-05-06
- DSpace Test is down,
catalina.out
has lots of messages about heap space from some time yesterday (!)
- It looks like Sisay was doing some batch imports
@@ -168,7 +168,7 @@ fi
-2016-05-10
+2016-05-10
- Start looking at more metadata migrations
- There are lots of fields in
dcterms
namespace that look interesting, like:
@@ -181,7 +181,7 @@ fi
- Looks like these were added in DSpace 4 to allow for future work to make DSpace more flexible
- CGSpace's
dc
registry has 96 items, and the default DSpace one has 73.
-2016-05-11
+2016-05-11
-2016-05-12
+2016-05-12
- Looks like the issue that Abenet was having a few days ago with “Connection Reset” in Firefox might be due to a Firefox 46 issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1268775
- I finally found a copy of the latest CG Core metadata guidelines and it looks like we can add a few more fields to our next migration:
@@ -233,7 +233,7 @@ fi
- Found ~200 messed up CIAT values in
dc.publisher
:
# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=39 and text_value similar to "% %";
-
2016-05-13
+2016-05-13
- More theorizing about CGcore
- Add two new fields:
@@ -245,7 +245,7 @@ fi
dc.place
is our own field, so it's easy to move
- I've removed
dc.title.jtitle
from the list for now because there's no use moving it out of DC until we know where it will go (see discussion yesterday)
-2016-05-18
+2016-05-18
- Work on 707 CCAFS records
- They have thumbnails on Flickr and elsewhere
@@ -257,7 +257,7 @@ fi
- So for the
hqdefault.jpg
ones I just take the UUID (-2) and use it as the filename
- Before importing with SAFBuilder I tested adding “__bundle:THUMBNAIL” to the
filename
column and it works fine
-2016-05-19
+2016-05-19
- More quality control on
filename
field of CCAFS records to make processing in shell and SAFBuilder more reliable:
@@ -274,7 +274,7 @@ fi
# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=75 and (text_value like 'PN%' or text_value like 'PHASE%' or text_value = 'CBA' or text_value = 'IA');
-
2016-05-20
+2016-05-20
- More work on CCAFS Video and Images records
- For SAFBuilder we need to modify filename column to have the thumbnail bundle:
@@ -290,14 +290,14 @@ fi
- A few miscellaneous fixes for XMLUI display niggles (spaces in item lists and link target
_black
): #224
- Work on configuration changes for Phase 2 metadata migrations
-2016-05-23
+2016-05-23
- Try to import the CCAFS Images and Videos to CGSpace but had some issues with LibreOffice and OpenRefine
- LibreOffice excludes empty cells when it exports and all the fields shift over to the left and cause URLs to go to Subjects, etc.
- Google Docs does this better, but somehow reorders the rows and when I paste the thumbnail/filename row in they don't match!
- I will have to try later
-2016-05-30
+2016-05-30
- Export CCAFS video and image records from DSpace Test using the migrate option (
-m
):
@@ -320,7 +320,7 @@ $ /home/cgspace.cgiar.org/bin/dspace metadata-import -e aorth@mjanja.ch -f ~/CTA
Discovery indexing took a few hours for some reason, and after that I started the index-authority
script
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" /home/cgspace.cgiar.org/bin/dspace index-authority
-
2016-05-31
+2016-05-31
- The
index-authority
script ran over night and was finished in the morning
- Hopefully this was because we haven't been running it regularly and it will speed up next time
diff --git a/docs/2016-06/index.html b/docs/2016-06/index.html
index fa7738d97..66ab9f0a5 100644
--- a/docs/2016-06/index.html
+++ b/docs/2016-06/index.html
@@ -31,7 +31,7 @@ This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRec
You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
"/>
-
+
@@ -112,7 +112,7 @@ Working on second phase of metadata migration, looks like this will work for mov
- 2016-06-01
+ 2016-06-01
- Experimenting with IFPRI OAI (we want to harvest their publications)
- After reading the ContentDM documentation I found IFPRI's OAI endpoint: http://ebrary.ifpri.org/oai/oai.php
@@ -128,7 +128,7 @@ UPDATE 14
- Fix a few minor miscellaneous issues in
dspace.cfg
(#227)
-2016-06-02
+2016-06-02
- Testing the configuration and theme changes for the upcoming metadata migration and I found some issues with
cg.coverage.admin-unit
- Seems that the Browse configuration in
dspace.cfg
can't handle the ‘-’ in the field name:
@@ -141,7 +141,7 @@ UPDATE 14
- I found a thread on the mailing list talking about it and there is bug report and a patch: https://jira.duraspace.org/browse/DS-2740
- The patch applies successfully on DSpace 5.1 so I will try it later
-2016-06-03
+2016-06-03
- Investigating the CCAFS authority issue, I exported the metadata for the Videos collection
- The top two authors are:
@@ -197,13 +197,13 @@ UPDATE 960
- That would only be for the “Browse by” function… so we'll have to see what effect that has later
-2016-06-04
+2016-06-04
- Re-sync DSpace Test with CGSpace and perform test of metadata migration again
- Run phase two of metadata migrations on CGSpace (see the migration notes)
- Run all system updates and reboot CGSpace server
-2016-06-07
+2016-06-07
- Figured out how to export a list of the unique values from a metadata field ordered by count:
@@ -230,7 +230,7 @@ UPDATE 960
Looks like OAI is kinda obtuse for this, and if we use ContentDM's API we'll be able to access their internal field names (rather than trying to figure out how they stuffed them into various, repeated Dublin Core fields)
-2016-06-08
+2016-06-08
-2016-06-09
+2016-06-09
- Atmire explained that the
atmire.orcid.id
field doesn't exist in the schema, as it actually comes from the authority cache during XMLUI run time
- This means we don't see it when harvesting via OAI or REST, for example
- They opened a feature ticket on the DSpace tracker to ask for support of this: https://jira.duraspace.org/browse/DS-3239
-2016-06-10
+2016-06-10
- Investigating authority confidences
- It looks like the values are documented in
Choices.java
@@ -269,16 +269,16 @@ UPDATE 960
- Merge item display tweaks from earlier this week (#231)
- Merge controlled vocabulary functionality for subregions (#238)
-2016-06-11
+2016-06-11
- Merge controlled vocabulary for sponsorship field (#239)
- Fix character encoding issues for animal breed lookup that I merged yesterday
-2016-06-17
+2016-06-17
- Linode has free RAM upgrades for their 13th birthday so I migrated DSpace Test (4→8GB of RAM)
-2016-06-18
+2016-06-18
-2016-06-20
+2016-06-20
- CGSpace's HTTPS certificate expired last night and I didn't notice, had to renew:
@@ -316,7 +316,7 @@ UPDATE 960
- I really need to fix that cron job…
-2016-06-24
+2016-06-24
- Run the replacements/deletes for
dc.description.sponsorship
(investors) on CGSpace:
@@ -332,7 +332,7 @@ $ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.spons
- Add new sponsors to controlled vocabulary (#244)
- Refine submission form labels and hints
-2016-06-28
+2016-06-28
- Testing the cleanup of
dc.contributor.corporate
with 13 deletions and 121 replacements
- There are still ~97 fields that weren't indicated to do anything
@@ -342,7 +342,7 @@ $ ./delete-metadata-values.py -i investors-delete-82.csv -f dc.description.spons
- Re-evaluate
dc.contributor.corporate
and it seems we will move it to dc.contributor.author
as this is more in line with how editors are actually using it
-2016-06-29
+2016-06-29
- Test run of
migrate-fields.sh
with the following re-mappings:
@@ -371,7 +371,7 @@ $ ./delete-metadata-values.py -f dc.contributor.corporate -i Corporate-Authors-D
- Run all system updates on the servers and reboot
- Start working on config changes for phase three of the metadata migrations
-2016-06-30
+2016-06-30
- Wow, there are 95 authors in the database who have ‘,’ at the end of their name:
diff --git a/docs/2016-07/index.html b/docs/2016-07/index.html
index 7e75a233c..f81557605 100644
--- a/docs/2016-07/index.html
+++ b/docs/2016-07/index.html
@@ -41,7 +41,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
In this case the select query was showing 95 results before the update
"/>
-
+
@@ -122,7 +122,7 @@ In this case the select query was showing 95 results before the update
- 2016-07-01
+ 2016-07-01
- Add
dc.description.sponsorship
to Discovery sidebar facets and make investors clickable in item view (#232)
- I think this query should find and replace all authors that have “,” at the end of their names:
@@ -136,15 +136,15 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
- In this case the select query was showing 95 results before the update
-2016-07-02
+2016-07-02
- Comment on DSpace Jira ticket about author lookup search text (DS-2329)
-2016-07-04
+2016-07-04
- Seems the database's author authority values mean nothing without the
authority
Solr core from the host where they were created!
-2016-07-05
+2016-07-05
- Amend
backup-solr.sh
script so it backs up the entire Solr folder
- We really only need
statistics
and authority
but meh
@@ -157,7 +157,7 @@ dspacetest=# select text_value from metadatavalue where metadata_field_id=3 and
- I tested the patch for DS-2740 that I had found last month and it seems to work
- I will merge it to
5_x-prod
-2016-07-06
+2016-07-06
- Delete 23 blank metadata values from CGSpace:
@@ -186,22 +186,22 @@ $ ./delete-metadata-values.py -f dc.contributor.affiliation -i Affiliations-Dele
- I then ran all server updates and rebooted the server
-2016-07-11
+2016-07-11
- Doing some author cleanups from Peter and Abenet:
$ ./fix-metadata-values.py -i /tmp/Authors-Fix-205-UTF8.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
$ ./delete-metadata-values.py -f dc.contributor.author -i /tmp/Authors-Delete-UTF8.csv -m 3 -u dspacetest -d dspacetest -p fuuu
-
2016-07-13
+2016-07-13
- Run the author cleanups on CGSpace and start a full Discovery re-index
-2016-07-14
+2016-07-14
- Test LDAP settings for new root LDAP
- Seems to work when binding as a top-level user
-2016-07-18
+2016-07-18
- Adjust identifiers in XMLUI item display to be more prominent
- Add species and breed to the XMLUI item display
@@ -226,12 +226,12 @@ org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error
proxy_pass http://127.0.0.1:8443;
deny 70.32.99.142;
}
-2016-07-21
+2016-07-21
-2016-07-22
+2016-07-22
- Help Paola from CCAFS with thumbnails for batch uploads
- She has been struggling to get the dimensions right, and manually enlarging smaller thumbnails, renaming PNGs to JPG, etc
@@ -268,7 +268,7 @@ index.authority.ignore-variants=true
- After re-indexing and clearing the XMLUI cache nothing has changed
-2016-07-25
+2016-07-25
- Trying a few more settings (plus reindex) for Discovery on DSpace Test:
@@ -292,7 +292,7 @@ discovery.index.authority.ignore-variants=true
- Re-sync DSpace Test with CGSpace
- I noticed that our backup scripts don't send Solr cores to S3 so I amended the script
-2016-07-31
+2016-07-31
- Work on removing Dryland Systems and Humidtropics subjects from Discovery sidebar and Browse by
- Also change “Subjects” to “AGROVOC keywords” in Discovery sidebar/search and Browse by (#257)
diff --git a/docs/2016-08/index.html b/docs/2016-08/index.html
index f81ab27bd..964b5e916 100644
--- a/docs/2016-08/index.html
+++ b/docs/2016-08/index.html
@@ -39,7 +39,7 @@ $ git checkout -b 55new 5_x-prod
$ git reset --hard ilri/5_x-prod
$ git rebase -i dspace-5.5
"/>
-
+
@@ -120,7 +120,7 @@ $ git rebase -i dspace-5.5
- 2016-08-01
+ 2016-08-01
- Add updated distribution license from Sisay (#259)
- Play with upgrading Mirage 2 dependencies in
bower.json
because most are several versions of out date
@@ -141,33 +141,33 @@ $ git rebase -i dspace-5.5
- Eventually I just turned on git rerere and solved the conflicts and completed the 403 commit rebase
- The 5.5 code now builds but doesn't run (white page in Tomcat)
-2016-08-02
+2016-08-02
- Ask Atmire for help with DSpace 5.5 issue
- Vanilla DSpace 5.5 deploys and runs fine
- Playing with DSpace in Ubuntu 16.04 and Tomcat 7
- Everything is still fucked up, even vanilla DSpace 5.5
-2016-08-04
+2016-08-04
- Ask on DSpace mailing list about duplicate authors, Discovery and author text values
- Atmire responded with some new DSpace 5.5 ready versions to try for their modules
-2016-08-05
+2016-08-05
- Fix item display incorrectly displaying Species when Breeds were present (#260)
- Experiment with fixing more authors, like Delia Grace:
dspacetest=# update metadatavalue set authority='0b4fcbc1-d930-4319-9b4d-ea1553cca70b', confidence=600 where metadata_field_id=3 and text_value='Grace, D.';
-
2016-08-06
+2016-08-06
- Finally figured out how to remove “View/Open” and “Bitstreams” from the item view
-2016-08-07
+2016-08-07
- Start working on Ubuntu 16.04 Ansible playbook for Tomcat 8, PostgreSQL 9.5, Oracle 8, etc
-2016-08-08
+2016-08-08
- Still troubleshooting Atmire modules on DSpace 5.5
- Vanilla DSpace 5.5 works on Tomcat 7…
@@ -190,13 +190,13 @@ $ ln -sv ~/dspace/webapps/oai /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/oai
$ ln -sv ~/dspace/webapps/jspui /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/jspui
$ ln -sv ~/dspace/webapps/rest /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/rest
$ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/solr
-2016-08-09
+2016-08-09
-2016-08-10
+2016-08-10
- Turns out DSpace 5.x isn't ready for Tomcat 8: https://jira.duraspace.org/browse/DS-3092
- So we'll need to use Tomcat 7 + Java 8 on Ubuntu 16.04
@@ -204,27 +204,27 @@ $ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/sol
- Merge pull request for fixing the type Discovery index to use
dc.type
(#262)
- Merge pull request for removing “Bitstream” text from item display, as it confuses users and isn't necessary (#263)
-2016-08-11
+2016-08-11
- Finally got DSpace (5.5) running on Ubuntu 16.04, Tomcat 7, Java 8, PostgreSQL 9.5 via the updated Ansible stuff
-2016-08-14
+2016-08-14
-2016-08-15
+2016-08-15
-2016-08-16
+2016-08-16
- Troubleshoot Paramiko connection issues with Ansible on ILRI servers: #37
- Turns out we need to add some MACs to our
sshd_config
: hmac-sha2-512,hmac-sha2-256
- Update DSpace Test's Java to version 8 to start testing this configuration (seeing as Solr recommends it)
-2016-08-17
+2016-08-17
- More work on Let's Encrypt stuff for Ansible roles
- Yesterday Atmire responded about DSpace 5.5 issues and asked me to try the
dspace database repair
command to fix Flyway issues
@@ -233,7 +233,7 @@ $ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/sol
- After removing the spring folder and running ant install again,
dspace database
works
- I see there are missing and pending Flyway migrations, but running
dspace database repair
and dspace database migrate
does nothing: https://gist.github.com/alanorth/41ed5abf2ff32d8ac9eedd1c3d015d70
-2016-08-18
+2016-08-18
- Fix “CONGO,DR” country name in
input-forms.xml
(#264)
- Also need to fix existing records using the incorrect form in the database:
@@ -242,7 +242,7 @@ $ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/sol
- I asked a question on the DSpace mailing list about updating “preferred” forms of author names from ORCID
-2016-08-21
+2016-08-21
- A few days ago someone on the DSpace mailing list suggested I try
dspace dsrun org.dspace.authority.UpdateAuthorities
to update preferred author names from ORCID
- If you set
auto-update-items=true
in dspace/config/modules/solrauthority.cfg
it is supposed to update records it finds automatically
@@ -250,7 +250,7 @@ $ ln -sv ~/dspace/webapps/solr /opt/brew/Cellar/tomcat/8.5.4/libexec/webapps/sol
- Still troubleshooting Atmire modules on DSpace 5.5
- I sent them some new verbose logs: https://gist.github.com/alanorth/700748995649688148ceba89d760253e
-2016-08-22
+2016-08-22
- Database migrations are fine on DSpace 5.1:
@@ -286,7 +286,7 @@ Database Driver: PostgreSQL Native Driver version PostgreSQL 9.1 JDBC4 (build 90
- So I'm not sure why they have problems when we move to DSpace 5.5 (even the 5.1 migrations themselves show as “Missing”)
-2016-08-23
+2016-08-23
- Help Paola from CCAFS with her thumbnails again
- Talk to Atmire about the DSpace 5.5 issue, and it seems to be caused by a bug in FlywayDB
@@ -311,13 +311,13 @@ context:/jndi:/localhost/themes/0_CGIAR/sitemap.xmap - 136:77
- I tried with a small version bump to CUA but it didn't work (version
5.5-4.1.1-0
)
- Also, I started looking into huge pages to prepare for PostgreSQL 9.5, but it seems Linode's kernels don't enable them
-2016-08-24
+2016-08-24
- Clean up and import 48 CCAFS records into DSpace Test
- SQL to get all journal titles from dc.source (55), since it's apparently used for internal DSpace filename shit, but we moved all our journal titles there a few months ago:
dspacetest=# select distinct text_value from metadatavalue where metadata_field_id=55 and text_value !~ '.*(\.pdf|\.png|\.PDF|\.Pdf|\.JPEG|\.jpg|\.JPG|\.jpeg|\.xls|\.rtf|\.docx?|\.potx|\.dotx|\.eqa|\.tiff|\.mp4|\.mp3|\.gif|\.zip|\.txt|\.pptx|\.indd|\.PNG|\.bmp|\.exe|org\.dspace\.app\.mediafilter).*';
-
2016-08-25
+2016-08-25
- Atmire suggested adding a missing bean to
dspace/config/spring/api/atmire-cua.xml
but it doesn't help:
@@ -347,7 +347,7 @@ $ JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m" /home/cgspace.cgiar.org/b
- Finally got DSpace 5.5 working with the Atmire modules after a few rounds of back and forth with Atmire devs
-2016-08-26
+2016-08-26
- CGSpace had issues tonight, not entirely crashing, but becoming unresponsive
- The dspace log had this:
@@ -356,7 +356,7 @@ $ JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx512m" /home/cgspace.cgiar.org/b
- Related to /rest no doubt
-2016-08-27
+2016-08-27
- Run corrections for Delia Grace and
CONGO, DR
, and deploy August changes to CGSpace
- Run all system updates and reboot the server
diff --git a/docs/2016-09/index.html b/docs/2016-09/index.html
index 68b0b363e..a6b6f798e 100644
--- a/docs/2016-09/index.html
+++ b/docs/2016-09/index.html
@@ -31,7 +31,7 @@ It looks like we might be able to use OUs now, instead of DCs:
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
"/>
-
+
@@ -112,7 +112,7 @@ $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=or
- 2016-09-01
+ 2016-09-01
- Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors
- Discuss how the migration of CGIAR's Active Directory to a flat structure will break our LDAP groups in DSpace
@@ -203,7 +203,7 @@ dspacetest=# select distinct text_value, authority, confidence from metadatavalu
- After updating the Authority indexes (
bin/dspace index-authority
) everything looks good
- Run authority updates on CGSpace
-2016-09-05
+2016-09-05
- After one week of logging TLS connections on CGSpace:
@@ -222,7 +222,7 @@ TLSv1/EDH-RSA-DES-CBC3-SHA
- This gives you, for example:
Mainstreaming gender in agricultural R&D.pdf__description:Brief
-2016-09-06
+2016-09-06
-2016-09-20
+2016-09-20
- Run all system updates on DSpace Test and reboot the server
- Merge changes for sponsorship and affiliation controlled vocabularies (#267, #268)
@@ -461,7 +461,7 @@ $ ./delete-metadata-values.py -f cg.contributor.affiliation -i affiliations_pb-2
- I need to read the docs and ask on the mailing list to see if we can tweak that
- Generate a new list of sponsors from the database for Peter Ballantyne so we can clean them up and update the controlled vocabulary
-2016-09-21
+2016-09-21
- Turns out the Solr search logic switched from OR to AND in DSpace 6.0 and the change is easy to backport: https://jira.duraspace.org/browse/DS-2809
- We just need to set this in
dspace/solr/search/conf/schema.xml
:
@@ -490,11 +490,11 @@ $ ./delete-metadata-values.py -i sponsors-delete-8.csv -f dc.description.sponsor
- I need to run these and the others from a few days ago on CGSpace the next time we run updates
- Also, I need to update the controlled vocab for sponsors based on these
-2016-09-22
+2016-09-22
- Update controlled vocabulary for sponsorship based on the latest corrected values from the database
-2016-09-25
+2016-09-25
- Merge accession date improvements for CUA module (#275)
- Merge addition of accession date to Discovery search filters (#276)
@@ -520,7 +520,7 @@ OCSP Response Data:
-2016-09-27
+2016-09-27
- Discuss fixing some ORCIDs for CCAFS author Sonja Vermeulen with Magdalena Haman
- This author has a few variations:
@@ -546,7 +546,7 @@ UPDATE 101
- We can also replace the RSS and mail icons in community text!
- Fix reference to
dc.type.*
in Atmire CUA module, as we now only index dc.type
for “Output type”
-2016-09-28
+2016-09-28
- Make a placeholder pull request for
discovery.xml
changes (#278), as I still need to test their effect on Atmire content analysis module
- Make a placeholder pull request for Font Awesome changes (#279), which replaces the GitHub image in the footer with an icon, and add style for RSS and @ icons that I will start replacing in community/collection HTML intros
@@ -565,7 +565,7 @@ dspacetest=# update metadatavalue set authority='09e4da69-33a3-45ca-b110-7d3f82d
$ ./fix-metadata-values.py -i ilrisubjects-fix-32.csv -f cg.subject.ilri -t correct -m 203 -d dspace -u dspace -p fuuuu
$ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -m 203 -d dspace -u dspace -p fuuu
-
2016-09-29
+2016-09-29
- Add
cg.identifier.ciatproject
to metadata registry in preparation for CIAT project tag
- Merge changes for CIAT project tag (#282)
@@ -573,7 +573,7 @@ $ ./delete-metadata-values.py -i ilrisubjects-delete-13.csv -f cg.subject.ilri -
- People on DSpace mailing list gave me a query to get authors from certain collections:
dspacetest=# select distinct text_value from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/5472', '10568/5473')));
-
2016-09-30
+2016-09-30
- Deny access to REST API's
find-by-metadata-field
endpoint to protect against an upstream security issue (DS-3250)
- There is a patch but it is only for 5.5 and doesn't apply cleanly to 5.1
diff --git a/docs/2016-10/index.html b/docs/2016-10/index.html
index 598b0a1e6..255b2d2c5 100644
--- a/docs/2016-10/index.html
+++ b/docs/2016-10/index.html
@@ -39,7 +39,7 @@ I exported a random item's metadata as CSV, deleted all columns except id an
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
"/>
-
+
@@ -120,7 +120,7 @@ I exported a random item's metadata as CSV, deleted all columns except id an
- 2016-10-03
+ 2016-10-03
- Testing adding ORCIDs to a CSV file for a single item to see if the author orders get messed up
- Need to test the following scenarios to see how author order is affected:
@@ -141,7 +141,7 @@ I exported a random item's metadata as CSV, deleted all columns except id an
- Looks like we'll just have to add the text to the About page (without a link) or add a separate page
-2016-10-04
+2016-10-04
- Start testing cleanups of authors that Peter sent last week
- Out of 40,000+ rows, Peter had indicated corrections for ~3,200 of them—too many to look through carefully, so I did some basic quality checking:
@@ -161,12 +161,12 @@ $ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -
- Generate list of unique authors in CCAFS collections:
dspacetest=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/32729', '10568/5472', '10568/5473', '10568/10288', '10568/70974', '10568/3547', '10568/3549', '10568/3531','10568/16890','10568/5470','10568/3546', '10568/36024', '10568/66581', '10568/21789', '10568/5469', '10568/5468', '10568/3548', '10568/71053', '10568/25167'))) group by text_value order by count desc) to /tmp/ccafs-authors.csv with csv;
-
2016-10-05
+2016-10-05
- Work on more infrastructure cleanups for Ansible DSpace role
- Clean up Let's Encrypt plumbing and submit pull request for rmg-ansible-public (#60)
-2016-10-06
+2016-10-06
- Nice! DSpace Test (linode02) is now having
java.lang.OutOfMemoryError: Java heap space
errors…
- Heap space is 2048m, and we have 5GB of RAM being used for OS cache (Solr!) so let's just bump the memory to 3072m
@@ -177,7 +177,7 @@ $ ./delete-metadata-values.py -i authors-delete-3.csv -f dc.contributor.author -
- Turns out the first PDF was exported from InDesign using CMYK and the second one was using sRGB
- Run all system updates on DSpace Test and reboot it
-2016-10-08
+2016-10-08
- Re-deploy CGSpace with latest changes from late September and early October
- Run fixes for ILRI subjects and delete blank metadata values:
@@ -193,13 +193,13 @@ DELETE 11
- Delete 2GB
cron-filter-media.log
file, as it is just a log from a cron job and it doesn't get rotated like normal log files (almost a year now maybe)
-2016-10-14
+2016-10-14
- Run all system updates on DSpace Test and reboot server
- Looking into some issues with Discovery filters in Atmire's content and usage analysis module after adjusting the filter class
- Looks like changing the filters from
configuration.DiscoverySearchFilterFacet
to configuration.DiscoverySearchFilter
breaks them in Atmire CUA module
-2016-10-17
+2016-10-17
- A bit more cleanup on the CCAFS authors, and run the corrections on DSpace Test:
@@ -207,7 +207,7 @@ DELETE 11
- One observation is that there are still some old versions of names in the author lookup because authors appear in other communities (as we only corrected authors from CCAFS for this round)
-2016-10-18
+2016-10-18
- Start working on DSpace 5.5 porting work again:
@@ -221,7 +221,7 @@ $ git rebase -i dspace-5.5
- Merge the
discovery.xml
cleanups (#278)
- Merge some minor edits to the distribution license (#285)
-2016-10-19
+2016-10-19
- When we move to DSpace 5.5 we should also cherry pick some patches from 5.6 branch:
@@ -231,12 +231,12 @@ $ git rebase -i dspace-5.5
-2016-10-20
+2016-10-20
- Run CCAFS author corrections on CGSpace
- Discovery reindexing took forever and kinda caused CGSpace to crash, so I ran all system updates and rebooted the server
-2016-10-25
+2016-10-25
- Move the LIVES community from the top level to the ILRI projects community
@@ -279,7 +279,7 @@ dspace=# update metadatavalue set text_value = regexp_replace(text_value, '<i
- And now that I start looking, I want to fix a bunch of links to popular sites that should be using HTTPS, like Twitter, Facebook, Google, Feed Burner, DOI, etc
- I should look to see if any of those domains is sending an HTTP 301 or setting HSTS headers to their HTTPS domains, then just replace them
-2016-10-27
+2016-10-27
- Run Font Awesome fixes on DSpace Test:
@@ -309,7 +309,7 @@ UPDATE 0
- Run the same replacements on CGSpace
-2016-10-30
+2016-10-30
- Fix some messed up authors on CGSpace:
diff --git a/docs/2016-11/index.html b/docs/2016-11/index.html
index c0b62277f..db8df5b9b 100644
--- a/docs/2016-11/index.html
+++ b/docs/2016-11/index.html
@@ -23,7 +23,7 @@ Add dc.type to the output options for Atmire's Listings and Reports module (
Add dc.type to the output options for Atmire's Listings and Reports module (#286)
"/>
-
+
@@ -104,12 +104,12 @@ Add dc.type to the output options for Atmire's Listings and Reports module (
- 2016-11-01
+ 2016-11-01
- Add
dc.type
to the output options for Atmire's Listings and Reports module (#286)
-2016-11-02
+2016-11-02
- Migrate DSpace Test to DSpace 5.5 (notes)
- Run all updates on DSpace Test and reboot the server
@@ -144,11 +144,11 @@ java.lang.NullPointerException
- I will raise a ticket with Atmire to ask them
-2016-11-06
+2016-11-06
- After re-deploying and re-indexing I didn't see the same issue, and the indexing completed in 85 minutes, which is about how long it is supposed to take
-2016-11-07
+2016-11-07
- Horrible one liner to get Linode ID from certain Ansible host vars:
@@ -166,7 +166,7 @@ COPY 22
- Add
AMR
to ILRI subjects and remove one duplicate instance of IITA in author affiliations controlled vocabulary (#288)
-2016-11-08
+2016-11-08
- Atmire's Listings and Reports module seems to be broken on DSpace 5.5
@@ -181,13 +181,13 @@ COPY 22
- Dump of the top ~200 authors in CGSpace:
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id=3 group by text_value order by count desc limit 210) to /tmp/210-authors.csv with csv;
-
2016-11-09
+2016-11-09
- CGSpace crashed so I quickly ran system updates, applied one or two of the waiting changes from the
5_x-prod
branch, and rebooted the server
- The error was
Timeout waiting for idle object
but I haven't looked into the Tomcat logs to see what happened
- Also, I ran the corrections for CRPs from earlier this week
-2016-11-10
+2016-11-10
- Helping Megan Zandstra and CIAT with some questions about the REST API
- Playing with
find-by-metadata-field
, this works:
@@ -283,7 +283,7 @@ $ curl -s -H "accept: application/json" -H "Content-Type: applica
- Not sure what's going on, but Discovery shows 83 values, and database shows 85, so I'm going to reindex Discovery just in case
-2016-11-14
+2016-11-14
- I applied Atmire's suggestions to fix Listings and Reports for DSpace 5.5 and now it works
- There were some issues with the
dspace/modules/jspui/pom.xml
, which is annoying because all I did was rebase our working 5.1 code on top of 5.5, meaning Atmire's installation procedure must have changed
@@ -319,7 +319,7 @@ X-Cocoon-Version: 2.2.0
- The first one gets a session, and any after that — within 60 seconds — will be internally mapped to the same session by Tomcat
- This means that when Google or Baidu slam you with tens of concurrent connections they will all map to ONE internal session, which saves RAM!
-2016-11-15
+2016-11-15
- The Tomcat JVM heap looks really good after applying the Crawler Session Manager fix on DSpace Test last night:
@@ -375,7 +375,7 @@ Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)" "
- We absolutely don't use those modules, so we shouldn't build them in the first place
-2016-11-17
+2016-11-17
- Generate a list of journal titles for Peter and Abenet to look through so we can make a controlled vocabulary out of them:
@@ -404,18 +404,18 @@ UPDATE 7
- I'm not sure if there's anything we can do, actually, because we would have to remove those from the thumbnail bundles, and replace them with the regular JPGs from the content bundle, and then remove them from the assetstore…
-2016-11-18
+2016-11-18
- Enable Tomcat Crawler Session Manager on CGSpace
-2016-11-21
+2016-11-21
- More work on Ansible playbooks for PostgreSQL 9.3→9.5 and Java 7→8 work
- CGSpace virtual managers meeting
- I need to look into making the item thumbnail clickable
- Macaroni Bros said they tested the DSpace Test (DSpace 5.5) REST API for CCAFS and WLE sites and it works as expected
-2016-11-23
+2016-11-23
- Upgrade Java from 7 to 8 on CGSpace
- I had started planning the inplace PostgreSQL 9.3→9.5 upgrade but decided that I will have to
pg_dump
and pg_restore
when I move to the new server soon anyways, so there's no need to upgrade the database right now
@@ -426,13 +426,13 @@ UPDATE 7
- Play with Creative Commons stuff in DSpace submission step
- It seems to work but it doesn't let you choose a version of CC (like 4.0), and we would need to customize the XMLUI item display so it doesn't display the gross CC badges
-2016-11-24
+2016-11-24
- Bizuwork was testing DSpace Test on DSPace 5.5 and noticed that the Listings and Reports module seems to be case sensitive, whereas CGSpace's Listings and Reports isn't (ie, a search for “orth, alan” vs “Orth, Alan” returns the same results on CGSpace, but different on DSpace Test)
- I have raised a ticket with Atmire
- Looks like this issue is actually the new Listings and Reports module honoring the Solr search queries more correctly
-2016-11-27
+2016-11-27
- Run system updates on DSpace Test and reboot the server
- Deploy DSpace 5.5 on CGSpace:
@@ -451,7 +451,7 @@ UPDATE 7
- Testing DSpace 5.5 on CGSpace, it seems CUA's export as XLS works for Usage statistics, but not Content statistics
- I will raise a bug with Atmire
-2016-11-28
+2016-11-28
- One user says they are still getting a blank page when he logs in (just CGSpace header, but no community list)
- Looking at the Catlina logs I see there is some super long-running indexing process going on:
@@ -478,7 +478,7 @@ $ /home/dspacetest.cgiar.org/bin/dspace registry-loader -metadata /home/dspacete
- Wow, Bram from Atmire pointed out this solution for using multiple handles with one DSpace instance: https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace?focusedCommentId=78163296#comment-78163296
- We might be able to migrate the CGIAR Library now, as they had wanted to keep their handles
-2016-11-29
+2016-11-29
- Sisay tried deleting and re-creating Goshu's account but he still can't see any communities on the homepage after he logs in
- Around the time of his login I see this in the DSpace logs:
@@ -514,7 +514,7 @@ org.dspace.discovery.SearchServiceException: Error executing query
- A few users are reporting having issues with their workflows, they get the following message: “You are not allowed to perform this task”
- Might be the same as DS-2920 on the bug tracker
-2016-11-30
+2016-11-30
- The
maxHttpHeaderSize
fix worked on CGSpace (user is able to see the community list on the homepage)
- The “take task” cache fix worked on DSpace Test but it's not an official patch, so I'll have to report the bug to DSpace people and try to get advice
diff --git a/docs/2016-12/index.html b/docs/2016-12/index.html
index 1174e2fbf..a53f16fc1 100644
--- a/docs/2016-12/index.html
+++ b/docs/2016-12/index.html
@@ -43,7 +43,7 @@ I see thousands of them in the logs for the last few months, so it's not rel
I've raised a ticket with Atmire to ask
Another worrying error from dspace.log is:
"/>
-
+
@@ -124,7 +124,7 @@ Another worrying error from dspace.log is:
- 2016-12-02
+ 2016-12-02
- CGSpace was down for five hours in the morning while I was sleeping
- While looking in the logs for errors, I see tons of warnings about Atmire MQM:
@@ -242,7 +242,7 @@ org.apache.solr.client.solrj.SolrServerException: Server refused connection at:
- Also, the disk is nearly full because of log file issues, so I'm running some compression on DSpace logs
- Normally these stay uncompressed for a month just in case we need to look at them, so now I've just compressed anything older than 2 weeks so we can get some disk space back
-2016-12-04
+2016-12-04
- I got a weird report from the CGSpace checksum checker this morning
- It says 732 bitstreams have potential issues, for example:
@@ -293,7 +293,7 @@ GC_TUNE="-XX:-UseSuperWord \
- I need to try these because they are recommended by the Solr project itself
- Also, as always, I need to read Shawn Heisey's wiki page on Solr
-2016-12-05
+2016-12-05
- I did some basic benchmarking on a local DSpace before and after the JVM settings above, but there wasn't anything amazingly obvious
- I want to make the changes on DSpace Test and monitor the JVM heap graphs for a few days to see if they change the JVM GC patterns or anything (munin graphs)
@@ -307,7 +307,7 @@ GC_TUNE="-XX:-UseSuperWord \
- I haven't tested it yet, but I created a pull request: #289
-2016-12-06
+2016-12-06
- Some author authority corrections and name standardizations for Peter:
@@ -360,7 +360,7 @@ java.lang.NullPointerException
real 8m39.913s
user 1m54.190s
sys 0m22.647s
-2016-12-07
+2016-12-07
- For what it's worth, after running the same SQL updates on my local test server,
index-authority
runs and completes just fine
- I will have to test more
@@ -459,7 +459,7 @@ update metadatavalue set text_value = 'Hoek, Rein van der', authority='4d6cbce2-
update metadatavalue set authority='18349f29-61b1-44d7-ac60-89e55546e812', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thorne, P%';
update metadatavalue set authority='0d8369bb-57f7-4b2f-92aa-af820b183aca', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Thornton, P%';
update metadatavalue set text_value='Grace, Delia', authority='bfa61d7c-7583-4175-991c-2e7315000f0c', confidence=600 where resource_type_id=2 and metadata_field_id=3 and text_value like 'Grace, D%';
-2016-12-08
+2016-12-08
- Something weird happened and Peter Thorne's names all ended up as “Thorne”, I guess because the original authority had that as its name value:
@@ -506,7 +506,7 @@ UPDATE 362
- In other news, I think we should really be using more RAM for PostgreSQL's
shared_buffers
- The PostgreSQL documentation recommends using 25% of the system's RAM on dedicated systems, but we should use a bit less since we also have a massive JVM heap and also benefit from some RAM being used by the OS cache
-2016-12-09
+2016-12-09
- More work on finishing rough draft of KM4Dev article
- Set PostgreSQL's
shared_buffers
on CGSpace to 10% of system RAM (1200MB)
@@ -517,7 +517,7 @@ dspace=# update metadatavalue set authority='2df8136e-d8f4-4142-b58c-562337cab76
- The authority IDs were different now than when I was looking a few days ago so I had to adjust them here
-2016-12-11
+2016-12-11
- After enabling a sizable
shared_buffers
for CGSpace's PostgreSQL configuration the number of connections to the database dropped significantly
@@ -553,7 +553,7 @@ UPDATE 35
- Work on article for KM4Dev journal
-2016-12-13
+2016-12-13
- Checking in on CGSpace postgres stats again, looks like the
shared_buffers
change from a few days ago really made a big impact:
@@ -640,7 +640,7 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
- It happens on development and production, so I will have to ask Atmire
- Most likely an issue with installation/configuration
-2016-12-14
+2016-12-14
- Atmire sent a quick fix for the
last-update.txt
file not found error
- After applying pull request #291 on DSpace Test I no longer see the error in the logs after the
UpdateSolrStorageReports
task runs
@@ -648,7 +648,7 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
- Made a pull request with a template for the cron jobs (#75)
- Testing SMTP from the new CGSpace server and it's not working, I'll have to tell James
-2016-12-15
+2016-12-15
- Start planning for server migration this weekend, letting users know
- I am trying to figure out what the process is to update the server's IP in the Handle system, and emailing the hdladmin account bounces(!)
@@ -662,7 +662,7 @@ Caused by: java.lang.NoSuchMethodError: com.atmire.statistics.generator.DSpaceOb
-2016-12-18
+2016-12-18
- Add four new CRP subjects for 2017 and sort the input forms alphabetically (#294)
- Test the SMTP on the new server and it's working
@@ -737,13 +737,13 @@ $ exit
-2016-12-22
+2016-12-22
- Abenet wanted a CSV of the IITA community, but the web export doesn't include the
dc.date.accessioned
field
- I had to export it from the command line using the
-a
flag:
$ [dspace]/bin/dspace metadata-export -a -f /tmp/iita.csv -i 10568/68616
-
2016-12-28
+2016-12-28
- We've been getting two alerts per day about CPU usage on the new server from Linode
- These are caused by the batch jobs for Solr etc that run in the early morning hours
diff --git a/docs/2017-01/index.html b/docs/2017-01/index.html
index f40b05fc4..25363d85a 100644
--- a/docs/2017-01/index.html
+++ b/docs/2017-01/index.html
@@ -25,7 +25,7 @@ I checked to see if the Solr sharding task that is supposed to run on January 1s
I tested on DSpace Test as well and it doesn't work there either
I asked on the dspace-tech mailing list because it seems to be broken, and actually now I'm not sure if we've ever had the sharding task run successfully over all these years
"/>
-
+
@@ -106,13 +106,13 @@ I asked on the dspace-tech mailing list because it seems to be broken, and actua
- 2017-01-02
+ 2017-01-02
- I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error
- I tested on DSpace Test as well and it doesn't work there either
- I asked on the dspace-tech mailing list because it seems to be broken, and actually now I'm not sure if we've ever had the sharding task run successfully over all these years
-2017-01-04
+2017-01-04
- I tried to shard my local dev instance and it fails the same way:
@@ -183,17 +183,17 @@ Caused by: java.net.SocketException: Broken pipe (Write failed)
- Very interesting… it creates the core and then fails somehow
-2017-01-08
+2017-01-08
- Put Sisay's
item-view.xsl
code to show mapped collections on CGSpace (#295)
-2017-01-09
+2017-01-09
- A user wrote to tell me that the new display of an item's mappings had a crazy bug for at least one item: https://cgspace.cgiar.org/handle/10568/78596
- She said she only mapped it once, but it appears to be mapped 184 times
-2017-01-10
+2017-01-10
- I tried to clean up the duplicate mappings by exporting the item's metadata to CSV, editing, and re-importing, but DSpace said “no changes were detected”
- I've asked on the dspace-tech mailing list to see if anyone can help
@@ -210,7 +210,7 @@ Caused by: java.net.SocketException: Broken pipe (Write failed)
- I will have to ask the DSpace people if this is a valid approach
- Finish looking at the Journal Title corrections of the top 500 Journal Titles so we can make a controlled vocabulary from it
-2017-01-11
+2017-01-11
- Maria found another item with duplicate mappings: https://cgspace.cgiar.org/handle/10568/78658
- Error in
fix-metadata-values.py
when it tries to print the value for Entwicklung & Ländlicher Raum:
@@ -238,11 +238,11 @@ UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15:
- I will have to go through these and fix some more before making the controlled vocabulary
- Added 30 more corrections or so, now there are 49 total and I'll have to get the top 500 after applying them
-2017-01-13
+2017-01-13
-2017-01-16
+2017-01-16
- Fix the two items Maria found with duplicate mappings with this script:
@@ -250,7 +250,7 @@ UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 15:
delete from collection2item where item_id = '80596' and id not in (90792, 90806, 90807);
/* 1 incorrect mapping: https://cgspace.cgiar.org/handle/10568/78658 */
delete from collection2item where id = '91082';
-2017-01-17
+2017-01-17
- Helping clean up some file names in the 232 CIAT records that Sisay worked on last week
- There are about 30 files with
%20
(space) and Spanish accents in the file name
@@ -276,18 +276,18 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
- Somewhere on the Internet suggested using a DPI of 144
-2017-01-19
+2017-01-19
- In testing a random sample of CIAT's PDFs for compressability, it looks like all of these methods generally increase the file size so we will just import them as they are
- Import 232 CIAT records into CGSpace:
$ JAVA_OPTS="-Xmx512m -Dfile.encoding=UTF-8" /home/cgspace.cgiar.org/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/68704 --source /home/aorth/CIAT_232/SimpleArchiveFormat/ --mapfile=/tmp/ciat.map &> /tmp/ciat.log
-
2017-01-22
+2017-01-22
- Looking at some records that Sisay is having problems importing into DSpace Test (seems to be because of copious whitespace return characters from Excel's CSV exporter)
- There were also some issues with an invalid dc.date.issued field, and I trimmed leading / trailing whitespace and cleaned up some URLs with unneeded parameters like ?show=full
-2017-01-23
+2017-01-23
- I merged Atmire's pull request into the development branch so they can deploy it on DSpace Test
- Move some old ILRI Program communities to a new subcommunity for former programs (10568/79164):
@@ -298,7 +298,7 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
10568/42161 10568/171 10568/79341
10568/41914 10568/171 10568/79340
-
2017-01-24
+2017-01-24
- Run all updates on DSpace Test and reboot the server
- Run fixes for Journal titles on CGSpace:
@@ -312,7 +312,7 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
- Then sort them in OpenRefine and create a controlled vocabulary by manually adding the XML markup, pull request (#298)
- This would be the last issue remaining to close the meta issue about switching to controlled vocabularies (#69)
-2017-01-25
+2017-01-25
- Atmire says the
com.atmire.statistics.util.UpdateSolrStorageReports
and com.atmire.utils.ReportSender
are no longer necessary because they are using a Spring scheduler for these tasks now
- Pull request to remove them from the Ansible templates: https://github.com/ilri/rmg-ansible-public/pull/80
@@ -325,18 +325,18 @@ $ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -
- But now we have a new issue with the “Types” in Content statistics not being respected—we only get the defaults, despite having custom settings in
dspace/config/modules/atmire-cua.cfg
-2017-01-27
+2017-01-27
- Magdalena pointed out that somehow the Anonymous group had been added to the Administrators group on CGSpace (!)
- Discuss plans to update CCAFS metadata and communities for their new flagships and phase II project identifiers
- The flagships are in
cg.subject.ccafs
, and we need to probably make a new field for the phase II project identifiers
-2017-01-28
+2017-01-28
- Merge controlled vocabulary for journal titles (
dc.source
) into CGSpace (#298)
- Merge new CIAT subject into CGSpace (#296)
-2017-01-29
+2017-01-29
- Run all system updates on DSpace Test, redeploy DSpace code, and reboot the server
- Run all system updates on CGSpace, redeploy DSpace code, and reboot the server
diff --git a/docs/2017-02/index.html b/docs/2017-02/index.html
index 2b45f2c63..345f7270b 100644
--- a/docs/2017-02/index.html
+++ b/docs/2017-02/index.html
@@ -47,7 +47,7 @@ DELETE 1
Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
Looks like we'll be using cg.identifier.ccafsprojectpii as the field name
"/>
-
+
@@ -128,7 +128,7 @@ Looks like we'll be using cg.identifier.ccafsprojectpii as the field name
- 2017-02-07
+ 2017-02-07
- An item was mapped twice erroneously again, so I had to remove one of the mappings manually:
@@ -145,7 +145,7 @@ DELETE 1
- Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
- Looks like we'll be using
cg.identifier.ccafsprojectpii
as the field name
-2017-02-08
+2017-02-08
- We also need to rename some of the CCAFS Phase I flagships:
@@ -159,7 +159,7 @@ DELETE 1
- Start testing some nearly 500 author corrections that CCAFS sent me:
$ ./fix-metadata-values.py -i /tmp/CCAFS-Authors-Feb-7.csv -f dc.contributor.author -t 'correct name' -m 3 -d dspace -u dspace -p fuuu
-
2017-02-09
+2017-02-09
- More work on CCAFS Phase II stuff
- Looks like simply adding a new metadata field to
dspace/config/registries/cgiar-types.xml
and restarting DSpace causes the field to get added to the rregistry
@@ -168,13 +168,13 @@ DELETE 1
- Testing some corrections on CCAFS Phase II flagships (
cg.subject.ccafs
):
$ ./fix-metadata-values.py -i ccafs-flagships-feb7.csv -f cg.subject.ccafs -t correct -m 210 -d dspace -u dspace -p fuuu
-
2017-02-10
+2017-02-10
- CCAFS said they want to wait on the flagship updates (
cg.subject.ccafs
) on CGSpace, perhaps for a month or so
- Help Marianne Gadeberg (WLE) with some user permissions as it seems she had previously been using a personal email account, and is now on a CGIAR one
- I manually added her new account to ~25 authorizations that her hold user was on
-2017-02-14
+2017-02-14
- Add
SCALING
to ILRI subjects (#304), as Sisay's attempts were all sloppy
- Cherry pick some patches from the DSpace 5.7 branch:
@@ -187,11 +187,11 @@ DELETE 1
- I still need to test these, especially as the last two which change some stuff with Solr maintenance
-2017-02-15
+2017-02-15
-2017-02-16
+2017-02-16
- Looking at memory info from munin on CGSpace:
@@ -262,7 +262,7 @@ dspace=# update metadatavalue set text_value = 'https://dx.doi.org/10.15446/agro
- Then we could add a cron job for them and run them from the command line like:
[dspace]/bin/dspace curate -t noop -i 10568/79891
-
2017-02-20
+2017-02-20
- Run all system updates on DSpace Test and reboot the server
- Run CCAFS author corrections on DSpace Test and CGSpace and force a full discovery reindex
@@ -281,7 +281,7 @@ b'Entwicklung & L\xc3\xa4ndlicher Raum'
- So for now I will remove the encode call from the script (though it was never used on the versions on the Linux hosts), leading me to believe it really was a temporary problem, perhaps due to macOS or the Python build I was using.
-2017-02-21
+2017-02-21
- Testing regenerating PDF thumbnails, like I started in 2016-11
- It seems there is a bug in
filter-media
that causes it to process formats that aren't part of its configuration:
@@ -300,14 +300,14 @@ filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = A
- I've sent a message to the mailing list and might file a Jira issue
- Ask Atmire about the failed interpolation of the
dspace.internalUrl
variable in atmire-cua.cfg
-2017-02-22
+2017-02-22
- Atmire said I can add
dspace.internalUrl
to my build properties and the error will go away
- It should be the local URL for accessing Tomcat from the server's own perspective, ie: http://localhost:8080
-2017-02-26
+2017-02-26
-- Find all fields with “http://hdl.handle.net” values (most are in
dc.identifier.uri
, but some are in other URL-related fields like cg.link.reference
, cg.identifier.dataurl
, and cg.identifier.url
):
+- Find all fields with “http://hdl.handle.net" values (most are in
dc.identifier.uri
, but some are in other URL-related fields like cg.link.reference
, cg.identifier.dataurl
, and cg.identifier.url
):
dspace=# select distinct metadata_field_id from metadatavalue where resource_type_id=2 and text_value like 'http://hdl.handle.net%';
dspace=# update metadatavalue set text_value = regexp_replace(text_value, 'http://hdl.handle.net', 'https://hdl.handle.net') where resource_type_id=2 and metadata_field_id IN (25, 113, 179, 219, 220, 223) and text_value like 'http://hdl.handle.net%';
@@ -316,7 +316,7 @@ UPDATE 58633
This works but I'm thinking I'll wait on the replacement as there are perhaps some other places that rely on http://hdl.handle.net
(grep the code, it's scary how many things are hard coded)
Send message to dspace-tech mailing list with concerns about this
-2017-02-27
+2017-02-27
- LDAP users cannot log in today, looks to be an issue with CGIAR's LDAP server:
@@ -379,7 +379,7 @@ Certificate chain
Redeploy CGSpace and DSpace Test to on latest 5_x-prod
branch with fixes for LDAP bind user
Run all system updates on CGSpace server and reboot
-2017-02-28
+2017-02-28
- After running the CIAT corrections and updating the Discovery and authority indexes, there is still no change in the number of items listed for CIAT in Discovery
- Ah, this is probably because some items have the
International Center for Tropical Agriculture
author twice, which I first noticed in 2016-12 but couldn't figure out how to fix
diff --git a/docs/2017-03/index.html b/docs/2017-03/index.html
index fa9136eea..7f67aa2b9 100644
--- a/docs/2017-03/index.html
+++ b/docs/2017-03/index.html
@@ -51,7 +51,7 @@ Interestingly, it seems DSpace 4.x's thumbnails were sRGB, but forcing regen
$ identify ~/Desktop/alc_contrastes_desafios.jpg
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
"/>
-
+
@@ -132,11 +132,11 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
- 2017-03-01
+ 2017-03-01
- Run the 279 CIAT author corrections on CGSpace
-2017-03-02
+2017-03-02
- Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace
- CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles
@@ -158,7 +158,7 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
- I filed an issue for the color space thing: DS-3517
-2017-03-03
+2017-03-03
- I created a patch for DS-3517 and made a pull request against upstream
dspace-5_x
: https://github.com/DSpace/DSpace/pull/1669
- Looks like
-colorspace sRGB
alone isn't enough, we need to use profiles:
@@ -176,13 +176,13 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
DirectClass CMYK
$ identify -format '%r\n' Africa\ group\ of\ negotiators.pdf\[0\]
DirectClass sRGB Alpha
-
2017-03-04
+2017-03-04
- Spent more time looking at the ImageMagick CMYK issue
- The
default_cmyk.icc
and default_rgb.icc
files are both part of the Ghostscript GPL distribution, but according to DSpace's LICENSES_THIRD_PARTY
file, DSpace doesn't allow distribution of dependencies that are licensed solely under the GPL
- So this issue is kinda pointless now, as the ICC profiles are absolutely necessary to make a meaningful CMYK→sRGB conversion
-2017-03-05
+2017-03-05
- Look into helping developers from landportal.info with a query for items related to LAND on the REST API
- They want something like the items that are returned by the general “LAND” query in the search interface, but we cannot do that
@@ -223,11 +223,11 @@ DirectClass sRGB Alpha
- Submit pull request to set the author separator for XMLUI item lists to a semicolon instead of “,": https://github.com/ilri/DSpace/pull/306
- I want to show it briefly to Abenet and Peter to get feedback
-2017-03-06
+2017-03-06
- Someone on the mailing list said that
handle.plugin.checknameauthority
should be false if we're using multiple handle prefixes
-2017-03-07
+2017-03-07
- I set up a top-level community as a test for the CGIAR Library and imported one item with the the 10947 handle prefix
- When testing the Handle resolver locally it shows the item to be on the local repository
@@ -243,18 +243,18 @@ DirectClass sRGB Alpha
- Another thing is that the import process creates new
dc.date.accessioned
and dc.date.available
fields, so we end up with duplicates (is it important to preserve the originals for these?)
- Report DS-3520 issue to Atmire
-2017-03-08
+2017-03-08
- Merge the author separator changes to
5_x-prod
, as everyone has responded positively about it, and it's the default in Mirage2 afterall!
- Cherry pick the
commons-collections
patch from DSpace's dspace-5_x
branch to address DS-3520: https://jira.duraspace.org/browse/DS-3520
-2017-03-09
+2017-03-09
- Export list of sponsors so Peter can clean it up:
dspace=# \copy (select text_value, count(*) from metadatavalue where resource_type_id=2 and metadata_field_id IN (select metadata_field_id from metadatafieldregistry where element = 'description' and qualifier = 'sponsorship') group by text_value order by count desc) to /tmp/sponsorship.csv with csv;
COPY 285
-
2017-03-12
+2017-03-12
- Test the sponsorship fixes and deletes from Peter:
@@ -271,7 +271,7 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
Created a basic theme for the Livestock CRP community
-2017-03-15
+2017-03-15
- Merge pull request for controlled vocabulary updates for sponsor: https://github.com/ilri/DSpace/pull/308
- Merge pull request for Livestock CRP theme: https://github.com/ilri/DSpace/issues/309
@@ -280,7 +280,7 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
- I also need to ask if either of these new fields need to be added to Discovery facets, search, and Atmire modules
- Run all system updates on DSpace Test and re-deploy CGSpace
-2017-03-16
+2017-03-16
- Merge pull request for PABRA subjects: https://github.com/ilri/DSpace/pull/310
- Abenet and Peter say we can add them to Discovery, Atmire modules, etc, but I might not have time to do it now
@@ -291,15 +291,15 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
- Deploy latest changes and investor fixes/deletions on CGSpace
- Run system updates on CGSpace and reboot server
-2017-03-20
+2017-03-20
-2017-03-24
+2017-03-24
- Still helping Sisay try to figure out how to create a theme for the RTB community
-2017-03-28
+2017-03-28
- CCAFS said they are ready for the flagship updates for Phase II to be run (
cg.subject.ccafs
), so I ran them on CGSpace:
@@ -313,7 +313,7 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
I sent a list to CCAFS people so they can tell me if some should be deleted or moved, etc
Test, squash, and merge Sisay's RTB theme into 5_x-prod
: https://github.com/ilri/DSpace/pull/316
-2017-03-29
+2017-03-29
- Dump a list of fields in the DC and CG schemas to compare with CG Core:
@@ -322,7 +322,7 @@ $ ./delete-metadata-values.py -i Investors-Delete-121.csv -f dc.description.spon
Ooh, a better one!
dspace=# select coalesce(case when metadata_schema_id=1 then 'dc.' else 'cg.' end) || concat_ws('.', element, qualifier) as field, scope_note from metadatafieldregistry where metadata_schema_id in (1, 2);
-
2017-03-30
+2017-03-30
- Adjust the Linode CPU usage alerts for the CGSpace server from 150% to 200%, as generally the nightly Solr indexing causes a usage around 150–190%, so this should make the alerts less regular
- Adjust the threshold for DSpace Test from 90 to 100%
diff --git a/docs/2017-04/index.html b/docs/2017-04/index.html
index 4b493e18b..3d78c79bc 100644
--- a/docs/2017-04/index.html
+++ b/docs/2017-04/index.html
@@ -37,7 +37,7 @@ Testing the CMYK patch on a collection with 650 items:
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
"/>
-
+
@@ -118,7 +118,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th
- 2017-04-02
+ 2017-04-02
- Merge one change to CCAFS flagships that I had forgotten to remove last month (“MANAGING CLIMATE RISK”): https://github.com/ilri/DSpace/pull/317
- Quick proof-of-concept hack to add
dc.rights
to the input form, including some inline instructions/hints:
@@ -129,7 +129,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th
- Testing the CMYK patch on a collection with 650 items:
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
-
2017-04-03
+2017-04-03
- Continue testing the CMYK patch on more communities:
@@ -150,7 +150,7 @@ $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Th
Also, I'm noticing some weird outliers in cg.coverage.region
, need to remember to go correct these later:
dspace=# select text_value from metadatavalue where resource_type_id=2 and metadata_field_id=227;
-
2017-04-04
+2017-04-04
- The
filter-media
script has been running on more large communities and now there are many more CMYK PDFs that have been fixed:
@@ -177,13 +177,13 @@ ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0
In that case it might just be better to see how many the user submitted (both with and without bitstreams):
dspace=# select * from metadatavalue where resource_type_id=2 and metadata_field_id=28 and text_value ~ '^Submitted.*giampieri.*2016-.*';
-
2017-04-05
+2017-04-05
- After doing a few more large communities it seems this is the final count of CMYK PDFs:
$ grep -c profile /tmp/filter-media-cmyk.txt
2505
-
2017-04-06
+2017-04-06
- After reading the notes for DCAT April 2017 I am testing some new settings for PostgreSQL on DSpace Test:
@@ -198,7 +198,7 @@ ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0
- Sisay added their OAI as a source to a new collection, but using the Simple Dublin Core method, so many fields are unqualified and duplicated
- Looking at the documentation it seems that we probably want to be using DSpace Intermediate Metadata
-2017-04-10
+2017-04-10
- Adjust Linode CPU usage alerts on DSpace servers
@@ -216,12 +216,12 @@ ILAC_Brief21_PMCA.pdf: 113462 bytes, checksum: 249fef468f401c066a119f5db687add0
- I added
cg.subject.cifor
to the metadata registry and I'm waiting for the harvester to re-harvest to see if it picks up more data now
- Another possiblity is that we could use a cross walk… but I've never done it.
-2017-04-11
+2017-04-11
- Looking at the item from CIFOR it hasn't been updated yet, maybe they aren't running the cron job
- I emailed Usman from CIFOR to ask if he's running the cron job
-2017-04-12
+2017-04-12
- CIFOR says they have cleaned their OAI cache and that the cron job for OAI import is enabled
- Now I see updated fields, like
dc.date.issued
but none from the CG or CIFOR namespaces
@@ -281,7 +281,7 @@ sys 1m29.310s
- Perhaps I need to file a bug for this, or at least ask on the DSpace Test mailing list?
- I wonder if we could use a crosswalk to convert to a format that CG Core wants, like
<date Type="Available">
-2017-04-13
+2017-04-13
- Checking the CIFOR item on DSpace Test, it still doesn't have the new metadata
- The collection status shows this message from the harvester:
@@ -297,7 +297,7 @@ sys 1m29.310s
- It seems like they have done a full metadata migration with
dc.date.issued
and cg.coverage.country
etc
- Submit pull request to upstream DSpace for the PDF thumbnail bug (DS-3516): https://github.com/DSpace/DSpace/pull/1709
-2017-04-14
+2017-04-14
- DSpace committers reviewed my patch for DS-3516 and proposed a simpler idea involving incorrect use of
SelfRegisteredInputFormats
- I tested the idea and it works, so I made a new patch: https://github.com/DSpace/DSpace/pull/1709
@@ -311,7 +311,7 @@ sys 1m29.310s
- Reboot DSpace Test server to get new Linode kernel
-2017-04-17
+2017-04-17
Error: ERROR: update or delete on table "bitstream" violates foreign key constraint "bundle_primary_bitstream_id_fkey" on table "bundle"
Detail: Key (bitstream_id)=(435) is still referenced from table "bundle".
-
2017-04-18
+2017-04-18
- Helping Tsega test his new CGSpace REST API Rails app on DSpace Test
- Setup and run with:
@@ -340,7 +340,7 @@ $ rails -s
- This is interesting for creating runnable commands from
bundle
:
$ bundle binstubs puma --path ./sbin
-
2017-04-19
+2017-04-19
-2017-04-20
+2017-04-20
- Atmire responded about the Workflow Statistics, saying that it had been disabled because many environments needed customization to be useful
- I re-enabled it with a hidden config key
workflow.stats.enabled = true
on DSpace Test and will evaluate adding it on CGSpace
@@ -403,14 +403,14 @@ $ wc -l /tmp/ciat
$ export JAVA_OPTS="-Dfile.encoding=UTF-8 -Xmx1024m"
$ time schedtool -D -e ionice -c2 -n7 nice -n19 [dspace]/bin/dspace filter-media -f -v -i 10568/71249 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
-
2017-04-22
+2017-04-22
- Someone on the dspace-tech mailing list responded with a suggestion about the foreign key violation in the
cleanup
task
- The solution is to remove the ID (ie set to NULL) from the
primary_bitstream_id
column in the bundle
table
- After doing that and running the
cleanup
task again I find more bitstreams that are affected and end up with a long list of IDs that need to be fixed:
dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (435, 1136, 1132, 1220, 1236, 3002, 3255, 5322);
-
2017-04-24
+2017-04-24
- Two users mentioned some items they recently approved not showing up in the search / XMLUI
- I looked at the logs from yesterday and it seems the Discovery indexing has been crashing:
@@ -476,7 +476,7 @@ org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: this Index
- Now running the cleanup script on DSpace Test and already seeing 11GB freed from the assetstore—it's likely we haven't had a cleanup task complete successfully in years…
-2017-04-25
+2017-04-25
- Finally finished running the PDF thumbnail re-processing on CGSpace, the final count of CMYK PDFs is about 2751
- Preparing to run the cleanup task on CGSpace, I want to see how many files are in the assetstore:
@@ -544,7 +544,7 @@ Caused by: java.lang.ClassNotFoundException: org.dspace.statistics.content.DSpac
- So that is 30,000 files, and about 7GB
- Add logging to the cleanup cron task
-2017-04-26
+2017-04-26
- The size of the CGSpace database dump went from 111MB to 96MB, not sure about actual database size though
- Update RVM's Ruby from 2.3.0 to 2.4.0 on DSpace Test:
diff --git a/docs/2017-05/index.html b/docs/2017-05/index.html
index fc9021cb0..e1c038c67 100644
--- a/docs/2017-05/index.html
+++ b/docs/2017-05/index.html
@@ -15,7 +15,7 @@
-
+
@@ -96,7 +96,7 @@
- 2017-05-01
+ 2017-05-01
- ICARDA apparently started working on CG Core on their MEL repository
- They have done a few
cg.*
fields, but not very consistent and even copy some of CGSpace items:
@@ -106,11 +106,11 @@
-2017-05-02
+2017-05-02
- Atmire got back about the Workflow Statistics issue, and apparently it's a bug in the CUA module so they will send us a pull request
-2017-05-04
+2017-05-04
- Sync DSpace Test with database and assetstore from CGSpace
- Re-deploy DSpace Test with Atmire's CUA patch for workflow statistics, run system updates, and restart the server
@@ -118,7 +118,7 @@
- Megan says there are still some mapped items are not appearing since last week, so I forced a full
index-discovery -b
- Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace.cgiar.org/handle/10568/80731
-2017-05-05
+2017-05-05
- Discovered that CGSpace has ~700 items that are missing the
cg.identifier.status
field
- Need to perhaps try using the “required metadata” curation task to find fields missing these items:
@@ -127,13 +127,13 @@
- It seems the curation task dies when it finds an item which has missing metadata
-2017-05-06
+2017-05-06
-2017-05-07
+2017-05-07
- Testing one replacement for CCAFS Flagships (
cg.subject.ccafs
), first changed in the submission forms, and then in the database:
@@ -142,7 +142,7 @@
- Also, CCAFS wants to re-order their flagships to prioritize the Phase II ones
- Waiting for feedback from CCAFS, then I can merge #320
-2017-05-08
+2017-05-08
-2017-05-09
+2017-05-09
- The CGIAR Library metadata has some blank metadata values, which leads to
|||
in the Discovery facets
- Clean these up in the database using:
@@ -188,7 +188,7 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
- I think those errors actually come from me running the
update-sequences.sql
script while Tomcat/DSpace are running
- Apparently you need to stop Tomcat!
-2017-05-10
+2017-05-10
- Atmire says they are willing to extend the ORCID implementation, and I've asked them to provide a quote
- I clarified that the scope of the implementation should be that ORCIDs are stored in the database and exposed via REST / API like other fields
@@ -208,13 +208,13 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
- After this I ran the
update-sequences.sql
script (with Tomcat shut down), and cleaned up the 200+ blank metadata records:
dspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
-
2017-05-13
+2017-05-13
- After quite a bit of troubleshooting with importing cleaned up data as CSV, it seems that there are actually NUL characters in the
dc.description.abstract
field (at least) on the lines where CSV importing was failing
- I tried to find a way to remove the characters in vim or Open Refine, but decided it was quicker to just remove the column temporarily and import it
- The import was successful and detected 2022 changes, which should likely be the rest that were failing to import before
-2017-05-15
+2017-05-15
- To delete the blank lines that cause isses during import we need to use a regex in vim
g/^$/d
- After that I started looking in the
dc.subject
field to try to pull countries and regions out, but there are too many values in there
@@ -241,12 +241,12 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
Fix cron jobs for log management on DSpace Test, as they weren't catching dspace.log.*
files correctly and we had over six months of them and they were taking up many gigs of disk space
-2017-05-16
+2017-05-16
- Discuss updates to WLE themes for their Phase II
- Make an issue to track the changes to
cg.subject.wle
: #322
-2017-05-17
+2017-05-17
- Looking into the error I get when trying to create a new collection on DSpace Test:
@@ -275,13 +275,13 @@ $ for item in /home/aorth/10947-1/ITEM@10947-*; do [dspace]/bin/dspace packager
- After that I can create collections just fine, though I'm not sure if it has other side effects
-2017-05-21
+2017-05-21
- Start creating a basic theme for the CGIAR System Organization's community on CGSpace
- Using colors from the CGIAR Branding guidelines (2014)
- Make a GitHub issue to track this work: #324
-2017-05-22
+2017-05-22
- Do some cleanups of community and collection names in CGIAR System Management Office community on DSpace Test, as well as move some items as Peter requested
- Peter wanted a list of authors in here, so I generated a list of collections using the “View Source” on each community and this hacky awk:
@@ -311,7 +311,7 @@ from metadatavalue
where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author')
AND resource_type_id = 2
AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10947/2', '10947/3', '10947/10', '10947/4', '10947/5', '10947/6', '10947/7', '10947/8', '10947/9', '10947/11', '10947/25', '10947/12', '10947/26', '10947/27', '10947/28', '10947/29', '10947/30', '10947/13', '10947/14', '10947/15', '10947/16', '10947/31', '10947/32', '10947/33', '10947/34', '10947/35', '10947/36', '10947/37', '10947/17', '10947/18', '10947/38', '10947/19', '10947/39', '10947/40', '10947/41', '10947/42', '10947/43', '10947/2512', '10947/44', '10947/20', '10947/21', '10947/45', '10947/46', '10947/47', '10947/48', '10947/49', '10947/22', '10947/23', '10947/24', '10947/50', '10947/51', '10947/2518', '10947/2776', '10947/2790', '10947/2521', '10947/2522', '10947/2782', '10947/2525', '10947/2836', '10947/2524', '10947/2878', '10947/2520', '10947/2523', '10947/2786', '10947/2631', '10947/2589', '10947/2519', '10947/2708', '10947/2526', '10947/2871', '10947/2527', '10947/4467', '10947/3457', '10947/2528', '10947/2529', '10947/2533', '10947/2530', '10947/2531', '10947/2532', '10947/2538', '10947/2534', '10947/2540', '10947/2900', '10947/2539', '10947/2784', '10947/2536', '10947/2805', '10947/2541', '10947/2535', '10947/2537', '10568/93761'))) group by text_value order by count desc) to /tmp/cgiar-librar-authors.csv with csv;
-2017-05-23
+2017-05-23
- Add Affiliation to filters on Listing and Reports module (#325)
- Start looking at WLE's Phase II metadata updates but it seems they are not tagging their items properly, as their website importer infers which theme to use based on the name of the CGSpace collection!
@@ -323,12 +323,12 @@ COPY 111
- Respond to Atmire message about ORCIDs, saying that right now we'd prefer to just have them available via REST API like any other metadata field, and that I'm available for a Skype
-2017-05-26
+2017-05-26
- Increase max file size in nginx so that CIP can upload some larger PDFs
- Agree to talk with Atmire after the June DSpace developers meeting where they will be discussing exposing ORCIDs via REST/OAI
-2017-05-28
+2017-05-28
- File an issue on GitHub to explore/track migration to proper country/region codes (ISO 2/3 and UN M.49): #326
- Ask Peter how the Landportal.info people should acknowledge us as the source of data on their website
@@ -354,7 +354,7 @@ UPDATE 187
- Run the corrections on CGSpace and then update discovery / authority
- I notice that there are a handful of
java.lang.OutOfMemoryError: Java heap space
errors in the Catalina logs on CGSpace, I should go look into that…
-2017-05-29
+2017-05-29
- Discuss WLE themes and subjects with Mia and Macaroni Bros
- We decided we need to create metadata fields for Phase I and II themes
diff --git a/docs/2017-06/index.html b/docs/2017-06/index.html
index 0e46533c7..8f9847692 100644
--- a/docs/2017-06/index.html
+++ b/docs/2017-06/index.html
@@ -15,7 +15,7 @@
-
+
@@ -96,7 +96,7 @@
- 2017-06-01
+ 2017-06-01
- After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes
- The
cg.identifier.wletheme
field will be used for both Phase I and Phase II Research Themes
@@ -106,7 +106,7 @@
- Create pull request to add Phase II research themes to the submission form: #328
- Add
cg.subject.system
to CGSpace metadata registry, for subject from the upcoming CGIAR Library migration
-2017-06-04
+2017-06-04
- After adding
cg.identifier.wletheme
to 1106 WLE items I can see the field on XMLUI but not in REST!
- Strangely it happens on DSpace Test AND on CGSpace!
@@ -115,7 +115,7 @@
- After rebooting the server (and therefore restarting Tomcat) the new metadata field is available
- I've sent a message to the dspace-tech mailing list to ask if this is a bug and whether I should file a Jira ticket
-2016-06-05
+2016-06-05
- Rename WLE's “Research Themes” sub-community to “WLE Phase I Research Themes” on DSpace Test so Macaroni Bros can continue their testing
- Macaroni Bros tested it and said it's fine, so I renamed it on CGSpace as well
@@ -151,7 +151,7 @@
- Total items in CIAT Book Chapters is 914, with the others being flagged for some reason, and we should send that back to CIAT
- Restart Tomcat on CGSpace so that the
cg.identifier.wletheme
field is available on REST API for Macaroni Bros
-2017-06-07
+2017-06-07
- Testing Atmire's patch for the CUA Workflow Statistics again
- Still doesn't seem to give results I'd expect, like there are no results for Maria Garruccio, or for the ILRI community!
@@ -186,12 +186,12 @@
-2017-06-18
+2017-06-18
- Redeploy CGSpace with latest changes from
5_x-prod
, run system updates, and reboot the server
- Continue working on ansible infrastructure changes for CGIAR Library
-2017-06-20
+2017-06-20
- Import Abenet and Peter's changes to the CGIAR Library CRP community
- Due to them using Windows and renaming some columns there were formatting, encoding, and duplicate metadata value issues
@@ -207,7 +207,7 @@
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books.map &> /tmp/ciat-books.log
$ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace import --add --eperson=aorth@mjanja.ch --collection=10568/35701 --source /home/aorth/CIAT-Books/SimpleArchiveFormat/ --mapfile=/tmp/ciat-books2.map &> /tmp/ciat-books2.log
-
2017-06-25
+2017-06-25
- WLE has said that one of their Phase II research themes is being renamed from
Regenerating Degraded Landscapes
to Restoring Degraded Landscapes
- Pull request with the changes to
input-forms.xml
: #329
@@ -221,7 +221,7 @@ $ JAVA_OPTS="-Xmx1024m -Dfile.encoding=UTF-8" [dspace]/bin/dspace impo
- Marianne from WLE asked if they can have both Phase I and II research themes together in the item submission form
- Perhaps we can add them together in the same question for
cg.identifier.wletheme
-2017-06-30
+2017-06-30
- CGSpace went down briefly, I see lots of these errors in the dspace logs:
diff --git a/docs/2017-07/index.html b/docs/2017-07/index.html
index deb2a29ed..23ef08269 100644
--- a/docs/2017-07/index.html
+++ b/docs/2017-07/index.html
@@ -33,7 +33,7 @@ Merge changes for WLE Phase II theme rename (#329)
Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace
We can use PostgreSQL's extended output format (-x) plus sed to format the output into quasi XML:
"/>
-
+
@@ -114,11 +114,11 @@ We can use PostgreSQL's extended output format (-x) plus sed to format the o
- 2017-07-01
+ 2017-07-01
- Run system updates and reboot DSpace Test
-2017-07-04
+2017-07-04
- Merge changes for WLE Phase II theme rename (#329)
- Looking at extracting the metadata registries from ICARDA's MEL DSpace database so we can compare fields with CGSpace
@@ -138,7 +138,7 @@ We can use PostgreSQL's extended output format (-x) plus sed to format the o
- And fuck, then anyone consuming our data via REST / OAI will not notice that we have an author outside of
dc.contributor.authors
… ugh
- What if we modify the item submission form to use
type-bind
fields to show/hide certain fields depending on the type?
-2017-07-05
+2017-07-05
- Adjust WLE Research Theme to include both Phase I and II on the submission form according to editor feedback (#330)
- Generate list of fields in the current CGSpace
cg
scheme so we can record them properly in the metadata registry:
@@ -159,26 +159,26 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
- Seems to come from
dspace-api/src/main/java/org/dspace/statistics/SolrLogger.java
-2017-07-06
+2017-07-06
- Sisay tried to help by making a pull request for the RTB flagships but there are formatting errors, unrelated changes, and the flagship names are not in the style I requested
- Abenet talked to CIP and they said they are actually ok with using collection names rather than adding a new metadata field
-2017-07-13
+2017-07-13
- Remove
UKaid
from the controlled vocabulary for dc.description.sponsorship
, as Department for International Development, United Kingdom
is the correct form and it is already present (#334)
-2017-07-14
+2017-07-14
- Sisay sent me a patch to add “Photo Report” to
dc.type
so I've added it to the 5_x-prod
branch
-2017-07-17
+2017-07-17
- Linode shut down our seventeen (17) VMs due to nonpayment of the July 1st invoice
- It took me a few hours to find the ICT/Finance contacts to pay the bill and boot all the servers back up
- Since the server was down anyways, I decided to run all system updates and re-deploy CGSpace so that the latest changes to
input-forms.xml
and the sponsors controlled vocabulary
-2017-07-20
+2017-07-20
- Skype chat with Addis team about the status of the CGIAR Library migration
- Need to add the CGIAR System Organization subjects to Discovery Facets (test first)
@@ -199,7 +199,7 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
-2017-07-24
+2017-07-24
- Move two top-level communities to be sub-communities of ILRI Projects
@@ -207,7 +207,7 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
- Discuss CGIAR Library data cleanup with Sisay and Abenet
-2017-07-27
+2017-07-27
- Help Sisay with some transforms to add descriptions to the
filename
column of some CIAT Presentations he's working on in OpenRefine
- Marianne emailed a few days ago to ask why “Integrating Ecosystem Solutions” was not in the list of WLE Phase I Research Themes on the input form
@@ -215,21 +215,21 @@ org.postgresql.util.PSQLException: FATAL: remaining connection slots are reserve
- Then Mia from WLE also emailed to ask where some WLE focal regions went, and I said I didn't understand what she was talking about, as all we did in our previous work was rename the old “Research Themes” subcommunity to “WLE Phase I Research Themes” and add a new subcommunity for “WLE Phase II Research Themes”.
- Discuss some modifications to the CCAFS project tags in CGSpace submission form and in the database
-2017-07-28
+2017-07-28
- Discuss updates to the Phase II CCAFS project tags with Andrea from Macaroni Bros
- I will do the renaming and untagging of items in CGSpace database, and he will update his webservice with the latest project tags and I will get the XML from here for our
input-forms.xml
: https://ccafs.cgiar.org/export/ccafsproject
-2017-07-29
+2017-07-29
- Move some WLE items into appropriate Phase I Research Themes communities and delete some empty collections in WLE Regions community
-2017-07-30
+2017-07-30
- Start working on CCAFS project tag cleanup
- More questions about inconsistencies and spelling mistakes in their tags, so I've sent some questions for followup
-2017-07-31
+2017-07-31
- Looks like the final list of metadata corrections for CCAFS project tags will be:
diff --git a/docs/2017-08/index.html b/docs/2017-08/index.html
index c8afec1d8..48a255c95 100644
--- a/docs/2017-08/index.html
+++ b/docs/2017-08/index.html
@@ -57,7 +57,7 @@ This was due to newline characters in the dc.description.abstract column, which
I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
"/>
-
+
@@ -138,7 +138,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
- 2017-08-01
+ 2017-08-01
- Linode sent an alert that CGSpace (linode18) was using 350% CPU for the past two hours
- I looked in the Activity pane of the Admin Control Panel and it seems that Google, Baidu, Yahoo, and Bing are all crawling with massive numbers of bots concurrently (~100 total, mostly Baidu and Google)
@@ -160,7 +160,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
- I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using
g/^$/d
- Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
-2017-08-02
+2017-08-02
- Magdalena from CCAFS asked if there was a way to get the top ten items published in 2016 (note: not the top items in 2016!)
- I think Atmire's Content and Usage Analysis module should be able to do this but I will have to look at the configuration and maybe email Atmire if I can't figure it out
@@ -168,7 +168,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
- Atmire responded about the missing workflow statistics issue a few weeks ago but I didn't see it for some reason
- They said they added a publication and saw the workflow stat for the user, so I should try again and let them know
-2017-08-05
+2017-08-05
- Usman from CIFOR emailed to ask about the status of our OAI tests for harvesting their DSpace repository
- I told him that the OAI appears to not be harvesting properly after the first sync, and that the control panel shows an “Internal error” for that collection:
@@ -178,18 +178,18 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
- I don't see anything related in our logs, so I asked him to check for our server's IP in their logs
- Also, in the mean time I stopped the harvesting process, reset the status, and restarted the process via the Admin control panel (note: I didn't reset the collection, just the harvester status!)
-2017-08-07
+2017-08-07
- Apply Abenet's corrections for the CGIAR Library's Consortium subcommunity (697 records)
- I had to fix a few small things, like moving the
dc.title
column away from the beginning of the row, delete blank spaces in the abstract in vim using :g/^$/d
, add the dc.subject[en_US]
column back, as she had deleted it and DSpace didn't detect the changes made there (we needed to blank the values instead)
-2017-08-08
+2017-08-08
- Apply Abenet's corrections for the CGIAR Library's historic archive subcommunity (2415 records)
- I had to add the
dc.subject[en_US]
column back with blank values so that DSpace could detect the changes
- I applied the changes in 500 item batches
-2017-08-09
+2017-08-09
- Run system updates on DSpace Test and reboot server
- Help ICARDA upgrade their MELSpace to DSpace 5.7 using the docker-dspace container
@@ -199,7 +199,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
-2017-08-10
+2017-08-10
- Apply last updates to the CGIAR Library's Fund community (812 items)
- Had to do some quality checks and column renames before importing, as either Sisay or Abenet renamed a few columns and the metadata importer wanted to remove/add new metadata for title, abstract, etc.
@@ -220,7 +220,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
- Follow up with Atmire on the ticket about ORCID metadata in DSpace
- Follow up with Lili and Andrea about the pending CCAFS metadata and flagship updates
-2017-08-11
+2017-08-11
- CGSpace had load issues and was throwing errors related to PostgreSQL
- I told Tsega to reduce the max connections from 70 to 40 because actually each web application gets that limit and so for xmlui, oai, jspui, rest, etc it could be 70 x 4 = 280 connections depending on the load, and the PostgreSQL config itself is only 100!
@@ -229,7 +229,7 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
- Also, I need to find out where the load is coming from (rest?) and possibly block bots from accessing dynamic pages like Browse and Discover instead of just sending an X-Robots-Tag HTTP header
- I noticed that Google has bitstreams from the
rest
interface in the search index. I need to ask on the dspace-tech mailing list to see what other people are doing about this, and maybe start issuing an X-Robots-Tag: none
there!
-2017-08-12
+2017-08-12
- I sent a message to the mailing list about the duplicate content issue with
/rest
and /bitstream
URLs
- Looking at the logs for the REST API on
/rest
, it looks like there is someone hammering doing testing or something on it…
@@ -249,12 +249,12 @@ Then I cleaned up the author authorities and HTML characters in OpenRefine and s
access_log /var/log/nginx/oai.log;
proxy_pass http://tomcat_http;
}
-2017-08-13
+2017-08-13
- Macaroni Bros say that CCAFS wants them to check once every hour for changes
- I told them to check every four or six hours
-2017-08-14
+2017-08-14
- Run author corrections on CGIAR Library community from Peter
@@ -300,7 +300,7 @@ $ grep -rsI SQLException dspace-xmlui | wc -l
- Apply 223 more author corrections from Peter on CGIAR Library
- Help Magdalena from CCAFS with some CUA statistics questions
-2017-08-15
+2017-08-15
- Increase the nginx upload limit on CGSpace (linode18) so Sisay can upload 23 CIAT reports
- Do some last minute cleanups and de-duplications of the CGIAR Library data, as I need to send it to Peter this week
@@ -308,7 +308,7 @@ $ grep -rsI SQLException dspace-xmlui | wc -l
- Also, a few dozen
dc.description.abstract
fields still had various HTML tags and entities in them
- Also, a bunch of
dc.subject
fields that were not AGROVOC had not been moved properly to cg.system.subject
-2017-08-16
+2017-08-16
- I wanted to merge the various field variations like
cg.subject.system
and cg.subject.system[en_US]
in OpenRefine but I realized it would be easier in PostgreSQL:
@@ -351,7 +351,7 @@ UPDATE 4899
I think we could use harvest.includerestricted.rss = false
but the items might need to be 100% restricted, not just the metadata
Adjust Ansible postgres role to use max_connections
from a template variable and deploy a new limit of 123 on CGSpace
-2017-08-17
+2017-08-17
- Run Peter's edits to the CGIAR System Organization community on DSpace Test
- Uptime Robot said CGSpace went down for 1 minute, not sure why
@@ -395,7 +395,7 @@ dspace.log.2017-08-17:584
- Peter responded and said that he doesn't want to limit items to be restricted just so we can change the RSS feeds
-2017-08-18
+2017-08-18
- Someone on the dspace-tech mailing list responded with some tips about using the authority framework to do external queries from the submission form
- He linked to some examples from DSpace-CRIS that use this functionality: VIAFAuthority
@@ -432,14 +432,14 @@ WHERE {
- I found this blog post about speeding up the Tomcat startup time: http://skybert.net/java/improve-tomcat-startup-time/
- The startup time went from ~80s to 40s!
-2017-08-19
+2017-08-19
- More examples of SPARQL queries: https://github.com/rsinger/openlcsh/wiki/Sparql-Examples
- Specifically the explanation of the
FILTER
regex
- Might want to
SELECT DISTINCT
or increase the LIMIT
to get terms like “wheat” and “fish” to be visible
- Test queries online on the AGROVOC SPARQL portal: http://202.45.139.84:10035/catalogs/fao/repositories/agrovoc
-2017-08-20
+2017-08-20
- Since I cleared the XMLUI cache on 2017-08-17 there haven't been any more
ERROR net.sf.ehcache.store.DiskStore
errors
- Look at the CGIAR Library to see if I can find the items that have been submitted since May:
@@ -466,16 +466,16 @@ WHERE {
10947/4661
10947/4664
(5 rows)
-2017-08-23
+2017-08-23
- Start testing the nginx configs for the CGIAR Library migration as well as start making a checklist
-2017-08-28
+2017-08-28
- Bram had written to me two weeks ago to set up a chat about ORCID stuff but the email apparently bounced and I only found out when he emaiiled me on another account
- I told him I can chat in a few weeks when I'm back
-2017-08-31
+2017-08-31
- I notice that in many WLE collections Marianne Gadeberg is in the edit or approval steps, but she is also in the groups for those steps.
- I think we need to have a process to go back and check / fix some of these scenarios—to remove her user from the step and instead add her to the group—because we have way too many authorizations and in late 2016 we had performance issues with Solr because of this
diff --git a/docs/2017-09/index.html b/docs/2017-09/index.html
index 63aba98fb..18349316a 100644
--- a/docs/2017-09/index.html
+++ b/docs/2017-09/index.html
@@ -29,7 +29,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two
Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group
"/>
-
+
@@ -110,15 +110,15 @@ Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is
- 2017-09-06
+ 2017-09-06
- Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two hours
-2017-09-07
+2017-09-07
- Ask Sisay to clean up the WLE approvers a bit, as Marianne's user account is both in the approvers step as well as the group
-2017-09-10
+2017-09-10
- Delete 58 blank metadata values from the CGSpace database:
@@ -155,12 +155,12 @@ dspace.log.2017-09-10:0
- I updated both CGSpace and DSpace Test to use these new settings (60 connections per web app and 183 for system PostgreSQL limit)
- I'm expecting to see 0 connection errors for the next few months
-2017-09-11
+2017-09-11
-2017-09-12
+2017-09-12
- I was testing the METS XSD caching during AIP ingest but it doesn't seem to help actually
- The import process takes the same amount of time with and without the caching
@@ -190,7 +190,7 @@ dspace.log.2017-09-10:0
-2017-09-13
+2017-09-13
- Last night Linode sent an alert about CGSpace (linode18) that it has exceeded the outbound traffic rate threshold of 10Mb/s for the last two hours
- I wonder what was going on, and looking into the nginx logs I think maybe it's OAI…
@@ -406,7 +406,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
- It added another authority… surely this is not the desired behavior, or maybe we are not using this as intented?
-2017-09-14
+2017-09-14
- Communicate with Handle.net admins to try to get some guidance about the 10947 prefix
- Michael Marus is the contact for their prefix but he has left CGIAR, but as I actually have access to the CGIAR Library server I think I can just generate a new
sitebndl.zip
file from their server and send it to Handle.net
@@ -415,7 +415,7 @@ delete from metadatavalue where resource_type_id=2 and metadata_field_id in (134
- I didn't see any abnormally high usage in the REST or OAI logs, but looking at Munin I see the average JVM usage was at 4.9GB and the heap is only 5GB (5120M), so I think it's just normal growing pains
- Every few months I generally try to increase the JVM heap to be 512M higher than the average usage reported by Munin, so now I adjusted it to 5632M
-2017-09-15
+2017-09-15
- Apply CCAFS project tag corrections on CGSpace:
@@ -425,7 +425,7 @@ UPDATE 4
UPDATE 1
DELETE 1
DELETE 207
-2017-09-17
+2017-09-17
- Create pull request for CGSpace to be able to resolve multiple handles (#339)
- We still need to do the changes to
config.dct
and regenerate the sitebndl.zip
to send to the Handle.net admins
@@ -456,7 +456,7 @@ DELETE 207
- I decided to start the import process in the evening rather than waiting for the morning, and right as the first community was finished importing I started seeing
Timeout waiting for idle object
errors
- I had to cancel the import, clean up a bunch of database entries, increase the PostgreSQL
max_connections
as a precaution, restart PostgreSQL and Tomcat, and then finally completed the import
-2017-09-18
+2017-09-18
- I think we should force regeneration of all thumbnails in the CGIAR Library community, as their DSpace is version 1.7 and CGSpace is running DSpace 5.5 so they should look much better
- One item for comparison:
@@ -466,7 +466,7 @@ DELETE 207
- Moved the CGIAR Library Migration notes to a page — cgiar-library-migration — as there seems to be a bug with post slugs defined in frontmatter when you have a permalink scheme defined in
config.toml
(happens currently in Hugo 0.27.1 at least)
-2017-09-19
+2017-09-19
- Nightly Solr indexing is working again, and it appears to be pretty quick actually:
@@ -481,7 +481,7 @@ DELETE 207
- Marianne Gadeberg from WLE asked if I would add an account for Adam Hunt on CGSpace and give him permissions to approve all WLE publications
- I told him to register first, as he's a CGIAR user and needs an account to be created before I can add him to the groups
-2017-09-20
+2017-09-20
- Abenet and I noticed that hdl.handle.net is blocked by ETC at ILRI Addis so I asked Biruk Debebe to route it over the satellite
- Force thumbnail regeneration for the CGIAR System Organization's Historic Archive community (2000 items):
@@ -490,19 +490,19 @@ DELETE 207
- I'm still waiting (over 1 day later) to hear back from the CGIAR System Organization about updating the DNS for library.cgiar.org
-2017-09-21
+2017-09-21
- Switch to OpenJDK 8 from Oracle JDK on DSpace Test
- I want to test this for awhile to see if we can start using it instead
- I need to look at the JVM graphs in Munin, test the Atmire modules, build the source, etc to get some impressions
-2017-09-22
+2017-09-22
-2017-09-24
+2017-09-24
- Start investigating other platforms for CGSpace due to linear instance pricing on Linode
- We need to figure out how much memory is used by applications, caches, etc, and how much disk space the asset store needs
@@ -538,7 +538,7 @@ DELETE 207
- I ended up having to kill the import and wait until he was done
- I exported a clean CSV and applied the changes from that one, which was a hundred or two less than I thought there should be (at least compared to the current state of DSpace Test, which is a few months old)
-2017-09-25
+2017-09-25
- Email Rosemary Kande from ICT to ask about the administrative / finance procedure for moving DSpace Test from EU to US region on Linode
- Communicate (finally) with Tania and Tunji from the CGIAR System Organization office to tell them to request CGNET make the DNS updates for library.cgiar.org
@@ -602,7 +602,7 @@ INFO org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Da
- So it's good to know that something gets printed when it fails because I didn't see any mention of JNDI before when I was testing!
-2017-09-26
+2017-09-26
- Adam Hunt from WLE finally registered so I added him to the editor and approver groups
- Then I noticed that Sisay never removed Marianne's user accounts from the approver steps in the workflow because she is already in the WLE groups, which are in those steps
@@ -613,7 +613,7 @@ INFO org.dspace.storage.rdbms.DatabaseManager @ Falling back to creating own Da
- Start discussiong with ICT about Linode server update for DSpace Test
- Rosemary said I need to work with Robert Okal to destroy/create the server, and then let her and Lilian Masigah from finance know the updated Linode asset names for their records
-2017-09-28
+2017-09-28
- Tunji from the System Organization finally sent the DNS request for library.cgiar.org to CGNET
- Now the redirects work
diff --git a/docs/2017-10/index.html b/docs/2017-10/index.html
index 6a577b627..a8434d992 100644
--- a/docs/2017-10/index.html
+++ b/docs/2017-10/index.html
@@ -31,7 +31,7 @@ http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
"/>
-
+
@@ -112,7 +112,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
- 2017-10-01
+ 2017-10-01
@@ -121,7 +121,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
- There appears to be a pattern but I'll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
- Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
-2017-10-02
+2017-10-02
- Peter Ballantyne said he was having problems logging into CGSpace with “both” of his accounts (CGIAR LDAP and personal, apparently)
- I looked in the logs and saw some LDAP lookup failures due to timeout but also strangely a “no DN found” error:
@@ -138,7 +138,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
- For what it's worth, there are no errors on any other recent days, so it must have been some network issue on Linode or CGNET's LDAP server
- Linode emailed to say that linode578611 (DSpace Test) needs to migrate to a new host for a security update so I initiated the migration immediately rather than waiting for the scheduled time in two weeks
-2017-10-04
+2017-10-04
- Twice in the last twenty-four hours Linode has alerted about high CPU usage on CGSpace (linode2533629)
- Communicate with Sam from the CGIAR System Organization about some broken links coming from their CGIAR Library domain to CGSpace
@@ -152,7 +152,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
- Lots of inconsistencies and errors in subjects, dc.format.extent, regions, countries
- Merge the Discovery search changes for ISI Journal (#341)
-2017-10-05
+2017-10-05
- Twice in the past twenty-four hours Linode has warned that CGSpace's outbound traffic rate was exceeding the notification threshold
- I had a look at yesterday's OAI and REST logs in
/var/log/nginx
but didn't see anything unusual:
@@ -188,7 +188,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
- I used OpenRefine to isolate them and then fixed and re-imported them into CGSpace
- I manually checked a dozen of them and it appeared that the correct handle was always the second one, so I just deleted the first one
-2017-10-06
+2017-10-06
- I saw a nice tweak to thumbnail presentation on the Cardiff Metropolitan University DSpace: https://repository.cardiffmet.ac.uk/handle/10369/8780
- It adds a subtle border and box shadow, before and after:
@@ -203,7 +203,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
- This method is kinda a hack but at least we can put all the pieces into git to be reproducible
- I will tell Tunji to send me the verification file
-2017-10-10
+2017-10-10
- Deploy logic to allow verification of the library.cgiar.org domain in the Google Search Console (#343)
- After verifying both the HTTP and HTTPS domains and submitting a sitemap it will be interesting to see how the stats in the console as well as the search results change (currently 28,500 results):
@@ -226,7 +226,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
- Delete community 10568/174 (Sustainable livestock futures)
- Delete collections in 10568/27629 that have zero items (33 of them!)
-2017-10-11
+2017-10-11
- Peter added me as an owner on the CGSpace property on Google Search Console and I tried to submit a “Change of Address” request for the CGIAR Library but got an error:
@@ -235,25 +235,25 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
- We are sending top-level CGIAR Library traffic to their specific community hierarchy in CGSpace so this type of change of address won't work—we'll just need to wait for Google to slowly index everything and take note of the HTTP 301 redirects
- Also the Google Search Console doesn't work very well with Google Analytics being blocked, so I had to turn off my ad blocker to get the “Change of Address” tool to work!
-2017-10-12
+2017-10-12
- Finally finish (I think) working on the myriad nginx redirects for all the CGIAR Library browse stuff—it ended up getting pretty complicated!
- I still need to commit the DSpace changes (add browse index, XMLUI strings, Discovery index, etc), but I should be able to deploy that on CGSpace soon
-2017-10-14
+2017-10-14
- Run system updates on DSpace Test and reboot server
- Merge changes adding a search/browse index for CGIAR System subject to
5_x-prod
(#344)
- I checked the top browse links in Google's search results for
site:library.cgiar.org inurl:browse
and they are all redirected appropriately by the nginx rewrites I worked on last week
-2017-10-22
+2017-10-22
- Run system updates on DSpace Test and reboot server
- Re-deploy CGSpace from latest
5_x-prod
(adds ISI Journal to search filters and adds Discovery index for CGIAR Library systemsubject
)
- Deploy nginx redirect fixes to catch CGIAR Library browse links (redirect to their community and translate subject→systemsubject)
- Run migration of CGSpace server (linode18) for Linode security alert, which took 42 minutes of downtime
-2017-10-26
+2017-10-26
- In the last 24 hours we've gotten a few alerts from Linode that there was high CPU and outgoing traffic on CGSpace
- Uptime Robot even noticed CGSpace go “down” for a few minutes
@@ -280,15 +280,15 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
- I told her about the possibility to use per-collection item templates, and asked if her items in question were all from a single collection
- We've never used it but it could be worth looking at
-2017-10-27
+2017-10-27
- Linode alerted about high CPU usage again (twice) on CGSpace in the last 24 hours, around 2AM and 2PM
-2017-10-28
+2017-10-28
- Linode alerted about high CPU usage again on CGSpace around 2AM this morning
-2017-10-29
+2017-10-29
- Linode alerted about high CPU usage again on CGSpace around 2AM and 4AM
- I'm still not sure why this started causing alerts so repeatadely the past week
@@ -310,7 +310,7 @@ Add Katherine Lutz to the groups for content submission and edit steps of the CG
- After browsing the CORE site it seems that the CGIAR Library is somehow a member of CORE, so they have probably only been harvesting CGSpace since we did the migration, as library.cgiar.org directs to us now
- For now I will just contact them to have them update their contact info in the bot's user agent, but eventually I think I'll tell them to swap out the CGIAR Library entry for CGSpace
-2017-10-30
+2017-10-30
- Like clock work, Linode alerted about high CPU usage on CGSpace again this morning (this time at 8:13 AM)
- Uptime Robot noticed that CGSpace went down around 10:15 AM, and I saw that there were 93 PostgreSQL connections:
@@ -385,7 +385,7 @@ session_id=6C30F10B4351A4ED83EC6ED50AFD6B6A
- I will check again tomorrow
-2017-10-31
+2017-10-31
- Very nice, Linode alerted that CGSpace had high CPU usage at 2AM again
- Ask on the dspace-tech mailing list if it's possible to use an existing item as a template for a new item
diff --git a/docs/2017-11/index.html b/docs/2017-11/index.html
index 67b3ae51b..015fd0b83 100644
--- a/docs/2017-11/index.html
+++ b/docs/2017-11/index.html
@@ -45,7 +45,7 @@ Generate list of authors on CGSpace for Peter to go through and correct:
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors.csv with csv;
COPY 54701
"/>
-
+
@@ -126,11 +126,11 @@ COPY 54701
- 2017-11-01
+ 2017-11-01
- The CORE developers responded to say they are looking into their bot not respecting our robots.txt
-2017-11-02
+2017-11-02
- Today there have been no hits by CORE and no alerts from Linode (coincidence?)
@@ -156,12 +156,12 @@ COPY 54701
- Also, some dates like with completely invalid format like “2010- 06” and “2011-3-28”
- I also collapsed some consecutive whitespace on a handful of fields
-2017-11-03
+2017-11-03
- Atmire got back to us to say that they estimate it will take two days of labor to implement the change to Listings and Reports
- I said I'd ask Abenet if she wants that feature
-2017-11-04
+2017-11-04
- I finished looking through Sisay's CIAT records for the “Alianzas de Aprendizaje” data
- I corrected about half of the authors to standardize them
@@ -198,7 +198,7 @@ COPY 54701
- For now I don't know what this user is!
-2017-11-05
+2017-11-05
- Peter asked if I could fix the appearance of “International Livestock Research Institute” in the author lookup during item submission
- It looks to be just an issue with the user interface expecting authors to have both a first and last name:
@@ -226,7 +226,7 @@ COPY 54701
- This guide shows how to enable JMX in Tomcat by modifying
CATALINA_OPTS
- I was able to successfully connect to my local Tomcat with jconsole!
-2017-11-07
+2017-11-07
- CGSpace when down and up a few times this morning, first around 3AM, then around 7
- Tsega had to restart Tomcat 7 to fix it temporarily
@@ -464,7 +464,7 @@ $ grep -Io -E 'session_id=[A-Z0-9]{32}:ip_addr=104.196.152.243' dspace.log.2017-
# grep "Baiduspider/2.0" /var/log/nginx/access.log | awk '{print $1}' | sort -n | uniq | wc -l
164
-
2017-11-08
+2017-11-08
- Linode sent several alerts last night about CPU usage and outbound traffic rate at 6:13PM
- Linode sent another alert about CPU usage in the morning at 6:12AM
@@ -526,7 +526,7 @@ proxy_set_header User-Agent $ua;
- Run system updates on CGSpace and reboot the server
- Re-deploy latest
5_x-prod
branch on CGSpace and DSpace Test (includes the clickable thumbnails, CCAFS phase II project tags, and updated news text)
-2017-11-09
+2017-11-09
- Awesome, it seems my bot mapping stuff in nginx actually reduced the number of Tomcat sessions used by the CIAT scraper today, total requests and unique sessions:
@@ -550,13 +550,13 @@ $ grep 104.196.152.243 dspace.log.2017-11-07 | grep -o -E 'session_id=[A-Z0-9]{3
This gets me thinking, I wonder if I can use something like nginx's rate limiter to automatically change the user agent of clients who make too many requests
Perhaps using a combination of geo and map, like illustrated here: https://www.nginx.com/blog/rate-limiting-nginx/
-2017-11-11
+2017-11-11
- I was looking at the Google index and noticed there are 4,090 search results for dspace.ilri.org but only seven for mahider.ilri.org
- Search with something like: inurl:dspace.ilri.org inurl:https
- I want to get rid of those legacy domains eventually!
-2017-11-12
+2017-11-12
- Update the Ansible infrastructure templates to be a little more modular and flexible
- Looking at the top client IPs on CGSpace so far this morning, even though it's only been eight hours:
@@ -630,7 +630,7 @@ Server: nginx
- The first request works, second is denied with an HTTP 503!
- I need to remember to check the Munin graphs for PostgreSQL and JVM next week to see how this affects them
-2017-11-13
+2017-11-13
- At the end of the day I checked the logs and it really looks like the Baidu rate limiting is working, HTTP 200 vs 503:
@@ -659,7 +659,7 @@ Server: nginx
After uploading and looking at the data in DSpace Test I saw more errors with CRPs, subjects (one item had four copies of all of its subjects, another had a “.” in it), affiliations, sponsors, etc.
Atmire responded to the ticket about ORCID stuff a few days ago, today I told them that I need to talk to Peter and the partners to see what we would like to do
-2017-11-14
+2017-11-14
- Deploy some nginx configuration updates to CGSpace
- They had been waiting on a branch for a few months and I think I just forgot about them
@@ -674,13 +674,13 @@ dspace6=# CREATE EXTENSION pgcrypto;
- I'm not sure if we can use separate profiles like we did before with
mvn -Denv=blah
to use blah.properties
- It seems we need to use “system properties” to override settings, ie:
-Ddspace.dir=/Users/aorth/dspace6
-2017-11-15
+2017-11-15
- Send Adam Hunt an invite to the DSpace Developers network on Yammer
- He is the new head of communications at WLE, since Michael left
- Merge changes to item view's wording of link metadata (#348)
-2017-11-17
+2017-11-17
- Uptime Robot said that CGSpace went down today and I see lots of
Timeout waiting for idle object
errors in the DSpace logs
- I looked in PostgreSQL using
SELECT * FROM pg_stat_activity;
and saw that there were 73 active connections
@@ -724,7 +724,7 @@ dspace6=# CREATE EXTENSION pgcrypto;
- Switch DSpace Test to using the G1GC for JVM so I can see what the JVM graph looks like eventually, and start evaluating it for production
-2017-11-19
+2017-11-19
- Linode sent an alert that CGSpace was using a lot of CPU around 4–6 AM
- Looking in the nginx access logs I see the most active XMLUI users between 4 and 6 AM:
@@ -762,18 +762,18 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
- It's been a few days since I enabled the G1GC on DSpace Test and the JVM graph definitely changed:
-2017-11-20
+2017-11-20
- I found an article about JVM tuning that gives some pointers how to enable logging and tools to analyze logs for you
- Also notes on rotating GC logs
- I decided to switch DSpace Test back to the CMS garbage collector because it is designed for low pauses and high throughput (like G1GC!) and because we haven't even tried to monitor or tune it
-2017-11-21
+2017-11-21
- Magdalena was having problems logging in via LDAP and it seems to be a problem with the CGIAR LDAP server:
2017-11-21 11:11:09,621 WARN org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=2FEC0E5286C17B6694567FFD77C3171C:ip_addr=77.241.141.58:ldap_authentication:type=failed_auth javax.naming.CommunicationException\colon; simple bind failed\colon; svcgroot2.cgiarad.org\colon;3269 [Root exception is javax.net.ssl.SSLHandshakeException\colon; sun.security.validator.ValidatorException\colon; PKIX path validation failed\colon; java.security.cert.CertPathValidatorException\colon; validity check failed]
-
2017-11-22
+2017-11-22
- Linode sent an alert that the CPU usage on the CGSpace server was very high around 4 to 6 AM
- The logs don't show anything particularly abnormal between those hours:
@@ -794,7 +794,7 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
- In other news, it looks like the JVM garbage collection pattern is back to its standard jigsaw pattern after switching back to CMS a few days ago:
-2017-11-23
+2017-11-23
- Linode alerted again that CPU usage was high on CGSpace from 4:13 to 6:13 AM
- I see a lot of Googlebot (66.249.66.90) in the XMLUI access logs
@@ -838,7 +838,7 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
- Apparently setting
random_page_cost
to 1 is “common” advice for systems running PostgreSQL on SSD (the default is 4)
- So I deployed this on DSpace Test and will check the Munin PostgreSQL graphs in a few days to see if anything changes
-2017-11-24
+2017-11-24
- It's too early to tell for sure, but after I made the
random_page_cost
change on DSpace Test's PostgreSQL yesterday the number of connections dropped drastically:
@@ -857,7 +857,7 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
- I should probably tell CGIAR people to have CGNET stop that
-2017-11-26
+2017-11-26
- Linode alerted that CGSpace server was using too much CPU from 5:18 to 7:18 AM
- Yet another mystery because the load for all domains looks fine at that time:
@@ -873,7 +873,7 @@ $ grep -c com.atmire.utils.UpdateSolrStatsMetadata dspace.log.2017-11-19
298 157.55.39.206
379 66.249.66.70
1855 66.249.66.90
-2017-11-29
+2017-11-29
- Linode alerted that CGSpace was using 279% CPU from 6 to 8 AM this morning
- About an hour later Uptime Robot said that the server was down
@@ -911,7 +911,7 @@ $ cat dspace.log.2017-11-28 | grep -o -E 'session_id=[A-Z0-9]{32}' | sort -n | u
- I will bump DSpace's
db.maxconnections
from 60 to 90, and PostgreSQL's max_connections
from 183 to 273 (which is using my loose formula of 90 * webapps + 3)
- I really need to figure out how to get DSpace to use a PostgreSQL connection pool
-2017-11-30
+2017-11-30
- Linode alerted about high CPU usage on CGSpace again around 6 to 8 AM
- Then Uptime Robot said CGSpace was down a few minutes later, but it resolved itself I think (or Tsega restarted Tomcat, I don't know)
diff --git a/docs/2017-12/index.html b/docs/2017-12/index.html
index d9fa5fad9..a2cd209ab 100644
--- a/docs/2017-12/index.html
+++ b/docs/2017-12/index.html
@@ -27,7 +27,7 @@ The logs say “Timeout waiting for idle object”
PostgreSQL activity says there are 115 connections currently
The list of connections to XMLUI and REST API for today:
"/>
-
+
@@ -108,7 +108,7 @@ The list of connections to XMLUI and REST API for today:
- 2017-12-01
+ 2017-12-01
- Uptime Robot noticed that CGSpace went down
- The logs say “Timeout waiting for idle object”
@@ -166,11 +166,11 @@ The list of connections to XMLUI and REST API for today:
14 2a01:7e00::f03c:91ff:fe18:7396
46 2001:4b99:1:1:216:3eff:fe2c:dc6c
319 2001:4b99:1:1:216:3eff:fe76:205b
-2017-12-03
+2017-12-03
- Linode alerted that CGSpace's load was 327.5% from 6 to 8 AM again
-2017-12-04
+2017-12-04
- Linode alerted that CGSpace's load was 255.5% from 8 to 10 AM again
- I looked at the Munin stats on DSpace Test (linode02) again to see how the PostgreSQL tweaks from a few weeks ago were holding up:
@@ -184,13 +184,13 @@ The list of connections to XMLUI and REST API for today:
- For reference, here is the past month's connections:
-2017-12-05
+2017-12-05
-2017-12-06
+2017-12-06
- Linode alerted again that the CPU usage on CGSpace was high this morning from 6 to 8 AM
- Uptime Robot alerted that the server went down and up around 8:53 this morning
@@ -212,7 +212,7 @@ The list of connections to XMLUI and REST API for today:
- 50.116.102.77 is apparently in the US on websitewelcome.com
-2017-12-07
+2017-12-07
- Uptime Robot reported a few times today that CGSpace was down and then up
- At one point Tsega restarted Tomcat
@@ -254,17 +254,17 @@ Error: ERROR: update or delete on table "bitstream" violates foreign k
dspace=# update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (144666);
UPDATE 1
-
2017-12-13
+2017-12-13
- Linode alerted that CGSpace was using high CPU from 10:13 to 12:13 this morning
-2017-12-16
+2017-12-16
- Re-work the XMLUI base theme to allow child themes to override the header logo's image and link destination: #349
- This required a little bit of work to restructure the XSL templates
- Optimize PNG and SVG image assets in the CGIAR base theme using pngquant and svgo: #350
-2017-12-17
+2017-12-17
- Reboot DSpace Test to get new Linode Linux kernel
- Looking at CCAFS bulk import for Magdalena Haman (she originally sent them in November but some of the thumbnails were missing and dates were messed up so she resent them now)
@@ -358,7 +358,7 @@ Elapsed time: 2 secs (2559 msecs)
- I will apply it on our branch but I need to make a note to NOT cherry-pick it when I rebase on to the latest 5.x upstream later
- Pull request: #351
-2017-12-18
+2017-12-18
- Linode alerted this morning that there was high outbound traffic from 6 to 8 AM
- The XMLUI logs show that the CORE bot from last night (137.108.70.7) is very active still:
@@ -453,7 +453,7 @@ $ schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery
- The PostgreSQL issues are getting out of control, I need to figure out how to enable connection pools in Tomcat!
-2017-12-19
+2017-12-19
- Briefly had PostgreSQL connection issues on CGSpace for the millionth time
- I'm fucking sick of this!
@@ -651,7 +651,7 @@ javax.naming.NoInitialContextException: Need to specify class name in environmen
- If you monitor the
pg_stat_activity
while you run dspace database info
you can see that it doesn't use the JNDI and creates ~9 extra PostgreSQL connections!
- And in the middle of all of this Linode sends an alert that CGSpace has high CPU usage from 2 to 4 PM
-2017-12-20
+2017-12-20
- The database connection pooling is definitely better!
@@ -674,7 +674,7 @@ $ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -i 10568/89287
-2017-12-24
+2017-12-24
- Linode alerted that CGSpace was using high CPU this morning around 6 AM
- I'm playing with reading all of a month's nginx logs into goaccess:
@@ -690,13 +690,13 @@ $ schedtool -D -e ionice -c2 -n7 nice -n19 dspace filter-media -i 10568/89287
-2017-12-25
+2017-12-25
- The PostgreSQL connection pooling is much better when using the Tomcat JNDI pool
- Here are the Munin stats for the past week on CGSpace:
-2017-12-29
+2017-12-29
- Looking at some old notes for metadata to clean up, I found a few hundred corrections in
cg.fulltextstatus
and dc.language.iso
:
@@ -721,7 +721,7 @@ DELETE 20
- I need to figure out why we have records with language
in
because that's not a language!
-2017-12-30
+2017-12-30
- Linode alerted that CGSpace was using 259% CPU from 4 to 6 AM
- Uptime Robot noticed that the server went down for 1 minute a few hours later, around 9AM
@@ -748,7 +748,7 @@ DELETE 20
- 216.244.66.245 seems to be moz.com's DotBot
-2017-12-31
+2017-12-31
- I finished working on the 42 records for CCAFS after Magdalena sent the remaining corrections
- After that I uploaded them to CGSpace:
diff --git a/docs/2018-01/index.html b/docs/2018-01/index.html
index a2ff296a2..cbbec0650 100644
--- a/docs/2018-01/index.html
+++ b/docs/2018-01/index.html
@@ -147,7 +147,7 @@ dspace.log.2018-01-02:34
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains
"/>
-
+
@@ -228,7 +228,7 @@ Danny wrote to ask for help renewing the wildcard ilri.org certificate and I adv
- 2018-01-02
+ 2018-01-02
- Uptime Robot noticed that CGSpace went down and up a few times last night, for a few minutes each time
- I didn't get any load alerts from Linode and the REST and XMLUI logs don't show anything out of the ordinary
@@ -295,7 +295,7 @@ dspace.log.2018-01-02:34
- Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let's Encrypt if it's just a handful of domains
-2018-01-03
+2018-01-03
- I woke up to more up and down of CGSpace, this time UptimeRobot noticed a few rounds of up and down of a few minutes each and Linode also notified of high CPU load from 12 to 2 PM
- Looks like I need to increase the database pool size again:
@@ -389,7 +389,7 @@ dspace.log.2018-01-03:1909
- I guess for now I just have to increase the database connection pool's max active
- It's currently 75 and normally I'd just bump it by 25 but let me be a bit daring and push it by 50 to 125, because I used to see at least 121 connections in pg_stat_activity before when we were using the shitty default pooling
-2018-01-04
+2018-01-04
- CGSpace went down and up a bunch of times last night and ILRI staff were complaining a lot last night
- The XMLUI logs show this activity:
@@ -423,7 +423,7 @@ dspace.log.2018-01-04:1559
- Once I get back to Amman I will have to try to create different database pools for different web applications, like recently discussed on the dspace-tech mailing list
- Create accounts on CGSpace for two CTA staff km4ard@cta.int and bheenick@cta.int
-2018-01-05
+2018-01-05
- Peter said that CGSpace was down last night and Tsega restarted Tomcat
- I don't see any alerts from Linode or UptimeRobot, and there are no PostgreSQL connection errors in the dspace logs for today:
@@ -453,7 +453,7 @@ sys 3m14.890s
-2018-01-06
+2018-01-06
- I'm still seeing Solr errors in the DSpace logs even after the full reindex yesterday:
@@ -461,14 +461,14 @@ sys 3m14.890s
- I posted a message to the dspace-tech mailing list to see if anyone can help
-2018-01-09
+2018-01-09
- Advise Sisay about blank lines in some IITA records
- Generate a list of author affiliations for Peter to clean up:
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv;
COPY 4515
-
2018-01-10
+2018-01-10
- I looked to see what happened to this year's Solr statistics sharding task that should have run on 2018-01-01 and of course it failed:
@@ -619,7 +619,7 @@ cache_alignment : 64
Citing concerns with metadata quality, I suggested adding him on DSpace Test first
I opened a ticket with Atmire to ask them about DSpace 5.8 compatibility: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560
-2018-01-11
+2018-01-11
- The PostgreSQL and firewall graphs from this week show clearly the load from the new bot from PerfectIP.net yesterday:
@@ -673,7 +673,7 @@ cache_alignment : 64
- With that it is super easy to see where PostgreSQL connections are coming from in
pg_stat_activity
-2018-01-12
+2018-01-12
@@ -698,7 +698,7 @@ cache_alignment : 64
- That could be very interesting
-2018-01-13
+2018-01-13
- Still testing DSpace 6.2 on Tomcat 8.5.24
- Catalina errors at Tomcat 8.5 startup:
@@ -741,14 +741,14 @@ Caused by: java.lang.NullPointerException
- Shit, this might actually be a DSpace error: https://jira.duraspace.org/browse/DS-3434
- I'll comment on that issue
-2018-01-14
+2018-01-14
- Looking at the authors Peter had corrected
- Some had multiple and he's corrected them by adding
||
in the correction column, but I can't process those this way so I will just have to flag them and do those manually later
- Also, I can flag the values that have “DELETE”
- Then I need to facet the correction column on isBlank(value) and not flagged
-2018-01-15
+2018-01-15
- Help Udana from IWMI export a CSV from DSpace Test so he can start trying a batch upload
- I'm going to apply these ~130 corrections on CGSpace:
@@ -830,7 +830,7 @@ COPY 4552
real 0m25.756s
user 0m28.016s
sys 0m2.210s
-2018-01-16
+2018-01-16
- Meeting with CGSpace team, a few action items:
@@ -849,7 +849,7 @@ sys 0m2.210s
- I ended up creating a Jira issue for my
db.jndi
documentation fix: DS-3803
- The DSpace developers said they wanted each pull request to be associated with a Jira issue
-2018-01-17
+2018-01-17
- Abenet asked me to proof and upload 54 records for LIVES
- A few records were missing countries (even though they're all from Ethiopia)
@@ -990,7 +990,7 @@ $ docker run --network dspace-build --name artifactory -d -v artifactory5_data:/
- Overall the heap space usage in the munin graph seems ok, though I usually increase it by 512MB over the average a few times per year as usage grows
- But maybe I should increase it by more, like 1024MB, to give a bit more head room
-2018-01-18
+2018-01-18
- UptimeRobot said CGSpace was down for 1 minute last night
- I don't see any errors in the nginx or catalina logs, so I guess UptimeRobot just got impatient and closed the request, which caused nginx to send an HTTP 499
@@ -1013,7 +1013,7 @@ Jan 18 07:01:22 linode18 sudo[10812]: pam_unix(sudo:session): session opened for
- I had to cancel the Discovery indexing and I'll have to re-try it another time when the server isn't so busy (it had already taken two hours and wasn't even close to being done)
- For now I've increased the Tomcat JVM heap from 5632 to 6144m, to give ~1GB of free memory over the average usage to hopefully account for spikes caused by load or background jobs
-2018-01-19
+2018-01-19
- Linode alerted and said that the CPU load was 264.1% on CGSpace
- Start the Discovery indexing again:
@@ -1029,7 +1029,7 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 /home/cgspace.cgiar.org/bin/dspa
- I told Peter we should keep an eye out and try again next week
-2018-01-20
+2018-01-20
- Run the authority indexing script on CGSpace and of course it died:
@@ -1072,7 +1072,7 @@ $ docker exec dspace_db psql -U postgres dspace -c 'alter user dspace nocreateus
$ docker exec dspace_db vacuumdb -U postgres dspace
$ docker cp ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspace_db:/tmp
$ docker exec dspace_db psql -U dspace -f /tmp/update-sequences.sql dspace
-2018-01-22
+2018-01-22
- Look over Udana's CSV of 25 WLE records from last week
- I sent him some corrections:
@@ -1106,7 +1106,7 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
- I'd still like to get arbitrary mbeans like activeSessions etc, though
- I can't remember if I had to configure the jmx settings in
/etc/munin/plugin-conf.d/munin-node
or not—I think all I did was re-run the munin-node-configure
script and of course enable JMX in Tomcat's JVM options
-2018-01-23
+2018-01-23
- Thinking about generating a jmeter test plan for DSpace, along the lines of Georgetown's dspace-performance-test
- I got a list of all the GET requests on CGSpace for January 21st (the last time Linode complained the load was high), excluding admin calls:
@@ -1141,7 +1141,7 @@ $ ./rest-find-collections.py 10568/1 | grep -i untitled
- I can definitely design a test plan on this!
-2018-01-24
+2018-01-24
- Looking at the REST requests, most of them are to expand all or metadata, but 5% are for retrieving bitstreams:
@@ -1205,7 +1205,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
- Then I generated reports for these runs like this:
$ jmeter -g 2018-01-24-linode5451120-baseline.jtl -o 2018-01-24-linode5451120-baseline
-
2018-01-25
+2018-01-25
- Run another round of tests on DSpace Test with jmeter after changing Tomcat's
minSpareThreads
to 20 (default is 10) and acceptorThreadCount
to 2 (default is 1):
@@ -1222,7 +1222,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
- I haven't had time to look at the results yet
-2018-01-26
+2018-01-26
- Peter followed up about some of the points from the Skype meeting last week
- Regarding the ORCID field issue, I see ICARDA's MELSpace is using
cg.creator.ID
: 0000-0001-9156-7691
@@ -1246,7 +1246,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
- I submitted a test item with ORCiDs and dc.rights from a controlled vocabulary on DSpace Test: https://dspacetest.cgiar.org/handle/10568/97703
- I will send it to Peter to check and give feedback (ie, about the ORCiD field name as well as allowing users to add ORCiDs manually or not)
-2018-01-28
+2018-01-28
- Assist Udana from WLE again to proof his 25 records and upload them to DSpace Test
- I am playing with the
startStopThreads="0"
parameter in Tomcat <Engine>
and <Host>
configuration
@@ -1254,7 +1254,7 @@ $ ./jmeter -n -t ~/dspace-performance-test/DSpacePerfTest-dspacetest.cgiar.org.j
- On my local test machine the startup time went from 70 to 30 seconds
- See: https://tomcat.apache.org/tomcat-7.0-doc/config/host.html
-2018-01-29
+2018-01-29
- CGSpace went down this morning for a few minutes, according to UptimeRobot
- Looking at the DSpace logs I see this error happened just before UptimeRobot noticed it going down:
@@ -1353,7 +1353,7 @@ Catalina:type=DataSource,class=javax.sql.DataSource,name="jdbc/dspace"
-2018-01-31
+2018-01-31
- UptimeRobot says CGSpace went down at 7:57 AM, and indeed I see a lot of HTTP 499 codes in nginx logs
- PostgreSQL activity shows 222 database connections
diff --git a/docs/2018-02/index.html b/docs/2018-02/index.html
index 33852785c..61007f40d 100644
--- a/docs/2018-02/index.html
+++ b/docs/2018-02/index.html
@@ -27,7 +27,7 @@ We don't need to distinguish between internal and external works, so that ma
Yesterday I figured out how to monitor DSpace sessions using JMX
I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu's munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
"/>
-
+
@@ -108,7 +108,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu's munin-plug
- 2018-02-01
+ 2018-02-01
- Peter gave feedback on the
dc.rights
proof of concept that I had sent him last week
- We don't need to distinguish between internal and external works, so that makes it just a simple list
@@ -124,7 +124,7 @@ I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu's munin-plug
v_.value 223
v_jspui.value 1
v_oai.value 0
-2018-02-03
+2018-02-03
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/authors-2018-02-05.csv with csv;
COPY 55630
-
2018-02-06
+2018-02-06
- UptimeRobot says CGSpace is down this morning around 9:15
- I see 308 PostgreSQL connections in
pg_stat_activity
@@ -213,7 +213,7 @@ Tue Feb 6 09:30:32 UTC 2018
- I'm not actually sure if the Solr web application uses the database though, so I'll have to check later and remove it if necessary
- I deployed the changes on DSpace Test only for now, so I will monitor and make them on CGSpace later this week
-2018-02-07
+2018-02-07
- Abenet wrote to ask a question about the ORCiD lookup not working for one CIAT user on CGSpace
- I tried on DSpace Test and indeed the lookup just doesn't work!
@@ -363,7 +363,7 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
- I cherry-picked all the commits for DS-3551 but it won't build on our current DSpace 5.5!
- I sent a message to the dspace-tech mailing list asking why DSpace thinks these connections are busy when PostgreSQL says they are idle
-2018-02-10
+2018-02-10
- I tried to disable ORCID lookups but keep the existing authorities
- This item has an ORCID for Ralf Kiese: http://localhost:8080/handle/10568/89897
@@ -378,7 +378,7 @@ $ grep 46.229.168 dspace.log.2018-02-07 | grep -o -E 'session_id=[A-Z0-9]{32}' |
- So I don't think we can disable the ORCID lookup function and keep the ORCID badges
-2018-02-11
+2018-02-11
- Magdalena from CCAFS emailed to ask why one of their items has such a weird thumbnail: 10568/90735
@@ -442,7 +442,7 @@ dspace=# commit;
- I don't know how to add ORCID IDs to existing items yet… some more querying of PostgreSQL for authority values perhaps?
- I added the script to the ILRI DSpace wiki on GitHub
-2018-02-12
+2018-02-12
- Follow up with Atmire on the DSpace 5.8 Compatibility ticket to ask again if they want me to send them a DSpace 5.8 branch to work on
- Abenet asked if there was a way to get the number of submissions she and Bizuwork did
@@ -464,7 +464,7 @@ dspace=# commit;
- I think I'd probably just attach the block storage volume and mount it on /home/dspace
- Ask Peter about
dc.rights
on DSpace Test again, if he likes it then we should move it to CGSpace soon
-2018-02-13
+2018-02-13
- Peter said he was getting a “socket closed” error on CGSpace
- I looked in the dspace.log.2018-02-13 and saw one recent one:
@@ -497,7 +497,7 @@ dspace.log.2018-02-13:4
Feb 13, 2018 2:05:42 PM org.apache.tomcat.jdbc.pool.ConnectionPool abandon
WARNING: Connection has been abandoned PooledConnection[org.postgresql.jdbc.PgConnection@22e107be]:java.lang.Exception
-
2018-02-14
+2018-02-14
- Skype with Peter and the Addis team to discuss what we need to do for the ORCIDs in the immediate future
- We said we'd start with a controlled vocabulary for
cg.creator.id
on the DSpace Test submission form, where we store the author name and the ORCID in some format like: Alan S. Orth (0000-0002-1735-7458)
@@ -552,7 +552,7 @@ UPDATE 1
- Then the cleanup process will continue for awhile and hit another foreign key conflict, and eventually it will complete after you manually resolve them all
-2018-02-15
+2018-02-15
- Altmetric seems to be indexing DSpace Test for some reason:
@@ -596,7 +596,7 @@ UPDATE 1
1512 207.46.13.59
1554 207.46.13.157
2018 104.196.152.243
-2018-02-17
+2018-02-17
- Peter pointed out that we had an incorrect sponsor in the controlled vocabulary:
U.S. Agency for International Development
→ United States Agency for International Development
- I made a pull request to fix it ((#354)[https://github.com/ilri/DSpace/pull/354])
@@ -604,7 +604,7 @@ UPDATE 1
dspace=# update metadatavalue set text_value='United States Agency for International Development' where resource_type_id=2 and metadata_field_id=29 and text_value like '%U.S. Agency for International Development%';
UPDATE 2
-
2018-02-18
+2018-02-18
- ICARDA's Mohamed Salem pointed out that it would be easiest to format the
cg.creator.id
field like “Alan Orth: 0000-0002-1735-7458” because no name will have a “:” so it's easier to split on
- I finally figured out a few ways to extract ORCID iDs from metadata using XSLT and display them in the XMLUI:
@@ -665,7 +665,7 @@ org.springframework.web.util.NestedServletException: Handler processing failed;
- I have no idea what caused this crash
- In other news, I adjusted the ORCID badge size on the XMLUI item display and sent it back to Peter for feedback
-2018-02-19
+2018-02-19
- Combined list of CGIAR author ORCID iDs is up to 1,500:
@@ -708,7 +708,7 @@ TypeError: 'NoneType' object is not subscriptable
- According to ORCID that identifier's entire name block is null!
-2018-02-20
+2018-02-20
- Send Abenet an email about getting a purchase requisition for a new DSpace Test server on Linode
- Discuss some of the issues with null values and poor-quality names in some ORCID identifiers with Abenet and I think we'll now only use ORCID iDs that have been sent to use from partners, not those extracted via keyword searches on orcid.org
@@ -756,7 +756,7 @@ TypeError: 'NoneType' object is not subscriptable
- Remove CPWF project number and Humidtropics subject from submission form (#3)
- I accidentally merged it into my own repository, oops
-2018-02-22
+2018-02-22
- CGSpace was apparently down today around 13:00 server time and I didn't get any emails on my phone, but saw them later on the computer
- It looks like Sisay restarted Tomcat because I was offline
@@ -803,11 +803,11 @@ TypeError: 'NoneType' object is not subscriptable
- It seems to re-use its user agent but makes tons of useless requests and I wonder if I should add “.spider.” to the Tomcat Crawler Session Manager valve?
-2018-02-23
+2018-02-23
- Atmire got back to us with a quote about their DSpace 5.8 upgrade
-2018-02-25
+2018-02-25
- A few days ago Abenet sent me the list of ORCID iDs from CCAFS
- We currently have 988 unique identifiers:
@@ -872,7 +872,7 @@ Alan S. Orth: 0000-0002-1735-7458
Ibrahim Mohammed: 0000-0001-5199-5528
Nor Azwadi: 0000-0001-9634-1958
./resolve-orcids.py -i orcid-test-values.txt -o /tmp/orcid-names 0.23s user 0.05s system 8% cpu 3.046 total
-2018-02-26
+2018-02-26
- Peter is having problems with “Socket closed” on his submissions page again
- He says his personal account loads much faster than his CGIAR account, which could be because the CGIAR account has potentially thousands of submissions over the last few years
@@ -880,7 +880,7 @@ Nor Azwadi: 0000-0001-9634-1958
- I think I should increase the
removeAbandonedTimeout
from 90 to something like 180 and continue observing
- I also reduced the timeout for the API pool back to 60 because those interfaces are only used by bots
-2018-02-27
+2018-02-27
- Peter is still having problems with “Socket closed” on his submissions page
- I have disabled
removeAbandoned
for now because that's the only thing I changed in the last few weeks since he started having issues
@@ -923,7 +923,7 @@ COPY 263
- It successfully mapped 2600 ORCID identifiers to items in my tests
- I will run it on DSpace Test
-2018-02-28
+2018-02-28
- CGSpace crashed today, the first HTTP 499 in nginx's access.log was around 09:12
- There's nothing interesting going on in nginx's logs around that time:
diff --git a/docs/2018-03/index.html b/docs/2018-03/index.html
index 96e19b5d0..66db16861 100644
--- a/docs/2018-03/index.html
+++ b/docs/2018-03/index.html
@@ -21,7 +21,7 @@ Export a CSV of the IITA community metadata for Martin Mueller
Export a CSV of the IITA community metadata for Martin Mueller
"/>
-
+
@@ -102,11 +102,11 @@ Export a CSV of the IITA community metadata for Martin Mueller
- 2018-03-02
+ 2018-03-02
- Export a CSV of the IITA community metadata for Martin Mueller
-2018-03-06
+2018-03-06
- Add three new CCAFS project tags to
input-forms.xml
(#357)
- Andrea from Macaroni Bros had sent me an email that CCAFS needs them
@@ -138,14 +138,14 @@ UPDATE 1
- Apply the proposed PostgreSQL indexes from DS-3636 (pull request #1791 on CGSpace (linode18)
-2018-03-07
+2018-03-07
- Add CIAT author Mauricio Efren Sotelo Cabrera to controlled vocabulary for ORCID identifiers (#360)
- Help Sisay proof 200 IITA records on DSpace Test
- Finally import Udana's 24 items to IWMI Journal Articles on CGSpace
- Skype with James Stapleton to discuss CGSpace, ILRI website, CKM staff issues, etc
-2018-03-08
+2018-03-08
- Looking at a CSV dump of the CIAT community I see there are tons of stupid text languages people add for their metadata
- This makes the CSV have tons of columns, for example
dc.title
, dc.title[]
, dc.title[en]
, dc.title[eng]
, dc.title[en_US]
and so on!
@@ -218,12 +218,12 @@ UPDATE 2309
- I added ORCID identifers for 187 items by CIAT's Hernan Ceballos, because that is what Elizabeth was trying to do manually!
- Also, I decided to add ORCID identifiers for all records from Peter, Abenet, and Sisay as well
-2018-03-09
+2018-03-09
- Give James Stapleton input on Sisay's KRAs
- Create a pull request to disable ORCID authority integration for
dc.contributor.author
in the submission forms and XMLUI display (#363)
-2018-03-11
+2018-03-11
- Peter also wrote to say he is having issues with the Atmire Listings and Reports module
- When I logged in to try it I get a blank white page after continuing and I see this in dspace.log.2018-03-11:
@@ -242,11 +242,11 @@ org.apache.jasper.JasperException: java.lang.NullPointerException
- Looks like I needed to remove the Humidtropics subject from Listings and Reports because it was looking for the terms and couldn't find them
- I made a quick fix and it's working now (#364)
-2018-03-12
+2018-03-12
- Increase upload size on CGSpace's nginx config to 85MB so Sisay can upload some data
-2018-03-13
+2018-03-13
- I created a new Linode server for DSpace Test (linode6623840) so I could try the block storage stuff, but when I went to add a 300GB volume it said that block storage capacity was exceeded in that datacenter (Newark, NJ)
- I deleted the Linode and created another one (linode6624164) in the Fremont, CA region
@@ -258,14 +258,14 @@ org.apache.jasper.JasperException: java.lang.NullPointerException
- CCAFS publication page: https://ccafs.cgiar.org/publications/can-scenario-planning-catalyse-transformational-change-evaluating-climate-change-policy
- Peter tweeted the Handle link and now Altmetric shows the donut for both the DOI and the Handle
-2018-03-14
+2018-03-14
- Help Abenet with a troublesome Listings and Report question for CIAT author Steve Beebe
- Continue migrating DSpace Test to the new server (linode6624164)
- I emailed ILRI service desk to update the DNS records for dspacetest.cgiar.org
- Abenet was having problems saving Listings and Reports configurations or layouts but I tested it and it works
-2018-03-15
+2018-03-15
- Help Abenet troubleshoot the Listings and Reports issue again
- It looks like it's an issue with the layouts, if you create a new layout that only has one type (
dc.identifier.citation
):
@@ -281,7 +281,7 @@ org.apache.jasper.JasperException: java.lang.NullPointerException
- I submitted a ticket to Atmire: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=589
- Small fix to the example citation text in Listings and Reports (#365)
-2018-03-16
+2018-03-16
- ICT made the DNS updates for dspacetest.cgiar.org late last night
- I have removed the old server (linode02 aka linode578611) in favor of linode19 aka linode6624164
@@ -300,7 +300,7 @@ COPY 21
- Create a pull request to update the input forms for the new CRP subject style (#366)
-2018-03-19
+2018-03-19
- Tezira has been having problems accessing CGSpace from the ILRI Nairobi campus since last week
- She is getting an HTTPS error apparently
@@ -355,7 +355,7 @@ Exception in thread "http-bio-127.0.0.1-8081-exec-280" java.lang.OutOf
- The title is “Untitled” and there is some metadata but indeed the citation is missing
- I don't know what would cause that
-2018-03-20
+2018-03-20
- DSpace Test has been down for a few hours with SQL and memory errors starting this morning:
@@ -401,7 +401,7 @@ java.lang.IllegalArgumentException: No choices plugin was configured for field
- I have to figure that one out…
-2018-03-21
+2018-03-21
- Looks like the indexing gets confused that there is still data in the
authority
column
- Unfortunately this causes those items to simply not be indexed, which users noticed because item counts were cut in half and old items showed up in RSS!
@@ -466,17 +466,17 @@ sys 2m45.135s
- I need to be able to add many common characters though so that it is useful to copy and paste into a new project to find issues
-2018-03-22
+2018-03-22
-2018-03-24
+2018-03-24
- More work on the Ubuntu 18.04 readiness stuff for the Ansible playbooks
- The playbook now uses the system's Ruby and Node.js so I don't have to manually install RVM and NVM after
-2018-03-25
+2018-03-25
- Looking at Peter's author corrections and trying to work out a way to find errors in OpenRefine easily
- I can find all names that have acceptable characters using a GREL expression like:
@@ -520,16 +520,16 @@ $ ./delete-metadata-values.py -i /tmp/Delete-8-Authors-2018-03-21.csv -f dc.cont
- CGSpace took 76m28.292s
- DSpace Test took 194m56.048s
-2018-03-26
+2018-03-26
- Atmire got back to me about the Listings and Reports issue and said it's caused by items that have missing
dc.identifier.citation
fields
- The will send a fix
-2018-03-27
+2018-03-27
- Atmire got back with an updated quote about the DSpace 5.8 compatibility so I've forwarded it to Peter
-2018-03-28
+2018-03-28
- DSpace Test crashed due to heap space so I've increased it from 4096m to 5120m
- The error in Tomcat's
catalina.out
was:
diff --git a/docs/2018-04/index.html b/docs/2018-04/index.html
index 8ca6241a7..3955ea72a 100644
--- a/docs/2018-04/index.html
+++ b/docs/2018-04/index.html
@@ -23,7 +23,7 @@ Catalina logs at least show some memory errors yesterday:
I tried to test something on DSpace Test but noticed that it's down since god knows when
Catalina logs at least show some memory errors yesterday:
"/>
-
+
@@ -104,7 +104,7 @@ Catalina logs at least show some memory errors yesterday:
- 2018-04-01
+ 2018-04-01
- I tried to test something on DSpace Test but noticed that it's down since god knows when
- Catalina logs at least show some memory errors yesterday:
@@ -121,7 +121,7 @@ Exception in thread "ContainerBackgroundProcessor[StandardEngine[Catalina]]
- I posted a message on Yammer to ask if people are using the Duplicate Check step from the Metadata Quality Module
- Help Lili Szilagyi with a question about statistics on some CCAFS items
-2018-04-04
+2018-04-04
- Peter noticed that there were still some old CRP names on CGSpace, because I hadn't forced the Discovery index to be updated after I fixed the others last week
- For completeness I re-ran the CRP corrections on CGSpace:
@@ -168,7 +168,7 @@ $ git rebase -i dspace-5.8
- I need to send this branch to Atmire and also arrange payment (see ticket #560 in their tracker)
- Fix Sisay's SSH access to the new DSpace Test server (linode19)
-2018-04-05
+2018-04-05
- Fix Sisay's sudo access on the new DSpace Test server (linode19)
- The reindexing process on DSpace Test took forever yesterday:
@@ -192,7 +192,7 @@ sys 2m52.585s
- Proof some records on DSpace Test for Udana from IWMI
- He has done better with the small syntax and consistency issues but then there are larger concerns with not linking to DOIs, copying titles incorrectly, etc
-2018-04-10
+2018-04-10
- I got a notice that CGSpace CPU usage was very high this morning
- Looking at the nginx logs, here are the top users today so far:
@@ -344,7 +344,7 @@ UPDATE 1
- I told Udana to fix the citation and abstract of the one item, and to correct the
dc.language.iso
for the five Spanish items in his Book Chapters collection
- Then we can import the records to CGSpace
-2018-04-11
+2018-04-11
- DSpace Test (linode19) crashed again some time since yesterday:
@@ -353,16 +353,16 @@ UPDATE 1
- I ran all system updates and rebooted the server
-2018-04-12
+2018-04-12
-2018-04-13
+2018-04-13
- Add
PII-LAM_CSAGender
to CCAFS Phase II project tags in input-forms.xml
-2018-04-15
+2018-04-15
- While testing an XMLUI patch for DS-3883 I noticed that there is still some remaining Authority / Solr configuration left that we need to remove:
@@ -385,11 +385,11 @@ Total time: 4 minutes 12 seconds
- The Linode block storage is much slower than the instance storage
- I ran all system updates and rebooted DSpace Test (linode19)
-2018-04-16
+2018-04-16
- Communicate with Bioversity about their project to migrate their e-Library (Typo3) and Sci-lit databases to CGSpace
-2018-04-18
+2018-04-18
dspace#= \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/89347', '10568/88229', '10568/53086', '10568/53085', '10568/69069', '10568/53087', '10568/53088', '10568/53089', '10568/53090', '10568/53091', '10568/53092', '10568/70150', '10568/53093', '10568/64874', '10568/53094'))) group by text_value order by count desc) to /tmp/cip-authors.csv with csv;
-
2018-04-19
+2018-04-19
- Run updates on DSpace Test (linode19) and reboot the server
- Also try deploying updated GeoLite database during ant update while re-deploying code:
@@ -442,7 +442,7 @@ sys 2m2.687s
- This time is with about 70,000 items in the repository
-2018-04-20
+2018-04-20
- Gabriela from CIP emailed to say that CGSpace was returning a white page, but I haven't seen any emails from UptimeRobot
- I confirm that it's just giving a white page around 4:16
@@ -515,7 +515,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Time
-2018-04-24
+2018-04-24
- Testing my Ansible playbooks with a clean and updated installation of Ubuntu 18.04 and I fixed some issues that I hadn't run into a few weeks ago
- There seems to be a new issue with Java dependencies, though
@@ -529,7 +529,7 @@ org.apache.tomcat.jdbc.pool.PoolExhaustedException: [localhost-startStop-2] Time
- Also, I started porting PostgreSQL 9.6 into the Ansible infrastructure scripts
- This should be a drop in I believe, though I will definitely test it more locally as well as on DSpace Test once we move to DSpace 5.8 and Ubuntu 18.04 in the coming months
-2018-04-25
+2018-04-25
- Still testing the Ansible infrastructure playbooks for Ubuntu 18.04, Tomcat 8.5, and PostgreSQL 9.6
- One other new thing I notice is that PostgreSQL 9.6 no longer uses
createuser
and nocreateuser
, as those have actually meant superuser
and nosuperuser
and have been deprecated for ten years
@@ -556,12 +556,12 @@ $ pg_restore -O -U dspacetest -d dspacetest -W -h localhost /tmp/dspace_2018-04-
- There's a Debian bug about this from a few weeks ago
- Apparently Tomcat was compiled with Java 9, so doesn't work with Java 8
-2018-04-29
+2018-04-29
- DSpace Test crashed again, looks like memory issues again
- JVM heap size was last increased to 6144m but the system only has 8GB total so there's not much we can do here other than get a bigger Linode instance or remove the massive Solr Statistics data
-2018-04-30
+2018-04-30
- DSpace Test crashed again
- I will email the CGSpace team to ask them whether or not we want to commit to having a public test server that accurately mirrors CGSpace (ie, to upgrade to the next largest Linode)
diff --git a/docs/2018-05/index.html b/docs/2018-05/index.html
index 4b088446b..fa6d1d27d 100644
--- a/docs/2018-05/index.html
+++ b/docs/2018-05/index.html
@@ -35,7 +35,7 @@ http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E
Then I reduced the JVM heap size from 6144 back to 5120m
Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
"/>
-
+
@@ -116,7 +116,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
- 2018-05-01
+ 2018-05-01
- I cleared the Solr statistics core on DSpace Test by issuing two commands directly to the Solr admin interface:
@@ -127,7 +127,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
- Then I reduced the JVM heap size from 6144 back to 5120m
- Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
-2018-05-02
+2018-05-02
- Advise Fabio Fidanza about integrating CGSpace content in the new CGIAR corporate website
- I think they can mostly rely on using the
cg.contributor.crp
field
@@ -161,7 +161,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
-2018-05-03
+2018-05-03
- It turns out that the IITA records that I was helping Sisay with in March were imported in 2018-04 without a final check by Abenet or I
- There are lots of errors on language, CRP, and even some encoding errors on abstract fields
@@ -172,7 +172,7 @@ Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked
- Abenet sent a list of 46 ORCID identifiers for ILRI authors so I need to get their names using my resolve-orcids.py script and merge them into our controlled vocabulary
- On the messed up IITA records from 2018-04 I see sixty DOIs in incorrect format (cg.identifier.doi)
-2018-05-06
+2018-05-06
- Fixing the IITA records from Sisay, sixty DOIs have completely invalid format like
http:dx.doi.org10.1016j.cropro.2008.07.003
- I corrected all the DOIs and then checked them for validity with a quick bash loop:
@@ -218,7 +218,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
- I made a pull request (#373) for this that I'll merge some time next week (I'm expecting Atmire to get back to us about DSpace 5.8 soon)
- After testing quickly I just decided to merge it, and I noticed that I don't even need to restart Tomcat for the changes to get loaded
-2018-05-07
+2018-05-07
- I spent a bit of time playing with conciliator and Solr, trying to figure out how to reconcile columns in OpenRefine with data in our existing Solr cores (like CRP subjects)
- The documentation regarding the Solr stuff is limited, and I cannot figure out what all the fields in
conciliator.properties
are supposed to be
@@ -226,7 +226,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
- That, combined with splitting our multi-value fields on “||” in OpenRefine is amaaaaazing, because after reconciliation you can just join them again
- Oh wow, you can also facet on the individual values once you've split them! That's going to be amazing for proofing CRPs, subjects, etc.
-2018-05-09
+2018-05-09
- Udana asked about the Book Chapters we had been proofing on DSpace Test in 2018-04
- I told him that there were still some TODO items for him on that data, for example to update the
dc.language.iso
field for the Spanish items
@@ -271,7 +271,7 @@ Livestock and Fish
- I tried to reconcile against a CSV of our countries but reconcile-csv crashes
-2018-05-13
+2018-05-13
- It turns out there was a space in my “country” header that was causing reconcile-csv to crash
- After removing that it works fine!
@@ -291,12 +291,12 @@ Livestock and Fish
-2018-05-14
+2018-05-14
- Send a message to the OpenRefine mailing list about the bug with reconciling multi-value cells
- Help Silvia Alonso get a list of all her publications since 2013 from Listings and Reports
-2018-05-15
+2018-05-15
- Turns out I was doing the OpenRefine reconciliation wrong: I needed to copy the matched values to a new column!
- Also, I learned how to do something cool with Jython expressions in OpenRefine
@@ -358,7 +358,7 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
- I copied over the DSpace
search_text
field type from the DSpace Solr config (had to remove some properties so Solr would start) but it doesn't seem to be any better at matching than the text_en
type
- I think I need to focus on trying to return scores with conciliator
-2018-05-16
+2018-05-16
- Discuss GDPR with James Stapleton
@@ -381,7 +381,7 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
- According to the analytics.js protocol parameter documentation this means that IPs are being anonymized
- After finding and fixing some duplicates in IITA's
IITA_April_27
test collection on DSpace Test (10568/92703) I told Sisay that he can move them to IITA's Journal Articles collection on CGSpace
-2018-05-17
+2018-05-17
- Testing reconciliation of countries against Solr via conciliator, I notice that
CÔTE D'IVOIRE
doesn't match COTE D'IVOIRE
, whereas with reconcile-csv it does
- Also, when reconciling regions against Solr via conciliator
EASTERN AFRICA
doesn't match EAST AFRICA
, whereas with reconcile-csv it does
@@ -401,23 +401,23 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
- This cookie could be set by a user clicking a link in a privacy policy, for example
- The additional Javascript could be easily added to our existing
googleAnalytics
template in each XMLUI theme
-2018-05-18
+2018-05-18
-2018-05-20
+2018-05-20
- Run all system updates on DSpace Test (linode19), re-deploy DSpace with latest
5_x-dev
branch (including GDPR IP anonymization), and reboot the server
- Run all system updates on CGSpace (linode18), re-deploy DSpace with latest
5_x-dev
branch (including GDPR IP anonymization), and reboot the server
-2018-05-21
+2018-05-21
- Geoffrey from IITA got back with more questions about depositing items programatically into the CGSpace workflow
- I pointed out that SWORD might be an option, as DSpace supports the SWORDv2 protocol (although we have never tested it)
- Work on implementing cookie consent popup for all XMLUI themes (SASS theme with primary / secondary branding from Bootstrap)
-2018-05-22
+2018-05-22
- Skype with James Stapleton about last minute GDPR wording
- After spending yesterday working on integration and theming of the cookieconsent popup, today I cannot get the damn “Agree” button to dismiss the popup!
@@ -427,7 +427,7 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
- This is a waste of TWO full days of work
- Marissa Van Epp asked if I could add
PII-FP1_PACCA2
to the CCAFS phase II project tags on CGSpace so I created a ticket to track it (#376)
-2018-05-23
+2018-05-23
- I'm investigating how many non-CGIAR users we have registered on CGSpace:
@@ -439,14 +439,14 @@ $ ./bin/post -c countries ~/src/git/DSpace/2018-05-10-countries.csv
- I made a pull request for the GDPR compliance popup (#377) and merged it to the
5_x-prod
branch
- I will deploy it to CGSpace tonight
-2018-05-28
+2018-05-28
- Daniel Haile-Michael sent a message that CGSpace was down (I am currently in Oregon so the time difference is ~10 hours)
- I looked in the logs but didn't see anything that would be the cause of the crash
- Atmire finalized the DSpace 5.8 testing and sent a pull request: https://github.com/ilri/DSpace/pull/378
- They have asked if I can test this and get back to them by June 11th
-2018-05-30
+2018-05-30
- Talk to Samantha from Bioversity about something related to Google Analytics, I'm still not sure what they want
- DSpace Test crashed last night, seems to be related to system memory (not JVM heap)
@@ -479,7 +479,7 @@ $ sed 's/.*Item1.*/\n&/g' ~/cifor-duplicates.txt > ~/cifor-duplicates-cle
- Then I format the list of handles and put it into this SQL query to export authors from items ONLY in those collections (too many to list here):
dspace=# \copy (select distinct text_value, count(*) from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 AND resource_id IN (select item_id from collection2item where collection_id IN (select resource_id from handle where handle in ('10568/67236','10568/67274',...))) group by text_value order by count desc) to /tmp/ilri-authors.csv with csv;
-
2018-05-31
+2018-05-31
- Clarify CGSpace's usage of Google Analytics and personally identifiable information during user registration for Bioversity team who had been asking about GDPR compliance
- Testing running PostgreSQL in a Docker container on localhost because when I'm on Arch Linux there isn't an easily installable package for particular PostgreSQL versions
diff --git a/docs/2018-06/index.html b/docs/2018-06/index.html
index 808311cc1..759b0831d 100644
--- a/docs/2018-06/index.html
+++ b/docs/2018-06/index.html
@@ -55,7 +55,7 @@ real 74m42.646s
user 8m5.056s
sys 2m7.289s
"/>
-
+
@@ -136,7 +136,7 @@ sys 2m7.289s
- 2018-06-04
+ 2018-06-04
- Test the DSpace 5.8 module upgrades from Atmire (#378)
@@ -156,13 +156,13 @@ sys 2m7.289s
real 74m42.646s
user 8m5.056s
sys 2m7.289s
-2018-06-06
+2018-06-06
- It turns out that I needed to add a server block for
atmire.com-snapshots
to my Maven settings, so now the Atmire code builds
- Now Maven and Ant run properly, but I'm getting SQL migration errors in
dspace.log
after starting Tomcat
- I've updated my ticket on Atmire's bug tracker: https://tracker.atmire.com/tickets-cgiar-ilri/view-ticket?id=560
-2018-06-07
+2018-06-07
- Proofing 200 IITA records on DSpace Test for Sisay: IITA_Junel_06 (10568/95391)
@@ -201,13 +201,13 @@ update schema_version set version = '5.8.2015.12.03.3' where version = '5.5.2015
- I will apply them on CGSpace tomorrow I think…
-2018-06-09
+2018-06-09
- It's pretty annoying, but the JVM monitoring for Munin was never set up when I migrated DSpace Test to its new server a few months ago
- I ran the tomcat and munin-node tags in Ansible again and now the stuff is all wired up and recording stats properly
- I applied the CIP author corrections on CGSpace and DSpace Test and re-ran the Discovery indexing
-2018-06-10
+2018-06-10
- I spent some time removing the Atmire Metadata Quality Module (MQM) from the proposed DSpace 5.8 changes
- After removing all code mentioning MQM, mqm, metadata-quality, batchedit, duplicatechecker, etc, I think I got most of it removed, but there is a Spring error during Tomcat startup:
@@ -237,7 +237,7 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
- I will have to tell IITA people to redo these entirely I think…
-2018-06-11
+2018-06-11
- Sisay sent a new version of the last IITA records that he created from the original CSV from IITA
- The 200 records are in the IITA_Junel_11 (10568/95870) collection
@@ -265,7 +265,7 @@ Failed to startup the DSpace Service Manager: failure starting up spring service
- I always use the built-in trim and collapse transformations anyways, but this seems to work to find the offending cells:
isNotNull(value.match(/.*?\s{2,}.*?/))
- I wonder if I should start checking for “smart” quotes like ’ (hex 2019)
-2018-06-12
+2018-06-12
-2018-06-13
+2018-06-13
- Elizabeth from CIAT contacted me to ask if I could add ORCID identifiers to all of Robin Buruchara's items
- I used my add-orcid-identifiers-csv.py script:
@@ -365,14 +365,14 @@ Error: ERROR: update or delete on table "bitstream" violates foreign k
$ psql dspace -c 'update bundle set primary_bitstream_id=NULL where primary_bitstream_id in (152402);'
UPDATE 1
-
2018-06-14
+2018-06-14
-2018-06-24
+2018-06-24
- I was restoring a PostgreSQL dump on my test machine and found a way to restore the CGSpace dump as the
postgres
user, but have the owner of the schema be the dspacetest
user:
@@ -427,7 +427,7 @@ Done.
"Jarvis, A.",Andy Jarvis: 0000-0001-6543-0798
"Jarvis, Andy",Andy Jarvis: 0000-0001-6543-0798
"Jarvis, Andrew",Andy Jarvis: 0000-0001-6543-0798
-2018-06-26
+2018-06-26
- Atmire got back to me to say that we can remove the
itemCollectionPlugin
and HasBitstreamsSSIPlugin
beans from DSpace's discovery.xml
file, as they are used by the Metadata Quality Module (MQM) that we are not using anymore
- I removed both those beans and did some simple tests to check item submission, media-filter of PDFs, REST API, but got an error “No matches for the query” when listing records in OAI
@@ -438,7 +438,7 @@ Done.
- It's actually only a warning and it also appears in the logs on DSpace Test (which is currently running DSpace 5.5), so I need to keep troubleshooting
- Ah, I think I just need to run
dspace oai import
-2018-06-27
+2018-06-27
- Vika from CIFOR sent back his annotations on the duplicates for the “CIFOR_May_9” archive import that I sent him last week
- I'll have to figure out how to separate those we're keeping, deleting, and mapping into CIFOR's archive collection
@@ -471,7 +471,7 @@ $ sed '/^id/d' 10568-*.csv | csvcut -c 1,2 > map-to-cifor-archive.csv
- After deleting the 62 duplicates, mapping the 50 items from elsewhere in CGSpace, and uploading 2,398 unique items, there are a total of 2,448 items added in this batch
- I'll let Abenet take one last look and then move them to CGSpace
-2018-06-28
+2018-06-28
- DSpace Test appears to have crashed last night
- There is nothing in the Tomcat or DSpace logs, but I see the following in
dmesg -T
:
diff --git a/docs/2018-07/index.html b/docs/2018-07/index.html
index b858e9f77..c6c776242 100644
--- a/docs/2018-07/index.html
+++ b/docs/2018-07/index.html
@@ -33,7 +33,7 @@ During the mvn package stage on the 5.8 branch I kept getting issues with java r
There is insufficient memory for the Java Runtime Environment to continue.
"/>
-
+
@@ -114,7 +114,7 @@ There is insufficient memory for the Java Runtime Environment to continue.
- 2018-07-01
+ 2018-07-01
- I want to upgrade DSpace Test to DSpace 5.8 so I took a backup of its current database just in case:
@@ -147,12 +147,12 @@ $ dspace database migrate ignored
- After that I started Tomcat 7 and DSpace seems to be working, now I need to tell our colleagues to try stuff and report issues they have
-2018-07-02
+2018-07-02
-2018-07-03
+2018-07-03
- Finally finish with the CIFOR Archive records (a total of 2448):
@@ -213,7 +213,7 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
- Gotta check that out later…
-2018-07-04
+2018-07-04
- I verified that the autowire error indeed only occurs on Tomcat 8.5, but the application works fine on Tomcat 7
- I have raised this in the DSpace 5.8 compatibility ticket on Atmire's tracker
@@ -221,12 +221,12 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
- Also, Udana wants me to add “Enhancing Sustainability Across Agricultural Systems” to the WLE Phase II research themes so I created a ticket to track that (#382)
- I need to try to finish this DSpace 5.8 business first because I have too many branches with cherry-picks going on right now!
-2018-07-06
+2018-07-06
- CCAFS want me to add “PII-FP2_MSCCCAFS” to their Phase II project tags on CGSpace (#383)
- I'll do it in a batch with all the other metadata updates next week
-2018-07-08
+2018-07-08
- I was tempted to do the Linode instance upgrade on CGSpace (linode18), but after looking closely at the system backups I noticed that Solr isn't being backed up to S3
- I apparently noticed this—and fixed it!—in 2016-07, but it doesn't look like the backup has been updated since then!
@@ -246,7 +246,7 @@ $ ./resolve-orcids.py -i /tmp/2018-07-08-orcids.txt -o /tmp/2018-07-08-names.txt
- But after comparing to the existing list of names I didn't see much change, so I just ignored it
-2018-07-09
+2018-07-09
- Uptime Robot said that CGSpace was down for two minutes early this morning but I don't see anything in Tomcat logs or dmesg
- Uptime Robot said that CGSpace was down for two minutes again later in the day, and this time I saw a memory error in Tomcat's
catalina.out
:
@@ -295,7 +295,7 @@ org.apache.solr.client.solrj.SolrServerException: IOException occured when talki
- Interestingly, the first time that I see
35.227.26.162
was on 2018-06-08
- I've added
35.227.26.162
to the bot tagging logic in the nginx vhost
-2018-07-10
+2018-07-10
- Add “United Kingdom government” to sponsors (#381)
- Add “Enhancing Sustainability Across Agricultural Systems” to WLE Phase II Research Themes (#382)
@@ -325,7 +325,7 @@ org.apache.solr.client.solrj.SolrServerException: IOException occured when talki
- He said there was a bug that caused his app to request a bunch of invalid URLs
- I'll have to keep and eye on this and see how their platform evolves
-2018-07-11
+2018-07-11
- Skype meeting with Peter and Addis CGSpace team
@@ -336,7 +336,7 @@ org.apache.solr.client.solrj.SolrServerException: IOException occured when talki
-2018-07-12
+2018-07-12
- Uptime Robot said that CGSpace went down a few times last night, around 10:45 PM and 12:30 AM
- Here are the top ten IPs from last night and this morning:
@@ -396,13 +396,13 @@ $ csvcut -c 1 < /tmp/affiliations.csv > /tmp/affiliations-1.csv
- We also need to discuss standardizing our countries and comparing our ORCID iDs
-2018-07-13
+2018-07-13
- Generate a list of affiliations for Peter and Abenet to go over so we can batch correct them before we deploy the new data visualization dashboard:
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'affiliation') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/affiliations.csv with csv header;
COPY 4518
-
2018-07-15
+2018-07-15
- Run all system updates on CGSpace, add latest metadata changes from last week, and start the Linode instance upgrade
- After the upgrade I see we have more disk space available in the instance's dashboard, so I shut the instance down and resized it from 392GB to 650GB
@@ -447,7 +447,7 @@ $ ./resolve-orcids.py -i /tmp/2018-07-15-orcid-ids.txt -o /tmp/2018-07-15-resolv
- I will check with the CGSpace team to see if they want me to add these to CGSpace
- Help Udana from WLE understand some Altmetrics concepts
-2018-07-18
+2018-07-18
- ICARDA sent me another refined list of ORCID iDs so I sorted and formatted them into our controlled vocabulary again
- Participate in call with IWMI and WLE to discuss Altmetric, CGSpace, and social media
@@ -486,7 +486,7 @@ Vary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
-2018-07-19
+2018-07-19
- I tested a submission via SAF bundle to DSpace 5.8 and it worked fine
- In addition to testing DSpace 5.8, I specifically wanted to see if the issue with specifying collections in metadata instead of on the command line would work (DS-3583)
@@ -497,7 +497,7 @@ X-XSS-Protection: 1; mode=block
- I told her that they need to start using more accurate dates for their issue dates
- In the example item I looked at the DOI has a publish date of 2018-03-16, so they should really try to capture that
-2018-07-22
+2018-07-22
- I told the IWMI people that they can use
sort_by=3
in their OpenSearch query to sort the results by dc.date.accessioned
instead of dc.date.issued
- They say that it is a burden for them to capture the issue dates, so I cautioned them that this is in their own benefit for future posterity and that everyone else on CGSpace manages to capture the issue dates!
@@ -510,7 +510,7 @@ X-XSS-Protection: 1; mode=block
- I finally informed Atmire that we're ready to proceed with deploying this to CGSpace and that they should advise whether we should wait about the SNAPSHOT versions in
pom.xml
- There is no word on the issue I reported with Tomcat 8.5.32 yet, though…
-2018-07-23
+2018-07-23
- Still discussing dates with IWMI
- I looked in the database to see the breakdown of date formats used in
dc.date.issued
, ie YYYY, YYYY-MM, or YYYY-MM-DD:
@@ -532,11 +532,11 @@ dspace=# select count(text_value) from metadatavalue where resource_type_id=2 an
- So it looks like YYYY is the most numerious, followed by YYYY-MM-DD, then YYYY-MM
-2018-07-26
+2018-07-26
- Run system updates on DSpace Test (linode19) and reboot the server
-2018-07-27
+2018-07-27
- Follow up with Atmire again about the SNAPSHOT versions in our
pom.xml
because I want to finalize the DSpace 5.8 upgrade soon and I haven't heard from them in a month (ticket 560)
diff --git a/docs/2018-08/index.html b/docs/2018-08/index.html
index eebf2bc73..00b5e79ed 100644
--- a/docs/2018-08/index.html
+++ b/docs/2018-08/index.html
@@ -43,7 +43,7 @@ Anyways, perhaps I should increase the JVM heap from 5120m to 6144m like we did
The server only has 8GB of RAM so we'll eventually need to upgrade to a larger one because we'll start starving the OS, PostgreSQL, and command line batch processes
I ran all system updates on DSpace Test and rebooted it
"/>
-
+
@@ -124,7 +124,7 @@ I ran all system updates on DSpace Test and rebooted it
- 2018-08-01
+ 2018-08-01
- DSpace Test had crashed at some point yesterday morning and I see the following in
dmesg
:
@@ -149,7 +149,7 @@ I ran all system updates on DSpace Test and rebooted it
-2018-08-02
+2018-08-02
- DSpace Test crashed again and I don't see the only error I see is this in
dmesg
:
@@ -165,7 +165,7 @@ I ran all system updates on DSpace Test and rebooted it
- I just tried to enable the stats again on DSpace Test now that we're on DSpace 5.8 with updated Atmire modules, but every user I search for shows “No data available”
- As a test I submitted a new item and I was able to see it in the workflow statistics “data” tab, but not in the graph
-2018-08-15
+2018-08-15
- Run through Peter's list of author affiliations from earlier this month
- I did some quick sanity checks and small cleanups in Open Refine, checking for spaces, weird accents, and encoding errors
@@ -173,7 +173,7 @@ I ran all system updates on DSpace Test and rebooted it
$ ./fix-metadata-values.py -i 2018-08-15-Correct-1083-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -t correct -m 211
$ ./delete-metadata-values.py -i 2018-08-15-Remove-11-Affiliations.csv -db dspace -u dspace -p 'fuuu' -f cg.contributor.affiliation -m 211
-
2018-08-16
+2018-08-16
- Generate a list of the top 1,500 authors on CGSpace for Sisay so he can create the controlled vocabulary:
@@ -194,7 +194,7 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest ~/Downloads/cgspace_2018-08-16.backup
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
-2018-08-19
+2018-08-19
- Keep working on the CIAT ORCID identifiers from Elizabeth
- In the spreadsheet she sent me there are some names with other versions in the database, so when it is obviously the same one (ie “Schultze-Kraft, Rainer” and “Schultze-Kraft, R.") I will just tag them with ORCID identifiers too
@@ -296,7 +296,7 @@ sys 2m20.248s
- So I'm thinking we should add “crawl” to the Tomcat Crawler Session Manager valve, as we already have “bot” that catches Googlebot, Bingbot, etc.
-2018-08-20
+2018-08-20
- Help Sisay with some UTF-8 encoding issues in a file Peter sent him
- Finish up reconciling Atmire's pull request for DSpace 5.8 changes with the latest status of our
5_x-prod
branch
@@ -313,7 +313,7 @@ sys 2m20.248s
- Instead, I will archive the current
5_x-prod
DSpace 5.5 branch as 5_x-prod-dspace-5.5
and then hard reset 5_x-prod
based on 5_x-dspace-5.8
- Unfortunately this will mess up the references in pull requests and issues on GitHub
-2018-08-21
+2018-08-21
- Something must have happened, as the
mvn package
always takes about two hours now, stopping for a very long time near the end at this step:
@@ -335,7 +335,7 @@ sys 2m20.248s
- I need to test to see if this has any side effects when deployed…
- In other news, I see there was a pull request in DSpace 5.9 that fixes the issue with not being able to have blank lines in CSVs when importing via command line or webui (DS-3245)
-2018-08-23
+2018-08-23
- Skype meeting with CKM people to meet new web dev guy Tariku
- They say they want to start working on the ContentDM harvester middleware again
@@ -345,7 +345,7 @@ sys 2m20.248s
- I imported the CTA items on CGSpace for Sisay:
$ dspace import -a -e s.webshet@cgiar.org -s /home/swebshet/ictupdates_uploads_August_21 -m /tmp/2018-08-23-cta-ictupdates.map
-
2018-08-26
+2018-08-26
- Doing the DSpace 5.8 upgrade on CGSpace (linode18)
- I already finished the Maven build, now I'll take a backup of the PostgreSQL database and do a database cleanup just in case:
@@ -401,14 +401,14 @@ $ dspace database migrate ignored
- I just checked to see if the Listings and Reports issue with using the CGSpace citation field was fixed as planned alongside the DSpace 5.8 upgrades (#589
- I was able to create a new layout containing only the citation field, so I closed the ticket
-2018-08-29
+2018-08-29
- Discuss COPO with Martin Mueller
- He and the consortium's idea is to use this for metadata annotation (submission?) to all repositories
- It is somehow related to adding events as items in the repository, and then linking related papers, presentations, etc to the event item using
dc.relation
, etc.
- Discuss Linode server charges with Abenet, apparently we want to start charging these to Big Data
-2018-08-30
+2018-08-30
- I fixed the graphical glitch in the cookieconsent popup (the dismiss bug is still there) by pinning the last known good version (3.0.6) in
bower.json
of each XMLUI theme
- I guess cookieconsent got updated without me realizing it and the previous expression
^3.0.6
make bower install version 3.1.0
diff --git a/docs/2018-09/index.html b/docs/2018-09/index.html
index 4cd5c388b..922eb9882 100644
--- a/docs/2018-09/index.html
+++ b/docs/2018-09/index.html
@@ -27,7 +27,7 @@ I'll update the DSpace role in our Ansible infrastructure playbooks and run
Also, I'll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system's RAM, and we never re-ran them after migrating to larger Linodes last month
I'm testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I'm getting those autowire errors in Tomcat 8.5.30 again:
"/>
-
+
@@ -108,7 +108,7 @@ I'm testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I
- 2018-09-02
+ 2018-09-02
- New PostgreSQL JDBC driver version 42.2.5
- I'll update the DSpace role in our Ansible infrastructure playbooks and run the updated playbooks on CGSpace and DSpace Test
@@ -139,7 +139,7 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
- And the
5_x-prod
DSpace 5.8 branch does work in Tomcat 8.5.x on my Arch Linux laptop…
- I'm not sure where the issue is then!
-2018-09-03
+2018-09-03
- Abenet says she's getting three emails about periodic statistics reports every day since the DSpace 5.8 upgrade last week
- They are from the CUA module
@@ -148,7 +148,7 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
- She will try to click the “Unsubscribe” link in the first two to see if it works, otherwise we should contact Atmire
- The only one she remembers subscribing to is the top downloads one
-2018-09-04
+2018-09-04
- I'm looking over the latest round of IITA records from Sisay: Mercy1806_August_29
@@ -171,7 +171,7 @@ Caused by: java.lang.RuntimeException: Failed to startup the DSpace Service Mana
- Abenet says she hasn't received any more subscription emails from the CUA module since she unsubscribed yesterday, so I think we don't need create an issue on Atmire's bug tracker anymore
-2018-09-10
+2018-09-10
- Playing with strest to test the DSpace REST API programatically
- For example, given this
test.yaml
:
@@ -287,7 +287,7 @@ X-XSS-Protection: 1; mode=block
- I will have to keep an eye on it and perhaps add it to the list of “bad bots” that get rate limited
-2018-09-12
+2018-09-12
- Merge AReS explorer changes to nginx config and deploy on CGSpace so CodeObia can start testing more
- Re-create my local Docker container for PostgreSQL data, but using a volume for the database data:
@@ -301,7 +301,7 @@ $ sudo docker run --name dspacedb -v dspacetest_data:/var/lib/postgresql/data -e
- I told Sisay to run the XML file through tidy
- More testing of the access and usage rights changes
-2018-09-13
+2018-09-13
- Peter was communicating with Altmetric about the OAI mapping issue for item 10568/82810 again
- Altmetric said it was somehow related to the OAI
dateStamp
not getting updated when the mappings changed, but I said that back in 2018-07 when this happened it was because the OAI was actually just not reflecting all the item's mappings
@@ -348,12 +348,12 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
- Must have been something like an old DSpace 5.5 file in the spring folder… weird
- But yay, this means we can update DSpace Test to Ubuntu 18.04, Tomcat 8, PostgreSQL 9.6, etc…
-2018-09-14
+2018-09-14
- Sisay uploaded the IITA records to CGSpace, but forgot to remove the old Handles
- I explicitly told him not to forget to remove them yesterday!
-2018-09-16
+2018-09-16
- Add the DSpace build.properties as a template into my Ansible infrastructure scripts for configuring DSpace machines
- One stupid thing there is that I add all the variables in a private vars file, which is apparently higher precedence than host vars, meaning that I can't override them (like SMTP server) on a per-host basis
@@ -361,7 +361,7 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
- I suggested that we leave access rights (
cg.identifier.access
) as it is now, with “Open Access” or “Limited Access”, and then simply re-brand that as “Access rights” in the UIs and relevant drop downs
- Then we continue as planned to add
dc.rights
as “Usage rights”
-2018-09-17
+2018-09-17
- Skype meeting with CGSpace team in Addis
- Change
cg.identifier.status
“Access rights” options to:
@@ -418,7 +418,7 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
- That one returns 766, which is exactly 1655 minus 889…
- Also, Solr's
fq
is similar to the regular q
query parameter, but it is considered for the Solr query cache so it should be faster for multiple queries
-2018-09-18
+2018-09-18
- I managed to create a simple proof of concept REST API to expose item view and download statistics: cgspace-statistics-api
- It uses the Python-based Falcon web framework and talks to Solr directly using the SolrClient library (which seems to have issues in Python 3.7 currently)
@@ -439,12 +439,12 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
- The rest of the Falcon tooling will be more difficult…
-2018-09-19
+2018-09-19
-2018-09-20
+2018-09-20
- Contact Atmire to ask how we can buy more credits for future development
- I researched the Solr
filterCache
size and I found out that the formula for calculating the potential memory use of each entry in the cache is:
@@ -460,7 +460,7 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
- Article discussing testing methodology for different
filterCache
sizes
- Discuss Handle links on Twitter with IWMI
-2018-09-21
+2018-09-21
- I see that there was a nice optimization to the ImageMagick PDF CMYK detection in the upstream
dspace-5_x
branch: DS-3664
- The fix will go into DSpace 5.10, and we are currently on DSpace 5.8 but I think I'll cherry-pick that fix into our
5_x-prod
branch:
@@ -475,14 +475,14 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=50.116.102.77' dspace.log.2018-09-
-2019-09-23
+2019-09-23
- I did more work on my cgspace-statistics-api, fixing some item view counts and adding indexing via SQLite (I'm trying to avoid having to set up yet another database, user, password, etc) during deployment
- I created a new branch called
5_x-upstream-cherry-picks
to test and track those cherry-picks from the upstream 5.x branch
- Also, I need to test the new LDAP server, so I will deploy that on DSpace Test today
- Rename my cgspace-statistics-api to dspace-statistics-api on GitHub
-2018-09-24
+2018-09-24
- Trying to figure out how to get item views and downloads from SQLite in a join
- It appears SQLite doesn't support
FULL OUTER JOIN
so some people on StackOverflow have emulated it with LEFT JOIN
and UNION
:
@@ -539,7 +539,7 @@ $ createuser -h localhost -U postgres --pwprompt dspacestatistics
$ psql -h localhost -U postgres dspacestatistics
dspacestatistics=> CREATE TABLE IF NOT EXISTS items
dspacestatistics-> (id INT PRIMARY KEY, views INT DEFAULT 0, downloads INT DEFAULT 0)
-2018-09-25
+2018-09-25
- I deployed the DSpace statistics API on CGSpace, but when I ran the indexer it wanted to index 180,000 pages of item views
- I'm not even sure how that's possible, as we only have 74,000 items!
@@ -586,7 +586,7 @@ Indexing item downloads (page 260 of 260)
- And now it's fast as hell due to the muuuuch smaller Solr statistics core
-2018-09-26
+2018-09-26
- Linode emailed to say that CGSpace (linode18) was using 30Mb/sec of outward bandwidth for two hours around midnight
- I don't see anything unusual in the nginx logs, so perhaps it was the cron job that syncs the Solr database to Amazon S3?
@@ -616,7 +616,7 @@ sys 2m18.485s
- I updated the dspace-statistiscs-api to use psycopg2's
execute_values()
to insert batches of 100 values into PostgreSQL instead of doing every insert individually
- On CGSpace this reduces the total run time of
indexer.py
from 432 seconds to 400 seconds (most of the time is actually spent in getting the data from Solr though)
-2018-09-27
+2018-09-27
- Linode emailed to say that CGSpace's (linode19) CPU load was high for a few hours last night
- Looking in the nginx logs around that time I see some new IPs that look like they are harvesting things:
@@ -645,7 +645,7 @@ $ grep -c -E 'session_id=[A-Z0-9]{32}:ip_addr=68.6.87.12' dspace.log.2018-09-26
- I will add their IPs to the list of bad bots in nginx so we can add a “bot” user agent to them and let Tomcat's Crawler Session Manager Valve handle them
- I asked Atmire to prepare an invoice for 125 credits
-2018-09-29
+2018-09-29
- I merged some changes to author affiliations from Sisay as well as some corrections to organizational names using smart quotes like
Université d’Abomey Calavi
(#388)
- Peter sent me a list of 43 author names to fix, but it had some encoding errors like
Belalcázar, John
like usual (I will tell him to stop trying to export as UTF-8 because it never seems to work)
@@ -662,7 +662,7 @@ $ ./fix-metadata-values.py -i 2018-09-29-fix-authors.csv -db dspace -u dspace -p
- It seems to be Moayad trying to do the AReS explorer indexing
- He was sending too many (5 or 10) concurrent requests to the server, but still… why is this shit so slow?!
-2018-09-30
+2018-09-30
- Valerio keeps sending items on CGSpace that have weird or incorrect languages, authors, etc
- I think I should just batch export and update all languages…
diff --git a/docs/2018-10/index.html b/docs/2018-10/index.html
index 32bc6e010..f81561e84 100644
--- a/docs/2018-10/index.html
+++ b/docs/2018-10/index.html
@@ -23,7 +23,7 @@ I created a GitHub issue to track this #389, because I'm super busy in Nairo
Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
I created a GitHub issue to track this #389, because I'm super busy in Nairobi right now
"/>
-
+
@@ -104,12 +104,12 @@ I created a GitHub issue to track this #389, because I'm super busy in Nairo
- 2018-10-01
+ 2018-10-01
- Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
- I created a GitHub issue to track this #389, because I'm super busy in Nairobi right now
-2018-10-03
+2018-10-03
- I see Moayad was busy collecting item views and downloads from CGSpace yesterday:
@@ -193,7 +193,7 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
"Thornton, Philip K.",Philip Thornton: 0000-0002-1854-0182
"Thornton, Phillip",Philip Thornton: 0000-0002-1854-0182
"Thornton, Phillip K.",Philip Thornton: 0000-0002-1854-0182
-2018-10-04
+2018-10-04
- Salem raised an issue that the dspace-statistics-api reports downloads for some items that have no bitstreams (like many limited access items)
- Every item has at least a
LICENSE
bundle, and some have a THUMBNAIL
bundle, but the indexing code is specifically checking for downloads from the ORIGINAL
bundle
@@ -213,24 +213,24 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
- I found a logic error in the dspace-statistics-api
indexer.py
script that was causing item views to be inserted into downloads
- I tagged version 0.4.2 of the tool and redeployed it on CGSpace
-2018-10-05
+2018-10-05
- Meet with Peter, Abenet, and Sisay to discuss CGSpace meeting in Nairobi and Sisay's work plan
- We agreed that he would do monthly updates of the controlled vocabularies and generate a new one for the top 1,000 AGROVOC terms
- Add a link to AReS explorer to the CGSpace homepage introduction text
-2018-10-06
+2018-10-06
- Follow up with AgriKnowledge about including Handle links (
dc.identifier.uri
) on their item pages
- In July, 2018 they had said their programmers would include the field in the next update of their website software
- CIMMYT's DSpace repository is now running DSpace 5.x!
- It's running OAI, but not REST, so I need to talk to Richard about that!
-2018-10-08
+2018-10-08
- AgriKnowledge says they're going to add the
dc.identifier.uri
to their item view in November when they update their website software
-2018-10-10
+2018-10-10
- Peter noticed that some recently added PDFs don't have thumbnails
- When I tried to force them to be generated I got an error that I've never seen before:
@@ -249,7 +249,7 @@ org.im4java.core.InfoException: org.im4java.core.CommandException: org.im4java.c
- This works, but I'm not sure what ImageMagick's long-term plan is if they are going to disable ALL image formats…
- I suppose I need to enable a workaround for this in Ansible?
-2018-10-11
+2018-10-11
- I emailed DuraSpace to update our entry in their DSpace registry (the data was still on DSpace 3, JSPUI, etc)
- Generate a list of the top 1500 values for
dc.subject
so Sisay can start making a controlled vocabulary for it:
@@ -288,7 +288,7 @@ COPY 10000
- CTA uploaded some infographics that are very tall and their thumbnails disrupt the item lists on the front page and in their communities and collections
- I decided to constrain the max height of these to 200px using CSS (#392)
-2018-10-13
+2018-10-13
- Run all system updates on DSpace Test (linode19) and reboot it
- Look through Peter's list of 746 author corrections in OpenRefine
@@ -308,7 +308,7 @@ COPY 10000
- I will apply these on CGSpace when I do the other updates tomorrow, as well as double check the high scoring ones to see if they are correct in Sisay's author controlled vocabulary
-2018-10-14
+2018-10-14
- Merge the authors controlled vocabulary (#393), usage rights (#394), and the upstream DSpace 5.x cherry-picks (#394) into our
5_x-prod
branch
- Switch to new CGIAR LDAP server on CGSpace, as it's been running (at least for authentication) on DSpace Test for the last few weeks, and I think they old one will be deprecated soon (today?)
@@ -330,7 +330,7 @@ COPY 10000
- I limited the tall thumbnails even further to 170px because Peter said CTA's were still too tall at 200px (#396)
-2018-10-15
+2018-10-15
- Tomcat on DSpace Test (linode19) has somehow stopped running all the DSpace applications
- I don't see anything in the Catalina logs or
dmesg
, and the Tomcat manager shows XMLUI, REST, OAI, etc all “Running: false”
@@ -353,7 +353,7 @@ $ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest superuser;'
$ pg_restore -h localhost -U postgres -d dspacetest -O --role=dspacetest -h localhost ~/Downloads/cgspace_2018-10-11.backup
$ psql -h localhost -U postgres -f ~/src/git/DSpace/dspace/etc/postgres/update-sequences.sql dspacetest
$ psql -h localhost -U postgres dspacetest -c 'alter user dspacetest nosuperuser;'
-2018-10-16
+2018-10-16
- Generate a list of the schema on CGSpace so CodeObia can compare with MELSpace:
@@ -401,7 +401,7 @@ $ time http --print h 'https://dspacetest.cgiar.org/rest/items?expand=metadata,b
- I sent a mail to dspace-tech to ask how to profile this…
-2018-10-17
+2018-10-17
- I decided to update most of the existing metadata values that we have in
dc.rights
on CGSpace to be machine readable in SPDX format (with Creative Commons version if it was included)
- Most of the are from Bioversity, and I asked Maria for permission before updating them
@@ -444,7 +444,7 @@ Given Names Deactivated Family Name Deactivated: 0000-0001-7930-5752
- I made a pull request and merged the ORCID updates into the
5_x-prod
branch (#397)
- Improve the logic of name checking in my resolve-orcids.py script
-2018-10-18
+2018-10-18
- I granted MEL's deposit user admin access to IITA, CIP, Bioversity, and RTB communities on DSpace Test so they can start testing real depositing
- After they do some tests and we check the values Enrico will send a formal email to Peter et al to ask that they start depositing officially
@@ -455,7 +455,7 @@ $ /usr/lib/postgresql/9.6/bin/pg_upgrade -b /usr/lib/postgresql/9.5/bin -B /usr/
$ exit
# systemctl start postgresql
# dpkg -r postgresql-9.5 postgresql-client-9.5 postgresql-contrib-9.5
-2018-10-19
+2018-10-19
- Help Francesca from Bioversity generate a report about items they uploaded in 2015 through 2018
- Linode emailed me to say that CGSpace (linode18) had high CPU usage for a few hours this afternoon
@@ -475,7 +475,7 @@ $ exit
- 5.9.6.51 is MegaIndex, which I've seen before…
-2018-10-20
+2018-10-20
- I was going to try to run Solr in Docker because I learned I can run Docker on Travis-CI (for testing my dspace-statistics-api), but the oldest official Solr images are for 5.5, and DSpace's Solr configuration is for 4.9
- This means our existing Solr configuration doesn't run in Solr 5.5:
@@ -522,11 +522,11 @@ ERROR: Error CREATEing SolrCore 'statistics': Unable to create core [statistics]
- So I'm not sure why this bot uses so many sessions — is it because it requests very slowly?
-2018-10-21
+2018-10-21
- Discuss AfricaRice joining CGSpace
-2018-10-22
+2018-10-22
- Post message to Yammer about usage rights (dc.rights)
- Change
build.properties
to use HTTPS for Handles in our Ansible infrastructure playbooks
@@ -546,7 +546,7 @@ UPDATE 76608
- Skype with Peter about ToRs for the AReS open source work and future plans to develop tools around the DSpace ecosystem
- Help CGSpace users with some issues related to usage rights
-2018-10-23
+2018-10-23
- Improve the usage rights (dc.rights) on CGSpace again by adding the long names in the submission form, as well as adding versio 3.0 and Creative Commons Zero (CC0) public domain license (#399)
- Add “usage rights” to the XMLUI item display (#400)
@@ -571,14 +571,14 @@ $ curl -X GET -H "Content-Type: application/json" -H "Accept: app
- Improve the documentatin of my dspace-statistics-api
- Email Modi and Jayashree from ICRISAT to ask if they want to join CGSpace as partners
-2018-10-24
+2018-10-24
- I deployed the new Creative Commons choices to the usage rights on the CGSpace submission form
- Also, I deployed the changes to show usage rights on the item view
- Re-work the dspace-statistics-api to use Python's native json instead of ujson to make it easier to deploy in places where we don't have — or don't want to have — Python headers and a compiler (like containers)
- Re-work the deployment of the API to use systemd's
EnvironmentFile
to read the environment variables instead of Environment
in the RMG Ansible infrastructure scripts
-2018-10-25
+2018-10-25
- Send Peter and Jane a list of technical ToRs for AReS open source work:
- Basic version of AReS that works with metadata fields present in default DSpace 5.x/6.x (for example author, date issued, type, subjects)
@@ -595,7 +595,7 @@ $ curl -X GET -H "Content-Type: application/json" -H "Accept: app
- Maria asked if we can add publisher (
dc.publisher
) to the advanced search filters, so I created a GitHub issue to track it
-2018-10-28
+2018-10-28
- I forked the SolrClient library and updated its kazoo dependency to be version 2.5.0 so we stop getting errors about “async” being a reserved keyword in Python 3.7
- Then I re-generated the
requirements.txt
in the dspace-statistics-library and released version 0.5.2
@@ -606,12 +606,12 @@ $ curl -X GET -H "Content-Type: application/json" -H "Accept: app
- I merged the changes for adding versionless Creative Commons licenses to the submission form to the
5_x-prod
branch (#403)
- I will deploy them later this week
-2018-10-29
+2018-10-29
- I deployed the publisher and Creative Commons changes to CGSpace, ran all system updates, and rebooted the server
- I sent the email to Jane Poole and ILRI ICT and Finance to start the admin process of getting a new Linode server for AReS
-2018-10-30
+2018-10-30
- Meet with the COPO guys to walk them through the CGSpace submission workflow and discuss CG core, REST API, etc
@@ -621,7 +621,7 @@ $ curl -X GET -H "Content-Type: application/json" -H "Accept: app
-2018-10-31
+2018-10-31
- More discussion and planning for AReS open sourcing and Amman meeting in 2019-10
- I did some work to clean up and improve the dspace-statistics-api README.md and project structure and moved it to the ILRI organization on GitHub
diff --git a/docs/2018-11/index.html b/docs/2018-11/index.html
index c9f2df476..d5b06f57c 100644
--- a/docs/2018-11/index.html
+++ b/docs/2018-11/index.html
@@ -33,7 +33,7 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
Today these are the top 10 IPs:
"/>
-
+
@@ -114,12 +114,12 @@ Today these are the top 10 IPs:
- 2018-11-01
+ 2018-11-01
- Finalize AReS Phase I and Phase II ToRs
- Send a note about my dspace-statistics-api to the dspace-tech mailing list
-2018-11-03
+2018-11-03
- Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
- Today these are the top 10 IPs:
@@ -218,7 +218,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.89.18' dspace.log.2018-11-03
- I will add them to the list of bot IPs in nginx for now and think about enforcing rate limits in XMLUI later
- Also, this is the third (?) time a mysterious IP on Hetzner has done this… who is this?
-2018-11-04
+2018-11-04
- Forward Peter's information about CGSpace financials to Modi from ICRISAT
- Linode emailed about the CPU load and outgoing bandwidth on CGSpace (linode18) again
@@ -313,7 +313,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
- I added the “most-popular” pages to the list that return
X-Robots-Tag: none
to try to inform bots not to index or follow those pages
- Also, I implemented an nginx rate limit of twelve requests per minute on all dynamic pages… I figure a human user might legitimately request one every five seconds
-2018-11-05
+2018-11-05
- I wrote a small Python script add-dc-rights.py to add usage rights (
dc.rights
) to CGSpace items based on the CSV Hector gave me from MARLO:
@@ -336,7 +336,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
- 29,000 requests from Facebook and none of the requests are to the dynamic pages I rate limited yesterday!
- At least the Tomcat Crawler Session Manager Valve is working now…
-2018-11-06
+2018-11-06
- I updated all the DSpace helper Python scripts to validate against PEP 8 using Flake8
- While I was updating the rest-find-collections.py script I noticed it was using
expand=all
to get the collection and community IDs
@@ -346,12 +346,12 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
- Average time with all expands was 14.3 seconds, and 12.8 seconds with
collections,subCommunities
, so 1.5 seconds difference!
-2018-11-07
+2018-11-07
- Update my dspace-statistics-api to use a database management class with Python contexts so that connections and cursors are automatically opened and closed
- Tag version 0.7.0 of the dspace-statistics-api
-2018-11-08
+2018-11-08
- I deployed verison 0.7.0 of the dspace-statistics-api on DSpace Test (linode19) so I can test it for a few days (and check the Munin stats to see the change in database connections) before deploying on CGSpace
- I also enabled systemd's persistent journal by setting
Storage=persistent
in journald.conf
@@ -362,12 +362,12 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
-2018-11-11
+2018-11-11
- I added tests to the dspace-statistics-api!
- It runs with Python 3.5, 3.6, and 3.7 using pytest, including automatically on Travis CI!
-2018-11-13
+2018-11-13
- Help troubleshoot an issue with Judy Kimani submitting to the ILRI project reports, papers and documents collection on CGSpace
- For some reason there is an existing group for the “Accept/Reject” workflow step, but it's empty
@@ -377,21 +377,21 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=2a03:2880:11ff' dspace.log.2018-11
- As for the collection mappings I think I need to export the CSV from DSpace Test, add mappings for each type (ie Books go to IITA books collection, etc), then re-import to DSpace Test, then export from DSpace command line in “migrate” mode…
- From there I should be able to script the removal of the old DSpace Test collection so they just go to the correct IITA collections on import into CGSpace
-2018-11-14
+2018-11-14
- Finally import the 277 IITA (ALIZZY1802) records to CGSpace
- I had to export them from DSpace Test and import them into a temporary collection on CGSpace first, then export the collection as CSV to map them to new owning collections (IITA books, IITA posters, etc) with OpenRefine because DSpace's
dspace export
command doesn't include the collections for the items!
- Delete all old IITA collections on DSpace Test and run
dspace cleanup
to get rid of all the bitstreams
-2018-11-15
+2018-11-15
-2018-11-18
+2018-11-18
- Request invoice from Wild Jordan for their meeting venue in January
-2018-11-19
+2018-11-19
- Testing corrections and deletions for AGROVOC (
dc.subject
) that Sisay and Peter were working on earlier this month:
@@ -405,7 +405,7 @@ $ ./delete-metadata-values.py -i 2018-11-19-delete-agrovoc.csv -f dc.subject -m
- Generate a new list of the top 1500 AGROVOC subjects on CGSpace to send to Peter and Sisay:
dspace=# \COPY (SELECT DISTINCT text_value, count(*) FROM metadatavalue WHERE metadata_field_id = 57 AND resource_type_id = 2 GROUP BY text_value ORDER BY count DESC LIMIT 1500) to /tmp/2018-11-19-top-1500-subject.csv WITH CSV HEADER;
-
2018-11-20
+2018-11-20
- The Discovery re-indexing on CGSpace never finished yesterday… the command died after six minutes
- The
dspace.log.2018-11-19
shows this at the time:
@@ -432,7 +432,7 @@ java.lang.IllegalStateException: DSpace kernel cannot be null
- these items will go to the Restoring Degraded Landscapes collection
- a few items missing DOIs, but they are easily available on the publication page
-- clean up DOIs to use “https://doi.org” format
+- clean up DOIs to use “https://doi.org" format
- clean up some cg.identifier.url to remove unneccessary query strings
- remove columns with no metadata (river basin, place, target audience, isbn, uri, publisher, ispartofseries, subject)
- fix column with invalid spaces in metadata field name (cg. subject. wle)
@@ -446,16 +446,16 @@ java.lang.IllegalStateException: DSpace kernel cannot be null
- these items will go to the Variability, Risks and Competing Uses collection
- trim and collapse whitespace in all fields (lots in WLE subject!)
- clean up some cg.identifier.url fields that had unneccessary anchors in their links
-- clean up DOIs to use “https://doi.org” format
+- clean up DOIs to use “https://doi.org" format
- fix column with invalid spaces in metadata field name (cg. subject. wle)
- remove columns with no metadata (place, target audience, isbn, uri, publisher, ispartofseries, subject)
- remove some weird Unicode characters (0xfffd) from abstracts, citations, and titles using Open Refine:
value.replace('�','')
-- I notice a few items using DOIs pointing at ICARDA's DSpace like: https://doi.org/20.500.11766/8178, which then points at the “real” DOI on the publisher's site… these should be using the real DOI instead of ICARDA's “fake” Handle DOI
+- I notice a few items using DOIs pointing at ICARDA's DSpace like: https://doi.org/20.500.11766/8178, which then points at the “real” DOI on the publisher's site… these should be using the real DOI instead of ICARDA's “fake” Handle DOI
- Some items missing DOIs, but they clearly have them if you look at the publisher's site
-2018-11-22
+2018-11-22
- Tezira is having problems submitting to the ILRI brochures collection for some reason
@@ -466,7 +466,7 @@ java.lang.IllegalStateException: DSpace kernel cannot be null
-2018-11-26
+2018-11-26
- This WLE item is issued on 2018-10 and accessioned on 2018-10-22 but does not show up in the WLE R4D Learning Series collection on CGSpace for some reason, and therefore does not show up on the WLE publication website
- I tried to remove that collection from Discovery and do a simple re-index:
@@ -484,7 +484,7 @@ $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery
- More work on the AReS terms of reference for CodeObia
- Erica from AgriKnowledge emailed me to say that they have implemented the changes in their item page UI so that they include the permanent identifier on items harvested from CGSpace, for example: https://www.agriknowledge.org/concern/generics/wd375w33s
-2018-11-27
+2018-11-27
- Linode alerted me that the outbound traffic rate on CGSpace (linode19) was very high
- The top users this morning are:
@@ -519,7 +519,7 @@ $ dspace dsrun org.dspace.eperson.Groomer -a -b 11/27/2016 -d
- Help Marianne troubleshoot some issue with items in their WLE collections and the WLE publicatons website
-2018-11-28
+2018-11-28
- Change the usage rights text a bit based on Maria Garruccio's feedback on “all rights reserved” (#404)
- Run all system updates on DSpace Test (linode19) and reboot the server
diff --git a/docs/2018-12/index.html b/docs/2018-12/index.html
index 1516b5d25..ac0cf71c8 100644
--- a/docs/2018-12/index.html
+++ b/docs/2018-12/index.html
@@ -33,7 +33,7 @@ Then I ran all system updates and restarted the server
I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week
"/>
-
+
@@ -114,13 +114,13 @@ I noticed that there is another issue with PDF thumbnails on CGSpace, and I see
- 2018-12-01
+ 2018-12-01
- Switch CGSpace (linode18) to use OpenJDK instead of Oracle JDK
- I manually installed OpenJDK, then removed Oracle JDK, then re-ran the Ansible playbook to update all configuration files, etc
- Then I ran all system updates and restarted the server
-2018-12-02
+2018-12-02
@@ -182,7 +182,7 @@ DEBUG: FC_WEIGHT didn't match
isNotNull(value.match(/.*\u00b4.*/)),
isNotNull(value.match(/.*\u007e.*/))
)
-2018-12-03
+2018-12-03
- I looked at the DSpace Ghostscript issue more and it seems to only affect certain PDFs…
- I can successfully generate a thumbnail for another recent item (10568/98394), but not for 10568/98930
@@ -308,7 +308,7 @@ $ gm convert -resize x600 -flatten -quality 85 cover.png cover.jpg
- This has got to be part Ubuntu Tomcat packaging, and part DSpace 5.x Tomcat 8.5 readiness…?
-2018-12-04
+2018-12-04
- Last night Linode sent a message that the load on CGSpace (linode18) was too high, here's a list of the top users at the time and throughout the day:
@@ -368,11 +368,11 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=78.46.79.71' dspace.log.2018-12-03
- In other news, it's good to see my re-work of the database connectivity in the dspace-statistics-api actually caused a reduction of persistent database connections (from 1 to 0, but still!):
-2018-12-05
+2018-12-05
- Discuss RSS issues with IWMI and WLE people
-2018-12-06
+2018-12-06
- Linode sent a message that the CPU usage of CGSpace (linode18) is too high last night
- I looked in the logs and there's nothing particular going on:
@@ -404,7 +404,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
- It seems they are hitting the XMLUI's OpenSearch a bit, but mostly on the REST API so no issues here yet
Drupal
is already in the Tomcat Crawler Session Manager Valve's regex so that's good!
-2018-12-10
+2018-12-10
- I ran into Mia Signs in Addis and we discussed Altmetric as well as RSS feeds again
@@ -417,7 +417,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
-2018-12-11
+2018-12-11
-2018-12-13
+2018-12-13
- Oh this is very interesting: WorldFish's repository is live now
- It's running DSpace 5.9-SNAPSHOT running on KnowledgeArc and the OAI and REST interfaces are active at least
- Also, I notice they ended up registering a Handle (they had been considering taking KnowledgeArc's advice to not use Handles!)
- Did some coordination work on the hotel bookings for the January AReS workshop in Amman
-2018-12-17
+2018-12-17
- Linode alerted me twice today that the load on CGSpace (linode18) was very high
- Looking at the nginx logs I see a few new IPs in the top 10:
@@ -457,15 +457,15 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}:ip_addr=54.70.40.11' dspace.log.2018-12-05
- I see that I added this bot to the Tomcat Crawler Session Manager valve in 2017-12 so its XMLUI sessions are getting re-used
2a01:4f8:173:1e85::2
is some new bot called BLEXBot/1.0
which should be matching the existing “bot” pattern in the Tomcat Crawler Session Manager regex
-2018-12-18
+2018-12-18
- Open a ticket with Atmire to ask them to prepare the Metadata Quality Module for our DSpace 5.8 code
-2018-12-19
+2018-12-19
- Update Atmire Listings and Reports to add the journal title (
dc.source
) to bibliography and update example bibliography values (#405)
-2018-12-20
+2018-12-20
- Testing compression of PostgreSQL backups with xz and gzip:
@@ -531,7 +531,7 @@ UPDATE 1
- After all that I started a full Discovery reindex to get the index name changes and rights updates
-2018-12-29
+2018-12-29
- CGSpace went down today for a few minutes while I was at dinner and I quickly restarted Tomcat
- The top IP addresses as of this evening are:
diff --git a/docs/2019-01/index.html b/docs/2019-01/index.html
index 76a195435..34496b03f 100644
--- a/docs/2019-01/index.html
+++ b/docs/2019-01/index.html
@@ -47,7 +47,7 @@ I don't see anything interesting in the web server logs around that time tho
357 207.46.13.1
903 54.70.40.11
"/>
-
+
@@ -128,7 +128,7 @@ I don't see anything interesting in the web server logs around that time tho
- 2019-01-02
+ 2019-01-02
- Linode alerted that CGSpace (linode18) had a higher outbound traffic rate than normal early this morning
- I don't see anything interesting in the web server logs around that time though:
@@ -173,7 +173,7 @@ Moving: 18497180 into core statistics-2018
- This could by why the outbound traffic rate was high, due to the S3 backup that run at 3:30AM…
- Run all system updates on DSpace Test (linode19) and reboot the server
-2019-01-03
+2019-01-03
- Update local Docker image for DSpace PostgreSQL, re-using the existing data volume:
@@ -271,7 +271,7 @@ org.apache.jasper.JasperException: /home.jsp (line: [214], column: [1]) /discove
- I sent a message to the dspace-tech mailing list to ask
-2019-01-04
+2019-01-04
- Linode sent a message last night that CGSpace (linode18) had high CPU usage, but I don't see anything around that time in the web server logs:
@@ -403,7 +403,7 @@ In [14]: for row in result.fetchone():
- The SPARQL query comes from my notes in 2017-08
-2019-01-06
+2019-01-06
- I built a clean DSpace 5.8 installation from the upstream
dspace-5.8
tag and the issue with the XMLUI/JSPUI login is still there with Tomcat 8.5.37
@@ -413,7 +413,7 @@ In [14]: for row in result.fetchone():
-2019-01-07
+2019-01-07
- I built a clean DSpace 6.3 installation from the upstream
dspace-6.3
tag and the issue with the XMLUI/JSPUI login is still there with Tomcat 8.5.37
@@ -423,7 +423,7 @@ In [14]: for row in result.fetchone():
-2019-01-08
+2019-01-08
- Tim Donohue responded to my thread about the cookies on the dspace-tech mailing list
@@ -433,7 +433,7 @@ In [14]: for row in result.fetchone():
-2019-01-11
+2019-01-11
- Tezira wrote to say she has stopped receiving the
DSpace Submission Approved and Archived
emails from CGSpace as of January 2nd
@@ -442,11 +442,11 @@ In [14]: for row in result.fetchone():
-2019-01-14
+2019-01-14
- Day one of CGSpace AReS meeting in Amman
-2019-01-15
+2019-01-15
-2019-01-24
+2019-01-24
- I noticed Ubuntu's Ghostscript 9.26 works on some troublesome PDFs where Arch's Ghostscript 9.26 doesn't, so the fix for the first/last page crash is not the patch I found yesterday
- Ubuntu's Ghostscript uses another patch from Ghostscript git (upstream bug report)
@@ -1078,7 +1078,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
- I sent a message titled “DC, QDC, and DCTERMS: reviewing our metadata practices” to the dspace-tech mailing list to ask about some of this
-2019-01-25
+2019-01-25
- A little bit more work on getting Tomcat to run from a tarball on our Ansible infrastructure playbooks
@@ -1090,7 +1090,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
-2019-01-27
+2019-01-27
- Linode sent an email that the server was using a lot of CPU this morning, and these were the top IPs in the web server logs at the time:
@@ -1113,7 +1113,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
-2019-01-28
+2019-01-28
-2019-01-30
+2019-01-30
- Got another alert from Linode about CGSpace (linode18) this morning, here are the top IPs before, during, and after the alert:
@@ -1204,7 +1204,7 @@ identify: CorruptImageProfile `xmp' @ warning/profile.c/SetImageProfileInternal/
- I might need to adjust the threshold again, because the load average this morning was 296% and the activity looks pretty normal (as always recently)
-2019-01-31
+2019-01-31
- Linode sent alerts about CGSpace (linode18) last night and this morning, here are the top IPs before, during, and after those times:
diff --git a/docs/2019-02/index.html b/docs/2019-02/index.html
index 82a3c83ef..d3babd297 100644
--- a/docs/2019-02/index.html
+++ b/docs/2019-02/index.html
@@ -69,7 +69,7 @@ real 0m19.873s
user 0m22.203s
sys 0m1.979s
"/>
-
+
@@ -150,7 +150,7 @@ sys 0m1.979s
- 2019-02-01
+ 2019-02-01
- Linode has alerted a few times since last night that the CPU usage on CGSpace (linode18) was high despite me increasing the alert threshold last week from 250% to 275%—I might need to increase it again!
- The top IPs before, during, and after this latest alert tonight were:
@@ -186,7 +186,7 @@ sys 0m1.979s
-2019-02-02
+2019-02-02
- Another alert from Linode about CGSpace (linode18) this morning, here are the top IPs in the web server logs before, during, and after that time:
@@ -206,7 +206,7 @@ sys 0m1.979s
I will increase the Linode alert threshold from 275 to 300% because this is becoming too much!
I tested the Atmire Metadata Quality Module (MQM)‘s duplicate checked on the some WLE items that I helped Udana with a few months ago on DSpace Test (linode19) and indeed it found many duplicates!
-2019-02-03
+2019-02-03
- This is seriously getting annoying, Linode sent another alert this morning that CGSpace (linode18) load was 377%!
- Here are the top IPs before, during, and after that time:
@@ -268,7 +268,7 @@ sys 0m1.979s
-2019-02-04
+2019-02-04
- Generate a list of CTA subjects from CGSpace for Peter:
@@ -294,7 +294,7 @@ COPY 321
At this rate I think I just need to stop paying attention to these alerts—DSpace gets thrashed when people use the APIs properly and there's nothing we can do to improve REST API performance!
Perhaps I just need to keep increasing the Linode alert threshold (currently 300%) for this host?
-2019-02-05
+2019-02-05
- Peter sent me corrections and deletions for the CTA subjects and as usual, there were encoding errors with some accentsÁ in his file
- In other news, it seems that the GREL syntax regarding booleans changed in OpenRefine recently, so I need to update some expressions like the one I use to detect encoding errors to use
toString()
:
@@ -328,7 +328,7 @@ MARKETING ET COMMERCE,MARKETING||COMMERCE
NATURAL RESOURCES AND ENVIRONMENT,NATURAL RESOURCES MANAGEMENT||ENVIRONMENT
PÊCHES ET AQUACULTURE,PÊCHES||AQUACULTURE
PESCAS E AQUACULTURE,PISCICULTURA||AQUACULTURE
-2019-02-06
+2019-02-06
- I dumped the CTA community so I can try to fix the subjects with multiple subjects that Peter indicated in his corrections:
@@ -406,7 +406,7 @@ PESCAS E AQUACULTURE,PISCICULTURA||AQUACULTURE
4661 205.186.128.185
4661 70.32.83.92
5102 45.5.186.2
-2019-02-07
+2019-02-07
- Linode sent an alert last night that the load on CGSpace (linode18) was over 300%
- Here are the top IPs in the web server and API logs before, during, and after that time, respectively:
@@ -491,7 +491,7 @@ Please see the DSpace documentation for assistance.
- I can't connect to TCP port 25 on that server so I sent a mail to CGNET support to ask what's up
- CGNET said these servers were discontinued in 2018-01 and that I should use Office 365
-2019-02-08
+2019-02-08
- I re-configured CGSpace to use the email/password for cgspace-support, but I get this error when I try the
test-email
script:
@@ -500,7 +500,7 @@ Please see the DSpace documentation for assistance.
- I tried to log into Outlook 365 with the credentials but I think the ones I have must be wrong, so I will ask ICT to reset the password
-2019-02-09
+2019-02-09
- Linode sent alerts about CPU load yesterday morning, yesterday night, and this morning! All over 300% CPU load!
- This is just for this morning:
@@ -535,7 +535,7 @@ Please see the DSpace documentation for assistance.
- 151.80.203.180 is on OVH so I sent a message to their abuse email…
-2019-02-10
+2019-02-10
- Linode sent another alert about CGSpace (linode18) CPU load this morning, here are the top IPs in the web server XMLUI and API logs before, during, and after that time:
@@ -624,12 +624,12 @@ Please see the DSpace documentation for assistance.
# mkdir -p /home/aorth/.local/lib/containers/volumes/artifactory5_data
# chown 1030 /home/aorth/.local/lib/containers/volumes/artifactory5_data
# docker run --name artifactory --network dspace-build -d -v /home/aorth/.local/lib/containers/volumes/artifactory5_data:/var/opt/jfrog/artifactory -p 8081:8081 docker.bintray.io/jfrog/artifactory-oss
-2019-02-11
+2019-02-11
- Bosede from IITA said we can use “SOCIAL SCIENCE & AGRIBUSINESS” in their new IITA theme field to be consistent with other places they are using it
- Run all system updates on DSpace Test (linode19) and reboot it
-2019-02-12
+2019-02-12
- I notice that DSpace 6 has included a new JAR-based PDF thumbnailer based on PDFBox, I wonder how good its thumbnails are and how it handles CMYK PDFs
- On a similar note, I wonder if we could use the performance-focused libvps and the third-party jlibvips Java library in DSpace
@@ -658,7 +658,7 @@ dspacestatistics=# SELECT * FROM items WHERE downloads > 0 ORDER BY downloads
- I will read the PDFBox thumbnailer documentation to see if I can change the size and quality
-2019-02-13
+2019-02-13
- ILRI ICT reset the password for the CGSpace mail account, but I still can't get it to send mail from DSpace's
test-email
utility
- I even added extra mail properties to
dspace.cfg
as suggested by someone on the dspace-tech mailing list:
@@ -735,7 +735,7 @@ $ podman run --name dspacedb -v /home/aorth/.local/lib/containers/volumes/dspace
- I increased the nginx upload limit, but she said she was having problems and couldn't really tell me why
- I logged in as her and completed the submission with no problems…
-2019-02-15
+2019-02-15
- Tomcat was killed around 3AM by the kernel's OOM killer according to
dmesg
:
@@ -805,7 +805,7 @@ $ podman start artifactory
-2019-02-17
+2019-02-17
- I ran DSpace's cleanup task on CGSpace (linode18) and there were errors:
@@ -821,7 +821,7 @@ UPDATE 1
- I merged the Atmire Metadata Quality Module (MQM) changes to the
5_x-prod
branch and deployed it on CGSpace (#407)
- Then I ran all system updates on CGSpace server and rebooted it
-2019-02-18
+2019-02-18
- Jesus fucking Christ, Linode sent an alert that CGSpace (linode18) was using 421% CPU for a few hours this afternoon (server time):
- There seems to have been a lot of activity in XMLUI:
@@ -942,7 +942,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
- I merged the changes to the
5_x-prod
branch and they will go live the next time we re-deploy CGSpace (#412)
-2019-02-19
+2019-02-19
- Linode sent another alert about CPU usage on CGSpace (linode18) averaging 417% this morning
- Unfortunately, I don't see any strange activity in the web server API or XMLUI logs at that time in particular
@@ -1028,7 +1028,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/cg-creator-i
- I wrote a quick and dirty Python script called
resolve-addresses.py
to resolve IP addresses to their owning organization's name, ASN, and country using the IPAPI.co API
-2019-02-20
+2019-02-20
- Ben Hack was asking about getting authors publications programmatically from CGSpace for the new ILRI website
- I told him that they should probably try to use the REST API's
find-by-metadata-field
endpoint
@@ -1049,7 +1049,7 @@ $ curl -s -H "accept: application/json" -H "Content-Type: applica
- See this issue on the VIVO tracker for more information about this endpoint
- The old-school AGROVOC SOAP WSDL works with the Zeep Python library, but in my tests the results are way too broad despite trying to use a “exact match” searching
-2019-02-21
+2019-02-21
- I wrote a script agrovoc-lookup.py to resolve subject terms against the public AGROVOC REST API
- It allows specifying the language the term should be queried in as well as output files to save the matched and unmatched terms to
@@ -1088,7 +1088,7 @@ COPY 33
-2019-02-22
+2019-02-22
-2019-02-24
+2019-02-24
- I decided to try to validate the AGROVOC subjects in IITA's recent batch upload by dumping all their terms, checking them in en/es/fr with
agrovoc-lookup.py
, then reconciling against the final list using reconcile-csv with OpenRefine
- I'm not sure how to deal with terms like “CORN” that are alternative labels (
altLabel
) in AGROVOC where the preferred label (prefLabel
) would be “MAIZE”
@@ -1163,7 +1163,7 @@ return "unmatched"
-2019-02-25
+2019-02-25
- There seems to be something going on with Solr on CGSpace (linode18) because statistics on communities and collections are blank for January and February this year
- I see some errors started recently in Solr (yesterday):
@@ -1257,7 +1257,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
- I still have not figured out what the real cause for the Solr cores to not load was, though
-2019-02-26
+2019-02-26
- I sent a mail to the dspace-tech mailing list about the “solr_update_time_stamp” error
- A CCAFS user sent a message saying they got this error when submitting to CGSpace:
@@ -1268,7 +1268,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
- I looked at the
WORKFLOW_STEP_1
(Accept/Reject) and the group is of course empty
- As we've seen several times recently, we are not using this step so it should simply be deleted
-2019-02-27
+2019-02-27
- Discuss batch uploads with Sisay
- He's trying to upload some CTA records, but it's not possible to do collection mapping when using the web UI
@@ -1291,7 +1291,7 @@ Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed
-2019-02-28
+2019-02-28
- I helped Sisay upload the nineteen CTA records from last week via the command line because they required mappings (which is not possible to do via the batch upload web interface)
diff --git a/docs/2019-03/index.html b/docs/2019-03/index.html
index 1146207fe..41d4df967 100644
--- a/docs/2019-03/index.html
+++ b/docs/2019-03/index.html
@@ -43,7 +43,7 @@ Most worryingly, there are encoding errors in the abstracts for eleven items, fo
I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs
"/>
-
+
@@ -124,7 +124,7 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca
- 2019-03-01
+ 2019-03-01
- I checked IITA's 259 Feb 14 records from last month for duplicates using Atmire's Duplicate Checker on a fresh snapshot of CGSpace on my local machine and everything looks good
- I am now only waiting to hear from her about where the items should go, though I assume Journal Articles go to IITA Journal Articles collection, etc…
@@ -139,7 +139,7 @@ I think I will need to ask Udana to re-copy and paste the abstracts with more ca
- I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs
-2019-03-03
+2019-03-03
- Trying to finally upload IITA's 259 Feb 14 items to CGSpace so I exported them from DSpace Test:
@@ -166,7 +166,7 @@ $ dspace export -i 10568/108684 -t COLLECTION -m -n 0 -d 2019-03-03-IITA-Feb14
- Deploy Tomcat 7.0.93 on CGSpace (linode18) after having tested it on DSpace Test (linode19) for a week
-2019-03-06
+2019-03-06
- Abenet was having problems with a CIP user account, I think that the user could not register
- I suspect it's related to the email issue that ICT hasn't responded about since last week
@@ -184,7 +184,7 @@ Error sending email:
- I will send a follow-up to ICT to ask them to reset the password
-2019-03-07
+2019-03-07
- ICT reset the email password and I confirmed that it is working now
- Generate a controlled vocabulary of 1187 AGROVOC subjects from the top 1500 that I checked last month, dumping the terms themselves using
csvcut
and then applying XML controlled vocabulary format in vim and then checking with tidy for good measure:
@@ -200,7 +200,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/dc-subject.x
-2019-03-08
+2019-03-08
- There's an issue with CGSpace right now where all items are giving a blank page in the XMLUI
@@ -223,7 +223,7 @@ $ tidy -xml -utf8 -iq -m -w 0 dspace/config/controlled-vocabularies/dc-subject.x
-2019-03-09
+2019-03-09
- I shared a post on Yammer informing our editors to try to AGROVOC controlled list
- The SPDX legal committee had a meeting and discussed the addition of CC-BY-ND-3.0-IGO and other IGO licenses to their list, but it seems unlikely (spdx/license-list-XML/issues/767)
@@ -241,7 +241,7 @@ UPDATE 44
- I ran the corrections on CGSpace and DSpace Test
-2019-03-10
+2019-03-10
- Working on tagging IITA's items with their new research theme (
cg.identifier.iitatheme
) based on their existing IITA subjects (see notes from 2019-02)
- I exported the entire IITA community from CGSpace and then used
csvcut
to extract only the needed fields:
@@ -261,15 +261,15 @@ UPDATE 44
- In total this would add research themes to 1,755 items
- I want to double check one last time with Bosede that they would like to do this, because I also see that this will tag a few hundred items from the 1970s and 1980s
-2019-03-11
+2019-03-11
- Bosede said that she would like the IITA research theme tagging only for items since 2015, which would be 256 items
-2019-03-12
+2019-03-12
- I imported the changes to 256 of IITA's records on CGSpace
-2019-03-14
+2019-03-14
- CGSpace had the same issue with blank items like earlier this month and I restarted Tomcat to fix it
- Create a pull request to change Swaziland to Eswatini and Macedonia to North Macedonia (#414)
@@ -301,7 +301,7 @@ done
- Run all system updates and reboot linode20
- Follow up with Felix from Earlham to see if he's done testing DSpace Test with COPO so I can re-sync the server from CGSpace
-2019-03-15
+2019-03-15
- CGSpace (linode18) has the blank page error again
- I'm not sure if it's related, but I see the following error in DSpace's log:
@@ -402,7 +402,7 @@ java.util.EmptyStackException
- For now I will just restart Tomcat…
-2019-03-17
+2019-03-17
-2019-03-22
+2019-03-22
- Share the initial list of invalid AGROVOC terms on Yammer to ask the editors for help in correcting them
- Advise Phanuel Ayuka from IITA about using controlled vocabularies in DSpace
-2019-03-23
+2019-03-23
- CGSpace (linode18) is having the blank page issue again and it seems to have started last night around 21:00:
@@ -811,7 +811,7 @@ org.postgresql.util.PSQLException: This statement has been closed.
-2019-03-24
+2019-03-24
- I did some more tests with the TomcatJdbcConnectionTest thing and while monitoring the number of active connections in jconsole and after adjusting the limits quite low I eventually saw some connections get abandoned
- I forgot that to connect to a remote JMX session with jconsole you need to use a dynamic SSH SOCKS proxy (as I originally discovered in 2017-11:
@@ -831,7 +831,7 @@ org.postgresql.util.PSQLException: This statement has been closed.
-2019-03-25
+2019-03-25
- Finish looking over the 175 invalid AGROVOC terms
@@ -918,7 +918,7 @@ $ grep -o -E 'session_id=[A-Z0-9]{32}' dspace.log.2019-03-22 | sort -u | wc -l
- According the Uptime Robot the server was up and down a few more times over the next hour so I restarted Tomcat again
-2019-03-26
+2019-03-26
- UptimeRobot says CGSpace went down again and I see the load is again at 14.0!
- Here are the top IPs in nginx logs in the last hour:
@@ -1032,7 +1032,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-03-26-AGROVOC-79-deletions.csv -db ds
$ grep -I -c 45.5.184.72 dspace.log.2019-03-26
0
-
2019-03-28
+2019-03-28
- Run the corrections and deletions to AGROVOC (dc.subject) on DSpace Test and CGSpace, and then start a full re-index of Discovery
- What the hell is going on with this CTA publication?
@@ -1074,7 +1074,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-03-26-AGROVOC-79-deletions.csv -db ds
- In other other news I see that DSpace has no statistics for years before 2019 currently, yet when I connect to Solr I see all the cores up
-2019-03-29
+2019-03-29
- Sent Linode more information from
top
and iostat
about the resource usage on linode18
@@ -1088,7 +1088,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-03-26-AGROVOC-79-deletions.csv -db ds
-2019-03-31
+2019-03-31
- After a few days of the CGSpace VM (linode18) being migrated to a new host the CPU steal is gone and the site is much more responsive
diff --git a/docs/2019-04/index.html b/docs/2019-04/index.html
index 184fb87f5..f039ce782 100644
--- a/docs/2019-04/index.html
+++ b/docs/2019-04/index.html
@@ -61,7 +61,7 @@ $ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u ds
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
"/>
-
+
@@ -142,7 +142,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
- 2019-04-01
+ 2019-04-01
- Meeting with AgroKnow to discuss CGSpace, ILRI data, AReS, GARDIAN, etc
@@ -165,7 +165,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
$ ./fix-metadata-values.py -i /tmp/2019-02-21-fix-4-regions.csv -db dspace -u dspace -p 'fuuu' -f cg.coverage.region -m 231 -t action -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspace -u dspace -p 'fuuu' -m 228 -f cg.coverage.country -d
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
-2019-04-02
+2019-04-02
- CTA says the Amazon IPs are AWS gateways for real user traffic
- I was trying to add Felix Shaw's account back to the Administrators group on DSpace Test, but I couldn't find his name in the user search of the groups page
@@ -175,7 +175,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace
-2019-04-03
+2019-04-03
-2019-04-14
+2019-04-14
- Change DSpace Test (linode19) to use the Java GC tuning from the Solr 4.14.4 startup script:
@@ -763,7 +763,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-04-11-delete-6-subjects.csv -db dspac
- I need to remember to check the Munin JVM graphs in a few days
- It might be placebo, but the site does feel snappier…
-2019-04-15
+2019-04-15
- Rework the dspace-statistics-api to use the vanilla Python requests library instead of Solr client
@@ -806,11 +806,11 @@ return item_id
real 82m45.324s
user 7m33.446s
sys 2m13.463s
-2019-04-16
+2019-04-16
- Export IITA's community from CGSpace because they want to experiment with importing it into their internal DSpace for some testing or something
-2019-04-17
+2019-04-17
- Reading an interesting blog post about Solr caching
- Did some tests of the dspace-statistics-api on my local DSpace instance with 28 million documents in a sharded statistics core (
statistics
and statistics-2018
) and monitored the memory usage of Tomcat in VisualVM
@@ -956,7 +956,7 @@ sys 2m13.463s
- Lots of CPU steal going on still on CGSpace (linode18):
-2019-04-18
+2019-04-18
- I've been trying to copy the
statistics-2018
Solr core from CGSpace to DSpace Test since yesterday, but the network speed is like 20KiB/sec
@@ -984,7 +984,7 @@ sys 2m13.463s
-2019-04-20
+2019-04-20
- Linode agreed to move CGSpace (linode18) to a new machine shortly after I filed my ticket about CPU steal two days ago and now the load is much more sane:
@@ -1020,7 +1020,7 @@ TCP window size: 85.0 KByte (default)
-2019-04-21
+2019-04-21
- Deploy Solr 4.10.4 on CGSpace (linode18)
- Deploy Tomcat 7.0.94 on CGSpace
@@ -1031,7 +1031,7 @@ TCP window size: 85.0 KByte (default)
-2019-04-22
+2019-04-22
- Abenet pointed out an item that doesn't have an Altmetric score on CGSpace, but has a score of 343 in the CGSpace Altmetric dashboard
@@ -1055,7 +1055,7 @@ dspace.log.2019-04-20:1515
-2019-04-23
+2019-04-23
@@ -1068,7 +1068,7 @@ dspace.log.2019-04-20:1515
-2019-04-24
+2019-04-24
- Linode migrated CGSpace (linode18) to a new host, but I am still getting poor performance when copying data to DSpace Test (linode19)
@@ -1159,7 +1159,7 @@ dspace=# SELECT COUNT(text_value) FROM metadatavalue WHERE resource_type_id=2 AN
- I sent a message to the dspace-tech mailing list to ask for help
-2019-04-25
+2019-04-25
- Peter pointed out that we need to remove Delicious and Google+ from our social sharing links
@@ -1200,13 +1200,13 @@ $ curl -f -H "rest-dspace-token: b43d41a6-5ac1-455d-b49a-616b8debc25b"
- Communicate with Carlos Tejo from the Land Portal about the
/items/find-by-metadata-value
endpoint
- Run all system updates on DSpace Test (linode19) and reboot it
-2019-04-26
+2019-04-26
- Export a list of authors for Peter to look through:
dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue where metadata_field_id = (select metadata_field_id from metadatafieldregistry where element = 'contributor' and qualifier = 'author') AND resource_type_id = 2 group by text_value order by count desc) to /tmp/2019-04-26-all-authors.csv with csv header;
COPY 65752
-
2019-04-28
+2019-04-28
-2019-05-06
+2019-05-06
- Peter pointed out that Solr stats are only showing 2019 stats
@@ -351,7 +351,7 @@ $ cat dspace.log.2019-05-01 | grep -E '2019-05-01 (02|03|04|05|06):' | grep -o -
-2019-05-07
+2019-05-07
- The total number of unique IPs on CGSpace yesterday was almost 14,000, which is several thousand higher than previous day totals:
@@ -391,7 +391,7 @@ $ cat dspace.log.2019-05-01 | grep -E 'session_id=[A-Z0-9]{32}' | sort | uniq |
- Add requests cache to
resolve-addresses.py
script
-2019-05-08
+2019-05-08
- A user said that CGSpace emails have stopped sending again
@@ -425,7 +425,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
-2019-05-10
+2019-05-10
- I finally had time to analyze the 7,000 IPs from the major traffic spike on 2019-05-06 after several runs of my
resolve-addresses.py
script (ipapi.co has a limit of 1,000 requests per day)
- Resolving the unique IP addresses to organization and AS names reveals some pretty big abusers:
@@ -461,7 +461,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
-2019-05-12
+2019-05-12
- I see that the Unpaywall bot is resonsible for a few thousand XMLUI sessions every day (IP addresses come from nginx access.log):
@@ -474,7 +474,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
Also, there is 10 to 20% CPU steal on that VM, so I will ask Linode to move it to another host
Commit changes to the resolve-addresses.py
script to add proper CSV output support
-2019-05-14
+2019-05-14
- Skype with Peter and AgroKnow about CTA story telling modification they want to do on the CTA ICT Update collection on CGSpace
@@ -483,7 +483,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
-2019-05-15
+2019-05-15
- Tezira says she's having issues with email reports for approved submissions, but I received an email about collection subscriptions this morning, and I tested with
dspace test-email
and it's also working…
- Send a list of DSpace build tips to Panagis from AgroKnow
@@ -493,7 +493,7 @@ UPDATE metadatavalue SET text_lang='es_ES' WHERE resource_type_id=2 AND metadata
-2019-05-16
+2019-05-16
- Export a list of all investors (
dc.description.sponsorship
) for Peter to look through and correct:
@@ -506,7 +506,7 @@ COPY 995
-2019-05-17
+2019-05-17
- Peter sent me a bunch of fixes for investors from yesterday
- I did a quick check in Open Refine (trim and collapse whitespace, clean smart quotes, etc) and then applied them on CGSpace:
@@ -532,17 +532,17 @@ $ ./delete-metadata-values.py -i /tmp/2019-05-17-delete-14-Investors.csv -db dsp
-2019-05-19
+2019-05-19
- Add “ISI journal” to item view sidebar at the request of Maria Garruccio
- Update
fix-metadata-values.py
and delete-metadata-values.py
scripts to add some basic checking of CSV fields and colorize shell output using Colorama
-2019-05-24
+2019-05-24
- Update AReS README.md on GitHub repository to add a proper introduction, credits, requirements, installation instructions, and legal information
- Update CIP subjects in input forms on CGSpace (#424)
-2019-05-25
+2019-05-25
-2019-06-04
+2019-06-04
- The MARLO team responded and said they will give us access to the CLARISA API
- Marie-Angélique proposed to integrate
dcterms.isPartOf
, dcterms.abstract
, and dcterms.bibliographicCitation
into the CG Core v2 schema
@@ -153,11 +153,11 @@ Skype with Marie-Angélique and Abenet about CG Core v2
- Add Arabic language to input-forms.xml (#427), as Bioversity is adding some Arabic items and noticed it missing
-2019-06-05
+2019-06-05
- Send mail to CGSpace and MELSpace people to let them know about the proposed metadata field migrations after the discussion with Marie-Angélique
-2019-06-07
+2019-06-07
- Thierry noticed that the CUA statistics were missing previous years again, and I see that the Solr admin UI has the following message:
@@ -165,7 +165,7 @@ Skype with Marie-Angélique and Abenet about CG Core v2
- I had to restart Tomcat a few times for all the stats cores to get loaded with no issue
-2019-06-10
+2019-06-10
- Rename the AReS repository on GitHub to OpenRXV: https://github.com/ilri/OpenRXV
- Create a new AReS repository: https://github.com/ilri/AReS
@@ -174,7 +174,7 @@ Skype with Marie-Angélique and Abenet about CG Core v2
- Trim leading, trailing, and consecutive whitespace on all columns, but I didn't notice very many issues
- Validate affiliations against latest list of top 1500 terms using reconcile-csv, correcting and standardizing about twenty-seven
- Validate countries against latest list of countries using reconcile-csv, correcting three
-- Convert all DOIs to “https://dx.doi.org” format
+- Convert all DOIs to “https://dx.doi.org" format
- Normalize all
cg.identifier.url
Google book fields to “books.google.com”
- Correct some inconsistencies in IITA subjects
- Correct two incorrect “Peer Review” in
dc.description.version
@@ -209,11 +209,11 @@ $ wc -l iita-agrovoc*
- Then make a new list to use with reconcile-csv by adding line numbers with csvcut and changing the line number header to
id
:
$ csvcut -c name -l 2019-06-10-subjects-matched.txt | sed 's/line_number/id/' > 2019-06-10-subjects-matched.csv
-
2019-06-20
+2019-06-20
- Share some feedback about AReS v2 with the colleagues and encourage them to do the same
-2019-06-23
+2019-06-23
-2019-06-28
+2019-06-28
- Start looking at the fifty-seven AfricaRice records sent by Ibnou earlier this month
@@ -275,7 +275,7 @@ UPDATE 2
-2019-06-30
+2019-06-30
- Upload fifty-seven AfricaRice records to DSpace Test
diff --git a/docs/2019-07/index.html b/docs/2019-07/index.html
index bf804d9b8..81ed2a2ab 100644
--- a/docs/2019-07/index.html
+++ b/docs/2019-07/index.html
@@ -35,7 +35,7 @@ CGSpace
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
"/>
-
+
@@ -116,7 +116,7 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
- 2019-07-01
+ 2019-07-01
- Create an “AfricaRice books and book chapters” collection on CGSpace for AfricaRice
- Last month Sisay asked why the following “most popular” statistics link for a range of months in 2018 works for the CIAT community on DSpace Test, but not on CGSpace:
@@ -205,7 +205,7 @@ Abenet had another similar issue a few days ago when trying to find the stats fo
-Dcom.sun.management.jmxremote.port=1337
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-
2019-07-02
+2019-07-02
- Help upload twenty-seven posters from the 2019-05 Sharefair to CGSpace
@@ -229,11 +229,11 @@ $ dspace import -a -e me@cgiar.org -m 2019-07-02-Sharefair.map -s /tmp/Sharefair
-2019-07-03
+2019-07-03
- Atmire responded about the Solr issue and said they would be willing to help
-2019-07-04
+2019-07-04