CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

July, 2016

2016-07-01

  • Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
  • I think this query should find and replace all authors that have “,” at the end of their names:

    dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
    UPDATE 95
    dspacetest=# select text_value from  metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
    text_value
    ------------
    (0 rows)
    
  • In this case the select query was showing 95 results before the update

2016-07-02

  • Comment on DSpace Jira ticket about author lookup search text (DS-2329)

2016-07-04

  • Seems the database’s author authority values mean nothing without the authority Solr core from the host where they were created!

2016-07-05

  • Amend backup-solr.sh script so it backs up the entire Solr folder
  • We really only need statistics and authority but meh
  • Fix metadata for species on DSpace Test:

    $ ./fix-metadata-values.py -i /tmp/Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 94 -d dspacetest -u dspacetest -p 'fuuu'
    
  • Will run later on CGSpace

  • A user is still having problems with Sherpa/Romeo causing crashes during the submission process when the journal is “ungraded”

  • I tested the patch for DS-2740 that I had found last month and it seems to work

  • I will merge it to 5_x-prod

2016-07-06

  • Delete 23 blank metadata values from CGSpace:

    cgspace=# delete from metadatavalue where resource_type_id=2 and text_value='';
    DELETE 23
    
  • Complete phase three of metadata migration, for the following fields:

    • dc.title.jtitle → dc.source
    • dc.crsubject.crpsubject → cg.contributor.crp
    • dc.contributor.affiliation → cg.contributor.affiliation
    • dc.Species → cg.species
    • dc.srplace.subregion → cg.coverage.subregion
    • dc.contributor.corporate → dc.contributor.author
    • dc.identifier.url → cg.identifier.url
    • dc.identifier.doi → cg.identifier.doi
    • dc.identifier.googleurl → cg.identifier.googleurl
    • dc.identifier.dataurl → cg.identifier.dataurl
  • Also, run fixes and deletes for species and author affiliations (over 1000 corrections!)

    $ ./fix-metadata-values.py -i Species-Peter-Fix.csv -f dc.Species -t CORRECT -m 212 -d dspace -u dspace -p 'fuuu'
    $ ./fix-metadata-values.py -i Affiliations-Fix-1045-Peter-Abenet.csv -f dc.contributor.affiliation -t Correct -m 211 -d dspace -u dspace -p 'fuuu'
    $ ./delete-metadata-values.py -f dc.contributor.affiliation -i Affiliations-Delete-Peter-Abenet.csv -m 211 -u dspace -d dspace -p 'fuuu'
    
  • I then ran all server updates and rebooted the server

2016-07-11

  • Doing some author cleanups from Peter and Abenet:

    $ ./fix-metadata-values.py -i /tmp/Authors-Fix-205-UTF8.csv -f dc.contributor.author -t correct -m 3 -d dspacetest -u dspacetest -p fuuu
    $ ./delete-metadata-values.py -f dc.contributor.author -i /tmp/Authors-Delete-UTF8.csv -m 3 -u dspacetest -d dspacetest -p fuuu
    

2016-07-13

  • Run the author cleanups on CGSpace and start a full Discovery re-index

2016-07-14

  • Test LDAP settings for new root LDAP
  • Seems to work when binding as a top-level user

2016-07-18

  • Adjust identifiers in XMLUI item display to be more prominent
  • Add species and breed to the XMLUI item display
  • CGSpace crashed late at night and the DSpace logs were showing:

    2016-07-18 20:26:30,941 ERROR org.dspace.storage.rdbms.DatabaseManager @ SQL connection Error - 
    org.apache.commons.dbcp.SQLNestedException: Cannot get a connection, pool error Timeout waiting for idle object
    ...
    
  • I suspect it’s someone hitting REST too much:

    # awk '{print $1}' /var/log/nginx/rest.log  | sort -n | uniq -c | sort -h | tail -n 3
    710 66.249.78.38
    1781 181.118.144.29
    24904 70.32.99.142
    
  • I just blocked access to /rest for that last IP for now:

     # log rest requests
     location /rest {
         access_log /var/log/nginx/rest.log;
         proxy_pass http://127.0.0.1:8443;
         deny 70.32.99.142;
     }
    

2016-07-21

2016-07-22

  • Help Paola from CCAFS with thumbnails for batch uploads
  • She has been struggling to get the dimensions right, and manually enlarging smaller thumbnails, renaming PNGs to JPG, etc
  • Altmetric reports having an issue with some of our authors being doubled…
  • This is related to authority and confidence!
  • We might need to use index.authority.ignore-prefered=true to tell the Discovery index to prefer the variation that exists in the metadatavalue rather than what it finds in the authority cache.
  • Trying these on DSpace Test after a discussion by Daniel Scharon on the dspace-tech mailing list:

    index.authority.ignore-prefered.dc.contributor.author=true
    index.authority.ignore-variants.dc.contributor.author=false
    
  • After reindexing I don’t see any change in Discovery’s display of authors, and still have entries like:

    Grace, D. (464)
    Grace, D. (62)
    
  • I asked for clarification of the following options on the DSpace mailing list:

    index.authority.ignore
    index.authority.ignore-prefered
    index.authority.ignore-variants
    
  • In the mean time, I will try these on DSpace Test (plus a reindex):

    index.authority.ignore=true
    index.authority.ignore-prefered=true
    index.authority.ignore-variants=true
    
  • Enabled usage of X-Forwarded-For in DSpace admin control panel (#255

  • It was misconfigured and disabled, but already working for some reason sigh

  • … no luck. Trying with just:

    index.authority.ignore=true
    
  • After re-indexing and clearing the XMLUI cache nothing has changed

2016-07-25

  • Trying a few more settings (plus reindex) for Discovery on DSpace Test:

    index.authority.ignore-prefered.dc.contributor.author=true
    index.authority.ignore-variants=true
    
  • Run all OS updates and reboot DSpace Test server

  • No changes to Discovery after reindexing… hmm.

  • Integrate and massively clean up About page (#256)

About page

  • The DSpace source code mentions the configuration key discovery.index.authority.ignore-prefered.* (with prefix of discovery, despite the docs saying otherwise), so I’m trying the following on DSpace Test:

    discovery.index.authority.ignore-prefered.dc.contributor.author=true
    discovery.index.authority.ignore-variants=true
    
  • Still no change!

  • Deploy species, breed, and identifier changes to CGSpace, as well as About page

  • Run Linode RAM upgrade (8→12GB)

  • Re-sync DSpace Test with CGSpace

  • I noticed that our backup scripts don’t send Solr cores to S3 so I amended the script

2016-07-31

  • Work on removing Dryland Systems and Humidtropics subjects from Discovery sidebar and Browse by
  • Also change “Subjects” to “AGROVOC keywords” in Discovery sidebar/search and Browse by (#257)