CGSpace Notes

Documenting day-to-day work on the CGSpace repository.

October, 2017

2017-10-01

http://hdl.handle.net/10568/78495||http://hdl.handle.net/10568/79336
  • There appears to be a pattern but I’ll have to look a bit closer and try to clean them up automatically, either in SQL or in OpenRefine
  • Add Katherine Lutz to the groups for content sumission and edit steps of the CGIAR System collections

2017-10-02

  • Peter Ballantyne said he was having problems logging into CGSpace with “both” of his accounts (CGIAR LDAP and personal, apparently)
  • I looked in the logs and saw some LDAP lookup failures due to timeout but also strangely a “no DN found” error:
2017-10-01 20:24:57,928 WARN  org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:ldap_attribute_lookup:type=failed_search javax.naming.CommunicationException\colon; svcgroot2.cgiarad.org\colon;3269 [Root exception is java.net.ConnectException\colon; Connection timed out (Connection timed out)]
2017-10-01 20:22:37,982 INFO  org.dspace.authenticate.LDAPAuthentication @ anonymous:session_id=CA0AA5FEAEA8805645489404CDCE9594:ip_addr=41.204.190.40:failed_login:no DN found for user pballantyne
  • I thought maybe his account had expired (seeing as it’s was the first of the month) but he says he was finally able to log in today
  • The logs for yesterday show fourteen errors related to LDAP auth failures:
$ grep -c "ldap_authentication:type=failed_auth" dspace.log.2017-10-01
14
  • For what it’s worth, there are no errors on any other recent days, so it must have been some network issue on Linode or CGNET’s LDAP server
  • Linode emailed to say that linode578611 (DSpace Test) needs to migrate to a new host for a security update so I initiated the migration immediately rather than waiting for the scheduled time in two weeks

2017-10-04

  • Twice in the last twenty-four hours Linode has alerted about high CPU usage on CGSpace (linode2533629)
  • Communicate with Sam from the CGIAR System Organization about some broken links coming from their CGIAR Library domain to CGSpace
  • The first is a link to a browse page that should be handled better in nginx:
http://library.cgiar.org/browse?value=Intellectual%20Assets%20Reports&type=subject → https://cgspace.cgiar.org/browse?value=Intellectual%20Assets%20Reports&type=subject
  • We’ll need to check for browse links and handle them properly, including swapping the subject parameter for systemsubject (which doesn’t exist in Discovery yet, but we’ll need to add it) as we have moved their poorly curated subjects from dc.subject to cg.subject.system
  • The second link was a direct link to a bitstream which has broken due to the sequence being updated, so I told him he should link to the handle of the item instead
  • Help Sisay proof sixty-two IITA records on DSpace Test
  • Lots of inconsistencies and errors in subjects, dc.format.extent, regions, countries
  • Merge the Discovery search changes for ISI Journal (#341)

2017-10-05

  • Twice in the past twenty-four hours Linode has warned that CGSpace’s outbound traffic rate was exceeding the notification threshold
  • I had a look at yesterday’s OAI and REST logs in /var/log/nginx but didn’t see anything unusual:
# awk '{print $1}' /var/log/nginx/rest.log.1 | sort -n | uniq -c | sort -h | tail -n 10
    141 157.55.39.240
    145 40.77.167.85
    162 66.249.66.92
    181 66.249.66.95
    211 66.249.66.91
    312 66.249.66.94
    384 66.249.66.90
   1495 50.116.102.77
   3904 70.32.83.92
   9904 45.5.184.196
# awk '{print $1}' /var/log/nginx/oai.log.1 | sort -n | uniq -c | sort -h | tail -n 10
      5 66.249.66.71
      6 66.249.66.67
      6 68.180.229.31
      8 41.84.227.85
      8 66.249.66.92
     17 66.249.66.65
     24 66.249.66.91
     38 66.249.66.95
     69 66.249.66.90
    148 66.249.66.94
  • Working on the nginx redirects for CGIAR Library
  • We should start using 301 redirects and also allow for /sitemap to work on the library.cgiar.org domain so the CGIAR System Organization people can update their Google Search Console and allow Google to find their content in a structured way
  • Remove eleven occurrences of ACP in IITA’s cg.coverage.region using the Atmire batch edit module from Discovery
  • Need to investigate how we can verify the library.cgiar.org using the HTML or DNS methods
  • Run corrections on 143 ILRI Archive items that had two dc.identifier.uri values (Handle) that Peter had pointed out earlier this week
  • I used OpenRefine to isolate them and then fixed and re-imported them into CGSpace
  • I manually checked a dozen of them and it appeared that the correct handle was always the second one, so I just deleted the first one