diff --git a/content/posts/2018-09.md b/content/posts/2018-09.md index f234df733..d0fa1b74c 100644 --- a/content/posts/2018-09.md +++ b/content/posts/2018-09.md @@ -528,5 +528,24 @@ Indexing item downloads (page 260 of 260) - Linode emailed to say that CGSpace (linode18) was using 30Mb/sec of outward bandwidth for two hours around midnight - I don't see anything unusual in the nginx logs, so perhaps it was the cron job that syncs the Solr database to Amazon S3? - It could be that the bot purge yesterday changed the core significantly so there was a lot to change? +- I don't see any drop in JVM heap size in CGSpace's munin stats since I did the Solr cleanup, but this looks pretty good: + +![Tomcat max processing time week](/cgspace-notes/2018/09/tomcat_maxtime-week.png) + +- I will have to keep an eye on that over the next few weeks to see if things stay as they are +- I did a batch replacement of the access rights with my [fix-metadata-values.py](https://gist.github.com/alanorth/df92cbfb54d762ba21b28f7cd83b6897) script on DSpace Test: + +``` +$ ./fix-metadata-values.py -i /tmp/fix-access-status.csv -db dspace-u dspace-p 'fuuu' -f cg.identifier.status -t correct -m 206 +``` + +- This changes "Open Access" to "Unrestricted Access" and "Limited Access" to "Restricted Access" +- After that I did a full Discovery reindex: + +``` +$ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b +``` + +- I told Peter it's better to do the access rights before the usage rights because the git branches are conflicting with each other and it's actually a pain in the ass to keep changing the values as we discuss, rebase, merge, fix conflicts... diff --git a/docs/2018-09/index.html b/docs/2018-09/index.html index acc1cf316..ad02a884e 100644 --- a/docs/2018-09/index.html +++ b/docs/2018-09/index.html @@ -18,7 +18,7 @@ I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I " /> - + Linode emailed to say that CGSpace (linode18) was using 30Mb/sec of outward bandwidth for two hours around midnight
  • I don’t see anything unusual in the nginx logs, so perhaps it was the cron job that syncs the Solr database to Amazon S3?
  • It could be that the bot purge yesterday changed the core significantly so there was a lot to change?
  • +
  • I don’t see any drop in JVM heap size in CGSpace’s munin stats since I did the Solr cleanup, but this looks pretty good:
  • + + +

    Tomcat max processing time week

    + + + +
    $ ./fix-metadata-values.py -i /tmp/fix-access-status.csv -db dspace-u dspace-p 'fuuu' -f cg.identifier.status -t correct -m 206
    +
    + + + +
    $ time schedtool -D -e ionice -c2 -n7 nice -n19 dspace index-discovery -b
    +
    + + diff --git a/docs/2018/09/tomcat_maxtime-week.png b/docs/2018/09/tomcat_maxtime-week.png new file mode 100644 index 000000000..e0dc7a47c Binary files /dev/null and b/docs/2018/09/tomcat_maxtime-week.png differ diff --git a/docs/categories/notes/page/2/index.html b/docs/categories/notes/page/2/index.html deleted file mode 100644 index 53027141a..000000000 --- a/docs/categories/notes/page/2/index.html +++ /dev/null @@ -1,486 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
    -
    - -
    -
    - - - -
    -
    -

    CGSpace Notes

    -

    Documenting day-to-day work on the CGSpace repository.

    -
    -
    - - - -
    -
    -
    - - - - - - - - - -
    -
    -

    July, 2017

    - -
    -

    2017-07-01

    - -
      -
    • Run system updates and reboot DSpace Test
    • -
    - -

    2017-07-04

    - -
      -
    • Merge changes for WLE Phase II theme rename (#329)
    • -
    • Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
    • -
    • We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
    • -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    June, 2017

    - -
    - 2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we’ll create a new sub-community for Phase II and create collections for the research themes there The current “Research Themes” community will be renamed to “WLE Phase I Research Themes” Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg. - Read more → -
    - - - - - - -
    -
    -

    May, 2017

    - -
    - 2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace. - Read more → -
    - - - - - - -
    -
    -

    April, 2017

    - -
    -

    2017-04-02

    - -
      -
    • Merge one change to CCAFS flagships that I had forgotten to remove last month (“MANAGING CLIMATE RISK”): https://github.com/ilri/DSpace/pull/317
    • -
    • Quick proof-of-concept hack to add dc.rights to the input form, including some inline instructions/hints:
    • -
    - -

    dc.rights in the submission form

    - -
      -
    • Remove redundant/duplicate text in the DSpace submission license
    • -
    • Testing the CMYK patch on a collection with 650 items:
    • -
    - -
    $ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
    -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    March, 2017

    - -
    -

    2017-03-01

    - -
      -
    • Run the 279 CIAT author corrections on CGSpace
    • -
    - -

    2017-03-02

    - -
      -
    • Skype with Michael and Peter, discussing moving the CGIAR Library to CGSpace
    • -
    • CGIAR people possibly open to moving content, redirecting library.cgiar.org to CGSpace and letting CGSpace resolve their handles
    • -
    • They might come in at the top level in one “CGIAR System” community, or with several communities
    • -
    • I need to spend a bit of time looking at the multiple handle support in DSpace and see if new content can be minted in both handles, or just one?
    • -
    • Need to send Peter and Michael some notes about this in a few days
    • -
    • Also, need to consider talking to Atmire about hiring them to bring ORCiD metadata to REST / OAI
    • -
    • Filed an issue on DSpace issue tracker for the filter-media bug that causes it to process JPGs even when limiting to the PDF thumbnail plugin: DS-3516
    • -
    • Discovered that the ImageMagic filter-media plugin creates JPG thumbnails with the CMYK colorspace when the source PDF is using CMYK
    • -
    • Interestingly, it seems DSpace 4.x’s thumbnails were sRGB, but forcing regeneration using DSpace 5.x’s ImageMagick plugin creates CMYK JPGs if the source PDF was CMYK (see 1056851999):
    • -
    - -
    $ identify ~/Desktop/alc_contrastes_desafios.jpg
    -/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
    -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    February, 2017

    - -
    -

    2017-02-07

    - -
      -
    • An item was mapped twice erroneously again, so I had to remove one of the mappings manually:
    • -
    - -
    dspace=# select * from collection2item where item_id = '80278';
    -  id   | collection_id | item_id
    --------+---------------+---------
    - 92551 |           313 |   80278
    - 92550 |           313 |   80278
    - 90774 |          1051 |   80278
    -(3 rows)
    -dspace=# delete from collection2item where id = 92551 and item_id = 80278;
    -DELETE 1
    -
    - -
      -
    • Create issue on GitHub to track the addition of CCAFS Phase II project tags (#301)
    • -
    • Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
    • -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    January, 2017

    - -
    -

    2017-01-02

    - -
      -
    • I checked to see if the Solr sharding task that is supposed to run on January 1st had run and saw there was an error
    • -
    • I tested on DSpace Test as well and it doesn’t work there either
    • -
    • I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
    • -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    December, 2016

    - -
    -

    2016-12-02

    - -
      -
    • CGSpace was down for five hours in the morning while I was sleeping
    • -
    • While looking in the logs for errors, I see tons of warnings about Atmire MQM:
    • -
    - -
    2016-12-02 03:00:32,352 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=CREATE, SubjectType=BUNDLE, SubjectID=70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632305, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
    -2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY_METADATA, SubjectType=BUNDLE, SubjectID =70316, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632309, dispatcher=1544803905, detail="dc.title", transactionID="TX157907838689377964651674089851855413607")
    -2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=ITEM, SubjectID=80044, Object Type=BUNDLE, ObjectID=70316, TimeStamp=1480647632311, dispatcher=1544803905, detail="THUMBNAIL", transactionID="TX157907838689377964651674089851855413607")
    -2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=ADD, SubjectType=BUNDLE, SubjectID=70316, Obje ctType=BITSTREAM, ObjectID=86715, TimeStamp=1480647632318, dispatcher=1544803905, detail="-1", transactionID="TX157907838689377964651674089851855413607")
    -2016-12-02 03:00:32,353 WARN  com.atmire.metadataquality.batchedit.BatchEditConsumer @ BatchEditConsumer should not have been given this kind of Subject in an event, skipping: org.dspace.event.Event(eventType=MODIFY, SubjectType=ITEM, SubjectID=80044, ObjectType=(Unknown), ObjectID=-1, TimeStamp=1480647632351, dispatcher=1544803905, detail=[null], transactionID="TX157907838689377964651674089851855413607")
    -
    - -
      -
    • I see thousands of them in the logs for the last few months, so it’s not related to the DSpace 5.5 upgrade
    • -
    • I’ve raised a ticket with Atmire to ask
    • -
    • Another worrying error from dspace.log is:
    • -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    November, 2016

    - -
    -

    2016-11-01

    - -
      -
    • Add dc.type to the output options for Atmire’s Listings and Reports module (#286)
    • -
    - -

    Listings and Reports with output type

    - -

    - Read more → -
    - - - - - - -
    -
    -

    October, 2016

    - -
    -

    2016-10-03

    - -
      -
    • Testing adding ORCIDs to a CSV file for a single item to see if the author orders get messed up
    • -
    • Need to test the following scenarios to see how author order is affected: - -
        -
      • ORCIDs only
      • -
      • ORCIDs plus normal authors
      • -
    • -
    • I exported a random item’s metadata as CSV, deleted all columns except id and collection, and made a new coloum called ORCID:dc.contributor.author with the following random ORCIDs from the ORCID registry:
    • -
    - -
    0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
    -
    - -

    - Read more → -
    - - - - - - - - - - -
    - - - - -
    -
    - - - - - - - - - diff --git a/docs/categories/notes/page/3/index.html b/docs/categories/notes/page/3/index.html deleted file mode 100644 index daeba25ba..000000000 --- a/docs/categories/notes/page/3/index.html +++ /dev/null @@ -1,481 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
    -
    - -
    -
    - - - -
    -
    -

    CGSpace Notes

    -

    Documenting day-to-day work on the CGSpace repository.

    -
    -
    - - - -
    -
    -
    - - - - - - - - - -
    -
    -

    September, 2016

    - -
    -

    2016-09-01

    - -
      -
    • Discuss helping CCAFS with some batch tagging of ORCID IDs for their authors
    • -
    • Discuss how the migration of CGIAR’s Active Directory to a flat structure will break our LDAP groups in DSpace
    • -
    • We had been using DC=ILRI to determine whether a user was ILRI or not
    • -
    • It looks like we might be able to use OUs now, instead of DCs:
    • -
    - -
    $ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
    -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    August, 2016

    - -
    -

    2016-08-01

    - -
      -
    • Add updated distribution license from Sisay (#259)
    • -
    • Play with upgrading Mirage 2 dependencies in bower.json because most are several versions of out date
    • -
    • Bootstrap is at 3.3.0 but upstream is at 3.3.7, and upgrading to anything beyond 3.3.1 breaks glyphicons and probably more
    • -
    • bower stuff is a dead end, waste of time, too many issues
    • -
    • Anything after Bootstrap 3.3.1 makes glyphicons disappear (HTTP 404 trying to access from incorrect path of fonts)
    • -
    • Start working on DSpace 5.1 → 5.5 port:
    • -
    - -
    $ git checkout -b 55new 5_x-prod
    -$ git reset --hard ilri/5_x-prod
    -$ git rebase -i dspace-5.5
    -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    July, 2016

    - -
    -

    2016-07-01

    - -
      -
    • Add dc.description.sponsorship to Discovery sidebar facets and make investors clickable in item view (#232)
    • -
    • I think this query should find and replace all authors that have “,” at the end of their names:
    • -
    - -
    dspacetest=# update metadatavalue set text_value = regexp_replace(text_value, '(^.+?),$', '\1') where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
    -UPDATE 95
    -dspacetest=# select text_value from  metadatavalue where metadata_field_id=3 and resource_type_id=2 and text_value ~ '^.+?,$';
    - text_value
    -------------
    -(0 rows)
    -
    - -
      -
    • In this case the select query was showing 95 results before the update
    • -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    June, 2016

    - -
    -

    2016-06-01

    - - - -

    - Read more → -
    - - - - - - -
    -
    -

    May, 2016

    - -
    -

    2016-05-01

    - -
      -
    • Since yesterday there have been 10,000 REST errors and the site has been unstable again
    • -
    • I have blocked access to the API now
    • -
    • There are 3,000 IPs accessing the REST API in a 24-hour period!
    • -
    - -
    # awk '{print $1}' /var/log/nginx/rest.log  | uniq | wc -l
    -3168
    -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    April, 2016

    - -
    -

    2016-04-04

    - -
      -
    • Looking at log file use on CGSpace and notice that we need to work on our cron setup a bit
    • -
    • We are backing up all logs in the log folder, including useless stuff like solr, cocoon, handle-plugin, etc
    • -
    • After running DSpace for over five years I’ve never needed to look in any other log file than dspace.log, leave alone one from last year!
    • -
    • This will save us a few gigs of backup space we’re paying for on S3
    • -
    • Also, I noticed the checker log has some errors we should pay attention to:
    • -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    March, 2016

    - -
    -

    2016-03-02

    - -
      -
    • Looking at issues with author authorities on CGSpace
    • -
    • For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
    • -
    • Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
    • -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    February, 2016

    - -
    -

    2016-02-05

    - -
      -
    • Looking at some DAGRIS data for Abenet Yabowork
    • -
    • Lots of issues with spaces, newlines, etc causing the import to fail
    • -
    • I noticed we have a very interesting list of countries on CGSpace:
    • -
    - -

    CGSpace country list

    - -
      -
    • Not only are there 49,000 countries, we have some blanks (25)…
    • -
    • Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
    • -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    January, 2016

    - -
    -

    2016-01-13

    - -
      -
    • Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_collections.sh script I wrote last year.
    • -
    • I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
    • -
    • Update GitHub wiki for documentation of maintenance tasks.
    • -
    - -

    - Read more → -
    - - - - - - -
    -
    -

    December, 2015

    - -
    -

    2015-12-02

    - -
      -
    • Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less space:
    • -
    - -
    # cd /home/dspacetest.cgiar.org/log
    -# ls -lh dspace.log.2015-11-18*
    --rw-rw-r-- 1 tomcat7 tomcat7 2.0M Nov 18 23:59 dspace.log.2015-11-18
    --rw-rw-r-- 1 tomcat7 tomcat7 387K Nov 18 23:59 dspace.log.2015-11-18.lzo
    --rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
    -
    - -

    - Read more → -
    - - - - - - - - - - -
    - - - - -
    -
    - - - - - - - - - diff --git a/docs/categories/notes/page/4/index.html b/docs/categories/notes/page/4/index.html deleted file mode 100644 index bcc720377..000000000 --- a/docs/categories/notes/page/4/index.html +++ /dev/null @@ -1,221 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CGSpace Notes - - - - - - - - - - - - - - - - - - - - -
    -
    - -
    -
    - - - -
    -
    -

    CGSpace Notes

    -

    Documenting day-to-day work on the CGSpace repository.

    -
    -
    - - - -
    -
    -
    - - - - - - - - - -
    -
    -

    November, 2015

    - -
    -

    2015-11-22

    - -
      -
    • CGSpace went down
    • -
    • Looks like DSpace exhausted its PostgreSQL connection pool
    • -
    • Last week I had increased the limit from 30 to 60, which seemed to help, but now there are many more idle connections:
    • -
    - -
    $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspace
    -78
    -
    - -

    - Read more → -
    - - - - - - - - - - - - - - - - - -
    - - - - -
    -
    - - - - - - - - - diff --git a/docs/sitemap.xml b/docs/sitemap.xml index daeb5f96c..c99914ba4 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -4,7 +4,7 @@ https://alanorth.github.io/cgspace-notes/2018-09/ - 2018-09-26T03:29:56+03:00 + 2018-09-26T09:54:45+03:00 @@ -184,7 +184,7 @@ https://alanorth.github.io/cgspace-notes/ - 2018-09-26T03:29:56+03:00 + 2018-09-26T09:54:45+03:00 0 @@ -195,7 +195,7 @@ https://alanorth.github.io/cgspace-notes/tags/notes/ - 2018-09-26T03:29:56+03:00 + 2018-09-26T09:54:45+03:00 0 @@ -207,13 +207,13 @@ https://alanorth.github.io/cgspace-notes/posts/ - 2018-09-26T03:29:56+03:00 + 2018-09-26T09:54:45+03:00 0 https://alanorth.github.io/cgspace-notes/tags/ - 2018-09-26T03:29:56+03:00 + 2018-09-26T09:54:45+03:00 0 diff --git a/static/2018/09/tomcat_maxtime-week.png b/static/2018/09/tomcat_maxtime-week.png new file mode 100644 index 000000000..e0dc7a47c Binary files /dev/null and b/static/2018/09/tomcat_maxtime-week.png differ