mirror of
https://github.com/alanorth/cgspace-notes.git
synced 2024-11-25 16:08:19 +01:00
Update notes for 2019-09-21
This commit is contained in:
parent
77bc2f3b6b
commit
ddf3b1346b
@ -291,4 +291,18 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio
|
||||
- Continue with institutional author normalization
|
||||
- Ask which collection to map items with type Brochure, Journal Item, and Thesis?
|
||||
|
||||
## 2019-09-21
|
||||
|
||||
- Re-upload the [IITA Sept 6 (20196th.xls) records to DSpace Test](https://dspacetest.cgiar.org/handle/10568/105116) after I did the re-sync yesterday
|
||||
- Then I looked at the records again and sent some feedback about three duplicates to Bosede
|
||||
- Also I noticed that many journal articles have the journal and page information in the citation, but are missing `dc.source` and `dc.format.extent` fields
|
||||
- Play with language identification using the langdetect, fasttext, polyglot, and langid libraries
|
||||
- ployglot requires too many system things to compile
|
||||
- langdetect didn't seem as accurate as the others
|
||||
- fasttext is likely the best, but [prints a blank link to the console when loading a model](https://github.com/facebookresearch/fastText/issues/909)
|
||||
- langid seems to be the best considering the above experiences
|
||||
- I added very experimental language detection to the [csv-metadata-quality](https://github.com/ilri/csv-metadata-quality) module
|
||||
- It works by checking the predicted language of the `dc.title` field against the item's `dc.language.iso` field
|
||||
- I tested it on the Bioversity migration data set and actually managed to correct about eight incorrect language fields in their records!
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
@ -37,7 +37,7 @@ $ psql -c 'SELECT * from pg_stat_activity;' | grep idle | grep -c cgspac
|
||||
78
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -37,7 +37,7 @@ Replace lzop with xz in log compression cron jobs on DSpace Test—it uses less
|
||||
-rw-rw-r-- 1 tomcat7 tomcat7 169K Nov 18 23:59 dspace.log.2015-11-18.xz
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -27,7 +27,7 @@ Move ILRI collection 10568/12503 from 10568/27869 to 10568/27629 using the move_
|
||||
I realized it is only necessary to clear the Cocoon cache after moving collections—rather than reindexing—as no metadata has changed, and therefore no search or browse indexes need to be updated.
|
||||
Update GitHub wiki for documentation of maintenance tasks.
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -41,7 +41,7 @@ I noticed we have a very interesting list of countries on CGSpace:
|
||||
Not only are there 49,000 countries, we have some blanks (25)…
|
||||
Also, lots of things like “COTE D`LVOIRE” and “COTE D IVOIRE”
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -27,7 +27,7 @@ Looking at issues with author authorities on CGSpace
|
||||
For some reason we still have the index-lucene-update cron job active on CGSpace, but I’m pretty sure we don’t need it as of the latest few versions of Atmire’s Listings and Reports module
|
||||
Reinstall my local (Mac OS X) DSpace stack with Tomcat 7, PostgreSQL 9.3, and Java JDK 1.7 to match environment on CGSpace server
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -31,7 +31,7 @@ After running DSpace for over five years I’ve never needed to look in any
|
||||
This will save us a few gigs of backup space we’re paying for on S3
|
||||
Also, I noticed the checker log has some errors we should pay attention to:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -37,7 +37,7 @@ There are 3,000 IPs accessing the REST API in a 24-hour period!
|
||||
3168
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -33,7 +33,7 @@ This is their publications set: http://ebrary.ifpri.org/oai/oai.php?verb=ListRec
|
||||
You can see the others by using the OAI ListSets verb: http://ebrary.ifpri.org/oai/oai.php?verb=ListSets
|
||||
Working on second phase of metadata migration, looks like this will work for moving CPWF-specific data in dc.identifier.fund to cg.identifier.cpwfproject and then the rest to dc.description.sponsorship
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -47,7 +47,7 @@ text_value
|
||||
|
||||
In this case the select query was showing 95 results before the update
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ $ git reset --hard ilri/5_x-prod
|
||||
$ git rebase -i dspace-5.5
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -37,7 +37,7 @@ It looks like we might be able to use OUs now, instead of DCs:
|
||||
$ ldapsearch -x -H ldaps://svcgroot2.cgiarad.org:3269/ -b "dc=cgiarad,dc=org" -D "admigration1@cgiarad.org" -W "(sAMAccountName=admigration1)"
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ I exported a random item’s metadata as CSV, deleted all columns except id
|
||||
0000-0002-6115-0956||0000-0002-3812-8793||0000-0001-7462-405X
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -27,7 +27,7 @@ Add dc.type to the output options for Atmire’s Listings and Reports module
|
||||
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -53,7 +53,7 @@ I’ve raised a ticket with Atmire to ask
|
||||
|
||||
Another worrying error from dspace.log is:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -27,7 +27,7 @@ I checked to see if the Solr sharding task that is supposed to run on January 1s
|
||||
I tested on DSpace Test as well and it doesn’t work there either
|
||||
I asked on the dspace-tech mailing list because it seems to be broken, and actually now I’m not sure if we’ve ever had the sharding task run successfully over all these years
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -53,7 +53,7 @@ Create issue on GitHub to track the addition of CCAFS Phase II project tags (#30
|
||||
|
||||
Looks like we’ll be using cg.identifier.ccafsprojectpii as the field name
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -61,7 +61,7 @@ $ identify ~/Desktop/alc_contrastes_desafios.jpg
|
||||
/Users/aorth/Desktop/alc_contrastes_desafios.jpg JPEG 464x600 464x600+0+0 8-bit CMYK 168KB 0.000u 0:00.000
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -47,7 +47,7 @@ Testing the CMYK patch on a collection with 650 items:
|
||||
$ [dspace]/bin/dspace filter-media -f -i 10568/16498 -p "ImageMagick PDF Thumbnail" -v >& /tmp/filter-media-cmyk.txt
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="May, 2017"/>
|
||||
<meta name="twitter:description" content="2017-05-01 ICARDA apparently started working on CG Core on their MEL repository They have done a few cg.* fields, but not very consistent and even copy some of CGSpace items: https://mel.cgiar.org/xmlui/handle/20.500.11766/6911?show=full https://cgspace.cgiar.org/handle/10568/73683 2017-05-02 Atmire got back about the Workflow Statistics issue, and apparently it’s a bug in the CUA module so they will send us a pull request 2017-05-04 Sync DSpace Test with database and assetstore from CGSpace Re-deploy DSpace Test with Atmire’s CUA patch for workflow statistics, run system updates, and restart the server Now I can see the workflow statistics and am able to select users, but everything returns 0 items Megan says there are still some mapped items are not appearing since last week, so I forced a full index-discovery -b Need to remember to check if the collection has more items (currently 39 on CGSpace, but 118 on the freshly reindexed DSPace Test) tomorrow: https://cgspace."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="June, 2017"/>
|
||||
<meta name="twitter:description" content="2017-06-01 After discussion with WLE and CGSpace content people, we decided to just add one metadata field for the WLE Research Themes The cg.identifier.wletheme field will be used for both Phase I and Phase II Research Themes Then we’ll create a new sub-community for Phase II and create collections for the research themes there The current “Research Themes” community will be renamed to “WLE Phase I Research Themes” Tagged all items in the current Phase I collections with their appropriate themes Create pull request to add Phase II research themes to the submission form: #328 Add cg."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -39,7 +39,7 @@ Merge changes for WLE Phase II theme rename (#329)
|
||||
Looking at extracting the metadata registries from ICARDA’s MEL DSpace database so we can compare fields with CGSpace
|
||||
We can use PostgreSQL’s extended output format (-x) plus sed to format the output into quasi XML:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -59,7 +59,7 @@ This was due to newline characters in the dc.description.abstract column, which
|
||||
I exported a new CSV from the collection on DSpace Test and then manually removed the characters in vim using g/^$/d
|
||||
Then I cleaned up the author authorities and HTML characters in OpenRefine and sent the file back to Abenet
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -35,7 +35,7 @@ Linode sent an alert that CGSpace (linode18) was using 261% CPU for the past two
|
||||
|
||||
Ask Sisay to clean up the WLE approvers a bit, as Marianne’s user account is both in the approvers step as well as the group
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -37,7 +37,7 @@ There appears to be a pattern but I’ll have to look a bit closer and try t
|
||||
|
||||
Add Katherine Lutz to the groups for content submission and edit steps of the CGIAR System collections
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -55,7 +55,7 @@ dspace=# \copy (select distinct text_value, count(*) as count from metadatavalue
|
||||
COPY 54701
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -29,7 +29,7 @@ The logs say “Timeout waiting for idle object”
|
||||
PostgreSQL activity says there are 115 connections currently
|
||||
The list of connections to XMLUI and REST API for today:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -163,7 +163,7 @@ dspace.log.2018-01-02:34
|
||||
|
||||
Danny wrote to ask for help renewing the wildcard ilri.org certificate and I advised that we should probably use Let’s Encrypt if it’s just a handful of domains
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -29,7 +29,7 @@ We don’t need to distinguish between internal and external works, so that
|
||||
Yesterday I figured out how to monitor DSpace sessions using JMX
|
||||
I copied the logic in the jmx_tomcat_dbpools provided by Ubuntu’s munin-plugins-java package and used the stuff I discovered about JMX in 2018-01
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -23,7 +23,7 @@ Export a CSV of the IITA community metadata for Martin Mueller
|
||||
|
||||
Export a CSV of the IITA community metadata for Martin Mueller
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -25,7 +25,7 @@ Catalina logs at least show some memory errors yesterday:
|
||||
I tried to test something on DSpace Test but noticed that it’s down since god knows when
|
||||
Catalina logs at least show some memory errors yesterday:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -37,7 +37,7 @@ http://localhost:3000/solr/statistics/update?stream.body=%3Ccommit/%3E
|
||||
Then I reduced the JVM heap size from 6144 back to 5120m
|
||||
Also, I switched it to use OpenJDK instead of Oracle Java, as well as re-worked the Ansible infrastructure scripts to support hosts choosing which distribution they want to use
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -65,7 +65,7 @@ user 8m5.056s
|
||||
sys 2m7.289s
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -39,7 +39,7 @@ During the mvn package stage on the 5.8 branch I kept getting issues with java r
|
||||
There is insufficient memory for the Java Runtime Environment to continue.
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -57,7 +57,7 @@ The server only has 8GB of RAM so we’ll eventually need to upgrade to a la
|
||||
|
||||
I ran all system updates on DSpace Test and rebooted it
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -29,7 +29,7 @@ I’ll update the DSpace role in our Ansible infrastructure playbooks and ru
|
||||
Also, I’ll re-run the postgresql tasks because the custom PostgreSQL variables are dynamic according to the system’s RAM, and we never re-ran them after migrating to larger Linodes last month
|
||||
I’m testing the new DSpace 5.8 branch in my Ubuntu 18.04 environment and I’m getting those autowire errors in Tomcat 8.5.30 again:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -25,7 +25,7 @@ I created a GitHub issue to track this #389, because I’m super busy in Nai
|
||||
Phil Thornton got an ORCID identifier so we need to add it to the list on CGSpace and tag his existing items
|
||||
I created a GitHub issue to track this #389, because I’m super busy in Nairobi right now
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -39,7 +39,7 @@ Send a note about my dspace-statistics-api to the dspace-tech mailing list
|
||||
Linode has been sending mails a few times a day recently that CGSpace (linode18) has had high CPU usage
|
||||
Today these are the top 10 IPs:
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -39,7 +39,7 @@ Then I ran all system updates and restarted the server
|
||||
|
||||
I noticed that there is another issue with PDF thumbnails on CGSpace, and I see there was another Ghostscript vulnerability last week
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -53,7 +53,7 @@ I don’t see anything interesting in the web server logs around that time t
|
||||
903 54.70.40.11
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -81,7 +81,7 @@ user 0m22.203s
|
||||
sys 0m1.979s
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -45,7 +45,7 @@ Most worryingly, there are encoding errors in the abstracts for eleven items, fo
|
||||
|
||||
I think I will need to ask Udana to re-copy and paste the abstracts with more care using Google Docs
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -71,7 +71,7 @@ $ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-2-countries.csv -db dspa
|
||||
$ ./delete-metadata-values.py -i /tmp/2019-02-21-delete-1-region.csv -db dspace -u dspace -p 'fuuu' -m 231 -f cg.coverage.region -d
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -51,7 +51,7 @@ DELETE 1
|
||||
|
||||
But after this I tried to delete the item from the XMLUI and it is still present…
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -37,7 +37,7 @@ Run system updates on CGSpace (linode18) and reboot it
|
||||
|
||||
Skype with Marie-Angélique and Abenet about CG Core v2
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -37,7 +37,7 @@ CGSpace
|
||||
|
||||
Abenet had another similar issue a few days ago when trying to find the stats for 2018 in the RTB community
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -49,7 +49,7 @@ After rebooting, all statistics cores were loaded… wow, that’s luck
|
||||
|
||||
Run system updates on DSpace Test (linode19) and reboot it
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -40,7 +40,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
<meta property="og:type" content="article" />
|
||||
<meta property="og:url" content="https://alanorth.github.io/cgspace-notes/2019-09/" />
|
||||
<meta property="article:published_time" content="2019-09-01T10:17:51+03:00" />
|
||||
<meta property="article:modified_time" content="2019-09-20T13:25:59+03:00" />
|
||||
<meta property="article:modified_time" content="2019-09-21T02:25:19+03:00" />
|
||||
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="September, 2019"/>
|
||||
@ -75,7 +75,7 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
9124 45.5.186.2
|
||||
|
||||
"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
@ -85,9 +85,9 @@ Here are the top ten IPs in the nginx XMLUI and REST/OAI logs this morning:
|
||||
"@type": "BlogPosting",
|
||||
"headline": "September, 2019",
|
||||
"url": "https:\/\/alanorth.github.io\/cgspace-notes\/2019-09\/",
|
||||
"wordCount": "2166",
|
||||
"wordCount": "2325",
|
||||
"datePublished": "2019-09-01T10:17:51\x2b03:00",
|
||||
"dateModified": "2019-09-20T13:25:59\x2b03:00",
|
||||
"dateModified": "2019-09-21T02:25:19\x2b03:00",
|
||||
"author": {
|
||||
"@type": "Person",
|
||||
"name": "Alan Orth"
|
||||
@ -510,6 +510,31 @@ $ dspace import -a me@cgiar.org -m 2019-09-20-bioversity2.map -s /home/aorth/Bio
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="2019-09-21">2019-09-21</h2>
|
||||
|
||||
<ul>
|
||||
<li>Re-upload the <a href="https://dspacetest.cgiar.org/handle/10568/105116">IITA Sept 6 (20196th.xls) records to DSpace Test</a> after I did the re-sync yesterday
|
||||
|
||||
<ul>
|
||||
<li>Then I looked at the records again and sent some feedback about three duplicates to Bosede</li>
|
||||
<li>Also I noticed that many journal articles have the journal and page information in the citation, but are missing <code>dc.source</code> and <code>dc.format.extent</code> fields</li>
|
||||
</ul></li>
|
||||
<li>Play with language identification using the langdetect, fasttext, polyglot, and langid libraries
|
||||
|
||||
<ul>
|
||||
<li>ployglot requires too many system things to compile</li>
|
||||
<li>langdetect didn’t seem as accurate as the others</li>
|
||||
<li>fasttext is likely the best, but <a href="https://github.com/facebookresearch/fastText/issues/909">prints a blank link to the console when loading a model</a></li>
|
||||
<li>langid seems to be the best considering the above experiences</li>
|
||||
</ul></li>
|
||||
<li>I added very experimental language detection to the <a href="https://github.com/ilri/csv-metadata-quality">csv-metadata-quality</a> module
|
||||
|
||||
<ul>
|
||||
<li>It works by checking the predicted language of the <code>dc.title</code> field against the item’s <code>dc.language.iso</code> field</li>
|
||||
<li>I tested it on the Bioversity migration data set and actually managed to correct about eight incorrect language fields in their records!</li>
|
||||
</ul></li>
|
||||
</ul>
|
||||
|
||||
<!-- vim: set sw=2 ts=2: -->
|
||||
|
||||
|
||||
|
@ -14,7 +14,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="404 Page not found"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Categories"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGIAR Library Migration"/>
|
||||
<meta name="twitter:description" content="Notes on the migration of the CGIAR Library to CGSpace"/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="CGSpace Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Posts"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -4,27 +4,27 @@
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/</loc>
|
||||
<lastmod>2019-09-20T13:25:59+03:00</lastmod>
|
||||
<lastmod>2019-09-21T02:25:19+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/notes/</loc>
|
||||
<lastmod>2019-09-20T13:25:59+03:00</lastmod>
|
||||
<lastmod>2019-09-21T02:25:19+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/posts/</loc>
|
||||
<lastmod>2019-09-20T13:25:59+03:00</lastmod>
|
||||
<lastmod>2019-09-21T02:25:19+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/2019-09/</loc>
|
||||
<lastmod>2019-09-20T13:25:59+03:00</lastmod>
|
||||
<lastmod>2019-09-21T02:25:19+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
<loc>https://alanorth.github.io/cgspace-notes/tags/</loc>
|
||||
<lastmod>2019-09-20T13:25:59+03:00</lastmod>
|
||||
<lastmod>2019-09-21T02:25:19+03:00</lastmod>
|
||||
</url>
|
||||
|
||||
<url>
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Notes"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
@ -15,7 +15,7 @@
|
||||
<meta name="twitter:card" content="summary"/>
|
||||
<meta name="twitter:title" content="Tags"/>
|
||||
<meta name="twitter:description" content="Documenting day-to-day work on the [CGSpace](https://cgspace.cgiar.org) repository."/>
|
||||
<meta name="generator" content="Hugo 0.58.2" />
|
||||
<meta name="generator" content="Hugo 0.58.3" />
|
||||
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user