About me


I am senior data scientist at RealityMine Ltd, a media and market research data company, where I use a range of statistical and big data tools to extract value and insight from our ~1TB/day of media and consumer data. On a day-to-day basis I operate mainly in R, Python and Scala. My current interests at work are in data fusion, classification and taxonomy of web traffic data, predictive analytics and design and analysis of online surveys.

I was previously employed as a research fellow in the Centre for Biostatistics and Centre for Primary Care at the University of Manchester. There I worked with large-scale Electronic Medical Record Databases containing millions of electronically captured doctor-patient consultations across hundreds of GP practices in order to address important questions in public health. I am also worked to improve accessibility to such data.

In 2012 I completed a PhD in evolutionary biology, studying plastic and genetic responses to environmental changes. My previous work has covered a range of topics including evolutionary genetics, artificial life, machine learning, computational biology and bioinformatics.

I started coding as a kid building text adventures and fractals in AmigaBASIC, but I have been programming seriously since 2008. I am expert in R and Python and fluent in Clojure, Scala and SQL. I also speak conversational Common Lisp, Scheme, Processing, C and Stata. I have skills and experience in acquiring, processing, modelling and visualising large and complicated datasets. I am interested in open source software; open data; health informatics and functional, statistical and data programming.

I am a dad of two young girls, an avid reader, calisthenics enthusiast, and connoisseur of lo-fi indie pop. I eat a 100% plant-based diet and occasionally play the flamenco guitar.

Get in contact:

I also set up a number of side projects:

  • rOpenHealth – A collaborative project to build R tools facilitating access to quantitative healthcare and public health data (twitter)
  • EMR_research papers – A real time twitter feed for research articles on electronic medical records
  • ClinicalCodes – An online clinical codes repository to improve validity and reproducability of medical database research (twitter)

About this site


This site is concerned with my various interests, in and around data science. I am interested in open source, open access and open data in academia, bio- and health informatics and distilling complex datasets from a range of sources into meaningful information.

This blog is in its fourth incarnation. I first hosted it on Blogger before moving it to Github pages. Upon the move, I first built the static site in Clojure using the Misaki Static site generator, (which I edited slightly to meet my needs). Then, I changed again to using my own Samatha static site engine which is written entirely in R. I have now moved to Octopress. I am very tempted to change again to have the whole site in a single org-mode file. Once I stop messing around with platforms, I may get around to posting more frequently.

Comments