Prediction modelling in the real world: Improving predictions of chronic diseases using longitudinal observational datasets

Summary of research

Healthcare is moving into an era of ‘big data’ and the, already large, volume of information available to health researchers and clinicians is set to increase rapidly as electronic medical records from primary and secondary care, patient-reported outcomes, eHealth and genetic/laboratory results are incorporated into NHS services aspiring to be ‘Learning Health Systems’ – providing diagnoses, therapies and recommendations to patients informed not only by the latest clinical trials, but also by the real-world experiences over time of large numbers of patients with the same conditions.

Clinical decisions are optimally based on the ‘gold standard’ of randomised controlled trials (RCTs). However, these can be prohibitively expensive, unethical or impractical, particularly for low prevalence conditions and minority or high-risk population subgroups. Recent research has demonstrated that often less than half of patients in the population would be eligible to take part in an RCT. The result of this is that a large proportion of the population are effectively isolated from evidence-based medicine. This problem is compounded because the very patients not covered by RCTs (minority groups, high risk patients and patients with multi-morbidities) are the ones with the greatest healthcare needs. Evidence from real-world observational data, complementary to RCT evidence, allows for scaling up of investigation, more systematic analyses based on heterogeneous populations and for the ability to track patients’ medical history over a period of years.

Despite the transformational potential that the availability and integration of large and diverse real-world medical data promises; the sheer volume, variety and complexity of such data presents challenges that can only be met with a multidisciplinary approach combining engineering, statistical and clinical expertise within institutions that are dedicated to driving innovation in healthcare knowledge discovery and the harnessing of real-world evidence from diverse sources. Inherently large and complex real-world observational datasets present many challenges to predictive modelling: observations are not collected at regular times, data are often missing and some groups of patients (e.g. older and high risk patients) are often under-represented in the data. Finally, most commonly used data management and statistical methods and software packages are ill-equipped to deal with large, complex and heterogeneous data.

This study will improve on current research methods to make better use of longitudinal observational data. Specifically, it will focus on two applications of this work:

Developing new risk models to predict risks of chronic diseases.

Current models for predicting risk for chronic diseases (such as cardiovascular disease and diabetes) that are used by clinicians and patients alike are adequate for prediction at the population level (e.g. X% of patients in a given group are likely to develop the condition in the next Y years). However, these models often do not perform well at the individual level (i.e. what is the risk for a particular patient of developing a condition within a given time-frame?), Patients at high risk often receive the least accurate predictions, and are the group most in need of accurate risk predictions . I will identify what is causing current risk models to perform poorly at these important extremes and develop new models that are tuned to individual level prediction, making use of the full medical record, incorporating variation in clinical indicators (such as blood pressure and blood sugar levels) over time, giving greater weight to high risk groups and exploring the relationships between risk factors. These models will be made available as an online tool and presented as intuitively as possible with a specified range of confidence.

Investigating the effect of within-patient variability in clinical indicators.

Patients with wide variation over time in clinical indicators such as blood pressure and blood sugar may be at higher risk than patients with consistently high levels. Also, instability in such measures could indicate important patient characteristics, for example that patients are not taking their medication regularly. The effectiveness of some drugs has been attributed to their ability to stabilise these measures, as well as simply to reduce them to a ‘safe’ level. Routinely collected healthcare data is potentially a powerful, yet underused, resource for such data. I will investigate how to best use this ‘within-patient’ or ‘visit-to-visit’ variation to better predict future complications and comorbidities from chronic diseases.

This project will improve outcomes for patients by improving on current risk tools for clinical practice and pave the way for including the health record into these tools. It will result in publications that inform the use of big healthcare to improve services and will be an important stepping stone to further work in this important area.

Patient and Public Involvement

Patient and Public Involvement (PPI) is an important aspect of all modern healthcare research. In this project I am particularly concerned with the public understanding and perception of risk of developing (and developing medical problems associated with) long-term conditions.

Questions to be addressed with PPI in this project:

  • How can risk scores and predictive models be best presented to patients and doctors?
  • How can risk be communicated most effectively without over-simplifying the results?
  • How can the public be made more aware of understanding of risk?
  • What tools can be used to improve public understanding of risk of long-term conditions?

If you are interested in how your medical data can be better used to improve YOUR healthcare, we are looking for patients to help us shape this project. See this PPI Newsletter for further details