Cancer remains one of the great challenges of our time. It is the second leading cause of death globally and kills about 9 million people every year. The number of new cases is expected to rise by about 70% over the next two decades. A highlight from the NCRI conference held in Glasgow last autumn was a session which asked the question: are data-driven approaches the way forward in tackling these challenges?
The task for the session’s three presenters – Andrew Morris, Richard Martin and Eva Morris – wasn’t insignificant: every single cancer patient can generate nearly one terabyte of biomedical data. Data come from a vast array of sources – including patient history, hospital and primary care and diagnostic imaging. Recently, the gathering of detailed genetic information on both patients and their tumours has become far more commonplace – as has the use of techniques such as machine learning with the capacity to mine the growing mountains of data at our disposal. The hope is that somewhere within these huge datasets lie clues to better diagnosis and treatment of cancer.
The session explored the cancer data landscape from three fascinating perspectives. Richard Martin, from Bristol University, tackled the issue of genomic information – almost every day, it seems, scientists discover an association between ‘snips’ of genes (SNPs) and various cancer traits. Genome-wide-association studies (GWAS’s) are slowly uncovering these associations, but we face a problem – how do we make sense of all these data, and how can we tease out the individual contributions of genes, the environment and lifestyle?
Richard reported on the ‘MR-Base’ database – it contains 45.6 billion SNP-trait associations from >5200 GWAS studies, on >4 million individuals. It supports ‘Mendelian randomisation’ studies which can examine important associations between genes and cancer traits, and explore genetic influences on treatment efficacy and side effects – all within weeks rather than the years it takes for traditional epidemiological studies. It’s an important step forward in genetic data utilisation, with the potential to develop personalised cancer risk and treatment models with far greater efficiency.
Eva Morris, of Leeds University, demonstrated what can be achieved when datasets relevant to single cancer (in this case bowel cancer) are brought together and interrogated. ‘Big data’ come in many shapes and forms; patients generate data when they visit GPs with symptoms, undertake screening tests, have investigations and undergo treatment. Vital in big data is linkage – for example, linking bowel screening data with cancer registries and other databases can provide the intelligence needed to optimise many aspects of this national screening programme. Other examples include analyses of colonoscopy effectiveness and outcomes, and monitoring of treatment practices such as radiotherapy and adjuvant chemotherapy. Recognition of the importance of this kind of cancer intelligence is growing; we are awash with routinely-collected cancer data, and Eva illustrated how the potential of these data can be unleashed to improve patient outcomes.
So, how should the UK respond to the challenge of data-driven innovation? There’s recognition that much of the data we collect is under-utilised – it’s potential to improve patient outcomes can only be realised through wide-scale collective effort. But health data are collected by multiple agencies which don’t necessarily link up. It varies in quality, it isn’t standardised, and there are complex issues of data ownership and patient confidentiality. To help us through this conundrum, Andrew Morris described the establishment of Health Data Research UK (HDRUK), which he leads. Andrew reminded us of the prize in cancer control; a new era in which diagnosis is pre-emptive, driven by risk profiling, where treatments are personalised, maximising efficacy and minimising side effects. He described the ‘new social contract’ needed to underpin these changes; where data are shared efficiently and safely, respecting patient autonomy.
For data to be useful clinically we need scale – that is, data from multiple sources from huge numbers of patients need to be combined, particularly for uncommon conditions (such as rare cancers). Accordingly, HDR-UK will create a “thriving, UK-wide network of inter-disciplinary research expertise that will disrupt traditional science and transcend disciplines, by enabling new scientific discovery from large multi-dimensional datasets”. It sounds ambitious, but HDRUK is up and running – critically, cancer needs to feature highly in UK-wide efforts such as these in the years ahead, or we risk falling behind data-led improvements in other common chronic illnesses
All in all, a fascinating session. It seems the challenge is less in accumulating patient data but, rather, the ability to properly manage and analyse it all effectively. We also face challenges over issues of data standardization and how data are collected, stored, and studied; and data quality is, of course, influenced by human factors – ‘big data’ approaches depend on quality and completeness of data. Nevertheless, data-driven technologies have the potential to deliver tailor-made prevention strategies and treatments to patients – the challenge is determining how we can work together to reap these rewards.
David Weller is Professor of General Practice at the Centre for Population Health Sciences, Edinburgh University.