Advancing disease mapping techniques

Healthcare company GSK is using our computing expertise to identify connections between different genes and diseases, with a view to developing more effective treatments.

A man and a woman looking at a big data visualisation on a curved screen that fills the entire room


Understanding the role our genes play in the progression of a disease can provide the key to developing a treatment or cure, and also to identifying new therapeutic uses for existing drugs. As a leader in pharmaceutical research, global healthcare company GSK has developed specialist software that creates “networks” to visualise the relationships between diseases, biological pathways and genes. This is done by analysing millions of biomedical research publications to identify correlations and the frequency at which they occur.

An ordinary computer monitor can also only show – and the human brain can only comprehend – a limited part of the network, restricting the view to small portions of data at a time. This means that vital connections can be missed, that may have become apparent if researchers were able to see the bigger picture. 


GSK took their project to the Hartree Centre to test new data clustering and visualisation techniques. The team were able to focus the analysis by targeting a specific, currently incurable disease, and also explored several known drug targets with a potential for application to other diseases. Beyond that, the power of the Hartree Centre’s data analytics and visualisation facilities also allowed GSK to see the network in its entirety – something that had previously been impossible.


The ability to analyse the network as a whole by data objective methods, rather than in parts, has enabled GSK scientists to extract valuable insights and identify subtleties in the connections between the genes and biological pathways. Increasing understanding is invaluable to a company focussed on healthcare innovation, and could eventually aid in the development of new treatments to target these different genes and pathways. Discovering new correlations will create a strong value proposition for any drug development work or improved treatments established based on the results.

A follow-on project will expand on this approach, include new data types and begin to apply it to other diseases, demonstrating that data mining techniques applied to large open datasets can be of real use in the development of new treatments for complex chronic conditions.

“We were impressed with how our input data was thoroughly analysed, how appropriate visualisation methods were evaluated and used, and with the communication throughout the project.”

Peter Woollard, GlaxoSmithKline

Join Newsletter

Provide your details to receive regular updates from the STFC Hartree Centre.