Advanced data analysis is helping scientists to find and validate gene signatures linked to diabetes, says Carl-Johan Ivarsson, President of Qlucore, so that treatments can be matched to individual patients more closely
Diabetes is a common life-long health condition. According to Diabetes UK, there are nearly 3 million people diagnosed with diabetes in the UK, and an estimated 850,000 people who have the condition but don’t know it. On a global scale, an estimated 285 million people are currently living with diabetes, and this number is expected to grow to 438 million by 2030, according to the World Diabetes Foundation.
In simple terms, diabetes is a condition where the amount of glucose in the blood is too high because the body can’t use it properly. This condition can be caused by one of two scenarios: either the pancreas isn’t producing enough (or indeed any) insulin, which is the hormone that allows glucose to enter the body’s cells so that it can be used as fuel, or else the insulin that is being produced just isn’t working properly, which means that glucose simply builds up in the blood and therefore can’t be used for energy.
Understanding the pathogenesis of diabetes can be difficult, mainly because of the long autoimmune process occurring before clinical onset, and also because it is not always feasible to study the pancreas directly. That’s why blood samples are often used, since blood cells will often register subtle changes in gene expression in association with disease. As a result, many researchers choose to study the gene expression profiles of circulating blood cells using microarrays in order to improve the care and treatment of diabetes, prevent it from developing in those at risk and, ultimately, to find a cure.
Research in this area remains challenging, however, as diabetes is more like a web of interconnected complications, rather than a single disease. As a result, rather than looking for a single treatment or cure for diabetes, many researchers are now hoping to develop a variety of treatments in order to meet the needs of individual patients.
High hopes for tailor-made treatment
The idea of ‘personalised medicine’ is receiving a lot of attention at the moment, since a diverse group of patients are likely to require diverse approaches to treatment, even if they have the same condition. A vital part of this approach, however, is the ability to identify the specific genetic signatures that are linked to different types of diabetes, so that the most effective combination of therapies can be used, depending on an individual patient’s own circumstances. Some patients, for example, may have a type of diabetes that is linked to problems with the liver, whilst others may have a form of the disease that is being affected by muscle tissue, fat cells, or even their brain.
In some cases, insulin resistance can even be caused by alterations in one particular gene that contains the information for a protein that “turns on” the gene for the insulin receptor. In this scenario, it is essential to determine which specific mechanisms in this gene are defective – and thus causing the body to be resistant to insulin – so that researchers can begin to explore which specific therapies would be most effective for this type of diabetes.
In order to begin this type of research, scientists often choose to perform microarray analysis on blood samples in order to find a specific gene expression signature that can be linked with development of different forms of diabetes. However, this typically involves the analysis of hundreds, if not thousands, of patients and control individuals, leaving scientists to try to look for any recognisable patterns within huge arrays of genes, proteins and/or RNA molecules.
The data that comes out of this type of analysis is an incredible, amount of information, which means that it has become increasingly difficult to identify which genes are relevant, and to what degree. To make matters even more challenging, research groups working in this area typically consist of a collection of highly trained specialists, each of whom has a unique technical skill. As a result, each individual person on the research team – whether a pathologist, molecular biologist and/or biostatistician – is often so specialised that none of them fully understands exactly what his or her colleagues are doing.
Software supports easier analysis
Despite these challenges, it is absolutely essential for scientists who are studying diabetes to capture, explore, and analyse this vast amount of data effectively, since this information is vital if they are to apply their findings to real-world conditions. Fortunately, the latest software in this area is now helping to accelerate and facilitate the understanding of both the context and relationships of the information contained within large data sets by displaying them graphically, in real time.
The simplicity of this interaction now makes it possible for researchers to work with powerful and statistical analysis in entirely new ways.Not only that, but faster analysis also means that scientists often have more time to test more creative theories, which in turn leads to better research results.
When analysing this type of research data, scientists often rely on Principal Component Analysis (PCA), a method that can be used to project high dimensional data down to lower dimensions. At this stage, specialist software can then be used to plot the lower dimension data produced via PCA onto a two-dimensional computer screen, so that full-colour 3D images can be rotated and examined with the naked eye more easily. These same applications can then be used to manipulate the different PCA-plots – interactively and in real time – complete with all annotations and other integrated links, as well as a number of powerful statistical functions such as false discovery rates (FDR) and p-values.
Can data visualisation help to reveal valid genetic signatures?
The ability to visualise research data in 3D represents a very powerful tool for scientists who are looking for valid genetic signatures that can be linked to diabetes, since the human brain is very good at detecting structures and patterns. As such, this approach offers a way to transform raw data into a comprehensible graphical format, so that scientists can make decisions based on information that they can understand more easily.
As a result, it’s now possible to investigate the output of large clinical trials very quickly, and therefore test different hypotheses and explore alternative scenarios within seconds. As a result, scientists looking for different genetic signatures can now drastically shorten their analysis time when attempting to identify relevant structures in their data.
With the latest data analysis software, it’s now possible to compare the gene expression profiles of different blood samples by studying data sets that have been generated by microarrays or Next Generation Sequencing (NGS)-based RNA sequencing (RNA-seq) techniques, and to generate a list of genes that classifies data based on a selection of statistical tests, including f-tests, t-tests or regression.It is then possible to investigate any structures in the data by using variance filtering combined with PCA and hierarchical clustering.
At this stage, it is very easy for researchers to remove any unwanted dependencies and batch effects with just a single click, and to work with variable PCA plots to find any correlation and/or networks amongst the selected genes. With this approach, it’s now possible to visualise and rapidly explore two years’ worth of microarray data in just a few hours.
Software can make the analysis of genetic data much easier
Researchers who are studying diabetes are often interested in finding genes that are differentially expressed between the different samples that represent different diabetes status, and also hope to isolate any variables that can be used to discriminate the non-diabetic samples from the diabetic samples.
Data visualisation can provide great insights here. For example, a heatmap can be used to show whether, among the remaining genes, some are most highly expressed in the non-diabetic blood samples, whilst others are most highly expressed in the diabetic samples. These differences can help to determine whether the generated signature is associated with diabetes status in the chosen data set, as well as discriminating most of the non-diabetic samples from the diabetic samples.
Already, scientists are making great progress in identifying molecular risk factors that indicate susceptibility to Type 2 diabetes, and are thus helping to provide an early warning sign that may lead to new approaches to treatment. For the first time, early results have shown that this novel research approach is capable of providing a clear-cut, disease-predisposing DNA methylation signature. Discoveries like these will continue to play a vital role in diabetes research, and will no doubt help to pave the way for more targeted treatments for this condition.