The fight against heart disease gets uploaded and linked with MRI data and a publication that includes a virtual machine — a downloadable clone of software and data used — promoting reproducibility and data availability to test and improve MRI analyses
Published in the Open Access and Open Data Journal GigaScience, researchers from Universidad Politécnica de Madrid in Spain and the National Institutes of Health in the USA provide a fantastic example of open data sharing to help build these exact tools: a wealth of patient imaging data1 . Even better: to enable reproducible comparisons between new tools, the researchers and journal have taken the unusual step of publishing and packaging the data alongside tools, scripts and the software required to run the experiments. This is available to download from GigaScience‘s GigaDB database2 as a “virtual hard disk” that will specifically allow researchers to directly run the experiments themselves and to add their own annotations to the data set.
The most common cause of heart attacks is coronary heart disease. Diagnosis is key to beginning treatment for preventing such events. One useful tool in the fight against this leading killer is magnetic resonance imaging, which allows the direct examination of blood flow to the myocardium of the heart. However, for this perfusion analysis technique to be the most effective requires compensation for the breathing motion of the patient, which is done using complex image processing methods. Thus, there is a need to improve these tools and algorithms. The key to achieving things is the availability of large publicly available MRI datasets to allow testing, optimization and development of new methods.
As one potential user of these resources, Professor Alistair Young, Technical Director of the Auckland Magnetic Resonance Research Group commented: “Very large amounts of medical imaging data are now becoming available through registries and large population studies. Well validated, automated methods are required to derive maximum benefit from such resources. The paper by Wollny and Kellman exemplifies how data and algorithm sharing can advance the field by providing a platform by which existing methods can be tested and new methods validated against existing benchmarks. Such benchmarking datasets are essential to advance the field through objective metrics and standards.”
Having everything wrapped up in a Virtual Machine also made things simpler during the scientific peer-review and publication process, as the settings, packages and file locations were already set up in a working configuration. One of the people carrying out this testing process, Dr Robert Davidson Data Scientist at GigaScience stated “Actually testing the code during review is sadly almost a novel concept and one that needs to roll out as a standard. But even more: if it’s easy for the reviewers, it’s easy for the community to use too.”
As well as being important for improving the diagnosis for the number one cause of death world wide3 , the continuing rise in retractions of published scientific articles, makes the addition of direct means to improve article reproducibility is essential, both for the ability to be able to trust current findings –on which future studies are built– and to prevent the public losing confidence in the research community they fund. Publishing a virtual machine, an interactive and executable publication provides an example to the scientific community and test case demonstrating a potential new type of scholarly output.
1. Wollny, G; Kellman, P: Free breathing myocardial perfusion data sets for performance analysis of motion compensation algorithms. GigaScience 2014 3:23 doi:10.1186/2047-217X-3-23
2. Wollny, G; Kellman, P (2014): Supporting material for: “Supporting material for: “Free breathingly acquired myocardial perfusion data sets for performance analysis of motion compensation algorithms”.“. GigaScience Database. doi. 10.5524/100106
3. World Health Organization (2012)