In December 2013, the U.S. Food and Drug Administration approved the first high-throughput DNA sequencer (also known commonly as a “gene sequencer”), an instrument that allows laboratories to quickly and efficiently sequence a person’s DNA for genetic testing, medical diagnoses and perhaps one day, customized drug therapies. Helping get the new device approved was another first: the initial use of a reference set of standard genotypes, or “coded blueprints” of a person’s genetic traits. The standard genotypes were created by the National Institute of Standards and Technology (NIST) and collaborators within the NIST-hosted Genome in a Bottle consortium.
“Two years ago, NIST hosted Genome in a Bottle – a group that includes stakeholders from industry, academia and the federal government – to develop reference materials that could measure the performance of equipment, reagents and mathematical algorithms used for clinical human genome sequencing,” says NIST biomedical engineer Justin Zook. “Our goal is to provide well-characterized, whole genome standards that will tell a laboratory how well its sequencing process is working, sort of a ‘meter stick of the genome.’”
Modern DNA sequencers take a genetic sample in the form of long strings of DNA and randomly chop the DNA into small pieces that can be individually analyzed to determine their sequence of letters from the genetic alphabet. Then, bioinformaticians apply complex mathematical algorithms to identify from which part of the genome the pieces originated. These pieces can then be compared to a defined “reference sequence” to identify where mutations have occurred in specific genes.
There are several different DNA sequencing technologies and computer algorithms to do this very complex analysis, and it’s known that for any given sample, they will produce similar, but not identical results. Built-in biases as well as what are essentially “blind spots” for certain possible sequences contribute to uncertainties or errors in the sequence analysis. “These biases can lead to hundreds of thousands of differences between sequencing technologies and algorithms for the same human genome,” Zook says.
In a recent paper in Nature Biotechnology,* Zook and his colleagues describe the methods used to make the Genome in a Bottle consortium’s pilot set of genotype reference materials. The source DNA, known as NA12878, was taken from a single person. The reference set is essentially the first complete human genome to have been extensively sequenced and re-sequenced by multiple techniques, with the results weighted and analyzed to eliminate as much variation and error as possible.
“We minimized bias in our reference materials toward any specific DNA sequencing method by comparing and integrating data from 14 sequencing experiments generated by five different sequencing platforms,” Zook says.
The findings in the Nature Biotechnology paper are publicly available from the Genome in a Bottle website, http://www.genomeinabottle.org. In addition, the Genome Comparison and Analytic Testing (GCAT) website enables real-time benchmarking of any DNA sequencing method using the paper’s results. The research was conducted by a team of scientists at NIST; Harvard University; the Virginia Bioinformatics Institute at Virginia Tech University; and an Austin, Texas, genetic company, Arpeggi Inc. (now part of Gene by Gene Ltd.).
After characterizing the NA12878 pilot, samples of the DNA will be issued as a NIST Reference Material. The Genome in a Bottle consortium also plans to develop well-characterized whole genome reference materials from two genetically diverse groups: Asians and Ashkenazi Jews. Both reference sets will include sequenced genes from father-mother-child “trios” to utilize genetic links between family members.
* J.M. Zook, B. Chapman, J. Wang, D. Mittelman, O. Hofmann, W. Hide and M. Salit. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nature Biotechnology Published online Feb. 16, 2014. doi:10.1038/nbt.2835