Bio-IT, September 28, 2010 |
The term next-generation sequencing (NGS) has been around for so long it has become almost meaningless. We use “NGS” to describe platforms that are so well established they are almost institutions, and future (3rd-, 4th-, or whatever) generations promising to do for terrestrial triage what Mr Spock’s Tricorder did for intergalactic health care. But as the costs of consumables keep falling, turning the data-generation aspect of NGS increasingly into a commodity, the all-important problems of data analysis, storage, and medical interpretation loom ever larger.
“There is a growing gap between the generation of massively parallel sequencing output and the ability to process and analyze the resulting data,” says Canadian cancer research John McPherson, feeling the pain of NGS neophytes left to negotiate “a bewildering maze of base calling, alignment, assembly, and analysis tools with often incomplete documentation and no idea how to compare and validate their outputs. Bridging this gap is essential, or the coveted $1,000 genome will come with a $20,000 analysis price tag.”
“The cost of DNA sequencing might not matter in a few years,” says the Broad Institute’s Chad Nusbaum. “People are saying they’ll be able to sequence the human genome for $100 or less. That’s lovely, but it still could cost you $2,500 to store the data, so the cost of storage ultimately becomes the limiting factor, not the cost of sequencing. We can quibble about the dollars and cents, but you can’t argue about the trends at all.”
But these issues look relatively trivial compared to the challenge of mining a personal genome sequence for medically actionable benefit. Stanford’s chair of bioengineering, Russ Altman, points out that not only is the cost of sequencing “essentially free,” but the computational cost of dealing with the data is also trivial. “I mean, we might need a big computer, but big computers exist, they can be amortized, and it’s not a big deal. But the interpretation of the data will be keeping us busy for the next 50 years.”
Or as Bruce Korf, the president of the American College of Medical Genetics, puts it: “We are close to having a $1,000 genome sequence, but this may be accompanied by a $1,000,000 interpretation.”
The “$1,000 genome” is, in the view of Infinity Pharmaceuticals’ Keith Robison, an “arbimagical goal”—an arbitrary target that has nevertheless obtained a magical notoriety through repetition. The catchphrase was first coined in 2001, although by whom isn’t entirely clear. The University of Wisconsin’s David Schwartz insists he proposed the term during a National Human Genome Research Institute (NHGRI) retreat in 2001. During a breakout session, he said that NHGRI needed a new technology to complete a human genome sequence in a day. Asked to price that, Schwartz paused: “I thought for a moment and responded, ‘$1,000.’” However, NHGRI officials say they had already coined the term.
The $1,000 genome caught on a year later, when Craig Venter and Gerry Rubin hosted a major symposium in Boston (see, “Wanted: The $1000 Genome,” Bio•IT World, Nov 2002). Venter invited George Church and five other hopefuls to present new sequencing technologies, none more riveting than U.S. Genomics founder Eugene Chan, who described an ingenious technology to unfurl DNA molecules that would soon sequence a human genome in an hour. (The company abandoned its sequencing program a year later.)
Another of those hopefuls was 454 Life Sciences, which in 2007 made Jim Watson the first personal genome using NGS, at a cost of about $1 million. Since then, the cost of sequencing has plummeted to less than $10,000 in 2010. Much of that has been fueled by the competition between Illumina and Applied Biosystems (ABI). When Illumina said its HiSeq 2000 could sequence a human genome for $10,000, ABI countered with a $6,000 genome dropping to $3,000 at 99.99% accuracy.
Earlier this year, Complete Genomics reported its first full human genomes in Science. One of those belonged to George Church, whose genome was sequenced for about $1,500. CEO Cliff Reid told us earlier this year that Complete Genomics now routinely sequenced human genomes at 30x coverage for less than $1,000 in reagent costs.
The ever-quotable Clive Brown, formerly a central figure at Solexa and now VP development and informatics for Oxford Nanopore, a 3rd-generation sequencing company says: “I like to think of the Gen 2 systems as giant fighting dinosaurs, ‘[gigabases] per run—grr—arggh’ etc., a volcano of data spewing behind them in a Jurassic landscape—Sequanosaurus Rex. Meanwhile, in the undergrowth, the Gen 3 ‘mammals’ are quietly getting on with evolving and adapting to the imminent climate change... smaller, faster, more agile, and more intelligent.”
Nearly all the 2nd-generation platforms have placed bets on 3rd-gen technologies. Illumina has partnered with Oxford Nanopore; Life Technologies has countered by acquiring Ion Torrent Systems; and Roche is teaming up with IBM. PacBio has talked about a “15-minute” genome by 2014, Halcyon Molecular promises a “$100 genome,” while a Harvard start-up called GnuBio has placed a bet on a mere $30 genome.
David Dooling of The Genome Center at Washington University, points out the widely debated cost of the Human Genome Project included everything—the instruments, personnel, overhead, consumables, and IT. But the $1,000 genome—or in 2010 numbers, the $10,000 genome—only refers to flow cells and reagents. Clearly, the true cost of a genome sequence is much higher (see, “The Grand Illusion”). In fact, Dooling estimates the true cost of a “$10,000 genome” as closer to $30,000, by the time one has considered instrument depreciation and sample prep, personnel and IT, informatics and validation, management and overheads.
“If you are just costing reagents, most of the vendors could claim a $1,000 genome right now,” says Brown. “A more interesting question is: ‘$1,000 genome—so what?’ It’s an odd goal because the closer you get to it the less relevant it becomes.”
This special issue of Bio•IT World contains a series of stories and essays that provide some useful perspectives on the march to the $1,000 genome, which some regard as a medical imperative and others a grand illusion.
We get an up-close look at sequencing operations at the Broad Institute, which has been the U.S. flagship genome center for a decade (see page 30). We also meet the leaders of BGI Americas, which aims to provide sequencing capacity and analysis for labs big and small, while managing editor Allison Proffitt gleefully visits BGI’s prized new sequencing center under construction in Hong Kong (page 42).
We look at the genesis of Solexa, the British company that provided the raw technology for Illumina, the best-selling NGS platform to date (page 52). We meet Kevin Ulmer, a man who has spent more than three decades trying to develop the killer app for the $1,000 genome (page 64). And we meet NABsys, a 3rd-generation technology taking aim at the myriad clinical applications of NGS (page 61).
Given that the costs of data analysis and storage will increasingly dominate the NGS equation, Alissa Poh reviews some of the latest software solutions on offer (page 58), while Allison Proffitt appraises some of the latest data storage technologies (page 38).
Finally, we meet some of the organizations—from bioinformaticians and medical geneticists to pathologists and software engineers—who are developing new ideas and resources for clinical genomic interpretation (page 48). And we profile Hugh Rienhoff, physician and founder of My Daughter’s DNA.org, and follow his inspirational quest to solve his daughter’s mystery condition (page 34).
Also in this report are invited commentaries from genomics experts at two big pharma—Amgen’s Sasha Kamb and Novartis’ Keith Johnson and colleagues—discussing the potential applications and adoption hurdles to NGS in pharma. We also have our regular columns, including BioTeam’s Michele Clamp and our colleague Eric Glazer on social media and a preview of an exciting online community called NGS Leaders.
We hope you enjoy this special report on the road to the $1,000 genome as much as we have enjoyed reporting and preparing it.
—Kevin Davies, Mark Gabrenya and Allison Proffitt