Advancing genome sequencing at the speed of light
New technologies, lower costs improve health protection—from both inside and out
It began on December 12, 2019, when several residents of Wuhan City, in the Chinese province of Hubei, presented to medical facilities with a mysterious form of pneumonia. By early January 2020, New Scientist was reporting that 59 Chinese citizens had contracted the illness.
“It seems that a new virus or bacteria might be the cause of the disease,” the journal quoted Duke University researcher Shenglan Tang as saying. “That is worrying somehow.”
Local authorities were quick to link the outbreak to a seafood market, which was ordered closed January 1, and by mid-January the Wuhan Municipal Health Committee still had no evidence the illness could be passed among humans.
The beast among us
We know now that the most catastrophic virus in more than a century was on the loose, but exactly what it was, how it could be spread, and the best measures to combat its transmission only became known as quickly as it did because of advanced genome sequencing.
On January 24, the New England Journal of Medicine published the genome of what was then called SARS-CoV-2—the fastest that the complete genome of a novel infectious agent had been analyzed. Within two weeks, the Global Initiative to Share All Influenza Data and GenBank, a database of publicly available nucleotide sequences for more than 300,000 organisms, had shared more than 80 SARS-CoV-2 genomes.
Analysis of the data indicated that what has become known as COVID-19 was closely related to a SARS-like coronavirus found in bats. Further, molecular modelling based on the genome sequencing revealed significant differences between the two coronaviruses, indicating that the new bug was likely to adapt quickly for life in human hosts. What’s more, the diversity in a surface protein within the two coronaviruses led researchers to conclude that known vaccines would be ineffective. But the sequencing of the genome also pointed to the type of testing that would be effective to identify those who had been infected.
As Michael Oberholzer, PhD, and Phil Febbo, MD, wrote in a February 13 article published on the website of Illumina, a leading developer of tools and integrated systems for analyzing genetic variation and function: “These tests are essential to both patient management and incidence tracking, so performance must be highly accurate.”
The beast was among us, but at least we knew with unprecedented speed and precision just what was facing us.
As Oberholzer and Febbo wrote: “Understanding the genome of SARS-CoV-2 early, provided unprecedented insight into the dynamics of viral spread and impacted response strategies.”
While the eventual death toll of COVID-19 may not be known for months, getting an early jump on the virus will likely keep the human outcome far below that of the 1918 H1N1 outbreak—colloquially known as the Spanish flu—which many experts point to as being the only modern-day precedent for the size of this pandemic. As a reference point, the Spanish flu took the lives of millions of people (estimates range from 17 to over 50 million)—at a time when the world’s population was less than a third of what it is today.
A 21st-century phenomenon
This head start on combating an even larger global tragedy wouldn’t have been possible even 20 years ago. At the turn of the century, researchers were just completing work on the Human Genome Project (HGP), while advances in supercomputers were still approaching the kind of globally distributed computational power that can be brought to bear on a problem like COVID-19 today.
While we have now normalized the process of genome sequencing and analysis—to the point where DNA analysis is casually offered to us by consumer-oriented companies like Ancestry on a regular basis—it’s illuminating to reflect on the challenge facing the HGP and just how far we have come in the 17 years since its completion, both in terms of how much faster we can now complete the sequencing of something as critical as the COVID-19 virus and how far the cost has fallen.
It begins with the sheer scale of any sequencing exercise. The human genome consists of all the DNA contained in a cell’s nucleus, with that DNA being composed of four “bases” (labelled G, A, T and C) with the biological information encoded within DNA determined by the order of those bases. The size of an organism’s genome is generally considered to be the total number of bases in one representative copy of its nuclear DNA. In the case of humans, that corresponds to the sum of the sizes of one copy of each chromosome pair. Overall, the human genome consists of about three billion bases. By comparison, coronaviruses consist of about 30,000 DNA bases.
According to a white paper published by the National Human Genome Research Institute, the HGP involved first mapping and then sequencing the human genome.
“The former was required at the time because there was otherwise no ‘framework’ for organizing the actual sequencing or the resulting sequence data. The maps of the human genome served as ‘scaffolds’ on which to connect individual segments of assembled DNA sequence. These genome-mapping efforts were quite expensive, but were essential at the time for generating an accurate genome sequence. It is difficult to estimate the costs associated with the ‘human genome mapping phase’ of the HGP, but it was certainly in the many tens of millions of dollars (and probably hundreds of millions of dollars).”
Once the genome was mapped, sequencing took an additional 15 months and cost about $300 million. Next, the HGP refined the draft sequencing and the final product was released in 2003, with an additional cost of $150 million.
With the technique of sequencing DNA bases established, the goal of bringing it to a manageable level relied on the burgeoning development of supercomputers. As Justin Eure, global communications manager for Lenovo, the Beijing-based computer company, wrote: “The speed of genome sequencing has risen in stride with the rapid acceleration of computational power. A process that initially spanned more than a decade and cost billions for a single genome can now be executed in a matter of hours on clusters of supercomputers running fully optimized hardware architecture.”
DNA analysis: The commercialization race
Accompanying faster processing came reduced prices. By 2010, the cost to sequence a single human genome had hit about $10,000, and researchers had targeted one-tenth of that cost as the ideal price point to spur on mass consumption. That target was hit within another decade, although the price breakthrough was obscured by the growing competition for lower-priced, broad-market DNA analysis and “matching,” as delivered by a range of companies headed by Ancestry and 23andMe. Sampling just a portion of a customer’s DNA, drawn from saliva, these companies have sold millions of test kits worldwide, based on the promise of revealing ethnic background and potential, previously unknown family connections. At this broad consumer level, testing now costs around $100, based on a model that emphasizes up-selling customers to subscriptions that promise ongoing updates about their family trees.
At the higher end of the market, competition is also fierce, highlighted by Illumina’s public feud with its Chinese competitor BGI, the company responsible for creating the COVID-19 test procedure. A participant in the HGP, BGI made a splash in early 2020 by announcing it was ready to launch a complete human genome test for under $100. To reach that significant milestone, BGI is using technology it purchased as part of the 2012 sale of the US-based company Complete Genomics along with robotic processing systems.
Writing in the MIT Technology Review, Antonio Regalado stated: “Modern ultra-fast sequencing operates by ‘synthesis.’ That is, a person’s genome is broken into billions of short stretches of DNA and captured on the surface of a chip. Then, fresh DNA chemical letters are added, pairing off along those fragments. The process gradually builds up, or synthesizes, matching strands of genetic information.
“By detecting the order that letters are added, the machines determine the sequence of those billions of fragments. The data is then puzzled back together on a computer to produce a full map of a person’s genome.”
Genome sequencing: Stirring the secret sauce
The specific processes used by genome sequencing companies, as well as those employed by DNA analysis firms like Ancestry, are closely guarded secrets. That said, most use optical means to analyze the DNA. The current bulk sequencing equipment for the optical method requires ultra-high speed cameras connected to powerful computing systems to process all that data. Teledyne DALSA has more than 40 years of experience developing the specialized camera technology to required to capture these photo-emission signals from DNA.
These substrates are actually “BioMEMS” microfluidic chips, built in a micro-electronic-mechanical systems (MEMS) foundry where semiconductors are made. Their microfabrication process involves deep etching to form the fluid channels or wellsl to contain DNA fragments. BioMEMS require very high micro-machining accuracy where critical dimensions or feature size needs to be controlled in the range of single micrometers and below.
This will likely change. The technology road map for NGS (Next Generation Sequencing) includes a different approach, where the DNA is sensed electronically instead of through an optical signal. Teledyne DALSA, for example, has developed an integrated solution where the electrical CMOS readout chip can be designed by its imaging division and the DNA BioMEMS reading cell can be manufactured at the semiconductor division. Bringing these two disparate technologies together into a unified portfolio makes it easier to accelerate new technology development, or help new companies break into the genomic industry.
Recently published research papers point to breakthroughs in synchronizing optical and electrical detection of biomolecules, while others point to advances in imaging techniques that will allow researchers to examine regions of the human genome—such as 22q11 on Chromosome 22—that could not be mapped using technology that existed during the HGP’s lifespan.
All of this “next-generation sequencing” work is in search of what is broadly called either “personalized” or “precision” medicine.
“If genome sequencing gets even cheaper,” Regalado wrote, “it could propel the use of new early-detection blood tests for cancer and measurements of people’s microbiomes (their gut bacteria), applications which also rely on high-throughput DNA sequencing.”
Facing market saturation for its family tree-driven tests, companies like Ancestry and 23andMe have pivoted toward allowing customers to obtain additional insights into their health, and new players like Sano are taking it a step further—linking people with specific conditions or genetic pre-dispositions so they can compare their information.
Sano’s CEO Patrick Short told The Guardian: “This is one of the great promises of genetic medicine. Because our DNA does not change throughout lives, our DNA—along with information about our environment and behavior—should be a powerful tool for moving towards proactive rather than reactive medicine.”
In the wake of the COVID-19 pandemic, it’s likely the world will be ready for some serious proactivity, feeding more growth in the DNA sequencing market that Market Research Future already predicts will achieve a compound annual growth rate of more than 17 percent over the coming years.