Bioinformatics called critical to next-generation DNA sequencing

Massive amounts of data threaten "bottlenecks" without adequate infrastructure, report warns
Tools

Laboratories that want to run next-generation sequencing (NGS) of DNA will need extraordinarily robust bioinformatics infrastructure to fully leverage research and diagnostic opportunities, the Association for Molecular Pathology reports in the November issue of the Journal of Molecular Diagnostics.

"The power of NGS to generate hundreds of millions to multigigabase levels of sequence in a single instrument run, while having opened a diversity of research and diagnostic avenues, is concomitantly stretching our ability to process data," according to the report, "Opportunities and Challenges Associated with Clinical Diagnostic Genome Sequencing."

"This unprecedented amount of sequencing information poses bottlenecks that vary, depending on application, at the level of data extraction, analysis, and interpretation," the report continues. "These challenges have become part and parcel of the biomedical research community where investigators have increasingly needed to incorporate bioinformatics and biostatistics into their armamentarium."

The infrastructure needs to include both computational hardware and high-level expertise, the report contends. "Indeed, the balance of time and effort required for NGS-based research or diagnostics is substantially shifted toward data analysis, as opposed to the technical component required to generate the data," the authors note.

Tools required include data management, storage, analysis and archiving for large data sets, according to the report. The most "computationally intensive" step of the basic sequencing process is converting image data into sequence reads, called base calling.

"There is a continuing need to reduce error rates, especially as platforms are pushed to generate longer reads," the report's authors note, adding that "mapping many reads to the reference genome requires highly efficient and accurate algorithms."

The "NGS data deluge" requires programming expertise and specialized servers to handle and store massive amounts of data--several terabytes of raw data for the typical experiment, according to the report. User-friendly informatics tools to analyze the data are critical, the report notes, but adds that cloud computing could reduce the need to purchase potentially prohibitively expensive servers for storage.

The demand for DNA sequencing could increase even further if the medical community adopts a recent recommendation by the Presidential Commission for the Study of Bioethical Issues. In its report, "Privacy and Progress in Whole Genome Sequencing," the commission recommended including DNA sequencing data in standardized electronic health records to advance medical research and clinical care.

To learn more:
- read the report

Related Articles:
NIH grants focus on tech creation to improve gene sequencing
Data analytics key to leveraging ENCODE DNA project discoveries
Genomics business value, cost-effectiveness unclear