Rob Patro, an assistant professor in the Department of Computer Science in Stony Brook’s College of Engineering and Applied Sciences, leads a group of computational biological researchers that developed a new software tool, Salmon — a lightweight method to provide fast and bias-aware quantification from RNA-sequencing reads. The research was published in the March 6 edition of Nature Methods.
The team includes researchers from the Department of Computer Science at Stony Brook University, University of North Carolina–Chapel Hill, Harvard School of Public Health, Carnegie Mellon School of Computer Science, and private industry.
“This research represents a perfect storm for computer science,” said Computer Science Chair Arie Kaufman. “We have a group of knowledge-driven collaborators from across the United States, funded by multiple sources, and striving for advancing genomic research by developing an innovative tool. I congratulate them on this discovery.”
In genomics, transcript abundance estimates are used to classify diseases and their subtypes, to understand how gene expression changes correlate with phenotype, and to track the progression of cancer. The accuracy of abundance estimates derived from RNA-seq data is especially urgent given the wide range of biases that affect the RNA-seq fragmentation and sequencing processes, and the use of expression data in studying disease and, eventually, for medical diagnosis and personalized treatments.
Created by researchers Rob Patro, Geet Duggal, Michael Love, Rafael A. Irizarry and Carl Kingsford, Salmon synthesizes, into one tool, many algorithmic and methodological advances that will power gene expression studies, both small- and large-scale.
According to Patro, the hallmarks of the method are its speed, accuracy and robustness. Salmon runs at a similar speed to existing fast algorithms for quantifying gene expression, yet it incorporates a rich and expressive model of the underlying experiment, including many technical biases, and uses a new statistical inference procedure to estimate gene expression quickly and accurately.
“The methodological underpinnings of Salmon provide a framework upon which we can continue to build accurate models and efficient inference algorithms,” said Patro. “We are working on understanding and modeling an even larger array of potential technical biases that arise in RNA-seq-based gene expression studies. We are also particularly interested in how quantification algorithms can be made more accurate and robust in single-cell RNA-sequencing (scRNA-seq) experiments, which present unique algorithmic and statistical challenges.”
Salmon was developed with funding from the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative, the National Science Foundation and National Institutes of Health.