Two university computer scientists will spend the next two years developing software that can analyze genetic information in parallel using a large computer network thanks to a $380,000 grant they received from the National Science Foundation last month.

Analyzing the vast amount of genetic information within a single genome – the set of genes/chromosomes that make up the full DNA sequence of a living being – usually takes a large amount of time and requires an expensive set of computers to be on hand. University computer science professors Mihai Pop and Steven Salzberg will attempt to use cloud computing – a method with which the researchers may purchase computing power on a distributed network of computers without needing to invest in a large amount of hardware – to speed up the DNA sequencing process. If they succeed, scientists will have an easier path to knowing the genetic makeup of more of the Earth’s creatures.

“Probably the most exciting benefit this technology could provide is mapping out the genetic content of pretty much every organism known to man,” Pop said. “Right now we don’t know the genetic sequences of many organisms. With faster and cheaper sequencing, we should be able to analyze anything.”

DNA is made up of nucleotide bases that act as a code guiding biological functions. This code, analyzed through DNA sequencing, can be used by scientists to compare the genetic makeup of different organisms.

The scientists noted more efficient DNA sequencing would likely lead to significant biological breakthroughs because researchers would have more species’ genomes available for reference.

“With all these genomes available digitally, biologists will use DNA information to test their hypotheses,” Pop said. “The big questions are: How are genes controlled in an organism, how are they turned on and off, and what network connects them? Targeted biological questions are beginning to and will continue to be answered by DNA sequencing.”

Pop said better DNA sequencing technology could enable scientists to learn more about the “good” kinds of bacteria that are essential for human health. He also added constructing reference genomes is essential for the future of all DNA sequencing.

But because computer processes cannot yet handle sequencing an entire genome, DNA strands must be divided up and sequenced, and then the sequences must be reconstructed.

Limitations arise with the computing power needed to reassemble the DNA sequences. Less computing power would be needed if there were already a reference genome to guide reconstruction. Pop utilized the metaphor of a jigsaw puzzle to illustrate the difficulty in analyzing DNA without a reference genome.

“If you have a reference genome, it’s like having the box with the picture on the front to guide your assembly,” Pop said in a university press release. “With no reference, it’s like having no picture and no idea what the finished product will look like; with lots of sky and ocean pieces that fit very loosely together.” If the researchers are able to implement efficient DNA sequencing with on a computing cloud, DNA sequencing will become quicker and less expensive, which will lead to a greater amount of known genomes.

But, though the $380,000 grant will help them get started, it doesn’t guarantee success. Pop and Salzberg will spend the next two years determining whether using cloud computing is practical for DNA sequencing. Pop said another problem researchers may encounter is how to send DNA sequences over the Internet efficiently.

“The biggest open question for us right now is whether the Internet can keep up with the computations,” Pop said. “It’s possible that, even if our analysis would take only an hour on the computer cloud, it could take us an entire day for us to send all of the data.”

If their programs successfully use cloud computing to enhance DNA sequencing, Pop and Salzberg will make their programs available to researchers for free.

jnashdbk@gmail.com