Strategic Communications and Marketing News Bureau

Genomics to surpass the biggest data producers, experts warn

CHAMPAIGN, Ill. — Each cell in the body contains a whole genome, yet the data packed into a few DNA molecules could fill a hard drive. As more people have their DNA sequenced, that data will require massive computational and storage capabilities beyond anything previously anticipated, says a new assessment from computational biologists and computer scientists at the University of Illinois and Cold Spring Harbor Laboratory.

The team of experts compared data needs of genomics with three of the biggest players in big data: astronomy, Twitter and YouTube. They projected growth in each area through the year 2025 and found that genomics is poised to be a leader in data acquisition, storage, distribution and analysis.

The team’s assessment is published in the journal PLOS Biology.

“As genome-sequencing technologies improve and costs drop, we are expecting an explosion of genome sequencing that will cause a huge flood of data,” said Gene Robinson, a professor of entomology and the director of the Carl R. Woese Institute for Genomic Biology at the U. of I. “The only way to handle this data deluge will be to improve the computing infrastructure for genomics.

“Astronomy, Twitter and YouTube represent three diverse domains that generate and use a huge amount of data, albeit with huge differences in computing needs. The diversity of these three forms of big data provides an excellent framework for comparative analyses with genomics,” he said.

Like YouTube and Twitter, genomics data are highly distributed, coming from many different sources. However, both Twitter and YouTube have standard formats for their entries, while genomic data can assume many different formats, making sharing and storing more complex.

The authors estimate that the genomics information so far, from sequencing different organisms and a number of humans, has produced data on the petabyte scale (a petabyte is a million gigabytes). However, over the last decade, genomic sequencing data doubled about every seven months, and will grow at an even faster rate as personal genome sequencing becomes more widespread. The researchers estimate that by 2025, genomics data will explode to the exabyte scale – billions of gigabytes. This surpasses even YouTube, the current title holder among the domains studied for most data stored.

Yet the sequences are only one element of genomic data.

“The DNA sequence in itself is not particularly useful for realizing all the great possibilities that genomics technology promises,” said co-author Saurabh Sinha, a professor of computer science at Illinois. “The sequence data have to be analyzed through sophisticated and often computationally intensive algorithms, which find patterns in the data and make connections between those data and various other types of biological information, before they can lead to biologically or clinically important insights. All of this makes the goal much more challenging than just sequencing DNA and storing that information.”

The need for complex analysis is similar to astronomy, but with an important difference, the authors say. Astronomy generates vast amounts of data but incorporates several processing technologies at the time of data collection, requiring less time and computational power later on. The researchers suggest that integrating similar processing methods could cut down on the storage needs for genomic data as well. But there’s a catch: The whole genome may offer insights not yet anticipated, as new understandings may emerge as more people are sequenced.

“In the future, we may have to take the hard decision of storing only the processed form and not the original, and that, too, in heavily compressed forms, to drastically reduce the storage needs,” Sinha said. 

The authors urge new technology development to handle the expected explosive growth in genomics data beyond what is predicted for social media and astronomy.

“Genomics will soon pose some of the most severe computational challenges that we have ever experienced,” Robinson said. “If genomics is to realize the promise of having a transformative positive impact on medicine, agriculture, energy production and our understanding of life itself, there must be dramatic innovations in computing. Now is the time to start.”

Editor’s note: To reach Gene Robinson, call 217-265-0309; email generobi@illinois.edu.
To reach Saurabh Sinha, call 217-333-3233; email: sinhas@illinois.edu.

The paper, “Big Data: Astronomical or Genomical?” is available online.

Read Next

Life sciences Photo of Michael Ward standing in tall grass on a riverbank.

How are migrating wild birds affected by H5N1 infection in the U.S.?

Each spring, roughly 3.5 billion wild birds migrate from their warm winter havens to their breeding grounds across North America, eating insects, distributing plant seeds and providing a variety of other ecosystem services to stopping sites along the way. Some also carry diseases like avian influenza, a worry for agricultural, environmental and public health authorities. […]

Announcements Marcelo Garcia, professor of civil and environmental engineering at The Grainger College of Engineering.

Illinois faculty member elected to National Academy of Engineering

Champaign, Ill. — Marcelo Garcia, a professor of civil and environmental engineering in The Grainger College of Engineering, has been elected to the National Academy of Engineering.

Social sciences Male and female student embracing on the quad with flowering redbud tree and the ACES library in the background. Photo by Michelle Hassel

Dating is not broken, but the trajectories of relationships have changed

CHAMPAIGN, Ill. — According to some popular culture writers and online posts by discouraged singles lamenting their inability to find romantic partners, dating is “broken,” fractured by the social isolation created by technology, pandemic lockdowns and potential partners’ unrealistic expectations. Yet two studies of college students conducted a decade apart found that their ideas about […]

Strategic Communications and Marketing News Bureau

507 E. Green St
MC-426
Champaign, IL 61820

Email: stratcom@illinois.edu

Phone (217) 333-5010