Big data leads life sciences into an uncertain future

WASHINGTON – The failure of a 2008 program called Google Flu Trends was an early lesson in the limitations of big data in health and life sciences research: the ability of computers to analyze huge sets of information is limited by how researchers use it.

The Google program tracked when and where people searched for flu-related information on Google’s search engine. The researchers thought they could use the data they collected to predict where flu would spread weeks ahead of the Centers for Disease Control and Prevention. But when the swine flu outbreak hit shortly after the program started, Flu Trends failed to forecast the outbreak and continued to yield inaccurate data until Google killed it in 2014.

“There is the idea that big data can show us stuff in the world that we don’t already know, that it can give us control over contingent situations that we don’t already have,” said Gaymon Bennett, a religion and science professor at Arizona State University.

The Google Flu Trends flop reflects that big data’s role in life sciences (essentially anything that deals with living organisms) is grounded more in theory than reality because most of what can happen, for good or bad, has not yet occurred.

And this potential may blind big data proponents to its inherent risks.

“There’s so much hope being invested that we’re creating futures without really thinking about the fact that the futures we’re creating aren’t just a new device here or a new website there,” Bennett said. “These potentially could fundamentally restructure how we live in the world.”

A report published in 2014 by the American Association for the Advancement of Science, FBI and United Nations crime institute said “no adequate legal or technical solutions exist to prevent adversaries from using data” to make biological weapons.

Using big data to predict what that weapon would be, however, has the potential for deadly miscalculation.

“The temptation of doing something ahead of time is that you can maybe anticipate and address the risk,” said Kavita Berger a scientist at Gryphon Scientific consulting company who organized the report. “But the risk of doing something ahead of time is that you inaccurately assess the risk or you don’t allow for the beneficial application to be developed and applied. And so it’s a really tough situation.”

Information needed to create a biological weapon could come from health information like data of people’s genomes that was originally obtained for good reasons, said Professor Margaret Kosal of the Georgia Institute of Technology, who worked on the report.

The risk of losing health information to hackers with bad intentions has scientists treading carefully, Kosal said, because the data they collect – medical records, analysis of past pandemics – could have disastrous global consequences if put in the wrong hands.

A more immediate threat is when hackers flood data systems with false information, surrounding accurate data with purposely misleading figures.

“If you were to flood the system with incorrect messages that surround correct, accurate observations, you would end up delaying a response to the nascent outbreak because people just don’t see it, because there’s so much noise,” said Tanya Berger-Wolf, a computer science professor at the University of Illinois at Chicago.

When the Obama administration rolled out its Healthcare.gov website in 2013, hackers tried to overload the site with unnecessary requests, using a distributed denial-of-service attack. The hackers hoped to shut down the website by flooding it with too much traffic to handle, but their attempt proved unsuccessful.

Hackers can also spoof datasets, where they pose as other people or parties to gain access to data, which they can then manipulate. Kosal said cyber criminals can spoof public health monitoring systems to create false alarms about insignificant diseases or undervalue the strength of a powerful contagion.

While the motivation for creating bioweapons is clear, researchers still question what prompts hackers to manipulate data, said Charles Schmitt, director of informatics at the North Carolina-based Renaissance Computing Institute. But some experts have ideas.

“At the far end it would be to create mass panic,” Kosal said. “At the near end it would be to cause whatever agency is monitoring it to waste time and energy.”

Hackers target health data more than any other sector in the United States, according to a 10-year analysis of data breach records published last year by security software provider Trend Micro, which also reported a steadily increasing number of data breaches across all sectors from 2009 to 2013.

The growth of data breaches magnifies the risks of using big data analytics in life sciences, Kosal said.

“It’s not the government computers that they hack, it’s insurance companies, major medical entities,” Kosal said. “It’s the vulnerabilities of the private sector. There is no government computer that this data is sitting on. Most of the big data in the life sciences is sitting on private computers or in clouds or servers. So it’s largely a problem of securing private infrastructure.”

The challenge of keeping such large quantities of healthcare data safe may turn some researchers away from using big data because they believe the dangers are too great. “You have to balance among priorities what is possible versus what is probable,” Kosal said. “We need to balance the legitimate role of science and discovery and the benefits so that we don’t inadvertently limit ourselves because we are overly concerned about something from a technology.”

Non-healthcare data also influences big data’s potential applications in life sciences. But unlike sensitive medical records, this information is often public. For instance, Google Flu Trends should have considered factors like environmental and demographic data, and how people move from place to place, Berger-Wolf said, because health care data alone is not enough to accurately predict outbreak trends.

“The data are open and highly manipulable,” Berger-Wolf said. “So if we know that (the Centers for Disease Control and Prevention) is going to start making decisions based on Google searches, then (hackers) can get a whole bunch of automated bots to bias that search.”

Bennett, the professor from Arizona State, said those who own and control big data also control our informational lives and may one day own our biological lives, too.

“It’s one thing for Facebook to vacuum up all my information and try to sell me something,” Bennett said. “It’s another thing when … the biological data of my susceptibilities to infectious disease begins to be a parcel of the ecology that big data sucks up and processes and owns. How is it that core parts of myself are being alienated because somebody owns and controls it?”

But our biological information already is controlled by others in some cases, , Berger-Wolf said.

When some hospitals take samples of patients’ DNA, for example, they force them to sign a release that allows the hospital to use the DNA in subsequent studies without further consent, she said.

“You are giving a sample that contains your entire information of who you are,”she said.

employers and health insurance companies from discriminating on the basis of genetic information. But the law excludes life insurance, one of the insurance types for which people with major health concerns could have the most to lose.

Data that could affect the cost of life insurance is everywhere, Berger-Wolf said. The technology people use to record their health, like iPhone apps or Fitbit fitness trackers, contains a wealth of information, like how much they walk and sleep. And some patients avoid getting tested for disease out of concern insurance companies will see the results.

“If that’s not influencing your biological life then I don’t know what is,” Berger-Wolf said.

The question, she said, is not whether life scientists and researchers should invest in or avoid using big data. They will continue to accumulate huge swaths of medical records, DNA sequences and Google searches for flu symptoms because that’s part of their job.

“We are doing it one way or another,” she said. “It’s part of our pursuit in science and in health and in every aspect of our lives.”

Instead, life scientists’ focus should turn to interpreting and managing the risks associated with big data.

“We are not going to stop using electronic records or encouraging people to live healthy lifestyles by creating apps,” Berger-Wolf said. “Nobody is going to prevent people from creating health apps. Are we going to shut down Fitbit? Absolutely not. But we need to start understanding what risks there are so we are not scrambling five years from now.”

Big data leads life sciences into an uncertain future

Medill on Twitter

Medill Today – March 4, 2025