Skip to navigationSkip to content
cnsphoto via REUTERS
More than 30 authors from China’s CDC appear on a genomics paper published in the Lancet.
VIRAL DATA

Coronavirus is a proving ground for scientific transparency

Tim McDonnell
By Tim McDonnell

Climate reporter

From our Obsession

Machines with Brains

AI is upending companies, industries, and humanity.

As the number of people who have contracted coronavirus increases, several groups at universities in the US and Europe have rolled out predictions over the last few days about where and how the disease, which epidemiologists have now dubbed nCoV19, will spread next. 

Some predictions focus on ranking the countries and Chinese provinces most at risk of seeing new cases (an analysis led by Northeastern University group puts the US fifth). Others have tried to estimate what the final tally will be—an estimate from the UK’s Lancaster Medical School says the caseload in Wuhan could reach above 190,000 people within two weeks. 

Mapping out the future of an epidemic is invaluable for officials scrambling to implement travel restrictions and allocate health care resources. But it requires a mountain of data. You need to know airline traffic patterns and keep up with shifting travel bans. You need to stay on top of virologists’ rapidly-evolving understanding of disease transmission. And you need to know where new cases are cropping up—no small feat for an outbreak of this size.

With all that in hand, you can design a computer model to calculate the probability that the virus will spread in a given time along any number of possible routes. 

Alessandro Vespignani, who leads the Modeling of Biological and Socio-Technical Systems Lab that produced the ranking analysis, said his team was already well-equipped with air travel data from their work on previous epidemics. But they needed more data about existing caseloads to plug in. So he turned to a public online repository of caseload data uploaded by researchers from a range of universities to the open-source platform Github. That, plus other academic and government sources, are helping him complete the puzzle.

In past epidemics, some public health researchers have been guilty of hoarding top-shelf data sets for meticulous study—and hopefully a splashy journal article—until long after the public health crisis has passed.

But global health scientists who are working to track the coronavirus say that the outbreak has sparked an unprecedented level of openness and collaboration between normally competitive research outfits. The result, they say, is that actionable computer models to predict where the disease might spread next—and the genetic data needed to develop treatments—are coming online faster and at a higher quality than in any previous epidemic. 

“The communication between modeling teams has been the best that I’ve ever seen in my life,” Vespignani said. 

The Github repository includes some data sources derived from artificial intelligence, like Harvard Medical School’s HealthMap, which uses AI to sift through news articles and other digital media for signs of new cases. Johns Hopkins University has also opened up the data behind its case-tracking map, which aggregates data from a range of official Chinese, American, and WHO sources.

The current field of predictive models wouldn’t be possible to make so quickly—within a week or so after the disease was first confirmed to spread between people—without those open data sets, Vespignani said. 

“In previous outbreaks you could wait weeks or months to see a paper come out that had that one piece of information you needed in your work,” he said. “That was really hindering the process. This time is different.”

David Pigott, a scientist at the University of Washington’s Institute of Health Metrics and Evaluation who has contributed to the Github repository, says global health researchers are now reaping the benefits of data-sharing networks that have evolved over the last few epidemics, especially Ebola and Zika. 

“With each event, there’s a snowballing of like-minded individuals being increasingly aware of each other,” he said. “So when the coronavirus was detected, we knew who to reach out to.”

Meanwhile, a number of papers on the disease’s viral characteristics and genetic makeup have sped through peer review at The Lancet in the last few days. And at least one private AI-based outbreak monitoring company is also joining the fray: Metabiota, a San Francisco-based service for government agencies and insurance companies, plans to make the province-level data behind its nCoV19 tracking map freely available early next week, CEO Nita Madhav told Quartz.

That will be the first time the company has ever shared its data, Madhav said, a decision that she said was influenced by WHO’s emergency declaration. “With such an important outbreak going on, we’ve been getting a lot of requests for data,” she said. “We’re seeing a trend toward open data and we want to contribute to that.”

Dirk Brockmann, a biologist at Humboldt University in Berlin who is maintaining his own projection of nCoV19 global transmission risk, attributes all this newfound openness to the increasing influx of computer scientists into public health fields, who bring with them a more liberal ethos toward transparency than has traditionally been found among epidemiologists.

“There’s a cultural shift happening in the next generation of scientists which is very promising,” he said. “It’s really changing the field.” 

That’s especially important in the context of nCoV19, because of China’s track record on transparency during epidemics. When the country was hit by SARS in 2003, it took months for officials to admit the true scale of the caseload, and to share genomic data collected from patients with outside scientists.

So when nCoV19 started to take off in early January, there was good reason to be concerned about what China would be willing to share. The outcome has been mixed: The World Health Organization and some scientists have praised the country’s relative openness with epidemiological and genetic data. But on Jan. 28, two top US health officials pressed the country for more data on the disease’s human-to-human transmission potential, and on efforts to develop a vaccine. 

An urge for more data was also reinforced Thursday by WHO in its official designation of the outbreak as a global health emergency. So far, Chinese scientists, if not their bureaucrats, seem one step ahead: The Lancet genomics paper published that same day listed more than 30 authors from China’s Center for Disease Control and Prevention.

Subscribe to the Daily Brief, our morning email with news and insights you need to understand our changing world.