'Synthetic data' will fuel AI development. Here's what it is and why it matters

Gretel co-founder and CTO John Myers, on the latest episode of Quartz AI Factor, explains how synthetic data will be like synthetic oil for AI
Synthetic data is the secret fuel accelerating AI’s revolution, exec says
Subtitles
  • Off
  • English
We may earn a commission from links on this page.

The mathematician Clive Humby first said in 2006 that “data is the new oil”: valuable, but unusable if it’s not refined. Much like oil, data also has a synthetic counterpart that will power the future of artificial intelligence, according to Gretel co-founder and CTO John Myers.

“I think synthetic data is going to be basically the underlying fuel that powers the development of AI systems and specifically the data that goes into it,” Myers said in the latest episode of Quartz AI Factor, a video series set at the Nasdaq MarketSite (NDAQ).

Gretel provides synthetic data to enterprises to improve AI and machine learning models. Data provides the foundation for development of systems and model training to make AI smarter and more effective with each use.

“I look at synthetic data a lot like synthetic oil,” he said. “Everyone that drives a car at today is using synthetic oil. They just know that it’s kind of like real oil, but it’s manufactured and it has guaranteed qualities on it that makes sure that the engine can run smoothly.”

Synthetic data works in much the same way. There are two ways to create synthetic data. The first is to take existing data and make it safe to use — reducing volume, making it portable, and eliminating privacy risks, Myers explained. The other is to make it “from scratch,” he said, generating data that doesn’t exist to solve problems and build new products.

Image for article titled 'Synthetic data' will fuel AI development. Here's what it is and why it matters
Image: Khanchit Khirisutchalual (Getty Images)

The consulting firm Gartner has estimated that 60% of data used for AI and analytics would be synthetically generated by 2024. The synthetic data generation market is forecasted to grow to $2.1 billion in 2028, from $381.3 million in 2022, according to BCC Research.

Privacy is one of the biggest selling points for synthetic data, particularly in highly regulated sectors like financial services and healthcare, which handle a lot of sensitive personal information. But synthetic data can also help fill gaps where real-world data is lacking and supplement organically produced data that’s outdated or poor quality.

But Myers doesn’t see synthetic data replacing raw data. Instead, it will serve as a complement to real data and records.

“I think what you’re gonna find is that there’s gonna be a pretty big boundary that says, when we want to build applications or put this data to work, let’s take a synthetic version of that data and use that,” Myers said. “And then that’s going to be where your applications are built at the enterprise layer, while that raw data is used to refine it down into that synthetic data.”

Watch the latest episode of Quartz AI Factor above.