The Pentagon's chief digital and artificial intelligence offer, Craig Martell, is alarmed by the potential for generative artificial intelligence systems like ChatGPT to deceive and sow disinformation. His talk on the technology at the DefCon hacker convention in August was a huge hit. But he's anything but sour on reliable AI.
Not a soldier but a data scientist, Martell headed machine-learning at companies including LinkedIn, Dropbox and Lyft before taking the job last year.
Marshalling the U.S. military’s data and determining what AI is trustworthy enough to take into battle is a big challenge in an increasingly unstable world where multiple countries are racing to develop lethal autonomous weapons.
The interview has been edited for length and clarity.
—-
A: Our job is to scale decision advantage from the boardroom to the battlefield. I don’t see it as our job to tackle a few particular missions but rather to develop the tools, processes, infrastructure and policies that allow the department as a whole to scale.
A: We are finally getting at network-centric warfare -- how to get the right data to the right place at the right time. There is a hierarchy of needs: quality data at the bottom, analytics and metrics in the middle, AI at the top. For this to work, most important is high-quality data.
A: All AI is, really, is counting the past to predict the future. I don’t actually think the modern wave of AI is any different.
A: I find that metaphor somewhat flawed. When we had a nuclear arms race it was with a monolithic technology. AI is not that. Nor is it a Pandora’s box. It’s a set of technologies we apply on a case-by-base basis, verifying empirically whether it’s effective or not.
A: Our team is not involved with Ukraine other than to help build a database for how allies provide assistance. It’s called Skyblue. We’re just helping make sure that stays organized.
A: In the military we train with a technology until we develop a justified confidence. We understand the limits of a system, know when it works and when it might not. How does this map to autonomous systems? Take my car. I trust the adaptive cruise control on it. The technology that is supposed to keep it from changing lanes, on the other hand, is terrible. So I don’t have justified confidence in that system and don’t use it. Extrapolate that to the military.
A: Computer vision has made amazing strides in the past 10 years. Whether it’s useful in a particular situation is an empirical question. We need to determine the precision we are willing to accept for the use case and build against that criteria – and test. So we can’t generalize. I would really like us to stop talking about the technology as a monolith and talk instead about the capabilities we want.
A: The commercial large-language models are definitely not constrained to tell the truth, so I am skeptical. That said, through Task Force Lima (launched in August) we are studying more than 160 use cases. We want to decide what is low risk and safe. I’m not setting official policy here, but let’s hypothesize. Low-risk could be something like generating first drafts in writing or computer code. In such cases, humans are going to edit, or in the case of software, compile. It could also potentially work for information retrieval — where facts can be validated to ensure they are correct.
A: That's a huge can of worms. We have just created a digital talent management office and are thinking hard about how to fill a whole new set of job roles. For example, do we really need to be hiring people who are looking to stay at the Department of Defense for 20-30 years? Probably not. But what if we can get them for three or four? What if we paid for their college and they pay us back with three or four years and then go off with that experience and get hired by Silicon Valley? We're thinking creatively like this. Could we, for example, be part of a diversity pipeline? Recruit at HBCUs (historically Black colleges and universities)?