It’s no secret that Google’s flagship AI chatbot Gemini has had some problems. Its production of historically inaccurate images forced Google parent Alphabet to temporarily suspend the product earlier this year.
But Google is trying to turn the page on its early AI mishaps. Keynote speakers at the tech giant’s annual Google Cloud Next conference in Las Vegas on Tuesday showed off new features of Gemini Pro 1.5, the latest version of its chatbot that’s now publicly available. Spectators watched while demonstrators muttered to themselves and typed prompts into the revamped AI chatbot to highlight its new tools — perhaps the most important of which is its ability to “ground” queries. “Grounding” means responses on Gemini Pro 1.5 are linked to “verifiable sources of information,” the company said Tuesday.
The announcements about Gemini 1.5 Pro included a range of updates to the chatbot as part of Google’s push to sell its AI products to corporate customers. Gemini now includes further capabilities for something called “long context understanding,” which basically means it can process a lot more information. And it has multimodal capabilities — or the ability to process not just text but also audio, video, and other formats to generate responses.
“With these two advances, enterprises can do things today that just weren’t possible with AI before,” Google CEO Sundar Pichai said during the presentation.
Businesses have already been piloting the product. Goldman Sachs, Mercedes, and Uber are among the early Gemini 1.5 Pro customers, Google said. Goldman Sachs’ CEO David Solomon himself made an appearance over video at Google Next right after Pichai. Mercedes-Benz CEO Ola Källenius also spoke about the German carmaker’s partnership with Google and use of its AI products.
Google said that Gemini 1.5 Pro allows customers to “process vast amounts of information in a single stream” — including 1 hour of video, 11 hours of audio, or over 700,000 words.
“For example,” the company added, “a gaming company could provide a video analysis of a player’s performance, along with tips to improve. Or an insurance company could combine video, images and text inputs to create an incident report, making the claims process easier.”
Google had some other AI announcements, too, a full list of which can be found on the Google Next 2024 conference website.
Google Vids
Google is launching an AI-powered video creation app, Google Vids. The app was demoed by Aparna Pappu, VP of Google Workspace on Tuesday.
“Gemini suggests a narrative outline for the story that I could easily customize and edit,” based on a prompt in Google Docs, said Pappu.
Text-to-live image generation
Google’s latest version of its AI generator, Imagen 2.0, which is powered by Gemini, has the ability to create live images from text prompts. It’s still in “preview” mode, but keynote speakers in Las Vegas showed off the feature.
“Marketing and creative teams can generate animated images from a text prompt, including product images, ads, GIFs, and storyboards,” Pappu said. Another demonstrator noted that the tool creates live images that would otherwise take “days or weeks of scouting and shooting.”
Pappu also announced that Google’s AI-generated Imagen images will have the ability to be watermarked using Google DeepMind’s SynthID.