Google I/O shows Gemini still needs time to bake


The prevailing tone at Google I/O 2024's launch keynote seemed to be "can we delay?" Google's promised AI improvements certainly took center stage, but with a few exceptions, most are still in the works.

That's not surprising - this is a developer conference after all. But it looks like consumers will have to wait a little longer before they get the "her" moment they were promised. Here's what you can expect as Google's new features begin rolling out.

Artificial Intelligence in Google Search

Perhaps the most impactful addition for most people will be the expansion of Gemini integration in Google Search. While Google already offers a "Generate Search" feature in Search Labs for quickly writing a paragraph or two, everyone will soon be getting an expanded version called "AI Overview."

In search, AI Overview can optionally generate multiple pieces of information in response to a query, complete with subtitles. It will also provide more context than its predecessor and can take more detailed prompts.

For example, if you live in a sunny, weather-friendly area and ask "restaurants near you," the overview might give you some basic suggestions, but also provide a separate subheading that lists having a good patio Seating restaurant.

Within the more traditional search results pages, you'll be able to use "AI-organized search results," which sidesteps traditional SEO and intelligently recommends web pages to you based on highly specific prompts.

For example, you can ask Google to "Create a gluten-free three-day meal plan that includes lots of vegetables and at least two desserts" and the search page will create several subheadings, each with links to the appropriate recipes. .

Google is also bringing artificial intelligence to the way you search, with a focus on multimodality — meaning you can use it with other content besides text. Specifically, a "Video Ask" feature is in the works that will allow you to request identification or repair help simply by pointing your phone's camera at an object and get answers through a generated search.

Google has not directly addressed how it handles criticism that AI search results essentially steal content from web sources without users having to click on the original source. That said, demonstrators have repeatedly stressed that these features bring you useful links that you can check out for yourself, perhaps covering their bases in the face of these criticisms.

The AI ​​overview is already rolling out to Google users in the U.S., and AI-organized search results and video queries will roll out "in the coming weeks."

Use AI to search your photos

Another more specific feature in the works is Ask Photos, which leverages multimodality to help you sort through the hundreds of gigabytes of images you have on your phone.

Let's say your daughter took swimming lessons last year and you forgot the first photo of her in the water. Asking for Photos lets you simply ask, "When did my daughter learn to swim?" Your phone automatically knows who "your daughter" you're talking about and displays an image of her first swimming lesson.

Of course, this is similar to searching for photos of cats in your photo library by simply typing "cat," but the idea is that in Gemini and the data already stored on your phone.

Other details are brief, with Ask Photos set to debut "in the coming months."

Project Astra: An AI agent in your pocket

This is where we get into more pie-in-the-sky stuff. Project Astra has the most C-3PO in AI we've ever seen. The idea is that you'll be able to load the Gemini app on your phone, open the camera, point it around, and ask questions and help based on what the phone sees.

Point at a speaker, for example, and Astra will be able to tell you what parts are in the hardware and how they're used. Point at a picture of a cat with questionable vitality, and Astra will answer your riddle with "Schrödinger's Cat." Ask it where your glasses are, and it can tell you if Astra looked at them earlier in your shot.

This may be a classic dream in artificial intelligence, very similar to OpenAI's recently released GPT-4o, so it makes sense that it's not ready yet. Astra will launch "later this year," but oddly enough, it should also work with AR glasses and phones. Maybe we'll learn about a new Google wearable device soon.

Make a custom podcast hosted by a bot

It's unclear when this feature will be ready, as it appears to be more of an example of Google's improved AI model rather than headline news, but what Google showed off during I/O was more impressive (and may also make One of the demos by Disquiet involves creating a custom model of a podcast hosted by an AI voice.

Let's say your son studies physics in school, but he's more of an audio learner than a text learner. It is said that Gemini will soon let you dump written PDFs into Google's NotebookLM application and require Gemini to make an audio program discussing them. The app will generate podcast-like content with an AI voice speaking naturally about topics in the PDF.

Your son will then be able to interrupt the presenter at any time and ask for clarification.

Hallucinations are obviously a major issue here, and the naturalistic language can be a bit "cringe-y", for lack of a better word. But there's no doubt it's an impressive display...if only we knew when we'd be able to recreate it.

Paid features

There are a few other tools in development that appear to be built specifically for the typical consumer, but for now, they will be limited to Google's paid Workspace (and in some cases, Google One AI Premium) plans.

The most promising of these is Gmail integration, which takes a three-pronged approach. The first is a summary, which reads through a Gmail thread and breaks down the key points for you. That's not too novel, and neither is the second aspect, which allows AI to suggest contextual responses for you based on information from your other emails.

But Gemini’s Q&A does seem to be transformative. Imagine you are looking to have some roofing work done, and you have emailed three different construction companies to get quotes. Now, you want to make a spreadsheet of each company, their quotes, and their availability. Instead of using them to sift through every email, you can ask the Gemini box at the bottom of Gmail to make the spreadsheet for you. It will search your Gmail inbox and generate a spreadsheet in minutes, saving you time and perhaps helping you find lost emails.

This kind of contextual spreadsheet building will be coming to apps outside of Gmail as well, but Google was also proud to show off its new "Virtual Gemini Powered Teammate." The upcoming Workspace feature is still in its early stages and is a bit like a cross between the typical Gemini chat box and Astra. The idea is that organizations will be able to add AI agents to their Slack equivalents that will be on call 24/7 to answer questions and create documents.

Gmail's Gemini-powered snippets feature will roll out to Workspace Labs users this month, and other Gmail features will roll out to Labs in July.

gem

Earlier this year, OpenAI replaced the ChatGPT plug-in with "GPT," which allows users to create customized versions of the ChatGPT chatbot that handles specific problems. Gems is Google's answer to this, and works relatively similarly. You'll be able to create many Gems, each with its own page in the Gemini interface, and each answering a specific set of instructions. In Google's demo, Suggestion Gems included examples such as "Yoga Bestie," which provides workout suggestions.

Gems are another feature that won't arrive for a few months, so for now, you'll have to stick with GPT.

agent

Fresh off lukewarm reactions to the Humane AI Pin and Rabbit R1, AI enthusiasts are hoping Google I/O will showcase Gemini's answer to the promise behind these devices, the ability to go beyond simply organizing information and actually interacting with websites. What we got was a lighthearted teaser with no confirmed release date.

In a speech from Google CEO Sundar Pichai, we saw that the company intends to build artificial intelligence agents that can "think multiple steps ahead." For example, Pichai talked about the possibility that in the future a Google AI agent could help you return your shoes. It can go from "Search your inbox for a receipt" all the way to "Fill out a return form" and "Schedule a pickup," all under your supervision.

All of this comes with a huge caveat, as it's not a demo, just an example of what Google wants to do. "Imagine If Gemini Could" does a lot of the heavy lifting in this part of the event.

New Google AI model

In addition to highlighting specific features, Google also touts the release of new AI models as well as updates to its existing AI models. From generative models like Imagen 3, to Gemini's larger, more contextually intelligent builds, these aspects of the demo are geared more toward developers than end users, but there are still some interesting points to make.

The key highlight is the launch of Veo and Music AI Sandbox, which generate AI video and sound respectively. There aren't many details yet on how they'll work, but Google has hired big stars like Donald Glover and Wyclef Jean, promising that "everyone will be a director" and "we mine unlimited boards." box".

Currently, the best demonstrations we have for these generative models are examples posted on celebrity YouTube channels. Here is one of them:

Google also kept talking about Gemini 1.5 Pro and 1.5 Flash during the demo, and its new version of LLM is mainly aimed at developers that support larger token numbers and allow for more context. These may not matter to you, but be aware of Gemini Advanced.

Gemini Advanced is already available as Google's paid Gemini plan and allows asking more questions, some non-developer interaction with Gemini 1.5 Pro, integration with various applications such as Docs (including some but not all Workspace announced today function), and upload files such as PDF.

Some of the features Google promises sound like they'll require you to subscribe to Gemini Advanced, specifically those that require you to upload documents so that the chatbot can answer questions about them or improvise with your own content. We're not sure yet what's free and what's not, but it's another warning to remember as Google promises to "watch us" at this I/O conference.

That's a summary of Google's overall announcements for Gemini. That said, they also announced new AI features in Android, including a new Circle to Search feature and fraud detection using Gemini. (That's not Android 15 news, though: it's coming tomorrow.)