Google's Project Astra is an AI assistant that reacts to what it sees


Google made many exciting AI announcements at I/O 2024, but one of the most talked about was Project Astra. Essentially, Project Astra is what Google calls an "advanced visual and speech-responsive agent." This means that in the future Google's AI will be able to get context from your surroundings, and you can ask questions and get answers in real time. It's almost like an enhanced version of Google Lens.

Project Astra is developed by Google's DeepMind team, whose mission is to build artificial intelligence that responsibly benefits humanity; this project is just one of the ways it does so. Google said that Project Astra is built on Gemini 1.5 Pro and has made improvements in areas such as translation, encoding, and reasoning. Google said that as part of the project, it developed a prototype artificial intelligence agent that can process information faster by continuously encoding video frames and combining video and voice input into a timeline of events. The company also uses their speech models to enhance the pronunciation of AI agents to adapt to a wider range of intonations.

Google released a two-part demo video showing how Project Astra works. The first half of the video shows Project Astra running on a Google Pixel phone; the second half shows the new artificial intelligence running on a prototype Glass device.

In the demo video, we can see the user using a Pixel phone and opening the camera viewfinder, and moving the device around the room while asking the next-generation Gemini AI assistant "Tell me when you see something that makes a sound," and the AI Will make a sound. Answer by pointing out the speaker at the table. Other examples in the video include asking what part of the code on the computer screen does, what city neighborhood they are currently in, and coming up with a band name for a dog and his toy tiger.

While it will be a long time before we see Project Astra's next generation of artificial intelligence enter our daily lives, it's still cool to see what the future holds.