Google’s new AI agent remembers everything, here’s how it works

Most AI agents have a memory problem. Ask a question, get an answer, close the tab and it’s done. Google’s newly open-sourced Always-On Memory Agent, built on Gemini Flash-Lite, is designed to fix that. It runs continuously in the background, ingesting information, forming connections, and building a persistent knowledge base that grows over time.

Also read: Meet Ventuno Q: Tiny Arduino board with 40 TOPS for offline GenAI projects

The headline capability isn’t just storage, it’s consolidation. The agent is built around three specialist sub-agents that divide the cognitive labour: one ingests new information, one consolidates existing memories, and one answers queries by synthesising everything it knows.

Ingestion is the entry point. Drop any file – a PDF, an MP3, a video, a photo, a text file – into a watch folder, and the IngestAgent picks it up automatically within seconds. It uses Gemini’s multimodal capabilities to extract structured information: a summary, key entities, topics, and an importance score. Twenty-seven file types are supported out of the box.

Consolidation is where it gets interesting. Every 30 minutes, the ConsolidateAgent wakes up and does something no vector database can do: it reads across all stored memories, finds non-obvious connections between them, and generates new cross-cutting insights. Google explicitly frames this as mimicking how the human brain processes information during sleep – replaying, linking, and compressing. The result isn’t just retrieval; it’s synthesis.

Also read: Digit Research: Gemini most popular AI chatbot in India, ahead of ChatGPT and others

Querying then draws on both raw memories and consolidated insights to produce cited, reasoned answers rather than simple keyword matches.

What makes the architecture notable is what it deliberately leaves out. There’s no vector database, no embedding model, no similarity search infrastructure. Storage is plain SQLite. The LLM is doing all the reading, thinking, writing which keeps the stack minimal but also means it won’t scale infinitely. Reading every memory on every query is fine for personal or single-agent use; it becomes expensive at enterprise volume.

The choice of Gemini Flash-Lite is telling. This agent runs 24/7, so cost-per-token matters enormously. Flash-Lite is cheap and fast enough to make continuous background processing practical without burning through an API budget. 

The project is fully open-source under MIT licence, built on Google’s Agent Development Kit, and runnable locally with a single API key. For developers building agents that need continuity across sessions, it’s one of the cleanest reference implementations available right now.

Also read: Copilot Cowork: AI-Powered Task Automation for Microsoft 365 Users

Vyom Ramani

A journalist with a soft spot for tech, games, and things that go beep. While waiting for a delayed metro or rebooting his brain, you’ll find him solving Rubik’s Cubes, bingeing F1, or hunting for the next great snack.

Connect On :