Google has introduced a new artificial intelligence model called Gemma 4 12B. The tech giant describes Gemma 4 12B as a “unified transformer” which is designed to bring agentic multimodal intelligence directly to laptops. The new model sits between the smaller Gemma E4B and the advanced 26B Mixture of Experts (MoE) model, offering a balance of performance and efficiency.
Google also revealed that the Gemma family of models has crossed 150 million downloads. The company said developers have already used Gemma models for a wide range of projects, from wearable robotic arms for physical assistance to enterprise-grade security solutions.
One of the biggest highlights of Gemma 4 12B is that it can run locally on devices with just 16GB of RAM or VRAM. According to Google, the model delivers advanced reasoning abilities while maintaining a relatively small memory footprint. Google also says Gemma 4 12B is its first mid-sized model with native audio input support.
Also read: Microsoft AI chief says future artificial intelligence should help humans, not replace them
Unlike many multimodal AI models that depend on separate encoders for visual and audio information, Gemma 4 12B handles these inputs directly through its language model backbone. Google describes this as a more streamlined approach that helps reduce memory usage and improve response speed.
For image processing, Google replaced the traditional vision encoder with a lightweight embedding module. “This allows the LLM backbone to take over visual processing,” Google explained. Also, instead of using a dedicated audio encoder, Gemma 4 12B projects raw audio signals directly into the same space used for text tokens.
Google also highlighted that Gemma 4 12B comes equipped with Multi-Token Prediction (MTP) drafters to reduce latency. According to the company, the model delivers benchmark performance close to its larger 26B counterpart. This could make advanced multimodal AI and agent-based workflows more accessible to developers and users who want to run AI locally on everyday hardware.