Meta unveils SAM 3, its most advanced AI model for visual understanding yet

Updated on 20-Nov-2025

HIGHLIGHTS

SAM 3 introduces text and exemplar prompts to segment nearly any visual concept across images and videos.

Meta is rolling out the Segment Anything Playground, letting users try SAM 3’s editing tools without technical skills.

A new SAM 3D model suite adds object reconstruction and human pose estimation for AR, robotics, and spatial computing.

Meta has introduced the Segment Anything Model 3 (SAM 3), the next generation of visual understanding tools. The new model includes big improvements in object detection, segmentation, and tracking across images and videos. Users can now use text-based and exemplar prompts to identify and segment almost any visual concept.

Meta has also introduced the Segment Anything Playground. This new interface allows the general public to experiment with SAM 3 and test its media editing capabilities without the need for technical knowledge. The company will also release model weights, a detailed research paper, and a new evaluation benchmark called SA-Co (Segment Anything with Concepts) to assist developers working on open-vocabulary segmentation.

Meta is also releasing SAM 3D, a collection of models capable of object and scene reconstruction, as well as human pose and shape estimation, with applications in augmented reality, robotics, and other spatial computing fields.

SAM 3: All upgrades

SAM 3 introduces “promptable concept segmentation,” which enables the model to segment anything users describe using short noun phrases or image examples. Meta claims SAM 3 outperforms previous systems on its new SA-Co benchmark for both images and video.

The model accepts a variety of prompts, such as masks, bounding boxes, points, text, and image examples, giving users multiple options for specifying what they want to detect or track.

The new model was trained using a large-scale data pipeline that included human annotators, SAM 3, and supporting AI systems like a Llama-based captioner. It processes and labels visual data much more efficiently than traditional methods, reducing annotation time and allowing for a dataset of over 4 million visual concepts.

Real-world applications

Meta already uses SAM 3 and SAM 3D to power features like View in Room on Facebook Marketplace, which allows customers to see furniture in their own homes. The technology will also be integrated into upcoming visual editing tools on the Meta AI, Meta.AI, and Edits apps.

Ashish Singh

Ashish Singh is the Chief Copy Editor at Digit. He's been wrangling tech jargon since 2020 (Times Internet, Jagran English '22). When not policing commas, he's likely fueling his gadget habit with coffee, strategising his next virtual race, or plotting a road trip to test the latest in-car tech. He speaks fluent Geek.