Meta has introduced the Segment Anything Model 3 (SAM 3), the next generation of visual understanding tools. The new model includes big improvements in object detection, segmentation, and tracking across images and videos. Users can now use text-based and exemplar prompts to identify and segment almost any visual concept.
Meta has also introduced the Segment Anything Playground. This new interface allows the general public to experiment with SAM 3 and test its media editing capabilities without the need for technical knowledge. The company will also release model weights, a detailed research paper, and a new evaluation benchmark called SA-Co (Segment Anything with Concepts) to assist developers working on open-vocabulary segmentation.
Meta is also releasing SAM 3D, a collection of models capable of object and scene reconstruction, as well as human pose and shape estimation, with applications in augmented reality, robotics, and other spatial computing fields.
SAM 3 introduces “promptable concept segmentation,” which enables the model to segment anything users describe using short noun phrases or image examples. Meta claims SAM 3 outperforms previous systems on its new SA-Co benchmark for both images and video.
The model accepts a variety of prompts, such as masks, bounding boxes, points, text, and image examples, giving users multiple options for specifying what they want to detect or track.
The new model was trained using a large-scale data pipeline that included human annotators, SAM 3, and supporting AI systems like a Llama-based captioner. It processes and labels visual data much more efficiently than traditional methods, reducing annotation time and allowing for a dataset of over 4 million visual concepts.
Meta already uses SAM 3 and SAM 3D to power features like View in Room on Facebook Marketplace, which allows customers to see furniture in their own homes. The technology will also be integrated into upcoming visual editing tools on the Meta AI, Meta.AI, and Edits apps.