xAI introduces Grok-1.5V AI model with image processing capabilities: Check details

xAI introduces Grok-1.5V AI model with image processing capabilities: Check details
HIGHLIGHTS

xAI has introduced its new Grok-1.5 Vision or Grok-1.5V AI model.

Grok-1.5V is xAI's first multimodal model.

Grok-1.5V will be available soon to xAI’s early testers and existing Grok users.

Elon Musk’s xAI has introduced its new Grok-1.5 Vision or Grok-1.5V AI model. Grok-1.5V is the company’s first multimodal model. In addition to its text capabilities, Grok can now understand a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs. It’s important to note that Grok-1.5V has not yet been released and will be available soon to xAI’s early testers and existing Grok users.

Let’s delve into the capabilities of the Grok-1.5V AI model.

Also read: Elon Musk’s Grok is now available for X premium users too! Here’s how to use it

xAI introduces Grok-1.5V AI model with image processing capabilities: Check details

Grok-1.5V: Capabilities

Grok-1.5V is competitive with existing frontier multimodal models in a number of domains, ranging from multi-disciplinary reasoning to understanding documents, science diagrams, charts, screenshots, and photographs. 

Its performance surpasses that of its peers in xAI’s new RealWorldQA benchmark, which evaluates real-world spatial understanding.

Also read: What is grok and why it is different from ChatGPT? Find out 

xAI introduces Grok-1.5V AI model with image processing capabilities: Check details

RealWorldQA

xAI has introduced a new benchmark called RealWorldQA. This benchmark is designed to evaluate basic real-world spatial understanding capabilities of multimodal models. While many of the examples in the current benchmark are relatively easy for humans, they often pose a challenge for frontier models.

The initial release of the RealWorldQA consists of over 700 images, with a question and easily verifiable answer for each image. The dataset consists of anonymised images taken from vehicles, in addition to other real-world images. 

xAI stated that advancing both its multimodal understanding and generation capabilities are important steps in building beneficial AGI that can understand the universe. In the coming months, xAI anticipates making significant improvements in both capabilities across various modalities such as images, audio, and video.

It seems like Elon Musk is trying hard to fight the competition with his chatbot. He sees his competition with OpenAI’s ChatGPT, Google’s Gemini, or Anthropic’s Claude. 

Ayushi Jain

Ayushi Jain

Tech news writer by day, BGMI player by night. Combining my passion for tech and gaming to bring you the latest in both worlds. View Full Profile

Digit.in
Logo
Digit.in
Logo