Microsoft’s medical AI system is four times more accurate than human doctors: Here’s how

Updated on 01-Jul-2025

HIGHLIGHTS

Microsoft’s MAI-DxO outperforms doctors in diagnosis, revolutionizing AI-powered healthcare with unmatched precision.

This AI system simulates a team of expert doctors to tackle rare and complex medical cases.

Microsoft’s multi-agent medical AI uses GPT, Claude, Gemini, and more to deliver accurate, fast diagnoses.

Microsoft new AI in health

On July 1, Doctors day, a day when India and the world celebrate the selfless service of healthcare professionals, Microsoft has unveiled a breakthrough that could redefine the future of diagnostics. Its new system, the Microsoft AI Diagnostic Orchestrator (MAI‑DxO), has achieved an astonishing 85.5% diagnostic accuracy in complex medical cases, over four times more accurate than experienced human doctors working under controlled test conditions.

While it’s not intended to replace clinicians, MAI‑DxO could become a powerful second opinion in hospitals and clinics, especially where access to specialist expertise is limited. Microsoft’s AI CEO Mustafa Suleyman says this is “a big step toward medical superintelligence.”

Also read: AI in healthcare 2024: AI powered hardware that made living better

How It Works: A Virtual Panel of AI Doctors

At its core, MAI‑DxO simulates a collaborative team of digital physicians, each responsible for different stages of the diagnostic process. When given a case like a patient with a high fever, shortness of breath, and fatigue, the system doesn’t just guess the diagnosis. Instead, it follows a step-by-step process similar to how human doctors reason through complex symptoms:

Symptom Analysis: Starts by parsing the initial complaints and patient history.
Dynamic Investigation: Orders relevant tests, asks follow-up questions, and refines its hypothesis.
Test Optimisation: Selects diagnostic procedures based on both accuracy and cost, avoiding redundant or expensive choices.
Multi-Model Orchestration: Unlike typical AI systems that rely on a single model, MAI‑DxO combines insights from multiple large language models including GPT-4, Claude, Gemini, Llama, Grok, and DeepSeek to simulate a multi-specialist panel.
Self-Verification: Before reaching a conclusion, the system verifies its logic, weighing medical costs and confirming consistency in reasoning.

This entire process is designed to mimic the rigor of a seasoned team of physicians, but compressed into seconds.

Microsoft tested MAI‑DxO using 304 of the most complex diagnostic puzzles from the New England Journal of Medicine, a gold-standard in medical literature. These cases are far more nuanced than typical textbook problems. In a direct comparison, MAI‑DxO achieved an 85.5% diagnostic accuracy, while the average for 21 experienced human doctors from the U.S. and U.K. was just 20%. The system also reduced diagnostic costs by an average of 20%, thanks to its cost-aware decision-making.

Also read: Satya Nadella on AI progress: What Microsoft CEO revealed in a talk

The benchmark used is called SD-Bench (Sequential Diagnosis Benchmark) and it better replicates real-world workflows than traditional multiple-choice tests like the USMLE. It challenges both humans and machines to proceed step-by-step, with limited information and real-time decision-making.

Broader Impact: Where This Could Go

MAI‑DxO is model-agnostic, meaning it can work with any LLM that meets medical reasoning standards. It uses a five-agent framework, with each AI persona simulating a distinct function, like differential diagnosis, test selection, and final review. It’s also designed to operate within configurable budget constraints, which could be game-changing in low-resource environments like rural clinics or public hospitals.

While the system is still in the research phase, its potential applications are massive. Microsoft envisions MAI‑DxO assisting clinicians in hospitals, enhancing consumer health tools like Bing and Copilot, and integrating into documentation platforms like Dragon Copilot or radiology pipelines via RAD‑DINO. With healthcare waste in the U.S. alone estimated at over $1 trillion annually, MAI‑DxO could help reduce unnecessary testing and curb spiraling costs.

The Fine Print: What Needs Work

Despite its promise, MAI‑DxO hasn’t yet been tested in real-world clinical settings, where time pressure, incomplete records, and patient variability add layers of complexity.
Critics also note that the doctors in the benchmark weren’t allowed access to Google or clinical tools, potentially widening the gap.

Moreover, questions around data bias, patient equity, and regulatory approval remain unanswered. Microsoft says clinical trials for broader ailments, not just rare or complex cases, are next, followed by safety testing and approvals.

On a day meant to celebrate doctors, Microsoft’s announcement is not a threat, but a tribute, highlighting how AI might one day become a tireless partner to clinicians everywhere. If MAI‑DxO holds up in the real world, it could revolutionise diagnosis, democratise specialist-level care, and support the very doctors it aims to emulate.

Also read: Researchers flag OpenAI’s Whisper AI used in hospitals as problematic, here’s why

Vyom Ramani

A journalist with a soft spot for tech, games, and things that go beep. While waiting for a delayed metro or rebooting his brain, you’ll find him solving Rubik’s Cubes, bingeing F1, or hunting for the next great snack.