Sam Altman OpenAI
OpenAI has introduced GPT-5.3-Codex-Spark, a new AI model built specifically for real-time coding. The new model is currently rolling out as a research preview and is a smaller version of GPT-5.3-Codex. It can generate more than 1,000 tokens per second when running on low-latency hardware, allowing developers to see changes to their code almost immediately. The goal is to make AI-assisted programming feel like a live collaboration rather than a delayed response.
This release is also the first milestone in OpenAI’s partnership with Cerebras, which was announced in January. The Codex-Spark model runs on Cerebras’ Wafer Scale Engine 3, a purpose-built AI accelerator designed to handle extremely fast inference workloads.
Also read: Apple iPhone 17e vs Google Pixel 10a: Price, camera, display, battery and other leaks compared
Developers can collaborate with the model in real time, interrupting or redirecting it as it works, and iterate with near-instant responses. By default, the model makes small, targeted code edits and does not run tests unless instructed. OpenAI says it performs strongly on software engineering benchmarks while completing tasks significantly faster than its larger counterpart.
Codex-Spark is currently text-only at a 128k context window and is said to be the first in a family of ultra-fast models. ‘During the research preview, Codex-Spark will have its own rate limits and usage will not count towards standard rate limits. However, when demand is high, you may see limited access or temporary queuing as we balance reliability across users,’ OpenAI explains.
The model is rolling out to ChatGPT Pro users as a research preview through Codex app, CLI, and VS Code extension.
Also read: OpenAI researcher quits, cites concerns over ChatGPT’s advertising push
OpenAI says Codex-Spark is the first step toward a future where AI coding tools combine fast, interactive assistance with longer-running autonomous problem-solving- allowing developers to switch seamlessly between quick edits and deeper tasks. ‘As we learn more with the developer community about where fast models shine for coding, we’ll introduce even more capabilities- including larger models, longer context lengths, and multimodal input,’ the AI firm added.