Cursor AI agents just wrote 1 million lines of code to build a web browser from scratch, here’s how

HIGHLIGHTS

Cursor AI agents wrote 1 million lines to build browser

Planner agents scaled autonomous coding to build massive software projects

Long-running GPT-5.2 agents built a web browser from scratch

The era of the AI “Copilot”, a helpful assistant that writes a few lines of code while you watch, might already be ending. In a new experiment published this week, the team behind the Cursor code editor revealed they have successfully scaled autonomous agents to handle massive, long-running projects, effectively creating an “AI software factory.”

The headline result? A swarm of AI agents wrote over 1 million lines of code to build a functional web browser from scratch in roughly a week. They also migrated the entire Cursor codebase from Solid.js to React and built a Windows 7 emulator. Here is the breakdown of how they moved from simple autocomplete to industrial-scale autonomous coding.

Also read: Can AI make a web browser from scratch? This experiment says maybe

The problem: Why single agents fail

Until now, AI agents have been “sprinters.” They are excellent at fixing a bug or writing a script (tasks that take minutes), but they fail at “marathons” (tasks that take weeks).

Context Overload: A single agent cannot keep the entire logic of a 1-million-line codebase in its “brain” (context window) at once.
Laziness & Drift: Over long periods, models tend to lose focus, hallucinate, or take shortcuts to “finish” the task early.

Cursor’s experiment proved that you can’t just ask one super-smart model to “build a browser.” You need a corporate structure.

The solution: The planner-worker hierarchy

To solve the scaling problem, Cursor didn’t use a better model; they used a better org chart. They moved away from a flat structure (where every agent tries to do everything) to a strict hierarchy.

Also read: Elon Musk and OpenAI: How a partnership became a $134 billion legal war

The Planners (The Managers) are agents, powered by high-reasoning models like GPT-5.2, act as project leads. They read the codebase, understand the high-level architecture, and break the project down into thousands of tiny, actionable tasks. They do not write code. This keeps their context window clean and focused purely on logic and strategy.

The Workers (The Engineers) as agents are “stateless.” They pick up a single task from the Planner’s queue (e.g., “Implement the tab closing logic in tabs.ts”), execute it, run the tests, and submit the work. Because they only care about one file at a time, they don’t get confused by the complexity of the wider project.

The Judges (QA) are a final layer of agents that reviews the output. If the tests pass and the code matches the spec, it’s merged. If not, it’s sent back to the loop.

The “GPT-5.2” advantage

The experiment highlighted a crucial hardware/software divide. While earlier models like Claude 3 Opus (Opus 4.5) were capable coders, they often struggled with the long-term planning required to manage a weeks-long project.

Cursor found that GPT-5.2 was essential for the “Planner” role. It demonstrated the ability to maintain a consistent “train of thought” over days, creating tasks that actually made sense for the workers without hallucinating non-existent files or circular dependencies.

The results: Coding by “brute force”

The most surprising finding from the experiment was that quantity has a quality all its own. By throwing hundreds of parallel agents at the problem, they generated code at a speed human teams couldn’t match. The 1 million lines weren’t perfect initially, but the sheer volume of iteration (write, test, fix, repeat) refined it rapidly.

Migrating a massive codebase is usually a nightmare for developers. The agents treated it as a rote task, methodically translating file after file without getting bored or fatigued.

This experiment suggests a future where software engineering looks more like management. Instead of typing syntax, senior engineers might soon spend their days reviewing the “Architecture Plans” generated by a Planner agent, approving the strategy, and then letting a swarm of Worker agents implement the 5,000 necessary changes overnight. The code factory is here and it runs on GPT-5.2.

Also read: Stanford to MIT: Claude AI is helping biological researchers unlock new science

Vyom Ramani

A journalist with a soft spot for tech, games, and things that go beep. While waiting for a delayed metro or rebooting his brain, you’ll find him solving Rubik’s Cubes, bingeing F1, or hunting for the next great snack.