Excel Agent Showdown: ChatGPT Builds Working Strategy Game with 'Smart' Enemy, Claude Creates Board, Copilot Fails

When prompted to create a working strategy game in Excel with graphics, ChatGPT built a functional game with formulas and a 'smart' enemy AI, Claude created a board but acted as game master, and Microsoft Copilot failed to produce a game.

AAAla AYADI & AI Research Desk·Mar 16, 2026·2 min read··154 views·AI-Generated·Report error

Source: x.comvia @emollickSingle Source

What Happened

AI researcher Ethan Mollick conducted an informal test of three major AI coding assistants—Claude (Anthropic), ChatGPT (OpenAI), and Microsoft Copilot—by giving them the same prompt: "make me a working strategy game in excel, it should have some form of graphics."

The results revealed significant differences in how each AI agent approached the task:

ChatGPT successfully built a working strategy game with formulas and implemented a "smart" enemy AI opponent
Claude created a game board but didn't build a complete game, instead positioning itself as a game master that would respond to player moves
Microsoft Copilot created only a board with no functional game mechanics

Context

This test highlights the varying capabilities of current AI coding assistants when faced with complex, multi-step creative tasks that require both programming logic and visual design elements within a constrained environment like Microsoft Excel.

Excel represents a particularly challenging platform for game development due to its spreadsheet-based architecture, requiring creative use of formulas, conditional formatting, and potentially VBA (Visual Basic for Applications) to create interactive experiences.

The fact that ChatGPT implemented a "smart" enemy suggests it went beyond basic game mechanics to include opponent AI logic, which would require more sophisticated programming than simply creating a static game board.

This informal comparison follows similar benchmarking efforts by researchers and developers testing AI capabilities across different domains, though this particular test appears to be more qualitative than quantitative, focusing on functional outcomes rather than standardized metrics.

Source: gentic.news · Mar 16, 2026 · author=Ala AYADI · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala AYADI.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This informal test reveals several important technical distinctions between current AI coding assistants. ChatGPT's ability to create a functional game with enemy AI suggests stronger capabilities in multi-step reasoning and implementation of game logic within constrained environments. The 'smart' enemy implementation implies the model understood not just how to create game mechanics, but how to program opponent behavior—a more complex task requiring state management and decision algorithms. Claude's approach of creating a board and positioning itself as game master represents a different architectural choice—perhaps prioritizing interactive guidance over complete automation. This could reflect differences in training data or reinforcement learning preferences, where Claude may be optimized for collaborative coding rather than end-to-end solution generation. Microsoft Copilot's failure to produce a working game is notable given its deep integration with Microsoft's ecosystem. This suggests either limitations in its current capabilities for complex creative tasks or differences in how it interprets and executes multi-step instructions compared to standalone models. The test highlights that despite similar marketing positioning, these AI assistants have meaningful differences in their practical capabilities for non-standard programming tasks.

#excel #benchmarking #coding assistants #ai development

This story is part of

The Instruction Hierarchy Crisis: OpenAI's Internal Fix for a Systemic AI Safety Failure

As public chatbots fail safety tests, OpenAI's quiet IH-Challenge project reveals a deeper struggle to control model agency.

Compare side-by-side

Anthropic vs OpenAI

→

Mentioned in this article

ChatGPT Claude Code Microsoft Copilot Anthropic OpenAI Microsoft Ethan Mollick

Enjoyed this article?