Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Excel Agent Showdown: ChatGPT Builds Working Strategy Game with 'Smart' Enemy, Claude Creates Board, Copilot Fails

Excel Agent Showdown: ChatGPT Builds Working Strategy Game with 'Smart' Enemy, Claude Creates Board, Copilot Fails

When prompted to create a working strategy game in Excel with graphics, ChatGPT built a functional game with formulas and a 'smart' enemy AI, Claude created a board but acted as game master, and Microsoft Copilot failed to produce a game.

·Mar 16, 2026·2 min read··154 views·AI-Generated·Report error
Share:
Excel Agent Showdown: ChatGPT Builds Working Strategy Game with 'Smart' Enemy, Claude Creates Board, Copilot Fails

What Happened

AI researcher Ethan Mollick conducted an informal test of three major AI coding assistants—Claude (Anthropic), ChatGPT (OpenAI), and Microsoft Copilot—by giving them the same prompt: "make me a working strategy game in excel, it should have some form of graphics."

The results revealed significant differences in how each AI agent approached the task:

  • ChatGPT successfully built a working strategy game with formulas and implemented a "smart" enemy AI opponent
  • Claude created a game board but didn't build a complete game, instead positioning itself as a game master that would respond to player moves
  • Microsoft Copilot created only a board with no functional game mechanics

Context

This test highlights the varying capabilities of current AI coding assistants when faced with complex, multi-step creative tasks that require both programming logic and visual design elements within a constrained environment like Microsoft Excel.

Excel represents a particularly challenging platform for game development due to its spreadsheet-based architecture, requiring creative use of formulas, conditional formatting, and potentially VBA (Visual Basic for Applications) to create interactive experiences.

The fact that ChatGPT implemented a "smart" enemy suggests it went beyond basic game mechanics to include opponent AI logic, which would require more sophisticated programming than simply creating a static game board.

This informal comparison follows similar benchmarking efforts by researchers and developers testing AI capabilities across different domains, though this particular test appears to be more qualitative than quantitative, focusing on functional outcomes rather than standardized metrics.

Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala AYADI.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This informal test reveals several important technical distinctions between current AI coding assistants. ChatGPT's ability to create a functional game with enemy AI suggests stronger capabilities in multi-step reasoning and implementation of game logic within constrained environments. The 'smart' enemy implementation implies the model understood not just how to create game mechanics, but how to program opponent behavior—a more complex task requiring state management and decision algorithms. Claude's approach of creating a board and positioning itself as game master represents a different architectural choice—perhaps prioritizing interactive guidance over complete automation. This could reflect differences in training data or reinforcement learning preferences, where Claude may be optimized for collaborative coding rather than end-to-end solution generation. Microsoft Copilot's failure to produce a working game is notable given its deep integration with Microsoft's ecosystem. This suggests either limitations in its current capabilities for complex creative tasks or differences in how it interprets and executes multi-step instructions compared to standalone models. The test highlights that despite similar marketing positioning, these AI assistants have meaningful differences in their practical capabilities for non-standard programming tasks.
This story is part of
The Instruction Hierarchy Crisis: OpenAI's Internal Fix for a Systemic AI Safety Failure
As public chatbots fail safety tests, OpenAI's quiet IH-Challenge project reveals a deeper struggle to control model agency.
Compare side-by-side
Anthropic vs OpenAI
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

More in Products & Launches

View all