A direct, user-side comparison of leading AI chatbots reveals a significant divergence in how they expose their internal reasoning processes. According to an analysis shared by researcher and professor Ethan Mollick, OpenAI's ChatGPT currently provides the most comprehensive and accessible view of its "thinking traces"—the step-by-step reasoning an AI model uses to arrive at an answer.
This feature is not a minor UI detail but a fundamental tool for developers, researchers, and power users who need to debug AI outputs, verify logic, or understand where a model's reasoning might have gone astray. The ability to audit an AI's chain-of-thought is critical for building trust and enabling reliable use in technical workflows.
What the Analysis Found
The comparison, based on hands-on use, breaks down the transparency features of three major platforms:
- ChatGPT (OpenAI): Offers a two-tiered system for viewing reasoning. A short, summarized list of key steps appears in the main chat window. For users who need deeper insight, a detailed sidebar provides a full, step-by-step audit of the model's calculations, code execution, and logical deductions.
- Claude (Anthropic): Performs "almost as well" in terms of overall reasoning capability but presents its process in a more summarized format. The analysis notes it is "harder to see calculations and code" specifically, making detailed technical auditing more challenging.
- Gemini (Google): Described as having a "big weak spot" in this area, lacking a comparable, user-accessible system for tracing the model's internal reasoning steps.
Why Thinking Traces Matter
For technical users, an AI's final answer is often less important than the process it used to get there. This is especially true for:
- Code Generation & Debugging: Seeing the step-by-step logic and code execution is essential for verifying correctness and identifying bugs in AI-generated programs.
- Mathematical & Logical Reasoning: Following the calculation trail allows users to spot errors in arithmetic or flawed logical leaps.
- Prompt Engineering & Optimization: Understanding how a model interprets and decomposes a complex prompt helps users refine their instructions for better results.
- Trust & Safety: Transparency into reasoning is a cornerstone of AI safety and alignment efforts, allowing for better scrutiny of model behavior.
The absence of this feature, as noted with Gemini, forces users to treat the AI as a black box, accepting or rejecting outputs without the ability to diagnose why a particular answer was generated.
The Competitive Landscape for AI Transparency
This user observation highlights a growing point of differentiation beyond mere benchmark performance. While raw capability on tasks like MMLU or GPQA is crucial, the operational usability of models in real-world developer workflows is becoming equally important.
OpenAI's implementation suggests a focus on the practitioner experience, providing tools that bridge the gap between a conversational AI and a reliable reasoning engine. Anthropic's approach with Claude, while highly capable, appears to prioritize cleaner, more summarized outputs over granular process visibility. Google's position, as reported, indicates this aspect of the user experience may not yet be a primary focus for the Gemini interface.
gentic.news Analysis
This observation aligns with a broader, industry-wide push towards explainable AI (XAI) and transparency. For years, the dominant narrative in frontier model development has been scaling parameters and training compute. However, as these models are deployed into critical technical and business workflows, the demand for interpretability tools has surged. This isn't just about trust; it's about utility. A developer can't effectively integrate an AI coding assistant if they can't see its intermediate steps to fix a bug.
This development follows OpenAI's consistent pattern of iterating on developer-centric features. From the launch of the GPT Store and custom instructions to the recent memory feature, OpenAI has shown a focus on enhancing the practical usability of its models within sustained workflows. The thinking trace feature fits this pattern perfectly—it's a tool for power users. Conversely, this highlights a potential gap for Google. While Gemini Advanced has proven highly competitive on many benchmarks, as we covered in our analysis of its performance against ChatGPT, usability features like this can become significant differentiators that aren't captured in standard evaluations.
Looking forward, the visibility of reasoning traces will likely become a standard expectation for professional-tier AI tools. We anticipate similar features will be prioritized in upcoming updates from competitors, as the market shifts from competing solely on answer quality to competing on the entire developer experience and toolchain.
Frequently Asked Questions
What are "thinking traces" in AI?
Thinking traces, also referred to as chain-of-thought reasoning, are the step-by-step logical processes an AI model uses internally to solve a problem or answer a query. Instead of just presenting a final answer, some models can expose this internal monologue, showing how they broke down the problem, performed calculations, wrote and executed code, and reached their conclusion.
Why is seeing an AI's thinking process important?
For technical users and developers, it's critical for debugging and validation. If an AI generates incorrect code or a flawed mathematical solution, seeing the trace allows you to pinpoint exactly where the logic failed. It transforms the AI from a black-box oracle into a debuggable system, increasing trust and enabling more reliable integration into software development, data analysis, and research workflows.
Can I access these thinking traces in the free version of ChatGPT?
The availability of advanced features like detailed thinking traces typically aligns with the model's capability. The analysis referenced likely pertains to ChatGPT-4o or ChatGPT-4, which are available to paying subscribers via ChatGPT Plus. The free tier, which uses older models like GPT-3.5, generally does not include these advanced reasoning transparency features.
Is this feature related to "OpenAI o1"?
The reasoning transparency described is a user interface feature for observing model outputs. It is conceptually related to but distinct from the OpenAI o1 model architecture, which was designed with enhanced reasoning capabilities at its core. The o1 models are built to perform more deliberate, chain-of-thought reasoning internally. The UI feature discussed here is the window that allows users to see that reasoning, whether it's coming from a standard GPT-4 model or a specialized reasoning model like o1.









