gpt 5

30 articles about gpt 5 in AI news

AI's Time Horizon Expands: Claude and GPT Push Multi-Hour Task Capabilities

New analysis reveals Claude Opus 4.6 and GPT 5.3 Codex can handle complex tasks requiring hours of human effort. The METR benchmark shows AI systems approaching 3-4 hour time horizons at 50% success rates, signaling major progress in sustained reasoning.

Feb 16, 202672% relevant

GPT-5.6 Sol on Cerebras Hits 750 Token/s

GPT-5.6 Sol on Cerebras claimed at 750 token/s, but no official data or model release exists. Unverified claim needs vendor confirmation.

Jul 18, 202693% relevant

GPT-Red: OpenAI's LLM Super-Hacker Finds 84% of Attacks, Humans 13%

OpenAI's GPT-Red LLM finds 84% of attacks vs 13% for humans, hardening GPT-5.6 Sol. Automated red-teaming shifts safety paradigm.

Jul 15, 202691% relevant

ChatGPT returns to WhatsApp in EU after Meta forced to open platform

OpenAI re-enabled ChatGPT on WhatsApp in the EEA after EU forced Meta to open its platform. Users reach GPT-5.5 via 1-800-CHATGPT.

Jul 14, 202689% relevant

OpenAI GPT-5.6 Sol, Terra, Luna Launch on Bedrock at Same Price

OpenAI's GPT-5.6 Sol, Terra, and Luna launch on Amazon Bedrock at matching first-party pricing. Sol scores 80 on Coding Agent Index.

Jul 13, 2026100% relevant

Boko Haram AI units use ChatGPT, Claude, Gemini for attack planning

Boko Haram uses ChatGPT, Claude, Gemini, and three other chatbots for attack planning. Cambridge study found safety filters failed.

Jul 11, 2026100% relevant

OpenAI GPT-5.6 Sol matches Fable 5 at 1/3 cost, adds multi-agent API

OpenAI's GPT-5.6 Sol nearly matches Claude Fable 5 on aggregate benchmarks at one-third the cost, with new multi-agent and tool-calling APIs.

Jul 10, 202695% relevant

OpenAI GPT-5.6 Launches Thursday After US Gov't Lifts Ban

OpenAI's GPT-5.6 Sol launches Thursday after US gov't lifts ban. It beats Claude Mythos 5 on benchmarks at half the cost.

Jul 8, 2026100% relevant

GPT-4 Held Top Spot 52 Weeks; Today's Models Last 7

GPT-4 dominated the ECI for a year. Today's top models last 7 weeks median, with 17 leadership changes since Feb 2024.

Jul 6, 202684% relevant

GPT-4 Held ECI Lead for 18 Months, Epoch AI Data Shows

GPT-4 led the ECI for 18 months, the longest reign. GPT-4o and Claude 3.5 Sonnet broke the streak in September 2024.

Jul 2, 202693% relevant

MirrorCode Rebuilds Programs from Behavior Alone, Beats GPT-4o by 37%

Epoch AI's MirrorCode reconstructs programs from I/O behavior alone, scoring 67.3% on SWE-bench—37% above GPT-4o—without source code or traces.

Jun 28, 2026100% relevant

PlanBench-XL: GPT-5.4 Scores 11.36% on Hard Tool-Use Tasks

PlanBench-XL shows GPT-5.4 drops from 51.90% to 11.36% accuracy on long-horizon tool-use tasks with 1,665 tools, revealing a fundamental planning weakness.

Jun 28, 202690% relevant

GPT-5.6 Sol, Terra, Luna: Benchmark Performance Depends on Which Test You Use

OpenAI released GPT-5.6 as three tiers—Sol, Terra, Luna—on June 27, 2026. Sol tops Terminal-Bench 2.1 but trails competitors on other benchmarks. The release shifts focus to tiered pricing and efficiency, but access remains restricted.

Jun 28, 202676% relevant

NanoEuler: GPT-2-Scale 116M Model Built in Pure C/CUDA From Scratch

NanoEuler is a 116M-parameter GPT-2-scale model built in pure C/CUDA from scratch. It provides a complete educational training pipeline for understanding LLMs at the lowest level.

Jun 28, 202675% relevant

OpenAI Launches GPT-5.6 Sol Under US Government Restrictions

OpenAI's GPT-5.6 Sol beats Claude Mythos 5 in agentic coding (88.8% vs 88%) but US government restricts access to select partners, a policy OpenAI calls unsustainable.

Jun 26, 2026100% relevant

White House Orders OpenAI to Gate GPT-5.6 Release per Customer

White House orders OpenAI to gate GPT-5.6 release per customer, mirroring Anthropic's voluntary suspension of Claude Mythos under regulatory pressure.

Jun 25, 2026100% relevant

Gemini 3.5 Flash Scores 78.4 on OSWorld, Matching GPT-5.5

Google integrated Computer Use into Gemini 3.5 Flash, scoring 78.4 on OSWorld — matching GPT-5.5 and undercutting on cost.

Jun 25, 2026100% relevant

OpenAI GPT-5.5-Cyber Beats Anthropic Mythos on Security Benchmarks

OpenAI's GPT-5.5-Cyber beats Anthropic's Mythos on security benchmarks. Updated Codex plugin auto-patches after scanning 30M commits.

Jun 23, 2026100% relevant

Cursor Trains GPT-Size Model with 10-20x Compute

Cursor trained a GPT-size model from scratch with 10-20x more compute, announced at Compile. The move shifts from fine-tuning to pretraining for code generation.

Jun 21, 202691% relevant

OpenAI Says GPT-5.5 Instant Beats Doctors on Health Accuracy — But It Designed the Test

OpenAI's GPT-5.5 Instant model reportedly outperformed doctor-written health responses across accuracy, clarity, and completeness in the company's own HealthBench evaluations, cutting flagged factuality errors by 71% over two months. The catch: OpenAI built the benchmark, organized the physician pan

Jun 18, 202698% relevant

Shopify opens AI sales channels to all merchants as ChatGPT drives 20% of Walmart referral traffic

Shopify launched its Spring '26 Edition on June 17, expanding agentic storefronts to millions of merchants and opening its catalog infrastructure to brands on any platform. The release arrives as AI referral traffic to U.S. retailers jumped 393% in Q1 2026, with ChatGPT now accounting for 20% of Wal

Jun 18, 202697% relevant

OpenAI DeploymentSim predicts GPT-5 errors 92% of the time pre-launch

OpenAI's Deployment Simulation predicted GPT-5 errors with 92% accuracy using 1.3M real conversations, outperforming standard safety tests.

Jun 17, 202690% relevant

ChatGPT Market Share Dips Below 50% for First Time, Sensor Tower Reports

ChatGPT's market share fell to 46.4% in May 2026 as Gemini and Claude gained ground. Users switch based on values and integration, and AI app spending is on pace to hit $4.2B in H1 2026.

Jun 16, 202695% relevant

MA-ProofBench: GPT-5.5 Hits 16% on Math Analysis, Most Models Near 0%

MA-ProofBench, a new theorem-proving benchmark for mathematical analysis, shows GPT-5.5 achieving 16% on undergraduate problems and 5% on PhD-level, with most models near 0% on the harder set.

Jun 15, 202682% relevant

Google Gemini-SQL2 Hits 80.04% on BIRD, Beating GPT-5.5 by 7 Points

Google's Gemini-SQL2 hits 80.04% on BIRD, beating GPT-5.5 by 7 points and Claude Opus 4.6 by 9 points, with no public release or paper yet.

Jun 13, 202695% relevant

Chinese Lab's Free MoE Model Matches GPT-5.5 on Agentic Coding

A Chinese lab released an Apache-2.0 open-weights MoE model matching GPT-5.5 on agentic coding. This free model challenges proprietary AI's lead with sparse MoE architecture.

Jun 12, 2026100% relevant

Visa ChatGPT Integration Enables AI Agent Retail Purchasing

Visa integrated with ChatGPT to let AI agents autonomously purchase retail goods. This enables conversational commerce where users delegate shopping to AI, with Visa handling secure payments.

Jun 11, 202696% relevant

OpenAI's ChatGPT 'Dreaming' Memory Retains Preferences Across Sessions

OpenAI launched a dreaming memory system for ChatGPT that retains user preferences across conversations by compressing and replaying session data, enabling persistent personalization.

Jun 5, 2026100% relevant

Nemotron 3 Ultra matches GPT-5.5 on physics test at 10X lower cost

Nemotron 3 Ultra matched GPT-5.5 on a physics test at 10X lower cost ($0.051 vs $0.57), highlighting MoE efficiency.

Jun 5, 202685% relevant

OpenAI Merges Codex into ChatGPT, Ending Standalone API

OpenAI merges Codex into ChatGPT, discontinuing standalone API. Developers must now use chat interface for code generation.

Jun 2, 202687% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety