gpt 4

30 articles about gpt 4 in AI news

OpenAI Cuts GPT-5.6 Luna Price 80% to $0.20/M Tokens

OpenAI cut GPT-5.6 Luna prices 80% to $0.20/M input tokens, citing Sol-optimized kernels that cut serving costs 20%. Luna now undercuts Gemini Flash-Lite and Claude Haiku.

Jul 30, 202692% relevant

ChatGPT Nears 1B Weekly Active Users, Missed Internal Deadline

ChatGPT is nearing 1B weekly active users, 7 months after OpenAI's internal deadline. Still among fastest apps to reach milestone.

Jul 29, 202685% relevant

CAS ZhiJing Beats GPT-5.5 on Social Cognition with FLARE Training

CAS ICT releases ZhiJing social intelligence system, Zing model beats GPT-5.5 on social cognition via FLARE training.

Jul 27, 202688% relevant

Alibaba's RecGPT-V3 Boosts GMV 3.97%, Cuts Serving Cost 52.4% on Taobao

Alibaba's RecGPT-V3, a stateful hybrid-modal recommender with continual memory, boosts GMV by 3.97% and cuts serving costs by 52.4% on Taobao.

Jul 26, 202685% relevant

GPT-5.6 Sol Leads DeepSWE at 72.7%, Beating Opus 5's 68.8%

GPT-5.6 Sol scores 72.7% on DeepSWE, beating Opus 5's 68.8%. The undocumented benchmark tests autonomous SWE agents.

Jul 25, 2026100% relevant

Gemini 3.6 Flash Hits 83% on Computer Use, Beats GPT-5.6 and Grok

Gemini 3.6 Flash scored 83.0% on OSWorld-Verified, beating GPT-5.6 and Grok at $7.50 per million tokens, breaking the cheap-tier capability trade-off.

Jul 22, 202687% relevant

GPT-5.6 Sol on Cerebras Hits 750 Token/s

GPT-5.6 Sol on Cerebras claimed at 750 token/s, but no official data or model release exists. Unverified claim needs vendor confirmation.

Jul 18, 202697% relevant

GPT-Red: OpenAI's LLM Super-Hacker Finds 84% of Attacks, Humans 13%

OpenAI's GPT-Red LLM finds 84% of attacks vs 13% for humans, hardening GPT-5.6 Sol. Automated red-teaming shifts safety paradigm.

Jul 15, 202691% relevant

ChatGPT returns to WhatsApp in EU after Meta forced to open platform

OpenAI re-enabled ChatGPT on WhatsApp in the EEA after EU forced Meta to open its platform. Users reach GPT-5.5 via 1-800-CHATGPT.

Jul 14, 202693% relevant

OpenAI GPT-5.6 Sol, Terra, Luna Launch on Bedrock at Same Price

OpenAI's GPT-5.6 Sol, Terra, and Luna launch on Amazon Bedrock at matching first-party pricing. Sol scores 80 on Coding Agent Index.

Jul 13, 2026100% relevant

Boko Haram AI units use ChatGPT, Claude, Gemini for attack planning

Boko Haram uses ChatGPT, Claude, Gemini, and three other chatbots for attack planning. Cambridge study found safety filters failed.

Jul 11, 2026100% relevant

OpenAI GPT-5.6 Sol matches Fable 5 at 1/3 cost, adds multi-agent API

OpenAI's GPT-5.6 Sol nearly matches Claude Fable 5 on aggregate benchmarks at one-third the cost, with new multi-agent and tool-calling APIs.

Jul 10, 202695% relevant

OpenAI GPT-5.6 Launches Thursday After US Gov't Lifts Ban

OpenAI's GPT-5.6 Sol launches Thursday after US gov't lifts ban. It beats Claude Mythos 5 on benchmarks at half the cost.

Jul 8, 2026100% relevant

GPT-4 Held Top Spot 52 Weeks; Today's Models Last 7

GPT-4 dominated the ECI for a year. Today's top models last 7 weeks median, with 17 leadership changes since Feb 2024.

Jul 6, 202684% relevant

GPT-4 Held ECI Lead for 18 Months, Epoch AI Data Shows

GPT-4 led the ECI for 18 months, the longest reign. GPT-4o and Claude 3.5 Sonnet broke the streak in September 2024.

Jul 2, 202693% relevant

MirrorCode Rebuilds Programs from Behavior Alone, Beats GPT-4o by 37%

Epoch AI's MirrorCode reconstructs programs from I/O behavior alone, scoring 67.3% on SWE-bench—37% above GPT-4o—without source code or traces.

Jun 28, 2026100% relevant

PlanBench-XL: GPT-5.4 Scores 11.36% on Hard Tool-Use Tasks

PlanBench-XL shows GPT-5.4 drops from 51.90% to 11.36% accuracy on long-horizon tool-use tasks with 1,665 tools, revealing a fundamental planning weakness.

Jun 28, 202690% relevant

GPT-5.6 Sol, Terra, Luna: Benchmark Performance Depends on Which Test You Use

OpenAI released GPT-5.6 as three tiers—Sol, Terra, Luna—on June 27, 2026. Sol tops Terminal-Bench 2.1 but trails competitors on other benchmarks. The release shifts focus to tiered pricing and efficiency, but access remains restricted.

Jun 28, 202676% relevant

NanoEuler: GPT-2-Scale 116M Model Built in Pure C/CUDA From Scratch

NanoEuler is a 116M-parameter GPT-2-scale model built in pure C/CUDA from scratch. It provides a complete educational training pipeline for understanding LLMs at the lowest level.

Jun 28, 202675% relevant

OpenAI Launches GPT-5.6 Sol Under US Government Restrictions

OpenAI's GPT-5.6 Sol beats Claude Mythos 5 in agentic coding (88.8% vs 88%) but US government restricts access to select partners, a policy OpenAI calls unsustainable.

Jun 26, 2026100% relevant

White House Orders OpenAI to Gate GPT-5.6 Release per Customer

White House orders OpenAI to gate GPT-5.6 release per customer, mirroring Anthropic's voluntary suspension of Claude Mythos under regulatory pressure.

Jun 25, 2026100% relevant

Gemini 3.5 Flash Scores 78.4 on OSWorld, Matching GPT-5.5

Google integrated Computer Use into Gemini 3.5 Flash, scoring 78.4 on OSWorld — matching GPT-5.5 and undercutting on cost.

Jun 25, 2026100% relevant

OpenAI GPT-5.5-Cyber Beats Anthropic Mythos on Security Benchmarks

OpenAI's GPT-5.5-Cyber beats Anthropic's Mythos on security benchmarks. Updated Codex plugin auto-patches after scanning 30M commits.

Jun 23, 2026100% relevant

Cursor Trains GPT-Size Model with 10-20x Compute

Cursor trained a GPT-size model from scratch with 10-20x more compute, announced at Compile. The move shifts from fine-tuning to pretraining for code generation.

Jun 21, 202691% relevant

OpenAI Says GPT-5.5 Instant Beats Doctors on Health Accuracy — But It Designed the Test

OpenAI's GPT-5.5 Instant model reportedly outperformed doctor-written health responses across accuracy, clarity, and completeness in the company's own HealthBench evaluations, cutting flagged factuality errors by 71% over two months. The catch: OpenAI built the benchmark, organized the physician pan

Jun 18, 202698% relevant

Shopify opens AI sales channels to all merchants as ChatGPT drives 20% of Walmart referral traffic

Shopify launched its Spring '26 Edition on June 17, expanding agentic storefronts to millions of merchants and opening its catalog infrastructure to brands on any platform. The release arrives as AI referral traffic to U.S. retailers jumped 393% in Q1 2026, with ChatGPT now accounting for 20% of Wal

Jun 18, 202697% relevant

OpenAI DeploymentSim predicts GPT-5 errors 92% of the time pre-launch

OpenAI's Deployment Simulation predicted GPT-5 errors with 92% accuracy using 1.3M real conversations, outperforming standard safety tests.

Jun 17, 202690% relevant

ChatGPT Market Share Dips Below 50% for First Time, Sensor Tower Reports

ChatGPT's market share fell to 46.4% in May 2026 as Gemini and Claude gained ground. Users switch based on values and integration, and AI app spending is on pace to hit $4.2B in H1 2026.

Jun 16, 202695% relevant

MA-ProofBench: GPT-5.5 Hits 16% on Math Analysis, Most Models Near 0%

MA-ProofBench, a new theorem-proving benchmark for mathematical analysis, shows GPT-5.5 achieving 16% on undergraduate problems and 5% on PhD-level, with most models near 0% on the harder set.

Jun 15, 202682% relevant

Google Gemini-SQL2 Hits 80.04% on BIRD, Beating GPT-5.5 by 7 Points

Google's Gemini-SQL2 hits 80.04% on BIRD, beating GPT-5.5 by 7 points and Claude Opus 4.6 by 9 points, with no public release or paper yet.

Jun 13, 202695% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety