Screen-level OS control#13 of 15 in category

OpenAI Computer Use Preview (CUA)

OpenAI · Launched Jan 2025

OpenAI's Operator/CUA specialized model as benchmarked on OSWorld-Verified at 31.3% (max 50 steps). Now folded into ChatGPT Agent.

ChatGPT Pro

Benchmarks scored

31.3

Peak score

Article mentions

Open source

Benchmark performance

OSWorld-Verified

Real desktop workflows across browser, files, office apps. 369 tasks (361 without Google Drive). Human expert baseline 72.4%. Current SOTA: Holo3-35B-A3B (H Company) at 80.4%.

31.3

Gap to SOTA: -49.1pp (held by Holo3-35B-A3B)Human: 72.4%Benchmark docs →

Other screen-level os control agents

The 15 agents in this category, ranked by peak benchmark.

Agent	Maker	Launch	Peak	Pricing
Claude Sonnet 4.6	Anthropic	2026-02	1470.0	API: 3/15 per M tokens
Kimi K2.5OSS	Moonshot AI	2026-01	1410.0	API pay-as-you-go
Claude Computer Use	Anthropic	2024-10	92.1	Claude API — input $5/M, output $25/M
Kimi K2.6OSS	Moonshot AI	2026-04	89.6	API: 0.60/2.75 per M tokens
Holo3-35B-A3B	H Company	2026-04	80.4	H Company enterprise
Claude Sonnet 4.5	Anthropic	2025-09	62.9	Legacy Anthropic API
Seed-1.8	ByteDance Seed	2025-12	61.9	Doubao ecosystem
EvoCUA-20260105	Meituan LongCat	2026-01	56.7	Research
GUI-Owl-1.5 32BOSS	Alibaba Tongyi Lab	2026-03	55.4	Free (OSS)
DeepMiner-Mano-72B	Mininglamp Technology	2025-10	53.9	Research

Quick facts

Type: Screen-level OS control
Maker: OpenAI
Launch: 2025-01-23
Open source: No
Pricing: ChatGPT Pro
Benchmarks scored: 1
Article mentions: 0
Rank in category: #13 of 15