Screen-level OS controlOpen Source#9 of 14 in category

GUI-Owl-1.5

Alibaba · Launched Feb 2026

Alibaba's Mobile-Agent-v3.5 GUI model; OSWorld-Verified 52.9% at 8B scale, AndroidWorld 71.6%, ScreenSpot-Pro 80.3% — SOTA among open GUI models.

Visit GUI-Owl-1.5 →Open weights

Benchmarks scored

80.3

Peak score

Article mentions

Yes

Open source

Benchmark performance

ScreenSpot-Pro

High-resolution professional-software GUI grounding (CAD, IDEs, creative suites) - far harder than consumer ScreenSpot.

80.3

★ SOTA

Benchmark docs →

AndroidWorld

Google's 116 hand-built tasks across 20 real Android apps on a live emulator. Mobile is harder than desktop for most agents.

71.6

Gap to SOTA: -15.5pp (held by Surfer 2)Human: 80%Benchmark docs →

OSWorld-Verified

369 real tasks on a live Ubuntu desktop VM: file I/O, spreadsheets, creative apps, settings. The July 2025 Verified rebuild moved to AWS (50x parallel) and fixed 300+ task bugs. The flagship computer-use benchmark.

52.9

Gap to SOTA: -30.5pp (held by Claude Opus 4.8)Human: 72.4%Benchmark docs →

Other screen-level os control agents

The 14 agents in this category, ranked by peak benchmark.

Agent	Maker	Launch	Peak	Pricing
Claude Sonnet 4.6	Anthropic	2026-02	1470.0	$3 / $15 per M tokens
Claude Opus 4.8	Anthropic	2026-05	88.6	$5 / $25 per M tokens
UI-TARS-2OSS	ByteDance	2025-09	88.2	Open weights
Claude Opus 4.7	Anthropic	2026-03	87.6	$5 / $25 per M tokens
Claude Mythos Preview	Anthropic	2026-04	86.9	Research preview
Gemini 3.1 Pro	Google DeepMind	2026-02	85.9	Google API
GPT-5.5	OpenAI	2026-05	82.6	OpenAI API
Holo3-35B-A3B	H Company	2026-04	82.6	H Company
Holo3-122B-A10B	H Company	2026-04	78.8	H Company
GPT-5.4	OpenAI	2026-03	75.0	OpenAI API

Quick facts

Type: Screen-level OS control
Maker: Alibaba
Launch: 2026-02-01
Open source: Yes
Pricing: Open weights
Benchmarks scored: 3
Article mentions: 0
Rank in category: #9 of 14