Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…
Screen-level OS controlOpen Source#13 of 14 in category

Qwen3-VL-235B

Alibaba · Launched Jan 2026

Alibaba's large vision-language MoE; among the strongest open-source models on OSWorld (66.7% with an agent scaffold).

1
Benchmarks scored
66.7
Peak score
0
Article mentions
Yes
Open source

Benchmark performance

OSWorld-Verified

369 real tasks on a live Ubuntu desktop VM: file I/O, spreadsheets, creative apps, settings. The July 2025 Verified rebuild moved to AWS (50x parallel) and fixed 300+ task bugs. The flagship computer-use benchmark.

66.7
Gap to SOTA: -16.7pp (held by Claude Opus 4.8)Human: 72.4%Benchmark docs →

Other screen-level os control agents

The 14 agents in this category, ranked by peak benchmark.

AgentMakerLaunchPeakPricing
Claude Sonnet 4.6Anthropic2026-021470.0$3 / $15 per M tokens
Claude Opus 4.8Anthropic2026-0588.6$5 / $25 per M tokens
UI-TARS-2OSSByteDance2025-0988.2Open weights
Claude Opus 4.7Anthropic2026-0387.6$5 / $25 per M tokens
Claude Mythos PreviewAnthropic2026-0486.9Research preview
Gemini 3.1 ProGoogle DeepMind2026-0285.9Google API
GPT-5.5OpenAI2026-0582.6OpenAI API
Holo3-35B-A3BH Company2026-0482.6H Company
GUI-Owl-1.5OSSAlibaba2026-0280.3Open weights
Holo3-122B-A10BH Company2026-0478.8H Company

Quick facts

Type
Screen-level OS control
Maker
Alibaba
Launch
2026-01-01
Open source
Yes
Pricing
Open weights
Benchmarks scored
1
Article mentions
0
Rank in category
#13 of 14