A line graph titled 'Selective Attackers Cut Agent Safety by 28pp' shows a steep downward trend from 100% to 72%…

AI ResearchBreakthrough

100

Selective Attackers Cut Agent Safety by 28pp, Paper Finds

Strategic attack timing cuts agent AI safety by up to 28pp, showing current evaluations overestimate safety.

arxiv.org/1d ago/3 min read/Widely Reported

research paperai safetyagentic ai

A macOS Activity Monitor window showing CPU, memory, and disk usage graphs, with multiple running processes listed below

AI Research

95

MacArena: 421-Task macOS Benchmark Reveals 26% CUA Ranking Inversion

MacArena benchmark of 421 macOS tasks reveals 26% performance gap for top models on native tasks, suggesting current CUAs overfit to Linux distributions.

arxiv.org/1d ago/3 min read/Widely Reported

computer-use agentsapplereinforcement learning

Bar chart comparing weekly token growth of Chinese and US LLMs on OpenRouter, showing Chinese models surging past US…

AI Research

95

Chinese LLMs Surge on OpenRouter as U.S. AI Traffic Shifts

Chinese LLMs now drive most weekly token growth on OpenRouter, with American startups routing more traffic to them, per @rohanpaul_ai. The shift reflects utility over brand loyalty.

x.com/1d ago/3 min read/Multi-Source

open sourceai infrastructurestartups

Three memory chip stacks labeled HBM4 on a circuit board near an Nvidia Vera Rubin GPU, with logos for SK Hynix…

Products & Launches

92

Nvidia Qualifies HBM4 for Vera Rubin, SK Hynix Gets 60-70% Share

Nvidia qualified HBM4 from all three DRAM suppliers for Vera Rubin, with SK Hynix taking 60-70% share, clearing a key bottleneck for 2026 production.

x.com/1d ago/3 min read

hardwarememorynvidia

WorldBench: Top MLLM Scores 64% on Visually Diverse Benchmark

AI Research

92

WorldBench: Top MLLM Scores 64% on Visually Diverse Benchmark

WorldBench, a new multimodal benchmark, tests 15 MLLMs on visually diverse images. Top model scores 64.0%, exposing fundamental gaps in visual understanding.

arxiv.org/1d ago/3 min read/Widely Reported

computer visionbenchmarkmultimodal

A line chart comparing small, medium, and large AI models shows the large model retaining rare skills longer during…

AI Research

88

Larger models learn rare skills by forgetting them less, new paper shows

New paper from Stanford, MIT, Harvard, and Anthropic shows larger models learn rare skills because they forget them less during training, tested on OLMo models from 4M to 4B parameters.

x.com/1d ago/3 min read

anthropicscaling lawsmachine learning

AI agent interface displaying Ebola sequence retrieval results with low counts, indicating missed data and altered…

AI Research

87

Anthropic: AI agents fail biology retrieval, miss 261 Ebola sequences

Anthropic research shows Claude Sonnet 4 returning 5–106 Ebola sequences instead of 266, shifting outbreak origin from 2014 to 1922. Repeatable retrieval tool fixes the variance.

x.com/23h ago/3 min read

agent reliabilityretrieval-augmented generationbiology

Bar chart comparing AI models on the Artificial Analysis Intelligence Index, with MiniMax-M3 scoring 55 and leading…

AI Research

85

MiniMax-M3 Scores 55 on AI Index, Open-Source Lead Looms

MiniMax-M3 scored 55 on the Artificial Analysis Intelligence Index, set to become the leading open-source model once weights are released.

x.com/1d ago/3 min read

open sourcebenchmarksai models

iPhone 17 Pro centered on a desk, displaying a multimodal AI interface, highlighting the sparse, advanced AFM Core…

AI Research

79

Apple AFM Core Advanced: Sparse, Multimodal, iPhone 17 Pro Only

Apple AFM Core Advanced is sparse, multimodal, and exclusive to iPhone 17 Pro, M3+ Mac, M4+ iPad, while AFM Core is dense for other devices.

x.com/1d ago/3 min read

hardware lock-inapplemultimodal

Products & Launches

78

Apple Blames EU DMA for Blocking Siri AI on iOS in Europe

Apple blames EU DMA for blocking Siri AI on iPhone and iPad in Europe, citing privacy risks from required rival AI assistant access. No timeline for launch.

x.com/1d ago/3 min read

privacyappleai

iPhone camera screen displays a coffee mug with visual intelligence overlay showing object identification and action…

Products & Launches

75

Apple’s New Siri in Camera Adds Visual Intelligence to iPhone

Apple previewed Siri in camera with visual intelligence, per a tweet. The feature competes with Google Lens and ChatGPT vision, but details remain scarce.

x.com/21h ago/3 min read

computer visionappleai

Business executives in suits stand in a modern glass-walled office, reviewing data on large screens showing charts…

Products & Launches

75

Banks Signal AI Will Cut Junior Analyst Roles by Two-Thirds

Four major banks cut junior analyst classes by two-thirds citing AI, but some cuts may mask prior overhiring.

x.com/1d ago/3 min read

financebankinglabor

A smartphone screen shows the Apple Passwords app interface with a data breach alert and an auto-change password button

Products & Launches

75

Apple Passwords App Gains AI Agent for Breach Auto-Change

Apple Intelligence will auto-change breached passwords on OS 27. Agent runs in Passwords app, eliminating manual credential rotation.

x.com/1d ago/3 min read

agentic aisecurityapple

AI News Digest

Selective Attackers Cut Agent Safety by 28pp, Paper Finds

MacArena: 421-Task macOS Benchmark Reveals 26% CUA Ranking Inversion

Chinese LLMs Surge on OpenRouter as U.S. AI Traffic Shifts

Nvidia Qualifies HBM4 for Vera Rubin, SK Hynix Gets 60-70% Share

WorldBench: Top MLLM Scores 64% on Visually Diverse Benchmark

Larger models learn rare skills by forgetting them less, new paper shows

Anthropic: AI agents fail biology retrieval, miss 261 Ebola sequences

MiniMax-M3 Scores 55 on AI Index, Open-Source Lead Looms

Apple AFM Core Advanced: Sparse, Multimodal, iPhone 17 Pro Only

Apple Blames EU DMA for Blocking Siri AI on iOS in Europe

Apple’s New Siri in Camera Adds Visual Intelligence to iPhone

Banks Signal AI Will Cut Junior Analyst Roles by Two-Thirds

Apple Passwords App Gains AI Agent for Breach Auto-Change

Recent Daily Digests