red team

30 articles about red team in AI news

Decepticon Open-Sources Autonomous AI Red Team for Full Kill Chain

Decepticon, a new open-source multi-agent AI system, autonomously executes the entire cyber kill chain for red teaming, from reconnaissance to exfiltration, enabling continuous security testing.

Apr 27, 202682% relevant

Mythos AI Red Team Reports: A 6-9 Month Warning Window for CISOs

AI researcher Ethan Mollick highlights a critical gap: few large organizations treat AI red team reports from groups like Mythos as urgent threats, despite a historical 6-9 month diffusion window to malicious actors.

Apr 8, 202689% relevant

Swarm Plugin Enforces Consistent 9/10 Outputs from Claude Code Teams

The Swarm plugin for Claude Code creates a structured team of agents that review and score work before it reaches you, solving the problem of inconsistent output quality.

Apr 17, 2026100% relevant

Garry Tan's gstack: Install This 56k-Star 'Virtual Team' for Claude Code

YC CEO Garry Tan open-sourced gstack, a pack of slash commands that turns Claude Code into a structured team of specialists, claiming it helps ship 10k-20k lines of code daily.

Mar 30, 202699% relevant

Google Quantum AI Team Reduces Bitcoin-Cracking Qubit Estimate to ~500k, Enabling 9-Minute Key Derivation

Google researchers have compiled Shor's algorithm to solve Bitcoin's 256-bit elliptic curve problem with ~1.2k logical qubits, translating to <500k physical qubits—a 20x reduction from 2023 estimates. This makes 'on-spend' attacks against unconfirmed transactions theoretically plausible with fast-clock quantum hardware.

Apr 1, 202695% relevant

RIFT-Bench Tests 45 Agentic Systems With Dynamic Red-Teaming

RIFT-Bench evaluates 45 agentic AI systems via a graph-driven two-phase pipeline, enabling unified security comparison across heterogeneous architectures.

Jun 24, 202685% relevant

GPT-Red: OpenAI's LLM Super-Hacker Finds 84% of Attacks, Humans 13%

OpenAI's GPT-Red LLM finds 84% of attacks vs 13% for humans, hardening GPT-5.6 Sol. Automated red-teaming shifts safety paradigm.

Jul 15, 202691% relevant

OpenAI Launches ChatGPT Workspace Agents for Team Automation

OpenAI has introduced workspace agents within ChatGPT, powered by Codex, designed to automate complex, multi-step workflows for teams across shared environments like Slack. These agents can gather context, execute tasks, request approvals, and run continuously in the cloud.

Apr 22, 202697% relevant

Microsoft Fires Candy Crush AI Team After Years of Level-Design Tool Development

A developer claims Microsoft fired the AI team at King, the Candy Crush developer, after they spent years building tools to automate level design. This highlights the tension between long-term AI R&D and corporate cost-cutting.

Apr 20, 202685% relevant

Emergent AI Launches Work Stress Copilot, Integrates with Slack & Teams

Emergent AI has launched a new 'Work Stress Copilot' agent that integrates with Slack and Microsoft Teams to autonomously manage calendar scheduling, email triage, and meeting prep. The tool aims to directly reduce cognitive load by automating repetitive administrative work.

Apr 15, 202687% relevant

Replace Claude Code's Context-Stuffing with git-semantic for Team-Wide Semantic Search

A new tool, git-semantic, lets teams build and share a semantic search index of their codebase via Git, eliminating redundant API calls and enabling faster, more accurate Claude Code queries.

Apr 6, 202696% relevant

OpenAI Unbundles Codex API, Launches Metered Pilot with Usage-Based Pricing

OpenAI has unbundled its Codex code-generation model from ChatGPT Business, making it available as a standalone, usage-metered product. This allows teams to pilot Codex without purchasing full ChatGPT seats and ties costs directly to coding output.

Apr 2, 202687% relevant

AI-Powered 'Vibe-Coded' Companies Emerge as AI Collapses Traditional Staffing Models

Entrepreneur Matthew Gallagher used AI to automate core business functions—coding, marketing, support—allowing his company to scale without building a large managerial team. This demonstrates AI's current strength: drastically reducing coordination costs to enable solo or small teams to execute like corporations.

Apr 2, 202685% relevant

DeepMind Secretly Assembled ~20-Person Team to Train AI for High-Frequency Trading, Aiming at Renaissance

Demis Hassabis formed a covert ~20-researcher team within DeepMind to develop AI-powered high-frequency trading algorithms, reportedly targeting rival Renaissance Technologies. Google leadership disapproved, leading to the project's quiet termination.

Apr 1, 202695% relevant

Requestly Launches Git-Synced API Client to Replace Scattered Postman Setups

Requestly has launched an AI-powered API client that automatically syncs team collections through Git, eliminating stale docs and configuration drift. The tool directly targets the collaboration pain points of Postman and Insomnia users.

Mar 28, 202685% relevant

LeCun's Team Publishes LeWorldModel: A 15M-Parameter World Model That Mathematically Prevents Training Collapse

Yann LeCun's team has open-sourced LeWorldModel, a 15M-parameter world model that uses a novel SIGReg regularizer to make representation collapse mathematically impossible. It trains on a single GPU in hours and enables efficient physical prediction for robotics and autonomous systems.

Mar 27, 202695% relevant

OpenAI Shifts Sora Team to World-Model Research, Reportedly Cancels Video Model for Compute

A report claims OpenAI has redirected its Sora team to focus on world-model research for robotics and canceled the video model to free compute for a new, powerful LLM codenamed 'Spud.'

Mar 24, 202695% relevant

AI Product Teams: How Luxury Brands Can 10x Development Velocity with Autonomous Agents

A developer built a full deal intelligence platform in one week using two AI agents as team members. This structured approach—43 sprints, 6,800-line strategy—demonstrates how luxury brands can accelerate digital innovation with AI-powered product development.

Mar 7, 202665% relevant

The End of 'Who Has the Latest Version?': How AI-Powered Real-Time Collaboration is Transforming Development

AI-driven real-time shared workspaces are eliminating traditional development bottlenecks like version conflicts and sync errors. These platforms enable entire teams to work from a single, live state, fundamentally changing how developers collaborate.

Feb 26, 202685% relevant

Inside Anthropic: How Claude Code Engineers Ship with 2-Engineer Teams and

Anthropic ships software with 2-engineer teams, AI-driven code review, and Claude Code. A 500K+ line Bun rewrite to Rust took 11 days and $165K in tokens—proof that small teams + AI beat large teams.

Jul 28, 202690% relevant

Michaels Launches 'Ask Mike' AI-Powered Shopping Assistant Built on Google Cloud

Michaels launched 'Ask Mike,' an AI shopping assistant on Google Cloud using Gemini models. The tool helps customers find products and get project ideas, potentially reducing search friction in craft retail.

Jul 21, 2026100% relevant

MITRE-Led Team Monolithically Integrates Piezo-Optomechanical Photonics

MITRE-led team demonstrated first monolithic CMOS platform for piezo-optomechanical photonics, achieving wafer-scale integration with 2.3x lower loss and 40% better bandwidth.

Jul 12, 202678% relevant

Reduce Compliance Violations 60% by Running Claude Code with OPA/Kyverno

Reduce compliance violations 60% by running Claude Code through OPA/Kyverno policies. This cloud-native approach cuts remediation from 3 days to 2 hours.

Jul 9, 202668% relevant

Agentic Commerce Needs Clean, Structured Data to Deliver ROI at Scale

Retail Dive reports that agentic commerce, with 4,700% YoY traffic growth, demands clean, structured data. Melissa's data quality assessment helps retailers identify weak spots for AI readiness.

Jul 6, 202682% relevant

This 4-Skill + 2-MCP 'Dev Team' Stack for Claude Code Beats 132-Agent

Install 4 skills (using-superpowers, writing-plans, subagent-driven-development, requesting-code-review) and 2 MCP servers to turn Claude Code into a parallel dev team without the noise of 132 agents.

Jul 5, 202669% relevant

BayesBench: LLMs Match Bayesian Posteriors But Fail Downstream Prediction

BayesBench tests 7 LLMs on multi-turn Bayesian reasoning. Scaling improves latent inference but not prediction, exposing a critical gap for agentic deployment.

Jul 1, 202689% relevant

OpenAI DeploymentSim predicts GPT-5 errors 92% of the time pre-launch

OpenAI's Deployment Simulation predicted GPT-5 errors with 92% accuracy using 1.3M real conversations, outperforming standard safety tests.

Jun 17, 202690% relevant

Clinical LLM Rejection Predictor Hits AUROC 0.719 in 4.5-Month Study

Clinical LLM rejection predictor achieves AUROC 0.719 in 4.5-month study using deployment-specific context to forecast user rejection before response generation.

Jun 12, 202672% relevant

og-local: The Local Privacy Proxy That Redacts Secrets Before They Reach

og-local is a local proxy that redacts PII/secrets from Claude Code API calls using an ONNX model. Install via curl, run ogl claude. No cloud round-trip, no data leaks.

Jun 11, 202670% relevant

Claude Code's June 15 Agentic Credit Split: How to Avoid Hitting the $20 Wall

Claude Code's June 15 agentic credit split moves `claude -p` and CI workflows to a separate $20/month bucket on Pro. Upgrade to Max 5x or switch to direct API for production pipelines.

Jun 10, 2026100% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety