artifacts
30 articles about artifacts in AI news
Claude Code Artifacts: How to Generate Shareable PR Walkthroughs and
Claude Code Artifacts (Team/Enterprise beta) turn session work into shareable, live web pages. Run `/login`, then prompt for dashboards, PR walkthroughs, or incident pages — zero infrastructure needed.
LeCun's Team Uncovers Hidden Transformer Flaws: How Architectural Artifacts Sabotage AI Efficiency
NYU researchers led by Yann LeCun reveal that Transformer language models contain systematic artifacts—massive activations and attention sinks—that degrade efficiency. These phenomena, stemming from architectural choices rather than fundamental properties, directly impact quantization, pruning, and memory management.
Visual-SDPO: Self-Distillation Fixes Code-Generated Visual Defects by +10 Points
Visual-SDPO uses visual-feedback self-distillation to improve code-generated visual artifacts by >10 points on ChartMimic, Design2Code, and AeSlides, with no added inference cost.
Claude's Cowork Adds Live Dashboards Connected to Apps & Files
Anthropic expanded its Claude Cowork collaborative workspace with live artifacts. Users can now create dashboards and trackers that pull live data from connected apps and files.
Omar Sar Uses Opus 4.7 Agent to Turn Podcasts into Self-Improving Wikis
AI researcher Omar Sar automated podcast consumption using an Opus 4.7 agent that extracts insights, generates analysis, and builds interactive HTML/JS artifacts. The system creates a self-improving knowledge wiki for agentic research workflows.
Omar Sarayra Builds LLM Artifact Generator for AI Knowledge Discovery
Omar Sarayra created a system that transforms dense LLM knowledge bases into consumable visual artifacts, like a pulse on HN AI discussions. He argues this format could become a new medium for staying current.
Windsurf AI Adds 'Clone GitHub Repo' Feature for Code Context
Windsurf AI now lets users clone a GitHub repository directly within the tool. This allows its AI to answer questions about the repo and use its code snippets to assist in building new artifacts.
OmniSch Benchmark Exposes Major Gaps in LMMs for PCB Schematic Understanding
Researchers introduced OmniSch, a benchmark with 1,854 real PCB schematics, to evaluate LMMs on converting diagrams to netlist graphs. Results show current models have unreliable grounding, brittle parsing, and inconsistent connectivity reasoning for engineering artifacts.
Reproducibility Crisis in Graph-Based Recommender Systems Research: SIGIR 2022 Papers Under Scrutiny
A new study analyzing 10 graph-based recommender system papers from SIGIR 2022 finds widespread reproducibility issues, including data leakage, inconsistent artifacts, and questionable baseline comparisons. This calls into question the validity of reported state-of-the-art improvements.
WorldBench: Top MLLM Scores 64% on Visually Diverse Benchmark
WorldBench, a new multimodal benchmark, tests 15 MLLMs on visually diverse images. Top model scores 64.0%, exposing fundamental gaps in visual understanding.
MIT Paper Formalizes Self-Revising AI Scientists That Can Change Their Own Language
MIT paper 2606.01444 formalizes self-revising AI scientists that can change their conceptual schema. Novelty is defined by what could not be expressed in the previous framework.
LTX Studio Turns AI Video Clips Into Editable Scenes
LTX Studio + LTX-2.3 lets users edit AI video scenes, not just generate clips. This shifts AI video from demo to production tool.
Karpathy: Neural nets will become the host, CPUs the co-processor
Karpathy predicts neural networks will become the host OS, with CPUs as co-processors, rendering most classical app interfaces obsolete.
The Five-Step Loop: Spec-First Coding Agents Cut Drift by 10x
The five-step loop makes every coding agent step a persistent artifact. Skipping the spec causes compounding drift that's invisible until verification passes for the wrong feature.
Simple Graph Heuristic Beats Generative Recommenders on 10 of 14 Benchmarks
A no-training graph heuristic beats generative recommenders on 10 of 14 benchmarks, exposing shortcut-solvable datasets. Relative NDCG@10 gains hit 44% on Amazon CDs.
Detecting AI Images: Metadata Exposes Generators, No GPU Needed
AI image detection via metadata analysis exposes generators like Google's Gemini and Meta's Llama without GPU clusters, highlighting a simple but effective method.
Skills as Untrusted Code: A Security Precedent for Agent Runtimes
Paper argues agent skills are untrusted code until verified; runtimes must enforce verification gates to prevent supply-chain attacks, echoing decades of software security lessons.
Recursive Multi-Agent Systems Top Hugging Papers; Eywa Bridges LLMs and Scientific Models
Recursive Multi-Agent Systems leads Hugging Papers with 242 upvotes. Eywa and OneManCompany signal a move from chat-based to structural agent collaboration.
GPT-Image-2 Adds Self-Review Loop for Iterative Image Correction
A new capability in GPT-Image-2 allows the model to review and iteratively correct its own image generations, aiming for higher accuracy before final output.
AI-Powered PS4 Emulator 'Spine' Runs Bloodborne Locally on PC
A developer has released Spine, a PS4 emulator that uses AI techniques to run Bloodborne fully on PC. This represents a major step forward in console emulation, previously considered years away.
Claude Code Security Alert: Patch Now, Stop Using Authentication Helpers
A critical security leak reveals three command injection vulnerabilities in Claude Code. Users must update and stop using authentication helpers to prevent credential theft and supply chain attacks.
OpenAI Engineer Processed 210B Tokens, Sparking AI Efficiency Debate
An OpenAI engineer processed 210 billion tokens in one week, equivalent to 33 Wikipedia-sized datasets. This extreme usage spotlights a growing trend where high AI consumption by engineers leads to a 10x cost increase and a high volume of discarded code.
Google Gemini's UI Harness Lags Behind Claude, GPT, Analyst Says
AI researcher Ethan Mollick notes the Gemini Pro 3.1 model is technically capable but hampered by a minimal user interface and tool harness, widening its gap with competitors Claude and ChatGPT.
GPT-5.5 Generates Complex SVG in Single Prompt, User Reports
A developer shared that OpenAI's GPT-5.5 produced a sophisticated SVG image from a single prompt. This suggests improvements in the model's ability to generate precise, structured visual code.
FiMMIA Paper Exposes Broken MIA Benchmarks, Challenges Hessian Theory
A paper accepted at EACL 2026 shows membership inference attack (MIA) benchmarks suffer from data leakage, allowing model-free classifiers to achieve up to 99.9% AUC. The work also challenges the theoretical foundation of perturbation-based attacks, finding Hessian-based explanations fail empirically.
Claude Code Reverse-Engineered: 98.4% of Codebase is Operational Harness
A reverse-engineering analysis of Claude Code reveals only 1.6% of its codebase is AI decision logic, with the rest being operational infrastructure. This challenges current agent design paradigms by prioritizing a robust deterministic harness over complex model routing.
Ethan Mollick on AI's Impact: 'Everything Is Someone's Life Work' No Longer True
AI researcher Ethan Mollick notes the foundational assumption that 'everything around me is somebody's life work' is being invalidated by generative AI, signaling a profound shift in how we value human output.
Charm AI Appears to Be a Rebranded Grok 4.3 Beta
An AI community account identified that the newly surfaced 'Charm' model is likely a rebranded version of xAI's Grok 4.3 Beta. This suggests a potential test or leak of an unreleased model.
MiniMax Launches MaxHermes, Cloud-Hosted Agent with NousResearch
MiniMax has launched MaxHermes, a cloud-hosted version of the Hermes agent framework, in partnership with NousResearch. This provides a managed service for users of MiniMax's M2.7 model, aiming to simplify agent deployment.
Anthropic's Opus 4.7 Model Spotted on Google Vertex AI
A new, unannounced Claude model, Opus 4.7, has been listed on Google's Vertex AI platform. This suggests an imminent public release and highlights the ongoing strategic integration between Anthropic and Google Cloud.