artifacts

30 articles about artifacts in AI news

Claude Code Artifacts: How to Generate Shareable PR Walkthroughs and

Claude Code Artifacts (Team/Enterprise beta) turn session work into shareable, live web pages. Run `/login`, then prompt for dashboards, PR walkthroughs, or incident pages — zero infrastructure needed.

Jun 18, 202692% relevant

LeCun's Team Uncovers Hidden Transformer Flaws: How Architectural Artifacts Sabotage AI Efficiency

NYU researchers led by Yann LeCun reveal that Transformer language models contain systematic artifacts—massive activations and attention sinks—that degrade efficiency. These phenomena, stemming from architectural choices rather than fundamental properties, directly impact quantization, pruning, and memory management.

Mar 7, 202695% relevant

Visual-SDPO: Self-Distillation Fixes Code-Generated Visual Defects by +10 Points

Visual-SDPO uses visual-feedback self-distillation to improve code-generated visual artifacts by >10 points on ChartMimic, Design2Code, and AeSlides, with no added inference cost.

Jun 10, 202668% relevant

Claude's Cowork Adds Live Dashboards Connected to Apps & Files

Anthropic expanded its Claude Cowork collaborative workspace with live artifacts. Users can now create dashboards and trackers that pull live data from connected apps and files.

Apr 21, 202689% relevant

Omar Sar Uses Opus 4.7 Agent to Turn Podcasts into Self-Improving Wikis

AI researcher Omar Sar automated podcast consumption using an Opus 4.7 agent that extracts insights, generates analysis, and builds interactive HTML/JS artifacts. The system creates a self-improving knowledge wiki for agentic research workflows.

Apr 19, 202687% relevant

Omar Sarayra Builds LLM Artifact Generator for AI Knowledge Discovery

Omar Sarayra created a system that transforms dense LLM knowledge bases into consumable visual artifacts, like a pulse on HN AI discussions. He argues this format could become a new medium for staying current.

Apr 19, 202687% relevant

Windsurf AI Adds 'Clone GitHub Repo' Feature for Code Context

Windsurf AI now lets users clone a GitHub repository directly within the tool. This allows its AI to answer questions about the repo and use its code snippets to assist in building new artifacts.

Apr 16, 202681% relevant

OmniSch Benchmark Exposes Major Gaps in LMMs for PCB Schematic Understanding

Researchers introduced OmniSch, a benchmark with 1,854 real PCB schematics, to evaluate LMMs on converting diagrams to netlist graphs. Results show current models have unreliable grounding, brittle parsing, and inconsistent connectivity reasoning for engineering artifacts.

Apr 2, 202676% relevant

Reproducibility Crisis in Graph-Based Recommender Systems Research: SIGIR 2022 Papers Under Scrutiny

A new study analyzing 10 graph-based recommender system papers from SIGIR 2022 finds widespread reproducibility issues, including data leakage, inconsistent artifacts, and questionable baseline comparisons. This calls into question the validity of reported state-of-the-art improvements.

Mar 30, 202684% relevant

WorldBench: Top MLLM Scores 64% on Visually Diverse Benchmark

WorldBench, a new multimodal benchmark, tests 15 MLLMs on visually diverse images. Top model scores 64.0%, exposing fundamental gaps in visual understanding.

Jun 8, 202692% relevant

MIT Paper Formalizes Self-Revising AI Scientists That Can Change Their Own Language

MIT paper 2606.01444 formalizes self-revising AI scientists that can change their conceptual schema. Novelty is defined by what could not be expressed in the previous framework.

Jun 6, 202687% relevant

LTX Studio Turns AI Video Clips Into Editable Scenes

LTX Studio + LTX-2.3 lets users edit AI video scenes, not just generate clips. This shifts AI video from demo to production tool.

Jun 5, 202675% relevant

Karpathy: Neural nets will become the host, CPUs the co-processor

Karpathy predicts neural networks will become the host OS, with CPUs as co-processors, rendering most classical app interfaces obsolete.

May 23, 202685% relevant

The Five-Step Loop: Spec-First Coding Agents Cut Drift by 10x

The five-step loop makes every coding agent step a persistent artifact. Skipping the spec causes compounding drift that's invisible until verification passes for the wrong feature.

May 17, 202692% relevant

Simple Graph Heuristic Beats Generative Recommenders on 10 of 14 Benchmarks

A no-training graph heuristic beats generative recommenders on 10 of 14 benchmarks, exposing shortcut-solvable datasets. Relative NDCG@10 gains hit 44% on Amazon CDs.

May 11, 2026100% relevant

Detecting AI Images: Metadata Exposes Generators, No GPU Needed

AI image detection via metadata analysis exposes generators like Google's Gemini and Meta's Llama without GPU clusters, highlighting a simple but effective method.

May 10, 202675% relevant

Skills as Untrusted Code: A Security Precedent for Agent Runtimes

Paper argues agent skills are untrusted code until verified; runtimes must enforce verification gates to prevent supply-chain attacks, echoing decades of software security lessons.

May 5, 2026100% relevant

Recursive Multi-Agent Systems Top Hugging Papers; Eywa Bridges LLMs and Scientific Models

Recursive Multi-Agent Systems leads Hugging Papers with 242 upvotes. Eywa and OneManCompany signal a move from chat-based to structural agent collaboration.

May 3, 202689% relevant

GPT-Image-2 Adds Self-Review Loop for Iterative Image Correction

A new capability in GPT-Image-2 allows the model to review and iteratively correct its own image generations, aiming for higher accuracy before final output.

Apr 21, 202685% relevant

AI-Powered PS4 Emulator 'Spine' Runs Bloodborne Locally on PC

A developer has released Spine, a PS4 emulator that uses AI techniques to run Bloodborne fully on PC. This represents a major step forward in console emulation, previously considered years away.

Apr 20, 202687% relevant

Claude Code Security Alert: Patch Now, Stop Using Authentication Helpers

A critical security leak reveals three command injection vulnerabilities in Claude Code. Users must update and stop using authentication helpers to prevent credential theft and supply chain attacks.

Apr 20, 2026100% relevant

OpenAI Engineer Processed 210B Tokens, Sparking AI Efficiency Debate

An OpenAI engineer processed 210 billion tokens in one week, equivalent to 33 Wikipedia-sized datasets. This extreme usage spotlights a growing trend where high AI consumption by engineers leads to a 10x cost increase and a high volume of discarded code.

Apr 20, 202685% relevant

Google Gemini's UI Harness Lags Behind Claude, GPT, Analyst Says

AI researcher Ethan Mollick notes the Gemini Pro 3.1 model is technically capable but hampered by a minimal user interface and tool harness, widening its gap with competitors Claude and ChatGPT.

Apr 19, 202679% relevant

GPT-5.5 Generates Complex SVG in Single Prompt, User Reports

A developer shared that OpenAI's GPT-5.5 produced a sophisticated SVG image from a single prompt. This suggests improvements in the model's ability to generate precise, structured visual code.

Apr 19, 202685% relevant

FiMMIA Paper Exposes Broken MIA Benchmarks, Challenges Hessian Theory

A paper accepted at EACL 2026 shows membership inference attack (MIA) benchmarks suffer from data leakage, allowing model-free classifiers to achieve up to 99.9% AUC. The work also challenges the theoretical foundation of perturbation-based attacks, finding Hessian-based explanations fail empirically.

Apr 18, 202684% relevant

Claude Code Reverse-Engineered: 98.4% of Codebase is Operational Harness

A reverse-engineering analysis of Claude Code reveals only 1.6% of its codebase is AI decision logic, with the rest being operational infrastructure. This challenges current agent design paradigms by prioritizing a robust deterministic harness over complex model routing.

Apr 18, 2026100% relevant

Ethan Mollick on AI's Impact: 'Everything Is Someone's Life Work' No Longer True

AI researcher Ethan Mollick notes the foundational assumption that 'everything around me is somebody's life work' is being invalidated by generative AI, signaling a profound shift in how we value human output.

Apr 18, 202685% relevant

Charm AI Appears to Be a Rebranded Grok 4.3 Beta

An AI community account identified that the newly surfaced 'Charm' model is likely a rebranded version of xAI's Grok 4.3 Beta. This suggests a potential test or leak of an unreleased model.

Apr 17, 202685% relevant

MiniMax Launches MaxHermes, Cloud-Hosted Agent with NousResearch

MiniMax has launched MaxHermes, a cloud-hosted version of the Hermes agent framework, in partnership with NousResearch. This provides a managed service for users of MiniMax's M2.7 model, aiming to simplify agent deployment.

Apr 16, 202685% relevant

Anthropic's Opus 4.7 Model Spotted on Google Vertex AI

A new, unannounced Claude model, Opus 4.7, has been listed on Google's Vertex AI platform. This suggests an imminent public release and highlights the ongoing strategic integration between Anthropic and Google Cloud.

Apr 16, 202697% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety