excel

30 articles about excel in AI news

Alt-X Launches as AI-Powered, Traceable Financial Model Builder for Excel

Alt-X launches as an AI tool that automatically builds traceable financial models in Excel from documents like OMs and 10-Ks. It promises linked numbers, user control, and no hallucinations.

Mar 19, 202685% relevant

Excel Agent Showdown: ChatGPT Builds Working Strategy Game with 'Smart' Enemy, Claude Creates Board, Copilot Fails

When prompted to create a working strategy game in Excel with graphics, ChatGPT built a functional game with formulas and a 'smart' enemy AI, Claude created a board but acted as game master, and Microsoft Copilot failed to produce a game.

Mar 16, 202685% relevant

The Fine-Grained Vision Gap: Why VLMs Excel at Conversation But Fail at Classification

New research reveals vision-language models struggle with fine-grained visual classification despite excelling at complex reasoning tasks. The study identifies architectural and training factors creating this disconnect, with implications for AI development.

Feb 23, 202670% relevant

Microsoft's Phi-4-Vision: A Compact AI Model That Excels at Math, Science, and Understanding Interfaces

Microsoft has released Phi-4-reasoning-vision-15B, a 15-billion parameter open-weight multimodal model designed for tasks requiring both visual perception and selective reasoning. The compact model excels at scientific, mathematical, and GUI understanding while balancing compute efficiency.

Mar 6, 202685% relevant

GPT-5.5 Demo Shows AI Generating Functional Excel-Like Spreadsheet

A user demonstrated GPT-5.5 creating a web-based spreadsheet with formatting and grid behavior. This showcases incremental progress in AI's ability to generate complex, interactive frontend code from natural language.

Apr 20, 202685% relevant

Princeton Study: GPT-4 Outperforms Search for Book Recommendations

Princeton researchers found that 2,012 participants preferred book recommendations from a GPT-4-powered chatbot over those from a traditional search engine, suggesting LLMs may excel at certain subjective tasks.

Apr 13, 202685% relevant

InsForge Open-Source Framework Gives AI Agents Backend Database & Auth

Developer Akshay Pachaar launched InsForge, an open-source framework that exposes backend primitives through a semantic layer AI agents can understand. This aims to solve a core weakness where agents excel at frontend code but fail at backend logic.

Apr 11, 202685% relevant

Microsoft's 'Markdownify' Converts PDFs, Audio, Video to Clean LLM Markdown

Microsoft launched 'Markdownify', a Python tool that converts PDFs, Word docs, Excel, PowerPoint, audio, and YouTube URLs into clean Markdown. This addresses a major pain point in AI pipelines where raw file parsing breaks context and structure.

Apr 8, 202685% relevant

New Research Reveals the Complementary Strengths of Generative and ID-Based Recommendation Models

A new study systematically tests the hypothesis that generative recommendation (GR) models generalize better. It finds GR excels at generalization tasks, while ID-based models are better at memorization, and proposes a hybrid approach for improved performance.

Mar 23, 202670% relevant

The Intelligence Gap: Why LLMs Can't Match a Child's Learning

Yann LeCun reveals that while large language models process staggering amounts of text data, they lack the grounded physical understanding that even young children develop naturally. This fundamental limitation explains why AI struggles with real-world common sense despite excelling at pattern recognition.

Mar 5, 202685% relevant

The Fragility of China's Open-Source AI: New Research Reveals Capability Gaps

New empirical evidence reveals Chinese open-weight AI models show significant fragility compared to frontier closed models, excelling in narrow domains but struggling with general tasks and out-of-distribution challenges.

Mar 2, 202685% relevant

Claude Code's Edge: Why Sonnet 4.5 Beats GPT-4o for Multi-File Projects

Claude Code's underlying model excels at understanding existing codebases and maintaining instruction fidelity in long sessions, making it the better choice for complex, multi-file development tasks.

Apr 16, 2026100% relevant

The Coordination Crisis: Why LLMs Fail at Simultaneous Decision-Making

New research reveals a critical flaw in multi-agent LLM systems: while they excel in sequential tasks, they fail catastrophically when decisions must be made simultaneously, with deadlock rates exceeding 95%. This coordination failure persists even with communication enabled, challenging assumptions about emergent cooperation.

Feb 17, 202675% relevant

Gemini Can Now Create Docs, Sheets, Slides Directly in Chat

Gemini now lets users create Docs, Sheets, Slides, and PDFs directly in chat, eliminating the need to copy-paste content between AI and productivity tools.

Apr 29, 202685% relevant

CPU Demand Flipping the AI Narrative as Datacenter Growth Shifts

A new analysis from SemiAnalysis indicates CPU demand is rising in AI datacenters, reversing a narrative of GPU-only dominance. This shift signals changing workload patterns and infrastructure priorities.

Apr 28, 2026100% relevant

Pretrained Audio Models Underperform in Music Recommendation, New Research Shows

A new study evaluates nine pretrained audio models for music recommendation, finding significant performance disparity between traditional MIR tasks and both hot and cold-start recommendation scenarios.

Apr 28, 202680% relevant

Talkie: Vintage LLM Trained on 260B Pre-1931 English Tokens

Talkie is a new 'vintage language model' trained on 260 billion tokens of historical English text from before 1931, developed by a team including Alec Radford, co-author of the original GPT paper. It offers a unique linguistic artifact for NLP research.

Apr 28, 202685% relevant

ASPIRE: New Framework Makes Spectral Graph Filters Learnable for

Researchers propose ASPIRE, a bi-level optimization framework that makes spectral graph filters fully learnable for collaborative filtering, solving the 'low-frequency explosion' problem and matching task-specific designs.

Apr 27, 202690% relevant

Use Claude Code to Automate Systematic Literature Reviews

Claude Code can automate systematic literature reviews: scrape papers, extract key themes, and generate structured summaries — all from the terminal.

Apr 26, 2026100% relevant

Pony.ai Unveils NVIDIA-Powered Domain Controller for L4 Autonomy

Pony.ai introduced a new autonomous driving domain controller built with NVIDIA, targeting large-scale L4 deployment. The controller integrates NVIDIA's DRIVE platform to handle sensor fusion and planning.

Apr 26, 202692% relevant

GPT-5.4 Fails Client-Ready Test: 0% Pass Rate in Banking Benchmark

A new benchmark, BankerToolBench, tested GPT-5.4, Claude Opus 4.6, and others on junior investment banker tasks. None of the outputs were deemed client-ready, with GPT-5.4 leading but still failing nearly half the criteria.

Apr 26, 202698% relevant

How a Nursing Student Used Claude Haiku to Build a 660K-Page Drug Database Solo

Learn how Claude Haiku enabled a solo developer to classify thousands of medical conditions and build a production-grade pharmaceutical database.

Apr 25, 202675% relevant

GPT-5.5 Tops Benchmarks, Costs 2x API Price, Still Hallucinates

OpenAI launched GPT-5.5, an agentic model that tops Terminal-Bench 2.0 at 82.7% and surpasses Claude Opus 4.7 and Gemini 3.1 Pro on coding and math. However, independent testing shows higher hallucination rates and effective API costs 20% above GPT-5.4 despite doubled token prices.

Apr 25, 2026100% relevant

Why Production AI Needs More Than Benchmark Scores

The article argues that high benchmark scores are insufficient for production AI success, highlighting the need for robust MLOps practices, monitoring, and real-world testing—critical for retail applications.

Apr 24, 202674% relevant

Meta Deploys Millions of Amazon Graviton CPUs for AI Agents

Meta will deploy tens of millions of AWS Graviton5 CPU cores for AI agent workloads, signaling that agentic inference favors CPUs over GPUs. The deal deepens Meta's $200B+ infrastructure push amid layoffs and cloud rivalry.

Apr 24, 202684% relevant

Cua Driver Open-Sourced: macOS Agent Control for Any App

Cua released Cua Driver as open-source, allowing agents like Claude Code and Codex to drive any macOS app through visual understanding and direct UI interaction.

Apr 23, 202685% relevant

FalkorDB: Graph Database for Multi-Hop AI Queries in Milliseconds

FalkorDB, an open-source graph database, stores connections as a sparse matrix to accelerate multi-hop queries by 100x. Combined with built-in vector search, it enables GraphRAG systems that answer complex relational questions without pre-built articles.

Apr 23, 202677% relevant

ThermoQA Benchmark Reveals LLM Reasoning Gaps: Claude Opus Leads at 94.1%

Researchers released ThermoQA, a 293-question benchmark testing thermodynamic reasoning. Claude Opus 4.6 scored 94.1% overall, but models showed significant degradation on complex cycle analysis versus simple property lookups.

Apr 23, 202678% relevant

OpenCLAW-P2P v6.0 Cuts Paper Lookup Latency to <50ms

OpenCLAW-P2P v6.0 introduces a multi-layer persistence architecture and live reference verification, reducing paper retrieval latency from >3s to <50ms and operating with 14 autonomous agents that scored 50+ papers.

Apr 23, 202677% relevant

OpenAI Launches ChatGPT Workspace Agents for Team Automation

OpenAI has introduced workspace agents within ChatGPT, powered by Codex, designed to automate complex, multi-step workflows for teams across shared environments like Slack. These agents can gather context, execute tasks, request approvals, and run continuously in the cloud.

Apr 22, 202697% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety