excel
30 articles about excel in AI news
Alt-X Launches as AI-Powered, Traceable Financial Model Builder for Excel
Alt-X launches as an AI tool that automatically builds traceable financial models in Excel from documents like OMs and 10-Ks. It promises linked numbers, user control, and no hallucinations.
Excel Agent Showdown: ChatGPT Builds Working Strategy Game with 'Smart' Enemy, Claude Creates Board, Copilot Fails
When prompted to create a working strategy game in Excel with graphics, ChatGPT built a functional game with formulas and a 'smart' enemy AI, Claude created a board but acted as game master, and Microsoft Copilot failed to produce a game.
The Fine-Grained Vision Gap: Why VLMs Excel at Conversation But Fail at Classification
New research reveals vision-language models struggle with fine-grained visual classification despite excelling at complex reasoning tasks. The study identifies architectural and training factors creating this disconnect, with implications for AI development.
Microsoft's Phi-4-Vision: A Compact AI Model That Excels at Math, Science, and Understanding Interfaces
Microsoft has released Phi-4-reasoning-vision-15B, a 15-billion parameter open-weight multimodal model designed for tasks requiring both visual perception and selective reasoning. The compact model excels at scientific, mathematical, and GUI understanding while balancing compute efficiency.
GPT-5.5 Demo Shows AI Generating Functional Excel-Like Spreadsheet
A user demonstrated GPT-5.5 creating a web-based spreadsheet with formatting and grid behavior. This showcases incremental progress in AI's ability to generate complex, interactive frontend code from natural language.
Princeton Study: GPT-4 Outperforms Search for Book Recommendations
Princeton researchers found that 2,012 participants preferred book recommendations from a GPT-4-powered chatbot over those from a traditional search engine, suggesting LLMs may excel at certain subjective tasks.
InsForge Open-Source Framework Gives AI Agents Backend Database & Auth
Developer Akshay Pachaar launched InsForge, an open-source framework that exposes backend primitives through a semantic layer AI agents can understand. This aims to solve a core weakness where agents excel at frontend code but fail at backend logic.
Microsoft's 'Markdownify' Converts PDFs, Audio, Video to Clean LLM Markdown
Microsoft launched 'Markdownify', a Python tool that converts PDFs, Word docs, Excel, PowerPoint, audio, and YouTube URLs into clean Markdown. This addresses a major pain point in AI pipelines where raw file parsing breaks context and structure.
New Research Reveals the Complementary Strengths of Generative and ID-Based Recommendation Models
A new study systematically tests the hypothesis that generative recommendation (GR) models generalize better. It finds GR excels at generalization tasks, while ID-based models are better at memorization, and proposes a hybrid approach for improved performance.
The Intelligence Gap: Why LLMs Can't Match a Child's Learning
Yann LeCun reveals that while large language models process staggering amounts of text data, they lack the grounded physical understanding that even young children develop naturally. This fundamental limitation explains why AI struggles with real-world common sense despite excelling at pattern recognition.
The Fragility of China's Open-Source AI: New Research Reveals Capability Gaps
New empirical evidence reveals Chinese open-weight AI models show significant fragility compared to frontier closed models, excelling in narrow domains but struggling with general tasks and out-of-distribution challenges.
Claude Code's Edge: Why Sonnet 4.5 Beats GPT-4o for Multi-File Projects
Claude Code's underlying model excels at understanding existing codebases and maintaining instruction fidelity in long sessions, making it the better choice for complex, multi-file development tasks.
The Coordination Crisis: Why LLMs Fail at Simultaneous Decision-Making
New research reveals a critical flaw in multi-agent LLM systems: while they excel in sequential tasks, they fail catastrophically when decisions must be made simultaneously, with deadlock rates exceeding 95%. This coordination failure persists even with communication enabled, challenging assumptions about emergent cooperation.
Gemini Can Now Create Docs, Sheets, Slides Directly in Chat
Gemini now lets users create Docs, Sheets, Slides, and PDFs directly in chat, eliminating the need to copy-paste content between AI and productivity tools.
CPU Demand Flipping the AI Narrative as Datacenter Growth Shifts
A new analysis from SemiAnalysis indicates CPU demand is rising in AI datacenters, reversing a narrative of GPU-only dominance. This shift signals changing workload patterns and infrastructure priorities.
Pretrained Audio Models Underperform in Music Recommendation, New Research Shows
A new study evaluates nine pretrained audio models for music recommendation, finding significant performance disparity between traditional MIR tasks and both hot and cold-start recommendation scenarios.
Talkie: Vintage LLM Trained on 260B Pre-1931 English Tokens
Talkie is a new 'vintage language model' trained on 260 billion tokens of historical English text from before 1931, developed by a team including Alec Radford, co-author of the original GPT paper. It offers a unique linguistic artifact for NLP research.
ASPIRE: New Framework Makes Spectral Graph Filters Learnable for
Researchers propose ASPIRE, a bi-level optimization framework that makes spectral graph filters fully learnable for collaborative filtering, solving the 'low-frequency explosion' problem and matching task-specific designs.
Use Claude Code to Automate Systematic Literature Reviews
Claude Code can automate systematic literature reviews: scrape papers, extract key themes, and generate structured summaries — all from the terminal.
Pony.ai Unveils NVIDIA-Powered Domain Controller for L4 Autonomy
Pony.ai introduced a new autonomous driving domain controller built with NVIDIA, targeting large-scale L4 deployment. The controller integrates NVIDIA's DRIVE platform to handle sensor fusion and planning.
GPT-5.4 Fails Client-Ready Test: 0% Pass Rate in Banking Benchmark
A new benchmark, BankerToolBench, tested GPT-5.4, Claude Opus 4.6, and others on junior investment banker tasks. None of the outputs were deemed client-ready, with GPT-5.4 leading but still failing nearly half the criteria.
How a Nursing Student Used Claude Haiku to Build a 660K-Page Drug Database Solo
Learn how Claude Haiku enabled a solo developer to classify thousands of medical conditions and build a production-grade pharmaceutical database.
GPT-5.5 Tops Benchmarks, Costs 2x API Price, Still Hallucinates
OpenAI launched GPT-5.5, an agentic model that tops Terminal-Bench 2.0 at 82.7% and surpasses Claude Opus 4.7 and Gemini 3.1 Pro on coding and math. However, independent testing shows higher hallucination rates and effective API costs 20% above GPT-5.4 despite doubled token prices.
Why Production AI Needs More Than Benchmark Scores
The article argues that high benchmark scores are insufficient for production AI success, highlighting the need for robust MLOps practices, monitoring, and real-world testing—critical for retail applications.
Meta Deploys Millions of Amazon Graviton CPUs for AI Agents
Meta will deploy tens of millions of AWS Graviton5 CPU cores for AI agent workloads, signaling that agentic inference favors CPUs over GPUs. The deal deepens Meta's $200B+ infrastructure push amid layoffs and cloud rivalry.
Cua Driver Open-Sourced: macOS Agent Control for Any App
Cua released Cua Driver as open-source, allowing agents like Claude Code and Codex to drive any macOS app through visual understanding and direct UI interaction.
FalkorDB: Graph Database for Multi-Hop AI Queries in Milliseconds
FalkorDB, an open-source graph database, stores connections as a sparse matrix to accelerate multi-hop queries by 100x. Combined with built-in vector search, it enables GraphRAG systems that answer complex relational questions without pre-built articles.
ThermoQA Benchmark Reveals LLM Reasoning Gaps: Claude Opus Leads at 94.1%
Researchers released ThermoQA, a 293-question benchmark testing thermodynamic reasoning. Claude Opus 4.6 scored 94.1% overall, but models showed significant degradation on complex cycle analysis versus simple property lookups.
OpenCLAW-P2P v6.0 Cuts Paper Lookup Latency to <50ms
OpenCLAW-P2P v6.0 introduces a multi-layer persistence architecture and live reference verification, reducing paper retrieval latency from >3s to <50ms and operating with 14 autonomous agents that scored 50+ papers.
OpenAI Launches ChatGPT Workspace Agents for Team Automation
OpenAI has introduced workspace agents within ChatGPT, powered by Codex, designed to automate complex, multi-step workflows for teams across shared environments like Slack. These agents can gather context, execute tasks, request approvals, and run continuously in the cloud.