News

CRMArena-Pro is designed to test how well large language models (LLMs) can function as agents in real-world business settings, especially for CRM tasks like sales, customer service, and pricing. The ...
Anthropic has shared the design for its new research agent, which uses a multi-agent approach: a main agent analyzes questions, creates strategies, and assigns specialized sub-agents to work on ...
OpenAI has rolled out a major update to ChatGPT's integrated search, introducing smarter answers, better handling of long conversations, and a new image search feature. According to OpenAI, the ...
ByteDance unveiled Seedance 1.0, a new AI model for video generation. According to the company's tests, Seedance 1.0 outperforms established systems such as Google's Veo, Kuaishou's Kling, and ...
Palisade Research put AI systems to the test in two large-scale Capture The Flag (CTF) tournaments involving thousands of participants. In these CTF challenges, teams race to uncover hidden "flags" by ...
A year after rolling out Apple Intelligence, Apple is heading into its annual developer conference with little to show for its AI ambitions, according to insider Mark Gurman. Instead of real progress, ...
With the Darwin-Gödel Machine (DGM), Sakana AI introduces an AI system that can iteratively improve itself through self-modification and open-ended exploration. Early results look promising, but the ...
A team at Stanford has shown that large language models can automatically generate highly efficient GPU kernels, sometimes outperforming the standard functions found in the popular machine learning ...
LLMs designed for reasoning, like Claude 3.7 and Deepseek-R1, are supposed to excel at complex problem-solving by simulating thought processes. But a new study by Apple researchers suggests that these ...
OpenAI has lowered the price of its o3 language model by 80 percent, CEO Sam Altman said. The new cost is $2 per million input tokens and $8 per million output tokens. The move follows Google’s Gemini ...
Fundamental disagreements over AI's future. LeCun's remarks highlight a much deeper debate about the direction of AI research. Companies like Anthropic and OpenAI are racing to commercialize ever more ...
Anthropic has added three premium features to Claude Pro, previously only for Max, Team, and Enterprise users. Claude Code allows command-line coding tasks. A new integration tool connects Claude to ...