Matrix for Performance of Reinforcement Learning Model

The End of Tabula Rasa: How Pre-Trained World Models are Redefining Reinforcement Learning

For a long time, the core idea in reinforcement learning (RL) was that AI agents should learn every new task from scratch, like a blank slate. This "tabula rasa" approach led to amazing achievements, ...

Inside Ring-1T: Ant engineers solve reinforcement learning bottlenecks at trillion scale

Ant Group, an affiliate of Alibaba, released Ring-1T which it says is the first trillion parameter open-source model.

19d

Self-improving language models are becoming reality with MIT's updated SEAL technique

Researchers at the Massachusetts Institute of Technology (MIT) are gaining renewed attention for developing and open sourcing ...

NextBigFuture

Progress to Continual Learning AI

LLM papers according to arXiv trends. This is driven by foundation model scale and multimodal extensions. However, ...

Geeky Gadgets

Reinforcement Learning for LLMs in 2025

Imagine trying to teach a child how to solve a tricky math problem. You might start by showing them examples, guiding them step by step, and encouraging them to think critically about their approach.

Semiconductor Engineering

DeepSeek: Improving Language Model Reasoning Capabilities Using Pure Reinforcement Learning

“We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT ...

TechCrunch

Quantum Machines and Nvidia use machine learning to get closer to an error-corrected quantum computer

About a year and a half ago, quantum control startup Quantum Machines and Nvidia announced a deep partnership that would bring together Nvidia’s DGX Quantum computing platform and Quantum Machine’s ...

MIT Technology Review

Why we should thank pigeons for our AI breakthroughs

The bird has never gotten much credit for being intelligent. But the reinforcement learning powering the world’s most advanced AI systems is far more pigeon than human. In 1943, while the world’s ...

Wired

Databricks Has a Trick That Lets AI Models Improve Themselves

Using several recent innovations, the company Databricks will let customers boost the IQ of their AI models even if they don’t have squeaky clean data. Databricks, a company that helps big businesses ...

TechCrunch

Microsoft’s most capable new Phi 4 AI model rivals the performance of far larger systems

Microsoft on Wednesday launched several new “open” AI models, the most capable of which is competitive with OpenAI’s o3-mini on at least one benchmark. As it says on the tin, all of the new ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results