LLM Inference Compute Communication

AI inference crisis: Google engineers on why network latency and memory trump compute

Researchers propose low-latency topologies and processing-in-network as memory and interconnect bottlenecks threaten ...

Semiconductor Engineering

Detailed Study of Performance Modeling For LLM Implementations At Scale (imec)

A new technical paper titled “System-performance and cost modeling of Large Language Model training and inference” was published by researchers at imec. “Large language models (LLMs), based on ...

Business Wire

Cerebras Launches the World’s Fastest AI Inference

SUNNYVALE, Calif.--(BUSINESS WIRE)--Today, Cerebras Systems, the pioneer in high performance AI compute, announced Cerebras Inference, the fastest AI inference solution in the world. Delivering 1,800 ...

NextBigFuture

Defeating Nondeterminism in LLM Inference by Thinking Machines

A research article by Horace He and the Thinking Machines Lab (X-OpenAI CTO Mira Murati founded) addresses a long-standing issue in large language models (LLMs). Even with greedy decoding bu setting ...

Semiconductor Engineering

HW-SW Co-Designed System With 3 Core Optimization Pathways For Long-Context Agentic LLM Inference (Cambridge, ICL)

A new technical paper titled “Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference” was published by researchers at University of Cambridge, Imperial College London ...

MarketWatch

Cornami and DESILO Bring Encrypted AI to Scale with Deployable FHE-Based LLM

The MarketWatch News Department was not involved in the creation of this content. SAN JOSE, Calif., Sept. 16, 2025 /PRNewswire/ -- At the AI Infra Summit 2025, Cornami, a leader in scalable computing ...

InfoWorld

Evolving Kubernetes for generative AI inference

Kubernetes has become the leading platform for deploying cloud-native applications and microservices, backed by an extensive community and comprehensive feature set for managing distributed systems.

VentureBeat

How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs)

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Very small language models (SLMs) can ...

EDN

The next AI frontier: AI inference for less than $0.002 per query

Inference is rapidly emerging as the next major frontier in artificial intelligence (AI). Historically, the AI development and deployment focus has been overwhelmingly on training with approximately ...

Morningstar

Cerebras Delivers End-to-End Training and Inference for Jais 2, the World’s Leading Open Arabic LLM

Trained on the industry’s largest, highest-quality Arabic-first dataset, Jais 2 sets new standards for accuracy, fluency, and cultural intelligence Cerebras Systems, in partnership with G42’s ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results