LLM Inference System Batch

AI inference crisis: Google engineers on why network latency and memory trump compute

Researchers propose low-latency topologies and processing-in-network as memory and interconnect bottlenecks threaten inference economic viability ...

Business Wire

Enfabrica Unveils Industry’s First Ethernet-Based AI Memory Fabric System for Efficient Superscaling of LLM Inference

MOUNTAIN VIEW, Calif.--(BUSINESS WIRE)--Enfabrica Corporation, an industry leader in high-performance networking silicon for artificial intelligence (AI) and accelerated computing, today announced the ...

VentureBeat

How attention offloading reduces the costs of LLM inference at scale

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Rearranging the computations and hardware used to serve large language ...

The Next Platform

Optimizing AI Inference Is As Vital As Building AI Training Beasts

The history of computing teaches us that software always and necessarily lags hardware, and unfortunately that lag can stretch for many years when it comes to wringing the best performance out of iron ...

InfoQ

QCon SF 2024: Scale Batch GPU Inference with Ray

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. Vivek Yadav, an engineering manager from ...

Semiconductor Engineering

HW-SW Co-Designed System With 3 Core Optimization Pathways For Long-Context Agentic LLM Inference (Cambridge, ICL)

A new technical paper titled “Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference” was published by researchers at University of Cambridge, Imperial College London ...

CRN

Nvidia Says New Software Will Double LLM Inference Speed On H100 GPU

The AI chip giant says the open-source software library, TensorRT-LLM, will double the H100’s performance for running inference on leading large language models when it comes out next month. Nvidia ...

Semiconductor Engineering

Dynamic KV Cache Scheduling in Heterogeneous Memory Systems for LLM Inference (Rensselaer Polytechnic Institute, IBM)

A new technical paper titled “Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System” was published by researchers at Rensselaer Polytechnic Institute and IBM. “Large ...

InfoQ

Scale out Batch Inference with Ray

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. Vivek Yadav, an engineering manager from ...

Yahoo Finance

Enfabrica Unveils Industry’s First Ethernet-Based AI Memory Fabric System for Efficient Superscaling of LLM Inference

Elastic Networked-Memory Solution Delivers Multi-800GB/s Read-Write Throughput Over Ethernet and Up To 50% Lower Cost Per Token Per User in AI Inference Workloads MOUNTAIN VIEW, Calif., July 29, 2025- ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results