On-Demand Videos

video

AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU Workloads in the Cloud

Watch now

video

AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack

Watch now

video

AI/ML Infra Meetup | Three Developments in AI Infra

Watch now

video

Accelerate AI: Alluxio 101

In the rapidly evolving landscape of AI and machine learning, Platform and Data Infrastructure Teams face critical challenges in building and managing large-scale AI platforms. Performance bottlenecks, scalability of the platform, and scarcity of GPUs pose significant challenges in supporting large-scale model training and serving.

In this talk, we introduce how Alluxio helps Platform and Data Infrastructure teams deliver faster, more scalable platforms to ML Engineering teams developing and training AI models. Alluxio’s highly-distributed cache accelerates AI workloads by eliminating data loading bottlenecks and maximizing GPU utilization. Customers report up to 4x faster training performance with high-speed access to petabytes of data spread across billions of files regardless of persistent storage type or proximity to GPU clusters. Alluxio’s architecture lowers data infrastructure costs, increases GPU utilization, and enables workload portability for navigating GPU scarcity challenges.

‍

Watch now

video

AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI

Watch now

video

AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training with NVMe GDS and RDMA

Watch now

video

AI/ML Infra Meetup | Big Data and AI

Watch now

video

AI/ML Infra Meetup | TorchTitan, One-stop PyTorch native solution for production ready LLM pre-training

Watch now

video

Model Training Across Regions and Clouds – Challenges, Solutions and Live Demo

AI training workloads running on compute engines like PyTorch, TensorFlow, and Ray require consistent, high-throughput access to training data to maintain high GPU utilization. However, with the decoupling of compute and storage and with today’s hybrid and multi-cloud landscape, AI Platform and Data Infrastructure teams are struggling to cost-effectively deliver the high-performance data access needed for AI workloads at scale.

Join Tom Luckenbach, Alluxio Solutions Engineering Manager, to learn how Alluxio enables high-speed, cost-effective data access for AI training workloads in hybrid and multi-cloud architectures, while eliminating the need to manage data copies across regions and clouds.

What Tom will share:

AI data access challenges in cross-region, cross-cloud architectures.
The architecture and integration of Alluxio with frameworks like PyTorch, TensorFlow, and Ray using POSIX, REST, or Python APIs across AWS, GCP and Azure.
A live demo of an AI training workload accessing cross-cloud datasets leveraging Alluxio's distributed cache, unified namespace, and policy-driven data management.
MLPerf and FIO benchmark results and cost-savings analysis.

Watch now

video

AI/ML Infra Meetup | Scaling Vector Databases for E-Commerce Visual Search: Architectural Strategies for Millions of Products

Watch now

video

AI/ML Infra Meetup | Scaling Experimentation Platform in Digital Marketplaces: Architecture, Implementation & Lessons Learned

Watch now

video

Optimize, Don’t Overspend: Data Caching Strategy for AI Workloads

As machine learning and deep learning models grow in complexity, AI platform engineers and ML engineers face significant challenges with slow data loading and GPU utilization, often leading to costly investments in high-performance computing (HPC) storage. However, this approach can result in overspending without addressing the core issues of data bottlenecks and infrastructure complexity.

A better approach is adding a data caching layer between compute and storage, like Alluxio, which offers a cost-effective alternative through its innovative data caching strategy. In this webinar, Jingwen will explore how Alluxio's caching solutions optimize AI workloads for performance, user experience and cost-effectiveness.

What you will learn:

The I/O bottlenecks that slow down data loading in model training
How Alluxio's data caching strategy optimizes I/O performance for training and GPU utilization, and significantly reduces cloud API costs
The architecture and key capabilities of Alluxio
Using Rapid Alluxio Deployer to install Alluxio and run benchmarks in AWS in just 30 minutes

Watch now

Alluxio Enterprise AI

Alluxio Enterprise Data

On-Demand Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer