Interactive analytics with Presto and Alluxio
Presto with Alluxio is a truly separated compute and storage stack, enabling interactive big data analytics on any file or object store.
Alluxio provides a distributed caching layer that can be used between Presto and data sources to improve I/O performance. By caching data closer to the Presto workers, Alluxio reduces the latency of data access and alleviates pressure on the underlying storage system
Caching solutions in Presto
- Slowplanning time
- Slow Hive Metascore
- Large tables with hundreds of partitions
- Overloaded HDFS's namenode
- Overloaded object storage such as S3
- Slow or unstable external storage
- Cross-region, multi-cloud, hybrid-cloud
- Data sharing with other compute engines
Why Presto + Alluxio
Alluxio provides a multi-tiered layer for Presto caching to reduce I/O access latency while co-located with Presto, enabling consistent high performance with jobs that run up to 10x faster.
Alluxio makes the important data local to Presto, so there are no copies to manage when reading from remote data storage systems like s3, resulting in lower egress and API requests charges.
Alluxio connects to a variety of storage systems and clouds so Presto can query data stored anywhere, accelerating queries when reading remote data across datacenters, regions, and clouds.