Alluxio Kubernetes Operator Tutorial: Simplifying Deploying and Managing Alluxio Clusters

August 15, 2023

Hope Wang

This blog provides a tutorial on using the Kubernetes operator to simplify deploying and managing Alluxio clusters on Kubernetes.

Introduction

The Alluxio Kubernetes operator makes deploying and managing Alluxio and the datasets on Kubernetes easier. With the operator, Alluxio clusters can be deployed and managed seamlessly like any other native Kubernetes application.

The operator handles common tasks like provisioning pods, configuring services, mounting storage volumes and load datasets. This automation simplifies operations and reduces the effort required to run Alluxio on Kubernetes, cutting operational costs.

The on-demand data loading enabled by the operator, via kubectl commands, allows users to load data into Alluxio only when needed. This reduces instance costs (such as EC2 costs) by avoiding storing unused data in Alluxio.

This blog provides a tutorial on deploying Alluxio on Kubernetes with Operator. You will learn the following step-by-step:

Install and deploy the Kubernetes Operator of Alluxio
Deploy and maintain dataset
Deploy Alluxio with Kubernetes Operator
Load data into Alluxio
Uninstall and clean up Alluxio and dataset

Prerequisites

A Kubernetes cluster with version at least 1.19, with feature gate enabled.
Cluster access to an Alluxio Docker image alluxio/alluxio or download an image tarball of Alluxio.
Ensure the cluster’s Kubernetes Network Policy allows for connectivity between applications (Alluxio clients) and the Alluxio Pods on the defined ports.
The control plane of the Kubernetes cluster has helm 3 with version at least 3.6.0 installed.
You will need certain RBAC permission in the Kubernetes cluster to make Operator to work.
- Permission to create CRD (Custom Resource Definition);
- Permission to create ServiceAccount, ClusterRole, and ClusterRoleBinding for the operator pods;
- Permission to create a namespace that the operator will be in.

Deploy Alluxio Kubernetes Operator

You will use the Helm Chart for deploying the Alluxio Kubernetes operator. Follow the steps below:

Download Alluxio Kubernetes Operator

Download the Alluxio Kubernetes Operator here https://github.com/Alluxio/k8s-operator and enter the root directory of the project.

Install Operator

Install the operator by running:

$ helm install operator ./deploy/charts/alluxio-operator

Operator will automatically create a namespace `alluxio-operator` and install all the components there.

Run Operator

Run the cmd below

$ kubectl get pods -n alluxio-operator

to make sure all pods of the operator are running as expected.

Deploy Dataset

Create Dataset Configuration

Create a dataset configuration dataset.yaml. Its apiVersion must be `k8s-operator.alluxio.com/v1alpha1` and `kind` must be `Dataset`. Here is an example:

apiVersion: k8s-operator.alluxio.com/v1alpha1 kind: Dataset metadata: name: my-dataset spec: dataset: path: <path of your dataset> credentials: - <property 1 for accessing your dataset> - <property 2 for accessing your dataset> - ...

Deploy Dataset

Deploy your dataset by running

$ kubectl create -f dataset.yaml

Check Status of Dataset

Check the status of the dataset by running

$ kubectl get dataset <dataset-name>

Deploy Alluxio

Prepare Resource Configuration File

Prepare a resource configuration file alluxio-config.yaml. Its `apiVersion` must be k8s-operator.alluxio.com/v1alpha1 and `kind` must be AlluxioCluster. Here is an example:

apiVersion: k8s-operator.alluxio.com/v1alpha1 kind: AlluxioCluster metadata: name: my-alluxio-cluster spec: dataset: my-dataset # dataset name is required worker: count: 4 pagestore: type: hostPath quota: 512Gi hostPath: /mnt/alluxio fuse: enabled: true

All configurable properties in the spec section can be found in deploy/charts/alluxio/values.yaml.

Deploy Alluxio Cluster

Deploy Alluxio cluster by running:

$ kubectl create -f alluxio-config.yaml

Check Status of Alluxio Cluster

Check the status of Alluxio cluster by running:

$ kubectl get alluxiocluster <alluxio-cluster-name>

Load the Data into Alluxio

To load your data into Alluxio cluster, so that your application can read the data faster, create a resource file load.yaml. Here is an example:

apiVersion: k8s-operator.alluxio.com/v1alpha1 kind: Load metadata: name: my-load spec: dataset: my-dataset path: /

Then run the following command to start the load:

$ kubectl create -f load.yaml

To check the status of the load:

$ kubectl get load

Uninstall

Run the following command to uninstall Dataset and Alluxio cluster:

$ kubectl delete dataset <dataset-name> $ kubectl delete alluxiocluster <alluxio-cluster-name>

Summary

Through this tutorial, you have learned how to leverage the operator to simplify deploying and managing Alluxio on Kubernetes.

To learn more about Alluxio, join 11k+ members in the Alluxio community slack channel to ask any questions and provide feedback.

Share this post

Blog

New Features in Alluxio Enterprise AI 3.5

With the new year comes new features in Alluxio Enterprise AI! Just weeks into 2025 and we are already bringing you exciting new features to better manage, scale, and secure your AI data with Alluxio. From advanced cache management and improved write performance to our Python SDK and S3 API enhancements, our latest release of Alluxio Enterprise AI delivers more power and performance to your AI workloads. Without further ado, let’s dig into the details.

‍

Alluxio Enterprise for Data Analytics Scales to New Heights

We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.

Introducing Rapid Alluxio Deployer On AWS: Experience The Benefits Of Alluxio Enterprise AI In A Few Clicks

We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo