This blog provides a tutorial on using the Kubernetes operator to simplify deploying and managing Alluxio clusters on Kubernetes.
Introduction
The Alluxio Kubernetes operator makes deploying and managing Alluxio and the datasets on Kubernetes easier. With the operator, Alluxio clusters can be deployed and managed seamlessly like any other native Kubernetes application.
The operator handles common tasks like provisioning pods, configuring services, mounting storage volumes and load datasets. This automation simplifies operations and reduces the effort required to run Alluxio on Kubernetes, cutting operational costs.
The on-demand data loading enabled by the operator, via kubectl commands, allows users to load data into Alluxio only when needed. This reduces instance costs (such as EC2 costs) by avoiding storing unused data in Alluxio.
This blog provides a tutorial on deploying Alluxio on Kubernetes with Operator. You will learn the following step-by-step:
- Install and deploy the Kubernetes Operator of Alluxio
- Deploy and maintain dataset
- Deploy Alluxio with Kubernetes Operator
- Load data into Alluxio
- Uninstall and clean up Alluxio and dataset
Prerequisites
- A Kubernetes cluster with version at least 1.19, with feature gate enabled.
- Cluster access to an Alluxio Docker image alluxio/alluxio or download an image tarball of Alluxio.
- Ensure the cluster’s Kubernetes Network Policy allows for connectivity between applications (Alluxio clients) and the Alluxio Pods on the defined ports.
- The control plane of the Kubernetes cluster has helm 3 with version at least 3.6.0 installed.
- You will need certain RBAC permission in the Kubernetes cluster to make Operator to work.
- Permission to create CRD (Custom Resource Definition);
- Permission to create ServiceAccount, ClusterRole, and ClusterRoleBinding for the operator pods;
- Permission to create a namespace that the operator will be in.
Deploy Alluxio Kubernetes Operator
You will use the Helm Chart for deploying the Alluxio Kubernetes operator. Follow the steps below:
Download Alluxio Kubernetes Operator
Download the Alluxio Kubernetes Operator here https://github.com/Alluxio/k8s-operator and enter the root directory of the project.
Install Operator
Install the operator by running:
$ helm install operator ./deploy/charts/alluxio-operator
Operator will automatically create a namespace `alluxio-operator` and install all the components there.
Run Operator
Run the cmd below
$ kubectl get pods -n alluxio-operator
to make sure all pods of the operator are running as expected.
Deploy Dataset
Create Dataset Configuration
Create a dataset configuration dataset.yaml. Its apiVersion must be `k8s-operator.alluxio.com/v1alpha1` and `kind` must be `Dataset`. Here is an example:
apiVersion: k8s-operator.alluxio.com/v1alpha1 kind: Dataset metadata: name: my-dataset spec: dataset: path: <path of your dataset> credentials: - <property 1 for accessing your dataset> - <property 2 for accessing your dataset> - ...
Deploy Dataset
Deploy your dataset by running
$ kubectl create -f dataset.yaml
Check Status of Dataset
Check the status of the dataset by running
$ kubectl get dataset <dataset-name>
Deploy Alluxio
Prepare Resource Configuration File
Prepare a resource configuration file alluxio-config.yaml. Its `apiVersion` must be k8s-operator.alluxio.com/v1alpha1 and `kind` must be AlluxioCluster. Here is an example:
apiVersion: k8s-operator.alluxio.com/v1alpha1 kind: AlluxioCluster metadata: name: my-alluxio-cluster spec: dataset: my-dataset # dataset name is required worker: count: 4 pagestore: type: hostPath quota: 512Gi hostPath: /mnt/alluxio fuse: enabled: true
All configurable properties in the spec section can be found in deploy/charts/alluxio/values.yaml.
Deploy Alluxio Cluster
Deploy Alluxio cluster by running:
$ kubectl create -f alluxio-config.yaml
Check Status of Alluxio Cluster
Check the status of Alluxio cluster by running:
$ kubectl get alluxiocluster <alluxio-cluster-name>
Load the Data into Alluxio
To load your data into Alluxio cluster, so that your application can read the data faster, create a resource file load.yaml. Here is an example:
apiVersion: k8s-operator.alluxio.com/v1alpha1 kind: Load metadata: name: my-load spec: dataset: my-dataset path: /
Then run the following command to start the load:
$ kubectl create -f load.yaml
To check the status of the load:
$ kubectl get load
Uninstall
Run the following command to uninstall Dataset and Alluxio cluster:
$ kubectl delete dataset <dataset-name> $ kubectl delete alluxiocluster <alluxio-cluster-name>
Summary
Through this tutorial, you have learned how to leverage the operator to simplify deploying and managing Alluxio on Kubernetes.
To learn more about Alluxio, join 11k+ members in the Alluxio community slack channel to ask any questions and provide feedback.
Blog
We are thrilled to announce the general availability of Alluxio Enterprise for Data Analytics 3.2! With data volumes continuing to grow at exponential rates, data platform teams face challenges in maintaining query performance, managing infrastructure costs, and ensuring scalability. This latest version of Alluxio addresses these challenges head-on with groundbreaking improvements in scalability, performance, and cost-efficiency.
We’re excited to introduce Rapid Alluxio Deployer (RAD) on AWS, which allows you to experience the performance benefits of Alluxio in less than 30 minutes. RAD is designed with a split-plane architecture, which ensures that your data remains secure within your AWS environment, giving you peace of mind while leveraging Alluxio’s capabilities.
PyTorch is one of the most popular deep learning frameworks in production today. As models become increasingly complex and dataset sizes grow, optimizing model training performance becomes crucial to reduce training times and improve productivity.