Nebulaworks Insight Content Card Background - Ricardo gomez angel building facade

Leveraging AWS EBS for Kubernetes Persistent Volumes

August 27, 2019 Matthew Shiroma

A how-to guide for connecting your k8s persistent volumes with AWS, covering common questions and issues that arise from this use case.

When it comes to managing workloads in a cluster, Kubernetes is often the tool of choice, with its open-source nature and ever-expanding user base. Being a container orchestrator, it solves the issue of micromanaging numerous ephemeral containers that often host various parts of an application, grouped together via Pods. Each of these containers has its own independent storage and life cycles. Due to this distinction from running traditional virtual machines (VMs), new challenges are presented in these applications. One such challenge is file storage. To resolve this issue, Kubernetes has the concept of volumes, which allows for pods to have permanent storage space. In this blog, we will be taking a quick look at leveraging AWS EBS for Kubernetes Persistent Volumes.

Kubernetes Persistent Volumes

As of this blog, there are two different categories of volumes that exist in Kubernetes, normal volumes, and persistent volumes. Persistent volumes come with the added luxury of being independent of the pod they are attached to, making them completely independent from the pod’s life cycle. Not only that, but they are more flexible than the standard volume, such as having user-specified sizes and performance needs. Kubernetes volumes also come with the nice perk of having a multitude of different types of them to fit a user’s need. One such type of persistent volume is the AWSElasticBlockStore which is the type this blog will focus on.

Why go to the cloud?

Great question! It may be a bold move to suddenly trust a third-party developer to store your cluster’s data, especially if it contains confidential data. However, this decision has a lot of merit to it, despite the initial rebound. By utilizing another service, the cluster’s infrastructure has been greatly simplified. As we will be seeing shortly, connecting a cloud provider’s volume into your cluster is fairly straightforward. Not only that, but it will also cut costs on maintaining an in-house server that would host said solution. What’s more, a cloud provider has built-in reliability, security, and high availability that they take care of in the background. All the end-user will need to worry about is utilizing said service in their applications. This separation of operations will prove its weight in gold in the long run.

Now that we addressed the Why?, let’s do a quick dive into the How? aspect.

Pre-Requirements

To properly utilize a cloud provider’s storage for persistent volumes, one must have the following:

A working Kubernetes cluster that is hosted on AWS. This can either be done on EC2 instances (which was what this blog post was written with in mind) or using AWS EKS service. The cluster also needs to have the flag --cloud-provider=aws enabled on the kubelet, api-server, and the controller-manager during the cluster’s creation. One way to incorporate this flag is by using kubeadm init --config config.yaml when creating a new cluster. An example of what is in a config.yaml is:

`config.yaml`

apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
apiServer:
  extraArgs:
    cloud-provider: aws
controllerManager:
  extraArgs:
    cloud-provider: aws
    address: 0.0.0.0
networking:
  podSubnet: <the-value-you-put-for-the-pod-address-cidr-flag>
scheduler:
  extraArgs:
    address: 0.0.0.0
---
apiVersion: kubeadm.k8s.io/v1beta1
kind: InitConfiguration
nodeRegistration:
  kubeletExtraArgs:
    cloud-provider: aws

For best practice, it is reccomended to have your cluster hosted in the same environment that your volumes will reside in. Otherwise, you will run into issues concerning data transfer/upload speeds.

The instances in the cluster have their hostname to be the same as their private DNS entry. The quickest way to get this done is by doing the following command on your EC2 instances

sudo sed -i "s/$(hostname)/$(curl http://169.254.169.254/latest/meta-data/hostname)/g" /etc/hosts
sudo sed -i "s/$(hostname)/$(curl http://169.254.169.254/latest/meta-data/hostname)/g" /etc/hostname

sudo reboot

Walkthrough

Create the AWS Elastic Block Store (EBS) volume in the same region as your cluster. If you have the aws cli installed and configured, this command will create one for you:

aws ec2 create-volume --availability-zone=eu-west-1a --size=10 --volume-type=gp2

With this new volume, attach it onto the master node in your cluster. If you have the aws cli installed and configured, this command will perform this for you:

aws ec2 attach-volume --device /dev/xvdf --instance-id <MASTER NODE ID> --volume-id <YOUR VOLUME ID>

In the master node, check to see if your device is attached to your instance by running lsblk. If the last step worked, you should see your volume at the bottom of the list. In this case, the volume I made earlier is called nvme1n1.

NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0         7:0    0 17.9M  1 loop /snap/amazon-ssm-agent/1068
loop1         7:1    0 89.3M  1 loop /snap/core/6673
nvme0n1     259:0    0   25G  0 disk
└─nvme0n1p1 259:1    0   25G  0 part /
nvme1n1     259:2    0   10G  0 disk

With the name of the volume, create the filesystem on the volume. This only needs to be done once on the volume.

sudo mkfs -t xfs /dev/<NAME OF VOLUME FROM PREV STEP>

Create a Persistent Volume that associates the EBS you made to the cluster. An example of said volume looks like this:

`pv.yaml`

apiVersion: v1
kind: PersistentVolume
metadata:
  name: aws-pv
  labels:
    type: aws-pv
spec:
  capacity:
    storage: 3Gi
  accessModes:
    - ReadWriteOnce
  awsElasticBlockStore:
    volumeID: <YOUR EBS VOLUME ID HERE>
    fsType: xfs

Create the Persistent Volume Claim that will take a partition of the Persistent Volume we just made. An example of said claim would look like is:

`pvc.yaml`

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: aws-pvc
    labels:
       type: aws-pvc
spec:
  accessModes:
    - ReadWriteOnce
resources:
  requests:
    storage: 3Gi
   selector:
    matchLabels:
      type: <THE NAME OF THE PV YOU MADE EARLIER>

Create a Pod that takes in the Persistent Volume Claim we just made and mounts it into the Pod. An example of said pod looks like this:

`redis-cloud.yaml`

apiVersion: v1
kind: Pod
metadata:
  name: redis-cloud
spec:
  volumes:
    - name: cloud-storage
      persistentVolumeClaim:
        claimName: <NAME OF CLAIM YOU MADE EARLIER>
  containers:
    - name: redis
      image: redis
      volumeMounts:
        - name: cloud-storage
          mountPath: /cloud/data

Run the following kubectl commands on your cluster:

kubectl create -f pv.yaml
kubectl create -f pvc.yaml

To verify that your volume and claim are associated, run kubectl get pvc and look for the name of your PVC that you made.

NAME      STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
aws-pvc   Bound    aws-pv   3Gi        RWO                           3s

If the status of it says BOUND, everything is working!

With you PVC bound to the PV, now run: kubectl create -f redis-cloud.yaml
Once it is up, verify to see if the volume has been properly mounted onto the pod by doing: kubectl describe pod redis-cloud. If the Events section looks like the following, the volume mounted successfully!

Events:
  Type    Reason                  Age   From                                                  Message
  ----    ------                  ----  ----                                                  -------
  Normal  Scheduled               17s   default-scheduler                                     Successfully assigned default/redis-cloud-2 to ip-172-31-23-218.us-west-1.compute.internal
  Normal  SuccessfulAttachVolume  15s   attachdetach-controller                               AttachVolume.Attach succeeded for volume "aws-pv"
  Normal  Pulling                 7s    kubelet, ip-172-31-23-218.us-west-1.compute.internal  Pulling image "redis"
  Normal  Pulled                  2s    kubelet, ip-172-31-23-218.us-west-1.compute.internal  Successfully pulled image "redis"
  Normal  Created                 2s    kubelet, ip-172-31-23-218.us-west-1.compute.internal  Created container redis
  Normal  Started                 2s    kubelet, ip-172-31-23-218.us-west-1.compute.internal  Started container redis

Perform a local exec into the pod, using kubectl exec -it nameOfPod -- /bin/bash and verify that the volume is at the mount point that we specified (in this case, it should be at /cloud/data).
You’re done! Feel free to add files to that directory. Even if the pod is deleted, when the pod is respun up, whether it is the same exact yaml that we provided or if it is a brand new pod, that file should still be in there.

NOTE: As of this blog post, the EBS volume integration with Kubernetes PV will only work on one node at a time. This means that two nodes cannot mount the same EBS volume at once. Thus, when making deployments using PVs that are backed by EBS, be sure to properly allocate the pods being located on the instance that has the volume attached to it.

But what about Dynamic Storage Provisioning?

Another good question! One of the downfalls of using the method above is that an operator needs to create the storage resource itself on a cloud provider and then link it to a Persistent Volume. Once that is done, the developer can then create the Persistent Volume Claim to use said deployed PV. However, there is a way for storage resources on the fly by the use of Storage Classes. This works by that the Storage Class, will provision the needed storage resource onto the cloud, using the specified provisioner. In order for these to work, the cluster must have the proper IAM permissions granted to them in order to deploy the proper resources.

These Storage Class objects are declared like the following (note that this is the format for a Storage Class utilizing EBS. For more detains on other cloud providers, refer to this link):

`sc.yaml`

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-storage-class
provisioner: kubernetes.io/aws-ebs
parameters:
  type: io1
  iopsPerGB: '10'
  fsType: xfs

By deploying sc.yaml into the cluster, all an operator needs to do when it comes to provisioning volumes for their developers is creating a Persistent Volume that has the additional parameter to it:

`pvWithSC.yaml`

apiVersion: v1
kind: PersistentVolume
metadata:
  name: aws-pv-sc
  labels:
    type: sc
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: ebs-storage-class # NEW PARAMETER

Then, a developer who needs to utilize a Persistent Volume creates and deploys the following Persistent Volume Claim for their own use:

`pvcWithSC.yaml`

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: aws-pvc-sc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
  storageClassName: ebs-storage-class # NEW PARAMETER
  selector:
    matchLabels:
      type: sc

Wrap Up

Volumes are what makes applications running in Kubernetes pods much more reliable in usability. No longer does the operations need to be concerned with making sure the data is safe from deletion or loss. By leveraging cloud providers, like AWS, in connecting to your Kubernetes persistent volumes, the cluster will continue to stay reliable in performance as well as operating.