DataMasque Installation on Amazon Elastic Kubernetes Service (EKS) Managed Nodes
DataMasque supports deployment to Elastic Kubernetes Service (EKS) clusters, with Elastic File Service (EFS) as the persistent volume storage attached to the cluster. DataMasque images are pushed from Docker to your account's private Elastic Container Registry (ECR).
Supported Versions and Instance Types
DataMasque supports EKS versions 1.27, 1.28 and 1.29.
DataMasque supports EKS nodes of EC2 instance type c5.2xlarge or larger. The minimum number of nodes is one.
DataMasque on EKS does not support masking files in Mounted Share connections. File masking on AWS S3 or Azure Blob Storage is supported.
Installation
At a high level, the installation process is:
- A. Create an EKS cluster (this step can be skipped to use an existing cluster).
- B. Configure an EFS instance and access point(s).
- C. Load DataMasque Docker images into a local Docker installation, then push them to ECR.
- D. Generate the YAML Kubernetes manifest describing the DataMasque services.
- E. Deploy the configuration to the EKS cluster.
Steps A and B are executed using eksctl
and aws
commands on the command line.
Example configuration files and commands are provided in this guide.
Steps C and D use setup scripts provided inside in the DataMasque Docker package to push images
and generate the Kubernetes manifest.
Step E is to deploy the configuration using kubectl
.
Prerequisites
Before performing installation, the following tools must be installed on the machine where the deployment instructions are being followed:
Must be configured with authentication tokens. If authentication is saved in a specific profile, be sure to specify the
--profile
argument to both theaws
andeksctl
commands.
Use these instructions as a guide for installing Docker packages only. It is not necessary to install DataMasque into Docker.
jq
CLI tool
Cluster Configuration
This section is to create the EKS cluster and EFS volume, as well as configure security. This section's steps correspond to steps A and B in the overview. Note that as some of these commands assign environment variables that are used in later commands, they are assumed to all be run in the same terminal session.
- Create an EKS cluster. Skip to Step 2 if you already have an existing EKS cluster. Use this example EKS config to create the cluster, replacing the values commented with the ones for your environment.
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: datamasque-cluster # replace with your desired cluster name
region: us-east-1 # replace with your region
version: "1.25" # replace with 1.26 or 1.27 if necessary
vpc:
id: "vpc-00000000000000000" # replace with VPC ID
subnets:
private:
us-east-1a: # replace with your region and AZ
id: "subnet-00000000000000000" # replace with subnet ID 1
us-east-1b: # replace with your region and AZ
id: "subnet-00000000000000000" # replace with subnet ID 2
managedNodeGroups:
- name: ng-1-managed
privateNetworking: true
instanceType: c5.2xlarge # specify the instance size required
securityGroups:
attachIDs: ["sg-00000000000000000"] # replace with the security group ID(s) to attach to the nodes
minSize: 1
maxSize: 2
desiredCapacity: 1
volumeType: gp3
labels: { role: stateful }
subnets:
- subnet-00000000000000000 # the subnet for the nodes
Create the cluster using eksctl
:
$ eksctl create cluster --config-file=eks-config.yaml
This command may take a few minutes to complete.
- Set up the EFS driver role for the file system
First, retrieve your AWS account ID, and the OIDC ID of the cluster that was created.
$ export CLUSTER_NAME=datamasque-cluster # cluster name from step 1
$ AWS_REGION_ID=us-east-1 # replace with your AWS region
$ AWS_ACCOUNT_ID=$(aws sts get-caller-identity | jq -r '.Account')
$ OIDC_ID=$(aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f 5)
Next, create a trust JSON file granting access to the cluster based on its OIDC ID.
This uses the AWS_ACCOUNT_ID
and OIDC_ID
variables created in the previous step.
$ cat << EOF > efs_trust.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::$AWS_ACCOUNT_ID:oidc-provider/oidc.eks.$AWS_REGION_ID.amazonaws.com/id/$OIDC_ID"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringLike": {
"oidc.eks.$AWS_REGION_ID.amazonaws.com/id/$OIDC_ID:sub": "system:serviceaccount:kube-system:efs-csi-*",
"oidc.eks.$AWS_REGION_ID.amazonaws.com/id/$OIDC_ID:aud": "sts.amazonaws.com"
}
}
}
]
}
EOF
Then, create an IAM role for EFS, using the permissions defined in efs_trust.json
$ aws iam create-role \
--role-name EKS_EFS_CSI_DriverRole \
--assume-role-policy-document file://"efs_trust.json"
Note that you may specify a different role name, provided you use the same one throughout this guide.
Finally, attach the AmazonEFSCSIDriverPolicy
to the role just created.
$ aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonEFSCSIDriverPolicy \
--role-name EKS_EFS_CSI_DriverRole
Be sure the --role-name
option matches that just created in the previous step.
More information about the EFS driver role can be found at the Amazon EFS CSI driver guide.
- Set up the EBS driver role for the underlying EBS. Note that the configuration and steps look similar to EFS but are different.
Start by creating the ebs_trust.json
file.
This assumes that the AWS_ACCOUNT_ID
and OIDC_ID
variables are still set.
$ cat << EOF > ebs_trust.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::$AWS_ACCOUNT_ID:oidc-provider/oidc.eks.$AWS_REGION_ID.amazonaws.com/id/$OIDC_ID"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.$AWS_REGION_ID.amazonaws.com/id/$OIDC_ID:aud": "sts.amazonaws.com",
"oidc.eks.$AWS_REGION_ID.amazonaws.com/id/$OIDC_ID:sub": "system:serviceaccount:kube-system:ebs-csi-controller-sa"
}
}
}
]
}
EOF
Then, create an IAM role for EBS, using the permissions defined in ebs_trust.json
.
$ aws iam create-role \
--role-name EKS_EBS_CSI_DriverRole \
--assume-role-policy-document file://"ebs_trust.json"
As with the EFS Driver Role, you may specify a different role name provided you use it throughout this guide.
Finally, attach the AmazonEBSCSIDriverPolicy
to the role just created.
$ aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--role-name EKS_EBS_CSI_DriverRole
Again, if you have specified a different role name in the previous step,
be sure to specify that in the --role-name
argument.
More information about the EBS drive role can be found at the Amazon EBS CSI driver IAM role guide.
- Verify the IAM OIDC provider exists for the cluster, and create one if it does not. First, check if the IAM OIDC
provider with the cluster's ID is already in the account. This command assumes the
OIDC_ID
variable is still set.
$ aws iam list-open-id-connect-providers | grep $OIDC_ID | cut -d "/" -f4
If the OIDC ID is output then it exists in the account and this next command can be skipped.
Otherwise, to create the IAM OIDC, use this command (which assumes the CLUSTER_NAME
variable is still set):
$ eksctl utils associate-iam-oidc-provider --cluster $CLUSTER_NAME --approve
- The Amazon EFS CSI must be deployed to the cluster. First determine the version of the EFS driver to use. This
command will list all available versions of EFS available for a specific Kubernetes version. Replace
1.25
with the version of Kubernetes as specified in the cluster definition in Step 1 (1.25, 1.26 or 1.27).
$ eksctl utils describe-addon-versions --kubernetes-version 1.25 --name aws-efs-csi-driver | grep \"AddonVersion\"
This will produce output containing the available EFS CSI versions, like this:
"AddonVersion": "v1.6.0-eksbuild.1",
"AddonVersion": "v1.5.9-eksbuild.1",
"AddonVersion": "v1.5.8-eksbuild.1",
After choosing a version, (usually the newest one is preferable) use it in the create addon
command.
In this case, version v1.6.0-eksbuild.1
.
$ eksctl create addon --cluster $CLUSTER_NAME --name aws-efs-csi-driver --version v1.6.0-eksbuild.1 \
--service-account-role-arn arn:aws:iam::$AWS_ACCOUNT_ID:role/EKS_EFS_CSI_DriverRole --force
Note the use of the EKS_EFS_CSI_DriverRole
which should match the name of the role created in Step 2.
This command can be used to check that the EFS driver was installed successfully:
$ eksctl get addon --cluster $CLUSTER_NAME | grep efs
The output should contain the role ARN and the version that was installed, for example:
aws-efs-csi-driver v1.6.0-eksbuild.1 ACTIVE 0 arn:aws:iam::<account_id>:role/EKS_EFS_CSI_DriverRole
- Deploy the Amazon EBS CSI addon to the cluster. Use the name of the
EKS_EBS_CSI_DriverRole
that was created in Step 3.
$ aws eks create-addon --cluster-name $CLUSTER_NAME --addon-name aws-ebs-csi-driver \
--service-account-role-arn arn:aws:iam::$AWS_ACCOUNT_ID:role/EKS_EBS_CSI_DriverRole
This command can be used to check that the driver EBS was installed successfully:
$ eksctl get addon --cluster $CLUSTER_NAME | grep ebs
The output should contain the role ARN and the version that was installed, for example:
aws-ebs-csi-driver v1.22.0-eksbuild.1 ACTIVE 0 arn:aws:iam::<account_id>:role/EKS_EBS_CSI_DriverRole
- Now that the roles and EKS addons are installed, the EFS can be created. The EFS is used to persist data across pod restarts as well as share data between pods.
First, retrieve the ID of VPC in which the EKS cluster was created, so that the EFS can be created in the same VPC.
$ VPC_ID=$(aws eks describe-cluster \
--name $CLUSTER_NAME \
--query "cluster.resourcesVpcConfig.vpcId" \
--output text)
The VPC ID is now in the VPC_ID
variable and can be used in subsequent commands.
Next, the CIDR range of the VPC is required, which is fetched with this command and stored in the CIDR_RANGE
variable:
$ CIDR_RANGE=$(aws ec2 describe-vpcs \
--vpc-ids $VPC_ID \
--query "Vpcs[].CidrBlock" \
--output text \
--region $AWS_REGION_ID)
New Security Groups are created that allow NFS ingress to the EFS (port 2049
).
First, create the security group saving its ID into the SECURITY_GROUP_ID
variable:
$ SECURITY_GROUP_ID=$(aws ec2 create-security-group \
--group-name DmEfsSecurityGroup \
--description "EFS security group for DataMasque" \
--vpc-id $vpc_id \
--output text)
Then ingress on port 2049
is granted to the CIDR range.
$ aws ec2 authorize-security-group-ingress \
--group-id $SECURITY_GROUP_ID \
--protocol tcp \
--port 2049 \
--cidr $CIDR_RANGE
Next the file system can be created.
The file system ID is stored in the FILE_SYSTEM_ID
variable and is required later when configuring the deployment
manifest.
$ FILE_SYSTEM_ID=$(aws efs create-file-system \
--region $AWS_REGION_ID --encrypted \
--performance-mode generalPurpose \
--tags Key=Name,Value=datamasque-eks-efs-file-system \
--query 'FileSystemId' \
--output text)
Display the file system ID using echo
:
$ echo $FILE_SYSTEM_ID
fs-11223344556677889
Retain this value for later.
- Mount targets for the EFS need to be created. These are IP addresses assigned to the EFS instance in the specified subnets it should be made available.
To get a list of available subnets, use the describe-subnets
command (this is not necessary if you already know the
subnets you want to add the EFS to).
$ aws ec2 describe-subnets \
--filters "Name=vpc-id,Values=$VPC_ID" \
--query 'Subnets[*].{SubnetId: SubnetId,AvailabilityZone: AvailabilityZone,CidrBlock: CidrBlock}' \
--output table
This command assumes that the VPC_ID
variable from Step 7 is still in scope.
After you know the IDs of the subnets in which the EFS should be available,
execute create-mount-target
for each subnet.
$ aws efs create-mount-target \
--file-system-id $FILE_SYSTEM_ID \
--subnet-id <subnet_id> \
--security-groups $SECURITY_GROUP_ID
The variables FILE_SYSTEM_ID
and SECURITY_GROUP_ID
are expected to still be in scope from Step 7.
This command must be run once for each subnet (which may not be all subnets listed in the previous command,
however, just the ones to add the EFS to).
For example, for subnets subnet-11111111111111111
and subnet-22222222222222222
, execute:
$ aws efs create-mount-target \
--file-system-id $FILE_SYSTEM_ID \
--subnet-id subnet-11111111111111111 \
--security-groups $SECURITY_GROUP_ID
$ aws efs create-mount-target \
--file-system-id $FILE_SYSTEM_ID \
--subnet-id subnet-22222222222222222 \
--security-groups $SECURITY_GROUP_ID
- Create an access point for the EFS. The user ID and group ID are both set to
1000
to match the user inside the DataMasque containers.
$ FILE_SYSTEM_AP_ID=$(aws efs create-access-point \
--file-system-id $FILE_SYSTEM_ID \
--posix-user Uid=1000,Gid=1000 \
--root-directory Path='/datamasque,CreationInfo={OwnerUid=1000,OwnerGid=1000,Permissions=777}' \
--output text \
--query 'AccessPointId')
Display the file system access point ID using echo
:
$ echo $FILE_SYSTEM_AP_ID
fsap-00112233445566778
Retain this value for use when executing the eks-prepare-manifest.sh
script
(later in the deployment configuration).
- DataMasque runs inside the
datamasque
EKS namespace. Add the namespace usingkubectl
:
$ kubectl create namespace datamasque
- Assign an IAM role to the EKS cluster using a service account. First, an IAM policy for the role must exist. You will need this IAM policy's ARN.
Create the service account with the policy attached using the eksctl create iamserviceaccount
command, as follows:
$ eksctl create iamserviceaccount \
--name datamasque-sa \
--namespace datamasque \
--role-name eks-datamasque-sa-role \
--approve \
--cluster <eks_cluster_name > \
--attach-policy-arn <policy_arn>
You will need to specify the eks_cluster_name
(the name of the EKS cluster),
and the policy_arn
, which is the ARN of the IAM policy to attach to the cluster.
The EKS and EFS configuration is now complete, and DataMasque can be deployed to the cluster.
DataMasque Deployment
In this section, DataMasque images are uploaded to ECR, then the DataMasque manifests are generated and deployed to EKS. This corresponds to steps C to E in the high level overview.
Ensure Docker is installed on the machine executing these instructions.
You must have your AWS account's ECR host name,
which is normally in the format <AWS account id>.dkr.ecr.<AWS region>.amazonaws.com
,
for example: 123456789012.dkr.ecr.us-east-1.amazonaws.com
.
You will also need:
- The ARN of the IAM role for EFS (created in Step 2 above).
- The EFS file system ID (from Step 7 above). It begins with
fs-
. - The EFS access point ID (from Step 9 above). It begins with
fsap-
.
- Extract the DataMasque Docker package and
cd
into the installation directory.
$ tar -xvzf datamasque-docker-v<version>.pkg
$ cd datamasque/<version>/
- Before pushing to ECR,
docker
must be authenticated to your account's ECR host name. The AWS private registry authentication guide has instructions for setting up authentication. In general the command is in this format:
$ docker login -u AWS -p $(aws ecr get-login-password) <ecr_host>
For example:
$ docker login -u AWS -p $(aws ecr get-login-password) 123456789012.dkr.ecr.us-east-1.amazonaws.com
Note that if you normally require sudo
to execute docker
, then prepend it to the above command, e.g:
$ sudo docker login -u AWS -p $(aws ecr get-login-password) <ecr_host>
After authenticating, the images can be loaded and pushed to ECR.
- DataMasque must be loaded into Docker on the local machine before they can be pushed to ECR.
The
eks-image-push.sh
command performs both of these steps. It must be called with your ECR host as the first and only argument. For example:
$ ./eks-image-push.sh 123456789012.dkr.ecr.us-east-1.amazonaws.com
This script will load the images into the local Docker and then push them to the specified ECR, tagged with the current DataMasque version and build number. It may take a few minutes depending on the internet connection speed.
- DataMasque will generate the Kubernetes manifest with the parameters used when setting up EKS and EFS. The parameters required are:
- The ECR host (the same as in Step 3).
- The ARN of the role for the
aws-efs-csi-driver
storage controller (usually this is the role created in Step 2 of Cluster Configuration). - The ARN of the role for the
aws-efs-csi-driver
account (usually this is the role created in Step 2 of Cluster Configuration). Both ARNs supplied may be the same. - The volume handle ID, which is made up of the file system ID and the access point ID, joined by two colons. For
example
fs-11223344556677889::fsap-00112233445566778
.
These are passed in as positional arguments to the script eks-prepare-manifest.sh
.
$ ./eks-prepare-manifest.sh -i 123456789012.dkr.ecr.us-east-1.amazonaws.com \
arn:aws:iam::123456789012:role/EKS_EFS_CSI_DriverRole \
arn:aws:iam::123456789012:role/EKS_EFS_CSI_DriverRole \
fs-11223344556677889::fsap-00112233445566778
If you are in the same shell as you used to create the cluster, and the variables are still in scope, you can use them in the command instead of the literal values:
$ ./eks-prepare-manifest.sh -i 123456789012.dkr.ecr.us-east-1.amazonaws.com \
arn:aws:iam::123456789012:role/EKS_EFS_CSI_DriverRole \
arn:aws:iam::123456789012:role/EKS_EFS_CSI_DriverRole \
$FILE_SYSTEM_ID::$FILE_SYSTEM_AP_ID
Executing this script will also generate a random password to be used for DataMasque's internal database. This should be retained as it is required when updating to a newer version of DataMasque on EKS.
After this script executes, the Kubernetes manifest files agent-queue-manifest.yaml
,
datamasque-manifest.yaml
and storage-settings.yaml
will be generated into the eks-manifest
directory.
If there were any mistakes made during the generation process, the script can be re-run with correct arguments provided. The script only generates the manifest files – it does not apply them.
- The generated manifests can now be deployed.
Note: It may take up to five minutes before the DNS entries for a new EFS access point are available. Please wait five minutes between running the
aws efs create-mount-target
command(s) (step 8 in the previous section) and thekubectl apply
command (below).
Use kubectl
to deploy the generated manifests:
$ kubectl apply -f eks-manifest
After deployment, check if the pods are ready:
$ kubectl get pods --all-namespaces
If the pods take more than five minutes to enter Running
status, these commands can help to troubleshoot.
- To find more information about why a pod is stuck in
Pending
status, use thedescribe pod
command. For example, to see information about the podadmin-db-0
:
$ kubectl describe pod admin-db-0
- To see EKS events, use
kubectl get events
. Usegrep
to filter for events for a particular pod. For example, to see events just foradmin-db-0
:
$ kubectl get events | grep admin-db-0
- If any pods are unable to find persistent volume claims (PVCs) then you will see errors regarding PVCs when
describing the pod or in the event list. Use the
get pvc
command to check if all PVCs have been bound.
$ kubectl get pvc --all-namespaces
This should show the EBS and EFS volumes created during cluster setup (and may also contain any other volumes already attached to the EKS cluster).
For example:
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
default ebs-claim Bound pvc-00000000-1111-2222-3333-444444444444 20Gi RWO ebs-sc 11s
default efs-claim Bound efs-pv 40Gi RWX efs-sc 11s
The
ebs-claim
ID will differ.
Both volumes should have the status Bound
. If they do not, check the IAM roles and permissions assigned to the
addons in steps 5 and 6 of the cluster setup instructions.
Also check that the correct IAM roles were used when generating the manifests.
- For more information about why a PVC is not
Bound
, use thedescribe pvc
command. For example, to describe theebs-claim
PVC:
$ kubectl describe pvc ebs-claim
This will give more detailed information about any errors with the PVC.
Once all DataMasque pods are available, you can continue with installation.
- The IP address of the cluster can be found by checking the IP address(es) of the EC2 node instance for the EKS cluster. Visit this IP address in a browser (e.g. https://<Node IP>) to finish DataMasque initial setup.
Increasing the number of masque-agent
pods
DataMasque can perform masking runs faster by executing tasks using multiple workers, or by running multiple tasks in parallel (see the performance optimisation documentation). When using EKS, worker tasks can be balanced across multiple nodes by starting multiple agent pods.
By default, the generated manifest eks-manifest/datamasque-manifest.yaml
only runs one masque-agent
instance.
If your EKS cluster has enough resources, the number of replicas can be increased. Each additional masque-agent
pod
requires 1200m of CPU and 5Gi of memory.
To increase the number of replicas, locate the section in the eks-manifest/datamasque-manifest.yaml
generated manifest
file that specifies the replicas for masque-agent
; it looks like this and has a comment to guide you:
replicas:
1 # Increase the number of replicas to be able to run more workers or parallel tasks at once
Increase the number of replicas, for example, to 2
:
replicas:
2 # Increase the number of replicas to be able to run more workers or parallel tasks at once
Then re-apply the manifests:
$ kubectl apply -f eks-manifest
This will add more masque-agent
workers without affecting the other pods.
You can also change the number of replicas back to 1
to scale down the pods;
however you should not change the number of replicas while a masking run is in progress.
Upgrading DataMasque on EKS
To upgrade to a newer version of DataMasque, a new EKS cluster does not need to be created.
The upgrade is performed by pushing new versions of the images to ECR, generating a new manifest YAML
(containing the new DataMasque version tags), and then using kubectl apply
to deploy.
Provided the same EFS volume is used, all data will be retained on upgrade.
- Extract the new DataMasque package version.
$ tar -xvzf datamasque-docker-v<version>.pkg
$ cd datamasque/<version>/
- Push the new DataMasque images to ECR, using the
eks-image-push.sh
script. Provide your ECR address as the argument.
$ ./eks-image-push.sh 123456789012.dkr.ecr.us-east-1.amazonaws.com
Be sure that Docker has been authenticated to push to ECR before running this command.
- Generate the updated manifest files using the
eks-prepare-manifest.sh
script. The-u
flag must be specified to indicate that this will be an upgrade, which will prompt for the internal admin database password (rather than generating a new one).
For example:
$ ./eks-prepare-manifest.sh -u 123456789012.dkr.ecr.us-east-1.amazonaws.com \
arn:aws:iam::123456789012:role/EKS_EFS_CSI_DriverRole \
arn:aws:iam::123456789012:role/EKS_EFS_CSI_DriverRole \
fs-11223344556677889::fsap-00112233445566778
You will be prompted to enter the current internal admin database password for the cluster. This password must match the password generated during the initial installation. The password can not be validated until the configuration is applied, so be sure it is correctly entered to avoid downtime during deployment.
If you enter an incorrect password, other incorrect parameters, or forget the -u
flag, it is safe to re-execute
eks-prepare-manifest.sh
. The script only generates configuration – it does not apply it.
- Deploy the updated manifest files with
kubectl
:
$ kubectl apply -f eks-manifest
Wait for the pods to be ready:
$ kubectl get pods --all-namespaces
After the pods are ready, connect to DataMasque with the same IP address or hostname as used previously.