DataMasque Portal

EKS Troubleshooting

  1. Pods are terminated, leaving stuck runs
  2. Multi-Attach error for volume
  3. DataMasque fails to open with a 'Disallowed host' error

1. Pods are terminated during a masking run

Problem

Pods are terminated during a masking run, leaving some runs stuck.

Solution

Generally if pods are terminated, they will restart automatically (after being rescheduled by EKS). In some circumstances, if a run is in progress and the masque-agent pod restarts, then the run may be stuck in a Running or Cancelling state.

To fix stuck runs:

  • If the run appears to still be Running, Cancel it using the DataMasque web UI. It should move to a Cancelling state, and then to Cancelled within a couple of minutes.

  • If the run stays in the Cancelling state for more than five minutes, restart the masque-0 pod by deleting it using kubectl:

    kubectl delete pod admin-db-0

    The pod will automatically be rescheduled by EKS and will clear any Cancelling runs when it starts.

2. Multi-Attach error for volume

Problem

EC2 nodes are terminated. When new nodes are created, pods do not start.

Solution

If a pod fails to start (is stuck in ContainerCreating status), use eksctl to describe the pod in question. For example, to describe the admin-db-0 pod:

$ kubectl describe pods admin-db-0

You should see a reason for the pod not starting.

If the error is similar to this:

Multi-Attach error for volume "pvc-<uuid>"
Volume is already exclusively attached to one node and can't be attached to another node.

Then the EBS volume is still attached to the terminated node. This error usually resolves itself within ten minutes, and the pod will start automatically.

3. DataMasque fails to open with a 'Disallowed host' error

Problem

EC2 nodes are terminated. When new nodes are created they have a new IP address, which causes a Disallowed host error when accessing the DataMasque web UI.

Solution

These commands are to be run on a machine with kubectl installed, which is configured to use the EKS cluster that needs updating. Information on configuring kubectl to use a specific EKS cluster can be found at the AWS creating/updating kubeconfig documentation.

  1. Run the following command to reset the allowed hosts:

    kubectl exec -it masque-0 -- bash -c 'python3 reset_allowed_hosts.py'

  2. Visit the target EKS IP address to be able to log in to DataMasque, navigate to the settings page and change the allowed hosts to the current IP Address.