DataMasque Portal

In-Flight Masking Basic Setup and Use

This page illustrates a basic setup and use of in-flight masking. It is intended as a tutorial that can be followed to set up an in-flight mask with an example ruleset, and learn the in-flight masking concepts.

  • First it fetches a JWT to authenticate.
  • Next, it creates a simple ruleset plan that uses a from_file mask to select random first names from a file.
  • Some data is then POSTed to the ruleset plan, and the masked data received.
  • The page concludes by introducing advanced features like hashing and run secret control.

Detailed explanations of specific concepts are linked throughout this guide.

Getting Started

curl and jq

The examples on this page use the command line tool curl to make HTTP requests. It's also recommended to install the jq tool for formatting JSON output, making it easier to read. The output from the curl command can be piped through jq to pretty-print the JSON.

For example, without jq:

$ curl -X POST "https://<your-dm-instance>/api/auth/jwt/login/" \
-H "Content-Type: application/json" \
-d '{"username": "<your username>", "password": "<your password>"}'

{"refresh_token": "eyJhbGc…JlRU","access_token": "eyJhbGci…_0z1","token_type": "Bearer"}

And with the output of curl piped to jq.

$ curl -X POST "https://<your-dm-instance>/api/auth/jwt/login/" \
-H "Content-Type: application/json" \
-d '{"username": "<your username>", "password": "<your password>"}' | jq

{
  "refresh_token": "eyJhbGc…JlRU",
  "access_token": "eyJhbGci…_0z1",
  "token_type": "Bearer"
}

Each value in the returned JSON is indented and on its own line, making it easier to read.

Self-signed Certificates

If your DataMasque instance is configured with the default or other self-signed SSL certificates, then use the -k flag with curl to disable certificate verification.

For example, notice -k added as the last argument:

$ curl -X POST "https://<your-dm-instance>/api/auth/jwt/login/" \
-H "Content-Type: application/json" \
-d '{"username": "<your username>", "password": "<your password>"}' \
-k

Authentication

First, authenticate with a username and password to fetch a JWT.

$ curl -X POST "https://<your-dm-instance>/api/auth/jwt/login/" \
-H "Content-Type: application/json" \
-d '{"username": "<your username>", "password": "<your password>"}'

It will output:

{
  "refresh_token": "eyJhbGc…JlRU",
  "access_token": "eyJhbGci…_0z1",
  "token_type": "Bearer"
}

Note: JWT tokens in this document are truncated for brevity.

Save this access_token for use in subsequent examples.

The JWT will be included in the Authorization header. This is specified with a -H option preceded by the word Bearer. For example:

-H "Authorization: Bearer eyJhbGci…_0z1"

Using environment variables

Instead of including the literal access token in each request in these examples, we will store it in the JWT variable in the shell.

This can be done by executing, (for example):

$ JWT="eyJhbGci…_0z1"

Where eyJhbGci…_0z1 is the full access_token.

Shortcut using jq

If you have installed jq, then authenticating and setting the access_token to a variable can be achieved by running a single command:

$ JWT=$(curl -X POST "https://<your-dm-instance>/api/auth/jwt/login/" \
-H "Content-Type: application/json" \
-d '{"username": "<your username>", "password": "<your password>"}' | jq -r '.access_token')

Creating a ruleset plan

A ruleset plan is an API endpoint created from a YAML ruleset that masks data according to specified rules. It also includes some configuration to control how the ruleset executes.

Upload Formats

Ruleset plans can be created by sending a POST request to https://your-dm-instance/ifm/ruleset-plans/.

There are two ways of uploading a YAML ruleset to this URL.

  1. By including it as a file attachment as part of multipart form. The other configuration options are included as a JSON encoded string in the configuration form field.
  2. By encoding the YAML content as JSON and embedding it in the configuration body. The entire body is then sent as JSON to the URL.

Method one is easier to use with curl, and that is what will be used in the next example.

Method two should be used when creating masking endpoints programmatically, where the YAML string can be easily JSON encoded. The request structure is documented on the In-Flight API Reference page.

Creating the ruleset plan

This example uses a simple ruleset that masks data using random first names:

version: "1.0"
rules:
- masks:
  - type: "from_file"
    seed_file: "DataMasque_firstNames_mixed.csv"

Note: The version indicates the version of the ruleset schema that the ruleset conforms to. It is not the version of the ruleset, and should not be confused with the serial of the ruleset plan (introduced later in this document).

To create the ruleset plan, save this YAML as first_name_mask.yaml and run this curl command:

$ curl -X POST https://<your-dm-instance>/ifm/ruleset-plans/ \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: multipart/form-data" \
-F 'configuration={"name": "first-name-mask"}' \
-F "ruleset=@first_name_mask.yaml"

The file first_name_mask.yaml is uploaded in the ruleset form value. The @ symbol preceding the file name means upload the contents of the file.

The configuration is supplied as a JSON object, which contains the name of the ruleset plan to create. It's sent in the configuration form field.

Note: when quoting arguments to curl, make sure to use double quotes when inserting the $JWT variable, otherwise the token won't be interpolated. Single quotes should be used around JSON as it contains double quotes inside it.

When creating a ruleset plan, DataMasque appends a unique 6-character code to the provided name to ensure uniqueness.

An example response is shown below:

{
  "name": "first-name-mask-5zykqn",
  "created_time": "2024-12-09T21:20:26+0000",
  "modified_time": "2024-12-09T21:20:26+0000",
  "serial": 0,
  "ruleset_yaml": "version: \"1.0\"\nrules:\n- masks:\n  - type: \"from_file…",
  "options": {
    "enabled": true,
    "default_encoding": "json",
    "default_charset": "utf-8",
    "default_log_level": "WARNING"
  },
  "logs": [],
  "url": "https://<your-dm-instance>/ifm/ruleset-plans/first-name-mask-5zykqn/"
}

The response includes the ruleset plan's full url, which you'll use for all future operations (masking data, modifications, or deletion). While the response contains other fields (documented in the In-Flight API Reference), for these examples we'll just need the url.

Masking Data

Once you have created a ruleset plan, you can mask data by making POST requests to its /mask/ URL.

For this example, we'll use the ruleset plan created earlier at https://<your-dm-instance>/ifm/ruleset-plans/first-name-mask-5zykqn/mask/. You should use the url from your created ruleset plan, with mask/ appended to it.

The data to be masked must be sent as a JSON object with a data field containing an array of values. If masking a single value, it should still be contained in an array.

For example:

{"data": ["Brian"]}

Send the data using curl:

curl -X POST "https://<your-dm-instance>/ifm/ruleset-plans/first-name-mask-5zykqn/mask/" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{"data": ["Brian"]}'

The response will include the masked values and request metadata:

{
  "request_id": "721fca71-c211-4faa-80f2-2d5e356ef13a",
  "encoding": "json",
  "charset": "utf-8",
  "logs": [],
  "ruleset_plan": {
    "name": "first-name-mask-5zykqn",
    "serial": 0
  },
  "data": [
    "Tay"
  ]
}

The response includes:

  • Masked values in the data array (maintaining one-to-one mapping with input).
  • A unique request_id (automatically generated if not provided).
  • Request logs with timestamps and messages.
  • Information about the ruleset_plan that performed the masking.
    • The name of the ruleset plan.
    • The serial number of the ruleset plan. This lets you know if masking has been performed with an older version of a ruleset than you expected, which could happen if running in-flight masking on a distributed system (like Kubernetes) and you send a masking request before all instances of DataMasque have updated with the new ruleset.
  • Metadata about the encoding (always json) and charset (always utf-8).

Consistent Masking with Run Secrets and Hashing

Overview of Consistency

DataMasque allows consistent masked value generation across different masking operations. This means you can mask the same data in different contexts (in-flight, database, or file masking) and get identical results. To generate consistent output values, use the same run secret and hash value(s) across operations. If consistency across multiple DataMasque instances is required, then the instance secret should be disabled too.

At a high level, given the same instance secret, run secret, and hash value, the output value generated will be the same. Depending on the mask type, the output value may also depend on the input value.

Typically, an identifier (such as a user ID) serves as the hash value, ensuring consistent masking across systems.

Consistency With In-Flight Using hash_sources

In-flight rulesets can specify hash_sources. These are a list of places from which to retrieve the hash value. Hash values can be:

  • A value inside POSTed JSON data, by specifying a json_path.
  • A value inside POSTed XML data (XML inside JSON strings), by specifying an xpath.
  • A value from a hash_values array in the request, by specifying from_request.
  • An entire pre-masked data element from data, by specifying self.

More information about hash_sources, including the YAML syntax, can be found on the Rulesets documentation.

For this example below, using from_request as a hash source will be illustrated.

Controlling Run Secrets and Instance Secrets with In-Flight Masking

When DataMasque's in-flight server starts, or when a ruleset plan is updated, the ruleset plan receives a random run secret value. This means that value generation with hashing will be consistent only as long as the ruleset plan is unchanged and DataMasque continues running.

Furthermore, the run secret in this case is unknown, and therefore cannot be used when performing database or file masking.

When sending a masking request, if no run_secret is present in the request, then that random run secret will be used. As such, controlling consistency across different masking types is impossible.

Instead, to use a specific run secret when masking with in-flight, specify it as the run_secret value in a masking request.

For consistency across multiple DataMasque instances, the instance secret should be disabled, too. This is done by specifying "disable_instance_secret": true in the masking request.

Note: disable_instance_secret can not be set without also specifying a run_secret.

Now that run secrets have been introduced, let's look at some examples of using hash_sources.

Hashing Examples

The following examples show how to use hash values for consistent masking. We'll start by configuring a ruleset with hash_sources, then explore how hash values affect masked output. Finally, we'll see how run secrets and instance secrets provide additional control over the masking process.

First, the ruleset must be updated to add hash_sources.

version: "1.0"
hash_sources:
  - source: from_request
rules:
- masks:
  - type: "from_file"
    seed_file: "DataMasque_firstNames_mixed.csv"

Save this ruleset as first_name_mask_with_hash_sources.yaml.

The example ruleset plan can then be updated with the ruleset:

$ curl -X PATCH https://<your-dm-instance>/ifm/ruleset-plans/first-name-mask-5zykqn/ \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: multipart/form-data" \
-F "ruleset=@first_name_mask_with_hash_sources.yaml"

The response will look like this:

{
  "name": "first-name-mask-5smn8a",
  "created_time": "2024-12-09T21:19:30+0000",
  "modified_time": "2024-12-09T21:33:13+0000",
  "serial": 1,
  "ruleset_yaml":"version: \"1.0\"\nhash_sources:…",
  "options": {
    "enabled":true,
    "default_encoding": "json",
    "default_charset": "utf-8",
    "default_log_level": "WARNING"
  },
  "logs": []
}

Note that the serial number has been incremented now.

Now hash_values can be sent as part of the request. In this case, we will not send a run_secret, thus the unknown random run secret will be used. This still allows for consistency until the endpoint is updated or DataMasque is restarted.

In this example, we send an array of hash_values of equal length to the data being sent.

Note: The curl commands are omitted for these example, instead just the body of the request and response are shown. Each will be POSTed to the same masking URL that was used earlier.

{
  "data": ["Darcy", "Molly", "Evelyn"],
  "hash_values": [1283, 1416, 1283]
}

In the example, two hash values are the same, so we got the same output for them.

{
  "data": ["Salma", "Emmie", "Salma"]
  # … other fields removed for brevity
}

hash_values may also be a single value instead of an array, but this should only be used it sending a single value for data, as it means all masked values will be the same.

For example, if sending:

{
  "data": ["Darcy", "Molly", "Kye"],
  "hash_values": 1283
}

The response data would be:

{
  "data": ["Salma", "Salma", "Salma"]
  # … other fields removed for brevity
}

Each value matches where the hash value was 1283 before, as it is now applied to each item in data.

Note: hash_values must either be an array of the same length as data, or a scalar value. If an array of a different length than data is provided an error will occur.

Run Secret Examples

Now we'll introduce run_secret.

In this example, we will specify a run secret and notice how the output values change even with the same hash_values we saw earlier:

{
  "data": ["Darcy", "Molly", "Evelyn"],
  "hash_values": [1283, 1416, 1283],
  "run_secret": "L7HKnasUhjNC6KsiD7bpo"
}

The output has different values:

{
  "data": ["Tina", "Evie", "Tina"]
  # … other fields removed for brevity
}

Similarly, if we disable the instance secret for this request, the output will have different values again:

{
  "data": ["Darcy", "Molly", "Evelyn"],
  "hash_values": [1283, 1416, 1283],
  "run_secret": "L7HKnasUhjNC6KsiD7bpo",
  "disable_instance_secret": true
}

Response:

{
  "data": ["Emmie", "Lana", "Emmie"]
  # … other fields removed for brevity
}

Before using run secrets or disabling the instance secret, please refer to Performance Considerations for Run Secrets.

This is the end of the run basic setup and introduction. Please refer to the full guides for more information.