In-Flight Masking Basic Setup and Use
This page illustrates a basic setup and use of in-flight masking. It is intended as a tutorial that can be followed to set up an in-flight mask with an example ruleset, and learn the in-flight masking concepts.
- First it fetches a JWT to authenticate.
- Next, it creates a simple ruleset plan that uses a
from_file
mask to select random first names from a file. - Some data is then
POST
ed to the ruleset plan, and the masked data received. - The page concludes by introducing advanced features like hashing and run secret control.
Detailed explanations of specific concepts are linked throughout this guide.
Getting Started
curl
and jq
The examples on this page use the command line tool curl
to make HTTP requests.
It's also recommended to install the jq
tool for formatting JSON output,
making it easier to read.
The output from the curl
command can be piped through jq
to pretty-print the JSON.
For example, without jq
:
$ curl -X POST "https://<your-dm-instance>/api/auth/jwt/login/" \
-H "Content-Type: application/json" \
-d '{"username": "<your username>", "password": "<your password>"}'
{"refresh_token": "eyJhbGc…JlRU","access_token": "eyJhbGci…_0z1","token_type": "Bearer"}
And with the output of curl
piped to jq
.
$ curl -X POST "https://<your-dm-instance>/api/auth/jwt/login/" \
-H "Content-Type: application/json" \
-d '{"username": "<your username>", "password": "<your password>"}' | jq
{
"refresh_token": "eyJhbGc…JlRU",
"access_token": "eyJhbGci…_0z1",
"token_type": "Bearer"
}
Each value in the returned JSON is indented and on its own line, making it easier to read.
Self-signed Certificates
If your DataMasque instance is configured with the default or other self-signed SSL certificates,
then use the -k
flag with curl
to disable certificate verification.
For example, notice -k
added as the last argument:
$ curl -X POST "https://<your-dm-instance>/api/auth/jwt/login/" \
-H "Content-Type: application/json" \
-d '{"username": "<your username>", "password": "<your password>"}' \
-k
Authentication
First, authenticate with a username and password to fetch a JWT.
$ curl -X POST "https://<your-dm-instance>/api/auth/jwt/login/" \
-H "Content-Type: application/json" \
-d '{"username": "<your username>", "password": "<your password>"}'
It will output:
{
"refresh_token": "eyJhbGc…JlRU",
"access_token": "eyJhbGci…_0z1",
"token_type": "Bearer"
}
Note: JWT tokens in this document are truncated for brevity.
Save this access_token
for use in subsequent examples.
The JWT will be included in the Authorization
header.
This is specified with a -H
option preceded by the word Bearer
. For example:
-H "Authorization: Bearer eyJhbGci…_0z1"
Using environment variables
Instead of including the literal access token in each request in these examples,
we will store it in the JWT
variable in the shell.
This can be done by executing, (for example):
$ JWT="eyJhbGci…_0z1"
Where eyJhbGci…_0z1
is the full access_token
.
Shortcut using jq
If you have installed jq
,
then authenticating and setting the access_token
to a variable can be achieved by running a single command:
$ JWT=$(curl -X POST "https://<your-dm-instance>/api/auth/jwt/login/" \
-H "Content-Type: application/json" \
-d '{"username": "<your username>", "password": "<your password>"}' | jq -r '.access_token')
Creating a ruleset plan
A ruleset plan is an API endpoint created from a YAML ruleset that masks data according to specified rules. It also includes some configuration to control how the ruleset executes.
Upload Formats
Ruleset plans can be created by sending a POST
request to https://your-dm-instance/ifm/ruleset-plans/
.
There are two ways of uploading a YAML ruleset to this URL.
- By including it as a file attachment as part of multipart form.
The other configuration options are included as a JSON encoded string in the
configuration
form field. - By encoding the YAML content as JSON and embedding it in the configuration body. The entire body is then sent as JSON to the URL.
Method one is easier to use with curl
,
and that is what will be used in the next example.
Method two should be used when creating masking endpoints programmatically, where the YAML string can be easily JSON encoded. The request structure is documented on the In-Flight API Reference page.
Creating the ruleset plan
This example uses a simple ruleset that masks data using random first names:
version: "1.0"
rules:
- masks:
- type: "from_file"
seed_file: "DataMasque_firstNames_mixed.csv"
Note: The
version
indicates the version of the ruleset schema that the ruleset conforms to. It is not the version of the ruleset, and should not be confused with theserial
of the ruleset plan (introduced later in this document).
To create the ruleset plan, save this YAML as first_name_mask.yaml
and run this curl command:
$ curl -X POST https://<your-dm-instance>/ifm/ruleset-plans/ \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: multipart/form-data" \
-F 'configuration={"name": "first-name-mask"}' \
-F "ruleset=@first_name_mask.yaml"
The file first_name_mask.yaml
is uploaded in the ruleset
form value.
The @
symbol preceding the file name means upload the contents of the file.
The configuration is supplied as a JSON object,
which contains the name
of the ruleset plan to create.
It's sent in the configuration
form field.
Note: when quoting arguments to
curl
, make sure to use double quotes when inserting the$JWT
variable, otherwise the token won't be interpolated. Single quotes should be used around JSON as it contains double quotes inside it.
When creating a ruleset plan, DataMasque appends a unique 6-character code to the provided name to ensure uniqueness.
An example response is shown below:
{
"name": "first-name-mask-5zykqn",
"created_time": "2024-12-09T21:20:26+0000",
"modified_time": "2024-12-09T21:20:26+0000",
"serial": 0,
"ruleset_yaml": "version: \"1.0\"\nrules:\n- masks:\n - type: \"from_file…",
"options": {
"enabled": true,
"default_encoding": "json",
"default_charset": "utf-8",
"default_log_level": "WARNING"
},
"logs": [],
"url": "https://<your-dm-instance>/ifm/ruleset-plans/first-name-mask-5zykqn/"
}
The response includes the ruleset plan's full url
,
which you'll use for all future operations (masking data, modifications, or deletion).
While the response contains other fields
(documented in the In-Flight API Reference),
for these examples we'll just need the url
.
Masking Data
Once you have created a ruleset plan, you can mask data by making POST
requests to its /mask/
URL.
For this example, we'll use the ruleset plan created earlier at
https://<your-dm-instance>/ifm/ruleset-plans/first-name-mask-5zykqn/mask/
.
You should use the url
from your created ruleset plan,
with mask/
appended to it.
The data to be masked must be sent as a JSON object with a data
field containing an array of values.
If masking a single value, it should still be contained in an array.
For example:
{"data": ["Brian"]}
Send the data using curl
:
curl -X POST "https://<your-dm-instance>/ifm/ruleset-plans/first-name-mask-5zykqn/mask/" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{"data": ["Brian"]}'
The response will include the masked values and request metadata:
{
"request_id": "721fca71-c211-4faa-80f2-2d5e356ef13a",
"encoding": "json",
"charset": "utf-8",
"logs": [],
"ruleset_plan": {
"name": "first-name-mask-5zykqn",
"serial": 0
},
"data": [
"Tay"
]
}
The response includes:
- Masked values in the
data
array (maintaining one-to-one mapping with input). - A unique
request_id
(automatically generated if not provided). - Request logs with timestamps and messages.
- Information about the
ruleset_plan
that performed the masking.- The
name
of the ruleset plan. - The
serial
number of the ruleset plan. This lets you know if masking has been performed with an older version of a ruleset than you expected, which could happen if running in-flight masking on a distributed system (like Kubernetes) and you send a masking request before all instances of DataMasque have updated with the new ruleset.
- The
- Metadata about the encoding (always
json
) and charset (alwaysutf-8
).
Consistent Masking with Run Secrets and Hashing
Overview of Consistency
DataMasque allows consistent masked value generation across different masking operations. This means you can mask the same data in different contexts (in-flight, database, or file masking) and get identical results. To generate consistent output values, use the same run secret and hash value(s) across operations. If consistency across multiple DataMasque instances is required, then the instance secret should be disabled too.
At a high level, given the same instance secret, run secret, and hash value, the output value generated will be the same. Depending on the mask type, the output value may also depend on the input value.
Typically, an identifier (such as a user ID) serves as the hash value, ensuring consistent masking across systems.
Consistency With In-Flight Using hash_sources
In-flight rulesets can specify hash_sources
.
These are a list of places from which to retrieve the hash value.
Hash values can be:
- A value inside POSTed JSON
data
, by specifying ajson_path
. - A value inside POSTed XML
data
(XML inside JSON strings), by specifying anxpath
. - A value from a
hash_values
array in the request, by specifyingfrom_request
. - An entire pre-masked data element from
data
, by specifyingself
.
More information about hash_sources
,
including the YAML syntax,
can be found on the Rulesets documentation.
For this example below, using from_request
as a hash source will be illustrated.
Controlling Run Secrets and Instance Secrets with In-Flight Masking
When DataMasque's in-flight server starts, or when a ruleset plan is updated, the ruleset plan receives a random run secret value. This means that value generation with hashing will be consistent only as long as the ruleset plan is unchanged and DataMasque continues running.
Furthermore, the run secret in this case is unknown, and therefore cannot be used when performing database or file masking.
When sending a masking request, if no run_secret
is present in the request,
then that random run secret will be used.
As such, controlling consistency across different masking types is impossible.
Instead, to use a specific run secret when masking with in-flight,
specify it as the run_secret
value in a masking request.
For consistency across multiple DataMasque instances,
the instance secret should be disabled, too.
This is done by specifying "disable_instance_secret": true
in the masking request.
Note:
disable_instance_secret
can not be set without also specifying arun_secret
.
Now that run secrets have been introduced,
let's look at some examples of using hash_sources
.
Hashing Examples
The following examples show how to use hash values for consistent masking.
We'll start by configuring a ruleset with hash_sources
,
then explore how hash values affect masked output.
Finally, we'll see how run secrets and instance secrets provide additional control over the masking process.
First, the ruleset must be updated to add hash_sources
.
version: "1.0"
hash_sources:
- source: from_request
rules:
- masks:
- type: "from_file"
seed_file: "DataMasque_firstNames_mixed.csv"
Save this ruleset as first_name_mask_with_hash_sources.yaml
.
The example ruleset plan can then be updated with the ruleset:
$ curl -X PATCH https://<your-dm-instance>/ifm/ruleset-plans/first-name-mask-5zykqn/ \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: multipart/form-data" \
-F "ruleset=@first_name_mask_with_hash_sources.yaml"
The response will look like this:
{
"name": "first-name-mask-5smn8a",
"created_time": "2024-12-09T21:19:30+0000",
"modified_time": "2024-12-09T21:33:13+0000",
"serial": 1,
"ruleset_yaml":"version: \"1.0\"\nhash_sources:…",
"options": {
"enabled":true,
"default_encoding": "json",
"default_charset": "utf-8",
"default_log_level": "WARNING"
},
"logs": []
}
Note that the
serial
number has been incremented now.
Now hash_values
can be sent as part of the request.
In this case, we will not send a run_secret
,
thus the unknown random run secret will be used.
This still allows for consistency until the endpoint is updated or DataMasque is restarted.
In this example, we send an array of hash_values
of equal length to the data being sent.
Note: The
curl
commands are omitted for these example, instead just the body of the request and response are shown. Each will bePOST
ed to the same masking URL that was used earlier.
{
"data": ["Darcy", "Molly", "Evelyn"],
"hash_values": [1283, 1416, 1283]
}
In the example, two hash values are the same, so we got the same output for them.
{
"data": ["Salma", "Emmie", "Salma"]
# … other fields removed for brevity
}
hash_values
may also be a single value instead of an array,
but this should only be used it sending a single value for data
,
as it means all masked values will be the same.
For example, if sending:
{
"data": ["Darcy", "Molly", "Kye"],
"hash_values": 1283
}
The response data would be:
{
"data": ["Salma", "Salma", "Salma"]
# … other fields removed for brevity
}
Each value matches where the hash value was 1283
before,
as it is now applied to each item in data
.
Note:
hash_values
must either be an array of the same length asdata
, or a scalar value. If an array of a different length thandata
is provided an error will occur.
Run Secret Examples
Now we'll introduce run_secret
.
In this example,
we will specify a run secret and notice how the output values change even with the same hash_values
we saw earlier:
{
"data": ["Darcy", "Molly", "Evelyn"],
"hash_values": [1283, 1416, 1283],
"run_secret": "L7HKnasUhjNC6KsiD7bpo"
}
The output has different values:
{
"data": ["Tina", "Evie", "Tina"]
# … other fields removed for brevity
}
Similarly, if we disable the instance secret for this request, the output will have different values again:
{
"data": ["Darcy", "Molly", "Evelyn"],
"hash_values": [1283, 1416, 1283],
"run_secret": "L7HKnasUhjNC6KsiD7bpo",
"disable_instance_secret": true
}
Response:
{
"data": ["Emmie", "Lana", "Emmie"]
# … other fields removed for brevity
}
Before using run secrets or disabling the instance secret, please refer to Performance Considerations for Run Secrets.
This is the end of the run basic setup and introduction. Please refer to the full guides for more information.