In-Flight Masking Rulesets
This document explains how to configure DataMasque's in-flight masking using rulesets. Rulesets define what data should be masked and how the masking should be performed. You'll learn how to structure rulesets, control masking behavior through various settings, and see examples of masking different types of data. The guide starts with simple examples and builds up to more complex scenarios like JSON document masking and hash-based consistent masking.
What are rulesets?
Rulesets are YAML definitions that specify how input values should be masked. Each ruleset contains one or more rules that define masking operations to be applied.
This document introduces ruleset concepts gradually, building on previous examples. It assumes you have read the basic setup and use guide, and have seen how data is masked with a ruleset.
version
Required
A schema version
is required in your ruleset.
It must be quoted and a string, not a number.
This is not valid:
version: 1.0
This is valid:
version: "1.0"
Currently, the in-flight ruleset schema is version 1.0.
rules
Required
rules
is required in a ruleset.
It is an array of mask definitions.
A full list of masks can be referenced on the
masking functions documentation.
Unsupported or Partially Supported Masks
In-flight masking supports all DataMasque masks except for:
from_column
from_unique
secure_shuffle
from_blob
In addition, not all functions of these masks are supported:
from_file
: Thetable_filter_column
andseed_filter_column
are not supported, as these only apply to filtering against database columns.
Simple Rule
This ruleset was shown in the basic setup guide. It generates a random first name.
version: "1.0"
rules:
- masks:
- type: "from_file"
seed_file: "DataMasque_firstNames_mixed.csv"
Example input and output
Note: Examples in this document omit any fields that do not affect the behaviour - for example, logs and metadata in responses. Refer to the API documentation on Masking Request Detail Object and Masking Response Detail Object for the full object schema.
Request:
{
"data": ["Darcy", "Molly", "Evelyn"]
}
Response:
{
"data": ["Salma", "Emmie", "Salma"]
}
Chaining Multiple Masks
Where multiple masks are defined in a ruleset, they are applied sequentially - the output of one mask becomes the input to the next mask.
This next example performs an uppercase transform on a randomly selected first name.
version: "1.0"
rules:
- masks:
- type: "from_file"
seed_file: "DataMasque_firstNames_mixed.csv"
- type: transform_case
transform: uppercase
Example input and output
Request:
{
"data": ["Darcy", "Molly", "Evelyn"]
}
Response:
{
"data": ["TINA", "LANA", "LEONA"]
}
Note: Since hashing has not been used, the output values are randomly generated on each request.
Mask Chaining Limitations
While masks can be chained together, some combinations may not work as expected:
Masks that generate values (like from_file
) ignore their input values entirely.
Chaining two from_file
masks will only use the output from the last one:
# The first from_file is effectively ignored
rules:
- masks:
- type: "from_file"
seed_file: "DataMasque_firstNames_mixed.csv" # Output ignored
- type: "from_file"
seed_file: "DataMasque_lastNames.csv" # Only this output is used
At this point, we've covered:
- Basic ruleset structure with versions and rules.
- Simple value generation with
from_file
. - Sequential mask chaining and its limitations.
Next we will look at masking values within JSON documents, allowing you to selectively mask fields while preserving document structure.
Advanced JSON Rules
As well as masking single values, in-flight masking can mask multiple values at once, if contained in JSON documents.
This is achieved by using a json
mask,
which applies sub-rules to specified paths in a JSON document.
Before investigating the ruleset YAML, we'll look at the documents to be masked, and explain what masks will be applied.
Consider a JSON document containing user information, where we want to mask personal details while preserving structure:
{
"id": 1,
"first_name": "Alice",
"last_name": "Apples",
"email": "alice.apples@gmail.com",
"identifier": "AAA111"
}
In this example,
the first_name
and last_name
will be masked with from_file
masks.
These rules are sub-masks of a json
mask.
This is the ruleset:
version: "1.0"
rules:
- masks:
- type: json
transforms:
- path: [first_name]
hash_sources:
- json_path: [.., id]
masks:
- type: from_file
seed_file: DataMasque_firstNames_mixed.csv
seed_column: firstname-mixed
- path: [last_name]
hash_sources:
- json_path: [.., id]
masks:
- type: from_file
seed_file: DataMasque_lastNames.csv
seed_column: lastnames
Note that each transform includes hash_sources
configured to use the document's id
field. This ensures consistent masking across systems for the same user ID.
It overrides any hash_sources
set at the top level of the ruleset.
We will expand on this ruleset later to mask other elements.
Example input and output
In this example, multiple data values will be sent at once.
Request:
{
"data": [
{
"id": 1,
"first_name": "Alice",
"last_name": "Apples",
"email": "alice.apples@gmail.com",
"identifier": "AAA111"
},
{
"id": 2,
"first_name": "Bob",
"last_name": "Boris",
"email": "bob.boris@gmail.com",
"identifier": "BXZ888"
}
]
}
Response:
{
"data": [
{
"id": 1,
"first_name": "Verena",
"last_name": "Grazina",
"email": "alice.apples@gmail.com",
"identifier": "AAA111"
},
{
"id": 2,
"first_name": "Joanny",
"last_name": "Gildenpfennig",
"email": "bob.boris@gmail.com",
"identifier": "BXZ888"
}
]
}
Notice that the JSON mask is applied to each element in the input data
array,
it does not pass the entire JSON data
to the mask.
Full JSON Example
The final JSON example extends the previous ruleset to mask the entire document. It uses:
- Email masking that combines three random values using a
concat
mask:- A random first name
- A random last name
- A random email suffix (e.g.
@example.com
)
- Identifier masking using
from_unique_imitate
- The same first name and last name masking as before
All fields use the document's id
as a hash source,
ensuring consistent masking across requests for the same user.
version: "1.0"
rules:
- masks:
- type: json
transforms:
- path: [email]
hash_sources:
- json_path: [.., id]
masks:
- type: chain
masks:
- type: concat
masks:
- type: from_file
seed_file: DataMasque_firstNames_mixed.csv
seed_column: firstname-mixed
- type: from_file
seed_file: DataMasque_lastNames.csv
seed_column: lastnames
- type: from_file
seed_file: DataMasque_fake_email_suffixes.csv
seed_column: email-suff
- path: [identifier]
hash_sources:
- json_path: [.., id]
masks:
- type: from_unique_imitate
- path: [first_name]
hash_sources:
- json_path: [.., id]
masks:
- type: from_file
seed_file: DataMasque_firstNames_mixed.csv
seed_column: firstname-mixed
- path: [last_name]
hash_sources:
- json_path: [.., id]
masks:
- type: from_file
seed_file: DataMasque_lastNames.csv
seed_column: lastnames
Request:
{
"data": [
{
"id": 1,
"first_name": "Alice",
"last_name": "Apples",
"email": "alice.apples@gmail.com",
"identifier": "AAA111"
},
{
"id": 2,
"first_name": "Bob",
"last_name": "Boris",
"email": "bob.boris@gmail.com",
"identifier": "BXZ888"
}
]
}
Response:
{
"data": [
{
"id": 1,
"first_name": "Verena",
"last_name": "Grazina",
"email": "VerenaGrazina@baptiste.org",
"identifier": "DJZ228"
},
{
"id": 2,
"first_name": "Joanny",
"last_name": "Gildenpfennig",
"email": "JoannyGildenpfennig@matrix.net",
"identifier": "CDC439"
}
]
}
hash_sources
Optional
Hash sources determine what values are used to generate consistent masking results. They can be specified at two levels:
- At the ruleset level (covered in this section)
- Within individual transforms (as seen in the JSON examples)
Examples of both have been seen in the basic setup guide.
This section looks at the various methods of specifying hash sources at the ruleset level.
hash_sources
are specified as a list at the top level of the ruleset.
When multiple hash sources are listed,
the hash values are fetched and concatenated on order to build a final hash seed.
This means if the order of hash_sources
changes the hash seed will change.
Each hash source is an object that must contain only one of these primary properties:
Path-based sources:
json_path
: A JSON path query (specified as a list) used to fetch a hash value from an element indata
.xpath
: An Xpath string used to fetch a hash value from an element indata
.
Request-based sources:
source
: Eitherfrom_request
orself
.from_request
: Fetch the hash value from ahash_values
array included in the request.self
: Use the entire data element being masked as the hash value.
Note:
from_request
andself
may only be specified at the top level of the ruleset, and not as part of ajson
orxml
mask.
The following extra properties control how the hash value is transformed after it is fetched. All are optional, and with none specified no transform is performed on the hash value.
case_transform
(optional, enum): Convert the hash value to lower- or upper-case. One of:lower
: Convert the value to lower case.upper
: Convert the value to upper case.
trim_whitespace
(optional, boolean): Iftrue
, trim whitespace from the start and end of the hash value. Defaults tofalse
(no trim is performed).coerce_whole_numbers_to_int
(optional, boolean): Iftrue
, whole number float or decimal values will be transformed to integers. For example,1.0
would be converted to1
. This is useful if IDs are not stored as integers, but are whole numbers. Even if this value istrue
, non-whole-numbers remain as floats. For example,1.5
stays as1.5
. Defaults tofalse
(conversion is not performed).
Before explaining the hash source types in more detail,
it is important to know when to specify hash_sources
at a ruleset or mask level.
Why are hash_sources
inside json
or xml
masks?
When masking values inside JSON or XML documents, sometimes the hash source can only be located relative to the node being masked.
For example, consider this JSON data where each element in data
is itself an array of objects:
{
"data": [
[
{
"id": 1,
"first_name": "Alice"
},
{
"id": 2,
"first_name": "Bob"
}
]
]
}
When masking first_name
, we need to use the corresponding id
from the same object as the hash value.
This relationship can only be expressed using relative paths within the json
mask.
The same principle applies to XML data. Consider this example:
{
"data": [
"<data><person id=\"1\"><first_name>Alice</first_name></person><person id=\"2\"><first_name>Bob</first_name></person></data>"
]
}
Here, each first_name
should be hashed using its parent node's id
attribute.
Again, this relationship requires relative path handling.
In contrast, specifying hash_sources
at a ruleset level means these relative relationships cannot be expressed,
as ruleset-level hash sources can only reference fixed paths.
We'll now look at the different hash source types in more detail, with examples.
json_path
hash source type
json_path
hash source type should be used when masking simple JSON documents that contain just a single value to mask,
and a value to use for hashing.
As discussed in the previous section,
masking multiple values in a single JSON document may not give consistent results with ruleset-level hashing.
This ruleset masks the first_name
in a JSON document by hashing on the id
included in the document.
version: "1.0"
rules:
- masks:
- type: json
transforms:
- path: [first_name]
masks:
- type: from_file
seed_file: DataMasque_firstNames_mixed.csv
seed_column: firstname-mixed
hash_sources:
- json_path: [id]
Example request:
{
"data": [
{
"id": 1,
"first_name": "Alice"
},
{
"id": 2,
"first_name": "Bob"
},
{
"id": 1,
"first_name": "Charles"
}
]
}
Example response:
{
"data": [
{
"id": 1,
"first_name": "Verena"
},
{
"id": 2,
"first_name": "Joanny"
},
{
"id": 1,
"first_name": "Verena"
}
]
}
Notice that since element 1 and element 3 both had the same id
,
their replacement first_name
s are the same.
xpath
hash source type
xpath
hash source type should be used when masking simple XML documents that contain just a single value to mask,
and a value to use for hashing.
As discussed in the previous section,
masking multiple values in a single XML document may not give consistent results with ruleset-level hashing.
This ruleset masks the first_name
element in an XML document by hashing on the id
included in the document.
version: "1.0"
rules:
- masks:
- type: xml
transforms:
- path: '/user/first_name'
node_transforms:
- type: text
masks:
- type: from_file
seed_file: DataMasque_firstNames_mixed.csv
seed_column: firstname-mixed
hash_sources:
- xpath: '/user/@id'
Example request:
{
"data": [
"<user id=\"1\"><first_name>Alice</first_name></user>",
"<user id=\"2\"><first_name>Bob</first_name></user>",
"<user id=\"1\"><first_name>Charles</first_name></user>"
]
}
Example response:
{
"data": [
"<user id=\"1\"><first_name>Verena</first_name></user>",
"<user id=\"2\"><first_name>Joanny</first_name></user>",
"<user id=\"1\"><first_name>Verena</first_name></user>"
]
}
Notice that since element 1 and element 3 both had the same id
,
their replacement first_name
s are the same.
from_request
hash source type
This example of using the
from_request
hash source type is repeated from the basic setup guide.
Using source: from_request
allows hash values to be specified in the request,
separate from the data
itself.
It has two modes of operation:
hash_values
is an array containing the same number of elements asdata
. The hash values map one-to-one to the data values, in the same order.hash_values
is a single value (string, number, or object). The same hash value is used for each data element.
These two modes are applied automatically based on the hash_values
type in the request.
The mode is not specified in the ruleset.
This ruleset selects a random first name using a from_file
mask.
It hashes on values in the request.
version: "1.0"
hash_sources:
- source: from_request
rules:
- masks:
- type: "from_file"
seed_file: "DataMasque_firstNames_mixed.csv"
The first request example uses one hash value per data value:
{
"data": ["Alice", "Bob", "Charles"],
"hash_values": [1, 2, 1]
}
Example response:
{
"data": ["Verena", "Joanny", "Verena"]
}
Again, repeated hash values lead to repeated output values.
This second request example uses just a single hash value, so all the output are the same:
{
"data": ["Alice", "Bob", "Charles"],
"hash_values": 2
}
Example response:
{
"data": ["Joanny", "Joanny", "Joanny"]
}
self
hash source type
The final type of hash source is self
,
which uses the entire data element being masked as the hash source.
The self
hash source is particularly useful when:
- You want deterministic masking without managing external hash values.
- You need consistency when the same data appears in different systems or databases.
- You're masking values where the input itself can serve as a stable identifier.
For example, when masking test data across multiple environments,
using self
ensures that each value maps consistently to its masked equivalent without needing to maintain separate
hash values or identifiers.
Note: If you need to maintain unique one-to-one relationships between input and masked values, the
from_unique_imitate
mask type should be used instead ofself
hashing.
This example ruleset masks first names using self
hashing,
meaning the same input name will always produce the same masked output.
version: "1.0"
hash_sources:
- source: self
rules:
- masks:
- type: "from_file"
seed_file: "DataMasque_firstNames_mixed.csv"
The following example demonstrates how self
hashing maintains consistency when values are repeated:
{
"data": ["Alice", "Bob", "Alice"]
}
Example response:
{
"data": ["Bonnie", "Nicholas", "Bonnie"]
}