Conditional Masking
You may wish to only apply masks to rows or values that meet some conditions. DataMasque has three different methods for conditionally applying masks to meet different use cases:
- Where - restricting database fetches
mask_table
only
- If - conditional rules
mask_table
andmask_tabular_file
only
- Skip - not masking specific values
mask_table
andmask_tabular_file
only
- Conditional masking for files
Use case | Mechanism |
---|---|
I want to restrict which rows are fetched for masking from the database table. | Where |
I want to apply certain masking rules to only a subset of rows or values. | If |
I want to skip applying masks to certain column values. | Skip |
Warning: Use of the conditional masking features
where
,skip
, orif/else
, may mean your masking rules are not applied to some database rows or values. It is recommended to verify the resulting output satisfies your masking requirements.
Where - restricting database fetches
To restrict which rows are fetched for masking from a database table,
you can specify a where
clause for a masked_table
:
version: "1.0"
tasks:
- type: mask_table
table: users
key: id
where: >-
"users"."role" <> 'administrator'
rules:
...
The where
clause can refer to any columns in the masked table or joined
tables. All columns must be referenced using their table-qualified name (e.g.
Users.FirstName
). Ensure to use appropriate quoting as required. For example,
if the identifier uses a reserved word, starts with an illegal character, or is
a case-sensitive identifier.
Important!
- Any rows excluded by the
where
clause will not be masked.- The SQL you provide for the
where
clause will not be validated before execution, please take care when constructing your SQL.- The SQL you provide for the
where
clause should not end in a semicolon, as this will cause a masking error.- Any string in the where clause variables must be quoted in single quotation marks.
- Joined tables cannot be referenced in the
where
clause currently.
Note for Amazon Redshift:
- Use of
where
clause for Amazon Redshift is not yet supported in DataMasque. This is in our roadmap and will be included in future releases.
If - conditional rules
You can choose to apply certain masking rules to only a subset of rows
within a table, while still allowing other masks to be applied to those
rows. This can be achieved through the use of if
-conditions in rules
lists.
Amazon DynamoDB note: Masking Amazon DynamoDB uses a
mask_table
task, so standardmask_table
condition parameters apply.
Example
In the following example, the last_name
of all users will be replaced with
'Smith'
, but the user's gender
will determine the mask applied to their
first_name
:
version: "1.0"
tasks:
- type: mask_table
table: users
key: id
rules:
- column: last_name
masks:
- type: from_fixed
value: 'Smith'
- if:
- column: gender
equals: 'female'
rules:
- column: first_name
masks:
- type: from_fixed
value: 'Alice'
else_rules:
- if:
- column: gender
equals: 'male'
rules:
- column: first_name
masks:
- type: from_fixed
value: 'Bob'
else_rules:
- column: first_name
masks:
- type: from_fixed
value: 'Chris'
This example applies the same conditions in a mask_tabular_file
task.
version: "1.0"
name: basic_file_mask
tasks:
- type: mask_tabular_file
include:
- glob: "*.csv"
rules:
- if:
- column: gender
equals: 'female'
rules:
- column: first_name
masks:
- type: from_fixed
value: 'Alice'
else_rules:
- if:
- column: gender
equals: 'male'
rules:
- column: first_name
masks:
- type: from_fixed
value: 'Bob'
else_rules:
- column: first_name
masks:
- type: from_fixed
value: 'Chris'
Condition Parameters
if
(required): A list of conditions (see below) that must all evaluate astrue
for the nested list of rules to be applied to a row.rules
(required): A nested list of masking rules/nested-if
-conditions (or dictionary mapping labels to rules) that will only be applied to rows that meet the conditions defined underif
.else_rules
(optional): A nested list of masking rules/nested-if
-conditions (or dictionary mapping labels to rules) that will only be applied to rows that do NOT meet the conditions defined underif
.
A condition under if
can contain the following attributes.
There are 3 different parameters that can be used to select which data values the condition is applied to:
column
(required): The database column to check this condition against. The column name can be prefixed with a table name to reference a column in another table, and that table name can be prefixed with a schema name to reference a table in another schema.json_path
(required withinmask_file
tasks): For JSON records, ajson_path
must be provided to extract the value to which the condition will be applied. Orxpath
(required withinmask_file
tasks): For XML records, anxpath
must be provided to extract the value to which the condition will be applied. The following conditions can be applied to the selected data values:equals
(optional): If specified, the condition will only evaluate astrue
if the value exactly equals the specified value. Data types are also checked (i.e.100
is not equal to"100"
). Data extracted from XML is treated as strings and the comparison value in the ruleset must be quoted.matches
(optional): If specified, the condition will only evaluate astrue
if the string of the value matches the specified regular expression. For more details on how to use regular expressions, see Common regular expression patterns.less_than
(optional): If specified, the condition will only evaluate astrue
if the value is a number or date/datetime and is less than the given value. If the specified value is numeric, the value from the data will be treated as a number.less_than_or_equal
(optional): If specified, the condition will only evaluate astrue
if the value is a number or date/datetime and is less than or equal to the given value. If the specified value is numeric, the value from the data will be treated as a number.greater_than
(optional): If specified, the condition will only evaluate astrue
if the value is a number or date/datetime and is greater than the given value. If the specified value is numeric, the value from the data will be treated as a number.greater_than_or_equal
(optional): If specified, the condition will only evaluate astrue
if the value is a number or date/datetime and is greater than or equal to the given value. If the specified value is numeric, the value from the data will be treated as a number.age_greater_than
(optional): If specified, the condition will only evaluate astrue
if the value is a date/datetime and the difference in years is greater than this value. If the value is not a date/datetime usedate_format
to try to parse it as a date/datetime. Ifdate_format
is not provided the date format%Y-%m-%d
will be used as a fallback. Can be combined withage_less_than
to enforce an age within a range.age_less_than
(optional): If specified, the condition will only evaluate astrue
if the value is a date/datetime and the difference in years is less than this value. If the value is not a date/datetime usedate_format
to try to parse it as a date/datetime. Ifdate_format
is not provided the date format%Y-%m-%d
will be used as a fallback. Can be combined withage_greater_than
to enforce an age within a range.disable_auto_date_parsing
(optional): By default, strings in rulesets that appear to be dates are attempted to be parsed into dates/datetimes. This can cause comparisons to unexpectedly fail due to type mismatches. Setdisable_auto_date_parsing
totrue
to disable this automatic parsing. This setting is overridden when specifying thedate_format
argument.
When performing conditions on values containing dates/datetimes the following parameter can be specified:
date_format
(optional): The format of the date/datetime stored. If specified, the condition will attempt to load the column value or the value at thejson_path
/xpath
as a datetime with the specified format. If the value cannot be parsed with thedate_format
an error will be raised. See Date Parsing in DataMasque for more information.
Notes:
- If the comparison is any of
less_than
,less_than_or_equal
,greater_than
, orgreater_than_or_equal
:
now
orcurrent_date_time
can be specified as the value to use the current datetime at the time of masking.- Use a similar format to the date/datetime stored e.g. if the datetime contains hours:mins:seconds+time_zone, specify those in the value to compare against ("2012-12-12 10:30:00+00:00").
- If the date/datetime contains timezone information it will be compared to the specified date with UTC timezone. This also applies when using
now
orcurrent_date_time
.
An additional option can be specified for conditions within mask_file
tasks:
on_missing
(optional): Determines how to handle records where no value exists at a condition'sjson_path
orxpath
, or where the record is not valid JSON/XML and a condition specifies ajson_path
/xpath
:error
(default): Raise an error in the run log.skip
: Skip this rule for the current record/file.apply_if_rules
: Apply the rules defined inrules
that are normally applied when the condition is True.apply_else_rules
: Apply the rules defined inelse_rules
that are normally applied when the condition is False. Only permitted whenelse_rules
are provided.
Notes:
- When comparing values in an XML document,
equals
conditions comparing to non-string values will always be false since they are stored as a string in the XML document. Please make sureequals
conditions match exactly to what is contained in the document.- When a database column specified in an
if
does not exist, the masking run will always raise an error.on_missing
behaviour only applies when a no values exists at a specifiedjson_path
orxpath
or where the column/record contains invalid JSON/XML when ajson_path
/xpath
is specified.
Conditions can also be grouped with the logical operators or
, not
,
and and
:
version: "1.0"
tasks:
- type: mask_table
table: users
key: user_id
rules:
- if:
- and:
- not:
- column: username
matches: 'customer_\w+'
- or:
- column: admin
equals: true
- column: role
equals: "admin"
rules:
- column: username
masks:
- type: from_fixed
value: "Bob"
Note: When using an
if
conditional in rulesets, final row counts will reflect the number of rows processed rather than the number of rows masked. This is due to the rows being filtered on the application side and so all rows fetched will be processed and added to the row count. Alternativelywhere
conditionals can be used in the ruleset which will provide an accurate row count of masked rows.
Skip - not masking specific values
A common use-case is to not apply masks to certain values, e.g. to leave
NULL
values or empty strings unchanged. You can choose to not mask
certain values in a column by specifying a number of values to skip
:
version: "1.0"
tasks:
- type: mask_table
table: users
key: user_id
rules:
- column: username
skip:
- null
- ""
- matches: 'admin_\w+'
masks:
- type: from_fixed
value: "Bob"
Any column values that are exactly equal to any of the
string/numeric/null values in the skip
list will not be masked (data
types are also checked, i.e. 100
is not equal to "100"
).
Additionally, string column values matching a regular expression can be
skipped by specifying the skip value as matches: "my_regex"
.
For more details on how to use regular expressions,
see Common regular expression patterns.
Conditional masking for files
Conditional masking for object files (as part of a mask_table
) differs to database masking. For
databases, conditions are evaluated per row, and affect the values (columns) in only that row.
For object files, the condition is applied per file and affects values for that file.
Multi-record object files (NDJSON or Avro) are more like databases in that the conditions are applied per record in the file and affect values for that record.
Masking tabular files (with mask_tabular_file
) applies rules per row in the file, again,
similar to masking a database table.
For the full list of parameters please refer to Condition Parameters
Notes:
where
is not supported for conditional masking for files aswhere
is used to restrict what is returned in database queries, and therefore it does not apply to files.The implementation of
skip
for files is different from databases. It is used in conjunction withinclude
to choose which files to mask or not. Please refer tomask_file
tasks andmask_tabular_file
tasks.For tabular file masking, refer to the general If section.
Here are the use cases for each type of condition:
Use case | Mechanism |
---|---|
I want to apply certain masking rules to the values contained in the files. | If |
I want to skip certain files based on the file name. | Please refer to the skip option for mask_file tasks |
Below are some example rulesets for conditional file masking.
If - conditional rules (files)
Example
In the following example, the last_name
of all users will be replaced with
'Smith'
, but the user's gender
will determine the mask applied to their first_name
:
version: "1.0"
name: json_with_conditional
tasks:
- type: mask_file
rules:
- masks:
- type: json
transforms:
- path: ["last_name"]
on_missing: error
masks:
- type: from_fixed
value: "Stevens"
- if:
- json_path: ["gender"]
equals: "female"
on_missing: apply_if_rules
rules:
- masks:
- type: json
transforms:
- path: ["first_name"]
on_missing: error
masks:
- type: from_fixed
value: "Alice"
else_rules:
- if:
- json_path: ["gender"]
equals: "male"
on_missing: skip
rules:
- masks:
- type: json
transforms:
- path: ["first_name"]
on_missing: error
masks:
- type: from_fixed
value: "Bob"
else_rules:
- masks:
- type: json
transforms:
- path: ["first_name"]
on_missing: error
masks:
- type: from_fixed
value: "Jamie"
Similarly, with XML files
version: "1.0"
name: xml_with_conditional
tasks:
- type: mask_file
rules:
- masks:
- type: xml
transforms:
- path: "//Root/Employee/LastName"
on_missing: error
node_transforms:
- type: text
masks:
- type: from_fixed
value: Stevens
- if:
- xpath: "//Root/Employee/Gender/text()"
equals: "female"
on_missing: apply_if_rules
rules:
- masks:
- type: xml
transforms:
- path: "//Root/Employee/FirstName"
on_missing: error
node_transforms:
- type: text
masks:
- type: from_fixed
value: Alice
else_rules:
- if:
- xpath: "//Root/Employee/Gender/text()"
equals: "male"
on_missing: skip
rules:
- masks:
- type: xml
transforms:
- path: "//Root/Employee/FirstName"
on_missing: error
node_transforms:
- type: text
masks:
- type: from_fixed
value: Bob
else_rules:
- masks:
- type: xml
transforms:
- path: "//Root/Employee/FirstName"
on_missing: error
node_transforms:
- type: text
masks:
- type: from_fixed
value: Jamie
An example for tabular files can be found in the If section.