Generic Masks
Generic Masks are used for simple generation or transformation of most data types.
- Fixed value (
from_fixed
)
Outputs a fixed value - From column (
from_column
)
Outputs from another column (or table) - From file (
from_file
)
Generates values sourced from a CSV file - From Blob (
from_blob
)
Outputs a value that is the entire contents of a file - From format string (
from_format_string
)
Generates random values according to a format string - From choices (
from_choices
)
Generates a random value picked from a (optionally weighted) set of values.
Fixed value (from_fixed
)
A simple mask that replaces all column values with the same fixed value.
Parameters
value
(required): The value to replace all column values with. Can be any data type, but should match that of the column being masked. This value can be enclosed in quotation marks, which will convert the value a string, or entered without quotation marks.
Example
This example will replace all values in the name
column of the employees table with 'Alex'.
version: '1.0'
tasks:
- type: mask_table
table: employees
key: ID
rules:
- column: name
masks:
- type: from_fixed
value: 'Alex'
Show result
Before | After |
|
|
---|
DataMasque also supports blank values for the from_fixed
mask type. To specify a blank entry, you can
enter the value as """
, ''
, or leave the field blank if there are no spaces.
version: '1.0'
tasks:
- type: mask_table
table: customers
key: id
rules:
column: name
masks:
- type: from_fixed
value: ''
From column (from_column
)
A mask to replace a column's value by copying the value from another database column (from the same table, or a joined table).
Parameters
source_column
(required): The name of the column to copy values from. The source column can be prefixed with a table name to reference a column in another table, and that table name can be prefixed with a schema name to reference a table in another schema.
Note:
from_column
is not supported formask_file
tasks, but is supported formask_tabular_file
tasks.
Example
This example will replace the values of the name
column in the employees
table with values
from the first_name
column of the users
table, joined on id
.
version: '1.0'
tasks:
- type: mask_table
table: employees
key: id
joins:
- target_table: users
target_key: id
source_table: employees
source_key: user_id
rules:
- column: name
masks:
- type: from_column
source_column: users.first_name
Show result
Given the users
table:
id | first_name |
---|---|
24 | Kyle |
25 | Reid |
26 | Helen |
27 | Callum |
28 | Raymond |
29 | Vivien |
The following transformation will take place on the employees
table.
Before | After |
|
|
---|
From file (from_file
)
A mask to replace column values by selecting from a column (seed_column
) in a CSV formatted seed file (seed_file
).
The seed file must contain one or more columns with named column headers. See the Files documentation for
more information on uploading seed files.
If there are any blank values in the file, they will not be considered when randomly selecting a replacement value.
If you wish to have blank values randomly selected from your seed file, ensure they are enclosed with quotation marks. For example, ""
.
Potential replacement values from the seed file rows may be pre-filtered using the table_filter_column
and
seed_filter_column
parameters. This allows you to subset the available values for each row by matching a column value
from the table to a column value in the seed file. If filtering does not match any rows in the seed file,
fallback_masks
can be provided to generate the replacement value instead.
After filtering has been applied (optional), replacement values from the seed file are selected randomly by default.
To enable deterministic selection of replacement values, you may specify one or more hash_columns
for the masking rule.
See Deterministic masking for more details.
Parameters
seed_file
(required): The name of a user-provided CSV seed file to select values from (see Files guide).seed_column
(optional): The name of the column in the seed file that will provide replacement values. This property only needs to be specified for CSV files with multiple columns.table_filter_column
andseed_filter_column
(optional): If both attributes are provided, the replacement value will only be selected from CSV rows where the value for theseed_filter_column
in the CSV row is equal to the value for thetable_filter_column
column in the database row being masked. These options are only supported for use with database masking, or for tabular file masking. These will not be applicable for other file masking circumstances.fallback_masks
(optional): If there are no rows in the CSV that match thetable_filter_column
/seed_filter_column
condition, then this nested list of masks will be applied to the column value instead. If nofallback_masks
are provided and no match is found, then the column value will be replaced with aNULL
value.null_string
(optional): If set, any values in the seed file column that match this value will be treated asNULL
(instead of a string) when written to the database.value_on_missing
(optional): If the mask were to generate aNULL
value (due to filters not matching any rows or fromNULL
values in the seed file) then use this value (if set) instead.
Example
This example masks the street_name
column with a random street name from a seed file
(DataMasque_address_example.csv) while ensuring that the randomly selected street
is from the same city as the original. A fallback value of “Main Street” is used in the event that the database
row contains a city that does not exist in the CSV.
version: '1.0'
tasks:
- type: mask_table
table: employees
key: id
rules:
- column: street_name
masks:
- type: from_file
seed_file: DataMasque_address_example.csv
seed_column: street_name
table_filter_column: city
seed_filter_column: city
fallback_masks:
- type: from_fixed
value: 'Main Street'
Show result
Given the following database table and using the DataMasque_address_example.csv seed file:
street_number | street_name | city |
---|---|---|
44 | Queen Street | Auckland |
57 | Oxford Street | London |
12 | Houston Street | New York |
5 | Fake Street | Fake City |
This mask could produce the following masked output:
street_number | street_same | city |
---|---|---|
44 | High Street | Auckland |
57 | Brick Lane | London |
12 | Maiden Lane | New York |
5 | Main Street | Fake City |
From blob (from_blob
)
A mask to replace column values or overwrite the entire file content with the entire contents of a file.
Important! The performance of
from_blob
is comparatively slower than other masking functions due to its need to write a significantly larger amount of data into the database. Consider runningfrom_blob
as its ownparallel
task.
from_blob
is not available for use inmask_tabular_file
tasks and Amazon Redshift databases.
The source file can be any file type. DataMasque includes some example binary files, as detailed in Built-in blob files). See the Files documentation for more information on uploading your own files.
Parameters
file_name
(required): The name of a user-provided file to retrieve blob data from (see Files guide).
Example
This database masking example masks the avatar
column which stores blob data and replaces all values of records that
have role
equal to admin
with the file admin_user.png
in the ruleset. Note that this example assumes the file
admin_user.png
has been uploaded to DataMasque.
version: "1.0"
name: from_blob
tasks:
- type: mask_table
table: users
key: id
rules:
- if:
- column: role
equals: "admin"
rules:
- column: avatar
masks:
- type: from_blob
file_name: "admin_user.png"
This file masking example masks .jpg
files by overwriting the content of those files with blob data from
replacement.jpg
, (assuming this file has been uploaded through the DataMasque web UI). All matched masked files will
have the same content after masking.
version: "1.0"
name: from_file
tasks:
- type: mask_file
include:
- glob: *.jpg
rules:
- masks:
- type: from_blob
file_name: "replacement.jpg"
From format string (from_format_string
)
A mask to generate values of a defined format. This is useful when all values of a column have the same format. DataMasque will generate values according to a format specified according to the format string syntax.
Parameters
value_format
(required): The format of the data to be generated, which must meet the requirements of the format string syntax.
version: '1.0'
tasks:
- type: mask_table
table: drivers
key: id
rules:
- column: licence_plate
masks:
- type: from_format_string
value_format: "{[A-Z],3}{[0-9],3}"
Show result
Before | After |
|
|
---|
Note
- In addition to what is possible with the formats for a
mask_unique_key
task there is more you can do with format strings with themask_table
task. For example, you can easily select one value from a small number of alternatives using the syntax for alternatives:{(EN|FR)}
- If you need to guarantee that generated values are unique within a column, consider using the
from_unique
mask type.
From choices (from_choices
)
A mask to replace column values by selecting from a list of choices or a dictionary of choices with weights specified.
Parameters
choices
(required): Can either be a list ofstrings
, or a list ofdictionaries
. If a list of dictionaries, each dictionary has a single key, the "choice" and value is the weight.Weights must be positive, and can be a mix of
float
andint
. Weights do not need to sum to a particular value, they are relative.If there are no weights assigned to any of the choices then all choices will have equal weights.
Examples
This example masks the department
column with a randomly selected department from the list of choices.
version: "1.0"
tasks:
- type: mask_table
table: employees
key: id
rules:
- column: department
masks:
- type: from_choices
choices:
- "accounting"
- "sales"
- "research and development"
Show result
Before | After |
|
|
---|
This example masks the department
column with a randomly selected department from the list of choices based on the weights provided.
version: "1.0"
tasks:
- type: mask_table
table: employees
key: id
rules:
- column: department
masks:
- type: from_choices
choices:
- "accounting": 10
- "sales": 15
- "research and development": 25
Show result
Before | After |
|
|
---|