DataMasque glossary of terms

Connection

A connection defines the parameters and credentials that allow DataMasque to connect to and mask a target database.

Deterministic random

By default, the mask types provided by DataMasque for generating random data (i.e. from_random_text) will produce a completely random masked value for each row the mask is applied to. This is the most secure option for masked data generation, as the masked values are generated independently of the original unmasked values.

However, there are many cases in data masking that require repeated generation of the same masked value for a given input value (i.e. the data generation must be deterministic). DataMasque achieves this by using hash-based algorithms that securely generate 'deterministic random' values. These values are uniformly distributed, but are deterministic with regard to their input. Random mask types are made deterministic by specifying the hash_columns parameter on the corresponding masking rule.

Mask

Masks are the algorithms provided by DataMasque for generating and manipulating database column values. Some mask types operate by modifying their input value (i.e. take_substring), while others act as a source of values (i.e. 'from_fixed').

When masks are combined in sequence they act as a pipeline, passing the output from one mask into the input of the next. The first mask in the sequence receives the original column value as input.

Masking run

A masking run is the application of a masking ruleset against a target database connection. Masking runs can be configured and triggered using the DataMasque web interface or API.

Rule

Every mask_table task requires a list of rules. A rule describes the sequence of one or more mask algorithms that will be applied to a single database column. In most cases you can consider there to be a one-to-one mapping of rules to database columns.

Ruleset

A ruleset is the configuration that defines the tasks that will be executed by DataMasque during a masking run. Rulesets are created and edited using the ruleset editor, and are written in the YAML configuration language.

Task

Tasks are the basic building blocks of a ruleset. Each task represents some action that DataMasque will take during a masking run. Different task types are available for common database masking needs:

To create a temporary table from an SQL query, use the temporary_table task type.
To truncate a table, use the truncate_table task type.
To run a SQL script, use the run_sql task type.
To mask a table using DataMasque mask algorithms, use the mask_table task type.
Use the special parallel and serial task types to group subtasks for parallel execution.

For more information on tasks, see the Ruleset YAML Specification.

YAML

YAML is the configuration language used to create rulesets. For a brief introduction to the YAML syntax, see: https://learnxinyminutes.com/docs/yaml/.