DataMasque glossary of terms
Connection
A connection defines the parameters and credentials that allow DataMasque to connect to and mask a target database.
Deterministic random
By default, the mask types provided by DataMasque for generating random data (i.e.
from_random_text
) will produce a completely random
masked value for each row the mask is applied to. This is the most secure option for masked data generation, as the
masked values are generated independently of the original unmasked values.
However, there are many cases in data masking that require repeated generation of the same masked value for a given
input value (i.e. the data generation must be deterministic). DataMasque achieves this by using hash-based algorithms
that securely generate 'deterministic random' values. These values are uniformly distributed, but are deterministic with
regards to their input. Random mask types are made deterministic by specifying the
hash_columns
parameter on the corresponding masking rule.
Mask
Masks are the algorithms provided by DataMasque for generating and manipulating database column values. Some
mask types operate by modifying their input value (i.e.
take_substring
), while others act as a source of values
(i.e. 'from_fixed').
When masks are combined in sequence they act as a pipeline, passing the output from one mask into the input of the next. The first mask in the sequence receives the original column value as input.
Masking run
A masking run is the application of a masking ruleset against a target database connection. Masking runs can be configured and triggered using the DataMasque web interface or API.
Rule
Every mask_table
task requires a list of rules
. A rule describes the sequence of one or
more mask algorithms that will be applied to a single database column. In most cases you
can consider there to be a one-to-one mapping of rules to database columns.
Ruleset
A ruleset is the configuration that defines the tasks that will be executed by DataMasque during a masking run. Rulesets are created and edited using the ruleset editor, and are written in the YAML configuration language.
Task
Tasks are the basic building blocks of a ruleset. Each task represents some action that DataMasque will take during a masking run. Different task types are available for common database masking needs:
- To create a temporary table from an SQL query, use the
temporary_table
task type. - To truncate a table, use the
truncate_table
task type. - To run a SQL script, use the
run_sql
task type. - To mask a table using DataMasque mask algorithms, use the
mask_table
task type. - Use the special
parallel
andserial
task types to group subtasks for parallel execution.
For more information on tasks, see the Ruleset YAML Specification.
YAML
YAML is the configuration language used to create rulesets. For a brief introduction to the YAML syntax, see: https://learnxinyminutes.com/docs/yaml/.