DataMasque Portal

Generic Masks

Generic Masks are used for simple generation or transformation of most data types.

Fixed value (from_fixed)

A simple mask that replaces all column values with the same fixed value.

Parameters
  • value (required): The value to replace all column values with. Can be any data type, but should match that of the column being masked. This value can be enclosed in quotation marks, which will convert the value a string, or entered without quotation marks.
Example

This example will replace all values in the name column of the employees table with 'Alex'.

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: ID
    rules:
      - column: name
        masks:
          - type: from_fixed
            value: 'Alex'

Show result

Before After
name
Bill
Chris
Anastasia
Judith
Gordon
Joel
name
Alex
Alex
Alex
Alex
Alex
Alex

DataMasque also supports blank values for the from_fixed mask type. To specify a blank entry, you can enter the value as """, '', or leave the field blank if there are no spaces.

version: '1.0'
tasks:
  - type: mask_table
    table: customers
    key: id
    rules:
      column: name
      masks:
      - type: from_fixed
        value: ''

From column (from_column)

A mask to replace a column's value by copying the value from another database column (from the same table, or a joined table).

Parameters
  • source_column (required): The name of the column to copy values from. The source column can be prefixed with a table name to reference a column in another table, and that table name can be prefixed with a schema name to reference a table in another schema.

Note: from_column is not supported for mask_file tasks, but is supported for mask_tabular_file tasks.

Example

This example will replace the values of the name column in the employees table with values from the first_name column of the users table, joined on id.

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: id
    joins:
      - target_table: users
        target_key: id
        source_table: employees
        source_key: user_id
    rules:
      - column: name
        masks:
          - type: from_column
            source_column: users.first_name

Show result

Given the users table:

id first_name
24 Kyle
25 Reid
26 Helen
27 Callum
28 Raymond
29 Vivien

The following transformation will take place on the employees table.

Before After
user_id name
26Bill
25Chris
29Anastasia
24Judith
27Gordon
28Joel
user_id name
26Helen
25Reid
29Vivien
24Kyle
27Callum
28Raymond


From file (from_file)

A mask to replace column values by selecting from a column (seed_column) in a CSV formatted seed file (seed_file). The seed file must contain one or more columns with named column headers. See the Files documentation for more information on uploading seed files.

If there are any blank values in the file, they will not be considered when randomly selecting a replacement value. If you wish to have blank values randomly selected from your seed file, ensure they are enclosed with quotation marks. For example, "".

Potential replacement values from the seed file rows may be pre-filtered using the table_filter_column and seed_filter_column parameters. This allows you to subset the available values for each row by matching a column value from the table to a column value in the seed file. If filtering does not match any rows in the seed file, fallback_masks can be provided to generate the replacement value instead.

After filtering has been applied (optional), replacement values from the seed file are selected randomly by default. To enable deterministic selection of replacement values, you may specify one or more hash_columns for the masking rule. See Deterministic masking for more details.

Parameters
  • seed_file (required): The name of a user-provided CSV seed file to select values from (see Files guide).
  • seed_column (optional): The name of the column in the seed file that will provide replacement values. This property only needs to be specified for CSV files with multiple columns.
  • table_filter_column and seed_filter_column (optional): If both attributes are provided, the replacement value will only be selected from CSV rows where the value for the seed_filter_column in the CSV row is equal to the value for the table_filter_column column in the database row being masked. These options are only supported for use with database masking, or for tabular file masking. These will not be applicable for other file masking circumstances.
  • fallback_masks (optional): If there are no rows in the CSV that match the table_filter_column/seed_filter_column condition, then this nested list of masks will be applied to the column value instead. If no fallback_masks are provided and no match is found, then the column value will be replaced with a NULL value.
  • null_string (optional): If set, any values in the seed file column that match this value will be treated as NULL (instead of a string) when written to the database.
  • value_on_missing (optional): If the mask were to generate a NULL value (due to filters not matching any rows or from NULL values in the seed file) then use this value (if set) instead.
Example

This example masks the street_name column with a random street name from a seed file (DataMasque_address_example.csv) while ensuring that the randomly selected street is from the same city as the original. A fallback value of “Main Street” is used in the event that the database row contains a city that does not exist in the CSV.

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: street_name
        masks:
          - type: from_file
            seed_file: DataMasque_address_example.csv
            seed_column: street_name
            table_filter_column: city
            seed_filter_column: city
            fallback_masks:
              - type: from_fixed
                value: 'Main Street'

Show result

Given the following database table and using the DataMasque_address_example.csv seed file:

street_number street_name city
44 Queen Street Auckland
57 Oxford Street London
12 Houston Street New York
5 Fake Street Fake City

This mask could produce the following masked output:

street_number street_same city
44 High Street Auckland
57 Brick Lane London
12 Maiden Lane New York
5 Main Street Fake City


From blob (from_blob)

A mask to replace column values or overwrite the entire file content with the entire contents of a file.

Important! The performance of from_blob is comparatively slower than other masking functions due to its need to write a significantly larger amount of data into the database. Consider running from_blob as its own parallel task.

from_blob is not available for use in mask_tabular_file tasks and Amazon Redshift databases.

The source file can be any file type. DataMasque includes some example binary files, as detailed in Built-in blob files). See the Files documentation for more information on uploading your own files.

Parameters
  • file_name (required): The name of a user-provided file to retrieve blob data from (see Files guide).
Example

This database masking example masks the avatar column which stores blob data and replaces all values of records that have role equal to admin with the file admin_user.png in the ruleset. Note that this example assumes the file admin_user.png has been uploaded to DataMasque.

version: "1.0"
name: from_blob
tasks:
  - type: mask_table
    table: users
    key: id
    rules:
      - if:
        - column: role
          equals: "admin"
        rules:
          - column: avatar
            masks:
              - type: from_blob
                file_name: "admin_user.png"

This file masking example masks .jpg files by overwriting the content of those files with blob data from replacement.jpg, (assuming this file has been uploaded through the DataMasque web UI). All matched masked files will have the same content after masking.

version: "1.0"
name: from_file
tasks:
  - type: mask_file
    include:
      - glob: *.jpg
    rules:
      - masks:
        - type: from_blob
          file_name: "replacement.jpg"

From format string (from_format_string)

A mask to generate values of a defined format. This is useful when all values of a column have the same format. DataMasque will generate values according to a format specified according to the format string syntax.

Parameters
  • value_format (required): The format of the data to be generated, which must meet the requirements of the format string syntax.
version: '1.0'
tasks:
  - type: mask_table
    table: drivers
    key: id
    rules:
      - column: licence_plate
        masks:
          - type: from_format_string
            value_format: "{[A-Z],3}{[0-9],3}"

Show result

Before After
licence_plate
ABC123
DEF456
GHI789
JKL246
MNP813
RTU579
licence_plate
VTC799
GEQ249
HWQ462
INY664
AKZ548
SCL654

Note

  • In addition to what is possible with the formats for a mask_unique_key task there is more you can do with format strings with the mask_table task. For example, you can easily select one value from a small number of alternatives using the syntax for alternatives: {(EN|FR)}
  • If you need to guarantee that generated values are unique within a column, consider using the from_unique mask type.

From choices (from_choices)

A mask to replace column values by selecting from a list of choices or a dictionary of choices with weights specified.

Parameters
  • choices (required): Can either be a list of strings, or a list of dictionaries. If a list of dictionaries, each dictionary has a single key, the "choice" and value is the weight.

    Weights must be positive, and can be a mix of float and int. Weights do not need to sum to a particular value, they are relative.

    If there are no weights assigned to any of the choices then all choices will have equal weights.

Examples

This example masks the department column with a randomly selected department from the list of choices.

version: "1.0"
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: department
        masks:
          - type: from_choices
            choices:
              - "accounting"
              - "sales"
              - "research and development"

Show result

Before After
department
employee
employee
employee
employee
employee
employee
employee
employee
employee
employee
department
research and development
accounting
sales
research and development
accounting
sales
research and development
sales
accounting
research and development

This example masks the department column with a randomly selected department from the list of choices based on the weights provided.

version: "1.0"
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: department
        masks:
          - type: from_choices
            choices:
              - "accounting": 10
              - "sales": 15
              - "research and development": 25

Show result

Before After
department
sales
sales
sales
sales
sales
sales
sales
sales
sales
sales
department
research and development
accounting
sales
research and development
research and development
sales
research and development
sales
accounting
research and development