String masks

String masks are masks with useful behaviour for string types.
For simpler masks or masks that work on other types See all mask functions.

Imitate (imitate)
Replaces characters with other characters from the same set (letters for letters, numbers for numbers)
Random Text (from_random_text)
Generates random strings of letters
Transform Case (transform_case)
Transforms the case of characters in a string
Take Substring (take_substring)
Extracts a substring from a column's value
Replace Substring (replace_substring)
Applies masks to a specific portion of string values
Replace Regular Expression (replace_regex)
Applies masks to parts of string values that match a given regular expression

Imitate (`imitate`)

Replace each character in a string with another random character from its same set. The character sets are:

Uppercase letters (A-Z).
Lowercase letters (a-z).
Digits (0-9).

Characters not in these sets (such as punctuation and symbols) are not replaced.

This mask is designed to be easy to drop in place to mask values that must have a specific format, but whose value is not important. For example, it could be used to mask:

Phone numbers (e.g. +1 (555) 867-5309 to +2 (938) 123-8372)
License plates (e.g. BZF123 to LMA191)
Bank accounts (e.g. 10-9282-9478563-00 to 23-1840-6492817-01)
Passport numbers (e.g. FD194845 to CZ858584)

and so on.

imitate is a good, simple and safe default for many data types. However, it is not intended to generate perfect replacements for columns that must have special rules. For example, if a value must always start with the letter C, followed by 6 random numbers and letters, then imitate is not suitable as the C might be replaced with another letter.

The uppercase, lowercase and digits arguments can be used to disable the replacement of each of these character sets. No errors are raised if a character set is enabled but those characters are not in the string, for example, it's safe to try to replace letters in a phone number field.

Parameters

force_change (optional): Since characters are chosen randomly, it is possible that a character might be randomly replaced with the same one (for example, A is chosen as a replacement for A). Set force_change to true to make sure the replacement character differs. Defaults to false. Note that this makes the output slightly less random as the number of possible replacements is reduced by one.
uppercase (optional): A boolean to enable or disable the replacement of uppercase characters. Defaults to true (uppercase characters will be replaced).
lowercase (optional): A boolean to enable or disable the replacement of lowercase characters. Defaults to true (lowercase characters will be replaced).
digits (optional): A boolean to enable or disable the replacement of digits. Defaults to true (digits will be replaced).

Example

This example will apply imitate masks to the phone, license_plate and validation_code.

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: phone
        masks:
          - type: imitate
      - column: license_plate
        masks:
          - type: imitate
      - column: validation_code
        masks:
          - type: imitate

Show result

Before

After

phone	license_plate	validation_code
(09) 8198822	BA981	aFec9-LIZN7
+64 (21) 0917762	GL1748	77HG8-bbA9
1-800 GET-MASQUE	CDF345	Lm85-gC5D

phone	license_plate	validation_code
(29) 01691548	BV912	bZwh0-NCZY9
+91 (45) 54173964	XP9165	01MV0-kqC7
2-975 JDV-PLASHE	LCU788	Ys04-wL9V

Random text (`from_random_text`)

This mask replaces the column's value with randomly generated a-z characters.

Parameters

max (required): The generated character string will be this length at maximum. The maximum length must be between 1 and 100.
min (optional): The generated character string will be this length at minimum. If no value is supplied here, the generated string's length will always be equal to the max value.
case (optional): The case (upper or lower) of the text generated. Mixed case will be generated if this field is left blank. Must be one of: upper, lower

Example

This example replaces the values in the name column with a random string of lower case characters between 5 and 10 characters in length.

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: name
        masks:
          - type: from_random_text
            min: 5
            max: 10
            case: lower

Show result

Before

After

name
Bill
Chris
Anastasia
Judith
Gordon
Joel

name
fjggrw
bjoquazqit
pljfrey
sdnbomx
wpoieut
yptrf

Transform case (`transform_case`)

A mask to perform a transformation to the case/capitalisation of a string.

Parameters

transform (required): The transformation to apply. Must be one of: uppercase, lowercase, capitalize_words (capitalizes first letter of each word), capitalize_string (capitalizes first letter only).

Example

This example will convert all values in the name column into uppercase.

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: name
        masks:
          - type: transform_case
            transform: uppercase

Show result

Before

After

name
Bill
Chris
Anastasia
Judith
Gordon
Joel

name
BILL
CHRIS
ANASTASIA
JUDITH
GORDON
JOEL

Substring (`take_substring`)

A mask to select a substring from a column value. You may wish to use this to select or remove a subset of characters from the beginning, end, or middle of a string.

Parameters

start_index (optional): The index of the first character to include in the selected substring, with 0 being the index of the first character in the string. Defaults to 0.
end_index (optional): The index of the character immediately AFTER the selected substring (i.e. the end_index is exclusive). If omitted, the selection will continue until the end of the string.

Positive and negative indices can be used, i.e. the first character in a string is at index 0, the second character is at index 1, the last character is at index -1, and the second-to-last character is at index -2.

Example

This example will return only the first 3 characters of each value in the name column. The final result will return the characters at positions 0, 1, and 2. This is because the end_index is exclusive; the characters starting from the end_index value of 3 onwards are omitted from the final result.

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: name
        masks:
          - type: take_substring
            start_index: 0
            end_index: 3

Show result

Before

After

name
Bill
Chris
Anastasia
Judith
Gordon
Joel

name
Bil
Chr
Ana
Jud
Gor
Joe

Replace substring (`replace_substring`)

A mask for transforming a selected substring of a string value. The transformation is defined by a nested sequence of masks. Matched substrings are transformed in-place, leaving the unmatched sections intact. For more complex use cases, replace_regex may be helpful.

Parameters

masks(required): A list of masks (or dictionary mapping keys to masks) that define the transformation to apply to the selected substring. The selected substring is provided as the input to the first mask.
start_index (optional): The index of the first character to include in the selected substring, with 0 being the index of the first character in the string. Defaults to 0.
end_index (optional): The index of the character immediately AFTER the selected substring (i.e. the end_index is exclusive). If omitted, the selection will continue until the end of the string.
preserve_length (optional): If set to true, then the output of the masks will be truncated or repeated until it has the same length as the original substring. This ensures the length of the entire string is unchanged. Defaults to false.

Positive and negative indices can be used, i.e. the first character in a string is at index 0, the second character is at index 1, the last character is at index -1, and the second-to-last character is at index -2.

Example

This example will replace the last 3 characters of each value in the name column with a # symbol. The start_index value of -3 indicates that the third to last character is the beginning of the substring. Because the end_index is not specified, all characters starting from the third to last character of the string until the end of the string are masked. The final result will take the characters at index position -3, -2 and -1, and replace those values with '#', leaving the rest of the string unchanged.

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: name
        masks:
          - type: replace_substring
            start_index: -3
            masks:
              - type: from_fixed
                value: '###'

Show result

Before

After

name
Bill
Chris
Anastasia
Judith
Gordon
Joel

name
B###
Ch###
Anasta###
Jud###
Gor###
J###

Replace regular expression (`replace_regex`)

A mask for transforming sections of a string that match a certain regular expression. The transformation that is applied to each matched substring is defined by a nested sequence of masks. The matched substrings are transformed in-place, leaving the unmatched sections intact.

Parameters

masks(required): A list of masks (or dictionary mapping keys to masks) defining the transformation to apply to each substring that matches the pattern specified in regex. The entire sequence of masks will be applied to each substring that is matched, with the matched value being provided as the input to the first mask.
regex (required): The regular expression that will be used to search for substrings to mask. For more details on how to use regular expressions, see Common regular expression patterns.
preserve_length (optional): If set to true, then each output of the masks will be truncated or repeated until it has the same length as the original matched substring. This ensures the length of the entire string is unchanged. Defaults to false.

Example

This example replaces all numeric characters in the driversLicence column with #. Please note that it is also best practice to wrap the regular expression in quotes to avoid special characters being misinterpreted as YAML syntax:

version: '1.0'
tasks:
  - type: mask_table
    table: '"DriversLicence"'
    key: id
    rules:
      - column: driversLicence
        masks:
          - type: replace_regex
            regex: '[0-9]'
            masks:
              - type: from_fixed
                value: '#'

Show result

Before

After

driversLicence
AB123456
CD987654
EF135790
GH246802
IJ112358

driversLicence
AB######
CD######
EF######
GH######
IJ######

Substitute (`substitute`) (deprecated)

The substitute mask was renamed to imitate in DataMasque 2.12. Unless noted in the changelog, substitute masks will continue to function with the same behaviour and options as the imitate mask. Refer to the imitate mask documentation for usage details and configuration. Backwards compatibility may be removed in a future DataMasque version, so substitute should be replaced with imitate in rulesets, when possible.

String masks

Imitate (imitate)

Parameters

Example

Random text (from_random_text)

Parameters

Example

Transform case (transform_case)

Parameters

Example

Substring (take_substring)

Parameters

Example

Replace substring (replace_substring)

Parameters

Example

Replace regular expression (replace_regex)

Parameters

Example

Substitute (substitute) (deprecated)

Imitate (`imitate`)

Random text (`from_random_text`)

Transform case (`transform_case`)

Substring (`take_substring`)

Replace substring (`replace_substring`)

Replace regular expression (`replace_regex`)

Substitute (`substitute`) (deprecated)