String masks
String masks are masks with useful behaviour for string types.
For simpler masks or masks that work on other types See all mask functions.
- Imitate (
imitate
)
Replaces characters with other characters from the same set (letters for letters, numbers for numbers) - Random Text (
from_random_text
)
Generates random strings of letters - Transform Case (
transform_case
)
Transforms the case of characters in a string - Take Substring (
take_substring
)
Extracts a substring from a column's value - Replace Substring (
replace_substring
)
Applies masks to a specific portion of string values - Replace Regular Expression (
replace_regex
)
Applies masks to parts of string values that match a given regular expression
Imitate (imitate
)
Replace each character in a string with another random character from its same set. The character sets are:
- Uppercase letters (
A-Z
). - Lowercase letters (
a-z
). - Digits (
0-9
).
Characters not in these sets (such as punctuation and symbols) are not replaced.
This mask is designed to be easy to drop in place to mask values that must have a specific format, but whose value is not important. For example, it could be used to mask:
- Phone numbers (e.g.
+1 (555) 867-5309
to+2 (938) 123-8372
) - License plates (e.g.
BZF123
toLMA191
) - Bank accounts (e.g.
10-9282-9478563-00
to23-1840-6492817-01
) - Passport numbers (e.g.
FD194845
toCZ858584
)
and so on.
imitate
is a good, simple and safe default for many data types. However, it is not intended to generate perfect
replacements for columns that must have special rules. For example, if a value must always start with the letter C
,
followed by 6 random numbers and letters, then imitate
is not suitable as the C
might be replaced with another
letter.
The uppercase
, lowercase
and digits
arguments can be used to disable the replacement of each of these character
sets. No errors are raised if a character set is enabled but those characters are not in the string, for example, it's
safe to try to replace letters in a phone number field.
Parameters
force_change
(optional): Since characters are chosen randomly, it is possible that a character might be randomly replaced with the same one (for example,A
is chosen as a replacement forA
). Setforce_change
totrue
to make sure the replacement character differs. Defaults tofalse
. Note that this makes the output slightly less random as the number of possible replacements is reduced by one.uppercase
(optional): A boolean to enable or disable the replacement of uppercase characters. Defaults totrue
(uppercase characters will be replaced).lowercase
(optional): A boolean to enable or disable the replacement of lowercase characters. Defaults totrue
(lowercase characters will be replaced).digits
(optional): A boolean to enable or disable the replacement of digits. Defaults totrue
(digits will be replaced).
Example
This example will apply imitate
masks to the phone
, license_plate
and validation_code
.
version: '1.0'
tasks:
- type: mask_table
table: employees
key: id
rules:
- column: phone
masks:
- type: imitate
- column: license_plate
masks:
- type: imitate
- column: validation_code
masks:
- type: imitate
Show result
Before | After |
|
|
---|
Random text (from_random_text
)
This mask replaces the column's value with randomly generated a-z
characters.
Parameters
max
(required): The generated character string will be this length at maximum. The maximum length must be between 1 and 100.min
(optional): The generated character string will be this length at minimum. If no value is supplied here, the generated string's length will always be equal to themax
value.case
(optional): The case (upper or lower) of the text generated. Mixed case will be generated if this field is left blank. Must be one of:upper
,lower
Example
This example replaces the values in the name
column with a random string of
lower case characters between 5 and 10 characters in length.
version: '1.0'
tasks:
- type: mask_table
table: employees
key: id
rules:
- column: name
masks:
- type: from_random_text
min: 5
max: 10
case: lower
Show result
Before | After |
|
|
---|
Transform case (transform_case
)
A mask to perform a transformation to the case/capitalisation of a string.
Parameters
transform
(required): The transformation to apply. Must be one of:uppercase
,lowercase
,capitalize_words
(capitalizes first letter of each word),capitalize_string
(capitalizes first letter only).
Example
This example will convert all values in the name
column into uppercase.
version: '1.0'
tasks:
- type: mask_table
table: employees
key: id
rules:
- column: name
masks:
- type: transform_case
transform: uppercase
Show result
Before | After |
|
|
---|
Substring (take_substring
)
A mask to select a substring from a column value. You may wish to use this to select or remove a subset of characters from the beginning, end, or middle of a string.
Parameters
start_index
(optional): The index of the first character to include in the selected substring, with 0 being the index of the first character in the string. Defaults to0
.end_index
(optional): The index of the character immediately AFTER the selected substring (i.e. theend_index
is exclusive). If omitted, the selection will continue until the end of the string.
Positive and negative indices can be used, i.e. the first character in a string
is at index 0
, the second character is at index 1
, the last character is at
index -1
, and the second-to-last character is at index -2
.
Example
This example will return only the first 3 characters of each value in the name
column. The final result
will return the characters at positions 0, 1, and 2. This is because the end_index
is exclusive; the
characters starting from the end_index
value of 3 onwards are omitted from the final result.
version: '1.0'
tasks:
- type: mask_table
table: employees
key: id
rules:
- column: name
masks:
- type: take_substring
start_index: 0
end_index: 3
Show result
Before | After |
|
|
---|
Replace substring (replace_substring
)
A mask for transforming a selected substring of a string value. The transformation is defined by a nested sequence of masks. Matched substrings are transformed in-place, leaving the unmatched sections intact. For more complex use cases, replace_regex may be helpful.
Parameters
masks
(required): A list of masks (or dictionary mapping keys to masks) that define the transformation to apply to the selected substring. The selected substring is provided as the input to the first mask.start_index
(optional): The index of the first character to include in the selected substring, with 0 being the index of the first character in the string. Defaults to0
.end_index
(optional): The index of the character immediately AFTER the selected substring (i.e. theend_index
is exclusive). If omitted, the selection will continue until the end of the string.preserve_length
(optional): If set totrue
, then the output of themasks
will be truncated or repeated until it has the same length as the original substring. This ensures the length of the entire string is unchanged. Defaults tofalse
.
Positive and negative indices can be used, i.e. the first character in a string is at index 0, the second character is at index 1, the last character is at index -1, and the second-to-last character is at index -2.
Example
This example will replace the last 3 characters of each value in the name
column with a # symbol.
The start_index
value of -3 indicates that the third to last character is the beginning of the
substring. Because the end_index
is not specified, all characters starting from the third to last
character of the string until the end of the string are masked. The final result will take the
characters at index position -3, -2 and -1, and replace those values with '#', leaving the rest
of the string unchanged.
version: '1.0'
tasks:
- type: mask_table
table: employees
key: id
rules:
- column: name
masks:
- type: replace_substring
start_index: -3
masks:
- type: from_fixed
value: '###'
Show result
Before | After |
|
|
---|
Replace regular expression (replace_regex
)
A mask for transforming sections of a string that match a certain regular expression. The transformation that is applied to each matched substring is defined by a nested sequence of masks. The matched substrings are transformed in-place, leaving the unmatched sections intact.
Parameters
masks
(required): A list of masks (or dictionary mapping keys to masks) defining the transformation to apply to each substring that matches the pattern specified inregex
. The entire sequence of masks will be applied to each substring that is matched, with the matched value being provided as the input to the first mask.regex
(required): The regular expression that will be used to search for substrings to mask. For more details on how to use regular expressions, see Common regular expression patterns.preserve_length
(optional): If set totrue
, then each output of themasks
will be truncated or repeated until it has the same length as the original matched substring. This ensures the length of the entire string is unchanged. Defaults tofalse
.
Example
This example replaces all numeric characters in the driversLicence
column with #
.
Please note that it is also best practice to wrap the regular expression in quotes to avoid
special characters being misinterpreted as YAML syntax:
version: '1.0'
tasks:
- type: mask_table
table: '"DriversLicence"'
key: id
rules:
- column: driversLicence
masks:
- type: replace_regex
regex: '[0-9]'
masks:
- type: from_fixed
value: '#'
Show result
Before | After |
|
|
---|
Substitute (substitute
) (deprecated)
The substitute
mask was renamed to imitate
in DataMasque 2.12. Unless noted in the changelog,
substitute
masks will continue to function with the same behaviour and options as the imitate
mask. Refer to
the imitate
mask documentation for usage details and configuration. Backwards
compatibility may be removed in a future DataMasque version, so substitute
should be replaced with imitate
in
rulesets, when possible.