Unique Masks
Unique masks are masks that will ensure every value is replaced with a unique value for that column. For other masks see all mask functions.
 From Unique (
from_unique
)
Generates random but unique strings or numbers, from a format string (databases only)  From Unique Imitate (
from_unique_imitate
)
Transforms strings or numbers to be random, retaining format and uniqueness
From unique (from_unique
)
A mask that generates string values that are guaranteed to be unique within the target column.
Note: To automatically cascade primary and unique key values to
foreign keys or mask composite keys, consider using
mask_unique_key
instead.
Parameters
format
(required): The format that will be used to generate values. See format string syntax for details on how to construct a format string.
version: '1.0'
tasks:
 type: mask_table
table: drivers
key: id
rules:
 column: licence_plate
masks:
 type: from_unique
format: "{[AZ],3}{[09],3}"
Show result
Before  After  



Note
 All values produced by
from_unique
will be strings. To convert values to other data types, you canchain
yourfrom_unique
mask with atypecast
mask.  Unlike
from_format_string
,from_unique
ensures generated values are unique. This is achieved using the same underlying unique value generation procedure asmask_unique_key
.  The specified
format
must allow for a sufficient number of unique values to cover the full number of rows in the target table or file (rounded up to the nearest multiple of thebatch_size
formask_table
tasks), otherwise an error will be reported when executing the run.  Using the
from_unique
mask type differs from themask_unique_key
task type in the following ways:from_unique
cannot perform any cascading of values to related columns, such as foreign keys.from_unique
cannot be used to guarantee joint uniqueness across the columns in a composite key, unless guaranteeing uniqueness within a single column within the composite key is sufficient for the target use case.from_unique
can be used to update unique columns at the same time as other columns in amask_table
task, without the need for a separate database update performed by amask_unique_key
task.from_unique
can be used to generate unique values inmask_tabular_file
tasks or inmask_table
tasks for databases that do not supportmask_unique_key
(such as Amazon DynamoDB).
 Using
from_unique
in amask_table
task whereworker_count
> 1 is only supported for Oracle and Amazon DynamoDB connections. from_unique
cannot be used in the rules of amask_file
task. Like other mask functions,
from_unique
cannot be used to update thekey
columns of amask_table
task (except for databases that allowmask_table
to update key columns, such as Amazon DynamoDB).  All
from_unique
masks within a given task will be produced from the same sequence of unique values, such that twofrom_unique
masks with identicalformat
will produce the same value for the same row. This can be useful to ensure values produced by identicalfrom_unique
masks in differentif
branches are jointly unique.
From Unique Imitate (from_unique_imitate
)
Note: When used in a
mask_table
task, this mask type will be skipped in a dry run as it requires writing to the database.
The from_unique_imitate
mask is very similar to the imitate mask:
 alphabetical characters
az
andAZ
are replaced by other alphabetical characters of the same case  digits are replaced by other digits
 all other characters (whitespace, symbols such as
%
, and so on) are left asis.
However, there are the following differences:
 Most notably,
from_unique_imitate
can also be used on primary keys and columns that have aUNIQUE
constraint. Indeed, it is designed specifically for data that must take unique values per row.  The masked values are guaranteed to be consistent for the same given input, and different for different inputs.
(The specific values generated will vary across runs unless you control the seeding.)
 For example, if a column has a
UNIQUE
constraint and the existing data satisfies that constraint, the resulting masked values will all be unique.  If a column contains the same value in every row, the content of the column after masking will also have the same value in every row.
 For example, if a column has a
The from_unique_imitate
mask will always create a different result to its input,
as if it has an implicit force_change: true
.
The mask may occasionally preserve individual letters and digits in the same position,
but it is always guaranteed that at least one (and very likely almost all) of the alphanumeric characters will be different.
For example, ABC123
may be masked to BYC457
.
This mask can only be used on columns of integer
or string (char
/ varchar
) type.
null
values will always be left as null
.
Important!
from_unique_imitate
does not supportIDENTITY
columns in Oracle or Microsoft SQL Server.
You can apply a from_unique_imitate
mask to a primary key column
or a column that is used as a foreign key in another table.
References will be updated automatically.
Composite primary keys are supported.
Parameters
skip_letters
(optional): A boolean to enable or disable the skipping alphabetical characters from being masked. Defaults tofalse
(alphabetical characters will be replaced).skip_digits
(optional): A boolean to enable or disable the skipping of digits from being masked. Defaults tofalse
(digits will be replaced).checksum
(optional): A string to specify an algorithm to use to generate unique valid replacements. Options:brazilian_cpf
,credit_card
,luhn
oricp
.on_invalid
(optional): A string to specify the action to take if the value fails checksum validation. One of:mask
(default): If the input length and format is valid (ie, the input value is only invalid because of an invalid checksum), the input value will be masked with a new uniquely assigned invalid checksum. Otherwise, the input value will be masked without consideration of a checksum.skip
: Skip to the next value, the value remains unchanged.error
: Raise an error and stop masking.
retain_prefix_length
(optional): The number of characters of the input value to retain. See Retaining Prefixes below. By default, no prefix is retained (i.e. the entire input value is masked).include_leading_zeros
(optional): Whether to mask leading zeros in input values and allow leading zeros in masked values. See the table below for details and examples of how this option works. One of:true
: Leading zeros will be masked.false
: Leading zeros will be preserved.warn
(default): Likefalse
, but also issues a warning in the run log if a string value with all zero digit(s) is encountered, reminding the user that the digits (or perhaps the entire value) will not be masked.
min_digits
(optional): Only applicable wheninclude_leading_zeros
istrue
. Pads integertype values to the given number of digits before masking. Valid values formin_digits
are between 1 and 18.on_too_long
(optional): Only applicable wheninclude_leading_zeros
istrue
andmin_digits
is specified. Determines what action to take when an integer value with more thanmin_digits
digits is to be masked. (The leading
sign in a negative number does not count as a digit.) Valid values are:error
(default): Raise an error and stop masking.mask_suffix
: The lastmin_digits
digits of the value will be masked; earlier digits will be left asis.mask_all
: The entire value will be masked.
disable_warning_on_skipped_characters
(optional): A boolean which, when set totrue
, will prevent warnings about characters being skipped being logged during masking. Defaults tofalse
.
The
include_leading_zeros
option has no effect for integer data unlessmin_digits
is also specified. By default, integer values are always masked to values with the same number of digits as the input.Because it can create duplicate values, masking negative integers with
min_digits
is not supported. If this option is set and a negative integer value is encountered, DataMasque raises an error and stops masking.
Details and examples about the include_leading_zeros
, min_digits
, and on_too_long
options
Without min_digits
, the include_leading_zeros
option only has an effect when the input is a string.
The following table details the behaviour for string values.
include_leading_zeros = false or warn (default) 
include_leading_zeros = true 


Leading zeros in input  All zeros appearing before any other digit in the value are preserved. If the value only contains zero digits, no digits are masked. 
All digits, including leading zeros, are masked. 
Leading zeros in output  The masked value cannot have a zero as the first digit character in the string. The first nonzero digit cannot be masked to a zero digit. 
Any masked value of the same format is possible, including those with leading zero digits. The first nonzero digit can be masked to a zero digit. 
The min_digits
and on_too_long
options are only applicable when include_leading_zeros
is true
,
and allow include_leading_zeros
to apply to integer values as well.
The idea behind min_digits
is to pad integer values to a certain length,
so all values with at most that many digits are treated the same.
 Without
min_digits
, values 09 mask to 09, 1099 to 1099, and so on.  With
min_digits: 6
, as an example, all 16 digit numbers (i.e. 0999,999) mask to any other value 0999,999.
String inputs
The following table shows how the various options work on string inputs. All values are of string datatype.
Input  include_leading_zeros 
min_digits 
on_too_long 
Example output  Explanation 

"ABC001230044" 
false 
    "ZQD002573498" 
The leading zero characters in the value are left unchanged when include_leading_zeros is false (or warn ). 
"ABC001230044" 
true 
    "ZQD132560349" 
By setting include_leading_zeros to true , the first two digits in the value can now be masked. 
"DFR081121123" 
false 
    "PLX084320091" 
The rule about digits not being masked to zeros when include_leading_zeros is false only applies to the digits before the first nonzero digit (here the 8 ). Hence, the subsequent 1123 can be masked to a value starting with one or more zero digits. 
"DFR081121123" 
true 
    "PLX003155440" 
With include_leading_zeros set to true , any digit can be masked to any digit. Here the 0 happened to be masked to 0 and the 8 was masked to another 0 . 
"AA000000" 
warn 
    "NF000000" ,warning issued 
All the digits in the value are zeros, so DataMasque issues a warning as none of the digits will be masked. You can suppress the warning by setting include_leading_zeros to false . 
"AA000000" 
true 
    "NF493281" 
By setting include_leading_zeros to true , the value can now be masked. 
"1234" 
true 
5 
Any  "0781" 
The input is a string, but the options min_digits and on_too_long only affect integer values. 
Integer inputs
The following table shows how the various options work on integer inputs. All values are of integer datatype.
Input  include_leading_zeros 
min_digits 
on_too_long 
Example output  Explanation 

1234 
true 
    4873 
Integer inputs cannot have leading zeros by definition, so include_leading_zeros has no effect for integer values unless min_digits is also specified. 
1234 
true 
6 
  192766 
The value is padded to 6 digits and then masked. 
4321 
true 
6 
  5 ( 000005 ) 
Like the previous example, the value is padded to 6 digits and then masked. The output might have leading zeros, which are stripped off when the output is converted to integer datatype. As a result, the final masked value can have between 1 and 6 digits inclusive. 
1234567 
true 
4 
Not specified or error 
Error  The input value has 7 digits, but min_digits is set to 4 . The default behaviour of on_too_long is to raise an error. 
1234567 
true 
4 
mask_suffix 
1230991 
The input has more digits than min_digits . With the mask_suffix option, DataMasque only masks the last min_digits (here 4) digits. The other digits 123 are left asis. 
1234567 
true 
4 
mask_all 
5098437 
The input has more digits than min_digits . When on_too_long is set to mask_all , all digits are masked. The output always has the same length as the input when using mask_all , otherwise duplicate values can occur. 
0 
false 
    0 
If include_leading_zeros is not true , the integer value zero is always masked to zero. 
0 
true 
    8 
With include_leading_zeros set to true , the zero value is treated the same as any other onedigit number. 
0 
true 
2 
Any  19 
Same as the previous example  zero is treated like any other value, so padded to a twodigit number (since min_digits is 2 ) and then masked. 
1234567 
false or warn 
    4132891 
When include_leading_zeros is not true , integer inputs always mask to numbers with the same number of digits. The sign of the value is preserved. 
1234 
true 
6 
  Error  Masking negative numbers with min_digits is not supported. 
Obtaining consistent masking between string and integer inputs
One of the main uses of min_digits
is to enable numeric input values to mask to the same output,
regardless of whether the input is of string or integer datatype.
To do this, set include_leading_zeros
to true
, and min_digits
to the number of digits in the largest input value.
The string values must be stored as zeropadded values with this many digits, for example "000123"
when using min_digits: 6
.
Note: Integer inputs must all be nonnegative (0 or higher). Masking negative integers with
min_digits
is not supported.Note: For this masking case, omit
on_too_long
, or set it to the default value oferror
. That way, the masking run will fail if it encounters a value with more digits than expected, avoiding any cases wherefrom_unique_imitate
would produce inconsistent masking between string and integer values. Use ofmask_suffix
ormask_all
will not produce consistent results.
Input  include_leading_zeros 
min_digits 
Example output  Explanation 

"000123" 
true 
  "481657" 
Since include_leading_zeros is true , all digits of the input are masked. 
123 
true 
6 
481657 
The integer equivalent of the above input string is masked to the integer equivalent of the above output string. 
"004511" 
true 
  "000008" 
Since include_leading_zeros is true , it is possible for digits to be masked to zeros. 
4511 
true 
6 
8 ( 000008 ) 
Again, the integer equivalent of the input is masked to the integer equivalent of the output. Any output value between 0 and 999,999 is possible. 
123 
true 
Not specified  892 
Without min_digits , a threedigit integer is always masked to another threedigit integer. 
You can also leave include_leading_zeros
as the default value of warn
,
which still produces consistent results between zeropadded strings and integers,
but the output is less secure as the output value always has the same number of significant digits as the input value.
You might use this option if you do not know the number of digits in the largest (in magnitude) input value,
or if the input data includes negative numbers.
Note: For a given input, the output when using
include_leading_zeros: false
(orwarn
) may be markedly different from the output when usinginclude_leading_zeros: true
, even when there are no leading zeros.
Input  include_leading_zeros 
Example output  Explanation 

"000123" 
false or warn 
"000892" 
The input value has three leading zeros. Without include_leading_zeros , the output value retains exactly three leading zeros. 
123 
false or warn 
892 
The integer equivalent of the above input string is masked to the integer equivalent of the above output string. 
"004511" 
false or warn 
"003669" 
There are two leading zeros. The minus sign is not masked. 
4511 
false or warn 
3669 
Again, the integer equivalent of the input is masked to the integer equivalent of the output. 
Example ruleset
Consider a table like the following:
id Primary key 
value_as_string CHAR(6) 
value_as_integer INTEGER 

1 
"000123" 
123 
2 
"381731" 
381731 
etc. 
To mask this such that the consistency between the two value
columns is preserved, you can use a ruleset like the following.
version: "1.0"
tasks:
 type: mask_table
table: '"my_table"'
key: '"id"'
rules:
 column: '"value_as_string"'
masks:
 type: from_unique_imitate
include_leading_zeros: true
 column: '"value_as_integer"'
masks:
 type: from_unique_imitate
include_leading_zeros: true
min_digits: 6
Invalid Parameter Combinations
 Setting both
skip_digits
andskip_letters
totrue
is prohibited as no masking would take place. Using the
checksum
option requires masking digits, so if any checksum is specified then you cannot setskip_digits
totrue
. Further, theicp
checksum is an alphanumeric checksum, so you cannot setskip_digits
norskip_letters
totrue
when using this checksum. Because it would have no effect,
include_leading_zeros
cannot be specified ifskip_digits
is set totrue
. The
min_digits
option is not compatible withchecksum
, nor withretain_prefix_length
.
Values Requiring Checksums
When the output values must satisfy a checksum, specify the name of the checksum as the checksum
parameter.
Unique values will be generated that satisfy that checksum algorithm.
The available options for checksum
are:
For each checksum, the input value must contain a certain number of digits 09
and no letters AZ
or az
.
Other nonletter characters that are used for formatting are retained in the output.
The replacement value will conform to the checksum algorithm,
even if the input did not, provided it is of the correct length.
To handle masking of values that may not match the checksum, or that may contain letters,
specify the on_invalid
parameter.
Please refer to Using on_invalid
for a detailed explanation of the behavior of each on_invalid
option.
brazilian_cpf
Use this checksum
type to generate values that satisfy the Brazilian CPF (Cadastro de Pessoas Físicas) number
checksum.
For valid CPFs to be generated, the input value must contain 11 digits (and may contain spaces or punctuation).
For handling of invalid input values (for example, incorrect length, bad checksum or the presence of letters),
please refer to Using on_invalid
.
The table below shows example input and output data, based on the default parameters.
Input Example  Description  Output Example  Output Description 

298.056.37220  Valid, formatted CPF  886.972.87065  Valid CPF with formatting retained 
2980,5637,220  Valid CPF, with other formatting  8869,7287,065  Valid CPF with formatting retained 
29805637220  Valid CPF, digits only  88697287065  Valid CPF, digits only 
298.056.37229  11digit, formatted number, that is not a CPF  886.972.87065  Valid CPF with formatting retained 
29805637229  11digit number that is not a CPF  88697287065  Valid CPF, digits only 
298056372  9digit number  –  No output, error is raised and masking stops due to invalid length 
298A056B372C20  String with letters  –  No output, error is raised and masking stops due to invalid characters 
credit_card
and luhn
The credit_card
and luhn
checksums both generate values that satisfy the Luhn checksum algorithm.
The difference is how they each validate the length of the number:
luhn
may be applied to any number containing two or more digits.credit_card
is only valid for numbers of length 12 to 19, inclusive.
For values of length 1219 characters, the behaviour of both checksums is identical.
If you are masking only credit cards, then credit_card
should be preferred,
as it will also validate the length of existing values.
luhn
should be used when generating values of other lengths,
for example, mobile phone IMEI numbers.
For handling of invalid input values (for example, incorrect length, bad checksum or the presence or letters),
please refer to Using on_invalid
.
Input Example  Description  Output Example  Output Description 

4111 1111 1111 1111  Valid, formatted card number  2260 5651 2623 0906  Number that satisfies the Luhn checksum, with formatting retained 
2980,5637/2204  Number with other formatting  8869,7287/0655  Number that satisfies the Luhn checksum, with formatting retained 
4111111111111111  Valid card number, digits only  2260565126230906  Number that satisfies the Luhn checksum, digits only 
1234 1234 5678 5678  Formatted card number that does not satisfy the Luhn algorithm  2260 5651 2623 0906  Number that satisfies the Luhn checksum, with formatting retained 
298A056B372C20  String with letters  –  No output, error is raised and masking stops due to invalid characters 
icp
Use this checksum
type to generate values that satisfy the New Zealand Installation Control Point (ICP) checksum.
For valid ICPs to be generated, the input value must contain 15 digits and letters (and may contain spaces or
punctuation). For handling of invalid input values (for example, incorrect length, or bad checksum),
please refer to Using on_invalid
.
The table below shows example input and output data, based on the default parameters.
Input Example  Description  Output Example  Output Description 

1234567890XYD51  Valid ICP  7972434682KR014  Valid ICP 
1234567890XYD51  Valid ICP, with other formatting  7972434682KR014  Valid ICP with formatting retained 
0123456789XYD51  15alpha numeric, that is not an ICP    No output, error is raised and masking stops due to invalid checksum 
29805637229  11digit number that is not an ICP    No output, error is raised and masking stops due to invalid checksum 
Example
This example will apply from_unique_imitate
masks to the vehicle_id
, license_plate
and validation_code
columns.
version: '1.0'
tasks:
 type: mask_table
table: employees
key: id
rules:
 column: vehicle_id
masks:
 type: from_unique_imitate
 column: license_plate
masks:
 type: from_unique_imitate
 column: validation_code
masks:
 type: from_unique_imitate
Show result
Before  After  



Using on_invalid
The on_invalid
parameter can be used to control how invalid values are handled by from_unique_imitate
.
It can only be used in combination with the checksum
parameter,
since without specifying a checksum
there is no way of considering a value "invalid".
A value is considered invalid if:
 It is too short for the checksum, or,
 The checkdigit(s) are not valid for the checksum, or,
 It contains letters – this can be controlled with the
skip_letters
parameter. See Determining and Handling Invalid Values for more detail.
Note that null
is a special case and is not considered invalid.
null
input is masked to null
output, regardless of settings.
The following table illustrates the behaviour of from_unique_imitate
based on different on_invalid
parameters.
on_invalid 
Behaviour when encountering invalid value 

error (default if not specified) 
The masking task stops with an error. 
skip 
The invalid value is retained. 
mask 
The value is masked, and warnings are logged to the run log. See Determining and Handling Invalid Values for more detail. 
Determining and Handling Invalid Values
There two are main ways that a value can be invalid:
 It contains letters.
 It has the wrong length or checksum.
Note that special characters, punctuation, and spaces, do not affect the validity of values. For example,
from_unique_imitate
considers the values123 456
,123456
and123456
the same.
Determining the validity based on the presence of letters can be controlled with the skip_letters
parameter.
When set to true
, a value will not be invalid if it contains letters.
However, if the digits themselves do not satisfy the given checksum
,
then the value would be considered invalid.
The following table shows the validity of some example values for the brazilian_cpf
algorithm,
with and without the use of skip_letters: true
.
The rules are applicable to any checksum.
Input Value  skip_letters 
Valid  Reason 

12175488403 Valid CPF number 
true or false 
Yes  
121.754.88403 Valid CPF number with formatting 
true or false 
Yes  Punctuation is ignored. 
AB121.754.88403 Valid CPF number containing letters 
false 
No  The value contains letters. 
AB121.754.88403 Valid CPF number containing letters 
true 
Yes  The letters are ignored due to the use of skip_letters: true . 
121.754.88400 Invalid CPF, bad check digits. 
true or false 
No  Invalid due to bad check digits. 
AB121.754.88400 Invalid CPF, contains letters and bad check digits. 
true or false 
No  Invalid due to bad check digits, regardless of skipping letters or not. 
121.754.884 Invalid CPF, bad length. 
true or false 
No  Invalid due to length. 
AB121.754.884 Invalid CPF, bad length. 
true or false 
No  Invalid due to bad length, regardless of skipping letters or not. 
Once a value is determined to be invalid:
 If using
on_invalid: error
then the masking task will stop with an error.  If using
on_invalid: skip
then the value will be returned unmasked.  If using
on_invalid: mask
then masking will continue, and is described in more detail below.
For invalid values, the behaviour of the mask will change based on the reason for it being invalid.
If masking a value that is invalid only because of a bad check checksum, the output will mask uniquely to a new value with a uniquely masked, invalid checksum. If the value is invalid for another other reason, the string will be masked uniquely.
For example, the brazilian_cpf
algorithm requires 11digit values.
An 11digit value that is not a valid CPF number will be masked to a invalid CPF number
(an 11digit number that does not have a valid CPF number checksum).
However, a number that is not 11digits will not be masked to a valid CPF number, as the number of digits is not correct for that algorithm, but the numbers will be masked.
The following table shows the minimum and maximum value length to which the checksum applies.
checksum 
Minimum length (inclusive)  Maximum length (inclusive) 

brazilian_cpf 
11  11 
credit_card 
12  19 
luhn 
2  10,000 
icp 
15  15 
If the value is invalid because it contains letters:
 The letters will be masked if
skip_digits
isfalse
.  The letters remain unchanged if
skip_digits
istrue
.
The following table gives examples of masked outputs for invalid inputs,
for different skip_letters
options.
on_invalid
is set to mask
, otherwise no masking would occur.
The example uses the brazilian_cpf
algorithm but the rules are applicable to any checksum.
Input Value  skip_letters 
Output Value  Explanation 

AB121.754.88403 Valid CPF with letters 
false 
DF149.758.05529 
Valid CPF in output, with letters masked. 
AB121.754.88403 Valid CPF with letters 
true 
AB149.758.05529 
Valid CPF in output, letters not masked. 
121.754.88499 Invalid CPF 
true or false 
149.758.05530 
Invalid CPF in output. 
121.754.88498 Invalid CPF 
true or false 
149.758.05561 
Invalid CPF in output. 
121.754.88499AB Invalid CPF with letters 
false 
149.758.05530DF 
Invalid CPF in output, with letters masked. 
121.754.88499AB Invalid CPF with letters 
true 
149.758.05530AB 
Invalid CPF in output, letters not masked. 
121.754.884 Too short for CPF 
true or false 
246.016.536 
Not a valid CPF, as input too short. 
121.754.884.692 Too long for CPF 
true or false 
246.016.536.420 
Not a valid CPF, as input too long. 
AB121.754.884 Too short for CPF, with letters 
false 
DF246.016.536 
Not a valid CPF, as input too short, with letters masked. 
AB121.754.884.692 Too long for CPF, with letters 
false 
DF246.016.536.420 
Not a valid CPF, as input too long, with letters masked. 
AB121.754.884 Too short for CPF, with letters 
true 
AB246.016.536 
Not a valid CPF, as input too short, letters not masked. 
AB121.754.884.692 Too long for CPF, with letters 
true 
AB246.016.536.420 
Not a valid CPF, as input too long, letters not masked. 
Warning: When using
on_invalid: mask
with thebrazilian_cpf
checksum, 10 of the possible 1,000,000,000 digit combinations are not maskable with this algorithm.Where the data to mask has 11 digits, and the first nine digits are all the same digit (eg,
111.111.111xx
regardless of the checksum), the number itself is considered to be invalid.Where this data is required to be masked, it is recommended that the
from_unique_imitate
rule is wrapped by anif
rule that detects this data, and directs the engine to apply an alternative algorithm.When the masking process is configured to uniquely mask a CPF number that is invalid due to repeated digits, the run log will note a warning, and these entries will not be masked. This does not affect other checksum masking as this situation is specific to Brazilian CPF unique masking.
Example
This example will apply from_unique_imitate
masks to the cpf_number
and apply a redaction to known invalid numbers
by leveraging an if
conditional masking rule.
version: '1.0'
tasks:
 type: mask_table
table: employees
key: id
rules:
 if:
 column: cpfnumber
matches: '(?:.*?(\d)\1\1){3}.*'
rules:
 column: cpfnumber
masks:
 type: from_fixed
value: 'redacted'
else_rules:
 column: cpfnumber
masks:
 type: from_unique_imitate
checksum: brazilian_cpf
on_invalid: mask
Retaining Prefixes
When masking values with from_unique_imitate
, the retain_prefix_length
option be used to specify the number of
prefix characters of the input to retain in the output.
The length takes into account only characters that would be masked based on the parameters of the mask.
Since from_unique_imitate
doesn't mask punctuation or spaces, then these are not counted towards the prefix length.
Similarly, if using skip_letters
/skip_digits
, then letters or digits (respectively) won't be counted in the prefix.
If the retain_prefix_length
is equal to or longer than the values to be masked,
then an error will be raised during masking.
The following table shows the retained prefix for example values, based on different parameters.
Input Value  retain_prefix_length 
skip_letters 
skip_digits 
Retained Prefix  Example Output 

A1B2C3D4E5 
4  false 
false 
A1B2 
A1B2F6G7H8 
A1 B2 C3 D4 E5 
4  false 
false 
A1 B2 
A1 B2 F6 G7 H8 
A1 B2 C3 D4 E5 
4  true 
false 
A1 B2 C3 D4 
A1 B2 C3 D4 E8 
A1 B2 C3 D4 E5 
4  false 
true 
A1 B2 C3 D 
A1 B2 C3 D4 J5 
A1B 
4  true /false 
true /false 
  No output, error is raised as the prefix is >= the length of the value 
AAA111 
3  false 
false 
AAA 
AAA456 
AAA111 
3  true 
false 
  No output, error is raised as the prefix is >= than number of digits 
AAA111 
3  false 
true 
  No output, error is raised as the prefix is >= than number of letters 
The retain_prefix_length
parameter can be combined with the checksum
parameter.
The checksum will be generated after combining the prefix with the masked values.
For example, using the luhn
checksum, with retain_prefix_length
of 5
.
The value to be masked is 211287932175
(which is valid for the Luhn checksum).
The order of masking is:
 Extract 5 characters as a prefix:
21128
 Apply unique masking to all digits but the checksum digit
793217
, giving (for example)123456
 This value is combined with the prefix, giving
21128123456
 The checksum digit is calculated from this prefix and masked value, which is
0
 The checksum digit is appended to the masked value, giving final output value of
211281234560
, which is also valid for the Luhn algorithm.
When retaining the prefix and using a checksum, the number of digits must be less than the number of digits in the original value, minus the length of the check digits. For example, to mask a Brazilian CPF, a maximum of 8 digits can be retained, since the value is made up of 11 digits with the final 2 being checkdigits.
Checksum validity is checked when retaining the prefix, and validation is on the original value (i.e. including the prefix). To control how to handle invalid values, refer to Determining and Handling Invalid Values.
The following example ruleset shows masking a column using the credit_card
checksum,
retaining the first 4 digits.
version: "1.0"
tasks:
 type: mask_table
table: customers
key: customer_id
rules:
 column: credit_card_number
masks:
 type: from_unique_imitate
retain_prefix_length: 4
skip_letters: true
checksum: credit_card
Show result
Before  After  



RedShift Limitations
Due to the current method in which we mask with RedShift, please note the current limitations:
PRIMARY KEY
constrained columns will be transformed into columns which areUNIQUE
constrained instead ofPRIMARY KEY
constrained.NOT NULL
constrained columns will lose theirNOT NULL
constraint. However, this is not just limited tofrom_unique_imitate
masking.