Date/Time Masks
These masks are used for working with date or datetime values when masking.
- Random date/time (
from_random_datetime
)
Generates random dates and times - Random date (
from_random_date
)
Generates random dates (without time components) - Retain Age (
retain_age
)
Transforms values into random date times that preserve the age (in years). - Retain Date Component (
retain_date_component
)
Transforms specific parts of a date to be random - Retain Year (
retain_year
)
Transforms a date time randomly whilst keeping the same year
Note: For guidance on date parsing parameters when working with dates and times, please consult Date Parsing in DataMasque.
Note: Consistency in timezone handling between date/time values in the masking data and ruleset is crucial. Mismatches, where one includes timezones and the other does not, can lead to unexpected masking results.
Random date/time (from_random_datetime
)
This mask replaces the column value with a random datetime. Generated datetime values have a resolution of 1 second (the millisecond component will be 0).
Parameters
max
(required): The generated date/time is guaranteed to be less than this date/time, formatted according to ISO 8601, or you can specifynow
/current_date_time
, which will be the date and time that the run executes.min
(required): This is the minimum value for the generated date/time, formatted according to ISO 8601, or you can specifynow
/current_date_time
, which will be the date and time that the run executes.
Note A
from_random_datetime
mask can be used in conjunction with a Typecast mask in order to make sure the format of the generated date/datetime value satisfies any format requirements.Note: If the column's data type does not support timezones, the timezone will be dropped from the masked value.
Note: When masking Redshift, DataMasque does not support datetimes with timezones as the
min
ormax
value, therefore timezone offsets must be excluded from the ruleset, or they will cause the masking run to fail.For example, specify
min: '2019-12-01T00:00'
instead ofmin: '2019-12-01T00:00+00:00'
.
Example
This example replaces the order_time
column with a random date time between
December 1, 2019 00:00:00 UTC and December 31, 2019 05:30:00 UTC.
version: '1.0'
tasks:
- type: mask_table
table: orders
key: id
rules:
- column: order_time
masks:
- type: from_random_datetime
min: '2019-12-01T00:00+00:00'
max: '2019-12-31T05:30+00:00'
Show result
Before | After |
|
|
---|
Random date (from_random_date
)
This mask replaces the column value with a random date. Generated date values have a resolution of 1 day and do not include a time component.
Parameters
max
(required): The generated date/time is guaranteed to be less than this date/time, formatted according to ISO 8601 or you can specifynow
/current_date_time
, which will be the date and time that the run executes.min
(required): This is the minimum value for the generated date/time, formatted according to ISO 8601 or you can specifynow
/current_date_time
, which will be the date and time that the run executes.
Note A
from_random_date
mask can be used in conjunction with a Typecast mask in order to make sure the format of the generated date value satisfies any format requirements.Note: When masking Redshift, DataMasque does not support datetimes with timezones as the
min
ormax
value, therefore timezone offsets must be excluded from the ruleset, or they will cause the masking run to fail.For example, specify
min: '2019-12-01T00:00'
instead ofmin: '2019-12-01T00:00+00:00'
.
Example
This example replaces the date_of_birth
column with a random date between January 1, 1950, and December 31, 2000.
version: '1.0'
tasks:
- type: mask_table
table: employees
key: id
rules:
- column: date_of_birth
masks:
- type: from_random_date
min: '1950-01-01'
max: '2000-12-31'
Show result
Before | After |
|
|
---|
Note that this mask uses the ISO-8601 standard for
date
values. This allows for ISO weeks to be used for formatting dates, ISO week formats are specified with a 'W' in front of the week number and optionally followed by the day of the week. Additionally, day of the year can be specified as a three-digit number. See Date formats below.
Date formats
Format | Example | Equivalent |
---|---|---|
yyyy | 2022 | January 1, 2022 |
yyyy-mm | 2022-03 | March 1, 2022 |
yyyy-mm-dd | 2022-03-05 | March 5, 2022 |
yyyy-Www | 2022-W20 | Week 20 of 2022 (May 16, 2022) |
yyyy-Www-d | 2022-W20-4 | Day 4 of Week 20 of 2022 (May 19, 2022) |
yyyy-ddd | 2022-050 | Day 50 of 2022 (February 19, 2022) |
Retain age (retain_age
)
This mask is designed to be used on a column containing dates. It will mask by generating random dates in such a way that age (in years) is retained, calculated based on the date the mask is executed. This can be used to ensure that columns such as date of birth,transaction date, registration date or other creation dates, are masked without conflicting with other rules that may rely on the age of the masked date.
Parameters
date_format
(optional): If the column value is read as a string type, then it will be converted to a date using this format. The new randomly generated date will also be written back to the database in the same format. Format strings must be valid Python date/time format strings. Defaults to"%Y-%m-%d"
.day_of_year
(optional): For consistent masking, specify aday_of_year
in<month>-<day>
format to calculate the ages on that day of the current year. See Consistency of retained ages below. If this option is omitted, then ages will be calculated based on the current date (in UTC time).allow_future
(optional): When set totrue
, this allows for the masking of future dates while preserving the age derived from the date. Dates in the future will only be masked with dates that are also in the future. If set tofalse
and a future date is encountered, an error will be raised. Defaults totrue
.force_change
(optional): Ensures the newly generated date is different to the input date. On the chance that the same date is generated.
Consistency of retained ages
The age calculation operates on the principle of retaining the whole number of years of age, based on the date of masking. This means that the generated date may vary if masking is performed on different days.
Note: Birth dates are used as examples in this explanation, but the principle of retaining age can be applied to any type of date where age retention is desired.
Consider the example of a person born on the 1st of June 2000. They turned 20 on the 1st of June 2020 and remained 20 until the 31st of May 2021 (inclusive).
Working backward, someone aged 20 on the 1st of June 2020 could have a birthday from the 2nd of June 1999 until the 1st of June 2000 (inclusive). Regardless of their actual birthday, anyone born within this range is considered to be 20 years old on that day.
However, the range of birthdays that result in the same age for a given date shifts depending on the current date. For example, if the calculation is rerun on the 2nd of June 2020, someone aged 20 could now be born from the 3rd of June 1999 to the 2nd of June 2000. Again, anyone born within this adjusted range would be 20 on the 2nd of June 2020.
To prevent these changes and maintain consistency in age calculation,
you can specify the day_of_year
parameter for the mask.
This parameter allows you to calculate the age for a specific day of the year.
The day_of_year
parameter should be specified in the format "<month>-<day>"
,
such as "8-1"
for the 1st of August. The year will automatically be set to the current year.
The following table demonstrates the impact of using the day_of_year
option when calculating date ranges for age
determination:
Original Date | day_of_year Parameter |
Date of Masking | Output Date Range (Inclusive) |
---|---|---|---|
2000-06-01 | Unset | 2020-06-01 | 1999-06-02 to 2000-06-01 |
2000-06-01 | Unset | 2020-06-02 | 1999-06-03 to 2000-06-03 |
2000-06-01 | 8-1 |
2020-06-01 | 1999-08-02 to 2000-08-01 |
2000-06-01 | 8-1 |
2020-06-02 | 1999-08-02 to 2000-08-01 |
The output date range represents the possible birthdays that result in the same age as of the masking date.
When using the day_of_year
parameter,
the output date range remains consistent across multiple days of masking,
ensuring uniformity in age calculation.
However, the birth range shifts to maintain accuracy for the specified day_of_year
.
The choice between using or not using day_of_year
should be based on the requirement for precise age calculation,
or consistent masking behavior across different days.
Note: As DataMasque operates in UTC time, the calculation date change occurs at midnight in the UTC time zone. Therefore, you may see different results on the same day in your timezone, because UTC has changed to a new day.
Example
This example replaces the date_of_birth
column values while retaining the employee's age.
The dates are based on the masking run being executed on August 1st, 2022.
The employees' ages are 50, 50, 49 and 22, respectively,
and are maintained even with the new dates of birth.
version: '1.0'
tasks:
- type: mask_table
table: employees
key: id
rules:
- column: date_of_birth
masks:
- type: retain_age
Show result
Before | After |
|
|
---|
Note that the mask considers birthdays on the 29th of February to be celebrated on the 28th of February on non-leap-years.
If masking
datetime
values (rather than justdate
s) then the time portion of the masked value will be00:00:00
(midnight).
Retain date component (retain_date_component
)
This mask will generate a date where up to two of the year, month, or day components value will be retained.
Parameters
date_format
(optional): If the column value is read as a string type, then it will be converted to a date using this format. The new randomly generated date will also be written back to the database in the same format. Format strings must be valid Python date/time format strings. Defaults to"%Y-%m-%d"
.year
(this ORmonth
ORday
required): This is abool
which specifies whether the year component of the date should be retained. Defaults tofalse
.month
(this ORyear
ORday
required): This is abool
which specifies whether the month component of the date should be retained. Defaults tofalse
.day
(this ORyear
ORmonth
required): This is abool
which specifies whether the day component of the date should be retained. Defaults tofalse
.minimum_year
(optional): This is the minimum year that can be generated. Defaults to 100 years ago.maximum_year
(optional): This is the maximum year that can be generated. Defaults to current year.force_change
(optional): Ensures the newly generated date is different to the input date. On the chance that the same date is generated.
Example
In this example the month of each date is retained, while the day and year will be randomly generated. The year will be a value between 1980 and 2005 (inclusive).
version: '1.0'
tasks:
- type: mask_table
table: customers
key: id
rules:
- column: date_of_birth
masks:
- type: retain_date_component
month: True
minimum_year: 1980
maximum_year: 2005
Show result
Before | After |
|
|
---|
Note: In order for
retain_date_component
to consistently produce the same values between runs in different years, a fixedmaximum_year
must be specified.
Retain year (retain_year
)
This mask replaces the column value with a random date. The generated date will have the year component retained while randomising month and day.
Parameters
date_format
(optional): If the column value is read as a string type, then it will be converted to a date using this format. The new randomly generated date will also be written back to the database in the same format. Format strings must be valid Python date/time format strings. Defaults to"%Y-%m-%d"
.force_change
(optional): Ensures the newly generated date is different to the input date. On the chance that the same date is generated.
Example
This example replaces the date_of_birth
column values with new dates while retaining their year component.
version: '1.0'
tasks:
- type: mask_table
table: employees
key: id
rules:
- column: date_of_birth
masks:
- type: retain_year
force_change: True
Show result
Before | After |
|
|
---|