DataMasque Portal

Date/Time Masks

These masks are used for working with date or datetime values when masking.

Note: For guidance on date parsing parameters when working with dates and times, please consult Date Parsing in DataMasque.

Note: Consistency in timezone handling between date/time values in the masking data and ruleset is crucial. Mismatches, where one includes timezones and the other does not, can lead to unexpected masking results.

Random date/time (from_random_datetime)

This mask replaces the column value with a random datetime. Generated datetime values have a resolution of 1 second (the millisecond component will be 0).

Parameters

  • max (required): The generated date/time is guaranteed to be less than this date/time, formatted according to ISO 8601, or you can specify now/current_date_time, which will be the date and time that the run executes.
  • min (required): This is the minimum value for the generated date/time, formatted according to ISO 8601, or you can specify now/current_date_time, which will be the date and time that the run executes.

Note A from_random_datetime mask can be used in conjunction with a Typecast mask in order to make sure the format of the generated date/datetime value satisfies any format requirements.

Note: If the column's data type does not support timezones, the timezone will be dropped from the masked value.

Note: When masking Redshift, DataMasque does not support datetimes with timezones as the min or max value, therefore timezone offsets must be excluded from the ruleset, or they will cause the masking run to fail.

For example, specify min: '2019-12-01T00:00' instead of min: '2019-12-01T00:00+00:00'.

Example

This example replaces the order_time column with a random date time between December 1, 2019 00:00:00 UTC and December 31, 2019 05:30:00 UTC.

version: '1.0'
tasks:
  - type: mask_table
    table: orders
    key: id
    rules:
      - column: order_time
        masks:
          - type: from_random_datetime
            min: '2019-12-01T00:00+00:00'
            max: '2019-12-31T05:30+00:00'

Show result

Before After
order_time
2019-12-07 19:50:12+00:00
2019-12-10 22:33:00+00:00
1966-12-22 07:09:01+00:00
2019-12-27 09:23:23+00:00
2019-12-29 12:06:50+00:00
2019-12-30 08:40:34+00:00
order_time
2019-12-23 20:02:50+00:00
2019-12-09 11:32:00+00:00
2019-12-12 21:31:41+00:00
2019-12-06 12:07:18+00:00
2019-12-02 05:28:45+00:00
2019-12-20 08:40:34+00:00


Random date (from_random_date)

This mask replaces the column value with a random date. Generated date values have a resolution of 1 day and do not include a time component.

Parameters

  • max (required): The generated date/time is guaranteed to be less than this date/time, formatted according to ISO 8601 or you can specify now/current_date_time, which will be the date and time that the run executes.
  • min (required): This is the minimum value for the generated date/time, formatted according to ISO 8601 or you can specify now/current_date_time, which will be the date and time that the run executes.

Note A from_random_date mask can be used in conjunction with a Typecast mask in order to make sure the format of the generated date value satisfies any format requirements.

Note: When masking Redshift, DataMasque does not support datetimes with timezones as the min or max value, therefore timezone offsets must be excluded from the ruleset, or they will cause the masking run to fail.

For example, specify min: '2019-12-01T00:00' instead of min: '2019-12-01T00:00+00:00'.

Example

This example replaces the date_of_birth column with a random date between January 1, 1950, and December 31, 2000.

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: date_of_birth
        masks:
          - type: from_random_date
            min: '1950-01-01'
            max: '2000-12-31'

Show result

Before After
date_of_birth
1959-03-07
1965-11-10
1966-06-22
1979-02-27
1990-05-31
1999-07-31
date_of_birth
1997-08-25
1970-10-09
1995-01-18
1973-04-18
1961-02-26
1992-04-30

Note that this mask uses the ISO-8601 standard for date values. This allows for ISO weeks to be used for formatting dates, ISO week formats are specified with a 'W' in front of the week number and optionally followed by the day of the week. Additionally, day of the year can be specified as a three-digit number. See Date formats below.

Date formats

Format Example Equivalent
yyyy2022January 1, 2022
yyyy-mm2022-03March 1, 2022
yyyy-mm-dd2022-03-05March 5, 2022
yyyy-Www2022-W20Week 20 of 2022
(May 16, 2022)
yyyy-Www-d2022-W20-4Day 4 of Week 20 of 2022
(May 19, 2022)
yyyy-ddd2022-050Day 50 of 2022
(February 19, 2022)

Retain age (retain_age)

This mask is designed to be used on a column containing dates. It will mask by generating random dates in such a way that age (in years) is retained, calculated based on the date the mask is executed. This can be used to ensure that columns such as date of birth,transaction date, registration date or other creation dates, are masked without conflicting with other rules that may rely on the age of the masked date.

Parameters

  • date_format (optional): If the column value is read as a string type, then it will be converted to a date using this format. The new randomly generated date will also be written back to the database in the same format. Format strings must be valid Python date/time format strings. Defaults to "%Y-%m-%d".
  • day_of_year (optional): For consistent masking, specify a day_of_year in <month>-<day> format to calculate the ages on that day of the current year. See Consistency of retained ages below. If this option is omitted, then ages will be calculated based on the current date (in UTC time).
  • allow_future (optional): When set to true, this allows for the masking of future dates while preserving the age derived from the date. Dates in the future will only be masked with dates that are also in the future. If set to false and a future date is encountered, an error will be raised. Defaults to true.
  • force_change (optional): Ensures the newly generated date is different to the input date. On the chance that the same date is generated.

Consistency of retained ages

The age calculation operates on the principle of retaining the whole number of years of age, based on the date of masking. This means that the generated date may vary if masking is performed on different days.

Note: Birth dates are used as examples in this explanation, but the principle of retaining age can be applied to any type of date where age retention is desired.

Consider the example of a person born on the 1st of June 2000. They turned 20 on the 1st of June 2020 and remained 20 until the 31st of May 2021 (inclusive).

Working backward, someone aged 20 on the 1st of June 2020 could have a birthday from the 2nd of June 1999 until the 1st of June 2000 (inclusive). Regardless of their actual birthday, anyone born within this range is considered to be 20 years old on that day.

However, the range of birthdays that result in the same age for a given date shifts depending on the current date. For example, if the calculation is rerun on the 2nd of June 2020, someone aged 20 could now be born from the 3rd of June 1999 to the 2nd of June 2000. Again, anyone born within this adjusted range would be 20 on the 2nd of June 2020.

To prevent these changes and maintain consistency in age calculation, you can specify the day_of_year parameter for the mask. This parameter allows you to calculate the age for a specific day of the year.

The day_of_year parameter should be specified in the format "<month>-<day>", such as "8-1" for the 1st of August. The year will automatically be set to the current year.

The following table demonstrates the impact of using the day_of_year option when calculating date ranges for age determination:

Original Date day_of_year Parameter Date of Masking Output Date Range (Inclusive)
2000-06-01 Unset 2020-06-01 1999-06-02 to 2000-06-01
2000-06-01 Unset 2020-06-02 1999-06-03 to 2000-06-03
2000-06-01 8-1 2020-06-01 1999-08-02 to 2000-08-01
2000-06-01 8-1 2020-06-02 1999-08-02 to 2000-08-01

The output date range represents the possible birthdays that result in the same age as of the masking date.

When using the day_of_year parameter, the output date range remains consistent across multiple days of masking, ensuring uniformity in age calculation. However, the birth range shifts to maintain accuracy for the specified day_of_year.

The choice between using or not using day_of_year should be based on the requirement for precise age calculation, or consistent masking behavior across different days.

Note: As DataMasque operates in UTC time, the calculation date change occurs at midnight in the UTC time zone. Therefore, you may see different results on the same day in your timezone, because UTC has changed to a new day.

Example

This example replaces the date_of_birth column values while retaining the employee's age. The dates are based on the masking run being executed on August 1st, 2022. The employees' ages are 50, 50, 49 and 22, respectively, and are maintained even with the new dates of birth.

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: date_of_birth
        masks:
          - type: retain_age

Show result

Before After
date_of_birth
1972-07-31
1972-08-01
1972-08-02
2000-02-29
date_of_birth
1972-05-30
1971-08-10
1973-04-21
1999-10-20

Note that the mask considers birthdays on the 29th of February to be celebrated on the 28th of February on non-leap-years.

If masking datetime values (rather than just dates) then the time portion of the masked value will be 00:00:00 (midnight).


Retain date component (retain_date_component)

This mask will generate a date where up to two of the year, month, or day components value will be retained.

Parameters

  • date_format (optional): If the column value is read as a string type, then it will be converted to a date using this format. The new randomly generated date will also be written back to the database in the same format. Format strings must be valid Python date/time format strings. Defaults to "%Y-%m-%d".
  • year (this OR month OR day required): This is a bool which specifies whether the year component of the date should be retained. Defaults to false.
  • month (this OR year OR day required): This is a bool which specifies whether the month component of the date should be retained. Defaults to false.
  • day (this OR year OR month required): This is a bool which specifies whether the day component of the date should be retained. Defaults to false.
  • minimum_year (optional): This is the minimum year that can be generated. Defaults to 100 years ago.
  • maximum_year(optional): This is the maximum year that can be generated. Defaults to current year.
  • force_change (optional): Ensures the newly generated date is different to the input date. On the chance that the same date is generated.

Example

In this example the month of each date is retained, while the day and year will be randomly generated. The year will be a value between 1980 and 2005 (inclusive).

version: '1.0'
tasks:
  - type: mask_table
    table: customers
    key: id
    rules:
      - column: date_of_birth
        masks:
          - type: retain_date_component
            month: True
            minimum_year: 1980
            maximum_year: 2005

Show result

Before After
date_of_birth
1972-07-31
2000-08-01
2010-10-02
1996-02-29
date_of_birth
1998-07-21
2004-08-15
1990-10-27
2000-02-12

Note: In order for retain_date_component to consistently produce the same values between runs in different years, a fixed maximum_year must be specified.


Retain year (retain_year)

This mask replaces the column value with a random date. The generated date will have the year component retained while randomising month and day.

Parameters

  • date_format (optional): If the column value is read as a string type, then it will be converted to a date using this format. The new randomly generated date will also be written back to the database in the same format. Format strings must be valid Python date/time format strings. Defaults to "%Y-%m-%d".
  • force_change (optional): Ensures the newly generated date is different to the input date. On the chance that the same date is generated.

Example

This example replaces the date_of_birth column values with new dates while retaining their year component.

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: date_of_birth
        masks:
          - type: retain_year
            force_change: True

Show result

Before After
date_of_birth
1972-07-31
1984-08-01
1996-08-02
2000-02-29
date_of_birth
1972-01-30
1984-11-10
1996-06-28
2000-02-13