DataMasque Portal

Transformation Masks

Transformation masks transform the underlying data type they work on. These masks are typically used alongside other masks for more complex masking operations.


Typecast (typecast)

A mask to convert a value from one data type to another. This is particularly useful when a value is stored as one data type in a database, but needs to be transformed to a different data type to apply certain mask functions within DataMasque.

If the typecast cannot be performed (e.g. because the value of the column is incompatible with the target data type), an error is raised.

Parameters

  • typecast_as (required): The data type you wish to convert the column value to. Must be one of: date, datetime, string, integer, float, decimal, or boolean.
  • date_format (optional): When converting a string to a datetime, this format string can be provided to specify the expected format of the string. Conversely, when converting a datetime to a string, this format string can be provided to determine the format of the resulting string. Format strings must be valid Python date/time format strings. See also Date Parsing in DataMasque for more information.

Notes

  • With typecast_as: integer, the typecast mask can only convert floats, or strings representing floats, if the floating-point value has a zero decimal part. For example, both 4.0 (float) and "4.000" (string) are converted to the integer 4. If the value has a non-zero decimal part, or is a special float value like nan (not a number) or inf (infinity), then an error is raised.
  • With typecast_as: integer, typecast_as: float or typecast_as: decimal, the Boolean values true and false are converted to 1 and 0 respectively.
  • The decimal type is designed for use with Parquet columns of type decimal128, but is also compatible with DECIMAL, NUMBER (Oracle) and NUMERIC (SQL Server) columns. If you experience failures masking data with the decimal typecast, try using float instead.

Example

This example will replace every start_date in the employees table with a fixed date value of 2010-01-01. The fixed date value is specified as a string and typecast as a datetime before being written to the database.

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: start_date
        masks:
          - type: from_fixed
            value: '2010-01-01'
          - type: typecast
            typecast_as: datetime
            date_format: '%Y-%m-%d'

Manipulating date values as strings

If you wish to manipulate non-string values using their string representations, you can use the typecast mask. Using the pattern demonstrated below, you may convert the value to a string, apply some manipulations, and then convert back to the original data type. For example, to truncate the day and month from a date while retaining the year, you may do the following:

version: '1.0'
tasks:
  - type: mask_table
    table: employees
    key: id
    rules:
      - column: date_of_birth
        masks:
          # Convert to string
          - type: typecast
            typecast_as: string
            date_format: '%Y-%m-%d'
          # Take the 'YYYY-' component and append a fixed value of
          # '01-01' to create the value 'YYYY-01-01'
          - type: concat
            masks:
              - type: replace_substring
                start_index: 5
                masks:
                - type: from_fixed
                  value: '01-01'
          # Convert back to original datatype
          - type: typecast
            typecast_as: datetime
            date_format: '%Y-%m-%d'

Do nothing (do_nothing)

A simple mask which does not mask the data instead, sets the original value. This is useful when there are certain items or elements of columns that are not intended to be masked.

For example, you can use it to retain certain known XML attributes and mask all the rest. For more information please refer to XML Documentation.