DataMasque Portal

Masking Joined Tables

When masking a table, you can specify a list of joins (or dictionary mapping keys to joins) that will join the rows of a target table to rows from one or more additional tables, providing you with the additional joined values to use in your masking rules.

Parameters

  • target_table (required): The name of the new table you wish to join into the masking data. The target table can be prefixed with a schema name to reference a table in another schema.
  • target_key (required): The key on target_table to use when performing the join. This can be specified as a single column name or a list of column names.
  • source_table (required): The name of the table you wish join the target_table with. This could be the table being masked, or another table earlier in the list of joins (allowing you to perform multistep joins). The source table can be prefixed with a schema name to reference a table in another schema.
  • source_key (required): The key on source_table to use when performing the join. This can be specified as a single column name or a list of column names.

Example

In this example, we have two tables: Customers and Membership. We would like the mask the first_name and last_name columns of the Customers table, but only if the user's membership status is Active.

The Customers contains data on customers, including their name and membership ID. The Membership table contains the status of the membership of each Customer: either Active or Inactive. The membership_ip column of the Users has a foreign key relation with the id column of the Membership table.


Customers Table

customer_id first_name last_name membership_id
1 Anastasia Rose 10001
2 Bill Jones 10002
3 Chris Yang 10003
4 Judith Taylor 10004
5 Gordon Smith 10005


Membership Table

id membership_status
10000 Active
10001 Active
10002 Inactive
10003 Active
10004 Inactive

Customers Membership diagram

In order to access the membership_status column of the Membership table, we need to define a join in our ruleset from the Customers table to the Membership table.

version: "1.0"
tasks:
  - type: mask_table
    table: Customers
    key: customer_id
    joins:
      - target_table: Membership
        target_key: id
        source_table: Customers
        source_key: membership_id
    rules:
      - if:
        - column: '"Membership".membership_status'
          equals: Active
        rules:
          - column: first_name
            masks:
              - type: from_file
                seed_file: DataMasque_firstNames_mixed.csv
                seed_column: firstname-mixed
          - column: last_name
            masks:
              - type: from_file
                seed_file: DataMasque_lastNames.csv
                seed_column: lastnames

After performing the join, this will allow us to reference the membership_status column of the Membership table in our ruleset. In this example, we can reference the column with Membership.membership_status. Using this column, we can use Conditional Masking to only mask the rows of Customers where the status of the membership is 'Active'.

Note: To reference a column in a joined table, the table name of joined table must be added as a prefix to the column name**

The example below utilises the from_file mask type detailed here to select a random first name from the DataMasque_firstNames-mixed.csv and a random last name from the DataMasque_lastNames.csv files that can be found on our Supplementary Files user guide. It will first check if the membership_status for the customer is 'Active', and if so, masks the two name columns: otherwise, these columns are left unmasked.

version: "1.0"
tasks:
  - type: mask_table
    table: Customers
    key: customer_id
    joins:
      - target_table: Membership
        target_key: id
        source_table: Customers
        source_key: membership_id
    rules:
      - if:
        - column: '"Membership".membership_status'
          equals: Active
        rules:
          - column: first_name
            masks:
              - type: from_file
                seed_file: DataMasque_firstNames_mixed.csv
                seed_column: firstname-mixed
          - column: last_name
            masks:
              - type: from_file
                seed_file: DataMasque_lastNames.csv
                seed_column: lastnames

This example will produce the following results in the Customers table. The customers with customer_id 3 and 5 are not masked, as the status of their membership is 'Inactive' in the joined Membership table.

customer_id first_name last_name membership_id
1 Tia Pallin 10001
2 Nikau Koller 10002
3 Chris Yang 10003
4 Anika Thom 10004
5 Gordon Smith 10005

Note:

  • For Microsoft SQL Server (Linked Server), joins are not currently supported.

  • For Microsoft SQL Server, when using temporary table, the name of the temporary table must be wrapped in quotation marks, as the # symbol in the YAML editor denotes the beginning of a comment (e.g. target_table: '##my_temporary_table' or '##my_temporary_table.column').

  • To reference a temporary table column (e.g. the table_filter_column parameter of the from_file mask type or as a part of hash_columns) you must prefix the column name with its table name (e.g. table.column).

  • Any column name specified without a table prefix is assumed to belong to the table being masked (as specified by the table parameter for the task). You cannot specify tables that belong to other schemas.