Masking Joined Tables
When masking a table, you can specify a list of joins (or dictionary mapping keys to joins) that will join the rows of a target table to rows from one or more additional tables, providing you with the additional joined values to use in your masking rules.
Parameters
target_table
(required): The name of the new table you wish to join into the masking data. The target table can be prefixed with a schema name to reference a table in another schema.target_key
(required): The key ontarget_table
to use when performing the join. This can be specified as a single column name or a list of column names.source_table
(required): The name of the table you wish join thetarget_table
with. This could be the table being masked, or another table earlier in the list ofjoins
(allowing you to perform multistep joins). The source table can be prefixed with a schema name to reference a table in another schema.source_key
(required): The key onsource_table
to use when performing the join. This can be specified as a single column name or a list of column names.
Example
In this example, we have two tables: Customers
and Membership
. We would like the mask the first_name
and
last_name
columns of the Customers
table, but only if the user's membership status is Active.
The Customers
contains data on customers, including their name and membership ID. The Membership
table
contains the status of the membership of each Customer: either Active or Inactive. The membership_ip
column
of the Users
has a foreign key relation with the id
column of the Membership
table.
Customers
Table
customer_id | first_name | last_name | membership_id |
---|---|---|---|
1 | Anastasia | Rose | 10001 |
2 | Bill | Jones | 10002 |
3 | Chris | Yang | 10003 |
4 | Judith | Taylor | 10004 |
5 | Gordon | Smith | 10005 |
Membership
Table
id | membership_status |
---|---|
10000 | Active |
10001 | Active |
10002 | Inactive |
10003 | Active |
10004 | Inactive |
In order to access the membership_status
column of the Membership
table, we need to define a join in our
ruleset from the Customers
table to the Membership
table.
version: "1.0"
tasks:
- type: mask_table
table: Customers
key: customer_id
joins:
- target_table: Membership
target_key: id
source_table: Customers
source_key: membership_id
rules:
- if:
- column: '"Membership".membership_status'
equals: Active
rules:
- column: first_name
masks:
- type: from_file
seed_file: DataMasque_firstNames_mixed.csv
seed_column: firstname-mixed
- column: last_name
masks:
- type: from_file
seed_file: DataMasque_lastNames.csv
seed_column: lastnames
After performing the join, this will allow us to reference the membership_status
column of the Membership
table in our ruleset. In this example, we can reference the column with Membership.membership_status
. Using
this column, we can use Conditional Masking to only mask the rows of Customers
where the
status of the membership is 'Active'.
Note: To reference a column in a joined table, the table name of joined table must be added as a prefix to the column name**
The example below utilises the from_file
mask type detailed here to select a random
first name from the DataMasque_firstNames-mixed.csv
and a random last name from the
DataMasque_lastNames.csv
files that can be found on our Supplementary Files user guide. It will
first check if the membership_status
for the customer is 'Active', and if so, masks the two name columns: otherwise,
these columns are left unmasked.
version: "1.0"
tasks:
- type: mask_table
table: Customers
key: customer_id
joins:
- target_table: Membership
target_key: id
source_table: Customers
source_key: membership_id
rules:
- if:
- column: '"Membership".membership_status'
equals: Active
rules:
- column: first_name
masks:
- type: from_file
seed_file: DataMasque_firstNames_mixed.csv
seed_column: firstname-mixed
- column: last_name
masks:
- type: from_file
seed_file: DataMasque_lastNames.csv
seed_column: lastnames
This example will produce the following results in the Customers
table. The customers with customer_id
3 and 5
are not masked, as the status of their membership is 'Inactive' in the joined Membership
table.
customer_id | first_name | last_name | membership_id |
---|---|---|---|
1 | Tia | Pallin | 10001 |
2 | Nikau | Koller | 10002 |
3 | Chris | Yang | 10003 |
4 | Anika | Thom | 10004 |
5 | Gordon | Smith | 10005 |
Note:
For Microsoft SQL Server (Linked Server),
joins
are not currently supported.For Microsoft SQL Server, when using temporary table, the name of the temporary table must be wrapped in quotation marks, as the
#
symbol in the YAML editor denotes the beginning of a comment (e.g.target_table: '##my_temporary_table'
or'##my_temporary_table.column'
).To reference a temporary table column (e.g. the
table_filter_column
parameter of thefrom_file
mask type or as a part ofhash_columns
) you must prefix the column name with its table name (e.g.table.column
).Any column name specified without a table prefix is assumed to belong to the table being masked (as specified by the
table
parameter for the task). You cannot specify tables that belong to other schemas.