Masking Tutorial

1. Introduction
2. Creating a connection
3. Uploading a seed file
4. Creating a ruleset
5. Starting a masking run
6. Next steps

1. Introduction

This tutorial will guide you through the process of:

Configuring a Connection to a target database
Building a simple Ruleset to mask user details
Executing a masking Run against the target database

Prerequisites

To complete this tutorial, a target database server (either Oracle or Microsoft SQL Server) that has been created with an empty schema is required. Download and run one of the following scripts to initialise the schema with a users table and some sample user data.

This tutorial uses a seed file tutorial_names.csv as the data source for replacement name values. Download this file now for use later on.

Masking strategy

After running the database init script from the previous section, you will have a users table with 4 columns. The table below describes the strategy that will be used to mask each of these columns:

Column	Strategy
`user_id`	N/A (non-sensitive data)
`date_of_birth`	Replace with randomly generated date.
`first_name`	Replace with a random first name chosen from seed file.
`last_name`	Replace with a random last name chosen from seed file.

2. Creating a connection

After logging in to DataMasque, you will be taken to the Dashboard:

Click the button on the Connections panel. You will be taken to the 'Add Connection' form:

Create-connection

Complete the form with the parameters and credentials to connect to the database that you have prepared following the Prerequisites section. Choosing a meaningful connection name based on your own requirements will help you to easily identify the target database at a glance. In this example, the connection is named datamasque_tutorial. If you are unsure about what value to use for any of the parameters, refer to the Connections reference.

Click the TEST CONNECTION button to validate that DataMasque can connect to the target database. Once you have confirmed that your connection works, click SAVE AND EXIT to complete the connection setup. Your new connection will now be available in the Connections list on the Dashboard.

Dashboard-with-connection

3. Uploading a seed file

Seed files provide datasets of replacement values for DataMasque to use when masking. Seed files must be CSV formatted and include a header row. In this tutorial, the file tutorial_names.csv will be used to provide DataMasque with a dataset of replacement names for the users table. The file contains two columns: first_name and last_name.

Open the sidebar navigation menu by clicking the menu icon at the top left of the screen and navigate to the Files page. A list of all available files will be displayed:

Empty Files list

Click the button to open the file upload dialog. Now click Browse and locate the tutorial_names.csv file that you have downloaded previously. You may also provide a short description for your file. Click SUBMIT to complete the file upload:

Browse Files

The tutorial_names.csv file will appear in the files list:

Files List

Return to the Dashboard using the sidebar navigation menu.

4. Creating a Ruleset

A ruleset is the configuration that defines the tasks and masking logic that will be applied by DataMasque to a target database during a masking run. Rulesets are created and edited using the ruleset editor, and are written in DataMasque's YAML-based ruleset configuration language. A complete reference is available in the Ruleset Specification user guide.

To create a new ruleset, click the button on the Rulesets panel of the Dashboard. You will be taken to the Ruleset Generator. The Ruleset Generator can automatically generate a ruleset from your database's schema, and is the recommended way to get started with DataMasque.

To create a new empty ruleset, without using the generator, click Skip to YAML Editor, which will take you to the Ruleset Editor (shown below).

New-ruleset

General setup

Replace the ruleset name with a descriptive name for the ruleset. This name will be used to identify the ruleset from the Dashboard. In this example, the name user_table_mask is used.

The target database has a single table that requires masking, so the ruleset will contain a single task of type mask_table.

Update the value of table to match the users table created by the database init script in the Prerequisites section.

The mask_table task type also requires the name of a key column which uniquely identifies each row in the database table. Multiple column names may be provided in an array to form a composite key. On the users table, each row is uniquely identified by the user_id column:

version: "1.0"
tasks:
  - type: mask_table
    table: users
    key: user_id
    rules: 
      - column: REPLACE_ME
        masks:
          - type: from_fixed
            value: REPLACE_ME

Masking rules

date_of_birth

The first masking rule of this ruleset will be applied to the date_of_birth column. Replace the placeholder value for column with date_of_birth.

The desired strategy for masking the date_of_birth column is to replace all values with a new randomly generated date. This can be achieved with the from_random_date mask. Replace the placeholder from_fixed mask type with from_random_date. min and max parameters must be provided to the from_random_date mask. In this example, each user's date_of_birth will be a randomly chosen date between 1st January 1950 and 31st December 2000.

version: "1.0"
tasks:
  - type: mask_table
    table: users
    key: user_id
    rules: 
      - column: date_of_birth
        masks:
          - type: from_random_date
            min: '1950-01-01'
            max: '2000-12-31'

first_name

The desired strategy for masking the first_name column is to replace all values with a new name chosen randomly from the tutorial_names.csv file uploaded previously.

To achieve this, we will add a second masking rule to the ruleset targeting the first_name column. The from_file mask type is used to randomly choose replacement values from a seed file. The seed_file parameter specifies the name of the seed file to use as the data source, and the seed_column parameter specifies the name of the column within that seed file (as determined by the CSV header row) from which values will be sourced:

version: "1.0"
tasks:
  - type: mask_table
    table: users
    key: user_id
    rules: 
      - column: date_of_birth
        masks:
          - type: from_random_date
            min: '1950-01-01'
            max: '2000-12-31'
      - column: first_name
        masks:
          - type: from_file
            seed_file: tutorial_names.csv
            seed_column: first_name

last_name

The strategy for masking the last_name column is nearly identical to the first_name column. Add a third masking rule to the ruleset to use the last_name column from the same tutorial_names.csv seed file as a data source for randomly chosen last names:

version: "1.0"
tasks:
  - type: mask_table
    table: users
    key: user_id
    rules: 
      - column: date_of_birth
        masks:
          - type: from_random_date
            min: '1950-01-01'
            max: '2000-12-31'
      - column: first_name
        masks:
          - type: from_file
            seed_file: tutorial_names.csv
            seed_column: first_name
      - column: last_name
        masks:
          - type: from_file
            seed_file: tutorial_names.csv
            seed_column: last_name

You have just finished building your first ruleset. Click SAVE AND EXIT to save the ruleset and return to the Dashboard:

Dashboard with connection and ruleset

5. Starting a masking run

We will now apply data masking to a target database. From the Dashboard, select the Connection and Ruleset created previously. Run options may be left as their default values. Click the PREVIEW RUN button, which will take you to a confirmation screen for the masking run:

Selecting Run Parameters

After verifying that the run configuration is correct, click the START RUN button to start your masking run:

Preview Run

You will be taken to the Run Logs page, where you can monitor the progress of the masking run from a stream of log messages from the masking worker. The run status will update to 'finished' on completion of masking.

Congratulations! You have successfully masked your first database with DataMasque. Try querying the users table to verify that the values have been masked as you expected.

6. Next steps

Familiarise yourself with the Ruleset Specification guide to learn how to implement more complex data masking strategies.
Review the DataMasque Best Practices guide for some tips on getting the most from DataMasque.