- 1. Introduction
- 2. Creating a connection
- 3. Uploading a seed file
- 4. Creating a ruleset
- 5. Starting a masking run
- 6. Next steps
This tutorial will guide you through the process of:
- Configuring a Connection to a target database
- Building a simple Ruleset to mask user details
- Executing a masking Run against the target database
To complete this tutorial, a target database server (either Oracle or Microsoft SQL Server) that has been
created with an empty schema is required. Download and run one of the following scripts to
initialise the schema with a
users table and some sample user data.
This tutorial uses a seed file tutorial_names.csv as the data source for replacement name values. Download this file now for use later on.
After running the database init script from the previous section, you will have a
with 4 columns. The table below describes the strategy that will be used to mask each of these columns:
||N/A (non-sensitive data)|
||Replace with randomly generated date.|
||Replace with a random first name chosen from seed file.|
||Replace with a random last name chosen from seed file.|
2. Creating a connection
After logging in to DataMasque, you will be taken to the Dashboard:
Click the button on the Connections panel. You will be taken to the 'Add Connection' form:
Complete the form with the parameters and credentials to connect to the database that you have prepared
following the Prerequisites section. Choosing a meaningful connection name based on your own
requirements will help you to easily identify the target database at a glance. In this example, the connection is named
datamasque_tutorial. If you are unsure about what value to use for any of the parameters, refer to the
Click the TEST CONNECTION button to validate that DataMasque can connect to the target database. Once you have confirmed that your connection works, click SAVE AND EXIT to complete the connection setup. Your new connection will now be available in the Connections list on the Dashboard.
3. Uploading a seed file
Seed files provide datasets of replacement values for DataMasque to use when masking. Seed files must be CSV formatted and include a
header row. In this tutorial, the file tutorial_names.csv will be used to
provide DataMasque with a dataset of replacement names for the
users table. The file contains two columns:
Open the sidebar navigation menu by clicking the menu icon at the top left of the screen and navigate to the Files page. A list of all available files will be displayed:
Click the button to open the file upload dialog. Now click Browse and locate the tutorial_names.csv file that you have downloaded previously. You may also provide a short description for your file. Click SUBMIT to complete the file upload:
tutorial_names.csv file will appear in the files list:
Return to the Dashboard using the sidebar navigation menu.
4. Creating a Ruleset
A ruleset is the configuration that defines the tasks and masking logic that will be applied by DataMasque to a target database during a masking run. Rulesets are created and edited using the ruleset editor, and are written in DataMasque's YAML-based ruleset configuration language. A complete reference is available in the Ruleset Specification user guide.
To create a new ruleset, click the button on the Rulesets panel of the Dashboard. You will be taken to the Ruleset Generator. The Ruleset Generator can automatically generate a ruleset from your database's schema, and is the recommended way to get started with DataMasque.
To create a new empty ruleset, without using the generator, click Skip to YAML Editor, which will take you to the Ruleset Editor (shown below).
Replace the ruleset name with a descriptive name for the ruleset. This name will be used to identify the ruleset
from the Dashboard. In this example, the name
user_table_mask is used.
The target database has a single table that requires masking, so the ruleset will contain a single task of type
Update the value of
table to match the
users table created by the database init script in the Prerequisites
mask_table task type also requires the name of a
key column which uniquely identifies each row in the database table.
Multiple column names may be provided in an array to form a composite key. On the
users table, each row is uniquely
identified by the
version: "1.0" tasks: - type: mask_table table: users key: user_id rules: - column: REPLACE_ME masks: - type: from_fixed value: REPLACE_ME
The first masking rule of this ruleset will be applied to the
date_of_birth column. Replace the placeholder value for
The desired strategy for masking the
date_of_birth column is to replace all values with a new
randomly generated date. This can be achieved with the
mask. Replace the placeholder
from_fixed mask type with
max parameters must be provided to
from_random_date mask. In this example, each user's
date_of_birth will be a randomly chosen date between 1st
January 1950 and 31st December 2000.
version: "1.0" tasks: - type: mask_table table: users key: user_id rules: - column: date_of_birth masks: - type: from_random_date min: '1950-01-01' max: '2000-12-31'
The desired strategy for masking the
first_name column is to replace all values with a new name
chosen randomly from the tutorial_names.csv file uploaded
To achieve this, we will add a second masking rule to the ruleset targeting the
from_file mask type is used to randomly choose replacement
values from a seed file. The
seed_file parameter specifies the name of the seed file to use as the data source, and
seed_column parameter specifies the name of the column within that seed file (as determined by the CSV header row) from which
values will be sourced:
version: "1.0" tasks: - type: mask_table table: users key: user_id rules: - column: date_of_birth masks: - type: from_random_date min: '1950-01-01' max: '2000-12-31' - column: first_name masks: - type: from_file seed_file: tutorial_names.csv seed_column: first_name
The strategy for masking the
last_name column is nearly identical to the
first_name column. Add a third masking rule
to the ruleset to use the
last_name column from the same tutorial_names.csv seed file as a data source for randomly chosen last names:
version: "1.0" tasks: - type: mask_table table: users key: user_id rules: - column: date_of_birth masks: - type: from_random_date min: '1950-01-01' max: '2000-12-31' - column: first_name masks: - type: from_file seed_file: tutorial_names.csv seed_column: first_name - column: last_name masks: - type: from_file seed_file: tutorial_names.csv seed_column: last_name
You have just finished building your first ruleset. Click SAVE AND EXIT to save the ruleset and return to the Dashboard:
5. Starting a masking run
We will now apply data masking to a target database. From the Dashboard, select the Connection and Ruleset created previously. Run options may be left as their default values. Click the PREVIEW RUN button, which will take you to a confirmation screen for the masking run:
After verifying that the run configuration is correct, click the START RUN button to start your masking run:
You will be taken to the Run Logs page, where you can monitor the progress of the masking run from a stream of log messages from the masking worker. The run status will update to 'finished' on completion of masking.
Congratulations! You have successfully masked your first database with DataMasque. Try querying the
users table to
verify that the values have been masked as you expected.