Masking runs
- Overview
- Create database masking run
- Create file masking run
- Preview and confirm run
- Run logs
- Simultaneous runs
Overview
A masking run is the application of a masking Ruleset to a database Connection. Runs are created from the Database Masking Dashboard, and can also be triggered via the API (see Best Practices).
Create database masking run
A new database masking run can be configured with the following steps:
- Navigate to the Database Masking dashboard
- Select a connection from the list of available connections.
- Select a ruleset from the list of available rulesets.
- Set additional run options (detailed below)
- Click PREVIEW RUN or PREVIEW DRY RUN*
- After configuring the run, you will be taken to the Preview and confirm run screen.
*Dry Run allows you to test your rulesets without modifying the database. When a dry run is executed, DataMasque performs every operation as usual except:
- the final
UPDATE
operation ofmask_table
tasks, which would otherwise write the masked value to the database- the value generation and subsequent
UPDATE
operation formask_unique_key
tasks- the truncate_table operation
- the run_sql operation
Database run options
Run options are displayed on the Run Options section of the dashboard page. The following options are available:
Option | Default | Description |
---|---|---|
Batch Size1 | 50,000 |
The maximum number of rows that will be fetched, masked, and updated in a single operation by DataMasque. Larger batch sizes will reduce database operation overhead, but using a batch size value that is too large may result in DataMasque memory exhaustion. This value does not affect the total number of rows masked. The maximum allowed batch size is 50,000. Note: The Batch Size parameter is not applicable for Amazon Redshift and Amazon DynamoDB masking runs. Only for database masking runs. |
Max rows34 | unset |
The maximum number of rows that will be masked by each mask_table task3. May be used for speeding up test iterations when developing rulesets. Warning: In the case that a table contains more rows than the value specified here, the remaining rows will contain unmasked data. Note: The Max rows parameter is not applicable for Amazon Redshift masking runs. Only for database masking runs. |
Run secret | unset |
The run secret is used in the random generation of masked values. A run secret must consist of at least 20 characters. Providing a consistent run secret will ensure that repeated runs on the same DataMasque instance will produce the same results. Note: If a random_seed value is provided in addition to the run_secret value, the random_seed will take precedence and the run_secret will be ignored. |
Continue on failure | false |
If there is a task failure, and this option is false, DataMasque will skip all remaining non-started tasks. If this option is true, DataMasque will continue performing other tasks even if there is a task failure. It can be useful to set this option to true when testing/debugging your masking ruleset to identify as many failures as possible in each run. |
Notify me when this run completes | false |
Email the current user when the job completes. The DataMasque instance must have SMTP configured. |
Disable instance secret | false |
If this option is set to true , DataMasque will exclude its instance-specific secret and generate masked values based solely on the run secret. You may wish to disable the instance secret in order to achieve consistent masking across DataMasque instances. However, by disabling the instance secret, any DataMasque instance using the same run_secret could replicate your data masking. |
Override connection's Server Side Encryption settings | false |
Only valid for Amazon DynamoDB connections. If checked, table SSE key settings may be overridden on a per-table basis. Click the gear icon to view the settings. See the Specifying SSE Options documentation for DynamoDB for more information about these settings |
Max file size2 | See description | The max file size is only applicable to Amazon Redshift and Amazon DynamoDB databases. For Redshift it sets the MAXFILESIZE of the UNLOAD command. For DynamoDB it determines the maximum pre-compression size of batch files split from the original files exported by DynamoDB to S3. In both cases, this determines the maximum file size in MB of records that will be masked at once by a single DataMasque worker. The max file size can be set to any integer value between 5 MB and 1000 MB, and defaults to 10 MB for Redshift and 100MB for DynamoDB (Redshift files are stored in the compact Parquet format, which expands to a greater extent when loaded into DataMasque's memory). Only for database masking runs. |
Enable Diagnostic Logging | false |
With diagnostic logging enabled, extra information (Columns, constraints, foreign keys and indexes) is captured to assist with diagnosing problems with masking runs. Because of the extra information collected, masking runs may take longer to execute with this option enabled. Memory information of the main process will be output to the run log in the following format Memory: T:31.19GB / F:0.26GB / A:10.60GB where T , F , and A stand for Total, Free and Available memory respectively. Memory of workers will also be captured in the following format Memory for PID(58): 355.17MB . |
Notes:
1
Batch Size
applies to database types other than Amazon Redshift and Amazon DynamoDB.2
Max file size
only applies to Amazon Redshift and Amazon DynamoDB databases.3
Max rows
does not apply tomask_unique_key
tasks.4 Use of the
Max rows
run option for Amazon Redshift is not yet supported in DataMasque. This is on our roadmap and will be added in a future release.
Create file masking run
A new file masking run can be configured with the following steps:
- Navigate to the File Masking dashboard
- Select a source connection from the list of available connections (files will be read from here).
- Select a ruleset from the list of available rulesets.
- Select a destination connection from the list of available connections (files will be written to here).
- Set additional run options (detailed below)
- Click PREVIEW RUN or PREVIEW DRY RUN*
- After configuring the run, you will be taken to the Preview and confirm run screen.
*Dry Run allows you to test your rulesets without modifying the database. When a dry run is executed, DataMasque download files from the source, applies
include
/skip
rules and masks them, but does not upload the masked data to the source. This allows you to see which files would get masked, and if there are any errors in your ruleset.
File run options
Run options are displayed on the Run Options section of the dashboard page. The following options are available:
Option | Default | Description |
---|---|---|
Run secret | unset |
The run secret is used in the random generation of masked values. Providing a consistent run secret will ensure that repeated runs on the same DataMasque instance will produce the same results. Note: If a random_seed value is provided in addition to the run_secret value, the random_seed will take precedence and the run_secret will be ignored. |
Continue on failure | false |
If there is a task failure, and this option is false, DataMasque will skip all remaining non-started tasks. If this option is true, DataMasque will continue performing other tasks even if there is a task failure. It can be useful to set this option to true when testing/debugging your masking ruleset to identify as many failures as possible in each run. |
Notify me when this run completes | false |
Email the current user when the job completes. The DataMasque instance must have SMTP configured. |
Disable instance secret | false |
If this option is set to true , DataMasque will exclude its instance-specific secret and generate masked values based solely on the run secret. You may wish to disable the instance secret in order to achieve consistent masking across DataMasque instances. However, by disabling the instance secret, any DataMasque instance using the same run_secret could replicate your data masking. |
Enable Diagnostic Logging | false |
With diagnostic logging enabled, extra memory information is captured to assist with diagnosing problems with masking runs. Because of the extra information collected, masking runs may take longer to execute with this option enabled. Memory information of the main process will be output to the run log in the following format Memory: T:31.19GB / F:0.26GB / A:10.60GB where T , F , and A stand for Total, Free and Available memory respectively. Memory of workers will also be captured in the following format Memory for PID(58): 355.17MB . |
Preview and confirm run
This screen shows the preview of the configured run. Check the run parameters here before proceeding with execution via the START RUN button. After the run has been started, you will be redirected to the Run logs page where you can monitor the run output and progress.
You can view a curl
command for starting an equivalently configured run using the DataMasque API by clicking the VIEW RUN COMMAND button. For more information, see the Best Practices guide and API Reference
Run logs
The Run Logs screen displays a log of all historic runs, their statuses, and their individual log outputs. To access the Run Logs screen, choose the Run logs item from the main menu.
Run details
When a run is selected in the Run Logs panel, its details and log history are displayed in the Masking Run panel. While a run is still being executed, its log output will be streamed for continuous feedback on the run progress.
The run options used by this run can be found on the first log line.
Database masking job summary
On completion of a database masking run, the run log will display the following information:
Masking run status
: The final status of the run on completion. This will indicate whether the run completed successfully or failed.Started at
: The time at which the masking run was started.Finished at
: The time at which the run completed, failed or was cancelled.Total time
: The total time taken for the run.Total tables masked
: The total number of tables that were successfully masked.Total columns masked
: The total number of columns that were successfully masked.Total rows masked
: The total number of rows that were successfully masked.
An example of a successful database masking run log can be seen below:
When a run fails, if continue_on_failure
was not enabled, the total tables, columns and rows masked will be displayed as 0. While some
rows or column may be masked, it cannot be determined which particular rows or columns are masked on a failed run,
so all tables should be considered unmasked.
If continue_on_failure
was enabled for the run, the tables, columns and rows will instead reflect the total numbers
of tables, columns and rows masked in successful mask_table
and mask_unique_key
tasks.
File masking job summary
On completion of a file masking run, the run log will display the following information:
Masking run status
: The final status of the run on completion. This will indicate whether the run was completed successfully or failed.Started at
: The time at which the masking run was started.Finished at
: The time at which the run completed, failed or was cancelled.Total time
: The total time taken for the run.Total files masked
: The total number of files that were successfully masked. On failed runs, this will reflect the number of files that were masked by successful file masking tasks.
An example of a successful file masking run log can be seen below:
An example of a failed file masking run log can be seen below:
Connection and ruleset snapshots
Snapshots of the connection and ruleset are kept for every masking run, maintaining an historical record of the exact configuration that was used for each run. Connection and ruleset snapshots can be viewed for the selected masking run by clicking on the connection or ruleset name displayed on the run detail panel.
A modal window will open to display the snapshots, as captured at the time of masking run creation. The snapshot status indicates whether the current connection or ruleset configuration has been changed since this snapshot was taken. Clicking the edit link will allow you to edit the current connection or ruleset corresponding to the displayed snapshot.
There are three different statuses that may be displayed for a snapshot:
- current: The details have not been changed since the run was performed.
- modified: The details have been changed since the run was performed. The details shown no longer reflect the current state of the connection or ruleset.
- deleted: The connection or ruleset no longer exists. If this is the case, there will be no option to edit.
Downloading a run log
While hovering over a run in the Run Logs panel, a download button will be shown. Clicking on this button will start a download containing the logs for this masking run.
Downloading a sensitive data discovery report
When a run_data_discovery
task is included in the
masking ruleset, the resulting report for each run can be downloaded by clicking either the shield
icon on the run row in the Run Logs list, or the Discovery Report chip on the
Masking run detail panel. The report will be downloaded in CSV format and may be opened in a
text editor or spreadsheet viewer such as Microsoft Excel. See Sensitive Data Discovery
for more details.
Cancelling a run
If you wish to cancel a run, you may do so with the following steps:
- Select the run you wish to cancel from the list in the 'Run Logs' panel.
- Click the CANCEL RUN button at the bottom of the screen.
- After clicking YES on the confirmation dialog, the run status will be updated to
cancelling
and DataMasque will proceed to stop the run's in-progress masking tasks. - Once all the run's tasks have been stopped, the run status will be updated to
cancelled
.
Note: During task cancellation, DataMasque will send an explicit request to the database to cancel any running queries. In most cases (if the query is in an interruptable phase), the database will catch this and stop the query immediately. If the query is in a non-interruptable phase, the database will still complete that phase before the query is terminated.
Simultaneous runs
To the same database
Simultaneous runs to the same database are not currently supported in DataMasque,
as simultaneous masking can result in data being incorrectly masked.
When there is a masking run in the status of queued
, running
or cancelling
,
subsequent masking runs to the same database connection cannot be scheduled.
If you wish to mask multiple tables in the same database simultaneously, it is recommended to use parallel tasks in a single ruleset.
To different databases
Simultaneous runs to different databases are supported in DataMasque.
For file connections
Simultaneous runs to the same source or destination file connection are not supported.