Changelog
This document contains all notable changes included in each release of DataMasque.
The DataMasque versioning scheme follows semantic versioning convention MAJOR.MINOR.PATCH:
- MAJOR version is incremented when incompatible API/schema changes are made
- MINOR version is incremented when functionalities are added in a backwards compatible manner
- PATCH version is incremented when backwards compatible bug fixes are made
[2.23.0] - 2024-07-11
Added
Support for retrieving secrets from AWS Secrets Manager as a Run Secret.
The worker count for a File Data Discovery run can now be specified in the UI when starting a Discovery run. The number of workers defaults to 1.
mask_unique_key
tasks that mask foreign keys will now cascade across schemas.The Continue On Failure option has been added to File Masking. If this option is enabled, and an error occurs when masking a file, DataMasque will log the error and continue to the next file. The run will have a failed status.
Agent logs now contain the run ID and worker process ID, for easier filtering.
Changed
Support for generating rulesets for large schemas using the Ruleset Generator UI has been improved.
During File or Database Data Discovery, if an error occurs when performing discovery on a table or file, the error is logged and discovery continues to the next table or file. Previously the discovery would fail.
The 1,000 column display limit in the Ruleset Generator has been removed. Rulesets generated through the UI may contain only 1,000 columns at a time. If more columns need rules generated, the large ruleset generation API should be used.
The Run Date column in the run logs screen has been renamed to Run Start Time.
Ignored or custom keywords CSV uploads are now allowed to be either in a single column or single row, instead of just a single row.
Docker Compose V2 is now required for Podman installations. Docker Compose V2 is recommended for Docker installations as well, but not required.
Fixed
Fixed HTTPS SSL certificate uploads being rejected by the DataMasque UI.
Fixed issue where only the first connection clone would succeed, with subsequent clones failing for any connection.
Removed link to Ruleset Editor and Ruleset Generator links for Mask Runner users, who do not have permission to access these pages.
File data discovery runs started using the API now show their results in the UI (previously only runs started in the UI would show results in the UI).
Fixed error handling in Connection form to display missing region validation for DynamoDB connections when a region is required.
Fix for a case where downloadable Discovery Report for a Data Discovery would not reflect the most recent discovery until the page was refreshed.
retain_date_component
mask no longer fails when retaining days from the 29th-31st, if randomly selecting a month with fewer days.The
disable_warning_on_use_of_conditionals
setting now correctly disables all related conditional warnings.Fix masking of Parquet columns that are all null.
Parquet integer types are retained after masking (for example,
INT96
remainsINT96
where previously it would be converted toINT64
).The Ruleset Generator no longer generates
imitate
masks for Parquet columns whose sensitive data type cannot be determined. Instead, appropriate date, time or numeric masks are generated.Progress logging has been improved, to prevent crashes in cases where the
admin-server
container may restart.Fixed searching of File Data Discovery results, and switching back and forth from Show-Sensitive to Show All when there are no results.
Improved documentation on required Oracle and PostgreSQL privileges.
Documentation of incorrect
csv_file
in Ruleset Generator API documentation has been fixed. The correctcsv_or_zip_file
is now documented.
[2.22.2] - 2024-10-25
Fixed
Addressed issue where
from_blob
was not applied in the following database masking scenarios:- When using
skip
/skip_defaults
withnull
, non-null values were also skipped. - When using an
if
condition to excludenull
values, non-null values were also excluded.
- When using
NaN
andNaT
values in Parquet files are now handled without error.Parquet
decimal128
types are able to be masked.File Data Discovery can now be run on a connection where a discovery run was previously cancelled.
[2.22.0] - 2024-09-23
Added
Schema discovery has been improved, utilising less memory, and includes an option to export discovery results to CSV for offline column selection.
The Ruleset Generator can generate rulesets from an uploaded CSV file, including the option to split the output into multiple rulesets each of a nominal size.
Support for RHEL9.
Support for Amazon Linux.
DynamoDB connections now support providing a role ARN to assume, to allow for cross-account access for masking.
hash_columns
now support acoerce_whole_numbers_to_int
option, for consistent hash generation between integers and whole numbers stored as floats.Shortcuts have been added to the Save Connection button, to quickly test and save a connection, or save and go to the Ruleset Generator.
Run IDs and durations are now shown on the Run Logs screen.
Changed
DataMasque now supports the Docker Compose plugin (i.e.
docker compose
) as a replacement for the standalonedocker-compose
tool. However,docker-compose
will continue to work for backward compatibility. Note that Podman and Amazon Linux still require the standalonedocker-compose
to be installed, as outlined in the DataMasque installation instructions.The Finished With Warnings status is now displayed in orange on the Run Logs page.
All warnings have been reviewed and refined. Many messages that had
WARNING
level have been downgraded toINFO
, reducing the number of runs that have Finished With Warnings status. Warnings that can be disabled show a hint on how to disable them. More information on warnings can be seen on the Warnings documentation.
Fixed
Filtered indexes in IBM Db2 LUW are now detected correctly, so indexes are no longer attempted to be created when one already exists.
In some instances the
numeric_bucket
mask would cause an infinite loop. This could occur ifforce_change
was set totrue
, and thescale_to
was chosen such that the scaled output value matched the input value.numeric_bucket
buckets must be a multiple ofscale_to
(if set), which fixes an issue where replacement values were outside their bucket.from_unique_imitate
now supports Amazon Redshift.Automatic index creation on Oracle 11g no longer fails. For example, when DataMasque requires an index and must create one for the duration of a task.
from_unique_imitate
now supports Oracle 11g.In-Data Discovery now supports Oracle 11g.
[2.21.0] - 2024-08-27
Added
A new status
finished_with_warnings
has been added to masking runs. This is applied to any runs that complete successfully but generated warnings in the run log. Any automation scripts that check forfinished
as a specific status should also check forfinished_with_warnings
and handle it appropriately.The
action_on_batch_size_exceeded
option has been added tomask_table
tasks, allowing DataMasque to more gracefully handle cases where the database returns more rows than expected for a given batch size. For example, if the table contains non-unique keys or has corrupt data.A
skip_defaults
rule to excludenull
and empty string values from being masked is now automatically included in generated database rulesets.The
use_calculated_bounds
option has been added tomask_table
tasks. This allows DataMasque to generate the batch bounds internally for integer key columns, which can improve performance on large tables.Added Brazilian seed files for
from_file
tasks and ruleset generation. Select the Brazil locality to use these.File masking runs now include the connection UUID(s) and name(s) in the run log.
The Connection menu on the Ruleset Generators is now searchable.
Added support for setting the connection encoding for MariaDB and MySQL using the API.
The
ALLOWED_HOSTS
setting can now be configured to use CIDR ranges instead of just IP addresses.Added In-Data Discovery of US states.
Added In-Data Discovery of phone numbers.
Changed
from_unique_imitate
can now be configured to mask leading zeros.Improved handling of exceptions that cause the cancellation of tasks
IBM Db2 LUW
run_sql
tasks that have multiple commands now issue a warning that only the first command will be executed.
Fixed
For IBM Db2 LUW,
from_unique_imitate
no longer attempts to reset sequence on triggers that do not have sequences attached.The In-Data Discovery sample-size UI now handles an empty field more gracefully.
If a data discovery run fails, better errors are now shown in the UI.
Run logs for discovery jobs can now be downloaded.
Autosaved rulesets are now preserved when clicking Save or Save and Exit on the Ruleset Editor screen, after the session has timed out.
[2.20.0] - 2024-07-15
Added
Added support for the IBM Db2 LUW database.
Added a Locality setting and the ability to upload custom seed files for the chosen locality. The Ruleset Generator uses these locality-specific seed files to generate some masking tasks.
Added support for discovery of
list
,struct
andmap
Parquet column types (including nested columns of those types).Added support for assuming another role when connecting to an S3 bucket, allowing for cross-account use of an S3 bucket. This can be configured in the connection settings for any file masking connections that use S3.
The
retain_age
mask now supports theforce_change
option.Added several new built-in seed files for:
- US companies, addresses and states.
- AU companies.
- Street Names and Types (e.g. Road, Street, Avenue, etc.).
Added a UI option to change the ruleset file's extension (
.yml
or.yaml
) when pushing it to Git. When pulling from Git, DataMasque checks for both extensions.A user can now set their own Git directory path which overrides the instance default path.
Database connections can now be set to read-only in the UI (previously this was only available via the API).
Added In-Data Discovery rules for gender, street address, and US state names.
Added an In-Data Discovery option to force use of In-Data Discovery on columns that already have metadata matches.
Added an In-Data Discovery option to flag data matching user-specified regex pattern(s) as non-sensitive.
Added a navigation UI to the Settings page.
Added documentation for configuring SAML Single Sign-On to DataMasque using the Okta platform.
Changed
DataMasque now logs file masking run history to a file
.datamasque_run_history.ndjson
. This file is created serving as an indicator that masking has taken place, and can be used for run validation. The validation API supports validating file masking runs using the values from the run history file.The run validation API now uses a random
run_hash
rather than a deterministic one, and the parameterruleset_hash
has been renamed toruleset_content_sha256
. Values for these fields can be found in the run history table or file.Improved run startup performance by not re-validating a ruleset that was marked as valid when saved in the editor.
Improved the performance of the Run Logs page. Long run logs will be truncated for display and can be downloaded to see the full content.
Tasks that require temporary modifications to the database (
secure_shuffle
,from_unique_imitate
,from_blob
) are no longer executed in dry runs.Tasks that get stuck in the Cancelling state for more than 5 minutes will now be forcibly cancelled.
Improved the built-in In-Data Discovery rule for MAC address to minimize false matches.
Improved In-Data Discovery to include credit card issuer validation in addition to Luhn checksum.
Where a column or field has both a metadata match and an In-Data Discovery match, the UI now displays both matches rather than just the latter.
Clicking the Add Ruleset button on the File Masking page now opens the File Ruleset Generator.
The My Account page no longer displays usage information. To view usage information, download the usage report.
Minor wording change to the EULA: "Target System" now means a physical or virtual machine, rather than a physical or virtual node.
Keys other than
ROWID
can now be used as thekey
column for Oracle databases. Note that DataMasque still recommends use ofROWID
as thekey
wherever possible, and the Ruleset Generator will selectROWID
as thekey
column.
Fixed
Temporary indexes created by DataMasque are now correctly cleaned up at the end of a masking run. Additionally, any dangling temporary indexes from previous masking runs are now cleaned up before creating a new one.
Temporary tables and indexes for
from_unique_imitate
masks are now correctly cleaned up when created on a schema other than the connection's default schema.run_data_discovery
tasks no longer return results from views on MySQL and MariaDB.Masking runs that are cancelled before they can start no longer get stuck in the Cancelling state.
The Ruleset Generator now generates the correct masks for
NUMBER
,NUMERIC
, and OracleLONG
columns.The File Ruleset Generator now generates
include
patterns that accurately match the files to be masked by each task.The File Ruleset Generator now discovers files in subdirectories of the connection's base directory.
The File Ruleset Generator page now correctly displays errors for malformed custom In-Data Discovery regex patterns.
DECIMAL
columns now correctly compare against the value in the seed file when using the column as atable_filter_column
in afrom_file
mask.UI web responses now use the correct browser-side caching options. This should avoid any stale data being displayed in the UI.
The UI will no longer erroneously display a warning that DataMasque is taking a long time to start.
Masking now works correctly when the destination S3 bucket is empty.
Fixed upload and delete of connection filesets (Oracle wallets or MySQL/MariaDB SSL ZIP files). An error is now shown trying to upload two connection filesets with the same name and type.
DataMasque displays a more descriptive error when the user tries to mask files in Amazon Glacier storage.
from_unique_imitate
no longer produces duplicate warnings for dangerous parameters.Improved API documentation.
In the documentation, clarified the operation of the
glob
andregex
options forinclude
andskip
in file masking rulesets.In the documentation, updated the list of supported key column types for all databases.
[2.19.1] - 2024-04-26
Changed
- Masks generated via the File Ruleset Generator or JSON Mask Generator now generate more specific masks based on the data type.
Fixed
- Database rulesets generated by the Ruleset Generator no longer fail to save.
[2.19.0] - 2024-04-12
Added
Introduced data discovery for files, including both metadata discovery and In-Data Discovery. This supports CSV, Parquet, JSON and NDJSON.
Added ruleset generator for file masking, based on discovery results, with an intuitive UI.
Implemented read-only database connections for data discovery (configurable with API only).
Enabled support for CSV/delimited files without header rows using the
column_names
parameter.Expanded functionality of
from_unique_imitate
with new optionson_invalid: mask
to mask invalid values and have them remain invalid.Add
checksum: icp
option tofrom_unique_imitate
, to generate values that satisfy the New Zealand Installation Control Point (ICP) checksum.Introduced a CSV Upload feature for global keywords in settings.
Implemented social security number detection within In-Data Discovery.
Add the version and ruleset hash to the runlog for auditing. Add support for querying the run hash in the API.
Add
from_json_path
mask type for JSON masks. Replace parts of a JSON document by sourcing from other components in the same document.Incorporate support for Ping SSO as a SAML identity provider for single sign-on.
Changed
Enhanced performance by eliminating redundant ruleset validation during saving.
Improved performance of the
replace_regex
mask.Importing exported rulesets that were archived now un-archives them.
Real address seed files have had a Street Type column added.
Fixed
Resolved an issue where foreign keys were dropped when they shouldn't be for
from_unique_imitate
.Improved metadata detection of
name
columns by ignoring some keywords (such asfile
andschema
).Addressed an issue where the loading message DataMasque taking a long time to load erroneously displayed under certain conditions.
Ruleset generator now correctly generates numeric values that will not overflow the target column.
[2.18.0] - 2024-03-01
Added
secure_shuffle
mask to shuffle values in a column. Currently, only Oracle is supported.Implemented storage of information about completed database masking runs in a
DATAMASQUE_RUN_HISTORY
table. This is created on the database being masked, serving as an indicator that the database has undergone masking. Supported databases are MariaDB, Microsoft SQL Server, MySQL, Oracle and PostgreSQL.from_unique_imitate
now supports aretain_prefix_length
option, enabling the preservation of a specified number of characters while masking.Expanded the functionality of
from_unique_imitate
by introducing theon_invalid
parameter, providing flexibility in handling values that do not match a specifiedchecksum
.from_unique_imitate
adds thecredit_card
checksum – this is the equivalent of theluhn
checksum, but also validates the incoming value is 12-19 digits long.Ruleset import and export to a Zip archive, for migration or backup of DataMasque instances. Use of this is currently only via API HTTP requests.
Additional address seed files for the USA, Australia and New Zealand, containing real addresses, have been added. These are for use with the
from_file
mask.Support for retrieving the username or password for a database connection from AWS Secrets Manager.
Support for EKS Kubernetes v1.27, v1.28 and v1.29.
Changed
Updated the
luhn
checksum onfrom_unique_imitate
to no longer require values to be 12-19 digits long; instead, it now verifies the length is greater than 1 digit. To retain length validation, switch to thecredit_card
checksum.Performance improvements when using format strings, in
from_unique
andfrom_format_string
masks, andmask_unique_key
tasks.Memory reduction and performance improvements when multiple
from_file
masks use the same seed file.Performance improvements to the
replace_regex
mask.Performance improvements in ruleset saving.
Fixed
Resolved an issue with
from_unique_imitate
on MySQL, where constraints/indexes for columns not being masked were being dropped unnecessarily.Fixed automatic
min
/max
selection forfrom_random_number
masks generated by the ruleset generator, based on column size.Addressed an error occurring when pushing and pulling empty rulesets to/from a Git repository.
[2.17.1] - 2024-02-09
Added
- Added the
luhn
checksum option to thefrom_unique_imitate
mask.
Changed
- When using a
checksum
withfrom_unique_imitate
, if the value to mask does not match the checksum, a warning is shown in the run log.
Fixed
Hashes are now correctly calculated for
json
masks that use transforms with a relativejson_path
as a hash source (i.e. the value to hash on differs for each node).Masking now works correctly when using
from_unique_imitate
in a task whosetable
is a fully-qualified table name (i.e. it includes the schema).The Total Tables Masked in the run summary for a
parallel
task now shows the correct number of tables.
[2.17.0] - 2024-01-26
Added
Added support for MariaDB. Supported versions are 10.11 and 11.2.
Added a
social_security_number
mask for American Social Security Numbers.Added support for masking columns with clustered indexes on SQL Server, when using
from_unique_imitate
.Added a
checksum
option to thefrom_unique_imitate
mask, allowing generation of unique values with correct checksum digits. Thebrazilian_cpf
checksum algorithm is the first to be supported.Application log downloading from the Logs page of the UI.
Added the ability to clone a database or file masking connection.
The Git dialog for ruleset management now shows a list of ruleset files present in the remote repository, allowing them to be selected directly rather than needing to enter the correct filename.
Changed
Improved the installer script to no longer set up a
docker
alias when usingpodman
, and request confirmation of the choice of platform when bothdocker
andpodman
are installed.Log entries in
masque_admin_server.log
now include a timestamp.Application logs show the timezone of the timestamp in which they were logged.
Improved user feedback when DataMasque cannot communicate with a configured Git repository.
Improved the performance of the
connections
API.Reworked the documentation, fixing up some errors and adding API usage examples. In particular, the Ruleset YAML Specification section is now divided amongst several pages to make it easier to follow.
Speed improvements to buffer size calculation on MySQL.
Ruleset names may now contain any character except for
/
.
Fixed
Fixed a bug when using
from_unique_imitate
in combination with another mask would sometimes cause the second mask fail with an error.DataMasque now reconnects to MySQL database instances if the previous connection is timed out by the database server.
Fixed crash when attempting to mask badly-formatted NDJSON or Avro files.
Fixed a bug where re-running schema discovery while columns were selected in the results table could result in the set of selected columns no longer being valid for ruleset generation.
Fixed a bug where schema discovery on MySQL would discover views, rather than just tables.
Fixed an issue where masking tasks on an SQL Server instance would fail if the database is configured with a case-sensitive collation mode.
Fixed a hang/crash when DataMasque receives an incorrectly-formed API request to create or update a connection.
The run duration timer no longer keeps incrementing after a run has been cancelled.
A
from_file
task in ruleset YAML is now correctly flagged as invalid if it specifiesseed_filter_column
but notable_filter_column
or vice versa.When attempting to access DataMasque shortly after starting the application, the user now sees a loading screen rather than an
Unexpected Error
page.When DataMasque receives an HTTP request to a URL that does not end with a slash (
/
), non-GET
requests now fail immediately, rather than first being redirected asGET
requests.Attempts to upload an invalid connection fileset, Oracle wallet, or license are now rejected correctly.
Retrieving the size of empty MySQL databases no longer causes a run to fail.
[2.16.1] - 2023-12-19
Added
Added an option to choose between displaying times in local or UTC time when viewing or downloading run logs.
Added an option to enable diagnostic logging for masking runs, including memory usage and DDL. These diagnostics appear in the run logs.
Added a
quoting
option for character-delimited file masking.
Changed
Renamed log files to have more descriptive names:
django.log
,celery.log
anduwsgi.log
becomemasque_admin_server.log
,masque_agent.log
andmasque_requests.log
respectively.The free trial license now allows the user to trial file masking, with a limit of 5 files per run.
Updated In-Data Discovery match rules to remove any upper limit on the size of the column being searched.
Adjusted In-Data Discovery match rules for credit card numbers to reduce false matches.
Fixed
Improved
from_unique_imitate
support for detectingIDENTITY
columns, masking on non-key columns and using existing keys when available.Fixed an issue where the
advance_hash
option for JSON and XML masks would not work correctly if the transform rules includedhash_sources
.The ruleset generator now suggests
mask_unique_key
tasks for columns that are part of unique indexes on Oracle.Fixed an issue where one user could update another user's Git key.
Fixed backwards compatibility of
credit_card
mask withnull
values when using deprecated parameters.Improved handling of large rulesets.
[2.16.0] - 2023-11-24
Added
The Ruleset Generator now supports In-Data Discovery to examine the content of database tables and create masking rulesets from their content (rather than just column names).
from_unique_imitate
mask to generate guaranteed unique values with the same format as the input values. This adds support for consistent unique key masking across different databases and files, and key masking for Amazon Redshift.Support for Git repositories. Rulesets may be pushed to and pulled from remote repositories.
Support for
from_blob
added to remaining databases (DynamoDB, MySQL, PostgreSQL and SQL Server).Automatic ruleset saving on browser idle or session timeout, allowing reloading upon return.
Documentation about how to configure SELinux for mounted file shares with Podman.
Changed
Major performance improvements to
credit_card
mask, along with extra options for number generation, prefix retention and PAN formatting.The Ruleset Generator now defaults to showing sensitive columns only (full column list can still be toggled on).
Automatic key selection from the Ruleset Generator has been improved. If a table has no primary key, then the Ruleset Generator falls back to unique keys, unique indexes, ID columns and finally to the first three columns in combination.
Change the behavior of MySQL
run_sql
tasks to reject the use ofDELIMITER
statements as these are not required when usingrun_sql
.The
retain_age
mask now supports dates in the future.Invalid regular expressions in conditional matches now show a more useful error.
SSO users can no longer reset their password or be disabled. These should be managed from the directory service (e.g. Active Directory).
Fixed
Long-running Schema Discovery tasks no longer block the UI.
Tables may be created in
run_sql
tasks and masked in subsequent tasks.from_unique
mask may now be used in parallel tasks.The Ruleset Generator now only shows PostgreSQL constraints with duplicate names once.
When using
hash_sources
orhash_columns
with anxpath
, runs will no longer unexpectedly fail if the XPath doesn't match any elements. Instead, hashing is performed onnull
and a warning shown in the run log.Podman now loads required environment variables more reliably.
Loading Run Logs now uses less memory and is faster.
Sending of automated notification emails.
API documentation has improved examples and better explanations of the token types.
[2.15.0] - 2023-10-06
Added
Support for Amazon EKS Managed Nodes.
Support for masking files on mounted file shares using NFS or SMB protocols or on the DataMasque host VM, with a new file connection type Mounted Share.
brazilian_cpf
mask to generate valid Brazilian CPF numbers.Advanced searching in the Ruleset Generator. Include schema, table or column in the search, with
*
to specify wildcards.Support for Schema Discovery on non-default schemas (or non-default databases in MySQL).
Testing a database connection provides extra information on failures, for example, incorrect hostname or ports.
Changed
The Sensitive Data report now includes non-sensitive columns.
The batch size limit has been removed.
Fixed
EC2 instances may now be switched between EC2 IMDS v1 and v2.
SSO with Azure Active Directory is now able to fetch the email address from the
emailAddress
oremail
attributes.The
django_manage.sh
command to execute password resets or other Django commands, has now been included.UI bugs related to switching the selected user after setting a new password, or creating a new user after setting a user's password.
The ruleset editor now resizes more gracefully with different screen sizes.
[2.14.0] - 2023-09-07
Added
from_blob
mask to replace binary data inBLOB
columns or files. Support is included for Oracle databases or entire files usingmask_file
tasks.
Changed
- The sidebar menu can now be toggled by clicking anywhere on it.
[2.13.0] - 2023-08-18
Added
DataMasque now supports deployments using Redhat Podman.
xml
andjson
masks now supporthash_sources
to fetch hash data from inside the current XML or JSON document.Support for Oracle Native Network Encryption (NNE).
Ruleset generator can filter to show sensitive columns only.
Run log filtering and searching.
The
value_on_missing
option has been added tofrom_file
to specify a value to be inserted instead ofnull
when a value can't be found in a CSV seed file.
Changed
The list of run logs is server side paginated for faster loading.
The ruleset generator process is now asynchronous.
Multiple foreign keys in the ruleset generator table are displayed on separate lines for better readability.
Login session are limited to 12 hours maximum, with a one-hour inactivity timeout.
All configurable run options are recorded in the run logs for auditing.
The Add User panel does not hide after a user is added, so adding multiple users is easier.
Updated password policy to incorporate NIST standards.
Columns or attributes masked by
mask_file
/mask_tabular_file
tasks are now listed in the run preview and run log.The side menu can be clicked anywhere to expand/collapse.
Fixed
Schema discovery does not fail if the schema has no tables.
Using Select All in the ruleset generator now accumulates selections.
Runs can no longer be started with ruleset of the wrong type (e.g. files for DB connections, and vice versa).
The full ruleset and connection names are now displayed on the run log screen.
Failed conditional comparisons now log the datatypes correctly.
Sorting users by role in the UI now works correctly.
Updating a user's username no longer removes their role.
Invalid data in batch size and max rows now shows validation errors.
An error is now shown if the run secret is too short.
Invalid license messages now shown in run log.
MySQL errors in
run_sql
tasks are captured and shown in run log.
[2.12.0] - 2023-06-09
Added
Added
advance_hash
option insidejson
andxml
masks for consistent generation of lists of values, when usinghash_sources
orhash_columns
.A mixed-gendered firstname seed file (
DataMasque_firstNames_mixed_gendered.csv
) withgender
column for filtering.hash_sources
andhash_columns
now support atrim
parameter to trim surrounding whitespace on hash value(s).Admin server exceptions are now logged to a file for easier troubleshooting.
Amazon DynamoDB supports conditionals on columns that don't exist, when
on_missing
is set toskip
.The region of an Amazon DynamoDB table may now be set in the connection dialogue, instead of having to be set in the ruleset.
Added support for IMDSv2 on AWS EC2.
Added
--non-interactive
flag for the DataMasque installer, for unattended updates.
Changed
Performance improvements to Ruleset Generator.
Performance improvements to JSON rule generator.
Generated rulesets now contain comments about how keys columns are masked, and if extra columns have been included due to being part of a composite key.
Logs for Azure Blob Storage connections have reduced verbosity to decrease the amount of unnecessary log entries.
The run log display now supports colours for different log levels.
Improved logging of errors if the agent is unable to communicate with the admin server, for easier troubleshooting.
Warnings about rows being skipped due to
if
orskip
is only shown once per table (per run).imitate
is now the default fallback mask in Ruleset Generator if no matches are found (previously it wasrandom_text
).Amazon DynamoDB permissions are checked at the start of the masking run, rather than at the end, for quicker time to failure.
substitute
mask is renamed toimitate
.substitute
is still retained for backwards compatibility, but is deprecated and may be removed from a future version of DataMasque.
Fixed
Amazon DynamoDB is now masked in chunks to prevent high memory usage. The file size for masking is configurable per run.
Values being set on Amazon DynamoDB key columns are now attempted to be cast to the correct type.
Masking of Amazon DynamoDB tables with on-demand or provisioned capacities, as well as local/global secondary indexes with either on-demand or provisioned capacities.
XML declaration is retained even if it does not contain an encoding.
XML conditions support XML documents with or without declarations.
Deselect all columns on the Ruleset Generator now works correctly.
Visibility of Add File Connection button for the Mask Builder role.
A useful error is now shown when trying to compare timezone-aware and timezone-naive times in a condition.
Public accessibility checking of S3 buckets are now more thorough, and show more useful error messages for disabling public access.
[2.11.2] - 2023-05-15
Changed
Performance improvements to file masking on AWS S3.
Performance improvements to masking of NDJSON and Apache Avro files (on all file connection types).
Performance improvements to schema discovery with Ruleset Generator.
Fixed
Masking against databases without a database ID no longer fails. This mostly affects MSSQL but the fix applies to all database types.
Fixed support for MySQL databases where the connection's hostname combined with the database name is more than 100 characters (e.g. some AWS RDS configurations).
Improved detection of client IP addresses when DataMasque is used with an HTTP(S) load balancer.
[2.11.1] - 2023-04-03
Fixed
- Select All checkbox in Ruleset Generator no longer selects unselectable foreign keys that should only be masked by cascade.
[2.11.0] - 2023-03-29
Added
Support for masking Amazon DynamoDB.
Conditional masking for files, including
if
andelse
rules. Conditions can be applied to XML or JSON documents.Support for masking NDJSON files.
Support for masking Apache Avro files.
Conditions in database masking may use predicates from XML or JSON data in columns.
Support for shortcuts in date-based conditional masking, e.g.
less_than: now
orage_greater_than: 50
.Added
from_unique
mask to generate unique values for use in columns or databases without unique constraints.A previous version of DataMasque can now overwrite a newer version on install, by providing the
--force
flag to the DataMasque installer (theinstall.sh
script).Improvements to date of birth column detection in the Ruleset Generator.
The
delimiter
option can be specified for tabular file masking to set the delimiter of character delimited files. For example,delimiter: "\t"
for tab-delimited files. When omitted, this defaults to,
(CSV).Support for masking UUID columns in PostgreSQL.
Added special UTF8 versions of some default seed files for use with columns or files that support UTF8 encoded input. These files have
UTF8
in their names. Any default seed file withoutUTF8
in its name contains only ASCII characters.Support for XML namespaces. This includes the
xml
mask,hash_sources
and conditional masking.File masking
skip
orinclude
rules can now be applied to the entire path name or just the file name.The
from_random_date
andfrom_random_datetime
masks can now specifycurrent_date_time
as themin
ormax
value. This allows the ruleset to stay up to date with the current execution time.now
can also be used as a shorter synonym ofcurrent_date_time
.Support for retaining parameters when re-running a previous run.
Batch size, ruleset UUID and name are all displayed on their own line in the run log, for easier auditing.
Task configuration (ruleset) is now added to the runlog for all tasks.
Ruleset Generator now supports a wider range of unique key generators.
Added the
update_foreign_keys
option to unique key masking of additional cascades.
Changed
The default value of
on_missing
forjson
andxml
masks is nowerror
. That is, if this option was not specified in prior versions of DataMasque, it would default toskip
. Attempting to mask an XML or JSON node that did not exist would simply skip to the next mask. From 2.11 onwards, the task will now fail. Previous behaviour can be restored by explicitly settingon_missing: skip
in the relevant places in the ruleset.The same table and/or columns may be masked more than once in a single ruleset. This now produces a warning instead of an error. The exception to this are:
- Amazon DynamoDB, a table may only be masked once per ruleset.
- The same table can not be masked multiple times in a
parallel
block.
The same file cannot be masked multiple times in a single ruleset. Using
skip
orinclude
rules can prevent file double-ups. The masking run will not start if multiple masks apply to the same file.xml
,retain_age
,retain_date_component
andretain_year
masks have been updated to retain anull
ifnull
was retrieved from an SQL column. Previously they would fail or use a fallback mask.Tabular file masking now defaults to character delimited files (e.g. CSV ) for files that don't match other known prefixes.
The file extension for fixed-width files can now be specified with or without a leading
.
(e.g.fixed_width_extension: .txt
orfixed_width_extension: txt
).imitate
mask was renamed tosubstitute
.imitate
is still available for backwards compatibility but may be removed in a future version of DataMasque.The
json
mask now returns the same type it received, for example, JSON encoded as a string will be returned as a JSON encoded as a string; already decoded JSON data will be returned as a raw object.The minimum year in
retain_date_component
defaults to 100 years less than the maximum year (if not specified). Previously it was 100 years before today.Foreign keys can no longer be selected in the ruleset generator. Instead, their foreign primary key should be masked and cascaded.
Multiple target column formats of
mask_unique_key
are allowed to be variable-width.
Fixed
Unique key masking cascades now work for an arbitrary number of levels, and will cascade to foreign keys that reference supersets of the set of masked target key columns
Added the
update_foreign_keys
option to unique key masking of additional cascades.Better support for conditionals between different types, e.g. floats, ints and decimals.
Better support for conditional dates parsed from strings.
Improved support for ruleset generator when working with composite keys.
Support for files masking with blank root directory.
Run log counts for total items masked fixed for parallel or failed tasks.
Glob matching on filename now enters into all directories to check for matching files.
Corrected file-masking IAM example roles in documentation.
Long ruleset names no longer overflow their container on the Run Preview screen.
The
force_consistency
option withjson
mask no longer fails if being applied tonull
or scalar values.The special schema discovery runs are not allowed to be rerun in the run log list.
Only connections that support schema discovery are listed in the Ruleset Generator connections menu.
Ruleset Generator now supports MSSQL
VARCHAR
columns with-1
ornull
length.Multiple
hash_source
s are now allowed in a file masking ruleset.Files with unsupported encodings now show a clearer error message.
The
skip
option for skipping values in tabular file masking.mask_unique_key
now performsnull
replacements even if all rows containnull
in a key column.For MySQL, constraints added during masking are now successfully removed.
The run log now reports the actual batch sized used for
mask_unique_key
(e.g. if the batch size is made smaller than that specified because it was bigger than the number of rows in the target table).
[2.10.1] - 2023-01-24
Fixed
mask_file
task withfrom_file
mask.xml
mask no longer requires afallback_mask
to be specified.
[2.10.0] - 2022-12-12
Changed
Performance improvements to MSSQL Linked Server table masking.
Performance improvements to Ruleset Generator.
Fixed
mask_type
parameter in Connection Test API is optional for backwards compatibility. Defaults todatabase
.Additional informational errors in file masking runs.
Link to File Ruleset Creator inside File Ruleset List.
[2.9.0] - 2022-11-14
Added
Object file masking support:
- JSON file masking.
- XML file masking.
- Full-file redaction of other file types.
Tabular file masking support:
- CSV files.
- Parquet files.
- Fixed-width column files.
File masking in AWS S3 and Azure Blob Storage.
JSON mask generator.
retain_date_component
: mask a date but retain the year, month or day.retain_year
: mask a date's month and day, retaining the year.from_choices
: select random values from a list of choices, with optional weighting. A good alternative tofrom_file
if there are only a small number of choices or if weighting is required.typecast
support for typecasting todate
.Support for XML documents/columns with encoding declaration.
Support for
mask_unique_key
on MySQL autoincrement columns.Added Rerun button to perform a run again with the same connection(s) and ruleset.
Added Edit button for ruleset snapshot in run log.
Support for composite unique keys in Ruleset Generator.
Fixed
Credit card masking with unknown prefix when using
retain_prefix
does not error, instead it will retain just the first digit of the credit card.Errors within sub-masks of
chain
orconcat
are displayed in the run log.Default batch size is now 50,000 rows.
JSON masking of
null
database values no longer get converted to JSONnull
s ("null"
string).MSSQL connections failing to be terminated if cancelled during a
run_sql
task.Non-generic error message if trying to create a run with connection(s) or rulesets that do not exist.
Fixed cancelling MySQL runs.
mask_table
runs will not start if attempting to mask a column that's also used as a key.Default seed files do not contain non-ASCII characters, so width counts are compatible with any columns regardless of encoding.
If no license is present, then a non-generic error is shown when executing schema discovery.
Better handling of invalid SQL in
run_sql
tasks.
Changed
Xpath based hashing extracts the first value for single-item arrays instead of hashing on the array.
Column width is taken into account when generating
from_random_text
rules with Ruleset Generator.Ruleset Generator uses
,
as the glue for addresses rather than,
.Autoscaling of Celery workers is disabled, instead the pool size is 16 available processes.
Ruleset Editor size increase.
[2.8.1] - 2022-10-21
Changed
- Performance improvements to MySQL table masking.
[2.8.0] - 2022-09-23
Added
MySQL database support.
MSSQL Linked Server support.
Masking of XML columns, with new
xml
mask.Simple and automatic character group replacement with new
imitate
mask.Generate credit card numbers in many formats with new
credit_card
mask.Mask numeric values into the same "bucket" with new
numeric_bucket
mask.Mask dates while retaining age with new
retain_age
mask.Elements of JSON or XML documents can be used as hash values.
Option to enforce consistency across multiple JSON elements when using the
json
mask.Generated rulesets will now automatically add substring masks if masked data would exceed the length of the column.
Ruleset YAML can be uploaded through the web UI.
Primary key/unique key masking now automatically cascades through multiple tables.
Added warning that row counts might not be accurate if using
skip
orif
.
Fixed
Improved column detection in ruleset generator for ages and names.
Fixed erroneous datatype mismatch error sometimes generated when using
skip
rules.Reduce memory usage in run log API fetch.
Empty strings (or other specified values) in CSVs can now be configured to be treated as
null
.Included seed files have had duplicate values removed.
Fixed random generation when using hash columns with
replace_regex
sub masks.MSSQL database constraints are no longer updated in a dry run.
Improved consistency between sensitive data discovery web display and CSV export.
Fixed expected row count check when using
where
.Rulesets with errors no longer create additional rulesets each time they are saved.
Changed
"Select All" in the ruleset generator now selects only visible rows.
Generated database IDs now have the
dm-
prefix to distinguish them from non-generated.Data read from CSVs is now always treated as strings.
Hashing now differentiates correctly between
null
and the string"None"
.
[2.7.2] - 2022-08-11
Fixed
Values in CSV seed files are now treated as strings to preserve formatting and leading zeros in numeric columns.
CSV quoting added to default CSV seed files.
Added sorting of MSSQL key column names for consistency when masking tables with composite keys.
More robust MSSQL database ID retrieval.
[2.7.1] - 2022-08-04
Fixed
- Saving Rulesets would sometimes cause an error.
[2.7.0] - 2022-08-02
Added
Better ruleset generation, more keywords are detected and used to generate rulesets.
Ruleset generation for Amazon Redshift databases.
Role based permissions, with Mask Builder and Mask Runner roles.
Masking of JSON columns, with new
json
mask.Masking of values using format strings with the
from_format_string
mask.Support for on-premise Active Directory with SAML Single Sign-On.
Script to update
ALLOWED_HOSTS
on DataMasque web admin.PostgreSQL
search_path
now included in run logs.Detecting and warning of tables in multiple schemas on PostgreSQL.
Fixed
Generated ruleset editor form prevents losing changes when navigating away.
Prevent deletion of built-in CSV seed files.
Connecting to MSSQL 2012 in some circumstances.
Changed
Buffer size (number of rows to fetch and mask at once) has been renamed to batch size.
Batch size may be specified on a per-table basis.
[2.6.1] - 2022-06-21
Fixed
UI display issues on Dashboard.
Validation errors not being cleared after Ruleset errors fixed in Ruleset Editor.
Restore warning upon leaving Ruleset Editor page when there are unsaved changes.
List of PII categories in Ruleset Generator now displays correctly.
Changed
Add Ruleset button in Ruleset List now links directly to Ruleset Generator.
Additional cascades are no longer included in Ruleset generation.
[2.6.0] - 2022-06-09
Added
Support for cross-schema masking.
The ruleset generation functionality to automate generating YAML rulesets.
Support to subscribe to email notifications for masking runs.
Support for format strings in
mask_unique_key
.Support for UUID mask pattern that generates unique values in the Universal Unique Identifier (UUID) format.
Masking summary information to run logs.
Validation for the use of duplicate tables/columns in rulesets.
Support for the use of the wildcard character
*
when specifying keywords.Support for the use of space or underscore
_
or dash-
when specifying keywords.New address seed files.
Fixed
Cancelling tasks incorrectly labelled them as failed masking runs.
Setting ‘Continue on failure’ to true failed to allow masking runs to continue executing when there is a task failure.
Changed
For Microsoft SQL Server, page lock is disabled on target tables during masking runs.
Improved performance and decreased memory consumption of from_file masking.
Renamed custom keywords to global data classification keywords.
Renamed ignored keywords to global ignored keywords.
[2.5.0] - 2022-03-11
Added
Support for Amazon Redshift.
Support for specifying a schema name for PostgreSQL connections.
Changed
Expose additional API endpoints. See API Reference for more details.
Validate Run secret has a minimum of 20 characters.
Improved handling of case-insensitive table and column names for PostgreSQL databases.
Prevent double masking on the same column(s) that are part of multiple foreign key constraints.
Removed
- Uniqueness validation of
key
column(s) in mask_table task.
[2.4.0] - 2021-11-05
Added
- YAML Templating tool.
Deprecated
The
name
attribute of the ruleset YAML is deprecated in favour of ruleset name in the ruleset's property.The
random_seed
attribute the ruleset YAML is deprecated in favour of the Run secret option.
[2.3.0] - 2021-09-30
Added
Support for PostgreSQL 11, 12 and 13.
Support for Microsoft SQL Server named instance.
Deployment support for AWS Marketplace.
Definitions to provide reusable task, mask and rule YAML blocks in ruleset.
Changed
- Support multiple SQL statements in masking task
run_sql
.
[2.2.0] - 2021-08-05
Added
Support for PostgreSQL 9.6 and 10.
Sensitive data discovery with built-in, custom and ignored keywords.
Changed
- Improved parallelism implementation to further optimise data masking performance.
[2.1.0] - 2021-06-02
Added
- SAML Single Sign-On (SSO) integration for Azure Active Directory.
Removed
- Support for deployment on Cohesity v6.3.1 App Marketplace.
[2.0.0] - 2021-05-17
Added
- The
mask_unique_key
task type to support masking of primary keys and unique keys.
Changed
- Improved handling of case-insensitive table and column names for SQL Server databases.
Removed
- The deprecated
table_name
attribute of themask_table
andtruncate_table
tasks. This attribute has been replaced bytable
. - The deprecated
max_workers
attribute of themask_table
task. This attribute has been replaced byworkers
.
[1.3.0] - 2021-04-30
Added
- Support for Microsoft SQL Server 2019.
Fixed
- An issue that caused an incorrect masking run status to be reported.
Changed
- Documentation improvements for the
workers
andkey
of themask_table
task. - Licence quota consumption reduction can happen if database size reductions are sustained.
- Database size calculation for licence quota consumed by Microsoft SQL Server databases now excludes offline files.
- Simultaneous masking runs on the same connection are disallowed.
Deprecated
- The
table_name
attribute of themask_table
andtruncate_table
tasks is deprecated in favour oftable
. - The
max_workers
attribute of themask_table
task is deprecated in favour ofworkers
.
[1.2.2] - 2021-03-12
Added
- Uniqueness validation of
key
column(s) inmask_table
task for Microsoft SQL Server connections. - Automated quarterly usage summary email.
Fixed
- An issue in detecting errors in parallel tasks.
- An issue on masking
key
with valueNULL
for Microsoft SQL Server connections.
Changed
- Documentation improvements on database privileges requirements and installation guides.
Removed
- Upper limit on
max_workers
formask_table
tasks is removed.
Deprecated
- The use of any value other than
ROWID
forkey
attribute inmask_table
task is deprecated for Oracle connections. - The
where
attribute formask_table
task is deprecated.
[1.2.1] - 2020-12-11
Added
- Support for joined table columns as
hash_columns
for deterministic masking.
[1.2.0] - 2020-11-16
Added
- Additional support for Microsoft SQL Server 2012 and 2014.
- Licence quota breaches and expiry notification.
- Enhancement on Ruleset YAML editor with ruleset YAML schema validation, documentation hover display, and auto-complete.
- System audit logs to web interface.
- Deterministic / hash based masking.
- Support for multiple Oracle wallets in database connections.
- Sample input and output for each supplied mask in the user guide.
- Masks to generate random decimal numbers, booleans, and dates.
- A
Continue on failure
option in the web interface to perform masking runs that will continue on task failures. - Deployment support for Cohesity v6.5.1 App Marketplace.
Fixed
- An issue that caused the
from_random_text
mask to ignore the rulesetrandom_seed
parameter. - Timezone truncation when masking
TIMESTAMP WITH TIME ZONE
columns in Oracle databases. - An issue that displayed misleading error message on ruleset editor.
Changed
- Migrate to a cumulative usage licensing quota.
run_sql
now runs queries with 'auto commit' enabled.- Now supports both SSL version 1.0 and 1.2 (previously 1.0 only) for Oracle Wallet.
- Improved API performance.
[1.1.0] - 2020-06-23
Added
- Multi-user support.
[1.0.0] - 2020-06-04
DataMasque is a best-of-breed data masking solution that empowers organisations to take control of their data security and makes protecting privacy, identity and rights as secure and straightforward as possible.
DataMasque champions commitment to data privacy and is fundamentally built and designed to promote masking irreversibility.