DataMasque Portal

API Endpoints

Authentication

The DataMasque API uses token authentication. Tokens are 40-character strings containing 0-9 and a-f. Tokens should be included in the Authorization HTTP header for each request, with the word Token prepended.

For example

GET /runs/123/
Authorization: Token abcdef1234567890abcdef1234567890abcdef12

There are two types of authentication tokens:

  1. A non-expiring API Token which has access to only some endpoints. You can get this token from the My Account page.
  2. A User Token that is valid for only 12 hours, but has access to all endpoints. User tokens are granted by posting your username and password to the /api/auth/token/login/ endpoint.

The documentation for each endpoint on this page includes the type of token that is required to access it. If an endpoint does not require the use of the Authorization header then its authorization is noted as Anonymous.

The purpose and use case of each token type is explained below.

API Token

The API Token is a long-lived credential retrieved from the My Account page. It remains valid indefinitely, unless revoked (also on the My Account page). This token is valid only for use with specific API endpoints.

It is designed to be used in automated scripts whose content may not be stored securely, therefore it mainly has access to controlling masking runs and checking their status.

User Token

The User Token is exclusively issued after a successful login, either through the user interface or by making a request to /api/auth/token/login/.

This token offers enhanced security due to its limited lifetime, expiring after 12 hours, and is only accessible after a successful login. When accessing DataMasque through the UI, the token is granted as a cookie which will expire after 1 hour of inactivity.

It can be used against all API endpoints, and grants access based on the user account's permissions.

Both token types serve distinct purposes within the DataMasque API, offering a balance between security and usability.

POST /api/auth/token/login/

Authorization: Anonymous.

Login with a username and password to obtain a user_token.

POST /api/auth/token/login/ Parameters

Field Type Required Location Description
username string Yes Request Body The username of the user you are logging in as.
password string Yes Request Body The password for the user.

POST /api/auth/token/login/ Responses

Status Code Description
200 A JSON serialised user object, including a short-lived API key.

POST /api/auth/token/login/ Postman example

  1. Open Postman.
  2. Create a new request.
  3. Set the method to POST and the URL to https://<your-datamasque-host>/auth/token/login/.
  4. Under Headers, add Content-Type as a key and set the value as application/json.
  5. Select the Body tab then the raw button.
  6. Include your DataMasque login details in this format in the text editor shown:
{
  "username": "<your-username>",
  "password": "<your-password>"
}
  1. Press the blue Send button to the right of the URL bar.

POST /api/auth/token/login/ curl example

curl -X POST "https://<your-datamasque-host>/api/auth/token/login/" \
     -H "Content-Type: application/json" \
     -d '{"username": "<your-username>", "password": "<your-password>"}'

User Object

User objects have the following fields:

Field Type Description
id integer The id of the User.
username string The username for the User. Used when logging in.
email string The email of the User.
date_joined date The date the User was created.
api_token string The API token for the User.
has_temporary_password boolean Whether user has a temporary password or not. If true, the user has not finalised their account creation.
is_active boolean Whether or not the user account is active. If false, the account is disabled.
is_staff boolean Whether or not the user is a staff account.
is_superuser boolean Whether or not the account is a superuser and has admin privileges.
is_sso_user boolean Whether or not the account is an SSO enabled account.
is_subscribed_to_sdd_updates boolean Whether or not the user has subscribed to sensitive data discovery updates.
user_roles array[string] List of roles assigned to the user. Full list of roles can be found in User Roles
user_permissions array[string] List of permissions assigned to the user.

User Roles

User objects may be assigned one or none of the below roles, as part of their user_roles array.

Role Description
mask_runner A user with this role is responsible solely for executing masking operations.
mask_builder In addition to the capabilities of the mask_runner role, this role includes the ability to create and manage rulesets.

GET /api/users/

Authorization: Admin User token only.

Returns a list of user accounts.

GET /api/users/ Parameters

No parameters.

GET /api/users/ Responses

Status Code Description
200 Returns a JSON serialised list of User objects.

GET /api/users/ curl example

curl "https://<your-datamasque-host>/api/users/" \
     -H "Authorization: Token <your-api-token>"

GET /api/users/{id}/

Authorization: Admin User token or the user themselves.

Retrieve information about a specific user.

GET /api/users/{id}/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the user.

GET /api/users/{id}/ Responses

Status Code Description
200 A JSON serialized User object for the specified user.
403 Forbidden: If the token does not have the required permissions.
404 Not Found: If the user with the specified id does not exist.

GET /api/users/{id}/ curl example

curl "https://<your-datamasque-host>/api/users/{id}/" \
     -H "Authorization: Token <your-api-token>"

`

GET /api/users/me/

Authorization: User token only.

Returns the details of the currently logged-in user.

GET /api/users/me/ Responses

Status Code Description
200 Returns a JSON serialised User object for the user that is currently logged in.

GET /api/users/me/ curl example

curl "https://<your-datamasque-host>/api/users/me/" \
     -H "Authorization: Token <your-api-token>"

POST /api/users/me/ curl example

curl -X POST "https://<your-datamasque-host>/api/users/me/profile/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{"git_directory_path": "path/to/root"}'

POST /api/users/

Authorization: Admin User token only.

Create a new user account.

POST /api/users/ Parameters

Field Type Required Location Description
username string Yes Request Body The username of the user being created.
password string Yes Request Body The password for the new user account.
re_password string Yes Request Body The password for the new user again, to confirm the password entered above.
email string Yes Request Body The email address of the new user.
role array[string] No Request Body The role(s) assigned to the user. If provided, the user will be added to the specified group(s). Defaults to no role which has the same permissions as mask_runner.

POST /api/users/ Responses

Status Code Description
201 A JSON serialized User object for the created user.
400 Bad Request: If the request data is invalid or user creation is disabled.
403 Forbidden: If the token does not have the required permissions.

POST /api/users/ curl example

curl -X POST "https://<your-datamasque-host>/api/users/" \
     -H "Authorization: Token <your-admin-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "username": "<your-new-username>",
           "password": "<your-new-password>",
           "re_password": "<your-new-password>",
           "email": "<your-new-email>",
           "role": "<your-user-role>"
         }'

GET /api/users/me/

Authorization: User token only.

Returns the details of the currently logged-in user.

GET /api/users/me/ Responses

Status Code Description
200 Returns a JSON serialised User object for the user that is currently logged in.

GET /api/users/me/ curl example

curl "https://<your-datamasque-host>/api/users/me/" \
     -H "Authorization: Token <your-api-token>"

GET /api/users/{id}/

Authorization: Admin User token (to query any user's details) or the queried user's token.

Retrieve information about a specific user.

GET /api/users/{id}/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the user.

GET /api/users/{id}/ Responses

Status Code Description
200 A JSON serialized User object for the specified user.
403 Forbidden: If the token does not have the required permissions.
404 Not Found: If the user with the specified id does not exist.

GET /api/users/{id}/ curl example

curl "https://<your-datamasque-host>/api/users/{id}/" \
     -H "Authorization: Token <your-api-token>"

PATCH /api/users/{id}/

Authorization: Admin User token (to update any user's details) or the updating user's token.

Partially update information for a specified user.

PATCH /api/users/{id}/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the user to update.
username string No Request Body The new username of the user. Only an Admin User can update this.
email string No Request Body The new email address of the user. An Admin User or the user themselves can update this.
user_roles array[string] No Request Body The role(s) assigned to the user. If provided, the user will be added to the specified group(s). Only an Admin User can update this.

PATCH /api/users/{id}/ Responses

Status Code Description
200 A JSON serialized User object for the updated user.
400 Bad Request: If the request data is invalid.
403 Forbidden: If the token does not have the required permissions.
404 Not Found: If the user with the specified id does not exist.

PATCH /api/users/{id}/ curl example

curl -X PATCH "https://<your-datamasque-host>/api/users/{id}/" \
     -H "Authorization: Token <your-admin-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "username": "<your-new-username>",
           "email": "<your-new-email>",
           "user_roles": ["<user-role>"]
         }'

PUT /api/users/{id}/

Authorization: Admin User token (to update any user's details) or the updating user's token.

Update information for a specified user.

PUT /api/users/{id}/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the user to update.
username string No Request Body The new username of the user. Only an Admin User can update this.
email string No Request Body The new email address of the user. An Admin User or the user themselves can update this.
user_roles array[string] No Request Body The role(s) assigned to the user. If provided, the user will be added to the specified group(s). Only an Admin User can update this.

PUT /api/users/{id}/ Responses

Status Code Description
200 A JSON serialized User object for the updated user.
400 Bad Request: If the request data is invalid.
403 Forbidden: If the token does not have the required permissions.
404 Not Found: If the user with the specified id does not exist.

PUT /api/users/{id}/ curl example

curl -X PUT "https://<your-datamasque-host>/api/users/{id}/" \
     -H "Authorization: Token <your-admin-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "username": "<your-new-username>",
           "email": "<your-new-email>",
           "user_roles": ["<user-role>"]
         }'

POST /api/users/{id}/reset-password/

Authorization: Admin User token only.

Reset the password for a specified user.

POST /api/users/{id}/reset-password/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the user whose password is being reset.

POST /api/users/{id}/reset-password/ Responses

Status Code Description
200 Returns a JSON object with the new temporary password.
403 Forbidden: If the token does not have the required permissions.
404 Not Found: If the user with the specified id does not exist.

POST /api/users/{id}/reset-password/ curl example

curl -X POST "https://<your-datamasque-host>/api/users/{id}/reset-password/" \
     -H "Authorization: Token <your-admin-api-token>" \
     -H "Content-Type: application/json"

Profile Object

A Profile object stores settings for a particular user. There is a one-to-one relationship between a user and their Profile. A Profile object may only be updated by the user that it belongs to (i.e. a user can only update their own Profile, admins cannot update Profiles of other users).

Profile objects have the following fields:

Field Type Description
git_directory_path string The Git directory path for this user when pushing/pulling rulesets to/from a Git repository.

Extra Field Notes

git_directory_path

This overrides the global Git directory for the DataMasque instance, for this user only. This value can be set even if Git integration is disabled, it will just have no effect.

GET /api/users/me/profile/

Authorization: User token only.

Returns the Profile Object for the currently logged-in user.

GET /api/users/me/profile/ Parameters

No parameters.

GET /api/users/me/profile/ Responses

Status Code Description
200 Returns a JSON serialised Profile object, with fields as described above.

GET /api/users/me/profile/ Parameters

No parameters.

GET /api/users/me/profile/ curl example

curl "https://<your-datamasque-host>/api/users/me/profile/" \
     -H "Authorization: Token <your-api-token>"

POST /api/users/me/profile/

Authorization: User token only.

Updates the Profile object for the current user. Partial updates are supported: only fields that are contained in the request will be updated (i.e. if a field is not present in the request then its stored value remains unchanged).

POST /api/users/me/profile/ Responses

Status Code Description
204 The Profile update was successful.

Run Object

Run objects have the following fields:

Field Type Description
id integer The id of the Run. Use this in API URLs that need a run id.
name string The name of the Run.
status string Indicates the Run status. The potential values are: queued, running, finished, finished_with_warnings, failed, cancelling, and cancelled. A status of finished or finished_with_warnings indicates the Run completed successfully; failed indicates an error. finished_with_warnings indicates there were warnings during the run, refer to the run log to view them.
mask_type string The masking type of the Run, valid options are "database" or "file".
connection string Deprecated, replaced by source_connection.
connection_name string Deprecated, replaced by source_connection_name.
source_connection string A UUID identifying the source connection used for this Run. For database connections, the source_connection also acts as the destination.
source_connection_name string The name of the source connection of the Run. For database connections, the source_connection also acts as the destination.
destination_connection Optional[string] A UUID identifying the destination connection used for this Run. Only present for file connections, as the source_connection also acts as the destination for database connections.
destination_connection_name Optional[string] The name of the destination connection of the Run. Only present for file connections, as the source_connection also acts as the destination for database connections.
ruleset string A UUID identifying the ruleset used for this Run.
ruleset_name string Ruleset name of the Run.
start_time string Start time of the Run, in ISO 8601 format.
end_time string End time of the Run, in ISO 8601 format.
options object An Option object of configuration for the Run.

GET /api/runs/

Authorization: User token or API token.

Get a list of DataMasque Runs.

GET /api/runs/ Parameters

Field Type Required Location Description
mask_type string No Query Parameter The mask type of the Run. The potential values are: database, file.
connection_ruleset_name string No Query Parameter The name of the source or destination connection name or the ruleset name of the Run.
status string No Query Parameter The status of the Run. The potential values are: queued, running, finished, finished_with_warnings, failed, cancelling, and cancelled.

GET /api/runs/ Responses

Status Code Description
200 A JSON serialised list of Run objects.

GET /api/runs/ curl example

curl "https://<your-datamasque-host>/api/runs/" \
     -H "Authorization: Token <your-api-token>"

POST /api/runs/

Authorization: User token or API token.

Start a new masking run.

POST /api/runs/ Parameters

Field Type Required Location Description
name string Yes Request Body The name of the Run.
connection string No Request Body Deprecated, replaced by source_connection.
source_connection string Yes Request Body A UUID identifying the source connection to be used for this Run. For database connections, the source_connection also acts as the destination.
destination_connection string Required only for runs on file connections. Request Body A UUID identifying the connection to be used for this Run.
ruleset string Yes Request Body A UUID identifying the ruleset to be used for this Run.
options object Yes Request Body An Option object of configuration for this Run.

POST /api/runs/ Responses

Status Code Description
200 A JSON serialised Run object.

POST /api/runs/ curl example

curl -X POST "https://<your-datamasque-host>/api/runs/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "name": "<run-name>",
           "source_connection": "<source-connection-uuid>",
           "destination_connection": "<destination-connection-uuid>",  # Include this only if required
           "ruleset": "<ruleset-uuid>",
           "options": {
             #... option object details ...
           }
         }'

GET /api/runs/{id}/

Authorization: User token or API token.

Retrieve information about a masking run.

GET /api/runs/{id}/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the Run.

GET /api/runs/{id}/ Responses

Status Code Description
200 A JSON serialised Run object.

GET /api/runs/{id}/ curl example

curl "https://<your-datamasque-host>/api/runs/{id}/" \
     -H "Authorization: Token <your-api-token>"

POST /api/runs/{id}/cancel/

Authorization: User token or API token.

Cancel a masking run.

GET /api/runs/validate/

Authorization: User token or API token.

Validate that the run actually occurred.

GET /api/runs/validate/ Parameters

Field Type Required Location Description
run_hash string Yes Query Parameter The hash of the run that can be retrieved from run_hash column in the DATAMASQUE_RUN_HISTORY table.
run_completion_time string Yes Query Parameter The finish time of the run that can be retrieved from the run log or from the completion_time column in the DATAMASQUE_RUN_HISTORY table. It must be in the datetime format: %Y-%m-%d %H:%M:%S
ruleset_content_sha256 string Yes Query Parameter The hash of the ruleset that can be retrieved from the run log or from the ruleset_content_sha256 column in the DATAMASQUE_RUN_HISTORY table.

GET /api/runs/validate/ curl example

Given the run log contains:

SHA256 hash of ruleset: 7ee08ef63db7fed2baf577f16d74427c2250ba05f6858b0a27b70e05ccbff6eb

Finished At: 2024-05-22 22:11:35 UTC

The DATAMASQUE_RUN_HISTORY table has:

run_hash: 8d34cc930ce7eae40a633e95aef3aee5d2108511eb20ac35805f2e0834115bb9

curl -X GET "https://<your-datamasque-host>/api/runs/validate/?run_hash=8d34cc930ce7eae40a633e95aef3aee5d2108511eb20ac35805f2e0834115bb9&run_completion_time=2024-05-22 22:11:35&ruleset_content_sha256=7ee08ef63db7fed2baf577f16d74427c2250ba05f6858b0a27b70e05ccbff6eb" \
     -H "Authorization: Token <your-api-token>"

POST /api/runs/{id}/cancel/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the Run.

POST /api/runs/{id}/cancel/ Responses

Status Code Description
201 Operation succeeded

POST /api/runs/{id}/cancel/ curl example

curl -X POST "https://<your-datamasque-host>/api/runs/{id}/cancel/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json"

GET /api/runs/{id}/sdd-report/

Authorization: User token only.

A binary serialised SDD Report object.

GET /api/runs/{id}/sdd-report/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the Run.

GET /api/runs/{id}/sdd-report/ Responses

Status Code Description
200 The server will return the SDD Report in the response body which can be downloaded as a CSV file.
404 If there is no SDD Report for a run, the server will return 404 status code.

GET /api/runs/{id}/sdd-report/ curl example

curl "https://<your-datamasque-host>/api/runs/{id}/sdd-report/" \
     -H "Authorization: Token <your-api-token>"

GET /api/runs/{id}/run-report/

Authorization: User token only.

A binary serialised Run Report object.

GET /api/runs/{id}/run-report/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the Run.

GET /api/runs/{id}/run-report/ Responses

Status Code Description
200 The server will return the Run Report in the response body which can be downloaded as a CSV file.
404 If there is no Run Report for a run, the server will return 404 status code.

GET /api/runs/{id}/run-report/ curl example

curl "https://<your-datamasque-host>/api/runs/{id}/run-report/" \
     -H "Authorization: Token <your-api-token>"

DELETE /api/runs/{id}/db-discovery-results/

Deletes the database discovery results for a run. Use this only when the results are no longer needed, for instance because you have completed another discovery run on the same database more recently.

Warning! Deletion of results is irreversible.

Note: This endpoint can only be used to delete discovery results that were created on versions of DataMasque v2.22 and later. It is not possible to delete discovery results from versions prior to v2.22.

DELETE /api/runs/{id}/db-discovery-results/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the Run.

DELETE /api/runs/{id}/db-discovery-results/report/ Responses

Status Code Description
204 Deletion was successful.
404 Not Found: There are no database discovery results for this run, or a run with the specified ID does not exist.

DELETE /api/runs/{id}/db-discovery-results/ curl example

curl -X DELETE "https://<your-datamasque-host>/api/runs/{id}/db-discovery-results/" \
     -H "Authorization: Token <your-api-token>"

GET /api/runs/{id}/db-discovery-results/report/

Downloads database schema discovery results as a CSV.

GET /api/runs/{id}/db-discovery-results/report/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the Run.

GET /api/runs/{id}/db-discovery-results/report/ Responses

Status Code Description
200 The server will return the discovery results in the response body which can be downloaded as a CSV file.
404 Not Found: There are no database discovery results for this run, or a run with the specified ID does not exist.

GET /api/runs/{id}/db-discovery-results/report/ curl example

curl -o report.csv "https://<your-datamasque-host>/api/runs/{id}/db-discovery-results/report/" \
     -H "Authorization: Token <your-api-token>"

Option Object

Option objects have the following fields:

Field Type Description
batch_size integer An argument to specify the number of rows to fetch in each batch retrieved from the database for masking. This is ignored for file masking.
dry_run boolean Indicates a dry run where no data in the database is actually changed. Values should either be true to indicate a dry run, or false to run normally. Default value is false. More information on dry runs is available in the Masking runs documentation.
max_rows integer A parameter to specify the maximum number of rows that will be masked by each mask_table task1. Defaults to no limit. This is ignored for file masking.
continue_on_failure boolean If there is a task failure, and this option is false, DataMasque will skip all remaining unstarted tasks. If this option is true, DataMasque will continue performing other tasks even if there is a task failure. Default value is false.
run_secret string The run secret is used in the random generation of masked values. If left unspecified, a random secret will be automatically generated and returned in the API response 2. Masking runs performed on the same DataMasque instance with the same run secret will produce the same masked values for identical unmasked database inputs. You should only specify a run secret if you require consistent masking across runs, otherwise it is more secure to allow a new run secret to be automatically generated for each run. Run secrets must be at least 20 characters long.
disable_instance_secret boolean If this option is set to true, DataMasque will exclude its instance-specific secret and generate masked values based solely on the run secret. You may wish to disable the instance in order to achieve consistent masking across DataMasque instances. However, by disabling the instance secret, any DataMasque instance using the same run_secret could replicate your data masking.
diagnostic_logging boolean If set to true, the run log will include information to help diagnose errors. This includes information about the tables, columns and keys being masked, memory usage information and more verbose output. Defaults to false.
buffer_size (deprecated; will be removed in release 3.0.0) integer Replaced by batch_size.

1 max_rows does not apply to mask_unique_key tasks.

2 The run_secret contained in the API response can be provided in subsequent API calls to start runs, facilitating consistent masking across those runs.

Additionally, the following options apply to schema discovery runs (i.e. runs that include at least one run_schema_discovery task):

Field Type Description
custom_keywords array[string] List of keywords that, where a column's name matches one or more of the keywords, indicates the column contains sensitive data. Default value is an empty list.
ignored_keywords array[string] List of keywords that, where a column's name matches one or more of the keywords, indicates the column should be excluded from the schema discovery results. Default value is an empty list.
disable_global_custom_keywords boolean If set to true, then the user-defined global set of custom keywords will not be used to flag columns as sensitive. Default value is false.
disable_global_ignored_keywords boolean If set to true, then the user-defined global set of ignored keywords will not be used to exclude columns from the schema discovery results. Default value is false.
disable_built_in_keywords boolean If set to true, then DataMasque's built-in list of keywords will not be used to flag columns as sensitive. Default value is false.
schemas array[string] List of schema (database for MySQL/MariaDB) names against which to perform schema discovery. Default value is an empty list, meaning schema discovery will run against the schema configured on the database connection, or the database user's default schema. Default value is an empty list.

Runlog Object

Runlog objects have the following fields:

Field Type Description
run integer ID of the Run this Runlog was generated for.
timestamp string Timestamp of this Runlog's generation, in ISO 8601 format.
message string The log message passed from the masking worker.
log_level integer Numeric representation of the log level, values are 20 for INFO, 30 for WARNING, and 40 for ERROR.
status string Indicates the Run status. The potential values are: queued, running, finished, finished_with_warnings, failed, cancelling, and cancelled. A status of finished or finished_with_warnings indicates the Run completed successfully; failed indicates an error. finished_with_warnings indicates there were warnings during the run, refer to the run log to view them.
is_dry_run boolean Indicates whether the Run is a dry run.

GET /api/runs/{id}/log/

Authorization: User token or API token.

List all logs for a specified Run in a JSON response.

GET /api/runs/{id}/log/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the Run.
limit integer No Query Parameter The maximum number of RunLog entries to return.
offset integer No Query Parameter The starting position of the query in relation to the complete set of RunLogs for this Run.
ordering integer No Query Parameter Controls the order of the results. Available fields to order by are id and timestamp. Reverse the order by prefixing the field name with -. Multiple orderings may be specified separated by a comma.

GET /api/runs/{id}/log/ Responses

Status Code Description
200 A JSON serialised list of Runlog objects. Default is to return the all the logs for the run.

GET /api/runs/{id}/log/ curl examples

Fetch the complete run log:

curl "https://<your-datamasque-host>/api/runs/{id}/log/" \
     -H "Authorization: Token <your-api-token>"

Fetch the first 25 logs:

curl "https://<your-datamasque-host>/api/runs/{id}/log/?limit=25&offset=0" \
     -H "Authorization: Token <your-api-token>"

Fetch logs from 50-100:

curl "https://<your-datamasque-host>/api/runs/{id}/log/?limit=50&offset=50" \
     -H "Authorization: Token <your-api-token>"

Order by timestamp and id descending (newest first):

curl "https://<your-datamasque-host>/api/runs/{id}/log/?ordering=-timestamp,-id" \
     -H "Authorization: Token <your-api-token>"

GET /api/runs/{id}/log/download/

Authorization: User token only.

All logs for a specified Run in a plain text file.

GET /api/runs/{id}/log/download/ Parameters

Field Type Required Location Description
timezone string Yes Query Parameter Timezone offset to use for the Run logs in the format +HH:MM or -HH:MM. Example: +07:00, -05:00.

GET /api/runs/{id}/log/download/ Responses

Status Code Description
200 The server will return the Run Log content in the response body which can be downloaded as a log file.

GET /api/runs/{id}/log/download/ curl example

curl "https://<your-datamasque-host>/api/runs/{id}/log/download/?timezone=+07:00" \
     -H "Authorization: Token <your-api-token>"

Connection Object

Database Connection objects have the following fields:

Field Type Description
version string The connection version. This should be set to `1.0'.
id integer The id of the Connection. Use this in API URLs that need a connection id.
name string The name of the Connection.
user string The name of the user in the database connection.
db_type string The type of database the connection is connecting to.
database string The database the connection is connecting to.
host string The hostname of the database connection.
port integer The database port being connected through.
dbpassword string The password for the user connecting to the database.
schema string The schema of the database to connect to.
options object An Option object of configuration for the Run
service_name string The service name for the connection. Only used for Oracle. (Optional)
connection_fileset string The connection fileset attached to this connection. Currently only used for MySQL and MariaDB. (Optional)
mask_type string The type of masking the connection can perform, only database or file are valid. (Optional) Should be set to database for database Connections.
last_discovery_run_date string The created_time of the last run on this connection including a run_schema_discovery task, or null if no such run has been performed.
last_discovery_run_id string The ID of the last run on this connection including a run_schema_discovery task, or null if no such run has been performed.
is_read_only boolean Whether or not the connection to the database is read-only.
data_encoding string Only for Oracle, Postgres, MySQL, and MariaDB connections An encoding to be used when retrieving data containing different character sets from the database. Should match the encoding of the data stored, not the character set of the database. The list of supported encodings can be found on the Database Connections page.
iam_role_arn string Only for Amazon DynamoDB connections The IAM role ARN for DataMasque to assume role

File Connection objects have the following fields:

Field Type Description
version string The connection version. This should be set to `1.0'.
id integer The id of the Connection. Use this in API URLs that need a connection id.
name string The name of the Connection.
type string The type of file system the connection is connecting to. Valid options are "s3_connection", "azure_blob_connection" or "mounted_share_connection".
base_directory string The root file path where files intended to be masked are stored.
bucket string The name of the S3 bucket containing the base_directory. Only for S3 Connections.
container string The name of the Azure Blob Storage container containing the base_directory. Only for Azure Blob Connections.
connection_string string The connection string configured with the authorization information to access data in your Azure Storage account. Only for Azure Blob Connections.
mask_type string The type of masking the connection can perform, only database or file are valid. (Optional) Should be set to file for file Connections.
is_file_mask_source boolean A boolean if the connection is a source Connection for file masking. (Optional) Defaults to false if not provided.
is_file_mask_destination boolean A boolean if the connection is a destination Connection for file masking. (Optional) Defaults to false if not provided.

GET /api/connections/

Authorization: User token only.

Get a list of all DataMasque connections.

Optionally, you can add an {id} to the end of the request to only return the details of the connection with that specific id.

GET /api/connections/ Parameters

Can optionally follow the URL with the id of a specific connection to only return information on that connection.

GET /api/connections/ Responses

Status Code Description
200 A JSON serialised Connection object.

Quickstart example using curl

curl "https://<your-datamasque-host>/api/connections/" \
     -H "Authorization: Token <your-api-token>"

POST /api/connections/

Authorization: User token only.

Create a new connection object.

POST /api/connections/ Parameters

Database Connections

Field Type Required Location Description
version string Yes Request Body The connection version. This should be set to 1.0.
name string Yes Request Body The name of the Connection.
user string Yes Request Body The name of the user in the database connection.
db_type string Yes Request Body The type of database the connection is connecting to.
database string Yes Request Body The database the connection is connecting to.
host string Yes Request Body The hostname of the database connection.
port integer Yes Request Body The database port being connected through.
dbpassword string Yes Request Body The password for the user connecting to the database.
schema string Yes Request Body The schema of the database to connect to.
service_name string No Request Body The service name for the connection. Only applies to Oracle.
connection_fileset string No Request Body The connection fileset attached to this connection. Only applies to MySQL and MariaDB.
mask_type string No, defaults to database if not provided. Request Body The type of masking the connection can perform, only database or file are valid.
is_read_only boolean No, defaults to false if not provided. Request Body Whether or not the connection to the database read-only.
data_encoding string No, defaults to None if not provided. Request Body Only for Oracle, Postgres, MySQL, and MariaDB connections An encoding to be used when retrieving data containing different character sets from the database. Should match the encoding of the data stored, not the character set of the database. The list of supported encodings can be found on the Database Connections page.
iam_role_arn string No, role assumption will only take place if provided. Request Body Only for Amazon DynamoDB connections The IAM role ARN for DataMasque to assume role

File Connections

Field Type Required Location Description
version string Yes Request Body The connection version. This should be set to `1.0'.
name string Yes Request Body The name of the Connection.
type string Yes Request Body The type of file system the connection is connecting to. Valid options are "s3_connection", "azure_blob_connection" or "mounted_share_connection".
base_directory string Yes Request Body The root file path where files intended to be masked are stored.
bucket string Required only for S3 Connections. Request Body The name of the S3 bucket containing the base_directory.
container string Required only for Azure Blob Connections. Request Body The name of the Azure Blob Storage container containing the base_directory.
connection_string string Required only for Azure Blob Connections. Request Body The connection string configured with the authorization information to access data in your Azure Storage account.
mask_type string No, defaults to database if not provided. Request Body The type of masking the connection can perform, only database or file are valid.
is_file_mask_source boolean No, defaults to false if not provided. Request Body A boolean if the connection is a source Connection for file masking.
is_file_mask_destination boolean No, defaults to false if not provided. Request Body A boolean if the connection is a destination Connection for file masking.
iam_role_arn string No, role assumption will only take place if provided. Request Body The IAM role ARN for DataMasque to assume role as for S3 connections.

POST /api/connections/ Responses

Status Code Description
201 A JSON serialised Connection object.

POST /api/connections/ curl example

curl -X POST "https://<your-datamasque-host>/api/connections/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "version": "1.0",
           "name": "<connection_name>",
           "user": "<database_user>",
           "db_type": "<database_type>",
           "database": "<database_name>",
           "host": "<database_host>",
           "port": <database_port>,
           "password": "<database_password>",
           "schema": "<database_schema>",
           "service_name": "<oracle_service_name>",
           "connection_fileset": "<connection_fileset>",
           "mask_type": "database"
         }'

PUT /api/connections/{id}/

Authorization: User token only.

Update a connection with a specified id with new values.

PUT /api/connections/{id}/ Parameters

Database Connections

Field Type Required Location Description
version string Yes Request Body The connection version. This should be set to 1.0.
name string Yes Request Body The name of the Connection.
user string Yes Request Body The name of the user in the database connection.
db_type string Yes Request Body The type of database the connection is connecting to.
database string Yes Request Body The database the connection is connecting to.
host string Yes Request Body The hostname of the database connection.
port integer Yes Request Body The database port being connected through.
dbpassword string Yes Request Body The password for the user connecting to the database.
schema string Yes Request Body The schema of the database to connect to.
service_name string No Request Body The service name for the connection. Only applies to Oracle.
connection_fileset string No Request Body The connection fileset attached to this connection. Only applies to MySQL and MariaDB.
mask_type string No, defaults to database if not provided. Request Body The type of masking the connection can perform, only database or file are valid.
is_read_only boolean No, defaults to false if not provided. Request Body Whether or not the connection to the database is read-only.
iam_role_arn string No, role assumption will only take place if provided. Request Body The IAM role ARN for DataMasque to assume role as for S3 connections.

File Connections

Field Type Required Location Description
version string Yes Request Body The connection version. This should be set to `1.0'.
name string Yes Request Body The name of the Connection.
type string Yes Request Body The type of file system the connection is connecting to. Valid options are "s3_connection", "azure_blob_connection" or "mounted_share_connection".
base_directory string Yes Request Body The root file path where files intended to be masked are stored.
bucket string Required only for S3 Connections. Request Body The name of the S3 bucket containing the base_directory.
container string Required only for Azure Blob Connections. Request Body The name of the Azure Blob Storage container containing the base_directory.
connection_string string Required only for Azure Blob Connections. Request Body The connection string configured with the authorization information to access data in your Azure Storage account.
mask_type string No, defaults to database if not provided. Request Body The type of masking the connection can perform, only database or file are valid.
is_file_mask_source boolean No, defaults to false if not provided. Request Body A boolean if the connection is a source Connection for file masking.
is_file_mask_destination boolean No, defaults to false if not provided. Request Body A boolean if the connection is a destination Connection for file masking.
iam_role_arn string No, role assumption will only take place if provided. Request Body The IAM role ARN for DataMasque to assume role as for S3 connections.

PUT /api/connections/{id}/ Responses

Status Code Description
200 A JSON serialised Connection object with the new updated values.

PUT /api/connections/{id}/ curl example

curl -X PUT "https://<your-datamasque-host>/api/connections/{connection_id}/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "version": "1.0",
           "name": "<connection_name>",
           "user": "<database_user>",
           "db_type": "<database_type>",
           "database": "<database_name>",
           "host": "<database_host>",
           "port": <database_port>,
           "password": "<database_password>",
           "schema": "<database_schema>",
           "service_name": "<oracle_service_name>",
           "connection_fileset": "<connection_fileset>",
           "mask_type": "database"
         }'

DELETE /api/connections/{id}/

Authorization: User token only.

Delete the connection with the specified id.

DELETE /api/connections/{id}/ Parameters

No parameters.

DELETE /api/connections/{id}/ Responses

Status Code Description
204 Operation succeeded

DELETE /api/connections/{id}/ curl example

curl -X DELETE "https://<your-datamasque-host>/api/connections/{id}/" \
     -H "Authorization: Token <your-api-token>"

POST /api/connections/test/

Authorization: User token only.

Test a connection to validate that it is able to successfully connect to the target database.

POST /api/connections/test/ Parameters

Database Connections

Field Type Required Location Description
version string Yes Request Body The connection version. This should be set to 1.0.
name string Yes Request Body The name of the Connection.
user string Yes Request Body The name of the user in the database connection.
db_type string Yes Request Body The type of database the connection is connecting to.
database string Yes Request Body The database the connection is connecting to.
host string Yes Request Body The hostname of the database connection.
port integer Yes Request Body The database port being connected through.
dbpassword string Yes Request Body The password for the user connecting to the database.
schema string Yes Request Body The schema of the database to connect to.
service_name string No Request Body The service name for the connection. Only applies to Oracle.
connection_fileset string No Request Body The connection fileset attached to this connection. Only applies to MySQL and MariaDB.
is_read_only boolean No, defaults to false if not provided. Request Body Whether or not the connection to the database is read-only.
iam_role_arn string No, role assumption will only take place if provided. Request Body The IAM role ARN for DataMasque to assume role as for S3 connections.

File Connections

Field Type Required Location Description
version string Yes Request Body The connection version. This should be set to `1.0'.
name string Yes Request Body The name of the Connection.
type string Yes Request Body The type of file system the connection is connecting to. Valid options are "s3_connection", "azure_blob_connection" or "mounted_share_connection".
base_directory string Yes Request Body The root file path where files intended to be masked are stored.
bucket string Required only for S3 Connections. Request Body The name of the S3 bucket containing the base_directory.
container string Required only for Azure Blob Connections. Request Body The name of the Azure Blob Storage container containing the base_directory.
connection_string string Required only for Azure Blob Connections. Request Body The connection string configured with the authorization information to access data in your Azure Storage account.
mask_type string No, defaults to database if not provided. Request Body The type of masking the connection can perform, only database or file are valid.
is_file_mask_source boolean No, defaults to false if not provided. Request Body A boolean if the connection is a source Connection for file masking.
is_file_mask_destination boolean No, defaults to false if not provided. Request Body A boolean if the connection is a destination Connection for file masking.
iam_role_arn string No, role assumption will only take place if provided. Request Body The IAM role ARN for DataMasque to assume role as for S3 connections.

POST /api/connections/test/ Responses

Status Code Description
200 Operation succeeded

Connection Fileset Object

Connection Fileset objects have the following fields:

Field Type Description
id integer The id of the Connection Fileset. Use this in API URLs that need a connection_fileset id.
name string The name of the Connection Fileset.
database_type string The type of database the Connection Fileset is associated with (currently only mysql is supported; this will work with both MySQL and MariaDB connections).
zip_archive string The location of the Zip archive.

POST /api/connections/test/ curl example

curl -X POST "https://<your-datamasque-host>/api/connections/test/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "name": "<your-connection-name>",
           "user": "<your-connection-user>",
           "db_type": "oracle",
           "database": "<your-database>",
           "host": "<your-host>",
           "port": 1433,
           "dbpassword": "<your-password>",
           "schema": "<optional-schema>",
           "service_name": "<optional-service-name>",
           "connection_fileset": "<optional-connection-fileset>",
           "version": "1.0"
         }'

GET /api/connection-filesets/

Authorization: User token only.

Returns a list of Connection Filesets. These may be used to encrypt connections to MySQL and MariaDB databases.

GET /api/connection-filesets/ Parameters

No parameters.

GET /api/connection-filesets/ Responses

Status Code Description
201 A list of JSON serialised Connection Filesets.

GET /api/connection-filesets/ curl example

curl "https://<your-datamasque-host>/api/connection-filesets/" \
     -H "Authorization: Token <your-api-token>"

POST /api/connection-filesets/

Authorization: User token only.

Create a new Connection Fileset.

POST /api/connection-filesets/ Parameters

Field Type Required Location Description
name string Yes Form Field The name of the Connection Fileset.
database_type string Yes Form Field The type of database the Connection Fileset is associated with (currently only mysql is supported; this will work with both MySQL and MariaDB connections).
zip_archive file Yes Form Field The Zip archive file.

POST /api/connection-filesets/ Responses

Status Code Description
201 A JSON serialised object of the Connection Fileset that was created.

POST /api/connection-filesets/ curl example

curl -X POST "https://<your-datamasque-host>/api/connection-filesets/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: multipart/form-data" \
     -F "name=<fileset_name>" \
     -F "database_type=<database_type>" \
     -F "zip_archive=@</path/to/your/file.zip>"

PUT /api/connection-filesets/{id}/

Authorization: User token only.

Update a Connection Fileset.

PUT /api/connection-filesets/{id}/ Parameters

Field Type Required Location Description
name string Yes Form Field The name of the Connection Fileset.
database_type string Yes Form Field The type of database the Connection Fileset is associated with (currently only mysql is supported; this will work with both MySQL and MariaDB connections).
zip_archive file Yes Form Field The Zip archive file.

PUT /api/connection-filesets/{id}/ Responses

Status Code Description
201 A JSON serialised object of the Connection Fileset that was created.

PUT /api/connection-filesets/{id}/ curl example

curl -X PUT "https://<your-datamasque-host>/api/connection-filesets/{id}/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: multipart/form-data" \
     -F "name=<fileset_name>" \
     -F "database_type=<database_type>" \
     -F "zip_archive=@</path/to/your/file.zip>"

DELETE /api/connection-filesets/{id}/

Authorization: User token only.

Deletes the Connection Fileset with the specified id. You may not delete a Connection Fileset associated to an existing connection.

DELETE /api/connection-filesets/{id}/ Parameters

No parameters.

DELETE /api/connection-filesets/{id}/ Responses

Status Code Description
204 Operation succeeded.

DELETE /api/connection-filesets/{id}/ curl example

curl -X DELETE "https://<your-datamasque-host>/api/connection-filesets/{id}/" \
     -H "Authorization: Token <your-api-token>"

`

Ruleset Object

Ruleset objects have the following fields:

Field Type Description
id integer The id of the Ruleset. Use this in API URLs that need a ruleset id.
name string The name of the Ruleset.
config_yaml string The contents of the Ruleset, including of all the masking rules.
is_valid boolean Whether or not the Ruleset is valid, and can be used for masking runs.
mask_type string The masking type of the Ruleset. This can be "database" or "file".

GET /api/rulesets/

Authorization: User token only.

Returns a list of all rulesets.

GET /api/rulesets/ Parameters

No parameters.

GET /api/rulesets/ Responses

Status Code Description
200 A JSON serialised list of Ruleset objects.

GET /api/rulesets/ curl example

curl "https://<your-datamasque-host>/api/rulesets/" \
     -H "Authorization: Token <your-api-token>"

GET /api/rulesets/{id}/

GET /api/rulesets/{id}/ Parameters

No parameters.

GET /api/rulesets/ Responses

Status Code Description
200 A JSON serialised Ruleset object.
curl "https://<your-datamasque-host>/api/rulesets/{id}/" \
     -H "Authorization: Token <your-api-token>"

POST /api/rulesets/

Authorization: User token only.

Creates a new ruleset.

POST /api/rulesets/ Parameters

Field Type Required Location Description
name string Yes Request Body The name of the Ruleset.
config_yaml string Yes Request Body The YAML contents of the Ruleset.
mask_type string No Request Body The masking type of the Ruleset. Valid options are "database" or "file".

POST /api/rulesets/ Responses

Status Code Description
201 A JSON serialised Ruleset object.

POST /api/rulesets/ curl example

curl -X POST "https://<your-datamasque-host>/api/rulesets/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "name": "<your-new-name>",
           "config_yaml": "version: \"1.0\"\ntasks:\n  - type: run_data_discovery"
         }'

PUT /api/rulesets/{id}/

Authorization: User token only.

Update an existing ruleset.

PUT /api/rulesets/{id}/ Parameters

Field Type Required Location Description
name string Yes Request Body The name of the Ruleset.
config_yaml string Yes Request Body The YAML contents of the Ruleset.
mask_type string No Request Body The masking type of the Ruleset. Valid options are "database" or "file".

PUT /api/rulesets/{id}/ Responses

Status Code Description
200 A JSON serialised Ruleset object with the updated values.

PUT /api/rulesets/{id}/ curl example

curl -X PUT "https://<your-datamasque-host>/api/rulesets/{id}/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "name": "<your-new-name>",
           "config_yaml": "version: \"1.0\"\ntasks:\n  - type: run_data_discovery"
         }'

DELETE /api/rulesets/{id}/

Authorization: User token only.

Deletes the ruleset with the specified id.

DELETE /api/rulesets/{id}/ Parameters

No parameters.

DELETE /api/rulesets/{id}/ Responses

Status Code Description
200 Operation succeeded

DELETE /api/rulesets/{id}/ curl example

curl -X DELETE "https://<your-datamasque-host>/api/rulesets/{id}/" \
     -H "Authorization: Token <your-api-token>" \

Seed Object

Field Type Description
id integer The id of the Seed.
name string The name of the Seed.
seed_file string The location of the Seed.
created date datetime The date that the Seed was uploaded.
filename string The file name of the uploaded Seed.

GET /api/seeds/

Authorization: User token only.

Get a list of all DataMasque seed files.

Optionally, you can add an {id} to the end of the request to only return the details of the seed with that specific id.

GET /api/seeds/ Parameters

No parameters.

GET /api/seeds/ Responses

Status Code Description
200 A JSON serialised list of Seed objects.

GET /api/seeds/ curl example

curl "https://<your-datamasque-host>/api/seeds/" \
     -H "Authorization: Token <your-api-token>"

POST /api/seeds/

Authorization: User token only.

Create a new seed from a csv file.

POST /api/seeds/ Parameters

Field Type Required Description
name string No The name of the csv file.
description string No A description of the seed file to displayed on the files menu.
seed_file file No The seed file.

POST /api/seeds/ Responses

Status Code Description
201 A JSON serialised Seed object.

POST /api/seeds/ curl example

curl -X POST "https://<your-datamasque-host>/api/seeds/" \
     -H "Authorization: Token <your-api-token>" \
     -F "name=<fileset_name>" \
     -F "seed_file=@</path/to/your/seed_file.csv>"

Audit Log Object

Field Type Description
id integer The id of the audit log.
timestamp datetime The timestamp of when the audit log was created.
username string The username which created the audit log.
category string The category for the audit log, one of the following: auth, run, ruleset, or connection
action string The action taken. One of the following: logged_in logged_out, for auth actions, started, cancelled, for masking run actions, created, modified, deleted for connection or ruleset actions.
description string A short description of what happened during the action.

Audit Log CSV

A CSV representation of the Audit Log Object

The CSV file contains the following headers:

Field Type Description
timestamp datetime The timestamp of when the audit log was created.
username string The username which created the audit log.
category string The category for the audit log, one of the following: auth, run, ruleset, or connection
action string The action taken. One of the following: logged_in logged_out, for auth actions, started, cancelled, for masking run actions, created, modified, deleted for connection or ruleset actions.
description string A short description of what happened during the action.

GET /api/audit-logs/

Authorization: User token only.

Retrieve all Audit Logs.

GET /api/audit-logs/ Parameters

No parameters.

GET /api/audit-logs/ Response

Status Code Description
200 A list of JSON serialised list of Audit Log objects

GET /api/audit-logs/ curl example

curl "https://<your-datamasque-host>/api/audit-logs/" \
     -H "Authorization: Token <your-api-token>"

GET /api/audit-logs/download/

Authorization: User token only.

Retrieve all Audit Logs.

GET /api/audit-logs/download/ Parameters

No parameters.

GET /api/audit-logs/download/ Response

Status Code Description
200 The server will return the audit logs in the response body which can be then downloaded as a CSV file.

GET /api/audit-logs/download/ curl example

curl -o <your-downloads-path>/<your-download-name>.csv -X GET "https://<your-datamasque-host>/api/audit-logs/" \
     -H "Authorization: Token <your-api-token>"

Schema Discovery

POST /api/schema-discovery/

Authorization: User token only.

Executes schema discovery against a database connection.

POST /api/schema-discovery/ Parameters

Field Type Required Description
connection string Yes The id of the Connection.
custom_keywords array[string] Yes List of keywords that, where a column name matches one or more of the keywords, indicates the column contains sensitive data.
disable_built_in_keywords boolean Yes If set to true, then DataMasque's built-in list of keywords will not be used to flag columns as sensitive.
disable_global_custom_keywords boolean Yes If set to true, then the user-defined global set of custom keywords will not be used to flag columns as sensitive.
disable_global_ignored_keywords boolean Yes If set to true, then the user-defined global set of ignored keywords will not be used to exclude columns from the discovery results.
ignored_keywords array[string] Yes List of keywords that, where a column name matches one or more of the keywords, indicates the column should be excluded from the schema discovery results.
in_data_discovery object No In-data discovery options. An object containing enabled, row_sample_size, custom_rules, non_sensitive_rules and force options. Defaults to {enabled: false}.
schemas array[string] Yes List of schema names (or database for MySQL/MariaDB) against which to perform schema discovery. Send an empty list to run against the schema configured on the database connection, or the database user's default schema if one is not specified for the connection.

POST /api/schema-discovery/ Responses

Status Code Description
201 A JSON serialised Run object.

POST /api/schema-discovery/ Example

curl -X POST "https://<your-datamasque-host>/api/schema-discovery/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "connection": "<your-connection-id>",
           "custom_keywords": [],
           "ignored_keywords": [],
           "disable_global_custom_keywords": false,
           "disable_global_ignored_keywords": false,
           "disable_built_in_keywords": false,
           "in_data_discovery": {
             "enabled": true,
             "row_sample_size": 500,
             "custom_rules": [
               {
                 "name": "temp_staff",
                 "pattern": "temp.*"
               }
             ],
             "non_sensitive_rules": [
               {"pattern": "retired.*"}
             ],
           }
         }'

GET /api/schema-discovery/{connection_id}/

Authorization: User token or API token.

Retrieve schema discovery results.

GET /api/schema-discovery/{connection_id}/ Parameters

None

GET /api/schema-discovery/{connection_id}/ Response

Status Code Description
200 A JSON serialised object containing a Schema Discovery object and a Run object.
Field Type Description
data object A Schema Discovery.
last_sdd_run object A JSON serialised Run object.

Schema Discovery Object

Schema Discovery objects have the following fields:

Field Type Description
options object List of ignored_keywords and customised_keywords.
schemas list[object] List of schema objects each with name and list of tables. tables contain name and a list of columns.
sd_version string Schema discovery version e.g. "1.1.1".

Schema Discovery Column Object

Column objects have the following fields:

Field Type Description
name string The column name
data_type string The data type for this field e.g varchar, integer, numeric, timestamp without time zone.
categories list[string] A list of classifications for the flagged sensitive data: PII, PHI, PCI and/or Custom.
max_length number The column length
description string The reason the column was flagged as sensitive.
foreign_keys list[object] A list of foreign key objects containing name and referenced_column.
is_unique_key boolean Is the column a unique key.
numeric_scale number If the data_type is numeric this refers to the maximum number of decimal places.
ruleset_match boolean The type of information detected by sensitive data discovery, used internally by the the ruleset generator to suggest a suitable masking rule.
in_data_result list[object] A list of In Data matches.
is_primary_key boolean Is the column a primary key.
numeric_precision number If the data_type is numeric this refers to the maximum number of digits present.
constraint_columns list[string] A list of column names participating in the constraint.
pk_constraint_name string The name of the primary key constraint.
uk_constraint_name string The name of the unique key constraint.
unique_index_names list[string] A list of index names for this column.
allow_in_data_override boolean A boolean representing that a Sensitive Data match can be overridden by an In Data match.
referencing_foreign_keys list[string] A list of foreign keys referencing this column.

GET /api/schema-discovery/v2/{run_id}/

Authorization: User token or API token.

Retrieve schema discovery results with server-side pagination, sorting, filtering and searching.

GET /api/schema-discovery/v2/{run_id}/ Parameters

Field Type Required Location Description
limit number No Query Parameter The maximum number of results to return. Defaults to 50 if not set.
offset number No Query Parameter The index of the first item to be returned within the whole set of results. Defaults to 0 if not set.
ordering string No Query Parameter Controls the sort order of results. Specify one or more columns separated by commas. To specify descending sort order, prefix the field name with '-'. Defaults to ?ordering=schema,table,column.
search string No Query Parameter Performs a case-insensitive partial match on the schema, table or column name.
categories string No Query Parameter Filters the categories (Data Classifications) using an exact match. Valid values are PII, PHI or PCI.
data_type string No Query Parameter Filters the data type name (excluding the length or numeric precision/scale) e.g ?data_type=varchar.
description string No Query Parameter Searches the description using a case-insensitive partial match.
flagged_by string No Query Parameter Filters the Flagged By field using an exact match. Valid values are In-Data Discovery or Metadata Discovery.
is_sensitive boolean No Query Parameter Filters the results for sensitive matches. Set to true to return only sensitive results, or false for only non-sensitive.
constraint string No Query Parameter Filters for results with either Primary or Unique constraints. Valid values are primary or unique (case-insensitive).

GET /api/schema-discovery/v2/{run_id}/ Response

Status Code Description
200 A JSON serialised object containing pagination meta-data and a list of Schema Discovery Result objects.
Field Type Description
count number Total number of unpaginated results.
next string Pagination link to the next page of results.
previous string Pagination link to the previous page of results.
results list[object] A list of Schema Discovery Result objects.

Schema Discovery Result object

Schema Discovery Result objects have the following fields:

Field Type Description
id number A unique id for the result.
column string The column name.
table string The table name.
schema string The schema name.
data object A v2 Schema Discovery Column Object.

v2 Schema Discovery Column Object

v2 Schema Discovery Column objects have the following fields:

Field Type Description
data_type string The data type for this field e.g varchar, integer, numeric, timestamp without time zone with the max_length or numeric_precision and numeric_scale appended.
max_length number The column length.
foreign_keys list[object] A list of foreign key objects containing name and referenced_column as a string containing schema.table.column.
discovery_matches list[object] A list of Discovery Match objects sorted by priority.
numeric_precision number The numeric precision of the column, the meaning of which depends on the database and data type.
numeric_scale number The numeric scale of the column, the meaning of which depends on the database and data type. Default is null.
constraint_columns list[string] A list of column names participating in the constraint.
pk_constraint_name string The name of the primary key constraint. Default is null.
uk_constraint_name string The name of the unique key constraint. Default is null.
unique_index_names list[string] A list of index names for this column.
referencing_foreign_keys list[object] A list of foreign keys referencing this column. The objects contain a name and referencing_column as a string containing schema.table.column.
categories list[string] A list of classifications for the flagged sensitive data: PII, PHI, PCI and/or Custom.
description string The reason the column was flagged as sensitive (blank for non-sensitive columns).
flagged_by string Indicates whether the column was flagged by In-Data Discovery or Metadata Discovery (or blank for non-sensitive columns).
constraint string Indicates whether the column is either a Primary or Unique key.

Discovery Match Object

Discovery Match objects have the following fields:

Field Type Description
label string A name for the rule that flagged the match. Can also be custom, custom_non_sensitive or ignore for user-defined match rules.
categories list[string] A list of classifications for the flagged sensitive data: PII, PHI, PCI and/or Custom.
flagged_by string Indicates whether the column was flagged by In-Data Discovery or Metadata Discovery.
description string The reason the column was flagged as sensitive.

Generating Rulesets

POST /api/generate-ruleset/

Authorization: User token only.

Returns a ruleset string for selected columns of a connection.

Prerequisite: Make sure you have the schema-discovery report for the connection specified in the post data.

POST /api/generate-ruleset/[v1/|v2/] curl example

curl -X POST "https://<your-datamasque-host>/api/generate-ruleset/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "connection": "<your-connection-id>",
           "selected_columns": {
             "schema_name": {
               "table_name": [
                 "column_name_1",
                 "column_name_2"
               ]
             }
           }
         }'

POST /api/generate-ruleset/[v1/] Response

The default response for a version 1 request is a json encoded string containing the ruleset yaml. The trailing /v1/ is optional for version 1.

POST /api/generate-ruleset/v2/ Response

The version 2 response is a plain text containing the ruleset yaml.

POST /api/generate-file-ruleset/

Authorization: User token only.

Returns a ruleset string for selected data of a file connection.

The selected data is a list of file groups, each of which contains:

  • A list of files which are the full paths relative to the base directory of the connection.
  • A list of locators, which are either JSON locators or strings containing a single header column name. JSON locators must be formatted as lists even if they consist of a single element.

Each file group will generate at least one task in the ruleset (either mask_file or mask_tabular_file).

Generally, only one task will be generated per file group, but in cases where files have different extensions, delimiters or encodings, multiple tasks will be generated to cater for these settings.

File groups should only contain files of the same type, that is, don't specify object files, multi-record files, or tabular files in the same file group. If multiple file types are mixed, then the generated ruleset will attempt to split into multiple tasks, but the results may be unexpected.

Prerequisite: Make sure you have the file-discovery report for the connection specified in the POST data so that a discovery run has been completed on the connection and the files can be selected from the report.

POST /api/generate-file-ruleset/ curl example

curl -X POST "https://<your-datamasque-host>/api/generate-file-ruleset/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "connection": "<your-connection-id>",
           "selected_data": [
             {
               "files": ["file1.json", "file2.json"],
               "locators": [["age"], ["users", "*", "name"]]
             },
             {
               "files": ["file1.csv", "file2.csv"],
               "locators": ["gender", "address"]
             },
             [repeated for different file groups…]
           ],
         }'

POST /api/generate-file-ruleset/ Response

The response is plain text containing the ruleset yaml.

Generate Ruleset Result Object

Generate Ruleset Result objects are returned by DataMasque for the async-generate-ruleset family of APIs. They have the following fields:

Field Type Description
connection string The ID of the connection for which a ruleset is being generated.
generated_ruleset string The ruleset that has been generated. Not applicable if ruleset generation was started using the from-csv API.
status string The status of the ruleset generation task. One of queued, running, finished, failed, or cancelled.
status_message string A status message describing the progress of the ruleset generation task.
error_message string The error message when generating the ruleset has failed.
last_updated string The timestamp of the last update to this Generate Ruleset Result, in ISO 8601 format.

Endpoint to query to get the generated ruleset:

GET /api/async-generate-ruleset/{connection_id}/

Authorization: User token only.

Returns result of generating ruleset progress.

GET /api/async-generate-ruleset/{connection_id}/ Parameters

Field Type Required Location Description
connection_id string Yes URL Path The id of the Connection.

GET /api/async-generate-ruleset/{connection_id}/ Responses

Status Code Description
200 A JSON serialised Generate Ruleset Result Object.
404 Not Found: No connection with the specified ID exists.

GET /api/async-generate-ruleset/{connection_id}/ curl example

curl "https://<your-datamasque-host>/api/async-generate-ruleset/{connection_id}/" \
     -H "Authorization: Token <your-api-token>"

POST /api/async-generate-ruleset/{connection_id}/

Authorization: User token only.

Start generating ruleset for selected columns of a database connection or for selected data of a file connection.

POST /api/async-generate-ruleset/{connection_id}/ Parameters

Field Type Required Location Description
connection_id string Yes URL Path The id of the Connection.

POST /api/async-generate-ruleset/{connection_id}/ Responses

Status Code Description
201 A JSON serialised Generate Ruleset Result Object.
404 Not Found: No connection with the specified ID exists.

POST /api/async-generate-ruleset/{connection_id}/ curl example

For generating rulesets on database connections:

curl -X POST "https://<your-datamasque-host>/api/async-generate-ruleset/{connection_id}/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "selected_columns": {
             "schema_name": {
               "table_name": [
                 "column_name_1",
                 "column_name_2"
               ]
             }
           }
         }'

For generating rulesets for file connections:

POST /api/async-generate-ruleset/{connection_id}/ curl example

curl -X POST "https://<your-datamasque-host>/api/async-generate-ruleset/{connection_id}/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "selected_data": [
             {
               "files": ["file1.json", "file2.json"],
               "locators": [["age"], ["users", "*", "name"]]
             },
             {
               "files": ["file1.csv", "file2.csv"],
               "locators": ["gender", "address"]
             },
             [repeated for different file groups…]
           ],
         }'

DELETE /api/async-generate-ruleset/{connection_id}/

Authorization: User token only.

Cancels ruleset generation currently in progress for a connection. If the ruleset generation has already finished, deletes any generated ruleset.

Warning! Deletion of the generated ruleset is irreversible.

DELETE /api/async-generate-ruleset/{connection_id}/ Parameters

Field Type Required Location Description
connection_id string Yes URL Path The id of the Connection.

DELETE /api/async-generate-ruleset/{connection_id}/ Responses

Status Code Description
200 Ruleset generation cancelled before any results were processed.
204 Ruleset generation had finished. The generated ruleset has been deleted.
404 Not Found: No connection with the specified ID exists.

DELETE /api/async-generate-ruleset/{connection_id}/ curl example

curl -X DELETE "https://<your-datamasque-host>/api/async-generate-ruleset/{connection_id}/" \
     -H "Authorization: Token <your-api-token>"

POST /api/async-generate-ruleset/{connection_id}/from-csv/

Authorization: User token only.

Start generating a ruleset for selected columns of a database connection. The columns are specified by modifying the CSV report retrieved from the /api/runs/{run_id}/db-discovery-results/report/ endpoint. Specifically, there is one discovered database column detailed in each row of the CSV report, and if that column is to be included in ruleset generation, the Selected column of the CSV should be marked with 1, true, y or yes (case-insensitive).

POST /api/async-generate-ruleset/{connection_id}/from-csv/ Parameters

Field Type Required Location Description
connection_id string Yes URL Path The id of the Connection.
csv_or_zip_file file Yes Request Body The byte content of the CSV, or the ZIP file containing one or more CSVs.
target_size_bytes int No Request Body Generate rulesets of approximately this size in bytes. Defaults to 512,000 (500 KiB).
force_run boolean No Request Body If set to true, cancel any existing ruleset generation and restart it. Defaults to false.

POST /api/async-generate-ruleset/{connection_id}/from-csv/ Responses

Status Code Description
201 A JSON serialised Generate Ruleset Result Object.
404 Not Found: No connection with the specified ID exists.

POST /api/async-generate-ruleset/{connection_id}/from-csv/ curl example

curl -X POST "https://<your-datamasque-host>/api/async-generate-ruleset/{connection_id}/" \
     -H "Authorization: Token <your-api-token>" \
     -F "csv_file=@selected_report.csv" \
     -F "target_size_bytes=250000"

GET /api/async-generate-ruleset/{connection_id}/download-rulesets/

Authorization: User token only.

Once ruleset generation invoked via POST /api/async-generate-ruleset/{connection_id}/from-csv/ is completed, query this endpoint to download the rulesets in a ZIP file.

GET /api/async-generate-ruleset/{connection_id}/download-rulesets/ Parameters

Field Type Required Location Description
connection_id string Yes URL Path The id of the Connection.

GET /api/async-generate-ruleset/{connection_id}/download-rulesets/ Responses

Status Code Description
200 Returns a streamed ZIP file containing the generated rulesets.
400 Bad Request: The ruleset generation is still in progress, or has failed.

If an error response is received, query the GET /api/async-generate-ruleset/{connection_id}/ endpoint to check the status of ruleset generation.

GET /api/async-generate-ruleset/{connection_id}/download-rulesets/ curl example

curl -o rulesets.zip "https://<your-datamasque-host>/api/async-generate-ruleset/{connection_id}/download-rulesets/" \
     -H "Authorization: Token <your-api-token>"

File Data Discovery

POST /api/run-file-data-discovery/

Authorization: User token only.

Executes data discovery against files on a file connection. The file connection must already be configured. Use the UUID of the file connection in the request, which can be found:

  • at the top of the page when you view the connection in the DataMasque UI, or
  • in the URL when you view the connection in the DataMasque UI, or
  • in the id field of the Connection Object.

Discovery keywords

By default, DataMasque's extensive list of built-in keywords is used to identify which fields and attributes in the files are considered sensitive. DataMasque matches the name of the field or attribute against each keyword using a case-insensitive, partial match. For example, a field named credit_CARD_NUMBER will match the Credit card keyword.

You can use various options to refine the set of discovery keywords.

  • Setting disable_built_in_keywords to true means that the built-in keyword list linked above will not be used. In this case, the discovery process will use only the keywords given in custom_keywords and any configured global custom keywords.
  • The custom_keywords option allows you to specify a list of additional keywords to match on. Any fields or attributes whose name includes one or more of those keywords will be flagged as sensitive.
  • A match between a field or attribute's name and a value in the ignored_keywords list will cause a field or attribute to be completely excluded from the results, even if its name suggests that the field may contain sensitive data.
  • Global keywords, as configured through the Settings page of the DataMasque UI, are also considered unless disable_global_custom_keywords and/or disable_global_ignored_keywords (as appropriate) are set to true.

Warning! Ignored keywords have priority. If a field or attribute name matches both a built-in, global, or custom keyword and also matches an entry in ignore_keywords or a global ignored keyword, the field or attribute will not be included in the discovery results.

Specifying files to discover

Supported filetypes for discovery are:

  • JSON (.json)
  • NDJSON (.ndjson)
  • Parquet (.parquet)
  • CSV (.csv)

Note: Files' types are determined solely by the file extension, not by their content.

Use the include, skip and recurse options to control which files are included in the discovery process. These have the same syntax and meaning as in a from_file task definition. If none of these options are included, DataMasque will run discovery against all files (of the supported filetypes) in the base directory specified on the connection, but will not recurse into subdirectories.

See also Choosing files to mask with include/skip for an exact specification of the behaviour of, and some common examples of, include and skip rules.

Warning! If a file matches both an include and a skip rule, that file will not be included in data discovery.

Note: Take care to correctly escape backslashes in include or skip regexes. For example, if you want to match a literal dot (.) in a filename, the regex needs to escape the dot with a backslash and this backslash must itself be escaped as part of JSON encoding rules, since the request body is in JSON format. So you might use the JSON object {"regex": "file\\.[0-9]+\\.csv"}, representing the regex file\.[0-9]+\.csv which will match file.53.csv but not filex53.csv.

Encoding of CSV files

The encoding option controls how DataMasque interprets CSV files. The default encoding is utf-8. Refer to Python Standard Encodings for a list of supported encodings.

Supported Parquet column types

The list of Parquet column data types supported by file data discovery is the same as the list of supported data types for Parquet masking. See the list of supported data types here.

For complex columns (those of struct, map and list type), also called nested columns, all fields of scalar data type within the columns are discovered separately. In the file discovery reports, the locators for the individual scalar fields are given as JSON paths with the column name as the first element.

Note: This differs from the syntax used for masking these fields where the column name must be specified separately from the path to the field within the column.

For example, with a column named staff of type map<string, struct<name: string, employee_id: int64, salary_history: list<float>>> (a map where the keys are strings and the values are a structure type with keys name, employee_id, and salary_history, the latter being a list), the discovered fields will all have one of the following path formats:

  • staff/<key value>/name
  • staff/<key value>/employee_id
  • staff/<key value>/salary_history/*

where <key value> is a key in the top-level map. Notice that all list indices are replaced with the wildcard * and treated as a single field.

Custom and ignored keywords match on the name of the individual field (such as name in the above example), not the name of the column. For list fields, they match on the last string element of the path (ignoring list indices), for example salary_history.

In-data discovery options

The in_data_discovery parameter on the API request body allows you to control whether and how the discovery process uses in-data discovery to refine sensitive data matches. It is an object parameter with the following fields.

  • You must specify the enabled parameter (true or false).
  • Optional parameters are a row_sample_size (positive integer), force (a boolean), a list of zero or more custom_rules, and a list of zero or more non_sensitive_rules.
  • Each entry in custom_rules is an object with parameters name and pattern, where name is any user-defined name and pattern is a regex.
  • Each entry in non_sensitive_rules is an object with a pattern parameter, again a regex.
  • row_sample_size defaults to 1000.
  • force defaults to false.
  • custom_rules and non_sensitive_rules are empty by default.

When enabled, in-data discovery applies the built-in rules, alongside any specified custom_rules and non_sensitive_rules, matching against the data within tabular file columns, or scalar values within JSON documents or complex Parquet columns.

Warning! Non-sensitive rules have priority. If a field or attribute name matches a keyword, built-in IDD rule or custom IDD rule, and also matches a non-sensitive rule, the field or attribute will be marked in the discovery results as Custom Non-Sensitive.

The row_sample_sizecontrols how many samples the in-data discovery process will examine to try to identify the type of data. Configure the row_sample_size according to your needs, bearing in mind that in-data discovery samples only the first <row_sample_size> rows or values encountered when processing the file (so the first 1000 rows in a CSV file, for example, with the default sample size). Use of very large sample sizes can slow down data discovery and consume a lot of RAM (see also this table of memory limits for in-data discovery).

  • If your files are small and/or consistent in that they have the same kind of data present in most or all rows, then a sample size of 100-500 rows is sufficient.
  • If you have large files with sparse data (many nulls) and/or differing data formats within a column or JSON path, use a larger sample size.

When enabled, force will run IDD on a column even if schema discovery has already flagged the column as containing sensitive data.

POST /api/run-file-data-discovery/ Parameters

Field Type Required Description
connection string Yes The id of the Connection.
in_data_discovery object No In-data discovery options. An object containing enabled, row_sample_size, custom_rules, ignore_rules and force options. Defaults to {enabled: false}.
custom_keywords array[string] No List of keywords that, where a field or attribute's name matches one or more of the keywords, indicates the column contains sensitive data. Default value is an empty list.
ignored_keywords array[string] No List of keywords that, where a field or attribute's name matches one or more of the keywords, indicates the field or attribute should be excluded from the schema discovery results. Default value is an empty list.
disable_global_custom_keywords boolean No If set to true, then the user-defined global set of custom keywords will not be used to flag fields or attributes as sensitive. Default value is false.
disable_global_ignored_keywords boolean No If set to true, then the user-defined global set of ignored keywords will not be used to exclude fields or attributes from the discovery results. Default value is false.
disable_built_in_keywords boolean No If set to true, then DataMasque's built-in list of keywords will not be used to flag fields or attributes as sensitive. Default value is false.
include array[object] No Files to discover, specified as glob or regex. Default value is an empty list, meaning everything will be included.
skip array[object] No Files to exclude, specified as glob or regex. Default value is an empty list, meaning everything will be included.
recurse boolean No Whether to recurse into subdirectories of the base directory, or of items matched by include. Default value is false.
encoding string No File byte encoding. Only applies to CSV files. Default value is utf-8.
workers integer No Number of workers. Refer to the File Ruleset Generator page for information. Allowed range is 1-32. Defaults to 1.

POST /api/run-file-data-discovery/ Responses

Data discovery runs asynchronously as a special type of masking run. This API endpoint returns a Run object which contains an id field. Use the GET /api/runs/{id}/ endpoint with this run ID to query the status of the data discovery process. To retrieve the file discovery results when the run is complete, use the GET /api/runs/{id}/file-discovery-results/ endpoint with this run ID.

Status Code Description
201 A JSON serialised Run object.

POST /api/run-file-data-discovery/ curl example

curl -X POST "https://<your-datamasque-host>/api/run-file-data-discovery" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "connection": "<your-connection-id>",
           "in_data_discovery": {
             "enabled": true,
             "row_sample_size": 500,
             "custom_rules": [
               {
                 "name": "temp_staff",
                 "pattern": "temp.*"
               }
             ],
             "non_sensitive_rules": [
               {"pattern": "retired.*"}
             ],
             "force": false
           },
           "custom_keywords": ["id1", "id2"],
           "ignored_keywords": ["ignore1"],
           "include": [
             {"glob": "*.ndjson"},
             {"glob": "*.json"},
           ],
           "skip": [
             {"regex": "backup/staff[0-9]+\\.json"},
           ],
           "recurse": true,
           "workers": 4
         }'

GET /api/runs/{id}/file-discovery-results/

Authorization: User token or API token.

Retrieve file discovery results.

GET /api/runs/{id}/file-discovery-results/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the Run.

GET /api/runs/{id}/file-discovery-results/ Responses

Status Code Description
200 A JSON serialised list of File Discovery Result objects.

GET /api/runs/{id}/file-discovery-results/ curl example

curl "https://<your-datamasque-host>/api/runs/{id}/file-discovery-results/" \
     -H "Authorization: Token <your-api-token>"

GET /api/runs/{id}/file-discovery-results/ Example response

This shows a group of results where one file was discovered with a Metadata match on Passenger ID, an In-Data match on Name and no matches on Ticket.

[
  {
    "id": 1,
    "connection": {
      "id": "f795b7f1-d654-41c8-bb7c-db741d81dc19",
      "name": "example_file_source"
    },
    "file_type": "csv",
    "files": [
      {
        "path": "example.csv",
        "delimiter": ",",
        "encoding": "utf-8",
        "file_type": "csv"
      }
    ],
    "results": [
      {
        "locator": "PassengerId",
        "matches": [
          {
            "label": "identifiers",
            "categories": ["PII", "PHI"],
            "flagged_by": "Metadata Discovery",
            "description": "Identification"
          }
        ],
        "data_types": ["int"]
      },
      {
        "locator": "Name",
        "matches": [
          {
            "label": "name",
            "categories": ["PII", "PCI", "PHI"],
            "flagged_by": "In-Data Discovery",
            "description": "Full Names"
          }
        ],
        "data_types": ["str"]
      },
      {
        "locator": "Ticket",
        "matches": [],
        "data_types": ["str"]
      }
    ]
  }
]

File Discovery Result Object

File Discovery Result objects have the following fields:

Field Type Description
id integer The id of the File Discovery Result.
connection object The UUID and name identifying the connection used for this File Discovery Result.
file_type string The file type (csv, parquet, json, or ndjson). File Discovery Results are grouped by file type.
files array[object] A list of File objects.
results array[object] A list of Result objects.

File Object

File objects have the following fields:

Field Type Description
path string The discovered file's path, relative to the base directory of the connection.
file_type string The file type (csv, parquet, json, or ndjson).
delimiter Optional[string] For delimited text files, the field separator. e.g "," for csv
encoding Optional[string] The file encoding, for example "utf-8".

Result Object

Result objects have the following fields:

Field Type Description
locator array['string' or 'int'] or string Either a JSON locator or a column name.
matches array['object'] A list of Match objects.
data_types array['string'] The list of data types found for this field: int, long, str, date, time, year, timestamp, boolean, float, or decimal.

Match Object

Match objects have the following fields:

Field Type Description
categories array['string'] A list of classifications for the flagged sensitive data: PII, PHI, PCI and/or Custom.
flagged_by string Whether the column was flagged for sensitive data through in-data discovery or through the standard sensitive data discovery / keyword matching process. Metadata Discovery or In-Data Discovery.
description string The name of the rule which caused the column to be flagged for sensitive data.
label string Machine-readable representation of description.

Oracle Wallets

GET /api/oracle-wallets/

Authorization: User token only.

Returns a list of Oracle wallets. These are used to connect to encrypted Oracle connections.

GET /api/oracle-wallets/ Parameters

No parameters.

GET /api/oracle-wallets/ Responses

Status Code Description
201 A JSON serialised list of Oracle wallets.

GET /api/oracle-wallets/ curl example

curl "https://<your-datamasque-host>/api/oracle-wallets/" \
     -H "Authorization: Token <your-api-token>"

POST /api/oracle-wallets/

Authorization: User token only.

Create a new Oracle wallet.

POST /api/oracle-wallets/ Parameters

Field Type Required Location Description
name string Yes Form Field The name of the Oracle Wallet.
zip_archive file Yes Form Field The Zip archive file.

POST /api/oracle-wallets/ Responses

Status Code Description
201 A JSON serialised Oracle wallet object of the wallet created.

POST /api/oracle-wallets/ curl example

curl -X POST "https://<your-datamasque-host>/api/oracle-wallets/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: multipart/form-data" \
     -F "name=<fileset_name>" \
     -F "zip_archive=@</path/to/your/file.zip>"

DELETE /api/oracle-wallets/{id}/

Authorization: User token only.

Delete the Oracle wallet with the specified id.

DELETE /api/oracle-wallets/{id}/ Parameters

No parameters.

DELETE /api/oracle-wallets/{id}/ Responses

Status Code Description
204 Operation succeeded.

DELETE /api/oracle-wallets/{id}/ curl example

curl -X DELETE "https://<your-datamasque-host>/api/oracle-wallets/{id}/" \
     -H "Authorization: Token <your-api-token>"

Git Setting Object

Git settings are global for the DataMasque instance and can only be updated by an admin user. Git settings are updated on the Settings page in the DataMasque UI.

Git Setting objects have the following fields:

Field Type Description
git_repository_url string The URL of where the Git repository is hosted.
git_branch string The name of the Git branch from which DataMasque will push or pull.
git_directory_path string The directory that DataMasque will push and pull rulesets to, relative to the root of the repository. Note that DataMasque does not support pushing/pulling rulesets in subdirectories of this directory.

GET /api/git-setting/

Authorization: User token only.

Retrieve a Git Setting Object with information about the DataMasque instance's Git settings.

GET /api/git-setting/ Parameters

No parameters.

GET /api/git-setting/ Responses

Status Code Description
200 A JSON serialized Git Setting Object for the DataMasque instance.

GET /api/git-setting/ curl example

curl "https://<your-datamasque-host>/api/git-setting/" \
     -H "Authorization: Token <your-api-token>"

GET /api/git-setting/user/

Authorization: User token only.

Retrieve a Git Setting Object with information about the DataMasque instance's Git settings. If the current user has specified a git_directory_path, this will be present in the response. Otherwise, the git_directory_path will be the global one for the DataMasque instance.

GET /api/git-setting/user/ Parameters

No parameters.

GET /api/git-setting/user/ Responses

Status Code Description
200 A JSON serialized Git Setting Object for the DataMasque instance.

GET /api/git-setting/user/ curl example

curl "https://<your-datamasque-host>/api/git-setting/user/" \
     -H "Authorization: Token <your-api-token>"

SSH Key Object

SSH Key objects have the following fields:

Field Type Description
name string The specified filename of the SSH Key file.
date_uploaded string The ISO 8601 datetime string of when the user uploaded the SSH key.

GET /api/git-ssh-key/

Authorization: User token only.

Retrieve an SSH Key Object for information about the current user's uploaded SSH Key.

GET /api/git-ssh-key/ Parameters

No parameters.

GET /api/git-ssh-key/ Responses

Status Code Description
200 A JSON serialized SSH Key Object which is the most recent SSH Key Upload for the user which made the request.

GET /api/git-ssh-key/ curl example

curl "https://<your-datamasque-host>/api/git-ssh-key/" \
     -H "Authorization: Token <your-api-token>"

PUT /api/git-ssh-key/

Authorization: User token only.

Upload an SSH Key to be used to access a Git remote repository.

Warning: A user may have only one SSH key at a time, so the existing key will be deleted and replaced with the uploaded key for the user making the request.

PUT /api/git-ssh-key/ Parameters

Field Type Required Location Description
key_file file Yes Form Field The SSH Key file.
name string Yes Form Field The name of the file.

PUT /api/git-ssh-key/ Responses

Status Code Description
200 A JSON serialized SSH Key Object, which is the most recent SSH Key Upload for the user making the request.

PUT /api/git-ssh-key/ curl example

curl -X PUT "https://<your-datamasque-host>/api/git-ssh-key/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: multipart/form-data" \
     -F "key_file=@</path/to/your/file>" \
     -F "name=<your-ssh-key-filename>"

DELETE /api/git-ssh-key/

Authorization: User token only.

Delete the current user's uploaded SSH key.

DELETE /api/git-ssh-key/ Parameters

No parameters.

DELETE /api/git-ssh-key/ Responses

Status Code Description
204 The SSH key associated with the requesting user has been deleted.

DELETE /api/git-ssh-key/ curl example

curl DELETE -X "https://<your-datamasque-host>/api/git-ssh-key/" \
     -H "Authorization: Token <your-api-token>"

GET /api/ruleset-git/

Authorization: User token only.

Pull the content of a specific ruleset given its commit ID. The current user's Git SSH key is used for authentication.

How File Paths Are Built

Internally, DataMasque generates the name of the file by appending the specified extension to ruleset_name. The file name is then appended to git_directory_path (from the DataMasque Git Settings) to build the full file path. For example, for a ruleset_name of My Ruleset, extension of .yml and git_directory_path of masking/rulesets, the file masking/rulesets/My Ruleset.yml will be retrieved. Its contents will be that as at the specified commit ID.

GET /api/ruleset-git/ Parameters

Field Type Required Location Description
commit_id string Yes Query Parameter The Git commit ID for the ruleset.
ruleset_name string Yes Query Parameter The name of the ruleset. Used to build the path as per How File Paths Are Built above.
extension string No Query Parameter The extension to save with the ruleset name. Must be .yml or .yaml. Default to .yml if missing.

GET /api/ruleset-git/ Responses

Status Code Description
200 A JSON object with a single key, config_yaml, that contains the ruleset content

GET /api/ruleset-git/ curl example

curl "https://<your-datamasque-host>/api/ruleset-git/?commit_id=<your-full-commit-id>&ruleset_name=<your-ruleset-name>&extension=.yaml" \
     -H "Authorization: Token <your-api-token>"

POST /api/ruleset-git/

Authorization: User token only.

Commit then push changes upstream for a specific ruleset.

POST /api/ruleset-git/ Parameters

Field Type Required Location Description
commit_message string Yes Request Body The Git commit message for the ruleset changes.
ruleset_name string Yes Request Body The name of the ruleset. Used to build the path as per How File Paths Are Built above.
extension string No Request Body The extension to save with the ruleset name. Must be .yml or .yaml. Default to .yml if missing.
ruleset_content string Yes Request Body The YAML contents of the ruleset.

POST /api/ruleset-git/ Responses

Status Code Description
200 Operation succeeded.

POST /api/ruleset-git/ curl example

curl -X POST "https://<your-datamasque-host>/api/ruleset-git/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "commit_message": "Update ruleset",
           "ruleset_name": "<your-ruleset-filename>",
           "extension": ".yml",
           "ruleset_content": "version: \"1.0\"\ntasks:\n  - type: run_data_discovery"
         }'

GET /api/ruleset-git/files/

Authorization: User token only.

This endpoint lists the git_directory_path in the remote repository configured for the DataMasque instance. It considers any files ending in .yml to be ruleset files, and will fetch the list of commits for each of them. It does not enter into subdirectories of git_directory_path.

GET /api/ruleset-git/files/ Parameters

No parameters.

GET /api/ruleset-git/files/ Responses

Example Response

The response is a JSON object with each key being the name of a file with a .yml extension in the git_directory_path. Each file entry has an array objects with a commit ID, commit date and commit message.

{
  "Ruleset1.yml": [
    {"commit": "f061s…46756", "date": "2024-01-10 12:31:45", "message": "Added Column"},
    {"commit": "64c18…1a279", "date": "2024-01-09 10:19:13", "message": "Removed Column"}
  ],
  "Another Ruleset.yml": [
    {"commit": "377f5…b32f4", "date": "2023-12-25 12:31:45", "message": "Update rule"}
  ]
}
Response Codes
Status Code Description
200 A JSON serialized list of ruleset names and their associated Git commit history.

GET /api/ruleset-git/files/ curl example

curl "https://<your-datamasque-host>/api/ruleset-git/files/" \
     -H "Authorization: Token <your-api-token>"

Exporting DataMasque Configuration

To keep a backup of the data stored in DataMasque, you can export it to a Zip file. This is done by making a GET request to /api/export/v1/. Optionally, you can also specify the export_type query parameter to select which data to include in the export. The parameter may be specified multiple times to specify different types of data to include in the same Zip file.

The Zip file will have the following structure, but please note that some files/directories may be missing if those files were not included in the export, due to setting an export_type.

Path Type Description
manifest.json File A JSON file containing metadata about the export and other files in the Zip.
rulesets/database/ Directory A directory containing database masking rulesets in YAML format.
rulesets/file/ Directory A directory containing file masking rulesets in YAML format.

Export Types

The following export types may be used to control the data included in the export archive:

Currently, only the export of Rulesets is supported, therefore this is no difference in specifying rulesets as the export_type or omitting the export_type parameter completely.

Export Type Description
all Include all data described in this table. This is the default if no export_type is selected.
rulesets Include only rulesets.

manifest.json format

The manifest.json file contains the following information:

  • metadata: Metadata about the export archive.
    • version: The version format of the export file.
    • exported_at: The UTC date and time the export was created, in ISO format.
  • data: Information about the files included in the export archive.
    • rulesets: A list of metadata about the exported ruleset. Each object in the list contains the id, name and type (database or file) for each exported ruleset.

Ruleset Export Naming

When rulesets are exported to a Zip archive, they are stored in either the rulesets/database/ directory, (for database rulesets) or rulesets/file/ directory (for file rulesets).

The name of the file is built by appending .yml to the ruleset name. For example:

  • The database masking ruleset named Ruleset 01 would be exported to rulesets/database/Ruleset 01.yml.
  • The file masking ruleset named Ruleset F would be exported to rulesets/file/Ruleset F.yml.

Note: Rulesets that have been deleted from DataMasque are not visible in the ruleset list in the DataMasque dashboard, but are still retained in the DataMasque database because runs reference them. These "archived" rulesets are not including the Zip export.

GET /api/export/v1/

Authorization: User token only.

Export DataMasque data to a Zip archive in the Version 1 format. The filename of the archive will be based on the export type selected, and contain the current UTC date and time. For example: datamasque_export_rulesets_20240211-091507.zip.

GET /api/export/v1/ Parameters

Field Type Required Location Description
export_type string No Query Parameter The type of data to export (see Export Types for a full list). Defaults to all.

Multiple export types may be specified by using multiple export_type query parameters. For example, /api/export/v1/?export_type=type_a&export_type=type_b.

GET /api/export/v1/ curl example

When using curl, specify the -O flag to output the response to disk, and the -J flag to allow the response to specify the name (as per the example above).

curl "https://<your-datamasque-host>/api/export/v1/" \
     -H "Authorization: Token <your-api-token>" \
     -J -O

A Zip file named like datamasque_export_all_20240211-091507.zip will be saved to the current directory.

Importing DataMasque Configuration

A DataMasque export Zip can be imported to a DataMasque install using the /api/export/v1 API endpoint.

For the best import experience, a Zip that has been exported from DataMasque than contains a manifest.json file should be used. However, a Zip with the correct folder structure may also be created, even if missing manifest.json. DataMasque will import the information, but automatic conflict resolution of duplicate rulesets will not work as well. The difference between inclusion/exclusion of manifest.json is explained below.

Zip Exports From DataMasque With manifest.json

Since Zip exports created by DataMasque include the UUID of each exported item, this can be used to determine which items already exist.

When importing rulesets:

  • If a ruleset with a given ID exists during import:
    • If ruleset is archived, then it will be restored and its name and content are updated with the imported ruleset.
    • If ruleset is not archived, then no action is taken with that ruleset. The content in the DataMasque instance is unchanged.
  • If a ruleset is found with a matching name, and the contents are identical, then no action is taken. The content in the DataMasque instance is unchanged.
  • If a ruleset is found with a matching name, but the contents are different, then a new ruleset is created by appending Copy to the name. For example, if Ruleset A exists, then the content will be uploaded to a ruleset Ruleset A Copy. An incrementing number will be added until an unused name is found, for example, Copy 1, Copy 2, etc.
  • If no ruleset with the given ID or name exists, then it is created.

Because of these rules, imports of the same Zip archive may be repeated multiple times without duplicating content.

Zip Exports Created Without manifest.json

A Zip export archive may be created manually, provided the file structure is correct. That is, it matches the structure outlined in Exporting DataMasque Configuration. Without a manifest.json, the ID of rulesets is not known, so matching is done based on the name, using the following rules:

  • If a ruleset is found with a matching name, and the contents are identical, then no action is taken. The content in the DataMasque instance is unchanged.
  • If a ruleset is found with a matching name, but the contents are different, then a new ruleset is created by appending Copy to the name. For example, if Ruleset A exists, then the content will be uploaded to a ruleset Ruleset A Copy. An incrementing number will be added until an unused name is found, for example, Copy 1, Copy 2, etc.
  • If no ruleset with the given name exists, then it is created.

Because the imported IDs of rulesets is not known, re-running an import without a manifest.json may result in duplicated rulesets with identical content.

POST /api/import/v1/

Authorization: User token only.

Import a DataMasque export Zip file. The response will contain a list of actions taken for each included object.

POST /api/import/v1/ Parameters

Field Type Required Location Description
zip_archive file Yes Form Field The exported Zip archive file.

POST /api/import/v1/ Responses

The response of an import request contains information about the resources that were imported, grouped by resource type. An example response is shown below.

{
  "data": {
    "rulesets": {
      "metadata": {"processed":  6, "created":  2, "restored": 1, "error":  1},
      "data": [
        {
          "exported_name": "Ruleset A", 
          "exported_id": "9d641e97-adf7-4f22-9089-afc3711bf222",
          "imported_name": "Ruleset A", 
          "imported_id": "9d641e97-adf7-4f22-9089-afc3711bf222",
          "ruleset_type": "database",
          "status": "NOT_CREATED", 
          "message": "A ruleset with ID \"9d641e97-adf7-4f22-9089-afc3711bf222\"  already exists, and was not changed."
        },
        {
          "exported_name": "Ruleset B", 
          "exported_id": null,
          "imported_name": "Ruleset B Copy", 
          "imported_id": "04ea20f0-ad4c-498e-881f-b0bc79d83ba7",
          "ruleset_type": "file",
          "status": "CREATED_DUPLICATE", 
          "message": "A ruleset named \"Ruleset B\" already exists, so ruleset \"Ruleset B Copy\" was created."
        },
        {
          "exported_name": "Ruleset C", 
          "exported_id": null,
          "imported_name": "Ruleset C", 
          "imported_id": "7d731d55-68c9-400e-a790-e052afe789cc",
          "ruleset_type": "database", 
          "status": "NOT_CREATED", 
          "message": "A ruleset named \"Ruleset C\" exists with identical content."
        },
        {
          "exported_name": "Ruleset D", 
          "exported_id": null,
          "imported_name": "Ruleset D", 
          "imported_id": "99eeffd3-3f65-4ed7-8ad1-a31a539b7b2c",
          "ruleset_type": "file",
          "status": "CREATED", 
          "message": "Ruleset named \"Ruleset D\" did not exist, and was created."
        },
        {
          "exported_name": "Ruleset E", 
          "exported_id": "c0f5b5bb-a2ce-4cea-9248-1b8ef6539a0e",
          "imported_name": "Ruleset E", 
          "imported_id": "c0f5b5bb-a2ce-4cea-9248-1b8ef6539a0e",
          "ruleset_type": "database",
          "status": "RESTORED", 
          "message": "An archived ruleset with ID \"c0f5b5bb-a2ce-4cea-9248-1b8ef6539a0e\" has been restored and overwritten with the new name and content."
        },
        {
          "exported_name": "Ruleset F", 
          "exported_id": "abc123",
          "imported_name": null, 
          "imported_id": null,
          "ruleset_type": "database",
          "status": "ERROR", 
          "message": "Import of ruleset with ID \"abc123\" due to error: invalid ID." 
        }
      ]
    }
  }
}

The metadata for each item type shows the number of items of that type processed, and how many of each one were created, restored or had an error.

Each data object contains information about the import of that item. The fields are:

  • exported_name: The name of the ruleset in the export Zip archive.
  • exported_id: The ID of the ruleset from the export Zip archive. Only available if a manifest.json files is present, otherwise this will be null.
  • imported_name: The name that the ruleset was imported to. Usually this will match exported_name. This will only be null on error. If the ruleset was not imported due to it already existing, this will still match exported_name.
  • imported_id: The ID that the ruleset was imported to. This will be generated if exported_id was null, otherwise it will be expected to match exported_id (even if the data was not changed). imported_id will be null on error.
  • ruleset_type: One of database or file.
  • status: The status of the import of this ruleset. One of:
    • NOT_CREATED: Ruleset was not created due to the ID existing or content being identical.
    • CREATED_DUPLICATE: A ruleset with that name existed, so it was imported with a new name (in imported_name).
    • CREATED: A ruleset with that ID or name did not exist, so was created.
    • RESTORED: An archived ruleset has been restored and overwritten with the new name and content from an imported ruleset.
    • ERROR: There was an error creating the ruleset. Check message for details.
  • message: A human-readable message describing the action taken or error that occurred. Messages may change between DataMasque versions, so they should not be relied on to determine the outcome of an import. Instead, refer to the status field.

The status code of the response, as shown in the table below, gives a quick overview of if any resources were created or not.

Status Code Description
200 The import was successful, indicating either no changes (e.g. the uploaded rulesets already existed) or the successful restoration of some rulesets.
201 The import was successful, and one or more rulesets were created.

POST /api/import/v1/ curl example

curl -X POST "https://<your-datamasque-host>/api/import/v1/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: multipart/form-data" \
     -F "zip_archive=@</path/to/your/datamasque_export_all_20240211-091507.zip>"

Other API Requests

POST /api/users/admin-install/

Authorization: Anonymous, Only when no user has been created.

Verify the DataMasque installation, and set up an admin account.

POST /api/users/admin-install/ Parameters

Field Type Required Location Description
email string Yes Request Body The email of the user you are logging in as.
username string Yes Request Body The username of the user you are logging in as.
password string Yes Request Body The password for the user.
re_password string Yes Request Body The password for the user again, to confirm the password entered above.
allowed_hosts array[string] Yes Request Body A list of hostnames, IP addresses or CIDR networks that will be allowed to access DataMasque upon installation.
aws_ec2_instance_id string Required only for AWS Marketplace installations. Request Body The instance id of the AWS EC2.
contract_license_type string Required only for AWS Contract Product installations. Request Body For contract products, the type of product to check out. Must be either business or enterprise.

POST /api/users/admin-install/ Responses

Status Code Description
201 A JSON serialised User object, with an extra warnings* item.

* Any non-critical warnings that were generated during installation are included in the warnings item of the response. This is an array of strings.

POST /api/users/admin-install/ curl example

curl -X POST "https://<your-datamasque-host>/api/users/admin-install/" \
     -H "Authorization: Token <your-api-token>" \
     -d '{
           "email": "<your-admin-email>",
           "username": "<your-username>",
           "password": "<your-admin-password>",
           "re_password": "<your-admin-password>",
           "allowed_hosts": ["masque.local"],
           "aws_ec2_instance_id": "<your-instance-id>"
         }'

Installation Info Object

A JSON object showing the state of the current installation with the following data:

Field Type Description
is_aws_marketplace boolean Whether the current installation has been installed from the AWS marketplace.
installed boolean If the current installation has been successfully installed.
is_smtp_configured boolean If SMTP has been configured on the DataMasque instance.
is_saml_sso_configured boolean Is SSO has been enabled on the DataMasque instance.

GET /api/app/check/

Authorization: User token or API token.

Checks to verify if DataMasque has successfully been installed.

GET /api/app/check/ Parameters

No parameters.

GET /api/app/check/ Response

Code 200

Description:

Status Code Description
200 A JSON serialised Installation Info Object object.

GET /api/app/check/ curl example

curl "https://<your-datamasque-host>/api/app/check/" \
     -H "Authorization: Token <your-api-token>"

POST /api/license-upload/

Authorization: User token only.

Uploads a licence file to DataMasque.

POST /api/license-upload/ Parameters

No parameters.

POST /api/license-upload/ Responses

Status Code Description
200 Operation succeeded.

POST /api/license-upload/ curl example

curl -X POST "https://<your-datamasque-host>/api/license-upload/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -F "license_file=@</path/to/your/license_file.lic>"

GET /api/license/contract-type/

Authorization: User token only.

For Cloud Contract Offer licenses, retrieve the type of license that has been configured to be used.

GET /api/license/contract-type/ Parameters

No parameters.

GET /api/license/contract-type/ Responses

Status Code Description
200 License type retrieved.
400 The licensing method is not of Cloud Contract type, so setting the license type is not supported.
404 The license type has not yet been specified.

An example response is shown below.

{
  "contract_license_type": "business"
}

contract_license_type must be one of:

  • business
  • enterprise

GET /api/license/contract-type/ curl example

curl "https://<your-datamasque-host>/api/license/contract-type/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json"

PUT /api/license/contract-type/

Authorization: Admin User token only.

For Cloud Contract Offer licenses, set the type of license to check out.

PUT /api/license/contract-type/ Parameters

Field Type Required Location Description
contract_license_type string Yes Request Body The type of license to check out. Must be one of business or enterprise.

PUT /api/license/contract-type/ Responses

Status Code Description
201 License type updated.
400 The licensing method is not of Cloud Contract type, so setting the license type is not supported, or the specified license type is invalid.

PUT /api/license/contract-type/ curl example

curl -X PUT "https://<your-datamasque-host>/api/license/contract-type/" \
     -H "Authorization: Token <your-api-token>" \
     -d '{"contract_license_type": "business"}'

Health Check Object

Various health statistics about the DataMasque instance:

Field Type Description
worker_running boolean true if the masking agent worker processes are healthy, false if there are no available workers.
license_expired boolean true if the licence is expired, false if the licence is not expired.
license_renewal_in_days integer Remaining days until licence expiry.
license_limit_breach object An object describing any licence breaches that have occurred. Each property on the object is the type of breach that has occurred. Each property value is an object containing breach_type, message, and created_date properties.

GET /api/health-check/

Authorization: User token or API token.

Get the basic health-check status of DataMasque.

GET /api/health-check/ Parameters

No parameters.

GET /api/health-check/ Responses

Status Code Description
200 A JSON serialised Health Check Object.
500 A server error has occurred, such as an invalid license file exists. The known error will be returned.

GET /api/health-check/ curl example

curl "https://<your-datamasque-host>/api/health-check/" \
     -H "Authorization: Token <your-api-token>"