DataMasque Portal

API Reference

Documentation Overview

This API Reference documentation page provides details on how to use the DataMasque API. If you're new to working with REST APIs, please read the Quickstart Examples section, which provide an overview of using Postman and curl. Otherwise, check the index listed above to quickly access documentation as well as curl examples for each endpoint.

Quickstart Examples

The examples on this page use < and > as delimiters for example values, but these characters should not be used in the requests.

Preface

These examples are made for beginners new to using REST APIs. We'll guide you step by step on how to use Postman and curl to interact with the DataMasque API. Remember, the type of request (GET, POST, PUT, etc.) depends on what is specified in our documentation for each endpoint.

An Overview of Making Requests With Postman

Please note that this documentation is based on Postman version 10.22. Future or previous versions of Postman may have slightly different options or behaviors.

Aside from the Postman example listed in the Authentication section, this is the only Postman example. However, a curl example will be listed for each endpoint.

  1. Open Postman: Launch the Postman application on your device.

  2. Create a New Request:

    • At the top of your window, select the File menu tab, then select New.
    • At modal will appear which will ask what type of request is needed. Select HTTP.
  3. Set the Request Method:

    • Above the URL field, you'll see a dropdown menu. This is where you specify the type of request (GET, POST, PUT, etc.) as indicated in our documentation for the endpoint you are using.
  4. Enter the URL:

    • In the URL field, type https://<your-datamasque-host>/<your-desired-endpoint>/.
    • Replace <your-datamasque-host> and <your-desired-endpoint> with the appropriate values provided in our documentation.
  5. Set Headers (For POST / PUT Requests):

    • If the request type is POST or PUT, you'll need to set some additional headers.
    • Go to the Headers tab just below the URL field.
    • Add two keys: Content-Type and Accept. For both, set the value to application/json, unless specified as a different value in that endpoint's documentation.
  6. Add Request Body (For POST/PUT Requests):
    Sometimes a request body needs to be specified for sending special types of request parameters to the DataMasque API. One example of is this is during the API Token retrieval request. When making PUT or POST requests, check the endpoint's documentation listed on this page to see if a request body is needed.

    • Switch to the Body tab.
    • Select raw and choose JSON from the dropdown menu that appears.
    • Enter the JSON-formatted data as specified in our documentation.
  7. Authorization (If Required):

    • Some requests might require an authorization token.
    • In the Headers tab, add a new key Authorization.
    • Set its value to Token <your-api-token>, replacing <your-api-token> with your actual token.
  8. Send the Request:

    • Click the Send button to dispatch your request.
  9. Check the Response

    • Under the Response section which takes up half of the My Workspace window in Postman, take a look at the returned response body under the Body tab.
    • If the response body is empty, take a look at the Status header to the right of the response section.
    • There should be a numerical code e.g. 200 followed by some text. If the text is not OK and the status code is 400 is higher, check the Body tab's returned JSON object to see if any error details have been specified.
    • If there are no error details returned in the response body, check if the status code next to the Status header is one listed in the Common Failiure Responses of our API documentation. This details provided there should give more insight into what may have caused the error.
    • If the status code is not listed there, refer to the endpoint's section of this documentation for any endpoint-specific error codes and their details.

An Overview of Making Requests With curl

Please note that this documentation is based on curl version 8.50.0. Future or previous versions of curl may have slightly different options or behaviors.

  1. Open Your Command Line Interface:

    • Access your terminal on macOS or Linux, or your command prompt or PowerShell on Windows.
  2. Construct Your Request:

    • Start by typing curl. This is the basic command to use the curl tool.
    • Specify the request method by adding -X request followed by the method (POST, PUT, etc.) as indicated in our documentation for the endpoint you are using. (If the request method is GET, you do not need to include the -X GET flag as that is the default.)
  3. Enter the URL:

    • After -X <METHOD>, type the full URL of the API endpoint you wish to call.
  4. Set Headers:

    • For sending headers with your request, use the -H flag followed by the header in quotes.
    • For most requests, especially POST or PUT, set the Content-Type and Accept headers to application/json, unless specified as a different value in that endpoint's documentation.
  5. Add Request Body (For POST/PUT Requests):

    • When the request type is POST or PUT, and you need to send data, use the -d option followed by the data in JSON format.
    • Ensure you've checked the endpoint's documentation to verify if a request body is needed.
  6. Include Authorization Token (If Required):

    • If an endpoint requires an authorization token, add it to the headers using the -H flag.
    • For example, -H "Authorization: Token <your-api-token>", replacing <your-api-token> with the actual token you received from the token retrieval endpoint.
  7. Send the Request:

    • After constructing the full curl command with all the necessary flags and data, press Enter to send the request.
  8. Check the Response:

    • The response will be printed directly in the terminal. Review the response body and headers to ensure your request was successful.

Quickstart generic example with curl

curl -X <GET | POST | PUT |  DELETE> "https://<your-datamasque-host>/<your-desired-endpoint>/" \
     -H "Authorization: Token <your-api-token>" \
     -H "<your-additional-headers>"

Common Responses

Upon making a request, as part of the HTTP response, you will receive a status code and potentially some details about the success status of the request made.

Most successful request response status codes will be in the range 200 to 299. For each endpoint detailed in this documentation, their expected success status codes and descriptions are listed.

However, when a response is not successful, there are several common failure response status codes which are useful to be aware of.

There may be several status codes which deviate from the ones mentioned in the following section, however, these will be listed in this documentation under specific endpoint sections.

Common Failure Responses

Status Code Description
400 Bad Request: Check the JSON body and response for more information.
401 Unauthorized: Token required.
403 Forbidden: User is not allowed.
404 Not Found.

Authentication

The DataMasque API uses token authentication. Tokens are 40-character strings containing 0-9 and a-f. Tokens should be included in the Authorization HTTP header for each request, with the word Token prepended.

For example

GET /runs/123/
Authorization: Token abcdef1234567890abcdef1234567890abcdef12

There are two types of authentication tokens:

  1. A non-expiring API Token which has access to only some endpoints. You can get this token from the My Account page.
  2. A User Token that is valid for only 12 hours, but has access to all endpoints. User tokens are granted by posting your username and password to the /api/auth/token/login/ endpoint.

The documentation for each endpoint on this page includes the type of token that is required to access it. If an endpoint does not require the use of the Authorization header then its authorization is noted as Anonymous.

The purpose and use case of each token type is explained below.

API Token

The API Token is a long-lived credential retrieved from the My Account page. It remains valid indefinitely, unless revoked (also on the My Account page). This token is valid only for use with specific API endpoints.

It is designed to be used in automated scripts whose content may not be stored securely, therefore it mainly has access to controlling masking runs and checking their status.

User Token

The User Token is exclusively issued after a successful login, either through the user interface or by making a request to /api/auth/token/login/.

This token offers enhanced security due to its limited lifetime, expiring after 12 hours, and is only accessible after a successful login. When accessing DataMasque through the UI, the token is granted as a cookie which will expire after 1 hour of inactivity.

It can be used against all API endpoints, and grants access based on the user account's permissions.

Both token types serve distinct purposes within the DataMasque API, offering a balance between security and usability.

POST /api/auth/token/login/

Authorization: Anonymous.

Login with a username and password to obtain a user_token.

POST /api/auth/token/login/ Parameters

Field Type Required Location Description
username string Yes Request Body The username of the user you are logging in as.
password string Yes Request Body The password for the user.

POST /api/auth/token/login/ Responses

Status Code Description
200 A JSON serialised user object, including a short-lived API key.

POST /api/auth/token/login/ Postman example

  1. Open Postman.
  2. Create a new request.
  3. Set the method to POST and the URL to https://<your-datamasque-host>/auth/token/login/.
  4. Under Headers, add Content-Type as a key and set the value as application/json.
  5. Select the Body tab then the raw button.
  6. Include your DataMasque login details in this format in the text editor shown:
{
  "username": "<your-username>",
  "password": "<your-password>"
}
  1. Press the blue Send button to the right of the URL bar.

POST /api/auth/token/login/ curl example

curl -X POST "https://<your-datamasque-host>/api/auth/token/login/" \
     -H "Content-Type: application/json" \
     -d '{"username": "<your-username>", "password": "<your-password>"}'

User Object

User objects have the following fields:

Field Type Description
id integer The id of the User.
username string The username for the User. Used when logging in.
email string The email of the User.
date_joined date The date the User was created.
api_token string The API token for the User.
has_temporary_password boolean Whether user has a temporary password or not. If true, the user has not finalised their account creation.
is_active boolean Whether or not the user account is active. If false, the account is disabled.
is_staff boolean Whether or not the user is a staff account.
is_superuser boolean Whether or not the account is a superuser and has admin privileges.
is_sso_user boolean Whether or not the account is an SSO enabled account.
is_subscribed_to_sdd_updates boolean Whether or not the user has subscribed to sensitive data discovery updates.

GET /api/users/

Authorization: User token only.

Returns a list of user accounts.

GET /api/users/ Parameters

No parameters.

GET /api/users/ Responses

Status Code Description
200 Returns a JSON serialised list of User objects.

GET /api/users/ Parameters

No parameters.

GET /api/users/ curl example

curl "https://<your-datamasque-host>/api/users/" \
     -H "Authorization: Token <your-api-token>"

GET /api/users/me/

Authorization: User token only.

Returns the details of the currently logged-in user.

GET /api/users/me/ Responses

Status Code Description
200 Returns a JSON serialised User object for the user that is currently logged in.

GET /api/users/me/ curl example

curl "https://<your-datamasque-host>/api/users/me/" \
     -H "Authorization: Token <your-api-token>"

Run Object

Run objects have the following fields:

Field Type Description
id integer The id of the Run. Use this in API URLs that need a run id.
name string The name of the Run.
status string The status of the Run. The potential values are: queued, running, finished, failed, cancelling, and cancelled. A status of finished indicates a run completed successfully; failed indicates an error.
mask_type string The masking type of the Run, valid options are "database" or "file".
connection string Deprecated, replaced by source_connection.
connection_name string Deprecated, replaced by source_connection_name.
source_connection string A UUID identifying the source connection used for this Run. For database connections, the source_connection also acts as the destination.
source_connection_name string The name of the source connection of the Run. For database connections, the source_connection also acts as the destination.
destination_connection Optional[string] A UUID identifying the destination connection used for this Run. Only present for file connections, as the source_connection also acts as the destination for database connections.
destination_connection_name Optional[string] The name of the destination connection of the Run. Only present for file connections, as the source_connection also acts as the destination for database connections.
ruleset string A UUID identifying the ruleset used for this Run.
ruleset_name string Ruleset name of the Run.
start_time string Start time of the Run, in iso format.
end_time string End time of the Run, in iso format.
options object An Option object of configuration for the Run.
run_hash string A hash of the id of the Run and the content of the ruleset used for masking.

GET /api/runs/

Authorization: User token or API token.

Get a list of DataMasque Runs.

GET /api/runs/ Parameters

Field Type Required Location Description
mask_type string No Query Parameter The mask type of the Run. The potential values are: database, file.
connection_ruleset_name string No Query Parameter The name of the source or destination connection name or the ruleset name of the Run.
run_status string No Query Parameter The status of the Run. The potential values are: queued, running, finished, failed, cancelling, and cancelled.
run_hash string No Query Parameter The hash of the Run.

GET /api/runs/ Responses

Status Code Description
200 A JSON serialised list of Run objects.

GET /api/runs/ curl example

curl "https://<your-datamasque-host>/api/runs/" \
     -H "Authorization: Token <your-api-token>"

POST /api/runs/

Authorization: User token or API token.

Start a new masking run.

POST /api/runs/ Parameters

Field Type Required Location Description
name string Yes Request Body The name of the Run.
connection string No Request Body Deprecated, replaced by source_connection.
source_connection string Yes Request Body A UUID identifying the source connection to be used for this Run. For database connections, the source_connection also acts as the destination.
destination_connection string Required only for runs on file connections. Request Body A UUID identifying the connection to be used for this Run.
ruleset string Yes Request Body A UUID identifying the ruleset to be used for this Run.
options object Yes Request Body An Option object of configuration for this Run.

POST /api/runs/ Responses

Status Code Description
200 A JSON serialised Run object.

POST /api/runs/ curl example

curl -X POST "https://<your-datamasque-host>/api/runs/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "name": "<run-name>",
           "source_connection": "<source-connection-uuid>",
           "destination_connection": "<destination-connection-uuid>",  # Include this only if required
           "ruleset": "<ruleset-uuid>",
           "options": {
             #... option object details ...
           }
         }'

GET /api/runs/{id}/

Authorization: User token or API token.

Retrieve information about a masking run.

GET /api/runs/{id}/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the Run.

GET /api/runs/{id}/ Responses

Status Code Description
200 A JSON serialised Run object.

GET /api/runs/{id}/ curl example

curl "https://<your-datamasque-host>/api/runs/{id}/" \
     -H "Authorization: Token <your-api-token>"

POST /api/runs/{id}/cancel/

Authorization: User token or API token.

Cancel a masking run.

POST /api/runs/{id}/cancel/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the Run.

POST /api/runs/{id}/cancel/ Responses

Status Code Description
201 Operation succeeded

POST /api/runs/{id}/cancel/ curl example

curl -X POST "https://<your-datamasque-host>/api/runs/{id}/cancel/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json"

GET /api/runs/{id}/sdd-report/

Authorization: User token only.

A binary serialised SDD Report object.

GET /api/runs/{id}/sdd-report/ Parameters

No parameters.

GET /api/runs/{id}/sdd-report/ Responses

Status Code Description
200 The server will return the SDD Report in the response body which can be downloaded as a CSV file.
404 If there is no SDD Report for a run, the server will return 404 status code.

GET /api/runs/{id}/sdd-report/ curl example

curl "https://<your-datamasque-host>/api/runs/{id}/sdd-report/" \
     -H "Authorization: Token <your-api-token>"

GET /api/runs/{id}/run-report/

Authorization: User token only.

A binary serialised Run Report object.

GET /api/runs/{id}/run-report/ Parameters

No parameters.

GET /api/runs/{id}/run-report/ Responses

Status Code Description
200 The server will return the Run Report in the response body which can be downloaded as a CSV file.
404 If there is no Run Report for a run, the server will return 404 status code.

GET /api/runs/{id}/run-report/ curl example

curl "https://<your-datamasque-host>/api/runs/{id}/run-report/" \
     -H "Authorization: Token <your-api-token>"

Option Object

Option objects have the following fields:

Field Type Description
batch_size integer An argument to specify the number of rows to fetch in each batch retrieved from the database for masking. This is ignored for file masking.
dry_run boolean Indicates a dry run where no data in the database is actually changed. Values should either be true to indicate a dry run, or false to run normally. Default value is false. More information on dry runs is available in the Masking runs documentation.
max_rows integer A parameter to specify the maximum number of rows that will be masked by each mask_table task1. Defaults to no limit. This is ignored for file masking.
continue_on_failure boolean If there is a task failure, and this option is false, DataMasque will skip all remaining unstarted tasks. If this option is true, DataMasque will continue performing other tasks even if there is a task failure. Default value is false.
run_secret string The run secret is used in the random generation of masked values. If left unspecified, a random secret will be automatically generated and returned in the API response 2. Masking runs performed on the same DataMasque instance with the same run secret will produce the same masked values for identical unmasked database inputs. You should only specify a run secret if you require consistent masking across runs, otherwise it is more secure to allow a new run secret to be automatically generated for each run. Run secrets must be at least 20 characters long.
disable_instance_secret boolean If this option is set to true, DataMasque will exclude its instance-specific secret and generate masked values based solely on the run secret. You may wish to disable the instance in order to achieve consistent masking across DataMasque instances. However, by disabling the instance secret, any DataMasque instance using the same run_secret could replicate your data masking.
diagnostic_logging boolean If set to true, the run log will include information to help diagnose errors. This includes information about the tables, columns and keys being masked, memory usage information and more verbose output. Defaults to false.
buffer_size (deprecated; will be removed in release 3.0.0) integer Replaced by batch_size.

1 max_rows does not apply to mask_unique_key tasks.

2 The run_secret contained in the API response can be provided in subsequent API calls to start runs, facilitating consistent masking across those runs.

Additionally, the following options apply to schema discovery runs (i.e. runs that include at least one run_schema_discovery task):

Field Type Description
custom_keywords list[string] List of keywords that, where a column's name matches one or more of the keywords, indicates the column contains sensitive data. Default value is an empty list.
ignored_keywords list[string] List of keywords that, where a column's name matches one or more of the keywords, indicates the column should be excluded from the schema discovery results. Default value is an empty list.
disable_global_custom_keywords boolean If set to true, then the user-defined global set of custom keywords will not be used to flag columns as sensitive. Default value is false.
disable_global_ignored_keywords boolean If set to true, then the user-defined global set of ignored keywords will not be used to exclude columns from the schema discovery results. Default value is false.
disable_built_in_keywords boolean If set to true, then DataMasque's built-in list of keywords will not be used to flag columns as sensitive. Default value is false.
schemas list[string] List of schema (database for MySQL/MariaDB) names against which to perform schema discovery. Default value is an empty list, meaning schema discovery will run against the schema configured on the database connection, or the database user's default schema. Default value is an empty list.

Runlog Object

Runlog objects have the following fields:

Field Type Description
run integer ID of the Run this Runlog was generated for.
worker_id string ID of the masking worker that generated this Runlog.
timestamp string Timestamp of this Runlog's generation, in ISO format.
message string The log message passed from the masking worker.
args string Arguments passed to the Run task.
run_status string Indicates the Run status. The potential values are: queued, running, finished, failed, cancelling, and cancelled. A status of finished indicates the Run completed successfully; failed indicates an error.
is_dry_run boolean Indicates whether the Run is a dry run.

GET /api/runs/{id}/log/

Authorization: User token or API token.

List all logs for a specified Run.

GET /api/runs/{id}/log/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the Run.

GET /api/runs/{id}/log/ Responses

Status Code Description
200 A JSON serialised list of Runlog objects.

GET /api/runs/{id}/log/ curl example

curl "https://<your-datamasque-host>/api/runs/{id}/log/" \
     -H "Authorization: Token <your-api-token>"

Connection Object

Database Connection objects have the following fields:

Field Type Description
version string The connection version. This should be set to `1.0'.
id integer The id of the Connection. Use this in API URLs that need a connection id.
name string The name of the Connection.
user string The name of the user in the database connection.
db_type string The type of database the connection is connecting to.
database string The database the connection is connecting to.
host string The hostname of the database connection.
port integer The database port being connected through.
dbpassword string The password for the user connecting to the database.
schema string The schema of the database to connect to.
options object An Option object of configuration for the Run
service_name string The service name for the connection. Only used for Oracle. (Optional)
connection_fileset string The connection fileset attached to this connection. Currently only used for MySQL and MariaDB. (Optional)
mask_type string The type of masking the connection can perform, only database or file are valid. (Optional) Should be set to database for database Connections.
last_discovery_run_date string The created_time of the last run on this connection including a run_schema_discovery task, or null if no such run has been performed.
last_discovery_run_id string The ID of the last run on this connection including a run_schema_discovery task, or null if no such run has been performed.
is_read_only boolean Whether or not the connection to the database is read-only.

File Connection objects have the following fields:

Field Type Description
version string The connection version. This should be set to `1.0'.
id integer The id of the Connection. Use this in API URLs that need a connection id.
name string The name of the Connection.
type string The type of file system the connection is connecting to. Valid options are "s3_connection", "azure_blob_connection" or "mounted_share_connection".
base_directory string The root file path where files intended to be masked are stored.
bucket string The name of the S3 bucket containing the base_directory. Only for S3 Connections.
container string The name of the Azure Blob Storage container containing the base_directory. Only for Azure Blob Connections.
connection_string string The connection string configured with the authorization information to access data in your Azure Storage account. Only for Azure Blob Connections.
mask_type string The type of masking the connection can perform, only database or file are valid. (Optional) Should be set to file for file Connections.
is_file_mask_source boolean A boolean if the connection is a source Connection for file masking. (Optional) Defaults to false if not provided.
is_file_mask_destination boolean A boolean if the connection is a destination Connection for file masking. (Optional) Defaults to false if not provided.

GET /api/connections/

Authorization: User token only.

Get a list of all DataMasque connections.

Optionally, you can add an {id} to the end of the request to only return the details of the connection with that specific id.

GET /api/connections/ Parameters

Can optionally follow the URL with the id of a specific connection to only return information on that connection.

GET /api/connections/ Responses

Status Code Description
200 A JSON serialised Connection object.

Quickstart example using curl

curl "https://<your-datamasque-host>/api/connections/" \
     -H "Authorization: Token <your-api-token>"

POST /api/connections/

Authorization: User token only.

Create a new connection object.

POST /api/connections/ Parameters

Database Connections

Field Type Required Location Description
version string Yes Request Body The connection version. This should be set to 1.0.
name string Yes Request Body The name of the Connection.
user string Yes Request Body The name of the user in the database connection.
db_type string Yes Request Body The type of database the connection is connecting to.
database string Yes Request Body The database the connection is connecting to.
host string Yes Request Body The hostname of the database connection.
port integer Yes Request Body The database port being connected through.
dbpassword string Yes Request Body The password for the user connecting to the database.
schema string Yes Request Body The schema of the database to connect to.
service_name string No Request Body The service name for the connection. Only applies to Oracle.
connection_fileset string No Request Body The connection fileset attached to this connection. Only applies to MySQL and MariaDB.
mask_type string No, defaults to database if not provided. Request Body The type of masking the connection can perform, only database or file are valid.
is_read_only boolean No, defaults to false if not provided. Request Body Whether or not the connection to the database read-only.

File Connections

Field Type Required Location Description
version string Yes Request Body The connection version. This should be set to `1.0'.
name string Yes Request Body The name of the Connection.
type string Yes Request Body The type of file system the connection is connecting to. Valid options are "s3_connection", "azure_blob_connection" or "mounted_share_connection".
base_directory string Yes Request Body The root file path where files intended to be masked are stored.
bucket string Required only for S3 Connections. Request Body The name of the S3 bucket containing the base_directory.
container string Required only for Azure Blob Connections. Request Body The name of the Azure Blob Storage container containing the base_directory.
connection_string string Required only for Azure Blob Connections. Request Body The connection string configured with the authorization information to access data in your Azure Storage account.
mask_type string No, defaults to database if not provided. Request Body The type of masking the connection can perform, only database or file are valid.
is_file_mask_source boolean No, defaults to false if not provided. Request Body A boolean if the connection is a source Connection for file masking.
is_file_mask_destination boolean No, defaults to false if not provided. Request Body A boolean if the connection is a destination Connection for file masking.

POST /api/connections/ Responses

Status Code Description
201 A JSON serialised Connection object.

POST /api/connections/ curl example

curl -X POST "https://<your-datamasque-host>/api/connections/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "version": "1.0",
           "name": "<connection_name>",
           "user": "<database_user>",
           "db_type": "<database_type>",
           "database": "<database_name>",
           "host": "<database_host>",
           "port": <database_port>,
           "password": "<database_password>",
           "schema": "<database_schema>",
           "service_name": "<oracle_service_name>",
           "connection_fileset": "<connection_fileset>",
           "mask_type": "database"
         }'

PUT /api/connections/{id}/

Authorization: User token only.

Update a connection with a specified id with new values.

PUT /api/connections/{id}/ Parameters

Database Connections

Field Type Required Location Description
version string Yes Request Body The connection version. This should be set to 1.0.
name string Yes Request Body The name of the Connection.
user string Yes Request Body The name of the user in the database connection.
db_type string Yes Request Body The type of database the connection is connecting to.
database string Yes Request Body The database the connection is connecting to.
host string Yes Request Body The hostname of the database connection.
port integer Yes Request Body The database port being connected through.
dbpassword string Yes Request Body The password for the user connecting to the database.
schema string Yes Request Body The schema of the database to connect to.
service_name string No Request Body The service name for the connection. Only applies to Oracle.
connection_fileset string No Request Body The connection fileset attached to this connection. Only applies to MySQL and MariaDB.
mask_type string No, defaults to database if not provided. Request Body The type of masking the connection can perform, only database or file are valid.
is_read_only boolean No, defaults to false if not provided. Request Body Whether or not the connection to the database is read-only.

File Connections

Field Type Required Location Description
version string Yes Request Body The connection version. This should be set to `1.0'.
name string Yes Request Body The name of the Connection.
type string Yes Request Body The type of file system the connection is connecting to. Valid options are "s3_connection", "azure_blob_connection" or "mounted_share_connection".
base_directory string Yes Request Body The root file path where files intended to be masked are stored.
bucket string Required only for S3 Connections. Request Body The name of the S3 bucket containing the base_directory.
container string Required only for Azure Blob Connections. Request Body The name of the Azure Blob Storage container containing the base_directory.
connection_string string Required only for Azure Blob Connections. Request Body The connection string configured with the authorization information to access data in your Azure Storage account.
mask_type string No, defaults to database if not provided. Request Body The type of masking the connection can perform, only database or file are valid.
is_file_mask_source boolean No, defaults to false if not provided. Request Body A boolean if the connection is a source Connection for file masking.
is_file_mask_destination boolean No, defaults to false if not provided. Request Body A boolean if the connection is a destination Connection for file masking.

PUT /api/connections/{id}/ Responses

Status Code Description
200 A JSON serialised Connection object with the new updated values.

PUT /api/connections/{id}/ curl example

curl -X PUT "https://<your-datamasque-host>/api/connections/{connection_id}/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "version": "1.0",
           "name": "<connection_name>",
           "user": "<database_user>",
           "db_type": "<database_type>",
           "database": "<database_name>",
           "host": "<database_host>",
           "port": <database_port>,
           "password": "<database_password>",
           "schema": "<database_schema>",
           "service_name": "<oracle_service_name>",
           "connection_fileset": "<connection_fileset>",
           "mask_type": "database"
         }'

DELETE /api/connections/{id}/

Authorization: User token only.

Delete the connection with the specified id.

DELETE /api/connections/{id}/ Parameters

No parameters.

DELETE /api/connections/{id}/ Responses

Status Code Description
204 Operation succeeded

DELETE /api/connections/{id}/ curl example

curl -X DELETE "https://<your-datamasque-host>/api/connections/{id}/" \
     -H "Authorization: Token <your-api-token>"

POST /api/connections/test/

Authorization: User token only.

Test a connection to validate that it is able to successfully connect to the target database.

POST /api/connections/test/ Parameters

Database Connections

Field Type Required Location Description
version string Yes Request Body The connection version. This should be set to 1.0.
name string Yes Request Body The name of the Connection.
user string Yes Request Body The name of the user in the database connection.
db_type string Yes Request Body The type of database the connection is connecting to.
database string Yes Request Body The database the connection is connecting to.
host string Yes Request Body The hostname of the database connection.
port integer Yes Request Body The database port being connected through.
dbpassword string Yes Request Body The password for the user connecting to the database.
schema string Yes Request Body The schema of the database to connect to.
service_name string No Request Body The service name for the connection. Only applies to Oracle.
connection_fileset string No Request Body The connection fileset attached to this connection. Only applies to MySQL and MariaDB.
is_read_only boolean No, defaults to false if not provided. Request Body Whether or not the connection to the database is read-only.

File Connections

Field Type Required Location Description
version string Yes Request Body The connection version. This should be set to `1.0'.
name string Yes Request Body The name of the Connection.
type string Yes Request Body The type of file system the connection is connecting to. Valid options are "s3_connection", "azure_blob_connection" or "mounted_share_connection".
base_directory string Yes Request Body The root file path where files intended to be masked are stored.
bucket string Required only for S3 Connections. Request Body The name of the S3 bucket containing the base_directory.
container string Required only for Azure Blob Connections. Request Body The name of the Azure Blob Storage container containing the base_directory.
connection_string string Required only for Azure Blob Connections. Request Body The connection string configured with the authorization information to access data in your Azure Storage account.
mask_type string No, defaults to database if not provided. Request Body The type of masking the connection can perform, only database or file are valid.
is_file_mask_source boolean No, defaults to false if not provided. Request Body A boolean if the connection is a source Connection for file masking.
is_file_mask_destination boolean No, defaults to false if not provided. Request Body A boolean if the connection is a destination Connection for file masking.

POST /api/connections/test/ Responses

Status Code Description
200 Operation succeeded

Connection Fileset Object

Connection Fileset objects have the following fields:

Field Type Description
id integer The id of the Connection Fileset. Use this in API URLs that need a connection_fileset id.
name string The name of the Connection Fileset.
database_type string The type of database the Connection Fileset is associated with (currently only mysql is supported; this will work with both MySQL and MariaDB connections).
zip_archive string The location of the Zip archive.

POST /api/connections/test/ curl example

curl -X POST "https://<your-datamasque-host>/api/connections/test/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "name": "<your-connection-name>",
           "user": "<your-connection-user>",
           "db_type": "oracle",
           "database": "<your-database>",
           "host": "<your-host>",
           "port": 1433,
           "dbpassword": "<your-password>",
           "schema": "<optional-schema>",
           "service_name": "<optional-service-name>",
           "connection_fileset": "<optional-connection-fileset>",
           "version": "1.0"
         }'

GET /api/connection-filesets/

Authorization: User token only.

Returns a list of Connection Filesets. These may be used to encrypt connections to MySQL and MariaDB databases.

GET /api/connection-filesets/ Parameters

No parameters.

GET /api/connection-filesets/ Responses

Status Code Description
201 A list of JSON serialised Connection Filesets.

GET /api/connection-filesets/ curl example

curl "https://<your-datamasque-host>/api/connection-filesets/" \
     -H "Authorization: Token <your-api-token>"

POST /api/connection-filesets/

Authorization: User token only.

Create a new Connection Fileset.

POST /api/connection-filesets/ Parameters

Field Type Required Location Description
name string Yes Form Field The name of the Connection Fileset.
database_type string Yes Form Field The type of database the Connection Fileset is associated with (currently only mysql is supported; this will work with both MySQL and MariaDB connections).
zip_archive file Yes Form Field The Zip archive file.

POST /api/connection-filesets/ Responses

Status Code Description
201 A JSON serialised object of the Connection Fileset that was created.

POST /api/connection-filesets/ curl example

curl -X POST "https://<your-datamasque-host>/api/connection-filesets/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: multipart/form-data" \
     -F "name=<fileset_name>" \
     -F "database_type=<database_type>" \
     -F "zip_archive=@</path/to/your/file.zip>"

PUT /api/connection-filesets/{id}/

Authorization: User token only.

Update a Connection Fileset.

PUT /api/connection-filesets/{id}/ Parameters

Field Type Required Location Description
name string Yes Form Field The name of the Connection Fileset.
database_type string Yes Form Field The type of database the Connection Fileset is associated with (currently only mysql is supported; this will work with both MySQL and MariaDB connections).
zip_archive file Yes Form Field The Zip archive file.

PUT /api/connection-filesets/{id}/ Responses

Status Code Description
201 A JSON serialised object of the Connection Fileset that was created.

PUT /api/connection-filesets/{id}/ curl example

curl -X PUT "https://<your-datamasque-host>/api/connection-filesets/{id}/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: multipart/form-data" \
     -F "name=<fileset_name>" \
     -F "database_type=<database_type>" \
     -F "zip_archive=@</path/to/your/file.zip>"

DELETE /api/connection-filesets/{id}/

Authorization: User token only.

Deletes the Connection Fileset with the specified id. You may not delete a Connection Fileset associated to an existing connection.

DELETE /api/connection-filesets/{id}/ Parameters

No parameters.

DELETE /api/connection-filesets/{id}/ Responses

Status Code Description
204 Operation succeeded.

DELETE /api/connection-filesets/{id}/ curl example

curl -X DELETE "https://<your-datamasque-host>/api/connection-filesets/{id}/" \
     -H "Authorization: Token <your-api-token>"

`

Ruleset Object

Ruleset objects have the following fields:

Field Type Description
id integer The id of the Ruleset. Use this in API URLs that need a ruleset id.
name string The name of the Ruleset.
config_yaml string The contents of the Ruleset, including of all the masking rules.
is_valid boolean Whether or not the Ruleset is valid, and can be used for masking runs.
mask_type string The masking type of the Ruleset. This can be "database" or "file".

GET /api/rulesets/

Authorization: User token only.

Returns a list of all rulesets.

GET /api/rulesets/ Parameters

No parameters.

GET /api/rulesets/ Responses

Status Code Description
200 A JSON serialised list of Ruleset objects.

GET /api/rulesets/ curl example

curl "https://<your-datamasque-host>/api/rulesets/" \
     -H "Authorization: Token <your-api-token>"

GET /api/rulesets/{id}/

GET /api/rulesets/{id}/ Parameters

No parameters.

GET /api/rulesets/ Responses

Status Code Description
200 A JSON serialised Ruleset object.
curl "https://<your-datamasque-host>/api/rulesets/{id}/" \
     -H "Authorization: Token <your-api-token>"

POST /api/rulesets/

Authorization: User token only.

Creates a new ruleset.

POST /api/rulesets/ Parameters

Field Type Required Location Description
name string Yes Request Body The name of the Ruleset.
config_yaml string Yes Request Body The YAML contents of the Ruleset.
mask_type string No Request Body The masking type of the Ruleset. Valid options are "database" or "file".

POST /api/rulesets/ Responses

Status Code Description
201 A JSON serialised Ruleset object.

POST /api/rulesets/ curl example

curl -X POST "https://<your-datamasque-host>/api/rulesets/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "name": "<your-new-name>",
           "config_yaml": "version: \"1.0\"\ntasks:\n  - type: run_data_discovery"
         }'

PUT /api/rulesets/{id}/

Authorization: User token only.

Update an existing ruleset.

PUT /api/rulesets/{id}/ Parameters

Field Type Required Location Description
name string Yes Request Body The name of the Ruleset.
config_yaml string Yes Request Body The YAML contents of the Ruleset.
mask_type string No Request Body The masking type of the Ruleset. Valid options are "database" or "file".

PUT /api/rulesets/{id}/ Responses

Status Code Description
200 A JSON serialised Ruleset object with the updated values.

PUT /api/rulesets/{id}/ curl example

curl -X PUT "https://<your-datamasque-host>/api/rulesets/{id}/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "name": "<your-new-name>",
           "config_yaml": "version: \"1.0\"\ntasks:\n  - type: run_data_discovery"
         }'

DELETE /api/rulesets/{id}/

Authorization: User token only.

Deletes the ruleset with the specified id.

DELETE /api/rulesets/{id}/ Parameters

No parameters.

DELETE /api/rulesets/{id}/ Responses

Status Code Description
200 Operation succeeded

DELETE /api/rulesets/{id}/ curl example

curl -X DELETE "https://<your-datamasque-host>/api/rulesets/{id}/" \
     -H "Authorization: Token <your-api-token>" \

Seed Object

Field Type Description
id integer The id of the Seed.
name string The name of the Seed.
seed_file string The location of the Seed.
created date datetime The date that the Seed was uploaded.
filename string The file name of the uploaded Seed.

GET /api/seeds/

Authorization: User token only.

Get a list of all DataMasque seed files.

Optionally, you can add an {id} to the end of the request to only return the details of the seed with that specific id.

GET /api/seeds/ Parameters

No parameters.

GET /api/seeds/ Responses

Status Code Description
200 A JSON serialised list of Seed objects.

GET /api/seeds/ curl example

curl "https://<your-datamasque-host>/api/seeds/" \
     -H "Authorization: Token <your-api-token>"

POST /api/seeds/

Authorization: User token only.

Create a new seed from a csv file.

POST /api/seeds/ Parameters

Field Type Required Description
name string No The name of the csv file.
description string No A description of the seed file to displayed on the files menu.
seed_file file No The seed file.

POST /api/seeds/ Responses

Status Code Description
201 A JSON serialised Seed object.

POST /api/seeds/ curl example

curl -X POST "https://<your-datamasque-host>/api/seeds/" \
     -H "Authorization: Token <your-api-token>" \
     -F "name=<fileset_name>" \
     -F "seed_file=@</path/to/your/seed_file.csv>"

Audit Log Object

Field Type Description
id integer The id of the audit log.
timestamp datetime The timestamp of when the audit log was created.
username string The username which created the audit log.
category string The category for the audit log, one of the following: auth, run, ruleset, or connection
action string The action taken. One of the following: logged_in logged_out, for auth actions, started, cancelled, for masking run actions, created, modified, deleted for connection or ruleset actions.
description string A short description of what happened during the action.

Audit Log CSV

A CSV representation of the Audit Log Object

The CSV file contains the following headers:

Field Type Description
timestamp datetime The timestamp of when the audit log was created.
username string The username which created the audit log.
category string The category for the audit log, one of the following: auth, run, ruleset, or connection
action string The action taken. One of the following: logged_in logged_out, for auth actions, started, cancelled, for masking run actions, created, modified, deleted for connection or ruleset actions.
description string A short description of what happened during the action.

GET /api/audit-logs/

Authorization: User token only.

Retrieve all Audit Logs.

GET /api/audit-logs/ Parameters

No parameters.

GET /api/audit-logs/ Response

Status Code Description
200 A list of JSON serialised list of Audit Log objects

GET /api/audit-logs/ curl example

curl "https://<your-datamasque-host>/api/audit-logs/" \
     -H "Authorization: Token <your-api-token>"

GET /api/audit-logs/download/

Authorization: User token only.

Retrieve all Audit Logs.

GET /api/audit-logs/download/ Parameters

No parameters.

GET /api/audit-logs/download/ Response

Status Code Description
200 The server will return the audit logs in the response body which can be then downloaded as a CSV file.

GET /api/audit-logs/download/ curl example

curl -o <your-downloads-path>/<your-download-name>.csv -X GET "https://<your-datamasque-host>/api/audit-logs/" \
     -H "Authorization: Token <your-api-token>"

POST /api/generate-ruleset/

Authorization: User token only.

Returns a ruleset string for selected columns of a connection.

Prerequisite: Make sure you have the schema-discovery report for the connection specified in the post data.

Generate Ruleset Result Object

Generate Ruleset Result objects have the following fields:

Field Type Description
id integer The id of the Generate Ruleset Result.
connection_id string The ID of the connection we have generated ruleset on.
generated_ruleset string The ruleset that have been generated.
status string The status of the generating ruleset progress.
error_message string The error message when generating ruleset failed.

POST /api/generate-ruleset/[v1/|v2/] curl example

curl -X POST "https://<your-datamasque-host>/api/generate-ruleset/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "connection": "<your-connection-id>",
           "selected_columns": {
             "schema_name": {
               "table_name": [
                 "column_name_1",
                 "column_name_2"
               ]
             }
           }
         }'

POST /api/generate-ruleset/[v1/] Response

The default response for a version 1 request is a json encoded string containing the ruleset yaml. The trailing /v1/ is optional for version 1.

POST /api/generate-ruleset/v2/ Response

The version 2 response is a plain text containing the ruleset yaml.

POST /api/generate-file-ruleset/

Authorization: User token only.

Returns a ruleset string for selected data of a file connection.

The selected data is a list of file groups, each of which contains:

  • A list of files which are the full paths relative to the base directory of the connection.
  • A list of locators, which are either JSON locators or strings containing a single header column name. JSON locators must be formatted as lists even if they consist of a single element.

Each file group will generate at least one task in the ruleset (either mask_file or mask_tabular_file).

Generally, only one task will be generated per file group, but in cases where files have different extensions, delimiters or encodings, multiple tasks will be generated to cater for these settings.

File groups should only contain files of the same type, that is, don't specify object files, multi-record files, or tabular files in the same file group. If multiple file types are mixed, then the generated ruleset will attempt to split into multiple tasks, but the results may be unexpected.

Prerequisite: Make sure you have the file-discovery report for the connection specified in the POST data so that a discovery run has been completed on the connection and the files can be selected from the report.

POST /api/generate-file-ruleset/ curl example

curl -X POST "https://<your-datamasque-host>/api/generate-file-ruleset/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "connection": "<your-connection-id>",
           "selected_data": [
             {
               "files": ["file1.json", "file2.json"],
               "locators": [["age"], ["users", "*", "name"]]
             },
             {
               "files": ["file1.csv", "file2.csv"],
               "locators": ["gender", "address"]
             },
             [repeated for different file groups…]
           ],
         }'

POST /api/generate-file-ruleset/ Response

The response is plain text containing the ruleset yaml.

GET /api/async-generate-ruleset/{connection_id}/

Authorization: User token only.

Returns result of generating ruleset progress.

GET /api/async-generate-ruleset/{connection_id}/ Parameters

Field Type Required Location Description
connection_id string Yes URL Path The id of the Connection.

GET /api/async-generate-ruleset/{connection_id}/ Responses

Status Code Description
200 A JSON serialised Generate Ruleset Result Object.

GET /api/async-generate-ruleset/{connection_id}/ curl example

curl "https://<your-datamasque-host>/api/async-generate-ruleset/{connection_id}/" \
     -H "Authorization: Token <your-api-token>"

POST /api/async-generate-ruleset/{connection_id}/

Authorization: User token only.

Start generating ruleset for selected columns of a database connection or for selected data of a file connection.

POST /api/async-generate-ruleset/{connection_id}/ Parameters

Field Type Required Location Description
connection_id string Yes URL Path The id of the Connection.

For generating rulesets on database connections:

POST /api/async-generate-ruleset/{connection_id}/ curl example

curl -X POST "https://<your-datamasque-host>/api/async-generate-ruleset/{connection_id}/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "selected_columns": {
             "schema_name": {
               "table_name": [
                 "column_name_1",
                 "column_name_2"
               ]
             }
           }
         }'

For generating rulesets for file connections:

POST /api/async-generate-ruleset/{connection_id}/ curl example

curl -X POST "https://<your-datamasque-host>/api/async-generate-ruleset/{connection_id}/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "selected_data": [
             {
               "files": ["file1.json", "file2.json"],
               "locators": [["age"], ["users", "*", "name"]]
             },
             {
               "files": ["file1.csv", "file2.csv"],
               "locators": ["gender", "address"]
             },
             [repeated for different file groups…]
           ],
         }'

POST /api/run-file-data-discovery/

Authorization: User token only.

Executes data discovery against files on a file connection. The file connection must already be configured. Use the UUID of the file connection in the request, which can be found:

  • at the top of the page when you view the connection in the DataMasque UI, or
  • in the URL when you view the connection in the DataMasque UI, or
  • in the id field of the Connection Object.

Discovery keywords

By default, DataMasque's extensive list of built-in keywords is used to identify which fields and attributes in the files are considered sensitive. DataMasque matches the name of the field or attribute against each keyword using a case-insensitive, partial match. For example, a field named credit_CARD_NUMBER will match the Credit card keyword.

You can use various options to refine the set of discovery keywords.

  • Setting disable_built_in_keywords to true means that the built-in keyword list linked above will not be used. In this case, the discovery process will use only the keywords given in custom_keywords and any configured global custom keywords.
  • The custom_keywords option allows you to specify a list of additional keywords to match on. Any fields or attributes whose name includes one or more of those keywords will be flagged as sensitive.
  • A match between a field or attribute's name and a value in the ignored_keywords list will cause a field or attribute to be completely excluded from the results, even if its name suggests that the field may contain sensitive data.
  • Global keywords, as configured through the Settings page of the DataMasque UI, are also considered unless disable_global_custom_keywords and/or disable_global_ignored_keywords (as appropriate) are set to true.

Warning! "Ignore" keywords have priority. If a field or attribute name matches both a built-in, global, or custom keyword and an entry in ignore_keywords or a global ignored keyword, the field or attribute will not be included in the discovery results.

Specifying files to discover

Supported filetypes for discovery are:

  • JSON (.json)
  • NDJSON (.ndjson)
  • Parquet (.parquet)
  • CSV (.csv)

Note: Files' types are determined solely by the file extension, not by their content.

Use the include, skip and recurse options to control which files are included in the discovery process. These have the same syntax and meaning as in a from_file task definition. If none of these options are included, DataMasque will run discovery against all files (of the supported filetypes) in the base directory specified on the connection, but will not recurse into subdirectories.

See also Choosing files to mask with include/skip for an exact specification of the behaviour of, and some common examples of, include and skip rules.

Warning! If a file matches both an include and a skip rule, that file will not be included in data discovery.

Note: Take care to correctly escape backslashes in include or skip regexes. For example, if you want to match a literal dot (.) in a filename, the regex needs to escape the dot with a backslash and this backslash must itself be escaped as part of JSON encoding rules, since the request body is in JSON format. So you might use the JSON object {"regex": "file\\.[0-9]+\\.csv"}, representing the regex file\.[0-9]+\.csv which will match file.53.csv but not filex53.csv.

Encoding of CSV files

The encoding option controls how DataMasque interprets CSV files. The default encoding is utf-8. Refer to Python Standard Encodings for a list of supported encodings.

Supported Parquet column types

The list of Parquet column data types supported by file discovery is the same as the list of supported data types for Parquet masking. See the list of supported data types here.

POST /api/run-file-data-discovery/ Parameters

Field Type Required Description
connection string Yes The id of the Connection.
custom_keywords list[string] No List of keywords that, where a field or attribute's name matches one or more of the keywords, indicates the column contains sensitive data. Default value is an empty list.
ignored_keywords list[string] No List of keywords that, where a field or attribute's name matches one or more of the keywords, indicates the field or attribute should be excluded from the schema discovery results. Default value is an empty list.
disable_global_custom_keywords boolean No If set to true, then the user-defined global set of custom keywords will not be used to flag fields or attributes as sensitive. Default value is false.
disable_global_ignored_keywords boolean No If set to true, then the user-defined global set of ignored keywords will not be used to exclude fields or attributes from the discovery results. Default value is false.
disable_built_in_keywords boolean No If set to true, then DataMasque's built-in list of keywords will not be used to flag fields or attributes as sensitive. Default value is false.
include list[object] No Files to discover, specified as glob or regex. Default value is an empty list, meaning everything will be included.
skip list[object] No Files to exclude, specified as glob or regex. Default value is an empty list, meaning everything will be included.
recurse boolean No Whether to recurse into subdirectories of the base directory, or of items matched by include. Default value is false.
encoding string No File byte encoding. Only applies to CSV files. Default value is utf-8.

POST /api/run-file-data-discovery/ Responses

Data discovery runs asynchronously as a special type of masking run. This API endpoint returns a Run object which contains an id field. Use the GET /api/runs/{id}/ endpoint with this run ID to query the status of the data discovery process. To retrieve the file discovery results when the run is complete, use the GET /api/runs/{id}/file-discovery-results/ endpoint with this run ID.

Status Code Description
201 A JSON serialised Run object.

POST /api/run-file-data-discovery/ curl example

curl -X POST "https://<your-datamasque-host>/api/run-file-data-discovery" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "connection": "<your-connection-id>",
           "custom_keywords": ["id1", "id2"],
           "ignored_keywords": ["ignore1"],
           "include": [
             {"glob": "*.ndjson"},
             {"glob": "*.json"},
           ],
           "skip": [
             {"regex": "backup/staff[0-9]+\\.json"},
           ],
           "recurse": true
         }'

GET /api/runs/{id}/file-discovery-results/

Authorization: User token or API token.

Retrieve file discovery results.

GET /api/runs/{id}/file-discovery-results/ Parameters

Field Type Required Location Description
id integer Yes URL Path The id of the Run.

GET /api/runs/{id}/file-discovery-results/ Responses

Status Code Description
200 A JSON serialised list of File Discovery Result objects.

GET /api/runs/{id}/file-discovery-results/ curl example

curl "https://<your-datamasque-host>/api/runs/{id}/file-discovery-results/" \
     -H "Authorization: Token <your-api-token>"

GET /api/runs/{id}/file-discovery-results/ Example response

This shows a group of results where one file was discovered with a Metadata match on Passenger ID, an In-Data match on Name and no matches on Ticket.

[
  {
    "id": 1,
    "connection": {
      "id": "f795b7f1-d654-41c8-bb7c-db741d81dc19",
      "name": "example_file_source"
    },
    "file_type": "csv",
    "files": [
      {
        "path": "example.csv",
        "delimiter": ",",
        "encoding": "utf-8",
        "file_type": "csv",
      }
    ],
    "results": [
      {
        "locator": "PassengerId",
        "matches": [
          {
            "label": "identifiers",
            "categories": ["PII", "PHI"],
            "flagged_by": "Metadata Discovery",
            "description": "Identification"
          }
        ],
        "data_types": ["int"]
      },
      {
        "locator": "Name",
        "matches": [
          {
            "label": "name",
            "categories": ["PII", "PCI", "PHI"],
            "flagged_by": "In-Data Discovery",
            "description": "Full Names"
          }
        ],
        "data_types": ["str"]
      },
      {
        "locator": "Ticket",
        "matches": [],
        "data_types": ["str"]
      }
    ]
  }
]

File Discovery Result Object

File Discovery Result objects have the following fields:

Field Type Description
id integer The id of the File Discovery Result.
connection object The UUID and name identifying the connection used for this File Discovery Result.
file_type string The file type (csv, parquet, json, or ndjson). File Discovery Results are grouped by file type.
files list['object'] A list of File objects.
results list['object'] A list of Result objects.

File Object

File objects have the following fields:

Field Type Description
path string The discovered file's path, relative to the base directory of the connection.
file_type string The file type (csv, parquet, json, or ndjson).
delimiter Optional['string'] For delimited text files, the field separator. e.g "," for csv
encoding Optional['string'] The file encoding, for example "utf-8".

Result Object

Result objects have the following fields:

Field Type Description
locator list['string' or 'int'] or string Either a JSON locator or a column name.
matches list['object'] A list of Match objects.
data_types list['string'] The list of data types found for this field: int, long, str, date, time, year, timestamp, boolean, float, or decimal.

Match Object

Match objects have the following fields:

Field Type Description
categories list['string'] A list of classifications for the flagged sensitive data: PII, PHI, PCI and/or Custom.
flagged_by string Whether the column was flagged for sensitive data through in-data discovery or through the standard sensitive data discovery / keyword matching process. Metadata Discovery or In-Data Discovery.
description string The name of the rule which caused the column to be flagged for sensitive data.
label string Machine-readable representation of description.

GET /api/oracle-wallets/

Authorization: User token only.

Returns a list of Oracle wallets. These are used to connect to encrypted Oracle connections.

GET /api/oracle-wallets/ Parameters

No parameters.

GET /api/oracle-wallets/ Responses

Status Code Description
201 A JSON serialised list of Oracle wallets.

GET /api/oracle-wallets/ curl example

curl "https://<your-datamasque-host>/api/oracle-wallets/" \
     -H "Authorization: Token <your-api-token>"

POST /api/oracle-wallets/

Authorization: User token only.

Create a new Oracle wallet.

POST /api/oracle-wallets/ Parameters

Field Type Required Location Description
name string Yes Form Field The name of the Oracle Wallet.
zip_archive file Yes Form Field The Zip archive file.

POST /api/oracle-wallets/ Responses

Status Code Description
201 A JSON serialised Oracle wallet object of the wallet created.

POST /api/oracle-wallets/ curl example

curl -X POST "https://<your-datamasque-host>/api/oracle-wallets/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: multipart/form-data" \
     -F "name=<fileset_name>" \
     -F "zip_archive=@</path/to/your/file.zip>"

DELETE /api/oracle-wallets/{id}/

Authorization: User token only.

Delete the Oracle wallet with the specified id.

DELETE /api/oracle-wallets/{id}/ Parameters

No parameters.

DELETE /api/oracle-wallets/{id}/ Responses

Status Code Description
204 Operation succeeded.

DELETE /api/oracle-wallets/{id}/ curl example

curl -X DELETE "https://<your-datamasque-host>/api/oracle-wallets/{id}/" \
     -H "Authorization: Token <your-api-token>"

Git Setting Object

Git settings are global for the DataMasque instance and can only be updated by an admin user. Git settings are updated on the Settings page in the DataMasque UI.

Git Setting objects have the following fields:

Field Type Description
git_repository_url string The URL of where the Git repository is hosted.
git_branch string The name of the Git branch from which DataMasque will push or pull.
git_directory_path string The directory that DataMasque will push and pull rulesets to, relative to the root of the repository. Note that DataMasque does not support pushing/pulling rulesets in subdirectories of this directory.

GET /api/git-setting/

Authorization: User token only.

Retrieve a Git Setting Object with information about the DataMasque instance's Git settings.

GET /api/git-setting/ Parameters

No parameters.

GET /api/git-setting/ Responses

Status Code Description
200 A JSON serialized Git Setting Object for the DataMasque instance.

GET /api/git-setting/ curl example

curl "https://<your-datamasque-host>/api/git-setting/" \
     -H "Authorization: Token <your-api-token>"

SSH Key Object

SSH Key objects have the following fields:

Field Type Description
name string The specified filename of the SSH Key file.
date_uploaded string The ISO 8601 datetime string of when the user uploaded the SSH key.

GET /api/git-ssh-key/

Authorization: User token only.

Retrieve an SSH Key Object for information about the current user's uploaded SSH Key.

GET /api/git-ssh-key/ Parameters

No parameters.

GET /api/git-ssh-key/ Responses

Status Code Description
200 A JSON serialized SSH Key Object which is the most recent SSH Key Upload for the user which made the request.

GET /api/git-ssh-key/ curl example

curl "https://<your-datamasque-host>/api/git-ssh-key/" \
     -H "Authorization: Token <your-api-token>"

PUT /api/git-ssh-key/

Authorization: User token only.

Upload an SSH Key to be used to access a Git remote repository.

Warning: A user may have only one SSH key at a time, so the existing key will be deleted and replaced with the uploaded key for the user making the request.

PUT /api/git-ssh-key/ Parameters

Field Type Required Location Description
key_file file Yes Form Field The SSH Key file.
name string Yes Form Field The name of the file.

PUT /api/git-ssh-key/ Responses

Status Code Description
200 A JSON serialized SSH Key Object, which is the most recent SSH Key Upload for the user making the request.

PUT /api/git-ssh-key/ curl example

curl -X PUT "https://<your-datamasque-host>/api/git-ssh-key/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: multipart/form-data" \
     -F "key_file=@</path/to/your/file>" \
     -F "name=<your-ssh-key-filename>"

DELETE /api/git-ssh-key/

Authorization: User token only.

Delete the current user's uploaded SSH key.

DELETE /api/git-ssh-key/ Parameters

No parameters.

DELETE /api/git-ssh-key/ Responses

Status Code Description
204 The SSH key associated with the requesting user has been deleted.

DELETE /api/git-ssh-key/ curl example

curl DELETE -X "https://<your-datamasque-host>/api/git-ssh-key/" \
     -H "Authorization: Token <your-api-token>"

GET /api/ruleset-git/

Authorization: User token only.

Pull the content of a specific ruleset given its commit ID. The current user's Git SSH key is used for authentication.

How File Paths Are Built

Internally, DataMasque generates the name of the file by appending .yml to ruleset_name. The file name is then appended to git_directory_path (from the DataMasque Git Settings) to build the full file path. For example, for a ruleset_name of My Ruleset and git_directory_path of masking/rulesets, the file masking/rulesets/My Ruleset.yml will be retrieved. Its contents will be that as at the specified commit ID.

GET /api/ruleset-git/ Parameters

Field Type Required Location Description
commit_id string Yes Query Parameter The Git commit ID for the ruleset.
ruleset_name string Yes Query Parameter The name of the ruleset. Used to build the path as per How File Paths Are Built above.

GET /api/ruleset-git/ Responses

Status Code Description
200 A JSON object with a single key, config_yaml, that contains the ruleset content

GET /api/ruleset-git/ curl example

curl "https://<your-datamasque-host>/api/ruleset-git/?commit_id=<your-full-commit-id>&ruleset_name=<your-ruleset-name>" \
     -H "Authorization: Token <your-api-token>"

POST /api/ruleset-git/

Authorization: User token only.

Commit then push changes upstream for a specific ruleset.

POST /api/ruleset-git/ Parameters

Field Type Required Location Description
commit_message string Yes Request Body The Git commit message for the ruleset changes.
ruleset_name string Yes Request Body The name of the ruleset. Used to build the path as per How File Paths Are Built above.
ruleset_content string Yes Request Body The YAML contents of the ruleset.

POST /api/ruleset-git/ Responses

Status Code Description
200 Operation succeeded.

POST /api/ruleset-git/ curl example

curl -X POST "https://<your-datamasque-host>/api/ruleset-git/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -d '{
           "commit_message": "Update ruleset",
           "ruleset_name": "<your-ruleset-filename>",
           "ruleset_content": "version: \"1.0\"\ntasks:\n  - type: run_data_discovery"
         }'

GET /api/ruleset-git/files/

Authorization: User token only.

This endpoint lists the git_directory_path in the remote repository configured for the DataMasque instance. It considers any files ending in .yml to be ruleset files, and will fetch the list of commits for each of them. It does not enter into subdirectories of git_directory_path.

GET /api/ruleset-git/files/ Parameters

No parameters.

GET /api/ruleset-git/files/ Responses

Example Response

The response is a JSON object with each key being the name of a file with a .yml extension in the git_directory_path. Each file entry has an array objects with a commit ID, commit date and commit message.

{
  "Ruleset1.yml": [
    {"commit": "f061s…46756", "date": "2024-01-10 12:31:45", "message": "Added Column"},
    {"commit": "64c18…1a279", "date": "2024-01-09 10:19:13", "message": "Removed Column"}
  ],
  "Another Ruleset.yml": [
    {"commit": "377f5…b32f4", "date": "2023-12-25 12:31:45", "message": "Update rule"}
  ]
}

Response Codes

Status Code Description
200 A JSON serialized list of ruleset names and their associated Git commit history.

GET /api/ruleset-git/files/ curl example

curl "https://<your-datamasque-host>/api/ruleset-git/files/" \
     -H "Authorization: Token <your-api-token>"

Exporting DataMasque Configuration

To keep a backup of the data stored in DataMasque, you can export it to a Zip file. This is done by making a GET request to /api/export/v1/. Optionally, you can also specify the export_type query parameter to select which data to include in the export. The parameter may be specified multiple times to specify different types of data to include in the same Zip file.

The Zip file will have the following structure, but please note that some files/directories may be missing if those files were not included in the export, due to setting an export_type.

Path Type Description
manifest.json File A JSON file containing metadata about the export and other files in the Zip.
rulesets/database/ Directory A directory containing database masking rulesets in YAML format.
rulesets/file/ Directory A directory containing file masking rulesets in YAML format.

Export Types

The following export types may be used to control the data included in the export archive:

Currently, only the export of Rulesets is supported, therefore this is no difference in specifying rulesets as the export_type or omitting the export_type parameter completely.

Export Type Description
all Include all data described in this table. This is the default if no export_type is selected.
rulesets Include only rulesets.

manifest.json format

The manifest.json file contains the following information:

  • metadata: Metadata about the export archive.
    • version: The version format of the export file.
    • exported_at: The UTC date and time the export was created, in ISO format.
  • data: Information about the files included in the export archive.
    • rulesets: A list of metadata about the exported ruleset. Each object in the list contains the id, name and type (database or file) for each exported ruleset.

Ruleset Export Naming

When rulesets are exported to a Zip archive, they are stored in either the rulesets/database/ directory, (for database rulesets) or rulesets/file/ directory (for file rulesets).

The name of the file is built by appending .yml to the ruleset name. For example:

  • The database masking ruleset named Ruleset 01 would be exported to rulesets/database/Ruleset 01.yml.
  • The file masking ruleset named Ruleset F would be exported to rulesets/file/Ruleset F.yml.

Note: Rulesets that have been deleted from DataMasque are not visible in the ruleset list in the DataMasque dashboard, but are still retained in the DataMasque database because runs reference them. These "archived" rulesets are not including the Zip export.

GET /api/export/v1/

Authorization: User token only.

Export DataMasque data to a Zip archive in the Version 1 format. The filename of the archive will be based on the export type selected, and contain the current UTC date and time. For example: datamasque_export_rulesets_20240211-091507.zip.

GET /api/export/v1/ Parameters

Field Type Required Location Description
export_type string No Query Parameter The type of data to export (see Export Types for a full list). Defaults to all.

Multiple export types may be specified by using multiple export_type query parameters. For example, /api/export/v1/?export_type=type_a&export_type=type_b.

GET /api/export/v1/ curl example

When using curl, specify the -O flag to output the response to disk, and the -J flag to allow the response to specify the name (as per the example above).

curl "https://<your-datamasque-host>/api/export/v1/" \
     -H "Authorization: Token <your-api-token>" \
     -J -O

A Zip file named like datamasque_export_all_20240211-091507.zip will be saved to the current directory.

Importing DataMasque Configuration

A DataMasque export Zip can be imported to a DataMasque install using the /api/export/v1 API endpoint.

For the best import experience, a Zip that has been exported from DataMasque than contains a manifest.json file should be used. However, a Zip with the correct folder structure may also be created, even if missing manifest.json. DataMasque will import the information, but automatic conflict resolution of duplicate rulesets will not work as well. The difference between inclusion/exclusion of manifest.json is explained below.

Zip Exports From DataMasque With manifest.json

Since Zip exports created by DataMasque include the UUID of each exported item, this can be used to determine which items already exist.

When importing rulesets:

  • If a ruleset with a given ID exists during import:
    • If ruleset is archived, then it will be restored and its name and content are updated with the imported ruleset.
    • If ruleset is not archived, then no action is taken with that ruleset. The content in the DataMasque instance is unchanged.
  • If a ruleset is found with a matching name, and the contents are identical, then no action is taken. The content in the DataMasque instance is unchanged.
  • If a ruleset is found with a matching name, but the contents are different, then a new ruleset is created by appending Copy to the name. For example, if Ruleset A exists, then the content will be uploaded to a ruleset Ruleset A Copy. An incrementing number will be added until an unused name is found, for example, Copy 1, Copy 2, etc.
  • If no ruleset with the given ID or name exists, then it is created.

Because of these rules, imports of the same Zip archive may be repeated multiple times without duplicating content.

Zip Exports Created Without manifest.json

A Zip export archive may be created manually, provided the file structure is correct. That is, it matches the structure outlined in Exporting DataMasque Configuration. Without a manifest.json, the ID of rulesets is not known, so matching is done based on the name, using the following rules:

  • If a ruleset is found with a matching name, and the contents are identical, then no action is taken. The content in the DataMasque instance is unchanged.
  • If a ruleset is found with a matching name, but the contents are different, then a new ruleset is created by appending Copy to the name. For example, if Ruleset A exists, then the content will be uploaded to a ruleset Ruleset A Copy. An incrementing number will be added until an unused name is found, for example, Copy 1, Copy 2, etc.
  • If no ruleset with the given name exists, then it is created.

Because the imported IDs of rulesets is not known, re-running an import without a manifest.json may result in duplicated rulesets with identical content.

POST /api/import/v1/

Authorization: User token only.

Import a DataMasque export Zip file. The response will contain a list of actions taken for each included object.

POST /api/import/v1/ Parameters

Field Type Required Location Description
zip_archive file Yes Form Field The exported Zip archive file.

POST /api/import/v1/ Responses

The response of an import request contains information about the resources that were imported, grouped by resource type. An example response is shown below.

{
  "data": {
    "rulesets": {
      "metadata": {"processed":  6, "created":  2, "restored": 1, "error":  1},
      "data": [
        {
          "exported_name": "Ruleset A", 
          "exported_id": "9d641e97-adf7-4f22-9089-afc3711bf222",
          "imported_name": "Ruleset A", 
          "imported_id": "9d641e97-adf7-4f22-9089-afc3711bf222",
          "ruleset_type": "database",
          "status": "NOT_CREATED", 
          "message": "A ruleset with ID \"9d641e97-adf7-4f22-9089-afc3711bf222\"  already exists, and was not changed."
        },
        {
          "exported_name": "Ruleset B", 
          "exported_id": null,
          "imported_name": "Ruleset B Copy", 
          "imported_id": "04ea20f0-ad4c-498e-881f-b0bc79d83ba7",
          "ruleset_type": "file",
          "status": "CREATED_DUPLICATE", 
          "message": "A ruleset named \"Ruleset B\" already exists, so ruleset \"Ruleset B Copy\" was created."
        },
        {
          "exported_name": "Ruleset C", 
          "exported_id": null,
          "imported_name": "Ruleset C", 
          "imported_id": "7d731d55-68c9-400e-a790-e052afe789cc",
          "ruleset_type": "database", 
          "status": "NOT_CREATED", 
          "message": "A ruleset named \"Ruleset C\" exists with identical content."
        },
        {
          "exported_name": "Ruleset D", 
          "exported_id": null,
          "imported_name": "Ruleset D", 
          "imported_id": "99eeffd3-3f65-4ed7-8ad1-a31a539b7b2c",
          "ruleset_type": "file",
          "status": "CREATED", 
          "message": "Ruleset named \"Ruleset D\" did not exist, and was created."
        },
        {
          "exported_name": "Ruleset E", 
          "exported_id": "c0f5b5bb-a2ce-4cea-9248-1b8ef6539a0e",
          "imported_name": "Ruleset E", 
          "imported_id": "c0f5b5bb-a2ce-4cea-9248-1b8ef6539a0e",
          "ruleset_type": "database",
          "status": "RESTORED", 
          "message": "An archived ruleset with ID \"c0f5b5bb-a2ce-4cea-9248-1b8ef6539a0e\" has been restored and overwritten with the new name and content."
        },
        {
          "exported_name": "Ruleset F", 
          "exported_id": "abc123",
          "imported_name": null, 
          "imported_id": null,
          "ruleset_type": "database",
          "status": "ERROR", 
          "message": "Import of ruleset with ID \"abc123\" due to error: invalid ID." 
        }
      ]
    }
  }
}

The metadata for each item type shows the number of items of that type processed, and how many of each one were created, restored or had an error.

Each data object contains information about the import of that item. The fields are:

  • exported_name: The name of the ruleset in the export Zip archive.
  • exported_id: The ID of the ruleset from the export Zip archive. Only available if a manifest.json files is present, otherwise this will be null.
  • imported_name: The name that the ruleset was imported to. Usually this will match exported_name. This will only be null on error. If the ruleset was not imported due to it already existing, this will still match exported_name.
  • imported_id: The ID that the ruleset was imported to. This will be generated if exported_id was null, otherwise it will be expected to match exported_id (even if the data was not changed). imported_id will be null on error.
  • ruleset_type: One of database or file.
  • status: The status of the import of this ruleset. One of:
    • NOT_CREATED: Ruleset was not created due to the ID existing or content being identical.
    • CREATED_DUPLICATE: A ruleset with that name existed, so it was imported with a new name (in imported_name).
    • CREATED: A ruleset with that ID or name did not exist, so was created.
    • RESTORED: An archived ruleset has been restored and overwritten with the new name and content from an imported ruleset.
    • ERROR: There was an error creating the ruleset. Check message for details.
  • message: A human-readable message describing the action taken or error that occurred. Messages may change between DataMasque versions, so they should not be relied on to determine the outcome of an import. Instead, refer to the status field.

The status code of the response, as shown in the table below, gives a quick overview of if any resources were created or not.

Status Code Description
200 The import was successful, indicating either no changes (e.g. the uploaded rulesets already existed) or the successful restoration of some rulesets.
201 The import was successful, and one or more rulesets were created.

POST /api/import/v1/ curl example

curl -X POST "https://<your-datamasque-host>/api/import/v1/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: multipart/form-data" \
     -F "zip_archive=@</path/to/your/datamasque_export_all_20240211-091507.zip>"

POST /api/users/admin-install/

Authorization: Anonymous, Only when no user has been created.

Verify the DataMasque installation, and set up an admin account.

POST /api/users/admin-install/ Parameters

Field Type Required Location Description
email string Yes Request Body The email of the user you are logging in as.
username string Yes Request Body The username of the user you are logging in as.
password string Yes Request Body The password for the user.
re_password string Yes Request Body The password for the user again, to confirm the password entered above.
allowed_hosts list['string'] Yes Request Body A list of hostnames that will be allowed to access DataMasque upon installation.
aws_ec2_instance_id string Required only for Marketplace installations. Request Body The instance id of the AWS EC2.

POST /api/users/admin-install/ Responses

Status Code Description
201 A JSON serialised User object.

POST /api/users/admin-install/ curl example

curl -X POST "https://<your-datamasque-host>/api/users/admin-install/" \
     -H "Authorization: Token <your-api-token>" \
     -d '{
           "email": "<your-admin-email>",
           "username": "<your-username>",
           "password": "<your-admin-password>",
           "re_password": "<your-admin-password>",
           "allowed_hosts": ["masque.local"],
           "aws_ec2_instance_id": "<your-instance-id>"
         }'

Installation Info Object

A JSON object showing the state of the current installation with the following data:

Field Type Description
is_aws_marketplace boolean Whether the current installation has been installed from the AWS marketplace.
installed boolean If the current installation has been successfully installed.
is_smtp_configured boolean If SMTP has been configured on the DataMasque instance.
is_saml_sso_configured boolean Is SSO has been enabled on the DataMasque instance.

GET /api/app/check/

Authorization: User token or API token.

Checks to verify if DataMasque has successfully been installed.

GET /api/app/check/ Parameters

No parameters.

GET /api/app/check/ Response

Code 200

Description:

Status Code Description
200 A JSON serialised Installation Info Object object.

GET /api/app/check/ curl example

curl "https://<your-datamasque-host>/api/app/check/" \
     -H "Authorization: Token <your-api-token>"

POST /api/license-upload/

Authorization: User token only.

Uploads a licence file to DataMasque.

POST /api/license-upload/ Parameters

No parameters.

POST /api/license-upload/ Responses

Status Code Description
200 Operation succeeded.

POST /api/license-upload/ curl example

curl -X POST "https://<your-datamasque-host>/api/license-upload/" \
     -H "Authorization: Token <your-api-token>" \
     -H "Content-Type: application/json" \
     -F "license_file=@</path/to/your/license_file.lic>"

Health Check Object

Various health statistics about the DataMasque instance:

Field Type Description
worker_running boolean true if the masking agent worker processes are healthy, false if there are no available workers.
license_expired boolean true if the licence is expired, false if the licence is not expired.
license_renewal_in_days integer Remaining days until licence expiry.
license_limit_breach object An object describing any licence breaches that have occurred. Each property on the object is the type of breach that has occurred. Each property value is an object containing breach_type, message, and created_date properties.

GET /api/health-check/

Authorization: User token or API token.

Get the basic health-check status of DataMasque.

GET /api/health-check/ Parameters

No parameters.

GET /api/health-check/ Responses

Status Code Description
200 A JSON serialised Health Check Object.
500 A server error has occurred, such as an invalid license file exists. The known error will be returned.

GET /api/health-check/ curl example

curl "https://<your-datamasque-host>/api/health-check/" \
     -H "Authorization: Token <your-api-token>"