DataMasque Portal

In-Flight Masking Performance Guide

This document describes what factors contribute to memory or CPU use of DataMasque's in-flight server, and includes a sizing guide to calculate how much memory and CPU you need.

CPU and Worker Requirements

The number of requests that can be service per second is dependent on:

  • The number of items being masked in the request.
  • The complexity of the ruleset(s).
  • The number of in-flight workers.
    • The number of in-flight workers that can be configured is in turn dependent on the number of CPUs and amount of memory available.

In general, more in-flight workers means more requests can be handled at once, provided your DataMasque instance has enough CPU and memory capacity.

In order to calculate the number of workers to use, a starting point is the worker count should be one third the number of simultaneous clients expected. The corollary to this is that the number of CPUs available to the in-flight server should be half that of the number of workers.

This table shows some examples of the number of workers and (virtual )CPUs required for different client loads (with fractional parts rounded up).

Expected Simultaneous Clients Estimated Workers Required CPUs Required
1 1 client /3 ≈ 1 worker 1 worker / 2 ≈ 1 CPU
10 10 clients / 3 ≈ 4 workers 4 workers / 2 = 2 CPUs
50 50 clients / 3 ≈ 17 workers 17 workers / 2 ≈ 9 CPUs

The worker count can be set using the IN_FLIGHT_WORKERS setting.

Memory Requirements

Memory requirements increase based on the number of workers and the number of size of ruleset plans loaded. The effect of each is explained in the next section.

Base Memory Requirements

The calculations below consider base/idle memory usage. This base memory requirement is for DataMasque and the operating system, and typically ranges from 2.5GB to 3GB. This includes memory needed for DataMasque's core processes and standard system operations. The exact amount may vary depending on your specific environment and configuration.

Memory Change Based On Workers and Ruleset Plans

The total memory usage of DataMasque's in-flight server is determined by:

  • The number of in-flight workers.
  • The amount and complexity of ruleset plans that are loaded.

Memory usage will increase based on the size of the requests being serviced, but in practice the memory used by requests are negligible compared to the base memory. The effect of request size on memory usage is detailed in the Memory Change Based On Active Requests section.

The amount of memory used per ruleset plan varies based on its ruleset's complexity. A ruleset plan with a simple ruleset with a single mask may only require approximately 1MB of extra memory. However, more complex rulesets may require much more. A ruleset with 15 rules to mask a complex JSON document may take 30MB of memory. An estimate of up to 2MB per mask should be used to calculate the memory usage of a ruleset plan. Each ruleset plan must be loaded per worker, so the memory usage should be multiplied by the number of workers.

This table shows an example of additional memory usage based on worker count (on top of the baseline memory usage) for both a simple and complex ruleset.

Ruleset Plan Type Ruleset Plan Memory Usage (approx.) Number of Workers Total Additional Memory
Simple Single-Mask 1MB 4 1MB * 4 Workers = 4MB
Simple Single-Mask 1MB 8 1MB * 8 Workers = 8MB
Complex Multi-Mask 30MB 4 30MB * 4 Workers = 120MB
Complex Multi-Mask 30MB 8 30MB * 8 Workers = 240MB

The next table shows the effect of loading multiple different ruleset plans, while keeping the number of workers constant.

Ruleset Plan Type Ruleset Plan Memory Usage (approx.) Number of Ruleset Plans Total Additional Memory
Simple Single-Mask 1MB 1 1MB * 4 Workers * 1 Plan = 4MB
Simple Single-Mask 1MB 10 1MB * 4 Workers * 10 Plans = 40MB
Complex Multi-Mask 30MB 1 30MB * 4 Workers * 1 Plan = 120MB
Complex Multi-Mask 30MB 10 30MB * 4 Workers * 10 Plans = 1200MB

In addition, each additional worker requires approximately 160MB of memory.

Memory Change Based On Active Requests

When a request is being processed, DataMasque's memory usage temporarily increases to handle the data. Due to internal processing and data transformation, the peak memory usage is typically several times larger than the raw request size. As a general guideline, you might see around 5x the original request size in memory usage while processing. For example, masking a 100KB JSON request might use approximately 500KB of memory during processing. This memory is released after the request completes.

Request size is not included in the sizing guide since it rarely impacts overall memory usage significantly. However, if you expect to process very large requests or many simultaneous requests, you should factor in their size when planning your memory requirements.

Calculating Resource Requirements (Sizing Guide)

This section shows examples of calculating resource requirements for a DataMasque instance, when using in-flight masking.

The approach is:

  1. Determine the number of simultaneous clients that need to be handled.
  2. Use this to determine worker count.
  3. Determine the base worker memory amount.
  4. Approximate the memory usage for the rulesets plan (2MB per mask).
  5. Multiply the number of ruleset plans by number of workers to determine total memory of ruleset plans.
  6. Add on the base memory usage for DataMasque + operating system (2.5GB - 3GB).
  7. Divide worker count by 2 to determine (v)CPU count.

Scenario 1: Low Use

In this scenario there are 2 clients using DataMasque simultaneously, with 2 simple ruleset plans loaded with 5 masks each.

  1. Simultaneous clients: 2.
  2. Worker count: 2 / 3 ≈ 1 worker.
  3. Base worker memory amount: 1 worker * 160MB/worker = 160MB.
  4. Ruleset plan memory usage: 3 ruleset plans * 5 masks * 2MB/mask = 12MB per worker.
  5. Total ruleset plan memory use: 12MB/worker * 1 worker = 12MB.
  6. Base memory for DataMasque and OS: 3GB
  7. (v)CPU Count: 2 workers / 2 = 1 (v)CPU.

Therefore, memory usage is:

160MB + 12MB + 3GB ≈ 3.2GB.

1 (virtual) CPU should be available for in-flight masking, to not affect other processes on the instance.

Scenario 2: High Use

In this scenario there are 10 clients using DataMasque simultaneously, with 10 complex ruleset plans loaded with 15 masks each.

  1. Simultaneous clients: 10.
  2. Worker count: 10 / 3 ≈ 4 workers.
  3. Base worker memory amount: 4 worker * 160MB/worker = 640MB.
  4. Ruleset plan memory usage: 10 ruleset plans * 15 masks * 2MB/mask = 300MB per worker.
  5. Total ruleset plan memory use: 300MB/ruleset * 4 worker = 1200MB.
  6. Base memory for DataMasque and OS: 3GB
  7. (v)CPU Count: 4 workers / 2 = 2 (v)CPU.

Therefore, memory usage is:

640MB + 1200MB + 3GB ≈ 5GB.

2 (virtual) CPUs should be available for in-flight masking, to not affect other processes on the instance.

In both cases, extra memory and CPUs should be left available if static masking (database or file masking) is to be performed while in-flight masking is in process.

Extra Performance by Batching Values

When sending values to mask, there is some per-request overhead. For example, there is time spent opening HTTP connections and when waiting for the round trip when sending/receiving data. Therefore, masking can be faster if multiple items are included in the masking request. This will reduce the number of requests per second processed, but since each request contains more items, the overall items per second will increase.

When increasing the request size, refer to the Memory Change Based On Active Requests section to make sure the memory of the DataMasque instance is not exhausted.

The maximum request body size is 250MB. The maximum allowed duration for a masking request is 60 seconds, this includes upload, masking, and download times. Using larger requests may cause the timeout to be exceeded.

Performance Considerations for Run Secrets

DataMasque maintains cached instances of masks for better performance. These instances are rebuilt when:

  • A new run secret is used.
  • The disable_instance_secret setting changes.
  • The ruleset plan is updated.
  • DataMasque starts.

Default Behavior (Unspecified Run Secret)

When no run_secret is specified in the masking request, for each ruleset plan:

  1. Generates a random run secret at startup or when the ruleset plan is created or updated.
  2. Builds mask instances once.
  3. Reuses these instances for all requests to that ruleset plan.

Custom Run Secrets

When using custom run secrets, performance depends on how frequently they change. In these examples, A and B represent two different run secrets:

Example 1 - Poor Performance:
Requests: ABABAB
Result: Masks rebuild on every request

Example 2 - Better Performance:
Requests: AAABBB
Result: Masks rebuild only twice (first A, first B)

The same principle applies to disable_instance_secret:

  • Alternating true/false (truefalsetruefalse) forces rebuilds on every request
  • Grouping values (truetruetruefalsefalse) minimizes rebuilds

Note: The time to rebuild masks depends on the number of masks in your ruleset. For optimal performance, group requests with the same run secret and instance secret settings.

Multiple Ruleset Plans

Run secrets are managed independently for each ruleset plan. Using run secret A on ruleset plan X, and run secret B on ruleset plan Y, will not trigger rebuilds, as each ruleset plan maintains its own mask instances.