Skip to content

Data Masking

The data masking utility can encrypt, decrypt, or irreversibly erase sensitive information to protect data confidentiality.

stateDiagram-v2
    direction LR
    LambdaFn: Your Lambda function
    DataMasking: DataMasking
    Operation: Possible operations
    Input: Sensitive value
    Erase: <strong>Erase</strong>
    Encrypt: <strong>Encrypt</strong>
    Decrypt: <strong>Decrypt</strong>
    Provider: AWS Encryption SDK provider
    Result: Data transformed <i>(erased, encrypted, or decrypted)</i>

    LambdaFn --> DataMasking
    DataMasking --> Operation

    state Operation {
        [*] --> Input
        Input --> Erase: Irreversible
        Input --> Encrypt
        Input --> Decrypt
        Encrypt --> Provider
        Decrypt --> Provider
    }

    Operation --> Result

Key features

  • Encrypt, decrypt, or irreversibly erase data with ease
  • Erase sensitive information in one or more fields within nested data
  • Seamless integration with AWS Encryption SDK for industry and AWS security best practices

Terminology

Erasing replaces sensitive information irreversibly with a non-sensitive placeholder (*****). This operation replaces data in-memory, making it a one-way action.

Encrypting transforms plaintext into ciphertext using an encryption algorithm and a cryptographic key. It allows you to encrypt any sensitive data, so only allowed personnel to decrypt it. Learn more about encryption here.

Decrypting transforms ciphertext back into plaintext using a decryption algorithm and the correct decryption key.

Encryption context is a non-secret key=value data used for authentication like tenant_id:<id>. This adds extra security and confirms encrypted data relationship with a context.

Encrypted message is a portable data structure that includes encrypted data along with copies of the encrypted data key. It includes everything Encryption SDK needs to validate authenticity, integrity, and to decrypt with the right master key.

Envelope encryption uses two different keys to encrypt data safely: master and data key. The data key encrypts the plaintext, and the master key encrypts the data key. It simplifies key management (you own the master key), isolates compromises to data key, and scales better with large data volumes.

graph LR
    M(Master key) --> |Encrypts| D(Data key)
    D(Data key) --> |Encrypts| S(Sensitive data)
Envelope encryption visualized.

Getting started

Tip

All examples shared in this documentation are available within the project repository.

Install

Our Lambda layer does not include the aws-encryption-sdk. Please install it as a dependency in your project to use this utility.

Add aws-lambda-powertools[datamasking] as a dependency in your preferred tool: e.g., requirements.txt, pyproject.toml. This will install the AWS Encryption SDK.

AWS Encryption SDK contains non-Python dependencies. This means you should use AWS SAM CLI or official build container images when building your application for AWS Lambda. Local development should work as expected.

Required resources

By default, we use Amazon Key Management Service (KMS) for encryption and decryption operations.

Before you start, you will need a KMS symmetric key to encrypt and decrypt your data. Your Lambda function will need read and write access to it.

NOTE. We recommend setting a minimum of 1024MB of memory (CPU intensive), and separate Lambda functions for encrypt and decrypt. For more information, you can see the full reports of our load tests and traces.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Description: >
  Powertools for AWS Lambda (Python) data masking example

Globals: # https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/sam-specification-template-anatomy-globals.html
  Function:
    Timeout: 5
    Runtime: python3.11
    Tracing: Active
    Environment:
      Variables:
        POWERTOOLS_SERVICE_NAME: PowertoolsHelloWorld
        POWERTOOLS_LOG_LEVEL: INFO
        KMS_KEY_ARN: !GetAtt DataMaskingMasterKey.Arn

# In production, we recommend you split up the encrypt and decrypt for fine-grained security.
# For example, one function can act as the encryption proxy via HTTP requests, data pipeline, etc.,
# while only authorized personnel can call decrypt via a separate function.
Resources:
  DataMaskingEncryptFunctionExample:
    Type: AWS::Serverless::Function
    Properties:
      Handler: data_masking_function_example.lambda_handler
      CodeUri: ../src
      Description: Data Masking encryption function
      # Cryptographic operations demand more CPU. CPU is proportionally allocated based on memory size.
      # We recommend allocating a minimum of 1024MB of memory.
      MemorySize: 1024

  # DataMaskingDecryptFunctionExample:
  #   Type: AWS::Serverless::Function
  #   Properties:
  #     Handler: data_masking_function_decrypt.lambda_handler
  #     CodeUri: ../src
  #     Description: Data Masking decryption function
  #     MemorySize: 1024

  # KMS Key
  DataMaskingMasterKey:
    Type: "AWS::KMS::Key"
    Properties:
      Description: KMS Key for encryption and decryption using Powertools for AWS Lambda Data masking feature
      # KMS Key support both IAM Resource Policies and Key Policies
      # For more details: https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html
      KeyPolicy:
        Version: "2012-10-17"
        Id: data-masking-enc-dec
        Statement:
          # For security reasons, ensure your KMS Key has at least one administrator.
          # In this example, the root account is granted administrator permissions.
          # However, we recommended configuring specific IAM Roles for enhanced security in production.
          - Effect: Allow
            Principal:
              AWS: !Sub "arn:aws:iam::${AWS::AccountId}:root" # (1)!
            Action: "kms:*"
            Resource: "*"
          # We must grant Lambda's IAM Role access to the KMS Key
          - Effect: Allow
            Principal:
              AWS: !GetAtt DataMaskingEncryptFunctionExampleRole.Arn # (2)!
            Action:
              - kms:Decrypt # to decrypt encrypted data key
              - kms:GenerateDataKey # to create an unique and random data key for encryption
              # Encrypt permission is required only when using multiple keys
              - kms:Encrypt  # (3)!
            Resource: "*"
  1. Key policy examples using IAM Roles
  2. SAM generated CloudFormation Resources
  3. Required only when using multiple keys

Erasing data

Erasing will remove the original data and replace it with a *****. This means you cannot recover erased data, and the data type will change to str for all data unless the data to be erased is of an Iterable type (list, tuple, set), in which case the method will return a new object of the same type as the input data but with each element replaced by the string *****.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
from __future__ import annotations

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.typing import LambdaContext

logger = Logger()
data_masker = DataMasking()


@logger.inject_lambda_context
def lambda_handler(event: dict, context: LambdaContext) -> dict:
    data: dict = event.get("body", {})

    logger.info("Erasing fields email, address.street, and company_address")

    erased = data_masker.erase(data, fields=["email", "address.street", "company_address"])  # (1)!

    return erased
  1. See working with nested data to learn more about the fields parameter.

    If we omit fields parameter, the entire dictionary will be erased with *****.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
{
    "body": 
    {
        "id": 1,
        "name": "John Doe",
        "age": 30,
        "email": "johndoe@example.com",
        "address": {
            "street": "123 Main St", 
            "city": "Anytown", 
            "state": "CA", 
            "zip": "12345"
        },
        "company_address": {
            "street": "456 ACME Ave", 
            "city": "Anytown", 
            "state": "CA", 
            "zip": "12345"
        }
    }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
    "id": 1,
    "name": "John Doe",
    "age": 30,
    "email": "*****",
    "address": {
        "street": "*****",
        "city": "Anytown",
        "state": "CA",
        "zip": "12345"
    },
    "company_address": "*****"
}

Encrypting data

About static typing and encryption

Encrypting data may lead to a different data type, as it always transforms into a string (<ciphertext>).

To encrypt, you will need an encryption provider. Here, we will use AWSEncryptionSDKProvider.

Under the hood, we delegate a number of operations to AWS Encryption SDK to authenticate, create a portable encryption message, and actual data encryption.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from __future__ import annotations

import os

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import (
    AWSEncryptionSDKProvider,
)
from aws_lambda_powertools.utilities.typing import LambdaContext

KMS_KEY_ARN = os.getenv("KMS_KEY_ARN", "")

encryption_provider = AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN])  # (1)!
data_masker = DataMasking(provider=encryption_provider)

logger = Logger()


@logger.inject_lambda_context
def lambda_handler(event: dict, context: LambdaContext) -> dict:
    data: dict = event.get("body", {})

    logger.info("Encrypting the whole object")

    encrypted = data_masker.encrypt(data)

    return {"body": encrypted}
  1. You can use more than one KMS Key for higher availability but increased latency.

    Encryption SDK will ensure the data key is encrypted with both keys.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
{
    "body": 
    {
        "id": 1,
        "name": "John Doe",
        "age": 30,
        "email": "johndoe@example.com",
        "address": {
            "street": "123 Main St", 
            "city": "Anytown", 
            "state": "CA", 
            "zip": "12345"
        },
        "company_address": {
            "street": "456 ACME Ave", 
            "city": "Anytown", 
            "state": "CA", 
            "zip": "12345"
        }
    }
}
1
2
3
{
    "body": "AgV4uF5K2YMtNhYrtviTwKNrUHhqQr73l/jNfukkh+qLOC8AXwABABVhd3MtY3J5cHRvLXB1YmxpYy1rZXkAREEvcjEyaFZHY1R5cjJuTDNKbTJ3UFA3R3ZjaytIdi9hekZqbXVUb25Ya3J5SzFBOUlJZDZxZXpSR1NTVnZDUUxoZz09AAEAB2F3cy1rbXMAS2Fybjphd3M6a21zOnVzLWVhc3QtMToyMDA5ODQxMTIzODY6a2V5LzZkODJiMzRlLTM2NjAtNDRlMi04YWJiLTdmMzA1OGJlYTIxMgC4AQIBAHjxYXAO7wQGd+7qxoyvXAajwqboF5FL/9lgYUNJTB8VtAHBP2hwVgw+zypp7GoMNTPAAAAAfjB8BgkqhkiG9w0BBwagbzBtAgEAMGgGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMx/B25MTgWwpL7CmuAgEQgDtan3orAOKFUfyNm3v6rFcglb+BVVVDV71fj4aRljhpg1ixsYFaKsoej8NcwRktIiWE+mw9XmTEVb6xFQIAABAA9DeLzlRaRQgTcXMJG0iBu/YTyyDKiROD+bU1Y09X9RBz5LA1nWIENJKq2seAhNSB/////wAAAAEAAAAAAAAAAAAAAAEAAAEBExLJ9wI4n7t+wyPEEP4kjYFBdkmNuLLsVC2Yt8mv9Y1iH2G+/g9SaIcdK57pkoW0ECpBxZVOxCuhmK2s74AJCUdem9McjS1waUKyzYTi9vv2ySNBsABIDwT990rE7jZJ3tEZAqcWZg/eWlxvnksFR/akBWZKsKzFz6lF57+cTgdISCEJRV0E7fcUeCuaMaQGK1Qw2OCmIeHEG5j5iztBkZG2IB2CVND/AbxmDUFHwgjsrJPTzaDYSufcGMoZW1A9X1sLVfqNVKvnOFP5tNY7kPF5eAI9FhGBw8SjTqODXz4k6zuqzy9no8HtXowP265U8NZ5VbVTd/zuVEbZyK5KBqzP1sExW4RhnlpXMoOs9WSuAGcwZQIxANTeEwb9V7CacV2Urt/oCqysUzhoV2AcT2ZjryFqY79Tsg+FRpIx7cBizL4ieRzbhQIwcRasNncO5OZOcmVr0MqHv+gCVznndMgjXJmWwUa7h6skJKmhhMPlN0CsugxtVWnD"
}

Decrypting data

About static typing and decryption

Decrypting data may lead to a different data type, as encrypted data is always a string (<ciphertext>).

To decrypt, you will need an encryption provider. Here, we will use AWSEncryptionSDKProvider.

Under the hood, we delegate a number of operations to AWS Encryption SDK to verify authentication, integrity, and actual ciphertext decryption.

NOTE. Decryption only works with KMS Key ARN.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from __future__ import annotations

import os

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import AWSEncryptionSDKProvider
from aws_lambda_powertools.utilities.typing import LambdaContext

KMS_KEY_ARN = os.getenv("KMS_KEY_ARN", "")  # (1)!

encryption_provider = AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN])  # (2)!
data_masker = DataMasking(provider=encryption_provider)

logger = Logger()


@logger.inject_lambda_context
def lambda_handler(event: dict, context: LambdaContext) -> dict:
    data: dict = event.get("body", {})

    logger.info("Decrypting whole object")

    decrypted = data_masker.decrypt(data)

    return decrypted
  1. Note that KMS key alias or key ID won't work.
  2. You can use more than one KMS Key for higher availability but increased latency.

    Encryption SDK will call Decrypt API with all master keys when trying to decrypt the data key.
1
2
3
{
    "body": "AgV4uF5K2YMtNhYrtviTwKNrUHhqQr73l/jNfukkh+qLOC8AXwABABVhd3MtY3J5cHRvLXB1YmxpYy1rZXkAREEvcjEyaFZHY1R5cjJuTDNKbTJ3UFA3R3ZjaytIdi9hekZqbXVUb25Ya3J5SzFBOUlJZDZxZXpSR1NTVnZDUUxoZz09AAEAB2F3cy1rbXMAS2Fybjphd3M6a21zOnVzLWVhc3QtMToyMDA5ODQxMTIzODY6a2V5LzZkODJiMzRlLTM2NjAtNDRlMi04YWJiLTdmMzA1OGJlYTIxMgC4AQIBAHjxYXAO7wQGd+7qxoyvXAajwqboF5FL/9lgYUNJTB8VtAHBP2hwVgw+zypp7GoMNTPAAAAAfjB8BgkqhkiG9w0BBwagbzBtAgEAMGgGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMx/B25MTgWwpL7CmuAgEQgDtan3orAOKFUfyNm3v6rFcglb+BVVVDV71fj4aRljhpg1ixsYFaKsoej8NcwRktIiWE+mw9XmTEVb6xFQIAABAA9DeLzlRaRQgTcXMJG0iBu/YTyyDKiROD+bU1Y09X9RBz5LA1nWIENJKq2seAhNSB/////wAAAAEAAAAAAAAAAAAAAAEAAAEBExLJ9wI4n7t+wyPEEP4kjYFBdkmNuLLsVC2Yt8mv9Y1iH2G+/g9SaIcdK57pkoW0ECpBxZVOxCuhmK2s74AJCUdem9McjS1waUKyzYTi9vv2ySNBsABIDwT990rE7jZJ3tEZAqcWZg/eWlxvnksFR/akBWZKsKzFz6lF57+cTgdISCEJRV0E7fcUeCuaMaQGK1Qw2OCmIeHEG5j5iztBkZG2IB2CVND/AbxmDUFHwgjsrJPTzaDYSufcGMoZW1A9X1sLVfqNVKvnOFP5tNY7kPF5eAI9FhGBw8SjTqODXz4k6zuqzy9no8HtXowP265U8NZ5VbVTd/zuVEbZyK5KBqzP1sExW4RhnlpXMoOs9WSuAGcwZQIxANTeEwb9V7CacV2Urt/oCqysUzhoV2AcT2ZjryFqY79Tsg+FRpIx7cBizL4ieRzbhQIwcRasNncO5OZOcmVr0MqHv+gCVznndMgjXJmWwUa7h6skJKmhhMPlN0CsugxtVWnD"
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{
    "id": 1,
    "name": "John Doe",
    "age": 30,
    "email": "johndoe@example.com",
    "address": {
        "street": "123 Main St",
        "city": "Anytown",
        "state": "CA",
        "zip": "12345"
    },
    "company_address": {
        "street": "456 ACME Ave",
        "city": "Anytown",
        "state": "CA",
        "zip": "12345"
    }
}

Encryption context for integrity and authenticity

For a stronger security posture, you can add metadata to each encryption operation, and verify them during decryption. This is known as additional authenticated data (AAD). These are non-sensitive data that can help protect authenticity and integrity of your encrypted data, and even help to prevent a confused deputy situation.

Important considerations you should know
  1. Exact match verification on decrypt. Be careful using random data like timestamps as encryption context if you can't provide them on decrypt.
  2. Only string values are supported. We will raise DataMaskingUnsupportedTypeError for non-string values.
  3. Use non-sensitive data only. When using KMS, encryption context is available as plaintext in AWS CloudTrail, unless you intentionally disabled KMS events.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from __future__ import annotations

import os

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import AWSEncryptionSDKProvider
from aws_lambda_powertools.utilities.typing import LambdaContext

KMS_KEY_ARN = os.getenv("KMS_KEY_ARN", "")

encryption_provider = AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN])
data_masker = DataMasking(provider=encryption_provider)

logger = Logger()


@logger.inject_lambda_context
def lambda_handler(event: dict, context: LambdaContext) -> str:
    data = event.get("body", {})

    logger.info("Encrypting whole object")

    encrypted: str = data_masker.encrypt(
        data,
        data_classification="confidential",  # (1)!
        data_type="customer-data",
        tenant_id="a06bf973-0734-4b53-9072-39d7ac5b2cba",
    )

    return encrypted
  1. They must match on decrypt() otherwise the operation will fail with DataMaskingContextMismatchError.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from __future__ import annotations

import os

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import AWSEncryptionSDKProvider
from aws_lambda_powertools.utilities.typing import LambdaContext

KMS_KEY_ARN = os.getenv("KMS_KEY_ARN", "")

encryption_provider = AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN])
data_masker = DataMasking(provider=encryption_provider)

logger = Logger()


@logger.inject_lambda_context
def lambda_handler(event: dict, context: LambdaContext) -> dict:
    data = event.get("body", {})

    logger.info("Decrypting whole object")

    decrypted: dict = data_masker.decrypt(
        data,
        data_classification="confidential",  # (1)!
        data_type="customer-data",
        tenant_id="a06bf973-0734-4b53-9072-39d7ac5b2cba",
    )

    return decrypted
  1. They must match otherwise the operation will fail with DataMaskingContextMismatchError.

Choosing parts of your data

Current limitations
  1. The fields parameter is not yet supported in encrypt and decrypt operations.
  2. We support JSON data types only - see data serialization for more details.

You can use the fields parameter with the dot notation . to choose one or more parts of your data to erase. This is useful when you want to keep data structure intact except the confidential fields.

When fields is present, erase behaves differently:

Operation Behavior Example Result
erase Replace data while keeping collections type intact. {"cards": ["a", "b"]} {"cards": ["*****", "*****"]}

Here are common scenarios to best visualize how to use fields.

You want to erase data in the card_number field.

Expression: data_masker.erase(data, fields=["card_number"])

1
2
3
4
5
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444"
}
1
2
3
4
5
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "*****"
}

You want to erase data in the postcode field.

Expression: data_masker.erase(data, fields=["address.postcode"])

1
2
3
4
5
6
7
8
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444",
    "address": {
        "postcode": 12345
    }
}
1
2
3
4
5
6
7
8
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444",
    "address": {
        "postcode": "*****"
    }
}

You want to erase data in both postcode and street fields.

Expression: data_masker.erase(data, fields=["address.postcode", "address.street"])

1
2
3
4
5
6
7
8
9
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444",
    "address": {
        "postcode": 12345,
        "street": "123 Any Street"
    }
}
1
2
3
4
5
6
7
8
9
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444",
    "address": {
        "postcode": "*****",
        "street": "*****"
    }
}

You want to erase data under address field.

Expression: data_masker.erase(data, fields=["address"])

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444",
    "address": [
        {
            "postcode": 12345,
            "street": "123 Any Street",
            "country": "United States",
            "timezone": "America/La_Paz"
        },
        {
            "postcode": 67890,
            "street": "100 Main Street",
            "country": "United States",
            "timezone": "America/Mazatlan"
        }
    ]
}
1
2
3
4
5
6
7
8
9
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444",
    "address": [
        "*****",
        "*****"
    ]
}

You want to erase data under name field.

Expression: data_masker.erase(data, fields=["category..name"])

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
    "category": {
        "subcategory": {
            "brand" : {
                "product": {
                    "name": "Car"
                }
            }
        }
    }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
    "category": {
        "subcategory": {
            "brand" : {
                "product": {
                    "name": "*****"
                }
            }
        }
    }
}

You want to erase data under street field located at the any index of the address list.

Expression: data_masker.erase(data, fields=["address[*].street"])

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444",
    "address": [
        {
            "postcode": 12345,
            "street": "123 Any Drive"
        },
        {
            "postcode": 67890,
            "street": "100 Main Street,"
        }
    ]
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444",
    "address": [
        {
            "postcode": 12345,
            "street": "*****"
        },
        {
            "postcode": 67890,
            "street": "*****"
        }
    ]
}

You want to erase data by slicing a list.

Expression: data_masker.erase(data, fields=["address[-1].street"])

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444",
    "address": [
        {
            "postcode": 12345,
            "street": "123 Any Street"
        },
        {
            "postcode": 67890,
            "street": "100 Main Street"
        },
        {
            "postcode": 78495,
            "street": "111 Any Drive"
        }
    ]
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444",
    "address": [
        {
            "postcode": 12345,
            "street": "123 Any Street"
        },
        {
            "postcode": 67890,
            "street": "100 Main Street"
        },
        {
            "postcode": 11111,
            "street": "*****"
        }
    ]
}

You want to erase data by finding for a field with conditional expression.

Expression: data_masker.erase(data, fields=["$.address[?(@.postcode > 12000)]"])

$: Represents the root of the JSON structure.

.address: Selects the "address" property within the JSON structure.

(@.postcode > 12000): Specifies the condition that elements should meet. It selects elements where the value of the postcode property is greater than 12000.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444",
    "address": [
        {
            "postcode": 12345,
            "street": "123 Any Drive"
        },
        {
            "postcode": 67890,
            "street": "111 Main Street"
        },
        {
            "postcode": 11111,
            "street": "100 Any Street"
        }
    ]
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "1111 2222 3333 4444",
    "address": [
        {
            "postcode": 12345,
            "street": "*****"
        },
        {
            "postcode": 67890,
            "street": "*****"
        },
        {
            "postcode": 11111,
            "street": "100 Any Street"
        }
    ]
}

For comprehensive guidance on using JSONPath syntax, please refer to the official documentation available at jsonpath-ng

JSON

We also support data in JSON string format as input. We automatically deserialize it, then handle each field operation as expected.

Note that the return will be a deserialized JSON and your desired fields updated.

Expression: data_masker.erase(data, fields=["card_number", "address.postcode"])

1
'{"name": "Carlos", "operation": "non sensitive", "card_number": "1111 2222 3333 4444", "address": {"postcode": 12345}}'
1
2
3
4
5
6
7
8
{
    "name": "Carlos",
    "operation": "non sensitive",
    "card_number": "*****",
    "address": {
        "postcode": "*****"
    }
}

Advanced

Data serialization

Current limitations
  1. Python classes, Dataclasses, and Pydantic models are not supported yet.

Before we traverse the data structure, we perform two important operations on input data:

  1. If JSON string, deserialize using default or provided deserializer.
  2. If dictionary, normalize into JSON to prevent traversing unsupported data types.

When decrypting, we revert the operation to restore the original data structure.

For compatibility or performance, you can optionally pass your own JSON serializer and deserializer to replace json.dumps and json.loads respectively:

advanced_custom_serializer.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from __future__ import annotations

import os

import ujson

from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import (
    AWSEncryptionSDKProvider,
)
from aws_lambda_powertools.utilities.typing import LambdaContext

KMS_KEY_ARN = os.getenv("KMS_KEY_ARN", "")

encryption_provider = AWSEncryptionSDKProvider(
    keys=[KMS_KEY_ARN],
    json_serializer=ujson.dumps,
    json_deserializer=ujson.loads,
)
data_masker = DataMasking(provider=encryption_provider)


def lambda_handler(event: dict, context: LambdaContext) -> str:
    data: dict = event.get("body", {})

    return data_masker.encrypt(data)

Using multiple keys

You can use multiple KMS keys from more than one AWS account for higher availability, when instantiating AWSEncryptionSDKProvider.

using_multiple_keys.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from __future__ import annotations

import os

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import (
    AWSEncryptionSDKProvider,
)
from aws_lambda_powertools.utilities.typing import LambdaContext

KMS_KEY_ARN_1 = os.getenv("KMS_KEY_ARN_1", "")
KMS_KEY_ARN_2 = os.getenv("KMS_KEY_ARN_2", "")

encryption_provider = AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN_1, KMS_KEY_ARN_2])
data_masker = DataMasking(provider=encryption_provider)

logger = Logger()


@logger.inject_lambda_context
def lambda_handler(event: dict, context: LambdaContext) -> dict:
    data: dict = event.get("body", {})

    logger.info("Encrypting the whole object")

    encrypted = data_masker.encrypt(data)

    return {"body": encrypted}

Providers

AWS Encryption SDK

You can modify the following values when initializing the AWSEncryptionSDKProvider to best accommodate your security and performance thresholds.

Parameter Default Description
local_cache_capacity 100 The maximum number of entries that can be retained in the local cryptographic materials cache
max_cache_age_seconds 300 The maximum time (in seconds) that a cache entry may be kept in the cache
max_messages_encrypted 4294967296 The maximum number of messages that may be encrypted under a cache entry
max_bytes_encrypted 9223372036854775807 The maximum number of bytes that may be encrypted under a cache entry

If required, you can customize the default values when initializing the AWSEncryptionSDKProvider class.

aws_encryption_provider_example.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from __future__ import annotations

import os

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import (
    AWSEncryptionSDKProvider,
)
from aws_lambda_powertools.utilities.typing import LambdaContext

KMS_KEY_ARN = os.getenv("KMS_KEY_ARN", "")

encryption_provider = AWSEncryptionSDKProvider(
    keys=[KMS_KEY_ARN],
    local_cache_capacity=200,
    max_cache_age_seconds=400,
    max_messages_encrypted=200,
    max_bytes_encrypted=2000)

data_masker = DataMasking(provider=encryption_provider)

logger = Logger()


@logger.inject_lambda_context
def lambda_handler(event: dict, context: LambdaContext) -> dict:
    data: dict = event.get("body", {})

    logger.info("Encrypting the whole object")

    encrypted = data_masker.encrypt(data)

    return {"body": encrypted}
Passing additional SDK arguments

As an escape hatch mechanism, you can pass additional arguments to the AWSEncryptionSDKProvider via the provider_options parameter.

For example, the AWS Encryption SDK defaults to using the AES_256_GCM_HKDF_SHA512_COMMIT_KEY_ECDSA_P384 algorithm for encrypting your Data Key. If you want, you have the flexibility to customize and choose a different encryption algorithm.

changing_default_algorithm.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from __future__ import annotations

import os

from aws_encryption_sdk.identifiers import Algorithm

from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import AWSEncryptionSDKProvider
from aws_lambda_powertools.utilities.typing import LambdaContext

KMS_KEY_ARN = os.getenv("KMS_KEY_ARN", "")

encryption_provider = AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN])
data_masker = DataMasking(provider=encryption_provider)

logger = Logger()


@logger.inject_lambda_context
def lambda_handler(event: dict, context: LambdaContext) -> str:
    data: dict = event.get("body", {})

    logger.info("Encrypting whole object with a different algorithm")

    provider_options = {"algorithm": Algorithm.AES_256_GCM_HKDF_SHA512_COMMIT_KEY}

    encrypted = data_masker.encrypt(
        data,
        provider_options=provider_options,
    )

    return encrypted

Data masking request flow

The following sequence diagrams explain how DataMasking behaves under different scenarios.

Erase operation

Erasing operations occur in-memory and we cannot recover the original value.

sequenceDiagram
    autonumber
    participant Client
    participant Lambda
    participant DataMasking as Data Masking (in memory)
    Client->>Lambda: Invoke (event)
    Lambda->>DataMasking: erase(data)
    DataMasking->>DataMasking: replaces data with *****
    Note over Lambda,DataMasking: No encryption providers involved.
    DataMasking->>Lambda: data masked
    Lambda-->>Client: Return response
Simple masking operation

Encrypt operation with Encryption SDK (KMS)

We call KMS to generate an unique data key that can be used for multiple encrypt operation in-memory. It improves performance, cost and prevent throttling.

To make this operation simpler to visualize, we keep caching details in a separate sequence diagram. Caching is enabled by default.

sequenceDiagram
    autonumber
    participant Client
    participant Lambda
    participant DataMasking as Data Masking
    participant EncryptionProvider as Encryption Provider
    Client->>Lambda: Invoke (event)
    Lambda->>DataMasking: Init Encryption Provider with master key
    Note over Lambda,DataMasking: AWSEncryptionSDKProvider([KMS_KEY])
    Lambda->>DataMasking: encrypt(data)
    DataMasking->>EncryptionProvider: Create unique data key
    Note over DataMasking,EncryptionProvider: KMS GenerateDataKey API
    DataMasking->>DataMasking: Cache new unique data key
    DataMasking->>DataMasking: DATA_KEY.encrypt(data)
    DataMasking->>DataMasking: MASTER_KEY.encrypt(DATA_KEY)
    DataMasking->>DataMasking: Create encrypted message
    Note over DataMasking: Encrypted message includes encrypted data, data key encrypted, algorithm, and more.
    DataMasking->>Lambda: Ciphertext from encrypted message
    Lambda-->>Client: Return response
Encrypting operation using envelope encryption.

Encrypt operation with multiple KMS Keys

When encrypting data with multiple KMS keys, the aws_encryption_sdk makes additional API calls to encrypt the data with each of the specified keys.

sequenceDiagram
    autonumber
    participant Client
    participant Lambda
    participant DataMasking as Data Masking
    participant EncryptionProvider as Encryption Provider
    Client->>Lambda: Invoke (event)
    Lambda->>DataMasking: Init Encryption Provider with master key
    Note over Lambda,DataMasking: AWSEncryptionSDKProvider([KEY_1, KEY_2])
    Lambda->>DataMasking: encrypt(data)
    DataMasking->>EncryptionProvider: Create unique data key
    Note over DataMasking,EncryptionProvider: KMS GenerateDataKey API - KEY_1
    DataMasking->>DataMasking: Cache new unique data key
    DataMasking->>DataMasking: DATA_KEY.encrypt(data)
    DataMasking->>DataMasking: KEY_1.encrypt(DATA_KEY)
    loop For every additional KMS Key
        DataMasking->>EncryptionProvider: Encrypt DATA_KEY
        Note over DataMasking,EncryptionProvider: KMS Encrypt API - KEY_2
    end
    DataMasking->>DataMasking: Create encrypted message
    Note over DataMasking: Encrypted message includes encrypted data, all data keys encrypted, algorithm, and more.
    DataMasking->>Lambda: Ciphertext from encrypted message
    Lambda-->>Client: Return response
Encrypting operation using envelope encryption.

Decrypt operation with Encryption SDK (KMS)

We call KMS to decrypt the encrypted data key available in the encrypted message. If successful, we run authentication (context) and integrity checks (algorithm, data key length, etc) to confirm its proceedings.

Lastly, we decrypt the original encrypted data, throw away the decrypted data key for security reasons, and return the original plaintext data.

sequenceDiagram
    autonumber
    participant Client
    participant Lambda
    participant DataMasking as Data Masking
    participant EncryptionProvider as Encryption Provider
    Client->>Lambda: Invoke (event)
    Lambda->>DataMasking: Init Encryption Provider with master key
    Note over Lambda,DataMasking: AWSEncryptionSDKProvider([KMS_KEY])
    Lambda->>DataMasking: decrypt(data)
    DataMasking->>EncryptionProvider: Decrypt encrypted data key
    Note over DataMasking,EncryptionProvider: KMS Decrypt API
    DataMasking->>DataMasking: Authentication and integrity checks
    DataMasking->>DataMasking: DATA_KEY.decrypt(data)
    DataMasking->>DataMasking: MASTER_KEY.encrypt(DATA_KEY)
    DataMasking->>DataMasking: Discards decrypted data key
    DataMasking->>Lambda: Plaintext
    Lambda-->>Client: Return response
Decrypting operation using envelope encryption.

Caching encrypt operations with Encryption SDK

Without caching, every encrypt() operation would generate a new data key. It significantly increases latency and cost for ephemeral and short running environments like Lambda.

With caching, we balance ephemeral Lambda environment performance characteristics with adjustable thresholds to meet your security needs.

Data key recycling

We request a new data key when a cached data key exceeds any of the following security thresholds:

  1. Max age in seconds
  2. Max number of encrypted messages
  3. Max bytes encrypted across all operations

sequenceDiagram
    autonumber
    participant Client
    participant Lambda
    participant DataMasking as Data Masking
    participant EncryptionProvider as Encryption Provider
    Client->>Lambda: Invoke (event)
    Lambda->>DataMasking: Init Encryption Provider with master key
    Note over Lambda,DataMasking: AWSEncryptionSDKProvider([KMS_KEY])
    Lambda->>DataMasking: encrypt(data)
    DataMasking->>EncryptionProvider: Create unique data key
    Note over DataMasking,EncryptionProvider: KMS GenerateDataKey API
    DataMasking->>DataMasking: Cache new unique data key
    DataMasking->>DataMasking: DATA_KEY.encrypt(data)
    DataMasking->>DataMasking: MASTER_KEY.encrypt(DATA_KEY)
    DataMasking->>DataMasking: Create encrypted message
    Note over DataMasking: Encrypted message includes encrypted data, data key encrypted, algorithm, and more.
    DataMasking->>Lambda: Ciphertext from encrypted message
    Lambda->>DataMasking: encrypt(another_data)
    DataMasking->>DataMasking: Searches for data key in cache
    alt Is Data key in cache?
        DataMasking->>DataMasking: Reuses data key
    else Is Data key evicted from cache?
        DataMasking->>EncryptionProvider: Create unique data key
        DataMasking->>DataMasking: MASTER_KEY.encrypt(DATA_KEY)
    end
    DataMasking->>DataMasking: DATA_KEY.encrypt(data)
    DataMasking->>DataMasking: Create encrypted message
    DataMasking->>Lambda: Ciphertext from encrypted message
    Lambda-->>Client: Return response
Caching data keys during encrypt operation.

Testing your code

Testing erase operation

Testing your code with a simple erase operation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from dataclasses import dataclass

import pytest
import test_lambda_mask


@pytest.fixture
def lambda_context():
    @dataclass
    class LambdaContext:
        function_name: str = "test"
        memory_limit_in_mb: int = 128
        invoked_function_arn: str = "arn:aws:lambda:eu-west-1:111111111:function:test"
        aws_request_id: str = "52fdfc07-2182-154f-163f-5f0f9a621d72"

        def get_remaining_time_in_millis(self) -> int:
            return 5

    return LambdaContext()


def test_encrypt_lambda(lambda_context):
    # GIVEN: A sample event for testing
    event = {"testkey": "testvalue"}

    # WHEN: Invoking the lambda_handler function with the sample event and Lambda context
    result = test_lambda_mask.lambda_handler(event, lambda_context)

    # THEN: Assert that the result matches the expected output
    assert result == {"testkey": "*****"}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from __future__ import annotations

from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.typing import LambdaContext

data_masker = DataMasking()


def lambda_handler(event: dict, context: LambdaContext) -> dict:
    data = event

    erased = data_masker.erase(data, fields=["testkey"])

    return erased