Data Masking
The data masking utility can encrypt, decrypt, or irreversibly erase sensitive information to protect data confidentiality.
stateDiagram-v2
direction LR
LambdaFn: Your Lambda function
DataMasking: DataMasking
Operation: Possible operations
Input: Sensitive value
Erase: <strong>Erase</strong>
Encrypt: <strong>Encrypt</strong>
Decrypt: <strong>Decrypt</strong>
Provider: AWS Encryption SDK provider
Result: Data transformed <i>(erased, encrypted, or decrypted)</i>
LambdaFn --> DataMasking
DataMasking --> Operation
state Operation {
[*] --> Input
Input --> Erase: Irreversible
Input --> Encrypt
Input --> Decrypt
Encrypt --> Provider
Decrypt --> Provider
}
Operation --> Result
Key features¶
- Encrypt, decrypt, or irreversibly erase data with ease
- Erase sensitive information in one or more fields within nested data
- Seamless integration with AWS Encryption SDK for industry and AWS security best practices
Terminology¶
Erasing replaces sensitive information irreversibly with a non-sensitive placeholder (*****
). This operation replaces data in-memory, making it a one-way action.
Encrypting transforms plaintext into ciphertext using an encryption algorithm and a cryptographic key. It allows you to encrypt any sensitive data, so only allowed personnel to decrypt it. Learn more about encryption here.
Decrypting transforms ciphertext back into plaintext using a decryption algorithm and the correct decryption key.
Encryption context is a non-secret key=value
data used for authentication like tenant_id:<id>
. This adds extra security and confirms encrypted data relationship with a context.
Encrypted message is a portable data structure that includes encrypted data along with copies of the encrypted data key. It includes everything Encryption SDK needs to validate authenticity, integrity, and to decrypt with the right master key.
Envelope encryption uses two different keys to encrypt data safely: master and data key. The data key encrypts the plaintext, and the master key encrypts the data key. It simplifies key management (you own the master key), isolates compromises to data key, and scales better with large data volumes.
graph LR
M(Master key) --> |Encrypts| D(Data key)
D(Data key) --> |Encrypts| S(Sensitive data)
Envelope encryption visualized.
Getting started¶
Tip
All examples shared in this documentation are available within the project repository.
Install¶
Add aws-lambda-powertools[datamasking]
as a dependency in your preferred tool: e.g., requirements.txt, pyproject.toml. This will install the AWS Encryption SDK.
AWS Encryption SDK contains non-Python dependencies. This means you should use AWS SAM CLI or official build container images when building your application for AWS Lambda. Local development should work as expected.
Required resources¶
By default, we use Amazon Key Management Service (KMS) for encryption and decryption operations.
Before you start, you will need a KMS symmetric key to encrypt and decrypt your data. Your Lambda function will need read and write access to it.
NOTE. We recommend setting a minimum of 1024MB of memory (CPU intensive), and separate Lambda functions for encrypt and decrypt. For more information, you can see the full reports of our load tests and traces.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
|
- Key policy examples using IAM Roles
- SAM generated CloudFormation Resources
- Required only when using multiple keys
Erasing data¶
Erasing will remove the original data and replace it with a *****
. This means you cannot recover erased data, and the data type will change to str
for all data unless the data to be erased is of an Iterable type (list
, tuple
, set
), in which case the method will return a new object of the same type as the input data but with each element replaced by the string *****
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
- See working with nested data to learn more about the
fields
parameter. If we omitfields
parameter, the entire dictionary will be erased with*****
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Encrypting data¶
About static typing and encryption
Encrypting data may lead to a different data type, as it always transforms into a string (<ciphertext>
).
To encrypt, you will need an encryption provider. Here, we will use AWSEncryptionSDKProvider
.
Under the hood, we delegate a number of operations to AWS Encryption SDK to authenticate, create a portable encryption message, and actual data encryption.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
- You can use more than one KMS Key for higher availability but increased latency. Encryption SDK will ensure the data key is encrypted with both keys.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
1 2 3 |
|
Decrypting data¶
About static typing and decryption
Decrypting data may lead to a different data type, as encrypted data is always a string (<ciphertext>
).
To decrypt, you will need an encryption provider. Here, we will use AWSEncryptionSDKProvider
.
Under the hood, we delegate a number of operations to AWS Encryption SDK to verify authentication, integrity, and actual ciphertext decryption.
NOTE. Decryption only works with KMS Key ARN.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
- Note that KMS key alias or key ID won't work.
- You can use more than one KMS Key for higher availability but increased latency. Encryption SDK will call
Decrypt
API with all master keys when trying to decrypt the data key.
1 2 3 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Encryption context for integrity and authenticity¶
For a stronger security posture, you can add metadata to each encryption operation, and verify them during decryption. This is known as additional authenticated data (AAD). These are non-sensitive data that can help protect authenticity and integrity of your encrypted data, and even help to prevent a confused deputy situation.
Important considerations you should know
- Exact match verification on decrypt. Be careful using random data like
timestamps
as encryption context if you can't provide them on decrypt. - Only
string
values are supported. We will raiseDataMaskingUnsupportedTypeError
for non-string values. - Use non-sensitive data only. When using KMS, encryption context is available as plaintext in AWS CloudTrail, unless you intentionally disabled KMS events.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
- They must match on
decrypt()
otherwise the operation will fail withDataMaskingContextMismatchError
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
- They must match otherwise the operation will fail with
DataMaskingContextMismatchError
.
Choosing parts of your data¶
Current limitations
- The
fields
parameter is not yet supported inencrypt
anddecrypt
operations. - We support
JSON
data types only - see data serialization for more details.
You can use the fields
parameter with the dot notation .
to choose one or more parts of your data to erase
. This is useful when you want to keep data structure intact except the confidential fields.
When fields
is present, erase
behaves differently:
Operation | Behavior | Example | Result |
---|---|---|---|
erase |
Replace data while keeping collections type intact. | {"cards": ["a", "b"]} |
{"cards": ["*****", "*****"]} |
Here are common scenarios to best visualize how to use fields
.
You want to erase data in the card_number
field.
Expression:
data_masker.erase(data, fields=["card_number"])
1 2 3 4 5 |
|
1 2 3 4 5 |
|
You want to erase data in the postcode
field.
Expression:
data_masker.erase(data, fields=["address.postcode"])
1 2 3 4 5 6 7 8 |
|
1 2 3 4 5 6 7 8 |
|
You want to erase data in both postcode
and street
fields.
Expression:
data_masker.erase(data, fields=["address.postcode", "address.street"])
1 2 3 4 5 6 7 8 9 |
|
1 2 3 4 5 6 7 8 9 |
|
You want to erase data under address
field.
Expression:
data_masker.erase(data, fields=["address"])
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
1 2 3 4 5 6 7 8 9 |
|
You want to erase data under name
field.
Expression:
data_masker.erase(data, fields=["category..name"])
1 2 3 4 5 6 7 8 9 10 11 |
|
1 2 3 4 5 6 7 8 9 10 11 |
|
You want to erase data under street
field located at the any index of the address list.
Expression:
data_masker.erase(data, fields=["address[*].street"])
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
You want to erase data by slicing a list.
Expression:
data_masker.erase(data, fields=["address[-1].street"])
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
You want to erase data by finding for a field with conditional expression.
Expression:
data_masker.erase(data, fields=["$.address[?(@.postcode > 12000)]"])
$
: Represents the root of the JSON structure.
.address
: Selects the "address" property within the JSON structure.
(@.postcode > 12000)
: Specifies the condition that elements should meet. It selects elements where the value of thepostcode
property isgreater than 12000
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
For comprehensive guidance on using JSONPath syntax, please refer to the official documentation available at jsonpath-ng
JSON¶
We also support data in JSON string format as input. We automatically deserialize it, then handle each field operation as expected.
Note that the return will be a deserialized JSON and your desired fields updated.
Expression: data_masker.erase(data, fields=["card_number", "address.postcode"])
1 |
|
1 2 3 4 5 6 7 8 |
|
Advanced¶
Data serialization¶
Current limitations
- Python classes,
Dataclasses
, andPydantic models
are not supported yet.
Before we traverse the data structure, we perform two important operations on input data:
- If
JSON string
, deserialize using default or provided deserializer. - If
dictionary
, normalize intoJSON
to prevent traversing unsupported data types.
When decrypting, we revert the operation to restore the original data structure.
For compatibility or performance, you can optionally pass your own JSON serializer and deserializer to replace json.dumps
and json.loads
respectively:
advanced_custom_serializer.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Using multiple keys¶
You can use multiple KMS keys from more than one AWS account for higher availability, when instantiating AWSEncryptionSDKProvider
.
using_multiple_keys.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Providers¶
AWS Encryption SDK¶
You can modify the following values when initializing the AWSEncryptionSDKProvider
to best accommodate your security and performance thresholds.
Parameter | Default | Description |
---|---|---|
local_cache_capacity | 100 |
The maximum number of entries that can be retained in the local cryptographic materials cache |
max_cache_age_seconds | 300 |
The maximum time (in seconds) that a cache entry may be kept in the cache |
max_messages_encrypted | 4294967296 |
The maximum number of messages that may be encrypted under a cache entry |
max_bytes_encrypted | 9223372036854775807 |
The maximum number of bytes that may be encrypted under a cache entry |
If required, you can customize the default values when initializing the AWSEncryptionSDKProvider
class.
aws_encryption_provider_example.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
Passing additional SDK arguments¶
As an escape hatch mechanism, you can pass additional arguments to the AWSEncryptionSDKProvider
via the provider_options
parameter.
For example, the AWS Encryption SDK defaults to using the AES_256_GCM_HKDF_SHA512_COMMIT_KEY_ECDSA_P384
algorithm for encrypting your Data Key. If you want, you have the flexibility to customize and choose a different encryption algorithm.
changing_default_algorithm.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Data masking request flow¶
The following sequence diagrams explain how DataMasking
behaves under different scenarios.
Erase operation¶
Erasing operations occur in-memory and we cannot recover the original value.
sequenceDiagram
autonumber
participant Client
participant Lambda
participant DataMasking as Data Masking (in memory)
Client->>Lambda: Invoke (event)
Lambda->>DataMasking: erase(data)
DataMasking->>DataMasking: replaces data with *****
Note over Lambda,DataMasking: No encryption providers involved.
DataMasking->>Lambda: data masked
Lambda-->>Client: Return response
Simple masking operation
Encrypt operation with Encryption SDK (KMS)¶
We call KMS to generate an unique data key that can be used for multiple encrypt
operation in-memory. It improves performance, cost and prevent throttling.
To make this operation simpler to visualize, we keep caching details in a separate sequence diagram. Caching is enabled by default.
sequenceDiagram
autonumber
participant Client
participant Lambda
participant DataMasking as Data Masking
participant EncryptionProvider as Encryption Provider
Client->>Lambda: Invoke (event)
Lambda->>DataMasking: Init Encryption Provider with master key
Note over Lambda,DataMasking: AWSEncryptionSDKProvider([KMS_KEY])
Lambda->>DataMasking: encrypt(data)
DataMasking->>EncryptionProvider: Create unique data key
Note over DataMasking,EncryptionProvider: KMS GenerateDataKey API
DataMasking->>DataMasking: Cache new unique data key
DataMasking->>DataMasking: DATA_KEY.encrypt(data)
DataMasking->>DataMasking: MASTER_KEY.encrypt(DATA_KEY)
DataMasking->>DataMasking: Create encrypted message
Note over DataMasking: Encrypted message includes encrypted data, data key encrypted, algorithm, and more.
DataMasking->>Lambda: Ciphertext from encrypted message
Lambda-->>Client: Return response
Encrypting operation using envelope encryption.
Encrypt operation with multiple KMS Keys¶
When encrypting data with multiple KMS keys, the aws_encryption_sdk
makes additional API calls to encrypt the data with each of the specified keys.
sequenceDiagram
autonumber
participant Client
participant Lambda
participant DataMasking as Data Masking
participant EncryptionProvider as Encryption Provider
Client->>Lambda: Invoke (event)
Lambda->>DataMasking: Init Encryption Provider with master key
Note over Lambda,DataMasking: AWSEncryptionSDKProvider([KEY_1, KEY_2])
Lambda->>DataMasking: encrypt(data)
DataMasking->>EncryptionProvider: Create unique data key
Note over DataMasking,EncryptionProvider: KMS GenerateDataKey API - KEY_1
DataMasking->>DataMasking: Cache new unique data key
DataMasking->>DataMasking: DATA_KEY.encrypt(data)
DataMasking->>DataMasking: KEY_1.encrypt(DATA_KEY)
loop For every additional KMS Key
DataMasking->>EncryptionProvider: Encrypt DATA_KEY
Note over DataMasking,EncryptionProvider: KMS Encrypt API - KEY_2
end
DataMasking->>DataMasking: Create encrypted message
Note over DataMasking: Encrypted message includes encrypted data, all data keys encrypted, algorithm, and more.
DataMasking->>Lambda: Ciphertext from encrypted message
Lambda-->>Client: Return response
Encrypting operation using envelope encryption.
Decrypt operation with Encryption SDK (KMS)¶
We call KMS to decrypt the encrypted data key available in the encrypted message. If successful, we run authentication (context) and integrity checks (algorithm, data key length, etc) to confirm its proceedings.
Lastly, we decrypt the original encrypted data, throw away the decrypted data key for security reasons, and return the original plaintext data.
sequenceDiagram
autonumber
participant Client
participant Lambda
participant DataMasking as Data Masking
participant EncryptionProvider as Encryption Provider
Client->>Lambda: Invoke (event)
Lambda->>DataMasking: Init Encryption Provider with master key
Note over Lambda,DataMasking: AWSEncryptionSDKProvider([KMS_KEY])
Lambda->>DataMasking: decrypt(data)
DataMasking->>EncryptionProvider: Decrypt encrypted data key
Note over DataMasking,EncryptionProvider: KMS Decrypt API
DataMasking->>DataMasking: Authentication and integrity checks
DataMasking->>DataMasking: DATA_KEY.decrypt(data)
DataMasking->>DataMasking: MASTER_KEY.encrypt(DATA_KEY)
DataMasking->>DataMasking: Discards decrypted data key
DataMasking->>Lambda: Plaintext
Lambda-->>Client: Return response
Decrypting operation using envelope encryption.
Caching encrypt operations with Encryption SDK¶
Without caching, every encrypt()
operation would generate a new data key. It significantly increases latency and cost for ephemeral and short running environments like Lambda.
With caching, we balance ephemeral Lambda environment performance characteristics with adjustable thresholds to meet your security needs.
Data key recycling
We request a new data key when a cached data key exceeds any of the following security thresholds:
- Max age in seconds
- Max number of encrypted messages
- Max bytes encrypted across all operations
sequenceDiagram
autonumber
participant Client
participant Lambda
participant DataMasking as Data Masking
participant EncryptionProvider as Encryption Provider
Client->>Lambda: Invoke (event)
Lambda->>DataMasking: Init Encryption Provider with master key
Note over Lambda,DataMasking: AWSEncryptionSDKProvider([KMS_KEY])
Lambda->>DataMasking: encrypt(data)
DataMasking->>EncryptionProvider: Create unique data key
Note over DataMasking,EncryptionProvider: KMS GenerateDataKey API
DataMasking->>DataMasking: Cache new unique data key
DataMasking->>DataMasking: DATA_KEY.encrypt(data)
DataMasking->>DataMasking: MASTER_KEY.encrypt(DATA_KEY)
DataMasking->>DataMasking: Create encrypted message
Note over DataMasking: Encrypted message includes encrypted data, data key encrypted, algorithm, and more.
DataMasking->>Lambda: Ciphertext from encrypted message
Lambda->>DataMasking: encrypt(another_data)
DataMasking->>DataMasking: Searches for data key in cache
alt Is Data key in cache?
DataMasking->>DataMasking: Reuses data key
else Is Data key evicted from cache?
DataMasking->>EncryptionProvider: Create unique data key
DataMasking->>DataMasking: MASTER_KEY.encrypt(DATA_KEY)
end
DataMasking->>DataMasking: DATA_KEY.encrypt(data)
DataMasking->>DataMasking: Create encrypted message
DataMasking->>Lambda: Ciphertext from encrypted message
Lambda-->>Client: Return response
Caching data keys during encrypt operation.
Testing your code¶
Testing erase operation¶
Testing your code with a simple erase operation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|