Batch Processing
Warning
This utility is currently released as beta developer preview and is intended strictly for feedback and testing purposes and not for production workloads. The version and all future versions tagged with the -beta
suffix should be treated as not stable. Up until before the General Availability release we might introduce significant breaking changes and improvements in response to customers feedback.
The batch processing utility handles partial failures when processing batches from Amazon SQS, Amazon Kinesis Data Streams, and Amazon DynamoDB Streams.
stateDiagram-v2
direction LR
BatchSource: Amazon SQS <br/><br/> Amazon Kinesis Data Streams <br/><br/> Amazon DynamoDB Streams <br/><br/>
LambdaInit: Lambda invocation
BatchProcessor: Batch Processor
RecordHandler: Record Handler function
YourLogic: Your logic to process each batch item
LambdaResponse: Lambda response
BatchSource --> LambdaInit
LambdaInit --> BatchProcessor
BatchProcessor --> RecordHandler
state BatchProcessor {
[*] --> RecordHandler: Your function
RecordHandler --> YourLogic
}
RecordHandler --> BatchProcessor: Collect results
BatchProcessor --> LambdaResponse: Report items that failed processing
Key features¶
- Reports batch item failures to reduce number of retries for a record upon errors
- Simple interface to process each batch record
- Build your own batch processor by extending primitives
Background¶
When using SQS, Kinesis Data Streams, or DynamoDB Streams as a Lambda event source, your Lambda functions are triggered with a batch of messages.
If your function fails to process any message from the batch, the entire batch returns to your queue or stream. This same batch is then retried until either condition happens first: a) your Lambda function returns a successful response, b) record reaches maximum retry attempts, or c) when records expire.
journey
section Conditions
Successful response: 5: Success
Maximum retries: 3: Failure
Records expired: 1: Failure
This behavior changes when you enable Report Batch Item Failures feature in your Lambda function event source configuration:
- SQS queues. Only messages reported as failure will return to the queue for a retry, while successful ones will be deleted.
- Kinesis data streams and DynamoDB streams. Single reported failure will use its sequence number as the stream checkpoint. Multiple reported failures will use the lowest sequence number as checkpoint.
Warning: This utility lowers the chance of processing records more than once; it does not guarantee it
We recommend implementing processing logic in an idempotent manner wherever possible.
You can find more details on how Lambda works with either SQS, Kinesis, or DynamoDB in the AWS Documentation.
Getting started¶
For this feature to work, you need to (1) configure your Lambda function event source to use ReportBatchItemFailures
, and (2) return a specific response to report which records failed to be processed.
You use your preferred deployment framework to set the correct configuration while this utility handles the correct response to be returned.
Required resources¶
The remaining sections of the documentation will rely on these samples. For completeness, this demonstrates IAM permissions and Dead Letter Queue where batch records will be sent after 2 retries were attempted.
You do not need any additional IAM permissions to use this utility, except for what each event source requires.
template.yaml | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
template.yaml | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
|
template.yaml | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
Processing messages from SQS¶
Processing batches from SQS works in three stages:
- Instantiate
BatchProcessor
and chooseEventType.SQS
for the event type - Define your function to handle each batch record, and use the
SQSRecord
type annotation for autocompletion - Use
processPartialResponse
to kick off processing
Info
This code example optionally uses Logger for completion.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
- Step 1. Creates a partial failure batch processor for SQS queues. See partial failure mechanics for details
- Step 2. Defines a function to receive one record at a time from the batch
- Step 3. Kicks off processing
The second record failed to be processed, therefore the processor added its message ID in the response.
1 2 3 4 5 6 7 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
FIFO queues¶
When using SQS FIFO queues, we will stop processing messages after the first failure, and return all failed and unprocessed messages in batchItemFailures
.
This helps preserve the ordering of messages in your queue.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
- Step 1. Creates a partial failure batch processor for SQS FIFO queues. See partial failure mechanics for details
Processing messages from Kinesis¶
Processing batches from Kinesis works in three stages:
- Instantiate
BatchProcessor
and chooseEventType.KinesisDataStreams
for the event type - Define your function to handle each batch record, and use the
KinesisStreamRecord
type annotation for autocompletion - Use
processPartialResponse
to kick off processing
Info
This code example optionally uses Logger for completion.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
- Step 1. Creates a partial failure batch processor for Kinesis Data Streams. See partial failure mechanics for details
The second record failed to be processed, therefore the processor added its sequence number in the response.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
1 2 3 4 5 6 7 |
|
Processing messages from DynamoDB¶
Processing batches from DynamoDB Streams works in three stages:
- Instantiate
BatchProcessor
and chooseEventType.DynamoDBStreams
for the event type - Define your function to handle each batch record, and use the
DynamoDBRecord
type annotation for autocompletion - Use
processPartialResponse
to kick off processing
Info
This code example optionally uses Logger for completion.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
- Step 1. Creates a partial failure batch processor for DynamoDB Streams. See partial failure mechanics for details
The second record failed to be processed, therefore the processor added its sequence number in the response.
1 2 3 4 5 6 7 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
Error handling¶
By default, we catch any exception raised by your record handler function. This allows us to (1) continue processing the batch, (2) collect each batch item that failed processing, and (3) return the appropriate response correctly without failing your Lambda function execution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
-
Any exception works here. See extending BatchProcessor section, if you want to override this behavior.
-
Exceptions raised in
record_handler
will propagate toprocess_partial_response
.
We catch them and include each failed batch item identifier in the response dictionary (seeSample response
tab).
1 2 3 4 5 6 7 |
|
Partial failure mechanics¶
All records in the batch will be passed to this handler for processing, even if exceptions are thrown - Here's the behaviour after completing the batch:
- All records successfully processed. We will return an empty list of item failures
{'batchItemFailures': []}
- Partial success with some exceptions. We will return a list of all item IDs/sequence numbers that failed processing
- All records failed to be processed. We will raise
BatchProcessingError
exception with a list of all exceptions raised when processing
The following sequence diagrams explain how each Batch processor behaves under different scenarios.
SQS Standard¶
Read more about Batch Failure Reporting feature in AWS Lambda.
Sequence diagram to explain how BatchProcessor
works with SQS Standard queues.
sequenceDiagram
autonumber
participant SQS queue
participant Lambda service
participant Lambda function
Lambda service->>SQS queue: Poll
Lambda service->>Lambda function: Invoke (batch event)
Lambda function->>Lambda service: Report some failed messages
activate SQS queue
Lambda service->>SQS queue: Delete successful messages
SQS queue-->>SQS queue: Failed messages return
Note over SQS queue,Lambda service: Process repeat
deactivate SQS queue
SQS mechanism with Batch Item Failures
SQS FIFO¶
Read more about Batch Failure Reporting feature in AWS Lambda.
Sequence diagram to explain how SqsFifoPartialProcessor
works with SQS FIFO queues.
sequenceDiagram
autonumber
participant SQS queue
participant Lambda service
participant Lambda function
Lambda service->>SQS queue: Poll
Lambda service->>Lambda function: Invoke (batch event)
activate Lambda function
Lambda function-->Lambda function: Process 2 out of 10 batch items
Lambda function--xLambda function: Fail on 3rd batch item
Lambda function->>Lambda service: Report 3rd batch item and unprocessed messages as failure
deactivate Lambda function
activate SQS queue
Lambda service->>SQS queue: Delete successful messages (1-2)
SQS queue-->>SQS queue: Failed messages return (3-10)
deactivate SQS queue
SQS FIFO mechanism with Batch Item Failures
Kinesis and DynamoDB Streams¶
Read more about Batch Failure Reporting feature.
Sequence diagram to explain how BatchProcessor
works with both Kinesis Data Streams and DynamoDB Streams.
For brevity, we will use Streams
to refer to either services. For theory on stream checkpoints, see this blog post
sequenceDiagram
autonumber
participant Streams
participant Lambda service
participant Lambda function
Lambda service->>Streams: Poll latest records
Lambda service->>Lambda function: Invoke (batch event)
activate Lambda function
Lambda function-->Lambda function: Process 2 out of 10 batch items
Lambda function--xLambda function: Fail on 3rd batch item
Lambda function-->Lambda function: Continue processing batch items (4-10)
Lambda function->>Lambda service: Report batch item as failure (3)
deactivate Lambda function
activate Streams
Lambda service->>Streams: Checkpoints to sequence number from 3rd batch item
Lambda service->>Streams: Poll records starting from updated checkpoint
deactivate Streams
Kinesis and DynamoDB streams mechanism with single batch item failure
The behavior changes slightly when there are multiple item failures. Stream checkpoint is updated to the lowest sequence number reported.
Note that the batch item sequence number could be different from batch item number in the illustration.
sequenceDiagram
autonumber
participant Streams
participant Lambda service
participant Lambda function
Lambda service->>Streams: Poll latest records
Lambda service->>Lambda function: Invoke (batch event)
activate Lambda function
Lambda function-->Lambda function: Process 2 out of 10 batch items
Lambda function--xLambda function: Fail on 3-5 batch items
Lambda function-->Lambda function: Continue processing batch items (6-10)
Lambda function->>Lambda service: Report batch items as failure (3-5)
deactivate Lambda function
activate Streams
Lambda service->>Streams: Checkpoints to lowest sequence number
Lambda service->>Streams: Poll records starting from updated checkpoint
deactivate Streams
Kinesis and DynamoDB streams mechanism with multiple batch item failures
Processing messages asynchronously¶
You can use AsyncBatchProcessor
class and asyncProcessPartialResponse
function to process messages concurrently.
When is this useful?
Your use case might be able to process multiple records at the same time without conflicting with one another.
For example, imagine you need to process multiple loyalty points and incrementally save in a database. While you await the database to confirm your records are saved, you could start processing another request concurrently.
The reason this is not the default behaviour is that not all use cases can handle concurrency safely (e.g., loyalty points must be updated in order).
High-concurrency with AsyncBatchProcessor | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
Advanced¶
Accessing processed messages¶
Use the BatchProcessor
directly in your function to access a list of all returned values from your recordHandler
function.
- When successful. We will include a tuple with
success
, the result ofrecordHandler
, and the batch record - When failed. We will include a tuple with
fail
, exception as a string, and the batch record
Accessing processed messages | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
- The processor requires the records array. This is typically handled by
processPartialResponse
. - You need to register the
batch
, therecordHandler
function, and optionally thecontext
to access the Lambda context.
Accessing Lambda Context¶
Within your recordHandler
function, you might need access to the Lambda context to determine how much time you have left before your function times out.
We can automatically inject the Lambda context into your recordHandler
as optional second argument if you register it when using BatchProcessor
or the processPartialResponse
function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
Extending BatchProcessor¶
You might want to bring custom logic to the existing BatchProcessor
to slightly override how we handle successes and failures.
For these scenarios, you can subclass BatchProcessor
and quickly override successHandler
and failureHandler
methods:
successHandler()
– Keeps track of successful batch recordsfailureHandler()
– Keeps track of failed batch records
Example
Let's suppose you'd like to add a metric named BatchRecordFailures
for each batch record that failed processing
Extending failure handling mechanism in BatchProcessor | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
Create your own partial processor¶
You can create your own partial batch processor from scratch by inheriting the BasePartialProcessor
class, and implementing the prepare()
, clean()
, processRecord()
and asyncProcessRecord()
abstract methods.
classDiagram
direction LR
class BasePartialProcessor {
<<interface>>
+prepare()
+clean()
+processRecord(record: BaseRecord)
+asyncProcessRecord(record: BaseRecord)
}
class YourCustomProcessor {
+prepare()
+clean()
+processRecord(record: BaseRecord)
+asyncProcessRecord(record: BaseRecord)
}
BasePartialProcessor <|-- YourCustomProcessor : extends
Visual representation to bring your own processor
processRecord()
– handles all processing logic for each individual message of a batch, including calling therecordHandler
(this.handler
)prepare()
– called once as part of the processor initializationclean()
– teardown logic called once afterprocessRecord
completesasyncProcessRecord()
– If you need to implement asynchronous logic, use this method, otherwise define it in your class with empty logic
You can then use this class as a context manager, or pass it to processPartialResponse
to process the records in your Lambda handler function.
Creating a custom batch processor | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
|
Testing your code¶
As there is no external calls, you can unit test your code with BatchProcessor
quite easily.
Example:
Given a SQS batch where the first batch record succeeds and the second fails processing, we should have a single item reported in the function response.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
events/sqs_event.json | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|