Kafka Consumer
The Kafka utility transparently handles message deserialization, provides an intuitive developer experience, and integrates seamlessly with the rest of the Powertools for AWS Lambda ecosystem.
flowchart LR
KafkaTopic["Kafka Topic"] --> MSK["Amazon MSK"]
KafkaTopic --> MSKServerless["Amazon MSK Serverless"]
KafkaTopic --> SelfHosted["Self-hosted Kafka"]
MSK --> EventSourceMapping["Event Source Mapping"]
MSKServerless --> EventSourceMapping
SelfHosted --> EventSourceMapping
EventSourceMapping --> Lambda["Lambda Function"]
Lambda --> KafkaUtility["Kafka Utility"]
KafkaUtility --> Deserialization["Deserialization"]
Deserialization --> YourLogic["Your Business Logic"]
Key features¶
- Automatic deserialization of Kafka messages (JSON, Avro, and Protocol Buffers)
- Simplified event record handling with familiar Kafka
ConsumerRecords
interface - Support for key and value deserialization
- Support for ESM with and without Schema Registry integration
- Proper error handling for deserialization issues
Terminology¶
Event Source Mapping (ESM) A Lambda feature that reads from streaming sources (like Kafka) and invokes your Lambda function. It manages polling, batching, and error handling automatically, eliminating the need for consumer management code.
Record Key and Value A Kafka messages contain two important parts: an optional key that determines the partition and a value containing the actual message data. Both are base64-encoded in Lambda events and can be independently deserialized.
Deserialization Is the process of converting binary data (base64-encoded in Lambda events) into usable Java objects according to a specific format like JSON, Avro, or Protocol Buffers. Powertools handles this conversion automatically.
DeserializationType enum Contains parameters that tell Powertools how to interpret message data, including the format type (JSON, Avro, Protocol Buffers).
Schema Registry Is a centralized service that stores and validates schemas, ensuring producers and consumers maintain compatibility when message formats evolve over time.
Moving from traditional Kafka consumers¶
Lambda processes Kafka messages as discrete events rather than continuous streams, requiring a different approach to consumer development that Powertools for AWS helps standardize.
Aspect | Traditional Kafka Consumers | Lambda Kafka Consumer |
---|---|---|
Model | Pull-based (you poll for messages) | Push-based (Lambda invoked with messages) |
Scaling | Manual scaling configuration | Automatic scaling to partition count |
State | Long-running application with state | Stateless, ephemeral executions |
Offsets | Manual offset management | Automatic offset commitment |
Schema Validation | Client-side schema validation | Optional Schema Registry integration with Event Source Mapping |
Error Handling | Per-message retry control | Batch-level retry policies |
Getting started¶
Installation¶
Add the Powertools for AWS Lambda Kafka dependency to your project. Make sure to also add the kafka-clients
library as a dependency. The utility supports kafka-clients >= 3.0.0
.
1 2 3 4 5 6 7 8 9 10 11 |
|
1 2 3 4 5 |
|
Required resources¶
To use the Kafka utility, you need an AWS Lambda function configured with a Kafka event source. This can be Amazon MSK, MSK Serverless, or a self-hosted Kafka cluster.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Using ESM with Schema Registry¶
The Event Source Mapping configuration determines which mode is used. With JSON
, Lambda converts all messages to JSON before invoking your function. With SOURCE
mode, Lambda preserves the original format, requiring you function to handle the appropriate deserialization.
Powertools for AWS supports both Schema Registry integration modes in your Event Source Mapping configuration.
Processing Kafka events¶
The Kafka utility transforms raw Lambda Kafka events into an intuitive format for processing. To handle messages effectively, you'll need to configure the @Deserialization
annotation that matches your data format. Based on the deserializer you choose, incoming records are directly transformed into your business objects which can be auto-generated classes from Avro / Protobuf or simple POJOs.
Using Avro is recommended
We recommend Avro for production Kafka implementations due to its schema evolution capabilities, compact binary format, and integration with Schema Registry. This offers better type safety and forward/backward compatibility compared to JSON.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Full examples on GitHub
A full example including how to generate Avro and Protobuf Java classes can be found on GitHub at https://github.com/aws-powertools/powertools-lambda-java/tree/main/examples/powertools-examples-kafka.
Deserializing keys and values¶
The @Deserialization
annotation deserializes both keys and values based on your type configuration. This flexibility allows you to work with different data formats in the same message.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Handling primitive types¶
When working with primitive data types (strings, integers, etc.) rather than structured objects, you can use any deserialization type such as KAFKA_JSON
. Simply place the primitive type like Integer
or String
in the ConsumerRecords
generic type parameters, and the library will automatically handle primitive type deserialization.
Common pattern: Keys with primitive values
Using primitive types (strings, integers) as Kafka message keys is a common pattern for partitioning and identifying messages. Powertools automatically handles these primitive keys without requiring special configuration, making it easy to implement this popular design pattern.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
Message format support and comparison¶
The Kafka utility supports multiple serialization formats to match your existing Kafka implementation. Choose the format that best suits your needs based on performance, schema evolution requirements, and ecosystem compatibility.
Selecting the right format
For new applications, consider Avro or Protocol Buffers over JSON. Both provide schema validation, evolution support, and significantly better performance with smaller message sizes. Avro is particularly well-suited for Kafka due to its built-in schema evolution capabilities.
Format | DeserializationType | Description | Required Dependencies |
---|---|---|---|
JSON | KAFKA_JSON |
Human-readable text format | Jackson |
Avro | KAFKA_AVRO |
Compact binary format with schema | Apache Avro |
Protocol Buffers | KAFKA_PROTOBUF |
Efficient binary format | Protocol Buffers |
Lambda Default | LAMBDA_DEFAULT |
Uses Lambda's built-in deserialization (equivalent to removing the @Deserialization annotation) |
None |
Feature | JSON | Avro | Protocol Buffers |
---|---|---|---|
Schema Definition | Optional | Required schema file | Required .proto file |
Schema Evolution | None | Strong support | Strong support |
Size Efficiency | Low | High | Highest |
Processing Speed | Slower | Fast | Fastest |
Human Readability | High | Low | Low |
Implementation Complexity | Low | Medium | Medium |
Additional Dependencies | None | Apache Avro | Protocol Buffers |
Choose the serialization format that best fits your needs:
- JSON: Best for simplicity and when schema flexibility is important
- Avro: Best for systems with evolving schemas and when compatibility is critical
- Protocol Buffers: Best for performance-critical systems with structured data
- Lambda Default: Best for simple string-based messages or when using Lambda's built-in deserialization
Advanced¶
Accessing record metadata¶
Each Kafka record contains important metadata that you can access alongside the deserialized message content. This metadata helps with message processing, troubleshooting, and implementing advanced patterns like exactly-once processing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
Available metadata properties¶
Property | Description | Example Use Case |
---|---|---|
topic() |
Topic name the record was published to | Routing logic in multi-topic consumers |
partition() |
Kafka partition number | Tracking message distribution |
offset() |
Position in the partition | De-duplication, exactly-once processing |
timestamp() |
Unix timestamp when record was created | Event timing analysis |
timestampType() |
Timestamp type (CREATE_TIME or LOG_APPEND_TIME) | Data lineage verification |
headers() |
Key-value pairs attached to the message | Cross-cutting concerns like correlation IDs |
key() |
Deserialized message key | Customer ID or entity identifier |
value() |
Deserialized message content | The actual business data |
Error handling¶
Handle errors gracefully when processing Kafka messages to ensure your application maintains resilience and provides clear diagnostic information. The Kafka utility integrates with standard Java exception handling patterns.
Treating Deserialization errors
Read Deserialization failures. Deserialization failures will fail the whole batch and do not execute your handler.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
Integrating with Idempotency¶
When processing Kafka messages in Lambda, failed batches can result in message reprocessing. The idempotency utility prevents duplicate processing by tracking which messages have already been handled, ensuring each message is processed exactly once.
The Idempotency utility automatically stores the result of each successful operation, returning the cached result if the same message is processed again, which prevents potentially harmful duplicate operations like double-charging customers or double-counting metrics.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
Ensuring exactly-once processing
The @Idempotent
annotation will use the JSON representation of the Payment object to make sure that the same object is only processed exactly once. Even if a batch fails and Lambda retries the messages, each unique payment will be processed exactly once.
Best practices¶
Batch size configuration¶
The number of Kafka records processed per Lambda invocation is controlled by your Event Source Mapping configuration. Properly sized batches optimize cost and performance.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Different workloads benefit from different batch configurations:
- High-volume, simple processing: Use larger batches (100-500 records) with short timeout
- Complex processing with database operations: Use smaller batches (10-50 records)
- Mixed message sizes: Set appropriate batching window (1-5 seconds) to handle variability
Cross-language compatibility¶
When using binary serialization formats across multiple programming languages, ensure consistent schema handling to prevent deserialization failures.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
Common cross-language challenges to address:
- Field naming conventions: camelCase in Java vs snake_case in Python
- Date/time: representation differences
- Numeric precision handling: especially decimals
Troubleshooting¶
Deserialization failures¶
The Java Kafka utility registers a custom Lambda serializer that performs eager deserialization of all records in the batch before your handler method is invoked.
This means that if any record in the batch fails deserialization, a RuntimeException
will be thrown with a concrete error message explaining why deserialization failed, and your handler method will never be called.
Key implications:
- Batch-level failure: If one record fails deserialization, the entire batch fails
- Early failure detection: Deserialization errors are caught before your business logic runs
- Clear error messages: The
RuntimeException
provides specific details about what went wrong - No partial processing: You cannot process some records while skipping failed ones within the same batch
Example of deserialization failure:
1 2 |
|
Handler method not invoked on deserialization failure
When deserialization fails, your handleRequest
method will not be invoked at all. The RuntimeException
is thrown before your handler code runs, preventing any processing of the batch.
Handling deserialization failures:
Since deserialization happens before your handler is called, you cannot catch these exceptions within your handler method. Instead, configure your Event Source Mapping with appropriate error handling:
- Dead Letter Queue (DLQ): Configure a DLQ to capture failed batches for later analysis
- Maximum Retry Attempts: Set appropriate retry limits to avoid infinite retries
- Batch Size: Use smaller batch sizes to minimize the impact of individual record failures
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Schema compatibility issues¶
Schema compatibility issues often manifest as successful connections but failed deserialization. Common causes include:
- Schema evolution without backward compatibility: New producer schema is incompatible with consumer schema
- Field type mismatches: For example, a field changed from String to Integer across systems
- Missing required fields: Fields required by the consumer schema but absent in the message
- Default value discrepancies: Different handling of default values between languages
When using Schema Registry, verify schema compatibility rules are properly configured for your topics and that all applications use the same registry.
Memory and timeout optimization¶
Lambda functions processing Kafka messages may encounter resource constraints, particularly with large batches or complex processing logic.
For memory errors:
- Increase Lambda memory allocation, which also provides more CPU resources
- Process fewer records per batch by adjusting the
BatchSize
parameter in your event source mapping - Consider optimizing your message format to reduce memory footprint
For timeout issues:
- Extend your Lambda function timeout setting to accommodate processing time
- Implement chunked or asynchronous processing patterns for time-consuming operations
- Monitor and optimize database operations, external API calls, or other I/O operations in your handler
Monitoring memory usage
Use CloudWatch metrics to track your function's memory utilization. If it consistently exceeds 80% of allocated memory, consider increasing the memory allocation or optimizing your code.
Kafka workflow¶
Using ESM with Schema Registry validation (SOURCE)¶
sequenceDiagram
participant Kafka
participant ESM as Event Source Mapping
participant SchemaRegistry as Schema Registry
participant Lambda
participant KafkaUtility
participant YourCode
Kafka->>+ESM: Send batch of records
ESM->>+SchemaRegistry: Validate schema
SchemaRegistry-->>-ESM: Confirm schema is valid
ESM->>+Lambda: Invoke with validated records (still encoded)
Lambda->>+KafkaUtility: Pass Kafka event
KafkaUtility->>KafkaUtility: Parse event structure
loop For each record
KafkaUtility->>KafkaUtility: Decode base64 data
KafkaUtility->>KafkaUtility: Deserialize based on DeserializationType
end
KafkaUtility->>+YourCode: Provide ConsumerRecords
YourCode->>YourCode: Process records
YourCode-->>-KafkaUtility: Return result
KafkaUtility-->>-Lambda: Pass result back
Lambda-->>-ESM: Return response
ESM-->>-Kafka: Acknowledge processed batch
Using ESM with Schema Registry deserialization (JSON)¶
sequenceDiagram
participant Kafka
participant ESM as Event Source Mapping
participant SchemaRegistry as Schema Registry
participant Lambda
participant KafkaUtility
participant YourCode
Kafka->>+ESM: Send batch of records
ESM->>+SchemaRegistry: Validate and deserialize
SchemaRegistry->>SchemaRegistry: Deserialize records
SchemaRegistry-->>-ESM: Return deserialized data
ESM->>+Lambda: Invoke with pre-deserialized JSON records
Lambda->>+KafkaUtility: Pass Kafka event
KafkaUtility->>KafkaUtility: Parse event structure
loop For each record
KafkaUtility->>KafkaUtility: Decode base64 data
KafkaUtility->>KafkaUtility: Record is already deserialized
KafkaUtility->>KafkaUtility: Map to POJO (if specified)
end
KafkaUtility->>+YourCode: Provide ConsumerRecords
YourCode->>YourCode: Process records
YourCode-->>-KafkaUtility: Return result
KafkaUtility-->>-Lambda: Pass result back
Lambda-->>-ESM: Return response
ESM-->>-Kafka: Acknowledge processed batch
Using ESM without Schema Registry integration¶
sequenceDiagram
participant Kafka
participant Lambda
participant KafkaUtility
participant YourCode
Kafka->>+Lambda: Invoke with batch of records (direct integration)
Lambda->>+KafkaUtility: Pass raw Kafka event
KafkaUtility->>KafkaUtility: Parse event structure
loop For each record
KafkaUtility->>KafkaUtility: Decode base64 data
KafkaUtility->>KafkaUtility: Deserialize based on DeserializationType
end
KafkaUtility->>+YourCode: Provide ConsumerRecords
YourCode->>YourCode: Process records
YourCode-->>-KafkaUtility: Return result
KafkaUtility-->>-Lambda: Pass result back
Lambda-->>-Kafka: Acknowledge processed batch
Testing your code¶
Testing Kafka consumer functions is straightforward with JUnit. You can construct Kafka ConsumerRecords
in the default way provided by the kafka-clients library without needing a real Kafka cluster.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
Extra Resources¶
Lambda Custom Serializers Compatibility¶
This Kafka utility uses Lambda custom serializers to provide automatic deserialization of Kafka messages.
Important compatibility considerations:
- Existing custom serializers: This utility will not be compatible if you already use your own custom Lambda serializer in your project
- Non-Kafka handlers: Installing this library will not affect default Lambda serialization behavior for non-Kafka related handlers
- Kafka-specific: The custom serialization only applies to handlers annotated with
@Deserialization
- Lambda default fallback: Using
@Deserialization(type = DeserializationType.LAMBDA_DEFAULT)
will proxy to Lambda's default serialization behavior
Need help with compatibility?
If you are blocked from adopting this utility due to existing custom serializers or other compatibility concerns, please contact us with your specific use-cases. We'd like to understand your requirements and explore potential solutions.
For more information about Lambda custom serialization, see the official AWS documentation.