Maintainers
Overview¶
Please treat this content as a living document.
This is document explains who the maintainers are, their responsibilities, and how they should be doing it. If you're interested in contributing, see CONTRIBUTING.
Current Maintainers¶
Maintainer | GitHub ID | Affiliation |
---|---|---|
Heitor Lessa | heitorlessa | Amazon |
Simon Thulbourn | sthulb | Amazon |
Ruben Fonseca | rubenfonseca | Amazon |
Leandro Damascena | leandrodamascena | Amazon |
Emeritus¶
Previous active maintainers who contributed to this project.
Maintainer | GitHub ID | Affiliation |
---|---|---|
Tom McCarthy | cakepietoast | MongoDB |
Nicolas Moutschen | nmoutschen | Apollo |
Alexander Melnyk | am29d | Amazon |
Michal Ploski | mploski | Amazon |
Labels¶
These are the most common labels used by maintainers to triage issues, pull requests (PR), and for project management:
Label | Usage | Notes |
---|---|---|
triage | New issues that require maintainers review | Issue template |
bug | Unexpected, reproducible and unintended software behavior | PR/Release automation; Doc snippets are excluded; |
not-a-bug | New and existing bug reports incorrectly submitted as bug | Analytics |
documentation | Documentation improvements | PR/Release automation; Doc additions, fixes, etc.; |
feature-request | New or enhancements to existing features | Issue template |
typing | New or enhancements to static typing | Issue template |
RFC | Technical design documents related to a feature request | Issue template |
bug-upstream | Bug caused by upstream dependency | |
help wanted | Tasks you want help from anyone to move forward | Bandwidth, complex topics, etc. |
need-customer-feedback | Tasks that need more feedback before proceeding | 80/20% rule, uncertain, etc. |
need-more-information | Missing information before making any calls | |
need-documentation | PR is missing or has incomplete documentation | |
need-issue | PR is missing a related issue for tracking change | PR automation |
need-rfc | Feature request requires a RFC to improve discussion | |
pending-release | Merged changes that will be available soon | Release automation auto-closes/notifies it |
revisit-in-3-months | Blocked issues/PRs that need to be revisited | Often related to need-customer-feedback , prioritization, etc. |
breaking-change | Changes that will cause customer impact and need careful triage | |
do-not-merge | PRs that are blocked for varying reasons | Timeline is uncertain |
size/XS | PRs between 0-9 LOC | PR automation |
size/S | PRs between 10-29 LOC | PR automation |
size/M | PRs between 30-99 LOC | PR automation |
size/L | PRs between 100-499 LOC | PR automation |
size/XL | PRs between 500-999 LOC, often PRs that grown with feedback | PR automation |
size/XXL | PRs with 1K+ LOC, largely documentation related | PR automation |
tests | PRs that add or change tests | PR automation |
<utility> |
PRs related to a Powertools for AWS Lambda (Python) utility, e.g. parameters , tracer |
PR automation |
feature | New features or minor changes | PR/Release automation |
dependencies | Changes that touch dependencies, e.g. Dependabot, etc. | PR/ automation |
github-actions | Changes in GitHub workflows | PR automation |
github-templates | Changes in GitHub issue/PR templates | PR automation |
internal | Changes in governance and chores (linting setup, baseline, etc.) | PR automation |
tech-debt | Changes in tech debt | |
customer-reference | Authorization to use company name in our documentation | Public Relations |
community-content | Suggested content to feature in our documentation | Public Relations |
Maintainer Responsibilities¶
Maintainers are active and visible members of the community, and have maintain-level permissions on a repository. Use those privileges to serve the community and evolve code as follows.
Be aware of recurring ambiguous situations and document them to help your fellow maintainers.
Uphold Code of Conduct¶
Model the behavior set forward by the Code of Conduct and raise any violations to other maintainers and admins. There could be unusual circumstances where inappropriate behavior does not immediately fall within the Code of Conduct.
These might be nuanced and should be handled with extra care - when in doubt, do not engage and reach out to other maintainers and admins.
Prioritize Security¶
Security is your number one priority. Maintainer's Github keys must be password protected securely and any reported security vulnerabilities are addressed before features or bugs.
Note that this repository is monitored and supported 24/7 by Amazon Security, see Reporting a Vulnerability for details.
Review Pull Requests¶
Review pull requests regularly, comment, suggest, reject, merge and close. Accept only high quality pull-requests. Provide code reviews and guidance on incoming pull requests.
PRs are labeled based on file changes and semantic title. Pay attention to whether labels reflect the current state of the PR and correct accordingly.
Use and enforce semantic versioning pull request titles, as these will be used for CHANGELOG and Release notes - make sure they communicate their intent at the human level.
TODO: This is an area we want to automate using the new GitHub GraphQL API.
For issues linked to a PR, make sure pending release
label is applied to them when merging. Upon release, these issues will be notified which release version contains their change.
See Common scenarios section for additional guidance.
Triage New Issues¶
Manage labels, review issues regularly, and create new labels as needed by the project. Remove triage
label when you're able to confirm the validity of a request, a bug can be reproduced, etc. Give priority to the original author for implementation, unless it is a sensitive task that is best handled by maintainers.
TODO: This is an area we want to automate using the new GitHub GraphQL API.
Make sure issues are assigned to our board of activities and have the right status.
Use our labels to signal good first issues to new community members, and to set expectation that this might need additional feedback from the author, other customers, experienced community members and/or maintainers.
Be aware of casual contributors and recurring contributors. Provide the experience and attention you wish you had if you were starting in open source.
See Common scenarios section for additional guidance.
Triage Bug Reports¶
Be familiar with our definition of bug. If it's not a bug, you can close it or adjust its title and labels - always communicate the reason accordingly.
For bugs caused by upstream dependencies, replace bug
with bug-upstream
label. Ask the author whether they'd like to raise the issue upstream or if they prefer us to do so.
Assess the impact and make the call on whether we need an emergency release. Contact other maintainers when in doubt.
See Common scenarios section for additional guidance.
Triage RFCs¶
RFC is a collaborative process to help us get to the most optimal solution given the context. Their purpose is to ensure everyone understands what this context is, their trade-offs, and alternative solutions that were part of the research before implementation begins.
Make sure you ask these questions in mind when reviewing:
- Does it use our RFC template?
- Does the match our Tenets?
- Does the proposal address the use case? If so, is the recommended usage explicit?
- Does it focus on the mechanics to solve the use case over fine-grained implementation details?
- Can anyone familiar with the code base implement it?
- If approved, are they interested in contributing? Do they need any guidance?
- Does this significantly increase the overall project maintenance? Do we have the skills to maintain it?
- If we can't take this use case, are there alternative projects we could recommend? Or does it call for a new project altogether?
When necessary, be upfront that the time to review, approve, and implement a RFC can vary - see Contribution is stuck. Some RFCs may be further updated after implementation, as certain areas become clearer.
Some examples using our initial and new RFC templates: #92, #94, #95, #991, #1226
Releasing a new version¶
Firstly, make sure the commit history in the develop
branch (1) it's up to date, (2) commit messages are semantic, and (3) commit messages have their respective area, for example feat(logger): <change>
, chore(ci): ...
).
Looks good, what's next?
Kickoff the Release
workflow with the intended version - this might take around 25m-30m to complete.
Once complete, you can start drafting the release notes to let customers know what changed and what's in it for them (a.k.a why they should care). We have guidelines in the release notes section so you know what good looks like.
NOTE: Documentation might take a few minutes to reflect the latest version due to caching and CDN invalidations.
Release process visualized¶
Every release makes hundreds of checks, security scans, canaries and deployments - all of these are automated.
This is a close visual representation of the main steps (GitHub Actions UI should be the source of truth), along with the approximate time it takes for each key step to complete.
gantt
title Release process
dateFormat HH:mm
axisFormat %H:%M
Release commit : milestone, m1, 10:00,2m
section Seal
Bump release version : active, 8s
Prevent source tampering : active, 43s
section QA
Quality checks : active, 2.2m
section Build
Checksum : active, 8s
Build release artifact : active, 39s
Seal : active, 8s
section Provenance
Attest build : active, 8s
Sign attestation : active, attestation, 10:06, 8s
section Release
Checksum : active, 8s
PyPi temp credentials : active, 8s
Publish PyPi : active, pypi, 10:07, 29s
PyPi release : milestone, m2, 10:07,1s
section Git release
Checksum : active, after pypi, 8s
Git Tag : active, 8s
Bump package version : active, 8s
Create PR : active, 8s
Upload attestation : active, 8s
section Layer release
Build (x86+ARM) : active, layer_build, 10:08, 6m
Deploy Beta : active, layer_beta, after layer_build, 6.3m
Deploy Prod : active, layer_prod, after layer_beta, 6.3m
Layer release : milestone, m3, 10:26,1s
section SAR release
Deploy Beta : active, sar_beta, after layer_beta, 2.2m
Deploy Prod : active, sar_prod, after sar_beta, 2.2m
SAR release : milestone, m4, 10:25,1s
section Docs
Create PR (Layer ARN) : active, after layer_prod, 8s
Release versioned docs : active, 2.2m
Documentation release : milestone, m4, 10:28,1m
section Post-release
Close pending issues : active, 8s
Release complete : milestone, m6, 10:31,2m
Drafting release notes¶
Visit the Releases page and choose the edit pencil button.
Make sure the tag
field reflects the new version you're releasing, the target branch field is set to develop
, and release title
matches your tag e.g., v1.26.0
.
You'll notice we group all changes based on their labels like feature
, bug
, documentation
, etc.
I spotted a typo or incorrect grouping - how do I fix it?
Edit the respective PR title and update their labels. Then run the Release Drafter workflow to update the Draft release.
NOTE: This won't change the CHANGELOG as the merge commit is immutable. Don't worry about it. We'd only rewrite git history only if this can lead to confusion and we'd pair with another maintainer.
All looking good, what's next?
The best part comes now. Replace the placeholder [Human readable summary of changes]
with what you'd like to communicate to customers what this release is all about. Rule of thumb: always put yourself in the customers shoes.
These are some questions to keep in mind when drafting your first or future release notes:
- Can customers understand at a high level what changed in this release?
- Is there a link to the documentation where they can read more about each main change?
- Are there any graphics or code snippets that can enhance readability?
- Are we calling out any key contributor(s) to this release?
- All contributors are automatically credited, use this as an exceptional case to feature them
Once you're happy, hit Publish release
πππ.
This will kick off the Publishing workflow and within a few minutes you should see the latest version in PyPi, and all issues labeled as pending-release
will be closed and notified.
Run end to end tests¶
E2E tests are run on every push to develop
or manually via run-e2e-tests workflow.
To run locally, you need AWS CDK CLI and an account bootstrapped (cdk bootstrap
). With a default AWS CLI profile configured, or AWS_PROFILE
environment variable set, run make e2e tests
.
Releasing a documentation hotfix¶
You can rebuild the latest documentation without a full release via this GitHub Actions Workflow. Choose Run workflow
, keep develop
as the branch, and input the latest Powertools for AWS Lambda (Python) version available.
This workflow will update both user guide and API documentation.
Maintain Overall Health of the Repo¶
TODO: Coordinate renaming
develop
tomain
Keep the develop
branch at production quality at all times. Backport features as needed. Cut release branches and tags to enable future patches.
Manage Roadmap¶
See Roadmap section
Ensure the repo highlights features that should be elevated to the project roadmap. Be clear about the featureβs status, priority, target version, and whether or not it should be elevated to the roadmap.
Add Continuous Integration Checks¶
Add integration checks that validate pull requests and pushes to ease the burden on Pull Request reviewers. Continuously revisit areas of improvement to reduce operational burden in all parties involved.
Negative Impact on the Project¶
Actions that negatively impact the project will be handled by the admins, in coordination with other maintainers, in balance with the urgency of the issue. Examples would be Code of Conduct violations, deliberate harmful or malicious actions, spam, monopolization, and security risks.
Becoming a maintainer¶
In 2023, we will revisit this. We need to improve our understanding of how other projects are doing, their mechanisms to promote key contributors, and how they interact daily.
We suspect this process might look similar to the OpenSearch project.
Common scenarios¶
These are recurring ambiguous situations that new and existing maintainers may encounter. They serve as guidance. It is up to each maintainer to follow, adjust, or handle in a different manner as long as our conduct is consistent
Contribution is stuck¶
A contribution can get stuck often due to lack of bandwidth and language barrier. For bandwidth issues, check whether the author needs help. Make sure you get their permission before pushing code into their existing PR - do not create a new PR unless strictly necessary.
For language barrier and others, offer a 1:1 chat to get them unblocked. Often times, English might not be their primary language, and writing in public might put them off, or come across not the way they intended to be.
In other cases, you may have constrained capacity. Use help wanted
label when you want to signal other maintainers and external contributors that you could use a hand to move it forward.
Insufficient feedback or information¶
When in doubt, use need-more-information
or need-customer-feedback
labels to signal more context and feedback are necessary before proceeding. You can also use revisit-in-3-months
label when you expect it might take a while to gather enough information before you can decide.
Crediting contributions¶
We credit all contributions as part of each release note as an automated process. If you find contributors are missing from the release note you're producing, please add them manually.
Is that a bug?¶
A bug produces incorrect or unexpected results at runtime that differ from its intended behavior. Bugs must be reproducible. They directly affect customers experience at runtime despite following its recommended usage.
Documentation snippets, use of internal components, or unadvertised functionalities are not considered bugs.
Mentoring contributions¶
Always favor mentoring issue authors to contribute, unless they're not interested or the implementation is sensitive (e.g., complexity, time to release, etc.).
Make use of help wanted
and good first issue
to signal additional contributions the community can help.
Long running issues or PRs¶
Try offering a 1:1 call in the attempt to get to a mutual understanding and clarify areas that maintainers could help.
In the rare cases where both parties don't have the bandwidth or expertise to continue, it's best to use the revisit-in-3-months
label. By then, see if it's possible to break the PR or issue in smaller chunks, and eventually close if there is no progress.
E2E framework¶
Structure¶
Our E2E framework relies on Pytest fixtures to coordinate infrastructure and test parallelization - see Test Parallelization and CDK CLI Parallelization.
tests/e2e structure
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
Where:
<feature>/infrastructure.py
. Uses CDK to define the infrastructure a given feature needs.<feature>/handlers/
. Lambda function handlers to build, deploy, and exposed as stack output in PascalCase (e.g.,BasicHandler
).utils/
. Test utilities to build data and fetch AWS data to ease assertionconftest.py
. Deploys and deletes a given feature infrastructure. Hierarchy matters:- Top-level (
e2e/conftest
). Builds Lambda Layer only once and blocks I/O across all CPU workers. - Feature-level (
e2e/<feature>/conftest
). Deploys stacks in parallel and make them independent of each other.
- Top-level (
Mechanics¶
Under BaseInfrastructure
, we hide the complexity of deployment and delete coordination under deploy
, delete
, and create_lambda_functions
methods.
This allows us to benefit from test and deployment parallelization, use IDE step-through debugging for a single test, run one, subset, or all tests and only deploy their related infrastructure, without any custom configuration.
Class diagram to understand abstraction built when defining a new stack (
LoggerStack
)
classDiagram
class InfrastructureProvider {
<<interface>>
+deploy() Dict
+delete()
+create_resources()
+create_lambda_functions() Dict~Functions~
}
class BaseInfrastructure {
+deploy() Dict
+delete()
+create_lambda_functions() Dict~Functions~
+add_cfn_output()
}
class TracerStack {
+create_resources()
}
class LoggerStack {
+create_resources()
}
class MetricsStack {
+create_resources()
}
class EventHandlerStack {
+create_resources()
}
InfrastructureProvider <|-- BaseInfrastructure : implement
BaseInfrastructure <|-- TracerStack : inherit
BaseInfrastructure <|-- LoggerStack : inherit
BaseInfrastructure <|-- MetricsStack : inherit
BaseInfrastructure <|-- EventHandlerStack : inherit
Authoring a new feature E2E test¶
Imagine you're going to create E2E for Event Handler feature for the first time. Keep the following mental model when reading:
graph LR
A["1. Define infrastructure"]-->B["2. Deploy/Delete infrastructure"]-->C["3.Access Stack outputs" ]
1. Define infrastructure¶
We use CDK as our Infrastructure as Code tool of choice. Before you start using CDK, you'd take the following steps:
- Create
tests/e2e/event_handler/infrastructure.py
file - Create a new class
EventHandlerStack
and inherit fromBaseInfrastructure
- Override
create_resources
method and define your infrastructure using CDK - (Optional) Create a Lambda function under
handlers/alb_handler.py
Excerpt
tests/e2e/event_handler/infrastructure.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Excerpt
tests/e2e/event_handler/handlers/alb_handler.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
2. Deploy/Delete infrastructure when tests run¶
We need to create a Pytest fixture for our new feature under tests/e2e/event_handler/conftest.py
.
This will instruct Pytest to deploy our infrastructure when our tests start, and delete it when they complete whether tests are successful or not. Note that this file will not need any modification in the future.
Excerpt
conftest.py
for Event Handler
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
3. Access stack outputs for E2E tests¶
Within our tests, we should now have access to the infrastructure
fixture we defined earlier in tests/e2e/event_handler/conftest.py
.
We can access any Stack Output using pytest dependency injection.
Excerpt
tests/e2e/event_handler/test_header_serializer.py
1 2 3 4 5 6 7 8 9 10 11 |
|
Internals¶
Test runner parallelization¶
Besides speed, we parallelize our end-to-end tests to ease asserting async side-effects may take a while per test too, e.g., traces to become available.
The following diagram demonstrates the process we take every time you use make e2e
locally or at CI:
graph TD
A[make e2e test] -->Spawn{"Split and group tests <br>by feature and CPU"}
Spawn -->|Worker0| Worker0_Start["Load tests"]
Spawn -->|Worker1| Worker1_Start["Load tests"]
Spawn -->|WorkerN| WorkerN_Start["Load tests"]
Worker0_Start -->|Wait| LambdaLayer["Lambda Layer build"]
Worker1_Start -->|Wait| LambdaLayer["Lambda Layer build"]
WorkerN_Start -->|Wait| LambdaLayer["Lambda Layer build"]
LambdaLayer -->|Worker0| Worker0_Deploy["Launch feature stack"]
LambdaLayer -->|Worker1| Worker1_Deploy["Launch feature stack"]
LambdaLayer -->|WorkerN| WorkerN_Deploy["Launch feature stack"]
Worker0_Deploy -->|Worker0| Worker0_Tests["Run tests"]
Worker1_Deploy -->|Worker1| Worker1_Tests["Run tests"]
WorkerN_Deploy -->|WorkerN| WorkerN_Tests["Run tests"]
Worker0_Tests --> ResultCollection
Worker1_Tests --> ResultCollection
WorkerN_Tests --> ResultCollection
ResultCollection{"Wait for workers<br/>Collect test results"}
ResultCollection --> TestEnd["Report results"]
ResultCollection --> DeployEnd["Delete Stacks"]
CDK CLI parallelization¶
For CDK CLI to work with independent CDK Apps, we specify an output directory when synthesizing our stack and deploy from said output directory.
flowchart TD
subgraph "Deploying distinct CDK Apps"
EventHandlerInfra["Event Handler CDK App"] --> EventHandlerSynth
TracerInfra["Tracer CDK App"] --> TracerSynth
EventHandlerSynth["cdk synth --out cdk.out/event_handler"] --> EventHandlerDeploy["cdk deploy --app cdk.out/event_handler"]
TracerSynth["cdk synth --out cdk.out/tracer"] --> TracerDeploy["cdk deploy --app cdk.out/tracer"]
end
We create the typical CDK app.py
at runtime when tests run, since we know which feature and Python version we're dealing with (locally or at CI).
Excerpt
cdk_app_V39.py
for Event Handler created at deploy phase
1 2 3 4 |
|
When we run E2E tests for a single feature or all of them, our cdk.out
looks like this:
1 2 3 4 5 6 7 |
|
classDiagram
class CdkOutDirectory {
feature_name/
layer_build/
layer_build.diff
}
class EventHandler {
manifest.json
stack_outputs.json
cdk_app_V39.py
asset.uuid/
...
}
class StackOutputsJson {
BasicHandlerArn: str
ALBDnsName: str
...
}
CdkOutDirectory <|-- EventHandler : feature_name/
StackOutputsJson <|-- EventHandler
Where:
<feature>
. Contains CDK Assets, CDKmanifest.json
, ourcdk_app_<PyVersion>.py
andstack_outputs.json
layer_build
. Contains our Lambda Layer source code built once, used by all stacks independentlylayer_build.diff
. Contains a hash on whether our source code has changed to speed up further deployments and E2E tests
Together, all of this allows us to use Pytest like we would for any project, use CDK CLI and its context methods (from_lookup
), and use step-through debugging for a single E2E test without any extra configuration.
NOTE: VSCode doesn't support debugging processes spawning sub-processes (like CDK CLI does w/ shell and CDK App). Maybe this works. PyCharm works just fine.