How developers can self-remediate AWS Security Hub findings (DEV216)
Summary of Video Transcription
Background and Objectives
The speaker's company, Kamashi, is building a system that allows developers to respond to security findings autonomously.
Kamashi provides multiple products, each with its own development, staging, and production environments.
The company has enabled security controls (SEC have) in each environment, and the findings are consolidated and managed in an audit account.
The security engineering team's mission is to enable the engineering team, rather than directly securing the product, by sharing security principles, practices, and knowledge.
The team prioritizes speed and agility while maintaining a minimum level of security, and their goal is for engineers to recognize and address security issues themselves.
The "Lever" System
The "Lever" system is designed to achieve the team's objectives.
The findings detected in each account are aggregated in the audit account and then sent to the security engineering account, where they are processed by a Lambda function.
Lever allows setting a minimum stability for each account, so that only critical or high-level notifications are sent to the on-call person.
The system provides explanation tickets that focus on why an action must be taken, along with information on how to take action, to create an environment where developers can address issues on their own.
Notifications are sent to individuals, with a backup system in place to ensure risks don't fall through the cracks.
The on-call engineer is responsible until the ticket is closed, but may work with the team to address the issue.
The security engineering team collaborates with developers to investigate and understand certain security findings that may require specialized knowledge.
Risk Management
The team uses the severity listed in the security findings to determine the risk, rather than estimating likelihood and severity, as the engineers do not yet have enough knowledge to do so.
The team initially only handles critical findings to avoid overwhelming the developers with notifications, and gradually increases the severity level as the team gains more experience.
The team also uses automations to properly manage the resources created by AWS Control Tower, to prevent drifts.
Operational Design and Outcomes
The team spent a lot of time on the operational design, which was more time-consuming than the implementation.
In the initial setup, the team excluded findings that did not need to be handled, and the on-call person would update the ticket in the ticketing system, which would trigger a change in the on-call system.
If the on-call person decides not to respond, the CTO makes the decision, and the engineer team must respond if the CTO says so.
The team introduced the "Lever" system gradually, starting with a one-month "Lampa" period, during which the engineering team tagged resources that they wanted to ignore.
The team uses the resource groups tagging API to detect new tags and understand why they were added.
After deploying the "Lever" system, the team was able to reduce the number of critical findings from 17 to 0 in the first week of operation.
Conclusion
The team continues to improve their systems and operations to create a more secure and robust environment.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.