Observing Serverless Applications: Challenges and Solutions
Serverless Applications: Understanding the Layers
- Serverless applications use computing primitives that allow developers to build applications without worrying about the underlying infrastructure.
- There are three layers of computing primitives:
- Managed Instances: Users manage everything, including the virtual machines.
- Containers: Users manage the container image, but not the networking or other infrastructure.
- True Serverless: Users only provide the code, and the runtime handles the rest.
- Similar principles apply to databases, data streaming, and queues, where users can simply request the desired capabilities without managing the underlying infrastructure.
The Challenges of Observing Serverless Applications
- Traditional monitoring and observability tools do not fit well with serverless architectures, which often involve a myriad of functions, databases, and event buses interacting with each other.
- The "Three Pillars of Observability" (metrics, logs, and traces) have fundamental issues when applied to serverless applications:
- Logs: Logs are scattered across multiple log groups, making it difficult to see emerging behaviors and correlate between functions and databases.
- Metrics: Metrics can only tell you that something is wrong, but not what caused the issues.
- Distributed Tracing: Tracing is too granular, and it's difficult to use it to observe applications in production.
The Principles of Great Observability
- High Cardinality: Data with high cardinality, such as unique request IDs, can provide valuable insights into the behavior of your application.
- High Dimensionality: Capturing a wide range of data points (high dimensionality) in a single event or "wide event" can enable you to answer any question about the emerging behavior of your application.
Leveraging Generative AI for Observability
- Generative AI models can help synthesize and present the information from traces and events in a more human-readable format, making it easier to identify and resolve issues.
- The goal is to have the AI continuously analyze the entire set of requests, traces, and events, and provide insights and suggested fixes automatically.
Cloud Flare's Approach to Serverless Observability
- Cloud Flare is building a composable, internet-native platform that provides compute, storage, and observability capabilities directly on the Cloud Flare network.
- The platform includes primitives like Cloud Flare Workers, Cloud Flare R2 (S3-compatible storage), Cloud Flare Queues, and Durable Objects (stateful serverless).
- Cloud Flare is integrating advanced observability features, such as high cardinality and high dimensionality events, directly into the platform, making it easier for developers to build observable serverless applications.
Depot's Observability Challenges and Solutions
- Depot, a build acceleration platform, uses a mix of serverless and traditional compute (EC2) components to build its architecture.
- Depot faces challenges in observing the performance and bottlenecks across its serverless components, which span multiple providers (AWS and Cloud Flare).
- Depot leverages observability tools and OpenTelemetry to monitor latency, cold starts, and other key performance indicators, allowing them to quickly identify and address issues in their serverless architecture.
In summary, the key to building observable serverless applications is to adopt the principles of high cardinality and high dimensionality, and leverage advanced tools and techniques, including generative AI, to gain deep insights into the behavior of your application.