AWS re:Invent 2025-Finding the Noisy Neighbor: Patterns for Per‑Customer Performance at Scale-MAM354

Finding the Noisy Neighbor: Patterns for Per‑Customer Performance at Scale

Challenges of High Cardinality Infrastructure

Zenesk, an AWS partner, has an infrastructure with hundreds of thousands of customers, leading to high cardinality challenges:

Trying to monitor and troubleshoot issues for a small subset of customers on a shared infrastructure
Balancing observability, cost, and data sensitivity
Dealing with the "teeter totter" of cost and observability

Defining the "Noisy Neighbor" Problem

The "noisy neighbor" analogy - one customer's activity negatively impacting the performance for other customers on a shared infrastructure

Troubleshooting a customer incident with limited visibility into the underlying infrastructure

Improving Observability with Tagging and Tracing

Tagging infrastructure components (e.g., tenants, microservices) to gain more visibility into the customer's issue

Using APM (Application Performance Monitoring) tools like Datadog to trace the entire call flow and identify potential bottlenecks

The "REST" Approach to Observability

Recognize: Identify the key metrics, logs, and infrastructure monitoring needed to detect issues

Examine: Analyze the data to determine the root cause, such as hot partitions, backlogs, or latency spikes

Shape: Implement controls and limits to protect customers, such as rate limiting or adjusting resource allocations

Test: Regularly review the effectiveness of the observability setup and make iterative improvements

Optimizing Observability Costs

Reducing log ingestion and indexing by only capturing necessary data (e.g., errors, specific customer traces)

Leveraging less expensive observability tiers, such as APM and metrics, to gain visibility without the high cost of full log ingestion

Aligning observability costs with the value provided to the business and customers

Lessons Learned and Key Takeaways

Start with a tagging strategy before building dashboards

Invest early in per-tenant KPIs and heat maps to identify performance issues

Treat cost as a first-class dimension in observability decisions

Ensure proper metadata and context is captured to enable proactive issue resolution

Foster a company culture that is comfortable discussing and optimizing observability costs

Technical Details and Business Impact

Zenesk uses AWS services like Aurora, ElastiCache, and EC2 to power their high-cardinality infrastructure

Improved response times and customer satisfaction by gaining better visibility into the infrastructure

Optimized cloud resource utilization and observability costs through iterative improvements

Real-World Examples and Results

Reduced log ingestion and indexing costs by only capturing necessary data (e.g., errors, specific customer traces)

Leveraged APM and metrics to gain visibility without the high cost of full log ingestion

Aligned observability costs with the value provided to the business and customers, avoiding budget conflicts

AWS re:Invent 2025-Finding the Noisy Neighbor: Patterns for Per‑Customer Performance at Scale-MAM354

Finding the Noisy Neighbor: Patterns for Per‑Customer Performance at Scale

Challenges of High Cardinality Infrastructure

Defining the "Noisy Neighbor" Problem

Improving Observability with Tagging and Tracing

The "REST" Approach to Observability

Optimizing Observability Costs

Lessons Learned and Key Takeaways

Technical Details and Business Impact

Real-World Examples and Results

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

AWS re:Invent 2025-Finding the Noisy Neighbor: Patterns for Per‑Customer Performance at Scale-MAM354

Finding the Noisy Neighbor: Patterns for Per‑Customer Performance at Scale

Challenges of High Cardinality Infrastructure

Defining the "Noisy Neighbor" Problem

Improving Observability with Tagging and Tracing

The "REST" Approach to Observability

Optimizing Observability Costs

Lessons Learned and Key Takeaways

Technical Details and Business Impact

Real-World Examples and Results

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.