Inside Tripadvisor’s real-time personalization with ScyllaDB and AWS (DAT204)

Key Takeaways from the Video Transcription

Challenges with Data-Intensive Applications

  • Data-intensive applications require sustaining high throughput and predictable low latencies, often in the single-digit millisecond range.
  • With growth, costs can become a concern, forcing organizations to trade-off between costs and user experience.

Introduction to ScyllaDB

  • ScyllaDB is a highly available, distributed database optimized for workloads requiring high throughput and predictable low latencies, with a focus on infrastructure cost savings.
  • ScyllaDB can deliver 5 times higher throughput and 20 times lower latencies compared to other databases, leading to up to 75% infrastructure cost savings.
  • ScyllaDB is fully compatible with Apache Cassandra and Amazon DynamoDB APIs and is available as a managed cloud solution or a self-managed enterprise deployment.

ScyllaDB Use Cases

  • Companies like Disney+, Kulu, Discord, Epic Games, EA Games, and Starbucks rely on ScyllaDB for their data-intensive applications.

Determining if ScyllaDB is a Fit

  • The provided chart shows a cluster of users with workloads ranging from 100 to 500,000 operations per second, requiring single-digit to less than 20 millisecond P99 response times, which is where Trip Advisor's use case falls.

Trip Advisor's Use of ScyllaDB

System Architecture

  • Trip Advisor's platform runs on hundreds of independently scalable microservices in containers, both on-premises and on Amazon EKS.
  • The machine learning model serving platform abstracts over 100 ML models, enabling A/B testing to find the best models.
  • The custom feature store provides both static and user features, with static features stored in Redis and user features served in real-time through ScyllaDB.

Technologies and Processes

  • The feature store serves up to 5 million static features per second and 500,000 user features per second.
  • User events and data are ingested through Amazon Kinesis, organized by microservices, and stored in ScyllaDB and the offline data warehouse.
  • ScyllaDB is used for the real-time user activity data, while the offline data warehouse is used for reporting and training ML models.
  • The microservices have strict latency requirements, with 95% of calls required to complete within 12 milliseconds.

ScyllaDB Performance

  • At peak, ScyllaDB is handling 340,000 operations per second, with CPU utilization at only 21%.
  • ScyllaDB's performance, with microsecond-level reads and millisecond-level writes, is a key reason for its adoption at Trip Advisor.

Migration to ScyllaDB

  • Trip Advisor initially used Cassandra on-premises but faced challenges with operational overhead and high tail latencies.
  • They migrated to ScyllaDB Cloud, then to the bring-your-own-account model, which allowed for improved application performance and better data privacy.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us