TalksAWS re:Invent 2025 - Ripple: Building an intelligent, multi-agent system for 24/7 operations-IND3301

AWS re:Invent 2025 - Ripple: Building an intelligent, multi-agent system for 24/7 operations-IND3301

Ripple: Building an Intelligent, Multi-Agent System for 24/7 Operations

Overview

  • Ripple, a fintech company, has built an AI-powered operations platform to transform their platform operations using AWS services.
  • The goal is to leverage AI and automation to improve the monitoring and troubleshooting of the XRPL (XRP Ledger) blockchain network.

Challenges

  • Reliance on C++ experts to make sense of the large volumes of debug logs from the decentralized XRPL network.
  • Difficulty in quickly identifying and resolving issues due to the complexity of the peer-to-peer network.
  • Need for a more scalable and efficient way to process and analyze the massive amounts of operational data.

Solution Architecture

The solution consists of several key components:

Log Processing Pipeline

  • Raw logs from XRPL nodes (validators, hubs, client handlers) are ingested into Amazon S3 using GitHub workflows and AWS Systems Manager.
  • An S3 event trigger invokes a Lambda function to parse the log data, extract relevant chunks, and store them in Amazon CloudWatch.

Code Analysis Pipeline

  • The Rippled and Standards repositories are automatically synced using Amazon EventBridge.
  • The Git repository processor pulls the latest changes, versions the code and documentation, and stores them in Amazon S3.
  • A knowledge base injection job then ingests this data into an Amazon Neptune graph database.

Graph-Based Code Analysis

  • The code and documentation data stored in Neptune is used to build a lexical graph, capturing relationships between entities like functions, classes, and modules.
  • This graph-based approach enables efficient retrieval and analysis of the codebase, leveraging the Bedrock knowledge base and re-ranking capabilities.

Multi-Agent Platform

  • The core of the solution is a multi-agent platform built using the Strand SDK, an open-source framework for coordinating AI agents.
  • The platform consists of four AI agents:
    1. Orchestrator Agent: Receives user queries, classifies the intent, and coordinates the execution of specialist agents.
    2. Code Analysis Agent: Leverages the graph-based knowledge base to provide insights about the XRPL codebase.
    3. Log Analysis Agent: Performs operational analytics on the CloudWatch log data, using a CloudWatch query generator agent to construct accurate queries.
    4. CloudWatch Query Generator Agent: Generates optimized CloudWatch queries based on predefined patterns and instructions.

Prompt Engineering and Model Integration

  • The AI agents use carefully crafted system prompts to define their roles, responsibilities, and behavioral guardrails.
  • The agents are powered by large language models, with the flexibility to use different models for different tasks (e.g., a lighter model for the Orchestrator, a more capable model for the specialist agents).
  • The agents use the Model Context Protocol (MCP) to interact with external systems like Amazon CloudWatch.

Demonstration and Key Capabilities

  • The solution is demonstrated through a chat-based interface, where users can ask questions about the XRPL network and receive detailed, correlated responses.
  • Examples include:
    • Identifying the number of proposals received by a UNL (Unique Node List) validator from other peers.
    • Correlating log messages to understand the sequence of events during a consensus round.
  • The solution provides enhanced data troubleshooting and processing capabilities, reducing the time and effort required to investigate operational issues.

Lessons Learned and Future Opportunities

  • Importance of context engineering and prompt design for effective AI-powered solutions.
  • Benefit of removing the bi-directional dependency between platform engineers and C++ experts.
  • Potential to expand the solution's capabilities, such as:
    • Leveraging agent memory and identity management features in Amazon Bedrock.
    • Implementing transaction-level monitoring and analysis to detect and mitigate network-level issues like spam and scams.

Technical Details

  • AWS services used: Amazon S3, Amazon CloudWatch, Amazon Neptune, Amazon EventBridge, Amazon API Gateway, Amazon Cognito, AWS Lambda, AWS Secrets Manager, AWS Systems Manager Parameter Store.
  • AI frameworks and tools: Strand SDK, Bedrock Agent Core, Claude language model.
  • Key metrics: 900+ XRPL nodes, 100 million ledgers processed, 0.4 cents per transaction fee.

Business Impact

  • Improved operational efficiency and reduced time to investigate and resolve issues in the XRPL network.
  • Empowered platform engineers to quickly analyze logs and codebase without relying on C++ experts.
  • Potential to expand the solution's capabilities to address additional operational and security challenges in the XRPL ecosystem.

Conclusion

Ripple's AI-powered operations platform demonstrates the power of leveraging AI and automation to tackle complex, decentralized infrastructure challenges. By building a multi-agent system on AWS, Ripple has been able to improve the monitoring and troubleshooting of their XRPL network, leading to enhanced operational efficiency and reduced reliance on specialized expertise.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.