close button
Comparison of LLM Prompt Caching: Cloudflare AI Gateway, Portkey, and Amazon Bedrock
vishwasnavadak.png Vishwasa Navada K
5 min read Jun 2, 2025
aws

Comparison of LLM Prompt Caching: Cloudflare AI Gateway, Portkey, and Amazon Bedrock

aws

As more applications rely on large language models (LLMs), prompt caching has become essential for making these systems efficient and cost-effective. Caching reduces repeated requests to LLMs, saving money, decreasing latency, and improving observability and manageability.

In this article, we compare three major prompt caching tools - Cloudflare AI Gateway, Portkey, and Amazon Bedrock Caching. Each tool brings unique strengths to caching: performance, observability, control, or security. Let’s explore each.

Cloudflare AI Gateway

Cloudflare AI Gateway acts as a proxy to AI providers such as OpenAI, enabling features like cost and token analytics, observability, logging, and caching.

Key Features

  • Caches LLM responses based on prompts and serves them from Cloudflare Edge Cache.
  • Supports custom cost parameters to track external and LLM-specific costs.
  • Rate limiting.
  • Additional authorization header support.
  • Guardrails to flag or block unsafe or inappropriate content.
  • Evaluation tools.
  • Integrates seamlessly with other Cloudflare services like Workers AI and AutoRAG.

Configuration

You can use the baseURL with your LLM provider’s SDK or the Vercel AI SDK to set up the AI Gateway.

OpenAI SDK

import OpenAI from "openai";

const openai = new OpenAI({

  apiKey: process.env.OPENAI_API_KEY,

  baseURL: "https://gateway.ai.cloudflare.com/v1/account-id/gateway/openai",

});
Vercel AI SDK

import { createOpenAI } from "@ai-sdk/openai";

const openai = createOpenAI({

  baseURL: "https://gateway.ai.cloudflare.com/v1/account-id/gateway/openai",

});

Model Support (as of April 2025)

Supported model providers include: Amazon Bedrock, Anthropic, Azure OpenAI, Cartesia, Cerebras, Cohere, DeepSeek, ElevenLabs, Google AI Studio, Google Vertex AI, Grok, Groq, HuggingFace, Mistral AI, OpenAI, OpenRouter, Perplexity, Universal Endpoint, Replicate, Workers AI.

Pricing

AI Gateway is freely available with limits of 25MB request size and 1-month cache TTL. A usage-based paid plan includes persistent logging and advanced analytics.

Portkey

Portkey is a platform focused on simplifying AI observability and control. Unlike Cloudflare AI Gateway, Portkey doesn’t require storing actual LLM API keys in your environment. Instead, it uses virtual keys issued by the platform.

Key Feature

  • Simple and semantic caching (semantic caching uses similarity detection).
  • Automatic retries.
  • Fallback and conditional routing (e.g., fastest, cheapest, smartest).
  • Guardrails via third-party tool integration.
  • Budget limits.
  • Integrations with external platforms (e.g., vector databases, agents, and libraries).

Configuration

Portkey can be used via its SDK or through OpenAI/Vercel AI SDK.

Portkey SDK

import { Portkey } from 'portkey-ai';

const portkey = new Portkey({

  apiKey: "portkeyAPIKey",

  virtualKey: "open-ai-virtual-<id>"

});

const chatCompletion = await portkey.chat.completions.create({

  messages: [{ role: 'user', content: 'What is Portkey' }],

  model: "gpt-4o",

  maxTokens: 64

});

console.log(chatCompletion.choices);
OpenAI SDK with Portkey Gateway

import OpenAI from 'openai';

import { createHeaders, PORTKEY_GATEWAY_URL } from 'portkey-ai';

const openai = new OpenAI({

  apiKey: 'xxx', // Ignored

  baseURL: PORTKEY_GATEWAY_URL,

  defaultHeaders: createHeaders({

    virtualKey: "open-ai-virtual-<id>",

    apiKey: "portkeyAPIKey"

  })

});

Model Support (as of April 2025)

Supported model providers include: OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Cohere, Google Gemini, Vertex AI, Perplexity AI, AI21, Anyscale, Byollm, Dashscope, Deepbricks, DeepInfra, DeepSeek, Fireworks AI, Github Models, Github, Google Palm, Groq, Jina AI, Lambda, Lemonfox AI, Lingyi (01.ai), LocalAI, Mistral AI, MonsterAPI, Moonshot, nCompass, Nebius, Nomic AI, Ollama, Openrouter, Predibase, Recraft AI, Recraft, Siliconflow, Snowflake Cortex, Snowflake, Stability AI, Together AI, Triton, Upstage, Voyage-AI, Workers AI, ZhipuAI.

Pricing

Portkey’s free tier includes simple caching and logging (10k logs, 90-day max cache age). The subscription plan starts at $49/month and unlocks premium features like semantic caching and access control.

Amazon Bedrock Prompt Caching

Amazon Bedrock provides managed prompt caching to reduce latency and costs, especially for high-throughput applications like document Q&A chatbots or coding assistants.

Key Feature

  • Automatically caches prompt prefixes once a token threshold is reached.
  • Reduces latency by up to 85% and costs by up to 90%.
  • Integrates with Bedrock Agents
  • Cache TTL is 5 minutes from last access (resets with each read).

Configuration

Caching is enabled by default for supported models. To configure explicitly:


import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient({ region: "us-west-2" });

const input = {

  modelId: "anthropic.claude-v2",

  content: [ /* your messages */ ],

  promptCaching: "ENABLED"

};

const command = new InvokeModelCommand(input);

const response = await client.send(command);

console.log(response);

Model Support (as of April 2025)

Prompt caching is supported for:

  • Anthropic Claude 3.5 Sonnet V2
  • Claude 3.5 Haiku
  • Amazon Titan models (Nova Micro, Lite, Pro)

Available in AWS regions: US West (Oregon) and US East (N. Virginia).

Pricing

Cache reads are cheaper than standard token reads. Claude models charge for cache writes, but Amazon Titan models do not. Caching significantly lowers costs for repeated prompts.

TLDR;

Feature Cloudflare AI Gateway Portkey Amazon Bedrock Caching
Caching Type Edge-based prompt-response caching Simple and semantic caching Managed prompt prefix caching
Caching Control TTL settings via Cloudflare caching rules TTL and semantic match tuning Automatic, 5 min TTL (extends on read)
Retry Logic No Yes (automatic retries) No
Routing/Fallbacks Yes Yes (conditional routing based on cost/speed/accuracy) No
Observability Basic analytics and logging Advanced analytics, budget limits, third-party integrations Limited to AWS metrics
Guardrails Basic content flagging Third-party guardrail integration Native AWS security compliance
Multi-Provider Support Some major providers Extensive list of providers Limited to support Bedrock models
Pricing Model Free tier + paid plans Free tier + paid plans ($49+/month) Pay per cache read/write usage

What Should You Choose?

  • Go with Cloudflare AI Gateway if you want fast, edge-based caching and already use Cloudflare.
  • Choose Portkey if you need detailed control, are testing lots of prompts, or use different LLMs.
  • Pick Amazon Bedrock if you’re on AWS and want a simple, secure solution with minimal setup.
Application Modernization Icon

Innovate faster, and go farther with serverless-native application development. Explore limitless possibilities with AntStack's serverless solutions. Empowering your business to achieve your most audacious goals.

Talk to us

Author(s)

Tags

Your Digital Journey deserves a great story.

Build one with us.

Recommended Blogs

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.