Comparison of LLM Prompt Caching: Cloudflare AI Gateway, Portkey, and Amazon Bedrock
As more applications rely on large language models (LLMs), prompt caching has become essential for making these systems efficient and cost-effective. Caching reduces repeated requests to LLMs, saving money, decreasing latency, and improving observability and manageability.
In this article, we compare three major prompt caching tools - Cloudflare AI Gateway, Portkey, and Amazon Bedrock Caching. Each tool brings unique strengths to caching: performance, observability, control, or security. Let’s explore each.
Cloudflare AI Gateway
Cloudflare AI Gateway acts as a proxy to AI providers such as OpenAI, enabling features like cost and token analytics, observability, logging, and caching.
Key Features
- Caches LLM responses based on prompts and serves them from Cloudflare Edge Cache.
- Supports custom cost parameters to track external and LLM-specific costs.
- Rate limiting.
- Additional authorization header support.
- Guardrails to flag or block unsafe or inappropriate content.
- Evaluation tools.
- Integrates seamlessly with other Cloudflare services like Workers AI and AutoRAG.
Configuration
You can use the baseURL with your LLM provider’s SDK or the Vercel AI SDK to set up the AI Gateway.
OpenAI SDK
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: "https://gateway.ai.cloudflare.com/v1/account-id/gateway/openai",
});
Vercel AI SDK
import { createOpenAI } from "@ai-sdk/openai";
const openai = createOpenAI({
baseURL: "https://gateway.ai.cloudflare.com/v1/account-id/gateway/openai",
});
Model Support (as of April 2025)
Supported model providers include: Amazon Bedrock, Anthropic, Azure OpenAI, Cartesia, Cerebras, Cohere, DeepSeek, ElevenLabs, Google AI Studio, Google Vertex AI, Grok, Groq, HuggingFace, Mistral AI, OpenAI, OpenRouter, Perplexity, Universal Endpoint, Replicate, Workers AI.
Pricing
AI Gateway is freely available with limits of 25MB request size and 1-month cache TTL. A usage-based paid plan includes persistent logging and advanced analytics.
Portkey
Portkey is a platform focused on simplifying AI observability and control. Unlike Cloudflare AI Gateway, Portkey doesn’t require storing actual LLM API keys in your environment. Instead, it uses virtual keys issued by the platform.
Key Feature
- Simple and semantic caching (semantic caching uses similarity detection).
- Automatic retries.
- Fallback and conditional routing (e.g., fastest, cheapest, smartest).
- Guardrails via third-party tool integration.
- Budget limits.
- Integrations with external platforms (e.g., vector databases, agents, and libraries).
Configuration
Portkey can be used via its SDK or through OpenAI/Vercel AI SDK.
Portkey SDK
import { Portkey } from 'portkey-ai';
const portkey = new Portkey({
apiKey: "portkeyAPIKey",
virtualKey: "open-ai-virtual-<id>"
});
const chatCompletion = await portkey.chat.completions.create({
messages: [{ role: 'user', content: 'What is Portkey' }],
model: "gpt-4o",
maxTokens: 64
});
console.log(chatCompletion.choices);
OpenAI SDK with Portkey Gateway
import OpenAI from 'openai';
import { createHeaders, PORTKEY_GATEWAY_URL } from 'portkey-ai';
const openai = new OpenAI({
apiKey: 'xxx', // Ignored
baseURL: PORTKEY_GATEWAY_URL,
defaultHeaders: createHeaders({
virtualKey: "open-ai-virtual-<id>",
apiKey: "portkeyAPIKey"
})
});
Model Support (as of April 2025)
Supported model providers include: OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Cohere, Google Gemini, Vertex AI, Perplexity AI, AI21, Anyscale, Byollm, Dashscope, Deepbricks, DeepInfra, DeepSeek, Fireworks AI, Github Models, Github, Google Palm, Groq, Jina AI, Lambda, Lemonfox AI, Lingyi (01.ai), LocalAI, Mistral AI, MonsterAPI, Moonshot, nCompass, Nebius, Nomic AI, Ollama, Openrouter, Predibase, Recraft AI, Recraft, Siliconflow, Snowflake Cortex, Snowflake, Stability AI, Together AI, Triton, Upstage, Voyage-AI, Workers AI, ZhipuAI.
Pricing
Portkey’s free tier includes simple caching and logging (10k logs, 90-day max cache age). The subscription plan starts at $49/month and unlocks premium features like semantic caching and access control.
Amazon Bedrock Prompt Caching
Amazon Bedrock provides managed prompt caching to reduce latency and costs, especially for high-throughput applications like document Q&A chatbots or coding assistants.
Key Feature
- Automatically caches prompt prefixes once a token threshold is reached.
- Reduces latency by up to 85% and costs by up to 90%.
- Integrates with Bedrock Agents
- Cache TTL is 5 minutes from last access (resets with each read).
Configuration
Caching is enabled by default for supported models. To configure explicitly:
import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";
const client = new BedrockRuntimeClient({ region: "us-west-2" });
const input = {
modelId: "anthropic.claude-v2",
content: [ /* your messages */ ],
promptCaching: "ENABLED"
};
const command = new InvokeModelCommand(input);
const response = await client.send(command);
console.log(response);
Model Support (as of April 2025)
Prompt caching is supported for:
- Anthropic Claude 3.5 Sonnet V2
- Claude 3.5 Haiku
- Amazon Titan models (Nova Micro, Lite, Pro)
Available in AWS regions: US West (Oregon) and US East (N. Virginia).
Pricing
Cache reads are cheaper than standard token reads. Claude models charge for cache writes, but Amazon Titan models do not. Caching significantly lowers costs for repeated prompts.
TLDR;
Feature | Cloudflare AI Gateway | Portkey | Amazon Bedrock Caching |
Caching Type | Edge-based prompt-response caching | Simple and semantic caching | Managed prompt prefix caching |
Caching Control | TTL settings via Cloudflare caching rules | TTL and semantic match tuning | Automatic, 5 min TTL (extends on read) |
Retry Logic | No | Yes (automatic retries) | No |
Routing/Fallbacks | Yes | Yes (conditional routing based on cost/speed/accuracy) | No |
Observability | Basic analytics and logging | Advanced analytics, budget limits, third-party integrations | Limited to AWS metrics |
Guardrails | Basic content flagging | Third-party guardrail integration | Native AWS security compliance |
Multi-Provider Support | Some major providers | Extensive list of providers | Limited to support Bedrock models |
Pricing Model | Free tier + paid plans | Free tier + paid plans ($49+/month) | Pay per cache read/write usage |
What Should You Choose?
- Go with Cloudflare AI Gateway if you want fast, edge-based caching and already use Cloudflare.
- Choose Portkey if you need detailed control, are testing lots of prompts, or use different LLMs.
- Pick Amazon Bedrock if you’re on AWS and want a simple, secure solution with minimal setup.