Comparison of LLM Prompt Caching: Cloudflare AI Gateway, Portkey, and Amazon Bedrock

As more applications rely on large language models (LLMs), prompt caching has become essential for making these systems efficient and cost-effective. Caching reduces repeated requests to LLMs, saving money, decreasing latency, and improving observability and manageability.

In this article, we compare three major prompt caching tools - Cloudflare AI Gateway, Portkey, and Amazon Bedrock Caching. Each tool brings unique strengths to caching: performance, observability, control, or security. Let’s explore each.

Cloudflare AI Gateway

Cloudflare AI Gateway acts as a proxy to AI providers such as OpenAI, enabling features like cost and token analytics, observability, logging, and caching.

Key Features

Caches LLM responses based on prompts and serves them from Cloudflare Edge Cache.
Supports custom cost parameters to track external and LLM-specific costs.
Rate limiting.
Additional authorization header support.
Guardrails to flag or block unsafe or inappropriate content.
Evaluation tools.
Integrates seamlessly with other Cloudflare services like Workers AI and AutoRAG.

Configuration

You can use the baseURL with your LLM provider’s SDK or the Vercel AI SDK to set up the AI Gateway.

OpenAI SDK


import OpenAI from "openai";

const openai = new OpenAI({

  apiKey: process.env.OPENAI_API_KEY,

  baseURL: "https://gateway.ai.cloudflare.com/v1/account-id/gateway/openai",

});

Vercel AI SDK


import { createOpenAI } from "@ai-sdk/openai";

const openai = createOpenAI({

  baseURL: "https://gateway.ai.cloudflare.com/v1/account-id/gateway/openai",

});

Model Support (as of April 2025)

Supported model providers include: Amazon Bedrock, Anthropic, Azure OpenAI, Cartesia, Cerebras, Cohere, DeepSeek, ElevenLabs, Google AI Studio, Google Vertex AI, Grok, Groq, HuggingFace, Mistral AI, OpenAI, OpenRouter, Perplexity, Universal Endpoint, Replicate, Workers AI.

Pricing

AI Gateway is freely available with limits of 25MB request size and 1-month cache TTL. A usage-based paid plan includes persistent logging and advanced analytics.

Portkey

Portkey is a platform focused on simplifying AI observability and control. Unlike Cloudflare AI Gateway, Portkey doesn’t require storing actual LLM API keys in your environment. Instead, it uses virtual keys issued by the platform.

Key Feature

Simple and semantic caching (semantic caching uses similarity detection).
Automatic retries.
Fallback and conditional routing (e.g., fastest, cheapest, smartest).
Guardrails via third-party tool integration.
Budget limits.
Integrations with external platforms (e.g., vector databases, agents, and libraries).

Configuration

Portkey can be used via its SDK or through OpenAI/Vercel AI SDK.

Portkey SDK


import { Portkey } from 'portkey-ai';

const portkey = new Portkey({

  apiKey: "portkeyAPIKey",

  virtualKey: "open-ai-virtual-&lt;id>"

});

const chatCompletion = await portkey.chat.completions.create({

  messages: [{ role: 'user', content: 'What is Portkey' }],

  model: "gpt-4o",

  maxTokens: 64

});

console.log(chatCompletion.choices);

OpenAI SDK with Portkey Gateway


import OpenAI from 'openai';

import { createHeaders, PORTKEY_GATEWAY_URL } from 'portkey-ai';

const openai = new OpenAI({

  apiKey: 'xxx', // Ignored

  baseURL: PORTKEY_GATEWAY_URL,

  defaultHeaders: createHeaders({

    virtualKey: "open-ai-virtual-&lt;id>",

    apiKey: "portkeyAPIKey"

  })

});

Model Support (as of April 2025)

Supported model providers include: OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Cohere, Google Gemini, Vertex AI, Perplexity AI, AI21, Anyscale, Byollm, Dashscope, Deepbricks, DeepInfra, DeepSeek, Fireworks AI, Github Models, Github, Google Palm, Groq, Jina AI, Lambda, Lemonfox AI, Lingyi (01.ai), LocalAI, Mistral AI, MonsterAPI, Moonshot, nCompass, Nebius, Nomic AI, Ollama, Openrouter, Predibase, Recraft AI, Recraft, Siliconflow, Snowflake Cortex, Snowflake, Stability AI, Together AI, Triton, Upstage, Voyage-AI, Workers AI, ZhipuAI.

Pricing

Portkey’s free tier includes simple caching and logging (10k logs, 90-day max cache age). The subscription plan starts at $49/month and unlocks premium features like semantic caching and access control.

Amazon Bedrock Prompt Caching

Amazon Bedrock provides managed prompt caching to reduce latency and costs, especially for high-throughput applications like document Q&A chatbots or coding assistants.

Key Feature

Automatically caches prompt prefixes once a token threshold is reached.
Reduces latency by up to 85% and costs by up to 90%.
Integrates with Bedrock Agents
Cache TTL is 5 minutes from last access (resets with each read).

Configuration

Caching is enabled by default for supported models. To configure explicitly:


import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient({ region: "us-west-2" });

const input = {

  modelId: "anthropic.claude-v2",

  content: [ /* your messages */ ],

  promptCaching: "ENABLED"

};

const command = new InvokeModelCommand(input);

const response = await client.send(command);

console.log(response);

Model Support (as of April 2025)

Prompt caching is supported for:

Anthropic Claude 3.5 Sonnet V2
Claude 3.5 Haiku
Amazon Titan models (Nova Micro, Lite, Pro)

Available in AWS regions: US West (Oregon) and US East (N. Virginia).

Pricing

Cache reads are cheaper than standard token reads. Claude models charge for cache writes, but Amazon Titan models do not. Caching significantly lowers costs for repeated prompts.

TLDR;

Feature	Cloudflare AI Gateway	Portkey	Amazon Bedrock Caching
Caching Type	Edge-based prompt-response caching	Simple and semantic caching	Managed prompt prefix caching
Caching Control	TTL settings via Cloudflare caching rules	TTL and semantic match tuning	Automatic, 5 min TTL (extends on read)
Retry Logic	No	Yes (automatic retries)	No
Routing/Fallbacks	Yes	Yes (conditional routing based on cost/speed/accuracy)	No
Observability	Basic analytics and logging	Advanced analytics, budget limits, third-party integrations	Limited to AWS metrics
Guardrails	Basic content flagging	Third-party guardrail integration	Native AWS security compliance
Multi-Provider Support	Some major providers	Extensive list of providers	Limited to support Bedrock models
Pricing Model	Free tier + paid plans	Free tier + paid plans ($49+/month)	Pay per cache read/write usage

What Should You Choose?

Go with Cloudflare AI Gateway if you want fast, edge-based caching and already use Cloudflare.
Choose Portkey if you need detailed control, are testing lots of prompts, or use different LLMs.
Pick Amazon Bedrock if you’re on AWS and want a simple, secure solution with minimal setup.

Cloudflare AI Gateway

Key Features

Configuration

OpenAI SDK

Vercel AI SDK

Model Support (as of April 2025)

Pricing

Portkey

Key Feature

Configuration

Portkey SDK

OpenAI SDK with Portkey Gateway

Model Support (as of April 2025)

Pricing

Amazon Bedrock Prompt Caching

Key Feature

Configuration

Model Support (as of April 2025)

Pricing

TLDR;

What Should You Choose?

Author(s)

Vishwasa Navada K

Your Digital Journey deserves a great story.

Build one with us.

Recommended Blogs

Headquarters

Delivery Centre

Comparison of LLM Prompt Caching: Cloudflare AI Gateway, Portkey, and Amazon Bedrock

Cloudflare AI Gateway

Key Features

Configuration

OpenAI SDK

Vercel AI SDK

Model Support (as of April 2025)

Pricing

Portkey

Key Feature

Configuration

Portkey SDK

OpenAI SDK with Portkey Gateway

Model Support (as of April 2025)

Pricing

Amazon Bedrock Prompt Caching

Key Feature

Configuration

Model Support (as of April 2025)

Pricing

TLDR;

What Should You Choose?

Author(s)

Vishwasa Navada K

Your Digital Journey deserves a great story.

Build one with us.

Recommended Blogs

This website stores cookies on your computer.