步驟 1：建立 ElastiCache for Valkey 叢集步驟 2：連線至叢集並設定內嵌步驟 3：建立語意快取的向量索引步驟 4：實作快取搜尋和更新函數步驟 5：實作讀取快取模式基礎的 Valkey 命令

使用 ElastiCache for Valkey 實作語意快取

下列逐步解說說明如何使用 ElastiCache for Valkey 搭配 Amazon Bedrock 實作讀取語意快取。

步驟 1：建立 ElastiCache for Valkey 叢集

使用建立版本為 8.2 或更新版本的 ElastiCache for Valkey 叢集 AWS CLI：


aws elasticache create-replication-group \
  --replication-group-id "valkey-semantic-cache" \
  --cache-node-type cache.r7g.large \
  --engine valkey \
  --engine-version 8.2 \
  --num-node-groups 1 \
  --replicas-per-node-group 1

步驟 2：連線至叢集並設定內嵌

從在 Amazon EC2 執行個體上執行的應用程式程式碼，連線至 ElastiCache 叢集並設定內嵌模型：


from valkey.cluster import ValkeyCluster
from langchain_aws import BedrockEmbeddings

# Connect to ElastiCache for Valkey
valkey_client = ValkeyCluster(
    host="mycluster.xxxxxx.clustercfg.use1.cache.amazonaws.com",  # Your cluster endpoint
    port=6379,
    decode_responses=False
)

# Set up Amazon Bedrock Titan embeddings
embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v2:0",
    region_name="us-east-1"
)

將主機值取代為 ElastiCache 叢集的組態端點。如需尋找叢集端點的指示，請參閱存取 ElastiCache 叢集。

步驟 3：建立語意快取的向量索引

設定 ValkeyStore，使用具有 COSINE 距離的 HNSW 索引自動內嵌查詢，以進行向量搜尋：


from langgraph_checkpoint_aws import ValkeyStore
from hashlib import md5

store = ValkeyStore(
    client=valkey_client,
    index={
        "collection_name": "semantic_cache",
        "embed": embeddings,
        "fields": ["query"],           # Fields to vectorize
        "index_type": "HNSW",          # Vector search algorithm
        "distance_metric": "COSINE",   # Similarity metric
        "dims": 1024                   # Titan V2 produces 1024-d vectors
    }
)
store.setup()

def cache_key_for_query(query: str):
    """Generate a deterministic cache key for a query."""
    return md5(query.encode("utf-8")).hexdigest()

注意

ElastiCache for Valkey 使用索引來提供快速且準確的向量搜尋。FT.CREATE 命令會建立基礎索引。如需詳細資訊，請參閱 ElastiCache 的向量搜尋。

步驟 4：實作快取搜尋和更新函數

建立函數以搜尋快取是否有語意相似的查詢，以及存放新的查詢回應對：


def search_cache(user_message: str, k: int = 3, min_similarity: float = 0.8):
    """Look up a semantically similar cached response from ElastiCache."""
    hits = store.search(
        namespace="semantic-cache",
        query=user_message,
        limit=k
    )
    if not hits:
        return None

    # Sort by similarity score (highest first)
    hits = sorted(hits, key=lambda h: h["score"], reverse=True)
    top_hit = hits[0]
    score = top_hit["score"]

    if score < min_similarity:
        return None  # Below similarity threshold

    return top_hit["value"]["answer"]  # Return cached answer


def store_cache(user_message: str, result_message: str):
    """Store a new query-response pair in the semantic cache."""
    key = cache_key_for_query(user_message)
    store.put(
        namespace="semantic-cache",
        key=key,
        value={
            "query": user_message,
            "answer": result_message
        }
    )

步驟 5：實作讀取快取模式

將快取整合到應用程式的請求處理：


import time

def handle_query(user_message: str) -> dict:
    """Handle a user query with read-through semantic cache."""
    start = time.time()

    # Step 1: Search the semantic cache
    cached_response = search_cache(user_message, min_similarity=0.8)

    if cached_response:
        # Cache hit - return cached response
        elapsed = (time.time() - start) * 1000
        return {
            "response": cached_response,
            "source": "cache",
            "latency_ms": round(elapsed, 1),
        }

    # Step 2: Cache miss - invoke LLM
    llm_response = invoke_llm(user_message)  # Your LLM invocation function

    # Step 3: Store the response in cache for future reuse
    store_cache(user_message, llm_response)

    elapsed = (time.time() - start) * 1000
    return {
        "response": llm_response,
        "source": "llm",
        "latency_ms": round(elapsed, 1),
    }

基礎的 Valkey 命令

下表顯示用於實作語意快取的 Valkey 命令：

作業	Valkey 命令	典型延遲
建立索引	`FT.CREATE semantic_cache SCHEMA query TEXT answer TEXT embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1024 DISTANCE_METRIC COSINE`	一次性設定
快取查詢	`FT.SEARCH semantic_cache "*=>[KNN 3 @embedding $query_vec]" PARAMS 2 query_vec [bytes] DIALECT 2`	微秒
儲存回應	`HSET cache:{hash} query "..." answer "..." embedding [bytes]`	微秒
設定 TTL	`EXPIRE cache:{hash} 82800`	微秒
LLM 推論（遺漏）	對 Amazon Bedrock 的外部 API 呼叫	500–6000 毫秒

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

先決條件

影響和基準