本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。 # 使用 ElastiCache for Valkey 實作語意快取下列逐步解說說明如何使用 ElastiCache for Valkey 搭配 Amazon Bedrock 實作讀取語意快取。 ## 步驟 1：建立 ElastiCache for Valkey 叢集使用建立版本為 8.2 或更新版本的 ElastiCache for Valkey 叢集 AWS CLI： ``` aws elasticache create-replication-group \ --replication-group-id "valkey-semantic-cache" \ --cache-node-type cache.r7g.large \ --engine valkey \ --engine-version 8.2 \ --num-node-groups 1 \ --replicas-per-node-group 1 ``` ## 步驟 2：連線至叢集並設定內嵌從在 Amazon EC2 執行個體上執行的應用程式程式碼，連線至 ElastiCache 叢集並設定內嵌模型： ``` from valkey.cluster import ValkeyCluster from langchain_aws import BedrockEmbeddings # Connect to ElastiCache for Valkey valkey_client = ValkeyCluster( host="mycluster.xxxxxx.clustercfg.use1.cache.amazonaws.com", # Your cluster endpoint port=6379, decode_responses=False ) # Set up Amazon Bedrock Titan embeddings embeddings = BedrockEmbeddings( model_id="amazon.titan-embed-text-v2:0", region_name="us-east-1" ) ``` 將主機值取代為 ElastiCache 叢集的組態端點。如需尋找叢集端點的指示，請參閱[存取 ElastiCache 叢集](accessing-elasticache.md)。 ## 步驟 3：建立語意快取的向量索引設定 ValkeyStore，使用具有 COSINE 距離的 HNSW 索引自動內嵌查詢，以進行向量搜尋： ``` from langgraph_checkpoint_aws import ValkeyStore from hashlib import md5 store = ValkeyStore( client=valkey_client, index={ "collection_name": "semantic_cache", "embed": embeddings, "fields": ["query"], # Fields to vectorize "index_type": "HNSW", # Vector search algorithm "distance_metric": "COSINE", # Similarity metric "dims": 1024 # Titan V2 produces 1024-d vectors } ) store.setup() def cache_key_for_query(query: str): """Generate a deterministic cache key for a query.""" return md5(query.encode("utf-8")).hexdigest() ``` **注意** ElastiCache for Valkey 使用索引來提供快速且準確的向量搜尋。`FT.CREATE` 命令會建立基礎索引。如需詳細資訊，請參閱 [ ElastiCache 的向量搜尋](search.md)。 ## 步驟 4：實作快取搜尋和更新函數建立函數以搜尋快取是否有語意相似的查詢，以及存放新的查詢回應對： ``` def search_cache(user_message: str, k: int = 3, min_similarity: float = 0.8): """Look up a semantically similar cached response from ElastiCache.""" hits = store.search( namespace="semantic-cache", query=user_message, limit=k ) if not hits: return None # Sort by similarity score (highest first) hits = sorted(hits, key=lambda h: h["score"], reverse=True) top_hit = hits[0] score = top_hit["score"] if score < min_similarity: return None # Below similarity threshold return top_hit["value"]["answer"] # Return cached answer def store_cache(user_message: str, result_message: str): """Store a new query-response pair in the semantic cache.""" key = cache_key_for_query(user_message) store.put( namespace="semantic-cache", key=key, value={ "query": user_message, "answer": result_message } ) ``` ## 步驟 5：實作讀取快取模式將快取整合到應用程式的請求處理： ``` import time def handle_query(user_message: str) -> dict: """Handle a user query with read-through semantic cache.""" start = time.time() # Step 1: Search the semantic cache cached_response = search_cache(user_message, min_similarity=0.8) if cached_response: # Cache hit - return cached response elapsed = (time.time() - start) * 1000 return { "response": cached_response, "source": "cache", "latency_ms": round(elapsed, 1), } # Step 2: Cache miss - invoke LLM llm_response = invoke_llm(user_message) # Your LLM invocation function # Step 3: Store the response in cache for future reuse store_cache(user_message, llm_response) elapsed = (time.time() - start) * 1000 return { "response": llm_response, "source": "llm", "latency_ms": round(elapsed, 1), } ``` ## 基礎的 Valkey 命令下表顯示用於實作語意快取的 Valkey 命令： | 作業 | Valkey 命令 | 典型延遲 | | --- | --- | --- | | 建立索引 | FT.CREATE semantic\_cache SCHEMA query TEXT answer TEXT embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1024 DISTANCE\_METRIC COSINE | 一次性設定 | | 快取查詢 | FT.SEARCH semantic\_cache "\*=>[KNN 3 @embedding $query\_vec]" PARAMS 2 query\_vec [bytes] DIALECT 2 | 微秒 | | 儲存回應 | HSET cache:{hash} query "..." answer "..." embedding [bytes] | 微秒 | | 設定 TTL | EXPIRE cache:{hash} 82800 | 微秒 | | LLM 推論（遺漏） | 對 Amazon Bedrock 的外部 API 呼叫 | 500–6000 毫秒 |