本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。 # Multi-turn 对话缓存对于具有多回合对话的应用程序，根据上下文，相同的用户消息可能具有不同的含义。例如，在关于 Valkey 的对话中，“告诉我更多” 的含义与关于 Python 的对话中的 “告诉我更多” 不同。 ## 面临的挑战 Single-prompt 缓存非常适合无状态查询。在多回合对话中，您必须缓存完整的对话上下文，而不仅仅是最后一条消息： ``` # "Tell me more" means nothing without context # Conversation A: "What is Valkey?" -> "Tell me more" (about Valkey) # Conversation B: "What is Python?" -> "Tell me more" (about Python) ``` ## 策略：上下文感知缓存密钥嵌入完整对话上下文的摘要，而不是只嵌入最后一条用户消息。这样，相似对话流程中的类似后续问题就可以重复使用缓存的答案。 ``` def build_context_string(messages: list) -> str: """Build a cacheable context string from conversation messages.""" # Use last 3 turns (6 messages: user + assistant pairs) recent = messages[-6:] parts = [] for msg in recent: role = msg["role"] content = msg["content"][:200] # Truncate long messages parts.append(f"{role}: {content}") return " | ".join(parts) ``` ## Per-user 使用 TAG 过滤器进行缓存隔离使用 TAG 字段按用户、会话或其他维度隔离缓存的对话。这样可以防止将一个用户的缓存对话返回给另一个用户： ``` # Create index with TAG field for per-user isolation valkey_client.execute_command( "FT.CREATE", "conv_cache_idx", "SCHEMA", "context_summary", "TEXT", "response", "TEXT", "user_id", "TAG", "turn_count", "NUMERIC", "embedding", "VECTOR", "HNSW", "6", "TYPE", "FLOAT32", "DIM", "1024", "DISTANCE_METRIC", "COSINE", ) ``` 使用混合过滤进行搜索（TAG \+ KNN）： ``` def lookup_conversation_cache(messages: list, user_id: str, threshold: float = 0.12): """Search cache for similar conversation contexts, scoped to a user. Note: FT.SEARCH with COSINE distance returns a distance score where 0 = identical and 2 = opposite. A lower score means higher similarity. The threshold here is a maximum distance: only return results closer than this value. """ context = build_context_string(messages) query_vec = get_embedding(context) # Hybrid search: filter by user_id TAG + KNN on context embedding results = valkey_client.execute_command( "FT.SEARCH", "conv_cache_idx", f"@user_id:{{{user_id}}}=>[KNN 1 @embedding $query_vec]", "PARAMS", "2", "query_vec", query_vec, "DIALECT", "2", ) if results[0] > 0: fields = results[2] field_dict = {fields[j]: fields[j+1] for j in range(0, len(fields), 2)} distance = float(field_dict.get("__embedding_score", "999")) if distance < threshold: # Lower distance = more similar return {"hit": True, "response": field_dict.get("response", ""), "distance": distance} return {"hit": False} ``` **注意** `@user_id:{user_123}`TAG 过滤器可确保用户 A 缓存的对话不会泄露给用户 B。混合查询 (TAG \+ KNN) 作为单个原子操作运行，即按用户进行预过滤，然后找到最近的对话上下文。 ## 缓存隔离策略 | Strategy | 标签过滤器 | 适用于 | | --- | --- | --- | | Per-user | @user\_id:{user\_123} | 个性化助手 | | Per-session | @session\_id:{sess\_abc} | Short-lived 聊天 | | 全球（共享） | 没有过滤器 (\*) | FAQ 机器人、常见查询 | | Per-model | @model:{gpt-4} | Multi-model 部署 | | Per-product | @product\_id:{prod\_456} | E-commerce 助手 |