面临的挑战策略：上下文感知缓存密钥 Per-user 使用 TAG 过滤器进行缓存隔离缓存隔离策略

Multi-turn 对话缓存

对于具有多回合对话的应用程序，根据上下文，相同的用户消息可能具有不同的含义。例如，在关于 Valkey 的对话中，“告诉我更多” 的含义与关于 Python 的对话中的 “告诉我更多” 不同。

面临的挑战

Single-prompt 缓存非常适合无状态查询。在多回合对话中，您必须缓存完整的对话上下文，而不仅仅是最后一条消息：


# "Tell me more" means nothing without context
# Conversation A: "What is Valkey?" -> "Tell me more"  (about Valkey)
# Conversation B: "What is Python?" -> "Tell me more"  (about Python)

策略：上下文感知缓存密钥

嵌入完整对话上下文的摘要，而不是只嵌入最后一条用户消息。这样，相似对话流程中的类似后续问题就可以重复使用缓存的答案。


def build_context_string(messages: list) -> str:
    """Build a cacheable context string from conversation messages."""
    # Use last 3 turns (6 messages: user + assistant pairs)
    recent = messages[-6:]
    parts = []
    for msg in recent:
        role = msg["role"]
        content = msg["content"][:200]  # Truncate long messages
        parts.append(f"{role}: {content}")
    return " | ".join(parts)

Per-user 使用 TAG 过滤器进行缓存隔离

使用 TAG 字段按用户、会话或其他维度隔离缓存的对话。这样可以防止将一个用户的缓存对话返回给另一个用户：


# Create index with TAG field for per-user isolation
valkey_client.execute_command(
    "FT.CREATE", "conv_cache_idx",
    "SCHEMA",
    "context_summary", "TEXT",
    "response", "TEXT",
    "user_id", "TAG",
    "turn_count", "NUMERIC",
    "embedding", "VECTOR", "HNSW", "6",
    "TYPE", "FLOAT32",
    "DIM", "1024",
    "DISTANCE_METRIC", "COSINE",
)

使用混合过滤进行搜索（TAG + KNN）：


def lookup_conversation_cache(messages: list, user_id: str, threshold: float = 0.12):
    """Search cache for similar conversation contexts, scoped to a user.

    Note: FT.SEARCH with COSINE distance returns a distance score where
    0 = identical and 2 = opposite. A lower score means higher similarity.
    The threshold here is a maximum distance: only return results closer
    than this value.
    """
    context = build_context_string(messages)
    query_vec = get_embedding(context)

    # Hybrid search: filter by user_id TAG + KNN on context embedding
    results = valkey_client.execute_command(
        "FT.SEARCH", "conv_cache_idx",
        f"@user_id:{{{user_id}}}=>[KNN 1 @embedding $query_vec]",
        "PARAMS", "2", "query_vec", query_vec,
        "DIALECT", "2",
    )

    if results[0] > 0:
        fields = results[2]
        field_dict = {fields[j]: fields[j+1] for j in range(0, len(fields), 2)}
        distance = float(field_dict.get("__embedding_score", "999"))
        if distance < threshold:  # Lower distance = more similar
            return {"hit": True, "response": field_dict.get("response", ""), "distance": distance}

    return {"hit": False}

注意

@user_id:{user_123}TAG 过滤器可确保用户 A 缓存的对话不会泄露给用户 B。混合查询 (TAG + KNN) 作为单个原子操作运行，即按用户进行预过滤，然后找到最近的对话上下文。

缓存隔离策略

Strategy	标签过滤器	适用于
Per-user	`@user_id:{user_123}`	个性化助手
Per-session	`@session_id:{sess_abc}`	Short-lived 聊天
全球（共享）	没有过滤器 (`*`)	FAQ 机器人、常见查询
Per-model	`@model:{gpt-4}`	Multi-model 部署
Per-product	`@product_id:{prod_456}`	E-commerce 助手

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

影响和基准

最佳实践