Fase 1: creare un cluster for Valkey ElastiCache Fase 2: Connect al cluster e configurazione degli incorporamenti Fase 3: Creare l'indice vettoriale per la cache semantica Fase 4: Implementazione delle funzioni di ricerca e aggiornamento della cache Fase 5: Implementazione del modello di cache di lettura Comandi Valkey sottostanti

Implementazione di una cache semantica con ElastiCache for Valkey

La seguente procedura dettagliata mostra come implementare una cache semantica di lettura utilizzando ElastiCache Valkey con Amazon Bedrock.

Fase 1: creare un cluster for Valkey ElastiCache

Crea un cluster ElastiCache for Valkey con versione 8.2 o successiva utilizzando: AWS CLI


aws elasticache create-replication-group \
  --replication-group-id "valkey-semantic-cache" \
  --cache-node-type cache.r7g.large \
  --engine valkey \
  --engine-version 8.2 \
  --num-node-groups 1 \
  --replicas-per-node-group 1

Fase 2: Connect al cluster e configurazione degli incorporamenti

Dal codice dell'applicazione in esecuzione sulla tua istanza Amazon EC2, connettiti al ElastiCache cluster e configura il modello di incorporamento:


from valkey.cluster import ValkeyCluster
from langchain_aws import BedrockEmbeddings

# Connect to ElastiCache for Valkey
valkey_client = ValkeyCluster(
    host="mycluster.xxxxxx.clustercfg.use1.cache.amazonaws.com",  # Your cluster endpoint
    port=6379,
    decode_responses=False
)

# Set up Amazon Bedrock Titan embeddings
embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v2:0",
    region_name="us-east-1"
)

Sostituisci il valore dell'host con l'endpoint di configurazione del ElastiCache cluster. Per istruzioni su come trovare l'endpoint del cluster, consulta Accesso al ElastiCache cluster.

Fase 3: Creare l'indice vettoriale per la cache semantica

Configura un programma ValkeyStore che incorpori automaticamente le query utilizzando un indice HNSW con distanza COSINE per la ricerca vettoriale:


from langgraph_checkpoint_aws import ValkeyStore
from hashlib import md5

store = ValkeyStore(
    client=valkey_client,
    index={
        "collection_name": "semantic_cache",
        "embed": embeddings,
        "fields": ["query"],           # Fields to vectorize
        "index_type": "HNSW",          # Vector search algorithm
        "distance_metric": "COSINE",   # Similarity metric
        "dims": 1024                   # Titan V2 produces 1024-d vectors
    }
)
store.setup()

def cache_key_for_query(query: str):
    """Generate a deterministic cache key for a query."""
    return md5(query.encode("utf-8")).hexdigest()

Nota

ElastiCache for Valkey utilizza un indice per fornire una ricerca vettoriale rapida e accurata. Il FT.CREATE comando crea l'indice sottostante. Per ulteriori informazioni, vedere Ricerca vettoriale per ElastiCache.

Fase 4: Implementazione delle funzioni di ricerca e aggiornamento della cache

Crea funzioni per cercare nella cache query semanticamente simili e per memorizzare nuove coppie query-risposta:


def search_cache(user_message: str, k: int = 3, min_similarity: float = 0.8):
    """Look up a semantically similar cached response from ElastiCache."""
    hits = store.search(
        namespace="semantic-cache",
        query=user_message,
        limit=k
    )
    if not hits:
        return None

    # Sort by similarity score (highest first)
    hits = sorted(hits, key=lambda h: h["score"], reverse=True)
    top_hit = hits[0]
    score = top_hit["score"]

    if score < min_similarity:
        return None  # Below similarity threshold

    return top_hit["value"]["answer"]  # Return cached answer


def store_cache(user_message: str, result_message: str):
    """Store a new query-response pair in the semantic cache."""
    key = cache_key_for_query(user_message)
    store.put(
        namespace="semantic-cache",
        key=key,
        value={
            "query": user_message,
            "answer": result_message
        }
    )

Fase 5: Implementazione del modello di cache di lettura

Integra la cache nella gestione delle richieste dell'applicazione:


import time

def handle_query(user_message: str) -> dict:
    """Handle a user query with read-through semantic cache."""
    start = time.time()

    # Step 1: Search the semantic cache
    cached_response = search_cache(user_message, min_similarity=0.8)

    if cached_response:
        # Cache hit - return cached response
        elapsed = (time.time() - start) * 1000
        return {
            "response": cached_response,
            "source": "cache",
            "latency_ms": round(elapsed, 1),
        }

    # Step 2: Cache miss - invoke LLM
    llm_response = invoke_llm(user_message)  # Your LLM invocation function

    # Step 3: Store the response in cache for future reuse
    store_cache(user_message, llm_response)

    elapsed = (time.time() - start) * 1000
    return {
        "response": llm_response,
        "source": "llm",
        "latency_ms": round(elapsed, 1),
    }

Comandi Valkey sottostanti

La tabella seguente mostra i comandi Valkey utilizzati per implementare la cache semantica:

Operation	Comando Valkey	Latenza tipica
Creazione di un indice	`FT.CREATE semantic_cache SCHEMA query TEXT answer TEXT embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1024 DISTANCE_METRIC COSINE`	One-time configurazione
Ricerca nella cache	`FT.SEARCH semantic_cache "*=>[KNN 3 @embedding $query_vec]" PARAMS 2 query_vec [bytes] DIALECT 2`	Microsecondi
Risposta del negozio	`HSET cache:{hash} query "..." answer "..." embedding [bytes]`	Microsecondi
Imposta TTL	`EXPIRE cache:{hash} 82800`	Microsecondi
Inferenza LLM (mancata)	Chiamata API esterna ad Amazon Bedrock	500-6000 ms

Avvertimento JavaScript è disabilitato o non è disponibile nel tuo browser.

Per usare la documentazione AWS, JavaScript deve essere abilitato. Consulta le pagine della guida del browser per le istruzioni.

Convenzioni dei documenti

Prerequisiti

Impatto e parametri di riferimento