Guidance for Creating Low-Cost Semantic Search on AWS

Overview

This Guidance demonstrates how you can implement cost-effective retrieval-augmented generation (RAG) solutions for your AI needs. It provides practical tools and methodologies for creating accessible, small-scale RAG implementations that remain effective without the high costs typically associated with vector database solutions. Making advanced AI techniques more accessible, this approach will enable your small business to personalize generative AI applications and use AI capabilities within budget constraints.

Benefits

Deploy AI-powered search while saving on costs

Implement a retrieval-augmented generation solution capabilities using DynamoDB as a cost-effective vector store, eliminating the need for expensive dedicated vector databases while maintaining performance for small to medium workloads.

Enhance decision-making with contextual AI

Empower your applications with intelligent document understanding and semantic search capabilities. Improve user experiences by providing relevant, context-aware responses based on your organization's specific knowledge base.

Accelerate AI adoption for small businesses

Quickly implement advanced AI techniques using pre-built workflows and managed services. Focus on creating value from your data while AWS handles the underlying infrastructure and AI model management.

How it works

Document ingestion and vectorization flow

This architecture diagram shows how to effectively create a low-cost vector store using Amazon DynamoDB. It shows the key components and their interactions, providing an overview of the architecture’s structure and functionality. This diagram illustrates document ingestion and vectorization flow.

Download the architecture diagram Document ingestion and vectorization flow Step 1
A user accesses the document upload portal through Amazon CloudFront, while AWS WAF protects against malicious traffic and common web exploits.
Step 2
Amazon Simple Storage Service (Amazon S3) serves the web portal's static files (HTML, CSS, and JavaScript). Amazon Cognito simultaneously handles user authentication and access management.
Step 3
Amazon API Gateway securely receives document uploads from the user and routes them to Amazon S3 for encrypted storage.
Step 4
When a new document is uploaded to Amazon S3, it automatically invokes an AWS Step Functions workflow to begin the data ingestion process.
Step 5
Depending on the document type, a user has two options for document processing: Amazon Textract for text extraction and parsing or Amazon Bedrock for advanced document understanding. Once processed, the documents are stored in Amazon S3. Subsequently, an AWS Lambda function performs two key operations:
Step 5a
Text Chunking: This operation segments the text into smaller chunks, with a user-configurable default size.
Step 5b
Embedding Generation: This operation creates vector embeddings for each chunk by using Amazon Titan Embeddings through the Amazon Bedrock API.
Step 6
The system stores document vectors in Amazon DynamoDB tables with separate tables for different chunk sizes to optimize retrieval performance.
Inference flow

This architecture diagram shows how to effectively create a low-cost vector store using Amazon DynamoDB. It shows the key components and their interactions, providing an overview of the architecture’s structure and functionality. This diagram illustrates inference flow.

Download the architecture diagram Inference flow Step 1
A user accesses the management portal through CloudFront, which enables them to do the following:
Step 1a
Adjust document processing settings (such as chunk sizes and processing models).
Step 1b
Evaluate and test the chat interface's performance.
Step 2
API Gateway routes the user's prompt to a Lambda function for processing.
Step 3
The inference processes the user's prompt by converting it to vectors using an Amazon Titan Embeddings model. This model stores the conversation history in DynamoDB for context retention. Simultaneously, a Lambda function performs the following:
Step 3a
It retrieves all vectors associated with the Amazon Cognito user from DynamoDB.
Step 3b
It performs a context similarity search using inverse cosine similarity.
Step 3c
It uses the most relevant text chunks as context for the generative AI model.
Step 4
Amazon Bedrock uses Claude 3 Haiku to process the user's prompt alongside contextually relevant vectors. It then generates and delivers the response to the user through the chat interface.

Deploy with confidence

Everything you need to launch this Guidance in your account is right here.

Let's make it happen

Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.