Using Business Context with the SageMaker Data Agent
The SageMaker Data Agent integrates with SageMaker Catalog business context and metadata, enabling data practitioners to discover datasets and generate more accurate SQL and Python code using business terminology instead of cryptic technical table names. This integration allows the Data Agent to use the business context that companies have invested months curating in their SageMaker Catalog, including those synced from Collibra, Atlan, and Alation, to deliver more accurate data discovery and code generation. When your domain has a configured SageMaker Catalog with published assets, the Data Agent uses glossary terms, custom metadata forms, summaries, and README content to find the correct tables for your queries.
Prerequisites
To use the business context integration, you need the following:
-
A Amazon SageMaker Unified Studio domain (IAM or IAM Identity Center)
-
A SageMaker Catalog configured with assets and data products. For more information, see Catalog in IDC-based domains.
-
Assets and Data Products enriched with business metadata (glossary terms, metadata forms, summaries, or READMEs)
-
For third-party catalog users: metadata sync configured between Collibra, Atlan, or Alation and SageMaker Catalog. For more information, see Third-party business data catalog integrations.
How it works
When you ask the Data Agent a question in a notebook or Query Editor:
-
The agent searches for accessible tables using technical metadata from AWS Glue Data Catalog and Redshift.
-
The agent queries the business context to find assets matching your business terms.
-
The agent merges technical and business metadata to identify the correct tables.
-
The agent generates SQL or PySpark code using the correct catalog, database, table, and column references.
Types of questions you can ask
You can ask the Data Agent questions using the business terminology defined in your catalog. The agent matches your terms against glossary terms, metadata form values, summaries, and README content to find the correct tables.
Discover datasets
Ask the Data Agent to find data using business terms instead of technical table names:
-
"What data do I have on customer churn?"
-
"Find tables in the Retail industry"
-
"What tables contain student data?"
-
"Show me Finance domain datasets"
-
"Find tables with a column called reg_cd"
-
"What data do I have on shoes inventory in current project"
Generate code from business questions
Ask analytical questions and the agent generates SQL or PySpark code using the correct tables and columns:
-
"Calculate customer retention rate"
-
"Show me EBITDA trends across quarters"
-
"Can you generate some table insights on avocado sales"
-
"Help me understand what the revenue_metrics table contains"
-
"Write a query to calculate retention rate. Use the business definitions from our catalog to pick the correct tables and columns."
Access-aware results
The agent checks whether you have access to matching assets before generating code. If a required table is not accessible, the agent informs you that you need to get access from your administrator.
Understanding search results
The agent searches two scopes:
-
Published assets – Assets published across your domain that your project has subscribed to.
-
Local assets – Unpublished assets within your current project.
When multiple assets match your query, the agent favors:
-
Assets subscribed by your current project.
-
Among subscribed assets, those with higher subscription counts (indicating broader usage).
Handling inaccessible assets
If the agent identifies a relevant asset that your project does not have access to, the agent informs you that you need to get access from your administrator. The agent does not generate code against inaccessible tables.
In-cell code generation
When using in-cell code generation (as opposed to the chat panel), the agent uses only accessible assets (subscribed and local).
Supported tools
Data Agent integration with business context is available in the following tools:
-
SageMaker Notebooks (Data Notebook)
-
Query Editor