View a markdown version of this page

Evaluation with Inspect AI - Amazon Nova

Evaluation with Inspect AI

You can evaluate your customized Amazon Nova models using Inspect AI, an open-source evaluation framework. Inspect AI supports standardized benchmarks from the AI research community, enabling you to measure model performance across knowledge, reasoning, coding, and safety tasks.

Choose the evaluation approach that best fits your workflow:

  • Inspect AI SDK – Run evaluations interactively from a notebook or local environment against your SageMaker inference endpoint. Best for development, iteration, and quick testing.

  • Inspect AI container – Run evaluations at scale as SageMaker Training Jobs. Best for production evaluation pipelines, chaining multiple benchmarks, and automated workflows.

Recommended workflow: Start with the Inspect AI SDK to build and test your custom evaluation benchmarks using the AI assistant onboarding prompt, then run evaluations against your preferred inference solution. Once your benchmarks are fully validated, you can seamlessly switch to job-based evaluation using the Inspect AI container — no code changes required. Simply move your benchmark files and recipe file to S3 and launch the job.