Prepare genomic, clinical, mutation, expression and imaging data for large-scale analysis and query against a data lake.
Download the architecture diagram

Step 1
Create an AWS CodeBuild project containing the setup.sh script. This script creates the remaining AWS CloudFormation stacks, code repositories, and code.
Step 2
The landing zone (zone) stack creates the AWS CodeCommit pipe repository. After the landing zone (zone) stack completes its setup, the setup.sh script pushes source code to the CodeCommit pipe repository.
Step 3
The deployment pipeline (pipe) stack creates the CodeCommit code repository, an Amazon CloudWatch event, and the AWS CodePipeline code pipeline. After the deployment pipeline (pipe) stack completes its setup, the setup.sh script pushes source code to the CodeCommit code repository.
Step 4
The CodePipeline (code) pipeline deploys the codebase (genomics, imaging and omics) CloudFormation stacks. After the CodePipeline pipelines complete their setup, the resources deployed in your account include Amazon Simple Storage Service (Amazon S3) buckets for storing object access logs, build artifacts, and data in your data lake; CodeCommit repositories for source code; a CodeBuild project for building code artifacts; a CodePipeline pipeline for automating builds and deployment of resources; example AWS Glue jobs, crawlers, and a data catalog; and an Amazon SageMaker Jupyter notebook instance. An Amazon Omics Reference Store, Variant Store, and Annotation Store is provisioned, and a sample variant call file (VCF), a subset 1000 genomes VCF, and ClinVar Annotation VCF is ingested for analysis. Using AWS Lake Formation, a Data lake Admin can enable access of data in Omics Variant and Annotation Stores using Amazon Athena and SageMaker. An Amazon Omics Reference Store, Variant Store, and Annotation Store is provisioned to store publicly available variant and annotation data and make it available for query and analysis.
Step 5
The imaging stack creates a hyperlink to a CloudFormation quick start, which can be launched to deploy the Amazon QuickSight stack. The QuickSight stack creates Identity and Access Management (IAM) and QuickSight resources necessary to interactively explore the multi-omics dataset.