This architecture diagram shows how to build a connector for cloud-based object storage services to AWS.
Download the architecture diagram

Step 1
Install and configure AWS DataSync Agent on a virtual machine in the public cloud where the source Object Storage is hosted.
Step 2
The DataSync Agent and DataSync allow discovery and scheduling of data transfer for both the initial sync and continuous, ongoing sync.
Step 3
Configure DataSync to store the replicated data in a Landing Zone S3 bucket.
Step 4
Create a rule in EventBridge to schedule a Step Functions standard workflow for data processing at required frequency.
Step 5
In the workflow, use a Lambda function to do any necessary file- or object-level decryption, and invoke an AWS Glue task to normalize the data.
Step 6
Use AWS Glue jobs and workflows to do data processing of the decrypted data files, and write data in a separate S3 bucket.
Step 7
Write the object in read optimized Apache Parquet format. Apply data. Apply custom partitioning scheme as needed to attribute level transformation like SHA256 hashing to secure sensitive optimize reads.
Step 8
Create an AWS Glue crawler, and add it to the workflow to catalog the read optimized data in Data Catalog.
Step 9
Use another Lambda function to do post-processing activities, such as moving the source data files to an "archive" prefix location as part of clean-up and to save on storage costs.
Step 10
Use Amazon SNS to publish a workflow complete event and notify operators and users using email. Use HTTP or Topic option to integrate with other observability tools.
Step 11
Use the following AWS services for security and access: IAM enables least privilege access to specific resources and operations. AWS KMS provides encryption for data at rest and data in transit. Secrets Manager provides hashing keys for PII data. CloudWatch monitors logs and metrics across all services used in this Guidance.