This architecture diagram displays how data from the edge location is processed and ingested to a data lake.
Download the architecture diagram

Step 1
Use AWS IoT Core to maintain device shadows (digital twins) for all connected IoT devices, enabling secure cloud connectivity, device management, over-the-air (OTA) updates, and robust security for the device fleet.
Step 2
Use AWS IoT SiteWise to unlock real-time data from industrial equipment, delivering an organized view of live and historical health and safety insights.
Step 3
Use Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose to stream health and safety data to capture, process, and store data streams at any scale.
Step 4
Amazon Kinesis Video Streams securely ingests video feeds from edge devices using the Kinesis Video Stream Edge Agent, enabling video analytics, machine learning, and processing for health and safety monitoring.
Step 5
Amazon Textract, a machine learning service, is utilized to automatically extract text, handwriting, layout elements, and data from scanned on-site documents, such as health and safety observation reports.
Step 6
A health and safety data lake is built in Amazon Simple Storage Service (Amazon S3) to store raw data. It also stores curated, processed datasets, enabling efficient storage and access for analysis and reporting.
Step 7
AWS Glue and AWS Lake Formation are utilized for data discovery, governance, and management within the Amazon S3 data lake. AWS Glue enables data transformation and enrichment tasks, ensuring the data is properly prepared for analysis and reporting purposes.
Step 8
Curated and processed datasets are created and stored in Amazon S3, enabling centralized access to the refined health and safety data for downstream analysis and reporting tasks.
Step 9
Amazon S3 Glacier Deep Archive provides secure, cost-effective long-term archival storage for large volumes of infrequently accessed raw health and safety data.
Step 10
Amazon SageMaker Ground Truth enables building accurate training datasets by guiding you through structured workflows for labeling images, audio, text, and other data types.
Step 11
Amazon SageMaker is utilized to build, train, and deploy risk prediction and inference models based on IoT data and document repositories. These models can optionally be deployed at the edge on IoT Greengrass Core devices for localized inference and decision-making.
Step 12
Amazon Rekognition can detect the presence and usage of PPE in images and video feeds from the operational environment. This enables monitoring and enforcing proper PPE practices.
Step 13
LLMs on Amazon Bedrock summarize and query health and safety content, enabling natural language information retrieval.
Step 14
Amazon OpenSearch Service and Amazon Kendra serve as the backend for Retrieval Augmented Generation (RAG), enabling efficient retrieval and integration of relevant information from health and safety data repositories. This enhances the responses and outputs generated by the LLMs.