

 Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the [ blog post ](https://aws.amazon.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/). 

# Materialized views in Amazon Redshift
<a name="materialized-view-overview"></a>

This section describes how to create and use materialized views in Amazon Redshift. A materialized view is a database object that stores the results of a query, which can be used to improve performance and efficiency.

In a data warehouse environment, applications often must perform complex queries on large tables. An example is SELECT statements that perform multi-table joins and aggregations on tables that contain billions of rows. Processing these queries can be expensive, in terms of system resources and the time it takes to compute the results.

Materialized views in Amazon Redshift provide a way to address these issues. A *materialized view* contains a precomputed result set, based on an SQL query over one or more base tables. You can issue SELECT statements to query a materialized view, in the same way that you can query other tables or views in the database. Amazon Redshift returns the precomputed results from the materialized view, without having to access the base tables at all. From the user standpoint, the query results are returned much faster compared to when retrieving the same data from the base tables.

Materialized views are especially useful for speeding up queries that are predictable and repeated. Instead of performing resource-intensive queries against large tables (such as aggregates or multiple joins), applications can query a materialized view and retrieve a precomputed result set. For example, consider the scenario where a set of queries is used to populate dashboards, such as Amazon Quick. This use case is ideal for a materialized view, because the queries are predictable and repeated over and over again. 

You can define a materialized view in terms of other materialized views. Use *materialized views on materialized views* to expand the capability of materialized views. In this approach, an existing materialized view plays the same role as a base table for the query to retrieve data. 

This approach is especially useful for reusing precomputed joins for different aggregate or GROUP BY options. For example, take a materialized view that joins customer information (containing millions of rows) with item order detail information (containing billions of rows). This is an expensive query to compute on demand repeatedly. You can use different GROUP BY options for the materialized views created on top of this materialized view and join with other tables. Doing this saves compute time otherwise used to run the expensive underlying join every time. The [STV\$1MV\$1DEPS](r_STV_MV_DEPS.md) table shows the dependencies of a materialized view on other materialized views. 

When you create a materialized view, Amazon Redshift runs the user-specified SQL statement to gather the data from the base table or tables and stores the result set. The following illustration provides an overview of the materialized view `tickets_mv` that an SQL query defines by using two base tables, `events` and `sales`.

![\[A materialized view defined using data from two base tables.\]](http://docs.aws.amazon.com/redshift/latest/dg/images/materialized-view.png)


You can then use these materialized views in queries to speed them up. In addition, Amazon Redshift can automatically rewrite these queries to use materialized views, even when the query doesn't explicitly reference a materialized view. Automatic rewrite of queries is especially powerful in enhancing performance when you can't change your queries to use materialized views.

To update the data in the materialized view, you can use the REFRESH MATERIALIZED VIEW statement at any time to manually refresh materialized views. Amazon Redshift identifies changes that have taken place in the base table or tables, and then applies those changes to the materialized view. Because automatic rewriting of queries requires materialized views to be up to date, as a materialized view owner, make sure to refresh materialized views whenever a base table changes. 

Amazon Redshift provides a few ways to keep materialized views up to date for automatic rewriting. You can configure materialized views with the automatic refresh option to refresh materialized views when base tables of materialized views are updated. This autorefresh operation runs at a time when cluster resources are available to minimize disruptions to other workloads. Because the scheduling of autorefresh is workload-dependent, you can have more control over when Amazon Redshift refreshes your materialized views. You can schedule a materialized view refresh job by using Amazon Redshift scheduler API and console integration. For more information about query scheduling, see [Scheduling a query on the Amazon Redshift console](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-schedule-query.html). 

Doing this is especially useful when there is a service level agreement (SLA) requirement for up-to-date data from a materialized view. You can also manually refresh any materialized views that you can autorefresh. For information on how to create materialized views, see [CREATE MATERIALIZED VIEW](materialized-view-create-sql-command.md).

You can issue SELECT statements to query a materialized view. For information on how to query materialized views, see [Materialized view queries](materialized-view-query.md). The result set eventually becomes stale when data is inserted, updated, and deleted in the base tables. You can refresh the materialized view at any time to update it with the latest changes from the base tables. For information on how to refresh materialized views, see [REFRESH MATERIALIZED VIEW](materialized-view-refresh-sql-command.md).

For details about SQL commands used to create and manage materialized views, see the following command topics:
+ [CREATE MATERIALIZED VIEW](materialized-view-create-sql-command.md)
+ [ALTER MATERIALIZED VIEW](r_ALTER_MATERIALIZED_VIEW.md)
+ [REFRESH MATERIALIZED VIEW](materialized-view-refresh-sql-command.md)
+ [DROP MATERIALIZED VIEW](materialized-view-drop-sql-command.md)

For information about system tables and views to monitor materialized views, see the following topics: 
+ [STV\$1MV\$1INFO](r_STV_MV_INFO.md)
+ [STL\$1MV\$1STATE](r_STL_MV_STATE.md)
+ [SVL\$1MV\$1REFRESH\$1STATUS](r_SVL_MV_REFRESH_STATUS.md)
+ [STV\$1MV\$1DEPS](r_STV_MV_DEPS.md)

**Topics**
+ [

# Materialized view queries
](materialized-view-query.md)
+ [

# Automatic query rewriting to use materialized views
](materialized-view-auto-rewrite.md)
+ [

# Materialized views on external data lake tables in Amazon Redshift Spectrum
](materialized-view-external-table.md)
+ [

# Refreshing a materialized view
](materialized-view-refresh.md)
+ [

# Automated materialized views
](materialized-view-auto-mv.md)
+ [

# Using a user-defined function (UDF) in a materialized view
](materialized-view-UDFs.md)
+ [

# Streaming ingestion to a materialized view
](materialized-view-streaming-ingestion.md)

# Materialized view queries
<a name="materialized-view-query"></a>

You can use a materialized view in any SQL query by referencing the materialized view name as the data source, like a table or standard view.

When a query accesses a materialized view, it sees only the data that is stored in the materialized view as of its most recent refresh. Thus, the query might not see all the latest changes from corresponding base tables of the materialized view.

If other users want to query the materialized view, the owner of the materialized view grants the SELECT permission to those users. The other users don't need to have the SELECT permission on the underlying base tables. The owner of the materialized view can also revoke the SELECT permission from other users to prevent them from querying the materialized view. Note that the other users still need the USAGE permission on the schemas that contain the base tables of the materialized view.

If the owner of the materialized view no longer has the local SELECT permission on the underlying base tables:
+ The owner can no longer query the materialized view. 
+ Other users who have the SELECT permission on the materialized view can no longer query the materialized view.

This is limited to local permissions. Changes in permissions managed by Lake Formation are not verified on querying the materialized view. This means that if an underlying base table is managed by Lake Formation and select permissions on the table are revoked in Lake Formation, you can still query the materialized view. 

The following example queries the `tickets_mv` materialized view. For more information on the SQL command used to create a materialized view, see [CREATE MATERIALIZED VIEW](materialized-view-create-sql-command.md).

```
SELECT sold
FROM tickets_mv
WHERE catgroup = 'Concerts';
```

Because the query results are precomputed, there's no need to access the underlying tables (`category`, `event`, and `sales`). Amazon Redshift can return the results directly from `tickets_mv`.

# Automatic query rewriting to use materialized views
<a name="materialized-view-auto-rewrite"></a>

You can use automatic query rewriting of materialized views in Amazon Redshift to have Amazon Redshift rewrite queries to use materialized views. Doing this accelerates query workloads even for queries that don't explicitly reference a materialized view. When Amazon Redshift rewrites queries, it only uses materialized views that are up to date.

## Usage notes
<a name="mv_auto-rewrite_usage"></a>

To check if automatic rewriting of queries is used for a query, you can inspect the query plan or STL\$1EXPLAIN. The following shows a SELECT statement and the EXPLAIN output of the original query plan.

```
SELECT catgroup, SUM(qtysold) AS sold
FROM category c, event e, sales s
WHERE c.catid = e.catid AND e.eventid = s.eventid
GROUP BY 1;

EXPLAIN 
 XN HashAggregate  (cost=920021.24..920021.24 rows=1 width=35)
   ->  XN Hash Join DS_BCAST_INNER  (cost=440004.53..920021.22 rows=4 width=35)
         Hash Cond: ("outer".eventid = "inner".eventid)
         ->  XN Seq Scan on sales s  (cost=0.00..7.40 rows=740 width=6)
         ->  XN Hash  (cost=440004.52..440004.52 rows=1 width=37)
               ->  XN Hash Join DS_BCAST_INNER  (cost=0.01..440004.52 rows=1 width=37)
                     Hash Cond: ("outer".catid = "inner".catid)
                     ->  XN Seq Scan on event e  (cost=0.00..2.00 rows=200 width=6)
                     ->  XN Hash  (cost=0.01..0.01 rows=1 width=35)
                           ->  XN Seq Scan on category c  (cost=0.00..0.01 rows=1 width=35)
```

The following shows the EXPLAIN output after a successful automatic rewriting. This output includes a scan on the materialized view in the query plan that replaces parts of the original query plan. 

```
* EXPLAIN 
     XN HashAggregate  (cost=11.85..12.35 rows=200 width=41)
       ->  XN Seq Scan on mv_tbl__tickets_mv__0 derived_table1  (cost=0.00..7.90 rows=790 width=41)
```

Only up-to-date (fresh) materialized views are considered for automatic rewriting of queries, irrespective of the refresh strategy, such as auto, scheduled, or manual. Hence, the original query returns up-to-date results. When a materialized view is explicitly referenced in queries, Amazon Redshift accesses currently stored data in the materialized view. This data might not reflect the latest changes from the base tables of the materialized view.

You can use automatic query rewriting of materialized views that are created on cluster version 1.0.20949 or later.

You can stop automatic query rewriting at the session level by using SET mv\$1enable\$1aqmv\$1for\$1session to FALSE.

## How automatic query rewriting of materialized views works
<a name="mv_auto-rewrite_behavior"></a>

Based on internal optimization, Amazon Redshift may decide to invoke automatic query rewriting of materialized views transparently to provide the most optimal query execution with the lowest possible query times.

For example, suppose User A creates a materialized view M1 on table T1 with the query `SELECT * FROM T1`. User A has full SELECT privileges on T1. User A grants access to M1 to all users, including User B. However, when User B tries to query T1 directly, they are denied access. This is because User B does not have SELECT privileges on T1.

Sometime later, User B again tries to query T1, but this time gets results back from T1. This is because automatic query rewrite using materialized view rewrote User B's query on table T1 (`SELECT <cols> FROM T1`) to a query on materialized view M1 (`SELECT <cols> FROM M1`). 

## Limitations
<a name="mv_auto-rewrite_limitations"></a>

Following are limitations for using automatic query rewriting of materialized views:
+ Automatic query rewriting works with materialized views that don't reference or include any of the following:
  + Subqueries
  + Left, right, or full outer joins
  + Set operations 
  + Any aggregate functions, except SUM, COUNT, MIN, MAX, and AVG. (These are the only aggregate functions that work with automatic query rewriting.)
  + Any aggregate functions with DISTINCT
  + Any window functions
  + SELECT DISTINCT or HAVING clauses
  + External or shared tables
  + Other materialized views
+ Automatic query rewriting rewrites SELECT queries that refer to user-defined Amazon Redshift tables. Amazon Redshift doesn't rewrite the following queries:
  + CREATE TABLE AS statements
  + SELECT INTO statements
  + Queries on catalogs or system tables
  + Queries with outer joins or a SELECT DISTINCT clause
+ If a query isn't automatically rewritten, check whether you have the SELECT permission on the specified materialized view and the [mv\$1enable\$1aqmv\$1for\$1session](r_mv_enable_aqmv_for_session.md) option is set to TRUE. 

  You can also check if your materialized views are eligible for automatic rewriting of queries by inspecting STV\$1MV\$1INFO. For more information, see [STV\$1MV\$1INFO](r_STV_MV_INFO.md).

# Materialized views on external data lake tables in Amazon Redshift Spectrum
<a name="materialized-view-external-table"></a>

Materialized views can provide incremental maintenance on external data lake tables. With incremental maintenance, Amazon Redshift updates the data in the materialized view with only the changes to the data in the base tables since the last refresh. Incremental maintenance is more cost effective than fully recomputing the materialized view after every data change on the base table.

When you use materialized views on at least one external table, materialized view creation is incremental on:
+ Standard data lake tables, partitioned and unpartitioned, with data files in any supported format (Parquet, Avro, CSV, etc.).
+ Apache Iceberg tables, partitioned and unpartitioned, with copy-on-write and merge-on-read.
+ Amazon Redshift Spectrum tables joined with any Amazon Redshift table in the same database.

Materialized view refresh is incremental on:
+ Standard data lake tables after S3 DELETE or PUT overwrite (deletion of data files), if the materialized view doesn't perform aggregation.
+ Apache Iceberg tables after INSERT, DELETE, UPDATE, or table compaction.

For more information about Amazon Redshift Spectrum, see [Amazon Redshift Spectrum](c-using-spectrum.md).

## Limitations
<a name="materialized-view-external-table-limitations"></a>

General limitations on materialized views still apply for materialized views on data lake tables. For more information, see [Refreshing a materialized view](materialized-view-refresh.md). In addition, consider the following limitations when you use materialized views on external data lake tables.
+ Materialized view creation is non-incremental on:
  + Hudi or Delta Lake tables.
  + Spectrum nested data access.
  + References to VARBYTE columns.
+ Materialized view refresh falls back to full recomputation on:
  + Apache Iceberg tables when a required snapshot is expired, if the materialized view performs aggregation.
  + Standard data lake tables after deletion or update of data files on Amazon S3, if the materialized view performs aggregation.
  + Standard data lake tables refreshed more than once within a transaction block.
  + Standard data lake tables governed by a manifest. For more information about manifests, see [Using a manifest to specify data files](https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html#copy-command-examples-manifest).
  + Amazon Redshift falls back to full recompute if this is expected to be more performant, in particular for materialized views that contain joins and more than one base table has been updated since the last refresh.
+ On Apache Iceberg tables, materialized view refresh can handle only up to 4 million positions deleted in a single data file. Once this limit is reached, the Apache Iceberg base table must be compacted to continue refreshing the materialized view.
+ On Apache Iceberg tables, concurrency scaling is not supported for materialized view creation and refresh.
+ Autonomics features are not supported. These include [automated materialized views](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-auto-mv.html) and [automatic query rewrite](https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-auto-rewrite.html). 
+ When an incremental materialized view is refreshed, IAM permissions apply only to the accessed portions of the Amazon Redshift base tables. 
+ Changes in permissions managed by Lake Formation are not verified on querying a materialized view. This means that if a materialized view is defined on a data lake table and select privileges are removed from the table with Lake Formation, then you can still query the materialized view.

# Refreshing a materialized view
<a name="materialized-view-refresh"></a>

This topic describes how to refresh the data in a material view from the underlying tables.

When you create a materialized view, its contents reflect the state of the underlying database relations (tables or other materialized views) at that time. The data in the materialized view remains unchanged, even when applications change the data in the underlying tables. To update the data in the materialized view, you can use the `REFRESH MATERIALIZED VIEW` statement at any time to manually refresh materialized views. When you use this statement, Amazon Redshift identifies changes that have taken place in the base relations and applies those changes to the materialized view.

Amazon Redshift has two strategies for refreshing a materialized view: 
+ In many cases, Amazon Redshift can perform an incremental refresh. In an *incremental refresh*, Amazon Redshift quickly identifies the changes to the data in the base relations since the last refresh and updates the data in the materialized view. Incremental refresh is supported on the following SQL constructs used in the query when defining the materialized view:
  + Constructs that contain the clauses SELECT, FROM, [INNER] JOIN, WHERE, GROUP BY, or HAVING.
  + Constructs that contain aggregations, such as SUM, MIN, MAX, AVG, and COUNT.
  + Most built-in SQL functions, specifically the ones that are immutable, given that these have the same input arguments and always produce the same output. 

  Incremental refresh is also supported for a materialized view that's based on a datashare table or materialized view.
+ If an incremental refresh isn't possible, then Amazon Redshift performs a full refresh. A *full refresh* reruns the underlying SQL statement, replacing all of the data in the materialized view.
+ Amazon Redshift automatically chooses the refresh method for a materialized view depending on the SELECT query used to define the materialized view. 

## Nested materialized views
<a name="materialized-view-refresh-nested"></a>

A materialized view can be defined on top of other materialized view(s). To refresh such a materialized view, you must explicitly use the `CASCADE` keyword at the top-most materialized view refresh. For example, assume the following nested materialized view structure:

```
CREATE TABLE t(a INT);
CREATE MATERIALIZED VIEW u AS SELECT * FROM t;
CREATE MATERIALIZED VIEW v AS SELECT * FROM u;
CREATE MATERIALIZED VIEW w AS SELECT * FROM v;

-- w -> v -> u -> t

INSERT INTO t VALUES (1);
```

To bring w fully up to date you have two choices:
+ (Recommended) Refresh w using `REFRESH MATERIALIZED VIEW w CASCADE` command. This command executes refresh of all materialized views in a single transaction. 
+ Refresh u, v, and w as separate commands, in dependency order (first u, then v, then w). 

If the `CASCADE` keyword is not explicitly used, the materialized view will be refreshed in `RESTRICT` mode, refreshing only the current materialized view. The following examples show an informational message when you run `REFRESH MATERIALIZED VIEW` on a materialized view that depends on an out-of-date materialized view.

```
REFRESH MATERIALIZED VIEW w;
INFO:  Materialized view w is already up to date.  However, it depends on another materialized view that is not up to date.

REFRESH MATERIALIZED VIEW w CASCADE;
INFO:  Materialized view w was incrementally updated successfully.
```

```
REFRESH MATERIALIZED VIEW v;
INFO: Materialized view v is already up to date. However, it depends on another materialized view that is not up to date.

REFRESH MATERIALIZED VIEW v CASCADE;
INFO: Materialized view v was incrementally updated successfully.
```

In the examples above with the cascade refresh option, materialized view u is refreshed first, materialized view v is refreshed next, and materialized view w is not refreshed.

The following example shows how you can create a full refresh plan for a materialized view programmatically. To refresh the materialized view v, first refresh materialized view u. To refresh materialized view w, first refresh materialized view u and then materialized view v.

```
WITH RECURSIVE recursive_deps (mv_tgt, lvl, mv_dep) AS
( SELECT trim(name) as mv_tgt, 0 as lvl, trim(ref_name) as mv_dep
  FROM stv_mv_deps
  UNION ALL
  SELECT R.mv_tgt, R.lvl+1 as lvl, trim(S.ref_name) as mv_dep
  FROM stv_mv_deps S, recursive_deps R
  WHERE R.mv_dep = S.name
)

SELECT mv_tgt, mv_dep from recursive_deps
ORDER BY mv_tgt, lvl DESC;

 mv_tgt | mv_dep
--------+--------
 v      | u
 w      | u
 w      | v
(3 rows)
```

## Limitations
<a name="materialized-view-refresh-limitations"></a>

Amazon Redshift doesn't support cascading refresh for materialized views based on sources other than:
+ Local tables
+ Local MVs
+ Streaming MVs

Amazon Redshift doesn't support incremental refresh for materialized views that are defined with a query using the following SQL elements:
+ OUTER JOIN (RIGHT, LEFT, or FULL).
+ The set operations UNION, INTERSECT, EXCEPT, and MINUS.
+ The aggregate functions MEDIAN, PERCENTILE\$1CONT, LISTAGG, STDDEV\$1SAMP, STDDEV\$1POP, APPROXIMATE COUNT, APPROXIMATE PERCENTILE, and bitwise aggregate functions.
**Note**  
The COUNT, SUM, and AVG aggregate functions are supported.
+ DISTINCT aggregate functions, such as DISTINCT COUNT, DISTINCT SUM, and so on.
+ Window functions.
+ A query that uses temporary tables for query optimization, such as optimizing common subexpressions.
+ Subqueries.
+ External tables referencing the following formats in the query that defines the materialized view. 
  +  Delta Lake 
  +  Hudi 

  Incremental refresh is supported for materialized views defined using formats other than those listed above. For more information, see [Materialized views on external data lake tables in Amazon Redshift SpectrumMaterialized views on external data lake tables](materialized-view-external-table.md). 

## Autorefreshing a materialized view
<a name="materialized-view-auto-refresh"></a>

Amazon Redshift can automatically refresh materialized views with up-to-date data from its base tables when materialized views are created with or altered to have the autorefresh option. Amazon Redshift autorefreshes materialized views as soon as possible after base tables changes.

To complete refresh of the most important materialized views with minimal impact to active workloads in your cluster, Amazon Redshift considers multiple factors. These factors include current system load, the resources needed for refresh, available cluster resources, and how often the materialized views are used. 

Amazon Redshift prioritizes your workloads over autorefresh and might stop autorefresh to preserve the performance of user workload. This approach might delay refresh of some materialized views. In some cases, you might need more deterministic refresh behavior for your materialized views. If so, consider using manual refresh as described in [REFRESH MATERIALIZED VIEW](materialized-view-refresh-sql-command.md) or scheduled refresh using the Amazon Redshift scheduler API operations or the console.

You can set autorefresh for materialized views using CREATE MATERIALIZED VIEW. You can also use the AUTO REFRESH clause to refresh materialized views automatically. For more information about creating materialized views, see [CREATE MATERIALIZED VIEW](materialized-view-create-sql-command.md). You can turn on autorefresh for a current materialized view by using [ALTER MATERIALIZED VIEW](r_ALTER_MATERIALIZED_VIEW.md).

Consider the following when you refresh materialized views:
+ You can still refresh a materialized view explicitly using REFRESH MATERIALIZED VIEW command even if you haven't enabled autorefresh for the materialized view.
+ Auto refresh is supported on materialized views defined on datasharing tables or Iceberg tables but not on the combination of the two.
+ For refresh status, you can check SVL\$1MV\$1REFRESH\$1STATUS, which records queries that were user-initiated or autorefreshed. 
+ To run REFRESH on recompute-only materialized views, make sure that you have the CREATE permission on schemas. For more information, see [GRANT](r_GRANT.md).

 Starting February 27, 2026, Auto REFRESH queries for Amazon Redshift materialized views are executed as user queries rather than background autonomic processes. As a result, Auto REFRESH queries now run with the same priority as other user queries. 

 This change improves the freshness of materialized views with Auto REFRESH enabled, helping them stay more up to date with the latest changes to their base tables compared to the previous behavior. 

 Note: The MV Auto REFRESH behavior change feature is only enabled for Amazon Redshift Provisioned clusters on the CURRENT Track of patch release P198 and newer. It is currently disabled on Serverless. 

# Automated materialized views
<a name="materialized-view-auto-mv"></a>

This topic describes how Amazon Redshift uses automated materialized views to improve performance. Amazon Redshift creates materialized views automatically based on database activity and performance. Amazon Redshift uses automated materialized views by default.

Materialized views are a powerful tool for improving query performance in Amazon Redshift. They do this by storing a precomputed result set. Similar queries don't have to re-run the same logic each time, because they can retrieve records from the existing result set. Developers and analysts create materialized views after analyzing their workloads to determine which queries would benefit, and whether the maintenance cost of each materialized view is worthwhile. As workloads grow or change, these materialized views must be reviewed to ensure they continue to provide tangible performance benefits.

The Automated Materialized Views (AutoMV) feature in Redshift enhances query performance by automatically creating and managing materialized views based on workload monitoring and machine learning algorithms. The following includes key features of AutoMV:
+ *Continuous monitoring* – Redshift continuously monitors the workload using machine learning techniques to identify opportunities for performance improvements through the creation of materialized views.
+ *Automatic creation and deletion* – When the system detects that a materialized view would be beneficial, it automatically creates and maintains it. Conversely, if a previously created AutoMV is no longer providing performance benefits, the system will automatically drop it.
+ *No user activity requirement* – The AutoMV feature only operates during periods of low user activity or workload running on the cluster. This ensures that the AutoMV operations do not interfere with or impact customer workloads.
+ *CPU usage spikes* – During times of no workload activity, the creation or refresh of materialized views by AutoMV may lead to spikes in CPU usage. This is a normal behavior as the system utilizes available resources to create and refresh materialized views.
+ *User workload priority* – If you initiate a workload while an AutoMV operation is in progress, the AutoMV task will stop to release resources for the user workload. This ensures that your workloads take priority over the AutoMV operations.

While the AutoMV feature may lead to CPU usage spikes during periods of no user activity, it operates transparently and without impacting your workloads. The system manages materialized views to improve query performance and simultaneously prioritizes user workloads over AutoMV operations.

AutoMV behavior and capabilities are the same as user-created materialized views. They are refreshed automatically and incrementally, using the same criteria and restrictions. Just like materialized views created by users, [Automatic query rewriting to use materialized views](materialized-view-auto-rewrite.md) identifies queries that can benefit from system-created AutoMVs. It automatically rewrites those queries to use the AutoMVs, improving query performance. Developers don't need to revise queries to take advantage of AutoMV.

**Note**  
Automated materialized views are refreshed intermittently. Queries rewritten to use AutoMV always return the latest results. When Redshift detects that data isn't up to date, queries aren't rewritten to read from automated materialized views. Instead, queries select the latest data from base tables.

Any workload with queries that are used repeatedly can benefit from AutoMV. Common use cases include:
+ *Dashboards* - Dashboards are widely used to provide quick views of key business indicators (KPIs), events, trends, and other metrics. They often have a common layout with charts and tables, but show different views for filtering, or for dimension-selection operations, like drill down. Dashboards often have a common set of queries used repeatedly with different parameters. Dashboard queries can benefit greatly from automated materialized views.
+  *Reports* - Reporting queries may be scheduled at various frequencies, based on business requirements and the type of report. Additionally, they can be automated or on-demand. A common characteristic of reporting queries is that they can be long running and resource-intensive. With AutoMV, these queries don't need to be recomputed each time they run, which reduces runtime for each query and resource utilization in Redshift. 

To turn off automated materialized views, you update the `auto_mv` parameter group to `false`. For more information, see [Amazon Redshift parameter groups](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-parameter-groups.html) in the Amazon Redshift Cluster Management Guide.

## SQL scope and considerations for automated materialized views
<a name="materialized-view-auto-mv-important"></a>
+ An automated materialized view can be initiated and created by a query or subquery, provided it contains a `GROUP BY` clause or one of the following aggregate functions: SUM, COUNT, MIN, MAX or AVG. But it cannot contain any of the following:
  + Left, right, or full outer joins
  + Aggregate functions other than SUM, COUNT, MIN, MAX, and AVG. (These particular functions work with automatic query rewriting.)
  + Any aggregate function that includes DISTINCT
  + Any window functions
  + SELECT DISTINCT or HAVING clauses
  + Other materialized views

  It isn't guaranteed that a query that meets the criteria will initiate the creation of an automated materialized view. The system determines from which candidates to create a view, based on its expected benefit to the workload and cost in resources to maintain, which includes the cost to the system to refresh. Each resulting materialized view is usable by automatic query rewriting.
+ Even though AutoMV might be initiated by a subquery or individual legs of set operators, the resulting materialized view won't contain subqueries or set operators.
+ To determine if AutoMV was used for queries, view the EXPLAIN plan and look for `%_auto_mv_%` in the output. For more information, see [EXPLAIN](https://docs.aws.amazon.com/redshift/latest/dg/r_EXPLAIN.html).
+ Automated materialized views aren't supported on external tables, such as datashares and federated tables.

## Automated materialized views limitations
<a name="materialized-view-auto-mv-limitations"></a>

Following are limitations for working with automated materialized views:
+ *Maximum number of AutoMVs* - The limit of automated materialized views is 200 per database in the cluster.
+ *Storage space and capacity* - An important characteristic of AutoMV is that it is performed using spare background cycles to help achieve that user workloads are not impacted. If the cluster is busy or running out of storage space, AutoMV ceases its activity. Specifically, at 80% of total cluster capacity, no new automated materialized views are created. At 90% of total capacity, they may be dropped to facilitate that user workloads continue without performance degradation. For more information about determining cluster capacity, see [STV\$1NODE\$1STORAGE\$1CAPACITY](r_STV_NODE_STORAGE_CAPACITY.md).

## Billing for automated materialized views
<a name="materialized-view-auto-mv-billing"></a>

 Amazon Redshift's automatic optimization capability creates and refreshes automated materialized views. There is no charge for compute resources for this process. Storage of automated materialized views is charged at the regular rate for storage. For more information, see [Amazon Redshift pricing](https://aws.amazon.com/redshift/pricing/).

## Additional resources
<a name="materialized-view-auto-mv-resources"></a>

 The following blog post provides further explanation regarding automated materialized views. It details how they’re created, maintained, and dropped. It also explains the underlying algorithms that drive these decisions: [Optimize your Amazon Redshift query performance with automated materialized views](https://aws.amazon.com/blogs//big-data/optimize-your-amazon-redshift-query-performance-with-automated-materialized-views/).

 This video begins with an explanation of materialized views and shows how they improve performance and conserve resources. It then provides an in-depth explanation of automated materialized views with a process-flow animation and a live demonstration. 

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/-85GSBQOBTA/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/-85GSBQOBTA)


# Using a user-defined function (UDF) in a materialized view
<a name="materialized-view-UDFs"></a>

You can use a scalar UDF in an Amazon Redshift materialized view. Define these either in python or SQL and reference them in the materialized view definition.

## Referencing a UDF in a materialized view
<a name="materialized-view-UDFs-examples"></a>

The following procedure shows how to use UDFs that perform simple arithmetic comparisons, in a materialized-view definition.

1. Create a table to use in the materialized-view definition.

   ```
   CREATE TABLE base_table (a int, b int);
   ```

1. Create a scalar user-defined function in python that returns a boolean value indicating whether an integer is larger than a comparison integer.

   ```
   CREATE OR REPLACE FUNCTION udf_python_bool(x1 int, x2 int) RETURNS bool IMMUTABLE
   AS $$
     return x1 > x2
   $$ LANGUAGE plpythonu;
   ```

   Optionally, create a functionally similar UDF with SQL, which you can use to compare results with the first. 

   ```
   CREATE OR REPLACE FUNCTION udf_sql_bool(int, int) RETURNS bool IMMUTABLE
   AS $$
     select $1 > $2;
   $$ LANGUAGE SQL;
   ```

1. Create a materialized view that selects from the table you created and references the UDF. 

   ```
   CREATE MATERIALIZED VIEW mv_python_udf AS SELECT udf_python_bool(a, b) AS a FROM base_table;
   ```

   Optionally, you can create a materialized view that references the SQL UDF.

   ```
   CREATE MATERIALIZED VIEW mv_sql_udf AS SELECT udf_sql_bool(a, b) AS a FROM base_table;
   ```

1. Add data to the table and refresh the materialized view.

   ```
   INSERT INTO base_table VALUES (1,2), (1,3), (4,2);
   ```

   ```
   REFRESH MATERIALIZED VIEW mv_python_udf;
   ```

   Optionally, you can refresh the materialized view that references the SQL UDF.

   ```
   REFRESH MATERIALIZED VIEW mv_sql_udf;
   ```

1. Query data from your materialized view.

   ```
   SELECT * FROM mv_python_udf ORDER BY a;
   ```

   The results of the query are the following:

   ```
   a
   -----
   false
   false
   true
   ```

   This returns `true` for the last set of values because the value for column `a` (4) is greater than the value for column `b` (2).

1. Optionally, you can query the materialized view that references the SQL UDF. The results for the SQL function match the results from the Python version.

   ```
   SELECT * FROM mv_sql_udf ORDER BY a;
   ```

   The results of the query are the following:

   ```
   a
   -----
   false
   false
   true
   ```

   This returns `true` for the last set of values to compare.

1. Use a DROP statement with CASCADE to drop the user-defined function and the materialized view that references it. 

   ```
   DROP FUNCTION udf_python_bool(int, int) CASCADE;
   ```

   ```
   DROP FUNCTION udf_sql_bool(int, int) CASCADE;
   ```

# Streaming ingestion to a materialized view
<a name="materialized-view-streaming-ingestion"></a>

This topic describes how to use materialized views for fast access to streaming data.

 Streaming ingestion provides low-latency, high-speed data ingestion from [Amazon Kinesis Data Streams](https://aws.amazon.com//kinesis/data-streams/) or [Amazon Managed Streaming for Apache Kafka](https://aws.amazon.com//msk/) to an Amazon Redshift provisioned or Amazon Redshift Serverless database. The data lands in a Redshift materialized view that's configured for the purpose. This results in fast access to external data. Streaming ingestion lowers data-access time and reduces storage cost. You can configure it for your Amazon Redshift cluster or for your Amazon Redshift Serverless workgroup, using a small collection of SQL commands. After it's set up, each materialized-view refresh can ingest hundreds of megabytes of data per second. 

## How data flows from a streaming service to Redshift
<a name="materialized-view-streaming-ingestion-data-flow"></a>

 It helps to understand how streaming ingestion works and the database objects utilized in the process. Data flows directly from a data-stream provider to an Amazon Redshift provisioned cluster or to an Amazon Redshift Serverless workgroup. There isn't a temporary landing area, such as an Amazon S3 bucket. The provisioned cluster or workgroup is the stream consumer. In the Redshift database, the data read from the stream lands in a materialized view. The data is processed as it arrives. For instance, JSON values can be consumed and mapped to a materialized view's data columns, using SQL. When the materialized view is refreshed, Redshift consumes data from allocated Kinesis data shards or Kafka partitions until the view is brought up to date with the stream. 

 Use cases for Amazon Redshift streaming ingestion involve data that's generated continually and must be processed within a short period, or *latency*, from its origination. This is commonly called *near real-time* analytics. Sources can include IT devices, system-telemetry devices, and click-stream data from a busy website or application.

## Data parsing best practices for improving performance
<a name="materialized-view-streaming-recommendations"></a>

When you configure streaming ingestion, there are options in how you can parse the incoming data. Practices can include performing business logic or formatting as data arrives. We recommend the following best practices to avoid errors or data loss. These derived from internal testing and helping customers troublehoot configuration and parsing issues.
+ **Extracting values from streamed data** – If you use the [JSON\$1EXTRACT\$1PATH\$1TEXT](https://docs.aws.amazon.com/redshift/latest/dg/JSON_EXTRACT_PATH_TEXT.html) function in your materialized view definition to parse or *shred* streamed JSON, it can significantly impact performance and latency. To explain, for each column extracted using JSON\$1EXTRACT\$1PATH\$1TEXT, the incoming JSON is re-parsed. After this, data-type conversion, filtering, and business-logic calculations occur. This means, for example, that if you extract 10 columns from JSON data, each JSON record is parsed 10 times, which includes additional logic. This results in higher ingestion latency. An alternative approach we recommend is to use the [JSON\$1PARSE function](https://docs.aws.amazon.com/redshift/latest/dg/JSON_PARSE.html) to convert JSON records to Redshift's SUPER data type. After the streamed data lands in the materialized view, use PartiQL to extract individual strings from the SUPER representation of the JSON data. For more information, see [Querying semi-structured data](https://docs.aws.amazon.com/redshift/latest/dg/query-super.html).

   Additionally, note that JSON\$1EXTRACT\$1PATH\$1TEXT has a 64KB data-size maximum. Thus, if any JSON record is larger than 64KB, processing it with JSON\$1EXTRACT\$1PATH\$1TEXT results in an error. 
+  **Mapping an Amazon Kinesis Data Streams stream or Amazon MSK topic to multiple materialized views** – We don't recommend creating multiple materialized views to ingest data from a single stream or topic. This is because each materialized view creates a consumer for each shard in the Kinesis Data Streams stream or partition in the Kafka topic. This can result in throttling or exceeding the throughput of the stream or topic. It also can result in higher cost, because you ingest the same data multiple times. When you configure streaming ingestion, we recommend you create one materialized view for each stream or topic. 

  If your use case requires that you ingest data from one KDS stream or MSK topic into multiple materialized views, consult the [AWS Big Data blog](https://aws.amazon.com/blogs/big-data/), specifically [Best practices to implement near-real-time analytics using Amazon Redshift Streaming Ingestion with Amazon MSK](https://aws.amazon.com/blogs/big-data/best-practices-to-implement-near-real-time-analytics-using-amazon-redshift-streaming-ingestion-with-amazon-msk/), before you do so.

## Streaming ingestion behavior and data types
<a name="materialized-view-streaming-ingestion-limitations"></a>

The following table describes technical behavior details and size limits for various data types. We recommend being familiar with these prior to configuring a materialized view for streaming ingestion.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion.html)

# Getting started with streaming ingestion from Amazon Kinesis Data Streams
<a name="materialized-view-streaming-ingestion-getting-started"></a>

This topic describes how to consume streaming data from Kinesis Data Streams using a materialized view.

 Setting up Amazon Redshift streaming ingestion involves creating an external schema that maps to the streaming data source and creating a materialized view that references the external schema. Amazon Redshift streaming ingestion supports Kinesis Data Streams as a source. As such, you must have a Kinesis Data Streams source available before configuring streaming ingestion. If you don't have a source, follow the instructions in the Kinesis documentation at [Getting Started with Amazon Kinesis Data Streams](https://docs.aws.amazon.com/streams/latest/dev/getting-started.html) or create one on the console using the instructions at [Creating a Stream via the AWS Management Console](https://docs.aws.amazon.com/streams/latest/dev/how-do-i-create-a-stream.html).

 Amazon Redshift streaming ingestion uses a materialized view, which is updated directly from the stream when `REFRESH` is run. The materialized view maps to the stream data source. You can perform filtering and aggregations on the stream data as part of the materialized-view definition. Your streaming ingestion materialized view (the *base* materialized view) can reference only one stream, but you can create additional materialized views that join with the base materialized view and with other materialized views or tables. 

**Note**  
*Streaming ingestion and Amazon Redshift Serverless* - The configuration steps in this topic apply both to provisioned Amazon Redshift clusters and to Amazon Redshift Serverless. For more information, see [Streaming ingestion behavior and data types](materialized-view-streaming-ingestion.md#materialized-view-streaming-ingestion-limitations).

Assuming you have a Kinesis Data Streams stream available, the first step is to define a schema in Amazon Redshift with `CREATE EXTERNAL SCHEMA` and to reference a Kinesis Data Streams resource. Following that, to access data in the stream, define the `STREAM` in a materialized view. You can store stream records in the semi-structured `SUPER` format, or define a schema that results in data converted to Redshift data types. When you query the materialized view, the returned records are a point-in-time view of the stream. 

1. Create an IAM role with a trust policy that allows your Amazon Redshift cluster or Amazon Redshift Serverless workgroup to assume the role. For information about how to configure the trust policy for the IAM role, see [Authorizing Amazon Redshift to access other AWS services on your behalf](https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html). After it is created, the role should have the following IAM policy, which provides permission for communication with the Amazon Kinesis data stream. 

   **IAM policy for an unencrypted stream from Kinesis Data Streams**

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "ReadStream",
               "Effect": "Allow",
               "Action": [
                   "kinesis:DescribeStreamSummary",
                   "kinesis:GetShardIterator",
                   "kinesis:GetRecords",
                   "kinesis:ListShards",
                   "kinesis:DescribeStream"
               ],
               "Resource": "arn:aws:kinesis:*:111122223333:stream/*"
           },
           {
               "Sid": "ListStream",
               "Effect": "Allow",
               "Action": "kinesis:ListStreams",
               "Resource": "*"
           }
       ]
   }
   ```

------

   **IAM policy for an encrypted stream from Kinesis Data Streams**

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "ReadStream",
               "Effect": "Allow",
               "Action": [
                   "kinesis:DescribeStreamSummary",
                   "kinesis:GetShardIterator",
                   "kinesis:GetRecords",
                   "kinesis:ListShards",
                   "kinesis:DescribeStream"
               ],
               "Resource": "arn:aws:kinesis:*:111122223333:stream/*"
           },
           {
               "Sid": "DecryptStream",
               "Effect": "Allow",
               "Action": [
                   "kms:Decrypt"
               ],
               "Resource": "arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab"
           },
           {
               "Sid": "ListStream",
               "Effect": "Allow",
               "Action": "kinesis:ListStreams",
               "Resource": "*"
           }
       ]
   }
   ```

------

1. Check your VPC and verify that your Amazon Redshift cluster or Amazon Redshift Serverless has a route to get to the Kinesis Data Streams endpoints over the internet using a NAT gateway or internet gateway. If you want traffic between Redshift and Kinesis Data Streams to remain within the AWS network, consider using a Kinesis Interface VPC Endpoint. For more information, see [Using Amazon Kinesis Data Streams Kinesis Data Streams with Interface VPC Endpoints](https://docs.aws.amazon.com/streams/latest/dev/vpc.html).

1. In Amazon Redshift, create an external schema to map the data from Kinesis to a schema.

   ```
   CREATE EXTERNAL SCHEMA kds
   FROM KINESIS
   IAM_ROLE { default | 'iam-role-arn' };
   ```

    Streaming ingestion for Kinesis Data Streams doesn't require an authentication type. It uses the IAM role defined in the `CREATE EXTERNAL SCHEMA` statement for making Kinesis Data Streams requests. 

    Optional: Use the REGION keyword to specify the region where the Amazon Kinesis Data Streams or Amazon MSK stream resides. 

   ```
   CREATE EXTERNAL SCHEMA kds
   FROM KINESIS
   REGION 'us-west-2'
   IAM_ROLE { default | 'iam-role-arn' };
   ```

   In this sample, the region specifies the location of the source stream. The IAM\$1ROLE is a sample.

1. Create a materialized view to consume the stream data. With a statement like the following, if a record can't be parsed, it causes an error. Use a command like this if you don't want error records to be skipped.

   ```
   CREATE MATERIALIZED VIEW my_view AUTO REFRESH YES AS
   SELECT *
   FROM kds.my_stream_name;
   ```

   Kinesis stream names are case sensitive and can contain both uppercase and lowercase letters. To ingest from streams with uppercase names, you can set the configuration `enable_case_sensitive_identifier` to `true` at the database level. For more information, see [Names and identifiers](https://docs.aws.amazon.com/redshift/latest/dg/r_names.html) and [enable\$1case\$1sensitive\$1identifier](https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html).

   To turn on auto refresh, use `AUTO REFRESH YES`. The default behavior is manual refresh. Note when you use CAN\$1JSON\$1PARSE, it's possible that records that can't be parsed are skipped.

   Metadata columns include the following:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion-getting-started.html)

   It's important to note if you have business logic in your materialized view definition that business-logic errors can cause streaming ingestion to be blocked in some cases. This might lead to you having to drop and re-create the materialized view. To avoid this, we recommend that you keep your logic as simple as possible and perform most of your business-logic checks on the data after it's ingested.

1. Refresh the view, which invokes Redshift to read from the stream and load data into the materialized view.

   ```
   REFRESH MATERIALIZED VIEW my_view;
   ```

1. Query data in the materialized view.

   ```
   select * from my_view;
   ```

# Getting started with streaming ingestion from Apache Kafka sources
<a name="materialized-view-streaming-ingestion-getting-started-MSK"></a>

This topic describes how to consume streaming data from Amazon MSK, Apache Kafka or Confluent Cloud using a materialized view.

 The purpose of Amazon Redshift streaming ingestion is to simplify the process for directly ingesting stream data from a streaming service into Amazon Redshift or Amazon Redshift Serverless. This works with Amazon MSK Provisioned and Amazon MSK Serverless, with open-source Apache Kafka, and with Confluent Cloud. Amazon Redshift streaming ingestion removes the need to stage an Apache Kafka topic in Amazon S3 before ingesting the stream data into Redshift.

 On a technical level, streaming ingestion provides low-latency, high-speed ingestion of stream or topic data into an Amazon Redshift materialized view. Following setup, using materialized view refresh, you can take in large data volumes. 

You must have an Apache Kafka source available before configuring Amazon Redshift streaming ingestion. If you do not have a source, create one using the following instructions:
+ **Amazon MSK** — [Getting Started Using Amazon MSK](https://docs.aws.amazon.com/msk/latest/developerguide/getting-started.html)
+ **Apache Kafka** — [Apache Kafka Quickstart](https://kafka.apache.org/quickstart)
+ **Confluent Cloud** — [Quick Start for Confluent Cloud](https://docs.confluent.io/cloud/current/get-started/index.html)

## Setting up streaming ingestion from Kafka
<a name="materialized-view-streaming-ingestion-getting-started-MSK-setup"></a>

Use the following procedures to set up streaming ingestion to Amazon Redshift from Amazon MSK, or Apache Kafka sources that are not AWS-managed (Apache Kafka and Confluent Cloud).

**Topics**
+ [

### Set up authentication
](#materialized-view-streaming-ingestion-getting-started-MSK-setup-auth)
+ [

### Setting up your VPC
](#materialized-view-streaming-ingestion-getting-started-MSK-Setup-VPC)
+ [

### Create a Materialized View
](#materialized-view-streaming-ingestion-getting-started-MSK-setup-materialized-view)

### Set up authentication
<a name="materialized-view-streaming-ingestion-getting-started-MSK-setup-auth"></a>

This section describes setting up authentication to allow your Amazon Redshift application to access an Amazon MSK source. 

After you create your application's role, attach one of the following policies to allow access to your Amazon MSK, Apache Kafka, or Confluent Cloud cluster. For mTLS authentication, you can store the certificates that Amazon Redshift uses in ACM or Secrets Manager, so you must choose the policy that matches where the certificate is stored. 

Note that self-signed certificates are not supported for authentication or data in transit when you use direct streaming ingestion into Amazon Redshift with any of the supported Apache Kafka streaming sources. These include Amazon MSK, Apache Kafka, and Confluent Cloud. Consider using certificates generated with AWS Certificate Manager or any other publicly trusted certificate authority.

Amazon Redshift IAM authentication with MSK is only supported on Kafka version 2.7.1 or above.

**AUTHENTICATION IAM (Amazon MSK only):**

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "MSKIAMpolicy",
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:ReadData",
                "kafka-cluster:DescribeTopic",
                "kafka-cluster:Connect"
            ],
            "Resource": [
                "arn:aws:kafka:*:111122223333:cluster/MyTestCluster/*",
                "arn:aws:kafka:*:111122223333:topic/MyTestCluster/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kafka-cluster:AlterGroup",
                "kafka-cluster:DescribeGroup"
            ],
            "Resource": [
                "arn:aws:kafka:*:111122223333:group/MyTestCluster/*"
            ]
        }
    ]
}
```

------

**AUTHENTICATION MTLS: using a certificate stored in AWS Certificate Manager**

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "MSKmTLSACMpolicy",
            "Effect": "Allow",
            "Action": [
                "acm:ExportCertificate" 
            ],
            "Resource": [
                "arn:aws:acm:us-east-1:444455556666:certificate/certificate_ID"
            ]
        }
    ]
}
```

------

**AUTHENTICATION MTLS: using a certificate stored in AWS Secrets Manager**

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "MSKmTLSSecretsManagerpolicy",
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue" 
            ],
            "Resource": [
                "arn:aws:secretsmanager:us-east-1:444455556666:secret:secret_ID"
            ]
        }
    ]
}
```

------

------
#### [ Amazon MSK ]

If you use AUTHENTICATION NONE to connect to a Amazon MSK source, no IAM role is required. However, if you use AUTHENTICATION IAM or MTLS to authenticate with your Amazon MSK cluster, your Amazon Redshift cluster or Amazon Redshift Serverless namespace must have an attached IAM role with appropriate permissions. Create an IAM role with a trust policy that allows your Amazon Redshift cluster or Amazon Redshift Serverless namespace to assume the role. After you create the role, add one of the following permissions to support IAM or MTLS. For mTLS authentication, the certificates that Amazon Redshift uses can be stored in AWS Certificate Manager or AWS Secrets Manager, so you must choose the policy that matches where the certificate is stored. Attach the role to your Amazon Redshift provisioned cluster or Redshift Serverless namespace. For information about how to configure the trust policy for the IAM role, see [Authorizing Amazon Redshift to access other AWS services on your behalf](https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html). 

The following table shows complimentary configuration options to set for streaming ingestion from Amazon MSK:


| Amazon Redshift configuration | Amazon MSK configuration | Port to open between Redshift and Amazon MSK | 
| --- | --- | --- | 
|  AUTHENTICATION NONE  |  TLS transport disabled  | 9092 | 
|  AUTHENTICATION NONE  |  TLS transport enabled  | 9094 | 
|  AUTHENTICATION IAM  |  IAM  | 9098/9198 | 
|  AUTHENTICATION MTLS  |  TLS transport enabled  | 9094 | 

Amazon Redshift authentication is set in the CREATE EXTERNAL SCHEMA statement.

**Note**  
In a case where the Amazon MSK cluster has Mutual Transport Layer Security (mTLS) authentication enabled, configuring Amazon Redshift to use AUTHENTICATION NONE directs it to use port 9094 for unauthenticated access. However, this will fail because the port is being used by mTLS authentication. Because of this, we recommend that you switch to AUTHENTICATION mtls when you use mTLS.

------
#### [ Apache Kafka or Confluent Cloud ]

For Apache Kafka and Confluent Cloud, Amazon Redshift supports the following connection protocols:
+ You can use mTLS or plaintext with TLS transport for authentication when connecting to Apache Kafka.
+ You can only use mTLS for authentication when connecting to Confluent Cloud.

Amazon Redshift supports the following encryption protocols for connecting to Apache Kafka or Confluent Cloud:

**Supported authentication methods for Apache Kafka and Confluent Cloud**


| Amazon Redshift | Kafka Security Protocol | Apache Kafka support | Confluent Cloud support | 
| --- | --- | --- | --- | 
| AUTHENTICATION NONE | PLAINTEXT | No | No | 
| AUTHENTICATION NONE | SSL | Yes | No | 
| AUTHENTICATION IAM | SASL\$1SSL | No | No | 
| AUTHENTICATION MTLS | SSL | Yes (with certificate) | Yes (with certificate) | 

Note that Amazon Redshift does not support SASL/SCRAM or SASL/PLAINTEXT.

------

### Setting up your VPC
<a name="materialized-view-streaming-ingestion-getting-started-MSK-Setup-VPC"></a>

After you create your authentication resources, check your VPC and verify that your Amazon Redshift cluster or Amazon Redshift Serverless workgroup has a route to get to your Apache Kafka source. 

**Note**  
For Amazon MSK, the inbound security group rules for your Amazon MSK cluster should allow your Amazon Redshift cluster's or your Redshift Serverless workgroup's security group. The ports you specify depend on the authentication methods configured on your Amazon MSK cluster. For more information, see [Port information](https://docs.aws.amazon.com/msk/latest/developerguide/port-info.html) and [Access from within AWS but outside the VPC](https://docs.aws.amazon.com/msk/latest/developerguide/aws-access.html).

Next, enable enhanced VPC routing on your Amazon Redshift cluster or Amazon Redshift Serverless workgroup. For more information, see [Enabling enhanced VPC routing](https://docs.aws.amazon.com/redshift/latest/mgmt/enhanced-vpc-enabling-cluster.html).

### Create a Materialized View
<a name="materialized-view-streaming-ingestion-getting-started-MSK-setup-materialized-view"></a>

In this section, you set up the materialized view that Amazon Redshift uses to access your Apache Kafka streaming data.

Assuming you have an Apache Kafka cluster available, the first step is to define a schema in Redshift with `CREATE EXTERNAL SCHEMA` and to reference the cluster as the data source. Following that, to access data in the topic, define the `STREAM` in a materialized view. You can store records from your topic using the default Amazon Redshift VARBYTE datatype, or define a schema that converts the data to the semi-structured `SUPER` format. When you query the materialized view, the returned records are a point-in-time view of the topic.

1. In Amazon Redshift, create an external schema to map to the Apacke Kafka cluster. The syntax is the following:

   ```
   CREATE EXTERNAL SCHEMA MySchema
   FROM KAFKA
   [ IAM_ROLE [ default | 'iam-role-arn' ] ]
   AUTHENTICATION [ none | iam | mtls ]
   {AUTHENTICATION_ARN 'acm-certificate-arn' |  SECRET_ARN 'asm-secret-arn'};
   ```

   In the `FROM` clause, `KAFKA` denotes that the schema maps data from an Apache Kafka source. 

    `AUTHENTICATION` denotes the authentication type for streaming ingestion. There are three types available: 
   + **none** – Specifies that there is no authentication required. This corresponds to Unauthenticated access on MSK. This corresponds to SSL authentication in Apache Kafka. This authentication method is not supported for Confluent Cloud.
   + **iam** – Specifies IAM authentication. You can only use IAM authentication with Amazon MSK. When you choose this, make sure that the IAM role has permissions for IAM authentication. For more information about setting up the required IAM policies, see [Setting up streaming ingestion from Kafka](#materialized-view-streaming-ingestion-getting-started-MSK-setup).
   + **mtls** – Specifies that mutual transport layer security provides secure communication by facilitating authentication between a client and server. In this case, the client is Redshift and the server is Apache Kafka. For more information about configuring streaming ingestion with mTLS, see [Authentication with mTLS for Redshift streaming ingestion from Apache Kafka sources](materialized-view-streaming-ingestion-mtls.md).

   Note that Amazon MSK authentication with a username and password isn't supported for streaming ingestion. 

   The `AUTHENTICATION_ARN` parameter specifies the ARN of the ACM mutual transport layer security (mTLS) certificate you use to establish an encrypted connection.

   The `SECRET_ARN` parameter specifies the arn of the AWS Secrets Manager secret containing the certificate to be used by Amazon Redshift for mTLS.

   The following examples show how to set the broker URI for the Amazon MSK cluster when you create the external schema:

   **Using IAM authentication:**

   ```
   CREATE EXTERNAL SCHEMA my_schema
   FROM KAFKA
   IAM_ROLE 'arn:aws:iam::012345678901:role/my_role'
   AUTHENTICATION IAM
   URI 'b-1.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9098,b-2.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9098'
   ```

   **Using no authentication:**

   ```
   CREATE EXTERNAL SCHEMA my_schema
   FROM KAFKA 
   AUTHENTICATION none
   URI 'b-1.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9092,b-2.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9092'
   ```

   **Using mTLS:**

   ```
   CREATE EXTERNAL SCHEMA my_schema
   FROM KAFKA
   IAM_ROLE 'arn:aws:iam::012345678901:role/my_role'
   AUTHENTICATION MTLS
   URI 'b-1.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9094,b- 2.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9094'
   {AUTHENTICATION_ARN 'acm-certificate-arn' |  SECRET_ARN 'asm-secret-arn'}
   ```

   For more information on creating an external schema, see [CREATE EXTERNAL SCHEMA](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_SCHEMA.html).

1. Create a materialized view to consume the data from the topic. Use a SQL command such as the following sample.

   ```
   CREATE MATERIALIZED VIEW MyView AUTO REFRESH YES AS
   SELECT *
   FROM MySchema."mytopic";
   ```

   Kafka topic names are case sensitive and can contain both uppercase and lowercase letters. To ingest from topics with uppercase names, you can set the configuration `enable_case_sensitive_identifier` to `true` at the session or database level. For more information, see [Names and identifiers](https://docs.aws.amazon.com/redshift/latest/dg/r_names.html) and [enable\$1case\$1sensitive\$1identifier](https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html).

   To turn on auto refresh, use `AUTO REFRESH YES`. The default behavior is manual refresh. 

1. Metadata columns include the following:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion-getting-started-MSK.html)

   It's important to note if you have business logic in your materialized view definition that results in business logic errors, this can result in ingestion failures in streaming ingestion in some cases. This might lead to you having to drop and re-create the materialized view. To avoid this, we recommend that you keep your business logic simple and run additional logic on the data after you ingest it.

1. Refresh the view, which invokes Amazon Redshift to read from the topic and load data into the materialized view.

   ```
   REFRESH MATERIALIZED VIEW MyView;
   ```

1. Query data in the materialized view.

   ```
   select * from MyView;
   ```

   The materialized view is updated directly from the topic when `REFRESH` is run. You create a materialized view that maps to the Kafka topic data source. You can perform filtering and aggregations on the data as part of the materialized view definition. Your streaming ingestion materialized view (base materialized view) can reference only one Kafka topic, but you can create additional materialized views that join with the base materialized view and with other materialized views or tables.

For more information about limitations for streaming ingestion, see [Streaming ingestion behavior and data types](materialized-view-streaming-ingestion.md#materialized-view-streaming-ingestion-limitations).

# Authentication with mTLS for Redshift streaming ingestion from Apache Kafka sources
<a name="materialized-view-streaming-ingestion-mtls"></a>

Mutual transport-layer security (mTLS) provides the means for a server to authenticate a client it's sending information to, and for the client to authenticate the server. The benefit of using mTLS is to provide trusted authentication for a variety of use cases in several industry-vertical applications. These include use cases in financial, retail, government and healthcare industries. In the case of streaming ingestion to Redshift, authentication occurs between a server, which can be Amazon MSK, Apache Kafka, or Confluent Cloud, and an Amazon Redshift provisioned cluster or an Amazon Redshift Serverless workgroup.

This topic provides procedures and SQL-command examples that show ways to create an external schema that uses mTLS to authenticate between the Redshift client and any Apache Kafka server. The steps in this topic complement the full set of steps for setting up streaming ingestion from Apache Kafka sources. For more information, see [Getting started with streaming ingestion from Apache Kafka sources](materialized-view-streaming-ingestion-getting-started-MSK.md).

## Prerequisites for using mTLS for streaming ingestion
<a name="materialized-view-streaming-ingestion-mtls-prerequisites"></a>

This section provides prerequisite steps for using mTLS for streaming ingestion with either AWS Certificate Manager or AWS Secrets Manager.

As a preliminary step, you must have or create a private certificate authority (PCA), which you can use to issue certificates that, among other functions, enable secure communication through secure communication channels. AWS Private Certificate Authority (Private CA) is an available service that performs this function. For more information, see [Creating a private CA](https://docs.aws.amazon.com/privateca/latest/userguide/create-CA.html) in the *AWS Private Certificate Authority User Guide*. After creating the Private CA, export the root CA certificate and save it to a file with a .pem extension. 

To create a cluster that uses the CA certificate, do the following:

------
#### [ Amazon MSK ]

1. Create an Amazon MSK cluster that supports mtls client authentication. For more information about configuring an Amazon MSK cluster, see [ Mutual TLS client authentication for Amazon MSK](https://docs.aws.amazon.com/msk/latest/developerguide/msk-authentication.html#msk-authentication-cluster) in the *Amazon Managed Streaming for Apache Kafka Developer Guide*.

1. Edit the security settings for the Amazon MSK cluster, turning on TLS client authentication using AWS Certificate Manager (ACM) and selecting the AWS Private CA (PCA) you created previously. For more information, see [Updating security settings of a cluster](https://docs.aws.amazon.com/msk/latest/developerguide/msk-update-security.html) in the *Amazon Managed Streaming for Apache Kafka Developer Guide*.

------
#### [ Confluent Cloud ]

1. Create a dedicated Confluent Cloud cluster, preferably in the same AWS Region as your Amazon Redshift cluster. For information about creating a Confluent Cloud cluster, see [ Create a Kafka cluster in Confluent Cloud](https://docs.confluent.io/cloud/current/get-started/index.html#step-1-create-a-ak-cluster-in-ccloud).

1. Upload the exported AWS Private CA root CA certificate pem file you created earlier. For more information, see [ Manage certificate authority for mTLS authentication for Confluent Cloud](https://docs.confluent.io/cloud/current/security/authenticate/workload-identities/identity-providers/mtls/certificate-authority.html). Confluent Cloud uses this certificate to verify the Amazon Redshift client certificate. 

------

## Using mTLS for streaming ingestion with AWS Certificate Manager
<a name="materialized-view-streaming-ingestion-mtls-acm"></a>

The following procedure shows how to configure mTLS for Redshift streaming ingestion by using AWS Certificate Manager (ACM) for certificate storage and management:

1. Request a private certificate through ACM. When you do this, select the PCA you created in the Prerequisites section as the certificate authority. ACM stores the signed certificate and attached private key for secure communication. For more information about managing certificates with ACM, see [Issuing and managing certificates](https://docs.aws.amazon.com/acm/latest/userguide/gs.html) in the *AWS Certificate Manager User Guide*.

1. For the IAM role that you use to manage your Redshift cluster or Amazon Redshift Serverless workgroup, attach the permission to export the certificate, which is **acm:ExportCertificate**. For more information about setting up the necessary IAM resources for streaming ingestion, see [Setting up streaming ingestion from Kafka](materialized-view-streaming-ingestion-getting-started-MSK.md#materialized-view-streaming-ingestion-getting-started-MSK-setup). Specify the same IAM role in the next step to create the external schema.
**Note**  
Requests to AWS Certificate Manager require an Internet gateway (IGW) or a NAT gateway (NGW) in your VPC. If your VPC doesn't have either an IGW or a NGW, do the following:  
Use Secrets Manager instead of ACM to store your certificates.
Attach a Secrets Manager VPC endpoint to your VPC.
For information about using Secrets Manager with mTLS for streaming ingestion, see [Using mTLS for streaming ingestion with AWS Secrets Manager](#materialized-view-streaming-ingestion-mtls-secrets-manager) following.

1. Get the bootstrap broker URI for the Amazon MSK, Apache Kafka, or Confluent Cloud cluster. For information about getting the bootstrap broker URI for Amazon MSK, see [Getting the bootstrap brokers for an Amazon MSK cluster](https://docs.aws.amazon.com/msk/latest/developerguide/msk-get-bootstrap-brokers.html) in the *Amazon Managed Streaming for Apache Kafka Developer Guide*.

1.  Run a SQL command such as the following example to create an external schema that maps the cluster to a Redshift external schema, using `mtls`.

------
#### [ Amazon MSK ]

   ```
   CREATE EXTERNAL SCHEMA my_schema
   FROM KAFKA
   IAM_ROLE 'arn:aws:iam::012345678901:role/my_role'
   AUTHENTICATION mtls
   URI 'b-1.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9094,b-2.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9094'
   AUTHENTICATION_ARN 'arn:aws:acm:Region:444455556666:certificate/certificate_ID';
   ```

------
#### [ Apache Kafka or Confluent Cloud ]

   ```
   CREATE EXTERNAL SCHEMA my_schema
   FROM KAFKA
   IAM_ROLE 'arn:aws:iam::012345678901:role/my_role'
   AUTHENTICATION mtls
   URI 'lkc-2v531.domz6wj0p.us-west-1.aws.confluent.cloud:9092'
   AUTHENTICATION_ARN 'arn:aws:acm:region:444455556666:certificate/certificate_ID';
   ```

------

   Important parameters:
   + IAM\$1ROLE – The IAM role associated with the cluster, for streaming ingestion.
   + URI – The bootstrap broker URI for the cluster. Note that for Amazon MSK, port 9094 is specified for communicating with brokers for TLS encryption.
   + AUTHENTICATION\$1ARN – The ARN of the ACM certificate. The ARN is available in the ACM console when you choose the issued certificate.

After performing these configuration steps, you can create a Redshift materialized view that references the schema defined in the sample and then use REFRESH MATERIALIZED VIEW to stream data. For more information, see [Getting started with streaming ingestion from Apache Kafka sources](materialized-view-streaming-ingestion-getting-started-MSK.md).

## Using mTLS for streaming ingestion with AWS Secrets Manager
<a name="materialized-view-streaming-ingestion-mtls-secrets-manager"></a>

You can configure mTLS for Redshift streaming ingestion by using AWS Secrets Manager for certificate management if you don't want to reference the certificate in AWS Certificate Manager. The following steps describe how to configure mTLS using Secrets Manager.

1. Create a certificate signing request and private key with your tool of choice. Then you can use the signing request to generate a signed certificate, using the same AWS Private CA (PCA) that you used to generate the certificate for the cluster. For more information about issuing a certificate, see [IssueCertificate](https://docs.aws.amazon.com/privateca/latest/APIReference/API_IssueCertificate.html) in the *AWS Private Certificate Authority API Reference*.

1. Extract the certificate using AWS Private Certificate Authority. For more information, see [Retrieve a private certificate](https://docs.aws.amazon.com/privateca/latest/userguide/PcaGetCert.html) in the *AWS Private Certificate Authority User Guide*.

1. Store the certificate and private key generated in the previous step in AWS Secrets Manager. Choose `Other type of secret` and use the plaintext format. The key-value pairs should be in the format `{"certificate":"<cert value>","privateKey":"<pkey value>"}`, as in the following example. For more information about creating and managing secrets in AWS Secrets Manager, see [Create and manage secrets with AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/managing-secrets.html) in the *AWS Secrets Manager User Guide*.

   ```
   {"certificate":"-----BEGIN CERTIFICATE-----
   klhdslkfjahksgdfkgioeuyihbflahabhbdslv6akybeoiwv1hoaiusdhbahsbdi 
   H4hAX8/eE96qCcjkpfT84EdvHzp6fC+/WwM0oXlwUEWlvfMCXNaG5D8SqRq3qA==
   -----END CERTIFICATE-----
   -----BEGIN CERTIFICATE-----
   klhdslkfjahksgdfkgioeuyihbflahabhbdslv6akybeoiwv1hoaiusdhbahsbdi 
   wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
   -----END CERTIFICATE-----",
   "privateKey":"-----BEGIN PRIVATE KEY-----
   klhdslkfjahksgdfkgioeuyihbflahabhbdslv6akybeoiwv1hoaiusdhbahsbdi
   7OD4m1dBEs3Fj++hDMH9rYRp99RqtCOf0EWOUe139KOilOsW+cyhAoc9Ci2+Jo/k
   17u2N1iGILMQEZuCRtnJOkFYkw==
   -----END PRIVATE KEY-----"}
   ```

1. Attach the permission policy to retrieve the secret to the IAM role that you use to manage your Amazon Redshift cluster or Amazon Redshift Serverless workgroup. This permission is `secretsmanager:GetSecretValue`. For more information, see [Set up authentication](materialized-view-streaming-ingestion-getting-started-MSK.md#materialized-view-streaming-ingestion-getting-started-MSK-setup-auth). For more information about managing IAM policies, see [Edit IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-edit.html). Specify the same IAM role in the next step to create the external schema.

1. In Redshift, run the SQL command to create the external schema. You use the AUTHENTICATION type `mtls`. You also specify the URI of the cluster and the secret ARN in AWS Secrets Manager.

   ```
   CREATE EXTERNAL SCHEMA my_schema
   FROM KAFKA
   IAM_ROLE 'arn:aws:iam::012345678901:role/my_role'
   AUTHENTICATION mtls
   URI 'b-1.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9094,b-2.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9094'
   SECRET_ARN 'arn:aws:secretsmanager:us-east-1:012345678910:secret:myMTLSSecret';
   ```

Important parameters:
+ IAM\$1ROLE – The IAM role associated with the cluster, for streaming ingestion.
+ URI – The bootstrap broker URI for the cluster. Note that for Amazon MSK, port 9094 is specified for communicating with brokers for TLS encryption.
+ SECRET\$1ARN – The ARN of the secret from Secrets Manager, containing the certificate to use for mTLS.

## Enabling mTLS authentication for an existing external schema
<a name="materialized-view-streaming-ingestion-mtls-alter"></a>

If you have an existing external schema that you use for streaming ingestion and you want to implement mutual TLS for authentication, you can run a command such as the following, which specifies mTLS authentication and the ACM certificate ARN in ACM.

```
ALTER EXTERNAL SCHEMA schema_name 
AUTHENTICATION mtls
AUTHENTICATION_ARN 'arn:aws:acm:Region:444455556666:certificate/certificate_ID';
```

Or you can specify mTLS authentication, with reference to the secret ARN in AWS Secrets Manager.

```
ALTER EXTERNAL SCHEMA schema_name 
AUTHENTICATION mtls
SECRET_ARN 'arn:aws:secretsmanager:us-east-1:012345678910:secret:myMTLSSecret';
```

For information about the ALTER EXTERNAL SCHEMA command, see [ALTER EXTERNAL SCHEMA](r_ALTER_EXTERNAL_SCHEMA.md).

# Ingesting streaming data using Kinesis
<a name="materialized-view-streaming-ingestion-example-station-data"></a>

This procedure demonstrates how to ingest data from a Kinesis stream named *ev\$1station\$1data*, which contains consumption data from different EV charging stations, in JSON format. The schema is well defined. The example shows how to store the data as raw JSON and also how to convert the JSON data to Amazon Redshift data types as it's ingested.

**Producer setup**

1. Using Amazon Kinesis Data Streams, follow the steps to create a stream named `ev_station_data`. Choose **On-demand** for the **Capacity mode**. For more information, see [Creating a Stream via the AWS Management Console](https://docs.aws.amazon.com/streams/latest/dev/how-do-i-create-a-stream.html).

1. The [Amazon Kinesis Data Generator](https://awslabs.github.io/amazon-kinesis-data-generator/web/producer.html?) can help you generate test data for use with your stream. Follow the steps detailed in the tool to get started, and use the following data template for generating your data:

   ```
   {
       
      "_id" : "{{random.uuid}}",
      "clusterID": "{{random.number(
           {   "min":1,
               "max":50
           }
       )}}", 
       "connectionTime": "{{date.now("YYYY-MM-DD HH:mm:ss")}}",
       "kWhDelivered": "{{commerce.price}}",
       "stationID": "{{random.number(
           {   "min":1,
               "max":467
           }
       )}}",
         "spaceID": "{{random.word}}-{{random.number(
           {   "min":1,
               "max":20
           }
       )}}",
    
      "timezone": "America/Los_Angeles",
      "userID": "{{random.number(
           {   "min":1000,
               "max":500000
           }
       )}}"
   }
   ```

    Each JSON object in the stream data has the following properties: 

   ```
   {
       "_id": "12084f2f-fc41-41fb-a218-8cc1ac6146eb",
       "clusterID": "49",
       "connectionTime": "2022-01-31 13:17:15",
       "kWhDelivered": "74.00",
       "stationID": "421",
       "spaceID": "technologies-2",
       "timezone": "America/Los_Angeles",
       "userID": "482329"
   }
   ```

**Amazon Redshift setup**

These steps show you how to configure the materialized view to ingest data.

1. Create an external schema to map the data from Kinesis to a Redshift object.

   ```
   CREATE EXTERNAL SCHEMA evdata FROM KINESIS
   IAM_ROLE 'arn:aws:iam::0123456789:role/redshift-streaming-role';
   ```

   For information about how to configure the IAM role, see [Getting started with streaming ingestion from Amazon Kinesis Data Streams](materialized-view-streaming-ingestion-getting-started.md).

1. Create a materialized view to consume the stream data. The following example shows how to define a materialized view to ingest the JSON formatted data from a Kinesis stream.

   First, store stream records in semi-structured SUPER format. In this example, the JSON source is stored in Redshift without converting to Redshift types.

   ```
   CREATE MATERIALIZED VIEW ev_station_data AS
       SELECT approximate_arrival_timestamp,
       partition_key,
       shard_id,
       sequence_number,
       case when can_json_parse(kinesis_data) then json_parse(kinesis_data) else null end as payload,
       case when not can_json_parse(kinesis_data) then kinesis_data else null end as failed_payload
       FROM evdata."ev_station_data" ;
   ```

**Query the stream**

1. Enable case sensitive SUPER attributes using the command below. Amazon Redshift is case insensitive by default, so in order to access case sensitive SUPER attributes, you need to enable this functionality.

   ```
   SET enable_case_sensitive_super_attribute to TRUE;
   ```

1. Refresh the materialized view with the following command in order to pull data from the stream.

   ```
   REFRESH MATERIALIZED VIEW ev_station_data;
   ```

1. Query the refreshed materialized view to get usage statistics.

   ```
   SELECT e.payload.connectionTime::date as connectiontime
   ,SUM(e.payload.kWhDelivered::decimal(10,2)) AS Energy_Consumed
   ,count(distinct e.payload.userID) AS #Users
   from ev_station_data as e
   group by connectiontime
   order by 1 desc;
   ```

1. View results.

   ```
   connectiontime        energy_consumed    #users
   2022-02-08                 4139          10
   2022-02-09                 5571          10
   2022-02-10                 8697          20
   2022-02-11                 4408          10
   2022-02-12                 4257          10
   2022-02-23                 6861          10
   ```