obstkel.com logo

Redshift materialized views: The good, the bad and the ugly

Redshift materialized views simplify complex queries across multiple tables with large amounts of data. The result is significant performance improvement!

 

What are materialized views?

To derive information from data, we need to analyze it. We do this by writing SQL against database tables. Sometimes this might require joining multiple tables, aggregating data and using complex SQL functions. 

If this task needs to be repeated, you save the SQL script and execute it or may even create a SQL view. A view by the way, is nothing more than a stored SQL query you execute as frequently as needed.

However, a view does not generate output data until it is executed. In other words, if a complex sql query takes forever to run, a view based on the same SQL will do the same. This is where materialized views come in handy.

When a materialized view is created, the underlying SQL query gets executed right away and the output data stored. So, when you call the materialized view, all its doing is extracting data from the stored results.

Think of a materialized view as the best of a table (data storage) and a view (stored sql query).

A Redshift materialized views save us the most expensive resource of all – time.

Best features of Redshift materialized views

Materialized views in Redshift have some noteworthy features. Let’s take a look at a few.

  • More than just tables: Do you have files in AWS S3 you would like to reference? Or maybe you already have a materialized view and need a new one with some additional data?
    In redshift you can create a materialized view to refer data in external tables (AWS S3) and even define one in terms of an existing view.

  • Materialized view on materialized view: Redshift lets you create materialized views based on materialized views you already created. This is similar to reading data from a table and helps avoid duplicating expensive table joins and aggregations.

  • Adding columns: There are more DDL (Data Definition Language) limitations on creating materialized views. However, one bright spot, you can add columns to the internal tables with zero impact to existing materialized views.

  • Automatic query rewriting: For me this is an exciting feature! Redshift automatically rewrites your sql query to use a materialized view (if one exists) even if you do not explicitly use it, thereby improving performance.

  • Incremental refresh: With certain limitations, Redshift lets you perform an incremental refresh (vs a full refresh) on a materialized view. This helps save time.

Redshift materialized view limitations

Redshift materialized views are not without limitations. Let’s take a look at the common ones.

  • Stale data: The data in a materialized view is a point in time snapshot. Any changes to the underlying data will not be reflected unless the materialized view is refreshed.


  • Redshift Create materialized view limitations: You cannot use or refer to the below objects or clauses when creating a materialized view
    • Auto refresh when using mutable functions or reading data from external tables.
    • Late binding or circular reference to tables.
    • Leader node-only functions such as CURRENT_SCHEMA, CURRENT_SCHEMAS, HAS_DATABASE_PRIVILEGE, HAS_SCHEMA_PRIVILEGE, HAS_TABLE_PRIVILEGE.
    • ORDER BY, LIMIT and OFFSET clauses.
    • System Tables.
    • User defined functions.
    • Views.

  • There is no CREATE or REPLACE materialized view Redshift statement. You have to drop the materialized view using DROP MATERIALIZED VIEW ddl first. Then re-create the Redshift materialized view using a CREATE MATERIALIZED VIEW statement.

  • Automatic query rewriting limitations: Query rewriting will not work if your materialized view has the below conditions/functions.
    • Aggregate functions other than SUM, COUNT, MIN, and MAX.
    • CREATE TABLE AS statements.
    • DISTINCT clause.
    • External tables.
    • HAVING clause.
    • LEFT, RIGHT and FULL outer joins.
    • Materialized views referencing other materialized views.
    • References to system tables and catalogs.
    • SELECT INTO statements.
    • Set operations (UNION, INTERSECT, and EXCEPT).
    • Subqueries.
    • Window functions.


  • Auto refresh limitations: If you recall, auto refresh has 2 modes: incremental and full. The only limitation on a full materialized view refresh is – no external tables allowed.

    Incremental refresh on the other hand has more than a few. I have them listed below.

    • Aggregate functions AVG, MEDIAN, PERCENTILE_CONT, LISTAGG, STDDEV_SAMP, STDDEV_POP, APPROXIMATE COUNT, APPROXIMATE PERCENTILE, and bitwise aggregate functions are not allowed.
    • DISTINCT clause.
    • External tables.
    • LEFT, RIGHT and FULL outer joins.
    • Mutable functions – date-time functions, RANDOM and non-STABLE user-defined functions
    • Set operations (UNION, INTERSECT, EXCEPT and MINUS).
    • Temporary tables used for query optimization.
    • Subqueries not part of the FROM clause.
    • Window functions.

6 Best practices for Redshift materialized views

Now that we have a feel for the limitations on materialized views, let’s look at 6 best practices when using them.

  1. Ensure you have SELECT privileges to the underlying tables, schema and permissions to CREATE, ALTER, REFRESH and DROP.

  2. Do not perform the below actions on a materialized view. If you really need to, then drop and recreate it.

    • Renaming a materialized view.

    • Change the data type of a column.

    • Change the schema name to which your tables belong.

    • Alter the underlying SQL statement.

  3. Make sure to refresh all dependent materialized views individually prior to refreshing your main view.  Query system table STV_MV_DEPS for information on materialized view dependencies.

  4. Use STL_EXPLAIN to determine if automatic query rewriting is being used for your query.

  5. Use SVL_MV_REFRESH_STATUS to check the materialized view refresh status as below.

       
    Select * from SVL_MV_REFRESH_STATUS;


    This is an extremely helpful view, so get familiar with it. At a minimum check for the 5 listed details in the SVL_MV_REFRESH_STATUS view.

    1. Who performed the last refresh?

    2. When was the refresh kicked off?

    3. The type of refresh performed (Manual vs Auto).

    4. Status of the refresh (Successful vs Partial vs Failed vs Aborted).

    5. What changes were made during the refresh (Schema vs Table vs Column).


  6. Prefix or suffix the materialized view name with “mv_” or “_mv” based on your accepted naming convention.

Redshift Create materialized view basics

The Redshift CREATE MATERIALZIED VIEW statement creates the view based on a SELECT AS statement. This is very similar to a standard CTAS statement.

A major benefit of this Select statement, you can combine fields from as many Redshift tables or external tables using the SQL JOIN clause.

Let’s look at how to create one. Instead of the traditional approach, I have two examples listed. The first with defaults and the second with parameters set.

It’s a lot simpler to understand this way.

In this first example we create a materialized view based on a single Redshift table. The default values for backup, distribution style and auto refresh are shown below. Note, you do not have to explicitly state the defaults. They are implied.


Example1
: Redshift create materialized view using DEFAULTS. CREATE MATERIALIZED VIEW mv_new_address AS SELECT * from addresses where address_updated ='Y'; BACKUP: YES DISTRIBUTION STYLE: EVEN AUTO REFRESH: NO


In this second example we create the same materialized view but specify the parameter values based on our needs.

The values used in this example are meant to clarify the syntax and usage of these parameters. Be sure to determine your optimal parameter values based on your application needs. 

 


Example2
: Redshift create materialized view with user defined parameter values. CREATE MATERIALIZED VIEW mv_new_address
BACKUP NO
DISTSTYLE KEY
DISTKEY (zipcode)
SORTKEY AUTO
AUTO REFRESH YES
AS SELECT * from addresses where address_updated ='Y';

Trending Questions on Redshift materialized views

1. Are materialized views automatically refreshed in Redshift?

When you create a materialized view, you must set the AUTO REFRESH parameter to YES. If this feature is not set, your view will not be refreshed automatically.

In case you forgot or chose not to initially, use an ALTER command to turn on auto refresh at any time.


2. What is incremental refresh vs full refresh in Redshift?

Both terms apply to refreshing the underlying data used in a materialized view.

In an incremental refresh, the changes to data since the last refresh is determined and applied to the materialized view. On the other hand, in a full refresh the SELECT clause in the view is executed and the entire data set is replaced. 

Conclusion

In summary, Redshift materialized views do save development and execution time. However, it’s important to know how and when to use them. Make sure you really understand the below key areas –

  • Auto refresh vs manual refresh.

  • Automatic query re writing and its limitations.

  • Materialized view on materialized view dependencies.

  • Don’t over think it. You may not be able to remember all the minor details. It’s okay. Practice makes perfect!


Need to Create tables in Redshift?

We have a post on Creating Redshift tables with examples, 10 ways. Most developers find it helpful.

Table of Contents

Recent Posts

Interested in our services ?

email us at : info@obstkel.com

Copyright 2022 © OBSTKEL LLC. All rights Reserved