obstkel.com logo

Redshift materialized views: The good, the bad and the ugly

Redshift materialized views simplify querying large amounts of data across multiple tables. In addition, they also provide significant performance improvement.

How does a materialized view achieve this, and how is it different from a database table? Let’s get our arms around the basics first.

To derive information from data, we need to analyze it. We do this by writing SQL against database tables. Sometimes this might require joining multiple tables, aggregating data and using complex SQL functions. 

If this task needs to be repeated, you save the sql script and execute it or even create a SQL view. A view by the way, is nothing more than a stored SQL query you execute as frequently as needed.

However, a view does not generate output data until it is executed. In other words, if a complex sql takes forever to run, a view based on the same SQL will do the same. This is where a Redshift materialized view comes in handy.

Redshift materialized views save us the most expensive resource of all – time.

When a materialized view is created, the underlying SQL query gets executed right away and the output data stored. So, when you call the materialized view, all its doing is extracting data from the stored results.

Think of a materialized view as the best of a table (data storage) and a view (stored sql query).

Best features of Redshift materialized views

Materialized views in Redshift have some noteworthy features. Let’s take a look at a few.

  • More than just tables: Do you have files in AWS S3 you would like to reference? Or maybe you already have a materialized view and need a new one with some additional data?
    In redshift you can create a materialized view to refer data in external tables (AWS S3) and even define one in terms of an existing view.

  • Materialized view on materialized view: Redshift lets you create materialized views based on materialized views you already created. This is similar to reading data from a table and helps avoid duplicating expensive table joins and aggregations.

  • Adding columns: There are more DDL (Data Definition Language) limitations on creating materialized views. However, one bright spot, you can add columns to the internal tables with zero impact to existing materialized views.

  • Automatic query rewriting: For me this is an exciting feature! Redshift automatically rewrites your sql query to use a materialized view (if one exists) even if you do not explicitly use it, thereby improving performance.

  • Incremental refresh: With certain limitations, Redshift lets you perform an incremental refresh (vs a full refresh) on a materialized view. This helps save time.

Redshift materialized view limitations

Redshift materialized views are not without limitations. Let’s take a look at the common ones.

  • Stale data: The data in a materialized view is a point in time snapshot. Any changes to the underlying data will not be reflected unless the materialized view is refreshed.

  • Create table limitations: You cannot use or refer to the below objects or clauses when creating a materialized view
    • Auto refresh when using mutable functions or reading data from external tables.
    • Late binding or circular reference to tables.
    • Leader node-only functions such as CURRENT_SCHEMA, CURRENT_SCHEMAS, HAS_DATABASE_PRIVILEGE, HAS_SCHEMA_PRIVILEGE, HAS_TABLE_PRIVILEGE.
    • ORDER BY, LIMIT and OFFSET clauses.
    • System Tables.
    • User defined functions.
    • Views.

  • Automatic query rewriting limitations: Query rewriting will not work if your materialized view has the below conditions/functions.
    • Aggregate functions other than SUM, COUNT, MIN, and MAX.
    • CREATE TABLE AS statements.
    • DISTINCT clause.
    • External tables.
    • HAVING clause.
    • LEFT, RIGHT and FULL outer joins.
    • Materialized views referencing other materialized views.
    • References to system tables and catalogs.
    • SELECT INTO statements.
    • Set operations (UNION, INTERSECT, and EXCEPT).
    • Subqueries.
    • Window functions.

  • Auto refresh limitations: If you recall, auto refresh has two modes: incremental and full. The only limitation on a full materialized view refresh is – no external tables allowed.

    Incremental refresh on the other hand has more than a few. I have them listed below.

    • Aggregate functions AVG, MEDIAN, PERCENTILE_CONT, LISTAGG, STDDEV_SAMP, STDDEV_POP, APPROXIMATE COUNT, APPROXIMATE PERCENTILE, and bitwise aggregate functions are not allowed.
    • DISTINCT clause.
    • External tables.
    • LEFT, RIGHT and FULL outer joins.
    • Mutable functions – date-time functions, RANDOM and non-STABLE user-defined functions
    • Set operations (UNION, INTERSECT, EXCEPT and MINUS).
    • Temporary tables used for query optimization.
    • Subqueries not part of the FROM clause.
    • Window functions.

6 Best practices for Redshift materialized views

Now that we have a feel for the limitations on materialized views, let’s look at 6 best practices when using them.

  1. Ensure you have SELECT privileges to the underlying tables, schema and permissions to CREATE, ALTER, REFRESH and DROP.

  2. Do not perform the below actions on a materialized view. If you really need to, then drop and recreate it.
    • Renaming a materialized view.
    • Change the data type of a column.
    • Change the schema name to which your tables belong.
    • Alter the underlying SQL statement

  3. Make sure to refresh all dependent materialized views individually prior to refreshing your main view.  Query system table STV_MV_DEPS for information on materialized view dependencies.

  4. Use STL_EXPLAIN to determine if automatic query rewriting is being used for your query.

  5. Use SVL_MV_REFRESH_STATUS to check the materialized view refresh status.

  6. Prefix or suffix the materialized view name with “mv_” or “_mv” based on your accepted naming convention.

Trending Questions on Redshift materialized views

1. Are materialized views automatically refreshed in Redshift?

When you create a materialized view, you must set the AUTO REFRESH parameter to YES. If this feature is not set, your view will not be refreshed automatically.

In case you forgot or chose not to initially, use an ALTER command to turn on auto refresh at any time.


2. How do you create a materialized view?

You create a materialized view using the SQL command CREATE MATERIALIZE VIEW.


3. What is incremental refresh vs full refresh in Redshift?

Both terms apply to refreshing the underlying data used in a materialized view. In an incremental refresh, the changes to data since the last refresh is determined and applied to the materialized view.
On the other hand, in a full refresh the SELECT clause in the view is executed and the entire data set is replaced. 

Conclusion

In summary, Redshift materialized views do save development and execution time. However, it’s important to know how and when to use them. Make sure you really understand the below key areas –

  • Auto refresh vs manual refresh.
  • Automatic query re writing and its limitations.
  • Materialized view on materialized view dependencies.

Table of Contents

Recent Posts

Interested in our services ?

email us at : info@obstkel.com

Copyright 2022 © OBSTKEL LLC. All rights Reserved