Redshift materialized views simplify querying large amounts of data across multiple tables. In addition, they also provide significant performance improvement.
How does a materialized view achieve this, and how is it different from a database table? Let’s get our arms around the basics first.
To derive information from data, we need to analyze it. We do this by writing SQL against database tables. Sometimes this might require joining multiple tables, aggregating data and using complex SQL functions.
If this task needs to be repeated, you save the sql script and execute it or even create a SQL view. A view by the way, is nothing more than a stored SQL query you execute as frequently as needed.
However, a view does not generate output data until it is executed. In other words, if a complex sql takes forever to run, a view based on the same SQL will do the same. This is where a Redshift materialized view comes in handy.
Redshift materialized views save us the most expensive resource of all – time.
When a materialized view is created, the underlying SQL query gets executed right away and the output data stored. So, when you call the materialized view, all its doing is extracting data from the stored results.
Think of a materialized view as the best of a table (data storage) and a view (stored sql query).
Materialized views in Redshift have some noteworthy features. Let’s take a look at a few.
Redshift materialized views are not without limitations. Let’s take a look at the common ones.
Incremental refresh on the other hand has more than a few. I have them listed below.
Now that we have a feel for the limitations on materialized views, let’s look at 6 best practices when using them.
When you create a materialized view, you must set the AUTO REFRESH parameter to YES. If this feature is not set, your view will not be refreshed automatically.
In case you forgot or chose not to initially, use an ALTER command to turn on auto refresh at any time.
You create a materialized view using the SQL command CREATE MATERIALIZE VIEW.
Both terms apply to refreshing the underlying data used in a materialized view. In an incremental refresh, the changes to data since the last refresh is determined and applied to the materialized view.
On the other hand, in a full refresh the SELECT clause in the view is executed and the entire data set is replaced.
In summary, Redshift materialized views do save development and execution time. However, it’s important to know how and when to use them. Make sure you really understand the below key areas –