Coalesce means to combine!
Redshift Coalesce is a conditional expression which returns the first non-null value from multiple input values. The beauty of coalesce is, there is no limit to the number of values you can input. It’s a complex sounding word, but one of the most helpful expressions in Redshift.
The syntax for the coalesce function is as below.
coalesce (expression1, expression2, expression3.......)
If you pay close attention, the coalesce function is pretty much an abbreviated version of an if/else statement. The above statement would break down as
if expression1 is null then set to expression2 else
if expression2 is null then set to expression3........
So, when should we use the Coalesce expression?
As a matter of personal preference, if you don’t like the sound of “coalesce” or think it’s too much to remember, don’t worry. You can use the NVL expression. COALESCE and NVL do the exact same thing.
For now, let’s look at 3 nuances of the coalesce function – data types, null values and expression priority. Mastering these will have you using coalesce much more effectively.
The coalesce expression is data type independent. This means you can use this expression on String, Date and Numerical values as long as all the inputs are of the same type. The last part needs some clarification.
For example, you cannot have one input field(expression1) of type date, and another one (expression2) of type number. All source data fields, in the coalesce function should have the same data type.
To keep things simple, I will be referring to CHAR/VARCHAR data types, combinedly as string(s).
Tip #1: The target field data type is the driver – When using the Redshift Coalesce function, make sure all the source data types are the same as the target data type. If they are not, then use a function to convert them to the target data type.
Now let’s look at a few examples. Take a look at the table below and see if come to the same conclusion.
The Redshift Database Developer Guide defines a null as “If a column in a row is missing, unknown, or not applicable, it is a null value or is said to contain null”.
A simplified version, a null value in a database denotes the absence of data. DATE and NUMERIC data types are the cleanest when it comes to determining null/not null cases. You either have a valid value or you have a null value. The complexity arises when the data is stored in a VARCHAR/CHAR datatype field.
A space or blank in a CHAR/VARCHAR field is not the same as null value. Space might not be the value you are looking for, but that does not mean it is a null value.
Likewise, a zero in a numeric field is not the same as a null value.
An incorrect date like 30 February 2022 might be okay in a CHAR/VARCHAR field but will be a null value in a DATE data type field.
In some instances, we may even have junk/garbage characters in a string field.
How, then do we make sure a source field is actually a pure null?
A clever move would be to write a SELECT with a TRIM function and an IS NULL in sql as shown below.
Tip#2: Determine if the source field contains null value(s) using IS NULL in SQL.
SELECT * FROM orders where TRIM(delivery_status) IS NULL;
Alternately, if you suspect the data has junk/garbage characters, you can add the REGEXP_REPLACE function to the above IS NULL SQL statement as below.
The function as written, replaces all occurrences of numbers ‘0-9’, upper case letters ‘A-Z’ and lowercase letters ‘a-z’ with a null value. If a junk/garbage character exists in the field, the SELECT statement returns a 1.
Tip#3: Use REGEXP_REPLACE and NOT NULL to determine if a field contains junk/garbage characters.
SELECT 1 FROM orders where
REGEXP_REPLACE(delivery_status, '[[A-Z,a-z,0-9]]*','') IS NOT NULL;
Moving on to the last and final topic on Coalesce – expressions.
The source fields you choose to be expression1, expression2, expression3 and so on makes a significant difference in the value returned by the Redshift Coalesce function. A technique to ensure you do not end up with incorrect values is to prioritize your expression selection.
How do you do that?
Just like you send in the best players first in football, make sure the field with the best data quality is used first. The image below depicts the order in which it is best to select fields in a Coalesce expression.
Let’s look at a few of Redshift Coalesce examples using date, string and numeric data. If you would like to see more variations on coalesce examples, sent us an email at email@example.com
In this example #2, let’s look at using coalesce on 3 input date columns. Remember in this context column and expression means the same.
SELECT coalesce ( order_delivered, to_date(expected_delivery,'YYYYMMDD'), current_date) FROM orders;
In this example #3, let’s look at using coalesce on 3 CHAR/VARCHAR columns. The goal is to determine the email address using a combination of existing data and deriving it if necessary.
SELECT coalesce ( email_address,trim(email_address2), last_name + firstname.lastname@example.org') FROM customers;
In this 4th example, let’s look at using coalesce on 3 numeric columns. The goal is to determine the price an item was sold at.
SELECT coalesce ( sale_price, cast(recommended_sale_price as decimal), current_price) FROM items;
In case you are wondering why the columns were listed in that order, go back and review the section on ‘Prioritizing expressions in Coalesce’.
For this specific example, we are trying to determine the price at which an item was sold.
The sale_price has the highest priority in this context and becomes the first expression. If the item was not listed for sale up to that point, sale_price will be null. Our next best bet is to use the recommended_sale_price, which is stored as a string. You can use either the TO_NUMBER function or the CAST function.
In this case I choose the CAST function to convert the recommended sale price to a decimal. If both the sale_price and recommended_sale_price is null, then we set the item price to the current_price.
Redshift Coalesce is a powerful and straightforward conditional expression. Just make sure you pay attention to the below listed details.