obstkel.com logo

What is Amazon Redshift explained in 10 minutes or less

main image for what is amazon redshift

In simple words, Amazon Redshift or AWS Redshift is a Cloud based Data Warehouse service by Amazon Web Services (AWS).

There are two terminologies to pay attention to here – Cloud and Data Warehouse.

what is amazon redshift

Cloud, short for Cloud Computing, refers to computing resources provided by a third party. These computing resources range from processing power, storage, applications to more complex SaaS, PaaS and IaaS. 

A Data Warehouse is a repository to store large amounts of historical data meant for generating reports and performing analytics. If the data is in a structured or semi-structured format, then you can store it in Redshift.

Traditionally, most data warehouses are hosted on premise and managed by a team of System Administrators and Database Administrators. However, Amazon Redshift as a fully managed cloud service handles all aspects of scaling, capacity provisioning, cluster backup, patching and upgrading. That makes a huge difference!

The benefits of Cloud Computing are immense; however, for the sake of simplicity let’s just say it saves you a lot of money and heartburn.  

What type of database is Amazon Redshift?

Amazon Redshift is a Relational Database Management System (RDBMS) built upon PostgreSQL.

PostgreSQL, if you are not familiar, is a highly robust open-sourced Object-Relational database. It is popular with large companies like Apple, Instagram, Reddit, Skype and Twitch. That’s just naming a few.

However, just because the Redshift database is built upon PostgreSQL does not mean they are the same. Amazon Redshift db is highly optimized for Business Intelligence (BI) and Online Analytical Processing (OLAP).

Some of the optimizations include:

  • Data storage: Redshift Database uses Columnar storage for database tables Instead of storing an entire row of data from a database table in a block, in Columnar storage, the entire column gets stored in the block.

    For instance, consider a database table on customer address, with 200 rows and 10 columns. Let’s assume the 5th column stores the ZIPCODE. In Columnar storage, the entire data for the ZIPCODE column gets stored in a single column. This provides better performance on SQL execution and storage.

    Why do you think SQL executions against these tables are faster?

    Because SQL queries for analytics are normally limited to certain columns and never the entire row. With the data for the entire column stored in a single block, we have fewer blocks to read/write.

  • Data Compression: Compressing data saves storage space. Redshift by default compresses the columns in the table using RAW, AZ64 or LZO encoding. The encoding type is chosen based on the data type of the columns.


  • Query engine: The query execution engine leverages Redshift specific Massive Parallel Processing (MPP), Results caching and Compiled code distribution feature in addition to the columnar storage to increase execution speed, reduce execution time and improve system performance. 

What is the Amazon Redshift difference?

  • The Redshift architecture difference: At its core, Amazon Redshift is made of clusters. A cluster in turn is made up of one or more nodes. These nodes can be categorized into leader nodes and compute nodes.

    The leader node does the job of coordination and communication(engine), while the compute node does the heavy lifting (database).
amazon redshift architecture
  • Redshift support for unstructured data. You already know Amazon Redshift can handle semi-structured data in addition to the standard structured data, which is great! If you have a vast amount of unstructured data and want to generate analytics from it, Redshift has a solution for you.

    Say hello to Amazon Redshift Spectrum!

    Redshift Spectrum is a feature of Amazon Redshift which lets you query unstructured data stored in Amazon S3. You do not even have to load the data into the Redshift database. Matter of fact, you can even use Redshift Spectrum to query your structured and semi-structured data straight from Amazon S3.


  • Redshift tables are not the same. Tablespaces, table partitioning, and inheritance are not supported in Redshift. This might sound strange, but it really helps to improve performance.


  • Some table constraints are informational. Primary keys, Foreign keys and Unique constraints are not enforced in Redshift. This means if you input bad data into your application, it gets stored in the database.


  • Data in a Redshift table is stored in sorted order. When you load data into a Redshift table, the data is stored in sorted order. The sort order is determined by the sort keys specified when you create a table.

    If this makes you scratch your head, dont worry! The sorted data compliments the Redshift columnar storage to give us highly efficient querying capabilities.


Related: Learn how to create tables in Redshift using examples

What is Amazon Redshift pricing model?

Pricing with any AWS Service is based on a Pay-as-you-go model. Similar to your water or electricity bill, you only pay for services used for the duration of the usage, without the need to sign any long-term contracts.

AWS offers a lot of flexibility when it comes to Amazon Redshift pricing. The best approach to maximize these benefits is to think in terms of environments: Sandbox/Prototyping, Development, Testing, Staging and Production. 

infographic on amazon redshift price
  • Sandbox/ Prototyping environment

    If you are playing around with the idea of Redshift, want to understand its features & functionality or build a quick prototype, consider the AWS Free Tier trial version of AWS Redshift.

    With this option you get upto 750 hours of free usage per month, for two months.

     

  • Development/ Test/Staging environment(s)

    These environments do not require to be up and operational 24/7. Your best option is to use an On demand instance (Pay-as-you-go) pricing. With this option, you can pay by the hour and shut down instances when not in use, or when you do not need them any more, so you don’t get billed.

    If On-Demand instance is what you opt for, then you need to think of Amazon Redshift pricing in terms of Compute, Storage and Data Transfer as shown below. 

ComputeStorageData Transfer
Dense Compute (DC2)
Dense Storage (DS2)
RA3 with Redshift Managed Storage
Redshift Managed
Additional Backup
Redshift Spectrum
  • Production environment(s)

     You want these environments to be up and operational with very little downtime. So, Reserved Instances are the best for these environments.

    AWS lets you choose instances for a 1–3-year term, and oftentimes, they can end up being cheaper than the Pay-as-you-go option.


An important point to remember, with AWS Reserved Instances, you are charged for the instances, for the term you signed up for, regardless of if you use them or not. The best part, the price includes two additional copies of your data, and AWS takes care of availability, backup, durability, monitoring, security and maintenance.

For additional details on Amazon Redshift pricing for reserved nodes, click here.


By now you should have a high-level understanding on how to approach Amazon Redshift pricing. Since cost can change, I recommended using the AWS Pricing Calculator for Amazon Redshift to get the most up-to-date details on pricing. 

Recent Posts

Redshift helpful links

Amazon Redshift Documentation

This is the latest version of Redshift Documentation

Get started with Amazon Redshift Spectrum

Learn how to create external tables, schema and query data using Spectrum

Table of Contents

Interested in our services ?

email us at : info@obstkel.com

Copyright 2022 © OBSTKEL LLC. All rights Reserved