obstkel.com logo

What is Amazon Redshift you ask?

data warehouse

In simple words, Amazon Redshift or AWS Redshift is a Cloud based Data Warehouse service by Amazon Web Services (AWS).

There are two terminologies to pay attention to here – Cloud and Data Warehouse.

what is amazon redshift

Cloud, short for Cloud Computing, refers to computing resources provided by a third party. These computing resources can range from processing power, storage, applications to more complex SaaS, PaaS and IaaS. 

A Data Warehouse is a repository to store large amounts of historical data meant for generating reports and performing analytics. If the data is in a structured or semi-structured format, then you can store it in Redshift.

Traditionally, most data warehouses are hosted on premise; however, Amazon Redshift as a fully managed cloud service handles all aspects of scaling, capacity provisioning, cluster backup, patching and upgrading. That makes a huge difference!


The benefits of Cloud Computing are immense; however, for the sake of simplicity let’s just say it saves you a lot of money and heartburn.  

Redshift Database: What's the difference ?

The Redshift Database is built upon PostgreSQL. A PostgreSQL database is a highly robust open-sourced Object-Relational database. It is popular with large companies like Apple, Instagram, Reddit, Skype and Twitch. That’s just naming a few. 

 

This does not mean that Redshift Database and PostgreSQL are the same.

 

Redshift is a tailored version of PostgreSQL for OLAP (Online Analytical Processing). PostgreSQL on the other hand is a general-purpose Object-Relational database meant for OLTP (Online Transaction Processing).

Let’s look at a few additional Redshift Database features that makes it different.


  • Amazon Redshift is a relational database management system (RDBMS)

    Yes, you can use Redshift as a transactional database. However, that is not what it is meant for. Quite often, Amazon Redshift is confused for a NoSQL database. Redshift is not a NoSQL database.

  • Redshift Database uses Columnar storage for database tables

    Instead of storing an entire row of data from a database table in a block, in Columnar storage, the entire column gets stored in the block.

    For instance, consider a database table on customer address, with 200 rows and 10 columns. Let’s assume the 5th column stores the ZIPCODE. In Columnar storage, the entire data for the ZIPCODE column gets stored in a single column. This provides better performance on SQL execution and storage.

    Why do you think SQL executions against these tables are faster?

    Because SQL queries for analytics are normally limited to certain columns and never the entire row. With the data for the entire column stored in a single block, we have fewer blocks to read/write.

  • Not just an optimized relational database.

    At its core, Amazon Redshift is made of clusters. A cluster in turn is made up of one or more nodes. These nodes can be categorized into leader nodes and compute nodes.
    The leader node does the job of coordination and communication(engine), while the compute node does the heavy lifting (database).

amazon redshift architecture
  • Redshift Database supports unstructured data as well

    You already know Amazon Redshift can handle semi-structured data in addition to the standard structured data, which is great! If you have a vast amount of unstructured data and want to generate analytics from it, Redshift has a solution for you.

    Say hello to Amazon Redshift Spectrum!

    Redshift Spectrum is a feature of Amazon Redshift which lets you query unstructured data stored in Amazon S3. You do not even have to load the data into the Redshift database. Matter of fact, you can even use Redshift Spectrum to query your structured and semi-structured data straight from Amazon S3.

  • Plays nice with third-party tools

    Since Amazon Redshift is based on a relational database, it plays really well with other Databases, Data Integration, Reporting, Business Intelligence (BI), Analytics and SQL Client tools.
    Connection to the Redshift database can be established using ODBC or JDBC drivers.


Related: Learn how to create tables in Redshift using examples

Amazon Redshift pricing - reasonable and practical

Pricing with any AWS Service is based on a Pay-as-you-go model. Similar to your water or electricity bill, you only pay for services used for the duration of the usage, without the need to sign any long-term contracts.

AWS offers a lot of flexibility when it comes to Amazon Redshift price. The best approach to maximize these benefits is to think in terms of environments: Sandbox/Prototyping, Development, Testing, Staging and Production. 

infographic on amazon redshift price
  • Sandbox/ Prototyping environment

    If you are playing around with the idea of Redshift, want to understand its features & functionality or build a quick prototype, consider the AWS Free Tier trial version of AWS Redshift.

    With this option you get upto 750 hours of free usage per month, for two months.

 

  • Development/ Test/Staging environment(s)

    These environments do not require to be up and operational 24/7. Your best option is to use On demand instance (Pay-as-you-go) pricing. With this option, you can pay by the hour and shut down instances when not in use, or when you do not need them any more, so you don’t get billed.

    If On-Demand instance is what you opt for, then you need to think of Amazon Redshift pricing in terms of Compute, Storage and Data Transfer as shown below. 

ComputeStorageData Transfer
Dense Compute (DC2)
Dense Storage (DS2)
RA3 with Redshift Managed Storage
Redshift Managed
Additional Backup
Redshift Spectrum
  • Production environment(s)

     You want these environments to be up and operational with very little downtime. So, Reserved Instances are the best for these environments.

    AWS lets you choose instances for a 1–3-year term, and oftentimes, they can end up being cheaper than the Pay-as-you-go option.



An important point to remember, with AWS Reserved Instances, you are charged for the instances, for the term you signed up for, regardless of if you use them or not. The best part, the price includes two additional copies of your data, and AWS takes care of availability, backup, durability, monitoring, security and maintenance.

For additional details on Redshift price for reserved nodes, click here.


By now you should have a high-level understanding on how to approach Amazon Redshift pricing. Since cost can change, I recommended using the AWS Pricing Calculator for Amazon Redshift to get the most up-to-date details on pricing. 

Recent Posts

Redshift helpful links

Amazon Redshift Documentation

This is the latest version of Redshift Documentation

Get started with Amazon Redshift Spectrum

Learn how to create external tables, schema and query data using Spectrum

Table of Contents

Interested in our services ?

email us at : info@obstkel.com

Copyright 2022 © OBSTKEL LLC. All rights Reserved