
In simple words, Amazon Redshift or AWS Redshift is a Cloud based Data Warehouse service by Amazon Web Services (AWS).
There are two terminologies to pay attention to here – Cloud and Data Warehouse.
Cloud, short for Cloud Computing, refers to computing resources provided by a third party. These computing resources can range from processing power, storage, applications to more complex SaaS, PaaS and IaaS.
A Data Warehouse is a repository to store large amounts of historical data meant for generating reports and performing analytics. If the data is in a structured or semi-structured format, then you can store it in Redshift.
Traditionally, most data warehouses are hosted on premise; however, Amazon Redshift as a fully managed cloud service handles all aspects of scaling, capacity provisioning, cluster backup, patching and upgrading. That makes a huge difference!
The benefits of Cloud Computing are immense; however, for the sake of simplicity let’s just say it saves you a lot of money and heartburn.
The Redshift Database is built upon PostgreSQL. A PostgreSQL database is a highly robust open-sourced Object-Relational database. It is popular with large companies like Apple, Instagram, Reddit, Skype and Twitch. That’s just naming a few.
This does not mean that Redshift Database and PostgreSQL are the same.
Redshift is a tailored version of PostgreSQL for OLAP (Online Analytical Processing). PostgreSQL on the other hand is a general-purpose Object-Relational database meant for OLTP (Online Transaction Processing).
Let’s look at a few additional Redshift Database features that makes it different.
Yes, you can use Redshift as a transactional database. However, that is not what it is meant for. Quite often, Amazon Redshift is confused for a NoSQL database. Redshift is not a NoSQL database.
Why do you think SQL executions against these tables are faster?
Because SQL queries for analytics are normally limited to certain columns and never the entire row. With the data for the entire column stored in a single block, we have fewer blocks to read/write.
At its core, Amazon Redshift is made of clusters. A cluster in turn is made up of one or more nodes. These nodes can be categorized into leader nodes and compute nodes.
The leader node does the job of coordination and communication(engine), while the compute node does the heavy lifting (database).
Say hello to Amazon Redshift Spectrum!
Redshift Spectrum is a feature of Amazon Redshift which lets you query unstructured data stored in Amazon S3. You do not even have to load the data into the Redshift database. Matter of fact, you can even use Redshift Spectrum to query your structured and semi-structured data straight from Amazon S3.
Related: Learn how to create tables in Redshift using examples
Pricing with any AWS Service is based on a Pay-as-you-go model. Similar to your water or electricity bill, you only pay for services used for the duration of the usage, without the need to sign any long-term contracts.
AWS offers a lot of flexibility when it comes to Amazon Redshift price. The best approach to maximize these benefits is to think in terms of environments: Sandbox/Prototyping, Development, Testing, Staging and Production.
If you are playing around with the idea of Redshift, want to understand its features & functionality or build a quick prototype, consider the AWS Free Tier trial version of AWS Redshift.
With this option you get upto 750 hours of free usage per month, for two months.
These environments do not require to be up and operational 24/7. Your best option is to use On demand instance (Pay-as-you-go) pricing. With this option, you can pay by the hour and shut down instances when not in use, or when you do not need them any more, so you don’t get billed.
If On-Demand instance is what you opt for, then you need to think of Amazon Redshift pricing in terms of Compute, Storage and Data Transfer as shown below.
Compute | Storage | Data Transfer |
Dense Compute (DC2) Dense Storage (DS2) RA3 with Redshift Managed Storage | Redshift Managed Additional Backup | Redshift Spectrum |
You want these environments to be up and operational with very little downtime. So, Reserved Instances are the best for these environments.
AWS lets you choose instances for a 1–3-year term, and oftentimes, they can end up being cheaper than the Pay-as-you-go option.
An important point to remember, with AWS Reserved Instances, you are charged for the instances, for the term you signed up for, regardless of if you use them or not. The best part, the price includes two additional copies of your data, and AWS takes care of availability, backup, durability, monitoring, security and maintenance.
For additional details on Redshift price for reserved nodes, click here.
By now you should have a high-level understanding on how to approach Amazon Redshift pricing. Since cost can change, I recommended using the AWS Pricing Calculator for Amazon Redshift to get the most up-to-date details on pricing.
This is the latest version of Redshift Documentation
Learn how to create external tables, schema and query data using Spectrum
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |