Data Warehousing Essentials: Introduction to Amazon Redshift

Pronay Ghosh
Accredian
Published in
5 min readApr 6, 2022

--

by Pronay Ghosh and Hiren Rupchandani

  • In the previous article, we learned about we covered a high-level overview of the system design and features of Amazon Lex.
  • In this series, we will understand the high-level overview of Amazon Redshift.
  • We will know how to load data from Amazon S3.
  • After that, we will know how to query nested data with Amazon Redshift Spectrum.
  • Then, we will know about configuring Manual Workload Management (MWM) queues.

What is Amazon Redshift?

  • Amazon Redshift is a completely managed cloud-based data warehousing solution.
  • Its datasets range in size from a few hundred megabytes to a petabyte.
  • The first step in building a data warehouse is to launch a collection of computational resources known as nodes.
  • These nodes are arranged into clusters. You can then proceed to process your requests.

Features of Amazon Redshift

Amazon Redshift supports a wide range of features and they are as follows:

  • VPC support: The VPC support feature allows users to run Redshift within a VPC and control cluster access using the virtual networking environment.
  • Encryption Redshift: Data may be encrypted and customized when tables are being created.
  • SSL: SSL encryption is used to encrypt client-to-Redshift communications.
  • Scalable: The number of nodes in your Redshift data warehouse may be quickly scaled as needed with a few easy clicks.
  • It also enables the expansion of storage capacity without sacrificing performance.
  • Amazon Redshift is a less expensive alternative to standard data warehousing techniques.
  • There are no upfront expenses, no long-term obligations, and a pricing system that is based on demand.

Amazon Redshift-Setup:

Step 1: Setting up the AWS Redshift Cluster

  • As the first step Log in to your AWS account and create a Redshift Cluster using the methods below.
  • Sign in to the AWS Management Console and go here to access the Amazon Redshift console.
  • Using the Region option in the upper right corner of the screen, choose the region where the cluster will be built.
  • To start the cluster, press the Launch Cluster button.
  • The Cluster Information page appears.
  • Fill in the relevant information and go to the review page by clicking the Continue button.
  • A page of confirmation appears.
  • To conclude, click the Close button to make the cluster visible in the Clusters list.
  • Review the Cluster Status information by selecting the cluster from the list.
  • The status of the cluster will be displayed on the website.

Step 2: Configuring The Security Groups

  • On the navigation pane of the Amazon Redshift Console, click Clusters.
  • Choose the Cluster you want. Make sure that the Configuration Tab is activated.
  • Select the Security tab.
  • Click the Inbound tab after the Security group page has loaded.
  • Select the Edit option.
  • Click the Save button after filling in the fields as indicated below.
  • Select Custom TCP Rule from the drop-down menu.
  • TCP stands for Transmission Control Protocol.
  • Type the same port number that was used to launch the cluster in the Port Range field.
  • Amazon Redshift’s default port is 5439.
  • Select Custom IP, then put 0.0.0.0/0 in the Source field.

Step 3: Connection to the Redshift Cluster

  • An SQL client tool is used to connect the cluster.
  • It works with SQL client tools that are PostgreSQL JDBC or ODBC compatible drivers.
  • Download JDBC from this URL.
  • For 64-bit ODBC computers, use this link
  • To obtain the Connection String, follow the procedures below.
  • In the Navigation panel of the Amazon Redshift Console, choose Cluster.
  • Select the cluster you want to work with and go to the Configuration page.
  • A page containing JDBC URL appears under Cluster Database Properties, as illustrated. Make a note of the URL.
  • To connect the Cluster to SQL Workbench, follow the instructions below.
  • SQL Workbench should now be open.
  • Select the file and then click the Connect button in the Connect box.
  • Select Create a new connection profile and fill up the needed information, such as name and email address.
  • The Manage Drivers dialogue box appears when you click Manage Drivers.
  • Fill in the relevant information by clicking the Create a new entry button.
  • Leave the Classname and Sample URL fields empty.
  • Click the OK button.
  • Select a driver from the drop-down menu.
  • Paste the JDBC URL you copied into the URL area.
  • Fill up the username and password blanks with the appropriate information.
  • Click Save profile list after checking the Autocommit option.

Conclusion:

  • So far in this article, we covered a high-level overview of the system design and features of Amazon Redshift.
  • In the next article, we will learn about how to load data from Amazon S3.

Final Thoughts and Closing Comments

There are some vital points many people fail to understand while they pursue their Data Science or AI journey. If you are one of them and looking for a way to counterbalance these cons, check out the certification programs provided by INSAID on their website. If you liked this story, I recommend you to go with the Global Certificate in Data Science & AI because this one will cover your foundations, machine learning algorithms, and deep neural networks (basic to advance).

Follow us for more upcoming future articles related to Data Science, Machine Learning, and Artificial Intelligence.

Also, Do give us a Clap👏 if you find this article useful as your encouragement catalyzes inspiration for and helps to create more cool stuff like this.

--

--

Pronay Ghosh
Accredian

Data Scientist at Aidetic | Former Data Science researcher at The International School of AI and Data Science