Get the report
MoreComplete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.
October 9, 2019
In the first post of our three-part Amazon Redshift series, we covered what Redshift is and how it works. For the second installment, we’ll discuss how Amazon Redshift queries are analyzed and monitored. Before we go deep into gauging query performance on Redshift, let’s take a quick refresher on what Amazon Redshift is and what it does.
Amazon Redshift is a fully-managed petabyte-scale cloud-based data warehouse service designed for large scale data set storage and analysis. Part of the larger Amazon Web Services (AWS) platform, Redshift is built to handle massive amounts of data for processing, analysis, and migration.
Read more: What is Amazon Redshift?
When you’re set up and running on Amazon Redshift, it’s crucial to monitor the performance of your clusters and databases regularly. You can use tools and techniques to optimize clusters, but without performance benchmarks, you can’t accurately track if your efforts are working. You can begin benchmarking by understanding the performance data and metrics Redshift provides through the AWS console.
Monitoring query performance is essential in ensuring that clusters are performing as expected.
Redshift users can use the console to monitor database activity and query performance. Since the data is aggregated in the console, users can correlate physical metrics with specific events within databases simply.
If you are interested in AWS monitoring the physical performance of your clusters, including CPU Utilization and Network Throughput, these metrics and more can be monitored through Amazon CloudWatch. You can learn more about CloudWatch here.
On the Redshift console, performance metrics are organized by cluster.
For performance metrics on the cluster-level, simply go to the Redshift console and choose the cluster you want to monitor from the dropdown menu. The Cluster details page will give you a quick view of the status of your cluster, including DB Health and availability. For query-level metrics, choose the Queries tab.
On the Queries tab, you will see the query summary, a list of all running queries in a cluster, including recently run queries. This list can be sorted by ID, query run time, and status. Though limited, the information and metrics provided in the Queries tab are beneficial in analyzing specific queries on your cluster. Information here include:
Remember that query performance metrics are very limited on Amazon CloudWatch, so to correlate an irregularity, it’s very useful to be able to view events on the query level to see more details on what really happened during an event.
Because of the massive amounts of data in Redshift, it can take a long time to execute complex queries to retrieve information from your clusters. To get the most out of Redshift, your queries must be processed as fast as possible. To know how your cluster is performing, there are tools that you can use to analyze queries and pinpoint the cause of performance issues.
Typical Redshift users process large amounts of data so it’s crucial to regularly monitor how much space is left on your Redshift cluster. This allows you to incrementally upgrade and avoid being surprised by a full cluster.
There are three ways you can monitor disk space in Redshift.
First, via CloudWatch. Second, through the Performance tab on the AWS Console. And third, by querying Redshift directly.
Setting up alerts on CloudWatch can save you time by automating the monitoring of your Redshift storage. You can set up an alert for when the disk reaches a certain percentage of usage. Learn how to set up alerts here.
From the cluster details page, choose the Performance tab and scroll down to the graph that shows how much storage is currently in use.
Use the STV_PARTITIONS query to monitor your cluster’s disk utilization. This is only visible to superusers, so make sure to have the correct permissions set up beforehand.
The monitoring tools provided by Amazon are very useful in optimizing Redshift. However, these tools are limited and could be inflexible, which can result in you not getting the information you need.
Using third-party monitoring tools like Sumo Logic, which seamlessly integrates with Redshift, gives you access to a richer set of metrics that are updated in real-time. In the next installment of this series, we will go deeper into Sumo Logic and how you can use it to optimize your Redshift clusters further.
Reduce downtime and move from reactive to proactive monitoring.
Build, run, and secure modern applications and cloud infrastructures.
Start free trial