Pricing Login
Back to blog results

June 22, 2021 By Rishi Divate and Angad Singh

Using pre-built Monitors to proactively monitor your application infrastructure

By Rishi Divate and Angad Singh

Platform architects, SREs, developers and DevOps staff for mission-critical modern apps know being notified in real-time when or before critical conditions occur can make a massive difference in end-user digital experiences and in meeting a 99.99% availability objective.

Last year, we released Sumo Logic Monitors that allow you to set robust and configurable alerting policies to get notified about critical changes or issues affecting your production applications. Monitors introduced many improvements on top of our existing alerting capabilities, including the ability to send notifications via email or a number of outbound connections supported by Sumo Logic including email, ServiceNow, Jira, PagerDuty, Opsgenie, Slack, and Microsoft Teams.

We are excited to announce the availability of 150 out-of-the box Sumo Logic Monitors to notify you of impending or immediate problems in your infrastructure and application components such as databases, web servers and messaging systems.

These pre-configured monitors were built around well known best practices for monitoring the technology they support. They leverage both logs and metrics data to ensure you have complete coverage of any operational issues that arise with the technology in question. Now, developers can focus on running critical application code that your business depends on, instead of managing tools and infrastructure.

We support out-of-the-box monitors Infrastructure for key infrastructure and application components such as AWS, Kubernetes, the Apache web server, Apache Kafka, NGINX, NGINX Ingress, NGINX Plus, MongoDB, MySQL, PostgreSQL, Redis, SQL Server, and more.

Apache Alerts Screenshot

The following sections detail a few examples of our monitors in action.


The infrastructure layer is on top of your application stack. Misconfigurations or faulty application code can cause infrastructure resources to either fail or perform at lower than expected capacity, and therefore it is critical to alert these conditions when and before they happen to avoid a degradation in customer experience.

Here are some examples of our out-of-the-box infrastructure alerts.

  • Amazon EC2 - High CPU/Memory/Disk Utilization: Fires when the average CPU/memory or disk utilization for an EC2 instance within a 5 minute interval for an AWS EC2 instance is equal to or more than 85% (configurable).

  • High Percentage of Failed AWS Lambda Requests: Fires when more than 5% (configurable) of all Lambda requests within a 5-minute interval have failed. This indicates your serverless applications are not performing as expected and may be causing a degraded customer experience.

  • Multiple Terminated Kubernetes Pods/Containers: Fires when we determine that more than 5 (configurable) pods or containers have been terminated because they either ran out of memory, errored out, or cannot run.

  • Kube Node Not Ready: Fires when a Kubernetes node is not ready.


Databases are among the critical components of web app infrastructure in not only traditional three-tier app architectures but also in modern distributed apps. Alerting on key database performance, backup, replication and availability conditions is key as applications use databases to both store and read data.

Here are some examples of our out-of-the-box infrastructure alerts.

  • Large number of slow queries: Fires when there are 5 or more slow queries (configurable) executing on a MySQL database cluster within a 5-minute interval.

  • Too Many Connections: Fires when a PostgreSQL database cluster has too many (90% of allowed) connections.

  • Sharding Errors: Fires when there are errors in MongoDB’s sharding operations.

Web Servers

For many organizations, web servers and the applications they serve are the primary way they interface with their community, customers, partners, or prospects. Therefore, it's critical to ensure the end users are having a good experience by monitoring web server responses and operations. In addition, it's critical to detect your websites are being targeted by attackers, so you can make sure malicious requests are blocked going forward.

Here are some examples of our out-of-the-box web server alerts that are applicable to Apache, NGINX, NGINX Ingress or NGINX Plus web servers:

  • High Error Rates: These alerts fire when more than 5% (configurable) of all HTTP requests have either 5xx or 4xx response error codes.

  • Critical Error Messages: Fires when error messages appear in web server error log files.

  • Access from Highly Malicious Sources: Fires when a web server is accessed from known highly malicious IP addresses. This is done by correlating inbound client IP addresses from web server logs with Sumo Logic’s Threat Intelligence database powered by CrowdStrike.

Streaming Platforms

Streaming platforms allow consumer software applications to send data via messages that producer software applications can consume. Development teams use these platforms for a variety of use cases, including monitoring user activity, sending notifications, and concurrently processing streams of incoming data such as financial transactions.

Given the size, complexity and critical nature of streaming platforms, it is critical to understand conditions that affect the operations of these clusters, and their subsequent impact on critical applications and services.

  • High Broker Disk Utilization: Fires when a disk on an Apache Kafka broker node is more than 85% (configurable) full.

  • Under Replicated Partitions: This alert fires when there are under-replicated partitions on a given Apache Kafka broker node.

  • Consumer Lag: This alert fires when an Apache Kafka consumer is lagging by 30 minutes or more.

Additional Resources

For more great DevOps and security focused reads, check out the Sumo Logic blog.

Download the Sumo Logic Continuous Intelligence Report that quantitatively defines the state of the modern application stack and the shift in technology used by enterprises adopting Cloud and DevSecOps during the COVID-19 global pandemic.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Sumo Logic cloud-native SaaS analytics

Build, run, and secure modern applications and cloud infrastructures.

Start free trial

Rishi Divate and Angad Singh

More posts by Rishi Divate and Angad Singh.

People who read this also enjoyed