These, and other AWS services, generate machine data in the form of log files and time- series metrics that can be analyzed in real time to improve visibility and mitigate security risk. Amazon CloudWatch (a monitoring service for AWS Cloud resources and the applications on them) aggregates these logs for high-level monitoring and alerting in AWS workloads. AWS Partner Network (APN) Advanced Technology Partner and AWS Security Competency Partner Sumo Logic applies advanced analytics and machine learning to logs and time-series metrics allowing organizations to gain real-time, full-stack visibility into cloud and hybrid environments.
Sumo Logic does not require instrumentation and easily captures machine data from AWS. It pulls log files from from a variety of AWS services, including AWS CloudTrail and Amazon VPC Flow Logs, and centralized metrics from Amazon CloudWatch to provide continuous intelligence. This continuous intelligence can help companies accelerate the building, running, and securing of modern applications and enables them to achieve greater visibility intotheir workloads compared to an on-premises environment. Sumo Logic also supports cross-functional collaboration by correlating data from multiple data sources, showing data in the context of time-series metrics, thereby providing a common source of truth for monitoring and troubleshooting.
The Importance of Machine Data Analytics
Machine data is data generated automatically by the activity of a computer, application, or device. This machine-generated data often come in the form of logs and can contain immensely valuable insights about the application/infrastructure and its health. The biggest problem with
harnessing machine data is the sheer volume of data being generated. Raw machine data contains billions, if not trillions, of log and metric data points and is increasing in quantity at an exponential rate. The volume and velocity of this data growth can be difficult for single-tenant analytics solutions to handle. Additionally, machine data can come in a variety of formats and can be structured, unstructured, or semi-structured:
- Structured data refers to data that resides in a fixed field within a file, such as a field in a relational database or a time-series metric such as CPU utilization. Structured data can be easily stored, retrieved and analyzed.
- Unstructured data refers to all those things that cannot be easily classified such as streaming data, videos, images, blogs, and wikis.
- Semi-structured data is a cross between the two. It lacks the strict data model of structured data but has tags or other markers that help you identify certain elements. Log files are a good example of semi-structured data.With this in mind, it is important to use a data analytics platform optimized to handle all types of machine generated data, including custom metrics.