Pricing Login
Pricing

Posts by

Blog

DevSecOps and log analysis: improving application security

Blog

AWS Lambda in Java 8: examples and instructions

Blog

How Australia's Privacy Legislation Amendment impacts cybersecurity

Blog

11 unique insights into SLOs and reliability management

Blog

Defragging database security in a fragmented cloud world

Blog

2022 Sumo Logic blog highlights, curated just for you

Blog

Learn about the meaning and value of cloud-native from experts at Atchison Technology, Qumu, Microsoft, and Techstrong Group

Blog

How female leaders find their path to career success

Blog

No-code vs. low-code and near-no-code security automation

Blog

Kubernetes DevSecOps vulnerabilities and best practices

Blog

How to improve your microservices architecture security

Blog

What is database security?

Blog

Too many tools? Best practices for planning and implementing a successful IT tool consolidation strategy

Blog

Fusing career paths with interests and passion

Blog

New AWS services? No problem! How Sumo Logic is evolving to meet your AWS observability needs

Blog

Prepare your IT systems for Black Friday with b​est practices and strategies from Ulta Beauty

Blog

Detection notes: In-memory Office application token theft

Blog

How to design a microservices architecture with Docker containers

Blog

How to take DevSecOps to the next level: A conversation with SecOps and DevOps leaders from NielsenIQ, ARA Security and Techstrong Group

Blog

How to build your DevOps team with Agile culture

Blog

Why Sumo Logic is betting its future on OpenTelemetry

Blog

Communicating the value of Sumo Logic in EMEA

Blog

Get AWS Lambda data at your fingertips faster with the new Telemetry API

Blog

How to decide on self-hosted vs managed Apache Airflow

Blog

10 things you should know about using AWS S3

Blog

Insights from Dolby and AWS CISOs on the challenges and opportunities in orchestrating the defense of modern applications

Blog

If and how to return to the office: Data-driven decision making

Blog

Building the case for Sumo Logic in the German market

Blog

How to track AWS costs with the AWS Cost Explorer app for Sumo Logic

Blog

2022 Gartner Magic Quadrant for SIEM: Sumo Logic positioned as a Visionary for the second year in a row

Blog

Open source documentation will improve collaboration

Blog

Datadog alternatives for cloud security and application monitoring

Blog

Understand the dependency between applications and infrastructure with Sumo Logic

Blog

Beat the challenges of supply chain vulnerability

Blog

Digital experiences — Our second nature

Blog

Find threats: Cloud credential theft on Windows endpoints

Blog

How Sumo Logic helps you comply with the CERT-In Directions 2022

Blog

FedRAMP: The journey to cloud secure operations

Blog

New capabilities: Sumo Logic expands Real User Monitoring (RUM)

Blog

How to drive better decision-making with reliability management

Blog

How to get maximum value from Service Level Objectives (SLOs)

Blog

Improve your application monitoring by reducing overhead of managing and updating alert rules

Blog

SOAR Market Guide 2022: What does the Gartner research say?

Blog

Simplify infrastructure and reduce costs with VPC Flow Logs ingest via Amazon Kinesis Data Firehose into Sumo Logic

Blog

How Sumo Logic is enhancing team pride, connection, insight and elevation

Blog

Five reasons to attend Illuminate 2022

Blog

Eight best practices for a successful cloud migration strategy

Blog

Why this employee believes diverse backgrounds make for better team collaboration

Blog

DevOps automation: Best practices and benefits

Blog

Discover the business impact of digital customer experience from E-Commerce and DevOps leaders

Blog

How a happy Sumo Logic customer became an employee

Blog

​​The Sumo Logic East Coast Tour Stops in Boston for AWS re:Inforce

Blog

SRE Pulse survey: Get the latest insights on the evolving role and employee impact

Blog

Monitorama 2022: the good, the bad and the beautiful (Part 2)

Blog

Get better visibility into DevOps performance in one place with Atlassian integrations

Blog

Use new Cloud SIEM Entity Groups to make threat response more efficient

Blog

Monitorama 2022: the good, the bad and the beautiful (Part 1)

Blog

How to gain Kubernetes visibility in just a few clicks

Blog

Learn how application monitoring helps lay the foundation for operational success

Blog

How one employee found his voice in a global organization

Blog

SIEM vs SOAR : Evaluating security tools for the modern SOC

Blog

Deconstructing AIOps: Is it even real?

Blog

How to increase allyship within the LGBTQIA+ community

Blog

Follina - CVE-2022-30190

Blog

Sumo Logic named a challenger in 2022 Gartner Magic Quadrant for APM and Observability

Blog

Why end-to-end visibility is critical to secure your apps in a serverless world

Blog

Sumo Logic expands Cloud SIEM security coverage for Microsoft Azure

Blog

AAPI month helps to understand and dispel Asian stereotypes at work

Blog

Join the Sumo Logic Security Team at RSA Conference 2022

Blog

Best practices to collect, customize and centralize Node.js logs

Blog

Former Navy serviceman now trains customers to be successful with Sumo Logic

Blog

Is your penetration testing weak? Catch hackers at your backdoor with Sumo Logic

Blog

How Sumo SREs manage and monitor SLOs as Code with OpenSLO

Blog

How an HR leader aligns business and people strategy to make a difference

Blog

Are we sure that SOAR is at a crossroads?

Blog

How SAP built a Dojo Community of Practice to support a cultural shift to DevOps

Blog

Unlocking self-service monitoring with the Sensu Integration Catalog

Blog

Sumo Logic celebrates Earth Day 2022 with Planeteer-led Earth Week

Blog

Weaponizing paranoia: developing a threat detection strategy

Blog

Why you need both SIEM and SOAR to improve SOC efficiencies and increase effectiveness

Blog

What it means to be ‘in it’ with our customers every single day

Blog

How to get started with OpenTelemetry auto-instrumentation for Java

Blog

Mind your Single Sign-On (SSO) logs

Blog

Okta evolving situation: Am I impacted?

Blog

Take the very first State of SRE Survey from DevOps Institute

Blog

Sumo Logic all-in with AWS

Blog

How to monitor RabbitMQ logs and metrics with Sumo Logic

Blog

Five women leaders share advice to empower the next generation of women in STEM

Blog

Want to improve collaboration and reduce incident response time? Try Cloud SOAR War Room

Blog

How to monitor ActiveMQ logs and metrics

Blog

Ship software faster by removing bottlenecks and keep work flowing

Blog

Overwhelmed: why SOAR solutions are a game changer

Blog

Minimize downtime, and improve performance for Verizon 5G Edge applications with Sumo Logic

Blog

How to monitor Amazon Kinesis

Blog

SRE: How the role is evolving

Blog

Cloud-native SOAR and SIEM solutions pave the road to the modern SOC

Blog

Adopt user analytics to accelerate security investigations

Blog

Make the most of your observability data with the Data Volume app

Blog

Monitoring AWS Spot instances using Sumo Logic

Blog

Monitoring your AWS environment for vulnerabilities and threat detection

Blog

Accelerating software delivery through observability at two very different organizations

Blog

Database monitoring with Sumo Logic and OpenTelemetry-powered distributed tracing

Blog

How teams are breaking down data silos to improve software delivery

Blog

Host and process metrics - monitoring beyond apps

Blog

Log4Shell CVE-2021-44228

Blog

Accelerate security operations today and tomorrow with automation and AI

Blog

User experience is a focus of Sumo Logic Observability innovations

Blog

Announcing new Sumo Logic AWS security Quick Start integrations

Blog

How Cloud SOAR helps teams boost security during cloud migration

Blog

How to streamline Windows monitoring for better security

Blog

Sumo Logic extends monitoring for AWS Fargate powered by AWS Graviton2 processors

Blog

How using Cloud SIEM dashboards and metrics for daily standups improves SOC efficiency

Blog

Extend your DevOps analysis to CircleCI and GitLab data

Blog

Why and how to monitor AWS EKS

Blog

How Sumo Logic monitors unit economics to improve cloud cost-efficiency

Blog

An open letter to Sumo Logic enthusiasts

Blog

The role of APM and distributed tracing in observability

Blog

Three Cloud SIEM innovations that improve team collaboration, tailor SOC workflows, and encourage customization

Blog

Top six Amazon S3 metrics to monitor

Blog

OpenTelemetry: the future of Sumo Logic Observability

Blog

Sumo Logic recognized as a leader in the GigaOm Radar Report for Security Orchestration, Automation, and Response (SOAR)

Blog

Illuminate 2021 - embracing open standards for big picture observability

Blog

How Cloud SOAR mitigates the cybersecurity skill gap problem in modern SOCs

Blog

Analyzing human layer risks with Tessian

Blog

Supply chain security, compliance, and privacy for cloud-native ecosystems

Blog

Sumo Logic extends monitoring for AWS Lambda functions powered by AWS Graviton2 processors

Blog

Introducing Sensu Plus

Blog

Troubleshooting outages at 3 AM with Alert Response

Blog

XDR, what is it? Does everyone agree? What is real impact vs. hype?

Blog

Extending observability to app infrastructure

Blog

Learn how to modernize security operations at Illuminate 21

Blog

Building a cloud-native SOC: fantasy or reality?

Blog

5 reasons to attend Illuminate

Blog

Securing critical infrastructure

Blog

5 reasons why security automation won't replace skilled security professionals

Blog

Supervised active intelligence - the next level of security automation

Blog

How to increase & justify your cybersecurity budget

Blog

Uncovering the power of Cloud SOAR’s Open Integration Framework

Blog

Integrating MITRE ATT&CK with Cloud SOAR to optimize SecOps and incident response

Blog

How to improve MTTD and MTTR with SOAR

Blog

SOAR doesn't replace humans - It makes them more efficient

Blog

All you need to know about HAProxy log format

Blog

Announcing New York State Department of Financial Services Attestation

Blog

How to implement cybersecurity automation in SecOps with SOAR (7 simple steps)

Blog

How the cloud-native journey is changing the CISO’s role

Blog

Monitoring Cassandra vs Redis vs MongoDB

Blog

How to use Cloud SOAR's search query bar to optimize workflow processes

Blog

Sumo Logic brings full coverage to modern IT and SecOps workflows with ServiceNow

Blog

Ready, set, SOAR! The road to next-gen SOC with SOAR security

Blog

5 trends shaping the cybersecurity landscape in 2021

Blog

Cost of cyber attacks vs. cost of cybersecurity in 2021

Blog

Why proactive threat hunting will be a necessity in 2021

Blog

Ransomware attacks 2.0: How to protect your data with SOAR

Blog

The state of SOAR: What to expect in 2021

Blog

Monitoring HAProxy logs and metrics with Sumo Logic

Blog

Modernizing SOC and security

Blog

Sumo Logic Red Hat Marketplace Operator

Blog

Disrupt your SOC or be disrupted

Blog

How SMART are your security program KPIs?

Blog

Flexible Incident Response playbooks for any situation

Blog

Global Confidence: Using crowdsourcing and machine learning to scale your SOC resources

Blog

Our vision for Cloud SOAR and the future

Blog

Sumo Logic completes full stack observability with Real User Monitoring capabilities

Blog

Announcing new Cloud Security Monitoring & Analytics apps to surface the most relevant security insights from AWS GuardDuty, WAF, and Security Hub data

Blog

Deep dive into Security Orchestration, Automation and Response (SOAR)

Blog

How to monitor NGINX deployments with Sumo Logic

Blog

How to use Kubernetes to deploy Postgres

Blog

Modern security ops with Zscaler and Sumo Logic

Blog

How to troubleshoot Apache Cassandra performance using metrics and logs in debugging

Blog

Building a modern SOC

Blog

Hunting for threats in multi-cloud and hybrid cloud environments

Blog

How to monitor Redis logs and metrics

Blog

Legacy vs. modern cloud SOAR-powered SOC

Blog

Queryless vs. query-less. Faster insights and better observer experience with span analytics

Blog

How to monitor Cassandra database clusters

Blog

Analyzing Office 365 GCC data with Sumo Logic

Blog

Optimize value of Cloudtrail logs with infrequent tier

Blog

Monitoring Apache Kafka clusters with Sumo Logic

Blog

Accelerate hybrid threat protection using Sumo Logic Cloud SIEM powered by AWS

Blog

Sumo Logic named a Visionary in the 2021 Gartner Magic Quadrant for SIEM for the first time

Blog

5 important DevOps monitoring metrics

Blog

The role of threat hunting in modern security

Blog

Using pre-built Monitors to proactively monitor your application infrastructure

Blog

Threat hunting with Cloud SIEM

Blog

Introducing new cloud security monitoring & analytics apps

Blog

CMMC compliance made easy with Sumo Logic

Blog

How to monitor application logs

Blog

Ensure cloud security with these key metrics

Blog

Introducing Sensu

Blog

Introducing Sumo Logic Cloud SIEM powered by AWS

Blog

Sumo Logic + DFLabs: Cloud SIEM combined with SOAR automates threat detection and incident response

Blog

Looking to disrupt your legacy SOC? Attend The Modern SOC Summit to find out how!

Blog

Distributed tracing vs. application monitoring

Blog

What is threat intelligence?

Blog

Detecting users crawling the MITRE ATT&CK stages in your AWS environment

Blog

Cloud SIEM accelerates modernizing security operations across Asia Pacific

Blog

Using Telegraf to collect infrastructure performance metrics

Blog

Accelerate incident resolution by benchmarks-enriched on-call contexts

Blog

Tail your logs with Tailing Sidecar Operator

Blog

Extend AWS observability beyond CloudWatch

Blog

Explore NGINX usage, performance, and transactions to increase customer experience

Blog

Sumo Logic joins AWS to accelerate Amazon CloudWatch Metrics collection

Blog

Why Prometheus isn’t enough to monitor complex environments

Blog

Microservices vs. serverless architecture

Blog

Sumo Logic extends support for OpenTelemetry to AWS Lambda

Blog

Sumo Logic extends its APM to browser

Blog

Sumo Logic to accelerate modernization of security operations with proposed acquisition of DFLabs

Blog

Efficiently monitor the state of Redis database clusters

Blog

Sumo Logic continues to expand public sector footprint

Blog

Forrester TEI study: Sumo Logic’s Cloud SIEM delivers 166 percent ROI over 3 years and a payback of less than 3 months

Blog

Service map & dashboards provide insight into health and dependencies of microservice architecture

Blog

Observability vs. monitoring: what's the difference?

Blog

Analyze your tracing data any way you want with Sumo search query language

Blog

Analyze JMX to better assess the health of your Java applications

Blog

How the COVID-19 pandemic has changed IT & security

Blog

Daemons in Cloud SOAR: proactively enhancing SecOps

Blog

How to dynamically auto-steer your traffic to multi-CDN or multiple data-centers

Blog

Sumo Logic achieves FedRAMP-Moderate authorization

Blog

Automating the potential workflows with Sumo Logic APIs

Blog

Case Study: Genesys’ journey to the cloud and DevOps excellence

Blog

Building autocomplete with ANTLR and CodeMirror

Blog

Code42 launches a new app in the Sumo Logic open source partner ecosystem

Blog

Best practices to monitor Cloudflare performance

Blog

Dark theme is here

Blog

SEGA Europe and Sumo Logic: integrating security across clouds

Blog

How to monitor Amazon DynamoDB performance

Blog

Embracing open source data collection

Blog

Improve your security posture by focusing on velocity, visibility, and vectors

Blog

Automate your SIEM with Sumo Logic in 7 clicks

Blog

How Clorox leverages Cloud SIEM across security operations, threat hunting, and IT Ops

Blog

How to monitor Amazon Aurora RDS logs and metrics

Blog

Everywhere in One Place: OpenTelemetry and Observability in Sumo Logic

Blog

Recommendations for monitoring SolarWinds supply chain attack with Sumo Logic Cloud SIEM

Blog

Automatic correlation of FireEye red team tool countermeasure detections

Blog

Application Performance Management for Microservices with Sumo Logic

Blog

How to Monitor Amazon Redshift

Blog

Building your modern cloud SIEM

Blog

Pondering Dogs and Observability

Blog

Monitoring Microsoft SQL Best Practices

Blog

Onboard your tracing data to Sumo Logic even faster with AWS Distro for OpenTelemetry (now in preview)

Blog

Monitor AWS Lambda functions created from container images

Blog

Sumo Logic partners with AWS to monitor Amazon EKS Distro

Blog

6 Signals that you need SOAR [Infographic]

Blog

How Sumo Logic’s Cloud SIEM Uses MITRE ATT&CK to Develop Content

Blog

Insights from the 5th annual Continuous Intelligence Report

Blog

Full VPC traffic visibility with AWS Network Firewall and Sumo Logic

Blog

How to Monitor Akamai Logs

Blog

Data security a major concern in healthcare: How to prevent data breaches with SOAR

Blog

The Dramatic Intersection of AI, Data and Modern Life

Blog

Creepy or Unjust: The State of Data in the U.S.

Blog

How to Monitor MongoDB Logs

Blog

Automated Tech Perpetuates the Digital Poorhouse

Blog

IoT cybersecurity in healthcare: How it can be improved with SOAR?

Blog

SOC vs. CSIRT - understanding the difference

Blog

SOAR guide #3: How to maximize your SOAR investment

Blog

How to analyze IIS logs for better monitoring

Blog

Introducing the Sumo Logic Observability suite with distributed tracing - a cornerstone of cloud-native APM

Blog

Security automation vs. security orchestration - what's the difference?

Blog

Illuminate 2020 Keynote: The What, Where and Why of Issues that Affect Reliable Customer Experiences

Blog

National Cyber Security Awareness month 2020 - The importance of SOAR

Blog

Modern App Reliability with Sumo Logic Observability

Blog

Building better software faster - the key to successful digital transformation

Blog

A New Framework for Modern Security

Blog

Building Better Apps: The Open Source Partner Ecosystem

Blog

PostgreSQL vs MySQL

Blog

Gartner’s 2020 SOAR Market Guide in a nutshell

Blog

Kubernetes Dashboard

Blog

NGINX Log Analyzer

Blog

SOAR guide #2: Taking security operations to the next level

Blog

Leveraging logs to better secure cloud-native applications

Blog

How security automation and orchestration helps you work smarter and improve Incident Response

Blog

Logging and Monitoring Kubernetes

Blog

Get Started with Kubernetes

Blog

Using Data to Find the Mysterious Centrist Voter

Blog

How Goibibo uses Sumo Logic to get log analytics at cloud scale

Blog

SOAR guide: The fundamentals of Security Orchestration, Automation and Response

Blog

Configuring the OpenTelemetry Collector

Blog

4 ways to distinguish a top SOAR platform

Blog

Can We Rely on Data to Predict the Outcome of the 2020 Election?

Blog

Integrating lessons learned into Incident Response

Blog

Kubernetes vs. Docker: What Does it Really Mean?

Blog

6 key steps to building a modern SOC

Blog

9 key components of incident and forensics management

Blog

How Cloud SOAR helps higher education institutions prevent cyber attacks

Blog

Why measuring SOC-cess matters - Using metrics to enhance your security program

Blog

Emerging issues in cybersecurity for higher education institutions

Blog

Top 5 Reasons to Attend Illuminate Virtual 2020

Blog

5 common Security Orchestration, Automation and Response (SOAR) use cases

Blog

SOAR to the sky: Discover the power of next-gen progressive automation

Blog

Simplifying log management with logging as a service

Blog

Detecting Windows Persistence

Blog

5 reasons why SOAR is a must-have technology for every high-functioning MSSP

Blog

AWS Observability: Designed specifically for AWS environments

Blog

Observability: The Intelligence Economy has arrived

Blog

How to Use the New Sumo Logic Terraform Provider for Hosted Collectors

Blog

Sumo Logic Achieves FedRAMP-Moderate “In Process”

Blog

Five critical components of SOAR technology

Blog

Distributed tracing analysis backend that fits your needs

Blog

Deploying AWS Microservices

Blog

Gartner SOAR Magic Quadrant: The best of Cloud SOAR is yet to come

Blog

Sumo Logic and ZeroFOX Join Forces to Improve Visibility and Protect your Public Attack Surface

Blog

3 core pillars of a SOAR Solution

Blog

Announcing new Sumo Logic dashboards

Blog

Rethinking Modern SOC Workflow

Blog

Gartner analysis: Why SOAR is the technology for the future

Blog

Reduce AWS bills with aws-nuke

Blog

What Data Types to Prioritize in Your SIEM

Blog

Cloud SIEM: Getting More Out of Your Threat Intelligence - 3 Use Cases for IOCs

Blog

5 Ways SOAR improves collaboration within a SOC team

Blog

Building a Security Practice Powered by Cloud SIEM

Blog

The automation hype is real for SOC teams: unpacking the Dimensional Research “2020 State of SecOps and Automation” report

Blog

Distributed Tracing & Logging - Better Together

Blog

Defense in depth: DoublePulsar

Blog

How SOAR improves the performance of а SOC team

Blog

Improving Application Quality through Log Analysis

Blog

Domain Hijacking Impersonation Campaigns

Blog

Continuous Intelligence for Atlassian tools and the DevSecOps Lifecycle (Part 2)

Blog

The Path of an Outlaw, a Shellbot Campaign

Blog

The power of new-age playbooks in Incident Response

Blog

Gaining Visibility Into Edge Computing with Kubernetes & Better Monitoring

Blog

Why cloud-native SIEM is vital to closing the security skills gap

Blog

How SOAR improves Standard Operating Procedures (SOP)

Blog

Standard Operating Procedures as big piece of the cyber Incident Response puzzle

Blog

The value of a stolen account. A look at credential stuffing attacks.

Blog

The Difference Between IaaS, Paas, and SaaS

Blog

Artificial intelligence and machine learning in cybersecurity

Blog

SOAR takes over where detection starts: Understanding the role of SOAR in Standard Operating Procedures

Blog

awA Million Dollar Knob: S3 Object Lifecycle Optimization

Blog

The difference between playbooks and runbooks in Incident Response

Blog

The 7 Essential Metrics for Amazon EC2 Monitoring

Blog

Continuous Intelligence for Atlassian tools and the DevSecOps Lifecycle (Part 1)

Blog

Monitoring MySQL Performance Metrics

Blog

SOAR trends in 2020: What does the future look like for SOAR?

Blog

MySQL Log File Location

Blog

Independent Survey Reveals: Continuous Intelligence Demand Grows as Organizations Shift to Real-time Business

Blog

Service Mesh Comparison: Istio vs. Linkerd

Blog

Profiling "VIP Accounts" Part 2

Blog

The importance of evidence preservation in incident response

Blog

7 Key DevOps Principles

Blog

How to Build a DevOps Pipeline

Blog

Utilizing Cloud SOAR to manage IT and OT and strengthen the cybersecurity posture

Blog

NoSQL-based stacks exposed to the Internet

Blog

Spam In the Browser

Blog

Adopting Distributed Tracing: Finding the Right Path

Blog

Profiling “VIP Accounts” Part 1

Blog

Best Practices for Logging in AWS Lambda

Blog

How SOAR improves EDR in SOC processes

Blog

Sumo Logic and NIST team up to secure energy sector IoT

Blog

AWS Lambda Monitoring - what to keep an eye on with serverless

Blog

Remote Admin Tools (RATs): The Swiss Army Knives of Cybercrime

Blog

The cost of cybersecurity solutions vs. the cost of cyber attacks

Blog

How SOAR helps PSPs effectively comply with PSD2 regulations

Blog

The New Opportunity

Blog

How to scale Prometheus monitoring

Blog

Limitless analytics for all your data, at a price that fits your budget

Blog

“Fiel-ding Good” - Three great ways to enrich AWS logs in Sumo Logic

Blog

Triage fraudulent transactions with Cloud SOAR

Blog

5 questions to ask before investing in a SOAR solution

Blog

Sumo Logic Recognized as Data Analytics Solution of the Year Showcasing the Power of Continuous Intelligence

Blog

How SOAR helps protect remote workers from cyber threats

Blog

Best Practices for Data Tagging, Data Classification & Data Enrichment

Blog

COVID-19 crisis management guide for business leaders

Blog

PowerShell and ‘Fileless Attacks’

Blog

Monitoring with Prometheus vs Grafana: Understanding the Difference

Blog

Ensure a secure and reliable Zoom video conferencing service

Blog

How to Monitor Amazon ECS

Blog

Addressing the lack of qualified cybersecurity professionals - What can we do about it?

Blog

Top 5 security challenges with Zoom video conferencing

Blog

COVID-19 Guide for Security Professionals

Blog

4 core functions of a Security Orchestration, Automation and Response (SOAR) solution

Blog

Where will SOAR go in the next 5 years?

Blog

Love In The Time Of Coronavirus

Blog

Sumo Logic Announces Continuous Intelligence for Atlassian Tools

Blog

How SOAR can foster efficient SecOps in modern SOCs

Blog

Alcide kAudit Integrates with Sumo Logic

Blog

Work from home better with secure and reliable enterprise service

Blog

FedRAMP Joint Authorization Board (JAB) Prioritizes Sumo Logic for P-ATO

Blog

How to manage cyber fraud with SOAR

Blog

Best Practices for CSOs to Navigate Today’s Uncertain World

Blog

The top 5 challenges faced by Security Operations Centers

Blog

How does Sumo Logic’s Cloud SOAR compare to other SOAR solutions?

Blog

Amazon VPC Traffic Mirroring

Blog

Automation in cybersecurity: Benefit or a threat?

Blog

What is Amazon ECS?

Blog

CASB vs Cloud SIEM for SaaS Security

Blog

SOAR for Success: How to properly measure KPIs for security operations

Blog

In A Fast Changing World, Peer Benchmarks Are A GPS

Blog

What is SOAR? A comprehensive guide on how SOAR emerged in the cybersecurity world

Blog

A Healthy Outlook on Security From RSA Conference 2020

Blog

5 key benefits of a SOAR solution for MSSPs

Blog

Securing IaaS, PaaS, and SaaS in 2020 with a Cloud SIEM

Blog

A New Integration between Sumo Logic and ARIA Cybersecurity Solutions

Blog

Pre-RSA Twitter Poll: 3 Interesting Observations on SOC, SIEM and Cloud

Blog

How to implement Incident Response automation the right way

Blog

SIEM Yara Rules

Blog

How to Secure Office365 with Cloud SIEM

Blog

How We Understand Monitoring

Blog

Securing your SaaS apps in 2020: 3 pillars you can’t neglect

Blog

How to Monitor EKS Logs

Blog

The total business impact of Sumo Logic Cloud SIEM

Blog

How Data Analytics Support the CDM Program

Blog

Tracking Systems Metrics with collectd

Blog

Understanding the Apache Access Log: View, Locate and Analyze

Blog

AWS offers 175 services now. Should you be adopting many of them now?

Blog

Can You Tell Debug Data and BI Data Apart?

Blog

What is Amazon Elastic Kubernetes Service (EKS)?

Blog

Top 5 Cybersecurity Predictions for 2020

Blog

The Ultimate Guide to Windows Event Logging

Blog

How to View Logs in Kubectl

Blog

All The Logs For All The Intelligence

Blog

Vagrant vs. Docker: Which Is Better for Software Development?

Blog

NGINX vs Apache

Blog

Sumo Logic and Amazon Web Services Continue to Help Businesses Thrive in the Cloud Era

Blog

The New Sumo Logic AWS Security Quick Start

Blog

New Sumo Logic Apps with support for AWS Hierarchies

Blog

Announcing Sumo Logic Archive Intelligence Service now in Beta

Blog

Monitor Cloud Run for Anthos with Sumo Logic

Blog

How to Monitor Redshift Logs with Sumo Logic

Blog

AWS S3 Monitoring with Sumo Logic

Blog

Top 10 SIEM Best Practices

Blog

Multi-Cloud is Finally Here!

Blog

Data Privacy Is Our Birthright - national cybersecurity month

Blog

Context is Everything - How SPS Commerce uses context to embrace complexity

Blog

What is AWS S3

Blog

How Informatica Confidently Migrates to Kubernetes with Sumo Logic

Blog

How Doximity solved their high operational overhead of their Elastic stack with Sumo Logic

Blog

5 business reasons why every CIO should consider Kubernetes

Blog

awsHow to Monitor Amazon Redshift

Blog

5 Tips for Preventing Ransomware Attacks

Blog

We Live in an Intelligence Economy - Illuminate 2019 recap

Blog

Cloud Scale Correlation and Investigation with Cloud SIEM

Blog

Service Levels––I Want To Buy A Vowel

Blog

Serverless Computing for Dummies: AWS vs. Azure vs. GCP

Blog

How to Secure Kubernetes Using Cloud SIEM?

Blog

Serverless Computing Security Tips

Blog

10 Modern SIEM Use Cases

Blog

Challenges of Monitoring and Troubleshooting in Kubernetes Environments

Blog

More Innovations from Sumo Logic that Harnesses the Power of Continuous Intelligence for Modern Enterprises

Blog

Monitoring Slack workspaces with the Sumo Logic app for Slack

Blog

A 360 degree view of the performance, health and security of MongoDB Atlas

Blog

Monitor your Google Anthos clusters with the Sumo Logic Istio app 

Blog

Sumo Logic’s World Class Partner and Channel Ecosystem Experiences Triple Digit Growth

Blog

6 Observations from the 2019 CI Report: State of Modern Applications and DevSecOps In The Cloud

Blog

What is PCI DSS compliance?

Blog

Objectives-Driven Observability

Blog

Peering Inside the Container: How to Work with Docker Logs

Blog

Security Strategies for Mitigating IoT Botnet Threats

Blog

How to Read, Search, and Analyze AWS CloudTrail Logs

Blog

Serverless vs. Containers: What’s the Same, What’s Different?

Blog

How to Monitor Syslog Data with Sumo Logic

Blog

Know Your Logs: IIS vs. Apache vs. NGINX Logs

Blog

Multi-Cloud Security Myths

Blog

What is Amazon Redshift?

Blog

See You in September at Illuminate!

Blog

Sumo Logic adds Netskope to its Security and Compliance Arsenal

Blog

How to SIEMplify through Cloud SIEM

Blog

Illuminate 2019 Stellar Speaker Line-up Will Help Attendees See Business and the World Differently Through Data Analytics

Blog

How to Monitor Fastly CDN Logs with Sumo Logic

Blog

How to Monitor NGINX Logs with Sumo Logic

Blog

To SIEM or not to SIEM?

Blog

Cloud Security: What It Is and Why It’s Different

Blog

How to Monitor Fastly Performance

Blog

Gartner is fully in the cloud. Are you?

Blog

How to monitor NGINX logs

Blog

Why you need to secure your AWS infrastructure and workloads?

Blog

What is AWS CloudTrail?

Blog

6 steps to secure your workflows in AWS

Blog

Machine Data is Business Intelligence for Digital Companies

Blog

Launching the AWS security threats benchmark

Blog

3 key takeaways on Cloud SIEM from Gartner Security & Risk Management Conference 2019

Blog

Sumo Logic provides real-time visibility, investigation and response of G Suite Alerts

Blog

What is NGINX?

Blog

What is Fastly CDN?

Blog

Industry Analysts Recognizing Cloud Analytics Brings Wave of Disruption to the SIEM Market

Blog

Now FedRAMP Ready, Sumo Logic Empowers Public Organizations

Blog

The Super Bowl of the Cloud

Blog

The Cloud SIEM market is validated by Microsoft, Google, and AWS

Blog

Clearing the Air: What Is Cloud Native?

Blog

Key Metrics to Baseline Cloud Migration

Blog

Typing a useReducer React hook in TypeScript

Blog

What is IoT Security?

Blog

Recycling is for Cardboard, not Analytics Tools

Blog

How to Monitor Apache Web Server Performance

Blog

IIS Logs Location

Blog

Software visibility is the key to innovation

Blog

What is Apache? In-Depth Overview of Apache Web Server

Blog

The Why Behind Modern Architectures

Blog

Control Your Data Flow with Ingest Budgets

Blog

From SRE to QE - Full Visibility for the Modern Application in both Production and Development

Blog

Sumo Logic Cert Jams Come to Japan

Blog

Best Practices with AWS GuardDuty for Security and Compliance

Blog

People-driven Documentation

Blog

Improve Alert Visibility and Monitoring with Sumo Logic and Opsgenie

Blog

What is AWS GuardDuty?

Blog

Platforms All The Way Up & Down

Blog

What is Serverless Architecture?

Blog

How Sumo Logic Maps DevOps Topologies

Blog

Endpoint Security Analytics with Sumo Logic and Carbon Black

Blog

RSAC 19 Partner Cam: Sumo Logic & PagerDuty Deliver Seamless SecOps

Blog

AWS 101: An Overview of Amazon Web Services

Blog

Building Cross-platform Mobile Apps

The way we all experience and interact with apps, devices, and data is changing dramatically. End users demand that apps are responsive, stable, and offer the same user experience no matter which platform they are using. To build these well, many developers consider creating cross-platform apps. Although building a separate native app per platform is a preferred approach for mass market consumer apps, there are still a lot of situations where it makes more sense to go cross-platform. In this post I’ll look at the most popular strategies a developer faces when it comes to building a mobile app, and some tools that help you to build well. Mobile Web Apps This is probably the most easiest way onto a mobile device. Mobile web apps are hosted on a remote server and built using identical technologies to desktop web apps: HTML5, JavaScript and CSS. The primary difference is that it will be accessed via mobile device’s built-in web browser, which may require you to apply responsive web design principles to ensure that the user experience is not degraded by the limited screen size on mobile and that would be costly to build and maintain. The cost of applying responsive design principles to a web site may be a significant fraction of developing a mobile app. Native Mobile Apps Native apps are mainly developed using the device’s out-of-the-box SDK. This is a huge advantage as you have full access to the device’s API, features, and inter-app integration. However it also means you need to learn Java to build Apps for Android, Objective-C for iOS, and C# for Windows phones. Whether you are a single developer or working with a company and multi team skills, learning to code in multiple languages is costly and time-consuming. And most of the time, all features will not be available on every platform. Cross-Platform Mobile Apps Cross-platform apps have somewhat of a reputation of not being competitive against native apps, but we continue to see more and more world class apps using this strategy. Developers only have to maintain a single code base for all platforms. They can reuse the same components within different platforms, and most importantly, developers can still access the native API via native modules. Below are some tools that support building cross-platform apps: PhoneGap Owned by Adobe, PhoneGap is a free resource and handy to translate HTML5, CSS and JavaScript code. Once the app is ready, the community will help in reviewing the app and it is supported all major platforms including BlackBerry. Xamarin.Forms With a free starter option, Xamarin.Forms is great tool for C# and Ruby developers to build an app cross-platform with the option of having access to native platform’s API. The a wide store of component to help achieve the goal faster. Xamarin has created a robust cross platform mobile development platform that’s been adopted by big names like Microsoft, Foursquare, IBM, and Dow Jones. Unity 3D This tool is mainly focused on building game apps, and very useful when graphics is most important detail in it. This cross platform mobile development tool goes beyond simple translation. After developing your code in UnityScriptor or C#, you can export your games to 17 different platforms, including iOS, Android, Windows, Web, Playstation, Xbox, Wii and Linux. When it comes to building an app, whether cross-platform or not, views and thoughts always differ. My preference is cross-platform for one main reason; it is less time-consuming - that is critical because I can then focus on adding new features to the app, or building another one. About the Author Mohamed Hasni is a Software Engineer focusing on end-to-end web and mobile development and delivery. He has deep experience in building line of business applications for large-scale enterprise deployments.

Blog

Sumo Logic Expands into Japan to Support Growing Cloud Adoption

In October of last year, I joined Sumo Logic to lead sales and go-to-market functions with the goal of successfully launching our newly established Japan region in Tokyo. The launch was highly received by our customers, partners, prospects and peers in the Japanese market and everyone walked away from the event optimistic about the future and hungry for more!It certainly was an exciting time not only for the company, but for me, personally, and as I reflect over the past few months here, I wanted to share a little bit about why the company’s launch in Japan came at a very strategic and opportune time as well as why Sumo Logic is a great market fit.Market OpportunityIn terms of overall IT spend and market size, Japan remains the second largest market in the world behind U.S. in enterprise technology. A large part of that is because of service spending versus traditional hardware and software.For years, Japan had been a half step or more behind in cloud technology innovation and adoption, but that has since changed. Now, Japan is experiencing a tsunami of cloud adoption with major influence from Amazon Web Services (AWS) who has aggressively invested in building data centers in Japan the past several years.The fact that AWS began heavily investing in the Japanese technology market was a strong indication to us at Sumo Logic that as we continue to expand our global footprint, the time was finally right to capitalize on this market opportunity.Sumo Logic OpportunityHowever, market opportunity aside, the nature of our SaaS machine data analytics platform and the services we provide across operations, security and the business, was a perfect fit for the needs of innovating Japanese enterprises. I’ve been here in Tokyo for over 30 years so I feel (with confidence) that it was our moment to shine in Japan. From a sales perspective, we’re very successful with a land and expand approach where we start with only a small subset of the business, and then gradually, we grow to other areas, operations, security and business as we continue to deliver great customer experiences that demonstrate long term value and impact. That level of trust building and attentiveness we provide to our global customer base is very typical of how Japanese enterprises like to conduct business. In other words, the core business model and approach of Sumo Logic are immediately applicable to the Japanese market. Anyone with experience in global IT will understand the simple, but powerful meaning of this; Sumo Logic`s native affinity with Japan is an enormous anomaly.And, Japan can be a very unforgiving market. It’s not a place where you want to come with a half-baked products or a smoke and mirrors approach. Solid products, solutions and hard work are, on the other hand, highly respected.Vertical FocusAs I’ve mentioned above, the Japan market is mostly enterprise, which is a sweet spot for Sumo Logic, and there’s also a heavy influence of automotive and internet of things (IoT) companies here. In fact, four of the world’s largest automotive companies are headquartered in Japan and their emerging autonomous driving needs, in particular, align directly with the real-time monitoring, troubleshooting and security analytics capabilities that are crucial for modern innovations around connected cars and IoT, which both generate massive amounts of data. Customers like Samsung SmartThings, Sharp and Panasonic leverage our platform for DevSecOps teams that want to visualize, build, run and secure that data. The connected car today has become less about the engine and more about the driver experience, which is 100 percent internet-enabled.Japan is also one of the two major cryptocurrency exchange centers in the world, which is why financial services, especially fintech, bitcoin and cryptocurrencies companies, is another focus vertical for Sumo Logic Japan. Our DevSecOps approach and cloud-native multi-tenant platform provides massive mission-critical operations and security analytics capabilities for crypto companies. Most of these financial services companies are struggling to stay on top of increasingly stringent regulatory and data requirements, and one of the biggest use cases for these industries is compliance monitoring. Japan is regulatory purgatory, and so our customers look for us to help automate parts of their compliance checks and security audits.Strong Partner EcosystemHaving a strong partner ecosystem was another very important piece of our overall GTM strategy in Japan. We were very fortunate to have forged an early partnership with AWS Japan that led to an introduction to one of their premium consulting partners, Classmethod, the first regional partnership with AWS APN. The partnership is already starting to help Japanese customers maximize their investment in the Sumo Logic platform by providing local guided deployment, support and storage in AWS. In addition, Sumo Logic provides the backbone for Classmethod’s AWS infrastructure to provide the continuous intelligence needed to serve and expand their portfolio of customers.Going forward, we’ll continue to grow our partner ecosystem with the addition of service providers, telecoms, MSPs and MSSPs for security to meet our customer’s evolving needs.Trusted AdvisorAt the end of the day, our mission is to help our customers continue to innovate and provide support in areas where they most need it — economically visualizing data across their modern application stacks and cloud infrastructures. We’re in a position to help all kinds of Japanese customers across varying industries modernize their architectures. Japanese customers know that they need to move to the cloud and continue to adopt modern technologies.We’re in the business of empowering our customers to focus on their core competencies while they leave the data piece component to us. By centralizing all of this disparate data into one platform, they can better understand their business operations, be more strategic and focus on growing their business. We’ve gone beyond selling a service to becoming both “data steward” as well as trusted data advisors for our customers. Japanese business is famous for its organic partnering model — think of supply chain management, and so on. Sumo Logic’s core strategy of pioneering machine data stewardship is a perfect extension of this to meet the rapidly evolving needs of the digital economy in Japan.Now that we have a local presence with a ground office and support team, we can deliver a better and more comprehensive experience to new and existing customers, like Gree and OGIS-RI, and look forward to continued growth and success in this important global market.Additional ResourcesRead the press release for more on Sumo Logic's expansion into Japan Download the white paper on cloud migration Download the ‘State of Modern Apps & DevSecOps in the Cloud’ report

AWS

January 31, 2019

Blog

Recapping the Top 3 Talks on Futuristic Machine Learning at Scale By the Bay 2018

As discussed in our previous post, we recently had the opportunity to present some interesting challenges and proposed directions for data science and machine learning (ML) at the 2018 Scale By the Bay conference. While the excellent talks and panels at the conference were too numerous to cover here, I wanted to briefly summarize three talks in particular that I found to represent some really interesting (to me) directions for ML on the java virtual machine (JVM). Talk 1: High-performance Functional Bayesian Inference in Scala By Avi Bryant (Stripe) | Full Video Available Here Probabilistic programming lies at the intersection of machine learning and programming languages, where the user directly defines a probabilistic model of their data. This formal representation has the advantage of neatly separating conceptual model specification from the mechanics of inference and estimation, with the intention that this separation will make modeling more accessible to subject matter experts while allowing researchers and engineers to focus on optimizing the underlying infrastructure. (image source) Rainier in an open-source library in Scala that allows the user to define their model and do inference in terms of monadic APIs over distributions and random variables. Some key design decisions are that Rainier is “pure JVM” (ie, no FFI) for ease of deployment, and that the library targets single-machine (ie, not distributed) use cases but achieves high performance via the nifty technical trick of inlining training data directly into dynamically generated JVM bytecode using ASM. Talk 2: Structured Deep Learning with Probabilistic Neural Programs By Jayant Krishnamurthy (Semantic Machines) | Full Video Available Here Machine learning examples and tutorials often focus on relatively simple output spaces: Is an email spam or not? Binary outputs: Yes/No, 1/0, +/-, … What is the expected sale price of a home? Numerical outputs – $1M, $2M, $5M, … (this is the Bay Area, after all!) However, what happens when we want our model to output a more richly structured object? Say that we want to convert a natural language description of an arithmetic formula into a formal binary tree representation that can then be evaluated, for example “three times four minus one” would map to the binary expression tree “(- (* 3 4) 1)”. The associated combinatorial explosion in the size of the output space makes “brute-force” enumeration and scoring infeasible. The key idea of this approach is to define the model outputs in terms of a probabilistic program (which allows us to concisely define structured outputs), but with the probability distributions of the random variables in that program being parameterized in terms of neural networks (which are very expressive and can be efficiently trained). This talk consisted mostly of live-coding, using an open-source Scala implementation which implements a monadic API for a function from neural network weights to a probability distribution over outputs. Talk 3: Towards Typesafe Deep Learning in Scala By Tongfei Chen (Johns Hopkins University) | Full Video Available Here (image source) For a variety of reasons, the most popular deep learning libraries such as TensorFlow & PyTorch are primarily oriented around the Python programming language. Code using these libraries consists primarily of various transformation or processing steps applied to n-dimensional arrays (ndarrays). It can be easy to accidentally introduce bugs by confusing which of the n axes you intended to aggregate over, mis-matching the dimensionalities of two ndarrays you are combining, and so on. These errors will occur at run time, and can be painful to debug. This talk proposes a collection of techniques for catching these issues at compile time via type safety in Scala, and walks through an example implementation in an open-source library. The mechanics of the approach are largely based on typelevel programming constructs and ideas from the shapeless library, although you don’t need to be a shapeless wizard yourself to simply use the library, and the corresponding paper demonstrates how some famously opaque compiler error messages can be made more meaningful for end-users of the library. Conclusion Aside from being great, well-delivered talks, several factors made these presentations particularly interesting to me. First, all three had associated open-source Scala libraries. There is of course no substitute for actual code when it comes to exploring the implementation details and trying out the approach on your own test data sets. Second, these talks shared a common theme of using the type system and API design to supply a higher-level mechanism for specifying modeling choices and program behaviors. This can both make end-user code easier to understand as well as unlock opportunities for having the underlying machinery automatically do work on your behalf in terms of error-checking and optimization. Finally, all three talks illustrated some interesting connections between statistical machine learning and functional programming patterns, which I found interesting as a longer-term direction for trying to build practical machine learning systems. Additional Resources Learn how to analyze Killer Queen game data with machine learning and data science with Sumo Logic Notebooks Interested in working with the Sumo Logic engineering team? We’re hiring! Check out our open positions here Sign up for a free trial of Sumo Logic

Blog

Sumo Logic Experts Reveal Their Top Enterprise Tech and Security Predictions for 2019

Blog

SnapSecChat: Sumo Logic's CSO Explains the Next-Gen SOC Imperative

Blog

How to Analyze Game Data from Killer Queen Using Machine Learning with Sumo Logic Notebooks

Blog

The Insider’s Guide to Sumo Cert Jams

What are Sumo Cert Jams? Sumo Logic Cert Jams are one and two-day training events held in major cities all over the world to help you ramp up your product knowledge, improve your skills and walk away with a certification confirming your product mastery. We started doing Cert Jams about a year ago to help educate our users around the world on what Sumo can really do and give you a chance to network and share use cases with other Sumo Logic users. Not to mention, you get a t-shirt. So far, we’ve had over 4,700 certifications from 2,700+ unique users across 650+ organizations worldwide. And we only launched the Sumo Cert Jam program in April! If you’re still undecided, check out this short video where our very own Mario Sanchez, Director of the Sumo Logic Learn team, shares why you should get the credit and recognition you deserve! Currently there are four certifications for Sumo Logic: Pro User Power User Power Admin Security User And these are offered in a choose-your-own-adventure format. While everyone starts out with the Pro User certification to learn the fundamentals, you can take any of the remaining exams depending on your interest in DevOps (Power User), Security, or Admin. Once you complete Sumo Pro User, you can choose your own path to Certification success. For a more detailed breakdown on the different certification levels, check out our web page, or our Top Reasons to Get Sumo Certified blog. What’s the Value? Often customers ask me in one-on-one situations what is the value of certification, and I tell them that we have seen significant gains in user understanding, operator usage and search performance once we get users certified. Our first Cert Jam in Delhi, India with members from the Bed, Bath and Beyond team showing their certification swag! First, there’s the ability to rise above “Mere Mortals” (those who haven’t been certified) and write better and more complex queries. From parsing to correlation, there’s a significant increase by certified users taking Pro (Level 1), Power User (Level 2), Admin (Level 3) and Security. Certified users are taking advantage of more Sumo Logic features, not only getting more value out of their investment, but also creating more efficient/performant queries. And from a more general perspective, once you know how to write better queries and dashboards, you can create the kind of custom content that you want. When it comes to monitoring and alerting, certified users are more likely to create dashboards and alerts to stay on top of what’s important to their organizations, further benefiting from Sumo Logic as a part of their daily workload. Here we can see that certified users show an increase in the creation of searches, dashboards and alerts, as well as key optimization features such as Field Extraction Rules (FERs), scheduled views and partitions: Join Us If you’re looking to host a Cert Jam at your company, and have classroom space for 50, reach out to our team. We are happy to work with you and see if we can host one in your area. If you’re looking for ways to get certified, or know someone who would benefit, check out our list of upcoming Cert Jams we’re offering. Don’t have Sumo Logic, but want to get started? Sign up for Sumo Logic for free! Our Cert Jam hosted by Tealium in May. Everyone was so enthusiastic to be certified.

Blog

Understanding the Impact of the Kubernetes Security Flaw and Why DevSecOps is the Answer

Blog

Careful Data Science with Scala

This post gives a brief overview of some ideas we presented at the recent Scale By the Bay conference in San Francisco, for more details you can see a video of the talk or take a look at the slides. The Problems of Sensitive Data and Leakage Data science and machine learning have gotten a lot of attention recently, and the ecosystem around these topics is moving fast. One significant trend has been the rise of data science notebooks (including our own here at Sumo Logic): interactive computing environments that allow individuals to rapidly explore, analyze, and prototype against datasets. However, this ease and speed can compound existing risks. Governments, companies, and the general public are increasingly alert to the potential issues around sensitive or personal data (see, for example, GDPR). Data scientists and engineers need to continuously balance the benefits of data-driven features and products against these concerns. Ideally, we’d like a technological assistance that makes it easier for engineers to do the right thing and avoid unintended data processing or revelation. Furthermore, there is also a subtle technical problem known in the data mining community as “leakage”. Kaufman et al won the best paper award at KDD 2011 for Leakage in Data Mining: Formulation, Detection, and Avoidance, which describes how it is possible to (completely by accident) allow your machine learning model to “cheat” because of unintended information leaks in the training data contaminating the results. This can lead machine learning systems which work well on sample datasets but whose performance is significantly degraded in the real world. As this can be a major problem, especially in systems that pull data from disparate sources to make important predictions. Oscar Boykin of Stripe presented an approach to this problem at Scale By the Bay 2017 using functional-reactive feature generation from time-based event streams. Information Flow Control (IFC) for Data Science My talk at Scale By the Bay 2018 discussed how we might use Scala to encode notions of data sensitivity, privacy, or contamination, thereby helping engineers and scientists avoid these problems. The idea is based on programming languages (PL) research by Russo et al, where sensitive data (“x” below) is put in a container data type (the “box” below) which is associated with some security level. Other code can apply transformations or analyses to the data in-place (known as Functor “map” operation in functional programming), but only specially trusted code with an equal or greater security level can “unbox” the data. To encode the levels, Russo et al propose using the Lattice model of secure information flow developed by Dorothy E. Denning. In this model, the security levels form a partially ordered set with the guarantee that any given pair of levels will have a unique greatest lower bound and least upper bound. This allows for a clear and principled mechanism for determining the appropriate level when combining two pieces of information. In the Russo paper and our Scale By the Bay presentation, we use two levels for simplicity: High for sensitive data, and Low for non-sensitive data. To map this research to our problem domain, recall that we want data scientists and engineers to be able to quickly experiment and iterate when working with data. However, when data may be from sensitive sources or be contaminated with prediction target information, we want only certain, specially-audited or reviewed code to be able to directly access or export the results. For example, we may want to lift this restriction only after data has been suitably anonymized or aggregated, perhaps according to some quantitative standard like differential privacy. Another use case might be that we are constructing data pipelines or workflows and we want the code itself to track the provenance and sensitivity of different pieces of data to prevent unintended or inappropriate usage. Note that, unlike much of the research in this area, we are not aiming to prevent truly malicious actors (internal or external) from accessing sensitive data – we simply want to provide automatic support in order to assist engineers in handling data appropriately. Implementation and Beyond Depending on how exactly we want to adapt the ideas from Russo et al, there are a few different ways to implement our secure data wrapper layer in Scala. Here we demonstrate one approach using typeclass instances and implicit scoping (similar to the paper) as well as two versions where we modify the formulation slightly to allow changing the security level as a monadic effect (ie, with flatMap) having last-write-wins (LWW) semantics, and create a new Neutral security level that always “defers” to the other security levels High and Low. Implicit scoping Most similar to the original Russo paper, we can create special “security level” object instances, and require one of them to be in implicit scope when de-classifying data. (Thanks to Sergei Winitzki of Workday who suggested this at the conference!) Value encoding For LWW flatMap, we can encode the levels as values. In this case, the security level is dynamically determined at runtime by the type of the associated level argument, and the de-classify method reveal() returns a type Option[T] where it is None if the level is High. This implementation uses Scala’s pattern-matching functionality. Type encoding For LWW flatMap, we can encode the levels as types. In this case, the compiler itself will statically determine if reveal() calls are valid (ie, against the Low security level type), and simply fail to compile code which accesses sensitive data illegally. This implementation relies on some tricks derived from Stefan Zeiger’s excellent Type-Level Computations in Scala presentation. Data science and machine learning workflows can be complex, and in particular there are often potential problems lurking in the data handling aspects. Existing research in security and PL can be a rich source of tools and ideas to help navigate these challenges, and my goal for the talk was to give people some examples and starting points in this direction. Finally, it must be emphasized that a single software library can in no way replace a thorough organization-wide commitment to responsible data handling. By encoding notions of data sensitivity in software, we can automate some best practices and safeguards, but it will necessarily only be a part of a complete solution. Watch the Full Presentation at Scale by the Bay Learn More

Blog

Why European Users Are Leveraging Machine Data for Security and Customer Experience

To gain a better understanding of the adoption and usage of machine data in Europe, Sumo Logic commissioned 451 Research to survey 250 executives across the UK, Sweden, the Netherlands and Germany, and to compare this data with a previous survey of U.S. respondents that were asked the same questions. The research set out to answer a number of questions, including: Is machine data in fact an important source of fuel in the analytics economy? Do businesses recognize the role machine data can play in driving business intelligence? Are businesses that recognize the power of machine data leaders in their fields? The report, “Using Machine Data Analytics to Gain Advantage in the Analytics Economy, the European Edition,” released at DockerCon Europe in Barcelona this week, reveals that companies in the U.S. are currently more likely to use and understand the value of machine data analytics than their European counterparts, but that Europeans lead the U.S. in using machine data for security use cases. Europeans Trail US in Recognizing Value of Machine Data Analytics Let’s dig deeper into the stats regarding U.S. respondents that stated they were more likely to use and understand the value of machine data analytics. For instance, 36 percent of U.S. respondents have more than 100 users interacting with machine data at least once a week, while in Europe, only 21 percent of respondents have that many users. Likewise, 64 percent of U.S. respondents said that machine data is extremely important to their company’s ability to meet its goals, with 54 percent of European respondents saying the same. When asked if machine data tools are deployed on-premises, only 48 percent of European correspondents responded affirmatively, compared to 74 percent of U.S. respondents. The gap might be explained by idea that U.S. businesses are more likely to have a software-centric mindset. According to the data, 64 percent of U.S. respondents said most of their company had software-centric mindsets, while only 40 percent of European respondents said the same. Software-centric businesses are more likely to recognize that machine data can deliver critical insights, from both an operational and business perspective, as they are more likely to integrate their business intelligence and machine data analytics tools. Software-centric companies are also more likely to say that a wide variety of users, including head of IT, head of security, line-of-business users, product managers and C-level executives recognize the business value of machine data. Europeans Lead US in Using Machine Data for Security At 63 percent, European companies lead the way in recognising the benefit of machine data analytics in security use cases, which is ahead of the U.S. Given strict data privacy regulations in Europe, including the new European Union (EU) General Data Protection Regulation (GDPR), it only seems natural that security is a significant driver for machine data tools in the region. Business Insight Recognized by Europeans as Valuable Beyond security, other top use cases cited for machine data in Europe are monitoring (55 percent), troubleshooting (48 percent) and business insight (48 percent). This means Europeans are clearly recognizing the value of machine data analytics beyond the typical security, monitoring and troubleshooting use-cases — they’re using it as a strategic tool to move the business forward. When IT operations teams have better insight into business performance, they are better equipped to prioritize incident response and improve their ability to support business goals. A Wide Array of European Employees in Different Roles Use Machine Data Analytics The data further show that, in addition to IT operations teams, a wide array of employees in other roles commonly use machine data analytics. Security analysts, product managers and data analysts — some of whom may serve lines of business or senior executives — all appeared at the top of the list of the roles using machine data analytics tools. The finding emphasizes that companies recognize the many ways that machine data can drive intelligence across the business. Customer Experience and Product Development Seen as Most Beneficial to Europeans Although security emerged as an important priority for users of machine data, improved customer experience and more efficient product development emerged as the top benefit of machine data analytics tools. Businesses are discovering that the machine analytic tools they use to improve their security posture can also drive value in other areas, including better end-user experiences, more efficient and smarter product development, optimized cloud and infrastructure spending, and improved sales and marketing performance. Barriers Preventing Wider Usage of Machine Data The report also provided insight into the barriers preventing wider usage of machine data analytics. The number one capability that users said was lacking in their existing tools was real-time access to data (37 percent), followed by fast, ad hoc querying (34 percent). Another notable barrier to broader usage is the lack of capabilities to effectively manage different machine data analytics tools. European respondents also stated that the adoption of modern technologies does make it harder to get the data they need for speedy decision-making (47 percent). Whilst moving to microservices and container-based architectures like Docker makes it easier to deploy at scale, it seems it is hard to effectively monitor activities over time without the right approach to logs and metrics in place. In Conclusion Europe is adopting modern tools and technologies at a slower rate than their U.S. counterparts, and fewer companies currently have a ‘software-led’ mindset in place. Software-centric businesses are doing more than their less advanced counterparts to make the most out of the intelligence available to them in machine data analytics tools. However, a desire for more continuous insights derived from machine data is there: the data show is that once European organisations start using machine data analytics to gain visibility into their security operations, they start to see the value for other use cases across operations, development and the business. The combination of customer experience and compliance with security represent strong value for European users of machine data analytics tools. Users want their machine data tools to drive even more insight into the customer experience, which is increasingly important to many businesses, and at the same time help ensure compliance. Additional Resources Download the full 451 Research report for more insights Check out the Sumo Logic DockerCon Europe press release Download the Paf customer case study Read the European GDPR competitive edge blog Sign up for a free trial of Sumo Logic

Blog

Announcing Extended AWS App Support at re:Invent for Security and Operations

Blog

Complete Visibility of Amazon Aurora Databases with Sumo Logic

Sumo Logic provides digital businesses a powerful and complete view of modern applications and cloud infrastructures such as AWS. Today, we’re pleased to announce complete visibility into performance, health and user activity of the leading Amazon Aurora database via two new applications – the Sumo Logic MySQL ULM application and the Sumo Logic PostgreSQL ULM application. Amazon Aurora is a MySQL and PostgreSQL-compatible relational database available on the AWS RDS platform. Amazon Aurora is up to five times faster than standard MySQL databases and three times faster than standard PostgreSQL databases. By providing complete visibility across your Amazon Aurora databases with these two applications, Sumo Logic provides the following benefits via advanced visualizations: Optimize your databases by understanding query performance, bottlenecks and system utilization Detect and troubleshoot problems by identifying new errors, failed connections, database activity, warnings and system events Monitor user activity by detecting unusual logins, failed events and geo-locations In the following sections of this blog post, we discuss details how these applications provide value to customers. Amazon Aurora Logs and Metrics Sources Amazon provides a rich set of log and metrics sources for monitoring and managing Aurora databases. The Sumo Logic Aurora MySQL ULM app works on the following three log types: AWS CloudTrail event logs AWS CloudWatch metrics AWS CloudWatch logs For Aurora MySQL databases, error logs are enabled by default to be pushed to CloudWatch. Aurora MySQL also supports slow query logs, audit logs, and general logs to be pushed to CloudWatch, however, you need to select this feature on CloudWatch. The Sumo Logic Aurora PostgreSQL ULM app works on the following log types: AWS Cloud Trail event logs AWS CloudWatch metrics For more details on setting up logs, please check the documentation for the Amazon Aurora PostgreSQL app and the Amazon Aurora MySQL app. Installing the Apps for Amazon Aurora Analyzing each of the above logs in isolation to debug a problem, or understand how your database environments are performing can be a daunting and time-consuming task. With the two new Sumo applications, you can instantly get complete visibility into all aspects of running your Aurora databases. Once you have configured your log sources, the Sumo Logic apps can be installed. Navigate to the Apps Catalog in your Sumo Logic instance and add the “Aurora MySQL ULM” or “Aurora PostgreSQL ULM” apps to your library after providing references to sources configured in the previous step. Optimizing Database Performance As part of running today’s digital businesses, customer experiences is a key outcome and towards that end closely monitoring the health of your databases is critical. The following dashboards provide an instant view on how your Amazon Aurora MySQL and PostGreSQL databases are performing across various important metrics. Using the queries from these dashboards, you can build scheduled searches and real-time alerts to quickly detect common performance problems. The Aurora MySQL ULM Logs – Slow Query Dashboard allows you to view log details on slow queries, including the number of slow queries, trends, execution times, time comparisons, command types, users, and IP addresses. The Aurora MySQL ULM Metric – Resource Utilization Monitoring dashboard allows you to view analysis of resource utilization, including usage, latency, active and blocked transactions, and login failures. The Aurora PostgreSQL ULM Metric – Latency, Throughput, and IOPS Monitoring Dashboard allows you to view granular details of database latency, throughput, IOPS and disk queue depth. It is important to monitor the performance of database queries. Latency and throughput are the key performance metrics. Detect and Troubleshoot Errors To provide the best service to your customers, you need to take care of issues quickly and minimize impacts to your users. Database errors can be hard to detect and sometimes surface only after users report application errors. The following set of dashboards help quickly surface unusual or new activity across your AWS Aurora databases. The Aurora MySQL ULM Logs – Error Logs Analysis Dashboard allows you to view details for error logs, including failed authentications, error outliers, top and recent warnings, log levels, and aborted connections. Monitor user activity With cloud environments, its becoming even more critical to investigate user behavior patterns and make sure your database is being accessed by the right staff. The following set of dashboards track all user and database activity and can help prioritize and identify patterns of unusual behavior for security and compliance monitoring. The Aurora MySQL ULM Logs – Audit Log Analysis Dashboard allows you to view an analysis of events, including accessed resources, destination and source addresses, timestamps, and user login information. These logs are specifically enabled to audit activities that are of interest from an audit and compliance perspective. The Aurora MySQL Logs – Audit Log SQL Statements Dashboard allows you to view details for SQL statement events, including Top SQL commands and statements, trends, user management, and activity for various types of SQL statements. You can drill deeper into various SQL statements and commands executed by clicking on the “Top SQL Commands” panel in the dashboard. This will open up the Aurora MySQL ULM – Logs – Audit Log SQL Statements dashboard, which will help with identifying trends, specific executions, user management activities performed and dropped objects. The Aurora PostgreSQL ULM CloudTrail Event – Overview Dashboard allows you to view details for event logs, including geographical locations, trends, successful and failed events, user activity, and error codes. In case you need to drill down for details, the CloudTrail Event – Details dashboard will help you with monitoring the most recent changes made to resources in your Aurora database ecosystem, including creation, modification, deletion and , reboot of Aurora clusters and or instances. Get Started Now! The Sumo Logic apps for Amazon Aurora helps optimize, troubleshoot and secure your AWS Aurora database environments. To get started check out the the Sumo Logic MySQL ULM application and the Sumo Logic PostgreSQL ULM application. If you don’t yet have a Sumo Logic account, you can sign up for a free trial today. For more great DevOps-focused reads, check out the Sumo Logic blog.

November 27, 2018

Blog

The Latest Trends for Modern Apps Built on AWS

Blog

Comparing a Multi-Tenant SaaS Solution vs. Single Tenant

Blog

An Organized Workflow for Prototyping

In the world of agile there’s a demand to solve grey areas throughout the design process at lightning speed. Prototypes help the scrum team test ideas and refine them. Without prototypes, we can’t test ideas until the feature or product has been built which can be a recipe for disaster. It’s like running a marathon without training. During a two week sprint, designers often need to quickly turn around prototypes in order to test. It can be hectic to juggle meetings, design and prototyping without a little planning. The guiding principles below, inspired by my time working with one our lead product designers at Sumo Logic — Rebecca Sorensen, will help you build prototypes more effectively for usability testing under a time crunch. Define the Scope From the start, it’s essential that we understand who is the audience and what is the goal of the prototype so we can determine other parts of the prototyping process like content, fidelity level and tools for the job. We can easily find out the intent by asking the stakeholder what he or she wants to learn. By defining the scope from the beginning we are able to prioritize our time more effectively throughout the prototyping process and tailor the content for the audience. For testing usually our audience are internal users or customers. The scrum team wants to know if the customer can complete a task successfully with the proposed design. Or they may also want to validate a flow to determine design direction. If we’re testing internally, we have more flexibility showing a low or mid fidelity prototype. However, when testing with customers, sometimes we have to consider more polished prototypes with real data. Set Expectations There was a time when designers made last minute changes to the prototype — sometimes while the prototype was being tested because a stakeholder added last-minute feedback — that impacted the outcome and did not provide enough time for the researcher to understand the changes. Before jumping into details, we create milestones to set delivery expectations. This helps the scrum team understand when to give feedback on the prototype and when the research team will receive the final prototype for testing. This timeline is an estimate and it might vary depending on the level of fidelity. We constantly experiment until we find our sweet spot. The best way to get started is to start from a desired end state, like debriefing the researcher on the final prototype, and work backward. The draft of the prototype doesn’t have to be completely finished and polished. It just needs to have at least structure so we can get it in front of the team for feedback. Sometimes, we don’t have to add all the feedback. Instead, we sift through the feedback and choose what makes sense given the time constraints. Tailor the Content for your Audience Content is critical to the success of a prototype. The level of details we need in the prototype depends on the phase our design process. Discovery In the exploration phase we are figuring out what are we building and why, so at this point the content tends to be more abstract. We’re trying to understand the problem space and our users so we shouldn’t be laser focused on details, only structure and navigation matter. Abstraction allows us to have a more open conversation with users that’s not solution focused. Sometimes we choose metaphors that allow us to be on the same playing field as our users to deconstruct their world more easily. We present this in the form of manipulatives — small cut outs of UI or empty UI elements the customer can draw on during a quick participatory design session. Cutting and preparing manipulatives is also a fun team activity Delivery When we move into the delivery phase of design where our focus is on how are we building the product, content needs to reflect the customer’s world. We partner closely with our Product Manager to structure a script. Context in the form of relevant copy, charts, data and labels help support the the script and various paths the user can take when interacting with the prototype. Small details like the data ink, data points along with the correct labels help us make the prototype more realistic so the user doesn’t feel he’s stepping into an unfamiliar environment. Even though a prototype is still an experiment, using real data gives us a preview of design challenges like truncation or readability. We are lucky to have real data from our product. CSVJSON helps us convert files into JSON format so we can use the data with chart plugins and CRAFT. Collaborate to Refine Prototyping is fun and playful — too much that it can be easy to forget that there are also other people who are part of the process. Prototyping is also a social way to develop ideas with non designers so when choosing which tool to present our prototype in we need to keep in mind collaboration outside the design team, not just the final output. We use InVision to quickly convey flows along with annotations but it has a downside during this collaborative process. Annotations can leave room for interpretation since every stakeholder has his own vocabulary. Recently, a couple of our Engineers in Poland started using UXPin. At first it was used to sell their ideas but for usability testing, it has also become a common area where we can work off each others’ prototypes. They like the ability to duplicate prototypes, reshuffle screens so the prototypes can be updated quickly without having to write another long document of explanations. By iterating together we are able to create a common visual representation and move fast. UXPin came to the rescue when collaborating with cross regional teams. It’s an intuitive tool for non designers that allows them to duplicate the prototype and make their own playground too. Tools will continue to change so it’s important to have an open mindset and be flexible to learning and making judgments about when to switch tools to deliver the prototype on time to research. Architect Smartly Although we are on a time crunch when prototyping for research, we can find time to experiment by adjusting the way we build our prototype. Make a playground Our lead productdesigner Rohan Singh invented the hamster playground to go wild with design explorations. The hamster playground is an experimental space which comes in handy when we may need to quickly whip something up without messing the rest of the design. It started as a separate page in our sketch files and now this is also present in our prototyping workspace. When we design something in high fidelity, we become attached to the idea right away. This can cripple experimentation. We need that sacred space detached from the main prototype that allows us to experiment with animations or dynamic elements. The hamster playground can also be a portable whiteboard or pen and paper. Embrace libraries Libraries accelerate the prototyping process exponentially! For the tool you’re commonly using to prototype invest some time (hackathons or end of quarter) to create a pattern library of the most common interactions(this is not a static UI Kit). If the prototype we’re building has some of those common elements, we will save them into the library so other team members can reuse them on another project. Building an interactive library is time consuming but it pays off because it allows the team to easily drag, drop and combine elements like legos. Consolidate the flow We try to remove non essential items from the prototype and replace them with screenshots or turn them into loops so we can focus only on the area that matters for testing. Consolidation also forces us to not overwhelm the prototype with many artboards otherwise we risk having clunky interactions during testing. The other advantage of consolidating is that you can easily map out interactions by triggers, states and animations/transitions. Prepare Researchers for Success Our job is not done until research, our partners, understand what we built. As a best practice, set up some time with the researcher to review the prototype. Present the limitations, discrepancies in different browsers and devices and any other instructions that are critical for the success of the testing session. A short guide that outlines the different paths with screenshots of what the successful interactions look like can aid researchers a lot when they are writing the testing script. Ready, Set…Prototype! Just like marathoners, who intuitively know when to move fast, adjust and change direction, great prototypers work from principles to guide their process. Throughout the design process the scrum team constantly needs answers to many questions. By becoming an effective prototyper, not the master of x tool, you can help the team find the answers right away. The principles outlined above will guide your process so you are more aware of how you spend your time and to know when you’re prototyping too much, too little or the wrong thing. Organization doesn’t kill experimentation; it makes more time for playfulness and solving the big grey areas. This post originally appeared on Medium. Check it out here. Additional Resources Check out this great article to learn how our customers influence the Sumo Logic product and how UX research is key to improving overall experiences Curious to know how I ended up at Sumo Logic doing product design/user experience? I share my journey in this employee spotlight article. Love video games and data? Then you’ll love this article from one of our DevOps engineers on how we created our own game (Sumo Smash bros) to demonstrate the power of machine data

Blog

Understanding Transaction Behavior with Slick + MySQL InnoDB

MySQL has always been among one of the top few database management systems used worldwide, according to DB-engines, one of the leading ranking websites. And thanks to the large open source community behind MySQL, it also solves a wide variety of use cases. In this blog post, we are going to focus on how to achieve transactional behavior with MySQL and Slick. We will also discuss how these transactions resulted in one of our production outages. But before going any further into the details, let’s first define what a database transaction is. In the context of relational databases, a sequence of operations that satisfies some common properties is known as a transaction. This common set of properties, which determine the behavior of these transactions are referred to as atomic, consistent, isolated and durable (ACID) properties. These properties are intended to guarantee the validity of the underlying data in case of power failure, errors, or other real-world problems. The ACID model talks about the basic supporting principles one should think about before designing database transactions. All of these principles are important for any mission-critical applications. One of the most popular storage engines we use in MySQL is InnoDB, whereas Slick is the modern database query and access library for Scala. Slick exposes the underlying data stored in these databases as Scala collections so that data stored onto these databases is seamlessly available. Database transactions come with their own set of overhead, especially in cases when we have long running queries wrapped in a transaction. Let’s understand the transaction behavior, which we get with Slick. Slick offers ways to execute transactions on MySQL. pre{ font-family: Consolas, Menlo, Monaco, Lucida Console, Liberation Mono, DejaVu Sans Mono, Bitstream Vera Sans Mono, Courier New, monospace, serif; margin-bottom: 10px; overflow: auto; width: auto; padding: 5px; background-color: #eee; width: 650px!ie7; padding-bottom: 20px!ie7; max-height: 600px; } function fetchCustomers(): Observable<Customer[]> { ... } val a = (for { ns <- coffees.filter(_.name.startsWith("ESPRESSO")).map(_.name).result _ <- DBIO.seq(ns.map(n => coffees.filter(_.name === n).delete): _*) } yield ()).transactionally These transactions are executed with the help of the Auto-Commit feature provided by the InnoDB engine. We will go into this auto-commit feature later in this article, but first, let me tell you about an outage, which happened on our production services at Sumo Logic and resulted in one of the outages. For the rest of the article, I will be talking about one of our minor outages which happened due to this lack of understanding in this transaction behavior. Whenever any user fires a query, the query follows this course of action before getting started: Query metadata i.e. user, customerID is first sent to Service A Service A asks this common Amazon MySQL RDS for the number of concurrent sessions for this user running across all the instances for this Service A If the number is greater than some threshold we throttle the request and send 429 to the user. Otherwise, we just add the metadata of the session to the table stored in RDS All of these actions are executed within the scope of a single slick transaction. Recently we started receiving lots of lock wait timeouts on this Service A. On debugging further, we saw that from the time we started getting lots of lock wait timeouts, there was also an increase in the average CPU usage across the Service A cluster. Looking into some of these particular issues of lock wait timeouts, we noticed that whenever we had an instance in the cluster going through full GC cycles, that resulted in a higher number of lock wait timeouts across the cluster. But interestingly enough, these lock wait timeouts were all across the cluster and not isolated on the single instance, which suffered from full GC cycles. Based on that, we knew that full GC cycles on one of the nodes were somewhat responsible for causing those lock wait timeouts across the cluster. As already mentioned above, we used the transaction feature provided by slick to execute all of the actions as a single command. So the next logical step was to dig deeper into understanding the question: “how does Slick implement these transactions”? We found out that Slick uses the InnoDB feature of auto-commits to execute transactions. In the auto-commit disabled mode, the transaction is kept open until the transaction is committed from the client side, which essentially means that the connection executing the current transaction holds all the locks until the transaction is committed. Auto-Commit Documentation from the InnoDB Manual In InnoDB, all user activity occurs inside a transaction. If auto-commit mode is enabled, each SQL statement forms a single transaction on its own. By default, MySQL starts the session for each new connection with auto-commit enabled, so MySQL does a commit after each SQL statement if that statement did not return an error. If a statement returns an error, the commit or rollback behavior depends on the error. See Section 14.21.4, “InnoDB Error Handling”. A session that has auto-commit enabled can perform a multiple-statement transaction by starting it with an explicit START TRANSACTION or BEGIN statement and ending it with a COMMIT or ROLLBACK statement. See Section 13.3.1, “START TRANSACTION, COMMIT, and ROLLBACK Syntax”. If auto-commit mode is disabled within a session with SET auto-commit = 0, the session always has a transaction open. A COMMIT or ROLLBACK statement ends the current transaction and a new one starts. Pay attention to the last sentence above. This means if auto-commit is disabled, then the transaction is open, which means all the locks are still with this transaction. All the locks, in this case, will be released only when we explicitly COMMIT the transaction. So in our case, our inability to execute the remaining commands within the transaction due to a high GC, meant that we were still holding onto the locks on the table and therefore would mean that other JVMs executing the transaction touching the same table (which is, in fact, the case), would also suffer from high latencies. But we needed to be sure that was the case on our production environments. So we went ahead with reproducing the production issue on the local testbed, making sure that locks were still held by the transaction on the node undergoing high GC cycles. Steps to Reproduce the High DB Latencies on One JVM Due to GC Pauses on Another JVM Step One We needed some way to know when the queries in the transactions were actually getting executed by the MySQL server. mysql> SET global general_log = 1; mysql> SET global log_output = 'table'; mysql> SELECT * from mysql.general_log; So MySQL general logs show the recent queries which were executed by the server. Step Two We needed two different transactions to execute at the same time in different JVMs to understand this lock wait timeout. Transaction One: val query = (for { ns <- userSessions.filter(_.email.startsWith(name)).length.result _ <- { println(ns) if (ns > n) DBIOAction.seq(userSessions += userSession) else DBIOAction.successful(()) } } yield ()).transactionally db.run(query) Transaction Two: db.run(userSessions.filter(_.id === id).delete) Step Three Now we needed to simulate the long GC pauses or pauses in one of the JVMs to mimic the production environments. On mimicking those long pauses, we need to monitor the mysql.general logs for finding out when did the command reached the MySQL server for asking to be executed. The below chart depicts the order of SQL statements getting executed on both JVMs: JVM 1( Adding the session of the user ) JVM 2 ( Delete the session of the user if present ) SET auto-commit = 0 ( as in false ) SELECT count(*) FROM USERS where User_id = “temp” ( LOCKS ACQUIRED ) SET auto-commit = 0 INSERT INTO USERS user_session DELETE FROM USERS where sessionID = “121” ( Started ) INTRODUCED HIGH LATENCY ON THE CLIENT SIDE FOR 40 SECONDS DELETE OPERATION IS BLOCKED DUE TO THE WAITING ON THE LOCK COMMIT DELETE FROM USERS where sessionID = “121” ( Completed ) COMMIT In the below image, you can see the SQL statements getting executed on both the JVMs: This image shows the lock wait time of around 40 seconds on JVM 2 on “Delete SQL” command: We can clearly see from the logs how pauses in one JVM causes high latencies across the different JVMs querying on MySQL servers. Handling Such Scenarios with MySQL We more than often need to handle this kind of scenario where we need to execute MySQL transactions across the JVMs. So how can we achieve low MySQL latencies for transactions even in cases of pauses in one of the JVMs? Here are some solutions: Using Stored Procedures With stored procedures, we could easily extract out this throttling logic into a function call and store it as a function on MySQL server. They can be easily called out by clients with appropriate arguments and they can be executed all at once on the server side without being afraid of the client side pauses. Along with the use of transactions in the procedures, we can ensure that they are executed atomically and results are hence consistent for the entire duration of the transaction. Delimit Multiple Queries With this, we can create transactions on the client side and execute them atomically on the server side without being afraid of the pauses. Note: You will need to enable allowMultiQueries=true because this flag allows batching multiple queries together into a single query and hence you will be able to run transactions as a single query. Better Indexes on the Table With better indices, we can ensure that while executing SELECT statements with WHERE condition we touch minimal rows and hence ensuring minimal row locks. Let’s suppose we don’t have any index on the table, then in that case for any select statement, we need to take a shared row lock on all the rows of the table, which will mean that during the execution phase of this transaction all the delete or updates would be blocked. So it’s generally advised to have WHERE condition in SELECT to be on an index. Lower Isolation Levels for executing Transactions With READ UNCOMMITTED isolation levels, we can always read the rows which still have not been committed. Additional Resources Want more articles like this? Check out the Sumo Logic blog for more technical content! Read this blog to learn how to triage test failures in a continuous delivery lifecycle Check out this article for some clear-cut strategies on how to manage long-running API queries using RxJS Visit the Sumo Logic App for MySQL page to learn about cloud-native monitoring for MySQL https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

Blog

Exploring Nordcloud’s Promise to Deliver 100 Percent Alert-Based Security Operations to Customers

Blog

Strategies for Managing Long-running API Calls with RxJS

Blog

Near Real-Time Log Collection From Amazon S3 Storage

Blog

SnapSecChat: Sumo Logic CSO Recaps HackerOne's Conference, Security@

Blog

Illuminate 2018 Product Update Q&A with Sumo Logic CEO Ramin Sayar

Blog

How to Triage Test Failures in a Continuous Delivery Lifecycle

Blog

Gain Visibility into Your Puppet Deployments with the New Sumo Logic Puppet App

Puppet is a software configuration management and deployment tool that is available both as an open source tool and commercial software. It’s most commonly used on Linux and Windows to pull the strings on multiple application servers at once. It includes its own declarative language to describe system configurations. In today’s cloud environments that consist of hundreds of distributed machines, Puppet can help in reducing development time and resources by automatically applying these configurations. Just like any other DevOps tool there can be errors and configuration issues. However, with the new Sumo Logic Puppet integration and application, customers can now leverage the Sumo Logic platform to help monitor Puppet performance, configurations and errors. Puppet Architecture and Logging Puppet can apply required configurations across new and existing servers or nodes. You can configure systems with Puppet either in a client-server architecture or in stand-alone architectures. The client-server architecture is the most commonly used architecture for Puppet implementations. Puppet agents apply the required changes and send the reports to the Puppet master describing the run and details of the client resources. These reports can help answer questions like “how often are the resources modified,” “how many events were successful in the past day” and “what was the status of the most recent run?” In addition to reports, Puppet also generates an extensive set of log files. From a reporting and monitoring perspective, the two log files of interest are the Puppet server logs and the HTTP request logs. Puppet server messages and errors are logged to the file /var/log/puppetlabs/puppetserver/puppetserver.log. Logging can be configured using the /etc/puppetlabs/puppetserver/logback.xml file, which can be used to monitor the health of the server. The /var/log/puppetlabs/puppetserver/puppetserver-access.log file contains HTTP traffic being routed via your Puppet deployment. This logging can be handled using the configuration file: /etc/puppetlabs/puppetserver/request-logging.xml. Puppet agent requests to the master are logged into this file. Sumo Logic Puppet App The Sumo Logic Puppet app is designed to effectively manage and monitor Puppet metrics, events and errors across your deployments. With Sumo Logic dashboards you will be able to easily identify: Unique nodes Puppet node runs activity Service times Catalog application times Puppet execution times Resource transition (failures, out-of-sync, modifications, etc.) Error rates and causes Installation In order to get started, the app requires three data sources: Puppet server logs Puppet access logs Puppet reports The puppet server logs and puppet access logs are present in the directory var/log/puppetlabs/puppetserver/. Configure separate local file resources for both of these log files. Puppet reports are generated as yaml files. These need to be converted into JSON files before ingesting into Sumo Logic. To ingest Puppet reports, you must configure a script source. Once the log sources are configured, the Sumo Logic app can be installed. Simply navigate to the apps Catalog in your Sumo Logic instance and add the Puppet app to the library after providing the sources configured in the previous step. For more details on app configuration, please see instructions on Sumo Logic’s DocHub. Sumo Logic Puppet App Visualizations In any given Puppet deployment, there can be a large number of nodes. Some of the nodes may be faulty or others may be very active. The Puppet server manages the nodes and it may be suffering from issues itself. The Sumo Logic Puppet app consists of predefined dashboards and search queries which help you monitor the Puppet infrastructure. The Puppet Overview dashboard shown below gives you an overview of activity across nodes and servers. If a Puppet node is failing, you can quickly find out when the node made requests, what version it is running on and how much time it is taking to prepare the catalog for the node by the server. Puppet Overview Dashboard Let’s take a closer look at the Error Rate panel. The Error Rate panel displays the error rates per hour. This helps identify when error rates spiked, and by clicking on the panel, you can identify the root cause on either the node-level or the server-level via the Puppet Error Analysis dashboard. In addition, this dashboard highlights the most erroneous nodes along with the most recent errors and warnings. With this information, it will be easier to drill down into the root cause of the issues. The panel Top Erroneous Nodes helps in identifying the most unstable nodes. Drill down to view the search query by clicking on the “Show in Search” icon highlighted in the above screenshot. The node name and the errors can be easily identified and corrective actions can be performed by reviewing the messages in the search result as shown in the screenshot below: With the help of information on the Puppet – Node Puppet Run Analysis dashboard, node health can be easily determined across different deployments such as production and pre-production. The “Slowest Nodes by Catalog Application Time” panel helps you determine the slowest nodes, which can potentially be indicative of problems and issues within those nodes. From there, you can reference the Puppet Error Analysis dashboard to determine the root cause. The “Resource Status” panel helps you quickly determine the status of various resources, further details around which can be obtained by drilling down to the query behind it. By reviewing the panels on this dashboard, highest failing or out-of-sync resources can be easily determined, which may be indicative of problems on respective nodes. To compare the average catalog application times, take a look at the “Average Catalog Application Time” and “Slowest Nodes by Catalog Application Time” panels. The resources panels show resources that failed, modified, are out-of-sync and skipped. Drilling down to the queries of the panels will help in determining the exact resource list with the selected status. Note: All the panels in the Puppet Node Puppet Run Analysis dashboard and some panels of the Puppet Overview dashboard can be filtered based on the environment, such as production, pre-production, etc. as shown below: Get Started Now! The Sumo Logic app for Puppet monitors your entire Puppet infrastructure potentially spanning hundreds of nodes and helps determine the right corrective and preventative actions. To get started check out the Sumo Logic Puppet app help doc. If you don’t yet have a Sumo Logic account, you can sign up for a free trial today. For more great DevOps-focused reads, check out the Sumo Logic blog.

Blog

Pokemon Co. International and Sumo Logic's Joint Journey to Build a Modern Day SOC

The world is changing. The way we do business, the way we communicate, and the way we secure the enterprise are all vastly different today than they were 20 years ago. This natural evolution of technology innovation is powered by the cloud, which has not only freed teams from on-premises security infrastructure, but has also provided them with the resources and agility needed to automate mundane tasks. The reality is that we have to automate in the enterprise if we are to remain relevant in an increasingly competitive digital world. Automation and security are a natural pairing, and when we think about the broader cybersecurity skills talent gap, we really should be thinking about how we can replace simple tasks through automation to make way for teams and security practitioners to be more innovative, focused and strategic. A Dynamic Duo That’s why Sumo Logic and our partner, The Pokemon Co. International, are all in on bringing together the tech and security innovations of today and using those tools and techniques to completely redefine how we do security operations, starting with creating a new model for how security operations center (SOC) should be structured and how it should function. So how exactly are we teaming up to build a modern day SOC, and what does it look like in terms of techniques, talent and tooling? We’ll get into that, and more, in this blog post. Three Pillars of the Modern Day SOC Adopt Military InfoSec Techniques The first pillar is all about mindset and adopting a new level of rigor and way of thinking for security. Both the Sumo Logic and Pokemon security teams are built on the backbone of a military technique called the OODA loop, which was originally coined by U.S. Air Force fighter pilot and Pentagon consultant of the late twentieth century, John Boyd. Boyd created the OODA loop to implement a change in military doctrine that focused on an air-to-air combat model. OODA stands for observe, orient, decide and act, and Boyd’s thinking was that if you followed this model and ensured that your OODA loop was faster than that of your adversary’s, then you’d win the conflict. Applying that to today’s modern security operations, all of the decisions made by your security leadership — whether it’s around the people, process or tools you’re using — should be aimed at reducing your OODA loop to a point where, when a situation happens, or when you’re preparing for a situation, you can easily follow the protocol to observe the behavior, orient yourself, make effective and efficient decisions, and then act upon those decisions. Sound familiar? This approach is almost identical to most current incident response and security protocols, because we live in an environment where every six, 12 or 24 months we’re seeing more tactics and techniques changing. That’s why the SOC of the future is going to be dependent on a security team’s ability to break down barriers and abandon older schools of thought for faster decision making models like the OODA loop. This model is also applicable across an organization to encourage teams to be more efficient and collaborative cross-departmentally, and to move faster and with greater confidence in order to achieve mutually beneficial business goals. Build and Maintain an Agile Team But it’s not enough to have the right processes in place. You also need the right people that are collectively and transparently working towards the same shared goal. Historically, security has been full of naysayers, but it’s time to shift our mindset to that of transparency and enablement, where security teams are plugged into other departments and are able to move forward with their programs as quickly and as securely as they can without creating bottlenecks. This dotted line approach is how Pokemon operates and it’s allowed the security team to share information horizontally, which empowers development, operations, finance and other cross-functional teams to also move forward in true DevSecOps spirit. One of the main reasons why this new and modern Sumo Logic security team structure has been successful is because it’s enabled each function — data protection/privacy, SOC, DevSecOps and federal — to work in unison not only with each other, but also cross-departmentally. In addition to knowing how to structure your security team, you also need to know what to look for when recruiting new talent. Here are three tips from Pokemon’s Director of Information Security and Data Protection Officer, John Visneski: Go Against the Grain. Unfortunately there are no purple security unicorns out there. Instead of finding the “ideal” security professional, go against the grain. Find people with the attitude and aptitude to succeed, regardless of direct security experience. The threat environment is changing rapidly, and burnout can happen fast, which is why it’s more important to have someone on in your team with those two qualities.Why? No one can know everything about security and sometimes you have to adapt and throw old rules and mindsets out the window. Prioritize an Operational Mindset. QAs and test engineers are good at automation and finding gaps in seams, very applicable to security. Best Security Engineers didn’t know a think about security before joining Pokemon, but he had a valuable skill set.Find talent pools that know how the sausage is made. Best and brightest security professionals didn’t even start out in security but their value add is that they are problem solvers first, and security pros secondary. Think Transparency. The goal is to get your security team to a point where they’re sharing information at a rapid enough pace and integrating themselves with the rest of the business. This allows for core functions to help solve each other’s problems and share use-cases, and it can only be successful if you create a culture that is open and transparent. The bottom line: Don’t be afraid to think outside of the box when it comes to recruiting talent. It’s more important to build a team based on want, desire and rigor, which is why bringing in folks with military experience has been vital to both Sumo Logic’s and Pokemon’s security strategies. Security skills can be learned. What delivers real value to a company are people that have a desire to be there, a thirst for knowledge and the capability to execute on the job. Build a Modern Day Security Stack Now that you have your process, and your people, you need your third pillar — tools sets. This is the Sumo Logic reference architecture that empowers us to be more secure and agile. You’ll notice that all of these providers are either born in the cloud or are open source. The Sumo Logic platform is at the core of this stack, but its these partnerships and tools that enable us to deliver our cloud-native machine data analytics as a service, and provide SIEM capabilities that easily prioritize and correlate sophisticated security threats in the most flexible way possible for our customers. We want to grow and transform with our own customer’s modern application stacks and cloud architectures as they digitally transform. Pokemon has a very similar approach to their security stack: The driving force behind Pokemon’s modern toolset is the move away from old school customer mentality of presenting a budget and asking for services. The customer-vendor relationship needs to mirror a two way partnership with mutually invested interests and clear benefits on both sides. Three vendors — AWS, CrowdStrike and Sumo Logic — comprise the core base of the Pokemon security platform, and the remainder of the stack is modular in nature. This plug and play model is key as the security and threat environments continue to evolve because it allows for flexibility in swapping in and out new vendors/tools as they come along. As long as the foundation of the platform is strong, the rest of the stack can evolve to match the current needs of the threat landscape. Our Ideal Model May Not Be Yours We’ve given you a peek inside the security kimono, but it’s important to remember that every organization is different, and what works for Pokemon or Sumo Logic may not work for every particular team dynamic. While you can use our respective approaches as a guide to implement your own modern day security operations, the biggest takeaway here is that you find a framework that is appropriate for your organization’s goals and that will help you build success and agility within your security team and across the business. The threat landscape is only going to grow more complex, technologies more advanced and attackers more sophisticated. If you truly want to stay ahead of those trends, then you’ve got to be progressive in how you think about your security stack, teams and operations. Because regardless of whether you’re an on-premises, hybrid or cloud environment, the industry and business are going to leave you no choice but to adopt a modern application stack whether you want to or not. Additional Resources Learn about Sumo Logic's security analytics capabilities in this short video. Hear how Sumo Logic has teamed up with HackerOne to take a DevSecOps approach to bug bounties in this SnapSecChat video. Learn how Pokemon leveraged Sumo Logic to manage its data privacy and GDPR compliance program and improve its security posture.

Blog

The 3 Phases Pitney Bowes Used to Migrate to AWS

Blog

Exploring the Future of MDR and Cloud SIEM with Sumo Logic, eSentire and EMA

At Sumo Logic’s annual user conference, Illuminate, we announced a strategic partnership with eSentire, the largest pure-play managed detection and response (MDR) provider, that will leverage security analytics from the Sumo Logic platform to deliver full spectrum visibility across the organization, eliminating common blind spots that are easily exploited by attackers. Today’s digital organizations operate on a wide range of modern applications, cloud infrastructures and methodologies such as DevSecOps, that accumulate and release massive amounts of data. If that data is managed incorrectly, it could allow malicious threats to slip through the cracks and negatively impact the business. This partnership combines the innovative MDR and cloud-based SIEM technologies from eSentire and Sumo Logic, respectively, that provide customers with improved analytics and actionable intelligence to rapidly detect and investigate machine data to identify potential threats to cloud or hybrid environments and strengthen overall security posture. Watch the video to learn more about this joint effort as well as the broader security, MDR, and cloud SIEM market outlook from Jabari Norton, VP global partner sales & alliances at Sumo Logic, Sean Blenkhorn, field CTO and VP sales engineering & advisory services at eSentire, and Dave Monahan, managing research director at analyst firm, EMA. For more details on the specifics of this partnership, read the joint press release.

Blog

Accelerate Security and PCI Compliance Visibility with New Sumo Logic Apps for Palo Alto Networks

Blog

Artificial Intelligence vs. Machine Learning vs. Deep Learning: What's the Difference?

Blog

Illuminate 2018 Video Q&A with Sumo Logic CEO Ramin Sayar

Blog

Intrinsic vs Meta Tags: What’s the Difference and Why Does it Matter?

Tag-based metrics are typically used by IT operations and DevOps teams to make it easier to design and scale their systems. Tags help you to make sense of metrics by allowing you to filter on things like host, cluster, services, etc. However, knowing which tags to use, and when, can be confusing. For instance, have you ever wondered about the difference between intrinsic tags (or dimensions) and meta tags with respect to custom application metrics? If so, you’re not alone. It is pretty common to get the two confused, but don’t worry because this blog post will help explain the difference. Before We Get Started First, let’s start with some background. Metrics in Carbon 2.0 take on the following format: Note that there are two spaces between intrinsic_tags and meta_tags. If a tag is listed before the double space, then it is an intrinsic tag. If a tag is listed after the double space, then it is a meta tag. Meta_tags are also optional. If no meta_tags are provided, there must be two spaces between intrinsic_tags and value. Some examples of Carbon 2.0 metrics might be: Understanding Intrinsic Tags Intrinsic tags may also be referred to as dimensions and are metric identifiers. If you have two data points sent with same set of dimension values then they will be values in the same metric time series. In the examples above, each metric has different dimensions so they will be separate time series. Understanding Meta Tags On the other hand, meta tags are not used as metric identifiers. This means that if two data points have the same intrinsic tags or dimensions, but different meta tags, they will still be values in the same metric time series. Meta tags are meant to be used in addition to intrinsic tags so that you can more conveniently select the metrics. Let’s Look at an Example To make that more clear, let’s use another example. Let’s say that you have 100 servers in your cluster that are reporting host metrics like “metric=cpu_idle.” This would be an intrinsic tag. You may also want to track the version of your code running on that cluster. Now if you put the code version in an intrinsic tag, you’ll get a completely new set of metrics every time you upgrade to a new code version. Unless you want to maintain the metrics “history” of the old code version, you probably don’t want this behavior. However, if you put the version in a meta tag instead then you will be able to change the version without creating a new set of metrics for your cluster. To take the example even further, let’s say you have upgraded half of your cluster to a new version and want to compare the CPU idle of the old and new code version. You could do this in Sumo Logic using the query “metric = cpu_idle | avg by version.” Knowing the Difference To summarize, if you want two values of a given tag to be separate metrics at the same time then the values should be an intrinsic tag and not a meta tag. Hopefully this clears up some of the confusion regarding intrinsic versus meta tags. By tagging your metrics appropriately you will make them easier to search and ensure that you are tracking all the metrics you expect. If you already have a Sumo Logic account, then you are ready to start ingesting custom metrics. If you are new to Sumo Logic, start by signing up for a free account here. Additional Resources Learn how to accelerate data analytics with Sumo Logic’s Logs to Metrics solution in this blog Want to know how to transform Graphite data into metadata-rich metrics? Check out our Metrics Rules solution Read the case study to learn how Paf leveraged the Sumo Logic platform to derive critical insights that enabled them to analyze log and metric data, perform root-cause analysis, and monitor apps and infrastructure

Blog

Why is Oracle and Microsoft SQL Adoption Low for Developers on AWS?

Blog

Why Decluttering Complex Data in Legends is Hard

Blog

5 Best Practices for Using Sumo Logic Notebooks for Data Science

This year, at Sumo Logic’s third annual user conference, Illuminate 2018, we presented Sumo Logic Notebooks as a way to do data science in Sumo Logic. Sumo Logic Notebooks are an experimental feature that integrate Sumo Logic, notebooks and common machine learning frameworks. They are a bold attempt to go beyond what the current Sumo Logic product has to offer and enable a data science workflow leveraging our core platform. Why Notebooks? In the data science world, notebooks have emerged as an important tool to do data science. Notebooks are active documents that are created by individuals or groups to write and run code, display results, and share outcomes and insights. Like every other story, a data science notebook follows a structure that is typical for its genre. We usually have four parts. We (a) start with defining a data set, (b) continue to clean and prepare the data, (c) perform some modeling using the data, and (d) interpret the results. In essence, a notebook should record an explanation of why experiments were initiated, how they were performed, and then display the results. Anatomy of a Notebook A notebook segments a computation in individual steps called paragraphs. A paragraph contains an input and an output section. Each paragraph executes separately and modifies the global state of the notebook. State can be defined as the ensemble of all relevant variables, memories, and registers. Paragraphs must not necessarily contain computations, but also can contain text or visualizations to illustrate the workings of the code. The input section (blue) will contain the instruction to the notebook execution engine (sometimes called kernel or interpreter). The output section (green) will display a trace of the paragraph’s execution and/or an intermediate result. In addition, the notebook software will expose some controls (purple) for managing and versioning notebook content as well as operational aspects such as starting and stopping executions. Human Speed vs Machine Speed The power of the notebook roots in its ability to segment and then slow down computation. Common executions of computer programs are done at machine speed. Machine speed suggests that when a program is submitted to the processor for execution, it will run from start to end as fast as possible and only block for IO or user input. Consequently, the state of the program changes so fast that it is neither observable, nor modifiable by humans. Programmers would typically attach debuggers physically or virtually to stop programs during execution at so-called breakpoints and read out and analyze their state. Thus, they would slow down execution to human speed. Notebooks make interrogating the state more explicit. Certain paragraphs are dedicated to make progress in the computation, i.e., advance the state, whereas other paragraphs would simply serve to read out and display the state. Moreover, it is possible to rewind state during execution by overwriting certain variables. It is also simple to kill the current execution, thereby deleting the state and starting anew. Notebooks as an Enabler for Productivity Notebooks increase productivity, because they allow for incremental improvement. It is cheap to modify code and rerun only the relevant paragraph. So when developing a notebook, the user builds up state and then iterates on that state until progress is made. Running a stand-alone program on the contrary will incur more setup time and might be prone to side-effects. A notebook will most likely keep all its state in the working memory whereas every new execution of a stand-alone program will need to build up the state on every time it is run. This takes more time and the required IO operations might fail. Working off a program state in the memory and iterating on that proved to be very efficient. This is particularly true for data scientists, as their programs usually deal with a large amount of data that has to be loaded in and out of memory as well as computations that can be time-consuming. From an the organizational point of view, notebooks are a valuable tool for knowledge management. As they are designed to be self-contained, sharable units of knowledge, they amend themselves for: Knowledge transfer Auditing and validation Collaboration Notebooks at Sumo Logic At Sumo Logic, we expose notebooks as an experimental feature to empower users to build custom models and analytics pipelines on top of log metrics data sets. The notebooks provide the framework to structure a thought process. This thought process can be aimed at delivering a special kind of insight or outcome. It could be drilling down on a search. Or an analysis specific to a vertical or an organization. We provide notebooks to enable users to go beyond what Sumo Logic operators have to offer, and train and test custom machine learning (ML) algorithms on your data. Inside notebooks we deliver data using data frames as a core data structure. Data frames make it easy to integrate logs and metrics with third-party data. Moreover, we integrate with other leading data wrangling, model management and visualization tools/services to provide a blend of the best technologies to create value with data. Technology Stack Sumo Logic Notebooks are an integration of several software packages to make it easy to define data sets using the Sumo Query language and use the result data set as a data frame in common machine learning frameworks. Notebooks are delivered as a Docker container and can therefore be installed on laptops or cloud instances without much effort. The most common machine learning libraries such as Apache Spark, pandas, and TensorFlow are pre-installed, but others are easy to add through python’s pip installer, or using apt-get and other package management software from the command line. Changes can be made persistent by committing the Docker image. The key of Sumo Logic Notebooks is the integration of the Sumo Logic API data adapter with Apache Spark. After a query has been submitted, the adapter will load the data and ingest it into Spark. From there we can switch over to a python/pandas environment or continue with Spark. The notebook software provides the interface to specify data science workflows. Best Practices for Writing Notebooks #1 One notebook, one focus A notebook contains a complete record of procedures, data, and thoughts to pass on to other people. For that purpose, they need to be focused. Although it is tempting to put everything in one place, this might be confusing for users. Better write two or more notebooks than overloading a single notebook. #2 State is explicit A common source of confusion is that program state gets passed on between paragraphs through hidden variables. The set of variables that represent the interface between two subsequent paragraphs should be made explicit. Referencing variables from other paragraphs than the previous one should be avoided. #3 Push code in modules A notebook integrates code, it is not a tool for code development. That would be an Integrated Development Environment (IDE). Therefore, a notebook should one contain glue code and maybe one core algorithm. All other code should be developed in an IDE, unit tested, version controlled, and then imported via libraries in the notebook. Modularity and all other good software engineering practices are still valid in notebooks. As in practice number one too much code clutters the notebook and distracts from the original purpose or analysis goal. #4 Use speaking variables and tidy up your code Notebooks are meant to be shared and read by others. Others might not have an easy time following our thought process, if we did not come up with good, self-explaining names. Tidying up the code goes a long way, too. Notebooks impose an even higher standard than traditional code on quality. #5 Label diagrams A picture is worth a thousand words. A diagram, however, will need some words to label axes, describe lines and dots, and comprehend other important informations such sample size, etc. A reader can have a hard time to seize the proportion or importance of a diagram without that information. Also keep in mind that diagrams are easily copy-pasted from the notebook into other documents or in chats. Then they lose the context of the notebook in which they were developed. Bottom Line The segmentation of a thought process is what fuels the power of the notebook. Facilitating incremental improvements when iterating on a problem boosts productivity. Sumo Logic enables the adoption of notebooks to foster the use of data science with logs and metrics data. Additional Resources Visit our Sumo Logic Notebooks documentation page to get started Check out Sumo Logic Notebooks on DockerHub or Read the Docs Read our latest press release announcing new platform innovations, including our new Data Science Insights innovation

Blog

How to Monitor Azure Services with Sumo Logic

Blog

Illuminate Day Two Keynote Top Four Takeaways

Day two of Illuminate, Sumo Logic’s annual user conference, started with a security bang, hearing from our founders, investors, customers and a special guest (keep reading to see who)! If you were unable attend the keynote in person, or watch via the Facebook Livestream, we’ve recapped the highlights below for you. If you are curious about the day one keynote, check out that recap blog post, as well. #1: Dial Tones are Dead, But Reliability Is Forever Two of our founders, Christian Beedgen and Bruno Kurtic, took the stage Thursday morning to kick off the second day keynote talk, and they did not disappoint. Sumo Logic founders Bruno Kurtic (left) and Christian Beedgen (right) kicking off the day two Illuminate keynote Although the presentation was full of cat memes, penguins and friendly banter, they delivered an earnest message: reliability, availability and performance are important to our customers, and are important to us at Sumo Logic. But hiccups happen, it’s inevitable, and that’s why Sumo Logic is committed to constantly monitoring for any hiccups so that we can troubleshoot instantly when they happen. The bottom line: our aspiration at Sumo Logic is to be the dial tone for those times when you absolutely need Sumo Logic to work. And we do that through total transparency. Our entire team has spent time on building a reliable service, built on transparency and constant improvement. It really is that simple. #2: The Platform is the Key to Democratizing Machine Data (and, Penguins) We also announced a number of new platform enhancements, solutions and innovations at Illuminate, all with the goal of improving our customers’ experiences. All of that goodness can be found in a number of places (linked at the end of this article), but what was most exciting to hear from Bruno and Christian on stage was what Sumo Logic is doing to address major macro trends. The first being proliferation of users and access. What we’ve seen from our customers, is that the Sumo Logic platform is brought into a specific group, like the security team, or the development team, and then it spreads like wildfire, until the entire company (or all of the penguins) wants access to the rich data insights. That’s why we’ve taken an API-first approach to everything we do. To keep your workloads running around the globe, we now have 20 availability zones across five regions and we will continue to expand to meet customer needs. The second being cloud scale economics because Moore’s Law is, in fact, real. Data ingest trends are going up, and for years our customers have relied on Sumo Logic to manage mission-critical data in order to keep their modern applications running and secured. Not all data is created equal, and different data sets have different requirements. Sometimes, it can be a challenge to store data outside of the Sumo Logic platform, which is why our customers now will have brand new capabilities for basic and cold storage within Sumo Logic. (Christian can confirm that the basic storage is still secure — by packs of wolves). The third trend is around the unification of modern apps and machine data. While the industry is buzzing about observability, one size does not fit all. To address this challenge, the Sumo Logic team asked, what can we do to deliver on the vision of unification? The answer is in the data. For the first time ever, we will deliver the State of Modern Applications report live, where customers can push their data to dynamic dashboards, and all of this information will be accessible in new easy to read charts that are API-first, templatized and most importantly, unified. Stay tuned for more on the launch of this new site! #3: The State of Security from Greylock, AB InBev and Pokemon One of my favorite highlights of the second day keynote was the security panel, moderated by our very own CSO, George Gerchow, with guests from one of our top investors, Greylock Partners, and two of our customers, Anheuser-Busch InBev (AB InBev) and Pokemon. From left to right: George Gerchow, CSO, Sumo Logic; Sara Guo, partner, Greylock; Khelan Bhatt, global director, security architecture, AB InBev; John Visneski, director infosecurity & DPO, Pokemon Sara Guo, general partner at Greylock, spoke about three constantly changing trends, or waves, she’s tracking in security, and what she looks when her firm is considering an investment: the environment, the business and the attackers. We all know the IT environment is changing drastically, and as it moves away from on-premises protection, it’s not a simple lift and shift process, we have to actually do security differently. Keeping abreast of attacker innovation is also important for enterprises, especially as cybersecurity resources continue to be sparse. We have to be able to scale our products, automate, know where our data lives and come together as a defensive community. When you think of Anheuser-Busch, you most likely think of beer, not digital transformation or cybersecurity. But there’s actually a deep connection, said Khelan Bhatt, global director, security architecture, AB InBev. As the largest beer distributor in the world, Anheuser Busch has 500 different breweries (brands) in all corners of the world, and each one has its own industrial IoT components that are sending data back to massive enterprise data lakes. The bigger these lakes get, the bigger targets they become to attackers. Sumo Logic has played a big part in helping the AB InBev security team digitally transform their operations, and building secure enterprise data lakes to maintain their strong connection to the consumer while keeping that data secure. John Visneski, director of information security and data protection officer (DPO) for the Pokémon Company International had an interesting take on how he and his team approach security. Be a problem solver first, and a security pro second. Although John brought on Sumo Logic to help him fulfill security and General Data Protection Regulation (GDPR) requirements, our platform has become a key business intelligence tool at Pokemon. With over 300 million active users, Pokemon collects sensitive personally identifiable information (PII) from children, including names, addresses and some geolocation data. Sumo Logic has been key for helping John and his team deliver on the company’s core values: providing child and customer safety, trust (and uninterrupted fun)! #4: Being a Leader Means Being You, First and Foremost When our very special guest, former CIA Director George Tenet, took the stage, I did not expect to walk away with some inspiring leadership advice. In a fireside chat with our CEO, Ramin Sayar, George talked about how technology has changed the threat landscape, and how nation-state actors are leveraging the pervasiveness of data to get inside our networks and businesses. Data is a powerful tool that can be used for good or bad. At Sumo Logic, we’re in it for the good. George also talked about what it means to be a leader and how to remain steadfast, even in times of uncertainty. Leaders have to lead within the context of who they are as human beings. If they try to adopt a persona of someone else, it destroys their credibility. The key to leaderships is self awareness of who you are, and understanding your limitations so that you can hire smart, talented people to fill those gaps. Leaders don’t create followers, they create other leaders. And that’s a wrap for Sumo Logic’s second annual user conference. Thanks to everyone who attended and supported the event. If we didn’t see you at Illuminate over the last two days, we hope you can join us next year! Additional Resources For data-driven industry insights, check out Sumo Logic’s third annual ‘State of Modern Applications and DevSecOps in the Cloud’ report. You can read about our latest platform innovations in our press release, or check out the cloud SIEM solution and Global Intelligence Service blogs. Check out our recent blog for a recap of the day one Illuminate keynote.

Blog

Illuminate Day One Keynote Top Five Takeaways

Today kicked off day one of Sumo Logic’s second annual user conference, Illuminate, and there was no better way to start the day than with a keynote presentation from our CEO, Ramin Sayar, and some of our most respected and valued customers, Samsung SmartThings and Major League Baseball (MLB). The event was completely sold out and the buzz and excitement could be felt as customers, industry experts, thought leaders, peers, partners and employees made their way to the main stage. If you were unable to catch the talk in person or tune in for the Facebook livestream, then read on for the top five highlights from the day one keynote. #1: Together, We’ve Cracked the Code to Machine Data At Sumo Logic, we’re experts in all things data. But, to make sure we weren’t biased, we partnered with 451 Research earlier this year to better understand how the industry is using machine data to improve overall customer experiences in today’s digital world. We found that 60 percent of enterprises are using machine data analytics using for business and customer insights, and to help support digital initiatives, usage and app performance. These unique findings have validated what we’ve been seeing within our own customer base over the past eight years — together, we can democratize machine data to make it easily accessible, understandable and beneficial to all teams within an organization. That’s why, as Ramin shared during the keynote, we’ve committed to hosting more meet-ups and global training and certification sessions, and providing more documentation, videos, Slack channels and other resources for our growing user base — all with the goal of ‘lluminating’ machine data for the masses, and to help customers win in today’s analytics economy. #2: Ask, and You Shall Receive Continued Platform Enhancements Day one was also a big day for some pretty significant platform enhancements and new solutions centered on three core areas: development, security and operations. The Sumo Logic dev and engineering teams have been hard at work, and have over 50 significant releases to show for it, all focused on meeting our customer’s evolving needs. Some of the newer releases on the Ops analytics side include Search Templates and Logs to Metrics. Search Templates empower non-technical users like customer support and product management, to leverage Sumo Logic’s powerful analytics without learning the query language. Logs to Metrics allow users to extract business KPIs from logs and cost-effectively convert them to high performance metrics for long-term trending and analysis. We’ve been hard at work on the security side of things as well, and are happy to announce the new cloud SIEM solution that’s going to take security analytics one step further. Our customers have been shouting from the rooftop for years that their traditional on-premises SIEM tool and rules-based correlation have let them down, and so they’ve been stuck straddling the line between old and new. With this entirely new, and first of its kind cloud SIEM solution, customers have a single, unified platform in the cloud, to help them meet their modern security needs. And we’re not done yet, there’s more to come. #3: Samsung SmartThings is Changing the World of Connected IoT Scott Vlaminck, co-founder and VP of engineering at Samsung SmartThings, shared his company’s vision for SmartThings to become the definitive platform for all IoT devices, in order to deliver the best possible smart home experience for their customers. And, as Scott said on stage, Sumo Logic helps make that possible by providing continuous intelligence of all operational, security and business data flowing across the SmartThings IoT platform, which receives about 200,000 requests per second a day! Scott talked about the company’s pervasive usage of the Sumo Logic platform, in which 95 percent of employees use Sumo Logic to report on KPIs, customer service, product insights, security metrics and app usage trends, and partner health metrics to drive deeper customer satisfaction. Having a fully integrated tool available to teams outside of traditional IT and DevOps is what continuous intelligence means for SmartThings. #4: Security is Everyone’s Responsibility at MLB When Neil Boland, the chief information security officer (CISO) for Major Baseball League took the stage, he shared how he and his security team are completely redefining what enterprise security means for a digital-first sports organization that has to manage, maintain and secure over 30 different leagues (which translates to 30 unique brands and 30 different attack vectors). Neil’s mission for 2018 is to blow up the traditional SIEM and MSSP models and reinvent them for his company’s 100 percent cloud-based initiatives. Neil’s biggest takeaway is that everyone at MLB is on the cybersecurity team, even non-technical groups like the help desk, and this shared responsibility helps strengthen overall security posture and continue to deliver uninterrupted sports entertainment to their fans. And Sumo Logic has been a force multiplier that helps Neil and his team achieve that collective goal. #5: Community, Community, Community Bringing the talk full circle, Ramin ended the keynote with a word about community, and how we are not only in it for our customers, but we’re in it with them, and we want to share data trends, usages, and best practices of the Sumo Logic platform with our ecosystem to provide benchmarking capabilities. That’s why today at Illuminate, we launched a new innovation — Global Intelligence Service — that focused on three key areas: Industry Insights, Community Insights and Data Science Insights. These insights will help customers extend machine learning and insights to new teams and use cases across the enterprise, and these are only possible with Sumo Logic’s cloud-native, multi-tenant architecture. For data-driven industry insights, check out Sumo Logic’s third annual ‘State of Modern Applications and DevSecOps in the Cloud’ report. You can read about our latest platform innovations in our press release, or check out the cloud SIEM solution and Global Intelligence Service blogs. Want the Day Two Recap? If you couldn’t join us live for day two of Illuminate, or were unable to catch the Facebook livestream, check out our second day keynote recap blog for the top highlights.

Blog

Announcing the Sumo Logic Global Intelligence Service at Illuminate 2018

In today’s hyper-connected world, a company’s differentiation is completely dependent upon delivering a better customer experience, at scale, and at a lower cost than the competition. This is no easy feat, and involves a combination of many things, particularly adopting new technologies and architectures, as well as making better use of data and analytics.Sumo Logic is committed to helping our customers excel in this challenging environment by making it easier to adopt the latest application architectures while also making the most of their precious data.The Power of the PlatformAs a multi-tenant cloud-native platform, Sumo Logic has a unique opportunity to provide context and data to our customers that is not available anywhere else. Why is this?First of all, when an enterprise wants to explore new architectures and evaluate options, it is very difficult to find broad overviews of industry trends based on real-time data rather than surveys or guesswork.Second, it is difficult to find reliable information about how exactly companies are using technologies at the implementation level, all the way down to the configurations and performance characteristics. Finally, once implemented, companies struggle to make the best use of the massive amount of machine data exhaust from their applications, particularly for non-traditional audiences like data scientists.It is with this backdrop in mind that Sumo Logic is announcing the Global Intelligence Service today during the keynote presentation at our second annual user conference, Illuminate, in Burlingame, Calif. This unprecedented initiative of data democratization is composed of three primary areas of innovation.Industry Insights — What Trends Should I be Watching?Sumo Logic is continuing to building on the success of its recently released third annual ‘State of Modern Applications and DevSecOps in the Cloud’ report to provide more real-time and actionable insights about industry trends. In order to stay on top of a constantly changing technology landscape, this report is expanding to include more frequent updates and instant-access options to help customers develop the right modern application or cloud migration strategy for their business, operational and security needs.Chart depicting clusters of AWS Services used frequently togetherCommunity Insights — What Are Companies like us, and Teams like ours, Doing?Sumo Logic is applying the power of machine learning to derive actionable insights for getting the most out of your technology investments. We have found that many engineering teams lack the right resources and education needed to make the best technology choices early on in their prototyping phases. And then, when the system is in production, it is often too late to make changes. That’s why Sumo Logic has an opportunity to save our customers pain and frustration by giving them benchmarking and comparison information when they most need it.We all like to think that our use cases are each a beautiful, unique snowflake. The reality is that, while each of us is unique, our uses of technology fall into some predictable clusters. So, looking over a customer base of thousands, Sumo Logic can infer patterns and best practices about how similar organizations are using technologies.Uses those patterns, we will be building recommendations and content for our customers that can be used to compare performance against a baseline of usage across their peers.Chart depicting how the performance behavior across customers tend to clusterData Science Insights — Data Scientists Need Love, TooData scientists are under more pressure than ever to deliver stunning results, while also getting pushback from society about the quality of their models and the biases that may or may not be there. At the end of the day, while data scientists have control over their models, they may have less control over the data.If the data is incomplete or biased in any way that can directly influence the results. To alleviate this issue, Sumo Logic is providing an open source integration with the industry standard Jupyter and Apache Zeppelin notebooks in order to make it easier for data scientists to leverage the treasure trove of knowledge currently buried in their application machine data.Empower the People who Power Modern BusinessYou may still be wondering, why does all of this matter?At the end of the day, it is all about making our customers successful by making their people successful. A business is only as effective as the people who do the work, and it is our mission at Sumo Logic to empower those users to excel in their roles, which in return contributes to overall company growth and performance.And we also want to set users outside of the traditional IT, DevOps, and security teams up for success as well by making machine data analytics more accessible for them.So, don’t forget that you heard it here first: Democratizing machine data is all about empowering the people with love (and with unique machine data analytics and insights)!Additional ResourcesDownload the 2018 ‘State of Modern Applications and DevSecOps in the Cloud’ report and/or read the press release for more detailed insights.Read the Sumo Logic platform enhancement release to learn more about our latest platform enhancements and innovationsSign up for Sumo Logic for free

Blog

Introducing Sumo Logic’s New Cloud SIEM Solution for Modern IT

Blog

Sumo Logic's Third Annual State of Modern Apps and DevSecOps in the Cloud Report is Here!

Blog

Why Cloud-Native is the Way to Go for Managing Modern Application Data

Are your on-premises analytics and security solutions failing you in today’s digital world? Don’t have the visibility you need across your full application stack? Unable to effectively monitor, troubleshoot and secure your microservices and multi-cloud architectures? If this sounds like your organization, then be sure to watch this short video explaining why a cloud-native, scalable and elastic machine data analytics platform approach is the right answer for building, running and securing your modern applications and cloud infrastructures.To learn more about how Sumo Logic is uniquely positioned to offer development, security and operations (DevSecOps) teams the right tools for their cloud environments, watch our Miles Ahead in the Cloud and DevOps Redemption videos, visit our website or sign up for Sumo Logic for free here.Video TranscriptionYou’ve decided to run your business in the cloud. You chose this to leverage all the benefits the cloud enables – speed to rapidly scale your business; elasticity to handle the buying cycles of your customers; and the ability to offload data center management headaches to someone else so you can focus your time, energy and innovation on building a great customer experience.So, when you need insights into your app to monitor, troubleshoot or learn more about your customers, why would you choose a solution that doesn’t work the same way?Why would you manage your app with a tool that locks you into a peak support contract, one that’s not designed to handle the unpredictability of your data?Sumo Logic is a cloud-native, multi-tenant service that lets you monitor, troubleshoot, and secure your application with the same standards of scalability, elasticity and security you hold yourself to.Sumo Logic is built on a modern app stack for modern app stacks.Its scalable………….elastic…………resilient cloud architecture has the agility to move as fast as your app moves, quickly scaling up for data volume.Its advanced analytics based on machine learning are designed to cope with change. So, when that data volume spikes, Sumo Logic is there with the capacity and the answers you need.Sumo Logic is built with security as a 1st principle. That means security is baked in at the code level, and that the platform has the credentials and attestations you need to manage compliance for your industry.Sumo Logic’s security analytics and integrated threat intelligence also help you detect threats and breaches faster, with no additional costs.Sumo Logic delivers all this value in a single platform solution. No more swivel chair analytics to slow you down or impede your decision-making. You have one place to see and correlate the continuum of operations, security and customer experience analytics – this is what we call continuous intelligence for modern apps.So, don’t try to support your cloud app with a tool that was designed for the old, on-premise world, or a pretend cloud-tool.Leverage the intelligence solution that fully replicates what you’re doing with your own cloud-business — Sumo Logic, the industry leading, cloud-native, machine data analytics platform delivered to you as a service.Sumo Logic. Continuous Intelligence for Modern Applications.

Blog

Top Reasons Why You Should Get Sumo Logic Certified, Now!

Blog

How Our Customers Influence the Sumo Logic Product

Sumo Logic is no different than most companies — we are in the service of our customers and we seek to build a product that they love. As we continue to refine the Sumo Logic platform, we’re also refining our feedback loops. One of those feedback loops is internal dogfooding and learning how our own internal teams such as engineering, sales engineering and customer success, experience the newest feature. However, we know that that approach can be biased. Our second feedback loop is directly from our customers, whose thoughts are then aggregated, distilled and incorporated into the product. The UX research team focuses on partnering with external customers as well as internal Sumo Logic teams that regularly use our platform, to hear their feedback and ensure that the product development team takes these insights into account as they build new capabilities. Our Product Development Process Sumo Logic is a late-stage startup, which means that we’re in the age of scaling our processes to suit larger teams and to support new functions. The processes mentioned are in various stages of maturity, and we haven’t implemented all of these to a textbook level of perfection (yet!). Currently, there are two facets to the product development process. The first is the discovery side, for capabilities that are entirely new, while the second is focused on delivery and improving capabilities that currently exist in the product. The two sides run concurrently, as opposed to sequentially, with the discovery side influencing the delivery side. The teams supporting both sides are cross-functional in nature, consisting of engineers, product managers and product designers. Adapted from Jeff Patton & Associates Now that we’ve established the two aspects to product development, we’ll discuss how customer feedback fits into this. Customer feedback is critical to all product decisions at Sumo Logic. We seek out the opinions of our customers when the product development team has questions that need answers before they can proceed. Customer feedback usually manifests in two different categories: broad and granular. Broad Customer Questions The more high level questions typically come from the discovery side. For example, we may get a question like this: “should we build a metrics product?” For the teams focused on discovery, UX research starts with a clear hypothesis and is more open-ended and high level. It may consist of whiteboarding with our customers or observing their use cases in their workspaces. The insights from this research might spawn a new scrum team to build a capability, or the insights could indicate we should focus efforts elsewhere. Granular Customer Questions By contrast, UX research for delivery teams is much more focused. The team likely has designs or prototypes to illustrate the feature that they’re building, and their questions tend to focus on discoverability and usability. For instance, they may be wondering if customers can find which filters apply to which dashboard panels. The outcomes from this research give the team the necessary data to make decisions and proceed with design and development. Occasionally, the findings from the discovery side will influence what’s going on the delivery side. The UX Research Process at Sumo Logic The diagram below describes the milestones during our current UX research process, for both discovery and delivery teams. As a customer, the most interesting pieces of this are the Research Execution and the Report Presentation, as these include your involvement as well as how your input impacts the product. UX Research Execution Research execution takes a variety of forms, from on-site observation to surveys to design research with a prototype. As a customer, you’re invited to all types of research, and we are always interested in your thoughts. Our ideal participants are willing to share how they are using the Sumo Logic platform for their unique operational, security and business needs, and to voice candid opinions. Our participants are also all over the emotional spectrum, from delighted to irritated, and we welcome all types. The immediate product development team takes part in the research execution. For example, if we’re meeting with customers via video conference, we’ll invite engineers, product management and designers to observe research sessions. There’s a certain realness for the product development team when they see and hear a customer reacting to their work, and we’ve found that it increases empathy for our customers. This is very typical for our qualitative UX research sessions, and what you can expect as a participant. In the above clip, Dan Reichert, a Sumo Logic sales engineer, discusses his vision for a Data Allocation feature to manage ingest. Research Presentation After the UX research team has executed the research, we’ll collect all data, video, photos and notes. We’ll produce a report with the key and detailed insights from the research, and we’ll present the report to the immediate product development team. These report readouts tend to be conversational, with a lengthy discussion of the results, anecdotes and recommendations from the UX researcher. I’ve found that the teams are very interested in hearing specifics of how our customers are using the product, and how their efforts will influence that. After the report readout, the product development team will meet afterward to discuss how they’ll implement the feedback from the study. The UX researcher will also circulate the report to the larger product development team for awareness. The insights are often useful for other product development teams, and occasionally fill in knowledge gaps for them. How Can I Voice My Thoughts and Get Involved in UX Research at Sumo Logic? We’d love to hear how you’re using Sumo Logic, and your feedback for improvement. We have a recruiting website to collect the basics, as well as your specific interests within the product. Our UX research team looks forward to meeting you!

Blog

Understanding Sumo Logic Query Language Design Patterns

Blog

A Look Inside Being a Web UI Engineering Intern at Sumo Logic

Hello there! My name is Sam and this summer I’ve been an intern at Sumo Logic. In this post I’ll share my experience working on the web UI engineering team and what I learned from it. A year ago I started my Master of Computer Science degree at Vanderbilt University and since the program is only two years long, there’s only one internship slot before graduation. So I needed to find a good one. Like other students, I wanted the internship to prepare me for my future career by teaching me about work beyond just programming skills while also adding a reputable line to my resume. So after months of researching, applying, preparing and interviewing, I officially joined the Sumo Logic team in May. The Onboarding Experience The first day was primarily meeting a lot of new people, filling out paperwork, setting up my laptop and learning which snacks are best at the office (roasted almonds take the win). The first couple of weeks were a heads-down learning period. I was learning about the Sumo Logic machine data analytics platform — everything from why it is used and how it works to what it is built on. We also had meetings with team members who explained the technologies involved in the Sumo Logic application. In general, though, the onboarding process was fairly flexible and open ended, with a ton of opportunities to ask questions and learn. Specifically, I enjoyed watching React Courses as a part of my on boarding. In school I pay to learn this, but here I am the one being paid 🙂 Culture and Work Environment The culture and work environment are super nice and relaxed. The developers are given a lot of freedom in how and what they are working on, and the internship program is very adaptable. I was able to shape my role throughout the internship to focus on tasks and projects that were interesting to me. Of course, the team was very helpful in providing direction and answering my questions, but it was mostly up to me to decide what I would like to do. The phrase that I remember best was from my manager. On my second week at Sumo Logic he said: “You don’t have to contribute anything — the most important thing is for you to learn.” The thing that surprised me the most at Sumo Logic is how nice everyone is. This is probably the highest “niceness per person” ratio I’ve ever experienced in my life. Almost every single person I’ve met here is super friendly, humble, open minded and smart. These aspects of the culture helped me greatly. Summer Outside Sumo Logic One of the important factors in choosing a company for me was its location. I am from Moscow, Russia, and am currently living in Nashville while I attend Vanderbilt, but I knew that this summer I definitely wanted to find an internship in the heart of the tech industry — Silicon Valley. Lucky for me, Sumo Logic is conveniently located right in the middle of it in Redwood City. I also enjoyed going to San Francisco on weekends to explore the city, skateboarding to Stanford from my home and visiting my friend at Apple’s Worldwide Developers Conference (WWDC) in San Jose. I liked the SF Bay Area so much that I don’t want to work anywhere else in the foreseeable future! Actual Projects: What Did I Work On? The main project that I work on is a UI component library. As the company quickly grows, we strive to make the UI components more consistent — visually and written — in standard and the code more maintainable. We also want to simplify the communication about the UI between the Dev and Design teams. I was very excited about the future impact and benefit of this project for the company, and had asked the team join this effort. A cool thing about this library is that it is a collection of fresh and independent React components that will be then used by developers in creation of all parts of the Sumo Logic app. It is a pleasure to learn the best practices while working with cutting edge libraries like React. If that sounds interesting to you, check out this blog from one of my Sumo Logic colleagues on how to evaluate and implement react table alternatives into your project. Things I Learned That I Didn’t Know Before How professional development processes are structured How companies work, grow and evolve How large projects are organized and maintained How to communicate and work on a team What a web-scale application looks from the inside And, finally, how to develop high quality React components Final Reflection Overall, I feel like spending three months at Sumo Logic was one of the most valuable and educational experiences I’ve ever had. I received a huge return on investment of time and moved much closer to my future goals of gaining relevant software development knowledge and skills to set me up for a successful career post-graduation. Additional Resources Want to stay in touch with Sumo Logic? Follow & connect with us on Twitter, LinkedIn and Facebook for updates. If you want to learn more about our machine data analytics platform visit our “how it works” page!

Blog

Black Hat 2018 Buzzwords: What Was Hot in Security This Year?

It’s been a busy security year, with countless twists and turns, mergers, acquisitions and IPOs, and most of that happening in the lead up to one of the biggest security conferences of the year — Black Hat U.S.A. Each year, thousands of hackers, security practitioners, analysts, architects, executives/managers and engineers from varying industries and from all over the country (and world) descend on the desert lands of the Mandalay Bay Resort & Casino in Las Vegas for more than a week of trainings, educational sessions, networking and the good kind of hacking (especially if you stayed behind for DefCon26). Every Black Hat has its own flavor, and this year was no different. So what were some of the “buzzwords” floating around the show floor, sessions and networking areas? The Sumo Logic security team pulled together a list of the hottest, newest, and some old, but good terms that we overheard and observed during our time at Black Hat last week. Read on for more, including a recap of this year’s show trends. And the Buzzword is… APT — Short for advanced persistent threat Metasploit — Provides information about security vulnerabilities and used in pen testing Pen Testing (or Pentesting) — short for penetration testing. Used to discover security vulnerabilities OSINT — Short for open source intelligence technologies XSS — Short for cross site scripting, which is a type of attack commonly launched against web sites to bypass access controls White Hat — security slang for an “ethical” hacker Black Hat — a hacker who violates computer security for little reason beyond maliciousness or personal gain Red Team — Tests the security program (Blue Team) effectiveness by using techniques that hackers would use Blue Team — The defenders against Red Team efforts and real attackers Purple Team — Responsible for ensuring the maximum effectiveness of both the Red and Blue Teams Fuzzing or Fuzz Testing — Automated software that performs invalid, unexpected or random data as inputs to a computer program that is typically looking for structured content, i.e. first name, last name, etc. Blockchain — Widely used by cryptocurrencies to distribute expanding lists of records (blocks), such as transaction data, which are virtually “chained” together by cryptography. Because of their distributed and encrypted nature the blocks are resistant to modification of the data. SOC — Short for security operations center NOC — Short for network operations center Black Hat 2018 Themes There were also some pretty clear themes that bubbled to the top of this year’s show. Let’s dig into them. The Bigger, the Better….Maybe Walking the winding labyrinth that is the Mandalay Bay, you might have overheard conference attendees complaining that this year, Black Hat was bigger than in year’s past, and to accommodate for this, the show was more spread out. The business expo hall was divided between two rooms: a bigger “main” show floor (Shoreline), and a second, smaller overflow room (Oceanside), which featured companies new to the security game, startups or those not ready to spend big bucks on flashy booths. While it may have been a bit confusing or a nuisance for some to switch between halls, the fact that the conference is outgrowing its own space is a good sign that security is an important topic and more organizations are taking a vested interest in it. Cloud is the Name, Security is the Game One of the many themes at this year’s show was definitely all things cloud. Scanning the booths, you would have noticed terms around security in the cloud, how to secure the cloud, and similar messaging. Cloud has been around for a while, but seems to be having a moment in security, especially as new, agile cloud-native security players challenge some of the legacy on-premises vendors and security solutions that don’t scale well in a modern cloud, container or serverless environment. In fact, according to recent Sumo Logic research, 93 percent of responding enterprises face challenges with security tools in the cloud, and 49 percent state that existing legacy tools aren’t effective in the cloud. Roses are Red, Violets are Blue, FUD is Gone, Let’s Converge One of the biggest criticisms of security vendors (sometimes by other security vendors) is all of the language around fear, uncertainty and doubt (FUD). This year, it seems that many vendors have ditched the fearmongering and opted for collaboration instead. Walking the expo halls, there was a lot of language around “togetherness,” “collaboration” and the general positive sentiment that bringing people together to fight malicious actors is more helpful than going at it alone in siloed work streams. Everything was more blue this year. Usually, you see the typical FUD coloring: reds, oranges , yellows and blacks, and while there was still some of that, the conference felt brighter and more uplifting this year with purples, all shades of blues, bright greens, and surprisingly… pinks! There was also a ton of signage around converging development, security and operations teams (DevSecOps or SecOps) and messaging, again, that fosters an “in this together” mentality that creates visibility across functions and departments for deeper collaboration. Many vendors, including Sumo Logic have been focusing on security education, offering and promoting their security training, certification and educational courses to make sure security is a well-understood priority for stakeholders across all lines of the business. Our recent survey findings also validate the appetite for converging workflows, with 54 percent of respondents citing a greater need for cross-team collaboration (DevSecOps) to effectively investigate, prioritize and correlate threats for faster remediation. Three cheers for that! Sugar and Socks and Everything FREE Let’s talk swag. Now this trend is not entirely specific to Black Hat, but it seems each year, the booth swag gets sweeter (literally) with vendors offering doughnut walls, chocolates, popcorn and all sorts of tasty treats to reel people into conversation (and get those badge scans). There’s no shortage of socks either! Our friends at HackerOne were giving out some serious booth swag, and you better believe we weren’t headed home without grabbing some! Side note: Read the latest HackerOne blog or watch the latest SnapSecChat video to learn how our Sumo Logic security team has taken a DevSecOps approach to bug bounties that creates transparency and collaboration between hackers, developers, and external auditors to improve security posture. Sumo swag giveaways were in full swing at our booth, as well. We even raffled off a Vento drone for one lucky Black Hat winner to take home! Parting Thoughts As we part ways with 100 degree temps and step back into our neglected cubicles or offices this week, it’s always good to remember the why. Why do we go to Black Hat, DefCon, BSides, and even RSA? It’s more than socializing and partying, it’s to connect with our community, to learn from each other and to make the world a more secure and bette place for ourselves, and for our customers. And with that, we’ll see you next year! Additional Resources For the latest Sumo Logic cloud security analytics platform updates, features and capabilities, read the latest press release. Want to learn more about Sumo Logic security analytics and threat investigation capabilities? Visit our security solutions page. Interested in attending our user conference next month, Iluminate? Visit the webpage, or check out our latest “Top Five Reasons to Attend” blog for more information. Download and read our 2018 Global Security Trends in the Cloud report or the infographic for more insights on how the security and threat landscape is evolving in today’s modern IT environment of cloud, applications, containers and serverless computing.

Blog

Top Five Reasons to Attend Illuminate18

Last year Sumo Logic launched its first user conference, Illuminate. We hosted more than 300 fellow Sumo Logic users who spent two days getting certified, interacting with peers to share best practices and lots of mingling with Sumo’s technical experts (all while having fun). The result? Super engaged users with a new toolbox to take back to their teams to make the most of their Sumo Logic platform investment, and get the real-time operational and security insights needed to better manage and secure their modern applications and cloud infrastructures. Watch last year’s highlight reel below: This piece of feedback from one attendee sums up the true value of Illuminate: “In 48 hours I already have a roadmap of how to maximize the use of Sumo Logic at my company and got a green light from my boss to move forward.” — Sumo Logic Customer / Illuminate Attendee Power to the People This year’s theme for Illuminate is “Empowering the People Who Power Modern Business” and is expected to attract more than 500 attendees who will participate in a unique interactive experience including over 40 sessions, Ask the Expert bar, partner showcase and Birds of a Feather roundtables. Not enough to convince you to attend? Here are five more reasons: Get Certified – Back by popular demand, our multi-level certification program provides users with the knowledge, skills and competencies to harness the power of machine data analytics and maximize investments in the Sumo Logic platform. Bonus: we have a brand new Sumo Security certification available at Illuminate this year designed to teach users how to increase the velocity and accuracy of threat detection and strengthen overall security posture. Hear What Your Peers are Doing – Get inspired and learn directly from your peers like Major League Baseball, Genesys, USA TODAY NETWORK, Wag, Lending Tree, Samsung SmartThings, Informatica and more about how they implemented Sumo Logic and are using it to increase productivity, revenue, employee satisfaction, deliver the best customer experiences and more. You can read more about the keynote speaker line up in our latest press release. Technical Sessions…Lots of Them – This year we’ve broaden our breakout sessions into multiple tracks including Monitoring and Troubleshooting, Security Analytics, Customer Experience and Dev Talk covering tips, tricks and best practices for using Sumo Logic around topics including Kubernetes, DevSecOps, Metrics, Advanced Analytics, Privacy-by-Design and more. Ask the Experts – Get direct access to expert advice from Sumo Logic’s product and technical teams. Many of these folks will be presenting sessions throughout the event, but we’re also hosting an Ask the Expert bar where you can get all of your questions answered, see demos, get ideas for dashboards and queries, and see the latest Sumo Logic innovations. Explore the Modern App Ecosystem – Sumo Logic has a rich ecosystem of partners and we have a powerful set of joint integrations across the modern application stack to enhance the overall manageability and security for you. Stop by the Partner Pavilion to see how Sumo Logic works with AWS, Carbon Black, CrowdStrike, JFrog, LightStep, MongoDB, Okta, OneLogin, PagerDuty, Relus and more. By now you’re totally ready for the Illuminate experience, right? Check out the full conference agenda here. These two days will give you all of the tools you need (training, best practices, new ideas, peer-to-peer networking, access to Sumo’s technical experts and partners) so you can hit the ground running and maximize the value of the Sumo Logic platform for your organization. Register today, we look forward to seeing you there!

Blog

Get Miles Ahead of Security & Compliance Challenges in the Cloud with Sumo Logic

Blog

SnapSecChat: A DevSecOps Approach to Bug Bounties with Sumo Logic & HackerOne

Regardless of industry or size, all organizations need a solid security and vulnerability management plan. One of the best ways to harden your security posture is through penetration testing and inviting hackers to hit your environment to look for weak spots or holes in security. However, for today’s highly regulated, modern SaaS company, the traditional check-box compliance approach to pen testing is failing them because it’s slowing them down from innovating and scaling. That’s why Sumo Logic Chief Security Officer and his team have partnered with HackerOne to implement a modern bug bounty program that takes a DevSecOps approach. They’ve done this by building a collaborative community for developers, third-party auditors and hackers to interact and share information in an online portal that creates a transparent bug bounty program that uses compliance to strengthen security. By pushing the boundaries and breaking things, it collectively makes us stronger, and it also gives our auditors a peek inside the kimono and more confidence in our overall security posture. It also moves the rigid audit process into the DevSecOps workflow for faster and more effective results. To learn more about Sumo Logic’s modern bug bounty program, the benefits and overall positive impact it’s had on not just the security team, but all lines of the business, including external stakeholders like customers, partners and prospects, watch the latest SnapSecChat video series with Sumo Logic CSO, George Gerchow. And if you want to hear about the results of Sumo Logic’s four bounty challenge sprints, head on over to the HackerOne blog for more. If you enjoyed this video, then be sure to stay tuned for another one coming to a website near you soon! And don’t forget to follow George on Twitter at @GeorgeGerchow, and use the hashtag #SnapSecChat to join the security conversation! Stop by Sumo Logic’s booth (2009) at Black Hat this week Aug 8-9, 2018 at The Mandalay Bay in Las Vegas to chat with our experts and to learn more about our cloud security analytics and threat investigation capabilities. Happy hacking!

Blog

Building Replicated Stateful Systems using Kafka as a Commit Log

Blog

Employee Spotlight: A Dreamer with a Passion for Product Design & Mentoring

In this Sumo Logic Employee Spotlight we interview Rocio Lopez. A lover of numbers, Rocio graduated from Columbia University with a degree in economics, but certain circumstances forced her to forego a career in investment banking and instead begin freelancing until she found a new career that suited her talents and passions: product design. Intrigued? You should be! Read Rocio’s story below. She was a delight to interview! When Creativity Calls Q: So tell me, Rocio, what’s your story? Rocio Lopez (RL): I am a product designer at Sumo Logic and focus mostly on interaction design and prototyping new ideas that meet our customers’ needs. Q: Very cool! But, that’s not what you went to school for, was it? RL: No. I studied economics at Columbia. I wanted to be an investment banker. Ever since I was a little girl, I’ve been a nerd about numbers and I love math. Part of it was because I remember when the Peso was devalued and my mom could no longer afford to buy milk. I became obsessed with numbers and this inspired my college decision. But the culture and career path at Columbia was clear — you either went into consulting or investment banking. I spent a summer shadowing at Citigroup (this was during the height of the financial crisis), and although my passion was there, I had to turn down a career in finance because I was here undocumented. Q: That’s tough. So what did you do instead? RL: When I graduated in 2011, I started doing the things I knew how to do well like using Adobe Photoshop and InDesign to do marketing for a real estate company or even doing telemarketing. I eventually landed a gig designing a database for a company called Keller Williams. They hired an engineer to code the database, but there was no designer around to think through the customer experience so I jumped in. Q: So that’s the job that got you interested in product design? RL: Yes. And then I spent a few years at Cisco in the marketing organization where they needed help revamping their training platforms. I started doing product design without even knowing what it was until a lead engineer called it out. I continued doing small design projects, started freelancing and exploring on my own until I connected with my current manager, Daniel Castro. He was hiring for a senior role, and while I was not that senior, the culture of the team drew me in. Q: Can you expand on that? RL: Sure. The design team at Sumo Logic is very unique. I’ve spent about seven years total in the industry and what I’ve been most impressed by is the design culture here, and the level of trust and level-headedness the team has. I’ve never come across this before. You would think that because we’re designing an enterprise product that everyone would be very serious and buckled up, but it’s the opposite. The Life of a Dreamer Q: Let’s switch gears here. I heard you on NPR one morning, before I even started working at Sumo Logic. Tell me about being a dreamer. RL: People come to the U.S. undocumented because they don’t know of other ways to come legally or the available paths for a visa aren’t a match for them because they may not have the right skills. And those people bring their families. I fell into that category. I was born in Mexico but my parents came over to the U.S. seeking a better life after the Tequila crisis. I grew up in Silicon Valley and went to school like any other American kid. When Barack Obama was in office, he created an executive order known as the Deferred Action for Childhood Arrivals (DACA) program, since Congress has failed to passed legislative action since 2001. To qualify for the program, applicants had to have arrived in the U.S. before age 16 since June 15, 2007 and pass a rigorous background check by homeland security every two years. . I fell into this category and was able to register in this program. Because most of the immigrants are young children who were brought here at a very young age, we’ve sort of been nicknamed “dreamers” after the 2001 DREAM Act (short for Development, Relief and Education for Alien Minors Act). Q: And under DACA you’ve been able to apply for a work permit? RL: That’s right. I have a work permit, I pay income taxes, and I was able to attend college just like a U.S. citizen, although I am still considered undocumented and that comes with certain limitations. For instance, my employer cannot sponsor me and I cannot travel outside the United States. The hope was that Congress would create a path for citizenship for Dreamers, but now that future is a bit uncertain after they failed to meet the deadline to pass a bill in March. For now I have to wait until the Supreme Court rules the constitutionality of DACA to figure out my future plans. Q: I can only imagine how difficult this is to live with. What’s helped you through it? RL: At first I was a big advocate, but now I try to block it out and live in the present moment. And the opportunity to join the Sumo Logic design team came at the right time in my life. I can’t believe what I do every day is considered work. The team has a very unique way of nurturing talent and it’s something I wish more companies would do. Our team leaders make sure we have fun in addition to getting our work done. We usually do team challenges, dress up days, etc. that really bring us all together to make us feel comfortable, encourage continued growth, and inspire us to feel comfortable speaking up with new ideas. I feel like the work I am doing has value and is meaningful, and we are at the positive end of the “data conversation.” I read the news and see the conversations taking place with companies like Facebook and Airbnb that are collecting our personal data. It’s scary to think about. And it feels good to be on the other side of the conversation; on the good side of data and that’s what gets me excited and motivated. Sumo Logic is collecting data and encrypting it and because we’re not on the consumer-facing side, we can control the lens of how people see that data. We can control not only the way our customers collect data but also how they parse and visualize it. I feel we’re at the cusp of a big industry topic that’s going to break in the next few years. Q: I take it you’re not on social media? RL: No. I am completely off Facebook and other social media platforms. When I joined Sumo Logic, I became more cautious of who I was giving my personal data to. Advice for Breaking into Design & Tech? Q: Good for you! So what advice to you have for people thinking of switching careers? RL: From 2011 to now I’ve gone through big career changes. There are a lot of people out there that need to understand how the market is shifting, that some industries like manufacturing, are not coming back, and that requires an adaptive mindset. The money and opportunity is where technology and data are and if people can’t transition to these new careers in some capacity, they’re going to be left out of the economy and will continue to have problems adjusting. It’s a harsh reality, but we have to be able to make these transitions because in 15 or 20 years from now, the world will look very different. I’ve been very active in mentoring people that want to break into technology but aren’t sure how. Q: What’s some of the specific advice related to a career path in UX/design that you give your mentees? RL: Sometimes you have to breakaway from traditions like school or doing a masters program and prioritize the job experience. Design and engineering are about showing you’ve done something, showing a portfolio. If you can change your mindset to this, you will be able to make the transition more smoothly. I also want to reiterate that as people are looking for jobs or next careers, it’s important to find that place that is fun and exciting. A place where you feel comfortable and can be yourself and also continue to grow and learn. Find meaning, find value, and find the good weird that makes you successful AND happy. Stay in Touch Stay in touch with Sumo Logic & connect with us on Twitter, LinkedIn and Facebook for updates. Want to work here? We’re hiring! Check out our careers page to join the team. If you want to learn more about our machine data analytics platform visit our “how it works” page!

August 1, 2018

Blog

Postmortems Considered Beautiful

Outages and postmortems are a fact of life for any software engineer responsible for managing a complex system. And it can be safely said that those two words – “outage” and “postmortem,” do not carry any positive connotations in the remotest sense of the word. In fact, they are generally dreaded by most engineers. While that sentiment is understandable given the direct impact of such incidents on customers and the accompanying disruption, our individual perspective matters a lot here as well. If we are able to look beyond the damage caused by such incidents, we might just realize that outages and postmortems shouldn’t be “dreaded,” but instead, wholeheartedly embraced. One has to only try, and the negative vibes associated with these incidents may quickly give way to an appreciation of the complexity in modern big data systems. The Accidental Harmony of Layered Failures As cliche as it may sound, “beauty” indeed lies in the eyes of the beholder. And one of the most beautiful things about an outage/postmortem is the spectacular way in which modern big data applications often blow up. When they fail, there are often dozens of things that fail simultaneously, all of which collude, resulting in an outage. This accidental harmony among failures and the dissonance among the guards and defenses put in place by engineers, is a constant feature of such incidents and is always something to marvel at. It’s almost as if the resonance frequencies of various failure conditions match, thereby amplifying the overall impact. What’s even more surprising is the way in which failures at multiple layers can collude. For example, it might so happen that an outage-inducing bug is missed by unit tests due to missing test cases, or even worse, a bug in the tests! Integration tests in staging environments may have again failed to catch the bug, either due to a missing test case or disparity in the workload/configuration of staging/production environments. There could also be misses in monitoring/alerting, resulting in increased MTTIs. Similarly, there may be avoidable process gaps in the outage handling procedure itself. For example, some on-calls may have too high of an escalation timeout for pages or may have failed to update their phone numbers in the pager service when traveling abroad (yup, that happens too!). Sometimes, the tests are perfect, and they even catch the error in staging environments, but due to a lack of communication among teams, the buggy version accidentally gets upgraded to production. Outages are Like Deterministic Chaos In some sense, these outages can also be compared to “deterministic chaos” caused by an otherwise harmless trigger that manages to pierce through multiple levels of defenses. To top it off, there are always people involved at some level in managing such systems, so the possibility of a mundane human error is never too far away. All in all, every single outage can be considered as a potential case study of cascading failures and their layered harmony. An Intellectual Journey Another very deeply satisfying aspect of an outage/postmortem is the intellectual journey from “how did that happen?” to “that happened exactly because X, Y, Z.” Even at the system level, it’s necessary to disentangle the various interactions and hidden dependencies, discover unstated assumptions and dig through multiple layers of “why’s” to make sense of it all. When properly done, root cause analysis for outages of even moderately complex systems, demand a certain level of tenacity and perseverance, and the fruits of such labor can be a worthwhile pursuit in and of itself. There is a certain joy in putting the pieces of a puzzle together, and outages/postmortems present us exactly with that opportunity. Besides the above intangibles, outages and their subsequent postmortems have other very tangible benefits. They not only help develop operational knowledge, but also provide a focused path (within the scope of the outage) to learn about the nitty-gritty details of the system. At the managerial level too, they can act as road signs for course correction and help get the priorities right. Of course, none of the above is an excuse to have more outages and postmortems! We should always strive to build reliable, fault-tolerant systems to minimize such incidents, but when they do happen, we should take them in stride, and try to appreciate the complexity of the software systems all around us. Love thy outages. Love thy postmortems. Stay in Touch Want to stay in touch with Sumo Logic? Follow & connect with us on Twitter, LinkedIn and Facebook for updates. Visit our website to learn more about our machine data analytics platform and be sure to check back on the blog for more posts like this one if you enjoyed what you read!

Blog

11 New Google Cloud Platform (GCP) Apps for Continued Multi-Cloud Support

Blog

Sumo Smash Bros: What Creating a Video Game Taught Us About the Power of Data

As a longtime DevOps engineer with a passion for gaming and creating things, I truly believe that in order to present data correctly, you must first understand the utility of a tool without getting hung up on the output (data). To understand why this matters, I’ll use Sumo Logic’s machine data analytics platform as an example. With a better understanding of how our platform works, you’ll be able to turn seemingly disparate data into valuable security, operational or business insights that directly service your organization’s specific needs and goals. The Beginning of Something Great Last year, I was sitting with some colleagues at lunch and suggested that it would be super cool to have a video game at our trade show booth. We all agreed it was a great idea, and what started as a personal at-home project turned into a journey to extract game data, and present it in a compelling and instructive manner. The following is how this simple idea unfolded over time and what we learned as a result. Super Smash Bros Meets Sumo Logic The overall idea was solid, however, after looking at emulators and doing hours of research (outside of office hours), I concluded that it was a lot harder to extract data from an old school arcade game even working with an emulator. My only path forward would be to use a cheat engine to read memory addresses, and all the work would be done in Assembly, which is a low-level ‘80s era programming language. It’s so retro that the documentation was nearly impossible to find and I again found myself at another impasse. Another colleague of mine who is a board game aficionado, suggested I find an open source game online that I could add code to myself in order to extract data. Before I started my search, I set some parameters. What I was looking for was a game that had the following characteristics. It should be multiplayer It would ideally produce different types of data It would manifest multiple win conditions: game and social Enter Super Smash Bros (SSB), which met all of the above criteria. If you are not familiar with this game, it’s originally owned/produced by Nintendo and the appeal is that you and up to three other players battle each other in “King of the Hill” until there is an “official” game winner. It helps to damage your opponent first before throwing them off the hill. The game win condition is whoever has the most number of lives when the game ends, wins. And the game ends when either time runs out or only one player has lives left. However, this leaves holes for friends to argue who actually won. If you’ve ever played this game (which is one of strategy), there is a second kind of condition — a social win condition. You can “officially” win by the game rules but there’s context attached to “how” you won — a social win. Creating Sumo Smash Bros I found an open source clone of Super Smash Bros written in Javascript which runs entirely in a web browser. It was perfect. Javascript is a simple language and with the help of a friend to get started, we made it so we could group log messages that would go to the console where a developer could access it and then send it directly into the Sumo Logic platform. PRO TIP: If you want game controllers for an online video game like Super Smash Bros, use Xbox controllers not Nintendo! We would record certain actions in the code, such as: When a player’s animation changed What move a player performed Who hit who, when and for how much What each players’ lives were For example, an animation change would be whenever a player was punched by an opponent. Now by the game standards, very limited data determines who is the “official” winner of the game based on the predetermined rules, but with this stream of data now flowing into Sumo Logic, we could also identify the contextual “social win” and determine if and how they were different from the game rules. Here’s an example of a “social” win condition: Imagine there’s a group of four playing the game, and one of the players (player 1) hangs back avoiding brawls until two of the three opponents are out of lives. Player 1 jumps into action, gets a lucky punch on the other remaining player who has thus far dominated (and who is really damaged) and throws him from the ring to take the “official” game win. Testing the Theory When we actually played, the data showed exactly what I had predicted. First, some quick background on my opponents: Jason E. (AKA Jiggles) — He admits to having spent a good portion of his youth playing SSB, and he may have actually played in tournaments. Michael H. (AKA Killer) — He’s my partner in crime. We’re constantly cooking up crazy ideas to try out, both in and outside of work He also had plenty of experience with the game. Mikhail M. (AKA MM$$DOLLAB) — He has always been a big talker. He too played a lot, and talked a big talk. Originally I had intended for us to pseudo choreograph the game to get the data to come out “how I wanted” in order to show that while the game awarded a “winner” title to one player, the “actual winner” would be awarded by the friends to the player who “did the most damage to others” or some other parameter. It only took about three nanoseconds before the plan was out the window and we were fighting for the top. Our colleague Jason got the clear technical game win. We had recorded the game and had the additional streams of data, and when the dust had settled, a very different story emerged. For instance, Jason came in third place for our social win parameter of “damage dealt.” Watching the recording, it’s clear that Jason’s strategy was to avoid fighting until the end. When brawls happened, he was actively jumping around but rarely engaged with the other players. He instead waited for singled-out attacks. Smart, right? Maybe. We did give him the “game win,” however, based on the “damage dealt” social win rule, the order was: Michael, myself, then Jason, and Mikhail. Watch what happened for yourself: What’s the Bigger Picture? While this was a fun experiment, there’s also an important takeaway. At Sumo Logic, we ingest more than 100 terabytes of data each day — that’s the equivalent of data from about 200 Libraries of Congress per second. That data comes from all over — it’s a mix of log, event, metrics, security data coming not just from within an organization’s applications and infrastructure, but also from third party vendors. When you have more information, you can see trends and patterns, make inferences, technical and business decisions — you gain an entirely new level of understanding beyond the 1s and 0s staring back at you on a computer screen. People also appreciate the data for different reasons. For example, engineers only care that the website they served you is the exact page you clicked on. They don’t care if you searched for hats or dog food or sunscreen. But marketers care, a lot. Marketers care about your buying decisions and patterns and they use that to inform strong, effective digital marketing campaigns to serve you relevant content. At Sumo Logic, we don’t want our customers or prospects to get hung up on the data, we want them to look past that to first understand what our tool does, to understand how it can help them get the specific data they need to solve a unique problem or use case. “In the words of Sherlock Holmes, it’s a capital mistake to theorize before one has data.” — Kenneth Barry, Sumo Logic The types of data you are ingesting and analyzing only matters if you first understand your end goal, and have the proper tools in place — a means to an end. From there, you can extract and make sense of the data in ways that matter to your business, and each use case varies from one customer to another. Data powers our modern businesses and at Sumo Logic, we empower those who use this data. And we make sure to have fun along the way! Bonus: Behind the Scenes Video Q&A with Kenneth Additional Resources Visit our website to learn more about the power of machine data analytics and to download Sumo Logic for free to try it out for yourself Read our 2018 State of Modern Applications in the Cloud report Register to attend Illuminate, our annual user conference taking place Sept. 12-13, 2018 in Burlingame, Calif.

Blog

A Primer on Building a Monitoring Strategy for Amazon RDS

In a previous blog post, we talked about Amazon Relational Database Service (RDS). RDS is one of the most popular cloud-based database services today and extensively used by Amazon Web Services (AWS) customers for its ease of use, cost-effectiveness and simple administration. Although as a managed service, RDS doesn’t require database administrators (DBAs) to do many of the day-to-day tasks, it still needs to be monitored for performance and availability. That’s because Amazon doesn’t auto-tune any database performance — this is a shared responsibility of the customer. That’s why there should be a monitoring strategy and processes in place for DBAs and operation teams to keep an eye on their RDS fleet. In this blog post, we will talk about an overall best-practice approach for doing this. Why Database Monitoring Keeping a database monitoring regimen in place, no matter how simple, can help address potential issues proactively before they become incidents, and cost additional time and money. Most AWS infrastructure teams typically have decent monitoring in place for different types of resources like EC2, ELB, Auto Scaling Groups, Logs, etc. Database monitoring often comes at a later stage or is ignored altogether. With RDS, it’s also easy to overlook due to the low-administration nature of the service. The DBA or the infrastructure managers should therefore invest some time in formulating and implementing a database monitoring policy. Please note that designing an overall monitoring strategy is an involved process and is not just about defining database counters to monitor. It also includes areas like: Service Level Agreement Classifying incident types (Critical, Serious, Moderate, Low etc.) Creating RACI (Responsible, Accountable, Consulted, Informed) matrix Defining escalation paths etc.. A detailed discussion of all these topics is beyond the scope of this article, so we will concentrate on the technical part only. What to Monitor Database monitoring, or RDS monitoring in this case, is not about monitoring only database performance. A monitoring strategy should include the following broad categories and their components: Monitoring category Examples of what to monitor Availability Is the RDS instance or cluster endpoint accessible from client tools? Is there any instance stopping, starting, failed over or being deleted? Is there a failover of multi-AZ instances? Recoverability Is the RDS instance being backed up – both automatically and manually? Are individual databases being backed up successfully? Health and Performance What’s the CPU, memory and disk space currently in use? What’s the query latency? What’s the disk read/write latency? What’s the disk queue length? How many database connections are active? Are there any blocking and waiting tasks? Are there any errors or warnings reported in database log files? Are these related to application queries? Are they related to non-optimal configuration values? Are any of the scheduled jobs failing? Manageability Are there any changes in the RDS instances’ Tags Security groups Instance properties Parameter and option groups? Who made those changes and when? Security Which users are connecting to the database instance? What queries are they running? Cost How much each RDS instance is costing every month? While many of these things can be monitored directly in AWS, Sumo Logic can greatly help with understanding all of the logs and metrics that RDS produces. In this article, we will talk about what AWS offers for monitoring RDS. As we go along, we will point out where we think Sumo Logic can make the work easier. Monitoring Amazon CloudWatch You can start monitoring RDS using metrics from Amazon CloudWatch. Amazon RDS, like any other AWS service, exposes a number of metrics which are available through CloudWatch. There are three ways to access these metrics: From AWS Console Using AWS CLI Using REST APIs The image below shows some of these metrics from the RDS console: Amazon CloudWatch shows two types of RDS metrics: Built-in Metrics Enhanced Monitoring Metrics Built-in Metrics These metrics are available from any RDS instance. They are collected from the hypervisor of the host running the RDS virtual machine. Some of the metrics may not be available for all database engines, but the important ones are common. It is recommended the following RDS metrics are monitored from CloudWatch Metric What it means Why you should monitor it CPUUtilization % CPU load in the RDS instance. A consistent high value means one or more processes are waiting for CPU time while one or more processes are blocking it. DiskQueueDepth The number of input and output requests waiting for the disk resource. A consistent high value means disk resource contention – perhaps due to locking, long running update queries etc. DatabaseConnections The number of database connections against the RDS instance. A sudden spike should be investigated immediately. It may not mean a DDOS attack, but a possible issue with the application generating multiple connections per request. FreeableMemory The amount of RAM available in the RDS instance, expressed in bytes. A very low value means the instance is under memory pressure. FreeStorageSpace Amount of disk storage available in bytes. A small value means disk space is running out. ReadIOPS The average number of disk read operations per second. Should be monitored for sudden spikes. Can mean runaway queries. WriteIOPS The average number of disk write operations per second. Should be monitored for sudden spikes. Can mean a very large data modification ReadLatency The average time in milliseconds to perform a read operation from the disk. A higher value may mean a slow disk operation, probably caused by locking. WriteLatency The average time in milliseconds to perform a write operation to disk. A higher value may means disk contention. ReplicaLag How far in time, the read replica of MySQL, MariaDB or PostgreSQL instance is lagging behind from its master A high lag value can means read operations from replica is not serving the current data. Amazon RDS Aurora engine also exposes some extra counters which are really useful for troubleshooting. At the time of writing, Aurora supports MySQL and PostgreSQL only. We recommend monitoring these counters: Metric What it means Why you should monitor it DDLLatency The average time in milliseconds to complete Data Definition Language (DDL) commands like CREATE, DROP, ALTER etc. A high value means the database is having performance issues running DDL commands. This can be due to exclusive locks on objects. SelectLatency The average time in milliseconds to complete SELECT queries. A high value may mean disk contention, poorly written queries, missing indexes etc. InsertLatency The average time in milliseconds to complete INSERT commands. A high value may mean locking or poorly written INSERT command. DeleteLatency The average time in milliseconds to complete DELETE commands. A high value may mean locking or poorly written DELETE command. UpdateLatency The average time in milliseconds to complete UPDATE commands. A high value may mean locking or poorly written UPDATE command. Deadlocks The average number of deadlocks happening per second in the database. More than 0 should be a concern – it means the application queries are running in such a way that they are blocking each other frequently. BufferCacheHitRatio The percentage of queries that can be served by data already stored in memory It should be a high value, near 100, meaning queries are don’t have to access disk for fetching data. Queries The average number of queries executed per second This should have a steady, average value. Any sudden spike or dip should be investigated. You can use the AWS documentation for a complete list of built-in RDS metrics. Enhanced Monitoring Metrics RDS also exposes “enhanced monitoring metrics.” These are collected by agents running on the RDS instances’ operating system. Enhanced monitoring can be enabled when an instance is first created or it can be enabled later. It is recommended enabling it because it offers a better view of the database engine. Like built-in metrics, enhanced metrics are available from the RDS console. Unlike built-in metrics though, enhanced metrics are not readily accessible from CloudWatch Metrics console. When enhanced monitoring is enabled, CloudWatch creates a log group called RDSOSMetrics in CloudWatch Logs: Under this log group, there will be a log stream for each RDS instance with enhanced monitoring. Each log stream will contain a series of JSON documents as records. Each JSON document will show a series of metrics collected at regular intervals (by default every minute). Here is a sample excerpt from one such JSON document: { “engine”: “Aurora”, “instanceID”: “prodataskills-mariadb”, “instanceResourceID”: “db-W4JYUYWNNIV7T2NDKTV6WJSIXU”, “timestamp”: “2018-06-23T11:50:27Z”, “version”: 1, “uptime”: “2 days, 1:31:19”, “numVCPUs”: 2, “cpuUtilization”: { “guest”: 0, “irq”: 0.01, “system”: 1.72, “wait”: 0.27, “idle”: 95.88, “user”: 1.91, “total”: 4.11, “steal”: 0.2, “nice”: 0 },…… It’s possible to create custom CloudWatch metrics from these logs and view those metrics from CloudWatch console. This will require some extra work. However, both built-in and enhanced metrics can be streamed to Sumo Logic from where you can build your own charts and alarms. Regardless of platform, it is recommended to monitor the enhanced metrics for a more complete view of the RDS database engine. The following counters should be monitored for Amazon Aurora, MySQL, MariaDB, PostgreSQL, or Oracle: Metric Group Metric What it means and why you should monitor cpuUtilization user % of CPU used by user processes.

AWS

July 17, 2018

Blog

What is Blockchain, Anyway? And What Are the Biggest Use Cases?

Everyone’s talking about blockchain these days. In fact, there is so much hype about blockchains — and there are so many grand ideas related to them — that it’s hard not to wonder whether everyone who is excited about blockchains understands what a blockchain actually is. If, amidst all this blockchain hype, you’re asking yourself “what is blockchain, anyway?” then this article is for you. It defines what blockchain is and explains what it can and can’t do. Blockchain Is a Database Architecture In the most basic sense, blockchain is a particular database architecture. In other words, like any other type of database architecture (relational databases, NoSQL and the like), a blockchain is a way to structure and store digital information. (The caveat to note here is that some blockchains now make it possible to distribute compute resources in addition to data. For more on that, see below.) What Makes Blockchain Special? If blockchain is just another type of database, why are people so excited about it? The reason is that a blockchain has special features that other types of database architectures lack. They include: Maximum data distribution. On a blockchain, data is distributed across hundreds of thousands of nodes. While other types of databases are sometimes deployed using clusters of multiple servers, this is not a strict requirement. A blockchain by definition involves a widely distributed network of nodes for hosting data. Decentralization. Each of the nodes on a blockchain is controlled by a separate party. As a result, the blockchain database as a whole is decentralized. No single person or group controls it, and no single group or person can modify it. Instead, changes to the data require network consensus. Immutability. In most cases, the protocols that define how you can read and write data to a blockchain make it impossible to erase or modify data once it has been written. As a result, data stored on a blockchain is immutable. You can add data, but you can’t change what already exists. (We should note that while data immutability is a feature of the major blockchains that have been created to date, it’s not strictly the case that blockchain data is always immutable.) Beyond Data As blockchains have evolved over the past few years, some blockchain architectures have grown to include more than a way to distribute data across a decentralized network. They also make it possible to share compute resources. The Ethereum blockchain does this, for example, although Bitcoin—the first and best-known blockchain—was designed only for recording data, not sharing compute resources. If your blockchain provides access to compute resources as well as data, it becomes possible to execute code directly on the blockchain. In that case, the blockchain starts to look more like a decentralized computer than just a decentralized database. Blockchains and Smart Contracts Another buzzword that comes up frequently when discussing what defines a blockchain is a smart contract. A smart contract is code that causes a specific action to happen automatically when a certain condition is met. The code is executed on the blockchain, and the results are recorded there. This may not sound very innovative, but there are some key benefits and use cases. Any application could incorporate code that makes a certain outcome conditional upon a certain circumstance. If-this-then-that code stanzas are not really a big deal. What makes a smart contract different from a typical software conditional statement, however, is that because the smart contract is executed on a decentralized network of computers, no one can modify its outcomes. This feature differentiates smart contracts from conditional statements in traditional applications, where the application is controlled by a single, central authority, which has the power to modify it. Smart contracts are useful for governing things like payment transactions. If you want to ensure that a seller does not receive payment for an item until the buyer receives the item, you could write a smart contract to make that happen automatically, without relying on third-party oversight. Limitations of Blockchains By enabling complete data decentralization and smart contracts, blockchains make it possible to do a lot of interesting things that you could not do with traditional infrastructure. However, it’s important to note that blockchains are not magic. Most blockchains currently have several notable limitations. Transactions are not instantaneous. Bitcoin transactions take surprisingly long to complete, for example. Access control is complicated. On most blockchains, all data is publicly accessible. There are ways to limit access control, but they are complex. In general, a blockchain is not a good solution if you require sophisticated access control for your data. Security. While blockchain is considered a secure place for transactions and storing/sending sensitive data and information, there have been a few blockchain-related security breaches. Moving your data to a blockchain does provide an inherent layer of protection because of the decentralization and encryption features, however, like most things, it does not guarantee that it won’t be hacked or exploited. Additional Resources Watch the latest SnapSecChat videos to hear what our CSO, George Gerchow, has to say about data privacy and the demand for security as a service. Read a blog on new Sumo Logic research that reveals why a new approach to security in the cloud is required for today’s modern businesses. Learn what three security dragons organizations must slay to achieve threat discovery and investigation in the cloud.

Blog

Comparing Europe’s Public Cloud Growth to the Global Tech Landscape

Blog

React Tables: How to Evaluate Options and Integrate a Table into Your Project

Blog

Thoughts from Gartner’s 2018 Security & Risk Management Summit

Blog

Deadline to Update PCI SSL & TLS Looms, Are You Ready?

Quick History LessonEarly internet data communications were enabled through the use of a protocol called HyperText Transmission Protocol (HTTP) to transfer data between nodes on the internet. HTTP essentially establishes the “request-response” rules to be used between a “client” (i.e. web browser) and “server”(computer hosting a website) throughout the session. While the use of HTTP grew along with internet adoption, its lack of security protocols left internet communications vulnerable to attacks from malicious actors.In the mid-nineties, Secure Sockets Layer (SSL) was developed to close this gap. SSL is known as a “cryptographic protocol” standard established to enable the privacy and integrity of the bidirectional data being transported via HTTP. You may be familiar with HTTPS or HyperText Transmission Protocol over SSL (a.k.a. HTTP Secure). Transport Layer Security (TLS) version 1.0 (v1.0) was developed in 1999 as an enhancement to the then current SSL v3.0 protocol standard. TLS standards matured over time with TLS v1.1 [2006] and TLS v1.2 [2008].Early Security Flaws Found in HTTPSWhile both SSL and TLS protocols remained effective for some time, in October of 2014, Google’s security team discovered a vulnerability in SSL version 3.0. Skilled hackers were able to use a technique called Padding Oracle On Downgraded Legacy Encryption — widely referred to as the “POODLE” exploit to bypass the SSL security and decrypt sensitive (HTTPS) information including secret session cookies. By doing this, hackers could then hijack user accounts.In December 2014, the early versions of TLS were also found to be vulnerable from a new variant of the POODLE attack exploits, that enabled hackers to downgrade the protocol version to one that was more vulnerable.Poodle Attacks Spur Changes to PCI StandardsSo what do POODLE attacks have to do with Payment Card Industry Data Security Standards (PCI DSS) standards and compliance? PCI DSS Requirement 4.1 mandates the use of “strong cryptography and security protocols to safeguard sensitive cardholder data during transmission” and these SSL vulnerabilities (and similar variants) also meant sensitive data associated with payment card transactions was also open to these risks. And in April of 2015 the PCI Standards Security Council (SSC) issued a revised set of industry standards — PCI DSS v3.1, which stated “SSL has been removed as an example of strong cryptography in the PCI DSS, and can no longer be used as a security control after June 30, 2016.”This deadline applied to both organizations and service providers to remedy this situation in their environments by migrating from SSL to TLS v1.1 or higher. They also included an information supplement: “Migrating from SSL and Early TLS” as a guide.However, due to early industry feedback and push back, in December of 2015 the PCI SSC issued a bulletin extending the deadline to June 30, 2018 for both service providers and end users to migrate to higher, later versions of TLS standards. And in April of 2016 the PCI SSC issued PCI v3.2 to formalize the deadline extension and added an “Appendix 2” to outline the requirements for conforming with these standards.Sumo Logic Is Ready, Are You?The Sumo Logic platform was built with a security-by-design approach and we take security and compliance very seriously. As a company, we continue to lead the market in securing our own environment and providing the tools to help enable our customers to do the same.Sumo Logic complied with the the PCI DSS 3.2 service provider level one standards in accordance with the original deadline (June 30, 2016), and received validation from a third party expert, Coalfire.If your organization is still using these legacy protocols it is important to take steps immediately and migrate to the newest versions to ensure compliance by the approaching June 30, 2018 deadline.If you are unsure whether these vulnerable protocols are still in use in your PCI environment, don’t wait until it’s too late to take action. If you don’t have the resources to perform your own audit, the PCI Standards Council has provided a list of “Qualified Security Assessors” that can help you in those efforts.What About Sumo Logic Customers?If you are a current Sumo Logic customer, in addition to ensuring we comply with PCI DSS standards in our own environment, we continually make every effort to inform you if one or more of your collectors are eligible for an upgrade.If you have any collectors that might still be present in your PCI DSS environment that do not meet the new PCI DSS standards, you would have been notified through the collectors page in our UI (see image below). It’s worthwhile to note that TLS v1.1 is still considered PCI compliant, however, at Sumo Logic we are leapfrogging the PCI requirements and moving forward, we will only be supporting TLS v1.2.If needed you can follow these instructions to upgrade (or downgrade) as required.Sumo Logic Support for PCI DSS ComplianceSumo Logic provides a ton of information, tools and pre-built dashboards to our customers to help with managing PCI DSS compliance standards in many cloud and non-cloud environments. A collection of these resources can be found on our PCI Resources page.If you are a cloud user, and are required to manage PCI DSS elements in that type of environment, in April 2018 the PCI SSC Cloud Special Interest Group issued an updated version 3.0 to their previous version 2.0 that was last released in February 2013.Be looking for another related blog to provide a deeper dive on this subject.PCI SSC Cloud Computing Guidelines version 3.0 include the following changes:Updated guidance on roles and responsibilities, scoping cloud environments, and PCI DSS compliance challenges.Expanded guidance on incident response and forensic investigation.New guidance on vulnerability management, as well as additional technical security considerations on topics such as Software Defined Networks (SDN), containers, dog computing and internet of things (IoT).Standardized terminology throughout the document.Updated references to PCI SSC and external resources.Additional ResourcesFor more information on the compliance standards Sumo Logic supports visit our self-service portal. You’ll need a Sumo Logic account to access the portal.Visit our DocHub page for specifics on how Sumo Logic helps support our customer’s PCI compliance needsSign up for Sumo Logic for free to learn more

Blog

DevOps Redemption: Don't Let Outdated Data Analytics Tools Slow You Down

Blog

SnapSecChat: The Demand for Security as a Service

Blog

Log Management and Analytics for the AWS ELB Classic Service

Quick RefresherEarlier this year, we showed you how to monitor Amazon Web Services Elastic Load Balancer (AWS ELB) with Cloudwatch. This piece is a follow up to that, and will focus on Classic Load Balancers. Classic Load Balancers provide basic load balancing across multiple Amazon EC2 instances and operate at both the request level and connection level. Classic Load Balancers are intended for applications that were built within the EC2-Classic network. AWS provides the ability to monitor your ELB configuration with detailed logs of all the requests made to your load balancers. There is a wealth of data in the logs generated by ELB, and it is extremely simple to set up.How to Get Started: Setting up AWS ELB LogsLogging is not enabled in AWS ELB by default. It is important to set up logging when you start using the service so you don’t miss any important details!Step 1: Create an S3 Bucket and Enable ELB LoggingNote: If you have more than one AWS account (such as ops, dev, and so on) or multiple regions that generate Elastic Load Balancing data, you’ll probably need to configure each of these separately.Here are the key steps you need to followCreate an S3 Bucket to store the logsNote: Want to learn more about S3? Look no further (link)Allow AWS ELB access to the S3 BucketEnable AWS ELB Logging in the AWS ConsoleVerify that it is workingStep 2: Allow Access to external Log Management ToolsTo add AWS ELB logs to your log management strategy, you need to give access to your log management tool! The easiest way to do that is by creating a special user and policy.Create a user in AWS Identity and Access Management (IAM) with Programmatic Access. For more information about this, refer to the appropriate section of the AWS User Guide.Note: Make sure to store the Access Key ID and Secret Access Key credentials in a secure location. You will need to provide these later to provide access to your tools!Create a Custom Policy for the new IAM user. We recommend you use the following JSON policy:{“Version”:”2012-10-17″,“Statement”:[{“Action”:[“s3:GetObject”,“s3:GetObjectVersion”,“s3:ListBucketVersions”,“s3:ListBucket”],“Effect”:”Allow”,“Resource”:[“arn:aws:s3:::your_bucketname/*”,“arn:aws:s3:::your_bucketname”]}]}Note: All of the Action parameters shown above are required. Replace the “your_bucketname” placeholders in the Resource section of the JSON policy with your actual S3 bucket name.Refer to the Access Policies section of the AWS User Guide for more info.What do the Logs look like?ELB logs are stored as .log files in the S3 buckets you specify when you enable logging.The file names of the access logs use the following format:bucket[/prefix]/AWSLogs/aws-account-id/elasticloadbalancing/region/yyyy/mm/dd/aws-account-id_elasticloadbalancing_region_load-balancer-name_end-time_ip-address_random-string.logbucketThe name of the S3 bucket.prefixThe prefix (logical hierarchy) in the bucket. If you don’t specify a prefix, the logs are placed at the root level of the bucket.aws-account-idThe AWS account ID of the owner.regionThe region for your load balancer and S3 bucket.yyyy/mm/ddThe date that the log was delivered.load-balancer-nameThe name of the load balancer.end-timeThe date and time that the logging interval ended. For example, an end time of 20140215T2340Z contains entries for requests made between 23:35 and 23:40 if the publishing interval is 5 minutes.ip-addressThe IP address of the load balancer node that handled the request. For an internal load balancer, this is a private IP address.random-stringA system-generated random string.The following is an example log file name:s3://my-loadbalancer-logs/my-app/AWSLogs/123456789012/elasticloadbalancing/us-west-2/2014/02/15/123456789012_elasticloadbalancing_us-west-2_my-loadbalancer_20140215T2340Z_172.160.001.192_20sg8hgm.logSyntaxEach log entry contains the details of a single request made to the load balancer. All fields in the log entry are delimited by spaces. Each entry in the log file has the following format:timestamp elb client:port backend:port request_processing_time backend_processing_time response_processing_time elb_status_code backend_status_code received_bytes sent_bytes “request” “user_agent” ssl_cipher ssl_protocolThe following table explains the different fields in the log file. Note: ELB can process HTTP requests and TCP requests, and the differences are noted below:FieldDescriptiontimestampThe time when the load balancer received the request from the client, in ISO 8601 format.elbThe name of the load balancerclient:portThe IP address and port of the requesting client.backend:portThe IP address and port of the registered instance that processed this request.request_processing_time[HTTP listener] The total time elapsed, in seconds, from the time the load balancer received the request until the time it sent it to a registered instance.

AWS

June 19, 2018

Blog

Transform Graphite Data into Metadata-Rich Metrics using Sumo Logic’s Metrics Rules

Graphite Metrics are one of the most common metrics formats in application monitoring today. Originally designed in 2006 by Chris Davis at Orbitz and open-sourced in 2008, Graphite itself is a monitoring tool now used by many organizations both large and small. It accepts metrics from a wide variety of sources, including popular daemons like collectd and statsd, provided that the metrics are sent in the following simple format: Where metric path is a unique identifier, specified in a dot-delimited format. Implicit in this format is also some logical hierarchy specific to each environment, for example: While this hierarchical format has been widely accepted in the industry for years, it creates challenges for usability and ultimately lengthens the time to troubleshoot application issues. Users need to carefully plan and define these hierarchies ahead of time in order to maintain consistency across systems, scale monitoring effectively in the future and reduce confusion for the end user leveraging these metrics. Fortunately, the industry is evolving towards tag-based metrics to make it easier to design and scale these systems, and Sumo Logic is excited to announce the launch of Metrics Rules to take advantage of this new model immediately. Using Metrics Rules to Bring Graphite Metrics into the New World Sumo Logic built its metrics platform to support metadata-rich metrics, but we also acknowledged that the broader industry and many of our customers have invested heavily in their Graphite architecture and naming schemas over time. Sumo Logic’s Metrics Rules solution now allows users to easily transform these Graphite metrics into the next generation, tag-based metric format, which provides three key benefits: Faster Time to Value: No need to re-instrument application metrics to take advantage of this metadata-rich, multi-dimensional format. Send Graphite-formatted metrics to Sumo immediately and enrich them with tag-based metadata later. Easy Configuration: An intuitive user interface (UI) allows you to validate and edit your transformation rules in real-time, while competitive solutions require carefully defined config files that are difficult to set up and prone to errors. Improved Usability: With rich metadata, use simple key-value pairs to discover, visualize, filter and alert on metrics without knowing the original Graphite-based hierarchy. Using the example above, we can use Metrics Rules to enrich the dot-delimited Graphite names with key-value tags, which will make it easier for us to monitor metrics by our system’s logical groupings in the future: Intuitive Metrics Rules UI for Easy Validation and Edits As Graphite monitoring systems grow, so do the complexities in maintaining these dot-delimited hierarchies across the organization. Some teams may have defined Graphite naming schemes with five different path components (e.g., app.env.host.assembly.metric), while others may have more components or a different hierarchical definition altogether. To make it easier to create tags out of these metrics, the Metrics Rules configuration interface allows you to see a preview of your rules and make sure that you’ve properly captured the different components. Simply specify a match expression (i.e., which metrics the rule will apply to), define variables for each of the extracted fields and then validate that each tag field is extracting the appropriate values. After saving the rule, Sumo Logic will go back in time and tag your metrics with this new metadata so you can take advantage of these rules for prior data points. Improved Discoverability, Filtering and Alerting with Key-Value Tags Once these metrics contain the key-value tags that we’ve applied via Metrics, you can take advantage of several usability features to make finding, visualizing and alerting on your metrics even easier. For example, Sumo Logic’s autocomplete feature makes it easier to find and group metrics based on these key-value tags: Additionally, when using our unified dashboards for logs and metrics, these new tags can be leveraged as filters for modifying visualizations. Selecting a value in one of these filters will append a key-value pair to your query and filter down to the data you’re interested in: Finally, configuring alerts becomes significantly easier when scoping and grouping your metrics with key-value pairs. In the example below, we selected metric=vcpu.user from one of our namespaces, and we’re averaging this across each node in Namespace=csteam. This means that alerts will trigger across each node, and our email and/or webhook notifications will tell us which particular node has breached the threshold: The Bigger Picture Users can now convert legacy Graphite-formatted performance metrics into the metadata-rich metrics with Sumo Logic, both in real-time and after ingestion. This allows customers to increase the usability and accessibility for their analytics users by allowing them to leverage business relevant tags, instead of relying only on obscure, technical tags. Now with the capability to extract business context (metadata) from IT-focused metrics, organizations can use this data to gain actionable insight to inform strategic business decisions. In a broader context, this is significant because as we’ve been seeing from our customers, the hard lines between IT and business are becoming blurred, and there’s a strong emphasis on using data to improve the overall end-user experience. As more organizations continue to leverage machine data analytics to improve their security, IT and business operations, the ability to map machine data insights to actionable, contextual business analytics for IT and non-core-IT users is critical. Learn More Head over to Sumo Logic DocHub for more details on how to configure Metrics Rules on your account. Additionally, see how these rules can even be used for non-Graphite metrics by parsing out values from existing key-value pairs such as _sourceCategory and _sourceHost. Are you at DockerCon 2018 at Moscone Center in San Francisco this week? We’ll be there! Stop by our booth S5 to chat with our experts, get a demo and to learn more! Additional Resources Read the press release on our latest product enhancements unveiled at DockerCon Download the report by 451 Research & Sumo Logic to learn how machine data analytics helps organizations gain an advantage in the analytics economy Check out the Logs-to-Metrics blog Sign up for Sumo Logic for free

June 12, 2018

Blog

Accelerate Data Analytics with Sumo Logic’s Logs-to-Metrics Solution

June 12, 2018

Blog

The Sumo Logic Advantage for the Analytics Economy

Blog

Kubernetes Monitoring: What to Monitor (Crash Course)

Blog

Employee Spotlight: Exploring the Parallels Between Finance and DevSecOps Engineering

In this Sumo Logic Employee Spotlight we interview Michael Halabi. Mike graduated from UC Santa Cruz with a bachelor’s degree in business management economics, spent some time as an auditor with PwC, joined Sumo Logic as the accounting manager in the finance department, and recently transitioned to a new role at the company as a DevSecOps engineer. [Pause here for head scratch] I know what you’re thinking, and yes that is quite the career shift, but if you stay with us, there’s a moral to this story, as well as a few lessons learned. Work Smarter, Not Harder Q: Why did you initially decide business management economics was the right educational path? Mike Halabi (MH): I fell into the “uncertain college kid” category. While I was interested in engineering, I was also an entrepreneur at heart and knew that someday, if I were to start my own business, I would need a foundational business background as well as a variety of other life experiences outside of textbook knowledge. Q: How do you approach your work? MH: Everything in life, no matter how scary it may appear up front, can be broken into a series of simpler and smaller tasks. If you learn how to think about problem solving in a certain way, you can make anything work, no matter how far beyond your skill set and core competency it may originally seem. This is especially true in the technology industry where success often depends on doing it not just better, but also faster, than the competition. Breaking down complex problems into bite size chunks allows you to tackle each piece of the problem quickly and effectively and move on to the next. Q: What’s the best way for a business to achieve that — doing it better and faster? MH: Automation. This is applicable across the board. The finance industry is full of opportunities to automate processes. Half of what a traditional finance team spends its time doing is copy/pasting the same information into the same email templates or copy/pasting a formula in excel and manually tweaking each line. In other words, a bunch of tedious outdated practices that could be easily automated thanks to modern programs and technologies. One instance I recall is someone spending a full day calculating a small subset of a massive spreadsheet line by line: eight hours to do one-tenth of the massive workbook. With a proper understanding of the problem and how to leverage the tools available, I wrote a formula to copy/paste in 30 minutes that completed the entire workbook and is still in use today. Scalable, simple, efficient — this formula removes manual error and works every time. And this was a quarterly project, so that many weeks’ worth of highly paid time is saved every quarter. Low hanging fruit like this is everywhere. Q: So how did you capture the attention of Sumo Logic’s technical team? MH: Word got out about my closet-coding (really, I annoyed everyone to death until they let me help with something fun) and soon, various people in various teams were sending side projects on troubleshooting and automation my way. I continued on like this for awhile — finance accounting manager by day, coder by night until I was approached by our CSO and asked if I’d like to transition onto his team as a DevSecOps engineer. Connect the Dots Q: Let’s back up. How did you initially get into coding? MH: I took an early liking to video game development and while I didn’t have a formal engineering or coding background, using the above methodology, I taught myself how to make simple games using C++/SDL. Then, once I started helping out with various projects at Sumo Logic, I discovered Python, C# and Go. By spending time experimenting with each language I found different use-cases for them and in trying to apply what I’d learned I was pushed into the altogether different world of infrastructure. Making solutions was easy enough, getting them to less technically inclined folks became a new challenge. In order to deploy many of my cross functional projects at Sumo Logic, I had to learn about Docker, Lambda, EC2, Dynamo, ELBs, SSL, HTTP, various exploits/security related to web-based tech, etc. I devoted more of my time to learning about the backend and underlying technologies of the modern internet because making a service scalable and robust requires a holistic skillset beyond simply writing the code. Q: Are there any interesting parallels between finance and engineering? MH: As an auditor at PwC, I worked frequently with many companies from the very early startup stage to large public companies, and the problems most all of these companies face are the same. How do we get more done without hiring more people or working longer hours, and without sacrificing work quality. In finance the problem is handled generally by hiring more at a lot of companies. Q: Can you expand on that? MH: You need to look beyond the company financials. Increased revenue to increased work can’t (or should never) be a 1:1 ratio. For a company to scale, each individual employee has to understand how his or her role will scale in the future to keep pace with corporate growth and needs. You scale an organization by using technology, not by mindlessly throwing bodies at the work. That’s what I learned from finance. You don’t need a team of 10 people to collect money and write the same email to clients multiple times a day, when you can automate and have a team of two handle it. Manual processes are slow and result in human error. In engineering I think this concept is well understood, but in finance, in my experience, many companies behave as if they’re still in the 1500s with nothing more than an abacus to help them do their job. Find a Passion Project Q:What would be your advice to those considering a major career shift? MH: Our interests and passions will shift over time, and there’s nothing wrong with that. If you decide one day to do a complete 180 degree career change, go for it. If you don’t genuinely enjoy what you do, you’ll never truly advance. I loved designing video games and automating financial processes, which led to my career shift into engineering. Did I put in long hours? Yes. Did I enjoy it? Yes. Passion may be an oversung cliche but if you aren’t invested in your work, you’ll go through the motions without reaping any of the benefits, including the satisfaction of producing meaningful work that influences others in a positive way. Q: What’s your biggest learning from this whole experience? MH: The biggest takeaway for me as a coder was that theoretical knowledge doesn’t always apply in the real world because you can’t know how to make something until you make it. Coding is an iterative process of creating, breaking and improving. So never be afraid to fail occasionally, learn from it and move on. And don’t put yourself in a box or give up on your dreams simply because you don’t have a formal education or piece of paper to prove your worth. The technology industry is hungry for engineering talent, and sometimes it can be found in unusual places. In fact, finding employees with robust skill sets and backgrounds will only positively impact your team. Our collective experiences make us holistically stronger.

Blog

Gain Full Visibility into Microservices Architectures Using Kubernetes with Sumo Logic and Amazon EKS

Blog

Sumo Logic Partners with IP Intelligence Leader Neustar to Meet Growing Customer Needs at Scale

Customers are visiting your website, employees are logging into your systems and countless machines are talking to each other in an effort to deliver the perfect user experience. We’d like to believe that all of these individuals and machines are operating with the best of intentions, but how can we be so sure? One possible answer lies in the connecting device’s IP address and its respective physical location. IP geolocation is the process of determining the location of a device based on its unique IP address. It not only requires knowledge about the physical location of the computer where the IP address is assigned, but also how the device is connecting (e.g., via anonymous proxy, mobile, cable, etc.). This challenge becomes further complicated in an increasingly digital world with proliferating devices and millions of connections being established across the globe daily. That’s why we’re excited to announce that we’ve partnered with Neustar, a leading IP intelligence provider, to deliver one of the most comprehensive and precise geolocation databases in the industry. As a Sumo Logic customer, you can now leverage Neustar’s 20+ years of experience gathering and delivering IP intelligence insights, all at no additional charge. Precision Database + Weekly Updates = Higher Confidence Analytics In the pre-cellphone era (remember that?), everyone had a landline which meant area codes were fairly accurate identifiers of an end-user location. I knew that 516 meant someone was calling from Long Island, New York, while 415 was likely coming from the San Francisco Bay Area. But the invention of the cellphone complicated this matter. I might be receiving a call from someone with a 516 number, but because the caller was using a “mobile” device, he or she could be located anywhere in the U.S. IP addresses are like very complicated cellphone numbers — they can be registered in one place, used in another and then re-assigned to someone else without much notice. Keeping track of this is an enormous task. And over time, malicious actors realized that they could take advantage of this to not only mask their true location, but create false security alerts to distract security teams from identifying and prioritizing legitimate high-risk threats. That’s why partnering with a leader like Neustar, that uses a global data collection network and a team of network geography network analysts, to update their IP GeoPoint database on a daily basis, is key. This accuracy allows security teams to have full visibility into their distributed, global IT environment and when there’s an attempt to compromise a user’s credentials within an application, they can quickly flag any anomalous activity and investigate suspicious logins immediately. Proactive Geo Monitoring and Alerting in Sumo Logic With Neustar’s IP GeoPoint database, you can rest assured that your geolocation results are more trustworthy and reliable than ever before. Using Sumo Logic, you can continue to take advantage of the proactive alerting and dashboarding capabilities to make sense of IP intelligence across your security and operational teams. For example, you’ll have a high confidence in your ability to: Detect Suspicious Logins: alert on login attempts occurring outside of trusted regions. Maintain Regulatory Compliance: see where data is being sent to and downloaded from to keep information geographically isolated. Analyze End-User Behavior: determine where your users are connecting from to better understand product adoption and inform advertising campaigns. With real-time alerts, for example, you can receive an email or Slack notification if a login occurs outside of your regional offices: Configure real-time alerts to get notified when a machine or user is appearing from outside of a specific region. You can also use real-time dashboards to monitor the launch of a new feature, track customer behavior or gain visibility into AWS Console Logins from CloudTrail: Using Sumo Logic’s Applications, you can install out-of-the-box dashboards for instant geographic visibility into AWS Console Logins, for example. The Bigger Picture Born in AWS, Sumo Logic has always held a cloud-first, security-by-design approach and our vision is to create a leading cloud security analytics platform to help our customers overcome the challenges of managing their security posture in the cloud. There is a major gap in the available on-premises security tools for customers that not only need to manage security in the cloud, but also meet rigorous regulatory compliance standards, especially the European Union’s General Data Protection Regulation (GDPR) that went into effect last week on May 25, 2018. Geolocation is key for those needs which is why we’re thrilled to be rolling this out to our customers as part of a bigger strategy to provide visibility and security across the full application stack. Learn More Head over to Sumo Logic DocHub for more details on how to leverage the new database, then schedule some searches and create dashboards to take advantage of the enhanced IP geolocation. Check out our latest press announcement to learn about the additional features and to our cloud security analytics solution, including intelligent investigation workflows, privacy and GDPR dashboards, and enhanced threat intelligence.

Blog

Comparing RDS, DynamoDB & Other Popular Database Services – Migrating to AWS Part 2

Blog

Join the Data Revolution - Listen to our New Masters of Data Podcast

In today’s world, we are surrounded by data. It’s flying through the air all around us over radio signals, wireless internet, mobile networks and more. We’re so immersed in data every day that we rarely stop to think about what the data means and why it is there. I, for one, rarely have a quiet moment where I am not directly, or indirectly, absorbing a flow of information, regardless of whether I want to or not.So, I wonder, are we going to master this all-encompassing data, or be mastered by it?Data is the new currency of our worldData has become the lifeblood of the modern business, and those who succeed in today’s competitive landscape are those who leverage data the best. Amazing applications exist that take the raw data flowing out of the exhaust pipe of the modern economy (and our lives) and enable companies to develop products we couldn’t have even conceived of a few years ago. Artificial intelligence-driven innovations are anticipating our next move. Social media products are connecting us and targeting us. Self-driving cars are swimming in data to navigate a world full of faulty humans. But, how often do we stop to talk about the good, the bad and the ugly of data?[epq-quote align="align-right"]"A single conversation across the table with a wise man is better than ten years mere study of books." - Henry Wadsworth Longfellow [/epq-quote]The discussions now about data privacy and the nonstop stream of hackers stealing our personal information, are actually elevating data from a sub-theme of our culture into a primary topic, even for non-experts. The value of data is also rising into the awareness of boardrooms, and this is a good thing. The only way to keep ourselves honest about how we use data is to talk about how we use data — the wonderful and the despicable, the innovative and the regressive.A new podcast to explore the data revolution[epq-quote align="align-left"]The only way to keep ourselves honest about how we use data is to talk about how we use data — the wonderful and the despicable, the innovative and the regressive" - Ben Newton, director of product marketing, Sumo Logic[/epq-quote]As a long-time podcast listener and fan of the spoken word, I am excited to announce a new podcast about this data revolution — Masters of Data. Each episode we interview innovators, big thinkers and provocateurs, to learn theirs views about data.We also want to meet the people behind the data, who can humanize the data by helping us understand the cultural content and intention of the data on human experience. That way we turn this from stories about widgets and gimmicks into stories about humans and the value of data, as well as the dangers of misusing it.Our first podcast is now live and features Bill Burns, the chief trust officer at Informatica. Bill and I take a journey through time and security and discuss the evolving tech landscape over the last 20 years. We talk about how he got started in a computer lab, cut his teeth at Netscape, helped change the world at Netflix, and is on his next journey at Informatica.How to listen and subscribeTo listen visit https://mastersofdata.sumologic.com or subscribe via the iTunes or Google Play app stores. Once you’ve subscribed on your favorite platform, be sure to check back for new episodes and leave us a review to let us know what you think! We will be releasing several discussions over the next few months, and we look forward your feedback!Until then, listen and enjoy.And don’t forget to join the conversation on Twitter #mastersofdata

May 15, 2018

Blog

Sumo Logic For Support and Customer Success Teams

*Authored by Kevin Keech, Director of Support at Sumo Logic, and Graham Watts, Senior Solutions Engineer at Sumo Logic Many Sumo Logic customers ask, “How can I use Sumo Logic for support and customer success teams?” If you need a better customer experience to stay ahead of the competition, Sumo Logic can help. In this post, I will describe why and how support and customer success teams use Sumo Logic, and summarize the key features to use in support and customer success use cases. Why Use Sumo Logic For Support and Customer Success Teams? Improved Customer Experience Catch deviations and performance degradation before your customers report it Using dashboards and scheduled alerts, your CS and Support teams can be notified of any service impacting issues and can then reach out and provide solutions before your customers may ever know they have a problem This helps your customers avoid experiencing any frustrations with your service, which in the past may have led them to look into competitive offerings Improve your Net Promoter Score (NPS) and Service Level Agreements (SLAs) Alert team members to reach out to a frustrated customer before they go to a competitors website or log out Efficiency and Cost Savings – Process More Tickets, Faster Sumo Logic customers report an increase in the number of support tickets each team member can handle by 2-3x or more Direct access to your data eliminates the need for your Support team to request access and wait for engineering resources to grant access This leads to a higher level of customer satisfaction, and allows you to reallocate engineering time to innovate and enhance your product offerings Your support reps can perform real-time analysis of issues as they are occurring, locate the root of a problem, and get your customers solutions quicker Customers report that using LogReduce cuts troubleshooting time down from hours/days to minutes As your teams and products grow, team members can process more tickets instead of needing to hire more staff Security Eliminate the need to directly log into servers to look at logs – you can Live Tail your logs right in Sumo Logic or via a CLI Use Role Based Access Control to allow teams to view only the data they need How to Use Sumo Logic For Support and Customer Success Teams Key features that enable your Support Team, Customer Success Team, or another technical team while troubleshooting are: Search Templates See here for a video tutorial of Search Templates Form based search experience – no need for employees to learn a query language Users type in human-friendly, easy to remember values like “Company Name” and Sumo will look up and inject complex IDs, like “Customer ID” or some other UUID, into the query, shown to the right: LogReduce Reduce 10s or 100s of thousands of log messages into a few patterns with the click of a button This reduces the time it takes to identify the root cause of an issue from hours or days to minutes In the below example, a bad certificate and related tracebacks are exposed with LogReduce Dashboards Dashboard filters – Auto-populating dashboard filters for easy troubleshooting TimeCompare – Is now ‘normal’ compared to historical trends? The example below shows production errors or exceptions today, overlaid with the last 7 days of production errors or exceptions:

Blog

Sumo Logic Announces Search Templates to Improve the Customer Experience with Better, Faster Application Insights

May 14, 2018

Blog

Call for Speakers for Illuminate 2018 Now Open!

Today at Sumo Logic, we’re excited to announce that registration as well as call for speaker papers are open for our annual user conference Illuminate 2018! For those that did not attend last year or are unfamiliar with Illuminate, it’s a two-day event where Sumo Logic users and ecosystem partners come together to share best practices for shining the light on continuous intelligence for modern applications. Illuminate 2018 takes place Sept. 12-13, 2018 at the Hyatt Regency San Francisco Airport Hotel in Burlingame, Calif., and registration details can be found on the conference website under the frequently asked questions (FAQ) section. Why Attend? The better question is why not? Last year, the conference brought together more than 400 customers, partners, practitioners and leaders across operations, development and security for hands-on training and certifications through the Sumo Logic Certification Program, as well as technical sessions and real-world case studies to help attendees get the most out of the Sumo Logic platform. You can register for Illuminate 2018 here: The 2017 keynote was chock-full of interesting stories from some of our most trusted and valued customers, partners and luminaries, including: Ramin Sayar, president and CEO, Sumo Logic David Hahn, CISO, Hearst Chris Stone, chief products officer, Acquia Special guest Reid Hoffman, co-founder of LinkedIn; partner of Greylock Partners You can watch the full 2017 keynote here, and check out the highlights reel below! Interested in Speaking at This Year’s Conference? If you’ve got an interesting or unique customer use-case or story that highlights real-world strategies and technical insights about how machine data analytics have improved operations, security and business needs for you and your end-users, then we’d love to hear from you! These presentations must provide a holistic overview into how today’s pioneers are pushing the boundaries of what’s possible with machine data analytics and developing powerful use cases within their industries or organizations. Still on the fence? Keep in mind that if your session is accepted, you’ll receive one complimentary registration to Illuminate as well as branding and promotional opportunities for your session. More on Topics, Requirements & Deadline To make it easier on our customers, we’ve compiled a list of desired topics to help guide you in the submission process. However, this is not an exhaustive list, so if you have another interesting technical story to tell that is relevant to Sumo Logic users and our ecosystem, we’d love to hear it. The Journey to the Cloud (Cloud Migration) Operations and Performance Management of Applications and Infrastructure Operations and Performance Management for Containers and Microservices Best Practices for using Serverless Architectures at scale Cloud Security and Compliance Best practices for implementing DevSecOps Sumo Logic to Enable Improved Customer Support/Success Unique Use Cases (I Didn’t Know You Could Use Sumo to…) Best Practices to Adopting and Leveraging Sumo Logic Effectively within your Organization Regardless of topic, all submissions MUST include: Title Brief Abstract (200-400 words) 3 Key Audience Takeaways or Learnings Speaker Bio and Links to any Previous Presentations The deadline to submit session abstracts is June 22, 2018. Please email submissions to [email protected]. Examples of Previous Customer Presentations Last year, we heard great customer stories from Samsung SmartThings, Hootsuite, Canary, Xero, and more. Here are a few customer highlight reels to give you examples of the types of stories we’d like to feature during Illuminate 2018.  Final Thoughts Even if you aren’t thinking of submitting a call for paper, we’d love to see you at Illuminate to participate in two full days of learning, certification and trainings, content sharing, networking with peers and Sumo Logic experts, and most importantly, fun! You can register here, and as a reminder, call for papers closes June 22, 2018! Together, we can bring light to dark, and democratize machine data analytics for everyone. If you have any additional questions, don’t hesitate to reach out to [email protected]. Hope you see you there!

May 10, 2018

Blog

SnapSecChat: Sumo Logic’s Chief Security Officer Ruminates on GDPR D-Day

Blog

Comparing AWS S3 and Glacier Data Storage Services - Migrating to AWS Part 1

Blog

How to Build a Scalable, Secure IoT Platform on GCP in 10 Days

Blog

Introducing New Content Sharing Capabilities for Sumo Logic Customers

As organizations scale and grow, teams begin to emerge with areas of specialization and ownership. Dependencies develop, with individuals and teams acting as service providers to other functional areas. We’re finding information technology and DevOps teams rely on Sumo Logic for not just their own monitoring of application health and infrastructure, but also for sharing this data out to other functional areas such as customer support and business analysts, to empower them to make stronger data-driven decisions. Our top customers such as Xero, Acquia and Delta each have hundreds if not thousands of users spread across multiple teams who have unique data needs that align with varying business priorities and wide ranges of skill sets. Scaling our platform to support the data distribution and sharing needs of such a broad and diverse user base is a key driver of Sumo Logic’s platform strategy. To this end, we are excited to announce the general availability of our new Content Sharing capabilities. Underlying this updated ability to share commonly used assets such as searches and dashboards is a secure, fine-grained and flexible role-based access control (RBAC) model. Check out our intro to content sharing video for more details. Collaboration with Control The updated model allows for data visualization assets such as searches and dashboards to be shared out individually (or grouped in folders) to not just other users, but also to members of a particular role. Users can be invited to simply view the asset or participate in actively editing or managing it. This allows for individual users to collaborate on a search or dashboard before sharing it out to the rest of the team. When new information is discovered or a meaningful visualization is created, they can be easily added to a dashboard by anyone with edit access and and the change is made immediately available to other users. While ease-of-use is critical in ensuring a smooth collaboration experience, it should not come at the price of data security. It is always transparent as to who has what access to an asset and this access can be revoked at any time by users with the right privileges. Users can control the data that is visible to the viewers of a dashboard and new security measures such as a “run-as” setting have been introduced to prevent viewing or exploitation of data access settings put in place by administrators. Business Continuity Supporting business continuity is a key conversation we have with our customers as their businesses grow and their data use-cases evolve and deepen. This new set of features reflects Sumo Logic’s belief that the data and its visualizations belong to the organization and should be easily managed as such. We’ve replaced our single-user ownership model with a model where multiple users can manage an asset (ie, view, edit and if required, delete it). This ensures that a team member could, for example, step in and update an email-alert-generating scheduled search when the original author is unavailable. Even if the asset was not shared with other team members, administrators now have the ability (via a newly introduced “Administrative Mode”) to manage any asset in the library, regardless of whether it was actively shared with them or not. The user deletion workflow has been simplified and updated to support business continuity. When a user is deleted, any of their searches and dashboards that are located in a shared folder continue to exist where they are, thus providing a seamless experience for team members depending on them for their business workflows. Administrators can assign and transfer the management of any assets located in the user’s personal folder to another team member with the right context. (Previously, administrators would have had to take ownership and responsibility.) Teams as Internal Service Providers Development teams that own a specific app or run an area of infrastructure, are increasingly seen as internal service providers, required to meet SLAs and demonstrate application health and uptime. They need to make this information easy to discover and access by field and service teams who in turn, rely on this data to support end customers. Our newly introduced Admin Recommended section allows administrators to set up organizational folder structures that facilitate this easy interaction between producers and consumers. Relevant folders are pinned to the very top of our Library view, enabling easy discoverability to infrequent users. The contents of these folders can be managed and updated by individual development teams, without the administrator needing to intervene. This allows for sharing of best practices, commonly used searches and dashboards as well as enables users to create team-specific best practice guides. As data-driven decision-making becomes the norm across all job functions and industries, the Sumo Logic platform is designed to scale in performance, usability and security to meet these needs. Features like fine-grained RBAC and administrative control, coupled with intuitive design make information accessible to those who need it the most and enable the creation of workflows that democratize data and drive business growth. Upcoming Webinars To learn more about how these features can be applied at your organization, please sign up for one of our upcoming webinars on May 1st (2:00 p.m. PST) or May 3rd (9:00 a.m. PST)

April 26, 2018

Blog

SnapSecChat: RSA 2018 Musings from Sumo Logic's Chief Security Officer

Blog

Comparing Kubernetes Services on AWS vs. Azure vs. GCP

Blog

IRS 2018 Tax Day Website Outage: Calculating The Real Monetary Impact

Every year, the proverbial cyber gods anticipate a major data breach during tax day, and this year, they weren’t completely wrong. While we didn’t have a cyber meltdown, we did experience a major technology failure. Many U.S. taxpayers may already know what I am talking about here but for those outside of the country, let me fill you in. The U.S. Internal Revenue Service (IRS) website crashed on April 17 — the infamous tax filing day. And much like death, or rush hour traffic, no one can escape doing his or her taxes. And thanks to the good graces of the IRS, the deadline was extended a day to April 18, so that taxes could be filed by everyone as required by U.S. laws. Now you might say “ah shucks, it’s just a day of downtime. How bad can it be?” In short, bad, and I’m going to crunch some numbers to show you just how bad because here at Sumo Logic, we care a lot about customer experience and this outage was definitely low on the customer experience scale. The below will help you calculate the financial impact of the total downtime and how the delays affected the U.S. government — and more importantly, the money the government collects from all of us. To calculate the cost of the one day downtime of the IRS app, we did some digging in terms of tax filing statistics. And here’s what we found: On average over 20 million taxpayers submit their taxes in the last week, according to FiveThirtyEight. Half of these folks — 50 percent, ~10M (million) people — submitted on the last day, and therefore were affected by the site outage. How did we arrive at this number? Well, we made a conservative assumption that the folks who waited until the last week were the classic procrastinators (you know who you are!). And these procrastinators generally wait until the last day (maybe even the last hour) to file their taxes, which was also backed up by the IRS and FiveThirtyEight data points. We also make a practical assumption that most last-minute tax filers are “payees,” meaning you are waiting for the last day because you are paying money to the government. After all, if you are getting money back, there’s more incentive to file early and cash in that check! You with me so far? Now, in order to determine the amount of money the IRS collects on the last day, we need to know what the average filer pays to the IRS. This was a tricky question to answer. The IRS assumes that most folks get a refund of $2900, but does not specify the average filer payment amount. Since there exists no standard amount, we modeled a few payments ($3K, $10K, $30K) to calculate the amount that got delayed because of the last minute outage. Number of filers who pay Avg. payee amount IRS revenue ($’s) on last filing day 10M $3K $30B 10M $10K $100B 10M $30K $300B So the government delayed getting some big money, but they do eventually get it all (didn’t we say that taxes and death are inevitable?). It’s important to note here that a lot of taxes are collected via W-2 forms, so filing does not mean the payee will actually pay, and in some instances, there will be refunds granted. So with that in mind, let’s now calculate the cost of the one day delay. To do this, we use the 1.6 percent* treasury yield on the money and we calculate the actual cost of downtime. Number of filers who pay Avg. payee amount IRS revenue ($’s) on last filing day Lost revenue for one day delay 10M $3K $30B $1.3M 10M $10K $100B $4.3M 10M $30K $300B $13.1M What you see is that even one day’s cost of downtime for the U.S. government is measured in millions of dollars. And for a government that is trillions of dollars in debt (and at least in theory focused on cost/debt reduction), these numbers add up quickly. So here’s a quick PSA for the U.S. government and for the IRS in particular: Your software application matters. And the customer experience matters. Update your systems, implement the proper controls, and get your act together so that we can rest assured our money is in good hands. If you continue to run your infrastructure on legacy systems, you’ll never be able to scale, stay secure or deliver the ultimate customer experience. With the pace of technological innovation, cloud is the future and you need visibility across the full application stack. But let us not forget the second moral of this story, said best by Abraham Lincoln: “You cannot escape the responsibility of tomorrow by evading it today.” And if you want to see customer experience done right, contact Sumo Logic for more questions on how to build, secure and run your modern applications in the cloud!

Blog

Challenges to Traditional Cloud Computing: Security, Data, Resiliency

Blog

RSA CSO Corner: Okta & Sumo Logic Talk MFA, Minimizing Risk in the Cloud

Blog

RSA CSO Corner: Neustar & Sumo Logic Talk GDPR, IP Intelligence

Blog

Survey Data Reveals New Security Approach Needed for Modern IT

New World of Modern Apps and Cloud Create Complex Security Challenges As the transition to the cloud and modern applications accelerates, the traditional security operations center (SOC) functions of threat correlation and investigation are under enormous pressure to adapt. These functions have always struggled with alert overload, poor signal to noise ratio in detection, complex and lengthy workflows, and acute labor churn; however, cloud and modern applications add new challenges to integrate previously siloed data and process while coping with much larger threat surface areas. To overcome these challenges, security must continuously collaborate with the rest of IT to acquire and understand essential context. In addition, cloud and application-level insight must be integrated with traditional infrastructure monitoring, and investigation workflows must accelerate at many times the current speed in order to keep pace with the exploding threat landscape. In the past 2 months we’ve formally surveyed hundreds of companies about their challenges with security for modernizing IT environments in the 2018 Global Security Trends in the Cloud report, conducted by Dimensional Research in March 2018 and sponsored by Sumo Logic. The survey included a total of 316 qualified independent sources of IT security professionals across the U.S. and Europe, the Middle East and Africa (EMEA). In addition, we’ve interviewed a broad cross-section of both current and potential future Sumo Logic customers. According to the survey results, a strong majority of respondents called out the need for a fundamentally new approach for threat assessment and investigation in the cloud, and even the laggard voices conceded these are “if not when” transitions that will redraw boundaries in traditional security tools and process. In the Customer Trenches: Why Security and IT Must Collaborate Eighty-seven percent of surveyed security pros observed that as they transition to the cloud, there is a corresponding increase in the need for security and IT operations to work together during threat detection and investigation. Customer interviews gave color to this strong majority with many use cases cited. For instance, one SaaS company security team needed end customer billing history to determine the time budget and priority for conclusion/case queuing. Another online business process firm needed close collaboration with the cloud ops teams to identify if slow application access was a security problem or not. A third company needed IT help for deeper behavioral insight from identity and access management (IAM) systems. In all of these examples the heavy dose of cloud and modern applications made it nearly impossible for the already overburdened security team to resolve the issues independently and in a timely manner. They required real-time assistance in getting data and interpreting it from a variety of teams outside the SOC. These examples are just a few of the complex workflows which can no longer be solved by siloed tools and processes that are holding organizations back from fully securing their modern IT environments. These challenges surface in the survey data as well, with 50 percent of respondents specifically looking for new tools to improve cross-team workflows for threat resolution. This group — as you would expect — had plenty of overlap with the over 50 percent of respondents who observed that on-premises security tools and traditional security information and event management (SIEM) solutions can’t effectively assimilate cloud data and threats. Unified Visibility is Key: Integrating Cloud and Application Insight Eighty-two percent of those surveyed observed that as their cloud adoption increases there is a corresponding increase in the need to investigate threats at both the application and infrastructure layers. A clear pattern in this area was best summarized by one SOC manager, who said: “I feel like 90 percent of my exposure is at the application layer but my current defense provides only 10 percent of the insight I need at that layer.” Attackers are moving up the stack as infrastructure defenses solidify for cloud environments, and the attack surface is expanding rapidly with modular software (e.g. microservices) and more externally facing customer services. “Ninety percent of my exposure is at the application layer but my current defense provides only 10 percent of the insight I need” In the survey, 63 percent of security pros reported broader technical expertise is required when trying to understand threats in the cloud. An industry veteran who spent the past 3 years consulting on incorporating cloud into SOCs noted a “three strikes you’re out” pattern for SOC teams in which they could not get cloud application data, could not understand the context in the data when they did get it, and even if they understood it could not figure out how to apply the data to their existing correlation workflows. One CISO described the process like “blind men feeling an elephant,” a metaphor with a long history describing situations in which partial understanding leads to wild divergence of opinion. Customers interviews provided several examples of this dynamic. One incident pesponse veteran described painstaking work connecting the dots from vulnerabilities identified in DevOps code scans to correlation rules to detect cross-site scripting, a workflow invisible to traditional infrastructure-focused SOCs. Another enterprise with customer facing SaaS offerings described a very complex manual mapping from each application microservice to possible IOCs, a process the traditional tools could only complete in disjointed fragments. Many reported the need to assess user activity involving applications in ways standard behavior analytics tools could not. More broadly these cloud and application blind spots create obvious holes in the security defense layer, such as missing context, lost trials, unidentified lateral movement and unsolvable cases (e.g. cross-site scripting) to name a few. Diversity of log/API formats and other challenges make moving up the stack a non-trivial integration, but these obstacles must be overcome for the defense to adapt to modern IT. New Approach Needed to Break Down Existing Silos With all of these challenges in the specific areas of threat correlation and investigation, it’s no surprise that more generally an aggregate of 93 percent of survey respondents think current security tools are ineffective for the cloud. Two-thirds of those surveyed are looking to consolidate around tools able to plug the holes. A full third say some traditional categories such as the SIEM need to be completely rethought for the cloud. At Sumo Logic we’ve lived the imperative to bridge across the traditional silos of IT vs. security, application vs. infrastructure, and cloud vs. on-premises to deliver an integrated cloud analytics platform. We’re applying that hard won insight into new data sources, ecosystems and application architectures to deliver a cloud security analytics solution that meets the demands of modern IT. Stop by the Sumo Logic booth (4516 in North Hall) this week at RSA for a demo of our new cloud security analytics platform features, including privacy and GDPR-focused dashboards, intelligent investigation workflow and enhanced threat intelligence. To read the full survey, check out the report landing page, or download the infographic for a high-level overview of the key findings.

Blog

Sumo Logic's Dave Frampton Live on theCube at RSA

Blog

Log Analysis on the Microsoft Cloud

The Microsoft Cloud, also known as Microsoft Azure, is a comprehensive collection of cloud services available for developers and IT professionals to deploy and manage applications in data centers around the globe. Managing applications and resources can be challenging, especially when the ecosystem involves many different types of resources, and perhaps multiple instances of each. Being able to view logs from those resources and perform log analysis is critical to effective management of your environment hosted in the Microsoft Cloud. In this article, we’re going to investigate what logging services are available within the Microsoft Cloud environment, and then what tools are available to assist you in analyzing those logs. What Types of Logs are Available? The Microsoft Cloud Infrastructure supports different logs depending on the types of resources you are deploying. Let’s look at the logs that are gathered within the ecosystem and then investigate each in more depth. Activity Logs Diagnostic Logs Application logs are also gathered within the Microsoft Cloud. However, these are limited to compute resources and are dependent on the technology used within the resource, and application and services which are deployed with that technology. Activity Logs All resources report their activity within the Microsoft Cloud ecosystem in the form of Activity Logs. These logs are generated as a result of some different categories of events. Administrative – Creation, deletion and updating of the resource. Alerts – Conditions which may be cause for concern, such as elevated processing or memory usage. Autoscaling – When the number of resources is adjusted due to autoscale settings. Service Health – Related to the health of the environment in which the resource is hosted. These logs contain information related to events occurring external to the resource. Diagnostic Logs Complementary to the activity logs are the diagnostic logs. Diagnostic logs provide a detailed view into the operations of the resource itself. Some examples of actions which would be included in these logs are: Accessing a secret vault for a key Security group rule invocation Diagnostic logs are invaluable in troubleshooting problems within the resource and gaining additional insight into the interactions with external resources from within the resource being monitored. This information is also valuable in determining the overall function and performance of the resource. Providing this data to an analysis tool can offer important insights which we’ll discuss more in the next section. Moving Beyond a Single Resource Log viewing tools and included complex search filters are available from within the Microsoft Cloud console. However, these are only useful if you are interested in learning more about the current state of a specific instance. And while there are times when this level of log analysis is valuable and appropriate, sometimes it can’t accomplish the task. If you find yourself managing a vast ecosystem consisting of multiple applications and supporting resources, you will need something more powerful. Log data from the Microsoft Cloud is available for access through a Command Line Interface (CLI), REST API and PowerShell Cmdlet. The real power in the logs lies in being able to analyze them to determine trends, identify anomalies and automate monitoring so that engineers can focus on developing additional functionality, improving performance and increasing efficiencies. There are some companies which have developed tools for aggregating and analyzing logs from the Microsoft Cloud, including Sumo Logic. You can learn more about the value which Sumo Logic can provide from your log data by visiting their Microsoft Azure Management page. I’d like to touch on some of the benefits here in conclusion. Centralized aggregation of all your log data, both from the Microsoft Cloud and from other environments, makes it easier to gain a holistic view of your resources. In addition to making this easier for employees to find the information they need quickly, it also enhances your ability to ensure adherence to best practices and maintain compliance with industry and regulatory standards. Use of the Sumo Logic platform also allows you to leverage their tested and proven algorithms for anomaly detection, and allows you to segregate your data by source, user-driven events, and many other categories to gain better insight into which customers are using your services, and how they are using them.

Blog

RSA CSO Corner: Twistlock & Sumo Logic Talk GDPR, Container Security

Blog

RSA CSO Corner: CloudPassage & Sumo Logic Talk DevSecOps, Cloud Security

Blog

RSA Video: GDPR Flash Q&A with Sumo Logic Execs

Blog

The History of Monitoring Tools

Blog

How Log Analysis Has Evolved

Blog

Achieving AWS DevOps Competency Status and What it Means for Customers

Blog

Configuring Your ELB Health Check For Better Health Monitoring

Blog

ALB vs ELB: Choosing Between an ELB and an ALB on AWS

Blog

Optimizing Cloud Visibility and Security with Amazon GuardDuty and Sumo Logic

Blog

Microservices for Startups Explained

Blog

Using AWS Config Rules to Manage Resource Tag Compliance

Blog

Graphite Monitoring for Windows Performance Metrics

For several years now, the tool of choice for collecting performance metrics in a Linux environment has been Graphite. While it is true that other monitoring tools, such as Grafana, have gained traction in the last several years, Graphite remains the go tool monitoring tool for countless organizations. But what about those organizations that run Windows, or a mixture of Windows and Linux? Because Graphite was designed for Linux, it is easy to assume that you will need a native Win32 tool for monitoring Windows systems. After all, the Windows operating system contains a built-in performance monitor, and there are countless supplementary performance monitoring tools available, such as Microsoft System Center. While using Graphite to monitor Linux systems and a different tool to monitor Windows is certainly an option, it probably isn’t the best option. After all, using two separate monitoring tools increases cost, as well as making life a little more difficult for the administrative staff. Fortunately, there is a way to use Graphite to monitor Windows systems. Bringing Graphite to Windows As you have probably already figured out, Graphite does not natively support the monitoring of Windows systems. However, you can use a tool from GitHub to bridge the gap between Windows and Graphite. In order to understand how Graphite monitoring for Windows works, you need to know a little bit about the Graphite architecture. Graphite uses a listener to listen for inbound monitoring data, which is then written to a database called Whisper. Graphite is designed to work with two different types of metrics—host metrics and application metrics. Host metrics (or server metrics) are compiled through a component called Collectd. Application metrics, on the other hand, are compiled through something called StatsD. In the Linux world, the use of Collectd and StatsD means that there is a very clear separation between host and application metrics. In the case of Windows however, Graphite monitoring is achieved through a tool called PerfTap. PerfTap does not as cleanly differentiate between host and application monitoring. Instead, the tool is designed to be compatible with StatsD listeners. Although StatsD is normally used for application monitoring, PerfTap can be used to monitor Windows operating system-level performance data, even in the absence of Collectd. The easy way of thinking of this is that StatsD is basically treating Windows as an application. As is the case with the native Windows Performance Monitor, PerfTap is based around the use of counters. These counters are grouped into five different categories, including: System Counters – Used for monitoring hardware components such as memory and CPU Dot Net Counters – Performance counters related to the .NET framework ASP Net Counters – Counters that can be used to track requests, sessions, worker processes, and errors for ASP.NET SQL Server Counters – Most of these counters are directly related to various aspects of Microsoft SQL Server, but there is a degree of overlap with the System Counters, as they relate to SQL Server. Web Service Counters – These counters are related to the native web services (IIS), and allow the monitoring of ISAPI extension requests, current connections, total method requests, and more. PerfTap allows monitoring to be enabled through the use of a relatively simple XML file. This XML file performs four main tasks. First, it sets the sampling interval. The second task performed by the XML file is to provide the location of the counter definition file. The XML file’s third task is to list the actual counters that need to be monitored. And finally, the XML file provides connectivity information to the Graphite server by listing the hostname, port number, prefix key, and format. You can find an example of the XML file here. Graphite and Sumo Logic Although Graphite can be a handy tool for analyzing performance metrics, Graphite unfortunately has trouble maintaining its efficiency as the organization’s operations scale. One possible solution to this problem is to bring your Graphite metrics into the Sumo Logic service. Sumo Logic provides a free video of a webinar in which they demonstrate their platform’s ability to natively ingest, index, and analyze Graphite data. You can find the video at: Bring your Graphite-compatible Metrics into Sumo Logic. Conclusion Although Graphite does not natively support the monitoring of Windows systems, you can use a third-party utility to send Windows monitoring data to a Graphite server. Of course, Graphite is known to have difficulty with monitoring larger environments, so adding Windows monitoring data to your existing Graphite deployment could complicate the management of monitoring data. One way of overcoming these scalability challenges is to bring your Graphite monitoring data into Sumo Logic for analysis.

Blog

Sumo Logic Gives Customers 171 Percent ROI: Forrester TEI Study

Blog

A DPO's Guide to the GDPR Galaxy: Dark Reading

Blog

Tuning Your ELB for Optimal Performance

Blog

Common AWS Security Threats and How to Mitigate Them

AWS security best practices are crucial in an age when AWS dominates the cloud computing market. Although moving workloads to the cloud can make them easier to deploy and manage, you’ll shoot yourself in the foot if you don’t secure cloud workloads well. Toward that end, this article outlines common AWS configuration mistakes that could lead to security vulnerabilities, then discusses strategies for addressing them. IAM Access The biggest threat that any AWS customer will face is user access control, which in AWS-speak is known as Identity and Access Management‎ (IAM). When you sign up for a brand-new AWS account, you are taken through steps that will enable you to grant privileged access to people in your company. When the wrong access control is given to a person that really doesn’t require it, things can go terribly downhill. This is what happened with GitLab, when their production database was partially deleted by mistake! Mitigation Fortunately, IAM access threats can be controlled without too much effort. One of the best ways to go about improving IAM security is to make sure you are educated about how AWS IAM works and how you can take advantage of it.When creating new identities and access policies for your company, grant the minimal set of privileges that everyone needs. Make sure you get the policies approved by your peers and let them reason out why one would need a particular level of access to your AWS account. And when absolutely needed, provide temporary access to get the job done.Granting access to someone does not just stop with the IAM access control module. You can take advantage of the VPC methods that allow administrators to create isolated networks that connect to only some of your instances. This way, you can have staging, testing and production instances. [Read More: Threat Hunting] Loose Security Group Policies Administrators sometimes create loose security group policies that expose loopholes to attackers. They do this because group policies are simpler than setting granular permissions on a per-user basis.Unfortunately, anyone with basic knowledge of AWS security policies can easily take advantage of permissive group policy settings to exploit AWS resources. They leave your AWS-hosted workloads at risk of being exploited by bots (which account for about a third of the visitors to websites, according to web security company Imperva). These bots are unmanned scripts that run on the Internet looking for basic security flaws, and misconfigured security groups on AWS servers that leave unwanted ports open are something they look for. Mitigation The easiest way to mitigate this issue is to have all the ports closed at the beginning of your account setup. One method of doing this is to make sure you allow only your IP address to connect to your servers. You can do this while setting up your security groups for your instances, to allow traffic only to your specific IP address rather than to have it open like: 0.0.0.0/0. Above all, making sure you name your security group when working in teams is always a good practice. Names that are confusing for teams to understand is also a risk. It’s also a good idea to create individual security groups for your instances. This allows you to handle all your instances separately during a threat. Separate security groups allow you to open or close ports for each machine, without having to depend on other machines’ policies. Amazon’s documentation on Security Groups can help you get tighter on your security measures. Protecting Your S3 Data One of the biggest data leaks from Verizon happened not because of a bunch of hackers trying to break their system, but from a simple misconfiguration in their AWS S3 storage bucket that contained a policy that allows anyone to read information from the bucket. This misconfiguration affected anywhere between six million and 14 million Verizon customers. This is a disaster for any business.Accidental S3 data exposure is not the only risk. A report released by Detectify identifies a vulnerability in AWS servers that allows hackers to identify the name of the S3 buckets. Using this information, an attacker can start talking to Amazon’s API. Done correctly, attackers can then read, write and update an S3 bucket without the bucket owner ever noticing. Mitigation According to Amazon, this is not actually an S3 bug. It’s simply a side effect of misconfiguring S3 access policies. This means that as long as you educate yourself about S3 configuration, and avoid careless exposure of S3 data to the public, you can avoid the S3 security risks described above. Conclusion Given AWS’s considerable market share, there is a good chance that you will deploy workloads on AWS in the future, if you do not already. The configuration mistakes described above that can lead to AWS security issues are easy to make. Fortunately, they’re also easy to avoid, as long as you educate yourself. None of these security vulnerabilities involve sophisticated attacks; they center on basic AWS configuration risks, which can be avoided by following best practices for ensuring that AWS data and access controls are secured.

Blog

Don't Fly Blind - Use Machine Data Analytics to Provide the Best Customer Experience

Blog

DevSecOps 2.0

Blog

4 Reasons Why I Chose Azure: A Developer's Perspective

Before Azure and AWS, Microsoft development teams would need a server on their local network that would manage change control and commits. Once a week, a chosen developer would compile and deploy the application to a production server. Now developers have the option of creating cloud applications in Visual Studio and connecting their projects directly to an Azure cloud server. For a Windows developer, Azure was the choice I made to develop my own software projects and manage them in the cloud due to the easy integration between my development desktop and Azure cloud instances. Azure has similar services as other cloud providers, but it has some advantages for Microsoft developers that need to deploy applications for the enterprise. The biggest advantage for an enterprise is that it reduces the amount of on-site resources needed to support a developer team. For instance, the typical Microsoft-based enterprise has a team of developers, a staging server, a development server, QA resources, Jira, and some kind of ticketing system (just to name a few). With Azure, the team can set up these resources without the real estate or the hardware on-site. It also has integrated IaaS, so you can create a seamless bond between your internal network and the cloud infrastructure. In other words, your users will never know if they are working on a local server or running applications on Azure. The Dashboard For anyone who has managed Windows servers, the dashboard (Azure calls it your portal) is intuitive. The difficult part of starting an Azure account is understanding all of the options you see in the dashboard. For developers used to local environments where you need to provision multiple resources for one application, it's important to understand that everything you need to build an application is at your fingertips. You no longer need to go out and purchase different services such as SSL and database software and install it. You just click the service that you want and click "Create" in the Azure portal and Microsoft builds it into your service agreement. If you pay as you go, then you only pay for the resources that you use. You can build APIs, services, start virtual servers, and even host WordPress sites from your portal. In the image above, I've blacked out the names of my resources for security reasons, but you can see that I have a web application, an SSL certificate, two databases (one MySQL and one MSSQL), email services and a vault (for the SSL cert) in Azure. I have an individual plan and don't use many resources from their servers, so I pay about $100 a month for a small WordPress site. When I had two web applications running, I paid about $150 a month. Integration with Visual Studio The main reason I use Azure is for it's easy integration with my Visual Studio projects. For Windows projects, integration between Visual Studio and Azure is as simple as creating a web application and copying the connection information into your projects. Once you've created the connection in Visual Studio, you can use Azure for change control and promote code directly from your development computer. For a simple web application, Microsoft offers a web app and a web app with SQL. This should be self-explanatory. If you just want to build a web application and don't need a database, then you choose the first option. The second one will set up a web app and a SQL Server. You aren't limited to a Windows environment either. When you create a web app, Azure asks if you want to use Windows or Linux. When you create your application, it sits on a subdomain named <your_app_name>.azurewebsites.net. This will be important when you set up your TLD. This feature is the downside of using Azure over traditional hosting. When you set up your TLD, you set up a CNAME and configure a custom domain in your Azure settings. When search engines crawl your site, sometimes they index this subdomain instead of only the TLD. This makes it difficult to work with when you use applications such as WordPress. When you install WordPress, you install it on the subdomain, so WordPress has the subdomain in all of its settings. This causes errors when you promote the site to your TLD, and you must do a global database search and replace to remove the it from your settings. I found this was one con to using Azure for my websites. After you create the web app, you're shown basic information to get started should you just want to upload files to the new location using FTP. The "Get publish profile" provides you with a downloadable text file that you use to connect Visual Studio. You can also connect it directly to your own GitHub repository. As a matter of fact, Microsoft will generate pre-defined settings for several repositories. After you download one of these files, you import the settings directly into your profile and you're automatically connected. When I work in Visual Studio with an Azure connection, I check out files, edit them and check them back in as I code. It feels exactly like the old-school Team Foundation Server environment except I don't have to buy the expensive equipment and licenses to host an internal Microsoft Windows server. Easy VM Creation for Testing Third-Party Software Occasionally, I get a customer that has an application that they want me to test. It's usually in beta or it's their recent stable release but it's something that I wouldn't normally use on my machine. It's dangerous for me to install third-party software on my work machine. Should my customers' software crash my computer, I lose important data. Not only could I lose data, but I don't know what type of software I'm installing on my important developer machine connected to my home network. I don't want anything to have the ability to search my home network and storage. With an Azure VM, I not only keep the software off of my local machine, but it's shielded from my local network too. I could install a VM on my local desktop, but I like to keep this machine clean from anything other than what I need to develop and code. I use Azure VMs to install the software, and then I can destroy the VM when I'm done with it. What's great about Azure is that they have several pre-installed operating systems and frameworks to choose from. Here are just a few: Some other platforms not shown above but can be installed on an Azure VM: Citrix XenDesktop Kali Linux Cisco CSR SQL Server cluster Kiteworks Suse Linux Jira Magento WordPress When I'm working with client software, I use the default Windows Server installation, which is 2016. Microsoft adds the latest operating system to its list as they are released. It's an Azure advantage over working with traditional VPS service. With traditional service, you must ask the host to upgrade your operating system and change your service. With Azure, you spin up a new VM with any operating system you choose. I then use the VM to install the software, work with it until I'm finished a project, and then destroy the VM when I'm done. Because I only use the VM for a short time and don't connect it to any resource-intensive applications, it only costs me a few extra dollars a month. But I keep third-party apps "sandboxed" from my own network without using local resources. Working with WordPress on Azure Occasionally organizations have a separate WordPress site for content even with a team of Windows developers. It's easier for marketing to publish content with WordPress. Instead of using a separate shared host, Azure has the option to create an app with WordPress pre-installed. Most people think running WordPress on a Windows server is senseless, but you get some kind of security from standard hacks that look for .htaccess or Apache vulnerabilities, and you can keep your web services all in one place. It only costs a few dollars to host the WordPress site, but you pay for traffic that hits the service so it might be much more if you have a high-traffic blog. Azure has its Insights services as well, so you can monitor WordPress unlike traditional shared or VPS services where you need a third-party application. You also get monitoring and all the available infrastructure such as load balancing, SSL, and monitoring should you need it with the WordPress site. These aren't available with simple shared or VPS hosting. While running WordPress on Windows seems counterintuitive, running it in Azure is more beneficial for the enterprise that needs to keep detailed statistics on site usage and security from script kiddies that attack any public WordPress site with poor Linux management. Is Azure Better than AWS? Most people will tell you that AWS is a better choice, but I preferred Azure's portal. I found spinning up services was more intuitive, but what sold me was the integration into Visual Studio. If you're in an enterprise, the one advantage of Azure is that you can integrate Active Directory Services so that your network expands into Azure's cloud IaaS. This would be much better than building a separate portal that you must control with individual security settings. This eliminates the possibility of accidentally exposing data from incorrect security settings. If you decide to try Azure, they give you the first 30 days free. AWS gives you 12 months, so users have longer to figure out settings. I found Azure beneficial and kept paying for a low-traffic, low-resource account until I could figure out if I wanted to permanently use it. I've had an account for two years now and don't plan to ever switch.

Blog

Top 5 Metrics to Monitor in IIS Logs

Blog

kAuto Subscribing CloudWatch Log Groups to AWS Lambda Function

Blog

How Much Data Comes From The IOT?

Blog

Resolving Issues with Your Application Using IIS Logs

Blog

Biggest AWS Security Breaches of 2017

Blog

Three Dragons to Slay in Threat Discovery and Investigation for the Cloud

Chinese dragon symbol. Threat correlation and prioritization (what do I pay attention to in an avalanche of highlighted threats?) and threat investigation (how do I decide what happened and what to do quickly?) are extremely challenging core functions of the security defense, resulting in many cases with less than 10% of high priority threats fully investigated. The accelerating migration to cloud and modern application deployment are making these already difficult workflows untenable in traditional models, leading to questions such as how to gather and correlate all of the new sources of data at cloud scale? How to understand and triangulate new dynamic data from many layers in the stack? How to react with the pace demanded by new models of DevSecOps deployment? And how to collaborate to connect the dots across evolving boundaries and silos? Last week a veteran of many cloud migration security projects I know described many SOCs as“groping in the dark” with these challenges and looking for a new approach despite all of the vendor claims mapped to their pains. The usual crowd of incremental enhancements (e.g. bringing cloud data into the traditional SIEM, automating manual workflows, layering more tools for specialized analytics, leveraging wisdom of crowds, etc.) leaves three dragons roaming the countryside which need to be slain for security to keep pace with the unstoppable accelerating migration to the cloud. Dragon #1 – Siloed Security and IT Ops Investigation Workflows A basic dilemma in security for the cloud is that often the knowledge needed to pursue an investigation to conclusion is split between two groups. Security analysts understand the process of investigation and the broad context, but often only IT ops understands the essential specific context – application behavior and customer content, for example – needed to interpret and hypothesize at many steps in a security investigation. A frequent comment bucket item goes something like, “The SOC understands the infrastructure, but they don’t know how to interpret app logs or new data sources like container orchestration.” This gap in understanding makes real time collaboration essential to prevent exploding backlogs, partial investigations, and bias toward more solvable on-prem alerts. Aside from needing to understand unfamiliar, new, and rapidly changing data sources in a single security investigation, cloud deployments generate more frequent “Dual Ticket” cases in which it is unknown whether a security issue or an IT issue is the root cause (ex: my customer is complaining they can’t access our app – network congestion? Cloud provider outage? Server CPU overload? DDoS attack? Malware? Customer issue?) It isn’t just that two separate investigations take more time and resources to complete and integrate, often, in cloud cases, neither side can reach conclusion without the other. Working from common data isn’t enough – analytics and workflow need to be common as well to enable the seamless collaboration required. In addition, modern cloud deployments often employ DevSecOps models in which the pace of application update, rollout, and change is measured in days or hours as opposed to months or quarters. One security threat investigation implication is that the processing of the threat resolution backlog must align so that current resources can be applied to current environments without being mired in “old” cases or chasing continuous flux in the data. This is challenge enough, but having to manage this triage across two separate backlogs in both IT and security with the usual integration taxes means operating on the scale of hours and days is extremely challenging. While separate siloes for IT ops and security investigations were feasible and logical in on-prem classic IT, modern cloud deployments and application architecture demand a seamless back and forth workflow where at each step the skills and perspective from both IT and security are needed to properly interpret the results of queries, evidence uncovered, or unfamiliar data. Asking both sides to completely subsume the knowledge of the other is unrealistic in the short term – a much better solution is to converge their workflows so they can collaborate in real time. Dragon #2 – Traditional Security Bias on Infrastructure vs. Application Insight Traditional SIEMs have long been exhorted to look up the stack to the application layer, and in several instances new product areas have sprung up when they have not. In the cloud world this application layer “nice to have” becomes a “must have.” Clould providers have taken on some of the infrastructure defense previously done by individual companies, creating harder targets that cause attackers to seek softer targets. At the same time, much of the traditional infrastructure defense from the on-prem world has not yet been replicated in the cloud, so often application layer assessment is the only investigation method available. In addition to the defensive need to incorporate the application layer, there clearly is additional insight at that layer which is unknown at the infrastructure layer (e.g. customer context, behavioral analytics, etc.). This is particularly true when it is unclear whether a security or an IT problem exists. Many point systems specialize in extracting actionable insight from this layer, but the holistic correlation and investigation of threat is more difficult, in part because of wide variations in APIs, log formats, and nomenclature. Looking forward, modern application deployment in the cloud also increases the surface area for investigation and threat assessment. For example, chained microservices create many possible transitions in variables important to investigators. For all of these reasons, adding insight from the application layer is necessary and good for cloud deployments, but integrating this insight quickly with infrastructure insight is better. Many investigation workflows jump back and forth across these layers several times in a single step, so fully integrated workflows will be essential to leverage the assimilation of new insight. Dragon #3 – Investigation Times Measured in 10s of Minutes and Hours In cloud and modern application deployment, the sheer volume of incoming data will make yesterday’s data avalanche seem like a pleasant snow dusting. Also, dynamic and transient data, entities, and nomenclature make workflows straightforward (although still slow and annoying) in the old world (e.g. track changing IP addresses for a user or machine) extremely challenging in the cloud. Finally, collaboration will require new models of distributed knowledge transfer since investigation workflows will be shared across both security and IT ops. [Read More: Threat Intelligence] Many SOCs are at the breaking point in traditional environments with growing backlogs of investigations and reactive triage. Achieving investigation times in minutes to keep pace in the cloud despite these additional challenges, will require breakthrough innovation in getting rapid insight in huge dynamic data sets and in scaling learning models across both humans and machines. Slaying these dragons will not be easy or quick – new solutions and thinking will collide with comfort zones, entrenched interests, perceived roles of people and process, and more than a few “sacred cows.” Despite these headwinds – I’m optimistic looking ahead based on two core beliefs: 1) The massive economic and technological leverage of the cloud has already led to many other transition dragons of comparable ferocity being attacked with zeal (e.g. DevSecOps, Data Privacy, Regional Regulation, etc.), and 2) unlike many other transitions a broad cross section of the individuals involved in these messy transitions on the front lines have far more to gain in the leap forward of their own skills, learning, and opportunity than they have to lose. Aside from that, the increasingly public scorecard of the attackers vs. the defenders will help keep us honest about progress along the way.

Blog

Docker Logging Example

Docker is hard.Don't get me wrong. It's not the technology itself that is difficult...It's the learning curve. Committing to a Docker-based infrastructure means committing to a new way of thinking, which can be a harsh adjustment from the traditional thinking behind bare metal and virtualized servers.Because of Docker's role-based container methodology, simple things like log management can seem like a bear to integrate. Thankfully, as with most things in tech, once you wrap your head around the basics, finding the solution is simply a matter of perspective and experience.Collecting LogsWhen it comes to aggregating Docker logs in Sumo Logic, the process starts much like any other: Add a Collector. To do this, open up the Sumo Logic Collection dashboard and open up the Setup Wizard.Because we will be aggregating logs from a running Docker container, rather than uploading pre-collected logs, select the Set Up Streaming Data option in the Setup Wizard when prompted.Next up, it is time to select the data type. While Docker images can be based on just about any operating system, the most common base image—and the one used for this demonstration—is Linux-based.After selecting the Linux data type, it's time for us to get into the meat of things. At this point, the Setup Wizard will present us with a script that can be used to install a Collector on a Linux system.The DockerfileWhile copying and pasting the above script is generally all that is required for a traditional Linux server, there are some steps required to translate it into a Docker-friendly environment. To accomplish this, let's take a look at the following Dockerfile:<strong>FROM</strong> ubuntu<strong>RUN</strong> apt-get update <strong>RUN</strong> apt-get install -y wget nginx<strong>CMD</strong> /etc/init.d/nginx start && tail -f /var/log/nginx/access.logThat Dockerfile creates a new container from the Ubuntu base image, installs NGINX, and then prints the NGINX access log to stdout (which allows our Docker image to be long-running). In order to add log aggregation to this image, we need to convert the provided Linux Collector script into Docker-ese. By replacing the sudo and && directives with RUN calls, you'll end up with something like this:<strong>RUN</strong> wget "https://collectors.us2.sumologic.com/rest/download/linux/64" -O SumoCollector.sh <strong>RUN</strong> chmod +x SumoCollector.sh <strong>RUN</strong> ./SumoCollector.sh -q -Vsumo.token_and_url=b2FkZlpQSjhhcm9FMzdiaVhBTHJUQ1ZLaWhTcXVIYjhodHRwczovL2NvbGxlY3RvcnMudXMyLnN1bW9sb2dpYy5jb20=Additionally, while this installs the Sumo Logic Linux Collector, what it does not do is start up the Collector daemon. The reason for this goes back to Docker's "one process per container" methodology, which keeps containers as lightweight and targeted as possible.While this is the "proper" method in larger production environments, in most cases, starting the Collector daemon alongside the container's intended process is enough to get the job done in a straightforward way. To do this, all we have to do is prefix the /etc/init.d/nginx start command with a /etc/init.d/collector start && directive.When all put together, our Dockerfile should look like this:<strong>FROM</strong> ubuntu<strong>RUN</strong> apt-get update <strong>RUN</strong> apt-get install -y wget nginx<strong>RUN</strong> wget "https://collectors.us2.sumologic.com/rest/download/linux/64" -O SumoCollector.sh <strong>RUN</strong> chmod +x SumoCollector.sh <strong>RUN</strong> ./SumoCollector.sh -q -Vsumo.token_and_url=b2FkZlpQSjhhcm9FMzdiaVhBTHJUQ1ZLaWhTcXVIYjhodHRwczovL2NvbGxlY3RvcnMudXMyLnN1bW9sb2dpYy5jb20=<strong>CMD</strong> /etc/init.d/collector start && /etc/init.d/nginx start && tail -f /var/log/nginx/access.logBuild ItIf you've been following along in real time up to now, you may have noticed that the Set Up Collection page hasn't yet allowed you to continue on to the next page. The reason for this is that Sumo Logic is waiting for the Collector to get installed. Triggering the "installed" status is as simple as running a standard docker build command:docker build -t sumologic_demo .Run ItNext, we need to run our container. This is a crucial step because the Setup Wizard process will fail unless the Collector is running.docker run -p 8080:80 sumologic_demoConfigure the SourceWith our container running, we can now configure the logging source. In most cases, the logs for the running process are piped to stdout, so unless you take special steps to pipe container logs directly to the syslog, you can generally select any log source here. /var/log/syslog is a safe choice.Targeted CollectionNow that we have our Linux Collector set up, let's actually send some data up to Sumo Logic with it. In our current example, we've set up a basic NGINX container, so the easiest choice here is to set up an NGINX Collector using the same Setup Wizard as above. When presented with the choice to set up the Collection, choose the existing Collector we just set up in the step above.Viewing MetricsOnce the Collectors are all set up, all it takes from here is to wait for the data to start trickling in. To view your metrics, head to your Sumo Logic dashboard and click on the Collector you’ve created.This will open up a real-time graph that will display data as it comes in, allowing you to compare and reduce the data as you need in order to identify trends from within your running container.Next StepsWhile this is a relatively simplistic example, it demonstrates the potential for creating incredibly complex workflows for aggregating logs across Docker containers. As I mentioned above, the inline collector method is great for aggregating logs from fairly basic Docker containers, but it isn't the only—or best—method available. Another more stable option (that is out of the scope of this article) would be using a dedicated Sumo Logic Collector container that is available across multiple containers within a cluster. That said, this tutorial hopefully provides the tools necessary to get started with log aggregation and monitoring across existing container infrastructure.

Blog

AWS Config vs. CloudTrail

Blog

What You Need to Know About Meltdown and Spectre

Last week, a security vulnerability was announced involving the exploitation of common features in microprocessor chips that power computers, tablets, smartphones and data centers. The vulnerabilities known as “Meltdown” and “Spectre” are getting lot attention in the media, and no doubt people are concerned about its impact on business, customers, partners and more. Here’s what you really need to know about these vulnerabilities. What are Meltdown and Spectre? The Meltdown vulnerability, CVE-2017-5754, can potentially allow hackers to bypass the hardware barrier between applications and kernel or host memory. A malicious application could therefore access the memory of other software, as well as the operating system. Any system running on an Intel processor manufactured since 1995 (except Intel Itanium and Intel Atom before 2013) is affected. The Spectre vulnerability has two variants: CVE-2017-5753 and CVE-2017-5715. These vulnerabilities break isolation between separate applications. An attacker could potentially gain access to data that an application would usually keep safe and inaccessible in memory. Spectre affects all computing devices with modern processors manufactured by Intel or AMD, or designed by ARM*. These vulnerabilities could potentially be exploited to steal sensitive data from your computer, such as passwords, financial details, and other information stored in applications. Here is a great primer explaining these security flaws. What can be compromised? The core system, known as the kernel, stores all types of sensitive information in memory. This means banking records, credit cards, financial data, communications, logins, passwords and secret information could which is all be at risk due to Meltdown. Spectre can be used to trick normal applications into giving up sensitive data, which potentially means anything processed by an application can be stolen, including passwords and other data. Was the Sumo Logic platform affected? Yes. Practically every computing device affected by Spectre, including laptops, desktops, tablets, smartphones and even cloud computing systems. A few lower power devices, such as certain Internet of Things gadgets, are unaffected. How is Sumo Logic handling the vulnerabilities? As of January 4th, 2018, AWS confirmed that all Sumo Logic systems were patched, rebooted and protected from the recent Meltdown/Spectre vulnerability. We worked very closely with our AWS TAM team and verified the updates. Sumo Logic started the OS patching process with the latest Ubuntu release Canonical on January 9th. Risk level now that AWS has patched is low, but we will continue to be diligent in following up and completing the remediation process. We take this vulnerability very seriously and are dedicated to ensuring that Sumo Logic platform is thoroughly patched and continuously monitored for any malicious activity. If you have questions please reach out to [email protected].

Blog

Kubernetes Development Trends

Blog

Logs and Metrics: What are they, and how do they help me?

Blog

2018 Predictions: ICO Frenzy, Advance of Multi-Cloud, GDPR and More

This is an exciting time to be in enterprise software. With the rise of serverless, the power of hybrid computing and the endless uses of artificial intelligence (AI), 2017 will likely go down as the most disruptive ever. But what does 2018 have in store? Sumo Logic executives weighed in in our yearly prediction series. Read on to see what they predict will influence the coming year in technology the most. Also keep these in mind and check back in mid-year to see how many come true! Demand for multi-cloud, multi-platform will drive the need for multi-choice Over the past few years, there has been much debate within enterprise IT about moving critical infrastructure to the cloud – specifically, around which cloud model is the most cost effective, secure and scalable. One thing is for certain – the cloud is the present (and future) of enterprise IT, and legacy companies continuing to predominantly or solely house their infrastructure on-premises to support existing or new modern applications will become increasingly irrelevant in a few years from now as their competitors will prevail. Moreover, this problem is further exacerbated as cloud users are demanding choice, which is going to drive massive growth in multi-cloud, multi-platform adoption in 2018. As a result, enterprises will need a unified cloud native analytics platform that can run across any vendor, whether it’s Amazon, Microsoft or Google, including what’s traditionally running on-premise. This agnostic model will serve as the backbone for the new world I refer to as the analytics economy, defined by positive disruption at every layer of the stack. –Ramin Sayar, CEO, Sumo Logic ICO creates another Wild West Next year we will begin to see the results from the growth in the initial coin offering (ICO) frenzy from this year. Investors have poured more than $300B into more than 200 ICOs this year, but it’s still a very unregulated market with a mix of players attempting to gain early entry. While legitimate companies are seeking and benefiting from crypto-token funding, there is still a lot of dubious activity from questionable characters in the space trying to make a quick buck. If crypto-token equity begins to take hold and legitimizes its worth to investors who pursued ICOs this year, then in 2018 the startup market will become the wild freaking west. AI will not transform the enterprise in the near future Previous predictions and claims about the direct impact of AI on enterprises have been overblown. There is excessive hype around how AI will lead us to new discoveries and medical breakthroughs. However, those expecting AI to be the ultimate truth conveyer are mistaken. It will be very hard to design a model that can determine unbiased truth, because human bias – whether explicitly or implicitly – will be coded into these data analytics systems and reinforce existing beliefs and prejudices. With that said, there are certain applications where systems can make better decisions in a shorter amount of time than humans, such as in the case of autonomous vehicles. In 2018 we will begin to see real use cases of the power of AI appear in our everyday lives — it just isn’t ready to be the shining star for the enterprise quite yet. When you look at the maturity of the enterprise, only half of the Global 2000 offer fully digital products. So, despite all of the buzz around digital transformation, there’s a lot of catch-up to be done before many of these companies can even consider looking at advanced developments such as AI. — Christian Beedgen, CTO, Sumo Logic GDPR regulations will turn massive tech companies into walking targets It won’t take long after the May 25 GDPR deadline before the gloves come off and the European Union cracks down on audits of big tech companies. We’re talking about Uber, Google, Apple and so forth. This will be EU’s effort to reinforce the severity of meeting GDPR regulations and to show that no business – not even the household names – will be immune to complying with GDPR stands. After the EU cracks down on the big tech companies, financial institutions and travel companies will be next, as these types of organizations are the most globalized industries, where data flows freely across geographical borders. And regardless of the EU’s efforts, the reality is that many companies won’t meet the May deadline, whether due to lack of resources, laziness or apathy. You better believe that those businesses that don’t get on board – and get caught – will be crushed, as business will come to a grinding halt. Government will continue to fall flat with security If I were a hacker, I would target the path of least resistance, and right now – and into 2018 – that path collides squarely with government agencies. What’s scary is that government organizations hold some of our most critical data, such as social security numbers, health records and financial information. It’s shocking how the government generally lags in terms of security and technology innovation. Over the past few years the government has been a prime target for bad actors. Take a look at the Office of Personnel Management breach in 2015, and more recently the hacks into the Department of Homeland Security and FBI in 2016. Next year will be no different. Even with all of the panels, hearings and legislation, such as the Modernizing IT Act and the executive order, reaffirming its commitment to updating and implementing stronger cybersecurity programs, the government is already 10-15 years behind, and I don’t see this improving over the next year. Millennials will be our security saving grace Millennials will inspire a societal shift in the way we view security and privacy. If you follow the data, it’ll make sense. For instance, Facebook is now most popular among adults age 65 and older. It’s less appealing to younger generations who’ve moved on to newer, more secure ways to express themselves, such as disappearing video chats with Snapchat. As social media evolves, privacy, user control/access and multi-factor authentication have become a natural part of protecting online identity, for both users and developers alike. My personal resolution for 2018 is to step up my mentorship to this younger generation. If we can encourage them to channel this “Security First” way of thinking in a professional capacity, we can continue to build a resilient and robust cybersecurity workforce that makes us all more secure. –George Gerchow, VP of Security and Compliance, Sumo Logic Now that you have read Sumo’s, tell us your predictions for the coming year. Tweet them to us at @SumoLogic

December 15, 2017

Blog

Finding and Debugging Memory Leaks with Sumo

Memory leaks happen when programs allocate more memory than they return. Memory is beside Compute one of the critical assets of any computer system. If a machine runs out of memory, it cannot provide its service. In the worst case, the entire machine might crash and tear down all running programs. The bugs responsible for that misbehavior are often hard to find. Sumo’s collector enables monitoring memory consumption out of the box. Using some additional tooling, it is possible to collect fine-grained logs and metrics that accelerate finding and efficient debugging of memory leaks. Ready to get started? See all the ways the Sumo Logic platform helps monitor and troubleshoot—from a seamless ingestion of data, to cross-platform versatility, and more. You can even get started for free.Free Trial Memory Management and Memory Leaks Memory management is done on multiple levels: The Operating System (OS) keeps track of memory allocated by its program in kernel space. In user space, virtual machines like the JVM might implement their own memory management component. At its core, memory management follows a Producer-Consumer pattern. The OS or VM gives away (produces) chunks of memory whenever programs are requesting (consuming) memory. Since memory is a finite resource in any computer system, programs have to release the allocated memory that is then returned to the pool of available memory managed by the producer. For some applications, the programmer is responsible for releasing memory, in others like the JVM a thread called garbage collector will collect all objects that are used no more. A healthy system would run through this give-and-take in a perfect circle. In a bad system, the program fails to return unused memory. This happens for example if the programmer forgets to call the functionfree, or if some objects keep on being referenced from a global scope after usage. In that case, new operations will allocate more memory on top of the already allocated, but unused memory. This is misbehavior is called a memory leak. Depending on the size of the objects this can be as little as a few bytes, kilobytes, or even megabytes if the objects, for example, contain images. Based on the frequency the erroneous allocation is called, the free space fills up as quickly as a few microseconds or it could take months to exhaust the memory in a server. This long time-to-failure can make memory leaks very tricky to debug because it is hard to track an application running over a long period. Moreover, if the leak is just a few bytes this marginal amount gets lost in the noise of common allocation and release operations. The usual observation period might be too short to recognize a trend. This article describes a particularly interesting instance of a memory leak. This example uses the Akka actor framework, but for simplicity, you can think of an actor as an object. The specific operation in this example is downloading a file: An actor is instantiated when the user invokes a specific operation (download a file) The actor accumulates memory over its lifetime (keeps adding to the temporary file in memory) After the operation completes (file has been saved to disk), the actor is not released The root cause of the memory leak is that it can handle only one request and it is useless after saving the content of the file. There are no references to the actor in the application code, but there still is a parent-child relationship defined in the actor system that defines a global scope. From After-the-Fact Analysis to Online Memory Supervision Usually, when a program runs out of memory it terminates with an “Out of Memory” error or exception. In case of the JVM, it will create a heap dump on termination. A heap dump is an image of program’s memory at the termination instant and saved to disk. This heap dump file can then be analyzed using tools such as MemoryAnalyzer, YourKit, or VisualVM for the JVM. These tools are very helpful to identify which objects are consuming what memory. They operate, however, on a snapshot of the memory and cannot keep track of the evolution of the memory consumption. Verifying that a patch works is out of the scope of these tools. With a little scripting, we can remediate this and use Sumo to build an “Online Memory Supervisor” that stores and processes this information for us. In addition to keeping track of the memory consumption history of our application, it saves us from juggling around with heap dump files that can potentially become very large. Here’s how we do it: 1. Mechanism to interrogate JVM for current objects and their size The JVM provides an API for creating actual memory dumps during runtime, or just retrieve a histogram of all current objects and their approximate size in memory. We want to do the latter as this is much more lightweight. The jmap tool in the Java SDK makes this interface accessible from the command line: jmap -histo PID Getting the PID of the JVM is as easy as grepping for it in the process table. Note that in case the JVM runs as a server using an unprivileged user, we need to run the command as this user via su. A bash one-liner to dump the object histogram could look like: sudo su stream -c"jmap -histops -ax | grep "[0-9]* java" | awk '{print $1}' > /tmp/${HOSTID}_jmap-histo-`date +%s`.txt" 2. Turn result into metrics for Sumo or just drop it as logs As a result of the previous operation, we have now a file containing a table with object names, count, and retained memory. IN order to use it in Sumo we’ll need to submit it for ingestion. Here we got two options: (a) send the raw file as logs, or (b) convert the counts to metrics. Each object’s measurement is a part of a time series tracking the evolution of the object’s memory consumption. Sumo Metrics ingest various time series input formats, we’ll use Graphite because it’s simple. To affect the conversion of a jmap histogram to Graphite we use bash scripting. The script cuts beginning and end of the file and then parses the histogram to produce two measurements: <class name, object count, timestamp> <class name, retained size, timestamp> Sending these measurements to Sumo can be done either through Sumo’s collector, using collectd with Sumo plugin, or sending directly to the HTTP endpoint. For simplicity, we’ve used the Graphite format and target the Sumo collector. To be able to differentiate both measurements as well as different hosts we prepend this information to the classpath: <count|size>.<host>.classpath For example, a jmap histogram might contain data in tabular form like: 69: 18 1584 akka.actor.ActorCell 98: 15 720 akka.actor.RepointableActorRef 103: 21 672 akka.actor.ChildActorPath 104: 21 672 akka.actor.Props Our script turns that into Graphite format and adds some more hierarchy to the package name. In the next section, we will leverage this hierarchy to perform queries on objects counts and sizes. count.memleak1.akka.actor.ActorCell 18 123 count.memleak1.akka.actor.RepointableActorRef 15 123 count.memleak1.akka.actor.ChildActorPath 21 123 count.memleak1.akka.actor.Props 21 123 In our case, we’ll just forward these logs to the Sumo collector. Previously, we’ve defined a Graphite source for Metrics. Then, it’s as easy as cat histogram-in-graphite | nc -q0 localhost 2003. 3. Automate processing via Ansible and StackStorm So far we are now capable of creating a fine-grained measurement of an application’s memory consumption using a couple of shell commands and scripts. Using the DevOps automation tools Ansible and StackStorm, we can turn this manual workflow in an Online Memory Supervision System. Ansible helps us to automate taking the measurement of multiple hosts. For each individual host, it connects to the hosts via ssh, runs the jmap command, the python conversion script, and submits the measurement to Sumo. StackStorm manages this workflow for us. In a given period, it kicks off Ansible and logs the process. In case something goes wrong, it defines remediation steps. Of course, there are alternatives to the myriad of available tools. Ansible competes with SaltStack, Chef, and Puppet. StackStorm is event-driven automation with all bells and whistles, for this example, we could have used a shell script with sleepor a simple cron job. Using Sumo to Troubleshoot Memory Leaks Now it’s time to use Sumo to analyze our memory. In the previous steps, we have submitted and ingested our application’s fine-grained memory consumption data. After this preparation, we can leverage Sumo to query the data and build dashboards. Using queries, we can perform in-depth analysis. This is useful as part of a post-mortem analysis to track down a memory leak, or during development to check, if a memory allocation/deallocation scheme actually works. During runtime, dashboards could monitor critical components of the application. Let’s check this out on a live example. We’ll use a setup of three JVMs simulating an application and a StackStorm instance. Each is running in their own Docker container, simulating a distributed system. To make our lives easier, we orchestrate this demo setup using Vagrant: Figure 1: Memory leak demo setup and control flow A Memory Measurement node orchestrates the acquisition process. We’ve developed a short Ansible script that connects to several application nodes and retrieves a histogram dump from the JVMs running the faulty program from [1]. It converts the dumps to Graphite metrics and sends them via the collector to Sumo. StackStorm periodically triggers the Ansible workflow. Finally, we use the UI to find and debug memory leaks. Analyze memory consumption First, we want to get an overview of what’s going on in the memory. We start to look at the total memory consumption of a single host. A simple sum over all objects sizes yields the application’s memory consumption over time. The steeply increasing curve abruptly comes to an end at a total of about 800 Mb. This is the total memory that we dispatched to the JVM (java -Xmx800m -jar memleak-assembly-0.1.jar). Figure 2: Total memory consumption of host memleak3 Drilling down on top memory consumers often hints at the responsible classes for a memory leak. For that query, we parse out all objects and sum their counts and sizes. Then we display only the top 10 counts. In the size query, we filter out objects above a certain size. These objects are the root objects of the application and do not contain much information. Figure 3: Top memory consumers on a single node Figure 4: To memory top consumers by size We find out that a Red-Black Tree dominates the objects. Looking at the Scala manual tells us that HashMaps make extensive use of this data structure: Scala provides implementations of immutable sets and maps that use a red-black tree internally. Access them under the names TreeSet and TreeMap. We know that ActorSystem uses HashMaps to store and maintain actors. Parsing and aggregating queries help to monitor entire subsystems of a distributed application. We use that to find out that the ActorSystem accumulates memory not only on a single host but over a set of hosts. This leads us to believe that this increase might not be an individual error, by a systemic issue. Figure 5: Use query parsing and aggregation operations to display the ActorSystem’s memory consumption A more detailed view of the Child actor reveals the trend how it accumulates memory. The trick in this query is that in the search part we filter out the packages inakka.actor.* the search expression and then use the aggregation part to parse out the single hosts and sum the size values of their individual objects. Since all three JVMs started at the same time, their memory usage increases at a similar rate in this picture. We can also split this query into three separate queries like below. These are looking at how the Child actors on all three hosts are evolving. Figure 6: The bad Child actor accumulating memory Finally, we verify that the patch worked. The latest chart shows that allocation and deallocation are now in balance on all three hosts. Figure 7: Memory leak removed, all good now Memory Analysis for Modern Apps Traditional memory analyzers were born in the era of standalone, desktop applications. Therefore, they work on snapshots and heap dumps and cannot track the dynamicity of memory allocation and deallocation patterns. Moreover, they are also restricted to work on single images and it is not easy to adapt them to a distributed system. Modern Apps have different requirements. Digital Businesses provide service 24/7, scale out in the cloud, and compete in terms of feature velocity. To achieve feature velocity, detecting memory issues online is more useful than after-the-fact. Bugs such as memory leaks need rapid detection and bugfixes inserted frequently and without stopping services. Pulling heap dumps and starting memory analyzers just won’t work in many cases. Sumo takes memory analysis to the next level. Leveraging Sumo’s Metrics product we can track memory consumption for classes and objects within an application. We look at aggregations of their counts and sizes to pinpoint the fault. Memory leaks are often hard to find and need superior visibility into an application’s memory stack to become debuggable. Sumo achieves this not only for a single instance of an application but scales memory analysis across the cloud. Additionally, Sumo’s Unified Logs and Monitoring (ULM) enables correlating logs and metrics and facilitates understanding the root cause of a memory leak. Bottom Line In this post, we showed how to turn Sumo into a fine-grained, online memory supervision system using modern DevOps tools. The fun doesn’t stop here. The presented framework can be easily extended to include metrics for threads and other resources of an application. As a result of this integration, developers and operators gain high visibility in the execution of their application. References Always stop unused Akka actors – Blog Post Acquire object histograms from multiple hosts – Ansible Script Sumo’s Modern Apps report – BI Report

Blog

awsMonitor AWS Lambda Functions with Sumo Logic

Blog

Optimizing Cloud Security: Amazon GuardDuty and Sumo Logic

Security concerns and skill shortages continue to impede cloud adoption Migration to the cloud is still being hampered by the security concerns this new frontier poses to these organizations and due to the same cybersecurity skills gaps already present in many if not most of these organizations today. This was highlighted in a 2017 survey by Forbes where 49% of respondents stated that they were delaying cloud deployment due to a cyber security skills gap. And even with adequate staffing, those organizations who have adopted some facet of cloud into their organization, express concerns in their abilities to monitor and manage these new environments. Sumo Logic and Amazon GuardDuty to the rescue Sumo Logic was founded over seven years ago, by security industry professionals, as a secure, cloud-native, machine data analytics platform, to convert machine data into real-time continuous intelligence, providing organizations with the full-stack visibility, analytics and insights they need to build, run and secure their modern applications and cloud infrastructures. The Sumo Logic platform provides security analytics and visibility across the entire AWS environment with context derived from details such as user access, platform configurations, changes, and with the ability to generate audit trails to demonstrate compliance with industry standards. Sumo Logic also correlates analytics from Crowdstrike threat intelligence to identify risks and threats in the AWS environment such as communications with malicious IPs, URLs, or Domains. At AWS’ annual re:Invent 2017 conference in Las Vegas this week, they announced the availability of Amazon GuardDuty. GuardDuty, provides AWS users with a continuous security monitoring and threat detection service. And due to Sumo Logic’s strong, and long standing relationship with AWS, Sumo Logic was provided early access to the beta version of GuardDuty, which allowed the team to develop, announce and release in parallel with Amazon, the complimentary Sumo Logic Amazon GuardDuty App. Click to enlarge The way GuardDuty works is by gathering log data from three distinct areas of the AWS cloud environment including: AWS Virtual Private Cloud (VPC) “flow logs” AWS CloudTrail “event logs” AWS Route 53 DNS “query logs” Along with the log data above, AWS provides additional sources of context (including threat intel associated with the AWS environment) to provide users with identification of potential threats in their environments. These potential threats are called “findings” by GuardDuty. Each “finding” provides users with details about each of the threats identified so that they can take any necessary action as needed. “Findings” details include to following information: Last seen – the time at which the activity took place that prompted the finding. Count – the number of times the finding was generated. Severity – the severity level (High, Medium, or Low) High – recommendation to take immediate remediation steps. Medium – investigate the implicated resource at your earliest convenience. Low – suspicious or malicious activity blocked. No immediate action needed. Finding Type – details and include the: Threat Purpose (more details available in the GuardDuty User Guide): Backdoor Behavior Cryptocurrency Pentest Recon Stealth Trojan UnauthorizedAccess Resource Type Affected: with the initial release of GuardDuty “only EC2 instances and IAM users (and their credentials) can be identified in findings as affected resources” Threat Family Name: the overall threat or potential malicious activity detected. Threat Family Variant: the specific variant of the Threat Family detected. Artifact: a specific resource owned by a tool used in the attack. Region – the region in which the finding was generated. Account ID – the ID of the AWS account in which the activity took place t Resource ID – the ID of the AWS resource against which the activity took place Target – the area of your AWS infrastructure where GuardDuty detected potentially malicious or anomalous activity Action – the activity that GuardDuty perceived to be potentially malicious or anomalous. Actor – the user that engaged in the potentially malicious or unexpected activity The Sumo Logic Amazon GuardDuty App Value-Add Pre-built Sumo Logic GuardDuty dashboards: Sumo Logic provides a single pane of glass to reduce the complexity of managing multiple environments, with pre-configured, user friendly and customizable dashboards that take GuardDuty’s linear data format and layers-on rich graphical reporting and depictions of trends over time. Click to enlarge Click to Fix: The Sumo Logic Amazon GuardDuty App allows users to rapidly, and visually identify “findings”, ranked by their severity levels (high, medium, and low), and can simply click on any of them to be automatically routed to their AWS environment to take any necessary actions for remediation. Value-added Context: The Sumo Logic Amazon GuardDuty App adds additional sources of analytics for deeper and wider visibility in the AWS environment and context across the organization including full stack visibility into application/infra logs, Application/Elastic Load Balancer (ALB/ELB) performance, and supplemental threat intel provided by Crowdstrike with no additional fees. The new Amazon GuardDuty offering along with capabilities from Sumo Logic’s tightly integrated GuardDuty App provides organizations with the tools they need to more simply and effectively manage and monitor their AWS cloud environments. And with the visibility for more rapid detection and remediation of real and potential threats to mission critical resources in those environments. Get the Sumo Logic Amazon GuardDuty App Sign up for Sumo Logic instantly and for free Watch the Sumo Logic product overview video.

Blog

Monitoring k8s-powered Apps with Sumo Logic

Blog

The Countdown to AWS re:Invent 2017

I travel to a lot of conferences over the course of the year. But the grand poohbah of them all (and one of my personal favorites) is AWS re:Invent. This has quickly become the must-attend tech event, and with more than 40,000 attendees expected in Las Vegas, this year will no doubt be one for the books. The Sumo Logic team will be there in full force showcasing our real-time machine data analytics platform and how we help businesses get the continuous intelligence needed to build, run and secure modern applications to accelerate digital business transformation. Here’s a rundown of some of our key activities: Sumo Logic Breakout Presentation: Making the Shift to Practical DevSecOps Agility is the cornerstone of the DevOps movement with security best practices and compliance are now the responsibility of everyone in the development lifecycle. Our VP of Security and Compliance George Gerchow will be presenting on Tuesday, Nov. 28 at 2:30 pm at the Aria Hotel. Swing by to learn best practices for making the shift to DevSecOps leveraging the CIS AWS Foundation Benchmarks. Visit us at Booth #1804 Stop by our booth to learn more about the power of real-time machine data analytics and how to centralize your data and turn analytics into business, operational, and security insights for full stack visibility of your AWS workloads. See live demos, talk to our technical experts and pick up limited edition swag! Join us at the Modern App Ecosystem Jam On Wednesday, Nov. 29 we will be hosting a party with our awesome partner ecosystem celebrating today’s new world of building, running and securing modern applications. No presentations, no pitches, just an evening networking with peers. Take a break from AWS re:Invent and join us for yummy Cuban appetizers, specialty mojitos and drinks, cigar rollers, entertainment, swag and more! Space is limited – secure your spot today! Closed-Loop Security Analytics with CloudPassage Throughout the week Sumo Logic will be co-presenting with CloudPassage to highlight our joint integration which gives users a comprehensive, real-time view of security and compliance postures while rapidly detecting and containing attacks. Stop by the CloudPassage booth #913 to learn more. Follow the Conversations on Social We will be live tweeting, posting Facebook Live videos and photos throughout the week. Twitter: @SumoLogic LinkedIn: www.linkedin.com/SumoLogic Facebook: https://www.facebook.com/Sumo.Logic/ For a full list of events and news, check out our re:Invent events page. We look forward to seeing you in Las Vegas next week!

Blog

Christian's Musings from Web Summit 2017

I was able to attend my third Web Summit last week. This is the second time for me in Lisbon, as I was lucky enough to be invited to talk on the Binate.io stage again after last year. If you are interested, check out my musings on instinct, intuition, experience and data analytics. Web Summit has grown tremendously since I first attended the Dublin incarnation in 2013. This year, the event was sold out at 60,000 attendees (!) - the Portuguese came out in force, but it was very clear that this event is, while of course drawing most attendees from all across Europe, ultimately an international affair as well. With so many people attending, Web Summit can be rather overwhelming. There is a bit of everything, and an incredible crowd of curious people. Lisbon is fantastically beautiful city, off the beaten path when it comes to tech conferences mostly, so the local folks are really coming out in force to take in the spectacle. So, what is Web Summit? Originally started in Dublin in 2009, it has over the years become a massive endeavor highlighting every conceivable aspect of technology. There's four massive conference halls with multiple stages for speakers and podium discussions in each hall. Christian Beedgen on binate.io stage - Web Summit, Lisbon 2017 Then there is the main arena holding 20,000 people; this is where the most high-profile keynote speakers hit the stage. Web Summit has always brought in government officials and politicians to the show as well in an effort to promote technology. I was actually standing next to Nigel Farage at the speaker cloak room waiting for my coat. There was another guy there as well who was already berating this unfortunate character, so thankfully I didn't have to do it myself. I managed to catch a couple of the keynotes in the aforementioned large arena. Three of them left an impression. Firstly, it was great to see Max Tegmark speak. I am reading his current book, Life 3.0, right now, and it is always a bit of trip when the author suddenly appears on a stage and you realize you have to throw away your mental image of that voice in your head that has been speaking to you from the pages of the book and adopt reality. In this case however, this was not a negative, as Max came across as both deeply knowledgeable and quite relaxed. He looked a bit like he is playing in the Ramones with his black leather jacket and black jeans; this I didn't see coming. In any case, I highly recommend checking out what he has to say. In light of the current almost bombastically overblown hype around AI, he is taking a very pragmatic view, based on many years of his own research. If you can imagine a future of "beneficial AI", check out his book, Life 3.0, for why and how we have a chance to get there. I was also impressed by Margrethe Vestager. She is a Danish politician and currently the European Commissioner for Competition. She captured the audience by simply speaking off of a couple of cue cards, not PowerPoint slides at all. Being a politician, she was casting a very official appearance, of course - but she wore some sick sneakers to a conservative dress which I thought was just awesome. Gotta love the Danish! Her talk centered around the reasoning behind the anti-trust investigation she brought against Google (which eventually lead to a $2.7 billion fine!) The details are too complicate to be reasonably summarized here, but essentially centered around the fact that while nobody in the EU has issues with Google's near-monopoly on search, in the eyes of the competition watchdogs, for Google to use this position to essentially favor their own products in search results creates intolerable fairness issues for other companies. It is very interesting to see how these views are developing outside of the US. The third and last memorable session had animated AI robots dialoguing with their inventor, Einstein, Artificial General Intelligence, distributed AI and models and the blockchain. It was by and large only missing Taylor Swift. SingularityNET is a new effort to create an open, free and decentralized market place for AI technology, enabled by Smart Contracts. I frankly don't have the slightest clue how that would work, but presenter Ben Goertzel was animatedly excited about the project. The case for needing an AI marketplace for narrow AIs to compose more general intelligences was laid out in a strenuous "discussion" with "lifelike" robots from Hanson Robotics. It is lost on me why everybody thinks they need to co-opt Einstein; first Salesforce calls their machine learning features Einstein, now these robotics guys have an Einstein robot on stage. I guess the path to the future requires still more detours to the past. I guess Einstein can't fight back on this anymore and at least they are picking an exceptional individual... Now that I am back in the US for only a day, the techno-optimism that's pervasive at Web Summit feels like a distant memory already.

November 14, 2017

Blog

Monitor DynamoDB with Sumo Logic

AWS

November 9, 2017

Blog

The Path to DevSecOps in 6 Steps

Blog

AWS Security Best Practices: Log Management

Blog

Apache Log Analysis with Sumo Logic

Blog

AWS ELB vs. NGINX Load Balancer

Blog

Apache Logs vs. NGINX Logs

Blog

Introducing the State of Modern Applications in the Cloud Report 2017

Blog

Packer and Sumo Logic - Build Monitoring Into Your Images

Whether you're new to automating your image builds with Packer, new to Sumo Logic, or just new to integrating Packer and Sumo Logic, this post guides you through creating an image with Sumo Logic baked in. We'll use AWS as our cloud provider, and show how to create custom machine images in one command that allow you to centralize metrics and logs from applications, OSs, and other workloads on your machines. Overview When baking a Sumo Logic collector into any machine image, you'll need to follow three main steps: First, create your sources.json file, and add it to the machine. This file specifies what logs and metrics you'd like to collect It's usually stored at /etc/sources.json, although you can store it anywhere at point to it Next, download, rename, and make the collector file executable. Collector downloads for various operating systems and Sumo Logic deployments can be found here An example command might look like: sudo wget 'https://collectors.us2.sumologic.com/rest/download/linux/64' -O SumoCollector.sh && sudo chmod +x SumoCollector.sh Finally, run the install script and skip registration. The most important part here is to use the -VskipRegistration=true flag so that the collector doesn't register to the temporary machine you are trying to built the image with Other important flags include -q > Run the script in quiet mode -Vephemeral=true > This tells Sumo Logic to auto-remove old collectors that are no longer alive, usually applicable for autoscaling use cases where VMs are ephemeral -Vsources=/etc/sources.json > Point to the local path of your sources.json file -Vsumo.accessid=<id> -Vsumo.accesskey=<key> > This is your Sumo Logic access key pair See all installation options here An example command might look like: sudo ./SumoCollector.sh -q -VskipRegistration=true -Vephemeral=true -Vsources=/etc/sources.json -Vsumo.accessid=<id> -Vsumo.accesskey=<key> Packer and Sumo Logic - Provisioners Packer Provisioners allow you to communicate with third party software to automate whatever tasks you need to built your image. Some examples of what you'd use provisioners for are: installing packages patching the kernel creating users downloading application code In this example, we'll use the Packer Shell Provisioner, which provisions your machine image via shell scripts. The basic steps that Packer will execute are: Start up an EC2 instance in your AWS account Download your sources.json file locally, which describes the logs and metrics you'd like to collect Download the Sumo Logic collector agent Run the collector setup script to configure the collector, while skipping registration (this creates a user.properties config file locally) Create the AMI and shut down the EC2 instance Print out the Amazon Machine Image ID (AMI ID) for your image with Sumo baked in Instructions: Packer and Sumo Logic Build Before You Begin To ensure Packer can access your AWS account resources, make sure you have an AWS authentication method to allow Packer to control AWS resources: Option 1: User key pair Option 2: Set up the AWS CLI or SDKs in your local environment I have chosen option 2 here so my Packer build command will not need AWS access key pair information. After setting up your local AWS authentication method, create a Sumo Logic free trial here if you don't already have an account. Then, generate a Sumo Logic key pair inside you Sumo Logic account. Copy this key down, as the secret key will only be shown once. Step 1 - Get Your Files After downloading Packer, download the Packer+Sumo_template.json and the packer_variables.json files, and place all 3 in the same directory. Step 2 - Customize Variables and Test Your Template Use the command ./packer validate packer_sumo_template.json to validate your packer template. This template automatically finds the latest Amazon Linux image in whatever region you use, based on the source_ami_filter in the builders object:"source_ami_filter": { "filters": { "virtualization-type": "hvm", "name": "amzn-ami-hvm-????.??.?.x86_64-gp2", "root-device-type": "ebs" }, "owners": ["amazon"], "most_recent": true } Customize the Region in the packer_variables.json file to the AWS Region you want to build your image in You can also change the Sumo collector download URL if you are in a different deployment The sources.json file url can be updated to point to your own sources.json file, or you can update the template to use the Packer File Provisioner to upload your sources.json file, and any other files Step 3 - Build Your Image Use the command ./packer build -var-file=packer_variables.json -var 'sumo_access_id=<sumo_id>' -var 'sumo_access_key=<sumo_key>' packer_sumo_template.json to build your image. You should see the build start and finish like this: Image Build Start Image Build Finish Done! Now that you've integrated Packer and Sumo Logic, you can navigate to the AMI section of the EC2 AWS console and find the image for use in Autoscaling Launch Configurations, or just launch the image manually. Now What? View Streaming Logs and Metrics! Install the Sumo Logic Applications for Linux and Host Metrics to get pre-built monitoring for your EC2 Instance: What Else Can Sumo Logic Do? Sumo Logic collects AWS CloudWatch metrics, CloudTrail audit data, and much more. Sumo Logic also offers integrated Threat Intelligence powered by CrowdStrike, so that you can identify threats in your cloud infrastructure in real time. See below for more documentation: AWS CloudTrail AWS CloudWatch Metrics Integrated Threat Intelligence What's Next? In part 3 of this series (will be linked here when published), I'll cover how to deploy an Autoscaling Group behind a load balancer in AWS. We will integrate the Sumo Logic collector into each EC2 instance in the fleet, and also log the load balancer access logs to an S3 bucket, then scan that bucket with a Sumo Logic S3 source. If you have any questions or comments, please reach out via my LinkedIn profile, or via our Sumo Logic public Slack Channel: slack.sumologic.com (@grahamwatts-sumologic). Thanks for reading!

AWS

September 29, 2017

Blog

Transitioning to Cloud: The Three Biggest Decision Hurdles

Written by Rod Trent Just a couple years ago, the cloud seemed far enough way that most organizations believed they could take their time migrating from on-premises systems to remote services and infrastructure. But, as the cloud wars heated up, the abilities of the cloud quickened so that now cloud adoption seems like a forgone conclusion. While Amazon continues to be a leader on paper, Microsoft is making some serious inroads with its “intelligent cloud,” Azure. And, we’ve got to the point in just a short period of time where there’s almost nothing keeping organizations from initiating full migrations to the cloud. Amazon, Microsoft, Google and others will continue to clash in the clouds, leapfrogging each other in function, feature, and cost savings. And, that’s good for everyone. Companies can eliminate much of the hardware and network infrastructure costs that have buoyed internal technology services for the last 20 years. A move to the cloud means delivering a freedom of work, allowing the business to expand without those historical, significant investments that dried up the annual budget just a few months into the fiscal year. It’s also a freedom of work for employees, making them happier, more confident, and more productive. For those that have now concluded that it’s time to migrate, and for those that are still wanting to stick a toe in the water to test, taking that first step isn’t as tough as it once was and there’s a “cool factor” about doing so – if for nothing but the cost savings alone. To get started, here are the three biggest hurdles to overcome. Determining How Much or How Little For most organizations shifting from on-premises to the cloud, it’s not an all or nothing scenario. Not everything can or should be run in the cloud – at least not yet. It takes a serious effort and proper due diligence to determine how much can be migrated to operate in the cloud. In a lot of cases, companies need to take an approach where they wean themselves slowly off reliance on on-premises operations and upgrade old systems before moving them. Many companies that go “all in” realize quickly that a slow and steady pace is more acceptable and that a hybrid cloud environment produces the most gains and positions the company for future success. Which leads to the next point… Locating an Experienced Partner Companies should not approach a migration to the cloud as something that is their sole responsibility. The organizations that have the most success are the ones that invested in partnerships. Experienced partners can help minimize headache and cost by identifying the company’s qualified software and solutions and leading organizations’ applications and processes into the cloud. A partner like Sumo Logic can help with application and service migrations. Their experience and solutions are designed to eliminate hassle and ensure success. We are happy to have Sumo Logic as a sponsor of IT/Dev Connections this year. Which leads to the next point… Hire or Educate? There are new skills required for operating in the cloud and not every organization has IT staff or developers that are well-versed in this modern environment. Many companies, through the process of determining the level of cloud and identifying the right partnerships will be able to determine the skills required for both the migration and for the continuing management and support. Once that has been identified, companies can take an inventory of the skills of current IT and developer staff. In some cases, hiring may be an option. However, in most cases, and because the current staff is already acclimated to the business, it makes the most sense to ensure that the current staff is educated for the new cloud economy. There are several resources available, but one in particular, IT/Dev Connections 2017, happens in just a few short weeks. IT/Dev Connections 2017 is an intimately unique conference with heavy focus on the cloud. With many of today’s top instructors onboard delivering over 200 sessions, one week of IT/Dev Connections 2017 delivers enough deep-dive instruction to provide enough opportunity to train the entire staff on the cloud and ensure the company’s migration is successful and enduring. IT/Dev Connections 2017 runs October 23-26, 2017 in San Francisco. Visit the site to register, to identify speakers you know, and to view the session catalog. Rod Trent is the Engagement, Education, and Conference Director for Penton. He has more than 25 years of IT experience, has written many books, thousands of articles, owned and sold businesses, has run and managed editorial teams, and speaks at various conferences and user groups. He’s an avid gadget fan, a die-hard old television show and cartoon buff, a health and exercise freak, and comic book aficionado. @rodtrent

Azure

September 28, 2017

Blog

Docker Logs vs. VM Logs: The Essential Differences

Blog

AWS Security vs. Azure Security

Blog

Docker Monitoring: A Complete Guide

Docker Monitoring: How It Works When it comes to monitoring and logging in Docker, the recommended pathway for developers has been for the container to write to its standard output, and let Docker collect the output. Then you configure Docker to either store it in files, or send it to syslog. Another option is to write to a directory, so the plain log file is the typical /var/log thing, and then you share that directory with another container. In practice, when you stop the first container, you indicate that /var/log will be a “volume,” essentially a special directory, that can then be shared with another container. Then you can run tail -f in a separate container to inspect those logs. Running tail by itself isn’t extremely exciting, but it becomes much more meaningful if you want to run a log collector that takes those logs and ships them somewhere. The reason is you shouldn’t have to synchronize between application and logging containers (for example, where the logging system needs Java or Node.js because it ships logs that way). The application and logging containers should not have to agree on specific dependencies, and risk breaking each others’ code. Docker Logging: The 12-Factor App However, this isn’t the only way to log in Docker. Remember the 12-Factor app, a methodology for building SaaS applications, recommending that you limit to one process per container as a best practice, with each running unbuffered and sending data to Stdout. There are numerous options for container logging from the pre-Docker 1.6 days forward, and are better than others. You could: Log Directly from an ApplicationInstall a File Collector in the ContainerInstall a File as a ContainerInstall a Syslog Collector as a ContainerUse Host Syslog for Local SyslogUse a Syslog Container for Local SyslogLog to Stdout and use a file collectorLog to StdOut and use LogspoutCollect from the Docker File systems (Not recommended)Inject Collector via Docker Exec Docker Logging Drivers in Docker Engine Docker 1.6 added 3 new log drivers: docker logs, syslog, and log-driver null. The driver interface was meant to support the smallest subset available for logging drivers to implement their functionality. Stdout and stderr would still be the source of logging for containers, but Docker takes the raw streams from the containers to create discrete messages delimited by writes that are then sent to the logging drivers. Version 1.7 added the ability to pass in parameters to drivers, and in Docker 1.9 tags were made available to other drivers. Importantly, Docker 1.10 allows syslog to run encrypted, thus allowing companies like Sumo Logic to send securely to the cloud. Recent proposals for Google Cloud Cloud Logging driver, and the TCP, UDP, Unix Domain Socket driver. “As part of the Docker engine, you need to go through the engine commit protocol. This is good, because there’s a lot of review stability. But it is also suboptimal because it is not really modular, and it adds more and more dependencies on third party libraries.” In fact, others have suggested the drivers be external plugins, similar to how volumes and networks work. Plugins would allow developers to write custom drivers for their specific infrastructure, and it would enable third-party developers to build drivers without having to get them merged upstream and wait for the next Docker release. A Comprehensive Approach for Docker Monitoring and Logging To get real value from machine-generated data, you need to look at “comprehensive monitoring.” There are five requirements to enable comprehensive monitoring. 5 Requirements of Comprehensive Monitoring Events Let's start with events. The Docker API makes it trivial to subscribe to the event stream. Events contain lots of interesting information. The full list is well described in the Docker API doc, but let’s just say you can track containers come and go, as well as observe containers getting killed, and other interesting stuff, such as out of memory situations. Docker has consistently added new events with every version, so this is a gift that will keep on giving in the future. Think of Docker events as nothing but logs. And they are very nicely structured—it's all just JSON. If, for example, you load this into my log aggregation solution, you can now track which container is running where. I can also track trends - for example, which images are run in the first place, and how often are they being run. Or, why are suddenly 10x more containers started in this period vs. before, and so on. This probably doesn't matter much for personal development, but once you have fleets, this is a super juicy source of insight. Lifecycle tracking for all your containers will matter a lot. Configurations Docker events, among other things, allow us to see containers come and go. What if we wanted also to track the configurations of those containers? Maybe we want to track drift of run parameters, such as volume settings, or capabilities and limits. The container image is immutable, but what about the invocation? Having detailed records of container starting configurations in my mind is another piece of the puzzle towards solving total visibility. Orchestration solutions will provide those settings, sure, but who is telling those solutions what to do? From experience, we know that deployment configurations are inevitably going to be drifting, and we have found the root cause to otherwise inscrutable problems there more than once. Docker allows us to use the inspect API to get the container configuration. Again, in my mental model, that's just a log. Send it to your aggregator. Alert on deviations, use the data after the fact for troubleshooting. Docker provides this info in a clean and convenient format. Logs Well, obviously, it would be great to have logs, right? Turns out there are many different ways to deal with logs in Docker, and new options are being enabled by the new log driver API. Not everybody is quite there yet in 12-factor land, but the again there are workarounds for when you need fat containers and you need to collect logs from files inside of containers. More and more people following the best practice of writing logs to standard out and standard error, and it is pretty straightforward to grab those logs from the logs API and forward them from there. The Logspout approach, for example, is really neat. It uses the event API to watch which containers get started, then turns around and attaches to the log endpoint, and then pumps the logs somewhere. Easy and complete, and you have all the logs in one place for troubleshooting, analytics, and alerting. Stats Since the release of Docker 1.5, container-level statistics are exposed via a new API. Now you can alert on the "throttled_data" information, for example - how about that? Again (and at this point, this is getting repetitive, perhaps), this data should be sucked into a centralized system. Ideally, this is the same system that already has the events, the configurations, and the logs! Logs can be correlated with the metrics and events. There are many pieces to the puzzle, but all of this data can be extracted from Docker pretty easily today already. Docker Daemon Logs and Hosts In all the excitement around APIs for monitoring data, let's not forget that we also need to have host level visibility. A comprehensive solution should therefore also work hard to get the Docker daemon logs, and provide a way to get any other system level logs that factor into the way Docker is being put to use on the hosts of the fleet. Add host level statistics to this and now performance issues can be understood in a holistic fashion - on a container basis, but also related to how the host is doing. Maybe there's some intricate interplay between containers based on placement that pops up on one host but not the other? Without quick access to the actual data, you will scratch your head all day. User Experience What's the desirable user experience for a comprehensive monitoring solution for Docker? Thanks to the API-based approach that allows us to get to all the data either locally or remotely, it should be easy to encapsulate all the monitoring data acquisition and forwarding into a container that can either run remotely, if the Docker daemons support remote access, or as a system container on every host. Depending on how the emerging orchestration solutions approach this, it might not even be too crazy to assume that the collection container could simply attach to a master daemon. It seems Docker Swarm might make this possible. Super simple, just add the URL to the collector config and go. Sumo Logic API and Docker Logging In its default configuration, our containerized Collector agent will use the Docker API to collect the logs and statistics (metrics) from all containers, and the events that are emitted from the Docker Engine. Unless configured otherwise, the Collector will monitor all containers that are currently active, as well as any containers that are started and stopped subsequently. Within seconds, the latest version of the Collector container will be downloaded, and all of the signals coming from your Docker environment will be pumped up to Sumo Logic’s platform. Using the API has its advantages. It allows us to get all 3 telemetry types (logs, metrics, and events), we can query for additional metadata during container startup, we don’t have to accommodate for different log file locations, and the integration is the same regardless of whether you log to files, or to journalD. The Benefits of Docker Agent-Based Collection The other advantage of this approach is the availability of a data collection agent that provides additional data processing capabilities and ensures reliable data delivery. Data processing capabilities include multiline processing, and data filtering and masking of data before leaving the host. This last capability is important when considering compliance requirements such as PCI or HIPAA. Also important from a compliance standpoint is reliability. All distributed logging systems must be able to accommodate networking issues or impedance mismatches, such as latency or endpoint throttling. These are all well covered issues when using the Sumo Logic Collector Agent. Docker Multiline Logging Lack of multiline logging support has always plagued Docker logging. The default Docker logging drivers, and the existing 3rd party logging drivers, have not supported multiline log messages, and for the most part, they still do not. One of Sumo Logic’s strengths has always been its ability to rejoin multiline log messages back into a single log message. This is an especially important issue to consider when monitoring JVM-based apps, and working with stack traces. Sumo Logic automatically infers common boundary patterns, and supports custom message boundary expressions. We ensure that our Docker Log Source and our Docker Logging Plugin maintain these same multiline processing capabilities. The ability to maintain multiline support is one of the reasons why we recommend using our custom Docker API based integration over simply reading the log files from the host. Generally speaking, reading container logs from the file system is a fine approach. However, when the logs are wrapped in JSON, and ornamented with additional metadata, it makes the multiline processing far more difficult. Other logging drivers are starting to consider this issue, no doubt based on market feedback. However, their capabilities are far less mature than Sumo’s. Instant Gratification of Docker Logging The installation of the containerized agent couldn’t be simpler. And with a simple query, you can see the data from all of the containers on your host, with all of the fields extracted and ready to explore. From there, it is easy to install our Docker App to monitor your complete Docker Environment as you scale this out to all of your hosts. Going Beyond Docker Basics When you deploy the Sumo Logic Collector container across a fleet of hosts, monitoring hundreds or thousands of containers, you will want to be a bit more sophisticated than just running with the default container settings. However, that is beyond the scope of this discussion. When you deploy our Collector Agent as a container, all of the Collector agent’s features are available, and all parameters can be configured. To read about how to dive into the advanced configuration options, check out the container’s readme on Docker Hub and read more details in our documentation . Sometimes You Gotta Go Agentless There are times when you require an agentless solution – or you may just prefer one. If you have another way to collect Docker container metrics, and you just need container logs, then a Docker Logging Plugin (earlier versions referred to as Logging Drivers) may be the perfect solution. Note: The agentless approach is an ideal solution for AWS ECS users that rely on CloudWatch for their container metrics and events. How Sumo Logic's Docker Logging Plugin Works Our Docker Logging Plugin is written in Go, and runs within the Docker Engine. It is configured on a per container basis, and sends data directly to Sumo Logic’s HTTP Endpoint, using a pre-configured “HTTP Source.” You can access our plugin on the new Docker Store , but the best place to read about how to use it is on its Github repo. Following the theme set out earlier, it is very easy to use in its default configuration, with a host of advanced options available. Follow these simple steps: Register the plugin with the Docker Engine :$ docker plugin install –grant-all-permissions store/sumologic/docker-logging-driver:<ver>(make sure you go to the Docker Store, and get the latest version number. As of this publishing, the latest version is 1.0.1 , and Docker Store does not support the ‘latest’ parameter. So, here is the corresponding command line for this version:$ docker plugin install –grant-all-permissions store/sumologic/docker-logging-driver:1.0.1 )Specify the driver when you run a container:$ docker run –log-driver=sumologic –log-opt sumo-url=<sumo_HTTP_url> Docker Logging Plugin Capabilities This plugin provides some very important capabilities: Buffering and batching. You can configure the size of each HTTP POSTCompression: Configurable gzip compression levels to minimize data transfer costsProxy support: Critical for highly secure enterprise deploymentTLS Required: This is a Sumo Logic requirement. All data transfer must meet PCI compliance requirements.Multiline Support: Multiline stitching is processed within the Sumo Logic cloud platform rather than in the logging plugin. This keeps the plugin fast and efficient. However, we made specific design considerations to ensure that the we preserved multiline support while providing rich metadata support.Configurable Metadata per Container: The Docker Logging Plugin framework supports a flexible templating system that is used by our plugin to construct dynamic Source Category metadata that varies per container. The template syntax gives you access to environment vars, docker labels, and the ability to pass in custom values when starting containers. Our Docker Logging Plugin is the first of our integrations to support this capability. A similar capability will be supported by our Docker Log and Stats Sources with our next Collector release. Integrating With Other Docker Source Agents If, for some reason, these two methods do not satisfy your needs, then one of our many other collection methods (aka “Sources”) will most likely do the trick. Sumo Logic also integrates with various other open source agents and cloud platform infrastructures, and relies on some of them for certain scenarios. Details on all of the above integrations are available in our docs. If you have been using Docker for a while, and have implemented a solution from the early days, such as syslog or logspout, we encourage you to review the approaches defined here, and migrate your solution accordingly.

Blog

Does IoT Stand for ‘Internet of Threats’?

Blog

SIEM is Not the Same for Cloud Security

Blog

Cloud maturity, security, and the importance of constant learning - Podcast

Blog

Gratitude

Sumo Logic: what thoughts come to your mind when you hear these two words? Some images that cross your mind may be a genius sumo wrestler solving math problems, or you might think that it’s the title of an autobiography of a Sumo wrestler, but really other than being a cool name, it can be a mystery. Well, at least that’s what came to my mind when I was told that I would be interning in the marketing department of Sumo Logic. Hello to the beautiful people reading this, my name is Danny Nguyen and when I first started interning at Sumo Logic, I was nothing more than your average 17 year old high schooler walking into a company thinking that he was way in over his head. Coming into Sumo Logic, I had the exact same panic, nervousness, and fear that anyone would have at their very first job – worried about making a good impression, worried that the work would be too much, and worried that the people he would be working with wouldn’t laugh at his jokes. Before starting my internship, I had these preconceived thoughts of the stereotypical intern. I figured that I would just be someone to give busy work to and someone that would be bossed around. I thought that I wouldn’t be taken seriously and looked down upon because I was still so young. However, as soon as my very first day at Sumo Logic came to an end, I knew that those worries that I had would be nothing but a figment of my imagination because being an intern at Sumo Logic had become and still is, one of the best experiences of my life. Sumo Logic completely redefined what I thought being an intern and working at a company meant. I was constantly learning something new and meeting new people every single week. I was treated with the same respect as if I were a full-time employee and coworker. I was able to gain a real-life experience of working in the marketing department with hands on learning. However, the greatest thing at Sumo Logic, that I will always remember, are all the people that make up the foundation and give it the amazing personality that it has. They made sure that I was getting a great experience where I made as many connections and learned as many things as I could. I was encouraged and inspired every single day to learn and to keep becoming a better version of myself than I was the previous day. But most importantly, people genuinely wanted to get to know as a person and become my friend. So when you ask me what Sumo Logic means to me today, I could type up a two page essay expressing all the different words and adjectives to describe my gratitude, love, and appreciation that I have for Sumo Logic and the people there – and it still would not be enough. – Danny From Maurina, Danny’s Manager: High school intern – I’ll let that sit with you a moment. When I agreed to take in a high school intern (gulp) here at Sumo Logic, I was worried to say the least, but our experience with Danny blew any expectations I had out of the water. A coworker here at Sumo introduced me to Genesys Works, whose mission is to transform the lives of disadvantaged high school students. I met with the Bay Area coordinator, and realized this was an amazing chance to make a difference in a young person’s life. I signed on…then I was terrified. Mentor and her pupil When Danny’s first day rolled around, I was unsure what to expect. From the minute we started talking, however, all my fears were put to rest. Our first conversation was a whirlwind of questions – “What’s your favorite type of food?” “What’s your degree in?” “Do you have any regrets in life?”…wait what? From there, I knew that Danny wasn’t your average 17 year old intern. For the nine months I managed Danny, I watched him grow from a quiet 17 year old, to a vibrant, confident, young man/professional who could turn a scribble into a perfect Salesforce report (hard to do if you know anything about Salesforce reporting), a Post-it drawing into a display ad, and a skeptical manager into a true believer in the power of drive and passion, regardless of age. Thank you Danny! <3

September 7, 2017

Blog

Detecting Insider Threats with Okta and Sumo Logic

Security intelligence for SaaS and AWS Workloads is different than your traditional on-prem environment Based on Okta’s latest Business@Work report, organizations are using between 16-22 SaaS applications in their environment. In the report, Office 365 comes out as the top business applications followed by Box and G suite. These business-critical SaaS applications hold sensitive and valuable company information such as financial data, employee records, and customer data. While everyone understands that SaaS applications provide immediate time-to-value and are increasing in adoption at a faster pace than ever before, what many fail to consider is that these SaaS applications also create a new attack surface that represents substantial risk for the company due to the lack of visibility that security operations teams would typically have with traditional, on-prem applications. If employee credentials are compromised, it creates huge exposure for the company because the attacker is able to access all the applications just like an insider would. In this case, timely detection and containment of an insider threat become extremely important. Sumo Logic’s security threat intelligence will allow security operations to address the many challenges related to SaaS and cloud workload security. There are many challenges for incident management and security operations teams when organizations are using SaaS applications: How do you make sure that users across SaaS applications can be uniquely identified? How can you track anomalies in user behavior? The first step from the attacker after exploiting the vulnerability is to steal employee’s identity and move laterally in the organization. In that process, the attacker’s behavior will be considerably different than the normal user’s behavior. Second, it is critical that the entire incident response and management processes are automated for detection and containment of such attacks to minimize potential damage or data leakage. Most organizations moving to the cloud have legacy solutions such as Active Directory and on-prem SIEM solutions. While traditional SIEM products can integrate with Okta, they cannot integrate effectively with other SaaS applications to provide complete visibility into user activities. Considering there are no collectors to install to get logs from SaaS applications, traditional SIEM vendors will not be able to provide the required insight into the modern SaaS application and AWS workloads. In order to solve for these specific problems, Okta and Sumo Logic have partnered to provide better visibility and faster detection of insider threats. Okta ensures that every user is uniquely identified across multiple SaaS applications. Sumo Logic can ingest those authentication logs from Okta and be able to correlate with the user activities across multiple SaaS applications such as Salesforce, Box, and Office 365. Sumo Logic has machine learning operators such as multi-dimensional Outlier, LogReduce, and LogCompare to quickly surface the anomaly in the user activities by correlating identity from Okta with the user activities in Salesforce and Office 365. Once the abnormal activities have been identified, Sumo Logic can take multiple actions such as sending Slack message, creating ServiceNow tickets or disabling the user in Okta or triggering actions within a customer’s automation platform. The use case: Okta + Sumo Logic = accurate incident response for cloud workloads and SaaS applications ` How many times have you fat fingered your password and got the authentication failure? Don’t answer it. Authentication failure is a part of life. You cannot launch an investigation every time there is an authentication failure. That would result in too many false positives and an overload of wasted effort for your security operations team. Okta and Sumo Logic allows you to detect multiple authentication failures followed by a successful authentication. It is good enough to launch an investigation at this point, but we all know it could also be a user error. Caps Lock is on, key board is misbehaving or we might have just changed the password and forgotten! To ensure that security operations get more intelligent and actionable insights into such events, Sumo Logic can provide additional context by correlating such authentication failure logs from Okta with user activity across multiple SaaS application. For example, I changed my password and now I am getting authentication failure within Okta. After that I realized the mistake and corrected it, I get the successful authentication. I log into the Box application to work on few documents and signed off. Sumo Logic will take this Okta event and correlate with the Box activities. In case the attacker had logged in instead of me, then there will be anomalies in behavior. An attacker might download all documents or make ownership changes to the documents. While this is happening, Sumo Logic will be able to spot these anomalies in near real time and be able to take a variety of automated actions from creating a ServiceNow ticket to disable the user in Okta. You can start ingesting your Okta logs and correlate with the user activity logs across multiple SaaS applications now. Sign up for your free Sumo Logic trial that never expires! Co-author Matt Egan is a Partner Solutions Technical Architect in Business Development at Okta. In this role, he works closely with ISV partners, like Sumo Logic, to develop integrations and joint solutions that increase customer value. Prior to joining Okta, Matt has held roles ranging from Software Development to Information Security over an 18 years career in technology.

Blog

The Top 5 Reasons to Attend Illuminate

Blog

GDPR Compliance: 3 Steps to Get Started

The General Data Protection Regulation (GDPR) is one of the hottest topics in IT security around the globe. The European Union (EU) regulation gives people more say over what companies can do with their data, while making data protection rules more or less identical throughout the EU. Although this regulation originated in the EU, its impact is global; any organization that does business using EU citizens’ data must be compliant. With the May 2018 deadline looming, IT security professionals worldwide are scrambling to ensure they’re ready (and avoid the strict fines for non-compliance and security breaches). In the video below, Sumo Logic VP of Security and Compliance George Gerchow offers three ways to get you GDPR-ready in no time. 1. Establish a Privacy Program Establishing a privacy program allows you to set a baseline for privacy standards. Once you have a privacy program in place, when new regulations like GDPR are released, all you have to do is fill in the gaps between where you are and where you need to be. 2. Designate a Data Protection Officer This is a critical part of complying with GDPR—and a great way to build sound data security principles into your organization. Under the GDPR requirements, the Data Protection Officer: Must report directly to the highest level of management Can be a staff member or an external service provider Must be appointed on the basis of professional qualities, particularly expert knowledge on data protection law and practices Must be provided with appropriate resources to carry out their tasks and maintain their expert knowledge Must not carry out any other tasks that could result in a conflict of interest 3. Take Inventory of Customer Data and Protections Before GDPR compliance becomes mandatory, take a thorough inventory of where your customer data is housed and how it is protected. Make sure you understand the journey of customer data from start to finish. Keep in mind that the data is only as secure as the systems you use to manage it. As you dissect the flow of data, take note of critical systems that the data depends upon. Make sure the data is secured at every step using proper methodologies like encryption. Bonus Tip: Arrange Third-Party GDPR Validation Between now and May 2018, you still start to see contracts coming through that ask if you are GDPR-compliant. When the deadline rolls around, there will be two groups of organizations out there: Companies that have verification of GDPR compliance to share with prospective clients. Companies that say they are GDPR compliant and want clients to take their word for it. Being in the first group gives your company a head start. Conduct a thorough self-assessment (and document the results) or use a third-party auditor to provide proof of your GDPR compliance. Learn More About GDPR Compliance Ready to get started with GDPR? George Gerchow, the Sumo Logic VP of Security and Compliance, shares more tips for cutting through the vendor FUD surrounding GDPR.

Blog

Understanding and Analyzing IIS Logs

Blog

Apache Error Log Files

Blog

Machine Learning and Log Analysis

Blog

Terraform and Sumo Logic - Build Monitoring into your Cloud Infrastructure

Are you using Terraform and looking for a way to easily monitor your cloud infrastructure? Whether you're new to Terraform, or you control all of your cloud infrastructure through Terraform, this post provides a few examples how to integrate Sumo Logic's monitoring platform into Terraform-scripted cloud infrastructure.*This article discusses how to integrate the Sumo Logic collector agent with your EC2 resources. To manage a hosted Sumo Logic collection (S3 sources, HTTPS sources, etc.), check out the Sumo Logic Terraform Provider here or read the blog.Collect Logs and Metrics from your Terraform InfrastructureSumo Logic's ability to Unify your Logs and Metrics can be built into your Terraform code in a few different ways. This post will show how to use a simple user data file to bootstrap an EC2 instance with the Sumo Logic collector agent. After the instance starts up, monitor local log files and overlay these events with system metrics using Sumo Logic's Host Metrics functionality:AWS CloudWatch Metrics and Graphite formatted metrics can be collected and analyzed as well.Sumo Logic integrates with Terraform to enable version control of your cloud infrastructure and monitoring the same way you version and improve your software.AWS EC2 Instance with Sumo Logic Built-InBefore we begin, if you are new to Terraform, I recommend Terraform: Up and Running. This guide originated as a blog, and was expanded to a helpful book by Yevgeniy Brikman.What We'll MakeIn this first example, we'll apply the Terraform code in my GitHub repo to launch a Linux AMI in a configurable AWS Region, with a configurable Sumo Logic deployment. The resources will be created in your default VPC and will include:One t2.micro EC2 instance One AWS Security Group A Sumo Logic collector agent and sources.json fileThe Approach - User Data vs. Terraform Provisioner vs. PackerIn this example, we'll be using a user data template file to bootstrap our EC2 instance. Terraform also offers Provisioners, which run scripts at the time of creation or destruction of an instance. HashiCorp offers Packer to build machine images, but I have selected to use user data in this example for a few reasons:User Data is viewable in the AWS console Simplicity - my next post will cover an example that uses Packer rather than user data, although user data can be included in an autoscaling group's launch configuration For more details, see the Stack Overflow discussion here If you want to build Sumo Logic collectors into your images with Packer, see my blog with instructions hereThe sources.json file will be copied to the instance upon startup, along with the Sumo Logic collector. The sources.json file instructs Sumo Logic to collect various types of logs and metrics from the EC2 instance:Linux OS Logs (Audit logs, Messages logs, Secure logs) Host Metrics (CPU, Memory, TCP, Network, Disk) Cron logs Any application log you needA Note on SecurityThis example relies on wget to bootstrap the instance with the Sumo Logic collector and sources.json file, so ports 80 and 443 are open to the world. In my next post, we'll use Packer to build the image, so these ports can be closed. We'll do this by deleting them in the Security Group resource of our main.tf file.Tutorial - Apply Terraform and Monitor Logs and Metrics InstantlyPrerequisitesFirst, you'll need a few things:Terraform - see the Terraform docs here for setup instructions A Sumo Logic account - Get a free one here Access to an AWS account with AmazonEC2FullAccess permissions - If you don't have access you can sign up for the free tier here An AWS authentication method to allow Terraform to control AWS resourcesOption 1: User key pair Option 2: Set up the AWS CLI or SDKs in your local environmentInstructions1. First, copy this repo (Example 1. Collector on Linux EC2) somewhere locally.You'll need all 3 files: main.tf, vars.tf, and user_data.sh main.tf will use user_data.sh to bootstrap your EC2 main.tf will also use vars.tf to perform lookups based on a Linux AMI map, a Sumo Logic collector endpoint map, and some other variables2. Then, test out Terraform by opening your shell and running:/path/to/terraform planYou can safely enter any string, like 'test', for the var.Sumo_Logic_Access_ID and var.Sumo_Logic_Access_Key inputs while you are testing with the plan command. After Terraform runs the plan command, you should see: "Plan: 2 to add, 0 to change, 0 to destroy." if the your environment is configured correctly.3. Next, run Terraform and create your EC2 instance, using the terraform apply commandThere are some configurable variables built in For example, the default AWS Region that this EC2 will be launched into is us-east-1, but you can pass in another region like this:path/to/terraform/terraform apply -var region=us-west-2If your Sumo Logic Deployment is in another Region, like DUB or SYD, you can run the command like this:path/to/terraform/terraform apply -var Sumo_Logic_Region=SYD5. Then, Terraform will interactively ask you for your Sumo Logic Access Key pair because there is no default value specified in the vars.tf fileGet your Sumo Logic Access Keys from your Sumo Logic account and enter them when Terraform prompts youFirst, navigate to the Sumo Logic Web Application and click your name in the left nav and open the Preferences page Next, click the blue + icon near My Access Keys to create a key pair See the official Sumo Logic documentation here for more infoYou will see this success message after Terraform creates your EC2 instance and Security Group: "Apply complete! Resources: 2 added, 0 changed, 0 destroyed."6. Now you're done!After about 3-4 minutes, check under Manage Data > Collection in the Sumo Logic UI You should see you new collector running and scanning the sources we specified in the sources.json (Linux OS logs, Cron log, and Host Metrics)CleanupMake sure to delete you resources using the Terraform destroy command. You can enter any string when you are prompted for the Sumo Logic key pair information. The -Vephemeral=true flag in our Sumo Logic user data configuration command instructs Sumo Logic to automatically clean out old collectors are no longer alive./path/to/terraform destroyNow What? View Streaming Logs and Metrics!Install the Sumo Logic Applications for Linux and Host Metrics to get pre-built monitoring for your EC2 Instance:What Else Can Sumo Logic Do?Sumo Logic collects AWS CloudWatch metrics, CloudTrail audit data, and much more. Sumo Logic also offers integrated Threat Intelligence powered by CrowdStrike, so that you can identify threats in your cloud infrastructure in real time. See below for more documentation:AWS CloudTrail AWS CloudWatch Metrics Integrated Threat IntelligenceWhat's Next?In part 2 of this post, I'll cover how to deploy an Autoscaling Group behind a load balancer in AWS. We will integrate the Sumo Logic collector into each EC2 instance in the fleet, and also log the load balancer access logs to an S3 bucket, then scan that bucket with a Sumo Logic S3 source.Thanks for reading!Graham Watts is an AWS Certified Solutions Architect and Sales Engineer at Sumo Logic

Blog

Monitoring and Troubleshooting Using AWS CloudWatch Logs

AWS

July 27, 2017

Blog

Log Aggregation vs. APM: No, They’re Not the Same Thing

Are you a bit unsure about the difference between log aggregation and Application Performance Monitoring (APM)? If so, you’re hardly alone. These are closely related types of operations, and it can be easy to conflate them—or assume that if you are doing one of them, there’s no reason to do the other. In this post, we’ll take a look at log aggregation vs APM, and the relationship between these two data accumulation/analysis domains, and why it is important to address both of them with a suite of domain-appropriate tools, rather than a single tool.Defining APM First, let’s look at Application Performance Monitoring, or APM. Note that APM can stand for both Application Performance Monitoring and Application Performance Management, and in most of the important ways, these terms really refer to the same thing—monitoring and managing the performance of software under real-world conditions, with emphasis on the user experience, and the functional purpose of the software. Since we’ll be talking mostly about the monitoring side of cloud APM, we’ll treat the acronym as being interchangeable with Application Performance Monitoring, but with the implicit understanding that it includes the performance management functions associated with APM. What does APM monitor, and what does it manage? Most of the elements of APM fall into two key areas: user experience, and resource-related performance. While these two areas interact (resource use, for example, can have a strong effect on user experience), there are significant differences in the ways in which they are monitored (and to a lesser degree, managed):APM: User Experience The most basic way to monitor application performance in terms of user experience is to monitor response time. How long does it take after a user clicks on an application input element for the program to display a response? And more to the point, how long does it take before the program produces a complete response (i.e., a full database record displayed in the correct format, rather than a partial record or a spinning cursor)?Load is Important Response time, however, is highly dependent on load—the conditions under which the application operates, and in particular, the volume of user requests and other transactions, as well as the demand placed on resources used by the application. To be accurate and complete, user experience APM should include in-depth monitoring and reporting of response time and related metrics under expected load, under peak load (including unreasonably high peaks, since unreasonable conditions and events are rather alarmingly common on the Internet), and under continuous high load (an important but all too often neglected element of performance monitoring and stress testing). Much of the peak-level and continuous high-level load monitoring, of course, will need to be done under test conditions, since it requires application of the appropriate load, but it can also be incorporated into real-time monitoring by means of reasonably sophisticated analytics: report performance (and load) when load peaks above a specified level, or when it remains above a specified level for a given minimum period of time.APM: Resource Use Resource-based performance monitoring is the other key element of APM. How is the application using resources such as CPU, memory, storage, and I/O? When analyzing these metrics, the important numbers to look at are generally percentage of the resource used, and percentage still available. This actually falls within the realm of metrics monitoring more than APM, and requires tools dedicated to metrics monitoring. If percent used for any resource (such as compute, storage or memory usage) approaches the total available, that can (and generally should) be taken as an indication of a potential performance bottleneck. It may then become necessary to allocate a greater share of the resource in question (either on an ongoing basis, or under specified conditions) in order to avoid such bottlenecks. Remember: bottlenecks don’t just slow down the affected processes. They may also bring all actions dependent on those processes to a halt.Once Again, Load Resource use, like response time, should be monitored and analyzed not only under normal expected load, but also under peak and continuous high loads. Continuous high loads in particular are useful for identifying potential bottlenecks which might not otherwise be detected.Log Aggregation It should be obvious from the description of APM that it can make good use of logs, since the various logs associated with the deployment of a typical Internet-based application provide a considerable amount of performance-related data. Much of the monitoring that goes into APM, however, is not necessarily log-based, and many of the key functions which logs perform are distinct from those required by APM.Logs as Historical Records Logs form an ongoing record of the actions and state of the application, its components, and its environment; in many ways, they serve as a historical record for an application. As we indicated, much of this data is at least somewhat relevant to performance (load level records, for example), but much of it is focused on areas not closely connected with performance:Logs, for example, are indispensable when it comes to analyzing and tracing many security problems, including attempted break-ins. Log analysis can detect suspicious patterns of user activity, as well as unusual actions on the part of system or application resources.Logs are a key element in maintaining compliance records for applications operating in a regulated environment. They can also be important in identifying details of specific transactions and other events when they require verification, or are in dispute.Logs can be very important in tracing the history and development of functional problems, both at the application and infrastructure level—as well as in analyzing changes in the volume or nature of user activity over time. APM tools can also provide historical visibility into your environment, but they do it in a different way and at a different level. They trace performance issues to specific lines of code. This is a different kind of visibility and is not a substitute for the insight you gain from using log aggregation with historical data in order to research or analyze issues after they have occurred.The Need for Log Aggregation The two greatest problems associated with logs are the volume of data generated by logging, and the often very large number of different logs generated by the application and its associated resources and infrastructure components. Log aggregation is the process of automatically gathering logs from disparate sources and storing them in a central location. It is generally used in combination with other log management tools, as well as log-based analytics. It should be clear at this point that APM and log aggregation are not only different—It also does not make sense for a single tool to handle both tasks. It is, in fact, asking far too much of any one tool to take care of all of the key tasks required by either domain. Each of them requires a full suite of tools, including monitoring, analytics, a flexible dashboard system, and a full-featured API. A suite of tools that can fully serve both domains, such as that offered by Sumo Logic, can, on the other hand, provide you with the full stack visibility and search capability into your network, infrastructure and application logs.

July 27, 2017

Blog

How to prevent Cloud Storage Data Leakage

Blog

Jenkins, Continuous Integration and You: How to Develop a CI Pipeline with Jenkins

Continuous Integration, or CI for short, is a development practice wherein developers can make changes to project code and have those changes automatically trigger a process which builds the project, runs any test suites and deploys the project into an environment. The process enables teams to rapidly test and develop ideas and bring innovation faster to the market. This approach allows teams to detect issues much earlier in the process than with traditional software development approaches.With its roots in Oracle’s Hudson server, Jenkins is an open source integration server written in Java. The server can be extended through the use of plugins and is highly configurable. Automated tasks are defined on the server as jobs, and can be executed manually on the server itself, or triggered by external events, such as merging a new branch of code into a repository. Jobs can also be chained together to form a pipeline, taking a project all the way from code to deployment, and even monitoring of the deployed solution in some cases.In this article, we’re going to look at how to set up a simple build job on a Jenkins server and look at some of the features available natively on the server to monitor and troubleshoot the build process. This article is intended as a primer on Jenkins for those who have not used it before, or have never leveraged it to build a complete CI pipeline.Before We Get StartedThis article assumes that you already have a Jenkins server installed on your local machine or on a server to which you have access. If you have not yet accomplished this, the Jenkins community and documentation can be an excellent source of information and resources to assist you. Jenkins is published under the MIT License and is available for download from their GitHub repository, or from the Jenkins website.Within the Jenkins documentation, you’ll find a Guided Tour, which will walk you through setting up a pipeline on your Jenkins box. One of the advantages of taking this tour is that it will show you how to create a configuration file for your pipeline, which you can store in your code repository, side-by-side with your project. The one downside of the examples presented is that they are very generic. For a different perspective on Jenkins jobs, let’s look at creating a build pipeline manually through the Jenkins console.Creating A Build JobFor this example, we’ll be using a project on GitHub that creates a Lambda function to be deployed on AWS. The project is Gradle-based and will be built with Java 8. The principles we’re using could be applied to other code repositories, build and deployment situations.Log in to your Jenkins server, and select New Item from the navigation menu.Jenkins New Item WorkflowChoose a name for your project, select Freestyle project and then scroll down and click OK. I’ll be naming mine Build Example Lambda. When the new project screen appears, follow the following steps. Not all of these steps are necessary, but they’ll make maintaining your project easier.Enter a Description for your project and describe what this pipeline will be doing with it.Check Discard old builds, and select the Log Rotation Strategy with the Max # of builds to keep set to 10. These are the settings I use, but you may select different numbers. Having this option in place prevents old builds from taking too much space on your server.We’ll add a parameter for the branch to build, and default it to master. This will allow you to build and deploy from a different branch if the need arises.Select This project is parameterized.Click on Add Parameter and select String Parameter.Name: BRANCHDefault Value: masterDescription: The branch from which to pull. Defaults to master.Scroll down to Source Code Management. Select Git.Enter the Repository URL. In my case, I entered https://github.com/echovue/Lambda_SQSMessageCreator.gitYou may also add credentials if your Git repository is secure, but setting that up is beyond the scope of this article.For the Branch Specifier, we’ll use the parameter we set up previously. Parameters are added by enclosing the parameter name in curly braces and prefixing it with a dollar sign. Update this field to read */${BRANCH}Git Configuration Using Parameterized BranchFor now, we’ll leave Build Triggers alone.Under Build Environment, select Delete workspace before build starts, to ensure that we are starting each build with a clean environment.Under Build, select Add build step, and select Invoke Gradle script.When I want to build and deploy my project locally, I’ll enter ./gradlew build fatJar on the command line. To accomplish this as part of the Jenkins job, I’ll complete the following steps.Select Use Gradle WrapperCheck From Root Build Script DirFor Tasks, enter build fatJarFinally, I want to save the Fat Jar which is created in the /build/libs folder of my project, as this is what I’ll be uploading to AWS in the next step.Under Post-build Actions, Select Add post-build action and choose Archive the artifacts.In files to archive, enter build/libs/AWSMessageCreator-all-*Finally, click on Save.Your job will now have been created. To run your job, simply click on the link to Build with Parameters. If the job completes successfully, you’ll have a jar file which can then be deployed to AWS Lambda. If the job fails, you can click on the job number, and then click on Console Output to troubleshoot your job.Next StepsIf your Jenkins server is hosting on a network that is accessible from the network which hosts the code repository you’re using, you may be able to set up a webhook to trigger the build job when changes are merged into the master branch.The next logical step is to automate the deployment of the new build to your environment if it builds successfully. Install the AWS Lambda Plugin and the Copy Artifact Plugin on your Jenkins server, and use it to create a job to deploy your Lambda to AWS, which copies the jar file we archived as part of the job we built above.When the deployment job has been successfully created, open the build job, and click on the Configure option. Add a second Post-build action to Build other projects. Enter the name of the deployment project, and select Trigger only if build is stable.At this point, the successful execution of the build job will automatically start the deployment job.Congrats! You’ve now constructed a complete CI pipeline with Jenkins.

Blog

Use Sumo Logic to Collect Raspberry Pi Logs

June 18, 2017

Blog

Integrating Machine Data Analytics in New Relic Insights via Sumo Logic Webhooks

When Sumo Logic and New Relic announced a partnership at AWS re:Invent 2016, we immediately started hearing the excitement from our joint customers. The ability to combine the strengths of two leading SaaS services that offer fast time-to-value for monitoring and troubleshooting modern applications would offer a powerful and complete view of digital businesses, from the client down to the infrastructure. Today, we’re pleased to announce another advancement in our partnership: integrated machine data analytics with application and infrastructure performance data in New Relic Insights via a custom New Relic webhook built directly into Sumo Logic. Custom New Relic webhook in Sumo Logic Unlocking Insights from Sumo Logic Scheduled searches in Sumo Logic allow you to monitor and alert on key events occurring in your application and infrastructure. The flexibility of the query language allows you to pull just the information you need while fine tuning the thresholds to trigger only when necessary. Combined with your New Relic APM and New Relic Infrastructure data in New Relic Insights, you’ll now be able to visualize information such as: Events: Service upgrades, exceptions, server restarts, for example Alerts: More than 10 errors seen in 5 minutes, for example, or failed login attempts exceeding 5 in 15 minutes KPIs: Count of errors by host, for example, or top 10 IPs by number of requests Integrating these insights into New Relic provides an integrated context for faster root cause analysis and reduced Mean Time to Resolution (MTTR), all within a single pane of glass. In just three simple steps, you’ll be able to leverage Sumo Logic webhooks to send data to New Relic. Step 1: Configure the New Relic webhook connection In New Relic Insights, you will first need to register an API key that will be used by the Sumo Logic webhook. These keys allow you to securely send custom events into New Relic from different data sources. Type in a short description to keep a record of how this API key will be used, then copy the Endpoint and Key for setup in Sumo Logic. Generate an API Key from New Relic Insights to be used in Sumo Logic In Sumo Logic, create a New Relic webhook connection and insert the Endpoint and Key into the URL and Insert Key fields. The payload field gives you the flexibility to customize the event for viewing in New Relic. In addition to the actual results, you can optionally specify metadata to provide additional context. For example, the name of the Sumo Logic search, a URL to that particular search, a description, and more. This payload can also be customized later when you schedule the search. Variables from your Sumo Logic search can be included in your payload for additional context in New Relic. Step 2: Schedule a search to send custom events After saving your New Relic webhook, you have the option to specify this as the destination for any scheduled search in Sumo Logic. The example below shows a query to look for “Invalid user” in our Linux logs every 15 minutes. To store and visualize this information in New Relic, we simply schedule a search, select the New Relic webhook that we configured in Step 1, and customize the payload with any additional information we want to include. This payload will send each result row from Sumo Logic as an individual event in New Relic. The Sumo Logic query language allows you to transfer meaningful insights from your logs to New Relic Step 3: Visualize events in New Relic Insights Once the scheduled search has been saved and triggered, we can see the data populating in New Relic Insights and use the New Relic Query Language (NRQL) to create the visualizations we need. NRQL’s flexibility lets you tailor the data to your use case, and the visualization options make it seamless to place alongside your own New Relic data. In fact, you might not even notice the difference between the data sources—can you tell which data below is coming from New Relic, and which is coming from Sumo Logic? A unified view: “Source IP’s from Failed Attempts” streams in from Sumo Logic, while “Errors by Class” comes from New Relic The ability to visualize application and infrastructure performance issues alongside insights from your logs reduces the need to pivot between tools, which can speed root cause analysis. If you’ve spotted an issue that requires a deeper analysis of your logs, you can jump right into a linked Sumo Logic dashboard or search to leverage machine learning and advanced analytics capabilities. Learn more Head over to Sumo Logic DocHub for more details on how to configure the New Relic webhook, then schedule some searches to send custom events to New Relic Insights. We’re excited to continue advancing this partnership, and we look forward to sharing more with you in the future. Stay tuned!

June 8, 2017

Blog

Disrupting the Economics of Machine Data Analytics

The power of modern applications is their ability to leverage the coming together of mobile, social, information and cloud to drive new and disruptive experiences…To enable companies to be more agile, to accelerate the pace at which they roll out new code, to adopt DevSecOps methodologies where traditional siloed walls between the teams are disappearing. But these modern applications are highly complex with new development and testing processes, new architectures, new tools (i.e. containers, micro-services and configuration management tools), SLA requirements, security in the cloud concerns, and explosion of data sources, coming from these new architectures as well as IOT. In this journey to the cloud with our 1500+ customers, we have learned a few things about their challenges: All of this complexity and volume of data is creating unprecedented challenges to enable ubiquitous user access to all this machine data to drive continuous intelligence across operational and security use cases. In this new world of modern applications and cloud infrastructures, they recognize that not all data is created equal. For example, the importance, the life expectancy, the access performance needed, the types of analytics that need to be run against that data. Think IT Operations data (high value, short life span, frequent and high performance access needs) vs. regulatory compliance data (long term storage, periodic searches, esp. at audit times, slower performance may be acceptable). Data ingest in certain verticals such as retail and travel, fluctuate widely and provisioning at maximum capacity loads – with idle capacity the majority of the year – is unacceptable in this day and age. So if we step back for a moment and look at the industry as a whole, what is hindering a company’s ability to unleash their full data potential? The root of the problem comes from two primary areas: 1. The more data we have, the higher the cost 2. The pricing models of current solutions are based on volume of data ingested and not optimized for varying use cases that we are seeing… it is like a “one size fits all” kind of approach Unfortunately, organizations are often forced to make a trade-off because of the high cost of current pricing models, something we refer to as the data tax – the cost of moving data into your data analytics solution. They have to decide: “What data do I send to my data analytics service?” as well as “Which users do I enable with access?” As organizations are building out new digital initiatives, or migrating workloads to the cloud, making these kinds of tradeoffs will not lead to ultimate success. What is needed is a model that will deliver continuous intelligence across operational and security use cases. One that leverages ALL kinds of data, without compromise. We believe there is a better option – one which leverages our cloud-native machine data analytics platform, shifting from a volume based approach – fixed, rigid, static – to a value based pricing model – flexible and dynamic – aligned with the dynamic nature of the modern apps that our customers are building. One that moves us to a place where democratization of machine data is realized! Introducing Sumo Logic Cloud Flex As this launch was being conceived, there were four primary goals we set out to accomplish: Alignment: Alignment between how we priced out service and the value customers received from it. Flexibility: Maximum flexibility in the data usage and consumption controls that best align to the various use cases Universal Access: Universal access of machine data analytics to all users, not just a select few Full Transparency: Real-time dashboards on how our service is being used, the kind of searches people are running, and the performance of the system And there were four problem areas we were trying to address: Data Segmentation: Different use cases require different retention durations Data Discrimination: Not all data sets require the same performance and analytics capabilities Not economical to store and analyze low value data sets Not economical to store data sets for long periods of time, esp. as it relates to regulatory compliance mandates Data Ubiquity: Not economical for all users to access machine data analytics Data Dynamics: Support seasonal business cycles and align revenue with opex So with this Cloud Flex launch, Sumo Logic introduces the following product capabilities to address these four pain points: Variable Data Retention Analytics Profile Unlimited Users Seasonal Pricing If increasing usage flexibility in your data analytics platform is of interest, please reach out to us. If you would like to get more information on cloud flex and the Democratizing Machine Data Analytics, please read our press release.

June 6, 2017

Blog

Universal Access

“In God we trust, all others must bring data.” – W. Edward DenningsOver the years, we’ve all internalized the concept that data driven decision making is key. We’ve watched as digital businesses like AirBnB, Uber, and Amazon far outpace their competitors in market-share and profitability. They do this by tapping into continuous intelligence: they use the data generated as users interact with their applications to provide customized experiences – allowing them to learn and adapt quickly to where their users want to be.I had always imagined decision-making at these companies to be kind of like the stock photo; a well-heeled executive looking at an interesting chart and experiencing that moment of brilliant insight that leads to a game-changing business decision.The reality though, is that it was never as simple as that. It was hard work. It was not one key decision made by one key executive. It was several hundreds of small every-day decisions made across all levels in the organization by all kinds of individuals, that slowly but surely inched the company across the finish-line of success and sustains them today in their continued growth. The better equipped employees were with the relevant data, the better they could execute in their roles.At most companies, the decision to be data-driven is simple, actually getting to that state, not so much. Conversations might go something like this:“We should be more data-driven!”“Yeah!”“What data do we need?”“Depends, what are we trying to solve?”“Where’s the data?”“How do we get to it?”“What do we do once we have it?”“How can we share all this goodness?”At Sumo Logic, we’ve already cracked the hard fundamental problems of getting to the data and being able to ask meaningful questions of that data. We support a vast and scalable set of collection tools and at Sumo’s core is very powerful machine data analytics platform that allows our customers to query their logs and metrics data to quickly derive insights and address impactful operational and security issues.We’re working our way up the problem chain. Now that we can easily get to the data and analyze it – how do we use it as an organization? How can our machine data tell our infrastructure engineers about the health of the service, the support team about performance against SLAs, help PMs to understand user adoption and find a way to summarize all of this into a format that can be presented to executives and key stakeholders?To solve this, we recently introduced the concept of public dashboards; data-rich dashboards that could be shared outside of Sumo and across the organization. This helped expose data to users who relied on it to make decisions, but who were far removed from the actual applications and infrastructure that generated it.Now, we’re tackling a deeper issue: how do users and teams collaborate on data analysis within Sumo? How do they learn from each other about what kind of metrics other teams collect, what are best practices and how do they learn and grow exponentially as an organization as they become empowered with this data?We plan to solve this, later this year, by allowing users to share their dashboards, log searches and metrics queries with other users and roles in their organization. Teams can collaborate with each other and control with granularity how different users and roles in the organization can edit or view a dashboard or a search. Administrators can efficiently organize and make accessible the content that’s most relevant for a particular group of people.We’ve embraced the concept of Universal Access to mean accessibility to Sumo and more importantly, the data in Sumo, to all users regardless of their skill or experience levels with Sumo. We’ve redesigned Sumo to be contextual and intuitive with the introduction of simpler navigation and workflows. Current users will appreciate the new types of content that can be opened in tabs – such as dashboards, log searches, metrics queries and live tail – and the fact that these tabs are persistent across login sessions. New users will have a smoother onboarding experience with a personalized homepage.To check out the new UI (beta) & learn more about how Sumo Logic can help your organization be more data-driven, sign-up today!

June 6, 2017

Blog

Journey to the Cloud, with Pivotal and Sumo Logic

There is no denying it – the digital business transformation movement is real, and the time for this transformation is now. When, according to survey from Bain & Company, 48 of 50 Fortune Global companies have publicly announced plans to adopt public cloud, it is clear that there are no industries immune from this disruption. We are seeing traditional industries such as insurance, banking, and healthcare carving out labs and incubators that bring innovative solutions to market, and establish processes and platforms to help the rest of the organization with their evolution. For large enterprises it is critical that they manage the challenges of moving to public cloud, while satisfying the needs of a diverse set of internal customers. They need to support a variety of development languages, multiple deployment tool chains, and a mix of data centers and multiple public cloud vendors. Because these are long term strategies that involve considerable investment, they are concerned about long-term vendor lock-in, and are being proactive about developing strategies to mitigate those risks. These organizations are looking toward cloud-neutral commercial vendors to help them migrate to the cloud, and have consistency in how they deploy and manage their applications across heterogeneous environments. These enterprises are increasingly turning to Pivotal Cloud Foundry® to help them abstract their app deployments from the deployment specifics of individual cloud platforms, and maintain their ability to move apps and workloads across cloud providers when the time comes. Effective DevOps Analytics for the Modern Application The migration of enterprise workloads to the cloud, and the rise of public cloud competition, is driving the demand for Sumo Logic as a cloud-native platform for monitoring and securing modern applications. Pivotal Cloud Foundry enables users to abstract the underlying plumbing necessary to deploy, manage and scale containerized cloud native applications. This benefits developers by greatly increasing their productivity and ability to launch applications quickly. Such an environment also exposes a broader set of operational and security constructs that are useful to track, log and analyze. However it can also be more complicated to diagnose performance issues with decoupled architectures and composable micro-services. Full stack observability and the ability to trace all the apps and services together are critical to successful cloud deployments. Observability of decoupled architectures with composable services requires the ability to trace all layers of the stack With Pivotal Cloud Foundry and tools from Sumo Logic, an organization can have an observable, enterprise-class platform for application delivery, operations, and support across multiple public cloud providers and on-premises data centers. Beyond platform operations, Cloud Foundry customers want to enable their app teams to be self sufficient, and promote an agile culture of DevOps. Often, with legacy monitoring and analytics tools, the operations team will have access to the data, but they can’t scale to support the application teams. Or, the apps team may restrict access to their sensitive data, and therefore not support the needs of the security and compliance team. Sumo Logic believes in democratized analytics. This means that this massive flow of highly valuable data, from across the stack and cloud providers, should be available to everyone that can benefit from it. This requires the right level of scale, security, ubiquity of access, and economics that only Sumo Logic can provide. Sumo Logic & Pivotal Cloud Foundry Partnership Through our collaboration with Pivotal®, Sumo Logic has developed an app for Pivotal Cloud Foundry, as well as an easy-to-deploy integration with Pivotal Cloud Foundry Loggregator. A customer ready Beta of the “Sumo Logic Nozzle for PCF”, is available now as an Operations Manager Tile for Pivotal Cloud Foundry, available for download in BETA from Pivotal Network. Sumo Logic Tile Installed in the PCF Ops Manager If you are already using or evaluating Pivotal Cloud Foundry you can get started with operational and security analytics in a manner of minutes. With this integration, all of the log and metrics data collated by Cloud Foundry Loggregator will be streamed securely to the Sumo Logic Platform. For deployments with security and compliance requirements, Sumo Logic’s cloud-based service is SOC 2, HIPAA, and PCI-compliant. The Sumo Logic integration for Pivotal Cloud Foundry will be available in the App Library soon. If you would like early access, please contact your account team. Sumo Logic App for Pivotal Cloud Foundry highlight key Pivotal data and KPI’s The Sumo Logic App for Pivotal Cloud Foundry highlights key Pivotal data and KPIs. Sumo Logic’s App for Cloud Foundry operationalizes Pivotal Cloud Foundry’s monitoring best practices for you, and provides a platform for you to build upon to address your unique monitoring and diagnostic requirements.

Blog

The Democratization of Machine Data Analytics

Earlier today we announced a revolutionary set of new platform services and capabilities. As such, I wanted to provide more context around this and our strategy. While this new announcement is very exciting, we have always been pushing the boundaries to continuously innovate in order to remove the complexity and cost associated with getting the most value out of data. Whether it’s build-it-yourself open source toolkits or legacy on-premise commercial software packages, the “data tax” associated with these legacy licensing models, let alone the technology limitations have prevented universal access for all types of data sources and more importantly users. This strategy and the innovations we announced address the digital transformation taking place industry- wide, led by the mega trends of cloud computing adoption, DevSecOps and the growth of machine data. For example, IDC recently forecasted public cloud spending to reach 203.4 billion by 2020, while Bain & Company’s figure is nearly twice that at $390 billion. Whatever number you believe, the bottom line is that public cloud adoption is officially on a tear. For example, according to Bain, 48 of the 50 Fortune Global companies have publicly announced cloud adoption plans to support a variety of needs. In the world of DevSecOps, our own Modern App Report released last November substantiated the rise of a new Modern Application Stack, replete with new technologies, such as Docker, Kuburnetes, NoSQL, S3, Lambda, and CloudFront that are seriously challenging the status quo of traditional on-premise standards from Microsoft, HP, Oracle, and Akamai. However, the most significant, and arguably the most difficult digital transformation trend for businesses to get their arms around is the growth of machine data. According to Barclay’s Big Data Handbook, machine data will account for 40 percent of all data created by 2020, reaching approximately 16 zetabytes. (To put that number in perspective, 16 zetabytes is equivalent to streaming the entire Netflix catalogue 30 million times!) Since machine data is the digital blueprint of digital business, it’s rich source of actionable insights either remains locked away or difficult-to-extract at best because of expensive, outdated, disparate tooling that limits visibility, impedes collaboration and slows down the continuous processes required to build, run, secure and manage modern applications. Seven years ago, Sumo Logic made a big bet: disrupt traditional Big Data models with their lagging intelligence indicators by pursuing a different course: a real-time, continuous intelligence platform strategy better equipped to support the continuous innovation models of transformational companies. Now this need is becoming more critical than ever as the laggards make their shift and cloud computing goes mainstream, which not only will drive those market data numbers even higher, but also put the squeeze on the talent necessary to execute the shift. That’s why Sumo Logic’s vision, to “Democratize Machine Data”, now comes to the forefront. To truly enable every company to have the power of real-time, machine data analytics, we believe the current licensing, access and delivery models surrounding machine data analytics are also ripe for disruption. Our announcement today provides essential new innovations – ones that are only achievable because of our market-leading, multi-tenant, cloud-native platform – that remove economic, access and visibility barriers holding companies back from reaching their full data-insight potential. They are: Sumo Cloud Flex: a disruptive data analytics economic model that enables maximum flexibility to align data consumption and use with different use cases, and provide universal access by removing user-based licensing. While this was purpose-built and optimized for the massive untapped terabyte volume data sets, it’s also applicable to the highly variable data sets. Unified Machine Data Analytics: New, native PaaS and IaaS-level integrations to our cloud-native, machine data analytics platform to support data ingest from a variety of cloud platforms, apps and infrastructures. These additions will enable complete visibility and holistic management across the entire modern application and infrastructure stack. Universal Access: New experience capabilities such as a contextual and intuitive user interface to improve user productivity and public dashboards, and improved content sharing for faster collaboration with role-based access controls (RBAC). With this innovation, machine data insights are easier to access for non-technical, support services and business users. Over time, we predict ease-of-use initiatives like this will be one of the drivers to help close the current data scientist/security analyst talent gap. With our new innovations announced today, plus more coming later in the year, Sumo Logic is positioned to become the modern application management platform for digital transformation, delivered to our customers as a low TCO, scalable, secure service. That’s because machine data analytics will be the layer that provides complete visibility to manage the growing complexity of cloud-based, modern applications, which is sorely needed today and in the future. As the leading, cloud-native machine data analytics service on AWS, we service more than 1500 customers, from born-in-the-cloud companies like Salesforce, Twilio and AirBnB to traditional enterprises, such as Marriott, Alaska Airlines and Anheuser-Busch. Our platform system on average analyzes 100+ petabytes of data, executes more than 20 million searches, and queries 300+ trillion records every day. While these numbers seem massive, the numbers keep growing and yet we are only at the beginning of this massive opportunity. Other machine data analytics options such as cobbling a solution together with old technologies, or trying to build it on your own fall short because they don’t address the fundamental problem – machine data will just keep growing. To address this, the data layer must be re-architected – similar to the compute layer – to utilize the power of true distributed computing to address a problem that is never over – the volume, velocity and variety of machine data growth – and to do so in a way that meets the speed, agility and intelligence demands of digital business. You can’t put old, enterprise application architectures on top of the cloud and expect to be prepared. Sumo Logic’s ability to flexibly manage and maximize data and compute resources – the power of multi-tenant, distributed system architecture – across 1500+ customers means our customers have the ability to maximize their data insight potential to realize the holy grail of being real-time, data-driven businesses. We invite you to experience the power of the Sumo Logic for free. As always, I look forward to your feedback and comments.

June 6, 2017

Blog

Graphite vs. Sumo Logic: Building vs. Buying value

No no no NOOOOO NOOOOOOOO… One could hear Mike almost yelling while staring at his computer screen. Suddenly Mike lost the SSH connection to one of the core billing servers. He was in the middle of the manual backup before he could upgrade the system with the latest monkey patch. Mike, gained a lot of visibility and the promotion, after his last initiative of migrating ELK Stack to SaaS-based machine data analytics platform, Sumo Logic. He improved MTTI/MTTR by 90% and uptime of the log analytics service by 70% in less than a month time. With the promotion, he was in-charge of the newly formed site reliability engineering (SRE) team. He had 4 people reporting to him. It was a big deal. This was his first major project after the promotion and he wanted to ensure that everything goes well. But just now, something happened to the billing server and Mike had a bad feeling about it. He waited for few minutes to check if the billing server will start responding again. It has happened before, where SSH client used to temporarily lose the connection to the server. The root cause of the connection loss was the firewall in the corporate headquarters. They had to upgrade the firewall to fix this issue. Mike was convinced that it’s not the firewall, but something else has happened to the billing server, and this time around there was a way to confirm his hunch. To view what happened to the billing server he runs a query on Sumo Logic. “_SourceHost=billingserver AND “shut*” He quickly realizes that server was rebooted. He broadens the search to +-5 minutes range from the above log message and identifies that disk was full. He added some more disk to the existing server to ensure that billing server does not restart because of the lack of hard drive space. However, Mike had no visibility into host metrics such as CPU, Hard Disk, and Memory usage. He needed a solution to gather host and custom metrics. He couldn’t believe how the application was managed without these metrics. He knew very well that Metrics must be captured to get visibility into system health. So he reprioritized his Metrics project over making ELK stack and entire infrastructure PCI compliant. Stage 1: Installing Graphite After a quick search, he identifies Graphite as one of his options. He had a bad taste in his mouth related to ELK, which cost him arm and a leg for just a search feature. This time though, he thought it will be different. Metrics were only 12 bytes in size! He thought how hard can it be to store 12 Bytes of data for 200 Machines? He chose Graphite as their open-source host metrics system. He downloads and installs the latest graphite on AWS t2.medium @ $0.016 USD per hour, Mike can get 4GB RAM with 2 vCPU. In less than $300 USD Mike is ready to test his new Metrics system. Graphite has three main components. Carbon, whisper and Graphite Web. Carbon listens on a TCP port and expects time series metrics. Whisper is a flat-file database while Graphite Web is a Django application that can query Carbon-cache and Whisper. He installs all of this on one single server. The logical architecture looks some like in Figure 1 below. Figure 1: Simple Graphite Logical Architecture on a Single Server Summary: At the end of stage 1, Mike had a working solution with a couple of servers on AWS. Stage 2: New Metrics stopped updating – the First issue with Graphite On a busy day, suddenly new metrics were not shown in the UI. This was the first time ever after few months of operations that Graphite was facing issues. After careful analysis, it was clear that metrics were getting written to the whisper files. Mike, thought for a second and realized that whisper pre-allocates the disk space to whisper files based on the configuration in carbon.conf file. To make it more concrete, 31.1 MB is pre-allocated by whisper for 1 metric collected every 1 second for one host and retained for 30 days. Total Metric Storage = 1 Host* 1 metric/sec* 60 sec *60 mins *24 hrs *30 days retention. He realized that he might have run out of disk space and sure enough, that was the case. He doubled the disk space, restarted the graphite server and now new data points started showing up. Mike was happy that he was able to resolve the issue before it got escalated. However, his mind started creating “What-If” scenarios. What if the application he is monitoring goes down exactly at the same time Graphite gives up? He parks that scenario in the back of his head and goes back to working on other priorities. Summary: At the end of stage 2, Mike already had incurred additional storage cost and ended up buying EBS Provisioned IoPS volume. SSD would have been better but this is the best he could do with the allocated budget. Stage 3: And Again New Metrics Stopped Updating On Saturday night 10 PM there was a marketing promotion. Suddenly it went viral and a lot of users logged into the application. Engineering had auto-scaling enabled on its front end while Mike had ensured that new images will automatically enable StatsD. Suddenly the metrics data points per minute (DPM) grew significantly and way above average DPM. Mike, had no idea about these series of events. The ticket with only information he received was “New Metrics are not showing up, AGAIN!” He quickly found out the following. MAX_UPDATES_PER_SECOND which determines how many updates you must have per second was increasing gradually also MAX_CREATES_PER_MINUTE was at its max. Mike quickly realized the underlying problem. It was the I/O problem causing the server to crash because graphite server is running out of memory. Here is how he connects the dots. Auto-scaling kicks in and suddenly 800 servers start sending the metrics to graphite. This is four times the load than the average number of hosts running at any given time. This quadruples the metrics ingested as well. Graphite configurations MAX_UPDATE_PER_SECOND and MAX_CREATES_PER_MINUTE reduces the load on disk I/O but it has an upstream impact. Suddenly carbon-cache starts using more and more memory. Considering “MAX_CACHE_SIZE” was set to infinite, Carbon-cache kept storing the metrics in the memory that was waiting to be written to whisper/disk. As carbon-cache process ran out of memory it crashed and sure enough, metrics stopped getting updated. So Mike added EBS volume with provisioned I/O and upgraded the server to M3 Medium instead of t2. Summary: At the end of stage 3, Mike has already performed two migrations. First, by changing the hard-drive he had to transfer the graphite data. Second, after changing the machine he had to reinstall and repopulate the data. Not to mention this time he has to reconfigure all the clients to send metrics to this new server. Figure 2: Single Graphite M3 Medium Server after Stage 3 Stage 4: Graphite gets resiliency, but at what cost? Mike from his earlier ELK experience learned one thing, that he cannot have any single point of failures in his data ingest pipeline at the same time he has to solve for the Carbon relay crash. Before anything happens he has to resolve the single point of failure in the above architecture and allocate more memory to carbon-relay. He decided to replicate similar graphite deployment in a different availability zone. This time he turns on the replication in the configuration file and creates the architecture as below. The architecture below ensures replication and adds more memory to carbon-relay process so that it can hold metrics in memory while whisper is busy writing them to the disk. Summary: At the end of stage 4, Mike has resiliency with replication and more memory for Carbon relay process. This change has doubled the Graphite cost from the last time. Figure 3: Two Graphite M3 Medium Server with replication after Stage 4 Stage 5: And another one bites the dust… Yet another Carbon Relay issue. Mike was standing in the line for the hot breakfast. At this deli, one has to pay first and then get their breakfast. He saw a huge line at the cashier. The cashier seemed to be a new guy. He was slow and the line was getting longer and longer. It was morning and everyone wanted to quickly get back. Suddenly Mike’s brain started drawing an analogy. He thought carbon-relay as a cashier, person serving the breakfast as a carbon-cache and chef as a whisper. The chef takes the longest time because he has to cook the breakfast. Suddenly he realizes the flaw in his earlier design. There is a line port (TCP 2003) and a Pickle port(TCP 2004) on Carbon-relay. Every host is configured to throw metrics at those ports. The moment Carbon-Relay gets saturated there is no way to scale them up without adding new servers and some network reconfigurations and hosts configuration changes. To avoid that kind of disruptions, he quickly comes up with a new design he calls it relay-sandwich. He separates out HA proxy on its dedicated server. Carbon-relay also gets its own server so that it can scale horizontally without changing the configuration at the host level. Summary: Each Graphite instance has four servers and total of 8 servers across two graphite instances. At this point, the system is resilient with headroom to scale carbon-relay. Figure 4: Adding more servers with HA Proxy and Carbon Relay Stage 6: Where is my UI? As you all must have noticed this is just the backend architecture. Mike was the only person running the show but if he wants more users to have access to this system, he must scale front end as well. He ends up installing Graphite-Web and the final architecture becomes as shown in figure 5. Summary: Graphite evolved from single server to 10 machine Graphite cluster instance managing metrics only for the fraction of their infrastructure. Figure 5: Adding more servers with HA Proxy and Carbon Relay Conclusion: It was Deja-vu for Mike. He had seen this movie before with ELK. After 20 servers in with Graphite, he was just getting started. He quickly realizes that if he enables custom metrics he has to double the size of his graphite cluster. Currently, the issue is graphite has metrics indicating “What” is wrong with the system while with Sumo Logic platform with correlated logs and metrics not only indicates “what” is wrong with the system but also indicates “why” something is wrong. Mike, turns on Sumo Logic metrics on the same collectors collecting logs and gets correlated logs and metrics on Sumo Logic platform. Best part he is not on the hook to manage the management system.

May 31, 2017

Blog

6 Metrics You Should Monitor During the Application Build Cycle

Monitoring application metrics and other telemetry from production environments is important for keeping your app stable and healthy. That you know. But app telemetry shouldn’t start and end with production. Monitoring telemetry during builds is also important for application quality. It helps you detect problems earlier on, before they reach production. It also allows you to achieve continuous, comprehensive visibility into your app. Below, we’ll take a look at why monitoring app telemetry during builds is important, then discuss the specific types of data you should collect at build time. App Telemetry During Builds By monitoring application telemetry during the build stage of your continuous delivery pipeline, you can achieve the following: Early detection of problems. Telemetry statistics collected during builds can help you to identify issues with your delivery chain early on. For example, if the number of compiler warnings is increasing, it could signal a problem with your coding process. You want to address that before your code gets into production. Environment-specific visibility. Since you usually perform builds for specific types of deployment environments, app telemetry from the builds can help you to gain insight into the way your app will perform within each type of environment. Here again, data from the builds helps you find potential problems before your code gets to production. Code-specific statistics. App telemetry data from a production environment is very different from build telemetry. That’s because the nature of the app being studied is different. Production telemetry focuses on metrics like bandwidth and active connections. Build telemetry gives you more visibility into your app itself—how many internal functions you have, how quickly your code can be compiled, and so on. Continuous visibility. Because app telemetry from builds gives you visibility that other types of telemetry can’t provide, it’s an essential ingredient for achieving continuous visibility into your delivery chain. Combined with monitoring metrics from other stages of delivery, build telemetry allows you to understand your app in a comprehensive way, rather than only monitoring it in production. Metrics to Collect If you’ve read this far, you know the why of build telemetry. Now let’s talk about the how. Specifically, let’s take a look at which types of metrics to focus on when monitoring app telemetry during the build stage of your continuous delivery pipeline. Number of environments you’re building for. This might seem so basic that it’s not worth monitoring. But in a complex continuous delivery workflow, it’s possible that the types of environments you target will change frequently. Tracking the total number of environments can help you understand the complexity of your build process. It can also help you measure your efforts to stay agile by maintaining the ability to add or subtract target environments quickly. Total lines of source code. This metric gives you a sense of how quickly your application is growing—and by extension, how many resources it will consume, and how long build times should take. The correlation between lines of source code and these factors is rough, of course. But it’s still a useful metric to track. Build times. Monitoring how long builds take, and how build times vary between different target environments is another way to get a sense of how quickly your app is growing. It’s also important for keeping your continuous delivery pipeline flowing smoothly. Code builds are often the most time-consuming process in a continuous delivery chain. If build times start increasing substantially, you should address them in order to avoid delays that could break your ability to deliver continuously. Compiler warnings and errors. Compiler issues are often an early sign of software architecture or coding issues. Even if you are able to work through the errors and warnings that your compiler throws, monitoring their frequency gives you an early warning sign of problems with your app. Build failure rate. This metric serves as another proxy for potential architecture or coding problems. Code load time. Measuring changes in the time it takes to check out code from the repository where you store it helps you prevent obstacles that could hamper continuous delivery. Monitoring telemetry during the build stage of your pipeline by focusing on the metrics outlined above helps you not only build more reliably, but also gain insights that make it easier to keep your overall continuous delivery chain operating smoothly. Most importantly, they help keep your app stable and efficient by assisting you in detecting problems early and maximizing your understanding of your application.

Blog

7 Ways the Sumo Logic Redesign Will Change Your Life

We’re excited to announce our biggest user experience overhaul as Sumo Logic enters its 8th year. Here’s quick list of some amazing things in the new UI. 1. An integrated workspace Everything now opens in tabs. This makes workflows like drilling down into dashboards, switching between log searches, or jumping between Metrics and Log Searching much smoother. The workspace remembers your state so when you log back into Sumo Logic, it fires up your tabs from the previous session. An Integrated Workspace 2. Quick access to your content Did you know about Sumo Logic library? It’s front and center now so you can quickly access your saved content and content shared with you. If you find yourself running the same searches and opening the same dashboards over and over again, you can now quickly launch them from the Recent tab. Quick access to content 3. Sumo Home Do you feel like a deer in headlights when you log in and see the blinking cursor on the search page? Not anymore! Sumo Home gives you a starting point full of useful content. Future improvements will let you personalize this page for your unique workflows. 4. A modern, cleaner, more consistent interface A fresh set of icons, an updated content library, and a tabbed, browser-like behavior are some of the many visual upgrades we did to get Sumo Logic ready for 2017. A modern interface 5. A beautiful App Catalog The App Catalog was redesigned from the ground up and now gives you a visual way of browsing through prebuilt content to help you get started faster and get more out of your logs. 6. Distraction-free mode Sometimes you need all the space you can get while troubleshooting. You can collapse the left navigation and pop out Dashboard tabs to give more real estate to your data. Distraction Free Mode 7. The Back Button works Hitting your browser’s back button will not log you out of Sumo Logic anymore, thanks to a smart UI routing. We solved one of biggest user pet peeves Check out the redesign! If you have any feedback on the redesign, please feel free to reach out to us at [email protected] or leave us comments directly on the Home Page.

May 24, 2017

Blog

AWS Config: Monitoring Resource Configurations for Compliance

Blog

Mesosphere DC/OS Logging with Sumo Logic

Mesosphere DC/OS (Data Center Operating System) lets you manage your data center as if it were a single powerful computer. It separates the infrastructure logic from the application logic, and makes it easy to manage distributed systems. Considering DC/OS is meant for highly scalable, distributed systems, logging and monitoring plays a key role in day-to-day operations with DC/OS. In this post, we’ll take a look at the goals of logging with DC/OS, and how you can set up DC/OS logging with Sumo Logic.Why Mesosphere DC/OS Clusters Need LoggingWhen you work with Mesosphere DC/OS, you typically have hundreds, if not thousands of nodes that are grouped in clusters. Tasks are executed on these nodes by Mesos “agent” instances, which are controlled by “master” instances. By grouping the nodes in clusters, DC/OS ensures high availability so that if any node or cluster fails, its workload is automatically routed to the other clusters. DC/OS uses two scheduling tools—Chronos for scheduled tasks like ETL jobs, and Marathon for long-running tasks like running a web server. Additionally, it includes app services like Docker, Cassandra, and Spark. DC/OS supports hybrid infrastructure, allowing you to manage bare metal servers, VMs on-premises, or cloud instances, all from a single pane of glass. Together, all of these components make for a complex system that needs close monitoring.There are two key purposes for collecting and analyzing DC/OS logs. The first is debugging. As new tasks are executed, DC/OS makes decisions in real time on how to schedule these tasks. While this is automated, it needs supervision. Failover needs logging so you can detect abnormal behavior early on. Also, as you troubleshoot operational issues on a day-to-day basis, you need to monitor resource usage at a granular level, and that requires a robust logging tool.Second, for certain apps in enterprises, compliance is a key reason to store historic logs over a long period of time. You may need to comply to HIPAA or PCI DSS standards.Viewing raw logs in DC/OSDC/OS services and tasks write stdout and stderr files in their sandboxes by default. You can access logs via the DC/OS CLI or the console. You can also SSH into a node and run the following command to view its logs:$ journalctl -u "dcos-*" -bWhile this is fine if you’re running just a couple of nodes, once you scale to tens or hundreds of nodes, you need a more robust logging tool. That’s where a log analysis tool like Sumo Logic comes in.Sharing DC/OS logs with Sumo LogicDC/OS shares log data via a HTTP endpoint which acts as a source. The first step to share Mesosphere DC/OS logs with Sumo Logic is to configure a HTTP source in Sumo Logic. You can do this from the Sumo Logic console by following these steps. You can edit settings like timestamp, and allow multi-line messages like stack traces.Your data is uploaded to a unique source URL. Once uploaded, the data is sent to a Sumo Logic collector. This collector is hosted and managed by Sumo Logic, which makes setup easy, and reduces maintenance later. The collector compresses the log data, encrypts it, and sends it to the Sumo Logic cloud, in real time.During this setup process, you can optionally create Processing rules to filter data sent to Sumo Logic. Here are some actions you can take on the logs being shared:Exclude messagesInclude messagesHash messagesMask messagesForward messagesThese processing rules apply only to data sent to Sumo Logic, not the raw logs in DC/OS.It may take a few minutes for data to start showing in the Sumo Logic dashboard, and once it does, you’re off to the races with state-of-the-art predictive analytics for your log data. You gain deep visibility into DC/OS cluster health. You can setup alerts based on the log data and get notifications when failed nodes reach a certain number, or when a high priority task is running too slow, or if there is any suspicious user behavior. Whether it’s an overview, or a deep dive to resolve issues, Sumo Logic provides advanced data analysis that builds on the default metrics of DC/OS. It also has options to archive historic log data for years so you can comply with various security standards like HIPAA or PCI DSS.DC/OS is changing the way we view data centers. It transforms the data center from hardware- centric to software-defined. A comprehensive package, it encourages hybrid infrastructure, prevents vendor lock-in, and provides support for container orchestration. DC/OS is built for modern web scale apps. However, it comes with a new set of challenges with infrastructure and application monitoring. This is where you need a tool like Sumo Logic so that you not only view raw log data, but are also able to analyze it and derive insights before incidents happen.About the AuthorTwain began his career at Google, where, among other things, he was involved in technical support for the AdWords team. Today, as a technology journalist he helps IT magazines, and startups change the way teams build and ship applications.Mesosphere DC/OS Logging with Sumo Logic is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Improving Your Performance via Method Objects

When Sumo Logic receives metrics data, we put those metrics datapoints into a Kafka queue for processing. To help us distribute the load, that Kafka queue is broken up into multiple Kafka Topic Partitions; we therefore have to decide which partition is appropriate for a given metrics datapoint. Our logic for doing that has evolved over the last year in a way that spread the decision logic out over a few different classes; I thought it was time to put it all in one place. My initial version had an interface like this: def partitionFor(metricDefinition: MetricDefinition): TopicPartition As I started filling out the implementation, though, I began to feel a little bit uncomfortable. The first twinge was when calculating which branch to go down in one of the methods: normally, when writing code, I try to focus on clarity, but when you’re working at the volumes of data that Sumo Logic has to process, you have to keep efficiency in mind when writing code that is evaluated on every single data point. And I couldn’t convince myself that one particular calculation was quite fast enough for me to want to perform it on every data point, given that the inputs for that calculation didn’t actually depend on the specific data point. So I switched over to a batch interface, pulling that potentially expensive branch calculation out to the batch level: class KafkaPartitionSelector { def partitionForBatch(metricDefinitions: Seq[MetricDefinition]): Seq[TopicPartition] = { val perMetric = calculateWhetherToPartitionPerMetric() metricDefinitions.map { metric => partitionFor(metric, perMetric) } } private def partitionFor(metricDefinition: MetricDefinition, perMetric: Boolean): TopicPartition = { if (perMetric) { ... } else { ... } } } That reduced the calculation in question from once per data point to once per batch, getting me past that first problem. But then I ran into a second such calculation that I needed, and a little after that I saw a call that could potentially translate into a network call; I didn’t want to do either of those on every data point, either! (The results of the network call are cached most of the time, but still.) I thought about adding them as arguments to partitionFor() and to methods that partitionFor() calls, but passing around three separate arguments would make the code pretty messy. To solve this, I reached a little further into my bag of tricks: this calls for a Method Object. Method Object is a design pattern that you can use when you have a method that calls a bunch of other methods and needs to pass the same values over and over down the method chain: instead of passing the values as arguments, you create a separate object whose member variables are the values that are needed in lots of places and whose methods are the original methods you want. That way, you can break your implementation up into methods with small, clean signatures, because the values that are needed everywhere are accessed transparently as member variables. In this specific instance, the object I extracted had a slightly different flavor, so I’ll call it a “Batch Method Object”: if you’re performing a calculation over a batch, if every evaluation needs the same data, and if evaluating that data is expensive, then create an object whose member variables are the data that’s shared by all batches. With that, the implementation became: class KafkaPartitionSelector { def partitionForBatch(metricDefinitions: Seq[MetricDefinition]): Seq[TopicPartition] = { val batchPartitionSelector = new BatchPartitionSelector metricDefinitions.map(batchPartitionSelector.partitionFor) } private class BatchPartitionSelector { private val perMetric = calculateWhetherToPartitionPerMetric() private val nextExpensiveCalculation = ... ... def partitionFor(metricDefinition: MetricDefinition): TopicPartition = { if (perMetric) { ... } else { ... } } ... } } One question that came up while doing this transformation was whether every single member variable in BatchPartitioner was going to be needed in every batch, no matter what the feature flag settings were. (Which was a potential concern, because they would all be initialized at BatchPartitioner creation time, every time this code processes a batch.) I looked at the paths and checked that most were used no matter the feature flag settings, but there was one that only mattered in some of the paths. This gave me a tradeoff: should I wastefully evaluate all of them anyways, or should I mark that last one as lazy? I decided to go the route of evaluating all of them, because lazy variables are a little conceptually messy and they introduce locking behind the scenes which has its own efficiency cost: those downsides seemed to me to outweigh the costs of doing the evaluation in question once per batch. If the potentially-unneeded evaluation had been more expensive (e.g. if it had involved a network call), however, then I would have made them lazy instead. The moral is: keep Method Object (and this Batch Method Object variant) in mind: it’s pretty rare that you need it, but in the right circumstances, it really can make your code a lot cleaner. Or, alternatively: don’t keep it in mind. Because you can actually deduce Method Object from more basic, more fundamental OO principles. Let’s do a thought experiment where I’ve gone down the route of performing shared calculations once at the batch level and then passing them down through various methods in the implementation: what would that look like? The code would have a bunch of methods that share the same three or four parameters (and there would, of course, be additional parameters specific to the individual methods). But whenever you see the same few pieces of data referenced or passed around together, that’s a smell that suggests that you want to introduce an object that has those pieces of data as member variables. If we follow that route, we’d apply Introduce Parameter Object to create a new class that you pass around, called something like BatchParameters. That helps, because instead of passing the same three arguments everywhere, we’re only passing one argument everywhere. (Incidentally, if you’re looking for rules of thumb: in really well factored code, methods generally only take at most two arguments. It’s not a universal rule, but if you find yourself writing methods with lots of arguments, ask yourself what you could do to shrink the argument lists.) But then that raises another smell: we’re passing the same argument everywhere! And when you have a bunch of methods called in close proximity that all take exactly the same object as one of their parameters (not just an object of the same type, but literally the same object), frequently that’s a sign that the methods in question should actually be methods on the object that’s a parameter. (Another way to think of this: you should still be passing around that same object as a parameter, but the parameter should be called this and should be hidden from you by the compiler!) And if you do that (I guess Move Method is the relevant term here?), moving the methods in question to BatchParameters, then BatchParameters becomes exactly the BatchPartitionSelector class from my example. So yeah, Method Object is great. But more fundamental principles like “group data used together into an object” and “turn repeated function calls with a shared parameter into methods on that shared parameter” are even better. And what’s even better than that is to remember Kent Beck’s four rules of simple design: those latter two principles are both themselves instances of Beck’s “No Duplication” rule. You just have to train your eyes to see duplication in its many forms.

May 9, 2017

Blog

Building Java Microservices with the DropWizard Framework

Blog

An Introduction to the AWS Application Load Balancer

Blog

The Importance of Logs

Blog

The DockerCon Scoop - Containers, Kubernetes and more!

Ahhh DockerCon, the annual convention for khaki pant enthusiasts. Oh, wait, not that Docker. Last week DockerCon kicked off with 5500 Developers, IT Ops Engineers and enterprise professionals from across the globe. With the announcement of new features like LinuxKit and the Moby project, Docker is doubling down on creating tools that enable mass innovation while simplifying and accelerating the speed of the delivery cycle. Docker is starting to turn a corner, becoming a mature platform for creating mission-critical, Enterprise class applications. Throughout all of this, monitoring and visibility into your infrastructure continues to be critical to success. Current Trends In the world of containers, there are three trends we are seeing here at Sumo Logic. First, is the rapid migration to containers. Containers provide great portability of code and easier deployments. Second is the need for visibility. While migrating to containers have simplified the deployment process, it is definitely a double-edged sword. The ability to monitor your containers health, access the container logs and monitor the cluster on which your containers run is critical to maintaining the health of your application. The last trend is the desire to consolidate tools. You may have numerous tools helping you monitor your applications. Having multiple tools introduces “swivel chair” syndrome, where you have to switch back and forth between different tools to help diagnose issues as they are happening. You may start with a tool showing you some metrics on CPU and memory, indicating something is going wrong. Metrics only give you part of the visibility you need. You need to turn to your logs to figure out why this is happening. Monitoring Your Containers and Environment Sumo Logic’s Unified Logs and Metrics are here to help give you full visibility into your applications. To effectively monitor your applications, you need the whole picture. Metrics give you insights into what is happening, and logs give you insights into why. The union of these two allow you to perform root cause analysis on production issues to quickly address the problem. Sumo Logic can quickly give you visibility into your Docker containers leveraging our Docker Logs and Docker Stats sources. Our Docker application allows you to gain immediate visibility into the performance of your containers across all of your Docker hosts. Collecting Logs and Metrics From Kubernetes At DockerCon, we saw an increased use of Kubernetes and we received many questions on how to collect data from Kubernetes clusters. We have created a demo environment that is fully monitored by Sumo Logic. This demo environment is a modern application leveraging a micro-services architecture running in containers on Kubernetes. So how do we collect that data? Well, the below diagram helps illustrate that. We created a FluentD plugin to gather the logs from the nodes in the cluster and enrich them with metadata available in Kubernetes. This metadata can be pulled into Sumo Logic giving you increased ability to search and mine your data. We run the FluentD plugin as a Daemonset which ensures we collect all the logs for every node in our cluster. For metrics, we are leveraging Heapster’s ability to output to a Graphite sink and using a Graphite Source on our collector to get the metrics into Sumo Logic. Since Heapster can monitor metrics at the cluster, container and node level, we just need to run it and the collector as a deployment to get access to all the metrics that Heapster has to offer. What's Next What if you are not running in Kubernetes? In a previous post, we discussed multiple ways to collect logs from containers. However, due to the fast-paced growth in the container community, it is time to update that and we will add a post to dive deeper into that.

Blog

Add Logging to Your Apps with the New Sumo Logic Javascript Logging SDK

April 26, 2017

Blog

Best Practices for Creating Custom Logs - Part I

Overview When logging information about your operating system, services, network, or anything else, usually there’s predefined log structures in place by the vendor. There are times when there aren’t predefined logs created by some software or you have custom application logs from your own software. Without properly planning your log syntax you’ll be using, things can get messy and your data may lose its integrity to properly tell the story. These best practices for creating custom logs can be applied to most logging solutions. The 5 W's There are 5 critical components of a good log structure*: When did it happen (timestamp) What happened (e.g., error codes, impact level, etc) Where did it happen (e.g., hostnames, gateways, etc) Who was involved (e.g., usernames) Where he, she, or it came from (e.g., source IP) Additionally, your custom logs should have a standard syntax that is easy to parse with distinct delimiters, a key-value pair, or a combination of both. An example of a good custom log is as follows: 2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flight/ - credit.payments.io - Success - 2 - 241.98 This log message shows when it was performed, what was performed, where it happened in your system, who performed it, and where that user came from. It’s also structured cleanly with a space-dash-space as the delimiters of each field. Optionally, you can also have key-value pairs to assist with parsing: timestamp: 2017-04-10 09:50:32 -0700 - username: dan12345 - source_ip: 10.0.24.123 - method: GET - resource: /checkout/flight/ - gateway: credit.payments.io - audit: Success - flights_purchased: 2 - value: 241.98 Once you have your log syntax and what will be going into the logs, be sure to document this somewhere. You can document it by adding a comment at the top of each log file. Without documentation, you may forget or someone may not know what something like “2” or “241.98” represents (for this example, it means 2 flights in the checkout at a value of $241.98). You can document our log syntax as such: Timestamp - username - user_ip - method - resource - gateway - audit - flights_purchased - value In the second part of this three part series, we'll go into deeper details around timestamps and log content. In the final part, we'll go even deeper into log syntax and documentation. *Source: Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management. Chuvakin, A., Phillips, C., & Schmidt, K. J. (2013).

April 20, 2017

Blog

Best Practices for Creating Custom Logs - Part II

Diving Deeper Now that you have an overview for custom logs and what is involved in creating a good log practice from Part I of the series, it’s time to look further into what and why you should log in your system. This will be broken up into two parts. The first will cover timestamps and content, and the second part will cover syntax and documentation. Timestamp The first and most critical component of just about any log syntax is your timestamp - the “when”. A timestamp is important as it will tell you exactly the time an event took place in the system and was logged. Without this component, you’ll be relying on your log analysis solution to stamp it based upon when it came in. Adding a timestamp at the exact point of when an entry is logged will make sure you are consistently and accurately placing the entry at the right point in time for when it occurred. RFC 3339 is used to define the standard time and date format on the internet. Your timestamp should include year, month, day, hour, minute, second, and timezone. Optionally, you’ll also want to include sub-second depending on how important and precise you’ll need your logs to get for analysis. For Sumo Logic, you can read about the different formats for timestamps that are supported here - Timestamps, Time Zones, Time Ranges, and Date Formats. Log Content To figure out what happened, this can include data such as the severity of the event (e.g., low, medium, high; or 1 through 5), success or failure, status codes, resource URI, or anything else that will help you or your organization know exactly what happened in an event. You should be able to take a single log message or entry out of a log file and know most or all critical information without depending on logs’ file name, storage locations, or automatic metadata tagging from your tool. Your logs should tell a story. If they’re complex, they should also be documented as discussed later on. Bad Logs For a bad example, you may have a log entry as such: 2017-04-10 09:50:32 -0700 Success While you know that on April 10, 2017 at 9:50am MT an event happened and it was a success, you don’t really know anything else. If you know your system inside and out, you may know exactly what was successful; however, if you handed these logs over to a peer to do some analysis, they may be completely clueless! Good Logs Once you add some more details, the picture starts coming together: 2017-04-10 09:50:32 -0700 GET /checkout/flights/ Success From these changes you know on April 10th, a GET method was successfully performed on the resource /checkout/flights/. Finally, you may need to know who was involved and where. While the previous log example can technically provide you a decent amount of information, especially if you have tiny environment, it’s always good to provide as much detail since you don’t know what you may need to know for the future. For example, usernames and user IPs are good to log: 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ Success Telling the Story Now you have even more details about what happened. A username or IP may individually be enough, but sometimes (especially for security) you’ll want as much as you can learn about the user since user accounts can be hacked and/or accessed from other IPs. You have just about enough at this point to really tell a story. To make sure you know whatever you can about the event, you also want to know where things were logged. Again, while your logging tool may automatically do this for you, there’s many factors that may affect the integrity and it’s best to have your raw messages tell as much as possible. To complete this, let’s add the gateway that logged the entry: 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success Now you know that this was performed on a gateway named credit.payments.io. If you had multiple gateways or containers, you may come to a point of needing to identify which to fix. Omitting this data from your log may result in a headache trying to track down exactly where it occurred. This was just 1 example of some basics of a log. You can add as much detail to this entry to make sure you know whatever you can for any insight you need now or in the future. For example, you may want to know other info about this event. How many flights were purchased? 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success 2 Where 2 is the amount of flights. What was the total value of the flights purchased? 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success 2 241.9 Where 2 is the amount of flights, and they totalled $241.98. Now that you know what to put into your custom logs, you should also consider deciding on a standard syntax throughout your logs. This will be covered in the last part of this series on best practices for creating custom logs.

April 20, 2017

Blog

Best Practices for Creating Custom Logs - Part III

Diving Even DeeperIn Part I there was a general overview of custom logs and Part II discussed timestamps and log content. At this point, you have a log that contains a bunch of important data to help you analyze it to gather useful information about your systems. In this final part of this series, you’ll learn about how to organize the data in your logs and how to make sure you properly document it.Log SyntaxYou may have the most descriptive and helpful data in your logs, but it can be very difficult to analyze your logs if you don’t have a defined and structured syntax. There are generally 2 ways to go about structuring your logs.Key-ValueWhen it comes to log analysis and parsing your logs, a key-value pair may be the simplest and allow for the most readable format. In our previous example, it may not be the most human-readable format and it may be a little more difficult to find anchors to parse against.You can change the message to be easier to read by humans and easier to parse in a tool like Sumo Logic:timestamp: 2017-04-10 09:50:32 -0700, username: dan12345, source_ip: 10.0.24.123, method: GET, resource: /checkout/flights/, gateway: credit.payments.io, audit: Success, flights_purchased: 2, value: 241.98You can take it a step further and structure your logs in a JSON format:{ timestamp: 2017-04-10 09:50:32 -0700,username: dan12345,source_ip: 10.0.24.123,method: GET,resource: /checkout/flights/,gateway: credit.payments.io,audit: Success,flights_purchased: 2,value: 241.98,}In Sumo Logic, you have various ways to parse through this type of structure including a basic Parse operator on predictable patterns or even Parse JSON. While it is ideal to use some sort of key-value pairing, it is not always the most efficient as you’re potentially doubling the size of an entry that gets sent and ingested. If you have low log volume, this wouldn’t be an issue; however, if you are generating logs at a high rate, it can become very costly to have log entries of that size.This brings us to the other format, which are delimited logs.DelimitedDelimited logs are essentially the type of log you built in the previous examples. This means that it’s a set structure to your log format, and different content is broken up by some sort of delimiter.2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success 2 241.98Because of how this example is structured, spaces are the delimiters. To an extent, this is perfectly reasonable. The problem this provides you when parsing is figuring out where fields start and end as you see with the timestamp, though it may be the most efficient and smallest size you can get for this log. If you need to stick with this format, you’ll probably be sticking to regular expressions to parse your logs. This isn’t a problem to some, but others regular expressions can understandably be a challenge.To try and reduce the need for regular expressions, you’ll want to use a unique delimiter. A space can sometimes be one, but it may require us to excessively parse the timestamp. You may want to use a delimiter such as dash, semicolon, comma, or another character (or character pattern) that you can guarantee will never be used in the data of your fields.2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flights/ - credit.payments.io - Success - 2 - 241.98A syntax like this will allow you to parse out the entire message with a space-dash-space ( - ) as your delimiter of the fields. Space-dash-space would make sure that the dashes in the timestamp are not counted as a delimiter.Finally, to make sure you don’t have an entry that can be improperly parsed, always make sure you have some sort of filler in place of any fields that may not have data.For example:2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flights/ - credit.payments.io - Failure - x - xFurthermore from the example, you know that the event was a failure. Because it failed, it didn’t have flight totals or values. To prevent needing additional parsers for not having those fields, you simply can replace those fields with something like an ‘x’. Note that if you’re running aggregates or math against a field that may typically be a number, you may require adding some additional logic to your search queries.DocumentationYou may have the greatest log structure possible, but without proper documentation it’s possible to forget why something was part of your logging structure or you may forget what certain fields represented. You should always have documented what your log syntax represents.Referring back to the previous log example:2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flights/ - credit.payments.io - Success - 2 - 241.98You can document your log syntax as such:Timestamp - username - user_ip - method - resource - gateway - audit - flights_purchased - valueThis log syntax can placed at the very start of the log file one time for future reference if necessary.ConclusionAt Sumo Logic, we regularly work with those who are new to logging and have many questions around how to get the most out of their logs. While you can start ingesting your logs and getting insights almost immediately, the information provided from the tool is only as good as the data we receive. Though most vendors do a good job in sticking to standard log structures with great data to get these insights, it’s up to you to standardize a custom created log.In this series, I set out to help you create logs that have relevant data to know as much as you can about your custom applications. As long as you stick to the “5 W’s”, you structure your logs in a standard syntax, and you document it, then you’ll be on the right track to getting the most out of Sumo Logic.Be sure to sign up for a free trial of Sumo Logic to see what you can do with your logs!

April 20, 2017

Blog

What does it take to implement & maintain a DevSecOps approach in the Cloud

Operational and Security Tips, Tricks and Best Practices In Gartner’s Top 10 Strategic Technology Trends for 2016: Adaptive Security Architecture, they argued that “Security must be more tightly integrated into the DevOps process to deliver a DevSecOps process that builds in security from the earliest stages of application design.” We ultimately need to move to this model if we are going to be successful and continue to reduce the dwell time of cyber criminals who are intent on compromising our applications and data. But how do we get from this: To this? Easier said than done. To answer this question, I sat down with our CISO and IANS Faculty Member George Gerchow about what it means to implement and maintain a DevSecOps approach in the cloud – and what operational and security best practices should organizations follow to ensure success in their move to the cloud. Below is a transcript of the conversation. DevSecOps seems like a buzz word that everyone is using these days. What does DevSecOps really mean? George: It is really about baking security in from Day 1. When you’re starting to put new workloads in the cloud or have these green field opportunities identified, start changing your habits and your behavior to incorporate security in from the very beginning. In the past we used to have a hard shell soft center type approach to security and in the cloud there is no hardshell, and we don’t run as many internal applications anymore. Now we’re releasing these things out into the wild into a hostile environment so you gotta be secure since day 1. Your developers and engineers, you have to have people who think security first when they’re developing code, that is most important take away” What does it really mean when you say baking security in….or the term shifting left, which I am starting to hear our there? George: It is about moving security earlier into the conversation, earlier into the software development lifecycle. You need to get developers to do security training. I’m talking about code review, short sprints, understanding what libraries are safe to use, and setting up feature flags that will check code in one piece at a time. The notion of a full release is a thing of the past – individual components are released continually. There also needs to be a QA mindset of testing the code and micro services to break it and then, fix accordingly through your agile DevSecOps methodologies. Sumo Logic is a cloud native service running in AWS for over 7 years now – why did you decide to build your service in the cloud? Can you describe a bit about that journey, what was it like, what obstacles did you face, how did you overcome them? And lastly, what did you learn along the way? George: Our company founders came from HP Arcsight and new full well of the pain in managing the execution environment – the hardware and software provisioning, the large teams needed, the protracted time to roll out new services. The cloud enabled us to be agile, flexible, highly elastic, and do this all securely at scale – it is at a level that was just not possible if we chose an on-prem model. The simplicity and automation capabilities of AWS was hugely attractive. You start setting up load balancers to be able to leverage tools like Chef to be able to do manage machine patching – it gets easier – and then you can start automating things from the very beginning so I think it’s that idea of starting very simple and leveraging native services that cloud service providers give you you and then looking for the gaps. The challenge initially was that this is a whole new world out and then the bigger challenge became getting people to buy off on the fact that cloud is more secure. People just weren’t there yet. What does Sumo Logic’s footprint look like in AWS? George: 100PB+ of data that is analyzed daily, 10K EC2 instances on any given day, 10M keys under management! We have over 1,300 customers and our service is growing by leaps and bounds. At this stage, it is all about logos – you wanna bring people in and you can’t afford to have bad customer service because this is a subscription-based model. When you think about the scale that we have, it’s also the scale that we have to protect our data. Now the challenge of quadrupling that number every year is extremely difficult so you have a long term view when it comes to scalability of security. 10,000+ instances it’s a very elastic type environment and auditors really struggle with this. One of the things that i’m the most proud of…if you look at hundreds of petabytes processed and analyzed daily, thats insane…thats the value of being in the cloud. 10 million of keys under management…thats huge…really?? George: It’s a very unique way that we do encryption. It makes our customers very comfortable with the dual control models…that they have some ownership over the keys and then we have capability to vault the keys for them. We do rotate the keys every 24 hours. The customers end up with 730 unique key rings on their keys at the end of the year. It’s a very slick, manageable program. We do put it into a vault and that vault is encrypted with key encryption key (KEK). So What Tools and Technologies are you using in AWS? George: Elastic load balancers are at the heart of what we do…we set up those load balancers to make sure that nothing that’s threatening gets through…so that’s our first layer of defense and then use security groups and we use firewalls to be able to route traffic to the right places to make sure only users can access. We use file integrity monitoring and we happen to use host sec for that across every host and we manage and that gives us extreme visibility. We also leverage IDS and snort and those signatures across those boxes to detect any kind of signature based attacks. Everything we do in the cloud is agentless or on the host. When you’re baking security in you have it on ALL of your systems, spun up automatically via scripts. We also have a great partnership with Crowdstrike, where threat intelligence is baked into our platform to identify malicious indicators of compromise and match that automatically to our customers logs data – very powerful So how are you leveraging Sumo to secure your own service? Can you share some of the tips, tricks and best practices you have gleaned over the years? George: Leveraging apps like CloudTrail, now we are able to see when a event takes place who is the person behind the event, and start looking for the impact of the event. I’m constantly looking for authorization type events (looking at Sumo Dashboards). When it comes to compliance I have to gather evidence of who is in the security groups. Sumo is definitely in the center of everything that we do. We have some applications built also for PCI and some other things as well to VPC flow logs but it gives us extreme visibility. We have dashboards that we have built internally to manage the logs and data sources. It is extremely valuable once you start correlating patterns of behavior and unique forms of attack patterns across the environment. You need to be able to identify how does that change that you just made impact the network traffic and latency in my environment and pulling in things like AWS inspector…How did that change that you made have an impact on my compliance and security posture. You want to have the visibility but then measure the level of impact when someone does make a change and even more proactively I want to have the visibility when something new is added to the environment or when something is deleted from the environment. Natively in AWS, it is hard to track these things” How does the Sumo Logic technology stack you talked about earlier help you with Compliance? George: Being able to do evidence gathering and prove that you’re protecting data is difficult. We’re protecting cardholder data, healthcare data and a host of other PII from the customers we serve across dozens of industries. We pursue our own security attestations like PCI, CSA Star, ISO 27001, SOC 2 Type 2, and more. We do not live vicariously through the security attestations of AWS like too many organizations do. Also, encryption across the board. All of these controls and attestations give people a level of confidence that we are doing the right things to protect their data and there’s actual evidence gathering going on. Specifically with respect to PCI, we leverage Sumo Logic PCI apps for evidence gathering -nonstop- across CloudTrail, Windows and Linus servers. We built out those apps for internal use, but released them to the public at RSA. There are a lot of threat actors out there, from Cyber Criminals, Corporate Spies, Hacktivists and Nation States. How do you see the threat landscape changing wrt the cloud. Is the risk greater given the massive scale of the attack surface? If someone hacked into an account, could they cause more damage by pointing their attack at Amazon, from within the service, possibly affecting millions of customers? George: It all starts with password hygiene. People sacrifice security for convenience. It’s a great time for us to start leveraging single sign on and multi factor authentication and all these different things that need to be involved but at a minimum end users should use heavily encrypted passwords…they should not bring in their personal type application passwords into the business world…If you start using basic password hygiene since day 1, you’re gonna follow the best habits in the business world. The people who should be the most responsible are not…I look at admins and developers in this way…all the sudden you have a developer put their full blown credentials into a slack channel. So when you look out toward the future, wrt the DevSecOps movement, the phenomenal growth of cloud providers like AWS and Azure, Machine learning and Artificial Intelligence, the rise of security as code, ….What are your thoughts, where do you see things going, and how should companies respond? George: First off, for the organizations that aren’t moving out to the cloud, at one point or the other, you’re gonna find yourself irrelevant or out of business. Secondly, you’re going to find that that the cloud is very secure. You can do a lot using cloud-based security if you bake security in since day one and work with your developers…if you work with your team…. you can be very secure. The future will hold a lot of cloud-based attacks. User behavior analytics…I can’t no longer go through this world of security and have hard-coded rules and certain things that I’m constantly looking for with all these false positives. I have to be able to leverage machine learning algorithms to consume and crunch through that data. The world is getting more cloudy more workloads moving into the cloud, teams will be coming together…security will be getting more backed in into the process. How would you summarize everything? George: “You’re developing things, you wanna make sure you have the right hygiene and security built into it and you have visibility into that and that allows you to scale as things get more complex where things actually become more complex is when you start adding more humans into it and you have less trust but if you have that scalability and visibility from day one and a simplistic approach, it’s going to do a lot of good for you. Visibility allows you to make quick decisions and it allows you to automate the right things and ultimately you need to have visibility because it allows you to have the evidence that you need to be compliant to help people feel comfortable that you’re protecting your data in the right way. George Gerchow can be reached at https://www.linkedin.com/in/georgegerchow or @georgegerchow

Blog

Top Patterns for Building a Successful Microservices Architecture

Why do you need patterns for building a successful microservices architecture? Shouldn’t the same basic principles apply, whether you’re designing software for a monolithic or microservices architecture? Those principles do largely hold true at the highest and most abstract levels of design (i.e., the systems level), and at the lowest and most concrete levels (such as classes and functions). But most code design is really concerned with the broad range between those two extremes, and it is there that the very nature of microservices architecture requires not only new patterns for design, but also new patterns for reimagining existing monolithic applications. The truth is that there is nothing in monolithic architecture that inherently imposes either structure or discipline in design. Almost all programming languages currently in use are designed to enforce structure and discipline at the level of coding, of course, but at higher levels, good design still requires conscious adherence to methodologies that enforce a set of architectural best practices. Microservices architecture, on the other hand, does impose by its very nature a very definite kind of structural discipline at the level of individual resources. Just as it makes no sense to cut a basic microservice into arbitrary chunks, and separate them, it makes equally little sense to bundle an individual service with another related or unrelated service in an arbitrary package, when the level of packaging that you’re working with is typically one package per container. Microservices Architecture Requires New Patterns In other words, you really do need new patterns in order to successfully design microservices architecture. The need for patterns starts at the top. If you are refactoring a monolithic program into a microservices-based application, the first pattern that you need to consider is the one that you will use for decomposition. What pattern will you use as a guide in breaking the program down into microservices? What are the basic decomposition patterns? At the higher levels of decomposition, it makes sense to consider such functional criteria as broad areas of task-based responsibility (subdomains), or large-scale business/revenue-generating responsibilities (business capabilities). In practice, there is considerable overlap between these two general functional patterns, since a business’ internal large-scale organization of tasks is likely to closely match the organization of business responsibilities. In either case, decomposition at this level should follow the actual corporate-level breakdown of basic business activities, such as inventory, delivery, sales, order processing, etc. In the subsequent stages of decomposition, you can define groups of microservices, and ultimately individual microservices. This calls for a different and much more fine-grained pattern of decomposition—one which is based largely on interactions within the application, with individual users, or both. Decomposition Patterns for Microservices Architecture There are several ways to decompose applications at this level, depending in part on the nature of the application, as well as the pattern for deployment. You can combine decomposition patterns, and in many if not most cases, this will be the most practical and natural approach. Among the key microservice-level decomposition patterns are: Decomposition by Use Case In many respects, this pattern is the logical continuation of a large-scale decomposition pattern, since business capabilities and subdomains are both fundamentally use case-based. In this pattern, you first identify use cases: sequences of actions which a user would typically follow in order to perform a task. Note that a user (or actor) does not need to be a person; it can, in fact, be another part of the same application. A use case could be something as obvious and common as filling out an online form or retrieving and displaying a database record. It could also include tasks such as processing and saving streaming data from a real-time input device, or polling multiple devices to synchronize data. If it seems fairly natural to model a process as a unified set of interactions between actors with an identifiable purpose, it is probably a good candidate for the use case decomposition pattern. Decomposition by Resources In this pattern, you define microservices based on the resources (storage, peripherals, databases, etc.) that they access or control. This allows you to create a set of microservices which function as channels for access to individual resources (following the basic pattern of OS-based peripheral/resource drivers), so that resource-access code does not need to be duplicated in other parts of the application. Isolating resource interfaces in specific microservices has the added advantage of allowing you to accommodate changes to a resource by updating only the microservice that accesses it directly. Decomposition by Responsibilities/Functions This pattern is likely to be most useful in the case of internal operations which perform a clearly defined set of functions that are likely to be shared by more than one part of the application. Such responsibility domains might include shopping cart checkout, inventory access, or credit authorization. Other microservices could be defined in terms of relatively simple functions (as is the case with many built-in OS-based microservices) rather than more complex domains. Microservices Architecture Deployment Patterns Beyond decomposition, there are other patterns of considerable importance in building a microservices-based architecture. Among the key patterns are those for deployment. There are three underlying patterns for microservices deployment, along with a few variations: Single Host/Multiple Services In this pattern, you deploy multiple instances of a service on a single host. This reduces deployment overhead, and allows greater efficiency through the use of shared resources. It has, however, greater potential for conflict, and security problems, since services interacting with different clients may be insufficiently isolated from each other. Single Service per Host, Virtual Machine, or Container This pattern deploys each service in its own environment. Typically, this environment will be a virtual machine (VM) or container, although there are times when the host may be defined at a less abstract level. This kind of deployment provides a high degree of flexibility, with little potential for conflict over system resources. Services are either entirely isolated from those used by other clients (as is the case with single-service-per-VM deployment), or can be effectively isolated while sharing some lower-level system resources (i.e., containers with appropriate security features). Deployment overhead may be greater than in the single host/multiple services model, but in practice, this may not represent significant cost in time or resources. Serverless/Abstracted Platform In this pattern, the service runs directly on pre-configured infrastructure made available as a service (which may be priced on a per-request basis); deployment may consist of little more than uploading the code, with a small number of configuration settings on your part. The deployment system places the code in a container or VM, which it manages. All you need to make use of the microservice is its address. Among the most common serverless environments are AWS Lambda, Azure Functions, and Google Cloud Functions. Serverless deployment requires very little overhead. It does, however, impose significant limitations, since the uploaded code must be able to meet the (often strict) requirements of the underlying infrastructure. This means that you may have a limited selection of programming languages and interfaces to outside resources. Serverless deployment also typically rules out stateful services. Applying Other Patterns to Microservices Architecture There are a variety of other patterns which apply to one degree or another to microservices deployment. These include patterns for communicating with external applications and services, for managing data, for logging, for testing, and for security. In many cases, these patterns are similar for both monolithic and microservices architecture, although some patterns are more likely to be applicable to microservices than others. Fully automated parallel testing in a virtualized environment, for example, is typically the most appropriate pattern for testing VM/container-based microservices. As is so often the case in software development (as well as more traditional forms of engineering), the key to building a successful microservices architecture lies in finding the patterns that are most suitable to your application, understanding how they work, and adapting them to the particular circumstances of your deployment. Use of the appropriate patterns can provide you with a clear and accurate roadmap to successful microservices architecture refactoring and deployment. About the Author Michael Churchman is involved in the analysis of software development processes and related engineering management issues. Top Patterns for Building a Successful Microservices Architecture is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Getting Started with Graphite Monitoring

Graphite is a complete monitoring tool for both simple and complex environments. It can be used for the monitoring of various networked systems—including sites, apps, services and servers in local and cloud environments. This range of possibilities serves companies of diverse segments and sizes. In this post all explain how graphite monitoring can help you get greater visibility into your application and infrastructure. A Quick Graphite Monitoring FAQ Graphite can support a company that has a specialized infrastructure, with a dedicated team and complex environments. It can also work well for small companies that have smaller teams and equipment. When it comes to choosing a monitoring tool, an admin usually asks several questions, such as: How long does it take to deploy? Does this tool address our issues? Will my team be able to work with and understand the essence of the tool? What reports are available? An administrator needs to ask these questions so that the choice of tool is as accurate as possible. Below are the answers to these questions for Graphite. How long does Graphite take to set up? Setting up and deploying Graphite is simple thanks to an installation script called Synthesize. And extensive technical documentation on Graphite makes it possible to gather information on the tool quickly. Essentially, Synthesize is a script that allows the installation and configuration of a range of components to automate the configuration of Graphite. Does this tool work in my environment? Graphite can support almost any type of environment you need to run it in. It works in the cloud. It runs on hybrid infrastructure. It works with on-premises servers. And it can run efficiently no matter what the size of your environment. Will my team be able to work with and understand the essence of the tool? If your team is able to read and interpret, they will understand the essence of the tool.As previously mentioned, Graphite has thorough documentation with step-by-step instructions, and scripts available to change according to your needs. What reports are available? Graphite reports are inclusive, well-crafted and easy to manipulate. This is useful because reports are often used by people who are not part of the technology team, and they need to be able to understand the information quickly. The reports will be used most often to justify requests for purchases of new equipment, as well as hardware upgrades, and for performance measurement. How Graphite Monitoring Works Now that we’ve discussed what Graphite is and where you can use it, let’s take a look at how it works. Graphite is composed of the following items: Carbon A service that will be installed on the client computer and will listen for the TCP/UDP packets to be sent over the network. Whisper The database that will store data collected from the machines. Graphite Webapp Django web app for graphics generation. To provide a sense of Graphite in action, I’ll next provide an overview of how it works on different Linux distributions. I’ll discuss my experience installing and configuring Graphite on both Ubuntu and CentOS, so that I cover both sides (the Ubuntu/Debian and CentOS/Red Hat of the Linux universe). Installing Graphite on Ubuntu Using Synthesize I installed Graphite on an Ubuntu server hosted in the cloud. Everything went smoothly, but here are a couple of special tweaks I had to perform: After logging into the cloud environment and installing Ubuntu 14.04 to test the Synthesize script, it was necessary for the cloud platform to release a port on the firewall so that the data could be sent to the dashboard (which should be done in any infrastructure, be it cloud or local). I had to be careful to release only the application port and not the full range. After that, I used a data collection dashboard recommended by Graphite, called Grafana. However, you can also stream Graphite data directly into Sumo Logic. Manual Installation of Graphite on CentOS Now let’s try a manual installation of Graphite, without using Synthesize. In a manual installation, we have to be careful about the dependencies that the operating system requires to run smoothly. To make the job a bit more complex for this example, I decided to use CentOS, and I followed the steps below. The first requirement for every operating system is to upgrade (if you haven’t already) so that it does not cause dependency problems later. sudo yum -y update Install the dependencies: sudo yum -y install httpd gcc gcc-c++ git pycairo mod_wsgi epel-release python-pip python-devel blas-devel lapack-devel libffi-devel Access the local folder to download the sources: cd /usr/local/src Clone the source of Carbon: sudo git clone https://github.com/graphite-project/carbon.git Access the Carbon folder: cd /usr/local/src/carbon/ Install Carbon: sudo python setup.py install Clone the source of Graphite Web: sudo git clone https://github.com/graphite-project/graphite-web.git Access the Graphite Web folder: cd /usr/local/src/graphite-web/ Install Graphite Web: sudo pip install -r /usr/local/src/graphite-web/requirements.txt sudo python setup.py install Copy the Carbon configuration file: sudo cp /opt/graphite/conf/carbon.conf.example /opt/graphite/conf/carbon.conf Copy the storage schemas configuration file: sudo cp /opt/graphite/conf/storage-schemas.conf.example /opt/graphite/conf/storage-schemas.conf Copy the storage aggregation configuration file: sudo cp /opt/graphite/conf/storage-aggregation.conf.example /opt/graphite/conf/storage-aggregation.conf Copy the relay rules configuration file: sudo cp /opt/graphite/conf/relay-rules.conf.example /opt/graphite/conf/relay-rules.conf Copy the local settings file: sudo cp /opt/graphite/webapp/graphite/local_settings.py.example /opt/graphite/webapp/graphite/local_settings.py Copy the Graphite WSGI file: sudo cp /opt/graphite/conf/graphite.wsgi.example /opt/graphite/conf/graphite.wsgi Copy the virtual hosts file: sudo cp /opt/graphite/examples/example-graphite-vhost.conf /etc/httpd/conf.d/graphite.conf Copy the initi files to /etc/init.d: sudo cp /usr/local/src/carbon/distro/redhat/init.d/carbon-* /etc/init.d/ Give permission to execute the init files: sudo chmod +x /etc/init.d/carbon-* Start the Carbon cache: sudo systemctl start carbon-cache Enable httpd: sudo systemctl enable httpd Start http: sudo systemctl start httpd With this configuration, we can access the Graphite web interface in https://localhost:8080, and monitor the local server running CentOS. Graphite and Sumo Logic Now that you have Graphite running, you can stream Graphite-formatted metrics directly into Sumo Logic. All you need do is set up an installed collector and connect it to a metrics source. This webinar walks you through the steps: Sources are the environments that Sumo Logic Collectors connect to to collect data from your site. Each Source is configured to collect files in a specific way, depending on the type of Collector you’re using. The Setup Wizard in Sumo Logic walks you through the process. You’ll find Linux, Mac OS and Windows instructions for installing a new Graphite collector here. Part of this process defines how your data will be tagged for _sourceCategory, the protocol and port you’ll use to stream data. Next, simply configure a Graphite source for the collector to connect to. Here are the steps for configuring your Graphite source. That’s it. Now you can use Sumo Logic’s advanced analytics to search and visualize data streaming from your application and infrastructure. Enjoy! About the Author Brena Monteiro is a software engineer with experience in the analysis and development of systems. She is a free software enthusiast and an apprentice of new technologies. Getting Started with Graphite Monitoring is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

CloudFormation and Sumo Logic - Build Monitoring into your Stack

Curious about Infrastructure as Code (IaC)? Whether you're new to AWS CloudFormation, or you control all of your cloud infrastructure through CloudFormation templates, this post demonstrates how to integrate Sumo Logic's monitoring platform into an AWS CloudFormation stack. Collect Logs and Metrics from your Stack Sumo Logic's ability to Unify your Logs and Metrics can be built into your CloudFormation Templates. Collect operating system logs, web server logs, application logs, and other logs from an EC2 instance. Additionally, Host Metrics, AWS CloudWatch Metrics, and Graphite formatted metrics can be collected and analyzed.With CloudFormation and Sumo Logic, you can achieve version control of your AWS infrastructure and your monitoring platform the same way you version and improve your software. CloudFormation Wordpress Stack with Sumo Logic Built-In Building off of the resources Adrian Cantrill provided in his Advanced CloudFormation course via A Cloud Guru, we will launch a test Wordpress stack with the following components: Linux EC2 instance - you choose the size! RDS instance - again, with a configurable size S3 bucket The Linux EC2 instance is bootstrapped with the following to create a LAMP stack: Apache MySQL PHP MySQL-PHP Libraries We also install Wordpress, and the latest version of the Sumo Logic Linux collector agent. Using the cfn-init script in our template, we rely on the file key of AWS::CloudFormation::Init metadata to install a sources.json file on the instance. This file instructs Sumo Logic to collect various types of logs and metrics from the EC2 instance: Linux OS Logs (Audit logs, Messages logs, Secure logs) Host Metrics (CPU, Memory, TCP, Network, Disk) Apache Access logs cfn-init logs Tutorial - Launch a CloudFormation Stack and Monitor Logs and Metrics Instantly First, you'll need a few things: A Sumo Logic account - Get a free one Here Access to an AWS account - If you don't have access you can sign up for the free tier here A local EC2 Key Pair - if you don't have one you can create one like this After you have access to your Sumo Logic account and an AWS account, navigate to an unused Region if you have one. This will give you a more isolated sandbox to test in so that we can more clearly see what our CloudFormation template creates. Make sure you have an EC2 key pair in that Region, you'll need to add this to the template.*Leveraging pseudo parameters, the template is portable, meaning it can be launched in any Region. First, log into AWS and navigate to CloudFormation. Choose 'Create New Stack' Then, download the example CloudFormation template from GitHub here Next, on line 87, in the EC2 Resources section, make sure to edit the value of the "KeyName" field to whatever your EC2 key is named for your current Region*Make sure the Region you choose to launch the stack in has an EC2 Key Pair, and that you update line 87 with your key's name. If you forget to do this your stack will fail to launch! Select 'Choose File' and upload the template you just downloaded and edited, then click Next Title your stack Log into Sumo Logic. and in the top-right click on your email username, then preferences, then '+' to create a Sumo Logic Access key pair Enter the Sumo Logic key pair into the stack details page. You can also select an EC2 and RDS instance size, and enter a test string that we can navigate to later when checking that we can communicate with the instance. Click 'Next', name/tag your stack if you'd like, then click 'Next' again, the select 'Create' to launch your stack! Now What? View Streaming Logs and Metrics! You've now launched your stack. In about 10-15 minutes, we can visit our Wordpress server to verify everything is working. We can also search our Apache logs and see any visitors (probably just us) that are interacting with the instance. Follow these steps to explore your new stack, and your Sumo Logic analytics: View the CloudFormation Events log. You should see four CREATE_COMPLETE statuses like so: Check your Sumo Logic account to see the collector and sources that have been automatically provisioned for you: What's Next? Sumo Logic collects AWS CloudWatch metrics, S3 Audit logs, and much more. Below is more information on the integrations for AWS RDS Metrics and also S3 Audit Logs: Amazon RDS Metrics Amazon S3 Audit Explore your logs! Try visiting your web server by navigating to your EC2 instance's public IP address This template uses the default security group of your Region's VPC, so you'll need to temporarily allow inbound HTTP traffic from either your IP, or anywhere (your IP is recommended) To do this, navigate to the EC2 console and select the Linux machine launched via the CloudFormation Template Then, scroll down to the Security Group and click 'default' as shown below Edit the inbound rules to allow HTTP traffic in, either from your IP or anywhere After you've allowed inbound HTTP traffic, navigate in your browser to <your-public-ip>/wordpress (something like 54.149.214.198/wordpress) and you'll see you're new Wordpress front end: You can also test the string we entered during setup by navigating to <your-public-ip>/index2.html Search you Sumo Logic account with _sourceCategory=test/apache and view your visits to your new Wordpress web server in the logs Finally, check out the metrics on your instance by installing the Host Metrics App: Cleanup Make sure to delete you stack as shown below, and to remove inbound HTTP rules on your default Security Group.

Blog

ELK Stack vs. Sumo Logic: Building or Buying Value?

Blog

The Great Big Wall and Security Analytics

Not long ago I was visiting the CISO of a large agriculture biotechnology company in the Midwest – we’ll call him Ron – and he said to me “Mark these cyber terrorists are everywhere, trying to hack into our systems from Russia and China, trying to steal our intellectual property. We have the biggest and the brightest people and the most advanced systems working on it, but they are still getting through. We are really challenged in our ability to identify and resolve these cyber threats in a timely manner. Can you help us?” Business issues that CISOs and their security teams face are significant. Customers are now making different decisions based on the trust factors they have with the companies they do business with. So implementing the right levels of controls, increasing team efficiency, to rapidly identify and resolve security incidents becomes of paramount importance. But despite this big wall that Ron has built, and the SIEM technology they are currently using, threats are still permeating the infrastructure, trying to compromise their applications and data. With over 35 security technologies in play, trying to get holistic visibility was a challenge, and with a small team, managing their SIEM was onerous. Additionally, the hardware and refresh cycles over the years, as their business has grown, has been challenged by flat budget allocations. “Do more with less” was frequently what they heard back from the CIO. Like any company that wants to be relevant in this modern age, they are moving workloads to the cloud, adopting DevOps methodologies to increase the speed of application delivery, creating new and disruptive experiences for their customers, to maintain their competitive edge. But as workloads were moved to the cloud – they chose AWS- the way things were done in the past were no longer going to work. The approach to security needed to change. And it was questionable if the SIEM solution they were using was even going to run in the cloud and support native AWS services, as scale. SIEMs are technologies that were architected over 15 year ago, and they were really designed to solve a different kind of problem – traditional on prem, perimeter based, mode 1 type security applications, going after known security threats. But as organizations are starting to move to the cloud, accelerating the pace at which they roll our new code, adopting DevOps methodologies, they need something different. Something that aligns to the Mode 2 digital initiatives of modern applications. Something that is cloud native, provides elasticity on demand, and delivers rapid time to value, not constrained by fixed rule sets going after known threats but instead, leveraging machine learning algorithms to uncover anomalies, deviations and unknown threats in the environment. And lastly, something that integrates threat intelligence OOTB to increase velocity and accuracy of threat detection – so you can get a handle on threats coming at your environment trying to compromise your applications and data. Is that great big wall working for you? Likely not. To learn more about Sumo Logic’s Security Analytics capabilities, please checkout our press release, blog or landing page. Mark Bloom can be reached at https://www.linkedin.com/in/markbloom or @bloom_mark

Blog

Ever wondered how many AWS workloads run on Docker?

Blog

Provide Real-Time Insights To Users Without A Sumo Logic Account

You just finished building some beautiful, real-time Sumo Logic dashboards to monitor your infrastructure and application performance and now you want to show them off to your colleagues. But your boss doesn’t have a Sumo Logic account and your ops team wants this information on TVs around the office. Sound like a familiar situation? We’ve got you covered. You can now share your live dashboards in view-only mode with no login required, all while maintaining the security and transparency that your organization requires. We’ll even kick things off with a live dashboard of our own. Share Information with Colleagues and Customers This new feature enables you to share a dashboard so that anyone with the URL can view your dashboard without logging in. It reduces the friction for sharing information even further so that the right people have the right information when they need it. For example: Colleagues: Share operational and business KPIs with colleagues or executives who do not have a Sumo Logic account. Internal TVs: Display real-time information about your infrastructure and application on monitors throughout your building. Customers: Provide SLA performance or other statistics to your customers. Granular Permissions for Administrators Sharing your sensitive information to users without a login is a serious matter. With great power comes great responsibility, and no matter how much you trust your colleagues that use Sumo Logic, you may not want this power being wielded by all of your team members. If you are an administrator, you can decide which users have this permission and educate them on best practices for sharing information within and outside of your organization. By default, this capability is turned off and can only be enabled by administrators on the account. Protect Dashboard URLs with an IP / CIDR Whitelist For those who want even more protection over who can view these dashboards without logging in, you can restrict viewers to only those accessing it from specific IP addresses or CIDRs. This works great when you are placing live dashboards on TVs throughout your building and you want to make sure that this information stays in your building. Similarly, you might want to help your internal ops team troubleshoot a problem quickly without logging in. Send them the URL via email or Slack, for example, and rest assured that the information will remain in the right hands. If you decide to remove an IP address from your whitelist, any users connecting from that IP will no longer be able to view that dashboard. Complete Visibility through Audit Logs As an extra layer of transparency, you can keep track of which dashboards are shared outside of your organization and see which IPs are viewing them through your audit logs. With this information, you can: Configure real-time alerts to get notified anytime a user shares a dashboard Generate daily or weekly reports with a list of users and their shared dashboards Create dashboards of your shared dashboards – see where your dashboards are being viewed from so you can follow up on any suspicious activity. Receive alerts when someone shares a dashboard outside of your organization Use audit logs to see where your dashboards are being viewed from Learn More So go ahead – earn those bonus points with your boss and show off your dashboards today! Check out this webinar for a refresher on creating dashboards, then head over to Sumo Logic DocHub for more information on sharing these to users without an account.

March 10, 2017

Blog

Sumo Logic launches Multi-Factor Authentication for its Platform

The biggest risk to any organization is the end user and their password hygiene. It is an age old problem as users want to keep things easy using the same dictionary based password for all applications! This problem will continue to exist until we change end user behavior via policy enforcement and apply other layers of protection such as Single Sign On and Multi Factor Authentication (MFA). Because of this, MFA is becoming more of a must than a nice to have as companies starting to adopt a healthier security posture/program. In fact, MFA has become a full blown requirement to achieve critical compliance certifications that would provide your company with a better security reputation and demonstrate evidence of data protection. As a Cloud Security Engineer, I would love for MFA to be adopted across the board, which is part of the reason we are writing this blog, to provide our insights into the importance of implementing MFA across an enterprise. As some of you may have recently heard, Sumo Logic is now PCI 3.2 DSS compliant, which we could not have achieved without the diligence of our DevSecOps team putting some cycles together to get Multi-Factor Authentication delivered to the Sumo Logic base via the platform for another layer of password defense. When logging into the Sumo platform, you can now enable the 2-step verification for your entire organization, within the security policies section of Sumo, as seen below. When Multi Factor Authentication is enabled globally for the Org, you will be prompted with the following screen, to configure your MFA. Every login from here on out will now prompt the following screen after completed configuration. What does Multi-factor authentication provide to the end user? A low friction way to keep their credentials from being compromised and make it extremely difficult for attackers to take advantage of weak end user passwords. With the emergence of Cloud Computing, password-based security just won’t cut it anymore. Applying this extra layer of defense to credentials drastically drops the chance of your account ever being compromised. At Sumo Logic, we are glad to extend this extra layer of defense to our customers as they access our Multi Tenant Saas based offering.

Blog

AWS CodePipeline vs. Jenkins CI Server

Blog

OneLogin Integrates with Sumo Logic for Enhanced Visibility and Threat Detection

OneLogin and Sumo Logic are thrilled to announce our new partnership and technology integration (app coming May 2017) between the two companies. We’re alike in many ways: we’re both cloud-first, our customers include both cloud natives and cloud migrators, and we are laser-focused on helping customers implement the best security with the least amount of effort. Today’s integration is a big step forward in making effortless security a reality. What does this integration do? OneLogin’s identity and access management solution allows for the easy enforcement of login policies across all their laptops, both Macs and Windows, SaaS applications, and SAML-enabled desktop applications. This new partnership takes things a step further by making it possible to stream application authentication and access events to over 200 application-related events. This includes over 200 application-related events, including: Who’s logged into which laptops — including stolen laptops Who’s accessed which applications — e.g., a salesperson accessing a finance app Who’s unsuccessfully logged in — indicating a potential attack in progress Who’s recently changed their password — another potential indicator of an attack Which users have lost their multi-factor authentication device — indicating a potential security weakness Which users have been suspended — to confirm that a compromised account is inactive User provision and de-provision activity – to track that users are removed from systems after leaving the company And finally, which applications are the most popular and which might be underutilized, indicating potential areas of budget waste These capabilities are critical for SecOps teams that need to centralize and correlate machine data across all applications. This, in turn, facilitates early detection of targeted attacks and data breaches, extends audit trails to device and application access, and provides a wider range of user activity monitoring. Because OneLogin has over 4000 applications in our app catalog, and automatically discover new applications and add them to its catalog, we can help you extend visibility across a wide range of unsanctioned Shadow IT apps. The integration uses streaming, not polling. This means that events flow from OneLogin into Sumo as soon as they are generated, not after a polling interval. This lets you respond more quickly to attacks in progress. How does the integration work? Since both OneLogin and Sumo Logic are cloud-based, integrating the two is a simple one-screen setup. Once integration is complete, you can use Sumo Logic to query OneLogin events, as well as view the following charts: Visitors heatmap by metro area. Suppose you don’t have any known users in Alaska — that anomaly is quite clear here, and you can investigate further. Logins by country. Suppose you don’t have any known users in China; 80 potentially malicious logins are evident here. Failed logins over time. If this number spikes, it could indicate a hacking attempt. Top users by events. If one user has many events, it could indicate a compromised account that should be deactivated in OneLogin. Events by app. If an app is utilized more than expected, it could indicate anomalous activity, such as large amounts of data downloads by an employee preparing to leave the company. All this visibility helps customers better understand how security threats could have started within their company. This is especially helpful when it comes to phishing attacks, which, according to a recent report by Gartner, are “the most common targeted method of cyberattacks, and even typical, consumer-level phishing attacks can have a significant impact on security.” Summing up: Better Threat Detection and Response Sumo Logic’s vice president of business development, Randy Streu, sums it up well: “Combining OneLogin’s critical access and user behavior data with Sumo Logic’s advanced real-time security analytics solution provides unparalleled visibility and control for both Sumo Logic and OneLogin customers.” This deep and wide visibility into laptop and application access helps SecOps teams uncover weak points within their security infrastructures so that they know exactly how to best secure data across users, applications, and devices. Get started for free Even better, OneLogin and Sumo Logic are each offering free versions of their respective products to each other’s customers to help you get started. The OneLogin for Sumo Logic Plan includes free single sign-on and directory integration, providing customers with secure access to Sumo Logic through SAML SSO and multi-factor authentication while eliminating the need for passwords. Deep visibility. Incredibly simple integration. Free editions. We’re very pleased to offer all this to our customers. Click here to learn more. *The Sumo Logic App for One Login, for out of the box visualizations and dash boarding will be available May 2017* This blog was written by John Offenhartz who is the Lead Product Owner of all of OneLogin’s integration and development programs. John’s previous experiences cover over twenty years in Cloud-based Development and Product Management with such companies as Microsoft, Netscape, Oracle and SAP. John can be reached at https://www.linkedin.com/in/johnoffenhartz

February 17, 2017

Blog

Analyze Azure Network Watcher Flow Logs with Sumo Logic

Blog

New DevOps Site Chronicles the Changing Face of DevOps

Blog

Sumo Logic Delivers Industry's First Multi-Tenant SaaS Security Analytics Solution with Integrated Threat Intelligence

Integrated Threat Intelligence Providing Visibility into Events that Matter to You! You’ve already invested a great deal in your security infrastructure to prevent, detect, and respond to cybersecurity attacks. Yet you may feel as if you’re still constantly putting out fires and are still uncertain about your current cybersecurity posture. You’re looking for ways to be more proactive, more effective, and more strategic about your defenses, without having to “rip and replace” all your existing defense infrastructure. You need the right cyber security intelligence, delivered at the right time, in the right way to help you stop breaches. That is exactly what Sumo Logic's integrated threat intelligence app delivers. Powered by Crowdstrike, Sumo's threat intelligence offering addresses a number requests we were hearing from customers: Help me increase the velocity & accuracy of threat detection. Enable me to correlate Sumo Logic log data with threat intelligence data to identify and visualize malicious IP addresses, domain names, email addresses, URLs and MD5 Hashes. Alert me when there is some penetration or event that maps to a known indicator of compromise (IOC) and tell me where else these IOCs exist in my infrastructure. And above all, make this simple, low friction, and integrated into your platform. And listen we did. Threat intelligence is offered as part of Sumo's Enterprise and Professional Editions, at no extra cost to the customer. Threat Intel Dashboard Supercharge your Threat Defenses: Consume threat intelligence directly into your enterprise systems in real time to increase velocity & accuracy of threat detection. Be Informed, Not Overwhelmed: Real-time visualizations of IOCs in your environment, with searchable queries via an intuitive web interface. Achieve Proactive Security: Know which adversaries may be targeting your assets and organization, thanks to strategic, operational and technical reporting and alerts. We chose to partner with CrowdStrike because they are a leader in cloud-delivered next-generation endpoint protection and adversary analysis. CrowdStrike’s Falcon Intelligence offers security professionals an in-depth and historical understanding of adversaries, their campaigns, and their motivations. CrowdStrike Falcon Intelligence reports provide real-time adversary analysis for effective defense and cybersecurity operations. To learn more about Sumo Logic's Integrated Threat Intelligence Solution, please go to http://www.sumologic.com/application/integrated-threat-intelligence.

AWS

February 6, 2017

Blog

Using Sumo Logic and Trend Micro Deep Security SNS for Event Management

As a principal architect at Trend Micro, focused on AWS, I get all the ‘challenging’ customer projects. Recently a neat use case has popped up with multiple customers and I found it interesting enough to share (hopefully you readers will agree). The original question came as a result of queries about Deep Security’s SIEM output via syslog and how best to do an integration with Sumo Logic. Sumo has a ton of great guidance for getting a local collector installed and syslog piped through, but I was really hoping for something: a little less heavy at install time; a little more encrypted leaving the Deep Security Manager (DSM); and a LOT more centralized. I’d skimmed an article recently about Sumo’s hosted HTTP collector which made me wonder – could I leverage Deep Security’s SNS event forwarding along with Sumo’s hosted collector configuration to get Events from Deep Security -> SNS -> Sumo? With Deep Security SNS events sending well formatted json, could I get natural language query in Sumo Logic search without defining fields or parsing text? This would be a pretty short post if the answers were no… so let’s see how it’s done. Step 1: Create an AWS IAM account This account will be allowed to submit to the SNS topic (but have no other rights or role assigned in AWS). NOTE: Grab the access and secret keys during creation as you’ll need to provide to Deep Security (DSM) later. You’ll also need the ARN of the user to give to the SNS Topic. (I’m going to guess everyone who got past the first paragraph without falling into an acronym coma has seen the IAM console so I’ll omit the usual screenshots.) Step 2: Create the Sumo Logic Hosted HTTP Collector. Go to Manage-> Collection then “Add Collector”. Choose a Hosted Collector and pick some descriptive labels. NOTE: Make note of the Category for later Pick some useful labels again, and make note of the Source Category for the Collector (or DataSource if you choose to override the collector value). We’ll need that in a little while. Tip When configuring the DataSource, most defaults are fine except for one: Enable Multiline Processing in default configuration will split each key:value from the SNS subscription into its own message. We’ll want to keep those together for parsing later, so have the DataSource use a boundary expression to detect message beginning and end, using this string (without the quotes) for the expression: (\{)(\}) Then grab the URL provided by the Sumo console for this collector, which we’ll plug into the SNS subscription shortly. Step 3: Create the SNS topic. Give it a name and grab the Topic ARN Personally I like to put some sanity around who can submit to the topic. Hit “Other Topic Actions” then “Edit topic policy”, and enter the ARN we captured for the new users above as the only AWS user allowed to publish messages to the topic. Step 4: Create the subscription for the HTTP collector. Select type HTTPS for the protocol, and enter the endpoint shown by the Sumo Console. Step 5: Go to search page in the Sumo Console and check for events from our new _sourceCategory: And click the URL in the “SubscribeURL” field to confirm the subscription. Step 6: Configure the Deep Security Manager to send events to the topic Now that we’ve got Sumo configured to accept messages from our SNS topic, the last step will be to configure the Deep Security Manager to send events to the topic. Log in to your Deep Security console and head to Administration -> System Settings -> Event Forwarding. Check the box for “Publish Events to Amazon Simple Notification Service and enter the Access and Secret key for the user we created with permission to submit to the topic then paste in the topic ARN and save. You’ll find quickly that we have a whole ton of data from SNS in each message that we really don’t need associated with our Deep Security events. So let’s put together a base query that will get us the Deep Security event fields directly accessible from our search box: _sourceCategory=Deep_Security_Events | parse “*” as jsonobject | json field=jsonobject “Message” as DSM_Log | json auto field=DSM_Log Much better. Thanks to Sumo Logic’s auto json parsing, we’ll now have access to directly filter any field included in a Deep Security event. Let your event management begin! Ping us on if you have any feedback or questions on this blog… And let us know what kind of dashboards your ops & secops teams are using this for! A big thanks to Saif Chaudhry, Principle Architect at Trend Micro who wrote this blog.

February 6, 2017

Blog

ECS Container Monitoring with CloudWatch and Sumo Logic

Blog

How to Analyze NGINX Logs with Sumo Logic

Blog

Chief Architect Stefan Zier on Tips for Optimizing AWS S3

Blog

Overview of AWS Lambda Monitoring

AWS Lambda usage is exploding, and at last week’s re:invent conference, it was one of the key priorities for AWS. Lambda simplifies infrastructure by removing host management, and giving you the advantage of paying for just the compute and storage you use. This means it’s important to monitor your usage carefully to ensure you are managing your spend well. This post outlines the various options and approaches to AWS Lambda monitoring so you can make an informed decision about how you monitor your Lambda applications. AWS Lambda Monitoring Basics AWS provides vital stats for Lambda in a couple of ways. You can view metrics from your Lambda console, the CloudWatch console, via the Lambda or CloudWatch command line, or even through the CloudWatch API. The Lambda console gives you an overview of just the vital stats of your Lambda app. For a more detailed view, you can click through to the CloudWatch console. Here are the key terms you’ll notice in your CloudWatch console: Invocations: Invocations are the number of times a Lambda function is triggered in response to an event or API call. Errors: The number of failed invocations. Duration times: The time it takes to execute a Lambda function. This is measured in milliseconds. Throttles: The number of invocation attempts that were not executed because they exceed concurrency limits. Dead letter errors: Asynchronous errors are sent to a dead letter queue for further troubleshooting. If these errors are not written to a DLQ, they are counted as dead letter errors. Within CloudWatch you can set alarms to notify you of issues, or identify unused resources. You can also view all logs generated by your code in CloudWatch Logs. Archiving these logs will incur additional cost, but you can decide how far back you’d like to store log data. Troubleshooting Common Errors in Lambda With AWS lambda, you don’t have underlying infrastructure to monitor. This means that most of the errors can be resolved by troubleshooting your code. Here are the common errors that occur with Lambda: IAM roles & permissions: For your code to access other AWS services, you need to configure IAM roles correctly. If not done, this could result in a permissions denied error. Timeout exceeded: Some functions can take longer to execute than others, and will need a longer timeout setting. Memory exceeded: Some jobs like database operations require less memory. However, jobs that involve large files, like images, for example, will need more memory. You will need to adjust the MemorySize setting in your code if you see this error. Advanced Lambda Monitoring As you start out using Lambda for part of your app, you can get by with AWS’ default monitoring options. Once you start expanding Lambda usage in your applications, and even run entire apps in Lambda, you’ll need the power of a more robust monitoring tool. The Sumo Logic App for AWS Lambda is great for monitoring your Lambda functions and gaining deeper visibility into performance and usage. Here are benefits: Track compute & memory usage: The Sumo Logic app tracks compute performance of individual Lambda functions and lets you drill down to the details. This is important because you configure resources in your code, and you need visibility at the individual function level to ensure all processes have adequate resources. Cost correlation: The Sumo Logic app translates granular performance data to actual billed costs. It lets you track usage of memory, and even excess memory in functions, and prevents overspending. It enables you to split expense across multiple teams if you need to. Based on past behavior, the app can even predict future performance and spend. Integrated reporting: If you already use Sumo Logic to monitor and manage the rest of your infrastructure stack, you can bring it all in one place, and have realistic benchmarks for your Lambda usage. Real-time stats: The data in your Sumo Logic dashboards streams in real-time from AWS, so you never have to worry about seeing outdated information. Advanced visualizations: Sumo Logic lets you slice and dice your data with advanced analytics tools. The dashboards contain various chart types, including advanced types like the box plot chart. This kind of analysis is simply not possible within AWS. As you scale your usage of Lambda, you need deep monitoring that can correlate compute usage with costs. You’re used to having control and visibility over your server-driven apps using traditional monitoring tools. With Lambda, the approach to monitoring has changed, but your goals are the same—to gain visibility and efficiency into your app and the abstracted Lambda infrastructure. You need the right tools to enable this kind of monitoring. About the Author Twain began his career at Google, where, among other things, he was involved in technical support for the AdWords team. Today, as a technology journalist he helps IT magazines, and startups change the way teams build and ship applications. Overview of AWS Lambda Monitoring is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Overview of MongoDB Performance Monitoring

Monitoring components of a system, especially for performance, is the best (and really the only) way to ensure a consistent user experience and that all service levels are being achieved. System monitoring is often left to operations to figure out after the system is built and “production ready.” This article will highlight the main areas to ensure they are monitored in MongoDB. What Is Performance Monitoring? Performance monitoring is combining all the various other types of monitoring of an application, or component, to have an overall view of system performance. Good performance monitoring also provides the ability to compare current performance levels to what has been set in service level agreements and trends. The utopia of performance monitoring is to be predictive and proactive. Find changes in performance, tell the appropriate group about it, and resolve it before it ever impacts a client. Critical Metrics to Monitor in MongoDB If this was a web or mobile application, then you’d probably be thinking response time, and you would be correct. Except in MongoDB, the only response it tracks easily is how long it is taking to write out to its peers in its replication agreements. This is a valid metric since every user transaction gets written to at least one peer. Read times aren’t tracked as their own metric inside the platform. The other critical data point to monitor for performance is disk I/O. MongoDB, like any database platform, relies heavily on the speed it can read and write to the disk. If the disk I/O gets to 100% that means everything is now waiting on disk access and the entire system will slow to a crawl. Extremely Useful MongoDB Metrics Knowing the usage pattern in your database is critical to trending performance. So tracking how many reads (selects) and writes (insert, update, delete) are being performed at a regular interval will definitely help so you can tune (and size) the system accordingly. In conjunction with tracking the actual reads and writes, MongoDB also exposes the number of clients actively doing reads and the number doing writes. Combining these two metrics will allow better cache tuning and can even decide when adding more replicas would make sense. The last extremely useful metric for performance monitoring in MongoDB, in my opinion, is capturing slow queries. This is very useful for developers to find missing indexes, and tracking the number will find ways that clients are using the system that weren’t originally envisioned. I’ve seen unexpected client behavior in the past. For instance, once a user base figured out they could do wildcard searches, they stopped typing an exact 12-digit number and just searched on the first digit followed by a %. An index brought performance back in line, but it was not expected. (Remember that people are like water. They will find the fastest way to do things and follow that route for good or bad.) Additional MongoDB Metrics to Monitor Most of the metrics in this section are solid and valuable, but are too coarse-grained to be of the same value as the metrics in the above sections. The following are good overall metrics of system health, and will definitely help in scaling the database system up and down to meet the needs of your applications. MongoDB has internal read and write queues that can be watched, and will only really be used when MongoDB can’t keep up with the number of requests that are incoming. If these are used often, then you will probably need to look into adding capacity to your MongoDB deployment. Another great metric to trend is the number of client connections, and available connections. Then, of course, there are always metrics at the machine level that are important to watch on all of your nodes. These include memory, CPU, and network performance. Monitoring Tools to Use More information on the tools that ship with MongoDB for monitoring are available at mongoDB. And there are operational support platforms, like Sumo Logic, which provide a much more visual and user-friendly way to review these metrics. Sumo Logic also has the Sumo Logic App for MongoDB that includes pre-built queries and dashboards allowing you to track overall system health, queries, logins and connections, errors and warnings, replication, and sharding. You can learn more about the app from the MongoDB documentation. If you don’t already have a Sumo Logic account, sign up for the Free Trial and take the app for a spin. About the Author Vince Power is a Solution Architect who has a focus on cloud adoption and technology implementations using open source-based technologies. He has extensive experience with core computing and networking (IaaS), identity and access management (IAM), application platforms (PaaS), and continuous delivery. Overview of MongoDB Performance Monitoring is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

AWS Well Architected Framework - Security Pillar

Blog

CISO Manifesto: 10 Rules for Vendors

This CISO blog post was contributed by Gary Hayslip, Deputy Director, Chief Information Security Officer (CISO) for the City of San Diego, Calif., and Co-Author of the book CISO Desk Reference Guide: A Practical Guide for CISOs As businesses today focus on the new opportunities cybersecurity programs provide them, CISOs like myself have to learn job roles they were not responsible for five years ago. These challenging roles and their required skill sets I believe demonstrate that the position of CISO is maturing. This role not only requires a strong technology background, good management skills, and the ability to mentor and lead teams; it now requires soft skills such as business acumen, risk management, innovative thinking, creating human networks, and building cross-organizational relationships. To be effective in this role, I believe the CISO must be able to define their “Vision” of cybersecurity to their organization. They must be able to explain the business value of that “Vision” and secure leadership support to execute and engage the business in implementing this “Vision.” So how does this relate to the subject of my manifesto? I am glad you asked. The reason I provided some background is because for us CISOs, a large portion of our time is spent working with third-party vendors to fix issues. We need these vendors to help us build our security programs, to implement innovative solutions for new services, or to just help us manage risk across sprawling network infrastructures. The truth of the matter is, organizations are looking to their CISO to help solve the hard technology and risk problems they face; this requires CISOs to look at technologies, workflows, new processes, and collaborative projects with peers to reduce risk and protect their enterprise assets. Of course, this isn’t easy to say the least, one of the hardest issues I believe CISOs face is time and again when they speak with their technology provider, the vendor truly doesn’t understand how the CISO does their job. The vendor doesn’t understand how the CISO views technology or really what the CISO is looking for in a solution. To provide some insight, I decided I would list ten rules that I hope technology providers will take to heart and just possibly make it better for all of us in the cyber security community. Now with these rules in mind, let’s get started. I will first start with several issues that really turn me off when I speak with a technology provider. I will end with some recommendation to help vendors understand what CISOs are thinking when they look at their technology. So here we go, let’s have some fun. Top Ten Rules for Technology Providers “Don’t pitch your competition” – I hate it when a vendor knows I have looked at some of their competitors, and then they spend their time telling me how bad the competition is and how much better they are. Honestly I don’t care, I contacted you to see how your technology works and if it fits for the issue I am trying to resolve. If you spend all of your time talking down about another vendor, that tells me you are more concerned about your competitor than my requirements. Maybe I called the wrong company for a demonstration. “Don’t tell me you solve 100% of ANY problem” – For vendors that like to make grand statements, don’t tell me that you do 100% of anything. The old adage “100% everything is 0% of anything.” In today’s threat environment, the only thing I believe that is 100% is eventually that I will have a breach. The rest is all B.S. so don’t waste my time saying you do 100% coverage, or 100% remediation, or 100% capturing of malware traffic. I don’t know of a single CISO that believes that anyone does 100% of anything so don’t waste your time trying to sell that to me.

Blog

AWS Best Practices - How to Achieve a Well Architected Framework

Blog

A Toddler’s Guide to Data Analytics

Any parent of a two-year old appreciates the power of speaking a common language. There is nothing more frustrating to my two-year old son than his inability to communicate what he wants. Learning to say things like “milk” and “applesauce” has transformed the breakfast experience. On the other hand, learning to say “trash” means that any object in reach is in danger of being tossed in the trash bin for the glory of saying “twash” over and over again. I see this in my world world as well. Millions of dollars are spent in the software world translating from one “language” to another. In fact, a whole industry has popped up around shared Application Program Interfaces (APIs) to standardize how systems communicate. Despite this trend, there seems to be more emphasis on the “communication” rather than the “data” itself. Data Analytics products in particular seem happy to shove all types of data into one mold, since the output is the same. That is where we at Sumo Logic decided to take the road less travelled. We believe in the idea that data needs to be treated “natively” – in other words, we don’t want to shove a square data peg in a round systems hole. Just like speaking a language natively changes the experience of travel – speaking the native language of data transforms the analytics experience. When we decided to build our Unified Logs and Metrics (ULM) product, we also decided that it was essential that we become bi-lingual – speaking the native language of both raw logs and time-series metrics. And here is why it matters – according to my Toddler. Answer my Question – Quickly Toddlers are well acquainted with the frustration of not being understood. Everything takes too long, and they need it now. And you know what. I get it. I have had the occasion of speaking to people that don’t speak my language before, and it is hard. I once spent 15 minutes in a pharmacy in Geneva trying to order anti-bacterial cream. There was a lot of waving of hands and short, obvious words (Arm. Cut. Ow!). We would have faced the same obstacles if we used a log system to store metrics. Every query needs to be translated, from one language to another, and it takes forever. At end of the day, you can try optimize a log system – built to search for needles in haystacks – to perform the equivalent of speed reading, but eventually the laws of physics intervene. You can only make it so fast. It takes too long – and like my toddler, you will just stop asking the question. What’s the use in that? Cleaning up is Hard I am always amazed with how my two-year old can turn a nice stack of puzzles or a bucket of toys into a room-sized disaster zone – it is the same components, but vastly different results. Storage optimization is essential in the world of operational data. There is a natural assumption underneath a true log-analytics system. We assume on some level that each log is a special snowflake. There is, of course, a lot of repetition, but the key is to be flexible and optimize for finding key terms very quickly. Metrics, on the hand, are repetitive by design. Every record of a measurement is the same – except for the measurement itself. Once you know you are collecting something – say system CPU performance on some server – you don’t need to capture that reference every time. You can optimize heavily for storing and retrieving long lists of numbers. Storing time series metrics as logs, or events, is extremely wasteful. You can incur any where from 3x to 10x more storage costs – and that is without the same performance. To achieve the same performance as most metrics system can reach, you are looking at 10-20x in storage costs. This, of course, is the reason why no log-analytics companies are really used for performance metrics at scale – the immense costs involved just don’t justify the benefit of tool reduction. I Want to Play with my Cars Anywhere One of the funniest things my son does is how he plays with his toy cars. He has race tracks, roads, and other appropriate surfaces. He rarely uses them. He prefers to race his cars on tables, up walls, and on daddy’s leg. The flexibility of having wheels is essential. He has other “cars” that don’t roll – he doesn’t really play with them. It is the same core truth with data analytics. Once you have high performance with cost effective storage – uses just present themselves. Now you can perform complex analytics without fear of slowing down the system to a crawl. You can compare performance over months and years, rather than minutes and hours – because storage is so much cheaper. Innovative use cases will always fill up the new space created by platform enhancements – just as restricted platforms will always restrict the use cases as well. Choose Wisely So, it’s 2 AM. Your application is down. Your DevOps/Ops/Engineering team is trying to solve the problem. They can either be frustrated that they can’t get their questions answered, or they can breeze through their operational data to get the answers they need. I know what two-year old would tell you to do. Time to put your old approach to data analytics in the twash.

January 10, 2017

Blog

Making the Most of AWS Lambda Logs

How can you get the most out of monitoring your AWS Lambda functions? In this post, we’ll take a look at the monitoring and logging data that Lambda makes available, and the value that it can bring to your AWS operations. You may be thinking, “Why should I even monitor AWS Lambda? Doesn’t AWS take care of all of the system and housekeeping stuff with Lambda? I thought that all the user had to do was write some code and run it!” A Look at AWS Lambda If that is what you’re thinking, then for the most part, you’re right. AWS Lambda is designed to be a simple plug-and-play experience from the user’s point of view. Its function is simply to run user-supplied code on request in a standardized environment. You write the code, specifying some basic configuration parameters, and upload the code, the configuration information, and any necessary dependencies to AWS Lambda. This uploaded package is called a Lambda function. To run the function, you invoke it from an application running somewhere in the AWS ecosystem (EC2, S3, or most other AWS services). When Lambda receives the invoke request, it runs your function in a container; the container pops into existence, does its job, and pops back out of existence. Lambda manages the containers—You don’t need to (and can’t) do anything with them. So there it is—Lambda. It’s simple, it’s neat, it’s clean, and it does have some metrics which can be monitored, and which are worth monitoring. Which Lambda Metrics to Monitor? So, which Lambda metrics are important, and why would you monitor them? There are two kinds of monitoring information which AWS Lambda provides: metrics displayed in the AWS CloudWatch console, and logging data, which is handled by both CloudWatch and the CloudTrail monitoring service. Both types of data are valuable to the user—the nature of that value and the best way to make use of it depend largely on the type of data. Monitoring Lambda CloudWatch Console Metrics Because AWS Lambda is strictly a standardized platform for running user-created code, the metrics that it displays in the CloudWatch console are largely concerned with the state of that code. These metrics include the number of invocation requests that a function receives, the number of failures resulting from errors in the function, the number of failures in user-configured error handling, the function’s duration, or running time, and the number of invocations that were throttled as a result of the user’s concurrency limits. These are useful metrics, and they can tell you a considerable amount about how well the code is working, how well the invocations work, and how the code operates within its environment. They are, however, largely useful in terms of functionality, debugging, and day-to-day (or millisecond-to-millisecond) operations. Monitoring and Analyzing AWS Lambda Logs With AWS Lambda, logging data is actually a much richer source of information in many ways. This is because logging provides a cumulative record of actions over time, including all API calls made in connection with AWS Lambda. Since Lambda functions exist for the most part to provide support for applications and websites running on other AWS services, Lambda log data is the main source of data about how a function is doing its job. “Logs,” you say, like Indiana Jones surrounded by hissing cobras. “Why does it always have to be logs? Digging through logs isn’t just un-fun, boring, and time-consuming. More often than not, it’s counter-productive, or just plain impractical!” And once again, you’re right. There isn’t much point in attempting to manually analyze AWS Lambda logs. in fact, you have three basic choices: either ignore the logs, write your own script for extracting and analyzing log data, or let a monitoring and analytics service do the work for you. For the majority of AWS Lambda users, the third option is by far the most practical and the most useful. Sumo Logic’s Log Analytics Dashboards for Lambda To get a clearer picture of what can be done with AWS Lambda metrics and logging data, let’s take a look at how the Sumo Logic App for AWS Lambda extracts useful information from the raw data, and how it organizes that data and presents it to the user. On the AWS side, you can use a Lambda function to collect CloudWatch logs and route them to Sumo Logic. Sumo integrates accumulated log and metric information to present a comprehensive picture of your AWS Lambda function’s behavior, condition, and use over time, using three standard dashboards: The Lambda Overview Dashboard The Overview dashboard provides a graphic representation of each function’s duration, maximum memory usage, compute usage, and errors. This allows you to quickly see how individual functions perform in comparison with each other. The Overview dashboard also breaks duration, memory, and compute usage down over time, making it possible to correlate Lambda function activity with other AWS-based operations, and it compares the actual values for all three metrics with their predicted values over time. This last set of values (actual vs. predicted) can help you pinpoint performance bottlenecks and allocate system resources more efficiently. The Lambda Duration and Memory Dashboard Sumo Logic’s AWS Lambda Duration and Memory dashboard displays duration and maximum memory use for all functions over a 24-hour period in the form of both outlier and trend charts. The Billed Duration by Hour trend chart compares actual billed duration with predicted duration on an hourly basis. In a similar manner, the Unused Memory trend chart shows used, unused, and predicted unused memory size, along with available memory. These charts, along with the Max Memory Used box plot chart, can be very useful in determining when and how to balance function invocations and avoid excessive memory over- or underuse. The Lambda Usage Dashboard The Usage dashboard breaks down requests, duration, and memory usage by function, along with requests by version alias. It includes actual request counts broken down by function and version alias. The Usage dashboard also includes detailed information on each function, including individual request ID, duration, billing, memory, and time information for each request. The breakdown into individual requests makes it easy to identify and examine specific instances of a function’s invocation, in order to analyze what is happening with that function on a case-by-case level. It is integrated, dashboard-based analytics such as those presented by the Sumo Logic App for AWS Lambda that make it not only possible but easy to extract useful data from Lambda, and truly make the most of AWS Lambda monitoring. About the Author Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.

Blog

Triggering AWS Lambda Functions from Sumo Logic Alerts

Blog

Leveraging Machine Data Analytics for DevOps

Early on, we measured data volume in gigabytes. Then we moved onto terabytes. Now, it’s petabytes. But the scale of data is not the only thing that has changed. We now deal with different types of data as well. In particular, the introduction of large volumes of machine data have created new opportunities for machine data analytics. Leveraging machine data, especially logs and metrics, is a key part of advancing the DevOps workflow. Advanced analytics based on machine data allows DevOps engineers to make sense of petabytes of data by using statistical, indexing, filtering and machine learning techniques. In this post, I explain how to use Sumo Logic’s cloud-native platform to analyze large volumes of machine data to drive actionable insights. Using Sumo Logic Unified Logs and Metrics for Machine Data Let’s start by discussing how Sumo Logic allows users to visualize machine logs and metrics. Sumo makes this information available through a single, unified interface—the Sumo Logic Application Status Dashboard. The dashboard shows the DevOps engineer a real-time visualization of the status quo regarding logs and metrics. The Sumo Logic Application Status Dashboard The image above shows the available metrics in this example: latency, customer logins, CPU usage, app log errors, and memory usage and errors. Additional metrics can be visualized in the dashboard as well, depending on which type of data is available. Examples of supported logs include error logs, binary logs, general and slow query logs, and DDL logs. In addition, since the dashboard is connected to those logs, it allows you to drill down to find more details about an issue. The Sumo Logic Dashboard and DevOps Using the available logs and metrics, a DevOps engineer can perform a quick root cause analysis on a production issue so the problem can be addressed quickly. That’s essential in DevOps because quick resolution of problems assures that pipelines can keep flowing continuously. This video demonstrates the Sumo Logic Application Dashboard in action: Notice in particular the use of filtering—one of the analytics techniques Sumo Logic uses to help a DevOps engineer tackle an issue. Other analytics methods include statistical, indexing and machine learning techniques. Machine Data, Predictive Analytics and Sumo Logic Sumo Logic lets you do more with machine data than simply find out what happened. You can also use it as a predictive analytics platform to identify trends and understand what is likely to happen next with your infrastructure or DevOps development pipeline. Predictive analytics based on machine data are valuable because the vast volume of data coming daily into an organization means that a large amount of that data turns into noise and ultimately masks the messages that are most important. With predictive analytics, DevOps teams can make the most of all data, even if they can’t react to it all in real time. Consider, for example, the case of a CPU usage spike or a memory drop. Predictive analytics techniques could help you to predict when such events will occur again so that you can prepare for them. Similarly, predictive analytics delivered via tools like Sumo Logic can help you to find patterns in a vast amount of data without having to program your own code. Sumo can identify the trends and help you make sense of them through a convenient interface. That’s a big help to DevOps professionals because it means that, using Sumo Logic, they can make sense of a large volume of information without having to be experts in statistics or data analytics programming. Instead, they can focus on what they know best—whether it is coding, testing or system administration—and rely on Sumo Logic to be the data analytics expert on their team. LogReduce: Clean Up Your Machine Data A final feature worth mentioning is LogReduce. This is a feature in Sumo Logic that, like unified logs and metrics, helps DevOps engineers to reduce the noise in their machine data. The following video shows an example of LogReduce: As you can see, a lot of calculations and analysis are done under the hood. All the DevOps engineer had to do was push the LogReduce button. This saves the DevOps engineer from having to worry about machine learning techniques, freeing him or her to focus on the problem to be solved. In my opinion, every DevOps engineer using the LogReduce button should have at least a basic understanding of machine data. Otherwise, results could be misinterpreted. Still, LogReduce is a great feature for transforming a baseline knowledge of machine data analytics into expert-level results. About the Author Cordny Nederkoorn is a software test engineer with over 10 years of experience in finance, e-commerce and web development. He is also the founder of TestingSaaS, an international community researching cloud applications with a focus on forensics, software testing and security.

Blog

How to Deploy and Manage a Container on Azure Container Service

Blog

Building a World-Class Cloud Partner Ecosystem

Wow, what a week last week at AWS re:Invent. I spent the weekend reflecting (and recovering) on the progress we have made in the market and how our great partners have impacted Sumo Logic’s success. Sumo Logic was born in the cloud and designed to help customers build, run and secure mission critical cloud applications. If you haven’t already, I encourage you to read our “State of the Modern Applications” report. As we developed this report, we found it incredibly enlightening as have many others. The report highlights how wildly different the technology choice is for building these modern applications in the cloud versus traditional technologies used for on-premises application development. This rise and increased adoption of cloud-native applications has heavily influenced Sumo Logic’s partner ecosystem. We are very proud and honored to be partnered with the leading technologies used to build today’s modern applications. We share a common vision with our partners to help customers accelerate their adoption of cloud more quickly and more safely. So, together with MongoDB, Fastly, Pivotal, NGINX, CrowdStrike, Okta and Evident.IO, we decided to THROW A PARTY to celebrate the success we shared in 2016 and to kick off 2017 with a bang! We had well over 300 customers, partners, and people wanting to learn more at the event. It was a fantastic time, and I want to thank our partners for sponsoring and everyone who attended. As I look to 2017, I couldn’t be more excited to be at Sumo Logic and the tremendous opportunity that lies ahead for us and our incredible partners. We are just getting started together on this journey, and I am supremely optimistic about our future together. Wishing you all a very happy holiday season and New Year!

AWS

December 8, 2016

Blog

Sumo Logic + New Relic = Comprehensive Application and Infrastructure Operations

At AWS re:Invent 2016 last week, New Relic and Sumo Logic announced a partnership that brings together two leaders in application operations. The integrated solution combines machine data analytics with application and infrastructure performance data to enable joint customers to use the New Relic Digital Intelligence Platform for visualizing the impact of software performance on customer experience and business outcomes. Why is this important and what does this mean for the industry? New Relic is the leader in Application Performance Management and provides detailed performance metrics across an enterprise’s digital business – customer experience, application performance, and infrastructure. Thousands of enterprises use the New Relic solution to proactively monitor and manage application performance alerts (“latency of application is spiking, application is unavailable etc.”). However, to get to the root cause of issues, IT teams also need complete visibility into the logs associated with the application, transactions and the supporting infrastructure. And that is where the integrated Sumo Logic – New Relic solution comes in. Sumo Logic turns machine data – structured and unstructured into operational insights, helping organizations more effectively build, run and secure modern applications and infrastructure. The Sumo Logic service enables customers to ingest and analyze logs with patented machine learning algorithms. And when integrated with New Relic’s full-stack visibility provides joint customers with: Complete picture of what’s happening with users, applications, and instances Proactive identification of application and infrastructure performance issues Faster troubleshooting and root cause analysis leveraging APM and log data However, it’s not just the comprehensive and integrated APM and log analytics views that distinguishes this partnership. What makes this partnership unique is the “operating model” of the two solutions. Like Sumo Logic, New Relic also focuses on modern applications and infrastructures that are developed using agile/DevOps methodologies, use microservices style distributed architectures and typically run in highly elastic and scalable cloud environments. And like Sumo Logic, New Relic is also a SaaS service, providing fast time-to-value and low total-cost-of-ownership for its customers (incidentally, both Sumo Logic and New Relic are in the newly announced AWS SaaS subscription marketplace) At Sumo Logic, we are thrilled to partner with New Relic. We believe that application operations is undergoing fundamental transformations and companies like Sumo Logic and New Relic are bringing a new vision to this market. Stay tuned to see more integration details emerge from this partnership.

December 7, 2016

Blog

Customer Blog: OpenX

Blog

Ground Zero for a $30 Trillion Disruption

Yes, that is a “T” as in trillion. In the last 30 days, I had the pleasure of attending two events that reinforced in my mind that Sumo Logic is at the center of the largest market disruption I will most likely experience in my lifetime. First, I traveled with our Sumo Logic CTO, Christian Beedgen, to Lisbon, Portugal, for his presentation at Europe’s largest technology conference, Web Summit 2016. After watching at least 100 thought leaders representing 20+ different industries from around the world, the comment from Saul Klein of LocalGlobe and Index Ventures about the expected reallocation of $30 trillion in market value from existing Fortune 2000 companies to new disruptors and yet-to-be-born companies beautifully summed up my key takeaway from the show. However this massive transfer of wealth is not just dependent on each country represented (more than 115!) finding and funding the next Google. Klein passionately appealed to global policy makers and influencers to support the 120 million “zebras” on Facebook interested in entrepreneurship in order to have more control over their lives. Self-employed entrepreneurs already represent five percent of the global population and growing. So, imagine the synchronicity I felt when I found myself a few weeks later at what I consider to be the mecca of the zebra wave: AWS re:Invent, Las Vegas. The energy, activity and dialogue of 32,000+ attendees (or ‘builders’ in parlance of AWS CTO Werner Vogels), across a myriad of market categories sharing and demonstrating their cloud-based products, solutions and services was not only exciting, but also gave me a sense of hope. Hope, in that, as Gary Vee said at Web Summit, now with the ubiquity of mobile and the accessibility of cloud, “if you have the imagination and willingness, no one can stop you. No one.” This hope is not just in transformative technologies but also in the very way people come together to create. It’s collaboration that goes beyond the sake of inclusivity in and of itself. The market transition to a size-able, global entrepreneurial workforce will create an ocean of experts who have the opportunity to come together in a much more harmonious way to solve similar problems, providing solutions to market that benefit people and profits. For businesses adopting this philosophy, silos will dissipate in favor of collaborative structures, processes and decision-making. We see this playing out already in digital businesses that have adopted DevOps as their innovation strategy. DevOps practices scuttle linear, waterfall software development approaches in favor of agile and continuous delivery practices resembling “loops.” Silos fade away, and a cross-expertise (as opposed to “cross-function”) of development, testing, deployment, operations, security and line-of-business professionals come together to create, monitor, troubleshoot, optimize and then creation again — a continuous loop that drives continuous innovation. And these loops won’t be cookie-cutter — they will be unique to the ideas, culture and opportunities of each organization, thereby creating new sources of business competitive advantage, and new sources of hope for unimagined opportunities. We see examples of this today in organizations such as Google, Facebook, Airbnb, Uber, Spotify, Snapchat, Amazon and Netflix to name a few. So, every company faces a fundamental question — to be part of the policy of continuous innovation (hope), or to maintain a policy of entrenchment to preserve the status quo. In my mind, no group will survive a policy of entrenchment or isolation. A company might keep going for a while, but ultimately it won’t survive. The longer it waits, the more it delays the inevitable. That’s why Sumo Logic is passionate about helping companies make this transition. Our machine data analytics platform provides the visibility and confidence companies desire to transition to cloud and modern applications. We think of it as “continuous intelligence” to feed the loop of continuous innovation decision-making. And the value is applicable across the organization. For example, as a 20-year marketing & communication veteran, I’m excited to finally be in the position of being a consumer of a B2B technology that’s relevant to my expertise. Our cloud-native, multi-tenant platform service enables us to leverage customer meta data to better understand their usage patterns of our own product, so we can better support and market to them, in addition to optimizing our services. And that re-allocated $30T I mentioned earlier? Well wrap your head around this: one expected outcome will be that 75% of the S&P 500 will be replaced in the next 10 years, at the current S&P churn rate. (Source: Innosight, Richard N. Foster, Standard & Poor’s). So, there’s a sense of urgency in the air, which may explain the astonishing growth rates of Web Summit (from a couple of hundred in Dublin 2010 to more than 50,000 in six years), and AWS re:Invent (5,500 attendees in 2012 to more than 32,000 this year). And that gives me hope too because it means people are getting it. So, when I look back ten years from now at Web Summit and AWS re:Invent 2016, I’m proud to know that I was there at ground zero for a $30T market disruption. And my forecast for the next ten years? It will be a wild ride for sure, but those who go ‘all-in’ on cloud computing, focus on new ideas and delivering value continuously at speed, are likely to come out on top. And at a time when much in the world seems unknown, the future certainly seems bright to me. https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

Blog

Designing a Data Analytics Strategy for Modern Apps

Yesterday at AWS re:Invent 2016, Sumo Logic Co-Founder and CTO Christian Beedgen presented his vision for machine data analytics in a world where modern apps are disrupting virtually every vertical market in business. Every business is a software business, Marc Andreessen wrote more than five years ago. Today, driven by customer demand, the need to differentiate and the push for agility, digital transformation initiatives are disrupting every industry. “We are still at very the beginning of this wave of Digital Transformation,” Christian said. “By 2020 half of all businesses will have figured out digitally enhanced products and services.” The result is that modern apps are being architected differently than they were just 3 years ago. Cloud applications are being built on microservices by DevOps teams that automate to deliver new functionality faster. “It used to be that you could take the architecture and put it on a piece of paper with a couple of boxes and a couple of arrows. Our application architecture was really clean.” But with this speed and agility comes complexity, and the need for visibility has become paramount. “Today our applications look like spaghetti. Building microservices, wiring them up, integrating them so they can work with something else, foundational services, SQL databases, NoSQL databases…” You need to be able to see what’s going on, because you can’t fix what you cannot see. Modern apps require Continuous Intelligence to provide insights, continuously and in real-time, across the entire application lifecycle. Designing Your Data Analytics Strategy Ben Newton, Sumo Logic’s Principal Product Manager of the Metrics team, took the stage to look at the various types of data and what you can do with them. Designing a data analytics strategy begins by understanding the data types that are produced by machine data, then focusing on the activities that data supports. The primary activities are Monitoring where you detect and notify (or alert), and Troubleshooting where you identify, diagnose, restore and resolve. “What we often find is that users can use that same data to do what we call App Intelligence – the same logs and metrics that allows you to figure out something is broken, also tells you what your users are doing. If you know what users are doing, you can make life better for them because that’s what really matters.” So who really cares about this data? When it comes to monitoring where the focus is on user-visible functionality, it’s your DevOps and traditional IT Ops teams. Engineering and development also are responsible for monitoring their code. In troubleshooting apps where the focus is on end-to-end visibility, customer success and technical support teams also become stakeholders. For app intelligence, the focus is on user activity and visibility everyone is a stakeholder including sales, marketing, and product management. “Once you have all of this data, all of these people are going to come knocking on your door,” said Ben. Once you understand the data types you have, where it is within your stack and the use cases, you can begin to use data to solve real problems. In defining what to monitor and measure, Ben highlighted: Monitor what’s important to your business and your users. Measure and monitor user visible metrics. Build fewer, higher impact, real-time monitors. “Once you get to troubleshooting side, it gets back to you can’t fix what you can’t measure.” Ben also said: You can’t improve what you can’t measure. You need both activity metrics and detailed logs. Up to date data drives better data-driven decisions. You need data from all parts of your stack. So what types of data will you be looking at? Ben broke it down to the following categories: Infrastructure Rollups vs. Detailed What resolution makes sense? Is real-time necessary? Platform Rollups vs. Detailed Coverage of all components Detailed logs for investigations Architecture in the metadata Custom How is your service measured? What frustrates users? How does the business measure itself? “Everything you have produces data. It’s important to ensure you have all of the components covered.” Once you have all of your data, it’s important to think about the metadata. Systems are complex and the way you make sense out of it is through your metadata. You use metadata to describe or tag your data. “For the customer, this is the code you wrote yourself. You are the only people that can figure out how to monitor that. So one of the things you have to think about is the metadata. ” Cloud Cruiser – A Case Study Cloud Cruiser’s Lead DevOps Engineer, Ben Abrams, took the stage to show how the company collects data and provide some tips on tagging it with metadata. Cloud Cruiser is a SaaS app that enables you to easily collect, meter, and understand your cloud spend in AWS, Azure, and GCP. Cloud Cruiser’s customers are large enterprises and mid-market players globally distributed across all verticals, and they manage hundreds of millions of cloud spend. Cloud Cruiser had been using an Elastic (Elasticsearch, Logstash, and Kibana) stack for their log management solution. They discovered that managing their own logging solution was costly and burdensome. Ben cited the following: Operational burden was a distraction to the core business. Improved security. Ability to scale + cost. Cloud Cruiser runs on AWS (300-500 instances) and utilizes microservices written in Java using the dropwizard framework. Their front-end web app runs on Tomcat and uses Angularjs. Figure 1 shows the breadth of the technology stack: In evaluating a replacement solution, Ben said “We were spending too much time on our ELK stack.” Sumo Logic’s Unified Logs and Metrics (ULM) was also a distinguishing factor. The inclusion of metrics meant that they didn’t have to employ yet another tool that would likewise have to be managed. “Logs are what you look at when something goes wrong. But Metrics are really cool.” Ben summarized the value and benefits they achieved this way: Logs Reduced operational burden. Reduced cost. Increased confidence in log integrity. Able to reduce the number of people needing VPN. Alerting based on searches did not need ops handholding. Metrics Increased visibility in system and application health. Used in an ongoing effort with application and infrastructure changes in that we were able to reduce our monthly AWS bill by over 100%. Ben then moved into a hands on session, showing how they automate the configuration and installation of Sumo Logic collectors, and how they tag their data using source categories. Cloud Cruiser currently collects data from the following sources: Chef: automation of config and collector install Application Graphite Metrics from Dropwizard Other graphite metrics forwarded by Sensu to Sumo Logic “When I search for something I want to know what environment is it, what type of log is it, and which server role did it come from.” One of their decisions was to differentiate log data from metrics data as shown below. Using this schema allows them to search logs and metrics by environment, type of log data and corresponding Chef role. Ben walked through the Chef Cookbook they used for deploying with Chef and shared how they automate the configuration and installation of Sumo Logic collectors. For those interested, I’ll follow on this up in the DevOps Blog. A key point from Ben, though, was “Don’t log secrets.” The access ID and key should be defined elsewhere, out of scope and stored in an encrypted data bag. Ben also walked through the searches they used to construct the following dashboard. Through this one dashboard, Cloud Cruiser can utilize both metrics and log data to get an overview of the health of their production deployment. Key Takeaways Designing your data analytics strategy is highly dependent on your architecture. Ultimately it’s about the experience you provide to your users. It’s no longer just about troubleshooting issues in production environments. It’s also about understanding the experience you provide to your users. The variety of data that streams in real time comes from the application, operating environment and network layers produces an ever increasing volume of data every day. Log analytics provides the forensic data you need, and time-series based metrics give you insights into the real-time changes taking place under the hood. To understand both the health of your deployment and the behavior/experience of your customers, you need to gather machine data from all of its sources, then apply both logs and metrics to give teams from engineering to marketing the insights they need. Download the slides and view the entire presentation below:

Blog

Evident.io: Visualize, Analyze and Report on Security Data From AWS

Evident.io and Sumo Logic team up to provide seamless integrated visibility into compliance monitoring and risk attribution Analyzing and visualizing all your security data in one place can be a tricky undertaking. For any SOC, DevSecOps or DevOps team in heterogeneous environments, the number of tools in place to gain visibility into and monitor compliance can be daunting. The good news is that Evident.io and Sumo Logic have teamed up to bring you a simple-to-implement, yet effective integration that allows you to perform additional analytics and visualization of your Evident Security Platform data in the Sumo Logic Analytics platform. Evident.io ESP is an agentless, cloud-native platform focused on comprehensive continuous security assessment of the control plane for AWS cloud infrastructure services. ESP can monitor all AWS services available through the API, ensuring their configurations are in line with AWS best practices for security as well as your organization’s specific compliance requirements. Sumo Logic is a leading SaaS-native, machine data analytics service for log management and time series metrics. Sumo Logic allows you to aggregate, perform statistical analytics, report on trends, visualize and alert on all your operational, performance and security related event log data in one place from just about any data source. Why integrate with Sumo Logic? Both of these platforms are architected for the cloud from the ground up and have a solid devops pedigree. This integration allows you to aggregate all the data generated by your AWS cloud infrastructure in the same place as your application level security and performance event data which allows you to perform attribution on a number of levels. The Evident.io alert data is rich with configuration state data about your security posture with regards to AWS best practices for security and the CIS Benchmarks for AWS. As customers adopt CI/CD concepts; being able to quickly visualize, alert and remediate, in near real-time, on any vulnerabilities introduced by misconfiguration is critical. Evident.io and Sumo Logic combined can help you do this better and faster. And, best yet, it is super easy to get started with Evident.io and Sumo Logic in a matter of minutes. The Sumo Logic App for Evident.io ESP The Sumo Logic App for Evident.io ESP enables a user to easily and quickly report on some key metrics from their AWS Cloud infrastructure such as: Trend analysis of alerts over time (track improving or deteriorating posture over time) Time to resolve alerts (For SLAs – by tracking the start and end of an alert in one report) Summary of unresolved alerts/risks Number of risks found by security signatures over time Below are some screen shots from the Sumo Logic App for Evident.io ESP: Figure 1 is an overview of the the types and severity of risks, alert status and how long before a risk is resolved and marked as ended on the Evident.io side. This can be an important metric when managing to SLAs. Fig. 1 Figure 2 provides a detailed view of the risks identified by Evident.io ESP within the configured time range for each of the dashboard panels. The panels present a views into: Which Evident.io ESP signatures triggered the risks A breakdown of: risks identified by AWS region risks by AWS account number of total identified risks number of newly identified risks Fig. 2 The chart in Fig 3 below is an interesting one that shows risks identified clearly trending down over 14 days. This is indicating that the teams are remediating identified issues in the Evident.io ESP alerts, and you clearly see an improvement in the security posture of this very large AWS environment that has 1000s of instances. Note: There are almost no high severity risks in this environment. Fig. 3 Is my data secure? These two platforms do an awesome job of securing your data both in flight and in transit, with both using TLS 1.2 encryption for in flight data and customer specific 256 bit AES encryption keys for at rest data. You can be confident that this data is securely transported from the Evident Security Platform to Sumo Logic and stored in a secure fashion. How can I gain access? This integration relies on the use of AWS SNS (Simple Notification Service) and a Sumo Logic native https collector. If you are both an Evident.io and Sumo Logic customer you can enable and start to benefit from the integration using the directions here: http://help.sumologic.com/Special:Search?qid=&fpid=230&fpth=&path=&search=evident.io or http://docs.evident.io/#sumo. Note you will need to have access to both Evident.io and Sumo Logic instances. Security and compliance monitoring are no longer a bottleneck in your agile environment. You can start visualizing the data from Evident Security Platform (ESP) in Sumo Logic in a matter of minutes. This blog post was written by Hermann Hesse, Senior Solutions Architect at Evident.io. He can be reached at https://www.linkedin.com/in/hermann-hesse-a040281

AWS

November 30, 2016

Blog

CDN with AWS CloudFront - Tutorial

Blog

AWS – The Biggest Supercomputer in the World

AWS is one of the greatest disruptive forces in the entire enterprise technology market. Who would have thought when they launched in 2006, it was going to kick off perhaps the most transformative shift in the history of the $300B data center industry. Over 25,000 people (or 0.0003% of the World’s population) are descending on Vegas this week to learn more about AWS, the biggest supercomputer in the world. As we get ready to eat, drink, network and learn, I wanted to provide some responses to inquiries I often get from prospects, reporters and folks who I meet at various conferences around the country. What advice would you pass on to anyone deciding to use AWS for public cloud storage? Understand the IaaS provider’s shared security model. In Amazon’s case, AWS is responsible for the infrastructure. The customer is responsible for the security of everything that runs on that infrastructure – The applications, the workloads and the data. Make sure any additional service you use on top of that have pursue their own security certifications and attestations to protect data at rest and in motion. This will allay fears and give people comfort in sending data through a SaaS-based service. We find that organizations are making different decisions based on the trust level they have with their partners, and we at Sumo Logic take this very seriously investing millions to achieve and maintain on an ongoing basis, these competitive differentiators. Too many people try to live vicariously through the certifications AWS has and pass this on as adequate Understand the benefits you are hoping to achieve before you start (i.e. Better pricing / reduced cost; Easier budget approvals (CAPEX vs. OPEX); Increase Business Agility; Increase flexibility and choices of what programming models, OS, DB and architectures make sense for the business; Increased security; Increased workload scalability / elasticity, etc.) How can we maximize AWS’s value? Crawl, walk, run – it is a learning curve that will take time to master. Adopt increasing levels of services as your teams get up to speed and understands how to leverage APIs and automate everything through code. Compute as code is now a reality. Understand the pain points you are trying to address – this will dictate approach (i.e. Pricing / Cost / Budget; Internal Politics; Control of Data Locality; Sovereignty; Security; Compliance, etc.) Turn on logging within AWS. More specifically, activate Amazon CloudWatch to log all your systems, applications and services and activate AWS CloudTrail to log all API actions. This will provide visibility into all user actions on AWS. The lack of visibility into cloud operations and controls stands as the largest security issue we see. What cautions might there be in terms of how to end up paying more than one should or not really getting full value out of this type of storage? Understand not all data is created equal…in terms of importance, frequency of access, life expectancy of the data, retention requirements, and search performance. Compare Operational data (high importance, high frequency of access, short life expectancy, high search performance requirements) to audit data (medium importance, lower frequency of access, longer life expectancy/data retention requirements, low performance requirements) Align your storage needs to the value and urgency of the data that you are logging (S3, S3 Infrequent Access, Glacier, EBS, etc.) Look for solutions and tools that are cloud native, so you can avoid unnecessary data exfiltration costs. 10 years ago, no one was virtualizing mission critical workloads because of Security and Compliance concerns…but we ended up there anyways. This is exactly the same thing for cloud. And in this new world, speed and time to market is everything. Organizations are looking to be more flexible, more agile, capitalize on business opportunities, and how you approach security is different. And to support the rapid pace of delivery of these digital initiatives – weekly, even daily – these companies are leveraging modern, advanced IT infrastructures like AWS and Sumo Logic. In this new world, we at Sumo Logic have a tremendous opportunity to help operations and security professionals get the visibility they need as those workloads are moved out to the cloud. We help them become cloud enablers, to help drive the business forward, not being naysayers. Visibility is everything! Come stop by our booth – #604 – and say hi!

Blog

Advanced Security Analytics for AWS

Every company – if they are going to remain relevant – is going through some form of digital transformation today and software is at the heart of this transformation. According to a report by the center for digital business transformation, the digital disruption will displace approximately 40% of incumbent companies within the next 5 years. Don’t believe it? According to Forrester Research, between 1973 and 1983, 35% of the top 20 F1000 companies were new. Now jump forward 20 years, and this number increases to 70%. According to predictions from IDC’s recent FutureScape for Digital Transformation, two-thirds of Global 2000 companies will have digital transformation at the center of their corporate strategy by next year, and by 2020, 50% of the Global 2000 will see the majority of their business depend on their ability create digitally-enhanced products, services, and experiences. So what does this all mean? Keeping pace with the evolving digital marketplace requires not only increased innovation, but also updated systems, tools, and teams. Accenture and Forrester Research reported in their Digital Transformation in the Age of the Customer study that only 26% of organizations considered themselves fully operationally ready to execute against their digital strategies. In order to deliver on the promise of digital transformation, organizations must also modernize their infrastructure to support the increased speed, scale, and change that comes with it. We see three characteristics that define these modern applications and digital initiatives: They follow a DevOps or DevSecOps culture, where the traditionally siloed walls between the Dev, Ops and Security teams are becoming blurred, or go away completely. This enables speed, flexibility and agility. They are generally running on modern infrastructure platforms like AWS (see AWS Modern Apps Report), leveraging APIs and compute as code (see AWS – The Largest Supercomputer in the World) The way you approach security needs to change. You need deep visibility & native integrations across the AWS services that are used, you need to understand your risks and security vulnerabilities, you need to connect the dots between the services used, and understand what the users are doing, where are they coming from, what are they changing, what are the relationship of those changes, how this impacts network flows and security risks. And it is important to be able to match information contained in your AWS log data – i.e. IP Address, Ports, UserIDs, etc – from services like CloudTrail and VPC Flow Logs, with known Indicators of Compromise (IOCs) that are out there in the wild from premium threat intelligence providers like Crowdstrike. Pulling in global threat intelligence into Sumo Logic’s Next Gen Cloud Security Analytics for AWS accomplishes the following: Increases velocity & accuracy of threat detection Adds additional content to log data and helps to identify and visualize malicious IP addresses, domain names, ports, email addresses, URLs, and more. Improve security and operational posture through accelerated time to identify and resolve security threats (IOC) Come stop by our booth – #604 – for a demo and say hi!

AWS

November 29, 2016

Blog

Getting Started with AWS EC2 Container Service (ECS)

Blog

Starting Fresh in AWS

Many folks we speak to ask the question: “How do I get started in AWS?” The answer used to be simple. There was a single service for compute, storage, and a few others services in early trials. Fast forward 10+ years and AWS now offers over 50 services. Taking your first steps can be daunting. What follows is my recommended approach if you’re starting fresh in the AWS Cloud and don’t have a lot of legacy applications and deployments weighing you down. If that is you, check out the companion post to this one. Do Less To Do More Everything in AWS operates under a Shared Responsibility Model. The model simple states that for each of the areas required day-to-day operations (physical, infrastructure, virtualization, operation system, application, and data), someone is responsible. That someone is either you (the user) or AWS (the service provider). Light grey options are the responsibility of AWS, Black are the user’s The workload shifts towards the service provider as you move away from infrastructure services (like Amazon EC2) towards abstract services (like AWS Lambda). As a user, you want AWS to do more of the work. This directs your service choice as you start to build in the AWS Cloud. You want to pick more and more of the services that fall under the SaaS or abstract — which is a more accurate term when compared to SaaS — category. Computation If you need to run your own code as part of your application, you should be making your choice based on doing less work. This means starting with AWS Lambda, a service that runs your functions directly without worrying about the underlying frameworks or operating system. If Lambda doesn’t meet your needs, try using a Docker container running on the Amazon EC2 Container Service (ECS). The advantage of this service is that it configures the underlying EC2 instance (the OS, Docker host, scheduling, etc.) and lets you simply worry about the application container. If ECS can’t meet your needs, see if you’re a fit for AWS Elastic Beanstalk. This is a service that takes care of provisioning, capacity management, and application health for you (a/k/a you do less work). All of this runs on top of Amazon EC2. So does Lambda and ECS for that matter. If all else fails, it’s time to deploy your own instances directly in Ec2. The reason you should try to avoid this as much as possible is the simple fact that you’re responsible for the management of the operating system, any applications your install, and — as always — your data. This means you need to keep on top of patching your systems, hardening them, and configuring them to suit your needs. The best approach here is to automate as much of this operational work as possible (see our theme of “do less” repeating?). AWS offers a number of services and features to help in this area as well (start with EC2 AMIs, AWS CodeDeploy, AWS CodePipeline, and AWS OpsWorks). Data Storage When it comes to storing your data, the same principle applies; do less. Try to store you data in services like Amazon DynamoDB because the entire underlying infrastructure is abstracted away for you. You get to focus purely on your data. If you just need to store simple file object, Amazon S3 is the place to be. In concert with Amazon Glacier (long term storage), you get the simplest version of storage possible. Just add an object (key) to a bucket and you’re all set. Under the covers, AWS manages all of the moving parts in order to get you 11 9’s of durability. This means that about 0.000000001% of objects stored in the service may experience data corruption. That’s a level of quality that your simply cannot get on your own. If you need more control or custom configurations, other services like the Amazon Elastic File System or EBS volumes in EC2 are available. Each of these technologies comes with more operational overhead. That’s the price you pay for customization. Too Many Services Due to the shear number of services that AWS provides, it’s hard to get a handle on where to start. Now that you know your guiding principle, it might be worth looking at the AWS Application Architecture Center. This section of the AWS site contains a number of simple reference architectures that provide solutions to common problems. Designs for web application hosting, batch processing, media sharing, and others are all available. These designs give you an idea of how these design patterns are applied in AWS and the services you’ll need to become familiar with. It’s a simple way to find out which services you should start learning first. Pick a design that meets your needs and start learning the services that the design is composed of. Keep Learning AWS does a great job of providing a lot of information to help get you up to speed. Their “Getting Started with AWS” page has a few sample projects that you can try under the free tier. Once you start to get your footing, the whitepaper library is a great way to dive deeper on certain topics. In addition, all of the talks from previous Summits (one to two day free events) and AWS re:Invent (the major user conference) are available for viewing on the AWS YouTube channel. There are days and days of content for you to watch. Try to start with the most recent material as a lot of the functionality has changed over the years. But basic, 101-type talks are usually still accurate. Dive In There is so much to learn about AWS that it can be paralyzing. The best advice I can give is to simply dive in. Find a simple problem that you need to solve, do some research, and try it out. There is no better way to learn than doing. Which leads me to my last point, the community around AWS is fantastic. AWS hosts a set of very active forums where you can post a question and usually get an answer very quickly. On top of that the usual social outlets (Twitter, blogs, etc.) are a great way to engage with others in the community and to find answers to your pressing questions. While this post has provided a glimpse of where to start, be sure to read the official “Getting Started” resources provided by AWS. There’s also a great community of training providers (+ the official AWS training) to help get you up and running. Good luck and happy building! This blog post was contributed by Mark Nunnikhoven, Vice President, Cloud Research at Trend Micro. Mark can be reached at https://ca.linkedin.com/in/marknca.

AWS

November 21, 2016

Blog

Getting Started Under Legacy Constraints in AWS

Getting started in AWS used to be simple. There was a single service for compute, storage, and a few others services in early trials. Fast forward 10+ years and AWS now offers over 50 services. Taking your first steps can be daunting. What follows is my recommended approach if you already have a moderate or large set of existing applications and deployments that you have to deal with and want to migrate to the AWS Cloud. If you’re starting fresh in the AWS Cloud, check out the companion post to this one. Do Less To Do More Everything in AWS operates under a Shared Responsibility Model. The model simple states that for each of the areas required day-to-day operations (physical, infrastructure, virtualization, operation system, application, and data), someone is responsible. That someone is either you (the user) or AWS (the service provider). Light grey options are the responsibility of AWS, Black are the user’s The workload shifts towards the service provider as you move away from infrastructure services (like Amazon EC2) towards abstract services (like AWS Lambda). As a user, you want AWS to do more of the work. This should direct your service choice as you start to build in the AWS Cloud. Ideally, you want to pick more and more of the services that fall under the SaaS or abstract — which is a more accurate term when compared to SaaS — category. But given your existing constraints, that probably isn’t possible. So you need to start where you can see some immediate value, keeping in mind that future project should aim to be “cloud native”. Start Up The Forklift The simplest way to get started in AWS under legacy constraints is to forklift an existing application from your data centre into the AWS Cloud. For most applications, this means you’re going to configure a VPC, deploy a few EC2 instances, an RDS instance (ideally as a Multi-AZ deployment). To make sure you can expand this deployment, leverage a tool like AWS OpsWorks to automate the deployment of the application on to your EC2 instances. This will make it a lot easier to repeat your deployments and to manage your Amazon Machine Images (AMIs). Migrating your data is extremely simple now as well. You’re going to want to use the AWS Database Migration Service to move the data and the database configuration into RDS. Second Stage Now that your application is up and running in the AWS Cloud, it’s time to start taking advantage of some of the key features of AWS. Start exploring the Amazon CloudWatch service to monitor the health of your application. You can set alarms to warn of network bandwidth constraints, CPU usage, and when the storage space on your instances starts to get a little cramped. With monitoring in place, you can now adjust the application’s configuration to support auto scaling and to sit behind a load balancer (either the classic ELB or the new ALB). This is going to provide some much needed resiliency to your application. It’s automated so you’re going to start to realize some of the benefits of AWS and reduce the operational burden on your teams at the same time. These few simple steps have started your team down a sustainable path of building in AWS. Even though these features and services are just the tip of the iceberg, they’ve allowed you to accomplish some very real goals. Namely having a production application working well in the AWS Cloud! On top of that, auto scaling and CloudWatch are great tools to help show teams the value you get by leveraging AWS services. Keep Going With a win under your belt, it’s a lot easier to convince teams to build natively in AWS. Applications that are build from the ground up to take advantages of abstract services in AWS — like Amazon Redshift, Amazon SQS, Amazon SNS, AWS Lambda, and others — will let you do more for your users with less effort on your part. Teams with existing constraints usually have a lot of preconceived notions of how to build and deliver IT services. To truly get the most out of AWS, you have to adopt a new approach to building services. Use small wins and a lot of patience to help convince hesitant team members that this is the best way to move forward. Too Many Services Due to the shear number of services that AWS provides, it’s hard to get a handle on where to start. Now that you know your guiding principle, it might be worth looking at the AWS Application Architecture Center. This section of the AWS site contains a number of simple reference architectures that provide solutions to common problems. Designs for web application hosting, batch processing, media sharing, and others are all available. These designs give you an idea of how these design patterns are applied in AWS and the services you’ll need to become familiar with. It’s a simple way to find out which services you should start learning first. Pick a design that meets your needs and start learning the services that the design is composed of. Keep Learning AWS does a great job of providing a lot of information to help get you up to speed. Their “Getting Started with AWS” page has a few sample projects that you can try under the free tier. Once you start to get your footing, the whitepaper library is a great way to dive deeper on certain topics. In addtion, all of the talks from previous Summits (one to two day free events) and AWS re:Invent (the major user conference) are available for viewing on the AWS YouTube channel. There are days and days of content for you to watch. Try to start with the most recent material as a lot of the functionality has changed over the years. But basic, 101-type talks are usually still accurate. Dive In There is so much to learn about AWS that it can be paralyzing. The best advice I can give is to simply dive in. Find a simple problem that you need to solve, do some research, and try it out. There is no better way to learn than doing. Which leads me to my last point, the community around AWS is fantastic. AWS hosts a set of very active forums where you can post a question and usually get an answer very quickly. On top of that the usual social outlets (Twitter, blogs, etc.) are a great way to engage with others in the community and to find answers to your pressing questions. While this post has provided a glimpse of where to start, be sure to read the official “Getting Started” resources provided by AWS. There’s also a great community of training providers (+ the official AWS training) to help get you up and running. Good luck and happy building! This blog post was contributed by Mark Nunnikhoven, Vice President, Cloud Research at Trend Micro. Mark can be reached at https://ca.linkedin.com/in/marknca.

AWS

November 21, 2016

Blog

5 Patterns for Better Microservices Architecture

Microservices have become mainstream in building modern architectures. But how do you actually develop an effective microservices architecture? This post explains how to build an optimal microservices environment by adhering to the following five principles: Cultivate a Solid Foundation Begin With the API Ensure Separation of Concerns Production Approval Through Testing Automate Deployment and Everything Else Principle 1: Great Microservices Architecture is Based on a Solid Foundation No matter how great the architecture, if it isn’t based on a solid foundation, it won’t stand the test of time. Conway’s law states that “…organizations that design systems … are constrained to produce designs which are copies of the communication structures of these organizations…” Before you can architect and develop a successful microservice environment, it is important that your organization and corporate culture can nurture and sustain a microservice environment. We’ll come back to this at the end, once we’ve looked at how we want to design our microservices. Principle 2: The API is King Unless you’re running a single-person development shop, you’ll want to have an agreed-upon contract for each service. Actually, even with a single developer, having a specific set of determined inputs and outputs for each microservice will save you a lot of headaches in the long run. Before the first line of code is typed, determine a strategy for developing and managing API documents. Once your strategy is in place, focus your efforts on developing and agreeing on an API for each microservice you want to develop. With an approved API for each microservice, you are ready to start development, and begin reaping the benefits of your upfront investment. Principle 3: Separation of Concerns Each microservice needs to have and own a single function or purpose. You’ve probably heard of separation of concerns, and microservices are prime examples for the application of that principle. Additionally, if your microservice is data-based, ensure that it owns that data, and exists as the sole access point for that data. As additional requirements come to light, it can be very tempting to add an additional endpoint to your service that kind of does the same thing (but only kind of). Avoid this at all costs. Keep your microservices focused and pure, and you’ll avoid running into the nightmare of trying to remember which service handled that one obscure piece of functionality. Principle 4: Test-Driven Approval Back in the old days, when you were supporting a large monolithic application, you’d schedule your release weeks or months in advance, including an approval meeting which may or may not have included a thumbs up/thumbs down vote, or fists of five to convey approval or confidence in the new release. With microservice architecture, that changes. You’re going to have a significant number of much smaller applications, and if you follow the same release process, you’ll be spending a whole lot more time in meetings. Therefore, if you’re implementing test-driven development (TDD), writing comprehensive contract and integration tests as you develop each application, you’ll finish your service up with a full test suite which you can automate as part of your build pipeline. Use these tests as the basis for your production deployment approval process, rather than relying on the approval meetings of yore. Principle 5: Automate, Automate, Automate As developers and engineers, we’re all about writing code which can automate and simplify the lives of others. Yet, too often, we find ourselves trapped in a world with manual deployments, manual testing and manual approval processes and change management. Automating these processes when it comes to microservices is less a convenience and more of a necessity, especially as your code base and repertoire of microservices expands and matures. Automate your build pipelines so that they trigger as soon as code is merged into the master branch. Automate your tests, static code analysis, security scans and any other process which you run your code through, and then on condition of all checks completing successfully, automate the deployment of the updated microservice into your environment. Automate it all! Once your microservice is live, ensure that you have configured a means by which the service can be automatically configured as well. An automated configuration process has the added benefit of making it easier to both troubleshoot and remember where it was that you set that “obscure” property for that one microservice. Conclusion: Back to the Foundation At the start of this article, I mentioned Conway’s law, and said I’d come back to talking about what kind of organization you need in order to facilitate successful microservice development. This is what your organization should look like: Each team should have a specific aspect of the business on which to focus, and be assigned development of functionality within that area. Determine the interfaces between teams right up front. The key is to encourage active and engaging communication and collaboration between the teams. Doing so will help to avoid broken dependencies and help to ensure that everyone remains focused. Empower teams with the ability to make their own decisions and own their services. Common sense may dictate some basic guidelines, but empowering teams is a lot like automating processes, and it’ll both fuel innovation and make your life easier. Now, with all that said, you probably don’t have the luxury of building a brand new development department from the ground up. More likely, you’re going to be trying to implement changes to a group of folks who are used to doing things in a different way. Just as with microservices, or any software project for that matter, focus on incremental improvements. Determine an aspect of the culture or the organization that you’d like to change, determine how to implement that change, how to measure success, and test it out. When you achieve success, pick another incremental change to work on, and when you fail, try a different approach or a different aspect of the business to improve. About the Author Mike Mackrory is a Global citizen who has settled down in the Pacific Northwest – for now. By day he works as a Senior Engineer on a Quality Engineering team and by night he writes, consults on several web based projects and runs a marginally successful eBay sticker business. 5 Patterns for Better Microservices Architecture is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Sumo Logic Brings Machine Data Analytics to AWS Marketplace

The founders of Sumo Logic recognized early on when they founded the company that in order to remain competitive in an increasingly disruptive world, companies would be moving to the cloud to build what are now being called modern apps. Hence, Sumo Logic purposefully architected the Sumo Logic machine data analytics platform from the ground up on Amazon Web Services. Along the way, Sumo Logic has acquired a vast knowledge and expertise in not only log management overlaid with metrics, but in the inner workings of the services offered by Amazon. Today, more 6 years later, we are pleased to announce that Sumo Logic is one of a handful of initial AWS partners participating in the launch of SaaS Subscription products on Amazon Web Services (AWS) Marketplace, and the immediate availability of the Sumo Logic Platform in AWS Marketplace. Now, customers already using AWS can leverage Sumo Logic’s expertise in machine-data analytics to visualize and monitor workloads in real-time, identify issues and expedite root cause analysis to improve operational and security insights across AWS infrastructure and services. How it Works AWS Marketplace is an online store that allows AWS customers to procure software and services directly in the marketplace and immediately start using those services. Billing runs through your AWS account, allowing your organization to consolidate billing for all AWS services, SaaS subscriptions and software purchased through the Marketplace. To get started with Sumo Logic in the AWS Marketplace go to the Sumo Logic page. You should see a screen similar to the following. Pricing As you can see, pricing is clearly marked next to the product description. Pricing is based on several factors starting with which edition of Sumo Logic you’re using – Professional or Enterprise. Professional edition supports up to 20 users and 30 days of data retention among other features. Enterprise Edition includes support for 20+ users and multi-year data retention as part of its services. See Sumo Logic Pricing page for more information. Reserved Log Ingest Once you’ve decided which edition, you’re ready to select the plan that’s best for you based on your anticipated ingest volume. Reserved Log Ingest Volume is the amount of logs you have contracted to send each day to the Sumo Logic service. The Reserve price is how you much you pay for GB of logs ingested each day. During signup, you can select a Reserved capacity in GB’s per day (see below). There are no minimum days, and you can cancel at any time. On-Demand Log Ingest Bursting is allowed and at the end of the billing cycle, for any usage beyond the total Reserved capacity for the period, you will pay the On-demand rate. Your first 30 days of service usage are FREE. Signing up When you click Continue, you’ll be taken to the Sumo Logic signup form similar to Figure 2. Enter your email address, then click Plan to select your Reserved Log Ingest volume. At this point you will select your Reserved capacity. Plans are available for increments of 1, 3, 5, 10, 20, 30, 40 and 50 GB per day. Once you’ve selected your plan, click the signup button to be taken through the signup process. Recall, billing is managed through AWS so no credit card required. What You Get If you’re not already familiar with the Sumo Logic, the platform unifies logs, metrics and events, transforming a variety of data types into real-time continuous intelligence across the entire application lifecycle enabling organizations to build, run and secure their modern applications. Highlights of Sumo Logic include: Unified Logs and Metrics Machine learning capabilities like LogReduce and LogCompare, machine learning features to quickly identify root cause. Elasticity and bursting support without over-provisioning Data encryption at rest, PCI DSS 3.1 with log immutability, and HIPAA compliance at no additional cost. Zero log infrastructure Management overhead. Go to sumologic.com for more information Sumo Logic Apps for AWS As mentioned, Sumo Logic has tremendous expertise in AWS, and experience building and operating massively multi-tenant, highly distributed cloud systems. Sumo Logic passes that expertise along to its customers in the form of Apps for AWS services. Sumo Logic Apps for AWS contain preconfigured searches and dashboards for the most common use cases, and are designed to accelerate your time to value with Sumo Logic. Using these dashboards and searches you can quickly get an overview of your entire AWS application at the app, system and network levels. You can quickly identify operational issues, drill down using search and apply tools like LogReduce and LogCompare to quickly get at the root cause of the problem. You also gain operational, security and business insight into services that support your app like S3, CloudTrail, VPC Flow and Lambda. Apps that are generally available include: Amazon S3 Audit App Amazon VPC Flow Logs App Amazon CloudFront App AWS CloudTrail App AWS Config App AWS Elastic Load Balancing App AWS Lambda App In addition, the following Apps are in Preview for Sumo Logic Customers: Amazon CloudWatch – ELB Metrics Amazon RDS Metrics Getting Started and Next Steps Sumo Logic is committed to educating its customers using the deep knowledge and expertise it has gained in working with AWS. If you’re new to Amazon Web Services, we’ve created AWS Hub, a portal dedicated to learning AWS fundamentals. The portal includes 101’s to get you started with EC2, S3, ELB, VPC Flow and AWS Lambda. In addition you’ll find deep-dive articles and blog posts walking you through many of the AWS service offerings. Finally, if you’re planning to attend AWS re:Invent at the end of November, stop by and get your questions answered, or take a quick tour of Sumo Logic and all machine learning and data analytics has to offer.

November 16, 2016

Blog

A Survival Guide for AWS reInvent 2016

AWS re:Invent is one of the biggest events Sumo Logic partakes in each year. If you’re not familiar, it’s one of the hottest cloud shows with over 25,000 attendees, more than 400 sessions, hands on labs, boot camps and much more! We’ve been lucky to be apart of it for the last few years and it just seems to get bigger and better. In just a few short weeks we’ll be headed back to Vegas and have a few things up our sleeves! Here are some tips, tricks and all the fun activities that you can expect. As Scar from the Lion King would say… Be Prepared! Whether you’re a veteran of re:Invent or a newbie, you can never be too prepared to survive the week. Here’s some useful tips we’ve picked up along the way: Make sure to plan your schedule and session times in advance. This year AWS has introduced “Reserved Seating” for sessions so you can secure your spot! More details here. If you’re looking for a new t-shirt wardrobe make sure to visit the expo hall. Vendors basically throw them at you. But honestly, some really cool stuff goes down in there! Check out the re:Play party on Thursday night, it’s always a different setup and they announce the musical guest that day. Wonder who it’ll be? Hydrate, hydrate, hydrate and have fun! Expo Hall Madness The expo hall is a lively and happening place with over 400 vendors showcasing the best cloud services. Last year we had half a ton of sumo wrestlers that made quite a splash there and you can be sure that we’ll bring the same excitement to re:Invent this year. Visit our booth #604, to see how machine data analytics can give you the continuous intelligence you need to build, run and secure you modern applications. We will have a bunch of activities, plenty of swag, and live demos and face time with the whole Sumo crew! Learn from the Best If you’re looking to get practical advice on how developers and operators can leverage data analytics to gain critical information about their modern application then you should check out our session on Wednesday afternoon at 11:00am. Our own CTO, Christian Beedgen and Ben Newton, Principal Product Manager, along with Cloud Cruiser will take you in depth on Effective Application Data Analytics for Modern Applications. Make sure to reserve your seat! Get Social with Sumo! Whether or not you’re attending re:Invent, you can participate and follow all of the conversations from the event! We will be live blogging from relevant keynotes and sessions, posting videos, photos and hosting a fun #SumoAfterDark photo contest (think roaming gnome). Follow us on Twitter or LinkedIn for all the latest. Check out our AWS re:Invent page for more details.

AWS

November 15, 2016

Blog

Announcing Sumo Logic's State of Modern Apps in AWS Report

Today we published Sumo Logic’s first "The State of The Modern App in AWS" report. We wanted to explore how modern applications differ from traditional applications: how are the modern app workloads run, what application components are used, and which types of application services are leveraged? This report is based on anonymized data from about 1,000 customers running modern application workloads in AWS and it examines the composition of those applications. Sumo Logic provides our customers with the operational and security visibility of their applications. As such, multiple millions events per second of live data are sent to our service from our customers’ full application stack: infrastructure, application components, custom app code, as well as variety of application services leveraged to manage the application. Fingerprint of this data then gives us unprecedented ability to answer questions about these applications. When we founded Sumo Logic, we decided to build a multi-tenant analytics service in order to be able to leverage the cross customer visibility in order to provide more value to each customer. Our architecture and technology behind it allows us to: Analyze anonymized statistics about technology usage to help our existing and future customers learn about latest technology trends emerging from enterprises leading the way to digital transformation (e.g. this report) Prioritize roadmap items in order to help our customers gain visibility into most commonly used technologies (e.g. our integration with Docker) Improve our machine learning algorithms by exposing them to larger data sets of same data types across multiple customers Derive meanings from events occurring across same sources across multiple customers to improve operational and security outcomes I am excited to have this report as it is a very tangible response to a question I very frequently get: what is the value of having a multi-tenant architecture? It is easy to quickly respond that we benefit from the same operating model as other all-in cloud services, such as AirBnB, Netflix, Hudl, etc. But this report also highlights another, and perhaps more important and unique value of multi-tenancy - it gives us the ability to analyze trillions of anonymized events across our customer community and convert them into value for each individual customer as they build, run, and secure their modern application. With Sumo Logic, you are not alone. We are excited to continue deriving insights that will help our customers create better outcomes for their business.

November 15, 2016

Blog

Machine Data for the Masses

Blog

New Features to Optimize Your Scheduled Searches

Last week Sumo Logic rolled out several new enhancements to Scheduled Searches that will make it easier to continuously monitor your stack and receive notifications on critical events. Scheduled searches are just standard searches that have been saved, and are executed on a schedule you set. As you create your scheduled search, you can configure several different alert types including email, Script Action, ServiceNow Connection, Webhook, Save to Index, and Real-time Alerts. Once configured, the scheduled search runs continuously in the background and triggers the alert type you’ve selected. The new enhancements give you more control over the scheduling of searches, and allow you to customize how and when the search results are presented. This post introduces you to these new features and shows how you can make best use of them. If you’d like a personal tour through the new scheduled-search features, register for our webinar, “Optimize Your Scheduled Searches is Here” to be held on November 17. Specify the Scheduled Timezone When you schedule a search within Sumo Logic, the search is by default scheduled to run within the timezone set by the user’s preferences. Sumo Logic now allows you to specify a timezone other than your preferred timezone. When used in combination with the other scheduling options, your search will now run at the time and timezone set within the search. Global teams that use a standard timezone like UTC will find the timezone option especially helpful, allowing all members to see the same results and avoid confusion. This option also provides the user creating the scheduled search with the flexibility to make the email results relevant to the recipients that may be located anywhere in the world. Weekly Scheduling Option Answering the call for popular feature request, Sumo Logic has increased the scheduled-search frequency to allow for weekly scheduling and reporting. When you select the weekly option, you can select a specific day and time of the week for your search to run. Custom CRON Scheduling Option For cases where the default scheduling options just don’t fit your needed schedule, Sumo Logic now allows you to input a custom CRON expression to fully customize the scheduled day and time of your search. With the custom CRON option, you can schedule your search to run monthly on a set day and hour, certain days of the week, specific hours within a day, or any other combination you can think of. The custom CRON scheduling utilizes the Quartz CRON format. For more information on formatting your CRON expression, see Using a CRON Expression in Sumo Logic. Note: In order make sure your scheduled searches are run on time, Sumo Logic does not support scheduling searches with CRON at an interval less than 15 minutes. If you need a schedule that is less than 15 minutes, consider using the standard scheduling options of the search. Set a Custom Email Subject Another useful feature is that you can now specify the email subject for your Scheduled Search emails using a set of predefined placeholders along with your own text. This makes it easier to recognize, organize, and filter these emails within your inbox. Don’t want to update your existing email filters or Scheduled Searches? Don’t worry. We supply a default subject that matches the previous email format. Your existing searches will continue to use that subject until the owner of the search decides to change it. Select How You Want to Receive Results Display only the information you want to see within the Scheduled Search emails by selecting which sections to include. Show or hide the results histogram image, the search query string, or the summarized results set. Additionally, you can include up to the first 1000 results from your query attached as a .CSV file. Note that there’s a 5MB limit. Learn More About Scheduled Searches The new enhancements in Scheduled Search now allows you to create highly customized alert notifications so that you can customize your notification systems and work seamlessly with global teams. If you’d like to learn more about using Scheduled Searches, join us for our upcoming webinar, or visit Sumo Logic DocHub for complete product documentation.

November 9, 2016

Blog

Getting Started with AWS Config Rules

Keeping track of all of your settings and configuration changes on AWS can be difficult if you try to do it by hand. Fortunately, AWS offers a service to automate the process called AWS Config. This article explains what AWS Config is, and show you how to get started using AWS Config rules. What is AWS Config? The official description of AWS Config reads as follows: AWS Config provides a detailed view of the configuration of AWS resources in your AWS account. This includes how the resources are related to one another and how they were configured in the past so that you can see how the configurations and relationships change over time. Reading a description like that will really start the wheels turning in any SysOps professional’s head. In the datacenter, many attempts were made, sometimes successfully, to manage inventory and changes. With an AWS account, there is a defined boundary in which to monitor. There is tagging on almost every resource type that can be provisioned, and everything that can be deployed fits neatly into a resource type definition. AWS Config ends up being an ideal tool to provide infrastructure visibility within an AWS account. Getting Started To begin setting up AWS Config, a tutorial is provided in the AWS Management Console. This tutorial walks the user through configuration items and establishes all components needed to begin using AWS Config. The setup allows the user to select resources in the region and configure global resources. An item worth noting is that AWS Config is set up on a region by region basis. The setup will also ask for an S3 bucket in which to store the AWS Config record. At the same time, an Amazon SNS topic can be set to send notifications and an IAM role can be applied to grant permission to interact with Amazon SNS and Amazon S3. The setup provides a number of suggested AWS Config template rules to select, including: Root-account-mfa-enabled – Ensures the root AWS account has multi-factor authentication enabled Cloudtrail-enabled – Checks if AWS CloudTrail (auditing of API calls) is enabled Eip-attached – Looks for unused Elastic IP addresses Encrypted-volumes – Checks to ensure that any EBS volumes that are in use are encrypted All four, at the present time, are very useful in their own way. It is recommended that all be added to the AWS account before confirming changes. Creating Rules Once confirmed, AWS Config will provide two primary navigation items along the left side of the screen—Rules and Resources. The Rules screen is used to configure—as one might expect, rules—to monitor configuration changes to Resources in the AWS account. AWS provides a healthy number of template rules, covering a number of primary use cases for using AWS Config. Three common template rules that should be enabled immediately include: Require-tags – Require specific tags on a resource. This is especially useful in preventing orphaned EC2 instances with unknown purposes. Restrict-SSH – Ensures SSH has not been enabled for a Security Group that should have permit inbound SSH requests Restricted-common-ports – Another rule to ensure common ports are not open to instances that should not be accessible. AWS Config can also leverage custom rules that can be configured in the same menu. Custom rules can be triggered either by a configuration change on the AWS account or on a periodic basis. A resource type or key value pair can be selected as a target of the rule. For even more complex rules logic, an AWS Lambda function can be invoked as well. Reviewing Configuration AWS Config provides two primary ways of consuming the configuration state of the AWS account—the Resources tab in the AWS Config console, and SNS notification. For periodic review, the Resources tab contains a couple of selectors that allow for searching either records associated with resources, or tags. This is also the screen where announcements are made about new resource types being supported in config. The other method is to configure AWS Simple Notification Service (SNS) to send an email or text message when a rule becomes non-compliant. One downside to either method is that it does not take steps to remediate the issue. That falls to the administrator who received the notification. Sumo Logic App for AWS Config AWS Config provides a great way to monitor how an AWS account is configured. For any account that has a lot of individuals, generating a large number of changes, AWS Config is an invaluable tool to provide continuous visibility into the infrastructure. The Sumo Logic App for AWS Config adds to this by presenting notifications containing snapshots of resource configurations and information about the modifications made to a resource. The app also provides predefined Live and Interactive Dashboards and filters that give you a greater level of visibility into your environment for real-time analysis of overall usage. If you don’t already have a Sumo Logic account, sign up for a Free Trial and take the app for a test drive. About the Author Over the last 10 years, Sara Jeanes has held numerous program management, engineering, and operations roles. She has led multiple team transformations and knows first-hand the pain of traditional waterfall development. She is a vocal advocate for DevOps, microservices, and the cloud as tools to create better products and services. Sara is currently a Contributor at Fixate.io and can be found on Twitter @sarajeanes. Getting Started with AWS Config Rules is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Logging Node.js Lambda Functions with CloudWatch

When AWS Lambda was introduced in 2014, the first language to be supported was Node.js. When a Node.js function is deployed as an AWS Lambda, it allows for the execution of the function in a highly scalable, event-driven environment. Because the user has no control over the hardware or the systems which execute the function, however, the need for logging Node.js Lambda functions is of utmost importance when it comes to monitoring, as well as diagnosing and troubleshooting problems within the function itself. This post will consider the options for logging within a Node.js function. It will then briefly outline the process of uploading a Node.js function into a Lambda function, configuring the function to send the logs to Cloudwatch, and viewing those logs following the execution of the Lambda. Logging with Node.js Within a Node.js function, logging is accomplished by executing the appropriate log function on the console object. Possible log statements include: console.log() console.error() console.warn() console.info() In each case, the message to be logged is included as a String argument. If for example the event passed to the function included a parameter called ‘phoneNumber’, and we validated the parameter to ensure that it contained a valid phone number, we could log the validation failure as follows: console.error(“Invalid phone number passed in: “ + event.phoneNumber); Sometimes the best way to learn something is to play with it. Let’s create an example Lambda so we can see logging in action. First, we’ll need to set up the environment to execute the function. Configuring it all in AWS You’ll need access to an AWS environment in order to complete the following. All of this should be available if you are using the free tier. See Amazon for more information on how to set up a free account. Once you have an AWS environment you can access, and you have logged in, navigate to the Identity and Access Management (IAM) home page. You’ll need to set up a role for the Lambda function to execute in the AWS environment. On the IAM home page, you should see a section similar to the one shown below which lists IAM Resources (The number next to each will be 0 if this is a new account.) Click on the Roles label, and then click on the Create New Role button on the next screen. You’ll need to select a name for the role. In my case, I chose ExecuteLambda as my name. Remember the name you choose, because you’ll need it when creating the function. Click on the Next Step button. You’ll now be shown a list of AWS Service Roles. Click the Select button next to AWS Lambda. Filter: Policy Type box, type AWSLambdaExecute. This will limit the list to the policy we need for this role. Check the checkbox next to the AWSLambdaExecute policy as shown below, and then click on Next Step. You’ll be taken to a review screen, where you can review the new policy to ensure it has what you want, and you can then click on the Create Role button to create the role. Create Your Lambda Function Navigate to the Lambda home page at https://console.aws.amazon.com/lambda/home, and click on Create Lambda Function. AWS provides a number of sample configurations you can use. We’re going to be creating a very simple function, so scroll down to the bottom of the page and click on Skip. AWS also provides the option to configure a trigger for your Lambda function. Since this is just a demo, we’re going to skip that as well. Click on the Next button on the bottom left corner of the page. Now we get to configure our function. Under Configure function, choose a Name for your function and select the Node.js 4.3 Runtime. Under Lambda function code, choose Edit code inline from the Code entry type dropdown, and then enter the following in the Code Window. exports.handler = function(event, context) { console.log("Phone Number = " + event.phoneNumber); if (event.phoneNumber.match(/\d/g).length != 10) { console.error("Invalid phone number passed in: " + event.phoneNumber); } else { console.info("Valid Phone Number.") } context.done(null, "Phone number validation complete"); } Under Lambda function handler and role, set the Handler to index.handler, the Role to Choose an Existing role, and for Existing role, select the name of the role you created previously. Click on Next, review your entries and then click on Create function. Test the Function and View the Logs Click on the Test button, and enter the following test data. Click Save and Test and you should see the test results shown on the bottom of the screen. By default, Lambdas also send their logs to CloudWatch. If you click on the Monitoring tab, you can click on the View logs in CloudWatch link to view the logs in CloudWatch. If you return to the Lambda code page, you can click on Actions, select Configure test event and then enter different data to see how the result changes. Since the validator only checks to ensure that the phone number contains 10 numbers, you can simply try it with 555-1234 as the phone number and watch it fail. Please note that this example is provided solely to illustrate logging within a Node.js function, and should not be used as an actual validation function for phone numbers. About the Author Mike Mackrory is a Global citizen who has settled down in the Pacific Northwest – for now. By day he works as a Senior Engineer on a Quality Engineering team and by night he writes, consults on several web-based projects and runs a marginally successful eBay sticker business. When he’s not tapping on the keys, he can be found hiking, fishing and exploring both the urban and rural landscape with his kids. Always happy to help out another developer, he has a definite preference for helping those who bring gifts of gourmet donuts, craft beer and/or Single-malt Scotch. Logging Node.js Lambda Functions with CloudWatch is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Troubleshooting Apps and Infrastructure Using Puppet Logs

If you’re working with Puppet, there’s a good chance you’ve run into problems when configuring a diverse infrastructure. It could be a problem with authorization and authentication, or perhaps with MCollective, Orchestration or Live Management. Puppet logs can provide a great deal of insight into the status of your apps and infrastructure across the data center. Knowing where to look is just the first step. Knowing what to look for is another matter. Here’s a cheat sheet that will help you identify the logs that are most useful, and show you what to look for. I’ll also explain how to connect Puppet to a log aggregation and analytics tool like Sumo Logic. Where are Puppet Logs Stored? The Puppet Enterprise platform produces a variety of log files across its software architecture. This Puppet documentation describes the file path of the following types of log files: Master Logs: Master application logs containing information such as fatal errors and reports, warnings, and compilation errors. Agent Logs: Information on client configuration retrieved from the Master. ActiveMQ Logs: Information on the ActiveMQ actions on specific nodes. MCollective service logs: Information on the MCollective actions on specific nodes. Console logs: Information around console errors, fatal errors and crash reports. Installer logs: Contains information around Puppet installations, such as errors occurred installation, last installation run and other relevant information. Database logs: Information around database modifications, errors, etc. Orchestration logs: Information around orchestration changes. The root of Puppet log storage is different depending on whether Puppet is running in a Unix-like system or in a Windows environment. For *nix-based installs, the root folder for Puppet is /etc/puppetlabs/puppet. For Windows-based installs the root folder for Puppet is: C:\ProgramData\PuppetLabs\puppet\etc for all versions of Windows server from 2008 onwards. Modifying Puppet Log Configuration The main setting that needs to be configured correctly to get the best from Puppet logs is the log_level attribute within the main Puppet configuration file. The log_level parameter can have the following values, with “notice” being the default value: debug info notice warning err alert emerg crit The Puppet Server can also be configured to process logs. This is done using the Java Virtual Machine Logback library. An .xml file is created, usually named logback.xml, which can be piped into the Puppet Server at run time. If a different filename is used, it will need to be specified in the global.conf file. The .xml file allows you to override the default root logging level of ‘info’. Possible levels are trace, debug, info and debug. For example, if you wanted to produce full debug data for Puppet, you would add the following parameter to the .xml file. <root level=”debug”> The Most Useful Puppet Logs Puppet produces a number of very useful log files, from basic platform logs to full application orchestration reports. The most commonly used Puppet logs include: Platform Master Logs These give generalized feedback on issues such as compilation errors, depreciation warnings, and crash/fatal termination. They can be found at the following locations: /var/log/puppetlabs/puppetserver/puppetserver.log /var/log/puppetlabs/puppetserver/puppetserver-daemon.log Application Orchestration Logs Application orchestration is probably the single most attractive aspect of the Puppet platform. It enables the complete end-to-end integration of the DevOps cycle into a production software application. As a result, these logs are likely to be the most critical logs of all. They include: /var/log/pe-mcollective/mcollective.log – This log file contains all of the log entries that affect the actual MCollective platform. This is a good first place to check if something has gone wrong with application orchestration. /var/lib/peadmin/.mcollective.d/client.log – a log file to be found on the client connecting to the MCollective server, the twin to the log file above, and the second place to begin troubleshooting. /var/log/pe-activemq/activemq.log – a log file that contains entries for ActiveMQ. /var/log/pe-mcollective/mcollective-audit.log – a top-level view of all MCollective requests. This could be a good place to look if you are unsure of exactly where the problem occurred so that you can highlight the specific audit event that triggered the problem. Puppet Console Logs Also valuable are Puppet logs, which include the following: /var/log/pe-console-services/console-services.log – the main console log that contains entries for top-level events and requests from all services that access the console. /var/log/pe-console-services/pe-console-services-daemon.log – low-level console event logging that occurs before the standard logback system is loaded. This is a useful log to check if the problem involves the higher level logback system itself. /var/log/pe-httpd/puppet-dashboard/access.log – a log of all HTTPS access requests made to the Puppet console. Advanced Logging Using Further Tools The inbuilt logging functions of Puppet mostly revolve around solving issues with the Puppet platform itself. However, Puppet offers some additional technologies to help visualize status date. One of these is the Puppet Services Status Check. This is both a top-level dashboard and a queryable API that provides top-level real-time status information on the entire Puppet platform. Puppet can also be configured to support Graphite. Once this has been done, a mass of useful metrics can be analyzed using either the demo Graphite dashboard provided, or using a custom dashboard. The ready-made Grafana dashboard makes a good starting point for measuring application performance that will affect end users, as it includes the following metrics by default: Active requests – a graphical measure of the current application load. Request durations – a graph of average latency/response times for application requests. Function calls – graphical representation of different functions called from the application catalogue. Potentially very useful for tweaking application performance. Function execution time – graphical data showing how fast specific application processes are executed. Using Puppet with Sumo Logic To get the very most out of Puppet log data, you can analyze the data using an external log aggregation and analytics platform, such as Sumo Logic. To work with Puppet logs on Sumo Logic, you simply use the Puppet module for installing the Sumo Logic collector. You’ll then be able to visualize and monitor all of your Puppet log data from the Sumo Logic interface, alongside logs from any other applications that you connect to Sumo Logic. You can find open source collectors for Docker, Chef, Jenkins, FluentD and many other servers at Sumo Logic Developers on Github. About the Author Ali Raza is a DevOps consultant who analyzes IT solutions, practices, trends and challenges for large enterprises and promising new startup firms. Troubleshooting Apps and Infrastructure Using Puppet Logs is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Sumo Logic Launches New Open Source Site on Github Pages

This week we deployed our new Github pages site, Sumo Logic Developers, to showcase some of the cool open-source projects that are being built in and around Sumo Logic. The goal is to share the knowledge and many projects that are being built by our customers, partners and everything in the wild. Aside from the official Sumo Logic repositories, there’s also a lot of tribal knowledge: Our Sales Engineers and Customer Success teams work daily with our customers to solve real-world problems. Often those solutions include search queries, field extraction rules, useful regular expressions, and configuration settings that are often captured in Github and other code registries. Sumo Logic Developers aggregates this knowledge in one place under Sumo Experts. Likewise, we’re seeing more organizations creating plugins and otherwise integrating Sumo Logic into to their products using our REST APIs. So under Integrations you’ll find a Jenkins Publisher plugin, along with tools for working with Twillio, Chef, DataDog, FluentD and many others. Sumo Logic Developers also provides everything our customers need to collect and ingest data including APIs and documentation, educational articles and walkthroughs from our Dev Blog, and links to all of our supported apps and integrations. The project was an enjoyable break from daily tasks and for those considering building a Github Pages site, I’d like to share my experience in building Sumo Logic Developers. If you’d like to contribute or add your project or repo to one of the lists on Sumo Logic Developers simply reply to this thread on our community site, Sumo Dojo. Alternatively, you can ping me on Twitter @codejournalist. Deploying Sumo Logic Developers on Github Pages Sumo Logic Developers is built on Github pages using Jekyll as a static site generator. Github pages are served from a special repository associated with your User or Organization account on Github. So all the code is visible unless you make it private. Github pages supports Jekyll natively so it’s easy to run builds on your local machine and review changes on a localhost before deploying to Github. Using Bootstrap I used the Bootstrap JavaScript framework to make the site responsive and mobile friendly. In particular Bootstrap CSS uses a grid system that uses rows and up to 12 columns. The grid scales as the device or viewport size increases and decreases. It includes predefined classes for easy layout options, so I can specify a new row like this: <div class=”row”> Within the row, I can designate the entire width of the screen like this: <div class=”col-xs-12 col-sm-12 col-md-12 col-lg-12 hidden-xs”> This create a full screen area using all 12 columns. Note that Bootstrap CSS lets you specify different sizes for desktop, laptop, tablets and mobile phones. This last example hides the content in this on extra small mobile screens. The following creates an area of 4 units on large and medium screens (most desktops and laptops, 6 units on small screens (most tablets) 8 units on phone devices. <div class=”col-lg-4 col-md-4 col-sm-6 col-xs-8″> Using Jekyll and Git Jekyll is a static site generator that includes the Liquid template engine to create data-driven sites. So using Liquid’s template language I was able to populate open-source projects into the site using data read from a Yaml file. This feature allows our internal teams to quickly update the site without having to muck in the code. A typical entry looks like: - title: "Sumo Report Generator" repo_description: "Tool that allows a user to execute multiple searches, and compile the data into a single report." url: "https://github.com/SumoLogic/sumo-report-generator" We use a Github workflow, so team members can simply create a new branch off of the master, edit the title, repo_description and url, then make a pull request to merge the request. You can find all of the source code at https://github.com/SumoLogic/sumologic.github.io Contributing to the Github Pages Site At it’s heart, Sumo Logic Developers is really a wiki – it’s informative with an amazing amount of rich content, it’s based on the work of our community of developers and DevOps practitioners, and it operates on a principle of collaborative trust. It’s brimming with tribal knowledge – the lessons learned from those who’ve travelled before you. In fact, the site currently represents: 30 Official Sumo Logic repositories ~ 20 Individually maintained repos by our customers, as well as SE, Customer Success and Engineering teams. 100+ Gists containing scripts, code snippets, useful regular expressions, etc. 3rd-party integrations with Jenkins, FluentD, Chef, Puppet and others. 100+ Blogs Links to our community, API’s and documentation Our main task now is finding new projects and encouraging members like you to contribute. The goal is to empower the tribe – all of those who use Sumo Logic to build, run and secure their modern applications. If you’d like to contribute, post your project on Sumo Dojo, or ping us @sumologic. Michael is the Head of Developer Programs at Sumo Logic. You can follow him on Twitter @codejournalist. Read more of Michael’s posts on Sumo Logic’s DevOps Blog.

Blog

Logging S3 API Calls with CloudTrail

Blog

How We Survived the Dyn DNS Outage

What Happened? On Friday October 21st, Dyn, a major DNS provider, started having trouble due to a DOS attack. Many companies including PagerDuty, Reddit, Twitter, and others suffered significant downtime. Sumo Logic had a short blip of failures, but stayed up, allowing our customers to continue to seamlessly use our service for monitoring and troubleshooting within their organizations. How did Sumo Logic bear the outage? Several months ago, we suffered a DNS outage and had a postmortem that focused on being more resilient to such incidents. We decided to create a primary-secondary setup for DNS. After reading quite a bit about how this should work in theory, we implemented a solution with two providers: Neustar and Dyn. This setup saved us during today’s outage. I hope you can learn from our setup and make your service more resilient as well. How is a primary-secondary DNS setup supposed to work? You maintain the DNS zone on the primary only. Any update to that zone gets automatically replicated to the secondary via two methods: A push notification from the primary and a periodic pull from the secondary. The two providers stay in sync and you do not have to worry about maintenance of the zone. Your registrar is configured with nameservers from both providers. Order does NOT matter. DNS Resolvers do not know which nameservers are primary and which are secondary. They just choose between all the configured nameservers. Most DNS Resolvers choose which name server to use based on latency of the prior responses. The rest of the DNS Resolvers choose at random. If you have 4 nameservers with 1 from one provider and 3 from another, the more simplistic DNS Resolvers will split traffic 1/4 to 3/4, whereas the ones that track latency will still hit the faster provider more often. When there is a problem contacting a nameserver, DNS Resolvers will pick another nameserver from the list until one works. How to set up a primary-secondary DNS? Sign up for two different companies who provide high-speed DNS services and offer primary/secondary setup. My recommendation is: NS1, Dyn, Neustar (ultradns) and Akamai. Currently Amazon’s Route53 does not provide transfer ability and therefore cannot support primary/secondary setup. ( You would have to change records in both providers and keep them in sync.) Slower providers will not take on as much traffic as fast ones, so you have to be aware of how fast the providers are for your customers. Configure one to be primary. This is the provider who you use when you make changes to your DNS. Follow the primary provider’s and secondary provider’s instructions to set up the secondary provider. This usually involves configuring whitelisting the secondary’s IPs at the primary, adding notifications to primary, and telling the secondary what IPs to use to get the transfer at the primary. Ensure that the secondary is syncing your zones with the primary. (Check on their console and try doing a dig @nameserver domain for the secondary’s nameservers.) Configure your registrar with both the primary’s and secondary’s name servers. We found out that the order does not matter at all. Our nameserver setup at the registrar: ns1.p29.dynect.net ns2.p29.dynect.net udns1.ultradns.net udns2.ultradns.net What happened during the outage? We got paged at 8:53 AM for DNS problem hitting service.sumologic.com. This was from our internal as well as external monitors. The oncalls ran a “dig” against all four of our nameservers and discovered that Dyn was down hard. We knew that we had a primary/secondary DNS setup, but neither provider had experienced any outages since we set it up. We also knew that it would take DNS Resolvers some time to decide to use Neustar nameservers as opposed to Dyn ones. Our alarms went off, so, we posted a status page telling our customers that we are experiencing an incident with our DNS and to let us know if they see a problem. Less than an hour later, our alarms stopped going off (although Dyn was still down). No Sumo Logic customers reached out to Support to let us know that they had issues. Here is a graph of the traffic decreases for one of the Sumo Logic domains during the Dyn Outage: Here is a graph of Neustar (UltraDNS) pulling in more traffic during the outage: In conclusion: This setup worked for Sumo Logic. We do not have control over DNS providers, but we can prevent their problems from affecting our customers. You can easily do the same.

October 21, 2016

Blog

Best Practices for Analyzing Elastic Load Balancer Logs

Blog

Sumo Logic Launches Ultimate Log Bible Project

Blog

Working With Field Extraction Rules in Sumo Logic

Field extraction rules compress queries into short phrases, filter out unwanted fields and drastically speed up query times. Fifty at a time can be stored and used in what Sumo Logic calls a “parser library.” These rules are a must once you move from simple collection to correlation and dashboarding. Since they tailor searches prior to source ingestion, the rules never collect unwanted fields, which can drastically speed up query times. Correlations and dashboards require many queries to load simultaneously, so the speed impact can be significant. Setting Up Field Extraction Rules The Sumo Logic team has written some templates to help you get started with common logs like IIS and Apache. While you will need to edit them, they take a lot of the pain out of writing regex parsers from scratch (phew). And if you write your own reusable parsers, save them as a template so you can help yourself to them later. To get started, find a frequently used query snippet. The best candidates are queries that (1) are used frequently and (2) take a while to load. These might pull from dense sources (like iis) or just crawl back over long periods of time. You can also look at de facto high usage queries saved in dashboards, alerts and pinned searches. Once you have the query, first take a look at what the source pulls without any filters. This is important both to ensure that you collect what’s needed, and that you don’t include anything that will throw off the rules. Since rules are “all or nothing,” only include persistent fields. In the example below, I am pulling from a safend collector. Here’s the output from a collector on a USB: 2014-10-09T15:12:33.912408-04:00 safend.host.com [Safend Data Protection] File Logging Alert details: User: [email protected], Computer: computer.host.com, Operating System: Windows 7, Client GMT: 10/9/2014 7:12:33 PM, Client Local Time: 10/9/2014 3:12:33 PM, Server Time: 10/9/2014 7:12:33 PM, Group: , Policy: Safend for Cuomer Default Policy, Device Description: Disk drive, Device Info: SanDisk Cruzer Pattern USB Device, Port: USB, Device Type: Removable Storage Devices, Vendor: 0781, Model: 550A, Distinct ID: 3485320307908660, Details: , File Name: F:\SOME_FILE_NAME, File Type: PDF, File Size: 35607, Created: 10/9/2014 7:12:33 PM, Modified: 10/9/2014 7:12:34 PM, Action: Write There are certainly reasons to collect all of this (and note that the rule won’t limit collection on the source collector) but I only want to analyze a few parameters. To get it just right, filter it in the Field Extraction panel: Below is the simple Parse Expression I used. Note that more parsing tools are supported that can grep nearly anything that a regular query can. But in this case, I just used parse and nodrop. Nodrop tells the query to pass results along even if the query returns nothing from that field. In this case, it acts like an OR function that concatenates the first three parse functions along with the last one. So if ‘parse regex “Action…”‘ returns nothing, nodrop commands the query to “not drop”, return a blank, and in this case, continue to the next function. Remember that Field Extraction Rules are “all or nothing” with respect to fields. If you add a field that doesn’t exist, nodrop will not help since it only works within existing fields. Use Field Extraction Rules to Speed Up Dashboard Load Time The above example would be a good underlying rule for a larger profiling dashboard. It returns file information only—Action on the File, File ID, File Size, and Type. Another extraction rule might return only User and User Activities, while yet another might include only host server actions. These rules can then be surfaced as dashboard panes, combined into profiles and easily edited. They load only the fields extracted, significantly improving load time, and the modularity of the rules provides a built-in library that makes editing and sharing useful snippets much simpler. Working With Field Extraction Rules in Sumo Logic is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production. About the Author Alex Entrekin served on the executive staff of Cloudshare where he was primarily responsible for advanced analytics and monitoring systems. His work extending Splunk into actionable user profiling was featured at VMworld: “How a Cloud Computing Provider Reached the Holy Grail of Visibility.” Alex is currently an attorney, researcher and writer based in Santa Barbara, CA. He holds a J.D. from the UCLA School of Law.

Blog

Monitoring and Analyzing Puppet Logs With Sumo Logic

The top Puppet question on ServerFault is How can the little guys effectively learn and use Puppet? Learning Puppet requires learning a DSL that’s thorny enough that the final step in many migrations is to buy Puppet training classes for team. While there is no getting around learning the Puppet DSL, the “little guys” can be more effective if they avoid extending Puppet beyond the realm of configuration management (CM). It can be tempting to extend Puppet to become a monitoring hub, a CI spoke, or many other things. After all, if it’s not in Puppet, it won’t be in your environment, so why not build on that powerful connectedness? The cons of Puppet for log analysis and monitoring Here’s one anecdote from scriptcrafty explaining some of the problems with extending beyond CM: Centralized logic where none is required, Weird DSLs and templating languages with convoluted error messages, Deployment and configuration logic disembodied from the applications that required them and written by people who have no idea what the application requires, Weird configuration dependencies that are completely untestable in a development environment, Broken secrets/token management and the heroic workarounds, Divergent and separate pipelines for development and production environments even though the whole point of these tools is to make things re-usable, and so on and so forth. Any environment complex enough to need Puppet is already too complex to be analyzed with bash and PuppetDB queries. These tools work well for spot investigation and break/fix, but do not extend easily into monitoring and analysis. I’ll use “borrow-time” as an example. To paraphrase the Puppet analytics team, “borrow-time” is the amount of time that the JRuby instances handling Puppet tasks spend on each request. If this number gets high, then there may be something unusually expensive going on. For instance, when the “borrow-timeout-count” metric is > 0, some build request has gone unfilled. It’s tempting to think that the problem is solved by setting a “borrow-timeout-count” trigger in PuppetDB for >0. After all, just about any scripting language will do, and then analysis can be done in the PuppetDB logs. Puppet even has some guides for this in Puppet Server – What’s Going on in There? Monitoring a tool with only its own suggested metrics is not just a convenience sample, but one that is also blind to the problem at hand—uptime and consistency across an inconsistent and complex environment. Knowing that some request has gone unhandled is a good starting point. A closer look at Puppet logs and metrics But look at everything else that Puppet shows when pulling metrics:is trying approach is it runs a risk so let’s look at what one “borrow-time” metrics pull brings up: In the Puppet server: pe-jruby-metrics->status->experimental->metrics "metrics": { "average-borrow-time": 75, "average-free-jrubies": 1.86, "average-lock-held-time": 0, "average-lock-wait-time": 0, "average-requested-jrubies": 1.8959058782351241, "average-wait-time": 77, "borrow-count": 10302, "borrow-retry-count": 0, "borrow-timeout-count": 0, "borrowed-instances": [ { "duration-millis": 2888, "reason": { "request": { "request-method": "post", "route-id": "puppet-v3-catalog-/*/", "uri": "/puppet/v3/catalog/foo.puppetlabs.net" } }, }, ...], "num-free-jrubies": 0, "num-jrubies": 4, "num-pool-locks": 0, "requested-count": 10305, "requested-instances": [ { "duration-millis": 134, "reason": { "request": { "request-method": "get", "route-id": "puppet-v3-file_metadata-/*/", "uri": "/puppet/v3/file_metadata/modules/catalog_zero16/catalog_zero16_impl83.txt" } }, }, ...], "return-count": 10298 } If you are lucky, you’ll have an intuitive feeling about the issue before asking whether the retry count is too high, or if it was only a problem in a certain geo. If the problem is severe, you won’t have time to check the common errors (here and here); you’ll want context. How Sumo Logic brings context to Puppet logs Adding context—such as timeseries, geo, tool, and user—is the primary reason to use Sumo for Puppet monitoring and analysis. Here is an overly simplified example Sumo Logic query where jruby borrowing is compared with the Apache log 2**/3**/4** errors: _sourceName=*jruby-metrics* AND _sourceCategory=*apache* | parse using public/apache/access | if(status_code matches "2*", 1, 0) as successes | if(status_code matches "5*", 1, 0) as server_errors | if(status_code matches "4*", 1, 0) as client_errors | if (num-free-jrubies matches “0”,1,0) as borrowrequired | timeslice by 1d | sum(successes) as successes, sum(client_errors) as client_errors, sum(server_errors) as server_errors sum(borrowrequired) as borrowed_jrubies by _timeslice Centralizing monitoring across the environment means not only querying and joining siloed data, but also allowing for smarter analysis. By appending an “outlier” query to something like the above, you can set baselines and spot trends in your environment instead of guessing and then querying. | timeslice 15d | max(borrowed_jrubies) as borrowed_jrubies by _timeslice | outlier response_time source: help.sumologic.com/Search/Search_Query_Language/Search_Operators/outlier About the Author Alex Entrekin served on the executive staff of Cloudshare where he was primarily responsible for advanced analytics and monitoring systems. His work extending Splunk into actionable user profiling was featured at VMworld: “How a Cloud Computing Provider Reached the Holy Grail of Visibility.” Alex is currently an attorney, researcher and writer based in Santa Barbara, CA. He holds a J.D. from the UCLA School of Law. Monitoring and Analyzing Puppet Logs With Sumo Logic is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Application Containers vs. System Containers: Understanding the Difference

When people talk about containers, they usually mean application containers. Docker is automatically associated with application containers and is widely used to package applications and services. But there is another type of container: system containers. Let us look at the differences between application containers vs. system containers and see how each type of container is used: Application Containers System Containers Images Application/service centric Growing tool ecosystem Machine-centric Limited tool ecosystem Infrastructure Security concerns Networking challenges Hampered by base OS limitations Datacenter-centric Isolated & secure Optimized networking The Low-Down on Application Containers Application containers are used to package applications without launching a virtual machine for each app or each service within an app. They are especially beneficial when making the move to a microservices architecture, as they allow you to create a separate container for each application component and provide greater control, security and process restriction. Ultimately, what you get from application containers is easier distribution. The risks of inconsistency, unreliability and compatibility issues are reduced significantly if an application is placed and shipped inside a container. Docker is currently the most widely adopted container service provider with a focus on application containers. However, there are other container technologies like CoreOS’s Rocket. Rocket promises better security, portability and flexibility of image sharing. Docker already enjoys the advantage of mass adoption, and Rocket might just be too late to the container party. Even with its differences, Docker is still the unofficial standard for application containers today. System Containers: How They’re Used System containers play a similar role to virtual machines, as they share the kernel of the host operating system and provide user space isolation. However, system containers do not use hypervisors. (Any container that runs an OS is a system container.) They also allow you to install different libraries, languages, and databases. Services running in each container use resources that are assigned to just that container. System containers let you run multiple processes at the same time, all under the same OS and not a separate guest OS. This lowers the performance impact, and provides the benefits of VMs, like running multiple processes, along with the new benefits of containers like better portability and quick startup times. Useful System Container Tools Joyent’s Triton is a Container as a Service that implements its proprietary OS called SmartOS. It not only focuses on packing apps into containers but also provides the benefits of added security, networking and storage, while keeping things lightweight, with very little performance impact. The key differentiator is that Triton delivers bare-metal performance. With Samsung’s recent acquisition of Joyent, it’s left to be seen how Triton progresses. Giant Swarm is a hosted cloud platform that offers a Docker-based virtualization system that is configured for microservices. It helps businesses manage their development stack, spend less time on operations setup, and more time on active development. LXD is a fairly new OS container that was released in 2016 by Canonical, the creators of Ubuntu. It combines the speed and efficiency of containers with the famed security of virtual machines. Since Docker and LXD share the same kernels, it is easy to run Docker containers inside LXD containers. Ultimately, understanding the differences and values of each type of container is important. Using both to provide solutions for different scenarios can’t be ruled out, either, as different teams have different uses. The development of containers, just like any other technology, is quickly advancing and changing based on newer demands and the changing needs of users. Monitoring Your Containers Whatever the type of container, monitoring and log analysis is always needed. Even with all of the advantages that containers offer as compared to virtual machines, things will go wrong. That is why it is important to have a reliable log-analysis solution like Sumo Logic. One of the biggest challenges of Docker adoption is scalability, and monitoring containerized apps. Sumo Logic addresses this issue with its container-native monitoring solution. The Docker Log Analysis app from Sumo Logic can visualize your entire Docker ecosystem, from development to deployment. It uses advanced machine learning algorithms to detect outliers and anomalies when troubleshooting issues in distributed container-based applications. Sumo Logic’s focus on containers means it can provide more comprehensive and vital log analysis than traditional Linux-based monitoring tools. About the Author Twain began his career at Google, where, among other things, he was involved in technical support for the AdWords team. His work involved reviewing stack traces, and resolving issues affecting both customers and the Support team, and handling escalations. Later, he built branded social media applications, and automation scripts to help startups better manage their marketing operations. Today, as a technology journalist he helps IT magazines, and startups change the way teams build and ship applications. Application Containers vs. System Containers: Understanding the Difference is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Managing Container Data Using Docker Data Volumes

Docker data volumes are designed to solve one of the deep paradoxes of containers, which is this: For the very same reasons that containers make apps highly portable — and, by extension, create more nimble data centers — they also make it hard to store data persistently. That’s because, by design, containerized apps are ephemeral. Once you shut down a container, everything inside it disappears. That makes your data center more flexible and secure, since it lets you spin up apps rapidly based on clean images. But it also means that data stored inside your containers disappears by default. How do you resolve this paradox? There are actually several ways. You could jerry-rig a system for loading data into a container each time it is spun up ( via SSH, for example), then exporting it somehow, but that’s messy. You could also turn to traditional distributed storage systems, like NFS, which you can access directly over the network. But that won’t work well if you have a complicated (software-defined) networking situation (and you probably do in a large data center). You’d think someone would have solved the Docker container storage challenge in a more elegant way by now — and someone has! Docker data volumes provide a much cleaner, straightforward way to provide persistent data storage for containers. That’s what I’ll cover here. Keep reading for instructions on setting up and deploying Docker data volumes (followed by brief notes on storing data persistently directly on the host). Creating a Docker Data Volume To use a data volume in Docker, you first need to create a container to host the volume. This is pretty basic. Just use a command like: docker create -v /some/directory mydatacontainer debian This command tells Docker to create a new container named mydatacontainer based on the Debian Docker image. (You could use any of Docker’s other OS images here, too.) Meanwhile, the -v flag in the command above sets up a storage container in the directory /some/directory inside the container. To repeat: That means the data is stored at /some/directory inside the container called mydatacontainer — not at /some/directory on your host system. The beauty of this, of course, is that we can now write data to /some/directory inside this container, and it will stay there as long as the container remains up. Using a Data Volume in Docker So that’s all good and well. But how do you actually get apps to use the new data volume you created? Pretty easily. The next and final step is just to start another container, using the –volumes-from flag to tell Docker that this new container should store data in the data volume we created in the first container. Our command would look something like this: docker run --volumes-from mydatacontainer --volumes-from debian Now, any data changes made inside the container debian will be saved inside mydatacontainer at the directory /some/directory. And they’ll stay there if you stop debian — which means this is a persistent data storage solution. (Of course, if you stop mycontainervolume, then you’ll also lose the data inside.) You can have as many data volumes as you want, by the way. Just specify multiple ones when you run the container that will access the volumes. Data Storage on Host instead of a container? You may be thinking, “What if I want to store my data directly on the host instead of inside another container?” There’s good news. You can do that, too. We won’t use data storage volumes for this, though. Instead, we’ll run a command like: docker run -v /host/dir:/container/dir -i image This starts a new container based on the image image and maps the directory /host/dir on the host system to the directory /container/dir inside the container. That means that any data that is written by the container to /container/dir will also appear inside /host/dir on the host, and vice versa. There you have it. You can now have your container data and eat it, too. Or something like that. About the Author Hemant Jain is the founder and owner of Rapidera Technologies, a full service software development shop. He and his team focus a lot on modern software delivery techniques and tools. Prior to Rapidera he managed large scale enterprise development projects at Autodesk and Deloitte. Managing Container Data Using Docker Data Volumes is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Getting the Most Out of SaltStack Logs

SaltStack, also known simply as Salt, is a handy configuration management platform. Written in Python, it’s open source and allows ITOps teams to define “Infrastructure as Code” in order to provision and orchestrate servers. But SaltStack’s usefulness is not limited to configuration management. The platform also generates logs, and like all logs, that data can be a useful source of insight in all manner of ways. This article provides an overview of SaltStack logging, as well as a primer on how to analyze SaltStack logs with Sumo Logic. Where does SaltStack store logs? The first thing to understand is where SaltStack logs live. The answer to that question depends on where you choose to place them. You can set the log location by editing your SaltStack configuration file on the salt-master. By default, this file should be located at /etc/salt/master on most Unix-like systems. The variable you’ll want to edit is log_file. If you want to store logs locally on the salt-master, you can simply set this to any location on the local file system, such as /var/log/salt/salt_master. Storing Salt logs with rsyslogd If you want to centralize logging across a cluster, however, you will benefit by using rsyslogd, a system logging tool for Unix-like systems. With rsyslogd, you can configure SaltStack to store logs either remotely or on the local file system. For remote logging, set the log_file parameter in the salt-master configuration file according to the format: <file|udp|tcp>://<host|socketpath>:/. For example, to connect to a server named mylogserver (whose name should be resolveable on your local network DNS, of course) via UDP on port 2099, you’d use a line like this one: log_file: udp://mylogserver:2099 Colorizing and bracketing your Salt logs Another useful configuration option that SaltStack supports is custom colorization of console logs. This can make it easier to read the logs by separating high-priority events from less important ones. To set colorization, you change the log_fmt_console parameter in the Salt configuration file. The colorization options available are: '%(colorlevel)s' # log level name colorized by level '%(colorname)s' # colorized module name '%(colorprocess)s' # colorized process number '%(colormsg)s' # log message colorized by level Log files can’t be colorized. That would not be as useful, since the program you use to read the log file may not support color output, but they can be padded and bracketed to distinguish different event levels. The parameter you’ll set here is log_fmt_logfile and the options supported include: '%(bracketlevel)s' # equivalent to [%(levelname)-8s] '%(bracketname)s' # equivalent to [%(name)-17s] '%(bracketprocess)s' # equivalent to [%(process)5s] How to Analyze SaltStack logs with Sumo Logic So far, we’ve covered some handy things to know about configuring SaltStack logs. You’re likely also interested in how you can analyze the data in those logs. Here, Sumo Logic, which offers easy integration with SaltStack, is an excellent solution. Sumo Logic has an official SaltStack formula, which is available from GitHub. To install it, you can use GitFS to make the formula available to your system, but the simpler approach (for my money, at least) is simply to clone the formula repository in order to save it locally. That way, changes to the formula won’t break your configuration. (The downside, of course, is that you also won’t automatically get updates to the formula, but you can always update your local clone of the repository if you want them.) To set up the Sumo Logic formula, run these commands: mkdir -p /srv/formulas # or wherever you want to save the formula cd /srv/formulas git clone https://github.com/saltstack-formulas/sumo-logic-formula.git Then simply edit your configuration by adding the new directory to the file_roots parameter, like so: file_roots: base: - /srv/salt - /srv/formulas/sumo-logic-formula Restart your salt-master and you’re all set. You’ll now be able to analyze your SaltStack logs from Sumo Logic, along with any other logs you work with through the platform. Getting the Most Out of SaltStack Logs is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production. About the Author Chris Tozzi has worked as a journalist and Linux systems administrator. He has particular interests in open source, agile infrastructure and networking. He is Senior Editor of content and a DevOps Analyst at Fixate IO.

Blog

Setting Up a Docker Environment Using Docker Compose

Blog

Integrated Container Security Monitoring with Twistlock

Blog

5 Log Monitoring Moves to Wow Your Business Partner

Looking for some logging moves that will impress your business partner? In this post, we’ll show you a few. But first, a note of caution:If you’re going to wow your business partner, make a visiting venture capitalist’s jaw drop, or knock the socks off of a few stockholders, you could always accomplish that with something that has a lot of flash, and not much more than that, or you could show them something that has real and lasting substance, and will make a difference in your company’s bottom line. We’ve all seen business presentations filled with flashy fireworks, and we’ve all seen how quickly those fireworks fade away.Around here, though, we believe in delivering value—the kind that stays with your organization, and gives it a solid foundation for growth. So, while the logging moves that we’re going to show you do look good, the important thing to keep in mind is that they provide genuine, substantial value—and discerning business partners and investors (the kind that you want to have in your corner) will recognize this value quickly.Why Is Log Monitoring Useful?What value should logs provide? Is it enough just to accumulate information so that IT staff can pick through it as required? That’s what most logs do, varying mostly in the amount of information and the level of detail. And most logs, taken as raw data, are very difficult to read and interpret; the most noticeable result of working with raw log data, in fact, is the demand that it puts on IT staff time.5 Log Monitoring Steps to SuccessMost of the value in logs is delivered by means of systems for organizing, managing, filtering, analyzing, and presenting log data. And needless to say, the best, most impressive, most valuable logging moves are those which are made possible by first-rate log management. They include:Quick, on-the-spot, easy-to-understand analytics. Pulling up instant, high-quality analytics may be the most impressive move that you can make when it comes to logging, and it is definitely one of the most valuable features that you should look for in any log management system. Raw log data is a gold mine, but you need to know how to extract and refine the gold. A high-quality analytics system will extract the data that’s valuable to you, based on your needs and interests, and present it in ways that make sense. It will also allow you to quickly recognize and understand the information that you’re looking for.Monitoring real-time data. While analysis of cumulative log data is extremely useful, there are also plenty of situations where you need to see what is going on right at the moment. Many of the processes that you most need to monitor (including customer interaction, system load, resource use, and hostile intrusion/attack) are rapid and transient, and there is no substitute for a real-time view into such events. Real-time monitoring should be accompanied by the capacity for real-time analytics. You need to be able to both see and understand events as they happen.Fully integrated logging and analytics. There may be processes in software development and operations which have a natural tendency to produce integrated output, but logging isn’t one of them. Each service or application can produce its own log, in its own format, based on its own standards, without reference to the content or format of the logs created by any other process. One of the most important and basic functions that any log management system can perform is log integration, bringing together not just standard log files, but also event-driven and real-time data. Want to really impress partners and investors? Bring up log data that comes from every part of your operation, and that is fully integrated into useful, easily-understood output.Drill-down to key data. Statistics and aggregate data are important; they give you an overall picture of how the system is operating, along with general, system-level warnings of potential trouble. But the ability to drill down to more specific levels of data—geographic regions, servers, individual accounts, specific services and processes —is what allows you to make use of much of that system-wide data. It’s one thing to see that your servers are experiencing an unusually high level of activity, and quite another to drill down and see an unusual spike in transactions centered around a group of servers in a region known for high levels of online credit card fraud. Needless to say, integrated logging and scalability are essential when it comes to drill-down capability.Logging throughout the application lifecycle. Logging integration includes integration across time, as well as across platforms. This means combining development, testing, and deployment logs with metrics and other performance-related data to provide a clear, unified, in-depth picture of the application’s entire lifecycle. This in turn makes it possible to look at development, operational, and performance-related issues in context, and see relationships which might not be visible without such cross-system, full lifecycle integration.Use Log Monitoring to Go for the GoldSo there you have it—five genuine, knock-’em-dead logging moves. They’ll look very impressive in a business presentation, and they’ll tell serious, knowledgeable investors that you understand and care about substance, and not just flash. More to the point, these are logging capabilities and strategies which will provide you with valuable (and often crucial) information about the development, deployment, and ongoing operation of your software.Logs do not need to be junkpiles of unsorted, raw data. Bring first-rate management and analytics to your logs now, and turn those junk-piles into gold.5 Log Monitoring Moves to Wow Your Business Partner is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.About the AuthorMichael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.

Blog

Using HTTP Request Builders to Create Repeatable API Workflows

As an API Engineer, you’ve probably spent hours carefully considering how API will be consumed by client software, what data you are making available at which points within particular workflows, and strategies for handling errors that bubble up when a client insists on feeding garbage to your API. You’ve written tests for the serializers and expected API behaviors, and you even thought to mock those external integrations so you can dive right into the build. As you settle in for a productive afternoon of development, you notice a glaring legacy element in your otherwise modern development setup: Latest and greatest version of your IDE: Check. Updated compiler and toolchain: Installed. Continuous Integration: Ready and waiting to put your code through its paces. That random text file containing a bunch of clumsily ordered cURL commands. …one of these things is not like the others. It turns out we’ve evolved…and so have our API tools Once upon a time, that little text file was state-of-the-art in API development. You could easily copy-paste commands into a terminal and watch your server code spring into action; however, deviating from previously built requests required careful editing. Invariably, a typo would creep into a crucial header declaration, or revisions to required parameters were inconsistently applied, or perhaps a change in HTTP method resulted in a subtly different API behavior that went unnoticed release over release. HTTP Request Builders were developed to take the sting out of developing and testing HTTP endpoints by reducing the overhead in building and maintaining test harnesses, allowing you to get better code written with higher quality. Two of the leaders in the commercial space are Postman and Paw, and they provide a number of key features that will resonate with those who either create or consume APIs: Create HTTP Requests in a visual editor: See the impact of your selected headers and request bodies on the request before you send it off to your server. Want to try an experiment? Toggle parameters on or off with ease or simply duplicate an existing request and try two different approaches! Organize requests for your own workflow…or collaborate with others: Create folders, reorder, and reorganize requests to make it painless to walk through sequential API calls. Test across multiple environments: Effortlessly switch between server environments or other variable data without having to rewrite every one of your requests. Inject dynamic data: Run your APIs as you would expect them to run in production, taking data from a previous API as the input to another API. From here, let’s explore the main features of HTTP Request Builders via Paw and show how those features can help make your development and test cycles more efficient. Although Paw will be featured in this post, many of these capabilities exist in other HTTP Builder packages such as Postman. How to Streamline your HTTP Request Pipeline Command-line interfaces are great for piping together functionality in one-off tests or when building out scripts for machines to follow, but quickly become unwieldy when you have a need to make sweeping changes to the structure or format of an API call. This is where visual editors shine, giving the human user an easily digestible view of the structure of the HTTP request, including its headers, querystring and body so that you can review and edit requests in a format that puts the human first. Paw’s editor is broken up into three areas. Working from left to right, these areas are: Request List: Each distinct request in your Paw document gets a new row in this panel and represents the collection of request data and response history associated with that specific request. HTTP Request Builder: This is the primary editor for constructing HTTP requests. Tabs within this panel allow you to quickly switch between editing headers, URL parameters, and request bodies. At the bottom of the panel is the code generator, allowing you to quickly spawn code for a variety of languages including Objective-C, Swift, Java, and even cURL! HTTP Exchange: This panel reflects the most recent request and associated response objects returned by the remote server. This panel also offers navigation controls for viewing historical requests and responses. Figure 1. Paw Document containing three sample HTTP Requests and the default panel arrangement. As you work through building up the requests that you use in your API workflows, you can easily duplicate, edit, and execute a request all in a matter of a few seconds. This allows you to easily experiment with alternate request formats or payloads while also retaining each of your previous versions. You might even score some brownie points with your QA team by providing a document with templated requests they can use to kick-start their testing of your new API! Organize Request Lists for Yourself and Others The Request List panel also doubles as the Paw document’s organization structure. As you add new requests, they will appear at the bottom of the list; however, you can customize the order by dragging and dropping requests, or a create folders to group related requests together. The order and names attached to each request help humans understand what the request does, but in no way impacts the actual requests made of the remote resource. Use these organization tools to make it easy for you to run through a series of tests or to show others exactly how to replicate a problem. If the custom sort options don’t quite cover your needs, or if your document starts to become too large, Sort and Filter bars appear at the bottom of the Request List to help you focus only on the requests you are actively working with. Group by URL or use the text filter to find only those requests that contain the URL you are working with. Figure 2. Request List panel showing saved requests, folder organization, and filtering options. Dealing with Environments and Variables Of course, many times you want to be able to test out behaviors across different environments — perhaps your local development instance, or the development instance updated by the Continuous Integration service. Or perhaps you may even want to compare functionality to what is presently available in production. It would be quite annoying to have to edit each of your requests and change the URL from one host to another. Instead, let Paw manage that with a quick switch in the UI. Figure 3. Paw’s Environment Switcher changes variables with just a couple of clicks. The Manage Environments view allows you to create different “Domains” for related kinds of variables, and add “Environments” as necessary to handle permutations of these values: Figure 4. Paw’s Environment Editor shows all Domains and gives easy access to each Environment. This allows you flexibility in adjusting the structure of a payload with a few quick clicks instead of having to handcraft an entirely new request. The Code Generator pane at the bottom of the Request Builder pane updates to show you exactly how your payload changes: Figure 5. Paw Document showing the rebuilt request based on the Server Domain’s Environment. One of the most common setups is to have a Server Domain with Environments for the different deployed versions of code. From there, you could build out a variable for the Base URL, or split it into multiple variables so that the protocol could be changed, independent of the host address — perhaps in order to quickly test whether HTTP to HTTPS redirection still works after making changes to a load balancer or routing configuration. Paw’s variables can even peer into other requests and responses and automatically rewrite successive APIs. Many APIs require some form of authentication to read or write privileged material. Perhaps the mechanism is something simple like a cookie or authentication header, or something more complex like an oAuth handshake. Either way, there is a bit of data in the response of one API that should be included in the request to a subsequent API. Paw variables can parse data from prior requests and prior responses, dynamically updating subsequent requests: Figure 6. Paw Document revealing the Response Parsed Body Variable extracting data from one request and injecting it into another. In the case shown above, we’ve set a “Response parsed body” variable as a Querystring parameter to a successive API, specifically grabbing the UserId key for the post at index 0 in the Top 100 Posts Request. Any indexable path in the response of a previous request is available in the editor. You may need to extract a session token from the sign-in API and apply it to subsequent authenticated-only requests. Setting this variable gives you the flexibility to change server environments or users, execute a sign-in API call, then proceed to hit protected endpoints in just a few moments rather than having to make sweeping edits to your requests. Request Builders: Fast Feedback, Quick Test Cycles HTTP Request Builders help give both API developers and API consumers a human-centric way of interacting with what is primarily a machine-to-machine interface. By making it easy to build and edit HTTP requests, and providing mechanisms to organize, sort, and filter requests, and allowing for fast or automatic substitution of request data, working with any API becomes much easier to digest. The next time someone hands you a bunch of cURL commands, take a few of those minutes you’ve saved from use of these tools, and help a developer join us here in the future! Using HTTP Request Builders to Create Repeatable API Workflows is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production. About the Author Bryan Musial (@BKMu) is a Full-Stack Senior Software Engineer with the San Francisco-based startup, Tally (www.meettally.com), working to make managing your credit cards automatic, simple, and secure. Previously, Bryan worked for the Blackboard Mobile team where he built and shipped iOS and Android applications used by millions of students and teachers every day.

Blog

Integrate Azure Functions with Sumo Logic Schedule Search

Blog

Customers Share their AWS Logging with Sumo Logic Use Cases

In June Sumo Dojo (our online community) launched a contest to learn more about how our customers are using Amazon Web Services like EC2, S3, ELB, and AWS Lambda. The Sumo Logic service is built on AWS and we have deep integration into Amazon Web Services. And as an AWS Technology Partner we’ve collaborated closely with AWS to build apps like the Sumo Logic App for Lambda.So we wanted to see how our customers are using Sumo Logic to do things like collecting logs from CloudWatch to gain visibility into their AWS applications. We thought you’d be interested in hearing how others are using AWS and Sumo Logic, too. So in this post I’ll share their stories along with announcing the contest winner.The contest narrowed down to two finalists – SmartThings, which is a Samsung company operates in the home automation industry and provides access to a wide range of connected devices to create smarter homes that enhance the comfort, convenience, security and energy management for the consumer.WHOmentors, Inc. our second finalist, is a publicly supported scientific, educational and charitable corporation, and fiscal sponsor of Teen Hackathon. The organization is, according to their site, “primarily engaged in interdisciplinary applied research to gain knowledge or understanding to determine the means by which a specific, recognized need may be met.”At stake was a DJI Phantom 3 Drone. All entrants were awarded a $10 Amazon gift card.AWS Logging Contest RulesThe Drone winner was selected based on the following criteria:You have to be a user of Sumo Logic and AWSTo enter the contest, a comment had to be placed on this thread in Sumo Dojo.The post could not be anonymous – you were required to log in to post and enter.Submissions closed August 15th.As noted in the Sumo Dojo posting, the winner would be selected based on our own editorial judgment and community reactions to the post (in the form of comments or “likes”) to select one that’s most interesting, useful and detailed.SmartThingsSmartThings has been working on a feature to enable Over-the-air programming (OTA) firmware updates of Zigbee Devices on user’s home networks. For the uninitiated, Zigbee is an IEEE specification for a suite of high-level communication protocols used to create personal area networks with small, low-power digital radios. See the Zigbee Alliance for more information.According to one of the firmware engineers at SmartThings, there are a lot of edge cases and potential points of failure for an OTA update including:The Cloud PlatformAn end user’s hubThe device itselfPower failuresRF inteference on the mesh networkDisaster in this scenario would be a user’s device ending up in a broken state. As Vlad Shtibin related:“Our platform is deployed across multiple geographical regions, which are hosted on AWS. Within each region we support multiple shards, furthermore within each shard we run multiple application clusters. The bulk of the services involved in the firmware update are JVM based application servers that run on AWS EC2 instances.Our goal for monitoring was to be able to identify as many of these failure points as possible and implement a recovery strategy. Identifying these points is where Sumo Logic comes into the picture. We use a key-value logger with a specific key/value for each of these failure points as well as a correlation ID for each point of the flow. Using Sumo Logic, we are able to aggregate all of these logs by passing the correlation ID when we make calls between the systems.Using Sumo Logic, we are able to aggregate all of these logs by passing the correlation ID when we make calls between the systems.We then created a search query (eventually a dashboard) to view the flow of the firmware updates as they went from our cloud down to the device and back up to the cloud to acknowledge that the firmware was updated. This query parses the log messages to retrieve the correlation ID, hub, device, status, firmware versions, etc.. These values are then fed into a Sumo Logic transaction enabling us to easily view the state of a firmware update for any user in the system at a micro level and the overall health of all OTA updates on the macro level.Depending on which part of the infrastructure the OTA update failed, engineers are then able to dig in deeper into the specific EC2 instance that had a problem. Because our application servers produce logs at the WARN and ERROR level we can see if the update failed because of a timeout from the AWS ElasticCache service, or from a problem with a query on AWS RDS. Having quick access to logs across the cluster enables us to identify issues across our platform regardless of which AWS service we are using.As Vlad noted, This feature is still being tested and hasn’t been rolled out fully in PROD yet. “The big take away is that we are much more confident in our ability identify updates, triage them when they fail and ensure that the feature is working correctly because of Sumo Logic.”WHOmentors.comWHOmentors.com, Inc. is a nonprofit scientific research organization and the 501(c)(3) fiscal sponsor of Teen Hackathon. To facilitate their training to learn languages like Java, Python, and Node.js, each individual participate begins with the Alexa Skills Kit, a collection of self-service application program interfaces (APIs), tools, documentation and code samples that make it fast and easy for teens to add capabilities for use Alexa-enabled products such as the Echo, Tap, or Dot.According WHOmentors.com CEO, Rauhmel Fox, “The easiest way to build the cloud-based service for a custom Alexa skill is by using AWS Lambda, an AWS offering that runs inline or uploaded code only when it’s needed and scales automatically, so there is no need to provision or continuously run servers.With AWS Lambda, WHOmentors.com pays only for what it uses. The corporate account is charged based on the number of requests for created functions and the time the code executes. While the AWS Lambda free tier includes one million free requests per month and 400,000 gigabyte (GB)-seconds of compute time per month, it becomes a concern when the students create complex applications that tie Lambda to other expensive services or the size of their Lambda programs are too long.Ordinarily, someone would be assigned to use Amazon CloudWatch to monitor and troubleshoot the serverless system architecture and multiple applications using existing AWS system, application, and custom log files. Unfortunately, there isn’t a central dashboard to monitor all created Lambda functions.With the integration of a single Sumo Logic collector, WHOmentors.com can automatically route all Amazon CloudWatch logs to the Sumo Logic service for advanced analytics and real-time visualization using the Sumo Logic Lambda functions on Github.”Using the Sumo Logic Lambda Functions“Instead of a “pull data” model, the “Sumo Logic Lambda function” grabs files and sends them to Sumo Logic web application immediately. Their online log analysis tool offers reporting, dashboards, and alerting as well as the ability to run specific advanced queries as needed.The real-time log analysis combination of the “SumoLogic Lambda function” assists me to quickly catch and troubleshoot performance issues such as the request rate of concurrent executions that are either stream-based event sources, or event sources that aren’t stream-based, rather than having to wait hours to identify whether there was an issue.I am most concerned about AWS Lambda limits (i.e., code storage) that are fixed and cannot be changed at this time. By default, AWS Lambda limits the total concurrent executions across all functions within a given region to 100. Why? The default limit is a safety limit that protects the corporate from costs due to potential runaway or recursive functions during initial development and testing.As a result, I can quickly determine the performance of any Lambda function and clean up the corporate account by removing Lambda functions that are no longer used or figure out how to reduce the code size of the Lambda functions that should not be removed such as apps in production.”The biggest relief for Rauhmel is he is able to encourage the trainees to focus on coding their applications instead of pressuring them to worry about the logs associated with the Lambda functions they create.And the Winner of AWS Logging Contest is…Just as at the end of an epic World-Series battle between two MLB teams, you sometimes wish both could be declared winner. Alas, there can only be one. We looked closely at the use cases, which were very different from one another. Weighing factors like the breadth in the usage of the Sumo Logic and AWS platforms added to our drama. While SmartThings uses Sumo Logic broadly to troubleshoot and prevent failure points, WHOmentors.com use case is specific to AWS Lambda. But we couldn’t ignore the cause of helping teens learn to write code in popular programming languages, and building skills that may one day lead them to a job.Congratulations to WHOmentors.com. Your Drone is on its way!

Blog

Using Logs to Speed Your DevOps Workflow

Blog

Data Analytics and Microsoft Azure

Today plenty of businesses still have real concerns about migrating applications to the cloud. Fears about network security, availability, and potential downtime swirl through the heads of chief decision makers, sometimes paralyzing organizations into standing pat on existing tech–even though it’s aging by the minute. Enter Microsoft Azure, the industry leader’s solution for going to a partially or totally cloud-based architecture. Below is a detailed look at what Azure is, the power of partnering with Microsoft for a cloud or hybrid cloud solution, and the best way to get full and actionable visibility into your aggregated logs and infrastructure metrics so your organization can react quickly to opportunities. What is Microsoft Azure? Microsoft has leveraged its constantly-expanding worldwide network of data centers to create Azure, a cloud platform for building, deploying, and managing services and applications, anywhere. Azure lets you add cloud capabilities to your existing network through its platform as a service (PaaS) model, or entrust Microsoft with all of your computing and network needs with Infrastructure as a Service (IaaS). Either option provides secure, reliable access to your cloud hosted data–one built on Microsoft’s proven architecture. Azure provides an ever expanding array of products and services designed to meet all your needs through one convenient, easy to manage platform. Below are just some of the capabilities Microsoft offers through Azure and tips for determining if the Microsoft cloud is the right choice for your organization. What can Microsoft Azure Do? Microsoft maintains a growing directory of Azure services, with more being added all the time. All the elements necessary to build a virtual network and deliver services or applications to a global audience are available, including: Virtual machines. Create Microsoft or Linux virtual machines (VMs) in just minutes from a wide selection of marketplace templates or from your own custom machine images. These cloud-based VMs will host your apps and services as if they resided in your own data center. SQL databases. Azure offers managed SQL relational databases, from one to an unlimited number, as a service. This saves you overhead and expenses on hardware, software, and the need for in-house expertise. Azure Active Directory Domain services. Built on the same proven technology as Windows Active Directory, this service for Azure lets you remotely manage group policy, authentication, and everything else. This makes moving and existing security structure partially or totally to the cloud as easy as a few clicks. Application services. With Azure it’s easier than ever to create and globally deploy applications that are compatible on all popular web and portable platforms. Reliable, scalable cloud access lets you respond quickly to your business’s ebb and flow, saving time and money. With the introduction of Azure WebApps to the Azure Marketplace, it’s easier than ever to manage production, testing and deployment of web applications that scale as quickly as your business. Prebuilt APIs for popular cloud services like Office 365, Salesforce and more greatly accelerate development. Visual Studio team services. An add-on service available under Azure, Visual Studio team services offer a complete application lifecycle management (ALM) solution in the Microsoft cloud. Developers can share and track code changes, perform load testing, and deliver applications to production while collaborating in Azure from all over the world. Visual Studio team services simplify development and delivery for large companies or new ones building a service portfolio. Storage. Count on Microsoft’s global infrastructure to provide safe, highly accessible data storage. With massive scalability and an intelligent pricing structure that lets you store infrequently accessed data at a huge savings, building a safe and cost-effective storage plan is simple in Microsoft Azure. Microsoft continues to expand its offerings in the Azure environment, making it easy to make a la carte choices for the best applications and services for your needs. Why are people trusting their workloads to Microsoft Azure? It’s been said that the on-premise data center has no future. Like mainframes and dial-up modems before them, self-hosted data centers are becoming obsolete, being replaced by increasingly available and affordable cloud solutions. Several important players have emerged in the cloud service sphere, including Amazon Web Services (AWS), perennial computing giant IBM, and Apple’s ubiquitous iCloud, which holds the picture memories and song preferences of hundreds of millions of smartphone users, among other data. With so many options, why are companies like 3M, BMW, and GE moving workloads to Microsoft Azure? Just some of the reasons: Flexibility. With Microsoft Azure you can spin up new services and geometrically scale your data storage capabilities on the fly. Compare this to a static data center, which would require new hardware and OS purchasing, provisioning, and deployment before additional power could be brought to bear against your IT challenges. This modern flexibility makes Azure a tempting solution for organizations of any size. Cost. Azure solutions don’t just make it faster and easier to add and scale infrastructure, they make it cheaper. Physical services and infrastructure devices like routers, load balancers and more quickly add up to thousands or even hundreds of thousands of dollars. Then there’s the IT expertise required to run this equipment, which amounts to major payroll overhead. By leveraging Microsoft’s massive infrastructure and expertise, Azure can trim our annual IT budget by head-turning percentages. Applications. With a la carte service offerings like Visual Studio Team Services, Visual Studio Application Insights, and Azure’s scalable, on-demand storage for both frequently accessed and ‘cold’ data, Microsoft makes developing and testing mission-critical apps a snap. Move an application from test to production mode on the fly across a globally distributed network. Microsoft also offers substantial licensing discounts for migrating their existing apps to Azure, which represents even more opportunity for savings. Disaster recovery. Sometimes the unthinkable becomes the very immediate reality. Another advantage of Microsoft Azure lay in its high-speed and geographically decentralized infrastructure, which creates limitless options for disaster recovery plans. Ensure that your critical application and data can run from redundant sites during recovery periods that last minutes or hours instead of days. Lost time is lost business, and with Azure you can guarantee continuous service delivery even when disaster strikes. The combination of Microsoft’s vast infrastructure, constant application and services development, and powerful presence in the global IT marketplace has made Microsoft Azure solutions the choice of two-thirds of the world’s Fortune 500 companies. But the infinite scalability of Azure can make it just as right for your small personal business. Logging capabilities within Microsoft Azure The secret gold mine of any infrastructure and service solution is ongoing operational and security visibility, and ultimately these comes down to extracting critical log and infrastructure metrics from the application and underlying stack. The lack of this visibility is like flying a plane blind—no one does it. Azure comes with integrated health monitoring and alert capabilities so you can know in an instant if performance issues or outages are impacting your business. Set smart alert levels for events from: Azure diagnostic infrastructure logs. Get current insights into how your cloud network is performing and take action to resolve slow downs, bottlenecks, or service failures. Windows IIS logs. View activity on your virtual web servers and respond to traffic patterns or log-in anomalies with the data Azure gathers on IIS 7. Crash dumps. Even virtual machines can ‘blue screen’ and other virtual equipment crashes can majorly disrupt your operations. With Microsoft Azure you can record crash dump data and troubleshoot to avoid repeat problems. Custom error logs. Set Azure alerts to inform you about defined error events. This is especially helpful when hosting private applications that generate internal intelligence about operations, so you can add these errors to the health checklist Azure maintains about your network. Microsoft Azure gives you the basic tools you need for error logging and monitoring, diagnostics, and troubleshooting to ensure continuous service delivery in your Azure cloud environment. Gain Full Visibility into Azure with Unified Logs and Metrics Even with Azure’s native logging and analytics tools, the vast amount of data flowing to make your network and applications operate can be overwhelming. The volume, variety and velocity of cloud data should not be underestimated. With the help of Sumo Logic, a trusted Microsoft partner, managment of that data is simple. The Sumo Logic platform unifies logs and metrics from the structured, semi-structured, and unstructured data across your entire Microsoft environment. Machine learning algorithms process vast amounts of log and metrics data, looking for anomalies and deviations from normal patterns of activity, alerting you when appropriate. With Log Reduce, Log Compare and Outlier Detection, extract continuous intelligence from your application stack and proactively respond to operational and security issues. The Sumo Logic apps for Microsoft Azure Audit, Microsoft Azure Web Apps, Microsoft Windows Server Active Directory, Microsoft Internet Information Services (IIS), and the popular Windows Performance app, make ingesting machine data in real-time and rendering it into clear, interactive visualizations for a complete picture of your applications and data. Before long the on-premise data center—along with its expensive hardware and hordes of local technicians on the payroll—may be lost to technology’s graveyard. But smart, researched investment into cloud capabilities like those provided in Microsoft Azure will make facing tomorrow’s bold technology challenges and possibilities relatively painless.

Azure

September 19, 2016

Blog

Optimizing AWS Lambda Cost and Performance Through Monitoring

In this post, I’ll be discussing the use of monitoring as a tool to optimize the cost and performance of AWS Lambda. I’ve worked on a number of teams, and almost without exception, the need to put monitoring in place has featured prominently in early plans. Tickets are usually created and discussed, and then placed in the backlog, where they seem to enter a cycle of being important—but never quite enough to be placed ahead of the next phase of work. In reality, especially in a continuous development environment, monitoring should be a top priority, and with the tools available in AWS and organizations like Sumo Logic, setting up basic monitoring shouldn’t take more than a couple of hours, or a day at most. What exactly is AWS Lambda? AWS Lambda from Amazon Web Services (AWS) allows an organization to introduce functionality into the AWS ecosystem without the need to provision and maintain servers. A Lambda function can be uploaded and configured, and then can be executed as frequently as needed without further intervention. Contrary to a typical server environment, you don’t have any control, nor do you require insight into the data elements like CPU usage or available memory, and you don’t have to worry about scaling your functionality to meet increased demand. Once a Lambda has been deployed, the cost is based on the number of requests and a few key elements we’ll discuss shortly. How to setup AWS Lambda monitoring, and what to track Before we get into how to set up monitoring and what data elements should be tracked, it is vital that you put an effective monitoring system in place. And with that decision made, AWS helps you right from the start. Monitoring is handled automatically by AWS Lambda, which means less time configuring and more time analyzing results. Logs are automatically sent to Amazon Cloudwatch where a user can view basic metrics, or harness the power of an external reporting system, and gain key insights into how Lambda is performing. The Sumo Logic App for AWS Lambda uses the Lambda logs via CloudWatch and visualizes operational and performance trends about all the Lambda functions in your account, providing insight into executions such as memory and duration usage, broken down by function versions or aliases. The pricing model for functionality deployed using AWS Lambda is calculated by the number of requests received, and the time and resources needed to process the request. Therefore, the key metrics that need to be considered are: Number of requests and associated error counts. Resources required for execution. Time required for execution or latency. Request and error counts in Lambda The cost per request is the simplest of the three factors. In the typical business environment, the goal is to drive traffic to the business’ offerings; thus, increasing the number of requests is key. Monitoring of these metrics should compare the number of actual visitors with the number of requests being made to the function. Depending on the function and how often a user is expected to interact with it, you can quickly determine what an acceptable ratio might be—1:1 or 1:3. Variances from this should be investigated to determine what the cause might be. Lambda Resources usage and processing time When Lambda is first configured, you can specify the expected memory usage of the function. The actual runtime usage may differ, and is reported on a per-request basis. Amazon then factors the cost of the request based on how much memory is used, and for how long. The latency is based on 100ms segments. If, for example, you find that your function typically completes in a time between 290ms and 310ms, there would be a definite cost savings if the function could be optimized to perform consistently in less than 300ms. These optimizations, however, would need to be analyzed to determine whether they increase resource usage for that same time, and if that increase in performance is worth an increase in cost. For more information on how Amazon calculates costs relative to resource usage and execution time, you can visit the AWS Lambda pricing page. AWS Lambda: The big picture Finally, when implementing and considering metrics, it is important to consider the big picture. One of my very first attempts with a Lambda function yielded exceptional numbers with respect to performance and utilization of resources. The process was blazingly fast, and barely used any resources at all. It wasn’t until I looked at the request and error metrics that I realized that over 90% of my requests were being rejected immediately. While monitoring can’t make your business decisions for you, having a solid monitoring system in place will give an objective view of how your functions are performing, and data to support the decisions you need to make for your business. Optimizing AWS Lambda Cost and Performance API Workflows is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production. About the Author Mike Mackrory is a Global citizen who has settled down in the Pacific Northwest – for now. By day he works as a Senior Engineer on a Quality Engineering team and by night he writes, consults on several web based projects and runs a marginally successful eBay sticker business. When he’s not tapping on the keys, he can be found hiking, fishing and exploring both the urban and the rural landscape with his kids. Always happy to help out another developer, he has a definite preference for helping those who bring gifts of gourmet donuts, craft beer and/or Single-malt Scotch.

Blog

How to Configure a Docker Cluster Using Swarm

Blog

Integrating Apps with the Sumo Logic Search API

The Sumo Logic Web app provides a search interface that lets you parse logs. This provides a great resource for a lot of use cases — especially because you can take advantage of a rich search syntax, including wildcards and various operators (documented here), directly from the Web app. But we realize that some people need to be able to harness Sumo Logic search data from within external apps, too. That’s why Sumo Logic also provides a robust RESTful API that you can use to integrate other apps with Sumo Logic search. To provide a sense of how you can use the Sumo Logic Search Job API in the real world, this post offers a quick primer on the API, along with a couple of examples of the API in action. For more detailed information, refer to the Search Job API documentation. Sumo Logic Search Integration: The Basics Before getting started there are a few essentials you should know about the Sumo Logic Search Job API. First, the API uses the HTTP GET method. That makes it pretty straightforward to build the API into Web apps you may have (or any other type of app that uses the HTTP protocol). It also means you can run queries directly from the CLI using any tool that supports HTTP GET requests, like curl or wget. Sound easy? It is! Second, queries should be directed to https://api.sumologic.com/api/v1/logs/search. You simply append your GET requests and send them on to the server. (You also need to make sure that your HTTP request contains the parameters for connecting to your Sumo Logic account; for example, with curl, you would specify these using the -u flag, for instance, curl -u [email protected]:VeryTopSecret123 your-search-query). Third, the server delivers query responses in JSON format. That approach is used because it keeps the search result data formatting consistent, allowing you to manipulate the results easily if needed. Fourth, know that the Search Job API can return up to one million records per search query. API requests are limited to four API per second and 240 requests per minute across all API calls from a customer. If the rate is exceeded, a rate limit exceeded (429) error is returned. Sumo Logic Search API Example Queries As promised, here are some real-world examples. For starters, let’s say you want to identify incidents where a database connection failure occurred. To do this, specify “database connection error” as our query, using a command like this: curl -u [email protected]:VeryTopSecret123 "https://api.sumologic.com/api/v1/logs/search?q=database connection error" (That’s all one line, by the way.) You can take things further, too, by adding date and time parameters to the search. For example, if you wanted to find database connection errors that happened between about 1 p.m. and 3 p.m. on April 4, 2012, you would add some extra data to your query, making it look like this: curl -u [email protected]:VeryTopSecret123 "https://api.sumologic.com/api/v1/logs/search?q=database connection error&from=2012-04-04T13:01:02&to=2012-04-04T15:01:02 Another real-world situation where the search API can come in handy is to find login failures. You could locate those in the logs with a query like this: curl -u [email protected]:VeryTopSecret123 "https://api.sumologic.com/api/v1/logs/search?q=failed login" Again, you could restrict your search here to a certain time and date range, too, if you wanted. Another Way to Integrate with Sumo Logic Search: Webhooks Most users will probably find the Sumo Logic search API the most extensible method of integrating their apps with log data. But there is another way to go about this, too, which is worth mentioning before we wrap up. That’s Webhook alerts, a feature that was added to Sumo Logic last fall. Webhooks make it easy to feed Sumo Logic search data to external apps, like Slack, PagerDuty, VictorOps and Datadog. I won’t explain how to use Webhooks in this post, because that topic is already covered on our blog. Integrating Apps with the Sumo Logic Search API is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production. About the Author Dan Stevens is the founder of StickyWeb (stickyweb.biz), a custom Web Technology development company. Previously, he was the Senior Product Manager for Java Technologies at Sun Microsystems and for broadcast video technologies at Sony Electronics, Accom and Ampex.

Blog

AWS Kinesis Streams - Getting Started

Blog

5 Bintray Security Best Practices

Bintray, JFrog’s software hosting and distribution platform, offers lots of exciting features, like CI integration and REST APIs. If you’re like me, you enjoy thinking about those features much more than you enjoy thinking about software security. Packaging and distributing software is fun; worrying about the details of Bintray security configurations and access control for your software tends to be tedious (unless security is your thing, of course). Like any other tool, however, Bintray is only effective in a production environment when it is run securely. That means that, alongside all of the other fun things you can do with Bintray, you should plan and run your deployment in a way that mitigates the risk of unauthorized access, the exposure of private data, and so on. Below, I explain the basics of Bintray security, and outline strategies for making your Bintray deployment more secure. Bintray Security Basics Bintray is a cloud service hosted by JFrog’s data center provider. JFrog promises that the service is designed for security, and hardened against attack. (The company is not very specific about how it mitigates security vulnerabilities for Bintray hosting, but I wouldn’t be either, since one does not want to give potential attackers information about the configuration.) JFrog also says that it restricts employee access to Bintray servers and uses SSH over VPN when employees do access the servers, which adds additional security. The hosted nature of Bintray means that none of the security considerations associated with on-premises software apply. That makes life considerably easier from the get-go if you’re using Bintray and are worried about security. Still, there’s more that you can do to ensure that your Bintray deployment is as robust as possible against potential intrusions. In particular, consider adopting the following policies. Set up an API key for Bintray Bintray requires users to create a username and password when they first set up an account. You’ll need those when getting started with Bintray. Once your account is created, however, you can help mitigate the risk of unauthorized access by creating an API key. This allows you to authenticate over the Bintray API without using your username or password. That means that even if a network sniffer is listening to your traffic, your account won’t be compromised. Use OAuth for Bintray Authentication Bintray also supports authentication using the OAuth protocol. That means you can log in using credentials from a GitHub, Twitter or Google+ account. Chances are that you pay closer attention to one of these accounts (and get notices from the providers about unauthorized access) than you do to your Bintray account. So, to maximize security and reduce the risk of unauthorized access, make sure your Bintray account itself has login credentials that cannot be brute-forced, then log in to Bintray via OAuth using an account from a third-party service that you monitor closely. Sign Packages with GPG Bintray supports optional GPG signing of packages. To do this, you first have to configure a key pair in your Bintray profile. For details, check out the Bintray documentation. GPG signing is another obvious way to help keep your Bintray deployment more secure. It also keeps the users of your software distributions happier, since they will know that your packages are GPG-signed, and therefore, are less likely to contain malicious content. Take Advantage of Bintray’s Access Control The professional version of Bintray offers granular control over who can download packages. (Unfortunately this feature is only available in that edition.) You can configure access on a per-user or per-organization basis. While gaining Bintray security shouldn’t be the main reason you use granular access control (the feature is primarily designed to help you fine-tune your software distribution), it doesn’t hurt to take advantage of it in order to reduce the risk that certain software becomes available to a user to whom you don’t want to give access. 5 Bintray Security Best Practices is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production. About the Author Chris Tozzi has worked as a journalist and Linux systems administrator. He has particular interests in open source, agile infrastructure and networking. He is Senior Editor of content and a DevOps Analyst at Fixate IO.

September 9, 2016

Blog

Tutorial: How to Run Artifactory as a Container

Blog

How Hudl and Cloud Cruiser use Sumo Logic Unified Logs and Metrics

Blog

Benchmarking Microservices for Fun and Profit

Why should I benchmark microservices? The ultimate goal of benchmarking is to better understand the software, and test out the effects of various optimization techniques for microservices. In this blog, we describe our approach to benchmarking microservices here at Sumo Logic. Create a spreadsheet for tracking your benchmarking We found a convenient way to document a series of benchmarks is in a Google Spreadsheet. It allows collaboration and provides the necessary features to analyze and sum up your results. Structure your spreadsheet as follows: Title page Goals Methodology List of planned and completed experiments (evolving as you learn more) Insights Additional pages Detailed benchmark results for various experiments Be clear about your benchmark goals Before you engage in benchmarking, clearly state (and document) your goal. Examples of goals are: “I am trying to understand how input X affects metric Y” “I am running experiments A, B and C to increase/decrease metric X” Pick one key metric (Key Performance Indicator – KPI) State clearly which one metric you are concerned about and how the metric affects users of the system. If you choose to capture additional metrics for your test runs, ensure that the key metrics stands out. Think like a scientist You’re going to be performing a series of experiments to better understand which inputs affect your key metric, and how. Consider and document the variables you devise, and create a standard control set to compare against. Design your series of experiments in a fashion that leads to the understanding in the least amount of time and effort. Define, document and validate your benchmarking methodology Define a methodology for running your benchmarks. It is critical your benchmarks be: Fairly fast (several minutes, ideally) Reproducible in the exact same manner, even months later Documented well enough so another person can repeat them and get identical results Document your methodology in detail. Also document how to re-create your environment. Include all details another person needs to know: Versions used Feature flags and other configuration Instance types and any other environmental details Use load generation tools, and understand their limitations In most cases, to accomplish repeatable, rapid-fire experiments, you need a synthetic load generation tool. Find out whether one already exists. If not, you may need to write one. Understand that load generation tools are at best an approximation of what is going on in production. The better the approximation, the more relevant the results you’re going to obtain. If you find yourself drawing insights from benchmarks that do not translate into production, revisit your load generation tool. Validate your benchmarking methodology Repeat a baseline benchmark at least 10 times and calculate the standard deviation over the results. You can use the following spreadsheet formula: =STDEV(<range>)/AVERAGE(<range>) Format this number as a percentage, and you’ll see how big the relative variance in your result set is. Ideally, you want this value to be < 10%. If your benchmarks have larger variance, revisit your methodology. You may need to tweak factors like: Increase the duration of the tests. Eliminate variance from the environments. Ensure all benchmarks start in the same state (i.e. cold caches, freshly launched JVMs, etc). Consider the effects of Hotspot/JITs. Simplify/stub components and dependencies on other microservices that add variance but aren’t key to your benchmark. Don’t be shy to make hacky code changes and push binaries you’d never ship to production. Important: Determine the number of results you need to get the standard deviation below a good threshold. Run each of your actual benchmarks at least that many times. Otherwise, your results may be too random. Execute the benchmark series Now that you have developed a sound methodology, it’s time to gather data. Tips: Only vary one input/knob/configuration setting at a time. For every run of the benchmark, capture start and end time. This will help you correlate it to logs and metrics later. If you’re unsure whether the input will actually affect your metric, try extreme values to confirm it’s worth running a series. Script the execution of the benchmarks and collection of metrics. Interleave your benchmarks to make sure what you’re observing aren’t slow changes in your test environment. Instead of running AAAABBBBCCCC, run ABCABCABCABC. Create enough load to be able to measure a difference There are two different strategies for generating load. Strategy 1: Redline it! In most cases, you want to ensure you’re creating enough load to saturate your component. If you do not manage to accomplish that, how would you see that you increased it’s throughput? If your component falls apart at redline (i.e. OOMs, throughput drops, or otherwise spirals out of control), understand why, and fix the problem. Strategy 2: Measure machine resources In cases where you cannot redline the component, or you have reason to believe it behaves substantially different in less-than-100%-load situations, you may need to resort to OS metrics such as CPU utilization and IOPS to determine whether you’ve made a change. Make sure your load is large enough for changes to be visible. If your load causes 3% CPU utilization, a 50% improvement in performance will be lost in the noise. Try different amounts of load and find a sweet spot, where your OS metric measurement is sensitive enough. Add new benchmarking experiments as needed As you execute your benchmarks and develop a better understanding of the system, you are likely to discover new factors that may impact your key metric. Add new experiments to your list and prioritize them over the previous ones if needed. Hack the code In some instances, the code may not have configuration or control knobs for the inputs you want to vary. Find the fastest way to change the input, even if it means hacking the code, commenting out sections or otherwise manipulating the code in ways that wouldn’t be “kosher” for merges into master. Remember: The goal here is to get answers as quickly as possible, not to write production-quality code—that comes later, once we have our answers. Analyze the data and document your insights Once you’ve completed a series of benchmarks, take a step back and think about what the data is telling you about the system you’re benchmarking. Document your insights and how the data backs them up. It may be helpful to: Calculate the average for each series of benchmarks you ran and to use that to calculate the difference (in percent) between series — i.e. “when I doubled the number of threads, QPS increased by 23% on average.” Graph your results — is the relationship between your input and the performance metric linear? Logarithmic? Bell curve? Present your insights When presenting your insights to management and/or other engineering teams, apply the Pyramid Principle. Engineers often make the mistake of explaining methodology, results and concluding with the insights. It is preferable to reverse the order and start with the insight. Then, if needed/requested, explain methodology and how the data supports your insight. Omit nitty-gritty details of any experiments that didn’t lead to interesting insights. Avoid jargon, and if you cannot, explain it. Don’t assume your audience knows the jargon. Make sure your graphs have meaningful, human-readable units. Make sure your graphs can be read when projected onto a screen or TV.

September 2, 2016

Blog

Solaris Containers: What You Need to Know

Blog

A Beginner’s Guide to GitHub Events

Do you like GitHub, but don’t like having to log in to check on the status of your project or code? GitHub events are your solution. GitHub events provide a handy way to receive automated status updates from your GitHub repos concerning everything from code commits to new users joining a project. And because they are accessible via a Web API as GET requests, it’s easy to integrate them into the notification system of your choosing. Keep reading for a primer on GitHub events and how to get the most out of them. What GitHub events are, and what they are not Again, GitHub events provide an easy way to keep track of your GitHub repository without monitoring its status manually. They’re basically a notification system that offers a high level of customizability. You should keep in mind, however, that GitHub events are designed only as a way to receive notifications. They don’t allow you to interact with your GitHub repo. You can’t trigger events; you can only receive notifications when specific events occur. That means that events are not a way for you to automate the maintenance of your repository or project. You’ll need other tools for that. But if you just want to monitor changes, they’re a simple solution. How to use GitHub events GitHub event usage is pretty straightforward. You simply send GET requests to https://api.github.com. You specify the type of information you want by completing the URL information accordingly. For example, if you want information about the public events performed by a given GitHub user, you would send a GET request to this URL: https://api.github.com/users//events (If you are authenticated, this request will generate information about private events that you have performed.) Here’s a real-world example, in which we send a GET request using curl to find information about public events performed by Linus Torvalds (the original author of Git), whose username is torvalds: curl -i -H "Accept: application/json" -H "Content-Type: application/json" -X GET https://api.github.com/users/torvalds/events Another handy request lets you list events for a particular organization. The URL to use here looks like: https://api.github.com/users/:username/events/orgs/ The full list of events, with their associated URLs, is available from the GitHub documentation. Use GitHub Webhooks for automated events reporting So far, we’ve covered how to request information about an event using a specific HTTP request. But you can take things further by using GitHub Webhooks to automate reporting about events of a certain type. Webhooks allow you to “subscribe” to particular events and receive an HTTP POST response (or, in GitHub parlance, a “payload”) to a URL of your choosing whenever that event occurs. You can create a Webhook in the GitHub Web interface that allows you to specify the URL to which GitHub should send your payload when an event is triggered. Alternatively, you can create Webhooks via the GitHub API using POST requests. However you set them up, Webhooks allow you to monitor your repositories (or any public repositories) and receive alerts in an automated fashion. Like most good things in life, Webhooks are subject to certain limitations, which are worth noting. Specifically, you can only configure up to a maximum of twenty events per each GitHub organization or repository. Authentication and GitHub events The last bit of information we should go over is how to authenticate with the GitHub API. While you can monitor public events without authentication, you’ll need to authenticate in order to keep track of private ones. Authentication via the GitHub API is detailed here, but it basically boils down to having three options. The simplest is to do HTTP authentication using a command like: curl -u "username" https://api.github.com If you want to be more sophisticated, you can also authenticate using OAuth2 via either key/secrets or tokens. For example, authenticating with a token would look something like: curl https://api.github.com/?access_token=OAUTH-TOKEN If you’re monitoring private events, you’ll want to authenticate with one of these methods before sending requests about the events. Further reading If you want to dive deeper into the details of GitHub events, the following resources are useful: Overview of event types. Event payloads according to event type. Setting up Webhooks. GitHub API authentication. A Beginner’s Guide to GitHub Events is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Building Software Release Cycle Health Dashboards in Sumo Logic

Gauging the health and productivity of a software release cycle is notoriously difficult. Atomic age metrics like “man months” and LOCs may be discredited, but they are too often a reflexive response for DevOps problems. Instead of understanding the cycle itself, management may hire a “DevOps expert” or homebrew one by taking someone off their project and focusing them on “automation.” Or they might add man months and LOCs with more well-intentioned end-to-end tests. What could go wrong? Below, I’ve compiled some metrics and tips for building a release cycle health dashboard using Sumo Logic. Measuring Your Software Release Cycle Speed Jez Humble points to some evidence that delivering faster not only shortens feedback but also makes people happier, even on deployment days. Regardless, shorter feedback cycles do tend to bring in more user involvement in the release, resulting in more useful features and fewer bugs. Even if you are not pushing for only faster releases, you will still need to allocate resources between functions and services. Measuring deployment speed will help. Change lead time: Time between ticket accepted and ticket closed. Change frequency: Time between deployments. Recovery Time: Time between a severe incident and resolution. To get this data to Sumo Logic, ingest your SCM and incident management tools. While not typical log streams, the tags and timestamps are necessary to tracking the pipeline. You can return deployment data from your release management tools. Tracking Teams, Services with the Github App To avoid averaging out insights, Separately tag services and teams in each of the tests above. For example, if a user logic group works on identities and billing, track billing and identity services separately. For Github users, there is an easy solution, the Sumo Logic App for Github, which is currently available in preview. It generates pre-built dashboards in common monitoring areas like security, commit/pipeline and issues. More importantly, each panel provides queries that can be repurposed for separately tagged, team-specific panels. Reusing these queries allows you to build clear pipeline visualizations very quickly. For example, let’s build a “UI” team change frequency panel. First, create a lookup table designating UserTeams. Pin it to saved queries as it can be used across the dashboard to break out teams: "id","user","email","team", "1","Joe","[email protected]","UI" "2","John","[email protected]","UI" "3","Susan","[email protected]","UI" "4","John","[email protected]","backspace" "5","John","[email protected]","backspace" Next, copy the “Pull Requests by Repository” query from the panel: _sourceCategory=github_logs and ( "opened" or "closed" or "reopened" ) | json "action", "issue.id", "issue.number", "issue.title" , "issue.state", "issue.created_at", "issue.updated_at", "issue.closed_at", "issue.body", "issue.user.login", "issue.url", "repository.name", "repository.open_issues_count" as action, issue_ID, issue_num, issue_title, state, createdAt, updatedAt, closedAt, body, user, url, repo_name, repoOpenIssueCnt | count by action,repo_name | where action != "assigned" | transpose row repo_name column action Then, pipe in the team identifier with a lookup command: _sourceCategory=github_logs and ( "opened" or "closed" or "reopened" ) | json "action", "issue.id", "issue.number", "issue.title" , "issue.state", "issue.created_at", "issue.updated_at", "issue.closed_at", "issue.body", "issue.user.login", "issue.url", "repository.name", "repository.open_issues_count" as action, issue_ID, issue_num, issue_title, state, createdAt, updatedAt, closedAt, body, user, url, repo_name, repoOpenIssueCnt | lookup team from https://toplevelurlwithlookups.com/UserTeams.csv on user=user | count by action,repo_name, team | where action != "assigned" | transpose row repo_name team column action This resulting query tracks commits — open, closed or reopened — by team. The visualization can be controlled on the panel editor, and the lookup can be easily piped to other queries to break the pipeline by teams. Don’t Forget User Experience It may seem out of scope to measure user experience alongside a deployment schedule and recovery time, but it’s a release cycle health dashboard, and nothing is a better measure of a release cycle’s health than user satisfaction. There are two standards worth including: Apdex and Net Promoter Score. Apdex: measures application performance on a 0-1 satisfaction scale calculated by… If you want to build an Apdex solely in Sumo Logic, you could read through this blog post and use the new Metrics feature in Sumo Logic. This is a set of numeric metrics tools for performance analysis. It will allow you to set, then tune satisfaction and tolerating levels without resorting to a third party tool. Net Promoter Score: How likely is it that you would recommend our service to a friend or colleague? This one-question survey correlates with user satisfaction, is simple to embed anywhere in an application or marketing channel, and can easily be forwarded to a Sumo Logic dashboard through a webhook. When visualizing these UX metrics, do not use the single numerical callout. Take advantage of Sumo Logic’s time-series capabilities by tracking a line chart with standard deviation. Over time, this will give you an expected range of satisfaction and visual cues of spikes in dissatisfaction that sit on the same timeline as your release cycle. Controlling the Release Cycle Logging Deluge A release cycle has a few dimensions that involve multiple sources, which allow you to query endlessly. For example, speed requires ticketing, CI and deployment logs. Crawling all the logs in these sources can quickly add up to TBs of data. That’s great fun for ad hoc queries, but streams like comment text are not necessary for a process health dashboard, and their verbosity can result in slow dashboard load times and costly index overruns. To avoid this, block this and other unnecessary data by partitioning sources in Sumo Logic’s index tailoring menus. You can also speed up the dashboard by scheduling your underlying query runs for once a day. A health dashboard doesn’t send alerts, so it doesn’t need to be running in real-time. More Resources: How Do you Measure Team Success? On the Care and Feeding of Feedback Cycles Martin Fowler’s Test Pyramid Just Say No to More End to End Tests Quantifying Devops Capability: It’s Important to Keep CALMS 9 Metrics DevOps Teams Track Building Software Release Cycle Health Dashboards in Sumo Logic is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production. About the Author Alex Entrekin served on the executive staff of Cloudshare where he was primarily responsible for advanced analytics and monitoring systems. His work extending Splunk into actionable user profiling was featured at VMworld: “How a Cloud Computing Provider Reached the Holy Grail of Visibility.” Alex is currently an attorney, researcher and writer based in Santa Barbara, CA. He holds a J.D. from the UCLA School of Law.

Blog

Who Broke My Test? A Git Bisect Tutorial

Git bisect is a great way to help narrow down who (and what) broke something in your test—and it is incredibly easy to learn. This Git Bisect tutorial will guide you through the basic process. So, what is Git bisect? What exactly is it? Git bisect is a binary search to help find the commit that introduced a bug. Not only is it a great tool for developers, but it’s also very useful for testers (like myself). I left for the weekend with my Jenkins dashboard looking nice and green. It was a three-day weekend—time to relax! I came in Tuesday, fired up my machine, logged into Jenkins…RED. I had to do a little bit of detective work. Luckily for me, a developer friend told me about git bisect (I’m relatively new to this world), and helped me quickly track down which commit broke my tests. Getting started with Git bisect First, I had to narrow down the timeline. (Side note—this isn’t really a tool if you’re looking over the last few months, but if it’s within recent history—days—it’s handy). Looking at my build history in Jenkins, I noted the date/times I had a passing build (around 11 AM), and when it started showing up red (around 5 PM). I went into SourceTree and found a commit from around 11 AM that I thought would be good. A simple double-click of that commit and I was in. I ran some tests against that commit, and all passed, confirming I had a good build. It was time to start my bisect session! git bisect start git bisect good Narrowing down the suspects Now that I’d established my GOOD build, I had time to figure out where I thought the bad build occured. Back to SourceTree! I found a commit from around 5 PM (where I noticed the first failure), so I thought I’d check that one out. I ran some more tests. Sure enough, they failed. I marked that as my bad build. git bisect bad I had a bunch of commits between the good and bad (in my case, 15), and needed to find which one between our 11 AM and 5 PM run broke our tests. Now, without bisect, I might have had to pull down each commit, run my tests, and see which started failing between good and bad. That’s very time-consuming. But git bisect prevents you from having to do that. When I ran git bisect bad, I got a message in the following format: Bisecting: revisions left to test after this (roughly steps) [<commit number>] <Commit Description> This helped identify the middle commit between what I identified as good and bad, cutting my options in half. It told me how many revisions were between the identified commit and my bad commit (previously identified), how many more steps I should have to find the culprit, and which commit I next needed to look at. Then, I needed to test the commit that bisect came up with. So—I grabbed it, ran my tests—and they all passed. git bisect good This narrowed down my results even further and gave me a similar message. I continued to grab the commit, run tests, and set them as git good until I found my culprit—and it took me only three steps (versus running through about 15 commits)! When my tests failed, I had to identify the bad commit. git bisect bad <commit number> is the first bad commit <commit number> Author: <name> Date: <date and time of commit> Aha! I knew the specific commit that broke my test, who did it, and when. I had everything I needed to go back to the engineer (with proof!) and start getting my tests green again. Getting to the Root Cause When you are in the process of stabilizing tests, it can be fairly time-consuming to determine if a failure is a result of a test, or the result of an actual bug. Using git bisect can help reduce that time and really pinpoint what exactly went wrong. In this case, we were able to quickly go to the engineer, alert the engineer that a specific commit broke the tests, and work together to understand why and how to fix it. Of course, in my perfect world, it wouldn’t only be my team that monitors and cares about the results. But until I live in Tester’s Utopia, I’ll use git bisect. Who Broke My Test? A Git Bisect Tutorial is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production. About the Author Ashley Hunsberger is a Quality Architect at Blackboard, Inc. and co-founder of Quality Element. She’s passionate about making an impact in education and loves coaching team members in product and client-focused quality practices. Most recently, she has focused on test strategy implementation and training, development process efficiencies, and preaching Test Driven Development to anyone that will listen. In her downtime, she loves to travel, read, quilt, hike, and spend time with her family.

Blog

Docker 1.12: What You Need to Know

Blog

A Brief Tutorial to Understanding, Starting, and Using AWS Lambda

As cloud computing continues to become more and more accessible to businesses (and individuals), it can be a full-time job keeping up with all the tools released by the likes of Amazon, Microsoft, and others. One such tool is AWS Lambda, something that stands to revolutionize the accessibility (and affordability) of distributed computing. What Are AWS Lambda Functions? The short answer is that Lambdas are discrete blocks of code which are executed without having to manage the overhead of hardware and software involved in hosting that code. Your code can currently be written in either Python or Node.js, and can be uploaded with project-specific dependencies to facilitate a seemingly endless set of possibilities. Getting Started with AWS Lambda Sometimes the best way to understand a system is to simply start using it, so let’s dispense with the introductions and get right into some code. Since I much more frequently use JavaScript in my life, I’ll be doing my examples here in Node.js. The equivalent scripts in Python will rarely be more complicated, and Amazon provides plenty of examples in both languages to allow you the chance to learn. Once you get through the AWS Console and are creating a new Lambda function, you are first presented with a selection of templates. One of the templates available is a hello-world template in Node.js. Selecting it will bring you to a configuration screen with several options, as well as the following code: There are three important pieces of this code. First, the line of code that creates the function: exports.handler = function(event, context) { This code (and the associated closing bracket) is the heart of any Lambda. The Lambda system relies on having this exported handler to know where it will pass execution when the Lambda has been triggered. Second in line is the event argument for the handler, which receives any input provided by the trigger for the lambda. This example does a good job of showing how the following JSON would be accessible to the code: { “key3”: “value3”, “key2”: “value2”, “key1”: “value1” } The final piece that’s important is the context success/failure call: context.succeed(event.key1); That call lets the Lambda host environment know whether or not the lambda was able to complete its work. You can also include (as shown) additional information with your success/failure indication, which is important if you plan on debugging what you’ve created. Everything else in the code (in this case the console.log() calls) is superfluous as far as the Lambda environment is concerned. How to Use AWS Lambda Functions With the basics out of the way, let’s get into the nitty-gritty of showing how the system can be useful to you. Let’s consider the following situation: You are tasked with sending a notification email to a manager whenever someone visits a specific page on a website. For pre-existing reasons, you aren’t able to send email directly from the web server, so you have to create a system that can be remotely invoked to deliver emails. In the past, maybe you would create a service or cron job that constantly polled a database table for outgoing emails. Maybe you’d even create your own message queue handler to do the work on demand (much better), but you’re still stuck building and maintaining extra infrastructure. Enter Lambdas! Use Case: Our Email Handler Starting from the hello-world example provided by Amazon, I’m going to remove everything except the exported handler. I’ll then include the AWS SDK (thankfully built into the Lambda environment, so I can do this without uploading my code in a ZIP file), building a parameter object for use with my AWS SES (Simple Email Service) account, and attempt sending the email through SES before deciding if the operation was successful. Here is the completed code (and the explanation of the lines below): var AWS = require('aws-sdk'); exports.handler = function(event, context) { var params = { Destination: { ToAddresses: ['[email protected]'] }, Message: { Body: { Text: { Data: 'Someone has pinged the website!', Charset: 'utf8' } }, Subject: { Data: 'A Website Ping', Charset: 'utf8' } }, Source: '[email protected]', SourceArn: 'arn-here' }; new AWS.SES().sendEmail(params, function(err, data) { if (err) { context.fail('Failed to send email (' + err + ')'); } else { context.succeed('The website has been pinged'); } }); }; The first line accomplishes loading the AWS SDK for use later on in the script. We use our typical handler export, and inside the function build our SES parameters and attempt sending through the SDK. The parameter object for SES includes the bare minimum of options for an SES email, including the Destination, Message, Source, and SourceArn definitions. The SourceArn is part of AWS’ Identity and Access Management (IAM) system and indicates which email address the lambda is attempting to use for delivery. Lambda Tutorial Extra Credit Once this much has been done, we technically have a functioning lambda (hit ‘Test’ on the configuration page to be sure). However, getting it to work within our system would require at least creating an API endpoint to enable remote triggering of the lambda via a REST call. Additionally, there are ways to add triggers for other services within AWS, so choosing the best trigger approach will be based on your technology requirements. Other Common Lambda Uses? While utilizing the system for distributed (and thus asynchronous) email notifications/delivery is useful, it’s hardly the end of where Lambdas are useful. Some other ideas for great applications of the feature: On-demand distributed file conversion (change video file encodings, get metadata from uploads, etc) Index updates for services such as ElasticSearch Heavy computations for data analysis And these are just a few of the potential use cases where Lambdas will shine. The Brass Tacks of AWS Lambda Hopefully this has been a quick but educational introduction into the world of AWS Lambdas. We’ve shown how you can perform non-time-sensitive operations outside of the confines of your application with a small amount of effort. Lambdas can accomplish much more—so good luck, and have fun embracing distributed computing. Editor’s Note: A Brief Tutorial to Understanding, Starting, and Using AWS Lambda is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to visit the Sumo Logic Developersfor free tools, API’s and example code that will enable you to monitor and troubleshoot applications from code to production. About the Author Andrew Male (@AndyM84) is a senior engineer at an enterprise development company in Boston, MA. Andrew has been programming from a young age and is entirely self-taught; he has spent time in many corners of the programming world including game/VR work, agency work, and teaching development to students and adults alike. He spends most of his time working on architecture design and pursuing his favorite hobby—physics.

Blog

Sending JMX Metrics to Sumo Logic Unified Logs and Metrics

This is a brief excerpt of how Mayvenn started sending JMX metrics to Sumo Logic Unified Logs and Metrics solution. For the full blog post please visit Mayvenn’s engineering blog. We’ve been using Sumo Logic for logs and were excited to have one tool and dashboard to visualize logs and metrics! In order to wire this up, we decided to use jmxtrans to regularly pipe the JMX metrics we query with Graphite formatted output to the new Sumo Logic collectors. These collectors can essentially be thought of a hosted version of Graphite. Step 1: Upgrade/Install Sumo Logic Collectors There’s a lot of guides out there on this one, but just in case you have existing collectors, they do need to be updated to have the new Graphite source. Step 2: Add a Graphite Source for the Collector This step can either be done in the Sumo Logic dashboard or through a local file for the collector that configures the sources. Either way, you will need to decide what port to run the collector on and whether to use TCP or UDP. For our purposes, the standard port of 2003 is sufficient and we don’t have an extremely high volume of metrics with network/CPU concerns to justify UDP. For configuring this source in the dashboard, the Sumo Logic guide to adding a Graphite source does a pretty thorough walkthrough. To summarize, though, the steps are pretty simple: go to to the collector management page, select the relevant collector, click add source, choose the Graphite source and configure it with the port and TCP/UDP choices. This method is certainly fast to just try out Sumo Logic metrics.

Blog

Sumo Logic App for Host Monitoring

Today, Sumo Logic is announcing a host monitoring app that provides comprehensive and native metrics visibility into a server and its resources. Now, customers using Sumo Logic can get detailed host metrics, analyze these metrics and visualize them (overlayed with other host metrics and side by side with logs) to optimize application and infrastructure performance. Sumo Logic App for Host Monitoring The Sumo Logic App for Host Metrics, helps you understand and correlate key metrics for monitoring your hosts (either Windows or Linux). The table below lists out the key metrics you can monitor out of the box with Sumo Logic: Key Host Metrics Visualized Host Metrics CPU User Time, System Time, Idle Time, Avg Load Time Disk Disk Used, Bytes Available, Reads, Writes, Read Bytes, Write Bytes Memory Total, Percentage Used, Total Free, Buffered and Cached TCP Connections Inbound, Outbound, Listen, Established, Close Wait, Time Wait Network In Packets, In Bytes, Out Packets, Out Bytes As an example of how to use the Host Metrics app, let’s drill down into the CPU dashboard. Host Metrics – CPU The CPU dashboard is structured to provide information at 2 levels of granularity: CPU metrics for an individual Host CPU metrics for the entire deployment This side-by-side view helps you compare/evaluate the performance of a single host in the context of the entire deployment. Panels for CPU user time and CPU system time help you understand where the CPU is spending most of its cycles. In most typical situations, user time is considerably higher, as that’s what’s going towards processing your applications. The system time is usually a much lower metric. An unusually high system time could indicate the kernel is over-busy executing system calls, such as I/O. On the other hand, an unusually high CPU user time could be an indication of inefficiency in the application code. To learn more about your Host Metrics, check out from the App available in the preview tab in the Sumo Logic library. Learn more Help and Documentation Customer Support Sumo Logic Community

August 22, 2016

Blog

Using Node.js npm with Artifactory via the API and CLI

Blog

How devRant uses Sumo Logic for Log Monitoring and A/B Testing

Here at devRant, we use Sumo Logic for a number of things including log monitoring (triggering alerts based on error log trends), database query analysis, and user behavior analysis + A/B testing analysis. In this article I’m going to focus on a recent A/B test analysis Sumo Logic was extremely helpful on. About the devRant Community In March 2016, my friend and I founded devRant (iOS, Android) – a fun community for devs to vent, share, and bond over how they really feel about code, tech, and life as a programmer. devRant is an app with an audience that is demanding and technical by nature since our users are developers. With that in mind, it is one of our goals to launch high-quality features and provide experiences that our community will consistently enjoy. The format of devRant is pretty simple – members of the community post rants which get displayed in a feed-like format. Right now we have three different methods users can choose from to sort the feed: recent – most recent first top – highest rated rants first algo – an algorithm sort with a number of components (recency, score, and new components which I’ll cover later) Where Sumo Logic Comes in As I noted, we use Sumo Logic for a error monitoring (triggering alerts based on error log trends), database query analysis (seeing which of our DB queries take the longest tied back to user cohort info), and most importantly for us right now, user behavior analysis + A/B testing analysis. In the introduction I mentioned the sorting methods we offer in the devRant mobile apps. Since devRant launched, the default sorting method we’ve offered is our algo sort. This is what all new users see and one of our most popular sorting options. Originally, this sorting method was simply a decaying time algorithm combined with score – so over about 12 hours, the rants shown would change, with the highest rated for that time period showing up on top. While this algo served us well for a while, it became clear that we could do better; we also started getting a lot of requests from users. For example, one of the most frequent items of feedback we got was instead of seeing much of the same content for a few hours in the feed, users wanted the app to hide the content they had already viewed. Additionally, we had an unproven hypothesis that since developers are a picky bunch (well, we are!), it would be good to make the algo sort to the taste of each specific user and mix in some slightly older rants that we think they would enjoy. We decided to base this personalized algo on criteria like users they’ve enjoyed content from before, tags they’ve like, and foremost, rants that users who have similar tastes have enjoyed and that the user hasn’t already seen. As a small but quickly growing startup, it was important to us to make sure we didn’t cannibalize our most important user experience with a hunch, so we decided we would A/B test the new algorithm and see if the new one actually out-performs the old one. Using Sumo Logic to Effectively Analyze Our Test We decided to deploy our new algorithm to 50% of our mobile user-base so we could get a meaningful amount of data fairly quickly. We then decided the main metric we would look at was number of upvotes because a user generally upvotes content only if they are enjoying it. The first thing we looked at was the number of +1’s created by users based on what algo feed version they had. The log format for a vote looks like this: event_type=’Vote’, user_id=’id of user’, vote_type=’1 for upvote, -1 for downvote’ insert_id=’a unique id for the vote’, platform=’iOS, Android or Web’, algov=’1 if the user is assigned the old algo, 2 if the new algo’, post_type=’rant or comment’ So an example log message from an upvote on a rant from an iOS user with the new algo would look like this: event_type=’Vote’, user_id=’123’, vote_type=’1’, insert_id=’123456’, platform=’iOS’, algov=’2’, post_type=’rant’ For the initial query, I wanted to see a split of votes on rants for each algo version on iOS or Android (since the new algo is currently only available on mobile), so I wrote this simple query (it just uses some string matching, parses out the algo version, and then gets a count of votes for each version): "event_type='Vote'" "vote_type='1'" "post_type='rant'" ("platform='iOS'" OR "platform='Android'") | parse "algov='*'" as algov | count by algov I ran this over a short period of time and quickly got some insight into our A/B test. The results were promising: algolv _count 2 2,921 1 1,922 This meant that based on this query and data, the new algo was resulting in about 50% more +1’s on rants compared to the old algo. However, as I thought about this more, I saw an issue with the data. Like I touched on in the beginning of the article, we offer three different sorting methods with algo sort being just one of those. In the log message for a vote, I realized we weren’t logging what sort method the user was actually using. So even if the user had algov=2, they could have easily been using the recent sort instead of the algo sort, making their vote irrelevant for this test. Without the actual sort method they had used to find the rant they voted on in the log, I was in a bind to somehow make the data work. Luckily, Sumo Logic’s robust query language came to the rescue! While there’s no sort property on the vote message, we have another event we log called “Feed Load” that logs each time the feed gets loaded for a user, and includes what sort method they are using. A feed load log message is very close in format to a vote message and looks like this: event_type=’Feed Load’, user_id=’123’, sort=’algo’, insert_id=’1234’, platform=Android, algov=’2’ The important property here is sort. Using the powerful Sumo Logic join functionality, I was able to combine vote events with these feed load events in order to ensure the user who placed the vote was using algo sort. The query looked like this: ("event_type='Vote'" OR "event_type='Feed Load'") ("platform='iOS'" OR "platform='Android'") | join (parse "vote_type='*'" as vote_type, "user_id='*'" as user_id, "insert_id='*'" as insert_id, "algov='*'" as algo_ver, "post_type='*'" as post_type) as vote_query, (parse "event_type='Feed *'" as query, "sort='*'" as sort_type, "user_id='*'" as user_id) as feed_query on vote_query.user_id=feed_query.user_id timewindow 500s | WHERE vote_query_vote_type="1" AND vote_query_post_type="rant" AND feed_query_sort_type="algo" | count_distinct(vote_query_insert_id) group by vote_query_algo_ver Here’s a brief explanation of this join query: it gets all of our vote and feed load events that occurred on iOS or Android over the given time period. Then for both the vote and feed load event type, it parses out relevant variables to be used later in the query. The votes and feed loads are joined on the user id. Meanwhile, Timewindow 500s makes it so only events that occurred within about 8 minutes of each other are included. In our use-case this is important because someone can change their sort method at any time, so it’s important that the feed algo event occurred close to the time of the vote. We then do some more filtering (only upvotes, only votes on rants, and only votes that were done near the time of an algo sort [meaning they most likely originated from content found in that sort]). Lastly, we use count_distinct on the unique vote id to make sure each vote is only counted once (since there can be multiple feed loads around the time of a vote), and group the results by the algo version so we get a nice split of number of votes for each test group. After running the query over a decently substantial dataset, I got the following results: vote_query_algo_ver _count_distinct 2 4,513 1 2,176 Whoa!! Needless to say, this was pretty exciting. It seems to indicate that the new sort algo produced more than double the amount of upvotes compared to the old one, over the exact same time period. With this data, we are now comfortably able to say that the new algo is a very big improvement and creates a much better user experience in our app. How Sumo Logic Will Continue to Help Us Push devRant Forward Both my co-founder and I are very data-centric, so having a flexible solution like Sumo Logic on our side is great because it allows us to easily base very important decisions on the data and user behaviors we want to analyze. I hope in this article I’ve provided an example of how valuable A/B testing can be and why it’s important to have a tool that lets you query your data in a number of different ways. Though our new algorithm is performing very well, it is very possible that one of our tests in the future won’t have the same success. It’s better to use data to realize a failure quickly than to look at overall metrics in a few months and wonder why they plummeted. And for us, an unnoticed mistake might result in us becoming the subject of our own app. As we grow, we look forward to continuing to develop new features and utilizing quantifiable data to measure their success. We believe this approach gives us an opportunity, as a startup that strives for innovation, to take some risks but learn quickly whether a few feature should stay or go. About the Author David Fox is the co-founder and engineering lead of devRant. He has over 10 years of experience developing high-performance backend systems and working with a large variety of databases alongside massive datasets. You can follow him on Twitter @dfoxinator or on devRant (@dfox). How devRant uses Sumo Logic for Log Monitoring and A/B Testing is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out the Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

August 18, 2016

Blog

Announcing Sumo Logic’s Unified Logs and Metrics Solution

Are you suffering from swivel chair hell? Everyone knows that logs and metrics represent two sides of application and infrastructure “machine data”: Metrics can provide your app and infrastructure KPI’s like CPU, memory usage, latencies, SQL execution times etc. Logs provide you context into application and infrastructure execution KPI’s – errors, warnings, relevant events like configurations, etc. So it’s no wonder that integrating the two streams can provide incredible insights and help IT ensure the performance and health of their application and infrastructure. However, up until now, managing log and metrics has required disparate (or poorly integrated) solutions. The lack of common data, visualization and context creates many challenges, including: Inefficient swivel chair management High MTTI/MTTR because of poor troubleshooting context High TCO with multiple tools Limited DevOps collaboration etc. If this is your problem, your life changes today ! Say hello to Sumo Logic’s Unified Logs and Metrics solution. Sumo Logic’s Unified Logs and Metrics Solution Today, Sumo Logic, the leading cloud-native, machine data analytics service, announced the general availability of Unified Logs and Metrics, the industry’s first machine data analytics platform to natively ingest, index and analyze metrics and log data together in real-time. Powered by patented machine learning technology, Unified Logs and Metrics transforms structured and unstructured data into real-time continuous intelligence for today’s modern applications and business insights. Key capabilities of the solution include: Out of the box support for AWS, infrastructure and applications metrics: Sumo Logic offers support for Host Metrics such as CPU, memory, disk usage and AWS Cloudwatch metrics (from services like AWS Cloud Trail, Elastic Load Balancing, Amazon Kinesis, AWS Config, VPC Flow Logs and Amazon Simple Storage Service (S3)). Easily extend support for custom apps and infrastructure: The Sumo Logic platform supports the Graphite protocol support, which enables customers to easily extend collection and analysis to other app and infrastructure as well as custom metrics Powerful Real-time Analytics for Troubleshooting – through machine learning, Sumo Logic enables advanced analytics of logs data and metrics data for contextual troubleshooting and quicker root cause analysis of issues. View logs and metrics in a unified dashboard : Sumo logic enables users to view, filter and report logs and metrics in one dashboard to reduce “swivel chair” management and lower mean time to identify (MTTI) and resolve (MTTR) issues. Benefits for customers Sumo Logic customers have seen several benefits with the solution: Improved customer experience by proactive management of application health Faster performance troubleshooting with contextual analysis and dashboard of logs and metrics Lowered total cost of ownership by eliminating multiple tools for logs and metrics analytics Improved DevOps collaboration by providing a single source of truth that can be securely accessed and analyzed by IT OPs, DevOps and Dev teams. As one early customer for ULM described it: “We have many (5+) disparate tools to monitor our cloud-based modern apps and infrastructures. It takes us hours to identify and troubleshoot issues currently and ULM will cut that to minutes” Want to learn more? If you are interested in learning how real-world customers are using this solution, attend our upcoming webinar, featuring Sumo Logic customers who will discuss how they are leveraging the Sumo Logic ULM platform to improve application health and management. Register at the following linksd: https://info.sumologic.com/Sumo-Logic-Unified-Logs-Metrics-Webinar.html

August 16, 2016

Blog

The Easy 3 Step Process to Get Started with Sumo Logic Metrics Solution

Already Collecting Logs with Sumo Logic? The good news is that you can use the exact same Collector to start collecting your Host Metrics. Read more to learn… Metrics are numeric samples of data collected over time. They can measure infrastructure, such as operating system performance or disk activity; application performance; or custom business and operational data that is coded into an organization’s applications. Metrics are an effective tool for monitoring, troubleshooting, and identifying the root causes of problems. They can help your organization: Gain end-to-end visibility into application performance. Track key performance indicators (KPIs) over time. Determine if an outage has occurred and restore service. Determine why an event occurred and how it might be prevented in the future. The Sumo Logic Unified Logs and Metrics, the industry’s first machine data analytics platform to natively ingest, index and analyze metrics and log data together in real-time. In these 3 short videos. Let’s walk you through the process to ingest and analyze metrics with the Sumo Logic solution: Upgrading your existing Collector: See how easy it is to use the existing Sumo Logic collectors to collect and ingest host metrics Learning how to query your Data: Analyze the metrics data to gather insights Visualizing your Data on Dashboards: View and Report your metrics. And see them with logs – all in the same dashboard. Ready to get started? http://help.sumologic.com/Metrics/Metrics_Overview/Get_Started_with_Metrics

August 16, 2016

Blog

Global Load Balancing Using AWS Route 53

Blog

Improving your Security Posture with Trend Micro Deep Security Integration

Enterprises are running their workloads across complex, hybrid infrastructures, and need solutions that provide full-stack, 360-degree visibility to support rapid time to identify and resolve security threats. Trend Micro Deep Security offers seamless integration with Sumo Logic’s data analytics service to enable rich analysis, visualizations and reporting of critical security and system data. This enables an actionable, single view across all elements in an environment. I. SOLUTION COMPONENTS FOR INTEGRATION DEEP SECURITY MANAGER (DSM) This is the management component of the system and is responsible for sending rules and security settings to the Deep Security Agents. The DSM is controlled using the web-based management console. Using the console, the administrator can define security policies, manage deployed agents, query status of various managed instances, etc. The integration with Sumo Logic is done using this interface and no additional component or software is required. DEEP SECURITY AGENT (DSA) This component provides for all protection functionality. The nature of protection depends on the rules and security settings that each DSA receives from the Deep Security Manager. Additionally, the DSA sends a regular heartbeat to the DSM, and pushes event logs and other data points about the instance being protected to the DSM. SUMO LOGIC INSTALLED COLLECTORS AND SOURCES Sumo Logic Installed Collectors receive data from one or more Sources. Collectors collect raw log data, compress it, encrypt it, and send it to the Sumo Logic, in real time via HTTPS. The Deep Security Solution Components forward security events to Installed Collectors with a syslog source. SUMO LOGIC DATA ANALYTICS SERVICE AND WEB UI The Sumo Logic Web UI is browser-based and provides visibility and analysis of log data and security events sent by the Deep Security Platform to the Sumo Logic service and also provides administration tools for checking system status, managing your deployment, controlling user access and managing Collectors. SUMO LOGIC APP FOR TREND MICRO DEEP SECURITY The Sumo Logic App for Trend Micro Deep Security delivers out-of-the-box Dashboards, saved searches, and field extraction for for each security module in the Deep Security solution, including Anti-malware, Web Reputation, Intrusion Prevention, Host-based Firewall and File Integrity Monitoring. II. HOW THE DEEP SECURITY INTEGRATED SOLUTION WORKS Overview Trend Micro Deep Security Software and Deep Security as a Service integrates with Sumo Logic through the Installed Collector and Syslog Source. This Syslog Source operates like a syslog server listening on the designated port to receive syslog messages from Trend Micro Deep Security Solution. The Installed Collectors can be deployed in your environment either on a local machine, a dedicated server or in the cloud. The Deep Security platform sends system and security event logs to this server, which forwards them securely to the Sumo Logic Data Analytics Service. Figure 1 provides a high-level overview of the integration process. III. INSTALL DATA COLLECTOR Install Options The first thing to consider when you set up the integration is how to collect data from your Deep Security deployment and forward it to Sumo Logic. There are three basic methods available, local host data collection, centralized syslog data collection and hosted collector. Deep Security uses an installed centralized collector with syslog source. In this method, an installed Collector with Syslog Sources can be used to collect all relevant data in a centralized location before forwarding it on to Sumo Logic’s cloud-based service. Installed Collector with Syslog Sources The installation process involves the deployment of a Sumo Logic collector in your environment and then adding a Syslog Source to it. A Sumo Logic Installed Collector can be installed on any standard server and used to collect local files, remote files or to aggregate logs from network services via syslog. You can choose to install a small number of collectors to minimize maintenance or you can choose to install many Collectors on many machines to leverage existing configuration management and automation tools like Puppet or Chef. At the minimum you will need one “Installed Collector” setup for Deep Security. The number of syslog sources you need depends on the types of event logs that you are sending to Sumo logic. You will need one syslog source for each type of event. There are two types of events in Deep Security: “System Events” and “Security Events”. In the example shown below, we have configured Sumo Logic Installed Collector with two Syslog Sources using UDP protocol. In this example setup, the first syslog source is listening on UDP port 514 for System Event Log forwarding. The second syslog source below is listening on UDP port 1514 for Security modules event log forwarding. IV. INTEGRATE WITH SUMO LOGIC System Event Log Forwarding The integration of Trend Micro Deep Security for system events forwarding to Sumo Logic is done via system setting (Administration System Settings SIEM) configuration as shown below: Security Event Log Forwarding The integration of Trend Micro Deep Security for security event forwarding to Sumo Logic is done via Policy configuration and requires a Syslog Source with UDP protocol and connection information to be added to the policy. Deep Security allows Policy inheritance where child policies inherit their settings from their parent Policies. This way you can create a policy tree that begins with a top/base parent policy configured with settings and rules that will apply to all computers. When you have a single collector installed in your environment to collect logs from Deep Security it is recommended to set the integration details at the Top (root/base) policy as shown below: Additionally, you can configure individual collectors for each security protection module or have all Deep Security modules to send logs to one collector depending on your requirements. Integration Options for Security Events Logs There are two integration options available to configure Deep Security Solution to forward security events to Sumo Logic, Relay via Deep Security Manager and Direct Forward. Relay via Deep Security Manager This option sends the syslog messages from the Deep Security Manager after events are collected on heartbeats as shown below: Direct Forward from Deep Security Agents This option sends the security events/messages in real time directly from the Agents as shown below: Comparison Between the Two Integration Options When you are deciding what integration option to choose from to send security events to Sumo Logic Installed Collectors among these two integration choices, consider your deep security deployment (as a Service, AWS and Azure Marketplace AMI/VM or software), your network topology/design, your available bandwidth, and deep security policy design. The table below provides comparison between these two choices for easier decision process: V. ANALYZE EVENTS LOGS Once the install and integration steps are done, you are almost set to analyze Deep Security event data in Sumo Logic. Log into the Sumo Logic console, jump down to the preview tab section, and select “install” under Trend Micro – Deep Security. Once you define the _sourceCategory, you are set to run searches, identify anomalies and correlate events across your protected workloads. You can also leverage out-of-the-box, powerful dashboards to unify, enrich and visualize security related information across your entire physical, virtual and cloud infrastructure. Sumo Logic Dashboard The Sumo Logic dashboards are a powerful visualization tool to help accelerate the time to identify anomalies and indicators of compromise (IOC). The saved searches powering these dashboards can also be leverage for forensic investigations and to reduce the time it takes for root cause analysis and remediation. The uses for Dashboards are nearly endless. Perhaps your IT security group wants to keep an eye on who is installing virtual machines. You can edit, create and save the queries you run as a panel in a Dashboard, and watch for spikes over time in a line graph. Multiple graphical options/formats are supported. Dashboards bring additional assurance, knowing that unusual activity will be displayed real time in an easy-to-digest graphical format. The data that matters the most to you is even easier to track. How to Learn More on Security For additional learning on Trend Micro Deep Security, please visit their site. To watch a video from Infor’s CISO, Jim Hoover, on how to securely scale teams, manage AWS workloads and address budget challenges, please watch here. *A special thanks to Saif Chaudhry, Principle Architect at Trend Micro and Dwayne Hoover , Sr. Sales Engineering Manager at Sumo Logic for making this integration and App a reality!

August 10, 2016

Blog

Visualize and Analyze Your Auth0 Users with Sumo Logic - A Tutorial

Gain better understanding of your users by visualizing and analyzing your Auth0 event logs with the Sumo Logic extension. Auth0 is a cloud-based, extensible identity provider for applications. The Sumo Logic extension for Auth0 makes it easy to analyze and visualize your Auth0 event logs and provides insight into security and operational issues. In this tutorial, we are going to install the Sumo Logic extension and explain how the dashboards we’ve created can help you quickly get a snapshot of how users are interacting with your application. To get started, you will need an Auth0 and a Sumo Logic account. Both services offer generous free tiers to get you started. Sign up for Auth0 here, and for Sumo Logic you can create an account here. You can follow the step by step tutorial below or watch our video tutorial to learn how and why combining Auth0 and Sumo Logic will be beneficial to your app. Watch the Auth0 and Sumo Logic integration video Benefits of Sumo Logic for Auth0 Before going through the process of setting up the extension, you may be asking yourself why would I even want to do this? What are the benefits? Using Auth0 as your identity provider allows you to capture a lot of data when users attempt to authenticate with your application. A lot of this data is stored in log files and easily forgotten about. Having this data visualized allows you to stay on top of what is happening in your applications. Sumo Logic makes it easy to see the latest failed logins, find and alert on error messages, create charts to visualize trends, or even do complex statistical analysis on your data. Here are some of the log types that can be collected: Logins, both successes and failures Token exchanges, both successes and failures Login failure reasons Connection errors User signup events Password changes Rate limiting events Configuring Sumo Logic to Receive Auth0 Logs To install the Sumo Logic extension, login to your Sumo Logic account and open up the Setup Wizard from the Manage top-level menu. On the next screen, you will want to select the Setup Streaming Data option. For the data type, we will select Your Custom App. Finally, select HTTP Source as the method for collecting the data logs. The last section will have you name the source category as well as select a time zone in the event one is not provided. With the configuration complete, the next screen will display the HTTP endpoint to be used for transmitting our logs. Copy the HTTP Source URL and click the Continue button to complete the setup wizard. Next, we’ll install the Sumo Logic extension from our Auth0 management dashboard. Installing the Sumo Logic Extension within Auth0 Installing the Sumo Logic extension is a fairly straightforward process. We only need the HTTP Source URL which we got when we ran through the Setup Wizard. Let’s look at the process for installing the Sumo Logic extension. Log into your Auth0 management dashboard and navigate to the Extensions tab. Scroll to find the extension title Auth0 Logs to Sumo Logic and select it. A modal dialog will open with a variety of configuration options. We can leave all the default options enabled, we’ll just need to update the SUMOLOGIC URL with the HTTP Source URL we copied earlier. Paste it here and hit save. By default, this job will run every five minutes. After five minutes have gone by, let’s check our extension and make sure that it ran properly. To do this, we can simply click into our Auth0 Logs to Sumo Logic extension and we will see the Cron job listed. Here, we can see when the job is scheduled to run again, the result of the last time it ran and other information. We can additionally click on the job name to see an in-depth history. Now that we have our Sumo Logic extension successfully installed and sending data, let’s go ahead and setup our dashboards in Sumo Logic so we can start making sense of the data. Installing the Auth0 Dashboards in Sumo Logic To install the Auth0 Dashboards in Sumo Logic, head over to your Sumo Logic dashboard. From here, select Library from the top level menu. Next, select the last tab titled Preview and you will see the Auth0 application at the very top. Note that at present time the Auth0 app is in Preview state, in the future it may be located in the Apps section. With the Auth0 app selected, click the Install button to configure and setup the app. Here, all you will need to select is the source category which will be the name you gave to the HTTP Source when we configured it earlier. You don’t have to remember the name as you will select the source from a dropdown list. We can leave all the other settings to their default values and just click the Install button to finish installing the app. To make sure the app is successfully installed, click on Library from your top level menu and select the tab titled Personal. You should see a new folder titled Auth0 and if you select it, you’ll see the two dashboards and all the predefined queries you can run. In the next section, we’ll take a look at the two dashboards Auth0 has created for us. Learning the Auth0 Dashboards We have created two different dashboards to better help you visualize and analyze the log data. The Overview dashboard allows you to visualize general login data while the Connections and Clients dashboard focuses primarily on showing you how and from where your users are logging in. Let’s look at deeper look into each of the dashboards. 1. Overview Dashboard The Overview dashboard provides a visual summary of login activity for your application. This dashboard is useful to quickly get a pulse on popular users, login success and fail rates, MFA usage, and the like. Login Event by Location. Performs a geo lookup operation and displays user logins based on IP address on a map of the world for the last 24 hours. Logins per Hour. Displays a line chart on a timeline showing the number of failed and successful logins per hour, over the last seven days. Top 10 Users by Successful Login. Shows a table chart with the top ten users with the most successful logins, including user name and count for the last 24 hours. Top 10 Users by Failed Login. Provides a table chart with the top ten users with the most failed logins, including user name and count for the last 24 hours. Top 10 Source IPs by Failed Login. Displays a table chart with a list of ten source IP addresses causing the most failed logins, including IP and count, for the last 24 hours. Top 10 User Agents. Displays the top ten most popular user agents in a pie chart from all connections for the last seven days. Top 10 Operating Systems. Shows the top ten most popular operating systems based on user agent in a pie chart for the last seven days. Guardian MFA Activity. Displays a line chart on a timeline showing the number of each Guardian MFA event per hour for the last seven days. 2. Connections and Clients Dashboard The Connections and Clients dashboard visualizes the logs that deal with how users are logging into your applications. This dashboard contains information such as countries, clients, and amount of times users login to specific clients. Logins by Client and Country. Displays a stacked bar chart showing the number of successful logins for the last 24 hours, grouped by both client and country name. This visualizes the relative popularity of each client overall, as well as in a given country. Logins by Client per Day.mShows a stacked bar chart on a timeline showing the number of successful logins for the last seven days, grouped by client per day. This shows the popularity of each client over the past week, and the relative popularity among clients. Connection Types per Hour. Provides a line chart on a timeline of the connection types used for the past seven days. Client Version Usage. Displays a line chart on a timeline of the Auth0 library version being used by all clients for the past seven days. This is useful to detect outdated clients, as well as to track upgrades. Top 10 Clients. Shows a table chart that lists the ten most popular clients, including client name and count for the past 24 hours. Top 10 Recent Errors. Provides a table chart with a list of the ten most frequent errors, including details on client name, connection, description and count for the last 24 hours. This is useful for discovering and troubleshooting operational issues. Logins by Client per Day. Shows a stacked bar chart on a timeline showing the number of successful logins for the last seven days, grouped by client per day. This shows the popularity of each client over the past week, and the relative popularity among clients. Connection Types per Hour. Provides a line chart on a timeline of the connection types used for the past seven days. Client Version Usage. Displays a line chart on a timeline of the Auth0 library version being used by all clients for the past seven days. This is useful to detect outdated clients, as well as to track upgrades. Top 10 Clients. Shows a table chart that lists the ten most popular clients, including client name and count for the past 24 hours. Top 10 Recent Errors. Provides a table chart with a list of the ten most frequent errors, including details on client name, connection, description and count for the last 24 hours. This is useful for discovering and troubleshooting operational issues. How to Learn More For additional learning on Auth0, please visit their site. For a video on how to configure the Sumo Logic App for Auth0, please watch here

August 9, 2016

Blog

DevSecOps in the AWS Cloud

Security teams need to change their approach in order to be successful in the AWS Cloud. DevSecOps in the AWS Cloud is key. DevSecOps in the AWS Cloud Sure the controls you’re using are similar but their application is very different in a cloud environment. The same goes for how teams interact as they embrace cloud technologies and techniques. The concept of DevOps is quickly becoming DevSecOps which is leading to strong security practices built directly into the fabric of cloud workloads. When embraced, this shift can result in a lot of positive change. Teams Level Up DevSecOps in the AWS Cloud With security built into the fabric of a deployment, the integration of technologies will have a direct impact on your teams. Siloed teams are ineffective. The transition to the cloud (or to a cloud mindset) is a great opportunity to break those silos down. There’s a hidden benefit that comes with the shift in team structure as well. Working hand-in-hand with other teams instead of a “gate keeper” role means that your security team is now spending more time helping the next business initiative instead of racing to put out fires all the time. Security is always better when it’s not “bolted on” and embracing this approach typically means that the overall noise of false positives and lack of context is greatly reduced. The result is a security team that’s no longer combing through log files 24/7 and other security drudge work. The shift to a DevSecOps culture lets your teams focus on the tasks they are better at. Resiliency The changes continue to pay off as your security team can now start to focus more on information security’s ignored little brother, “availability”. Information security has three primary goals; confidentiality, integrity, and availability. The easy way to relate these goals is that security works to ensure that only the people you want (confidentiality) get the correct data (integrity) when they need it (availability). DevSecOps in the AWS Cloud And while we spend a lot of time worrying and talking about confidentiality and integrity, we often ignore availability typically letting other teams address this requirement. Now with the functionality available in the AWS Cloud we can actually use aspects of availability to increase our security. Leveraging features like Amazon SNS, AWS Lambda, and Auto Scaling, we can build automated response scenarios. This “continuous response” is one of the first steps to creating self-healing workloads. When you start to automate the security layer in an environment where everything is accessible via an API some very exciting possibilities open up. This cloud security blog was written by Mark Nunnikhoven, Vice-President of Cloud Research at Trend Micro. Mark can be reached on LinkedIn at https://ca.linkedin.com/in/marknca or on Twitter @marknca. Learn More For additional learning on AWS, please visit these video resources 1. AWS re:Invent 2015 | (DVO207) Defending Your Workloads Against the Next Zero-Day Attack https://www.youtube.com/watch?v=-HW_F1-fjUU Discussion on how you can increase the security and availability of your deployment in the AWS Cloud 2. AWS re:Invent 2015 | (DVO206) How to Securely Scale Teams, Workloads, and Budgets https://www.youtube.com/watch?v=Xa5nYcCh5MU Discussion on lessons from a CISO, featuring Jim Hoover, CISO Infor along with Matt Yanchyshyn from AWS and Adam Boyle from Trend Micro.

AWS

August 3, 2016

Blog

Using Bintray with Continuous Integration

Developing your company’s pipeline for continuous integration is a daunting and ever-evolving effort. As software becomes more complex, a need to distribute versioned internal libraries will often arise. Tools like NuGet and Maven provide excellent platforms for distribution, but running your own server can be an unnecessary hassle. Enter Bintray.What is Bintray?Bintray bills itself as “…package hosting and download center infrastructure for automated software distribution.” Perhaps a simpler way to think of it is as a way to manage and distribute your versioned products. The system provides integration with repository systems like NuGet, Maven, and Docker, making it easy to find ways of utilizing the service within your build and delivery processes. Paid subscriptions offer you additional features such as private repositories and historical statistics, while an open-source license still offers CDN-based distribution and public repositories that support formats like NuGet, NPM, YUM, and more.An Example: NuGet LibraryI spend a lot of time working in .NET and Visual Studio. The software I help build is extraordinarily complex and is comprised of over 100 different pieces. Not all of those pieces are proprietary, and some are very stable. We’ve found that in these situations, it is often helpful to repackage those components as libraries within NuGet.Here, we’ll pretend we have a library called FileHelper that is both open- source and stable, used throughout our code. We use TeamCity at work, so once the project has been separated from our main solution, we’ll have a build project configured for the library:One of the build steps would be a Powershell script that does various housekeeping before packing the project into a properly formed NuGet package (see this link for instructions). Once the package is built through NuGet commands, I’ll again use the NuGet executable to push my package to Bintray. The following notes should help you get through:Make sure you’re using a version of NuGet that is registered with your system path on Windows.Register your Bintray repository with NuGet on your machine (before trying to run this in a build step in TeamCity) using the steps described by Bintray.After you pack your nupkg file, use the described nuget push command to actually get your version pushed to Bintray’s repository.When you’ve finished all of this, you should see your new version (in this case, mine was the first version) listed on your package’s version section:Other Bintray FeaturesBintray has plenty of useful features if you subscribe. For starters, there is the ability to restrict access to packages using access tokens, which appear to work as though they were download keys. You can also white-label your downloads with your own domain, and you can even determine what countries are able to access your downloads.Like FTP, but BetterTaking care of binary distribution is an important step in the DevOps process, allowing a codebase to be much more flexible in how it handles dependencies and deployments. In the end, Bintray is a solution that can help alleviate several painful components of your development pipeline.Editor’s Note: Using Bintray with Continuous Integration is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out the Sumo Logic Developers Open Source page for free tools, API’s and example code that will enable you to monitor and troubleshoot applications from code to production.About the AuthorAndrew Male (@AndyM84) is a senior engineer at an enterprise development company in Boston, MA. Andrew has been programming from a young age and is entirely self-taught; he has spent time in many corners of the programming world including game/VR work, agency work, and teaching development to students and adults alike. He spends most of his time working on architecture design and pursuing his favorite hobby—physics.

July 27, 2016

Blog

CIS AWS Foundations Benchmark Monitoring with Sumo Logic

Blog

Dockerizing Apps that Scale Using Docker Compose

Blog

Dockerizing Microservices for Cloud Apps at Scale

Last week I introduced Sumo Logic Developers’ Thought Leadership Series where JFrog’s Co-founder and Chief Architect, Fred Simon, came together with Sumo Logic’s Chief Architect, Stefan Zier, to talk about optimizing continuous integration and delivery using advanced analytics. In Part 2 of this series, Fred and Stefan dive into Docker and Dockerizing microservices. Specifically, I asked Stefan about initiatives within Sumo Logic to Dockerize parts of its service. What I didn’t realize was the scale at which these Dockerized microservices must be delivered. Sumo Logic is in the middle of Dockerizing its architecture and is doing it incrementally. As Stefan says, “We’ve got a 747 in mid-air and we have to be cautious as to what we do to it mid-flight.” The goal in Dockerizing Sumo Logic is to gain more speed out of the deployment cycle. Stefan explains, “There’s a project right now to do a broader stroke containerization of all of our microservices. We’ve done a lot of benchmarking of Artifactory to see what happens if a thousand machines pull images from Artifactory at once. That is the type of scale that we operate at. Some of our microservices have a thousand-plus instances of the service running and when we do an upgrade we need to pull a thousand-plus in a reasonable amount of time – especially when we’re going to do continuous deployment: You can’t say ‘well we’ll roll the deployment for the next three hours then we’re ready to run the code,’ That’s not quick enough anymore. It has to be minutes at most to get the code out there.” The Sumo Logic engineering team has learned a lot in going through this process. In terms of adoption and learning curve Stefan suggests: Developer Education – Docker is a new and foreign thing and the benefits are not immediately obvious to people. Communication – Talking through why it’s important and why it’s going to help and how to use it. Workshops – Sumo Logic does hands-on workshops in-house to get its developers comfortable with using Docker. Culture – Build a culture around Docker. Plan for change – the tool chain is still evolving. You have to anticipate the evolution of the tools and plan for it. As a lesson learned, Stefan explains, “We’ve had some fun adventures on Ubuntu – in production we run automatic upgrades for all our patches so you get security upgrades automatically. It turns out when you get an upgrade to the Docker Daemon it kills all the running containers. We had one or two instances where, this wasn’t in production fortunately, but in one or two instances we experienced where across the fleet all containers went away. Eventually we traced it back to Docker Daemon and now we’re explicitly holding back Docker daemon upgrades and make it an explicit upgrade so that we are in control of the timing. We can do it machine by machine instead of the whole fleet at once.” JFrog on Dockerizing Microservices Fred likewise shared JFrog’s experiences, pointing out that JFrog’s customers asked early on for Docker support. So JFrog has been in it from the early days of Docker. Artifactory has supported Docker images for more than 2 years. To Stefan’s point, Fred says “we had to evolve with Docker. So we Dockerized our pure SaaS [product] Bintray, which is a distribution hub for all the packages around the world. It’s highly distributed across all the continents, CDN enabled, [utilizes a] MongoDB cluster, CouchDB, and all of this problematic distributed software. Today Bintray is fully Dockerized. We use Kubernetes for orchestration.” One of the win-wins for Frog developers is that the components the developer is “not” working on are delivered via Docker, the exact same containers that will run in production, on their own local workstation. ‘We use Vagrant to run Docker inside a VM with all the images so the developer can connect to microservices exactly the same way. So the developer has the immediate benefit that he doesn’t have to configure and install components developed by the other team. Fred also mentioned Xray, which was just released, is fully Dockerized. Xray analyzes any kind of package within Artifactory including Docker images, Debian, RPM, zip, jar, war files and analyzes what it contains. “That’s one of the things with Docker images, it’s getting hard to know what’s inside it. Xray is based on 12 microservices and we needed a way to put their software in the hands of our customers, because Artifactory is both SaaS and on-prem, we do both. So JFrog does fully Docker and Docker Compose delivery. So developers can get the first image and all images from Bintray.” “The big question to the community at large,” Fred says, “is how do you deliver microservices software to your end customer?” There is still some work to be done here.” More Docker Adventures – TL;DR Adventures is a way of saying, we went on this journey, not everything went as planned and here’s what we learned from our experience. If you’ve read this far, I’ve provided a good summary of the first 10 minutes, so you can jump there to learn more. Each of the topics are marked by a slide so you can quickly jump to a topic of interest. Those include: Promoting containers. Why it’s important to promote your containers at each stage in the delivery cycle rather than retag and rebuild. Docker Shortcuts. How Sumo Logic is implementing Docker incrementally and taking a hybrid approach versus doing pure Docker. Adventures Dockerizing Cassandra. Evolving Conventions for Docker Distribution. New Shifts in Microservices What are the new shifts in microservices? In the final segment of this series, Fred and Stefan dive into microservices and how they put pressure on your developers to create clean APIs. Stay tuned for more adventures building, running and deploying microservices in the cloud. https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

Blog

Automated Infrastructure Problem Discovery using Sumo Logic and Chef

The Chef-Sumo integration, which can be found on the Sumo Logic Developers open source page, is a way to inch towards something like automated “infrastructure problem discovery.” By “problem”, I mean services not working as intended. That could be a recurring incident, a persistent performance problem, or a security threat. Gathering useful statistics on the interaction between your code and your “Infrastructure-as-Code” will make it easier to discover these problems without intense rote manual querying. For example, a basic problem that you might discover is that something is in the logs where nothing should be. This could be a one-off security problem, or a problem across the entire service. Monitoring and alerting from kernel to user will tell you quickly. The Chef-Sumo combination is a quick and scalable way to build this solution. Chef’s verbose Infrastructure-as-Code is powerful, allowing for service description and discovery, e.g. automated AWS discovery and deployment. Sumo pares down Chef’s verbose output into dashboardable, queryable SaaS, and correlates it with other service logs, simultaneously widening coverage and narrowing focus. To Chef, Sumo is yet another agent to provision; rollout is no more complicated than anything else in a cookbook. To Sumo, Chef is yet another log stream; once provisioned, the Chef server is parsed into sources and registered in Sumo. Types of Problems This focus is critical. Since storage is cheap and logging services want lock-in, the instinct in DevOps is to hoard information. Too often, teams suffer from a cargo cult mentality where the data’s “bigness” is all that matters. In practice, this usually means collecting TBs of data that are unorganized, poorly described and not directed towards problem-solving. It’s much easier to find needles in haystacks with magnets. With infrastructure logs, that means finding literal anomalies, like an unknown user with privileged access. Or sometimes it means finding pattern mismatches or deviations from known benchmarks, like a flood of pings from a proxy. Problem Solving on Rails Sumo has two out-of-the-box query tools that can make the problem-solving process simpler— Outlier and Anomaly. These are part of Sumo’s “log reduce” family. Outlier tracks the moving average and standard deviation of a value, allowing for alerts and reports when the difference between the value exceeds the mean by some multiple of the standard deviation. Here’s an example query for a simple AWS alert: | source=AWS_inFooTown | parse “* * *: * * * * * * * * \ “* *://*:*/* HTTP/” as server, port, backend | timeslice by 1m | avg(server) as OKserver, avg(port) as OKport, avg(backend) as OKbackend by _timeslice | (OKserver+OKport+OKbackend) as total_time_OK | fields _timeslice, total_time_OK | outlier total_time_OK In other search tools, this would require indexing and forwarding the sources, setting up stdev searches in separate summary indexes, and collecting them on a manually written average. Not only does that take a lot of time and effort, it requires knowing where to look. While you will still need to parse each service into your own simple language, not having to learn where to deploy this on every new cookbook is a huge time-saver. Anomaly is also a huge time-saver, and comes with some pre-built templates for RED/YELLOW/GREEN problems. It detects literal anomalies based on some machine learning logic. Check here to learn more about the logic’s internals. How to Look before You Leap While it’s all hyperloops and SaaS in theory, no configuration management and monitoring rollout is all that simple, especially when the question is “what should monitor what” and the rollout is of a Chef-provisioning-Sumo-monitoring-Chef process. For example, sometimes the “wrong” source is monitored when Chef is provisioning applications that each consume multiple sources. The simplest way to avoid the confusion at the source is to avoid arrays completely when defining Sumo. Stick with hashes for all sources, and Chef will merge based on the hash-defined rules. Read CodeRanger’s excellent explanation of this fix here. This is a pretty tedious solution, however, and the good folks at Chef and Sumo have come up with something that’s a lot more elegant: custom resources in Chef, with directives in the JSON configuration. This avoids source-by-source editing, and is in line with Sumo’s JSON standards. To get started with this approach, take a look at the custom resources debate on GitHub, and read the source for Kennon Kwok’s cookbook for Sumo collectors. Editor’s Note: Automated Infrastructure Problem Discovery using Sumo Logic and Chef is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out the Sumo Logic Developers Open Source page for free tools, API’s and example code that will enable you to monitor and troubleshoot applications from code to production. About the Author Alex Entrekin served on the executive staff of Cloudshare where he was primarily responsible for advanced analytics and monitoring systems. His work extending Splunk into actionable user profiling was featured at VMworld: “How a Cloud Computing Provider Reached the Holy Grail of Visibility.” Alex is currently an attorney, researcher and writer based in Santa Barbara, CA. He holds a J.D. from the UCLA School of Law. Resources: LearnChef on Custom Resources CodeRanger on Solving Attribute Merging Problems in Chef with Hashes Gist of Custom Chef Resources in Sumo by Ken Kwok More Ken Kwok on Custom Resources in Chef & Sumo Using Multiple JSON files to Configure Sumo Sources

Blog

CI/CD, Docker and Microservices - by JFrog and Sumo Logic’s Top Developers

Blog

Sumo Dojo Winners - Using Docker, FluentD and Sumo Logic for Deployment Automation

Recently, Sumo Dojo ran a contest in the community see who is analyzing Docker logs with Sumo Logic, and how. The contest ran the month of June and was presented at DockerCon. Last week, the Sumo Dojo selected the winner, Brandon Milsom, from Australia-based company Fugro Roames. Roames uses remote sensing laser (LIDAR) technology to create interactive 3D asset models for powerline networks for energy companies in Australia and the United Kingdom. As Brandon writes: "We use Docker and Sumo Logic as a part of our deployment automation. We use Ansible scripts to automatically deploy our developer’s applications onto Amazon EC2 instances inside Docker containers as part of our cloud infrastructure. These applications are automatically configured to send tagged logs to Sumo Logic using Fluentd, which our developers use to identify their running instances for debugging and troubleshooting. Not only are the application logs sent directly to Sumo Logic, but the Docker container logs are also configured using Docker’s built in Fluentd logging driver. This forwards logs to another Docker container on the same host running a Fluentd server, which then seamlessly ships logs over to Sumo Logic. The result is developers easily access their application and container OS logs that their app is running in just by opening a browser tab." Part of our development has also been trialling drones for asset inspection, and we also have a few drone fanatics in our office. Winning a drone would also be beneficial as it would give us something to shoot at with our Nerf guns, improving office morale. Brandon's coworker, Adrian Howchin also wrote in saying" "I think one of the best things that we've gained from this setup is that it allows us to keep users from connecting (SSH) in to our instances. Given our CD setup, we don't want users connecting in to hosts where their applications are deployed (it's bad practice). However, we had no answer to the question of how they get their application/OS logs." Thanks to SumoLogic (and the Docker logging driver!), we're able to get these logs out to a centralized location, and keep the users out of the instances. Congratulations to Brandon and the team at Fugro Roames. Now you have something cool to shoot at.

July 12, 2016

Blog

Key Takeaways from MongoDB World 2016

Blog

Managing Containers with Docker Shipyard

Blog

Sumo Logic Now Monitors and Optimizes MongoDB Deployments

Blog

New York State of Mind: Architecting the Cloud

Blog

Role-Based Access and Containers as a Service

Blog

Top 5 Questions From DockerCon 2016

Developers, IT Ops engineers and enterprise professionals converged on DockerCon 2016 with a vengeance and Sumo Logic was on hand to show them how the Sumo Logic platform gives them visibility into their Docker ecosystems. We also released a new eBook for practitioners, Docker, From Code to Container that provides best practices for building, testing and deploying containers and includes hands-on exercises with Docker Compose. The Sumo Logic Community also announced a chance to win a DJI Phantom 3 Drone. The contest ends June 30, so there’s still time. With announcements of new features like Containers as a Service and tools like Docker Universal Control Plane (UCP), Docker is taking the deployment of microservices via containers to a whole new level. UCP offers automated container scanning and the ability run signed binaries. As a primarily DevOps crowd with a heavy bent toward the developer, there was a lot of interest in Docker logging, monitoring and analytics, and we received a lot of questions about the internals of the Sumo Logic approach to collecting logs. In fact, the #1 question I got was how we implemented the container, so I thought I’d answer that and other questions here. How Does Sumo Logic Operate in a Docker Ecosystem? Sumo Logic uses a container to collect and ship data from Docker. The image itself contains a collector and a script source. You can grab the image from DockerHub by just running a Docker pull. docker pull sumologic/appcollector:latest Before you run the container, you’ll need to create an access key in Sumo Logic (see documentation for details). Then run the container using the AccessID and Access_key that you created previously. docker run -d -v /var/run/docker.sock:/var/run/docker.sock --name="sumologic-docker" sumologic/appcollector:latest The container creates a collector in your Sumo Logic account, and establishes two sources: Docker Logs and a Docker Stats. That’s it. Once the image is installed and configured locally, you simply select the App for Docker from the Library in Sumo Logic, bring up the one of the dashboards and watch data begin to populate. If you’d like to try it out yourself and don’t already have an account, sign up for Sumo Logic Free. Can you monitor more than One Docker Host? Another question I got was whether you could monitor more than one host. Apparently not all monitoring tools let you do this. The answer is, you can. As you can see in this Overview Dashboard, there are two Docker Hosts in this example. The Sumo Logic collector image typically runs on the same host as the Docker host. You can collect data from multiple hosts by installing an image on each host. Note, however, that you can only run one instance at a time. A better approach is to run the Sumo Logic Collector on one host, and have containers on all other hosts log to it by setting the syslog address accordingly when running the container. Our CTO, Christian Beedgen explains more in this post on Logging Drivers. What kind of data do you capture and what analytics do you provide? To get real value from machine-generated data, Sumo Logic takes a comprehensive approach to monitoring Docker. There are five requirements to enable comprehensive monitoring: Events Configurations Logs Stats Host and daemon logs For events, you can send each event as a JSON message, which means you can use JSON as a way of logging each event. The Sumo Logic collector enumerates all running containers, then starts listening to the event stream, collecting each running container and each start event. See my post on Comprehensive Monitoring in Docker for more detail. We call the inspect API to get configurations and send that in JSON. For logs, we call the logs API to open a stream and send each log. Now you have a record of all the configurations together with your logs, making it easy search for them when troubleshooting. For statistics, we call the stats API to open a stream for each running container and each start event, and send each received JSON message as a log. For host and daemon logs, you can include a collector into host images or run a collector as a container. Do you have any Free Stuff? No conference would be complete with a new backpack stuffed with hoodies, T-shirts and may be a Red Hat (Thanks guys!) But I also believe in adding value by educating developers and ops. So, I’ve put together an eBook, Docker – From Code to Container, that I hope you’ll find interesting. Docker From Code to Container explains how containerization enables Continuous Integration and Continuous Delivery processes, shows how you can take Docker to production with confidence, walks you through the process of building applications with Docker Compose, and presents a comprehensive model for monitoring Docker both your application stack and your Docker ecosystem. Ultimately, you will learn how containers enable DevOps teams build, run and secure their Dockerized a applications. In this Webinar you will learn: How Docker enables continuous integration and delivery Best practices for delivering Docker containers to production How to build Applications with Docker Compose Best practices for securing docker containers How to gauge the health of your Docker ecosystem using analytics A comprehensive approach to monitoring and logging What’s Next I’m glad you asked. We’re featuring a Live Webinar with Jason Bloomberg, president of Intelyx and Kalyan Ramanathan, VP of Marketing for Sumo Logic to dive deeper into the use cases for Docker monitoring. The webinar is July 20 at 10:00 am PDT. Be there or be square! https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

June 27, 2016

Blog

Confessions of a Sumo Dashboard Addict

When you think of Ulta Beauty, you most likely think of makeovers, not technology. But in fact, we’ve also been doing makeovers of our technology to bring together all the technical groups that touch our guests (application development, operations, e-commerce, plus our off-shore teams) under one organization to drive the best possible guest experience. For those of you who aren’t familiar with Ulta Beauty, we’re the fastest growing beauty retailer with both brick and mortar and an online presence. Back in 2014, we experienced some challenges with online guest order fulfillment during the holiday season (our busiest time of the year). Our business partners simply lacked the visibility into inventory levels during peak season. We identified the problem in advance, but due to time constraints weren’t able to resolve so we took a novel approach using machine data analytics for better visibility. We knew we needed a solution to streamline operations and proactively identify product trends. We selected Sumo Logic to help us get a real-time view of our order throughput so that we could manually bring down order levels if throughput went too high. In my role as VP of Guest-facing Services, I rely on deep technical knowledge and business sense to make sure our applications are running smoothly. Sumo Logic was easy enough for us to manage on our own. It’s flexible and simple but powerful, which enables me to ensure my business stakeholders are empowered with the information they need to be successful. Fast forward to holiday season 2015. We not only improved our backend systems but also expanded and operationalized Sumo Logic with our DevOps, App Dev and Business Partner teams. We created multiple dashboards and reports to identify hot selling items, what’s trending, inventory levels and more. This was huge for our business partners, who could then consciously make inventory business decisions on their own. The biggest impact of rolling out Sumo Logic has been the ability to impact the guest experience in a positive way and effectively manage the channel. I confess that I became a bit of a Sumo Logic dashboard addict. During the holiday season, if I was out and about, I would check the mobile view so frequently that I blew my cellular data plan. What’s next for Ulta and Sumo Logic? We’re expanding our use of Sumo and validating new business use cases for point-of-sale, network infrastructure and warehouse management systems. With Sumo’s assistance, we’re creating an enterprise application performance management roadmap that incorporates Sumo Logic’s machine data analytics to ensure the maximum reliability and stability of our business-critical systems. Now that’s a beautiful makeover!

Blog

Docker Security - 6 Ways to Secure Your Docker Containers

Blog

LXC and LXD: Explaining Linux Containers

Blog

How to Build Applications with Docker Compose

Blog

JFrog Artifactory Users Gain Real-time Continuous Intelligence with New Partnership

May 26, 2016

Blog

Delivering Analytics Behind the Analytics at Dodge Data & Analytics

If you’re not a builder or architect, you may not be familiar with Dodge Data & Analytics. We help building product manufacturers, general contractors and subcontractors, architects and engineers to size markets, prioritize prospects, strengthen market positions and optimize sales strategies. Simply put, we build the analytics engine for the builder community. In our industry, it’s important that we deliver a consistent level of operational availability in order to best serve our clients. We didn’t have a solution in place for machine data analytics and needed a way to make better use of our logs and time-series metrics data to quickly surface and remediate known and unknown issues. Sumo Logic was a great choice for helping us monitor and manage our system’s operational metrics based on ease of deployment, maintenance and support, and powerful search queries. Sumo Logic’s machine data analytics platform allows our teams to accurately correlate meaningful information to provide root cause analysis and operational behavior patterns. Literally with one click of a button we have access to our data, giving us better real-time insights into our own infrastructure, so we can in turn serve up better insights for our customers. Sumo Logic calls this continuous intelligence and it’s our approach as well. Tighter organizational collaboration is also important to us. Sumo Logic is helping to bring together various teams within Dodge, including our IT operations team, DevOps, DBAs and incident managers. It provides single version of the truth for monitoring and troubleshooting so that we work better together and solve problems faster and isn’t that what it’s all about? About the bloggers: Doug Sheley is Director of Infrastructure Systems. Jay Yeras is a system administrator at Dodge Data & Analytics, unofficially known as “Doer of Things”.

Blog

SIEM vs. Security Analytics Checklist

Blog

SIEM: Crash and Burn or Evolution? You Decide.

SIEM History: Is SIEM Dead? Often times when I am presenting at conferences around the country, people will ask me “Is SIEM Dead”? Such a great question! Has the technology reached its end of life? Has SIEM really crashed and burned? I think the answer to that question is NO. SIEM is not dead, it has just evolved. History of SIEM SIEMs unfortunately have struggled to keep pace with the security needs of modern enterprises, especially as the volume, variety and velocity of data has grown. As well, SIEMs have struggled to keep pace with the sophistication of modern day threats. Malware 15 years ago was static and predictable. But today’s threats are stealthy and polymorphic. Furthermore, the reality is that few enterprises have the resources to dedicate to the upkeep of SIEM and the use of SIEM technology to address threat management has become less effective and waned. Gartner Analyst Oliver Rochford famously wrote, “Implementing SIEMs continues to be fraught with difficulties, with failed and stalled deployments common.”(1) In Greek mythology, a phoenix (Greek: φοῖνιξ phoinix; Latin: phoenix, phœnix, fenix) is a long-lived bird that is cyclically regenerated or reborn. Associated with the sun, a phoenix obtains new life by arising from the ashes of its predecessor. Phoenix rising from the SIEM ashes The SIEM ashes are omnipresent and security analytics is emerging as the primary system for detection and response. Deconstructing SIEM Although we use the term SIEM to describe this market, SIEM is really made up of two distinct areas: SIM or Security Information Management (SIM) deals with the storage, analysis and reporting of log data. SIM ingests data from host systems, applications, network and security devices. SEM on the other hand, or Security Event Management (SEM), processes event data from security devices, network devices, systems and applications in real time. This is dealing with the monitoring, correlating and notification of security events that are generated across the IT infrastructure and application stack. Folks generally do not distinguish between these two areas anymore and just use “SIEM” to describe the market category. However, it’s important to take note of what you are trying to accomplish and which problems you are trying to solve with these solutions. Why Do We Care About SIEM? One could easily dismiss these solutions outright, but the security market is huge – $21.4B in 2014 according to our friends at Gartner. And the SIEM piece alone reached $1.6B last year. According to 451 Research the security market has around 1,500-1,800 vendors broken down into a number of main categories across IAM, EPP, SIEM, SMG, SWG, DLP, Encryption, Cloud Security, etc. And within each of these main categories, there are numerous sub categories. Security Landscape And despite the billions of dollars invested, current security and SIEM solutions are struggling to keep the bad guys out. Whether cyber criminals, corporate spies, or others, these bad actors are getting through. The Executive Chairman and former CEO of Cisco Systems famously said, “There are two types of companies, those who have been hacked and those who have no clue.” Consider for a moment that the median # days before a breach is detected exceeds 6 ½ months and that the % of victims notified by external 3rd parties is almost 70% (3). People indeed have no clue! Something different is clearly needed. This is the first in a series of blogs on SIEM and Security Analytics. Stay tuned next week for our second blog titled “SIEM and Security Analytics: Head to Head.” Additional SIEM Resources Find out how Sumo Logic helps deliver advanced security analytics without the pain of SIEM Sign up for a free trial of Sumo Logic. It’s quick and easy. Within just a few clicks you can configure streaming data, and start gaining security insights into your data in seconds. Mark Bloom runs Product Marketing for Compliance & Security at Sumo Logic. You can reach him on LinkedIn or on Twitter @bloom_mark Sources (1) Gartner: Overcoming Common Causes for SIEM Deployment Failures by Oliver Rochford 21Aug2014 (2) Forrester: Evolution of SIEM graph, taken from Security Analytics is the Cornerstone of Modern Detection and Response, December 2015 (3) Mandiant mTrends Reports

Blog

Distributed Processing with Azure Functions

As distributed computing becomes more and more available, it’s good to evaluate the different options accessible to you and your organization. At Build 2016, Microsoft announced the upcoming availability of Azure Functions, an event-driven, on-demand compute system similar to AWS Lambda Functions. Azure Functions are still a new feature in Azure, so there are bound to be growing pains while they mature. With that in mind, let’s dig deeper! Building a Twitter Aggregator with Azure Functions I’ve always lived by the mantra that learning is best done by doing. Imagine that we are building a service that generates profile information for users by aggregating their content from different websites/services. We can create a Twitter aggregator as an Azure Function. This Function (which would be part of a group of Functions for aggregating content on-demand) will accept two arguments from a web request: ID and Handle. The ID will represent a mythical unique identifier created by our system elsewhere. The Handle will be the supplied Twitter handle/screen-name of the user who has requested a profile be generated. Once these arguments are supplied, the Function should connect to Twitter and pull down the five most recent tweets for use in the profile. Creating a new Function is incredibly simple. Head over to portal.azure.com/, and search for “Function App” in the marketplace to add a new resource. It is listed underneath “Web + Mobile.” Alternatively you can go to functions.azure.com/ and get a more streamlined creation process. Either way, you are presented with the ability to name your Function and choose things like its physical location and subscription. After that, you are taken straight to an editor where you can begin configuring and coding the Function. I’ll skip through the boring parts and get to the code because there’s a bit of effort involved due to working with Twitter. Here’s a link to the completed Function: I won’t go through every line of code, but there are important elements to discuss. Line #1 is #r “Newtonsoft.Json”, which is a way of referencing assemblies. You can provide new assemblies by taking advantage of the Continuous Integration options and tying your Function to a source control repository with a bin directory that contains pre-compiled DLLs. For more information, see the documentation here. You’ll notice that I’m not creating any new classes within my Function. This is not a requirement, as this is simply a C# “script.” You are free to create classes, but as of the time of this writing, you still cannot create namespaces for organizing your classes. (This can still be achieved by building libraries that your script uses, as referenced in #1). I am making use of the HttpRequestMessage.CreateResponse() method for my Function’s responses, but as I’ll discuss in a moment, it was the “quick and dirty” approach to building a new Function. There are other options that may fit your use cases better. Feel free to explore. I opted to not use any pre-built Twitter libraries for simplicity. I don’t generally recommend this, but it also isn’t terribly complicated for app-only usage, so make a decision based on your use case. Possibilities Everywhere Although I chose to build my Function in C#, Microsoft provides numerous language options including: C# F# Powershell Batch Bash JavaScript PHP Python There’s also a wide range of options for input and output. I set up my Function as a generic webhook, meaning it provides me with an API endpoint to call for triggering the Function. If that’s not your cup of tea, though, maybe one of these satisfies: Storage Blobs (trigger whenever a file is updated/added to an Azure Storage Container) Event Hubs (trigger whenever an event is received from an Azure Event Hub) GitHub WebHook (trigger whenever a GitHub webhook request is received) HttpTrigger (trigger based off of HTTP requests) QueueTrigger (trigger whenever a message is added to an Azure Storage Queue) ServiceBusQueueTrigger (trigger whenever a message is added to a ServiceBus instance) TimerTrigger (trigger on a schedule, similar to scheduled tasks or cron jobs) There is also a full set of options included to direct request output. For my Function, I simply assumed that returning the JSON as a web response was reasonable, but there are likely much better/scalable ways of directing the output into our profiles. See the full documentation for more information. In Closing While Sumo Logic doesn’t yet provide an integration for Azure Functions, you can get visibility into other Microsoft technologies using tools like the Sumo Logic Application for Office 365 (currently in Preview) that offersPre-built dashboards, searches and reports visually highlight key activity across your O365 environment including OneDrive, Exchange Online, SharePoint, and Azure Active Directory. Check out other apps like the App for Windows Directory at sumologic.com. Hopefully this has been a simple introduction to the concept of Azure Functions. The sheer number of things that can be accomplished with them is daunting, but they are fantastic for the world of distributed computing. The documentation and community for Functions is already growing, so don’t be afraid to take them for a spin. Resources Complete Source for Example About the Author Andrew Male (@AndyM84) is a senior engineer at an enterprise development company in Boston, MA. Andrew has been programming from a young age and is entirely self-taught. He has spent time in many corners of the programming world, including game/VR work, agency work, and teaching development to students and adults alike. He spends most of his time working on architecture design and pursuing his favorite hobby—physics.

Blog

Container Orchestration with Mesos Marathon

Blog

Sumo Logic CTO Christian Beedgen Talks Unified Logs and Metrics

Blog

Sumo Logic Combines Logs, Metric Data in Analytics Service

If you are looking for application logging solutions, the “direct-to-cloud” approach is usually your best option. This approach however, is not necessarily a panacea for all of your logging and monitoring needs. Depending on your business requirements around logging and monitoring, this initial approach may be all you ever need. Or, you may find out later that your needs have changed, and you need to modify your approach to increase reliability, or to capture full stack logs. Sumo Logic is committed to making it easy to capture all of your application data. If you are running on AWS, your AWS specific logs from Cloudtrail, S3, Elastic Load Balancing and many more will most likely end up in Amazon’s Cloudwatch Logs. The solution Amazon provides for distributing these logs for analysis is Amazon Kinesis. With help from the Sumo community, Kinesis is also getting the Sumo treatment. For developers already using Kinesis to stream their logs, they can now also stream to Sumo through the Kinesis connector. If you are not yet using Kinesis and are instead using something like S3 to store your AWS logs, consider making the switch. Storing your logs in a file system like S3 increases the time it takes for other applications to collect use and analyze those logs. And let’s face it, as long as it remains un-analyzed, that treasure of data is basically a useless waste of storage space. Instead, Kinesis treats your logs as a continuous data stream which allows near real time analysis instead of forcing you to wait to see your data. Logs from the Kinesis stream are transformed into JSON by the new Sumo connector and all fields can be conveniently extracted by Sumo's JSON auto-parser. The community project can be found here on GitHub. To get the most Sumo functionality out of your technology stack, visit our GitHub page to see all open-source projects: ready and in-progress. If you would like to see more Sumo support for your technologies, come join the community help us out!

April 20, 2016

Blog

Sumo Logic Combines Logs, Metric Data in Analytics Service

If you are looking for application logging solutions, the “direct-to-cloud” approach is usually your best option. This approach however, is not necessarily a panacea for all of your logging and monitoring needs. Depending on your business requirements around logging and monitoring, this initial approach may be all you ever need. Or, you may find out later that your needs have changed, and you need to modify your approach to increase reliability, or to capture full stack logs. Sumo Logic is committed to making it easy to capture all of your application data. If you are running on AWS, your AWS specific logs from Cloudtrail, S3, Elastic Load Balancing and many more will most likely end up in Amazon’s Cloudwatch Logs. The solution Amazon provides for distributing these logs for analysis is Amazon Kinesis. With help from the Sumo community, Kinesis is also getting the Sumo treatment. For developers already using Kinesis to stream their logs, they can now also stream to Sumo through the Kinesis connector. If you are not yet using Kinesis and are instead using something like S3 to store your AWS logs, consider making the switch. Storing your logs in a file system like S3 increases the time it takes for other applications to collect use and analyze those logs. And let’s face it, as long as it remains un-analyzed, that treasure of data is basically a useless waste of storage space. Instead, Kinesis treats your logs as a continuous data stream which allows near real time analysis instead of forcing you to wait to see your data. Logs from the Kinesis stream are transformed into JSON by the new Sumo connector and all fields can be conveniently extracted by Sumo's JSON auto-parser. The community project can be found here on GitHub. To get the most Sumo functionality out of your technology stack, visit our GitHub page to see all open-source projects: ready and in-progress. If you would like to see more Sumo support for your technologies, come join the community help us out!

April 20, 2016

Blog

Sumo Logic Combines Logs, Metric Data in Analytics Service

If you are looking for application logging solutions, the “direct-to-cloud” approach is usually your best option. This approach however, is not necessarily a panacea for all of your logging and monitoring needs. Depending on your business requirements around logging and monitoring, this initial approach may be all you ever need. Or, you may find out later that your needs have changed, and you need to modify your approach to increase reliability, or to capture full stack logs. Sumo Logic is committed to making it easy to capture all of your application data. If you are running on AWS, your AWS specific logs from Cloudtrail, S3, Elastic Load Balancing and many more will most likely end up in Amazon’s Cloudwatch Logs. The solution Amazon provides for distributing these logs for analysis is AWS Kinesis. With help from the Sumo community, Kinesis is also getting the Sumo treatment. For developers already using Kinesis to stream their logs, they can now also stream to Sumo through the Kinesis connector. If you are not yet using Kinesis and are instead using something like S3 to store your AWS logs, consider making the switch. Storing your logs in a file system like S3 increases the time it takes for other applications to collect use and analyze those logs. And let’s face it, as long as it remains un-analyzed, that treasure of data is basically a useless waste of storage space. Instead, Kinesis treats your logs as a continuous data stream which allows near real time analysis instead of forcing you to wait to see your data. Logs from the Kinesis stream are transformed into JSON by the new Sumo connector and all fields can be conveniently extracted by Sumo's JSON auto-parser. The community project can be found here on GitHub. To get the most Sumo functionality out of your technology stack, visit our GitHub page to see all open-source projects: ready and in-progress. If you would like to see more Sumo support for your technologies, come join the community help us out!

April 20, 2016

Blog

Correlating Logs and Metrics

This week, CEO Ramin Sayar offered insights into Sumo Logic’s Unified Logs and Metrics announcement, noting that Sumo Logic is now the first and foremost cloud-native, machine data analytics SaaS to handle log data and time-series metrics together. Beginning this week Sumo Logic is providing “early access” to customers that are using either Amazon CloudWatch or Graphite to gather metrics. That’s good news for practitioners from developers to DevOps and release managers, because as Ben Newton explains in his blog post you’ll now be able to view both logs and metrics data together and in context. For example, when troubleshooting an application issue, developers can start with log data to narrow a problem to a specific instance, then overlay metrics to build screens that show both logs and metrics (like CPU utilization over time) in the context of the problem. What Are you Measuring? Sumo Logic already provides log analytics at three levels: System (or machine) Network Application Unified Logs & Metrics also extends the reporting of time-series data to these three levels. So using Sumo Logic you’ll now be able to focus on application performance metrics, infrastructure metrics, custom metrics and log events. Custom Application Metrics Of the three, application metrics can be the most challenging because as your application changes, so do the metrics you need to see. Often you don’t know what you will be measuring until you encounter the problem. APM tools provide byte-code instrumentation where they load code into the JVM. That can be helpful, but results are restricted to what the APM tool is designed or configured to report on. Moreover, the cost for instrumenting code using APM tools can be expensive. So developers, who know their code better than any tool, often resort to creating their own custom metrics to get the information needed to track and troubleshoot specific application behavior. That was the motivation behind an open-source tool called StatsD. StatsD allows you to create new metrics in Graphite just by sending it data for that metric. That means there’s no management overhead for engineers to start tracking something new: simply give StatsD a data point you want to track and Graphite will create the metric. Graphite itself has become a foundational monitoring tool, and because many of our customers already use it Sumo Logic felt it important to support it. Graphite, which is written in Python and open-sourced under the Apache 2.0 license, collects, stores and displays time-series data in real time. Graphite is fairly complex, but the short story is that it’s good at graphing a lot of different things like dozens of performance metrics from thousands of servers. So typically you write an application that collects numeric time-series data and sends it to Graphite’s processing backend (Carbon), which stores the data in a Graphite database. The Carbon process listens for incoming data but does not send any response back to the client. Client applications typically publish metrics using plaintext, but can also use the pickle protocol, or Advanced Message Queueing Protocol (AMQP). The data can then be visualized through a web interface like Grafana. But as previously mentioned, your custom application can simply send data points to a StatsD server. Under the hood StatsD is a simple NodeJS daemon that listens for messages on a UDP port, then parses the messages, extracts the metrics data, and periodically (every 10 seconds) flushes the data to graphite. Sumo Logic’s Unified Logs and Metrics Getting metrics into Sumo Logic is super easy. With StatsD and Graphite, you have two options. You can point your StatsD server to a Sumo Logic hosted collector or you can install native collector within the application environment. CloudWatch CloudWatch is Amazon’s service for monitoring applications running on AWS and system resources. CloudWatch tracks metrics (data expressed over a period of time) and monitors log files for EC2 Instances and other AWS resources like EBS volumes, ELB, DynamoDB tables, and so on. For EC2 Instances, you can collect metrics on things like CPU Utilization, then apply dimensions to filter by instance ID, instance type, or image id. Pricing for AWS CloudWatch is based on Data Points. A DP = 5 minute of activity (specifically the previous minutes). A Detailed DP (DDP) = 1 minute. Unified Logs and Metrics dashboards allow you to view metrics by category, and are grouped first by namespace, and then by the various dimension combinations within each namespace. One very cool feature is you can search for meta tags across EC2 instances. Sumo Logic makes the call once to retrieve meta tags and caches them. That means you no longer have to make an API call to retrieve each meta tag, which can result in cost savings since AWS charges per API call. Use Cases Monitoring – Now you’ll be able to focus on tracking KPI behavior over time with Dashboards and Alerts. Monitoring allows you to: Track SLA adherence Watch for anomalies Respond quickly to emerging issues Compare to past behavior Troubleshooting – This about determining if there is an outage and then restoring service. With Unified Logs and Metrics you can: Identify what is failing Identify when it changed Quickly iterate on ideas “Swarm” issues Root-cause Analysis – Focuses on determining why something happened and how to prevent it.Dashboards overlayed with log data and metrics allows you to: Perform historical analysis Correlate Behavior Uncover long term fixes Improve Monitoring Correlating Logs and Metrics When you start troubleshooting you really want to start correlating multiple types of metrics and multiple sources of log data. Ultimately, you’ll be able to start with Outliers and begin overlaying metrics and log data to quickly build views and help you quickly identify issues. Now you’ll be able to overlay log and metrics from two different systems and do it in real time. If you want to see what Unified Logs and Metrics can do, Product Manager Ben Newton walks you through the steps of building on logs and overlaying metrics in this short introduction.

Blog

Unified Logs & Metrics Opens a Rich Universe of Opportunities for Our Customers

Blog

Together Forever - Logs and Metrics

Together forever and never to part Together forever we two And don't you know I would move heaven and earth To be together forever with you Rick Astley There are lots of things in this world that when put together, prove to be more than the sum of their parts. For example, take a favorite dessert of nearby Napa Valley - vanilla ice cream, olive oil and sea salt. The first time you hear it, you might think -- “You are going to pour oil on my ice cream, and then top it off with some salt? Yuck! Thanks for ruining dinner.” Take the simple, but delicious peanut butter and jelly. Many europeans think it's disgusting (this is the same culture that finds snails or marmite delectable, so consider the source.). Yet it is now so American that I have heard on good authority it’s being considered as a requirement for U.S. citizenship (along with eating apple pie and listening to Taylor Swift, of course). So, before you start thinking this is a food blog, let’s get to the point. In the world of IT we have something similar - logs and metrics. Simply put, logs are usually records of something that happened at some point in time (e.g. application errors, warnings, application events, etc.) and metrics are typically measurements of something over time (e.g. the response time of the home page, server CPU, server memory, etc.). In the past, those types of data were often treated very differently. We at Sumo Logic believe that’s a problem, and have worked to solve it. In this blog, I want to dig a little deeper into why logs and metrics go together like peanut butter and jelly. Building the Tesla for Logs & Metrics Once a software company builds a product and starts selling it, it’s very hard to change the skeleton of the thing - the architecture. Switching analogies for a second - consider for a moment that software is similar to cars. When automotive designers are faced with a new problem or scenario, the easiest approach is to tinker with the existing design. This works if you’re adding new tires to an SUV for off-roading. It does not work if you are designing a fully electric car. Tesla completely reworked everything - the transmission, the engine - even the sales model. In the case of systems dealing with machine data, the tinkering approach has definitely not been successful. For example, some log analytics companies, having built engines to make sense of messy, unstructured logs, have also tweaked their mechanics to address highly structured time series data (e.g. metrics). The end result is a lot of hoop jumping to get their engines to perform the way users expect. On the other hand, companies that have chosen to build metrics analysis engines have tried their hand at logs - or at least they say they do. However, in reality, the “logs” passing through the metrics engines are so massaged and the messiness wrung mercilessly out of them, that you end up with something more akin to metrics data. So, this approach may improve engine performance, but breaks down when it comes to accuracy -- all for the sake of expediency. We dealt with this same conundrum at Sumo Logic. We decided to take a big risk and actually evolve our engine design, resulting in a next generation architecture. We built a purpose-built, high-performance engine for time series data, alongside, and integrated with, our high performance log analytics engine. We didn’t forget all of the lessons of the past in building a highly scalable, multi-tenant application -- we just didn’t take shortcuts. The end result is a platform that treats both types of data natively, or, what I consider to be the respect they deserve. It’s all About the Analytics The secret of a great peanut butter and jelly sandwich is not just bringing unique ingredients together, but creating it with the entire sandwich experience in mind. That applied to us as well when we unified logs and metrics -- what’s the point of it all? For example, we aren’t a data warehouse in the sky. People come to Sumo Logic to make sense of their data -- to do analytics. And, the core of analytics is the type of questions asked of the data. Because of data’s nature, some types of data answer certain types of questions better than others. In the case of logs, users often find themselves looking for the proverbial needle in the haystack, or, to be more accurate, hundreds of needles sprinkled across a field of haystacks. So, log analytics has to excel at “x-raying” those haystacks to get to the offending needles very quickly, and even better, detecting patterns in the needles - basically troubleshooting and root cause analysis. Counting the needles, haystacks, or even the hay straw itself is of secondary concern. With metrics, the goal is often to to look for behavior or trends over time. Users rely on time-series metrics to understand how the things they measure change over time, and whether it indicates something deeper beneath the surface. Fitness devices (like the Apple Watch) are a perfect example here. I can track my heart-rate, running speed, distance, etc., over time. This helps me determine if getting out of bed to run outside was remotely worth it. If my stats improve over time, then yes. If no, then I’m sleeping in! At Sumo Logic, we knew we couldn’t afford to ignore the different ways that people use their data. And, simply put, we couldn’t treat metrics like a half-brother of logs. So, we focused on tools that help uncover system and application behavior, not just draw nice looking graphs. Users can overlay related data to find anomalies and patterns in the data. And we strove to make the system as real-time as possible, shaving off microseconds wherever we could so the data is as relevant and timely as possible. Listen to the Users A platform that optimizes for the data and the analytics, but ignores the user, will end up being stiff and hard to use. In building this new functionality at Sumo Logic, we’ve strived to understand our users. That meant investing a lot of time and energy in talking to our users, and listening to their feedback -- and then being willing to change. At the end of the day, that is why we embarked on this journey in the first place. Our customers intuitively understand that having multiple tools for logs and metrics was slowing them down. They have transitioned, or are transitioning, to a world of small, collaborative teams, with cross-functional skill sets, who own segments of their applications. The implication is that the majority of our users aren’t log experts, server experts, or network specialists, but software developers, and/or DevOps teams that support modern applications. They know what they need to measure and to analyze because they write the code and support it in the wild. What they are not interested in is learning the deep intricacies of machine data or advanced analytics. So, after listening to our customers, we embarked on a journey to empower them to explore their data without learning complex search languages or getting a Ph.D. in machine learning. We want our users to be able to lose themselves in the analysis of their data without context-switching from one platform to another, and without diverting time away from their tasks. When it is 2 a.m., and the application is down, learning the theory of statistical analysis is definitely not fun. So, wrapping up, I’ll conclude with this: we are committed to providing the best possible experience for our users -- and that has meant questioning many of our own assumptions about how machine data analytics works. While it might have been easy to recognize that peanut butter and jelly belong together, it takes dedicated hard work and perseverance to get it exactly right. We didn’t do it to prove a point. We did it because our customers, in one way or another, asked us to. They have taken immense risks on their own to ride the cutting edge of innovation in their fields, and they deserve a machine data analytics tool that is right there in the thick of it with them. But we are strong, each in our purpose, and we are all more strong together. ― Van Helsing in Bram Stoker’s Dracula

April 12, 2016

Blog

Using Sumo on Sumo for Better Customer Insight

April 7, 2016

Blog

Why are you treating your data like corn? Silos are for corn, not data

“Toto, I’ve a feeling we’re not in Kansas any more.” Dorothy, The Wizard of Oz We, the software vendors of the world, love to talk about ridding the IT world of silos. So what, may you ask, is a silo anyway? According to the source of all knowledge (Wikipedia, of course), silo comes from the greek word σιρός (siros), which translates to “pit for holding grain.” For most of us, Wikipedia’s definition brings to mind large, metal containers that dot the golden landscapes of states like Idaho or Nebraska. But the term “silos” has also become a great visual analogy for many of the problems with IT and software development, so it’s worth delving beneath the surface to see what we can learn. Why do we need silos? My brother lived in Kansas for a few years and we’ve talked about the ways in which local farmers use grain elevators and silos. It’s amazing to think that farmers from hundreds of miles around bring their grain to these giant grain elevators so the fruit of their labor can be deposited into giant concrete silos. Why does this make sense for farmers? Because their corn is a commodity, i.e.,. all of the corn in that silo is basically the same and sold in bulk just like oil, copper ore, beans, etc. For a long time corporations have done and continue to do something very similar with their data. They create large scale processes to dump related types of data into purpose-built databases. These “data silos” often mirror the organizational structure of the IT organization–network monitoring data goes here, systems monitoring here, etc. Do Data Silos work? Silos serve an important, rather obvious function for the commodities industry. They enable the storage of commodities. No one would put rice in oil or even rice with corn. Silos optimize for costs and efficiency since the value of the commodity is in its volume, or bulk–not an individual kernel of corn. From a corporate perspective, data silos have the opposite effect of siloed corn–they increase costs, decrease efficiency and limit innovation. In the modern corporation, IT culture is rapidly evolving to a shared ownership model removing the organizational inertia to sharing data and ideas. When data is siloed, it actually creates unnecessary friction–hindering collaboration and problem solving across functional boundaries. Conversely, mixing related data can dramatically increase the value of the data by uncovering important patterns and correlating issues across different components. One only needs to look back at the recent past of data warehousing to see a mostly failed attempt to uncover this untapped value. Who benefits from silos anyway? So, who benefits from silos? In the world of commodities, silos empower industrial scale commodities trading and efficiencies. For the agriculture industry, in particular, siloed commodities have led to lower prices for consumers though, one might argue, some reduction in quality. Regardless, silo-based storage has mostly been a net positive. What about data silos? In the world of modern applications, data silos are definitely not a help to application developers and operators. And, they rarely help business decision makers. Most importantly, they don’t help end users or customers. So, who do they help? One clear answer–software vendors. You depend on their products to derive value from the data they house for you. These software products and services either empower you to gain the fullest possible value from your data or hinder and hobble your efforts. In the past, software vendors could sidestep this thorny issue because their interests were aligned with the interests of the traditional organizational structure. No more. The architecture of the modern application (microservices, containers, etc.) demands the free flow of data to connect the dots and form a complete picture. Cross-functional, empowered teams of the Agile/DevOps revolution don’t defer to functional boundaries when solving problems. So, does it make sense to use machine data analytics tools that operate in this same siloed way, holding your team back from delivering value to your customers? Better Together The last 10 years have brought amazing changes to the power of software to deliver business value. The cause of this shift is the symbiotic relationship between powerful cultural changes in how companies organize themselves and get work done along with the tectonic shifts in how business value is delivered over the Internet to customers at massive scale and near real-time speed. Unfortunately, many corporations are still stuck in the world of data silos and losing out on the rich value their data has to offer by treating it like corn. So, it’s time to see how valuable your organization’s data truly is. Don’t let anyone, or any product, stand in your way.

April 5, 2016

Blog

A Machine Data Analytics Solution Built for DevOps

DevOps has gone mainstream. The DevOps philosophy of increasing cross-functional collaboration to quickly deploy high quality software releases that address end-customer demands makes good business sense. Case in point: a 2015 survey by the analyst firm, EMA, found that 80 percent of companies now report the formation of cross-functional teams supporting application development and delivery.More and more of these cross-functional teams (i.e., development, operations, customer success/support, SecOps, and line-of-business) are relying on log and/or machine data analytics solutions to speed their processes. However, it’s important to realize that not all machine data analytics solutions are alike, especially if they are marketed as cloud solutions. At Sumo Logic, we believe cloud-native solutions – that is, SaaS that’s built for the cloud, not merely hosted in the cloud – provide a superior value for DevOps teams looking to leverage machine data analytics to build, run and secure their applications. Below are five key reasons why:Reason 1: New Functionality Without the WaitWith a cloud-native, machine data analytics SaaS, you have the simplicity of being up and running in minutes, as opposed to weeks or months with a traditional, enterprise packaged software. And, cloud-native solutions are designed for velocity; new features can be added at a much faster rate, enabling faster time-to-value and better quality for the same price. New updates are rolled out to all users at the same time, so no one is left behind on an outdated version. So there is no need to wait 12 or 18 months for upgrades to provide access to new features (or spend time and money deploying them). The result is a machine data analytics platform that keeps pace with the shorter development cycles and agile methodologies you employ to drive your own software release velocity.Reason 2: The Elasticity You Need Without Sticker Shock In a rapid DevOps world of continuous technology and underlying infrastructure changes, problems can surface when you least expect it. Just consider the moment when your production applications or infrastructure has a serious problem, and it is “all hands on deck” to investigate and resolve. Suddenly, you have your entire team running simultaneous queries with debug mode activated, bringing in more data than usual.From a DevOps perspective, this scenario is where the elasticity of a multi-tenancy platform outshines a single tenant solution. Since only a small percentage of customers have “incidents” at the exact same time, excess capacity is always available for you and your cross-functional team to use in your hour of crisis. This elasticity also comes in handy to address the expected or unexpected seasonal demands on your application environment. Lastly, with a cloud-based metering payment model, your elasticity doe not come at an extra price. On the contrary, you pay for your “average” capacity, even though there may be times when you utilize more to handle those bursts that require 5, 10 or 100 times more capacity.Reason 3: Scalability Without Performance HitsFor DevOps engineers, machine generated log data is growing exponentially from their IT environments as their infrastructure and application stacks grow in complexity. In fact, by 2020, IT departments will have 50 times more data to index, search, store, analyze, alert and report on. To address this demand, cloud-native solutions have dedicated tiers for each log management function (ingest, index, search, etc.) that scale independently, to keep up with your data’s volume and velocity of change. And, there’s no hassle with managing search nodes or heads in order to ensure search performance. Since not all cloud-based solutions are alike, make sure to understand how much data you can store, query, analyze or report on before being locked out when considering a potential machine data data analytics vendor. Remember, when you have a problem is exactly when you’ll need to scale.Reason 4: Reliability. Reliability. Reliability.The power of aggregated computing resources in a multi-tenant, native cloud architecture provides built-in availability for peak capacity when you need it without incurring payment penalties for bursting. This architectural approach results in always-on service continuity and availability of your data. Additionally, cloud SaaS ensures enterprise-class support for you and your team. Solving customer problems quickly enables SaaS vendors to deliver value across the entire customer base, which is very motivating for the vendor.Reason 5: Never Overbuy or OverprovisionRapid development cycles will also fluctuate your machine data analytics needs over time (e.g. more usage fluctuations, more cross-functional projects that will bring more data, etc.), thereby complicating planning. Native cloud solutions that include metered billing alleviate the need to think ahead on current and future needs—just simply pay as you go for what you need. Additionally, since you will never overbuy or overprovision, your ROI improves over the long term.In summary, DevOps philosophies and agile methodologies are driving the new pace of business innovation. So, it only seems logical that your machine data analytics tools would need to do the same for DevOps, and anything different will just slow you down. Arm yourself with a machine data analytics solution built for the cloud to truly align with your DevOps needs.To learn more, I encourage you to read Ramin Sayar’s recent blog, Built for the Cloud or Hosted in the Cloud – So What’s the Difference? And, you can obtain a checklist of questions to review when considering cloud vendors from a new, IDC white paper, Why Choose Multi-Tenant Cloud-Native Services for Machine Data Analytics.

Blog

Kinesis Streams vs Firehose

Blog

Korean Short Rib Taco or the Blue Plate Special - Three Reasons to Order Data Analytics as a Service with Your Microservices Architecture

One of the current market shifts getting a lot of attention these days is the transition from traditional monolithic application architectures to microservices, the impact of which can be felt from technological innovation to business/team culture. We, at Sumo Logic, believe in the importance of this trend because the same forces driving companies to microservices are also driving them to cloud-based services for supporting capabilities like machine data analytics. Below are three reasons why the benefits of consuming machine data analytics as a service greatly outweighs the old monolithic application delivery model. Focus on the Product, not the Skillsets In recent years, the gourmet food truck has become one of my favorite food trends. The expectation of lines of food trucks all peddling some highly refined comfort food has been a welcome motivation on many family excursions. My wife can get korean-style fried chicken, my daughter a quesadilla, and I can indulge in a completely ridiculous barbeque sandwich with an egg on top. In my mind, microservices are like food trucks while monolithic applications remind me of those cafeterias my grandma dragged us to as kids. Bin after bin of unremarkable, over salted food made by general purpose cooks counting down the hours to clock-out. Sound familiar? Companies today face the same dilemma as the food industry - optimize for predictable mediocrity or allow for creativity, with the higher risk of failure. Customers have made that decision for them - the old way of doing things is not working. What this means is the the modern, digital company has had to move to an organization philosophy focused on business capabilities, not silo-ed, generalized capabilities like database architects and web operations teams. This allows those companies to rapidly develop and experiment with new functionality without toppling the entire edifice, and, most important, make a better quality product in the process. Essentially, create specialization around your product - “korean-style fried chicken”, not platform components - “warmer of canned green beans”. Shared ownership with shared standards Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure. -- Melvyn Conway, 1967 What I have found with food trucks is that their product tends to be highly unique and, at least for the successful ones, highly attuned to the tastes of their users. On the other extreme, the cafeteria is attuned to nobody, and responsive to nobody. In today’s world of opinionated consumers, restaurants have to balance these extremes in order to be successful - the quality of the food truck balanced against the efficient operations of the tightly managed cafeteria. Companies, large and small, are facing this same conundrum - moving from hierarchical decision making that produces middling applications to more dynamic shared-ownership models that can produce unique engaging experiences. For example, traditional, monolithic applications rely on shared components like databases and web services. With microservices, small, cross-functional, multi-skilled teams are expected, and empowered, to maintain stable components with well-known integrations points and APIs, built on common infrastructure components. The quality of the output and the agility of the team is valued above office politics. Consuming your data analytics as a cloud-native service delivers similar benefits to microservices.. The service accepts data from your application in well-understood ways with published methods for processing and analyzing data. And just like with microservices, this allows your team to focus on their application services, rather than having to worry about supporting data services like log or metrics analytics. Focus on Agility Remaining relevant to today’s discerning customers makes speed essential to business. My grandma’s preferred cafeteria menu remained unchanged in the 20 odd years I can remember going there. Food trucks, on the other hand, can change their menu at the drop of a hat in response to the fickle foodies everywhere. The new modern application architecture emphasizes agility and flexibility to respond to similar expectations in the Internet world. Users change loyalties as often as hipsters take food selfies, and businesses can no longer accept nail-biting code deployments every few months. Just like food trucks enable cooks to experiment at the micro-level within a standard kitchen platform, microservices have allowed engineering teams to simplify the act of making changes by pushing complexity down into the smallest box possible. Now, instead of standardizing on lumbering platform components, these companies have standardized on APIs and infrastructure automation. The appeal of data analytics as a service is driven by the same imperatives. Engineers can bake machine data spigots into their microservices components and then depend on the data analytics services’ to provide the appropriate services, as published. I have good memories of eating bland mash potatoes with my grandma, just like I have fond memories of supporting monolithic applications in my early career. But you won’t catch me sneaking into a cafeteria or a chain restaurant for plate full of highly-salted vegetables these days. I’ve moved on. The world has moved on. You can say the same with the modern application. It’s time to focus on making the best possible korean taco or falafel sandwich your business can possibly make, and let us take care of machine data analytics for you. Now, go get some lunch.

Blog

Built for the Cloud or Hosted in the Cloud – So What’s the Difference?

Have you heard of the term, “cloud washing”? It is defined as the purposeful, and sometimes deceptive, attempt by a vendor to rebrand an old product or service by slapping the buzzword “cloud” on it. Unfortunately, this practice is quite prevalent in the industry, which confuses and frustrates customers, and is picking up coverage in the media. Case-in-point: Last month, Salesforce CEO Marc Benioff called out the deceptive practice in an interview with Mad Money host Jim Cramer.The machine analytics space is no exception in that some vendors cloud wash their products. Therefore, before making any cloud-based purchase, we encourage you to understand the difference between services that are built for the cloud — truly cloud-native— versus products that are simply hosted in the cloud (e.g., managed service).Based on our ~ 6 years of experience in developing cloud-native platforms and applications, and based on what our more than one thousand unique paying customers are telling us, we’ve decided to simplify the confusion by providing a table of the key features and explanations. Equal Value for all Customers, Regardless of How Much You PayThe benefits of machine data analytics running on a multi-tenant system — that is, a system designed to equally support all customers regardless of contract or company size, are remarkable. First, a problem incurred by one customer leads to a quick resolution that benefits everyone who uses the system. Cloud-native providers are especially motivated to address issues as soon as possible to increase the platform’s value for all users, preventing problems from recurring for other customers. In essence, you get enterprise-class support even if you are a small or-medium sized business (SMB).Second, a multi-tenant system can handle load fluctuations seamlessly by dynamically scaling resources for any single customer because resources are shared across all customers. In fact, the sharing of these resources is an inherent advantage for a couple of reasons. A native cloud-based SaaS vendor can more easily predict load needs from aggregate searches, anomaly detection, and alerts across all system users because the responsibility of managing the platform belongs to the provider, not the customer. Also, excess capacity is always available because not all customers are simultaneously at full capacity, and only a small percentage will have incidents at the exact same time. When you really need it, 10, 20, or even 100 times capacity can be available.AgilityTechnology change is the new norm as organizations develop, test, deploy, and update applications at faster and faster rates. Similarly, cloud-built machine data analytics solutions are also designed with speed and rapid deployment in mind. This means for one constant price, you get quicker access to new capabilities and realize faster time-to-value. In addition, with self-service and rapid provisioning of users, data, and applications through automated onboarding wizards, customers can easily expand usage to other users, groups and organizations. And, customers can choose what features they want or don’t want to use, and provide instant feedback to help improve the service at no additional costs.Elasticity and BurstingThe data volume and rate of change in your environment combined with the demand for troubleshooting and analysis can cause traffic spikes and usage fluctuations that could bring your machine data analytics to a halt. Anticipating this issue, cloud-native vendors create platform architectures with dedicated, separate tiers for each log management function (e.g. ingest, index, search, alert, etc.) that scale independent of each other. This multi-tiered architecture approach distributes data processing and ensures reliable system performance because you can execute multiple tasks regardless of data bursts.Some hosted managed service vendors may claim 10x bursting, but it comes with a price. With a license-based payment model, customers will experience lockout if their data exceeds the license’s predesignated number of bursts. If your business is experiencing greater activity than expected, being locked out from your data is the last thing you need, negatively impacting the business. Moreover, if a hosted managed vendor relies on one server or one component to do double duty – like indexing and serving data—the system can grind to a halt if you attempt to run a search while ingesting log data. This limitation of the hosted managed services can seriously hamper the service for customers, let alone their revenue goals. Even worse, some vendors intentionally deceive customers with lower price and under provisioned infrastructure and “search heads.” Once performance and user complaints occur, they require additional expenses and time to resolve. Don’t be cloud-washed. Ask how often you will be allowed to go over your license limit.SecurityA good litmus test when evaluating cloud vendors is how seriously they take security. Is security an after thought, bolted-on, or is security enabled from the get go, as services are spun up? Do vendors live vicariously through the security attestations of their infrastructure providers, like AWS and Azure, or have they committed the resources – both in time and money – to pursue their own certifications. In Amazon’s shared security model, AWS is responsible for the infrastructure, while organizations are responsible for the security of everything else that runs on top of that infrastructure- the applications, the workloads and the data. These additional certifications are required!At Sumo Logic, we protect customer data with security measures such AES 256-bit encryption for data at rest and TLS for encryption of data in motion. We also hold “must-have” attestations such as PCI DSS 3.1 Service Provider Level 1, ISO 27001, SOC 2 Type 2, CSA Star, HIPAA, FIPS-140, and the EU-US Privacy Shield.Organizations are making different decisions based on the trust level they have with their service provider, and we take this very seriously investing millions to achieve and maintain on an ongoing basis, these competitive differentiators.Too many people try to pass on as “good enough” the certifications their IAAS-provider has achieved. Don’t be fooled into comfort by these surrogate attestations.We are starting to see regulations mandate – like PCI DSS – that organizations include a written agreement that their service providers are responsible for the security of cardholder data the service providers possess or otherwise store, process or transmit on behalf of the customer. Choosing a vendor, if not done wisely, could put your compliance and sensitive data at risk. Make sure that the cloud solutions provider you choose takes security and compliance as seriously, and possibly more serious, that you do.Pay as You GoYour volume of machine data and your needs will change and evolve over time. Unlike true cloud-native elastic SaaS, hosted machine data analytics solutions set log limits and, as mentioned previously, will lock searching when you exceed usage. The only way to restore service is to call support, which is time consuming and stops your organization from getting back to business as usual. For example, what if you deploy a new Web application which crashes and you turn-on debugging mode which increases your log volume? How will you quickly troubleshoot if your machine data analytics service is locked and you have to call support and waste time on the phone? Native cloud solutions’ metering and billing model eliminates the need to overbuy or overprovision capacity— just simply pay as you go for what you need. No service disruptions. Your machine data will always be stored; your machine data searching functionality will always work. It’s simple, scalable and always available.In SummaryIt is obvious that there are significant differences exist between “built for the cloud” and “hosted in the cloud.” Simply put, multi-tenant, cloud-native services are superior to single-tenant, cloud-hosted services. As such, we hope this blog was informative and helps accelerate your understanding on the subject. To learn more, and receive a checklist of questions to use when considering cloud vendors, we invite you to review the IDC whitepaper, Why Choose Multi-Tenant Cloud-Native Services for Machine Data Analytics.

March 29, 2016

Blog

Sumo Logic Users: Join And Help Name Your New Online Community

Blog

New Sumo Logic Doc Hub Now Available!

Blog

It’s so hard to say goodbye - An ode to whoopie cushions and the modern application

“I thought we'd get to see forever But forever's gone away It's so hard to say goodbye to yesterday. “ Boyz II Men “Everything changes and nothing stands still.” Heraclitus of Ephesus Change comes at us fast and furious. It seems like only yesterday that my four-year-old daughter thought peek-a-boo was the height of comedy, and now she is begging me to use her whoopie cushion on colleagues at work. Changes in technology are only slightly less of a blur. Machinery powering the Internet has evolved far beyond what most of us thought possible even a few years ago. As I reflect back on this evolution, below are the changes in devops methodology and toolsets that rise to the top for me. Change 1: Today’s App ain’t your Daddy’s App Back when I started out in the world of IT in the early 2000s, the 3-tier app dominated - web, app, and database. Over that same decade, that pedestrian view of the world shattered, albeit slowly, with the introduction and adoption of virtualization, the “Cloud”, distributed data centers, etc., etc. Then, as the behemoths of the Internet age really got going, we’ve seen micro-service architectures drive us to new heights in the form of distributed processing, platform as a service, and virtualization driven to its logical conclusion in containerization - just to name a few. And here’s the core idea that has hit me like a ton of bricks - these new massively scalable apps don’t lend themselves to nicely laid out architecture diagrams. What used to be a nice hierarchal tree is now spaghetti. The old approaches to architecture just can’t handle this kind of complexity and constant change. Change 2: The speed of Business is the speed of the Internet user Accusing the modern software team of doing “waterfall” is about the same as asking your mother-in-law’s if she’s put on few pounds. Bad, bad idea. Why? The digital corporations of 2016 depend on software quality and innovation for their survival. Their users have no patience for spinning icons and slowly loading pages. The mistakes these companies make will immediately be taken advantage of by their competitors. The grassroots answer to this need for agility has been a focus on small team empowerment and collaboration, while also distributing the responsibility for quality and uptime from just operations teams to cross-functional engineering teams. Consequently, these teams are now demanding technologies that supports their collaborative approach, while also being flexible enough to adapt to their particular approach to systems management. Change 3: The Modern Application demands a better toolset We talk a lot these days about how the big 4 (HP, BMC, CA, IBM) are imploding. But that isn’t the end of it. The IT management children born of the failures of the big 4 are now themselves quickly becoming outmoded. We are seeing once-innovative companies trying to convince the world that slapping “On Demand” or “Cloud” labels on on-premise software means they are up-to-snuff. Savvy customers aren’t buying it. And then on the other hand, we have monitoring and troubleshooting tools that weren’t built for the micro-services, code-to-container world. The deeper implication here is that not only do tools need to evolve to support this new microservices architecture world, but the answer to “what do I need to measure” is rapidly changing. The modern application development team doesn’t need to be spoon-fed the “proper” metrics, when they know how to measure their app’s performance - they wrote it. The twin revolutions of DevOps and Agile have pulled the wary software developer out of the safety of waterfall into the controlled chaos of continuous integration and deployment. She is on the hook for the quality of her software, and she can write her own metrics - thank you very much. “It's the end of the world as we know it, and I feel fine” R.E.M. So, what’s the takeaway of the story here? To start off, I am recruiting my accomplices at work for the “great whoopie cushion” caper. I am going to embrace my daughter’s humor where it is at, and stop making “daddy” excuses. So, don’t resist the change - embrace it. And when you do - remember that you aren’t alone. You can ask more from the tools and services you use to support your application. You and your team deserve better.

Blog

Garbage collection in Java - G1GC is NOT the savior of the world

Blog

Three reasons to deploy security analytics software in the enterprise

This security analytics blog was written by expert and author Dan Sullivan (@dsapptech) who outlines three use case scenarios for security analytics tools and explains how they can benefit the enterprise. If there were any doubts about the sophistication of today’s cyber threats, the 2014 attacks on Sony Corporation put them to rest. On November 22, 2014, attackers hacked the Sony network and left some employees with compromised computers displaying skulls on their screens, along with threats to expose information stolen from the company. Sony, by all accounts, was the subject of an advanced persistent threat attack using exploits that would have compromised the majority of security access controls. The scope of the attack forced employees to work with pen, paper and fax machines, while others dealt with the repercussions of the release of embarrassing emails. The coverage around the Sony breach may rightly leave many organizations wondering if their networks are sufficiently protected and — of particular interest here — whether security analytics software and tools could help them avoid the fate of Sony. The short answer is, yes. Just about any business or organization with a substantial number of devices — including desktops, mobile devices, servers and routers — can benefit from security analytics software. It is important to collect as much useful data as possible to supply the security analytics tool with the raw data it needs to detect events and alert administrators. So before deploying a security analytics tool, it helps to understand how such a product will fit within an organization’s other security controls and the gaps it will help fill in typical IT security use cases. Compliance Compliance is becoming a key driver of security requirements for more businesses. In addition to government and industry regulations, businesses are implementing their own security policies and procedures. To ensure these regulations, policies and procedures are implemented as intended, it is imperative to verify compliance. This is not a trivial endeavor. Consider for a moment how many different security controls may be needed to implement a network security policy that is compliant with various regulations and security standards. For instance, anti-malware systems might scan network traffic while endpoint anti-malware operates on individual devices. Then there are firewalls, which are deployed with various configurations depending on the type of traffic allowed on the sub-network or server hosting the firewall. Identity management systems, Active Directory and LDAP servers — meanwhile — log significant events, such as login failures and changes in authorizations. In addition to these core security controls, an enterprise may have to collect application-specific information from other logs. For example, if a salesperson downloads an unusually large volume of data from the customer relation management (CRM) system, the organization would want to know. It is important to collect as much useful data as possible to supply the security analytics tool with the raw data it needs to detect events and alert administrators. When companies have a small number of servers and a relatively simple network infrastructure, it may be possible to manually review logs. However, as the number of servers and complexity of the network grows, it is more important to automate log processing. System administrators routinely write shell scripts to process files and filter data. In theory, they should be able to write scripts in awk, Perl, Ruby or some other scripting language to collect logs, extract data and generate summaries and alerts. But how much time should system administrators invest in these tasks? If they write a basic script that works for a specific log, it may not easily generalize to other uses. If they want a more generalized script, it will likely take longer to write and thoroughly test. This presents significant opportunity costs for system administrators who could better spend their time on issues more closely linked to business operations. This is not to imply that the functionality provided by these scripts is not important — it is very important, especially when it comes to the kind of data required for compliance. The question is how to most efficiently and reliably collect log data, integrate multiple data sets and derive information that can help admins make decisions about how to proceed in the face of potentially adverse events. Security analysis tools are designed to collect a wide variety of data types, but there is much more to security analytics than copying log files. Data from different applications and servers has to be integrated so organizations can view a unified timeline of events across devices, for example. In addition, these solutions include reporting tools that are designed to help admins focus on the most important data without being overwhelmed with less useful detail. So, in a nutshell, the economic incentive of security analytics vendors is to provide solutions that generalize and relieve customers of the burden of initial development and continued maintenance. Security event detection and remediation The term “connecting the dots” is often used in security and intelligence discussions as a metaphor for linking-related — but not obviously connected — pieces of information. Security expert Bruce Schneier wrote a succinct post on why this is a poor metaphor: In real life the “dots” and their relation to each other is apparent only in hindsight; security analytics tools do not have mystical powers that allow them to discern forthcoming attacks or to “connect the dots” auto-magically. A better metaphor is “finding needles in a haystack,” where needles are significant security events and haystacks are logs, network packet and other data about the state of a network. Security analytics tools, at a minimum, should be able to alert organizations to significant events. These are defined by rules, such as a trigger that alerts the organization to failed login attempts to administrator accounts or when an FTP job is run on the database server outside of normal export schedules. Single, isolated events often do not tell the whole story. Attacks can entail multiple steps, from sending phishing lures to downloading malware and probing the network. Data on these events could show up in multiple logs over an extended period of time. Consequently, finding correlated events can be very challenging, but it is something security analytics software can help with. It is important to emphasize that security analytics researchers have not perfected methods for detecting correlated events, however. Organizations will almost certainly get false positives and miss some true positives. These tools can help reduce the time and effort required to collect, filter and analyze event data, though. Given the speed at which attacks can occur, any tool that reduces detection and remediation time should be welcomed. Forensics In some ways, computer forensics — the discipline of collecting evidence in the aftermath of a crime or other event — is the art of exploiting hindsight. Even in cases where attacks are successful and data is stolen or systems compromised, an enterprise may be able to learn how to block future attacks through forensics. For example, forensic analysis may reveal vulnerabilities in an organization’s network or desktop security controls they did not know existed. Security analytics tools are useful for forensic analysis because they collect data from multiple sources and can provide a history of events before an attack through the post-attack period. For example, an enterprise may be able to determine how an attacker initially penetrated its systems. Was it a drive-by download from a compromised website? Did an executive fall for a spear phishing lure and open a malicious email attachment? Did the attacker use an injection attack against one of its Web applications? If an organization is the victim of a cybercrime, security analytics tools can help mitigate the risk of being a victim to multiple forms of the same type of exploits in the future. The need for incident response planning In addition to the use cases outlined above, it is important to emphasize the need for incident response planning. Security analytics may help enterprises identify a breach, but it cannot tell it how to respond — this is the role of an incident response plan. Any organization contemplating a security analytics application should consider how it would use the information the platform provides. Its security practice should include an incident response plan, which is a description of how to assess the scope of a breach and what to do in response to an attack. A response plan typically includes information on how to: Make a preliminary assessment of the breach; Communicating details of the breach to appropriate executives, application owners, data owners, etc.; Isolating compromised devices to limit damage; Collecting forensic data for evidence and post-response analysis; Performing recovery operations, such as restoring applications and data from backups; and Documenting the incident. Security analytics tools help detect breaches and collect data, but it is important to have a response plan in place prior to detecting incidents. Enterprises do not want to make up their response plan as they are responding to an incident. There is too much potential for error, miscommunication and loss of evidence to risk an ad hoc response to a security breach. Deploying security analytics software For organizations that decide to proceed with a security analytics deployment, there are several recommended steps to follow, including: identifying operations that will benefit from security analytics (e.g. compliance activities); understanding the specific tasks within these operations, such as Web filtering and traffic inspection; determining how the security analytics tool will be deployed given their network architectures; and identifying systems that will provide raw data to the security analytics tool. These topics will be discussed in further detail in the next article in this series. Other resources: What is Security Analytics http://searchsecurity.techtarget.com/essentialguide/Security-analytics-The-key-to-reliable-security-data-effective-action https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

Blog

How Companies Can Minimize Their Cloud Security Risk

This cloud security blog was written by Robert Plant,Vice-Chairman, Department of Business Technology at the University of Miami (@drrobertplant). As enterprises move their applications and data to the cloud, executives are increasingly being faced with balancing the benefits of productivity gains with significant concerns around compliance and security. A principal area of concern relates to unsanctioned use of cloud services and applications by employees. Data from Rajiv Gupta, CEO of Skyhigh Networks, indicates that the average company now uses 1,154 distinct cloud services, and the number is growing at over 20% per year. Many organizations are simply unaware of unsanctioned cloud usage, while others acknowledge that the use of such “shadow IT,” which is technology deployed without oversight by the core enterprise technology group, is inevitable, a side effect of today’s decentralized business structures and need for agile solutions to be deployed quickly. Most concerning for chief security officers is that this growth is led by employees seeking productivity gains through unsanctioned cloud-based solutions with a wide range of security levels. Currently it is estimated by Skyhigh Networks that 15.8% of files on the cloud contain sensitive data, and that 28.1% of users have uploaded sensitive data, 9.2% of which is then shared. Employees may, for example, upload a file while overseas to a local cloud file storage service provider without checking the terms and conditions of that vendor who may in fact claim ownership rights to any content. Additionally the data storage cloud provider may not encrypt the data either during its transmission or while stored on their cloud, thus increasing the risk. Other situations include employees who take a piece of code from a cloud-based open source site and incorporate it into their own program without fully checking the validity of the adopted code. Or someone may adopt a design feature from a site that has the potential to infringe another firm’s intellectual property. Or employees may simply discuss technical problems on a cloud-based site for like-minded individuals. While this may seem a great way to increase productivity and find a solution quickly, valuable intellectual property could be lost or insights on new products could inadvertently be revealed to rivals stalking these sites. Well, cloud “lockdown” is practically infeasible. Technical solutions such as blocking certain sites or requiring authentication, certificates and platform-specific vendors will only work so far, as employees have access to personal machines and devices that can’t be monitored and secured. Instead, employers should implement a strategy under which employees can bring new tool and resource ideas from the cloud to the enterprise, which can yield great benefits. But this has to be done within an adoption framework where the tool, product or service is properly vetted from technical and legal perspectives. For example, is the cloud service being used robust? Does it employ sufficient redundancy such that if high value data is placed there that it is always guaranteed to be available? From a legal perspective it is necessary to examine the cloud service to ensure it is within the compliance parameters required by regulators for the industry. Risk can be mitigated in a number of ways including deploying monitoring tools that scan cloud access, software downloads and storage. These tools can identify individuals, IP addresses and abnormal trends. They can rank risk by site and use against profiles for cloud vendors. Technical monitoring alone is, however, not sufficient and needs to be used in combination with education, evaluation, compliance audits, transparency, accountability and openness of discussion — all positive steps that chief security officers can take to managing cloud adoption and risk.

Blog

Containerization: Enabling DevOps Teams

Blog

Sumo Logic Add-on for Heroku Now GA

Back in October Sumo Logic opened up its beta program of the Sumo Logic Add-on for Heroku. Today we are pleased to make the Sumo Logic Add-on generally available in the Heroku Elements marketplace, and adding support for Private Spaces. The Sumo Logic Add-on for Heroku helps PaaS developers to build, run and secure their applications. Using Sumo Logic’s pre-built dashboards, predictive analytics and features like Live Tail, Heroku developers can monitor their applications in real-time and troubleshoot them from code to production. Existing Heroku developers can test drive Sumo Logic with a free trial, and upgrade the add-on to a paid plan from the Heroku marketplace.The add-on is easy to set up. From Heroku Elements, simply select Sumo Logic as an add-on for your application. You can then launch the Sumo Logic service directly from your Heroku Dashboard to gain real-time access to event logs in order to monitor new deployments, troubleshoot applications, and uncover performance issues.Build, Run, SecureDevOps teams can monitor and troubleshoot their applications from code to production. Heroku’s Logplex consolidates application, system and network logs into a single stream and retains 1,500 lines log log data. Sumo Logic effortlessly collects terabytes of data from Logplex and your Heroku application. Data can be pre-parsed and partitioned on ingest to get separate views of your application and network streams. Our lightweight collectors replace traditional complex setups and effortlessly collect, compress, cache, and encrypt your data for secure transfer.Using Sumo Logic, developers can monitor application event streams in real time, tail logs in production using Live Tail, and utilize other tools to understand performance, detect critical issues, correlate events, analyze trends, and detect anomalies.DevOps teams can also utilize applications Analytics to understand how users use their app, to analyze business KPIs in real-time, and to optimize their applications to deliver the most value to customers.Secure by design, Sumo Logic maintains an array of critical certifications and attestations including PCI DSS 3.0. Ensure that your application complies with regulations like PCI or HIPAA, and that it handles sensitive data securely. Monitor access and other user behavior and detect malicious activity.Compliance with the U.S. – E.U. Safe Harbor framework SOC 2, Type II attestation Attestation of HIPAA compliance PCI DSS 3.0 FIPS 140 complianceBeyond Log Collection and CentralizationSumo Logic offers a broad set of features including monitoring, search and predictive analytics.Search and Analyze. Run searches and correlate events in real-time using a simple search engine-like syntax, such as PARSE, WHERE, IF, SUMMARIZE, TIMESLICE, GROUP BY, and SORT. LogReduce™ technology reduces hundreds of thousands of log events into groups of patterns. By filtering out the noise in your data, LogReduce can help reduce the Mean Time to Identification of issues by 50% or more. Transaction Analytics automates analysis of transactional context to decrease time associated with compiling and applying intelligence across transactions flowing through your multi-tiered Heroku application. Detect and Predict. When rules are not enough, Anomaly Detection technology powered by machine-learning algorithms detects deviations to uncover the unknowns in your data. Outlier Detection, also powered by a unique algorithm, analyzes thousands of data streams with a single query, determines baselines and identify outliers in real-time. Purpose-built visualization highlights abnormal behaviors giving Operations and Security teams visibility into critical KPIs for troubleshooting and remediation. Predictive Analytics extends and complements Anomaly and Outlier Detection by predicting future KPI violations and abnormal behaviors through a linear projection model. The ability to observe violations that may occur in the future helps teams address issues before they impact their business Monitor and Visualize. Custom Dashboards and brilliant visualization help you easily monitor your data in real-time. The Dashboards unify all data streams so you can keep an eye on events that matter. Charting capabilities such as bar, pie line, map, and combo charts help you keep an eye on the most important KPIs for your Heroku application. Alert and Notify. Custom alerts proactively notify you when specific events and outliers are identified across your data streams. Proactive notifications are generated when your data deviates from calculated baselines or exceed thresholds to help you address potential issues promptly.Integrated Solutions for DevOps TeamsSumo Logic provides out-of-the-box solutions to help you build, run and secure your Heroku applications.Docker. Provides a native collection source for your entire Docker infrastructure. Real-time monitoring of Docker infrastructure including stats, events and container logs. Troubleshoot issues and set alerts on abnormal container or application behavior. Artifactory. Provides insight into your JFrog Artifactory binary repository. The App provides preconfigured Dashboards that include an Overview of your system, Traffic, Requests and Access, Download Activity, Cache Activity, and Non-Cached Deployment Activity. Github. Coming soon, our Github beta will allow DevOps teams gather metrics to facilitate code reviews, monitor team productivity, and enable teams to secure intellectual property.Next StepsExisting Heroku customers can launch the Sumo Logic service directly from the Heroku dashboard. Select a pricing that works for you, or take a test drive for 30 days with Sumo Free. I’ve written a short getting started for Ruby Developers. If you’d like to contribute a tutorial in another language, feel free to share it on our developer community site.

Blog

Contributing to Open-source Projects

We love open source, and it is taking over the world. In fact, Sumo Logic has contributed code to a number of repositories, many of which are listed on the Open-source at Sumo Logic page. To have a strong and successful following, an open source project needs contributors, purpose, value, users, and great leadership. Contributors are the lifeline of open source projects, and there are loads of open source projects that unfortunately haven’t been touched in years. A great way to keep an open source project alive is to get involved and make contributions to an existing project you’re currently using. Why should you contribute? Why do firefighters volunteer? The reasons are actually similar: Give back to the community Be part of a team Fulfill a family tradition Passion for the career Where can you start to contribute? Selenium is one open source project to pay attention to. Personally, as a consumer of Selenium, it can feel good to give back to the project that I use every day, and be part of the team that has one of the hottest open source projects out there today. By contributing, you can gain exposure and experience. You don’t actually have to code to be a contributor. Let’s start reviewing the different types of roles you can take on as a contributor: Open-source Project Roles There are many skills and roles involved in an open source project. You only need to understand a project’s roles to find your place in the project. Project Lead or Committee Provides technical leadership for the project as a whole. The responsibilities include driving the project’s direction, working as a maintainer of code, enforcing standards and quality, and more. This role will be the only one with write access to the repository. They will review all pull requests from existing or new contributors. Developer Will contribute code by fixing bugs and implementing feature requests, building a formal website or re-design, or contributing logos and/or icons approved by project lead or committee. Verifier Activities may vary. You may be reproducing bugs, identifying important test coverage, preventing bugs, introducing new testing solutions, or reviewing documentation. (Yes, open source projects need testing.) Support Provides multiple ways to contribute to an open project. Some of the responsibilities include answering questions on discussion boards or mailing lists, writing and reviewing documentation, developing Get Started tutorial material, corresponding with the community, or creating screencasts on features, or tutorials on how to use the project. Getting Involved and Becoming an Open-source Contributor As technologists, we use our free time to learn new technologies. We can also use our time to contribute to an open source project that we love. Doing so is a great opportunity to work with talented people in the same field. Plus, contributing to open source projects allows you to network far outside your stiletto network of friends and co-workers. In addition, contributing allows you to brand yourself, develop new skills, and gain that valuable experience working on a virtual project. If you’re contributing to non-code tasks, ask yourself how your contribution will help others. Let’s review some contribution activities: Code As a developer, your first step towards contributing will probably be fixing a bug that has been approved by triage committee. The effort doesn’t have to start with a bug fix, but these types of contributions help the consumers of the project. You can implement a feature request or help build or re-design a formal website (for instance, Selenium). All code to be committed to the project needs to pass the existing unit and functional tests before a pull request is sent. Moderator Join a discussion board or mailing list to help answer questions about the project, features, review documentation, and more. Tutorials As a new user, the project may benefit from detailed boot camp tutorials (test automation) — initial setting up of environment, and getting to know and understanding of all the possible features, writing tests, executing tests, or exploration of how to integrate cloud solutions and continuous integration servers. Documentation You will find that documentation comes in two flavors: outdated and new. The most common problem with any software project is outdated documentation, especially when configuring your environment. Before committing updates or new documentation, ensure that you understand the project style guide. Bug Triage You can help manage project issues reported. You only need to be familiar with the open source project. As a triager, you will have conversations with the bug reporter and then either close the bug or move it into a bucket. A GitHub project may use labels to organize the bugs, features, documentation, or organize for upcoming releases. Testing Start by reviewing the existing test coverage. Then determine whether there are any critical tests are missing from the project that would help prevent bugs before merging the pull request. The coder is responsible for writing tests for new code introduced into the project. Open-source Projects for Test Automation As a test automation technologist, some projects looking for all types of contributors include Selenium, Appium, Protractor, and NightwatchJS. Conclusion One beneficial element when contributing to an open source project is that you decide what type of tasks to work on and how often you want to contribute. It’s exactly why people contribute to open source projects — It keeps things fun. One thing that prevents most people from contributing is getting past the fear. It is an amazing feeling to find out that something has been accepted by the public. I always feel rewarded when I see that one of my published blog posts has been tweeted. If you’re interested in giving back and you’ve never contributed to open source before, I strongly recommend that you give it a go. Find an open source role and set a goal for 2016. You will gain some great experience, exposure, and meet talented technologists along your new journey of open source. About the Author Greg Sypolt (@gregsypolt) is a senior engineer at Gannett and co-founder of Quality Element. He is a passionate automation engineer seeking to optimize software development quality, while coaching team members on how to write great automation scripts and helping the testing community become better testers. Greg has spent most of his career working on software quality — concentrating on web browsers, APIs, and mobile. For the past five years, he has focused on the creation and deployment of automated test strategies, frameworks, tools and platforms.

Blog

Local Optimization - The Divide Between DevOps and IT

Blog

Are Users the Achilles' Heel of Security?

Presaging the death of an industry or a path to user activity monitoring (UAM) enlightenment John Chamber, ex-CEO of Cisco, one said that there are two types of companies, those who have been hacked and those who don’t yet know they have been hacked? Consider for a moment, the following statistics: There were 783 major breaches in 2014 (1) This represents a 30% increase from 2013 (2) Median number of days before detection: 205 (3) Average number of systems accessed: 40 Valid credentials used: 100% Percentage of victims notified by external entities: 69% Large enterprises are finally coming to the conclusion that security vendors and their solutions are failing them. Despite the unbelievable growth in enterprise Security spend, organizations are not any safer. And security attestations like PCI and HIPAA, while helping with compliance, are not equated with a stronger security posture. Don’t believe it? Take a look at the recent announcement from Netflix where they indicated they are dumping their anti-virus solution. And because Netflix is a well-known innovator in the tech space, and the first major web firm to openly dump its anti-virus software, others are likely to follow. Even the federal government is jumping into this security cesspool. In a recent U.S. appellate court decision, the Federal Trade Commission (FTC) was granted authority to regulate corporate cybersecurity. This was done because the market has failed and it was necessary for the government to intervene through public policy (i.e. regulation or legislation). Research has indicated that security solutions are rarely successful in detecting newer, more advanced forms of malware, and scans of corporate environments reveal that most enterprises are already infected. “Enterprises are recognizing that adding more layers to their security infrastructure is not necessarily increasing their security posture,” said George Gerchow, Product Management Director, Security and Compliance at Sumo Logic. “Instead of just bolting on more and more layers, companies are looking for better ways to tackle the problem.” While security has gotten better over the years, so too have the bad actors, whether cybercriminals, hacktivists or nation states. Malware-as-a-service has made this was too easy and pervasive. You know the bad guys are going to find ways to penetrate any barrier you put up, regardless if you are running physical, virtual or cloud (PVC) infrastructures. So is all hopeless, or is there a path to enlightenment by looking at this problem through a different lens? According to a new report from CloudLock, Cybercriminals continue to focus their efforts on what is widely considered to be the weakest link in the security chain: the user. According to CloudLock CEO Gil Zimmerman, “Cyber attacks today target your users—not your infrastructure. As technology leaders wake up to this new reality, security programs are being reengineered to focus where true risk lies: with the user. The best defense is to know what typical user behavior looks like – and more importantly, what is doesn’t.” User Risks And the ROI of this approach is huge, because the report – which analyzed user behavior across 10M users, 1B files and 91K cloud applications – found that 75% of the security risk could be attributed to just 1% of the users. And almost 60% of the apps installed are conducted by highly privileged users. Given these facts, and that cybercriminals always leverage these highly coveted, privileged user accounts during a data breach, understanding user behavior is critical to improving one’s security posture. “As more and more organizations deploy modern-day productivity tools like Microsoft Office 365, Google Apps and Salesforce.com, not understanding what users are doing injects unnecessary and oftentimes unacceptable business risk,” said Mark Bloom, Product Marketing Director, Security & Compliance at Sumo Logic. Leveraging activity-monitoring APIs across these applications, it becomes possible to monitor a number of activities that help in reducing overall risk. These include: Visibility into user actions and behaviors Understand who is logging into the service and from where Investigate changes made by administrators Failed/Valid login attempts Identify anomalous activity that might suggest compromised credentials or malicious insider activity Tokens: Information about 3rd party websites and applications that have been granted access to your systems This new, emerging field of User Activity Monitoring (UAM) – applied to Cloud Productivity and Collaboration Applications - can really help to eliminate guesswork using big data and machine learning algorithms to assess the risk, in near-real time, of user activity. UAM (sometimes used interchangeably with user behavior analytics – UBA) employs modeling to establish what normal behavior looks like and can automatically identify anomalies, patterns and deviations that might require additional scrutiny. This helps security and compliance teams automatically identify areas of user risk, quickly respond and take actions. Sumo Logic applications for Office 365, Salesforce, Google Apps and Box brings a new level of visibility and transparency to activities within these SaaS-based services. And once ingested into Sumo Logic, customers are then able to combine their activity logs with logs from other cloud solutions and on-prem infrastructure, to create a single monitoring solution for operations, security and compliance across the entire enterprise. Enable cloud productivity without compromise! Sources: Identity Theft Resource Center (ITRC) Report http://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks Mandiant M-Trends Report (2012 -2015)

Blog

Sumo Logic’s Christian Beedgen Speaks on Docker Logging and Monitoring

Support for Docker logging has evolved over the past two years, and the improvements made from Docker 1.6 to today have greatly simplified both the process and the options for logging. However, DevOps teams are still challenged with monitoring, tracking and troubleshooting issues in a context where each container emits its own logging data. Machine data can come from numerous sources, and containers may not agree on a common method. Once log data has been acquired, assembling meaningful real-time metrics such as the condition of your host environment, the number of running containers, CPU usage, memory consumption and network performance can be arduous. And if a logging method fails, even temporarily, that data is lost. Sumo Logic’s co-founder and CTO, Christian Beedgen presented his vision for comprehensive container monitoring and logging to the 250+ developers that attended the Docker team’s first Meetup at Docker HQ in San Francisco this past Tuesday. Docker Logging When it comes to logging in Docker, the recommended pathway for developers has been for the container to write to its standard output, and let Docker collect the output. Then you configure Docker to either store it in files, or send it to syslog. Another option is to write to a directory, so the plain log file is the typical /var/log thing, and then you share that directory with another container. In practice, When you stop the first container, you indicate that /var/log will be a “volume,” essentially a special directory, that can then be shared with another container. Then you can run tail -f in a separate container to inspect those logs. Running tail by itself isn’t extremely exciting, but it becomes much more meaningful if you want to run a log collector that takes those logs and ships them somewhere. The reason is you shouldn’t have to synchronize between application and logging containers (for example, where the logging system needs Java or Node.js because it ships logs that way). The application and logging containers should not have to agree on specific dependencies, and risk breaking each others’ code. But as Christian showed, this isn’t the only way to log in Docker. Christian began the presentation by reminding developers of the 12-Factor app, a methodology for building SaaS applications, recommending that you limit to one process per container as a best practice, with each running unbuffered and sending data to Stdout. He then introduced the numerous options for container logging from the pre-Docker 1.6 days forward, and quickly enumerated them saying that some were better than others. You could: Log Directly from an Application Install a File Collector in the Container Install a File as a Container Install a Syslog Collector as a Container Use Host Syslog for Local Syslog Use a Syslog Container for Local Syslog Log to Stdout and use a file collector Log to StdOut and use Logspout Collect from the Docker File systems (Not recommended) Inject Collector via Docker Exec Logging Drivers in Docker Engine Christian also talked about logging drivers, which he believes have been a very large step forward in the last 12 months. He stepped through incremental logging enhancements made to Docker from 1.6 to today. Docker 1.6 added 3 new log drivers: docker logs, syslog, and log-driver null. The driver interface was meant to support the smallest subset available for logging drivers to implement their functionality. Stdout and stderr would still be the source of logging for containers, but Docker takes the raw streams from the containers to create discrete messages delimited by writes that are then sent to the logging drivers. Version 1.7 added the ability to pass in parameters to drivers, and in Docker 1.9 tags were made available to other drivers. Importantly, Docker 1.10 allows syslog to run encrypted, thus allowing companies like Sumo Logic to send securely to the cloud. He noted recent proposals for Google Cloud Cloud Logging driver, and the TCP, UDP, Unix Domain Socket driver. “As part of the Docker engine, you need to go through the engine commit protocol. This is good, because there’s a lot of review stability. But it is also suboptimal because it is not really modular, and it adds more and more dependencies on third party libraries.” So he poses the question of whether this should be decoupled. In fact, others have suggested the drivers be external plugins, similar to how volumes and networks work. Plugins would allow developers to write custom drivers for their specific infrastructure, and it would enable third-party developers to build drivers without having to get them merged upstream and wait for the next Docker release. A Comprehensive Approach for Monitoring and Logging As Christian stated, “you can’t live on logs alone.” To get real value from machine-generated data, you need to look at what he calls “comprehensive monitoring.” There are five requirements to enable comprehensive monitoring: Events Configurations Logs Stats Host and daemon logs For events, you can send each event as a JSON message, which means you can use JSON as a way of logging each event. You enumerate all running containers, then start listening to the event stream. Then you start collecting each running container and each start event. For configurations, you call the inspect API and send that in JSON, as well. “Now you have a record,” he said. “Now we have all the configurations in the logs, and we can quickly search for them when we troubleshoot.” For logs, you simply call the logs API to open a stream and send each log as, well, a log. Similarly for statistics, you call the stats API to open a stream for each running container and each start event, and send each received JSON message as a log. “Now we have monitoring,” says Christian. “For host and daemon logs, you can include a collector into host images or run a collector as a container. This is what Sumo Logic is already doing, thanks to the API.” Summary Perhaps it is a testament to the popularity of Docker, but even the Docker team seemed surprised by the huge turnout for this first meetup at HQ. As proud sponsor Sumo Logic of the meetup, we look forward to new features in Docker 1.10 aimed at enhancing container security including temporary file systems, seccomp profiles, user namespaces, and content addressable images. If you’re interested in learning more about Docker logging and monitoring, you can download Christian’s Docker presentation on Slideshare.

Blog

Technical Debt and its Impact on DevOps

Blog

5 Ways to Gain Insights Through Machine Data

The pursuit of data-driven decision making has put tracking, logging, and monitoring forefront in the minds of product, sales, and marketing teams. Engineers are generally familiar with gathering and tracking data to maintain and optimize infrastructure and application performance, but with the discovery of the power of data, the other divisions now clamor for the latest tools and instrumentation. Quite often the expense of implementation is undersold as merely placing a tag on your site or adding a library and doesn’t take into account the additional expense of tracking unique aspects of your application. To complicate matters, there is usually quite a bit of confusion on the types of data being captured by the tools a company already has in place. While various products can span multiple areas of tracking and vendors continue to deliver new methods for visualizing data, the commonly used tools for capturing data can be classified in 5 categories: Infrastructure Monitoring Application Monitoring Request Logging Event Tracking User Recording 1. Infrastructure Monitoring What it is: Information about your individual servers and how they are handling the load of your application and requests. The data includes CPU and memory usage, load, disk I/O, and memory I/O aggregated for the system or displayed individually by process. Where it comes from: Data is either captured by an agent installed on each server or a service connects to machines via SNMP and then data is sent to a centralized location for visualization. Why you want it: Understanding how your infrastructure performs allows the engineering team to proactively address issues, set alerts, assess how to scale the environment, and optimize performance. Level of Effort: Low-Medium 2. Application Monitoring What it is: Information about the performance of your application. The data includes transaction response times, throughput, and error rates. A transaction is the work your application did in response to a request from a user and the response time is comprised of any network latency, application processing time, and read/write access to a database or cache. Where it comes from: An agent is installed on each application server or a SDK is used on native applications. Agentless monitoring solutions exist for applications that support client requests. Why you want it: Application monitoring enables the engineering team to diagnose performance issues and track errors within an application. In addition to response time, detailed visibility is given into slow code execution via stack traces and slow database queries are tracked and logged for investigation. Throughput visualization provides insight on how the application performs under various loads and client side monitoring can demonstrate geographical variances for web applications. Level of Effort: Low-Medium 3. Request Logs What it is: Data contained in requests made to the server are logged in files. Referrer, remote address, user agent, header data, status codes, response data, etc. can be captured. Where it comes from: An application module, library, or SDK handles recording the requests and formatting the data being logged. Why you want it: Almost any data within the stack can be logged; from server, application, and client performance to user activities. Log data can be filtered, aggregated, and then analyzed via visualization tools to assess most aspects of your product and underlying infrastructure. Level of Effort: Low (implementing logging library) High (implementing filtration and visualization) 4. Event Tracking What it is: As a user interacts with a website or application their actions can trigger events. For example, an application can be setup so that when a user clicks submit on a registration form they trigger the “registration” event which is then captured by the tracking service. Data includes any events that have been defined for the application. Where it comes from: Generally a Javascript tag is installed on a web app and an SDK is used for native applications. Why you want it: Event data is valuable to the product and marketing team for understanding how users navigate through an application as well as any areas of friction in the UX (user experience). Generally the event data is visualized within UX flows as a funnel, with the wide top of the funnel representing the area of the application where most users start their interaction and the narrow part at the bottom of the funnel being the desired user action. Level of Effort: Low (pageviews) High (full funnel tracking) 5. User Recording What it is: Session recordings and/or heat maps of cursor placement, clicks, and scroll behaviors. Where it comes from: Generally a Javascript tag is installed on a web app and an SDK will be used for native applications. Why you want it: These tools can give insight into how users are interacting with the pages of your application between clicks to aid in identifying UX friction. Heat maps indicate areas on your application with the highest frequency of activity, while recordings can be invaluable to identify how a set of users experience the application. Level of Effort: Low Amidst the data revolution, a multitude of analytics, monitoring and visualization solutions have come out in the market. Vendors generally provide core features that are similar while having unique features that cater to certain use cases and differentiate from competitors. It is quite common to see Product, Marketing, and Sales chase the latest features of various platforms and end up with a multitude of portals to visualize data. Service providers generally store your data in their own infrastructure and multiple analytics tools means your data will be spread across disjointed data stores. Consolidation and a centralized view are paramount to maintaining a holistic view your entire team can work from. With the ability to gather data across the various layers of your product it is critical to understand what each product can deliver and have a well-thought analytics strategy to keep each division in the organization and the view of your application and infrastructure connected. About the Author Tom Overton has leveraged full-stack technical experience to run engineering teams for companies including Technicolor, VMware, and VentureBeat.

Blog

Introducing Sumo Logic Live Tail

In my last post I wrote about how DevOps’ emphasis on frequent release cycles leads to the need for more troubleshooting in production, and that developers are being frequently being drawn into that process. Troubleshooting applications in production isn’t always easy: For developers, the first course of action is to drop down to terminal, ssh into the environment (assuming you have access) and begin tailing log files to determine the current state. When the problem isn’t immediately obvious, they might tail -f the logs to a file, then grep for specific patterns. But there’s no easy way to search log tails in real time. Until now. Now developers and team members have a new tool, called Sumo Logic Live Tail, that lets you tail log files into a window, filter for specific conditions and utilize other cool features to troubleshoot in real time. Specifically, Live Tail lets you: Pause the log stream, scroll up to previous messages, then jump to the latest log line and resume the stream. Create keywords that will then be used to highlight occurrences within the log stream. Filter log files on-the-fly in real time Tail multiple log files simultaneously by multi-tailing Launch Sumo Logic Search in context of Sumo Logic Live Tail (and vice versa) Live Tail is immediately available from with the Sumo Logic environment, and coming soon is a command line interface (CLI) that will allow developers to launch live tail directly from the command line. What Can I Do With Live Tail? Troubleshoot Production Logs in Real Time You can now troubleshoot without having to log into business critical applications. Users also can harness the power of Sumo Logic by being able to launch Search in the context of Live Tail and vice versa. There is simply no need to go between different tools to get the data you need. Save Time Requesting and Exporting Log Files As I mentioned, troubleshooting applications in production with tail-f isn’t always easy. First, you need to gain access to production log files. For someone managing sensitive data, admins may be reluctant to grant that access. Live Tail allows you to view your most recent logs in real time, analyze them in context, copy and share every time via secure email when there’s an outage, and set up searches based on live tail results using Sumo Logic. Consolidate Tools to Reduce Costs In the past, you may have toggled between two tools: one for tailing your logs and another for advanced analytics for pattern recognition to help with troubleshooting, proactive problem identification and user analysis. With Sumo Logic Live Tail, you can now troubleshoot from the Sumo Logic browser interface or from a Sumo Logic Command Line Interface without investing in a separate solution for live tail, thereby reducing the cost of owning licenses for multiple tools. Getting Started There are a couple of ways to initiate a Live Tail session. From the Sumo Logic web app: Go directly to Live Tail by hovering over the Search menu and clicking on the Live Tail menu item; or From an existing search, click the Live Tail link (just below the search interface). In both instances, you’ll need to enter the name of the _sourceCategory, _sourceHost, _sourceName, _source, or _collector of the log you want to tail, along with any filters. Click Run to initiate the search query. That will bring up a session similar to Figure 1. Figure 1. A Live Tail session. To find specific information, such as errors and exceptions you can filter by keyword. Just add your keywords to the Live Tail query and click Run or press Enter. The search will be rerun with the new filter and those keywords will be highlighted on incoming messages, making easy to spot conditions. The screen clears, and new results automatically scroll. Figure 2. Using Keyword Highlighting to quickly locate items in the log stream. To highlight keywords that appear in your running Live Tail, click the A button. A dialog will open — enter the term you’d like to highlight. You may enter multi-term keywords separated by spaces. Hit enter to add additional keywords. The different keywords are then highlighted using different colors, so that they are easy to find on the screen. You can highlight up to eight keywords at a time. Multi-tailing A single log file doesn’t always give you a full view. Using the multi-tail feature, you can tail multiple logs simultaneously. For example, after a database reboot, you can check if it was successful by validating that the application is querying the database. But if there’s an error on one server, you’ll need to check the other servers to see if they may be affected. You can start a second Live Tail session from the Live Tail page, or from the Search page, and the browser opens in split-screen mode, and streams 300 – 400 messages per minute. You can also open, or “pop out” a running Live Tail session into a new browser window. This way, you can move the new window to another screen, or watch it separately from the browser window where Sumo Logic is running. Figure 3. Multi-tailing in split screen mode Launch In Context One of the highlights of Sumo Logic Live Tail is the ability to launch in context, which allows you to seamlessly alternate between Sumo Logic Search and Live Tail in browser mode. For example, when you are on the search page and need to start tailing a log file to view the most recent log files coming in (raw log lines), you click on a button to launch the Live Tail page from Search and the source name gets carried forward automatically. If you are looking to perform more advanced operations like parsing, using operators or increasing the time range for the previous day, simply click “Open in Search”. This action launches a new search tab which automatically includes the parameters you entered on the Live Tail page. There is no delay to re-enter the parameters. For more information about using Live Tail, check out the documentation in Sumo Logic Help.

January 21, 2016

Blog

Open Source Projects at Sumo Logic

Someone recently asked me, rather smugly I might add, “who’s ever made money from open source?” At the time I naively answered with the first person who came to mind, which was Rod Johnson, the creator of Java’s Spring Framework. My mind quickly began retrieving other examples, but in the process I began to wonder about the motivation behind the question. The inference, of course, was that open source is free. Such a sentiment speaks not only to monetization but to the premise of open source, which raises a good many questions. As Karim R. Lakhani and Robert G Wolf wrote, “Many are puzzled by what appears to be irrational and altruistic behavior… giving code away, revealing proprietary information, and helping strangers solve their technical problems.” While many thought that better jobs, career advancement, and so on are the main drivers, Lakhani and Wolf discovered it is how creative a person feels when working on the project (what they call “enjoyment-based intrinsic motivation”) is the strongest and most pervasive driver. They also found that user need, intellectual stimulation derived from writing code, and improving programming skills are top motivators for project participation. Open Source Projects at Sumo Logic Here at Sumo Logic, we have some very talented developers on the engineering team and they are passionate about both the Sumo Logic application and giving back. To showcase some of the open-source projects our developers are working on, as well as other commits from our community we’ve created a gallery on our developer site where you can quickly browse projects and dive into the repos, code, and gists we’ve committed. Here’s a sampling of what you’ll find: Sumoshell Parsing out fields on the command line can be cumbersome. Aggregating is basically impossible, and there is no good way to view the results. Written by Russell Cohen, Sumoshell is collection of CLI utilities written in Go that you can use to improve analyzing log files. Grep can’t tell that some log lines span multiple individual lines. In Sumoshell, each individual command acts as a phase in a pipeline to get the answer you want. Sumoshell brings a lot of the functionality of Sumo Logic to the command line. Sumobot As our Chief Architect, Stefan Zier, explains in this blog post, all changes to production environments at Sumo Logic follow a well-documented change management process. In the past, we manually tied together JIRA and Slack to get from a proposal to approved change in the most expedient manner. So we built a plugin for our sumobot Slack bot. Check out both the post and the plugin. Sumo Logic Python SDK Written by Yoway Buorn, the SDK provides a Python interface to the Sumo Logic REST API. The idea is to make it easier to hit the API in Python code. Feel free to add your scripts and programs to the scripts folder. Sumo Logic Java Client Sumo Logic provides a cloud-based log management solution. It can process and analyze log files in petabyte scale. This library provides a Java client to execute searches on the data collected by the Sumo Logic service. Growing Number of Projects Machine data and analytics is about more than just server logging and aggregation. There are some interesting problems yet to be solved. Currently, you’ll find numerous appenders for .Net and Log4j, search utilities for Ruby and Java, Chef Cookbooks, and more. We could additional examples calling our REST API’s from different languages. As we build our developer community, we’d like to invite you contribute. Check out the open-source projects landing page and browse through the projects. Feel free to fork a project and share, or add examples to folders where indicated.

Blog

DevOps Visibility - Monitor, Track, Troubleshoot

As organizations embrace the DevOps approach to application development they face new challenges that can’t be met with legacy monitoring tools. Teams need DevOps Visibility. While continuous integration, automated testing and continuous delivery have greatly improved the quality of software, clean code doesn’t mean software always behaves as expected. A faulty algorithm or failure to account for unforeseen conditions can cause software to behave unpredictably. Within the continuous delivery (CD) pipeline, troubleshooting can be difficult, and in cases like debugging in a production environment it may not even be possible.DevOps teams are challenged with monitoring, tracking and troubleshooting issues in a context where applications, systems, network, and tools across the toolchain all emit their own logging data. In fact, we are generating an ever-increasing variety, velocity, and volume of data.Challenges of Frequent Release CyclesThe mantra of DevOps is to "Release faster and automate more." But these goals can also become pain points. Frequent release introduces new complexity and automation obscures that complexity. In fact, DevOps teams cite deployment complexity as their #1 challenge.The current challenges for DevOps teams is:Difficulty in collaborating across silos. Difficulty syncing multiple development work-streams. Frequent performance or availability issues. No predictive analytics to project future KPI violations. No proactive push notifications to alert on service outages.DevOps drives cross-organizational team collaboration. However, organizations amidst a DevOps adoption are finding they are having difficulty in collaborating across silos. Frequent release cycles also adds pressure when it comes to syncing multiple development work-streams. These forces are driving the need for more integration between existing legacy tools, and the need for new tools that cross-organizational teams can use collaboratively.Because of its emphasis on automated testing, DevOps has also created a need for toolsets that enable troubleshooting and root-cause analysis. Why? Because, as I've said, clean code doesn’t mean software always behaves as expected. That's why a greatest pain point for many of these teams is additions and modifications to packaged applications - often these are deployed to multi-tenant cloud environments.Troubleshooting from the Command LineDevOps teams are discovering that performance and availability problems have increased with more frequent releases. That means Ops is spending more time troubleshooting, and development is being drawn into production troubleshooting. In response developers typically will ssh into a server or cloud environment, drop down to the command line, and tail -f the log file. When the problem isn't readily seen they begin grepping the logs using regular expressions and hunt for patterns and clues to the problem. But grep doesn't scale. Simply put, log data is everywhere. Application, system and network logs are stored in different locations of each server, and may be distributed across locations in the cloud or other servers. Sifting through terabytes of data can take days.The difficulty is there's no consistency, no centralization and no visibility—No Consistency Ops is spending more time troubleshooting. Development is drawn into production troubleshooting. Service levels have degraded with more frequent releases. Performance and availability problems have increased. No CentralizationMany locations of various logs on each server. Logs are distributed across locations in the cloud or various servers. SSH + GREP doesn’t scale. No DevOps VisibilityHigh-value data is buried in petabytes Meaningful views are difficult to assemble No real-time visibility Immense size of Log DataDevOps Visibility Across the Tool ChainSumo Logic provides a single solution that is tool-agnostic and provides visibility throughout the Continuous Integration-Continuous Delivery pipeline, as well as across the entire DevOps toolchain. Sumo Logic delivers a comprehensive strategy for monitoring, tracking and troubleshooting applications at every stage of the build, test, deliver, and deploy release cycle.Full Stack DevOps Visibility - gather event streams from applications at every stage from sandbox development to final deployment and beyond. Combine with system and infrastructure data to get a complete view of your application and infrastructure stack in real time. No integration hassles - Sumo Logic can be integrated with a host of DevOps tools across the entire continuous delivery pipeline, not just server data. Increased Availability and Performance - Because you can monitor deployments in real time, issues can be identified before they impact the application and customer. Precise, proactive analytics quickly uncover hidden root causes across all layers of the application and infrastructure stack. Streamlines Continuous Delivery Troubleshoot issues and set alerts on abnormal container or application behavior Visualizations of key metrics and KPIs, including image usage, container actions and faults, as well as CPU/Memory/Network statistics Ability to easily create custom and aggregate KPIs and metrics using Sumo Logic’s powerful query language Advanced analytics powered by Log Reduce, Anomaly Detection, Transaction Analytics, and Outlier DetectionVersatilityThe one reaction I hear from customers is surprise - An organization will typically apply Sumo Logic to a specific use case such as security compliance. Then they discover the breadth of the product and apply it to use cases they had never thought of.“Many benefits and features of Sumo Logic came to us as a surprise. The Sumo Logic Service continues to uncover different critical issues and deliver new insight throughout the development/support lifecycles of each new version we release” -- Nathan Smith, Technical Director, Outsmart GamesSumo Logic enables DevOps teams to get deep, real-time visibility into their entire toolchain and production environment to help create better software faster. You can check out Sumo Logic right now with a free trial. It's easy to set up and allows you check out the wealth of features including LogReduce, our pattern-matching algorithm that quickly detects anomalies, errors and trending patterns in your data.

Blog

Sumo Logic is ISO 27001 and CSA Star Certified

Recently Sumo Logic secured ISO 27001 Certification and CSA Star Certification, further demonstrating not only our commitment to security and compliance, but also providing customers with the highest level of compliance certifications to secure data in the cloud. ISO/IEC 27001:2013 is the international standard for information security management, which specifies 14 security control clauses and 144 security controls designed to protect the confidentiality, integrity and availability of information. It is important to note that ISO 27001 requires active involvement of executive team insecurity and compliance activities and puts emphasis on demonstrating continuous improvement. CSA Star is a rigorous assessment of cloud specific security controls and processes. The certification leverages the requirements of the ISO/IEC 27001 management system standard together with the CSA Cloud Controls Matrix specific to cloud security controls, mapped to leading standards, best practices and regulations. While some cloud providers complete a self-assessment, Sumo Logic engaged BrightLine CPAs to conduct an independent audit. How did we do it? I’ve been often asked what does it take to obtain ISO 27001 certification and how much time and effort is required. The answer is it depends on your existing security posture. ISO certification process itself is very involved and requires completion of the following tasks: Obtaining buy-in from the executive team – This goes beyond obtaining budget for the audit. ISO 27001 requires that the executive team is actively involved in security management process and enforcing of security controls in their respective teams. Completing gap assessment – Identifying security controls that are already in place and the ones that either have to be implemented or improved. Implementing of ISO controls based on the results of the gap assessment. Educating and training employees – ISO 27001 program requires that all employees understand their involvement in individual controls and contribution to continuous improvement. Completing documentation – ISO 27001 certification requires extensivedocumentation addressing all relevant millstones and individual controls. This forms the criteria the company is measured against to meet the ISO standard. Completing an internal audit, which has to be performed by an independent auditor. Passing Phase I and Phase II audits – These are certification audits are performed by an independent assessor who upon successful completion of audits (without any nonconformities) issue a certificate stating that the business is meeting the ISO 27001 controls and requirements. CERTIFICATION! These certifications are a huge milestone for any company, but the fact that we have architected the Sumo Logic platform with security in mind, makes it a bit easier. Our industry-leading includes a rigorous security model with an end-to-end process, which includes best-of-breed technologies and stringent operational processes, enabling us to provide our customers with the ability to operate and innovate with confidence and security in the cloud.

Blog

Using Analytics to Support the Canary Release

When you roll out a new deployment, how do you roll? With a big bang? A blue/green deployment? Or do you prefer a Canary Release? There’s a lot to be said for the Canary Release strategy of testing new software releases on a limited subset of users. It reduces the risk of an embarrassing and potentially costly public failure of your application to a practical minimum. It allows you to test your new deployment in a real-world environment and under a real-world load. It allows a rapid (and generally painless) rollback. And if there’s a failure of genuinely catastrophic proportions, only a small subset of your users will even notice the problem. But when you use Canary Release, are you getting everything you can out of the process? A full-featured suite of analytics and monitoring tools is — or should be — an indispensable part of any Canary Release strategy. The Canary Release Pattern In a Canary Release, you initially release the new version of your software on a limited number of servers, and make it available to a small subset of your users. You monitor it for bugs and performance problems, and after you’ve taken care of those, you release it to all of your users. The strategy is named after the practice of taking canaries into coal mines to test the quality of the air; if the canary stopped singing (or died), it meant that the air was going bad. In this case, the “canary” is your initial subset of users; their exposure to your new release allows you detect and fix the bugs, so your general body of users won’t have to deal with them. Ideally, in a strategy such as this, you want to get as much useful information as possible out of your initial sample, so that you can detect not only the obvious errors and performance issues, but also problems which may not be so obvious, or which may be relatively slow to develop. This is where good analytic tools can make a difference. Using Analytics to Support a Canary Release In fact, the Canary Release strategy needs at least some analytics in order to work at all. Without any analytics, you would have to rely on extremely coarse-grained sources of information, such as end-user bug reports and obvious crashes at the server end, which are very likely to miss the problems that you actually need to find. Such problems, however, generally will show up in error logs and performance logs. Error statistics will tell you whether the number, type, and concentration (in time or space) of errors is out of the expected range. Even if they can’t identify the specific problem, such statistics can suggest the general direction in which the problem lies. And since error logs also contain records of individual errors, you can at least in theory pinpoint any errors which are likely to be the result of newly-introduced bugs, or of failed attempts to eliminate known bugs. The problem with identifying individual errors in the log is that any given error is likely to be a very small needle in a very large haystack. Analytics tools which incorporate intelligent searches and such features as pattern analysis and detection of unusual events allow you to identify likely signs of a significant error in seconds. Without such tools, the equivalent search might take hours, whether it uses brute force or carefully-crafted regex terms. Even being forced by necessity to do a line-by-line visual scan of an error log, however, is better than having no error log at all. Logs that monitor such things as performance, load, and load distribution can also be useful in the Canary Release strategy. Bugs which don’t produce clearly identifiable errors may show up in the form of performance degradation or excessive traffic. Design problems may also leave identifiable traces in performance logs; poor design can cause traffic jams, or lead to excessive demands on databases and other resources. You can enhance the value of your analytics, and of the Canary Release itself, if you put together an in-depth demographic profile of the user subset assigned to the release. The criteria which you use in choosing the subset, of course, depends on your needs and priorities, as well as the nature of the release. It may consist of in-house users, of a random selection from the general user base, or of users carefully chosen to represent either the general user base, or specific types of user. In any of these cases, however, it should be possible to assemble a profile of the users in the subset. If you know how the users in the subset make use of your software (which features they access most frequently, how often they use the major features, and at what times of day, how this use is reflected in server loads, etc.), and if you understand how these patterns of use compared to those of you general user base, the process of extrapolation from Canary Release analytics should be fairly straightforward, as long as you are using analytic tools which are capable of distilling out the information that you need. So yes, Canary Release can be one of the most rewarding deployment strategies — when you take full advantage of what it has to offer by making intelligent use of first-rate analytic tools. Then the canary will really sing! About the Author Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the 90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.

Blog

Snowflake Configurations and DevOps Automation

“Of course our pipeline is fully automated! Well, we have to do some manual configuration adjustments on a few of our bare metal servers after we run the install scripts, but you know what I mean…” We do know what you mean, but that is not full automation. Call it what it really is — partial automation in a snowflake environment. A snowflake configuration is ad-hoc and “unique” to the the environment at large. But in DevOps, you need to drop unique configurations, and focus on full-automation. What’s Wrong With Snowflake Configurations? In DevOps, a snowflake is a server that requires special configuration beyond that covered by automated deployment scripts. You do the automated deployment, and then you tweak the snowflake system by hand. For a long time (through the ‘90s, at least), snowflake configurations were the rule. Servers were bare metal, and any differences in hardware configuration or peripherals (as well as most differences in installed software) meant that any changes had to be handled on a system-by-system basis. Nobody even called them snowflakes. They were just normal servers. But what’s normal in one era can become an anachronism or an out-and-out roadblock in another era -— and nowhere is this more true than in the world of software development. A fully-automated, script-driven DevOps pipeline works best when the elements that make up the pipeline are uniform. A scripted deployment to a thousand identical servers may take less time and run more smoothly than deployment to half a dozen servers that require manual adjustments after the script has been run. For more on DevOps pipelines, see, “How to Build a Continuous Delivery Pipeline” › No Virtual Snowflakes … A virtual snowflake might be at home on a contemporary Christmas tree, but there’s no place for virtual snowflakes in DevOps. Cloud-based virtual environments are by their nature software-configurable; as long as the cloud insulates them from any interaction with the underlying hardware, there is no physical reason for a set of virtual servers running in the same cloud environment to be anything other than identical. Any differences should be based strictly on functional requirements — if there is no functional reason for any of the virtual servers to be different, they should be identical. Why is it important to maintain such a high degree of uniformity? In DevOps, all virtual machines (whether they’re full VMs or Docker) are containers in much the same way as steel shipping containers. When you ship something overseas, you’re only concerned with the container’s functional qualities. Uniform shipping containers are functionally useful because they have known characteristics, and they can be filled and stacked efficiently. This is equally true of even the most full-featured virtual machine when it is deployed as a DevOps container. This is all intrinsic to core DevOps philosophy. The container exists solely to deliver the software or services, and should be optimized for that purpose. When delivery to multiple virtual servers is automated and script-driven, optimization requires as much uniformity as possible in server configurations. For more on containers, see, “Kubernetes vs. Docker: What Does It Really Mean?” › What About Non-Virtual Snowflakes? If you only deal in virtual servers, it isn’t hard to impose the kind of standardization described above. But real life isn’t always that simple; you may find yourself working in an environment where some or all of the servers are bare metal. How do you handle a physical server with snowflake characteristics? Do you throw in the towel, and adjust it manually after each deployment, or are there ways to prevent a snowflake server from behaving like a snowflake? As it turns out, there are ways to de-snowflake a physical server — ways that are fully in keeping with core DevOps philosophy. First, however, consider this question: What makes a snowflake server a snowflake? Is it the mere fact that it requires special settings, or is it the need to make those adjustments outside of the automated deployment process (or in a way that interrupts the flow of that process)? A thoroughgoing DevOps purist might opt for the first definition, but in practical terms, the second definition is more than adequate. A snowflake is a snowflake because it must be treated as a snowflake. If it doesn’t require any special treatment, it’s not a snowflake. One way to eliminate the need for special treatment during deployment (as suggested by Daniel Lindner) is to install a virtual machine on the server, and deploy software on the virtual machine. The actual deployment would ignore the underlying hardware and interact only with the virtual system. The virtual machine would fully insulate the deployment from any of the server’s snowflake characteristics. What if it isn’t practical or desirable to add an extra virtual layer? It may still be possible to handle all of the server’s snowflake adjustments locally by means of scripts (or automated recipes, as Martin Fowler put it in his original Snowflake Server post), running on the target server itself. These local scripts would need to be able to recognize elements in the deployment which might require adjustments to snowflake configurations, then translate those requirements into local settings and apply them. If the elements that require local adjustments are available as part of the deployment data, the local scripts might intercept that data as the main deployment script runs. But if those elements are not obvious (if, for example, they are part of the compiled application code), it may be necessary to include a table of values which may require local adjustments as part of the deployment script (if not full de-snowflaking, at least a 99.99% de-snowflaking strategy). So, what is the bottom line on snowflake servers? In an ideal DevOps environment, they wouldn’t exist. In the less-than-ideal world where real-life DevOps takes place, they can’t always be eliminated, but you can still neutralize most or all of their snowflake characteristics to the point where they do not interfere with the pipeline. For more on virtual machines, see, “Docker Logs vs Virtual Machine Logs” › Next Up DevOps as a Service: Build Automation in the Cloud Learn about DevOps as a managed cloud service, the tools available, mutable vs immutable infrastructure, and more. The State of Modern Applications & DevSecOps in the Cloud Sumo Logic’s third annual report reveals how the world’s most cloud-savvy companies manage their modern applications. DevOps and Continuous Delivery Discover how Sumo Logic accelerates the CD pipeline with automated testing, integrated threat intelligence, and more.

Blog

Introducing the Sumo Logic App for AWS Config

Introducing the Sumo Logic App for AWS Config: Real-Time Cloud Visibility The best part about an AWS infrastructure is its dynamic and flexible nature, the ability to add/delete/modify resources at any time, allowing you to rapidly meet the needs of the business. However, operating and monitoring that dynamic AWS environment on a daily basis is a different story. That dynamic nature we all appreciate, but this presents many operating challenges: Organizations need an easy way to track changes. For auditing and compliance Security investigations Tracking system failures Operations teams that support the AWS environment need to know what was changed in an environment When was it changed Who made the change What resources were impacted by this change Without detailed visibility operations teams are flying blind. They don’t have the information they need to mange their AWS infrastructure and be held accountable. To help you operate, manage and monitor your AWS environment and to maximize your investments we are please to announce the availability of the Sumo Logic App for AWS Config. The new app enables operations and security teams to monitor an AWS infrastructure and track what is being modified and its relationship with other objects. Dashboard View: Sumo Logic App for AWS Config With the Sumo Logic App for AWS Config enables organizations to. Monitor resources Generate audit and compliance reports View resource relationships Troubleshoot configurations Discover resource modification trends The Sumo Logic App for AWS Config is available today from the App Library. If you haven’t tried Sumo Logic yourself yet, sign-up for our free trial and see how you can get immediate operational visibility into your AWS infrastructure. It’s free and you can get up and running in just a few minutes. To learn more about Sumo Logic’s continuous intelligence for AWS, please go to www.sumologic.com/aws. I’d also love to hear about how you are using the app or supporting your AWS environment, so please feel free to send feedback directly to [email protected]. Mark Product Marketing, Compliance & Security Sumo Logic

Blog

What Are Isomorphic Applications?

Having been a backend developer for my entire career, isomorphic applications are still a very new concept to me. When I first heard about them, it was a little difficult for me to understand. Why would you give that much control to the front-end? My brain started listing off reasons why that is a terrible idea: security, debugging, and complexity are just a few of the problems I saw. But, once I got past my initial flurry of closed-mindedness, I started to look into the solutions to those problems. While I am not necessarily a true believer, I can definitely see the power of isomorphic apps. So, what exactly is an isomorphic application? Well, in a nutshell, an isomorphic application (typically written in JavaScript) tries to mix the best parts of the front-end (“the client”) and the backend (“the server”) by allowing the same code to run on both sides. In this configuration, the first request made by the web browser is processed by the server, while all remaining requests are processed by the client. The biggest advantage to this is that both the the client and server are capable of performing the same operations, making the first request fast, and all subsequent requests faster. A secondary (at least from my perspective) benefit is the SEO advantage that isomorphic applications provide. Because the first request is passed directly onto the server, raw HTML is rendered the first time, rather than making an AJAX call and processing the response via JavaScript. This allows search engine crawlers that don’t have JavaScript support to properly read the data on the page, as opposed to receiving a blank page with no text. While isomorphic applications speed things up and prevent duplication of functionality between the client and the server, there are still some risks associated with using them. A big potential problem with isomorphic apps is security. Because the client and server share so much code, you have to be especially careful not to expose API keys or application passwords in the client. Another issue, as I mentioned above, is that debugging can be significantly more difficult. This is because, instead of debugging JavaScript in the browser and PHP in the server (as an example), you are now debugging the same set of code but in potentially two places (depending on when and where the issue occurred). If you are using an isomorphic JavaScript library, such as Facebook’s React.js, tools like the React Developer Tools Chrome Extension can be invaluable for debugging issues on the client side. The biggest concern I have with isomorphic apps isn’t necessarily a universal one: complexity. While the concept is incredibly clever, the learning curve feels like it could be pretty big. Like with most frameworks and libraries it does take practice, but because this is such a new way to structure web apps, there is a high potential for making mistakes and doing things the “wrong” way. Ultimately, I think isomorphic JavaScript has a ton of potential to make some great web applications. While it may not be perfect for every project, I think that the benefits definitely outweigh the risks. About the Authors Zachary Flower (@zachflower) is a freelance web developer, writer, and polymath. He has an eye for simplicity and usability, and strives to build products with both the end user, and business goals, in mind. From building projects for the NSA to features for Buffer, Zach has always taken a strong stand against needlessly reinventing the wheel, often advocating for using well established third-party and open source services and solutions to improve the efficiency and reliability of a development project.

Blog

Carsales Drives Confidently into the Cloud with Sumo Logic

I always love talking to customers and hearing how they’re using Sumo Logic to help solve challenges within their organizations, particularly those that in the middle of their journey to moving workloads and data to the cloud. Without fail, I’m always surprised to learn how hard the day-to-day was for IT teams, and how by taking a cloud-native approach to managing log data analytics, they’re able to open up a whole new world of insights and intelligence that really impacts their business. I recently spoke with one of our newest customers in the Asia Pacific region, carsales. One of Australia’s leading automotive classifieds website (think of the equivalent of CarFax, TraderOnline or CraigsList here in the U.S.), carsales services both consumers and more than 6,000 dealers across the country. As you can imagine, the company experiences a huge amount of website traffic and processes more than 450 million searches and over 12.5 billion image downloads. As a growing enterprise, carsales had long been looking to transition from a legacy data center to the cloud. Interestingly this journey became a priority when their executive team asked about their disaster recovery plan. “We originally started moving our infrastructure to the cloud because our site traffic varies greatly throughout the day – no one day is the same. The cloud is perfect for allowing us to adjust our footprint as necessary. It also made it easy for us to develop a solid disaster recovery plan without having to pay and manage separate data centers” said Michael Ridgway, director of engineering for Ryvuss at carsales.com. The carsales team quickly discovered that retrieving logs manually from machines wasn’t practical so they started looking for a log management solution. One of their non-negotiable requirements for this solution was to avoid managing any additional infrastructure or software. Since moving to Sumo Logic, the carsales team is now in the driver’s seat and has gained operational visibility across their entire infrastructure stack and obtained new insights into application health and performance “With Sumo Logic we’ve just scratched the surface. Our entire development team now has real-time access to our log applications and can see trending metrics over time. As a result, we can now put the power in the hands of the people who can actually fix the problem. Our average response times have decreased from hours to minutes and we can detect and resolve issues before they have potential to impact our customers.” For more information on how carsales is getting value from Sumo Logic check out the full case study.

Blog

Does Docker Deployment Make Bare Metal Relevant?

Is bare metal infrastructure relevant in a DevOps world? The cloud has reduced hardware to little more than a substrate for the pool of resources that is the cloud itself. Those resources are the important part; the hardware is little more than a formality. Or at least that’s been the standard cloud-vs-metal story, until recently. Times change, and everything that was old does eventually become new again — usually because of a combination of unmet needs, improved technology, and a fresh approach. And the bare-metal comeback is no different. Unmet Needs The cloud is a pool not just of generic, but also shared resources (processor speed, memory, storage space, bandwidth). Even if you pay a premium for a greater share of these things, you are still competing with the other premium-paying customers. And the hard truth is that cloud providers can’t guarantee a consistent, high level of performance. Cloud performance depends on the demand placed on it by other users — demand which you can’t control. If you need reliable performance, there is a good chance that you will not find it in the cloud. This is particularly true if you’re dealing with large databases; Big Data tends to be resource-hungry, and it is likely to do better on a platform with dedicated resources down to the bare-metal level, rather than in a cloud, where it may have to contend with dueling Hadoops. The cloud can present sticky compliance issues, as well. If you’re dealing with formal data-security standards, such as those set by the Securities and Exchange Commission or by overseas agencies, verification may be difficult in a cloud environment. Bare metal provides an environment with more clearly-defined, hardware-based boundaries and points of entry. Improved Technology Even if Moore’s Law has been slowing down to sniff the flowers lately, there have been significant improvements in hardware capabilities, such as increased storage capacity, and the availability of higher-capacity solid state drives, resulting in a major boost in key performance parameters. And technology isn’t just hardware — it’s also software and system architecture. Open-source initiatives for standardizing and improving the hardware interface layers, along with the highly scalable, low-overhead CoreOS, make lean, efficient bare metal provisioning and deployment a reality. And that means that it’s definitely time to look closely at what bare metal is now capable of doing, and what it can now do better than the cloud. A Fresh Approach As technology improves, it makes sense to take a new look at existing problems, and see what could be done now that hadn’t been possible (or easy) before. That’s where Docker and container technology come in. One of the major drawbacks of bare metal in comparison to cloud systems has always been the relative inflexibility of available resources. You can expand such things as memory, storage, and the number of processors, but the hard limit will always be what is physically available to the system; if you want to go beyond that point, you will need to manually install new resources. If you’re deploying a large number of virtual machines, resource inflexibility can be a serious problem. VMs have relatively high overhead; they require hypervisors, and they need enough memory and storage to contain both a complete virtual machine and a full operating system. All of this requires processor time as well. In the cloud, with its large pool of resources, it isn’t difficult to quickly shift resources to meet rapidly changing demands as virtual machines are created and deleted. In a bare-metal system with hardware-dependent resources, this kind of resource allocation can quickly run up against the hard limits of the system. Docker-based deployment, however, can radically reduce the demands placed on the host system. Containers are built to be lean; they use the kernel of the host OS, and they include only those applications and utilities which must be available locally. If a virtual machine is a bulky box that contains the application being deployed, plus plenty of packing material, a container is a thin wrapper around the application. And Docker itself is designed to manage a large number of containers efficiently, with little overhead. On bare metal, the combination of Docker, a lean, dedicated host system such as CoreOS, and an open-source hardware management layer makes it possible to host a much higher number of containers than virtual machines. In many cases, this means that bare metal’s relative lack of flexibility with regard to resources is no longer a factor; if the number of containers that can be deployed using available resources is much greater than the anticipated short-to-medium-term demand, and if the hardware resources themselves are easily expandable, then the cloud really doesn’t offer much advantage in terms of resource flexibility. In effect, Docker moves bare metal from the “can’t use” category to “can use” when it comes to the kind of massive deployments of VMs and containers which are a standard part of the cloud environment. This is an important point — very often, it is this change from “can’t use” to “can use” that sets off revolutions in the way that technology is applied (most of the history of personal computers, for example, could be described in terms of “can’t use”/”can use” shifts), and that change is generally one in perception and understanding as much as it is a change in technology. In the case of Docker and bare metal, the shift to “can use” allows system managers and architects to take a close look at the positive advantages of bare metal in comparison to the cloud. Hardware-based solutions, for example, are often the preferred option in situations where access to dedicated resources is important. If consistent speed and reliable performance are important, bare metal may be the best choice. And the biggest surprises may come when designers start asking themselves, “What can we do with Docker on bare metal that we could do with anything before?” So, does Docker make bare metal relevant? Yes it does, and more than that, it makes bare metal into a new game, with new and potentially very interesting rules. About the Author @mazorstorn Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry, working on the prototype for the ground-breaking laser-disc game Dragon’s Lair. He spent much of the 90s in the high-pressure bundled software industry, where near-continuous release cycles and automated deployment were already de facto standards; during that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues. Sources source:”IBM 700 logic module” by Autopilot – Own work. Licensed under CC BY-SA 3.0 via Commons – https://commons.wikimedia.org/wiki/File:IBM_700_logic_module.jpg#/media/File:IBM_700_logic_module.jpg

Blog

2016 Trends Impacting the Future of Machine Data Analytics

Blog

Design Like a Sumo: Harnessing the Power of Data

Sumo Logic recently hosted an Enterprise UX Meetup that I got to open with an introductory presentation on “Designing by Numbers”. Many designers think of numbers only in conjunction with quantitative methods, but hybrid options allow designers to apply numbers to qualitative methods as well. The following blog is my attempt to create a simple four-step process to extract meaning from a hybrid testing method. Results are everything When I started thinking about the presentation that turned into this blog, I thought about the similarities between the world of Sumo wrestling and the world of design. Results are ingrained in the Sumo wrestling lifestyle. Everything that these amazing athletes do from the moment that they wake up to when they go to sleep at night is driven by results. Similarly, in design, results are everything as well. At Sumo Logic, we analyze close to 50 Petabytes of data and scan 175 Trillion records a day! The scope of our service is really amazing. But we have to make sense of all that data. The results of how our users use our product tells us whether our design is effective or not. At the end of the day, that’s what matters. We can always say, our designs are great and we love how cool they are, but like Sumo wrestlers, results drive everything we do. Data is to design as strength is to Sumo. A Sumo wrestler trains everyday to gain strength. Similarly in design, the more data you can gather, the more strength you have to inform your designs. It seems obvious, but you’d be surprised by how many companies that design without gathering data first. They assume they know it all, but in reality, designing without data can be pretty dangerous. For example, at Sumo Logic, we know that our product has to help our customers manage critical applications like DevOps, IT, and Security. So in that case, if our designs don’t focus on our customers’ goals, it could literally be dangerous to their—and our—bottom lines. So how do we think like a Sumo when it comes to design? 1. Know your opponent In this case, the opponent is the design problem. First, break down the problem that you’re trying to solve. This is a big challenge when you’re just starting to gather data. Often, designers may make the mistake of wanting to tackle all the problems at once, when in reality the first thing you must do is decide, what is the main problem we’re trying to solve? Next, remember that problems are attached to people. When you’re studying the problem, remember to study the person who is having the problem.To try to solve a problem without concentrating on the person is like saying a Sumo is wrestling “fighting moves” instead of another Sumo wrestler. Finally, it’s a good rule not to measure “likes” or “hates”. These are words that make people feel like they have to react to or defend against them. And they’re not really measurable. Concentrate on the reasons for the “likes” or “hates”. Ask users, what is it exactly about that feature that you really like or hate? UX designers, Product Managers, or anyone in the UX field need to get around vague answers and not let them sway you one way or another out of a knee-jerk reaction. Focus on the reasons your users give you, and make sure you know what you’re really measuring as you collect data to define your problem. 2. Identify high risk areas A Sumo wrestler has to know his opponents weaknesses in order to exploit them, but even more importantly, he needs to know his own weaknesses in order to defend against them. High risk areas are the strikes that can knock you out. But once you identify them, you can defend against them. For our Metrics project, the UX team developed many creative deliverables, including this Customer Journey diagram below. Even if it looks a little bit like a comic strip, with it we were able to identify the high risk points in our customers’ day-to-day work, and efficiently communicate them with internal stakeholders, executives, and customers. Illustration by Nitesh Jain Often, designers confuse “high risk” with “high friction”. For example, if you have a web form that’s difficult for your customer to get through, but they can get through and then continue, that would be considered high friction, not high risk. High risk means if your customer can’t get through that form, they’re out. Game over. So the high risk points are the main areas you want to identify, tackle, and make sure are working well before you move on in your design process. 3. Hypothesize Once you know who is having the problem and understand the high risk areas, you can prepare to strike—with a hypothesis. In the Sumo wrestling world, when the opponents take their positions and prepare to strike, it’s an elaborate setup that takes a long time before the moment of attack. This is similar to creating your hypothesis in design. But you’ve already gathered data, talked to users, and done your research. Your hypothesis is clear. Now, it’s time to attack—with creativity and testing. 4. Be creative and test Just like how a Sumo wrestler has learned many different moves, there are many ways to attack a problem. In design, these require creativity and user testing, as illustrated in the following diagram. Qualitative. On one side of the spectrum you have a qualitative attack, which is usually direct contact with a person, and the results are measured in words. This method requires smaller sets of user testers. Quantitative. On the other side of the spectrum you have a quantitative attack, which is usually indirect, and the results are measured in numbers. This method requires larger sets of user testers. What People Say. On the bottom of the spectrum, you can base your attack on what people say in user testing. But often what they say what they would do in a situation, and what they actually do is very different. What People Do. At the top of the spectrum, you can base your attack on what people actually do in user testing. Which is the best method? It depends on the problem, the people, and the high-risk factors. You need to be creative and try different methods to determine what is the best way to tackle your design problem. Relying on one method is like a Sumo wrestler relying on only one move. Here’s another example. Let’s say we’re using a qualitative method to measure pizza diets. In this test, you have four different diet categories: Carnivores Only New York pizza Vegan Gluten free From this result, you could say that the number of “Carnivores” and “Only NY” is just about equal. With this method though, the words you use to describe the results are vague. For example, you could say that not many people are interested in “Gluten Free” pizza diets. To make your results more specific, you can take your qualitative results, and superimpose numbers on the results, to get a quantitative result. Convert your results to ones and zeroes, do the math, and get actual number results. Then you can say “Two out of six people are interested in Vegan pizza diets.” Once you have qualitative results, you can find out the reasons behind these answers, so they’re not just numbers. For example, one test participant says he or she is both a “Carnivore” and a “Vegan.” You might want to follow up with that person and see why they answered that way. Overall, combining these two methods is much more powerful than using only one method, like a Sumo wrestler using a combination of moves to throw his opponent off balance. Together, they can lead to a new jumping off point to ask more details questions to get even better results. How do I know if I got it right? From experience, I can tell you that you’ll know when you have it right. You’ll also know when you have it wrong. There’s a lot of vagueness in between, but not on the ends of the spectrum. Ultimately, your customers will tell you when you have it right with how they respond to your product. We’ve had great responses from our Sumo Logic customers, and with your continued help, we’re making our product better all the time.

December 17, 2015

Blog

Log Analytics for Mobile Development

Blog

Shopping for a MSP That Gets Cloud? Three Essential Questions to Guide Your Search

As organizations look to move more workloads into the cloud, many have turned to Managed Service Providers (MSPs) to help with their journey. While organizations have the opportunity to choose from a vast pool of MSPs, not all MSPs are created equally when it comes to the cloud. Case in point: This past week, Sumo Logic had the privilege of hosting a panel of three “next generation MSPs” – Day1 Solutions, Logicworks, and Smartronix. The panelists, along with Kelly Hartman, who heads up the MSP Program at AWS, met at the AWS Loft in SoHo to learn more about why these organizations are on the cutting edge of the cloud market transition and what that means for customers. From the discussion, I took away three key questions that I recommend IT leaders consider when determining which MSP is right for their journey to the cloud. 1. Is your MSP resting on its laurels? A key characteristic of “next generation” organizations – whether an MSP, technology vendor, customer, etc. – is their passion for constant innovation, including their laser-like focus on the latest shifts and trends that influence innovation. Thus, next generation MSPs are built on speed and agility, and they quickly adapt to relevant trends – making investments in the right tools and technologies to meet the demands of their customers. All our panelists agreed that while these changes are necessary, they can be quite painful – from restaffing and training, to new tools and service offerings – but a true next generation MSP is able to see the value in these crucial investments. This shift will bring about both an expertise in industry trends and the specific “tools of the trade.” Next generation MSPs will not only be well versed in new cloud-native tools, but will also recognize where their old tools simply won’t do the job. Jason McKay, Senior VP and CTO at Logicworks, advised that many customers aren’t fully aware of the breadth of capabilities that cloud-native tools can bring to the equation, and MSPs need to be able to educate customers on the new capabilities. This means going beyond just the new bells and whistles and delving deep into new ways of using data or monitoring applications. 2. Is your MSP a code company or a people company? Our panelists also highlighted the operating model shift of next generation MSPs. In the past it was a common selling point for MSPs to showcase their fully-staffed NOC. Now, according to Brian Clark, VP of the Managed Service Division at Day1 Solutions, the degree of code running infrastructure operations is the new measure of IT confidence. It’s less about “old-world” system administrators, and more about “new-world,” agile infrastructure advisors, who can aid customers on the cloud/IT strategies relevant for their businesses while code takes over the rudimentary work of keeping the infrastructure up, running and viable for end customer engagement. For example, MSPs that invest in cloud-native log management solutions will provide their customers with real-time, full-stack visibility of their entire infrastructure for managing, monitoring and troubleshooting their applications – a capability that simply didn’t exist in the past. This capability, known as continuous intelligence, enables customers to reallocate valuable IT headcount to focus on higher-value work, such as developing and delivering new business applications that can contribute to company profitability either through increasing revenue opportunities or internal process efficiencies. 3. Is your MSP willing to tell you your baby is ugly? Early in the discussion Paul Beda, Principal Cloud Architect and Strategist at Smartronix, chimed in to say that a next generation MSP needs to be willing to tell customers their baby was ugly. What does this mean? When entering into a relationship with an MSP, you are seeking expertise, and this means finding an MSP who is willing to take charge, who will point out the flaws in your current systems, tell you what can be altered, and what needs to be re-architected entirely (aka, the ugly baby). Next generation MSPs do more than manage infrastructures, they have the expertise and experience to optimize, automate and innovate your IT environments to ensure it’s tightly aligned to business goals and objectives. Thus, you’ll maximize your MSP investment if you leverage your MSP as a strategic advisor in addition to a service provider. The panel all agreed that the challenge here is that many customers try to come into the relationship and say how they want something, as opposed to what they really want. Even if your next generation MSP is telling you your baby is ugly, their expertise is lost if you aren’t willing to listen and let them do their jobs. I believe these key questions will greatly aid your search for the right MSP partner. In addition, specific business and IT guidelines and requirements are essential, which is a point that Kelly Hartman emphasized, and AWS has recognized through its development of the Managed Service Provider Validation Checklist. AWS announced that the 3.0 version of this list will be available January 1. Sumo Logic has been working closely with the AWS Managed Service Program to help enable next-generation MSPs by providing the continuous intelligence they need to deliver on the infrastructure needs of their customers. Learn more about our cooperation with this program in this blog post.

AWS

December 16, 2015

Blog

Root Cause Analysis Best Practices

December 15, 2015

Blog

VPC Flow Logging in a DevOps Environment

Blog

The Importance of Continuous Intelligence on the Future of Full-stack System Management

Here at Sumo Logic we’ve been talking a lot about the shift to Continuous Intelligence, and how software-centric companies and traditional organizations alike are being disrupted by traditional IT models. A newly commissioned white paper by the Enterprise Strategy Group, digs into the future of full-stack system management in the era of digital business. The author and Principle Analyst, Application Development and Deployment, Stephen Hendrick, examines the opportunity and challenge IT faces as an active participant in creating new, digital business models. “The opportunity centers on IT’s ability to create new business models and better address customer needs, while the challenge lies in it’s role as a disruptive force to establish enterprises that underestimate the power and speed of IT-fueled change.” Digital business models are fueling the growing acceptance of cloud computing and DevOps practices, resulting in new customer applications that are transforming many traditional markets into digital disruptors – Amazon, AWS, AirBnB, Facebook, Google, Netflix, Twitter, and Uber spring to mind as common examples. However, the rise of cloud-computing and continuous development and delivery practices also results in greater complexity and change within IT environments. Stephen discusses the emergence of technologies to address this trend. In addition, he introduces a Systems Management Reference model to analyze the role and relevance of continuous intelligence technologies to increase the adaptability of full-stack system management, thereby better serving the dynamic needs of the IT infrastructure and business. Stephen concludes, “continuous intelligence brings together the best that real-time, advance analytics has to offer by leveraging continuous real-time data to proactively support the evaluation of IT asset availability and performance within a highly secure environment. This approach reflects and is aligned with today’s modern architecture for application development and deployment, which includes microservices and immutable infrastructure.” Where does Sumo Logic fit into all of this? Quite simply we believe Sumo Logic’s purpose-built, cloud-native, machine data analytics service was designed to deliver real-time continuous intelligence across the entire infrastructure and application stack. This in turn enables organizations to answer questions they didn’t even know they had, by transforming the velocity, variety and volume of unstructured machine data overwhelming them into rich, actionable insights, to address and diffuse complexity and risk.

November 20, 2015

Blog

Driving Continuous Innovation with IT Visibility

IT Operations teams are always under pressure to keep critical business applications and services running to meet the demands of the business and its customers. Given the legacy siloed monitoring tools they are stuck working with, pinpointing a problem or getting ahead of a problem before it happens is a daily challenge. Ops teams require a modern solution that integrates with their current environment and provides end-to-end visibility in a single unified view for all their applications and supporting infrastructure. When IT Operations teams don’t have this insight, the IT resources that their organization, employees and customers can come to a grinding halt. So as an IT Operations professional, what can you do? How do you get the insights you need to keep the servers running, the cloud available and applications responding, all while avoiding any negative business impact? The answer is simple: With Sumo Logic Sumo Logic is the cloud-native service that helps deliver visibility into the operation of an organization’s IT infrastructure. By ingesting the pool of machine data generated within the IT environment, Sumo Logic easily delivers critical performance, availability, configuration, capacity, and security insights to help you monitor and support your critical business apps and underlying IT infrastructure. The Sumo Logic service model advantage lets you: Eliminate monitoring silos by merging all your application and IT infrastructure data Discover meaningful patterns in your IT data and resolve issues faster Detect abnormalities in the performance and usage of your applications and IT systems Sumo Logic offers some of the most comprehensive native integrations for IT to modernize operations and deliver superior service to the business and its customers. If you haven’t tried Sumo Logic yourself yet, sign-up and see how you can get immediate operational visibility of your IT infrastructure. Not only that, it’s free and you can get up and running in just a few minutes. I’d love to hear about how you’re monitoring your IT infrastrcuture, so please feel free to send feedback directly to [email protected]. Manish Director, Product Marketing Sumo Logic

November 18, 2015

Blog

Joan Pepin Receives a Stevie Awards “Female Executive of the Year”

We’re very excited to announce that Sumo Logic’s CISO and VP of Security, Joan Pepin, was recognized by the Stevie® Awards for Women in Business as a Female Executive of the Year in the Business Services category. The final results of the 12th annual Stevie Awards for Women in Business were announced this week at a dinner in New York City. Joan has been an outspoken advocate for cultivating a diverse workforce, particularly around bringing more women into the security fold. She’s led by example through her own accomplishments in the field – from developing methodologies for secure systems, to assessing whether a network is undergoing an attack, to inventing SecureWorks’ Anomaly Detection Engine and Event Linking technologies. Joan got her start in the field by parlaying her knack for hacking with her college friends into consultancy gigs. Her technical chops, tenacity, passion and “forward-thinking empathy” approach to leadership led to management roles where she drove policy management, security metrics and incident response initiatives at SecureWorks. Since joining Sumo four years ago, Joan’s been more than an executive – she’s been a foundational anchor. As one of our first employees, Joan ensured that data security was integral to the product and a high priority for every engineer and architect that worked on it. Thanks to Joan, security is baked into our culture too, meaning that everyone understands their roles in protecting company data and takes it seriously. The data security of Sumo Logic is so well-respected that Joan regularly consults with customers on compliance and strategies for successfully meeting the ever-evolving security standards. She’s also helped spearhead the cloud-first approach that’s become a critical differentiation point for our business. It’s safe to say that Joan’s come a long way from hacking in her college dorm. We’re honored to celebrate Joan’s recognition as an industry leader and advocate for women in technology.

November 18, 2015

Blog

What is Full Stack Deployment?

Full stack deployments are a relatively new concept to me. At first, I was confused as to why you would redeploy the entire stack every time, rather than just the code. It seems silly, right? My brain was stuck a little in the past, as if you were rebuilding a server from scratch on every deployment. Stupid. But what exactly is full stack deployment, and why is it better than “traditional” code-only deployment? Traditionally, deployments involve moving code from a source code repository into a production environment. I know this is a simplistic explanation, but I don’t really want to get into unit testing, continuous integration, migrations, and all the other popular buzzwords that inhabit the release engineering ethos. Code moves from Point A to Point B, where it ends up in the hands of the end user. Not much else changes along that path. The machine, operating system, and configurations all stay the same (for the most part). With full stack deployments, everything is re-deployed. The machine (or, more accurately, the virtual machine) is replaced with a fresh one, the operating system is reprovisioned, and any dependent services are recreated or reconfigured. These deployment are often handled in the form of a freshly configured server image that is uploaded and then spun up, rather than starting and provisioning a server remotely. While this might sound a little like overkill, it is actually an incredibly valuable way to keep your entire application healthy and clean. Imagine if you were serving someone a sandwich for lunch two days in a row. You would use a different plate to put today’s sandwich on than you used for yesterday’s sandwich wouldn’t you? Maintaining a consistent environment from development to production is also a great way to reduce the number of production bugs that can’t be reproduced on a development environment. With the rise of scalable micro-hosting services Amazon EC2, this mindset has already taken hold a bit to facilitate increased server and network load. As the site requires more resources, identical server images are loaded onto EC2 instances to handle the additional load, and then are powered back down when they’re no longer needed. This practice is also incredibly valuable for preventing issues that can crop up with long-running applications, especially across deployments. Technologies like Docker do a good job of encouraging isolation of different pieces of an application by their function, allowing them to be deployed as needed as individual server images. As Docker and other similar services gain support, I think we will start to see a change in the way we view applications. Rather than being defined as just code, applications will be defined as a collection of isolated services.

Blog

A Strategy for Container Adoption

Blog

IT teams: How to remain an organizational asset in this 'digital or die' era

Insights from a former enterprise CIO and now chief operations officer for a startup This blog was contributed by our customer friends at Spark Ventures. It was written by Peter Yates (@peteyatesnz) who is the head of operations and platform delivery. When IT does not break, it’s all good but when things go wrong it’s all hands to the pump. IT should not just be about keeping the lights on anymore, whereas that may have been a strategy in the past it certainly will not be good enough in this Digital era. Organizations may require IT to guide them through periods of significant change or lead Digital strategies and innovation. IT must therefore be an enabler and leader of change and must ensure it can respond to the current and future needs of the organization by being flexible and agile. So how can IT achieve this? If IT cannot get to grips with this approach then we may see a proliferation of shadow IT or IT being bypassed by the organization in favor of advice from outside influences that can help the Organization consistently respond to and meet its business objectives. For IT to be an organizational enabler and leader of change, IT should: 1. Define a clear strategy What does IT stand for and how will it support organizational goals? How will it use the cloud and automation? How can IT support the Organization’s need to be more Digital, in a world where the need to be Digital or die is so prevalent? As Forbes (Cloud is the foundation for Digital Transformation, 2014) has recognized "Since 2000, 52 per cent of companies in the Fortune 500 have either gone bankrupt, been acquired or ceased to exist". The reason for this, in my view, is because Organizations have failed to keep up with the constant rate of change. 2. Be focused What does IT see as its core business? Where will it add the most value to the organization? By having this clearly defined within IT strategy as part of supporting a wider organizational strategy the answers to these questions will help clarify the most suitable technology solutions, in essence creating some guiding principles for making architectural or technology decisions. For example, an internal IT team within an innovation venture (as part of a leading telecommunications company) may decide that managing an email service is not a core service because there are cloud solutions such as Office 365 or Gmail that can be consumed without the operational overhead of a traditional email service. Read more: Challenges arise as big data becomes the ‘new normal’ 3. Get the foundations right Ensuring the IT basics are done correctly (e.g. monitoring, network and application stability/availability) are the building blocks for creating credibility and stability. Without this in place an Organization may, for example, have the best apps on the market but which are constantly unavailable and unusable by the Organization and its customers. Poor foundations means that supporting organizational growth will be hard for an IT team to achieve. Getting the foundations rights needs to be done in conjunction with setting a clear strategy and being focused on what is core to IT and ultimately the organization. 4. Deliver If you can’t deliver on projects, service levels or advice in general then you risk losing the trust of the Organization and, more than likely, IT and the CIO/CDO will be overlooked for their advice to the executive team. If you cannot consistently and quickly deliver to the needs of the Organization then you may see a proliferation of shadow IT within the company, again a possible sign that IT is not being agile or responsive enough to meet the needs of the organization. Above all get the basics right so IT can build on solid technology decisions and solutions that support the Organization and its strategies (growth or otherwise). If IT can't deliver in its current guise, it must look at ways to enable this, such as creating a separate innovation team that is not constrained by legacy, as has been shown by Spark New Zealand (Spark Ventures), New Zealand Post, Fletcher Building or Air New Zealand. Read more: CIO Upfront: Is there such a thing as bad innovation? 5. Stay current and relevant It is vital that IT stays up to date with industry and technology trends (Cloud, IoT, Digital, SaaS) and can demonstrate, or at least has a view on, how these can be utilized by the organization both now and in the future. Being relevant and current reduces an organization's need to look elsewhere for advice and technology solutions. Staying current could also mean a review of how IT is structured, a CIO versus a CDO or a less siloed approach to team structure. Digital is not one particular “thing” - it’s also a change in mind-set and a move away from the traditional to being more about an organisation’s combined use of social media and analytics to drive decisions, particularly around its customer’s. Being “Digital” is also about an organisations use of the "cloud" (SaaS, AWS or Azure) and having a mobile presence for its products, services and support. The strategies and subsequent use of social media, analytics, mobility and cloud by any organisation must coexist. For example, it is not useful to just have a mobility strategy without the customer analytics behind it, or the ability for an Organisation’s customers to tweet or comment using the Organisation’s application using any device. If IT focuses on the above five key areas it can remain relevant as well as being an enabler to an organisation in achieving its goals and strategies alongside (rather than having to go around) IT. In this Digital era it’s not only consumers that consume - organisations are looking at options within this "consumption economy" as a way of focusing on core business and consuming the rest. Some great examples of this are Salesforce (CRM), Zuora (Billing), Remedyforce (Digital Enterprise Management), Box (document storage), Sumologic (Data Analytics) or Office 365 and Gmail (Collaboration). Not going digital is really not an option for many organisations, especially if they still want to be loved by their customers and want to remain agile so they can respond to, or even lead, market changes. Read more: CIO to COO: Lessons from the cloud A quote from former GE CEO Jack Welch sums up nicely why IT needs to support and/or lead an organisation’s change programmes: “If the rate of change on the outside exceeds the rate of change on the inside, the end is near.” Related: The State of the CIO 2015: The digital mindshift Peter Yates (@peteyatesnz) is head of operations and platform delivery at Spark Ventures (formerly Telecom Digital Ventures). His previous roles included technology services group manager/CIO at Foster Moore and IS infrastructure manager at Auckland Council.

Azure

November 10, 2015

Blog

Sumo Logic Takes Center Stage at PCI Europe Community Meeting

Back in Aug 19, 2015, we announced that Sumo Logic has joined the Payment Card Industry (PCI) Security Standards Council (SSC) as a participating organization, and is also an active member in the “Daily Log Monitoring” Special Interest Group (SIG). The purpose of the SIG and primary reason we joined, is to provide helpful guidance and techniques to organizations on improving daily log monitoring and forensic breach investigations to meet PCI Data Security Standard (DSS) Requirement 10. Organizations face many challenges in dealing with PCI DSS Requirement 10 including but not limited to large volumes of log data, distinguishing between what is a security event and what is normal, correlating log data from disparate systems, and meeting the stated frequency of manual log reviews. It was with great honor that the chair of this SIG, Jake Marcinko, Standards Manager PCI SSC asked us to co-present with him on stage at the PCI European Community Meeting in Nice France. Over 500 people came from all over Europe – Banks, Merchants, Card Brands, Qualified security assessors (QSA), Penetration testers, certified information system auditors (CISA), and vendors – for a packed three days of education, networking, discussions and of course, good food! To provide some context and background – and part of the “raison d’etre” this SIG came to fruition – when looking anecdotally at past data breaches, evidence has often been found in merchant logs. However, the details were extremely difficult to find due to the high volume of logged events. And although log collection and daily reviews are required by the PCI DSS, logs collected from merchants can be huge, at the peak of the day, some organizations seeing over 50,000 events per second. This makes it time consuming and often difficult – if not humanely possible – to accurately review and monitor those logs to meet the intent of PCI DSS. This is akin to finding the needle in the haystack, where the needle is the security event, and the haystack is the corresponding logs and data packets. According to Mandiant’s annual M-Trends Report, the median number of days before a breach is detected is 205 days. Why is this the case? Because existing security technologies are struggling to keep up with modern day threats. Fixed rule sets we see across SIEM solutions are great if you know what you are looking for, but what happens when we do not know what too look for or when we do not even know the right questions to ask? So what does this all mean? Is there hope, or are we destined to continue along with the dismal status quo? Luckily there are new cloud-native, advanced security solutions emerging that leverage data science to help us look holistically across our hybrid infrastructure to give us visibility across the entire stack, leveraging machine learning to reduce millions of data streams into human digestible patterns and security events, and to know what is normal by baselining and automatically identifying and alerting on anomalies and deviations. It is these continuous insights and visibility across hybrid workloads that become real opportunities to improve one’s security posture and approach compliance with confidence and clarity. Timelines and Deliverables Information Supplement – Daily Log Monitoring SIG guidance is expected to be released in Q1, 2016.

Blog

Docker Engine for Windows Server 2016

Until recently, deploying containers on Windows (or on Microsoft’s Azure cloud) meant deploying them on a Linux/UNIX VM managed by a Windows-based hypervisor. Microsoft is a member of the Open Container Initiative, and has been generally quite supportive of such solutions, but they necessarily required the added VM layer of abstraction, rather than being native Windows containers. If you wanted containers, and if you wanted Docker, at some point, you needed Linux or UNIX. Microsoft’s much anticipated Technical Preview of the Docker Engine for Windows Server is out. Let’s look at some of the differences between Docker on Linux. First, though, let’s look at the difference between a virtual machine and containerization? If you’re reading this, you probably know the answer in detail, but we’ll do a quick run-through anyway. Containerization From the Past A virtual machine (or VM) is a complete computer hardware layer (CPU, RAM, I/O, etc.) abstracted to software, and running as if it were a self-contained, independent machine within the host operating system. A VM is managed by a hypervisor application (such as VirtualBox or VMware) running on the host system. The VM itself has an operating system that is completely independent of the host system, so a virtual machine with a Linux OS can run on Windows, or vice versa. While a container may look like a virtual machine, it is a very different sort of device when you look under the hood. Like a VM, a container provides a self-contained, isolated environment in which to run a program. A container, however, uses many of the resources of the host system. At the same time, the applications within the container can’t see or act outside of the bounds of the container. Since the container uses so many of the host system’s basic resources, the container’s OS is essentially the same as the host OS. While VM hypervisors are available for most operating systems (allowing an individual instance of a VM to be essentially machine-independent), containers developed largely out of the Linux/UNIX world, and have been closely tied to Linux and UNIX systems. Docker has become the de facto standard for deploying and managing containers, and Docker itself is native to the Linux/UNIX world. Windows Server 2016 and the New Hyper-V Enter Windows Server Containers, and Windows Server 2016. Windows Server Containers are native Windows containers running on Windows Server 2016, and they come with a Windows implementation of Docker. Now, if you want container and you want Docker, you can have them directly on Windows. But what does this mean in practice? First and foremost, a Windows Server Container is exactly what the name implies — a Windows container. Just as a Linux-based container makes direct use of the resources of the underlying operating system and is dependent on it, a Windows Server Container uses Windows resources, and is dependent on Windows. This means that you can’t deploy a Windows Server Container directly on a Linux/UNIX system, any more than you can deploy a Linux container directly on a Windows system. Note: Initially system resources used in a Windows Server Container instance must exactly match those used by the host system in terms of version number, build, and patch; since Windows Server Containers are still in the technical preview stage, however, this may change.) Microsoft has added a bonus: along with Windows Server Containers, which are standard containers at heart, it is also offering a kind of hybrid container, which it calls Hyper-V. A Hyper-V container is more like a standard virtual machine, in that it has its own operating system kernel and memory space, but in other ways, it is dependent on system resources in the manner of a typical container. Microsoft says that the advantage of Hyper-V containers is that they have greater isolation from the host system, and thus more security, making them a better choice for situations where you do not have full control over what’s going to be going on within the containers. Hyper-V containers can be deployed and managed exactly like Windows Server Containers. How they Compare So, then, is Windows Docker really Docker? Yes, it is. Microsoft has taken considerable care to make sure that all of the Docker CLI commands are implemented in the Windows version; Windows Server Containers (and Hyper-V containers) can be managed either from the Docker command line or the PowerShell command line. Now for some of the “mostly, but not 100%” part: Windows Server Containers and Docker containers aren’t quite the same thing. You can use Docker to create a Windows Server Container from an existing Windows Server Container image, and you can manage the new container using Docker. You can also create and manage Windows Server Containers from PowerShell, but if you provision a container with PowerShell it cannot be managed directly with the Docker client/server and vice versa. You must stick with one provisioning method. (Microsoft, however, has indicated that this may change.) In many ways, these are the complications that you would expect when porting a conceptually rather complex and platform-dependent system from one OS to another. What’s more important is that you can now deploy containers using Docker directly on Windows. It may not be a threat to Linux, but it does keep Microsoft in the game at a time when that game has been shifting more and more towards open-source and generally Linux/UNIX-based DevOps tools. So to answer the question, “Are Windows Server Containers really Docker?” — they are as much Docker as you could reasonably expect, and then some. They are also definitely Windows containers, and Microsoft to the core. Docker Engine for Windows Server is not yet GA – If you’re currently running apps from Docker containers running on Linux, check out the Docker Log Analyzer from Sumo Logic.

Blog

Log Analysis in a DevOps Environment

Log analysis is a first-rate debugging tool for DevOps. But if all you’re using it for is finding and preventing trouble, you may be missing some of the major benefits of log analysis. What else can it offer you? Let’s talk about growth. First of all, not all trouble shows up in the form of bugs or error messages; an “error-free” system can still be operating far below optimal efficiency by a variety of important standards. What is the actual response time from the user’s point of view? Is the program eating up clock cycles with unnecessary operations? Log analysis can help you identify bottlenecks, even when they aren’t yet apparent in day-to-day operations. Use Cases for Log Analysis Consider, for example, something as basic as database access. As the number of records grows, access time can slow down, sometimes significantly; there’s nothing new about that. But if the complexity and the number of tables in the database are also increasing, those factors can also slow down retrieval. If the code that deals with the database is designed for maximum efficiency in all situations, it should handle the increased complexity with a minimum of trouble. The tricky part of that last sentence, however, is the phrase “in all situations”. In practice, most code is designed to be efficient under any conditions which seem reasonable at the time, rather than in perpetuity. A routine that performs an optional check on database records may not present any problem when the number of records is low, or when it only runs occasionally, but it may slow the system down if the number of affected records is too high, or if it is done too frequently. As conditions change, hidden inefficiencies in existing code are likely to make themselves known, particularly if the changes put greater demands on the system. As inefficiencies of this kind emerge (but before they present obvious problems in performance) they are likely to show up in the system’s logs. As an example, a gradual increase in the time required to open or close a group of records might appear, which gives you a chance to anticipate and prevent any slowdowns that they might cause. Log analysis can find other kinds of potential bottlenecks as well. For example, intermittent delays in response from a process or an external program can be hard to detect simply by watching overall performance, but they will probably show up in the log files. A single process with significant delays in response time can slow down the whole system. If two process are dependent on each other, and they each have intermittent delays, they can reduce the system’s speed to a crawl or even bring it to a halt. Log analysis should allow you to recognize these delays, as well as the dependencies which can amplify them. Log Data Analytics – Beyond Ops Software operation isn’t the only thing that can be made more efficient by log analysis. Consider the amount of time that is spent in meetings simply trying to get everybody on the same page when it comes to discussing technical issues. It’s far too easy to have a prolonged discussion of performance problems and potential solutions without the participants having a clear idea of the current state of the system. One of the easiest ways to bring such a meeting into focus and shorten discussion time is to provide everybody involved with a digest of key items from the logs, showing the current state of the system and highlighting problem areas. Log analysis can also be a major aid to overall planning by providing detailed picture of how the system actually performs. It can help you map out the parts of the system are the most sensitive to changes in the performance in other areas, allowing you to avoid making alterations which are likely to degrade performance. It can also reveal unanticipated dependencies, as well as suggesting potential shortcuts in the flow of data. Understanding Scalability via Log Analysis One of the most important things that log analysis can do in terms of growth is to help you understand how the system is likely to perform as it scales up. When you know the time required to perform a particular operation on 100,000 records, you can roughly calculate the time required to do the same operation with 10,000,000 records. This in turn allows you to consider whether the code that performs the operation will be adequate at a larger scale, or whether you will need to look at a new strategy for producing the same results. Observability and Baseline Metrics A log analysis system that lets you establish a baseline and observe changes to metrics in relation to that baseline is of course extremely valuable for troubleshooting, but it can also be a major aid to growth. Rapid notification of changes in metrics gives you a real-time window into the way that the system responds to new conditions, and it allows you to detect potential sensitivities which might otherwise go unnoticed. In a similar vein, a system with superior anomaly detection features will make it much easier to pinpoint potential bottlenecks and delayed-response cascades by alerting you to the kinds of unusual events which are often signatures of such problems. All of these things — detecting bottlenecks and intermittent delays, as well as other anomalies which may signal future trouble, anticipating changes in performance as a result of changes in scale, recognizing inefficiencies — will help you turn your software (and your organization) into the kind of lean, clean system which is so often necessary for growth. And all of these things can, surprisingly enough, come from something as simple as good, intelligent, thoughtful log analysis.

Blog

A Better Way to Analyze Log Files on the Command Line

Sumo Logic makes it easy to aggregate and search terabytes of log data. But you don’t always have terabytes of data on 1000s of servers. Sometimes you have just a few log files on a single server. We’re open sourcing Sumoshell, a set of tools recently created at a hackathon, to help fill that gap. Getting real value from your logs requires more than finding log lines that match a few keywords and paging through (ala tail/grep/less) — you need parsing, transforming, aggregating, graphing, clustering (and more). All these things are easy to do in Sumo Logic, but they’re hard to do with the standard set of unix command line utilities people usually use to analyze logs. Sumoshell is a set of command line utilities to analyze logs. Its goal is to bring Sumo Logic’s log analysis power to the command line. Here’s an example of Sumoshell parsing tcpdump’s output to show the ip addresses that my laptop is sending data to, and the total amount of data sent to each host. The TCP dump looks like this: 23:25:17.237834 IP 6.97.a86c.com.http > 10.0.0.6.53036: Flags [P.], seq 33007:33409, ack 24989, win 126, options [TS], length 2 23:25:17.237881 IP 10.0.0.6.53036 > 6.97.a86c.com.http: Flags [.], ack 2, win 4096, options [nop], length 0 23:25:17.237959 IP 10.0.0.6.53036 > 6.97.a86c.http: Flags [P.] options [nop,nop,TS val 1255619794 ecr 249923103], length 6 The Sumoshell command is: sudo tcpdump 2>/dev/null | sumo search | sumo parse "IP * > *:" as src, dest | sumo parse "length *" as length | sumo sum length by dest | render The Sumoshelll query language supports an adapted subset of the Sumo Logic query language, utilizing Unix pipes to shuttle data between operators. The output is: Some other helpful features of Sumoshell: Sumoshell understands that multiline log messages are one semantic unit, so if you search for Exception, you get the entire stack trace. Sumoshell lets you parse out pieces of your logs to just print the bits you care about or to use later in aggregations or transformations. Once you’ve parsed out fields like status_code or response_time_ms, you can count by status_code or average response_time_ms by status_code. If you wanted to do this for your weblogs, you could do something like: tail -f /var/log/webserver/http.log | sumo search "GET" | sumo parse "[status=*][response_time=*] as stat, rt | average rt by stat | render Once you’ve parsed fields, or aggregated the results with sum, count, or average, Sumoshell comes with intelligent pretty-printers to clearly display the aggregate data on the command line. They know how wide your terminal is so text won’t wrap and be hard to read. They figure out how many characters individual fields have, so the columns line up. They even let you see live updating graphs of your data, all in your terminal. You can learn more about Sumoshell at the Github repository where you can also download binaries, see the source, and contribute your own operators. If Sumoshell helps you analyze logs on one server, consider trying out Sumo Logic to use even more powerful tools on your entire fleet.

October 30, 2015

Blog

Change Management, the ChatOps Way

All changes to production environments at Sumo Logic follow a well-documented change management process. While generally a sound practice, it is also specifically required for PCI, SOC 2, HIPAA, ISO 27001 and CSA Star compliance, amongst others. Traditional processes never seemed like a suitable way to implement change management at Sumo Logic. Even a Change Management Board (CMB) that meets daily is much too slow for our environment, where changes are implemented every day, at any time of the day. In this blog, I’ll describe our current solution, which we have iterated towards over the past several years. The goals for a our change management process are that: Anybody can propose a change to the production system, at anytime, and anybody can follow what changes are being proposed. A well-known set of reviewers can quickly and efficiently review changes and decide on whether to implement them. Any change to production needs to leave an audit trail to meet compliance requirements. Workflow and Audit Trail We used Atlassian JIRA to model the workflow for any System Change Request (SCR). Not only is JIRA a good tool for workflows, but we also use it for most of our other bug and project tracking, making it trivial to link to relevant bugs or issues. Here’s what the current workflow for a system change request looks like: A typical system change request goes through these steps: Create the JIRA issue. Propose the system change request to the Change Management Board. Get three approvals from members of the Change Management Board. Implement the change. Close the JIRA issue. If the CMB rejects the change request, we simply close the JIRA issue. The SCR type in JIRA has a number of custom fields, including: Environments to which the change needs to be applied Schedule date for the change Justification for the change (for emergency changes only) Risk assessment (Low/Medium/High) Customer facing downtime? Implementation steps, back-out steps and verification steps CMB meeting notes Names of CMB approvers These details allow CMB members to quickly assess the risk and effects of a proposed change. Getting to a decision quickly To get from a proposal to approved change in the most expedient manner, we have a dedicated #cmb-public channel in Slack. The typical sequence is: Somebody proposes a system change in the Slack channel, linking to the JIRA ticket. If needed, there is a brief discussion around the risk and details of the change. Three of the members of the CMB approve the change in JIRA. The requester or on-calls implement the change and mark the SCR implemented. In the past, we manually tied together JIRA and Slack, without any direct integration. As a result, it often took a long time for SCRs to get approved, and there was a good amount of manual leg work to find the SCR in JIRA and see the details. Bender to the rescue In order to tie together the JIRA and Slack portions of this workflow, we built a plugin for our sumobot Slack bot. In our Slack instance, sumobot goes by the name of Bender Bending Rodriguez, named for the robot in Futurama. As engineers and CMB members interact with an SCR, Bender provides helpful details from Jira. Here’s an example of an interaction: As you can see, Bender listens to messages containing both the word “proposing” and a JIRA link. He then provides a helpful summary of the request. As people vote, he checks the status of the JIRA ticket, and once it moves into the Approved state, he lets the channel know. Additionally, he posts a list of currently open SCRs into the channel three times a day, to remind CMB members of items they still need to decide on. The same list can also be manually requested by asking for “pending scrs” in the channel. Since this sumobot plugin is specific to our use case, I have decided not to include it in the open source repository, but I have made the current version of the source code available as part of this blog post here.

Blog

Stateful traversal and econometric simulation: together at last!

Functional programming concepts such as the State monad and the Traversable type class can be powerful tools for thinking about common software engineering problems. As with any abstractions, these ideas may seem unfamiliar or opaque in the absence of concrete examples. The goal of this post is therefore to motivate and contextualize a “stateful traversal” design pattern by working through the specific problem of generating synthetic data points drawn from a simple statistical model. AR(p) modeling The p-th order linear autoregressive model AR(p) assumes that data point $$y_t$$ is generated according to the equation where the $$a_i$$ are the fixed autoregressive coefficients and $$\epsilon_t$$ are the iid Gaussian noise terms. This is a basic and foundational model in econometrics and time series analysis. Data generation Say we have some model estimation code which we would like to sanity test against synthetic data points generated by a known model. First, we note that the $$\epsilon$$ terms are, by definition, totally independent of anything else. We can therefore simply sample $$N$$ of these to populate an $$\epsilon$$ vector. The next step is a bit more involved. While we could compute each $$y_t$$ independently using only the previous noise terms $$\epsilon$$, this calculation would be $$O(N^2)$$ and a bit awkward. A more intuitive approach might be to sweep a pointer over the $$\epsilon$$ array and our (to-be-populated) $$y$$ array, using pointer/index arithmetic to select the appropriate previous terms for our calculation of each $$y_t$$. Basic sliding window model. This simple diagram shows how we can slide a function $$f$$ with fixed coefficients a over our previously generated $$y$$ and iid $$\epsilon$$. An unfortunate aspect of this design is that it is mixes inputs and outputs in a way that could easily lead to programming errors. Also, the pointer arithmetic trick couples us to this exact problem and somewhat obscures what we’re really doing: threading the dynamic evolution of some state (here, the trailing window) through the course of our data-generating process. To make the latter point more concrete, consider the bookkeeping complexity that would be entailed by the extension of this approach to models with a richer notion of state, such as an ARIMA model or an Infinite Hidden Markov Model. This motivates an alternative implementation which explicitly models the evolution of some state as we generate our data. State-threading approach The idea here is that the AR(p) equation above tells us how to get a mapping from some state of type S (window $$[y_{t-p},\ldots,y_{t-1}]$$) and an input type I (noise $$\epsilon_t$$) to a new output of type O (output $$y_t$$) and a new state of type S (new window $$[y_{t-(p-1)},\ldots,y_t]$$). As a Scala type signature this would look something like (S,I) => (S,O). Basic state-based model. The above diagram shows the simplest version of this idea, where $$g$$ is nearly identical to $$f$$, except with an additional output for the new state S. We can simplify this a bit via the concept of partial application. We note that $$ a $$ is fixed across all evaluations of our function, so we can fix that parameter to get $$g(y,\epsilon_t)$$, as shown in this diagram. Model with coefficient parameters fixed. Finally, we combine partial application with the functor property of our $$\epsilon$$ sequence by mapping partial application of our function over $$\epsilon$$ to get separate functions $$g_t(y)$$ for each individual position $$t$$. Our function now simply maps from a previous window S to a new window S and an output value O, as shown in this diagram Per-t state-based model. This re-arrangement has stripped our sequential data-generating computation down to its essence: given a sequence of iid noise terms $$\epsilon$$, we almost directly encode our mathematical definition as a function which takes a sliding window state S as its input, and returns a computed value O and a new sliding window state S as output, therefore having type signature S => (S,O). Plug-and-play All that remains is to construct the plumbing necessary to put it all together. As luck would have it, there is a common functional programming idiom for traversing a data structure while simultaneously accumulating state and transforming individual elements based on that state. This allows us to simply supply our transformed function of $$g$$ along with an initial window state (say, all zeros). #wrap_githubgist02765cb6b2af72bce5f1 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n case class AR(coeff: List[Double]) {\n \n \n \n type Coeff = List[Double]\n \n \n \n type Window = List[Double]\n \n \n \n \n\n \n \n \n // Assuming Coeff supplied as [... , a_{t-2}, a_{t-1}] and\n \n \n \n // likewise, Window supplied as [..., y_{t-2}, y_{t-1}]\n \n \n \n private def g(a: Coeff, noise: Double, w: Window): (Window, Double) = {\n \n \n \n val yt = (a zip w).map{case (x,y) => x*y}.sum + noise\n \n \n \n (w.tail :+ yt, yt)\n \n \n \n }\n \n \n \n \n\n \n \n \n def generate(noise: List[Double]): List[Double] = {\n \n \n \n val initWindow = coeff.map(_ => 0.0) // Init window = all zeros\n \n \n \n val gs = noise.map((g _).curried(coeff)) // One S => (S, O) for each noise input\n \n \n \n StatefulGeneration.generate(initWindow, gs) // ???\n \n \n \n }\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n ar-model.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found This is somewhat remarkable. The AR(p) equation tells us how to go from a state (the window of previous $$y_t$$) and an input ($$\epsilon_t$$) to a new state (the new window) and an output (the new $$y_t$$), and we can directly plug exactly this information into some generic machinery to achieve our desired result. So how does it work? State monad Clearly we’ve buried some crucial mechanics – let’s take a look at StatefulGeneration.generate(): #wrap_githubgist5507cbc7dda71c9688dc .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n object StatefulGeneration {\n \n \n \n def generate[G[_] : Traverse : Functor, S, O](init: S, gs: G[S => (S,O)]): G[O] = {\n \n \n \n val (traverse, functor) = (implicitly[Traverse[G]], implicitly[Functor[G]])\n \n \n \n val stateFunctions = functor.map(gs)(State.apply[S,O])\n \n \n \n traverse.sequenceS[S,O](stateFunctions).run(init)._2\n \n \n \n }\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n stateful-gen.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found The context bounds assert the availability of utility type class instances for our container type G[_] (here, List), which are supplied by scalaz.std.list.listInstance and retrieved via the implicitly statements. The first piece of machinery invoked in the above example is functor.map(gs)(State.apply). Recall that gs has type List[Window => (Window,Double)], that is, a sequence of functions that each map an input state to a new output state along with an output value. This simple and abstract definition of a stateful computation occupies a special place in the functional programming world, known as the State Monad. There exists a vast amount of instructional content on this topic which we shall not recapitulate here (you can find plenty on the web, for a nice example see “Learn You A Haskell” aka LYAH). Suffice it to say that for our purposes a State instance is a wrapper for a function that takes a state of type S as input, and returns some new state of type S along with an “output” value of type A, something like: #wrap_githubgist01a320efa31d736c4ace .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n case class State[S, A](run: S => (S, A))\n \n\n\n \n\n \n \n\n\n \n \n view raw\n state.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found The “Monad” half of the name means that there exists a flatMap function over State which, loosely speaking, “chains” two of these computations, using the output S of one State as the input to the next and returning a new stateful computation. An example implementation and simple diagram are below, where State.run takes an initial state S as input and then executes the wrapped function: #wrap_githubgistf54c8ff1b5044d6d9d58 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n def flatMap[A,B](ma: State[S,A])(amb: A => State[S,B]): State[S,B] = {\n \n \n \n State({s => {\n \n \n \n val (s1, a) = ma.run(s)\n \n \n \n amb(a).run(s1)\n \n \n \n }})\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n state-flatmap.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found Diagram of flatMap over State. Returning to our generate() function above, the snippet simply wraps each of our $$g_t$$ functions Window => (Window,Double) into a scalaz State, for which flatMap and a variety of other nice helper functions are already defined. Traverse/Sequence What about traverse.sequenceS[Window,Double]? Let’s inspect the type signature of sequenceS, substituting in the actual concrete types we’re using: #wrap_githubgistb0ffc4bb59bf15329b51 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n def sequenceS[Window,Double](fga: List[State[Window,Double]]): State[Window,List[Double]]\n \n\n\n \n\n \n \n\n\n \n \n view raw\n sequences-signature.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found Verbally, this translates to transforming a List of stateful computations with Window state and Double output into a single stateful computation with Window state and List[Double] output. Informally, sequenceS is able to accomplish this by using the flatMap machinery defined in the previous section to chain all of the individual stateful computations into one. Furthermore, sequenceS also transforms the “output” variable of the resulting stateful computation from Double to List[Double], a list of all the individual outputs, which is exactly the final output we originally wanted. In general, Sequence allows us to “commute” two higher-order types, going from G[F[A]] to F[G[A]]. In this specific case we are transforming List[State[Window,Double]] to State[Window, List[Double]]. The End Finally, we run it by supplying an all-zeros window as the initial state. The resulting value is of type (Window,List[Double]), containing both the final window state and the list of all output values. We retrieve the latter via ._2 and declare victory! What have we gained by all this? First, we have pushed the work of incidental wiring into well-defined library functions, leaving our custom code focused solely on the particulars of our problem: the $$g()$$ function which emits a value and updates the state. This compact surface area for custom logic should be easier to both understand and test. Second, notice that StatefulGeneration.generate() is polymorphic in the type of the data container G[_], subject to the availability of Traverse and Functor type class instances. Finally, the stateful traversal and transformation of sequential data structures is ubiquitous in software, making this design pattern a valuable addition to one’s “vocabulary” for understanding and reasoning about code. References This example was briefly discussed in my talk “Economical machine learning via functional programming” (slides, video) at the Big Data Scala by the Bay conference in Oakland. The code examples use the scalaz library, for which the “Learning Scalaz” blog series is an invaluable companion. The functional programming concepts discussed here are heavily influenced by the excellent book Functional Programming in Scala. My reference for the AR(p) process is Applied Econometric Time Series. A more thorough look at some of these ideas can be found in Eric Torreborre’s epic blog post on the “Essence of the Iterator Pattern” paper by Jeremy Gibbons and Bruno C. d. S. Oliveira. Also see other blog discussion here, here, and here. Any mistakes are of course my own, and I’d love to hear about them!

October 27, 2015

Blog

Rapid Similarity Search with Weighted Min-Hash

// We are in the era of big data characterized by data sets that are too large, fast and complicated to be handled by standard methods or tools. An example of such kind of big data sets is machine-generated logs. At Sumo Logic, our analytics team has been focused on studying new techniques for summarizing, analyzing and mining machine-generated logs. Our production service is processing hundreds of terabytes of machine-generated logs per day. One technique contributing for such huge scalability is the min-hash, which is an instance of locality sensitive hashing. Why min-hash? Let us consider the problem of finding similar machine logs: provided $$n$$ logs, the goal is to build a data structure that, given a query log $$q$$, return logs that are similar to the query. Suppose each log is represented by a set of tokens and Jaccard index is used to measuring their pairwise similarity as: where $$J \in [0, 1]$$, $$\mbox{log}_1$$ and $$\mbox{log}_2$$ are two machine logs, and $$\mbox{set}(.)$$ is the function used to tokenizing a machine log into a set of tokens, a straightforward approach for finding similar logs to the query $$q$$ is to iterate all $$n$$ logs and compute their Jaccard similarities to $$q$$. The time consumption of this approach is $$O(n)$$, which is unacceptable for very large datasets of high-dimensional items. The min-hash technique is a much faster solution to the above problem of finding similar logs and its time consumption can be close to $$O(1)$$ [1]. The basic idea is to design a hashing trick that maps similar machine logs into the same bucket so that only logs within the same bucket need to be considered more closely. Let $$h$$ denote a hashing function that maps a token to a random integer: then the min-hash function will map $$\mbox{log}$$ into the following bucket: We can verify that the probability of two machine logs being mapped to the same bucket is equal to their Jaccard similarity, that is: The min-hash technique has been widely used in industries such as news recommendation[4], near duplicate web page detection[1] and image search[5]. At Sumo Logic, our analytics team is adopting it for matching machine logs to signatures and also for inferring new signatures from machine logs. Considering token weights In min-hash, all tokens are considered to have equal weights, and therefore have equal probabilities to be selected as the token with the minimal hashing value. However in reality, we wish to treat tokens as having different weights, in particular in applications such as document retrieval, text mining and web search. A famous approach to weighting tokens in document retrieval is the inverse document frequency (idf)[2]. Let $$t$$ be a token, then idf weights $$t$$ as: where $$n_{t}$$ is the total frequency of $$t$$ appearing in all documents and $$N$$ is the total number of documents. The idf approach puts small weights on frequent tokens and large weights on rare tokens. This unequal assignment of token weights will decrease effects of common tokens and let more informative tokens pop out, thus leading to significant improvement on accuracy and relevance of retrieved results. Like documents, machine logs consist of tokens, which we may wish to weigh differently. For example, in the machine log: ``INFO [hostId=monitor-1] [module=GLASS] [localUserName=glass] [logger=LeaderElection] [thread=Thread-2] Updating leader election; Found 18 elections to update", bold tokens are more informative than others and thus we hope them have larger weights. It can also be useful to incorporate token weights when measuring the similarity between two logs, which in fact leads to the weighted Jaccard approach defined as: The weighted Jaccard similarity is a natural generalization of Jaccard similarity. It will become Jaccard similarity if all token weights $$w(t)$$ are set as $$1.0$$. The figure above gives an example of showing Jaccard and weighted Jaccard similarities between two synthetic machine logs “A B” and “A C” in which weights of tokens “B” and “C” are fixed as $$1.0$$, while the weight of token “A” is changing from $$0.1$$ to $$10$$. It tells us that incorporating token weights can have a major impact on the computed similarity. Weighted min-hash Suppose tokens have different weights and these weights are also known beforehand, the question now becomes how to incorporate token weights into min-hash so that similar logs under weighted Jaccard similarities have high probabilities to be mapped into the same bucket. Note in min-hash, the hashing function, which is a mapping from a token to a random integer, is only used for perturbing orders of tokens. That means you can use original tokens as hashing keys. You can also use randomly-generated strings as hashing keys. The advantage of randomly-generated keys is that you can assign different numbers of random keys to the same token according to its weight. In particular, tokens with large weights should be assigned more random keys, while tokens with small weights should be assigned less random keys. That is the core trick of weighted min-hash[3][6]. If all token weights are assumed to be positive integers which can be realized by scaling and discretization, then a simple approach is to generate $$w(t)$$ random keys for the token $$t$$ where $$w(t)$$ is its weight. For example, if the weight of token “fail” is $$3$$, then we will assign $$3$$ random keys such as “fail0″, “fail1″, “fail2″ to it. The weighted min-hash runs min-hash on randomly-generated keys, instead of original keys. The weighted min-hash function becomes: where $$\mbox{random-keys}(t)$$ is the set of randomly-generated keys for the token $$t$$ according to its weight. We can verify that for integer-valued weights: Apart from the random keys method described above, monotonic transformation[3] is another approach for incorporating token weights into min-hash. In[3], the authors did a lot of experiments to compare min-hash and weighed min-hash for image retrieval and concluded that weighted min-hash performs much better than min-hash. Our analytics team got the same conclusion after running min-hash and weighted min-hash for machine log analysis. [1] A. Rajaraman and J. Ullman, Mining of Massive Datasets, Cambridge University Press, 2012. [2] K. Jones, A Statistical Interpretation of Term Specificity and Its Application in Retrieval, Journal of Documentation, pp. 11-21, 1972 [3] O. Chum, J. Philbin, and A. Zisserman, Near Duplicate Image Detection: Min-Hash and Tf-Idf Weighting, BMCV, pp. 493-502, 2008 [4] A. Das, M. Datar, and A. Garg, Google News Personalization: Scalable Online Collaborative Filtering, WWW, pp. 271-280, 2007 [5] O. Chum, M. Perdoch, and J. Matas, Geometric min-Hashing: Finding a (Thick) Needle in a Haystack, CVPR, pp. 17-24, 2009 [6] S. Ioffe, Improved Consistent Sampling, Weighted MinHash and L1 Sketching, ICDM, pp.246-255, 2010

October 22, 2015

Blog

Open Sourcing DevOps

Once again, DevOps is moving the needle. This time, it’s in the open source world, and both open source and commercial software may never be the same again.As more and more open source projects have become not only successful, but vital to the individual and organizations that use them, the open source world has begun to receive some (occasionally grudging) respect from commercial software developers. And as commercial developers have become increasingly dependent on open source tools, they have begun to take the open source process itself more seriously.Today some large corporations have begun to actively participate in the open source world, not merely as users, but as developers of open source projects. SAP, for example, has a variety of projects on github, and Capital One has just launched its open source Hygeia project, also on github. Why would large, corporate, and commercial software developers place their code in open source repositories? Needless to say, they’re making only a limited number of projects available as open source, but for companies used to treating their proprietary source code as a valuable asset that needs to be closely guarded, even limited open source exposure is a remarkable concession. It’s reasonable to assume that they see significant value in the move. What kind of payoff are they looking for? Hiring. The open source community is one of the largest and most accessible pools of programming talent in software history, and a high percentage of the most capable participants are underemployed by traditional standards. Posting an attractive-looking set of open source projects is an easy way to lure new recruits. It allows potential employees to get their feet wet without making any commitments (on either end), and it says, “Hey, we’re a casual, relaxed open source company — working for us will be just like what you’re doing now, only you’ll be making more money!” Recognition. It’s an easy way to recognize employees who have made a contribution — post a (non-essential) project that they’ve worked on, giving them credit for their work. It’s cheaper than a bonus (or even a trophy), and the recognition that employees receive is considerably more public and lasting than a corporate award ceremony. Development of open source as a resource. Large corporations are already major users and sponsors of open-source software, often with direct involvement at the foundation level. By entering the open source world as active contributors, they are positioning themselves to exert even greater influence on its course of development by engaging the open source community on the ground. Behind the direct move into open source is also the recognition that the basic model of the software industry has largely shifted from selling products, which by their nature are at least somewhat proprietary, to selling services, where the unique value of what is being sold depends much more on the specific combination of services being provided, along with the interactions between the vendor and the customer. The backend code at a website can all be generic; the “brand” — the combination of look-and-feel, services provided, name recognition, and trademarks — is the only thing that really needs to be proprietary. And even when other providers manage to successfully clone a brand, they may come up short, as Facebook’s would-be competitors have discovered. Facebook is an instructive example, because (even though its backend code is unique and largely proprietary) the unique service which it provides, and which gives it its value, is the community of users — something that by its nature isn’t proprietary. In the service model, the uniqueness of tools becomes less and less important. In a world where all services used the same basic set of tools, individual service providers could and would still distinguish themselves based on the combinations of services that they offered and the intangibles associated with those services. This doesn’t mean that the source code for SAP’s core applications is about to become worthless, of course. Its value is intimately tied to SAP’s brand, its reputation, its services, and perhaps more than anything, to the accumulated expertise, knowledge, and experience which SAP possesses at the organizational level. As with Facebook, it would be much easier to clone any of SAP’s applications than it would be to clone these intangibles. But the shift to services does mean that for large corporate developers like SAP, placing the code for new projects (particularly for auxiliary software not closely tied to their core applications) in an open source repository may be a reasonable option. The boundary between proprietary and open source software is no longer the boundary between worlds, or between commercial developers and open source foundations. It is now more of a thin line between proprietary and open source applications (or components, or even code snippets) on an individual basis, and very possibly operating within the same environment. For current and future software developers, this does present a challenge, but one which is manageable: to recast themselves not as creators of unique source code, or even as developers of unique applications, but rather as providers of unique packages of applications and services. These packages may include both proprietary and open source elements, but their value will lie in what they offer the user as a package much more than it lies in the intellectual property rights status of the components. This kind of packaging has always been smart business, and the most successful software vendors have always made good use of it. We are rapidly entering a time when it may be the only way to do business.

Blog

Step-by-Step Approach To Reducing On-Call Pain

Companies that move fast put pressure on developers and QA to continually innovate and push software out. This leaves the people with the pager, quite often the same developers, dealing with a continuous flow of production problems. On-call pain is the level of interrupts (pager notifications), plus the level of work that the on-call is expected to perform “keeping the system up” during their shift. How can we reduce this pain without slowing down development or having decrees like “there shall be no errors in our logs”? Assuming there is no time to do overhauls of monitoring systems, or make major architecture changes, here is a step-by-step approach to reducing on-call pain. Measure On-Call Pain As always, start out by measuring where you are now and setting a goal of where you want to be. Figure out how often your on-call gets paged or interrupted over a large period of time, such as a week or month. Track this number. If your on-call is responsible for non-interrupt driven tasks such as trouble tickets, automation, deployments or anything else, approximate how much time they spend on those activities. Make a realistic goal of how often you think it’s acceptable for the on-call to get interrupted and how much of their time they should spend on non-interrupt driven tasks. We all want to drive the interrupt-driven work to zero, but if your system breaks several times per week, it is not realistic for the on-call to be that quiet. Continuously track this pain metric. Although it may not impact your customers or your product, it impacts the sanity of your employees. Reduce Noise The first step to reducing on-call pain is to systematically reduce the alert noise. The easiest way to do it is to simply ask the on-call to keep track of the noise (alarms that he did not have to fix). Remove alarms where no action is required. Adjust thresholds for alarms that were too sensitive. Put de-duplication logic in place. The same alarm on multiple hosts should log to the same trouble ticket and not keep paging the on-call. If you have monitoring software that does flapping detection, put that in place. Otherwise adjust thresholds in such a way to minimize flapping. Stop Abusing Humans Any time that you have playbooks or procedures for troubleshooting common problems, ask yourself if you are engaging in human abuse. Most playbooks consist of instructions which require very little actual human intelligence. So why use a human do to them? Go through your playbooks and write scripts for everything you can. Reduce the playbook procedure to “for problem x, run script x.” Automate running those scripts. You can start with writing crons that check for a condition and run the script and go all the way to a complex auto-remediation system. Get The Metrics Right Metrics have the ability to reduce on-call pain, if used correctly. If you know and trust your metrics, you can create an internal Service Level Agreement that is reliable. A breach of that SLA pages the on-call. If you have the right type of metrics and are able to display and navigate them in a meaningful way, then the on-call can quickly focus on the problem without getting inundated with tens of alarms from various systems. Create internal SLAs that alarm before their impact is felt by the customer. Ensure that the on-calls can drill down from the alarming SLA to the problem at hand. Similar to deduping, preventing all related alarms from paging (while still notifying of their failure) relieves pager pain. The holy grail here is a system that shows alarm dependencies, which can also be achieved with a set of good dashboards. Decide On Severity If an on-call is constantly working in an interrupt-driven mode, it’s hard for him or her to assess the situation. The urgency is always the same, no matter what is going on. Non-critical interrupts increase stress as well as time to resolution. This is where the subject of severity comes in. Define severities from highest to lowest. These might depend on the tools you have, but generally you want three severities: Define the highest severity. That is an outage or a major customer facing incident. In this case, the on-call gets paged and engages other stakeholders or an SLA breach pages all the stakeholders at the same time (immediate escalation). This one does not reduce any on-call pain, but it should exist. Define the second severity. This is a critical event. When an internal SLA fires and alarm or a major system malfunction happens. It is best practice to define this as an alarm stating that customers are impacted or are going to be impacted within N hours if this does not get fixed. Define the third severity. The third severity is everything else. The on-call gets paged for the first two severities (they are interrupt-driven) but the third severity goes into a queue for the on-call to prioritize and work through when they have time. It is not interrupt-driven. Create a procedure for the non-interrupt driven work of the third priority. Move alarms that do not meet the bar for the second severity into the third severity (they should not page the on-call). Ensure that the third severity alarms still get done by the on-call and are handed off appropriately between shifts. Make Your Software Resilient I know that I began with statements like “assuming you have no time,” but now the on-calls have more time. The on-calls should spend that time following up on root causes and really making the changes that will have a lasting impact on stability of the software itself. Go through all the automation that you have created through this process and fix the pieces of the architecture that have the most band-aids on them. Look at your SLAs and determine areas of improvement. Are there spikes during deployments or single machine failures? Ensure that your software scales up and down automatically. I have not covered follow-the-sun on-calls where an on-call shift only happens during working hours and gets handed off to another region of the world. I have also not covered the decentralized model of having each development team to carry primary pagers only for their piece of the world. I have not covered these topics because they share the pain between more people. I believe that a company can rationally make a decision to share the pain only once the oncall pain has been reduced as much as possible. So, I will leave the discussion of sharing the pain for another blog post.

October 20, 2015

Blog

New Heroku Add-on for Sumo Logic Goes Beta

Today, Sumo Logic is pleased to announce that it is partnering with Heroku to bring a new level of real-time visibility to Heroku logs. Now Heroku developers will be able to select the Sumo Logic add-on directly from the Heroku marketplace, and quickly connect application, system and event logs to the Sumo Logic service with just a few clicks. Developers can then launch the Sumo Logic service directly from their Heroku Dashboard to gain real-time access to event logs in order to monitor new deployments, troubleshoot applications, and uncover performance issues. They can take advantage of Sumo Logic’s powerful search language to quickly search unstructured log data and isolate the application node, module or library where the root cause of a problem hides. Developers can also utilize patent-pending LogReduce™ to reduce hundreds of thousands of log events down to groups of patterns while filtering out the noise in your data. LogReduce can help reduce the Mean Time to Identification (MTTI) of issues by 50% or more. Developers also have access to Outlier & Anomaly Detection. Often, analysis and troubleshooting is centered around known data in systems. However, most errors, and security breaches stem from unknown data, or data that is new to a system. Analyzing this data requires highly-scalable infrastructure, and advanced algorithms to process the data. This is what Sumo Logic enables with its Anomaly Detection feature. Extending Heroku’s Logplex Heroku is a polyglot Platform-as-a-Service (PaaS) that allows developers to build applications locally, then push changes via Github up to Heroku for deployment. Heroku provides a managed container environment that supports popular development stacks including Java, Ruby, Scala, Clojure, Node.js, PHP, Python and Go. For logging, Heroku provides a service called Logplex that captures events and outputs streams of your app’s running processes, system components and other relevant platform-level events, and routes them into a single channel. Logplex aggregates log output from your application (including logs generated from an application server and libraries), system logs (such as restarting a crashed process), and API logs (e.g., deploying new code). The caveat is that Heroku only stores the last 1,500 lines of consolidated logs. To get Sumo Logic’s comprehensive logging with advanced search, pattern matching, outlier detection, and anomaly detection, you previously had to create a Heroku log drain – network service that can consume your app’s logs. Then you could configure an HTTPS service for sending logs to Sumo Logic. Seamless UX The Heroku add-on simplifies this process while providing developers with a seamless experience from the Heroku Dashboard. Now, with the Heroku add-on for Sumo Logic, you simply push your changes up to Heroku, then run the following on the command line to create your app: heroku addons:create sumologic –app <my_app_name> This creates the application on Heroku, configures the app and points the log drain to the Sumo Logic service for you automatically. To view your logs, simply go to your Heroku dashboard and launch the Sumo Logic add-on. From the Heroku Dashboard, simply click on the Sumo Logic add-on. That will open Sumo Logic. Heroku Add-on for Sumo Logic Quick Start I’ve created a quick start that shows you how to build a simple Ruby app on Heroku, install the Sumo Logic add-on, and connect your new app to the Sumo Logic service. You can use Sumo Free to test your configuration, and you can run through the entire quick start in 15 minutes. About the Author Michael is the Head of Developer Programs at Sumo Logic. You can follow him on Twitter @CodeJournalist or LinkedIn.

October 15, 2015

Blog

Heroku Add-on for Sumo Logic Quick Start

October 15, 2015

Blog

Public vs. Private Cloud

Blog

Best Practices for Securely Leveraging the Cloud

Over 20,000 people from all over the world descended on Las Vegas this week for Amazon’s completely sold out AWS re:Invent 2015 show. They came for many reasons, education, networking, great food, music and entertainment. But most importantly, they came because of AWS’s leadership and relevancy in this world of software-centric businesses driving continuous innovation and rapid delivery cycles, leveraging modern day public cloud infrastructures like AWS. On the second day of the event, I had the opportunity to sit through an afternoon session titled: If You Build It, They Will Come: Best Practices for Securely Leveraging the Cloud. Security expert and industry thought leader Joan Pepin, who has over 17 years experience in policy management, security metrics and incident response – as well as being the inventor of SecureWorks’ Anomaly Detection Engine – gave the presentation. There is no doubt that cloud computing is reshaping not only the technology landscape, but also the very way companies think about and execute their innovative processes and practices to enable faster, differentiated and more personalized customer experiences. And a path to operating in the cloud securely and confidently requires a new set of rules and a different way of thinking. This was at the heart of Joan’s session – helping security practitioners adapt to this paradigm shift and creating a pathway to securely leveraging the cloud with confidence and clarity. Securing Your Future “We are in the middle of a mass extinction. The world we are used to living, working and operating in is going to disappear over the next ten years. It’s already well underway. We are seeing the mass extinction of traditional Datacenter, of Colocation and of being our own infrastructure providers,” said Pepin. I expect a new mantra will be echoing through corporate boardrooms around the globe in the not too distant future: “Friends don’t let friends build datacenters.” Joan suggests that the future – and how one secures it – is going to be very different from the past and what most people are doing in the present. She knows this first hand, because she is living this every day, running Sumo Logic’s state-of-the-art advanced analytics platform that ingests over 50TB of data and analyzes over 25PB – daily! Joan passionately states: “The future is upon us. The cloud is the wave of the future: the economics, the scalability, the power of the architecture, security built-in from inception. It’s inevitable. If we are not prepared to adapt our thinking to this new paradigm, we will be made irrelevant.” There are boxes, inside boxes, inside boxes. And security people had very little to do with the design on those boxes. Throwing in a few FWs and IDS/IPSs into the box was how things used to be done. This is not the way to build security into a massively scalable system, with ephemeral instances. That is not a way to make security fractal so as you expand your footprint, security goes along with you. In this new paradigm, security has a greater opportunity to be much more involved in the delivery of the service and design of the architecture and be able to take security to a completely different level so that it is embedded in every layer of the infrastructure and every layer of the application. “Do I really need to see all the blinking lights of the boxes to be secure? Too many decisions are being made emotionally, not rationally.” Operationally, security organizations need to change their thinking and processes from traditional data center-centric models (aka “Flat Earth” thinking) to new, more statistical models. AWS presents this giant amorphous blog of power, with APIs, elasticity, configurability and infrastructure as code. Security is now embedded into all that automation and goodness. As you expand, as you grow, as you change, the security model stays the same and weaves itself throughout your cloud infrastructure. “This was my world is round moment” said Pepin. “I have seen the light and will never go back. My CISO friends in more traditional companies are envious of what we have been able to achieve here at Sumo Logic – the ability to ingest, index, encrypt, store and turn the data back around for searching in 30 seconds – this is generations ahead of the market. It is how the cloud of tomorrow works today!” Joan provided a number of practical and insightful best practices that security professionals should follow in thinking about cloud security: Less is More: Simplicity of design, APIs, interfaces, and data-flow all help lead to a secure and scalable system. Automate: Think of your infrastructure as code-based – it’s a game changer; Test, do rapid prototyping and implement fully automated, API-driven deployment methods; Automate a complete stack Do the Right Thing: Design in-code-reuse and centralize configuration information to keep attack surface to a minimum; Sanitize and encrypt it; Don’t trust client-side verification; enforce everything at every layer. Defense in Depth: Everything. All the Time Achieve Scale by Running POD Model Use Best-of-Breed Security Stack: IDS, FIM, Log Mgt., Host Firewall. To watch Joan’s video, please select this link: AWS re:Invent 2015 | (SEC202) Best Practices for Securely Leveraging the Cloud For more information on Sumo Logic’s cloud-native AWS solutions please visit AWS Integrations for Rapid Time-to-Value.

AWS

October 8, 2015

Blog

Leveraging AWS Spot Instances for Continuous Integration

Automated Continuous Integration (CI), at a high level, is a development process in which changes submitted to a central version control repository by developers are automatically built and run through a test suite. As builds and tests succeed or fail, the development team is then aware of the state of the codebase at a much more granular level, providing more confidence in deployments. While CI is often used primarily on production-ready branches, many implementations run builds and tests on all branches of a version control system, giving developers and managers a high level view of the status of each project in development. The trouble with automated CI is that it typically requires an always-on server or an expensive SaaS product in order to run builds and tests at any given time. In a large development team, this can be prohibitively expensive as a large backlog of commits would require more resources to test in a reasonable amount of time.A way that organizations can save money is by utilizing Amazon Web Services (AWS) EC2 On-Demand Instances. These instances let you pay by the hour with no long-term commitments. You just spin up an instance when you need one, and shut it down when you’re done. This is incredibly useful, as in the times between builds, you aren’t paying for servers, and conversely, your CI environment can scale to the needs of your team as the need for testing builds increases. While more volatile, AWS Spot Instances are even more cost effective than On-Demand instances. Spot Instances are spare On-Demand instances Amazon auctions off at up to 90% off the regular price that run as long as your bid exceeds the current Spot Price, which fluctuates based on supply and demand. The trouble with Spot Instances is that they can disappear at any point, so applications running on them need to be able to appropriately handle this unpredictability. This requirement puts us at a bit of a disadvantage when configuring a CI server, as it will need to be capable of going down in an instant without throwing false failure negatives.Sample Reference Architecture (Source: aws.amazon.com)Finding a CI server that can safely handle the volatility of AWS Spot Instances can be tough, and configuration and maintenance is a bit more difficult as well, but luckily Jenkins, one of the most popular open source CI servers, has an extensive plugin library that provides extended support for most use cases; including Amazon EC2 support. The Amazon EC2 Jenkins plugin is a plugin that is in active development that gives Jenkins the ability to start slaves on EC2 on demand. This allows you to run a lower powered Jenkins server, and spin up slaves for more resources as they are needed. The EC2 plugin also provides great Spot Instance support, giving you the ability to configure persistent bid prices and monitor slaves more accurately. In the event a Spot slave is terminated, a build will be marked as failed, but the error messaging will reflect the reason appropriately.While the barrier to entry of configuring your CI environment to utilize AWS Spot Instances can be high, the possible savings can be even bigger. SaaS products are often a great way to offload the time and expertise needed to manage services like this, but CI services can be unnecessarily expensive. The same can be said for always-on servers in large organizations with a backlog commits that need to be built and tested. Spot Instances are a great way to dynamically allocate only the resources that an organization needs when they need them, while at the same time reducing operating costs and build wait times.

Blog

New: Webhook Alerts for Scheduled Searches - Quickly Enable Integration to Slack, PagerDuty, Datadog, and others!

Hello world! Today, we're excited to announce the new Webhook alert type in Sumo Logic. This new feature will allow you to easily fire off alerts from Sumo Logic Scheduled Searches into a variety of third-party tools such a Slack, PagerDuty, VictorOps, and Datadog. Webhooks can also enable easy integration to your own custom app or unlock a variety of use cases via third-party integration frameworks like IFTTT and Zapier as well. Setting up a new Webhook integration is easy: The first step is to create a new Connection to a third-party system under Manage->Connections in the UI. (Note: this is an Admin function.) In addition to ServiceNow for ticketing, you'll now see a generic Webhook alert type, as well as starter templates for Slack, Datadog, and PagerDuty. Next, the Connection will allow you to provide a few simple fields to enable the Webhook. After entering the Name and Description and target URL details, the key piece to get right is the JSON Payload. This field allows you to construct a JSON object in the format expected by the target webhook system. The payload can also be parameterized with variables with specific information about the search itself such as Name, Description, Fire time, Number of results, etc. That's it! Once the new connection is created, it can be used by Sumo users for alerting within their scheduled searches: The new feature is available in your account today! I'd love to hear about any interesting use cases you've enabled, so please feel free to send feedback directly to [email protected] or via our Support system. Happy integration! Sahir Director, Product Management Sumo Logic

October 5, 2015

Blog

5 Traits Of Highly Effective Operations Teams

What are some of the key skills that highly effective operational teams possess? What makes one operations team much more effective than another? Here is my take on a few of the key traits highly effective operations teams have. Trait 1: React with speed and effectiveness The quality of service depends on how quickly you can identify and resolve the issue. Mean Time to Identify (MTTI) and Mean Time to Resolution (MTTR) are two key metrics that will tell you how you are doing on this dimension. Your team should be tracking those two numbers depending their impact on the customers. It’s okay if those numbers are higher for low impact issues, but for high impact issues those metrics should be low. Having good visibility into what is going on deep within the system is key to resolving issues within minutes as opposed to hours and days. Trait 2: Proactively monitor – Actively look for failures Teams at this level are hypersensitive about failures that went undetected. The goal is to lower the frequency of issues that are missed by monitoring and alerting framework. Better than having one metric, such a metric should be broken down by event severity. Over time the number of issues missed by monitoring/alerting for a really high severity should go down from “frequently” to “very rare”. If your team is small, you might want to focus on a particular set of KPIs and develop full loop capability for detecting, alerting and remediating those issues. Trait 3: Build really good playbooks – Know how to respond when an alert happens High levels of monitoring and alerting can quickly lead to “alert fatigue.’ To reduce this, create easy to find and execute playbooks. Playbooks are a simplified list of steps that tell an operator how to respond when an alert happens. Playbooks should be written in a way that it requires zero thinking for an operator to execute on it. And remember to make those playbooks really easy to discover. Or heck, put a link to that in the alert itself. Trait 4: Do retrospectives – Learn from failures Failures are inevitable. There is always something that will happen, your goal is to avoid repeating them. To go one step beyond Trait 2, look at the issues and ask questions as to what was it about the process, architecture or people that led that failure to happen. Was the time it took to resolve the issue acceptable? If not, what can be done to reduce the time it took to resolve it? Can we automate some of the identification/resolution steps to reduce MTTI and MTTR? Teams can get really good at this by building a culture of blameless post mortems, focusing relentlessly on finding the root cause. For if a team doesn’t truly understand the root cause, they can’t be sure that the issue is fixed. And if you aren’t sure that you have fixed the issue, you cannot be sure that it won’t happen again. Ask yourself the five whys until you get to the root cause. Sometimes five is not enough. You have to really get down to the core issue, an issue that you can fix. If you cannot fix right away, at least detect and recover from that very quickly, hopefully without any impact to the service. Trait 5: Build resiliency into the system – Make use of auto-healing systems Having said all the above, many of the issues that turn into operational nightmares can be caught and taken care of at design time. Make the requirement to be able to run a service at a high quality a key requirement from the get go. You will be paying much more for bad design/architectural choices several times over by the time the service is generally available and used.

Blog

Sumo Logic Delivers Big Data Analytics to DataDirect Cloud Command Center

DataDirect Cloud is a connectivity platform as a service (cPaaS) running on Amazon Web Services (AWS) that provides data connectivity through open industry standards (OData, ODBC, JDBC). To illustrate the critical nature of the service, Intuit uses it for real-time connectivity for 10,000 Salesforce users to Oracle data behind their corporate firewall. If the service goes down, Intuit agents are no longer able to access order histories, case histories or commissions’ calculations. Beyond this use case, we have thousands of DataDirect Cloud users connecting various data sources for their core business operations. DataDirect Command Center (DCCC) is a collection of Sumologic dashboards that measure key metrics focused around DataDirect Cloud user experiences. Each of the dashboards is displayed on an individual monitor and physically mounted to display a collective view across the following metrics: Top users Top Errors Trends for Key Customers with personalized screens broken out by account types Production Usage Metrics Throughput Error Counts Data Source Volumes Interface Volumes Types of Queries Executed Failure Rate Integrated JVM metrics Notes: No customer specific information is monitored per Progress Software privacy policies. For monitoring JVM memory, code was developed using JMX to feed metrics from Amazon into Sumologic providing a 360 view of the systems. Eric Brown, Cloud Engineering Manager, led the effort to create the DCCC and both are pictured above. The DCCC leverages the built-in dashboarding functionality in Sumologic and it was developed to enhance DataDirect Cloud user experiences. The service enables new data connectivity functionality that pushes traditional workloads making it imperative to deliver a great user experience. In response, the engineering organization actively monitors the DCCC to detect anomalies in usage patterns and take appropriate actions as described in the next section on visualization. Sumologic has delivered big data analytics throughout the organization, and we did not have to engage our data scientists for the project. Below are tips for creating the search queries in Sumologic for DCCC: Start small: small in scope and small in your search window (timeframe) Stay focused: take one question you want to know and create a query to answer it. Then go to the next. This is sometimes hard when there are many questions to answer, but the focus will help you learn Sumologic syntax and particularities much faster. Filter steps: It’s impossible to know about all the logs in a moderately complex system. Start searches broad then filter down one step at a time. Sumologic uses a “pipe” to represent a filter step. Keep in mind that each time a log passes through the pipe (“|”) it may not come out the other end. Think of these pipes like “gates”. There are keywords that let logs “through the gate” like a password where otherwise the filter condition would have blocked it. Multiple Filters: adding filters one by one and checking results was the most efficient way to move forward in developing larger, more complex queries. It’s much easier to troubleshoot a single filter statement than 4 back-to-bacl/ Confirm, confirm, confirm: Make sure your queries are “correct” in every sense. Nothing is worse than making judgments on queries that return unintended results. Comments: Use the “//” to comment out a line (saves a bunch of time troubleshooting queries). It also provides you space for comments. We have many generic queries about user logs where you just have to add the user id to the query. We use the “//” to provide instructions to the Sumologic user. Ask for Help: The documentation is great and support is outstanding. Kevin at Sumologic answers questions quickly and accurately (even over the weekend). Monitoring Usage example for a QA user (using Comments) When to use alerts versus real-time data visualizations Devops has several tools for monitoring applications and systems across a mix of Sumologic and open source tools such as Zabbix and Cacti. The R&D team on the other hand is interested in very specific information captured in DCCC and both teams work together to exchange queries and intelligence. When considering intelligence in our systems, humans are fantastic at detecting patterns in visualizations when the question is unknown. On the other hand, alerts are great when you know the question and answer. In most cases, it’s not constructive to look at large amounts of raw data. When it comes to data visualization, dashboards are more than just dashboards They can act as a starting point for deeper investigations. This helps to “ramp-up” engineers that are new to Sumologic by providing a starting point during troubleshooting activities. The R&D team started to look at visualization since they may not know what patterns to detect for alerts from the dashboards in DCCC. One example is that the R&D team detected an anomaly in “user experience” through visual insight of a dashboard and proactively alerted the customer before they contacted our support team. This is a great example of effective monitoring through data visualization and customer service in the cloud era. Alerts are very useful, some information is not mission critical but still very important to growing our products. We’ve created queries and attached them to automated alerts through Sumologic to monitor the “user experience” of our evaluation users. Every morning we get an automated email with exception reports from which we decide whether or not to reach out to specific users proactively for help. Once a visualization uncovers value for customers, those are then integrated into the larger alert system run by devops over time. How DCCC forces R&D to use best practices for logging messages In building common queries that are shared between R&D and Devops, best practices were developed in logging as follows: Naming your collectors Do yourself a favor and settle on a naming convention for your collectors up front. We have several deployments in the cloud. Each deployment consists of many virtual machines. Each virtual machine can have one or more programs generating logs. Sometimes you want to search all the logs within a particular instance to piece together a “chain of events”. If all the collectors for a particular instance have the same prefix it’s easy to start searching your logs (and it’s easier to remember). When trying to troubleshoot a workflow wesometimes look for a username in all the logs. That’s easy to do using wild cards in the Sumo Logic search. We use the format: [Product]-[Instance]-[Function]-[IPAddress] So we might have the following: [coolproduct]-[live]-[nat]-[127.0.0.1] [coolproduct]-[live]-[website]-[127.0.0.2] [coolproduct]-[live]-[db]-[127.0.0.3] [coolproduct]-[live]-[not]-[127.0.0.4] With this structure it’s easy for me to quickly search all the logs in the production instance of a product using: _sourceCategory=coolproduct*live* and “username” _sourceCategory=coolproduct*test* and “username2” And, of course, a focused query is just as easy to follow/remember: _sourceCategory=coolproduct*live*website and “username” Or search across all instances like this (like testA, testB, live) _sourceCategory=coolproduct* and “username” Common Log Structure The structure of your logs are also important. If you can keep a similar pattern on log contents that will help you with your parsing logic. For example, if you can settle on some common fields across most logs (like we did) you can start your logs entries like this: [dateStamp][loglevel][username][x][y][z] Use key/value pairs where possible, this makes parsing easier. For example: success=true ms=341 (response time) version=2.3 Example Log 25-Aug-2015 17:08:18.264 INFO [http-nio-8080-exec-1] [username] [FwO3Wvy5frS6O9wART3Y].[login] [success=true][ms=1242][bytesIn=91][bytesOut=1241[clientVersion=2.0.1.64][timezone=America/New_York][flags=1][portNumber=xyz][connectionRetryCount=0] Our product integrate with users backend systems and we typically include error messages from those backend system into our logs. This allows us to alert our users to issues they may not know about on their end. What’s next for DCCC and next phase of analytics for Sumo Logic data? The Sumologic dashboards are fantastic and there are plans in the DataDirect Cloud R&D offices in Research Triangle Park, NC to expand the Command Center to common areas to crowd source pattern detection in the dashboards. It’s also a unique opportunity to engage more employees directly with our technology and serves as a constant reminder of all the impressive work that goes into devops and cloud R&D for successfully running business critical cloud applications. DevOps is planning to expand the concept of a Command Center to Progress Software corporate headquarters in Bedford, MA for even greater visualization across the complete portfolio of cloud applications. About the Author Sumit Sarkar is the Chief Data Evangelist for Progress Software. You can follow him on LinkedIn www.linkedin.com/in/meetsumit and Twitter @SAsInSumit.

Blog

Continuous Intelligence for Your AWS Environment

Blog

Continuous Intelligence: Business Intelligence Redefined For Software-Centric Business

Blog

Operational Visibility for Amazon Web Services

Blog

API Design - A Documentation-first Approach

What exactly makes a “good” API? That is a question a lot of developers ask when designing their first API. While there are hundreds of resources online, all with differing opinions about what defines “good,” the majority of them share some similar themes. Logical endpoint naming conventions, clear error messaging, accessibility, and predictability are all crucial pieces in any well-designed API. Most importantly, every good API I’ve ever worked with has had clearly written and easily understandable documentation. On the flip side, poor documentation is one of my biggest frustrations with any API I use. A great example of a good API with excellent documentation is Stripe. Any developer who has worked with it can attest to how well written it is. With clearly defined endpoints, transparent error messages, usable examples, a slew of great SDKs, and standards-compliant methodology, Stripe is often used as a reference point for API development. Even their documentation is used as a source of inspiration for many freely and commercially available website templates. When designing an API, it is often desireable to take a “build first” approach, especially when utilizing the architecture of a pre-existing product. Unfortunately, this mindset doesn’t follow the standard usability practices that we follow when building apps with graphic interfaces. It is extremely important that we take a user-centric approach to API design, because we are developing a product to be consumed by other developers. If we can’t empathise with their needs and frustrations, then who can we empathise with? This user-centric focus is an important reason to start by writing your API documentation, rather than just designing it. When you create good documentation, good design follows, but the reverse isn’t necessarily true. Designing an API before you write any code can be difficult when you are working with pre-existing architecture. Your pre-conceived notions about how a system works will influence your design, and may result in a less-than-logical API. Starting with the documentation first will force you to design an unopinionated API. If you write documentation that you as a user would want to read, there is a good chance that your own users will appreciate it as well. Almost as important as the content of your documentation is how easy it is to read. There are tons of services out there that make this incredibly easy, and even go so far as to generate dynamic documentation along with beautiful templates. A great way to start is an open source API documentation library called API Blueprint. API Blueprint allows you to build out API documentation using markdown, and export it into a well-structured JSON file for importing into a number of services. Apiary and Gelato.io are two services that allow you to import API Blueprint files for hosting, and even testing your API design prior to writing any code. Remember that, while writing your API documentation, the most important factor to consider is how valuable it is to a user. To quote Apiary, “An API Is Only As Good As Its Documentation.” It may be tough, at times, to separate the perfect structure of your API from the current structure of an existing system, but it is important to remember that it is just another user interface. While the backend architecture of a current application has some influence on the frontend, we often have to find creative solutions in order to provide the best possible experience for our users. The only difference between that and API design is that our target users are much more tech savvy, and thus potentially less forgiving when something doesn’t make sense.

Blog

DevOps is a Strategy

Blog

Feel the Spray Yet? (Riding the Ocean of Data Tsunami)

The talk of digital business transformation is vast and deep, but what of it’s impact on the very nature of innovation itself? For example, the speed in which an idea goes from the spark in an entrepreneur’s eye to a disruptive new service is uncanny. That journey, largely borne out through software, has become so compelling that even Bloomberg recently devoted an entire issue to it, as if we were collectively being introduce to our new life partner: code. Well, to code, I say, “meet data.” It’s not a blind date – data is your soul mate. Soul mate? To many IT/app leaders and managers, it’s probably more like the mate they can’t live with. That’s because code – specifically the code that has become the new, exciting software- centric businesses disrupting the way I travel, buy window fans, hitch a ride, get my baby wrap burrito, and recycle my junk — is generating data at a velocity, variety and volume that is virtually impossible to extract value from in a timely manner. This is creating unprecedented complexity and risk in the infrastructure that’s hand wringing at best, and, [I won’t say it but you know] at worst. Code is creating an ocean of data at such a fast rate, that even if an IT/App leader could build a solution to address it, that solution would be outdated at deployment – maybe even at conception. We built an infographic below to share some of the stats on just how fast this data tsunami is growing. Feeling the spray of the wave yet? Well, don’t worry – Sumo’s got your back. Sumo Logic’s cloud-native, data analytics service, is the way to master the new IT/App complexity and risk. It’s an always on, current, scaling, elastic, learning (through advance machine learning algorithms), secure (the most advanced cloud-based security model in the industry) service. And it comes with built-in, advanced analytics, to help you uncover the patterns and anomalies across the your entire infrastructure stack that support business and operational insights. With Sumo Logic, code and data become a match made in heaven – the mate you can’t live without. Don’t delay – start your free Sumo Logic trial today!

September 24, 2015

Blog

Security Analytics in the AWS Cloud – Limiting the Blast Radius

This blog focuses on security event management within the AWS cloud and presents some options for implementing a security analytics solution. Security Analytics in the AWS CloudThe basic role of security analytics remains the same in the cloud, but there are a few significant differences. Arguably the biggest, is the effective blast radius of an incident can be far greater in the cloud.“I built my datacenter in 5 minutes” is a great marketing slogan and bumper sticker that AWS have. However, if someone compromises an IAM role with admin privileges, or worse, your root account, they can completely destroy that datacenter in well under 2 minutes. Having an effective strategy to identify, isolate and contain a security incident is paramount in the cloud. Amazon Web Services prides itself on its security and compliance stature and often states that security is Job Zero. Nonetheless, customers need to be mindful that this is a shared responsibility model. Whilst AWS agrees to provide physically secure hosting facilities, data storage and destruction processes and a vast array of tools and services to protect your applications and infrastructure, it is still ultimately the customer’s responsibility to protect and manage the services they run inside AWS. To name just a few of these :Managing your own firewall rules including ACLs and Security Groups.Encrypting your data both in transit and at rest ( inc. managing keys )Configuring and managing IPS and WAF devicesVirus/Malware detectionIAM events ( logins, roles etc )The list goes on… and with the speed that AWS releases new products and features, it is important to be able to keep on top of it all. We know AWS and a number of their technology and managed services partners are well aware of this and provide some really useful tools to help manage this problem.We will focus on the following AWS services and then discuss how we can incorporate them into a strategic Security Analytics solution:IAM ( Identity & Access Management ) is an absolute must for anyone who is serious about securing their environment. It provides authenticated and auditable access to all of your resources. Through the use of users, group, roles and policies you can create fine grained permissions and rules. It also allows you to federate credentials with an external user repository.CloudTrail can log all events from IAM and is one of the most important services from a SIEM perspective. CloudTrail is a web service that records all kinds of API calls made within IAM, and most other AWS services. It is essential from an auditing perspective and in the event you need to manage a security event. See this link for a full list of supported services that also links back to the relevant API reference guide. Additionally, this link provides detailed information about logging IAM events to CloudTrail.VPC Flow Logs is a fairly recent addition to the AWS inventory but has long been a feature request from the security community. Whilst Security Groups and ACLs have long provided customers with the ability to control access to their VPC, they weren’t previously able to see the logs generated. A key part of a SIEM solution is the ability to process “firewall” logs. It’s all well and good knowing that an authorized user can access a particular service but it can also be very useful to know what requests are getting rejected and who is trying to access protected resources. In this respect, Flow Logs now gives customers a much clearer view of the traffic within their VPC. Rather conveniently, Flow Logs data is processed by CloudWatch LogsCloudWatch Logs is an extension of the CloudWatch monitoring facility and provides the ability to parse system, service and application logs in near real time. There is a filtering syntax that can be used to trigger SNS alerts in the event of certain conditions. In the case of applications running on EC2 instances, this requires a log agent to be installed and configured. See this link for how to configure CloudTrail to send events to CloudWatch Logs. This service does somewhat impinge upon the functionality of existing log management products such as SumoLogic and Splunk, however, as we’ll explain in a separate post, there is still a good argument for keeping your third party tools.Config is a service that allows you to track and compare infrastructure changes in your environment over time and restore them if necessary. It provides a full inventory of your AWS resources and the facility to snapshot it into CloudFormation templates in S3. It also integrates with CloudTrail, which in turn integrates with CloudWatch Logs to provide a very useful SIEM function. For example, if a new Security Group gets created with an open access rule from the internet an alert can be raised. There is quite a bit of functional overlap with CloudTrail itself but Config can also be very useful from a change management and troubleshooting perspective.Here are a couple of real world examples that make use of these services.Example 1This scenario has a rogue administrator adding another unauthorized user to the admin role inside the IAM section of the AWS Console.If we have configured CloudTrail then this event will automatically get logged to an S3 bucket. The logs will be in JSON format and this particular entry would look something like this. An IAM role can be assigned to CloudWatch Logs to allow it to ingest the CloudTrail events and a filter can be applied to raise an alarm for this condition. You can use SNS to initiate a number of possible actions from this.Some other possible events ( there are many ) that we may want to consider monitoring are:AuthorizeSecurityGroupIngress – someone adding a rule to a security groupAssociateRouteTable – someone making a routing changeStopLogging – someone stops CloudTrail from recording eventsUnauthorized* – any event that returns a permission error“type”:”Root” – any activity at all performed under the root accountExample 2This is a very high level overview of how VPC Flow Logs, and essentially all the services we’ve outlined in this post, can be integrated with a third party log management tool.In my opinion, whilst CloudWatch Logs does provide some very useful and low cost monitoring capabilities, there are quite a few dedicated tools provided by AWS technology partners that offer a number of advantages in terms of configurability, functionality and usability. SumoLogic appears to be one of the first vendors to integrate with VPC Flow Logs and is very easy to get up and running with. As always, thank you for taking the time to read this post. I’d also like to thank David Kaplan, Security and Compliance Principal at AWS Australia, for his valuable input to this piece.This blog was contributed by our partner friends at Cloudten. It was written by Richard Tomkinson, Principle Infrastructure Architect. Cloudten Industries © is an Australian cloud practice and a recognized consulting partner of AWS. They specialize in the design, delivery and support of secure cloud based solutions.For more information on Sumo Logic’s cloud-native AWS solutions please visit AWS Integrations for Rapid Time-to-Value.

AWS

September 23, 2015

Blog

Monitoring and Analytics With the New Sumo Logic Docker App

Hello world! Today at Dockercon SF, we’re excited to announce GA availability of the Sumo Logic Application for Docker! We’ve heard from many customers that comprehensive out-of-the-box visibility and analytics capabilities for Docker are an important part of effectively managing modern containerized applications. In order to help our customers troubleshoot and monitor their growing Docker infrastructures we’ve created a Docker App which provides: Out of box real-time Dashboards and searches to help monitor Docker stats and events Visualizations of key metrics and KPIs, including image usage, container actions and faults Ability to troubleshoot issues and set alerts on abnormal container or application behavior Ability to easily create custom and aggregate KPIs and metrics using Sumo Logic’s advanced query language Advanced analytics powered by LogReduce™, Anomaly Detection, Transaction Analytics, and Outlier Detection The app is available now in the Sumo Logic Application Library. Sign up for a free trial to try it out now! Some additional resources to check out: Sumo Logic CTO Christian Beedgen’s latest Docker musings Docker Logging Drivers For information on Docker Events, see: https://docs.docker.com/reference/api/docker_remote_api_v1.18/#monitor-dockers-events For information on Docker Container statistics based on resource usage, see: https://docs.docker.com/reference/api/docker_remote_api_v1.18/#get-container-stats-based-on-resource-usage As always, we’d love to hear your feedback. And for those of you at Dockercon in San Francisco today and tomorrow, stop by the Sumo Logic booth to grab a Sumo squishy! Signing off, Sahir Azam [email protected] Sumo Logic Product Management @sumo_sahir

Blog

Who Controls Docker Containers?

It’s no secret that Development (“Dev”) and Operations (“Ops”) departments have a tendency to butt heads. The most common point of contention between these two departments is ownership. Traditionally Ops owns and manages everything that isn’t direct development, such as systems administration, systems engineering, database administration, security, networking, and various other subdisciplines. On the flipside of the coin, Dev is responsible for product development and quality assurance. The conflict between the two departments happens in the overlap of duties, especially in the case of managing development resources. When it comes to Docker containers, there is often disagreement as to which department actually owns them because the same container can be used in both development and production environments. If you were to ask me, I would say without hesitation that Dev owns Docker containers, but thanks to the obvious bias I have as a developer, that is probably an overly-simplistic viewpoint. In my personal experience, getting development-related resources from Ops can be tough. Now don’t get me wrong, I’m under no impression that this is because of some Shakespearian blood-feud; Ops just has different priorities than Dev, and spinning up yet another test server just happens to land a little further down the list. When it comes to managing development resources, I think it is a no-brainer that they should fall under the Dev umbrella. Empowering Dev to manage their own resources reduces tension between the departments, manages time and priorities more appropriately, and keeps things running smoothly. Source: www.docker.com On the flip side, Docker containers that aren’t used directly for development should fall under the purview of Ops. Database containers are good examples of this type of separation. While the MySQL container may be used by Dev, no development needs to be done directly on it, which makes the separation pretty clear. But what about more specialized containers that developers work directly on, like workers or even (in some instances) web servers? It doesn’t really make sense for either department to have full control over these containers, as developers may need to make changes to the containers themselves (for the sake of development) that would normally fall under the Ops umbrella if there was clear separation between development and production environments. The best solution I can think of to this particular problem would be joint custody of ambiguous containers. I think the reason this would work well is that it would require clear documentation and communication between Dev and Ops as to how these types of containers are maintained, which would in turn keep everybody happy and on the same page. A possible process that could work well would be for Ops to be responsible for provisioning base containers, with the understanding that the high-level configuration of these types of containers would be manageable by Dev. Because Ops typically handles releases, it would then be back on Ops to approve any changes made by Dev to these containers before deploying. This type of checks-and-balances system would provide a high level of transparency between the two departments, and also maintain a healthy partnership between them. About the Author Zachary Flower (@zachflower) is a freelance web developer, writer, and polymath. He has an eye for simplicity and usability, and strives to build products with both the end user, and business goals, in mind. From building projects for the NSA to features for Buffer, Zach has always taken a strong stand against needlessly reinventing the wheel, often advocating for using well established third-party and open source services and solutions to improve the efficiency and reliability of a development project.

Blog

New DevOps Community Enables Continuous Delivery Practitioners

Blog

Do Containers Become the DevOps Pipeline?

The DevOps pipeline is a wonderful thing, isn’t it? It streamlines the develop-and-release process to the point where you can (at least in theory, and often enough in practice) pour fresh code in one end, and get a bright, shiny new release out of the other. What more could you want? That’s a trick question, of course. In the contemporary world of software development, nothing is the be-all, end-all, definitive way of getting anything done. Any process, any methodology, no matter how much it offers, is really an interim step; a stopover on the way to something more useful or more relevant to the needs of the moment (or of the moment-after-next). And that includes the pipeline. Consider for a moment what the software release pipeline is, and what it isn’t. It is an automated, script-directed implementation of the post-coding phases of the code-test-release cycle. It uses a set of scriptable tools to do the work, so that the process does not require hands-on human intervention or supervision. It is not any of the specific tools, scripting systems, methodologies, architectures, or management philosophies which may be involved in specific versions of the pipeline. Elements of the present-day release pipeline have been around for some time. Fully automated integrate-and-release systems were not uncommon (even at smaller software companies) by the mid-90s. The details may have been different (DOS batch files, output to floppy disks), and many of the current tools such as those for automated testing and virtualization were not yet available, but the release scripts and the tasks that they managed were at times complex and sophisticated. Within the limited scope that was then available, such scripts functioned as segments of a still-incomplete pipeline. Today’s pipeline has expanded to engulf the build and test phases, and it incorporates functions which at times have a profound effect on the nature of the release process itself, such as virtualization. Virtualization and containerization fundamentally alter the relationship between software (and along with it, the release process) and the environment. By wrapping the application in a mini-environment that is completely tailored to its requirements, virtualization eliminates the need to tailor the software to the environment. It also removes the need to provide supporting files to act as buffers or bridges between it and the environment. As long as the virtualized system itself is sufficiently portable, multi-platform release shrinks from being a major (and complex) issue to a near-triviality. Containerization carries its own (and equally clear) logic. If virtualization makes the pipeline work more smoothly, then stripping the virtual environment down to only those essentials required by the software will streamline the process even more, by providing a lightweight, fully-tailored container for the application. It is virtualization without the extra baggage and added weight. Once you start stripping nonessentials out of the virtual environment, the question naturally arises — what does a container really need to be? Is it better to look at it as a stripped-down virtual box, or something more like a form-fitting functional skin? The more that a container resembles a skin, the more that it can be regarded as a standardized layer insulating the application from (and integrating it into) the environment. At some point, it will simply become the software’s outer skin, rather than something wrapped around it or added to it. Containerization in the Pipeline What does this mean for the pipeline? For one thing, containerization itself tends to push development toward the microservices model; with a fully containerized pipeline, individual components and services become modularized to the point where they can be viewed as discrete components for purposes such as debugging or analyzing a program’s architecture. The model shifts all the way over from the “software as tangle of interlocking code” end of the spectrum to the “discrete modules with easily identifiable points of contact” end. Integration-and-test becomes largely a matter of testing the relationship and interactions of containerized modules. In fact, if all of the code moving through the pipeline is containerized, then management of the pipeline naturally becomes management of the containers. If the container mediates the application’s interaction with its environment, there is very little point in having the scripts that control the pipeline directly address those interactions on the level of the application’s code. Functional testing of the code can focus on things such as expected outputs, with minimal concern about environmental factors. If what’s in the container does what it’s supposed to do when it’s supposed to do it, that’s all you need to know. At some point, two things happen to the pipeline. First, the scripts that constitute the actual control system for the pipeline can be replaced by a single script that controls the movement and interactions of the containers. Since containerization effectively eliminates multiplatform problems and special cases, this change alone may simplify the pipeline by several orders of magnitude. Second, the tools used in the pipeline (for integration or functional testing, for example) will become more focused on the containers, and on the applications as they operate from within the containers. When this happens the containers become, if not the pipeline itself, then at least the driving factor that determines the nature of the pipeline. A pipeline of this sort will not need to take care of many of the issues handled by more traditional pipelines, since the containers themselves will handle them. To the degree that container do take over functions previously handled by pipeline scripts (such as adaptation for specific platforms), they will then become the pipeline, while the pipeline becomes a means of orchestrating the containers. None of this should really be surprising. The pipeline was originally developed to automatically manage release in a world without containerization, where the often tricky relationship between an application and the multiple platforms on which it had to run was a major concern. Containerization, in turn, was developed in response to the opportunities that were made possible by the pipeline. It’s only natural that containers should then remake the pipeline in their own image, so that the distinction between the two begins to fade.

Blog

Automated Testing in a DevOps World

The objective of automated testing is to simplify as much of the testing effort as possible with a minimum set of scripts. Automated testing tools are capable of executing repeatable tests, reporting outcomes, and comparing results with faster feedback to the team. Automated tests perform precisely the same operation each time they are executed, thereby eliminating human errors – and can be run repeatedly, at any time of day. Below I’ve outline five steps how to get up and running (the right way) with automation. Step 1: Laying the Foundation The foundation is the most important element of any building, be it a house or a high-rise. It may seem like a simple part of the overall construction process, but getting the foundation right is incredibly important. Mistakes made in the foundation will only get worse as you go up. It’s known as compounding defects and it means that mistakes grow. Wait! How does this relate to automation? Proper planning will lay a solid automation foundation for project success; without it your project will be shaky and a maintenance nightmare. To avoid these potential pitfalls and keep on task, you need a good road map – start by: Planning your automation objective; Design the architecture; Training the team; Begin developing test scripts; Releasing to the wild. Throughout the onboarding process, it is important educate everyone. One of the common misconceptions of automation is that automation is a magic bullet – the initial setup and step creation will take time and effort. Automation requires effort to maintain. This needs to be factored into any planning and estimation. The principles of good programming are closely related to principles of good design and engineering. By enforcing standards, you can help developers become more efficient and to produce code which is easier to maintain with fewer defects. It’s a great opportunity to start shaping your new testing portfolio by educating everyone – they need to understand what type of testing belongs at unit, integration, and api layers when having those conversations during the sprint planning phase. Step 2: Selecting a Technology You’ve found the perfect plan and know where you want to build. The excitement is building. It’s time to start thinking about the details that can make all the difference. Choosing the right products and materials for your new home can be overwhelming. The same applies when choosing a testing framework for your automation. It is a critical part of the process. Since there are so many different testing frameworks available, it’s important to create a list of requirements to review when evaluating a framework. To help you, here are some questions to ask as you round out your requirements: Are you looking for Behavior Driven Development (BDD) framework for unit and front-end UI testing? Do you need to support browsers, mobile web, or mobile native apps? Are you looking to run your selenium test locally or in the cloud? Do you need cross-browser testing? Do you need keyword driven framework for non-technical resources? Does your team have sufficient programming knowledge for automation development? Are you practicing continuous integration and need a tool that integrates seamlessly? Step 3: Configuration Find the right pro who specializes in exactly the type of work you need done. You would never hire a plumber to do electrical work. Right? This stage is critical and can be frustrating with a lack of experience. It would be ideal to find an automation expert to design your automation architecture, teach the team the fundamental skills how to write quality tests, and continuous mentoring. If hiring an expert is not an option, I strongly suggest finding an on-site training course for everyone planning to write tests. Step 4: The Basics Every construction trade require basic knowledge of the service. I couldn’t even imagine building interior framing wall without any basic knowledge. The framing basics of your wall include the sill plate on the bottom, the wall studs (which are the vertical beams), and the top sill plate (which is the beam running across the top). It sounds simple in theory, but to get a wall plumb and square takes basic knowledge and a lot of practice. The same applies when writing automation scripts. One of the basic principles of automation is to understand how to find locators and NEVER use xPaths. It is critical that engineering teams are designing software with automation in mind. What does that mean exactly? Educate engineers how to define unique and predictable locators needed for automation. The most efficient way and preferred way to locate an element on a web page is by HTML element ID or CSS Selector. The ID and CSS are the safest and fastest locator option and should ALWAYS be the first choice. Let’s start with, only use xPath if that is the ONLY possible way, personally this is great opportunity to push back to engineering to provide predictable locators for your scripts. Here are example how to write an automation script from predictable HTML elements; HTML code with well defined locators: <div class="grid location-search-result js-store js-search-result" data-showclosed="true" data-substituted="false" data-delivery="false" data-carryout="true" data-online="true" data-orderable="Carryout Delivery" data-open="true" data-type="Carryout" data-storeid="4348" data-services="Carryout"> <a class="js-orderCarryoutNow js-carryoutAvailable btn btn--block" data-type="Carryout" data-ordertiming="current" href="#/section/Food/category/AllEntrees/">Order Carryout</a> </div> Write a script using Capybara testing framework: expect(page).to have_css('.js-orderCarryoutNow') find('.js-orderCarryoutNow').click Step 5: Let’s Code FINALLY, the construction begins! Here are my tips for getting started with writing great automation scripts. Repeatable. Automated scripts must be repeatable. You must be able to measure an expected outcome. Design Test for Scalability. The whole point of automation is to provide rapid feedback. Creating and executing large-scale automated tests need a thoughtful approach to elements in the process and a close eye on test design. Keep tests lean and independent. Easy to Understand. Scripts should be readable. Ideally, your scripts also serve as a useful form of design and requirement documentation. Speed. Your automated tests should run quickly. The end goal is to make the overall development process faster by establishing rapid feedback cycles to detect problems. Sample test using Capybara: describe "on the google homepage -", :google, :type => :feature do before :each do visit 'https://google.com/' end it "the search input field is present" do expect(page).to have_css('input#lst-ib.gsfi') end end Takeaways Your road to automation can be difficult, but it’s worth it. To it well you need the right attitude. Sometimes you will hit roadblocks. If something didn’t work, you really learn to problem solve here. The training phase is so important and incredible opportunity to master your new craft – pay attention, have the right attitude, adapt, be engaged, and never stop learning… About the Author Greg Sypolt (@gregsypolt) is a senior engineer at Gannett and co-founder of Quality Element. The last 5 years focused on creation and deployment of automated test strategies, frameworks, tools, and platforms.

Blog

Update On Logging With Docker

A Simpler & Better WayIn New Docker Logging Drivers, I previously described how to use the new Syslog logging driver introduced in Docker 1.6 to transport container logs to Sumo Logic.Since then, there have been improvements to the Syslog logging driver, which now allows users to specify the address of the Syslog server to send the logs to. In its initial release the Syslog logging driver simply logged to the local Syslog daemon, but this is now configurable. We can exploit this in conjunction with the Sumo Logic Collector container for Syslog to make logging with Docker and Sumo Logic even easier.Simply run the Syslog Collector container as previously described:$ docker run -d -p 514:514 -p 514:514/udp \ --name="sumo-logic-collector" \ sumologic/collector:latest-syslog \ [Access ID] [Access key]Now you have a collector running, listening for Syslog on both ports 514/tcp and 514/udp.For every container required to run on the same host, you can now add the following to the Docker run command in order to make the container log to your Syslog collector:--log-driver syslog --log-opt syslog-address=udp://localhost:514Or, in a complete example:$ docker run --rm --name test \ --log-driver syslog --log-opt syslog-address=udp://localhost:514 \ ubuntu \ bash -c 'for i in `seq 1 10`; do echo Hello $i; sleep 1; done'You should now see something along these lines in Sumo Logic:This, of course, works remotely, as well. You can run the Sumo Logic Collector on one host, and have containers on all other hosts log to it by setting the syslog address accordingly when running the container.And Here Is An ErrataIn New Docker Logging Drivers, I described the newly added logging drivers in Docker 1.6. At the time, Docker was only able to log to local syslog, and hence our recommendation for integration was as follows:$ docker run -v /var/log/syslog:/syslog -d \ --name="sumo-logic-collector" \ sumologic/collector:latest-logging-driver-syslog \ [Access ID] [Access Key]This will basically have the Sumo Logic Collector tail the OS /var/log/syslog file. We discovered in the meantime that this will cause issues if /var/log/syslog is being logrotate’d. The container will hang on to the original file into which Syslog initially wrote the messages, and not pick up the new file after the old file was moved out of the way.There’s a simple solution to the issue: mount the directory into the container, not the file. In other words, please do this:$ docker pull sumologic/collector:latest-logging-driver-syslog$ docker run -v /var/log:/syslog -d \ --name="sumo-logic-collector" \ sumologic/collector:latest-logging-driver-syslog \ [Access ID] [Access Key]Or, of course, switch to the above described new and improved approach!

Blog

Choosing the Right Development Environment For You

When starting a project, working as an individual developer provides a level of development freedom that can get quickly complicated when it is time to grow the team. Once you expand to multiple developers, it is critical to maintain a well-documented and structured development environment. In a poorly architected environment, team members will have different experiences and ideas about software development, which can lead to friction amongst developers. The lack of consistency between the different environments makes fixing bugs and developing features a frustrating experience, and leads to the commonly used “works on my machine” excuse. By contrast, a properly structured and documented development environment keeps everyone on the same page and focused on product instead of constantly trying to get things to work. In addition to a more efficient development team, a structured development environment can drastically decrease the time it takes to onboard a new developer (I’ve personally worked at a company where the development environment took the entirety of my first week to setup and configure properly, because it wasn’t properly documented or managed). While there is no “one size fits all” development environment, the majority of the solutions to the problem are centered around consistency. This is almost always handled through the use of virtual machines, however, how and where they are setup can differ wildly. When it comes down to it, there are two options that determine what and how these machines work: local or remote. In a local development environment, devs run an instance of the code locally on their own machines, which allows them to work independently and without having to rely on a centralized server. Unfortunately, local environments can be limited when trying to replicate more complex production environments, especially if developers are working on underpowered machines. In a remote environment, developers work directly off of a remotely hosted server, rather than locally. This has the added benefit of offering perfect parity with the production environment, but requires developers to have a high-speed internet connection to write and test even the most trivial of changes to the codebase. The most popular structured local development environment that I have seen is Vagrant. Vagrant, at its core, is a cross-platform virtual machine management tool that allows you to configure and provision virtual machines in a reproducible way. With built in Puppet and Chef support, Vagrant is an excellent way to setup a brand new development environment with just one command. What makes Vagrant so great is that, rather than passing around machine images, the configuration files are committed directly into a project’s version control system and a base machine is built and configured on the fly, meaning any developer who can clone the codebase is also instantly given the ability to spin up a virtual machine. Because Vagrant runs within a virtual machine that mounts the directory it is configured in, developers can also use any IDE that they are comfortable with, which allows them to spend less time learning new tools and more time focusing on product. Often, it is desirable for developers to work off of a centralized server (or set of servers), rather than their local machines. Remote development environments are an interesting case in that they can provide a lot more power than local environments and reduce the amount of setup required to almost nothing for new developers. Depending on the size of the organization, these environments can be set up by someone on either the development or operations team, and can be hosted almost anywhere. The two most common setups I have seen for remote development environments are shared servers and private servers. In a shared server environment, every developer shares the same machine with their own distinct logins and subdomains. This is a good solution for organizations with limited resources that self-host their servers, as it may not be feasible to have a dedicated private server for each developer. When available, private servers are the perfect solution for remote development environments because they can provide 100% parity with production environments and, much like Vagrant, can be spun up at the click of a button. The biggest problem with private servers, however, is that an internet connection is required to use them. In a local environment, developers could theoretically work off the grid, but in a remote environment, a high-speed internet connection is always required to get work done. Another, smaller issue is remote access doesn’t always play well with every IDE. Many don’t provide great remote access functionality, if they provide any at all, requiring developers to either use a new IDE or cook up hacky solutions to use the IDE they’re used to using. In a perfect world in which developers have access to sufficiently powerful machines, I would recommend Vagrant 100% of the time. Because it is cross-platform, organizations can take an OS-agnostic approach to personal computers, and the automated and simple setup allows for quicker developer onboarding. While Vagrant can have some speed drawbacks, the lack of internet requirement is a huge bonus, removing “my internet is down” from the list of things that can go wrong in a project.

Blog

Has SIEM Lost its Magic?

Blog

Change Management in a Change-Dominated World

DevOps isn’t just about change — it’s about continuous, automated change. It’s about ongoing stakeholder input and shifting requirements;about rapid response and fluid priorities. In such a change-dominated world, how can the concept of change management mean anything? But maybe that’s the wrong question. Maybe a better question would be this: Can a change-dominated world even exist without some kind of built-in change management? Change management is always an attempt to impose orderly processes on disorder. That, at least, doesn’t change. What does change is the nature and the scope of the disorder, and the nature and the scope of the processes that must be imposed on it. This is what makes the DevOps world look so different, and appear to be so alien to any kind of recognizable change management. Traditional change management, after all, seems inseparable from waterfall and other traditional development methodologies. You determine which changes will be part of a project, you schedule them, and there they are on a Gantt chart, each one following its predecessor in proper order. Your job is as much to keep out ad-hoc chaos as it is to manage the changes in the project. And in many ways, Agile change management is a more fluid and responsive version of traditional change management, scaled down from project-level to iteration-level, with a shifting stack of priorities replacing the Gantt chart. Change management’s role is to determine if and when there is a reason why a task should move higher or lower in the priority stack, but not to freeze priorities (as would have happened in the initial stages of a waterfall project). Agile change management is priority management as much as it is change management — but it still serves as a barrier against the disorder of ad-hoc decision-making. In Agile, the actual processes involved in managing changes and priorities are still in human hands and are based on human decisions. DevOps moves many of those management processes out of human hands and places them under automated control. Is it still possible to manage changes or even maintain control over priorities in an environment where much of the on-the-ground decision-making is automated? Consider what automation actually is in DevOps — it’s the transfer of human management policies, decision-making, and functional processes to an automatically operating computer-based system. You move the responsibilities that can be implemented in an algorithm over to the automated system, leaving the DevOps team free to deal with the items that need actual, hands-on human attention. This immediately suggests what naturally tends to happen with change management in DevOps. It splits into two forks, each of which is important to the overall DevOps effort. One fork consists of change management as implemented in the automates continuous release system, while the other fork consists of human-directed change management of the somewhat more traditional kind. Each of these requires first-rate change management expertise on an ongoing basis. It isn’t hard to see why an automated continuous release system that incorporates change management features would require the involvement of human change management experts during its initial design and implementation phases. Since the release system is supposed to incorporate human expertise, it naturally needs expert input at some point during its design. Input from experienced change managers (particularly those with a good understanding of the system being developed) can be extremely important during the early design phases of an automated continuous release system; you are in effect building their knowledge into the structure of the system. But DevOps continuous release is by its very nature likely to be a continually changing process itself, which means that the automation software that directs it is going to be in a continual state of change. This continual flux will include the expertise that is embodied in the system, which means that its frequent revision and redesign will require input from human change management experts. And not all management duties can be automated. After human managers have been relieved of all of the responsibilities that can be automated, they are left with the ones that for one reason or another do not lend themselves well to automation — in essence, anything that can’t be easily turned into an algorithm. This is likely to include at least some (and possibly many) of the kinds of decision that fall under the heading of change management. These unautomated responsibilities will require someone (or several people) to take the role of change manager. And DevOps change management generally does not take its cue from waterfall in the first place. It is more likely to be a lineal descendant of Agile change management, with its emphasis on managing a flexible stack of priorities during the course of an iteration, and not a static list of requirements that must be included in the project. This kind of priority-balancing requires more human involvement than does waterfall’s static list, which means that Agile-style change management is likely to result in a greater degree of unautomated change management than one would find with waterfall. This shouldn’t be surprising. As the more repetitive, time-consuming, and generally uninteresting tasks in any system are automated, it leaves greater time for complex and demanding tasks involving analysis and decision-making. This in turn make it easier to implement methodologies which might not be practical in a less automated environment. In other words, human-based change management will now focus on managing shifting priorities and stakeholder demands, not because it has to, but because it can. So what place does change management have in a change-dominated world? It transforms itself from being a relatively static discipline imposed on an inherently slow process (waterfall development) to an intrinsic (and dynamic) part of the change-driven environment itself. DevOps change management manages change from within the machinery of the system itself, while at the same time allowing greater latitude for human guidance of the flow of change in response to the shifting requirements imposed by that change-driven environment. To manage change in a change-dominated world, one becomes the change.

September 4, 2015

Blog

Why Twitter Chose Sumo Logic to Address PCI Compliance

Blog

Good sourceCategory, bad sourceCategory

Setting sourceCategory values, especially for a small set of sources, may seem trivial at first. Good sourceCategory values are however indispensable for scale and performance in the long term. This blog post discusses some best practices around sourceCategory values. Source Categories have 3 main purposes: 1) Scoping your searches 2) Indexing (partitioning) your data 3) Controlling who sees what data (RBAC) Our recommendation for sourceCategory values follows this nomenclature: component1/component2/component3… starting with the least descriptive, highest level grouping and getting more descriptive with each component, the full value describing the subset of data in detail. For example, assume you have several different Firewall appliances, ASA and FWSM from Cisco and 7050 from Palo Alto Networks. In addition you also have a Cisco router, 800 series. Following the above nomenclature we could set the following values (instead of simply using “FWSM”, “ASA”, etc): Networking/Firewall/Cisco/FWSM Networking/Firewall/Cisco/ASA Networking/Firewall/PAN/7050 Networking/Router/Cisco/800 While the components at the beginning of the value do not add any obvious value they do provide a high level grouping of this data. This allows us to: 1) Easily and effectively define the scope of our search: _sourceCategory=Networking/Firewall/* (all firewall data) or _sourceCategory=Networking/*/Cisco/* (all Cisco data) With one sourceCategory specification and wild cards we can find the subset of data we need without any need for boolean logic (OR). 2) If we wanted to create a separate index for the networking data for better performance we can specify an index with the following routing expression: _sourceCategory=Networking* Since indexes cannot be modified (they can only be disabled and recreated with a new name and/or routing expression) we want to make sure that we do not have to modify them (and re-educate all users) unless something major changes. Using high level groups with wild cards to specify the index will self-maintain and help drive adoption of the indexes with your users. 3) Similar to the indexing, if you wanted to restrict access to this data you can now use the high level values, reducing the amount of managing these rules as you add more data. High level groupings can be built with a variety of items, for example environment details (prod vs. dev), geographical information (east vs. west), by application, by business unit or any other value that makes sense for your data. The order in which we use these values is determined by how you are searching the data. For example, if most of your use cases do not need data from both prod and dev environments, you could use: Prod/Web/Apache/Access Dev/Web/Apache/Access Prod/DB/MySQL/Error Dev/DB/MySQL/Error You can still search across both when needed but this scheme splits all your data up into prod and dev more intuitively. If, on the other hand you do have a need to search this data together frequently, you could use: Web/Apache/Access/Prod Web/Apache/Access/Dev DB/MySQL/Error/Prod DB/MySQL/Error/Dev This simple change completely changes your high level grouping. Both schemes allow you simply cover both use cases, the difference is what looks for intuitive for the majority of your use cases.

August 27, 2015

Blog

Deploying “Hello, World!” DevOps Style

Blog

2 Key Principles for Creating Meaningful Alerts

Blog

Delivering a Speedy Site: Sumo Logic On Top of CDN Logs!

Blog

Change Management in a Change-Dominated World

DevOps isn't just about change -- it's about continuous, automated change. It's about ongoing stakeholder input and shifting requirements;about rapid response and fluid priorities. In such a change-dominated world, how can the concept of change management mean anything? But maybe that's the wrong question. Maybe a better question would be this: Can a change-dominated world even exist without some kind of built-in change management? Change management is always an attempt to impose orderly processes on disorder. That, at least, doesn't change. What does change is the nature and the scope of the disorder, and the nature and the scope of the processes that must be imposed on it. This is what makes the DevOps world look so different, and appear to be so alien to any kind of recognizable change management. Traditional change management, after all, seems inseparable from waterfall and other traditional development methodologies. You determine which changes will be part of a project, you schedule them, and there they are on a Gantt chart, each one following its predecessor in proper order. Your job is as much to keep out ad-hoc chaos as it is to manage the changes in the project. And in many ways, Agile change management is a more fluid and responsive version of traditional change management, scaled down from project-level to iteration-level, with a shifting stack of priorities replacing the Gantt chart. Change management's role is to determine if and when there is a reason why a task should move higher or lower in the priority stack, but not to freeze priorities (as would have happened in the initial stages of a waterfall project). Agile change management is priority management as much as it is change management -- but it still serves as a barrier against the disorder of ad-hoc decision-making. In Agile, the actual processes involved in managing changes and priorities are still in human hands and are based on human decisions. DevOps moves many of those management processes out of human hands and places them under automated control. Is it still possible to manage changes or even maintain control over priorities in an environment where much of the on-the-ground decision-making is automated? Consider what automation actually is in DevOps -- it's the transfer of human management policies, decision-making, and functional processes to an automatically operating computer-based system. You move the responsibilities that can be implemented in an algorithm over to the automated system, leaving the DevOps team free to deal with the items that need actual, hands-on human attention. This immediately suggests what naturally tends to happen with change management in DevOps. It splits into two forks, each of which is important to the overall DevOps effort. One fork consists of change management as implemented in the automates continuous release system, while the other fork consists of human-directed change management of the somewhat more traditional kind. Each of these requires first-rate change management expertise on an ongoing basis. It isn't hard to see why an automated continuous release system that incorporates change management features would require the involvement of human change management experts during its initial design and implementation phases. Since the release system is supposed to incorporate human expertise, it naturally needs expert input at some point during its design. Input from experienced change managers (particularly those with a good understanding of the system being developed) can be extremely important during the early design phases of an automated continuous release system; you are in effect building their knowledge into the structure of the system. But DevOps continuous release is by its very nature likely to be a continually changing process itself, which means that the automation software that directs it is going to be in a continual state of change. This continual flux will include the expertise that is embodied in the system, which means that its frequent revision and redesign will require input from human change management experts. And not all management duties can be automated. After human managers have been relieved of all of the responsibilities that can be automated, they are left with the ones that for one reason or another do not lend themselves well to automation -- in essence, anything that can't be easily turned into an algorithm. This is likely to include at least some (and possibly many) of the kinds of decision that fall under the heading of change management. These unautomated responsibilities will require someone (or several people) to take the role of change manager. And DevOps change management generally does not take its cue from waterfall in the first place. It is more likely to be a lineal descendant of Agile change management, with its emphasis on managing a flexible stack of priorities during the course of an iteration, and not a static list of requirements that must be included in the project. This kind of priority-balancing requires more human involvement than does waterfall's static list, which means that Agile-style change management is likely to result in a greater degree of unautomated change management than one would find with waterfall. This shouldn't be surprising. As the more repetitive, time-consuming, and generally uninteresting tasks in any system are automated, it leaves greater time for complex and demanding tasks involving analysis and decision-making. This in turn make it easier to implement methodologies which might not be practical in a less automated environment. In other words, human-based change management will now focus on managing shifting priorities and stakeholder demands, not because it has to, but because it can. So what place does change management have in a change-dominated world? It transforms itself from being a relatively static discipline imposed on an inherently slow process (waterfall development) to an intrinsic (and dynamic) part of the change-driven environment itself. DevOps change management manages change from within the machinery of the system itself, while at the same time allowing greater latitude for human guidance of the flow of change in response to the shifting requirements imposed by that change-driven environment. To manage change in a change-dominated world, one becomes the change.

Blog

Why You Should Add Wire Data to Your Sumo Logic

By Chris Abella, Technical Marketing Engineer, ExtraHop Networks My coworkers and I hooked our coffee maker up to the office Wi-Fi the minute we found out we could (we’re engineers after all). We’re located in Seattle, so the chances we’d buy a coffee machine that can’t be connected to the Internet are slim. We got excited when it showed up on our local ExtraHop which monitors the office network, but the wind went out of our sails a bit when all it did was an XML-over-TCP heartbeat to the manufacturer on some ephemeral port. For days. Oh well, on to actual work. As we connect more and more devices to our networks, as interconnections and communications between our systems and applications increase, monitoring and managing these devices increases in complexity and importance. It’s not just coffee makers jumping on the Internet, but industrial and warehousing equipment, even cars. Planning ahead, what does our monitoring strategy look like as we implement the new Internet of Things? It isn’t tied to a single monitoring methodology, but instead is going to require a best-of-breed approach, and combining ExtraHop’s wire data analytics with Sumo Logic’s machine data analytics lays the foundation for next-generation monitoring. Machine Data + Wire Data Logging has been king in IT ops and will continue to be instrumental going forward. Developers know the apps inside and out and can provide an internal view of not just code flow but business flow. Sumo Logic brings intelligence to the Apache access files you already have, and, with even small modifications the log’s usefulness is greatly improved. Sumo Logic makes workflows like parsing a key like timestamp, client_ip, or request_uid from error.log and correlating to access.log a snap; no sed, grep, or awk required. On the flip side, Sumo Logic has made integrations with off-the-shelf technologies like S3, CloudTrail, and even OSX simple. But there is more to see than what is available in your logs, and that’s where wire data comes in. Wire data is all the information flowing over your networks. Whereas machine data is all the logs files stored on discrete servers or services, wire data is the communication passed between those discrete elements. HTTP? That’s wire data. FTP? Also wire data. The same goes for our coffee machine’s TCP heartbeat. Here at ExtraHop, we specialize in real-time wire data analytics at scale (I heard Sumo Logic is into scale). With the ExtraHop platform’s Open Data Stream, you can send your wire data to Sumo Logic for correlation with machine data. The ExtraHop appliance analyzes a mirrored copy of your network traffic to extract real-time wire data. Combining ExtraHop and Sumo Logic, you can troubleshoot failed order transactions by pulling the error and request information from your Apache logs (machine data), stack traces from application logs (machine data), POST parameters (order ID, user ID) from the HTTP request (wire data), HTTP status codes (machine and wire data), HTTP headers (wire data), and TCP aborts (wire data). You could also add performance metrics like processing time (wire and machine data), network latency (wire data), and TCP issues like retransmission timeouts and throttling (wire data). The power of wire data is that it is application agnostic. When you use machine and wire data in conjunction with each other, you get a comprehensive view of your application, inside and outside. And if you find yourself wanting metrics that weren’t included in the logs, you can often pull them from the wire instead going through dev and QA. While we didn’t see anything other than a heartbeat from the coffee machine in those first few days, writing a Trigger to extract methods and messages and push them into Sumo Logic is trivial, and with Sumo Logic’s tools like LogReduce and anomaly detection, the day our coffee machine decides it’s time for an unscheduled tune up, we’ll be the first to know. If you’re like us, you want to understand what everything in your environment is doing, including your coffee maker, whether you’re implementing your own in-house tech or buying off-the-shelf. When you combine multiple sources of data, you can drive quicker troubleshooting and optimization, deliver business insights, and deliver more secure applications, all built on more visibility. Interested in trying out ExtraHop? Check out our interactive demo or request your own free virtual appliance. Who am I? I’m a Technical Marketing Engineer at ExtraHop Networks, a wire data analytics company based in Seattle, Washington. I’ve been lucky enough to see the rise of the next generation of IT operational intelligence from the inside, and get to build solutions that leverage many of these technologies, including integrating the ExtraHop and Sumo Logic platforms. When I’m not at work, you can find me dancing at various music festivals, maining support in LoL, or running/biking around the Pacific Northwest.

August 20, 2015

Blog

Why You Should Add Wire Data to Your Sumo Logic

By Chris Abella, Technical Marketing Engineer, ExtraHop Networks My coworkers and I hooked our coffee maker up to the office Wi-Fi the minute we found out we could (we’re engineers after all). We’re located in Seattle, so the chances we’d buy a coffee machine that can’t be connected to the Internet are slim. We got excited when it showed up on our local ExtraHop which monitors the office network, but the wind went out of our sails a bit when all it did was an XML-over-TCP heartbeat to the manufacturer on some ephemeral port. For days. Oh well, on to actual work. As we connect more and more devices to our networks, as interconnections and communications between our systems and applications increase, monitoring and managing these devices increases in complexity and importance. It’s not just coffee makers jumping on the Internet, but industrial and warehousing equipment, even cars. Planning ahead, what does our monitoring strategy look like as we implement the new Internet of Things? It isn’t tied to a single monitoring methodology, but instead is going to require a best-of-breed approach, and combining ExtraHop’s wire data analytics with Sumo Logic’s machine data analytics lays the foundation for next-generation monitoring. Machine Data + Wire Data Logging has been king in IT ops and will continue to be instrumental going forward. Developers know the apps inside and out and can provide an internal view of not just code flow but business flow. Sumo Logic brings intelligence to the Apache access files you already have, and, with even small modifications the log’s usefulness is greatly improved. Sumo Logic makes workflows like parsing a key like timestamp, client_ip, or request_uid from error.log and correlating to access.log a snap; no sed, grep, or awk required. On the flip side, Sumo Logic has made integrations with off-the-shelf technologies like S3, CloudTrail, and even OSX simple. But there is more to see than what is available in your logs, and that’s where wire data comes in. Wire data is all the information flowing over your networks. Whereas machine data is all the logs files stored on discrete servers or services, wire data is the communication passed between those discrete elements. HTTP? That’s wire data. FTP? Also wire data. The same goes for our coffee machine’s TCP heartbeat. Here at ExtraHop, we specialize in real-time wire data analytics at scale (I heard Sumo Logic is into scale). With the ExtraHop platform’s Open Data Stream, you can send your wire data to Sumo Logic for correlation with machine data. The ExtraHop appliance analyzes a mirrored copy of your network traffic to extract real-time wire data. Combining ExtraHop and Sumo Logic, you can troubleshoot failed order transactions by pulling the error and request information from your Apache logs (machine data), stack traces from application logs (machine data), POST parameters (order ID, user ID) from the HTTP request (wire data), HTTP status codes (machine and wire data), HTTP headers (wire data), and TCP aborts (wire data). You could also add performance metrics like processing time (wire and machine data), network latency (wire data), and TCP issues like retransmission timeouts and throttling (wire data). The power of wire data is that it is application agnostic. When you use machine and wire data in conjunction with each other, you get a comprehensive view of your application, inside and outside. And if you find yourself wanting metrics that weren’t included in the logs, you can often pull them from the wire instead going through dev and QA. While we didn’t see anything other than a heartbeat from the coffee machine in those first few days, writing a Trigger to extract methods and messages and push them into Sumo Logic is trivial, and with Sumo Logic’s tools like LogReduce and anomaly detection, the day our coffee machine decides it’s time for an unscheduled tune up, we’ll be the first to know. If you’re like us, you want to understand what everything in your environment is doing, including your coffee maker, whether you’re implementing your own in-house tech or buying off-the-shelf. When you combine multiple sources of data, you can drive quicker troubleshooting and optimization, deliver business insights, and deliver more secure applications, all built on more visibility. Interested in trying out ExtraHop? Check out our interactive demo or request your own free virtual appliance. Who am I? I’m a Technical Marketing Engineer at ExtraHop Networks, a wire data analytics company based in Seattle, Washington. I’ve been lucky enough to see the rise of the next generation of IT operational intelligence from the inside, and get to build solutions that leverage many of these technologies, including integrating the ExtraHop and Sumo Logic platforms. When I’m not at work, you can find me dancing at various music festivals, maining support in LoL, or running/biking around the Pacific Northwest.

August 20, 2015

Blog

Why You Should Add Wire Data to Your Sumo Logic

By Chris Abella, Technical Marketing Engineer, ExtraHop Networks My coworkers and I hooked our coffee maker up to the office Wi-Fi the minute we found out we could (we’re engineers after all). We’re located in Seattle, so the chances we’d buy a coffee machine that can’t be connected to the Internet are slim. We got excited when it showed up on our local ExtraHop which monitors the office network, but the wind went out of our sails a bit when all it did was an XML-over-TCP heartbeat to the manufacturer on some ephemeral port. For days. Oh well, on to actual work. As we connect more and more devices to our networks, as interconnections and communications between our systems and applications increase, monitoring and managing these devices increases in complexity and importance. It’s not just coffee makers jumping on the Internet, but industrial and warehousing equipment, even cars. Planning ahead, what does our monitoring strategy look like as we implement the new Internet of Things? It isn’t tied to a single monitoring methodology, but instead is going to require a best-of-breed approach, and combining ExtraHop’s wire data analytics with Sumo Logic’s machine data analytics lays the foundation for next-generation monitoring. Machine Data + Wire Data Logging has been king in IT ops and will continue to be instrumental going forward. Developers know the apps inside and out and can provide an internal view of not just code flow but business flow. Sumo Logic brings intelligence to the Apache access files you already have, and, with even small modifications the log’s usefulness is greatly improved. Sumo Logic makes workflows like parsing a key like timestamp, client_ip, or request_uid from error.log and correlating to access.log a snap; no sed, grep, or awk required. On the flip side, Sumo Logic has made integrations with off-the-shelf technologies like S3, CloudTrail, and even OSX simple. But there is more to see than what is available in your logs, and that’s where wire data comes in. Wire data is all the information flowing over your networks. Whereas machine data is all the logs files stored on discrete servers or services, wire data is the communication passed between those discrete elements. HTTP? That’s wire data. FTP? Also wire data. The same goes for our coffee machine’s TCP heartbeat. Here at ExtraHop, we specialize in real-time wire data analytics at scale (I heard Sumo Logic is into scale). With the ExtraHop platform’s Open Data Stream, you can send your wire data to Sumo Logic for correlation with machine data. The ExtraHop appliance analyzes a mirrored copy of your network traffic to extract real-time wire data. Combining ExtraHop and Sumo Logic, you can troubleshoot failed order transactions by pulling the error and request information from your Apache logs (machine data), stack traces from application logs (machine data), POST parameters (order ID, user ID) from the HTTP request (wire data), HTTP status codes (machine and wire data), HTTP headers (wire data), and TCP aborts (wire data). You could also add performance metrics like processing time (wire and machine data), network latency (wire data), and TCP issues like retransmission timeouts and throttling (wire data). The power of wire data is that it is application agnostic. When you use machine and wire data in conjunction with each other, you get a comprehensive view of your application, inside and outside. And if you find yourself wanting metrics that weren’t included in the logs, you can often pull them from the wire instead going through dev and QA. While we didn’t see anything other than a heartbeat from the coffee machine in those first few days, writing a Trigger to extract methods and messages and push them into Sumo Logic is trivial, and with Sumo Logic’s tools like LogReduce and anomaly detection, the day our coffee machine decides it’s time for an unscheduled tune up, we’ll be the first to know. If you’re like us, you want to understand what everything in your environment is doing, including your coffee maker, whether you’re implementing your own in-house tech or buying off-the-shelf. When you combine multiple sources of data, you can drive quicker troubleshooting and optimization, deliver business insights, and deliver more secure applications, all built on more visibility. Interested in trying out ExtraHop? Check out our interactive demo or request your own free virtual appliance. Who am I? I’m a Technical Marketing Engineer at ExtraHop Networks, a wire data analytics company based in Seattle, Washington. I’ve been lucky enough to see the rise of the next generation of IT operational intelligence from the inside, and get to build solutions that leverage many of these technologies, including integrating the ExtraHop and Sumo Logic platforms. When I’m not at work, you can find me dancing at various music festivals, maining support in LoL, or running/biking around the Pacific Northwest.

August 20, 2015

Blog

Why You Should Add Wire Data to Your Sumo Logic

My coworkers and I hooked our coffee maker up to the office Wi-Fi the minute we found out we could (we’re engineers after all). We’re located in Seattle, so the chances we’d buy a coffee machine that can’t be connected to the Internet are slim. We got excited when it showed up on our local ExtraHop which monitors the office network, but the wind went out of our sails a bit when all it did was an XML-over-TCP heartbeat to the manufacturer on some ephemeral port. For days. Oh well, on to actual work. As we connect more and more devices to our networks, as interconnections and communications between our systems and applications increase, monitoring and managing these devices increases in complexity and importance. It’s not just coffee makers jumping on the Internet, but industrial and warehousing equipment, even cars. Planning ahead, what does our monitoring strategy look like as we implement the new Internet of Things? It isn’t tied to a single monitoring methodology, but instead is going to require a best-of-breed approach, and combining ExtraHop’s wire data analytics with Sumo Logic’s machine data analytics lays the foundation for next-generation monitoring. Machine Data + Wire Data Logging has been king in IT ops and will continue to be instrumental going forward. Developers know the apps inside and out and can provide an internal view of not just code flow but business flow. Sumo Logic brings intelligence to the Apache access files you already have, and, with even small modifications the log’s usefulness is greatly improved. Sumo Logic makes workflows like parsing a key like timestamp, client_ip, or request_uid from error.log and correlating to access.log a snap; no sed, grep, or awk required. On the flip side, Sumo Logic has made integrations with off-the-shelf technologies like S3, CloudTrail, and even OSX simple. But there is more to see than what is available in your logs, and that’s where wire data comes in. Wire data is all the information flowing over your networks. Whereas machine data is all the logs files stored on discrete servers or services, wire data is the communication passed between those discrete elements. HTTP? That’s wire data. FTP? Also wire data. The same goes for our coffee machine’s TCP heartbeat. Here at ExtraHop, we specialize in real-time wire data analytics at scale (I heard Sumo Logic is into scale). With the ExtraHop platform’s Open Data Stream, you can send your wire data to Sumo Logic for correlation with machine data. The ExtraHop appliance analyzes a mirrored copy of your network traffic to extract real-time wire data. Combining ExtraHop and Sumo Logic, you can troubleshoot failed order transactions by pulling the error and request information from your Apache logs (machine data), stack traces from application logs (machine data), POST parameters (order ID, user ID) from the HTTP request (wire data), HTTP status codes (machine and wire data), HTTP headers (wire data), and TCP aborts (wire data). You could also add performance metrics like processing time (wire and machine data), network latency (wire data), and TCP issues like retransmission timeouts and throttling (wire data). The power of wire data is that it is application agnostic. When you use machine and wire data in conjunction with each other, you get a comprehensive view of your application, inside and outside. And if you find yourself wanting metrics that weren’t included in the logs, you can often pull them from the wire instead going through dev and QA. While we didn’t see anything other than a heartbeat from the coffee machine in those first few days, writing a Trigger to extract methods and messages and push them into Sumo Logic is trivial, and with Sumo Logic’s tools like LogReduce and anomaly detection, the day our coffee machine decides it’s time for an unscheduled tune up, we’ll be the first to know. If you’re like us, you want to understand what everything in your environment is doing, including your coffee maker, whether you’re implementing your own in-house tech or buying off-the-shelf. When you combine multiple sources of data, you can drive quicker troubleshooting and optimization, deliver business insights, and deliver more secure applications, all built on more visibility. Interested in trying out ExtraHop? Check out our interactive demo or request your own free virtual appliance. Who am I? I’m a Technical Marketing Engineer at ExtraHop Networks, a wire data analytics company based in Seattle, Washington. I’ve been lucky enough to see the rise of the next generation of IT operational intelligence from the inside, and get to build solutions that leverage many of these technologies, including integrating the ExtraHop and Sumo Logic platforms. When I’m not at work, you can find me dancing at various music festivals, maining support in LoL, or running/biking around the Pacific Northwest.

August 19, 2015

Blog

CSI: Cloud (Pilot coming soon to your AWS stack.)

AWS

August 17, 2015

Blog

Centralizing Your Application Log Data

Blog

Monitoring AWS Auto Scaling and Elastic Load Balancers with Log Analytics

In the first article of this series, we introduced the basics of AWS log analytics by correlating logs between S3 and CloudTrail on an Apache application running atop an EC2 server. But, the whole point of being in the cloud is to have a dynamic infrastructure. Instead of being bound to physical servers, you’re able to grow (and shrink) your web application on-demand. To get the most out of AWS, you need to be adding or removing EC2 instances to match the size of your audience. AWS provides an Auto Scaling service to do just that. Putting a group of auto-scaling Apache servers behind an Elastic Load Balancer creates a highly available and highly scalable infrastructure for your web application. However, EC2 instances are expensive, which raises a new problem: can you trust AWS Auto Scaling to manage your infrastructure for you? The only way you can say “yes” to this question is if you have a clear window into your Auto Scaling activities. The ability to see how many EC2 instances you’re paying for at any given time, as well as when and why they were created, gives you the necessary peace of mind to automatically scale your web application. As in the previous article, this information can be extracted from your application’s log data. This article takes full stack visibility one step further by adding Auto Scaling and Elastic Load Balancing to our example scenario. We’ll learn how to define key performance indicators (KPIs) that are tailored to your specific needs and monitor them with real-time dashboards and alerts. This not only aids in troubleshooting your system, but also provides the necessary insight to optimize your Auto Scaling behavior. Scenario This article assumes that you’re running a collection of Apache servers on auto-scaled EC2 instances behind an Elastic Load Balancer. As in the previous article, we also assume that you’re monitoring your AWS administration activity with CloudTrail. This stack is much more complicated than in the previous article. Instead of a single web server and EC2 instance, you now have an arbitrary number of them. This means that a centralized log analyzer is no longer an optional component of your toolchain—SSH’ing into individual machines and manually inspecting log files is now virtually impossible. We also have new potential points of failure: ELB could be routing requests incorrectly, your Auto Scaling algorithm could be creating too many or too few EC2 instances, and any one of those instances could be broken. In addition, when an EC2 instance gets deleted by AWS Auto Scaling, all of those log messages disappear. If you’re not collecting those logs with a centralized tool, the information is lost forever. You have no way to see if that server caused an unnecessary EC2 creation or deletion event due to a traffic spike or an obscure error. As your web application scales, log analytics becomes more and more important because the system is so complex you typically don’t even know when something isn’t working. As a result, you don’t know when to start troubleshooting as we did in the previous article. Instead, we need to be proactive about monitoring our infrastructure. For instance, when you have a hundred Apache servers, you won’t notice when one of them goes down. Of course, this also means that you won’t know when you’re wasting money on an EC2 instance that isn’t having any impact on your customers. But, if we have a real-time dashboard showing us the traffic from every EC2 instance, it’s trivial to identify wasted EC2 instances. Monitoring + Tuning: A Virtuous Cycle Optimizing your Auto Scaling algorithm directly affects your bottom line because it means you aren’t wasting money on unnecessary resources, while also ensuring that you’re not losing customers due to an underperforming web application. Tuning is all about correlating the number of EC2 nodes with business metrics like page load times and number of requests served. This implies that you have visibility into your operational and business metrics, which is where log analytics comes into play. Apache and ELB logs provide traffic information, while CloudTrail records EC2 creation and deletion events. Correlating events from all these components provides key insights that are impossible to see when examining the logs of any one component in isolation. It’s also important to remember that tuning isn’t a one-time event. It’s an ongoing process. Every time you push code or get a large influx of users, there’s a chance that it will change your CPU/memory/disk usage and disrupt your finely tuned Auto Scaling algorithm. In other words, it’s not just the Auto Scaling algorithm that needs to be optimized, it’s the behavior of your entire system. Monitoring your infrastructure with proactive alerts and real-time dashboards means you’ll catch optimization opportunities much sooner and with much less effort. In this article, we’ll learn how to monitor an AWS web stack with real-time dashboards that tell us exactly what we want to know about our load balancing and auto scaling behavior. This visibility gives you the confidence you need to let Auto Scaling manage your EC2 creation and deletion. Customizing your KPIs Log analytics is designed to monitor a complex, dynamic system. As your infrastructure changes, your log analytics instrumentation needs to change with it. The ability to define custom KPIs based on your unique needs is a critical skill if you want your log analytics tool to stay relevant as your application evolves. The rest of this article walks through this process with an example AWS web stack. Remember that these queries are just a starting point—feel free to alter parameters to analyze different metrics more suited to your specific needs. ELB Response Time Elastic Load Balancer logs contain three types of response time metrics: time from the load balancer to the backend instance, the backend instance’s processing time and the time from the backend instance back to the load balancer. These three values give you a broad overview of how your system is performing as a whole. The above chart shows a spike in backend processing time, which tells you that something is wrong with your backend EC2 instances (i.e., your web servers) opposed to a problem with ELB. But, notice that this query isn’t meant to identify the root cause of a problem. It’s only a high-level window into your EC2 performance. The real value in full stack AWS log analytics is the ability to compare this chart with the rest of the ones we’re about to create. This chart was generated with the following Sumo Logic query. The idea is to save this query as a panel in a custom dashboard alongside other important metrics like web traffic requests and the number of active EC2 instances. Having all these panels in a one place makes it much easier to find correlations between different layers of your stack. _sourceCategory=aws_elb | parse "* * *:* *:* * * * * * * * \"* *://*:*/* HTTP" as f1, elb_server, clientIP, port, backend, backend_port, request_pt, backend_pt, response_pt, ELB_StatusCode, be_StatusCode, rcvd, send, method, protocol, domain, server_port, path | timeslice by 1m | avg(request_pt) as avg_request_pt, avg(backend_pt) as avg_backend_pt, avg(response_pt) as avg_response_pt by _timeslice | fields _timeslice, avg_request_pt, avg_backend_pt, avg_response_pt ELB Traffic by Requests and Volume Next, let’s take a look at the web traffic being served through ELB. The following chart shows the traffic volume in bytes received and bytes sent, along with the number of requests served: This sheds a little more light on the graph from the previous section. It seems our spike in backend processing time was caused by an influx of web traffic, as shown by the spike in both requests and bytes received. This is valuable information, as we now know that the problem wasn’t with a slow script on our web servers, but rather an issue with Auto Scaling. Our system simply wasn’t big enough to handle to spike in traffic and our Auto Scaling algorithm didn’t compensate quickly enough. The above chart was created with the following query: _sourceCategory=aws_elb | parse "* * *:* *:* * * * * * * * \"* *://*:*/* HTTP" as f1, elb_server, clientIP, port, backend, backend_port, request_pt, backend_pt, response_pt, ELB_StatusCode, be_StatusCode, rcvd, send, method, protocol, domain, server_port, path | timeslice by 1m | sum(rcvd) as bytes_received, sum(send) as bytes_sent, count as requests by _timeslice One of the common metrics for defining AWS’s Auto Scaling algorithm is request frequency. This query shows you requests per minute, as well as another important metric: traffic volume. A web application serving a small number of very large requests won’t scale correctly if EC2 instances are scaled only on request frequency. This is the kind of visibility that gives you the confidence to let Auto Scaling take care of EC2 instance creation and deletion for you. Number of Requests by Backend EC2 Instance Both of the above queries presented average values across your EC2 infrastructure. They provide a primitive system for monitoring your Auto Scaling behavior, but you can take it a step further by identifying stray EC2 instances that aren’t working well compared to the rest of the system. The following query displays the number of requests served to individual EC2 instances. EC2 instances that aren’t serving as much traffic as the rest of the system are easily identified by lower lines on the graph. This can indicate ELB misconfiguration, large requests that take so long to serve that your load balancer stopped sending the instance new requests, or a hanging script on your web server. Whatever the reason, EC2 instances serving an unusually low amount of requests are wasting money and need to be optimized. _sourceCategory=aws_elb | parse "* * *:* *:* * * * * * * * \"* *://*:*/* HTTP" as f1, elb_server, clientIP, port, backend, backend_port, request_pt, backend_pt, response_pt, ELB_StatusCode, be_StatusCode, rcvd, send, method, protocol, domain, server_port, path | timeslice 1m | count as requests by backend, _timeslice | transpose row _timeslice column backend As you can see from the underlying query, we’re not actually analyzing every web server directly. Instead, we’re extracting the backend EC2 instance from each ELB log and dividing up the traffic based on that value. Since the ELB logs already have all the request information we’re looking for, this is a bit more convenient that pulling logs from individual servers. However, with a tool like Sumo Logic, analyzing logs from multiple sources isn’t that much harder. This would be useful, say, if we were looking at custom application logs instead of web server requests. Processing Time by Backend EC2 Instance We can get another view on our EC2 instances by analyzing backend processing time. This will identify a different set of issues than the previous query, which only looked at the number of requests, opposed to how long it took to serve them. Slow instances can be caused by anything from a server running old, unoptimized code, hardware issues, or even a malicious user performing a DoS attack on a single instance. Note that many servers with high latency won’t fail a health check, so this query finds optimization opportunities that simpler monitoring techniques won’t catch. _sourceCategory=aws_elb | parse "* * *:* *:* * * * * * * * \"* *://*:*/* HTTP" as f1, elb_server, clientIP, port, backend, backend_port, request_pt, backend_pt, response_pt, ELB_StatusCode, be_StatusCode, rcvd, send, method, protocol, domain, server_port, path | timeslice 1m | avg(backend_pt) as avg_processing_time by backend, _timeslice | transpose row _timeslice column backend This screenshot shows that all our servers are responding more slowly to requests after the influx of traffic. It tells us that the problem is system-wide, rather than isolated in any particular server or group of servers. This is further confirmation that our infrastructure is simply too small to accommodate the influx of traffic. EC2 Creation and Deletion Events Now we get into the meaty part of full stack log analytics. By examining CloudTrail logs, we can figure out when EC2 instances were created or deleted. These events are incredibly important because they directly influence your bottom line. This chart shows that a single web server was allocated after the spike in traffic. It tells us that our Auto Scaling creates instances in the right direction, but not enough of them to accommodate large changes in traffic. From this, we can infer that our Auto Scaling algorithm is not sensitive enough. Our next and final query in this article will confirm this hypothesis. The underlying query tallies up RunInstances, StartInstances, StopInstances, and TerminateInstances events from CloudTrail logs, which signify EC2 instance creation and deletion events: _sourceCategory=aws_cloudtrail | json auto | where eventname = "RunInstances" OR eventname = "StartInstances" OR eventname = "StopInstances" OR eventname = "TerminateInstances" | parse regex "requestParameters\"\:\{\"instancesSet\"\:\{\"items\"\:\[(?<instances>.*?)\]" | parse regex field=instances "\{\"instanceId\"\:\"(?<instance>.*?)\"" multi | if(eventname = "RunInstances" OR eventname = "StartInstances", 1, -1) as instance_delta | timeslice 1m | sum(instance_delta) as change by _timeslice Note that this panel is only useful when you compare it to the rest of our dashboard. Knowing that an EC2 instance isn’t really helpful, but if you can see that it was created because of an x percentage increase in web traffic, suddenly you have the means to start testing much more sophisticated Auto Scaling algorithms and validating the results in real time. Number of Backend EC2 Instances with Requests Ultimately, the number of EC2 instances that you have is what’s going to determine your bottom line. Pivoting this metric against other values in your log data makes sure that the money you’re spending on EC2 instances is actually impacting your user experience. The following panel shows the total number of web requests served by your application, overlaid on the number of active EC2 instances during any given time period: This is the holy grail of Auto Scaling monitoring. With one look at this chart, we can conclude that our EC2 instance creation isn’t keeping up with our web traffic. The solution is to tweak our Auto Scaling monitoring to be more sensitive to request frequency. _sourceCategory=aws_elb | parse "* * *:* *:* * * * * * * * \"* *://*:*/* HTTP" as f1, elb_server, clientIP, port, backend, backend_port, request_pt, backend_pt, response_pt, ELB_StatusCode, be_StatusCode, rcvd, send, method, protocol, domain, server_port, path | timeslice by 1m | count as requests, count_distinct(backend) as instances by _timeslice The above query extracts the number of active EC2 instances during the specified time period. You can pair this with CloudTrail’s EC2 creation/deletion events in the previous section for a more precise picture of your application’s Auto Scaling activity. Also keep in mind that the chart from the previous section will still find optimization opportunities that this query won’t because it shows the frequency of EC2 creation and deletion events. This is valuable information, as it ensures your algorithm isn’t allocating and deallocating EC2 instances too quickly. A Brief AWS Auto Scaling Case Study Optimizing the number of EC2 instances supporting your web application is about finding the right balance between speed and cost. If you have too few EC2 instances, your UX might be slow enough that you’re forcing users away from your site. On the other hand, if you have too many EC2 instances, you’re wasting money on diminishing returns. We’ll conclude this article with what to look out for when it comes to tuning your AWS Auto Scaling algorithm. Notice that it would be very difficult to identify either of the following scenarios without the custom dashboard that we just set up. Auto Scaling Not Sensitive Enough This is the scenario that we’ve been using throughout this article. When your Auto Scaling algorithm isn’t sensitive enough, you’ll see continuous increases in backend processing time, request frequency, and/or traffic volume that aren’t recuperated by new EC2 instances. The panels in the following dashboard show a typical scenario of Auto Scaling not responding quickly enough to a growing audience: Again, looking only at the EC2 creation events in isolation won’t tell you if your Auto Scaling is actually working. When traffic spiked, we still had a creation event, which would seem to tell us that our algorithm is working. However, further examination clearly showed that we didn’t create enough new instances to accommodate the traffic. Auto Scaling Too Sensitive On the opposite end of the spectrum, you can have an AWS Auto Scaling algorithm that’s too sensitive to changes in your traffic. An optimized algorithm should be able to absorb temporary spikes in traffic without creating unnecessary EC2 instances. Consider the following scenario: Instead of a continuous increase, this dashboard shows a brief spike followed by a return to baseline traffic volume. The two top-left panels tell us that this triggered the creation of a new server, but that server stuck around after the spike subsided. To optimize this algorithm, you can either make it less sensitive to changes in traffic so that a new server is never created, or you can make sure that it’s sensitive in both directions. When usage drops back down to normal levels, Auto Scaling should delete the extra EC2 instance that is no longer necessary. Conclusion AWS Auto Scaling is about trust. This article demonstrated how log analytics facilitates that trust by reliably monitoring your entire AWS web application. We set up a dashboard of custom KPIs tailored to monitor our specific stack. With one look at this dashboard, we had all the information required to assess our AWS Auto Scaling behavior. However, Auto Scaling is only one use case for log analytics. The value of a tool like Sumo Logic comes from its ability to find relationships between arbitrary components of your IT infrastructure. In this article, we found correlations in our ELB logs, CloudTrail, EC2 instances, and Auto Scaling behavior, but this exact same methodology can be applied to other aspects of your system. For instance, if you added Amazon CloudFront CDN on top of our existing infrastructure, you might find that a particular group of mis-configured backend servers are causing cache misses. The point is, full stack log analytics lets you find ways to optimize your infrastructure in ways you probably never even knew were possible. And, since you pay for that infrastructure, these optimizations directly affect your bottom line.

AWS

August 13, 2015

Blog

Explore your Data with Interactive Dashboards

We are excited about one of our newest features, Interactive Dashboards. Interactive Dashboards is the second major release of a longer term project for changing the way Sumo Logic dashboards work. Based on the feedback from you, our users, we are working to make our dashboarding technology more flexible and easier to use. In this blog article we’ll discuss how the new dashboards work, and how they can help you get more out of your data. What are Interactive Dashboards? Interactive Dashboards are built on the power of our search engine. That means powerful analytics with lots of flexibility. It also means that you can look at historical data immediately, explore the data more easily, and make changes to your dashboards without worrying about delays. So, let’s talk details. Interactive Dashboards help you Explore your data There are really three major challenges that we have tried to solve with this technology – making it easier to look at the data, opening up historical exploration of data, and navigating data with simple to use filters. Free Users from the Query Language Many of Sumo Logic users cringe when they see the search window, and that’s ok. Their use of Sumo Logic shouldn’t require knowing the intricacies of parsing or any familiarity with regex. Interactive Dashboards helps those users. Now subject matter experts can use their experience to build great dashboards, and point novice users at those dashboards. No need to know the queries. No need to understand the underlying logs. Consider your history One major request from our customers has been to use dashboards to look at things that happened last night, last week, or a month ago. Now users can apply any date range to an Interactive dashboard (within your data retention period, of course), and see the data. For example, let’s say you want to compare last weeks user visits to this weeks. No problem. Look at a web server issue last Tuesday. No problem. Any user viewing an Interactive Dashboard can set the time range on individual panels or reset the whole dashboard. Reduce the Noise One of the most requested features for dashboards has been filters. Now you can filter on any field in your data, whether you pulled that field out in the query itself, or it was pulled out on ingest. So, why does it matter? Now you can create dashboards and give your colleagues a way to easily navigate the data. For example, maybe let’s say you are creating a dashboard to look at web statistics. The problem is that you have 10 web sites with at a few apps on each of them. With Interactive Dashboards you can create a more generic dashboard with a filter for web site and another for app. Now you get a global view, plus any user can dig into the data without understanding anything about the structure of the data or complexity of your queries! Worried about Unauthorized Access to Data – Worry no more One of the problems with streaming dashboards is that by their nature there is one stream – and that stream has to be based on some user’s view. So, with our traditional dashboards, if you shared a dashboard, you shared the data. Now, with Interactive Dashboards, that is no longer the case. The new dashboards take any access controls you have setup into account. For example, if you have shared a security dashboard that has sensitive data, users in Roles that cannot see that sensitive data will not see it in the dashboards either. In conclusion, as I mentioned above, Interactive Dashboards is only one of many steps we will be taking to making our dashboards more powerful and useful for you. As you try out Interactive Dashboards, we look forward to your comments and feedback.

August 12, 2015

Blog

3 Ways to make your Data work for you - on AWS and everywhere else

This is an exciting time to be in the Enterprise software business. Many of the predictions we heard in the late 90s and early 2000s are finally coming true. First, we are finally moving from the world of the server artist (pronounced “Arteeste”) lovingly caring for his server, to a world where any self-respecting operations engineer is capable of runnings hundreds of systems on a bad day. That transformation has come on the back of organizations finally embracing things like configuration automation and standardization on top of powerful cloud platforms like Amazon Web Services (AWS). Container technologies like Docker have only accelerated that transition. Second, the evolution of the DevOps movement is changing the ways operations teams approach their jobs. Now they tend towards small, collaborative teams swarming problems with no respect for technology silos. Developers are also being held accountable now for the uptime of their software, so they are putting more of the operational machinery in code and changing the data from their apps to suit their needs. “So, what now?” Thanks for asking. Now that we are mechanizing our software and operations, we need to spend more thinking about what our applications and platforms spit out - the machine data. Here are three ways to up your machine data game: Embrace the Format Why are we still producing logs that look like they were written by a drunk monkey? If you are planning on getting any value out of those logs at scale you have to put them in a format that makes them easier to analyze. JSON is really the best choice, but even embracing simple key-value pairs will make a huge difference. Embrace the Content Imagination is great for jam sessions with your college buddies, but horrible for operational logging. The more your data looks the same, the easier it is to analyze, and to understand at 2 AM when your application is down. And how do you do that at scale? One way is to embrace standardized metrics and logging libraries (Log4J, NLog, etc.). Another way is to use your platform - well, that’s my third point. Embrace the Platform The primary reason why AWS is so successful is not the nebulous “cloud” concept. AWS is powerful because it has standardized and commoditized operational details away. Need a load-balancer? No need to order a $50k box with blinking lights. Press a button in your AWS console. Need another app server, or 10 or 100.? No problem. Now AWS is doing the same thing with data. It is simple to generate streams of useful data with CloudTrail, Cloudwatch, AWS Kinesis, etc. If you use the standardized services like ELB, RDS, or S3, you get even more metrics and logs out of the box. So - embrace the data, embrace the platform. So, what’s the wrap? Just remember to go all in! Half-measures don’t help you scale to that next level. Flexibility is great for that yoga class you just enrolled in, but standardization will make life easier when the proverbial stuff hits the fan at oh-dark:30 in the morning. Use the tools at hand to squeeze unnecessary mess out of your data and focus on the fun part - making pretty pictures for your boss with that newly massaged data. Box Chart, anyone?

AWS

August 11, 2015

Blog

The Digital Universe and PCI Compliance – A Customer Story

According to IDC, the digital universe is growing at 40% a year, and will continue to grow well into the next decade. It is estimated that by 2020, the digital universe will contain nearly as many digital bits as there are stars in the universe. To put this into perspective, the data we create and copy annually will reach 44 zettabytes, or 44 trillion gigabytes. In 2014 alone, the digital universe will equal 1.7 megabytes a minute for every person on earth. That is a lot of data!As a new employee at Sumo Logic, I’ve had the opportunity to come in contact with a lot of people my first few weeks – employees, customers and partners. One interaction with a global, multi-billion dollar travel powerhouse really stood out for me, as they are a great example of an organization grappling with massive growth in an ever expanding digital universe.The BusinessThe travel company provides a world-class product-bookings engine and delivers fully customized shopping experiences that build brand loyalty and drive incremental revenue. They company is also responsible for safeguarding the personal data and payment information of millions of customers. “Customer security and being compliant with PCI DSS is essential to our business” was echoed many times.The ChallengeAs a result of phenomenal growth in their business, the volume of ecommerce transactions and logs produced was skyrocketing, more than doubling from the previous year. The company was processing over 5 billion web requests per month, generating on average close to 50GB of daily log data across 250 production AWS EC2 instances. It became clear that an effective solution was required to enable the company to handle this volume of data more effectively. Current manual processes using Syslog and other monitoring tools were not manageable, searchable or scalable and it was very difficult to extract actionable intelligence. Additionally, this effort was extremely time intensive and would divert limited resources from focusing on more important areas of the business – driving innovation and competitive differentiation.PCI Compliance: The ability to track and monitor all access to network resources and cardholder data (PCI DSS Requirement 10) was of particular importance. This is not surprising as logging mechanisms and the ability to track user activities are critical in minimizing the impact of a data compromise. The presence and access to of log data across the AWS infrastructure is critical to provide necessary tracking, alerting and analysis when something goes wrong.The SolutionWhile multiple solutions were considered – including Splunk, Loggly and ELK stack, the company selected Sumo Logic for its strong time to value, feature set, and low management overhead. Additionally, the security attestations, including PCI DSS 3.0 Service Provider Level 1, as well as data encryption controls for data at rest and in motion, were levels above what other companies provided. Being able to not worry about the execution environment – handled by Sumo Logic – and focus on extracting value from the service was extremely valuable.The ResultsThe most important immediate benefits for the client included being able to reduce the time, cost and complexity of their PCI audit. They were also able to leverage the platform for IT Ops and Development use cases, reducing mean time to investigate (MTTI) and mean time to resolve (MTTR) by over 75%.As I was wrapping up our conversation, I asked if they had any “aha moments” in leveraging the Sumo Logic platform and dealing with this exponential growth in their digital universe. Their response was:“I’ve been really impressed with how fast the team has been able to identify and resolve problems. Sumo Logic’s solution has helped us change the playing field in ways that were just not possible before.”To learn more about Sumo Logic’s compliance & security solutions for AWS, please visit: http://www.sumologic.com/aws-trialTo try Sumo Logic for free, please visit: http://www.sumologic.com/pricing

Blog

Monitoring SumoLogic Usage for Faster Issue Resolution

Guest Blog Post by Ethan Culler-Mayeno, TechOps Engineer at LogicMonitorOn the LogicMonitor Technical Operations team, we love our logs. We know that logs are more than just “those files you need to delete to clear a disk space alert.” Logs provide patterns of normal behavior, and therefore make it easier to identify anomalies or changes in these patterns. Here at LogicMonitor, while our own product is the primary method used for monitoring performance and identifying issues, we also use logs as a tool to investigate (or better yet, prevent) issues.Recently, one of our servers managed to find its way into an email loop with an auto-responder. Unfortunately, in this game of digital chicken, it was our machine which reached its port 25 saturation point first. This email loop resulted in degraded performance for other applications running on that machine. Now you may be saying something like “Bummer, but just put in a rule to discard mail from the auto-responder and Bob’s your Uncle.” While that is certainly the correct way to address this issue, let’s take a step back – how would you identify the issue? In most troubleshooting scenarios, fixing the issue is easy. It’s finding the issue that is the hard (and often time consuming) part. How would you know that your machine was getting blown up by an email responder if you got an alert for HTTPS performance issues for a web application on that machine?Well now I am sure you’re guessing something along the lines of “the logs have the answer!” But mailserver logs are not usually the first place you look when investigating a web service performance issue…. So lets take a look at how we were able to use LogicMonitor to identify this issue.Our team uses SumoLogic as a log analysis tool. SumoLogic provides an excellent API for programmatically performing queries against logs, which allows our team to monitor and alert on subsets of our logs. We alert on specific events and exceptions, but we also use a groovy-based LogicModule (created by our engineers) that uses SumoLogic’s API to monitor the rate of log messages being written per device. Below is a graph for that datasource that shows the total number of log entries written for the server that was hit by the aforementioned mail loop.Because we were trending the number of log messages, as soon as we started looking at the performance of that server in LogicMonitor, it was very clear that we needed to investigate the logged messages in SumoLogic for details of the issue – which immediately led us to the mail loop, and a quick resolution.Monitoring your logs at a high level can fill in the pieces that content based log monitoring might miss. In many cases logs do not contain content that would cause a watcher (or monitoring solution) to bat an eye. However, when a device is logging 30x as many messages per minute as normal, it’s pretty safe to say that there is something wrong.You can download the SumoLogic LogicModule we used – SumoLogic_Logs_Per_Host – from our core repository, by selecting “Settings..Datasources…Add.. From LogicMonitor Repository” from within LogicMonitor. (Some more information about it is available on this help page.) You can also easily modify it to track, graph and alert on other data from SumoLogic.Let us know if you have other cool ways to tie into SumoLogic, or other logging systems, too!

Blog

Analyze Logs like a Pro with Sumologic – Big Data & Analytics

Every event must be logged either due to business or compliance requirement. Yes, that’s right – these days we all want to capture everything happening within applications. Well, it’s not bad things but it has pros and cons. You know the pros – you are getting all the required logs you want but side effects of this is, you have hell lot of logs/data which is difficult to analyze. For small business, it might be ok to look into few MB’s of logs. But if you are running medium size or global business then it’s just impossible to analyze all the captured logs and eventually goes to tape backup and kept there for many years. Sumo Logic is not the only company who solves this problem but it’s one of the best in the industries. It helps to transform daily operations into intelligent business decisions. It has large number of application log analyzer, which helps you to save time on writing complex queries to extract the data. List of log analyzer as I write: Linux Log Analyzer Nginx Log Analyzer IIS Log Analyzer Apache Log Analyzer Windows Log Analyzer Cisco Log Analyzer MySQL Log Analyzer Docker Log Analyzer AWS CloudTrail Log Analyzer Amazon S3 Log Analyzer Amazon Cloudfront Log Analyzer AWS Elastic LB Log Analyzer Akamai Log Analyzer Box Log Analyzer PIC Compliance Log Analyzer Microsoft Windows AD Log Analyzer VMware Log Analyzer Varnish Log Analyzer Adobe Connect Log Analyzer Hyperguard Log Analyzer Mac OS X Log Analyzer OSSEC Log Analyzer Palo Alto Networks Log Analyzer Postfix Log Analyzer Quickstart Log Analyzer StatsD Log Analyzer SourceFire and Snort IDS Log Analyzer So now you probably thinking some of the above analyzer will fit your requirements. Best thing about Sumologic is – they allow you to analyze your logs on cloud and as well as locally. Below picture will give you an idea how it works over cloud. This means in order to use Sumologic cloud, you need to have their collector agent/cloud collector running on your server. Collector agent is available for Linux, Mac OS X, Solaris and Windows OS. It’s time to show you how to analyze NGINX logs using Sumologic. Pre-requisite You need to register an account with company email Must have at-least 128 MB memory on your server to run Collector Agent (Only for cloud) I assume you have registered your account and ready to start analyzing the logs. Login into Sumologic and you should see following welcome screen Here you have an option to select if you wish to analyze local files or over cloud. Click on Upload Files to analyze the local files Select the log type as nginx Click on Select Files to select the access log Select the time zone for your log file Click on Continue It will take few minutes to upload (depends on log file size) and at the end you will have an option to view dashboard Click on View My Dashboard to see the overview of visitor locations, traffic volume, and response over time, etc. Let’s take a look at some of the out of box features to analyze the logs. These are available under Library menu. All HTTP Response codes with their count One of the first few things you want to analyze in your web server log is HTTP response code to understand if you are wasting server resources in serving 40x or 50x error. Top browsers Media Type Served Top Referrers Above are just to given you an idea that you can have a meaningful insight report of your web application which will help you to focus on what you want and help the business. Moreover, you can perform custom search and have complete control on timeline, reporting chart, data drill-down and save searched items to dashboard for future reference. I will use this to analyze following for Geek Flare. How about you? Analyze 40x & 50x HTTP status return code Analyze IP causing lots of 40x & 50x requests Analyze client’s browser Analyze response time Analyze robots https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

August 2, 2015

Blog

7 Things That Product Managers Should Say

Product Management is an art, not a major in which you can earn a degree. Like an apprentice, you’d learn on the job which requires commitment, discipline, and perseverance. In my several years as a Product Manager, I’ve learned that communication is one of the most important aspects in this job; I’ve had to pick myself up numerous times from missteps learning through trial and error. In this post, I’d like to pass along seven things I have learned on what Product Managers should say: 1. That feature is on the roadmap, but has been deprioritized so we can focus on the company’s top initiative. We receive requests from all directions and the requests can be overwhelming, but PMs have to say “No” more than “Yes” because we cannot build everything for everyone. It is easier, though, to say “No” once everyone understands that we as a company need to agree on the strategic vision with a focus on the top initiatives solving the biggest problems. This means that we cannot and should not commit to every request. 2. This is one of the top pain points – we need to fix it in the next release because it generates X tickets a week. Although we focus most of our time building the next innovative feature or product, we can’t neglect our current customers. We must allocate time to address critical customer issues, even if an issue requires time and effort to properly fix. Providing the proper fix means we can improve the product quality, which improves churn and customer satisfaction. 3. I will be back in 10 minutes, after my standup. Being in-sync and aligned with engineering is the key to a successful release. To be fully integrated with the team, the Product Manager should participate in all the agile ceremonies like the daily standups, sprint planning, and retrospectives. 4. When will the wireframes/mockups be ready for user testing? A design that has been user tested carries a lot more weight than a design which hasn’t. Over a few rounds of user testing, designs can be refined significantly! Our engineers are also very keen on getting customer feedback so that they know what we are building truly solves customer pain points. Click here to learn more about our UX process. 5. I have socialized this with Sales, Marketing, Support, and other stakeholders. Getting buy-in and alignment from all stakeholders is often the biggest hurdle in getting a project kicked off. Getting the sign-off also validates the plan has a purpose which solves customer pain points and enables the business to move faster. 6. Please log these events so adoption and other KPIs can be tracked. Adoption is a form of measuring success for a feature, team, or product. When we put measurements in place we’ll be able to make informed decisions on moving forward as planned, pivoting, or investing in other areas of the product. 7. Let me tell Marketing early so they have time to plan for the launch. Marketing is Product Management’s best friend in getting the feature or product to the marketplace. Successful campaigns and events require enough lead time for Marketing to plan and execute a solid go-to-market strategy. For these reasons, a strong Product Manager needs to communicate frequently and clearly with colleagues to get the most out of the team. The quotes I’ve focused on in this post are examples that demonstrate the many moving pieces and the communication required to build and maintain a successful product. PS, here are some bonus quotes for your entertainment: Nancy Chu Sr. Product Manager [email protected]

July 27, 2015

Blog

awsTroubleshooting AWS Web Apps with S3 Logs and CloudTrail

Your web application’s log data contains a vast amount of actionable information, but it’s only useful if you can cross-reference it with other events in your system. For example, CloudTrail provides an audit trail of everything that’s happened in your AWS environment. This makes it an indispensable security tool—but only if you can correlate CloudTrail activity with changes in web traffic, spikes in error log messages, increased response times, or the number of active EC2 instances. In this article, we’ll introduce the basics of AWS monitoring analytics. We’ll learn how a centralized log manager gives you complete visibility into your full AWS stack. This visibility dramatically reduces the time and effort required to troubleshoot a complex cloud application. Instead of wasting developer time tracking down software bugs in all the wrong places, you can identify issues quickly and reliably by replaying every event that occurred in your system leading up to a breakage. As you read through this article, keep in mind that troubleshooting with a centralized log management tool like Sumo Logic is fundamentally different than traditional debugging. Instead of logging into individual machines and grep’ing log files, you identify root causes by querying all of your log data from a single interface. A powerful query language makes it easy to perform complex lookups, visualizations help you quickly identify trends, and its centralized nature lets you cross-reference logs from different parts of your stack. Scenario This article assumes that you’re running an Apache web server on an EC2 instance, storing user photos in S3 buckets, and using CloudTrail to monitor your AWS administration activity. We’ll be walking through an example troubleshooting scenario to learn how AWS log data can help you identify problems faster than traditional debugging techniques. After a recent code push, you start receiving complaints from existing users saying that they can’t upload new images. This is mission-critical functionality, and fixing it is a high priority. But where do you start? The breakage could be in your custom application code, Apache, the EC2 instance, an S3 bucket, or even third-party libraries that you’re using. Traditional debugging would involve SSH’ing into individual machines and grep’ing their log files for common errors. This might work when you only have one or two machines, but the whole point of switching to AWS is to make your web application scalable. Imagine having a dozen EC2 instances that are all communicating with a handful of S3 buckets, and you can see how this kind of troubleshooting could quickly become a bottleneck. If you want a scalable web application, you also need scalable troubleshooting techniques. Centralized logging helps manage the complexity associated with large cloud-based applications. It lets you perform sophisticated queries with SQL-like syntax and visualize trends with intuitive charts. But, more importantly, it lets you examine all of your log data. This helps you find correlations between different components of your web stack. As we’ll see in this article, the ability to cross-reference logs from different sources makes it easy to find problems that are virtually impossible to see when examining individual components. Check the Error Logs As in traditional debugging, the first step when something goes wrong is to check your error logs. With centralized logging, you can check error logs from Apache, EC2, S3, and CloudTrail with a single query: error <code class="o">|</code> <code class="k">summarize</code> Running this in Sumo Logic will return all of the logs that contain the keyword error. However, even smaller web applications will output millions of log messages a month, which means you need a way to cut through the noise. This is exactly what the summarize operator was designed to do. It uses fuzzy logic to group similar messages, making it much easier to inspect your log data. In our example scenario, we only see minor warnings—nothing that indicates a serious issue related to user accounts. So, we have to continue our analysis elsewhere. Even so, this is usually a good jumping-off point for investigating problems with your web application. Check for Status Code Errors Web application problems can also be recorded as 400- and 500-level status codes in your Apache or S3 access logs. As with error logs, the advantage of centralized logging is that it lets you examine access logs from multiple Apache servers and S3 buckets simultaneously. _sourceCategory=S3/Access OR _sourceCategory=Apache/Access| parse "HTTP/1.1\" * " as status_code| where status_code > 400| timeslice 1m| count by status_code, _timeslice| transpose row _timeslice column status_code The _sourceCategory metadata field limits results to either S3 logs or Apache access logs, the parse operator pulls out the status code from each log, and the where statement shows us only messages with status code errors. The results from our example scenario are shown above. The light green portion of the stacked column chart tells us that we’re getting an abnormal amount of 403 errors from S3. We also noticed that the errors come from different S3 buckets, so we also know that it isn’t a configuration issue with a single bucket. Dig Deeper Into the S3 Logs Our next step is to take a closer look at these 403 errors to see if they contain any more clues as to what’s wrong with our web application. The following query extracts all of the 403 errors from S3: <pre><code class="nf">_sourceCategory</code><code class="o">=</code>S3/Access<code class="o">|</code> <code class="k">parse </code><code class="s">"HTTP/1.1\" * "</code> <code class="k">as </code>status_code<code class="o">|</code> <code class="k">where </code>status_code <code class="o">=</code> 403</pre> If we look closely at the raw messages from the above query, we’ll find that they all contain an InvalidAccessKeyID error: This tells us that whatever code is trying to send or fetch data from S3 is not authenticating correctly. We’ve now identified the type of error behind our broken signup functionality. In a traditional debugging scenario, you might start digging into your source code at this point. Examining how your code assigns AWS credentials to users when they start a new session would be a good starting point, given the nature of the error. However, jumping into your source code this early in the troubleshooting process would be a mistake. The whole point of log analytics is that you can use your log data to identify root causes much faster than sifting through your source code. There’s still a lot more information we can find in our log messages that will save several hours of troubleshooting work. Identify the Time Frame with Outlier This InvalidAccessKeyID error wasn’t there forever, and figuring out when it started is an important clue for determining the underlying cause. Sumo Logic’s outlier operator is designed to find anomalous spikes in numerical values. We can use this to determine when our 403 errors began occurring: <pre><code class="nf">_sourceCategory</code><code class="o">=</code>s3_aws_logs <code class="o">AND</code> <code class="s">"InvalidAccessKeyID"</code><code class="o">|</code> <code class="k">timeslice </code>1m<code class="o">|</code> <code class="k">count as </code>access_key_errors <code class="k">by </code><code class="nf">_timeslice</code><code class="o">|</code> <code class="k">outlier </code>access_key_errors</pre> Graphing the results as a line chart makes it ridiculously easy to identify when our web application broke: Without a centralized log management tool, it would have been much more difficult to identify when these errors began. You would have had to check multiple S3 buckets, grep for InvalidAccessKeyID, and find the earliest timestamp amongst all your buckets. In addition, if you have other InvalidAccessKeyID errors, it would be difficult to determine when the spike occurred vs. when a programmer mistyped some credentials during development. Isolating the time frame like this using traditional troubleshooting methods could take hours. The point to take away is that log data lets you narrow down the potential root causes of a problem in many ways, and outlier lets you quickly identify important changes in your production environment. Find Related Events in CloudTrail Now that we have a specific time frame, we can continue our search by examining CloudTrail logs. CloudTrail records the administration activity for all of your AWS services, which makes it a great place to look for configuration problems. By collecting CloudTrail logs, we can ask questions like, “Who shut down this EC2 instance?” and “What did this administrator do the last time they logged in?” In our case, we want to know what events led up to our 403 errors. All we need to do is narrow the time frame to the one identified in the previous section and pull out the CloudTrail logs: <pre><code class="nf">_sourceCategory</code><code class="o">=</code>CloudTrail</pre> The results show us that an UpdateAccessKey event occurred right when our 403 errors began. Investigating this log line further tells us that a user came in and invalidated the IAM access key for another user. The log message also includes the username that performed this action, and we see it is the same username that assigns temporary S3 access keys to our web app users when they start a new session (as per AWS best practices). So, we now have the “who.” This is almost all of the information we need to solve our problem. Note that if you didn’t know this was a security-related error (and thus didn’t know to check CloudTrail logs), you could perform a generic * | summarize query to identify other related errors/activity during the same time frame. Stalk the Suspicious User At this point, we have two possibilities to consider: One of your developers changed some security credentials but forgot to update the application code to use the new keys.A malicious user gained access to your AWS account and is attacking your website. Once again, we can answer this question with log analytics. First, we want to take a look at what other activity this user has been up to. The following query extracts all the CloudTrail events associated with this user: <pre><code class="nf">_sourceCategory</code><code class="o">=</code><code class="s">"cloudtrail_aws_logs"</code> <code class="o">|</code> <code class="k">json auto keys </code><code class="s">"useridentity.type"</code><code class="o">|</code> <code class="k">where </code><code class="err">%</code><code class="s">"useridentity.type"</code> <code class="o">=</code> <code class="s">"suspicious-user"</code></pre> Of course, you would want to change “suspicious-user” to the username you identified in the previous step. We find a long list of UpdateAccessKey events similar to the one above. This is looking like a malicious user that gained access to the account we use to assign temporary keys to users, but to really make sure, let’s check the location of the IP address: _sourceCategory="cloudtrail_aws_logs" | json auto keys "useridentity.type", "sourceIPAddress"| where %"useridentity.type" = "suspicious-user"| lookup latitude, longitude| count by latitude, longitude| sort _count The lookup operator gets the latitude and longitude coordinates of the user’s IP address, which we can display on a map: Our user logged in from Europe, while all of our existing administrators, as well as the servers that use those credentials, are located in the United States. This is a pretty clear indicator that we’re dealing with a malicious user. Revoke Their Privileges, Update Your App To resolve the problem, all you have to do is revoke the suspicious user’s privileges, change your AWS account passwords, and create a new IAM user for assigning temporary S3 access keys. The only update you have to make to your source code is to insert the new IAM credentials. After you’ve done this, you should be able to verify that the solution worked by examining our graph of 403 errors. If they disappear, we can rest easy knowing that we did, in fact, solve our problem: Debugging with log analytics means that you don’t need to touch your source code until you already have a solution to your problem, and it also means you can immediately verify if it’s the correct solution. This is an important distinction from traditional debugging. Instead of grep’ing log files, patching code, and running a suite of automated/QA testing, we knew exactly what code we needed to change before we changed it. Conclusion This article stepped through a basic AWS log analytics scenario. First we figured out what kind of error we had by examining S3 logs, then we figured out when they started by using outlier, determined who caused the problem with CloudTrail logs, and figured out why the user caused the problem. And, we did all of this without touching a single line of source code. This is the power of centralized log analytics. Consider all of the SSH’ing and grep’ing you would have to do to solve this problem—even with only a single EC2 instance and S3 bucket. With a centralized log manager, we only had to run 7 simple queries. Also consider the fact that our debugging process wouldn’t have changed at all if we had a hundred or even a thousand EC2 instances to investigate. Examining that many servers with traditional means is nearly impossible. Again, if you want a scalable web application, you need scalable debugging tools. We mostly talked about troubleshooting in this article, but there’s also another key aspect to AWS log analytics: monitoring. We can actually save every query we performed in this article into a real-time dashboard or alert to make sure this problem never happens again. Dashboards and alerts mean you can be proactive about identifying these kinds of issues before your customers even notice them. For instance, if we had set up a real-time alert looking for spikes in 403 errors, we would have been notified by our log management system instead of an unhappy user. The next article in this series will talk more about the monitoring aspects of AWS log analytics. We’ll learn how to define custom dashboards that contain key performance indicators that are tailored to your individual web application. We’ll also see how this proactive monitoring becomes even more important as you add more components to your web stack.

AWS

July 25, 2015

Blog

How Trek10 Uses Sumo Logic to Monitor Amazon Container Service

Guest Blog Post by Andrew Warzon, Founder at Trek10 and Jared Short, Director of DevOps at Trek10 You’ve probably heard of Docker by now – maybe you’ve toyed with Dockerizing your app or maybe you’re even running Docker in staging or production (We are!). For us, Docker means we can focus on building solid infrastructure and monitoring without having to worry about managing all sorts of things, like application specific dependencies, bootstrapping scripts, baking AMIs, or environment management. Docker also enables high fidelity parity from dev through to production, since everything runs in the same Docker containers. No more “but it works on my machine” bugs, or nasty dependency surprises between staging and production. With the explosive growth of Docker, it is no surprise that AWS announced the EC2 Container Service (ECS) in November 2014. ECS entered General Availability in April 2015. At Trek10, we have been running apps in ECS since May. ECS enables you to take your lovingly Dockerized applications and processes and distribute them across a cluster. It handles tasks like load balancing, rolling deploys, service/application updates, healing, and more. Only because of Docker & ECS can we confidently say, “Running eight different applications on a cluster is no different than running one or two applications.” Powerful stuff! As great as Docker & ECS seem to be, one of the biggest hurdles we faced was logging in a reliable, easy-to-manage way that let Docker & ECS do what they do best, with minimal engineering, configuration, and resource overhead. Unfortunately, collecting logs from a container has no accepted, perfect solution. There are a lot of options out there… Our primary goal is simplicity; we just want the logging problem to “get out of the way” so we can push ahead with building value. We’ve opted for a relatively simple solution: installing the Sumo collector on the ECS host and using mounted volumes. For the impatient, here is the quick summary of how we make this work: 1. Install the Sumo agent unattended on the ECS host with user data 2. Make sure your sumosources.json file points to a new directory like /tmp/logs which you will map into your containers 3. Make sure your logs inside the container are written to some directory like /tmp/clogs 4. Use the ECS task definition to map /tmp/clogs to /tmp/logs/mycontainer Here is some more detail on each step: Step 1: Write a script to install Sumo on the host unattended. We will run this script in the EC2 user data. User data is EC2’s way to let you run scripts upon launching an instance. In this way, we can customize the host without maintaining our own AMIs; we simply add this script to the existing ECS AMI. There are many ways to accomplish this, but our script includes the following: Copy some configs out of an S3 bucket including Sumo access keys and a Sumo sources JSON file. Create /etc/sumo.conf and /etc/sumosources.json on the host machine. Actually install the Sumo collector The key here is the sumosources.json file. Here is ours: { "api.version": "v1", "sources": [ { "sourceType" : "LocalFile", "name" : "${CLIENT_NAME}_${ECS_CLUSTER}-ecs_apps", "pathExpression" : "${LOGDIR}/**", "category": "${CLIENT_NAME}_${ECS_CLUSTER}-ecs", "hostName": "${CLIENT_NAME}_${INSTANCE_ID}", "useAutolineMatching": false, "multilineProcessingEnabled": ${MULTILINE}, "manualPrefixRegexp": "${APP_REGEX}", "timeZone": "UTC", "automaticDateParsing": true, "forceTimeZone": false, "defaultDateFormat": "MMM dd HH:mm:ss" }, { "sourceType" : "LocalFile", "name" : "${CLIENT_NAME}_${ECS_CLUSTER}-ecs_messages", "pathExpression" : "/var/log/messages", "category": "${CLIENT_NAME}_${ECS_CLUSTER}-ecs", "hostName": "${CLIENT_NAME}_${INSTANCE_ID}", "useAutolineMatching": false, "multilineProcessingEnabled": false, "timeZone": "UTC", "automaticDateParsing": true, "forceTimeZone": false, "defaultDateFormat": "MMM dd HH:mm:ss" }, { "sourceType" : "LocalFile", "name" : "${CLIENT_NAME}_${ECS_CLUSTER}-ecs_secure", "pathExpression" : "/var/log/secure", "category": "${CLIENT_NAME}_${ECS_CLUSTER}-ecs", "hostName": "${CLIENT_NAME}_${INSTANCE_ID}", "useAutolineMatching": false, "multilineProcessingEnabled": false, "timeZone": "UTC", "automaticDateParsing": true, "forceTimeZone": false, "defaultDateFormat": "MMM dd HH:mm:ss" } ] } Note line 7, pathExpression… this is the key. We define $LOGDIR to be some path on the host instance where we will later put our logs. This config just says to push anything in this directory into Sumo. Step 2: Pick some directory inside the container where your logs will exist. How you accomplish this will vary significantly based on your application. We point ours to a separate directory inside the container, /tmp/clogs. One key tip here: if whatever you are doing is different than how you would usually run this container, use the ECS Task Definition “Command” to override the default command for your container. Step 3: Mount your volumes with the ECS Task Definition. Here we are basically telling ECS to map all of the log files from inside the container (/tmp/clogs in our case) to outside the container where Sumo will be looking for log files as defined in sumosources.json In the ECS task definition, this is done with two pieces. First, you must define a Volume. This is the path on the host that will now be available to be mapped to containers. Here is where to edit this in the AWS Management Console “Task Definition Builder” GUI: One key note here: Make sure that this source path is a subdirectory of $LOGDIR as defined in sumosources.json, and that subdirectory is unique for each container you define across all task definitions in your cluster. This way, any given host can have an arbitrary number of containers and an arbitrary number of tasks running on it and Sumo will get all of the logs and keep them separate. The second piece of the task definition required is the “mount points” section of each container defined in your task definition. Use the volume name defined above, and map it to the log path inside the container. Below is how this looks in the Task Definition Builder: If you prefer to write the JSON for the Task Definition directly, here is a generic Task Definition with these two pieces: { "family": "my-container", "containerDefinitions": [ { "name": "MyContainer", "image": "python", "cpu": 400, "memory": 800, "entryPoint": [], "environment": [ { "name": "MY_VAR", "value": "foo" } ], "command": [], "portMappings": [ { "hostPort": 8080, "containerPort": 8080 } ], "volumesFrom": [], "links": [], "mountPoints": [ { "sourceVolume": "logs", "containerPath": "/tmp/clogs", "readOnly": false } ], "essential": true } ], "volumes": [ { "name": "logs", "host": { "sourcePath": "/tmp/logs/mycontainer" } } ] } So that’s it… a simple, low-maintenance, and flexible way to get all of your logs from ECS-run Docker containers into Sumo Logic. Good luck!

Blog

Setup Wizard – July 2015 Release is LIVE!

Blog

Big Data Analytics, in 15 Minutes or Less, from 34,000 Feet...

I love working at Sumo Logic – for lots of reasons. The culture is amazing, and the people are fun, but what I really love is blowing my customers’ minds. I have already blogged about our scalability – and you can read that here – but this time I want to talk about fast you can get up and running with Sumo Logic. Last week, I put together a slide deck highlighting how a partner of ours won a deal at a major financial institution, and in it, I mentioned that one of the value drivers was that we could be up and running in 15 minutes. I admit, it seems like a throwaway line, and if you are technical, well… you are probably going to question that. Sure enough, as the presentation made the rounds at our partner, their CTO started poking and prodding on that. How can we possibly have a customer up and running in 15 minutes? There are lots of reasons for this. First, Sumo Logic is a true SaaS service. When you sign up for the service online, your account is created in seconds and you are ready to start bringing in data. And yes, this is the same, highly secure, highly redundant environment that all of our customers use – from the smallest to the largest. So, we can have the environment ready in well under 15 minutes. But when I said “up and running”, I meant up and running, and analyzing your own data. How do we do this? Magic! Well… Wizards anyway. Our UX team and our engineers have been working tirelessly on creating Wizards to help you get your data in to Sumo Logic quickly and easily. Have a single file to upload? We can do that in a couple minutes. Want your AWS logs? That’s even faster. Need to install a collector on your server to collect the logs for a critical app while sitting on an airplane at 34,000 feet? 8 minutes. Wait… what? Yeah, that last one was pretty specific. While contemplating this Blog, I thought I should run through ingesting data myself, just to make sure we could do it as quickly as I thought. The thing is, I was sitting on a plane, 34,000 feet over the Nevada desert. Internet on planes is always a little spotty, but I decided to give it a shot anyway. Using our Onboarding Wizard, I went through the process of collecting some log data from my Macbook Pro. The wizard is really easy to use, and took me through the process step-by-step – from downloading and installing the collector, to selecting the logs I wanted to ingest. 8 minutes later I was searching the logs, and looking at dashboards. I even recorded this for posterity. I did all of this from the comfort of seat 7F. ***UPDATE*** After I wrote this Blog, we released an enhancement to our Wizards - now we make it even easier, with Intelligent Path Detection so that users do not have to manually type in path expressions for common sources like Apache, Linux and Mac systems, etc., and a redesigned experience for setting up streaming sources with new supported data types including HTTP, Syslog, Cisco ASA, Palo Alto Network, Linux system, Mac system, Windows Events, and Windows Performance. Another benefit of working for a true SaaS company - releases come fast and furious, and all of our customers get the benefit of these updates the minute they are released!

July 20, 2015

Blog

Introduction to Apache Log Analytics, Part II

Blog

Can Your Analytics Platform Do This?

Scaling a service can be a double-edged sword. If your app, your game, your online store or your next generation payment service starts to take off, the business team is popping champagne, while the operations team is often scrambling to scale the service out. Last Sunday was the season finale of Silicon Valley on HBO (it’s a fantastic show, and if you are not watching it, you should start!). In that episode, the fictional “Pied Piper” struggled while their rather boring “Condor Cam” went viral – thanks in part to Manny Pacquiao (I’m not going to explain any more, I don’t want to ruin the plot). Suffice it to say, traffic on their site spiked, and they scrambled to scale the service to handle hundreds of thousands of unexpected users. At one point they were jamming circuit breakers so they wouldn’t trip, punching holes in walls for new cable runs, and even letting small electrical fires burn uncontrollably, all in an effort to scale the site. Of course, that’s television, and they were able to keep the site running until the traffic dropped off. But it’s not too far off from the real world. While most bootstrapped startups are leveraging cloud-based servers instead of homemade racks in a garage, they still struggle to scale their own services when these bursts happen – and it’s hard enough to scale out your own services, let alone your 3rd party analytics platform. The other day, one of our large customers started sending us data from a new cluster of servers. A lot of new data. They went from their usual ingest rate of 10TB/Day to a rate of 20TB/Day. If you are unfamiliar with the machine data analytics space, I can tell you that’s a massive increase. In a typical on-premise implementation, doubling ingest in an environment of this size would cause chaos. The admins would have to stop collecting the data while they re-designed the architecture to support the new load. Once the design was finalized, network equipment, storage subsystems and servers would be ordered, while real estate in the data center was identified and racks installed. It will take weeks, but more commonly months, to scale this environment out. But the business doesn’t have months to scale – it needs that data right away. The only reason our customer was able to scale the way they did is because Sumo Logic was purpose-built to handle exactly this kind of use case. It is a true, multi-tenant, elastically scalable, cloud-based analytics service – the only one of it’s kind. The customer simply installed our collector on those new servers, and started sending us data. No architecture planning, no server buys, and no datacenter expansion was required. So, can your analytics platform do this?

June 30, 2015

Blog

3 Reasons Why You Need To Bodystorm

Last month I wrote about learning to build a Service Blueprint to build a better SaaS product from the “Transforming Customer Experience” training at Cooper U. Another valuable tool we learned is Bodystorming. Bodystorming is a technique that can be very helpful when designing customer interactions. Rather than imagining an interaction by brainstorming, the idea of bodystorming is to act as if the product or workflow actually existed by roleplaying the different parts of the interaction. The example here captures an actual bodystorming exercise I had with our Support Manager, Kevin. He tried to explain a problem with our current account lockout and password reset, but until we acted it out, I wasn’t able to fully empathize with our users and internalize the frustration. Kevin acted as the Sumo Logic application and I acted as a user who was locked out of the account and couldn’t reset password. As you can see, bodystorming makes it easier to understand and empathize with users by acting out different roles in the interaction, and the best part is that this can be achieved in a short period of time. It is valuable in helping participants come up with new creative ways to design a particular interaction by enabling the participants to be surrounded by the actual events, behaviors, inefficiencies, and pain points. My next step is to work out all the details of this workflow, and run it by our Security team to make sure that we are still PCI Compliant while we simplify the User Experience. In summary, next time you should definitely include bodystorming during your design process because: You want to understand and empathize with your users You want to efficiently design a product, workflow, or interaction You want to create an effective design that solves inefficiencies and pain points

June 24, 2015

Blog

Using Sleeper Cells to Load Test Microservices

Blog

4 Reasons Why You Should Make A Service Blueprint For Your Service

4 reasons why you should put together a Service Blueprint for your service: You understand that providing a cohesive customer experience is a requirement in today’s service-oriented society You want to understand your customer experience in its entirety You want to identify places where you can improve the entire customer service experience You want to motivate people in different parts of your organization so that you can work together to improve your customer experience from all touchpoints A couple of weeks ago a group of us attended the “Transforming Customer Experience” training at Cooper U. We had participation from User Experience, Customer Success, and Product Management. Together we want to unite cross functionally to have greater impact within the Sumo Logic organization, so that we can improve our service offering from all possible touchpoints. We are a service company, and we know that providing a cohesive customer experience is a requirement that is more important than ever in today’s service-oriented society (think, Uber and Airbnb). One of the most valuable lessons I learned in this class is the benefit of using the primary tool of service design – the Service Blueprint. The Service Blueprint includes the customer journey as well as all of the interactions and touchpoints that make up and support that journey. After sitting down and creating the Service Blueprint for the Sumo Logic Setup Wizard (which recently went through a dramatic UX transformation), it helped me look beyond the Setup Wizard so that I can evaluate all the other backend systems that make up the customer’s entire journey. The Service Blueprint for the Sumo Logic Setup Wizard starts off with the user signing up for the service on the website, going into the product, stepping through the wizard, seeing the Wow moment, and upgrading at the end of the 30 day trial. The multiple swimlanes capture the backend systems that support this entire journey: sending the activation email, syncing with Marketo, which then syncs with Salesforce, which then creates the lead and then routes to the appropriate Sales manager, at the same time our Onboarding engineers also get notified and they then reach out to the user providing assistance to get them fully set up in Sumo Logic. The existing workflow is marked in green. After mapping out the existing workflow, I then identified the customer emotions in the yellow boxes. Wherever the emotion is not entirely positive, I put a red circle next to it identifying them as pain points in the journey. Identifying pain points make it clear that we need to come up with improvements, marked in blue. For example, we know that waiting for a long time to see data in Sumo Logic is a frustrating experience, therefore we plan to keep on iterating on this portion of the workflow to shorten the wait, so that our users can get to a dashboard as soon as possible and visualize their data. This is the Wow Moment in the entire journey. The journey doesn’t just stop after the Wow Moment though, we need to continue to enhance the journey because the ultimate goal is provide the values that our users are looking for so that they feel compelled to upgrade to a paid account. After mapping this out on the Service Blueprint, it is now very clear visually that we need to shorten this portion of the journey as well – the time to conversion – as much as possible, because at the same time the Onboarding engineers are spending time helping customers onboard. To improve this, we plan to build in more in-app help, enhance our nurturing emails to point users to helpful tutorials and documentation, and to automatically parse fields during ingest so the important fields already come parsed by the time the user is ready for search. The next step is to put the Service Blueprint on a big wall somewhere in the office where everyone can see, and then invite other parts of the organization to review it together so everyone can contribute in adding more ideas on how we can improve the overall customer journey. A Service Blueprint can clearly identify the interactions between the user, touchpoints, and service employees, including the activities that the user can directly see and those that user does not see. Therefore, it is a very powerful tool that can be used to better deliver a successful customer experience holistically; it is a great tool in the way that it makes it easy to explain a long-running and complicated process in just a few minutes, which adds tremendous value in getting multiple minds together to collaborate on a problem. To summarize, here’s why you should put together a Service Blueprint for your service: You understand that providing a cohesive customer experience is a requirement in today’s service-oriented society You want to understand your customer experience in its entirety You want to identify places where you can improve the customer service experience You want to motivate people in different parts of your organization so that you can work together to improve your customer experience from all touchpoints

June 18, 2015

Blog

Sumo Logic Setup Wizard - June 2015 Release And The Road Ahead

In March we introduced the first version of the brand new Sumo Logic Setup Wizard focused on AWS after an amazing UX journey. This month we introduced a bold new version that works harder for you so you can get started quickly and easily for more types of Sources. It guides you step by step through the process of adding data, from selecting the type of data to configuring your Source, so that you can send your logs to Sumo Logic with minimal work on your end. From the usability to the visual design, every aspect of the setup process has been improved so that setting up data in Sumo Logic is now easier than ever. The wizard also installs a Sumo Logic App to help you analyze and visualize your data, if available. This release includes: The ability to upload local static files without configuring a Collector so that you can start searching right away! A completely redesigned experience for setting up Apache, Windows IIS, MySQL, Nginx, and Varnish Sources. For more information on this release, please contact Nancy Chu at [email protected]. But wait, there’s more! Here’s a sneak peek of what is coming in July: Based on popular request, a welcome page that clearly distinguishes between a sandbox environment via static file uploads and configuring Collectors to collect streaming data. A redesigned experience for setting up HTTP Sources, Syslog Sources, Cisco ASA, Palo Alto Network, Linux system logs, Mac system logs, Windows Events, and Windows Performance. We also have an interactive prototype for the ultimate vision for the Setup Wizard, please take a look here: Setup Wizard Ultimate Vision. In our upcoming releases, we plan to extend the new user experience to cover cloud Sources like Google Drive and Microsoft Office 365; and by popular demand we plan to incorporate Field Extraction Rules and Templates during data ingest so that important fields come already parsed when users are ready to search! To give feedback on our upcoming releases, please contact Nancy Chu at [email protected]. PS: Remember where we were a few months ago? We’ve come a long way! Nancy Chu Sr Product Manager [email protected]

June 17, 2015

Blog

Comprehensive Monitoring For Docker - More Than "Just" Logs

Today I am happy to be able to talk about something that has been spooking around in my head for the last six months or so. I've been thinking about this ever since we started looking into Docker and how it applies to what we are doing here at Sumo. There are many different and totally valid ways to get logs and statistics out of Docker. Options are great, but I have concluded that the ultimate goal should be a solution that doesn't require users to have in-depth knowledge about all the things that are available for monitoring and the various methods to get to them. Instead, I want something that just pulls all the monitoring data out of the containers and Docker daemons with minimal user effort. In my head, I have been calling this "a comprehensive solution". Let me introduce you to the components that I think need to be part of a comprehensive monitoring solution for Docker: Docker events, to track container lifecycles Configuration info on containers Logs, naturally Statistics on the host and the containers Other host stuff (daemon logs, host logs, ...) Events Let's start with events. The Docker API makes it trivial to subscribe to the event stream. Events contain lots of interesting information. The full list is well described in the Docker API doc, but let’s just say you can track containers come and go, as well as observe containers getting killed, and other interesting stuff, such as out of memory situations. Docker has consistently added new events with every version, so this is a gift that will keep on giving in the future. I think of Docker events as nothing but logs. And they are very nicely structured—it's all just JSON. If, for example, I can load this into my log aggregation solution, I can now track which container is running where. I can also track trends - for example, which images are run in the first place, and how often are they being run. Or, why are suddenly 10x more containers started in this period vs. before, and so on. This probably doesn't matter much for personal development, but once you have fleets, this is a super juicy source of insight. Lifecycle tracking for all your containers will matter a lot. Configurations Docker events, among other things, allow us to see containers come and go. What if we wanted also to track the configurations of those containers? Maybe we want to track drift of run parameters, such as volume settings, or capabilities and limits. The container image is immutable, but what about the invocation? Having detailed records of container starting configurations in my mind is another piece of the puzzle towards solving total visibility. Orchestration solutions will provide those settings, sure, but who is telling those solutions what to do? From our own experience, we know that deployment configurations are inevitably going to be drifting, and we have found the root cause to otherwise inscrutable problems there more than once. Docker allows us to use the inspect API to get the container configuration. Again, in my mental model, that's just a log. Send it to your aggregator. Alert on deviations, use the data after the fact for troubleshooting. Docker provides this info in a clean and convenient format. Logs Well, obviously, it would be great to have logs, right? Turns out there are many different ways to deal with logs in Docker, and new options are being enabled by the new log driver API. Not everybody is quite there yet in 12-factor land, but the again there are workarounds for when you need fat containers and you need to collect logs from files inside of containers. More and more I see people following the best practice of writing logs to standard out and standard error, and it is pretty straightforward to grab those logs from the logs API and forward them from there. The Logspout approach, for example, is really neat. It uses the event API to watch which containers get started, then turns around and attaches to the log endpoint, and then pumps the logs somewhere. Easy and complete, and you have all the logs in one place for troubleshooting, analytics, and alerting. Stats Since the release of Docker 1.5, container-level statistics are exposed via a new API. Now you can alert on the "throttled_data" information, for example - how about that? Again (and at this point, this is getting repetitive, perhaps), this data should be sucked into a centralized system. Ideally, this is the same system that already has the events, the configurations, and the logs! Logs can be correlated with the metrics and events. Now, this is how I think we are getting to a comprehensive solution. There are many pieces to the puzzle, but all of this data can be extracted from Docker pretty easily today already. I am sure as we all keep learning more about this it will get even easier and more efficient. Host Stuff In all the excitement around APIs for monitoring data, let's not forget that we also need to have host level visibility. A comprehensive solution should therefore also work hard to get the Docker daemon logs, and provide a way to get any other system level logs that factor into the way Docker is being put to use on the hosts of the fleet. Add host level statistics to this and now performance issues can be understood in a holistic fashion - on a container basis, but also related to how the host is doing. Maybe there's some intricate interplay between containers based on placement that pops up on one host but not the other? Without quick access to the actual data, you will scratch your head all day. User Experience What's the desirable user experience for a comprehensive monitoring solution for Docker? I think it needs to be brain-dead easy. Thanks to the API-based approach that allows us to get to all the data either locally or remotely, it should be easy to encapsulate all the monitoring data acquisition and forwarding into a container that can either run remotely, if the Docker daemons support remote access, or as a system container on every host. Depending on how the emerging orchestration solutions approach this, it might not even be too crazy to assume that the collection container could simply attach to a master daemon. It seems Docker Swarm might make this possible. Super simple, just add the URL to the collector config and go. I really like the idea of being able to do all of this through the API because now I don't need to introduce other requirements on the hosts. Do they have Syslog? JournalD? Those are of course all great tools, but as the levels of abstractions keep rising, we will less and less be able to make assumptions about the hosts. So the API-based access provides decoupling and allows for composition. All For One So, to be completely honest, there's a little bit more going on here on our end than just thinking about this solution. We have started to implement almost all of the ideas into a native Sumo Logic collection Source for Docker. We are not ready to make it generally available just yet, but we will be showing it off next week at DockerCon (along with another really cool thing I am not going to talk about here). Email [email protected] to get access to a beta version of the Sumo Logic collection Source for Docker.

Blog

Sumo Logic AWS VPC Flow Log Application

Hola peeps, Exciting times here at Sumo Logic! Last week we announced a new round of funding Sumo Logic Raises 80 Million and this week we are EXCITED to holla about our upcoming release of the AWS VPC Flow Log App! See the AWS blog by @jeffbarr https://aws.amazon.com/blogs/aws/vpc-flow-logs-log-and-view-network-traffic-flows/At a high level VPC Flow Logs allow AWS customers to create alarms that will fire if certain types of traffic are detected; you can also create metrics to help you to identify trends and patterns. The information captured by Flow Logs includes allowed and denied traffic (based on security group and network ACL rules). It also includes source and destination IP addresses, ports, the IANA protocol number, packet and byte counts, a time interval during which the flow was observed, and an action (ACCEPT or REJECT). The Sumo Logic Application will add a TON of additional value on top of what AWS is currently giving you with pre built Dashboards that show Geographical Locations of Network Traffic, highlight REJECTED IP’s Dashboard Uno: Packets dropping from China and Russia Dashboard Dos: Looking for Anomalies within the Network Traffic (Source, Destination, high rate of packets dropped) all dynamically set by our machine based learning analytics. This is just another step in quest of the Cloud Illuminati. Stay tuned for more updates and join us in our BETA program to get a head start on our AWS VPC Flow Application! Join the Cloud Illuminati Cambio y Fuera! George

June 11, 2015

Blog

Introduction to Apache Log Analytics, Part I

Blog

The Power of 5

Five years, five rounds of financing, five hundred customers already and 500 Sumo employees down the road. And there’s another 5 hidden in this story which you will have to puzzle out yourself. We welcome our new investors Draper Fisher Jurvetson Growth and Institutional Venture Partners, as well as Glynn Capital and Tenaya Capital. And we say thank you for the continued support of the people and the firms that have added so much value while fueling our journey: Greylock, Sutter Hill Ventures, Accel Partners, and Sequoia. It is fair to say that we were confident in the beginning that the hypotheses on which Sumo Logic was founded are fundamentally solid. But living through the last 5 years, and seeing what the people in this company have accomplished to build on top of this foundation is truly breathtaking and fills me with great pride. For us, the last five years have been a time of continuous scaling. And yet we managed to stay true to our vision – to make machine data useful with the best service we can possibly provide. We have become experts at using the power of the scalability that’s on tap in our backend to relentlessly crunch through data. Our customers are telling us that this again and again surfaces the right insights that help them understand their application and security infrastructures. And with our unique machine learning capabilities, we can turn outliers and anomalies into those little “tap tap tap”-on-your-shoulder moments that make the unknown known and that truly turn data into gold. One of the (many) things that is striking to me when looking back over the last 5 years is just how much I appreciate the difference between building software and building a service. They will have to drag me back kicking and screaming to build a product as a bunch of code to be shipped to customers. That’s right, I am a recovering enterprise software developer. We had a hunch that there must be a better way, and boy were we right. Choosing to build Sumo Logic as a service was a very fundamental decision – we never wanted to ever again be in a situation in which we were unable to observe how our product was being used. As a service, we have the ultimate visibility, and seeing and synthesizing what our customers are doing continuously helps to support our growth. At the same time, we have nearly total control over the execution environment in which our product operates. This is enlightening for us as engineers because it removes the guesswork when bug reports are coming in. No longer do I have to silently suspect that maybe it is related to that old version of Solaris that the customer insists on using to run my product on. And no, I don’t want to educate you which RAID level you need to run the database supporting my software on anymore, because if you don’t believe me, we are both going to be in a world of hurt 6 months down the road when everything grinds to a halt. I simply don’t want to talk anymore about you having to learn to run and administer my product. Our commitment and value is simple: let me do it for you, so you can focus on using our service and getting more value. Give us the control to run it right and all will benefit. Obviously, we are not alone in having realized the many benefits of software as a service – SaaS. This is why the trend to convert all software to services has only started. Software is eating software, quite literally. I see it every day when we replace legacy systems. We are ourselves exclusively consuming services at Sumo Logic – we have no data center. We literally have just one Linksys router sitting alone and lonely in the corner of our office, tying the wireless access points to some fiber coming out of the floor. That’s it. Everything else is a service. We believe this is a better way to live, and we put our money where our mouth is, supporting our fellow product companies that have gone the service route. So in many ways we are all riding the same wave, the big mega trend – a trend that is based on efficiency and a possibility of doing things in a truly better way. And we have the opportunity to both be like and behave like our customers, while actually helping our customers build these great new forward looking systems. At Sumo Logic, we have created a purpose-built cloud analytics service that supports, and is needed, by every software development shop over the next number of years as more and more products are built on the new extreme architecture. Those who have adopted and are adopting the new way of doing things are on board already and we are looking forward to support the next waves by continuing to provide the best service to monitor, troubleshoot, and proactively maintain the quality of your applications, infrastructure, and ultimately of your service. In addition, with our unique and patented machine learning analytics capabilities we can further deliver on our vision to bring machine intelligence to the masses where as this was previously only available to the fortunate few. As we scale along with the great opportunity that the massive wave of change in IT and software is bringing, we will put the money provided by our investors to the best possible use we can think of. First of all, we will continue to bring more engineers and product development talent on board. The addition of this new tech talent will continue to help us further develop our massive elastic scale platform which has grown more than 1000X in the past few years in terms of data ingested. In fact, we are already processing 50TB of new data every day, and that number will only go up. Our own production footprint has reached a point where we would literally have to invent a product like Sumo Logic in order to keep up – thankfully, we enjoy eating our dog food, all across the company. Except for the dogs in the office, they’d actually much rather have more human food. In any case, this service is engineering heavy, full of challenges along many dimensions, and scale is just one of them. If you are looking for a hardcore technical challenge, let’s talk ([email protected]). And while we continue to tweak our system and adhere to our SLAs (even for queries!), we will also massively grow the sales, G&A, marketing and customer success side of the company to bring what we believe to be the best purpose built cloud service for monitoring modern application architectures to more and more people, and to constantly improve on our mission of maniacal customer success. What do you say? Five more years, everybody!!!

June 1, 2015

Blog

Collecting In-Container Log Files

Docker and the use of containers is spreading like wildfire. In a Docker-ized environment, certain legacy practices and approaches are being challenged. Centralized logging is the one of them. The most popular way of capturing logs coming from a container is to setup the containerized process such that it logs to stdout. Docker then spools this to disk, from where it can be collected. This is great for many use cases. We have of course blogged about this multiple times already. If the topic fascinates you, also checkout a presentation I did in December at the Docker NYC meetup. At the same time, at Sumo Logic our customers are telling us that the stdout approach doesn’t always work. Not all containers are setup to follow the process-per-container model. This is sometimes referred to as “fat” containers. There are tons of opinions about whether this is the right thing to do or not. Pragmatically speaking, it is a reality for some users. What if you could visualize your entire Docker ecosystem in real-time? See how Sumo Logic makes it possible and get started for free today.Free Trial Even some programs that are otherwise easily containerized as single processes pose some challenges to the stdout model. For example, popular web servers write at least two log files: access and error logs. There are of course workarounds to map this back to a single stdout stream. But ultimately there’s only so much multiplexing that can be done before the demuxing operation becomes too painful. A Powerstrip for Logfiles Powerstrip-Logfiles presents a proof of concept towards easily centralizing log files from within a container. Simply setting LOGS=/var/logs/nginx in the container environment, for example, will use a bind mount to make the Nginx access and error logs available on the host under /var/logs/container-logfiles/containers/[ID of the Nginx container]/var/log/nginx. A file-based log collector can now simply be configured to recursively collect from /var/logs/container-logfiles/containers and will pick up logs from any container configured with the LOGS environment variable. Powerstrip-Logfiles is based on the Powerstrip project by ClusterHQ, which is meant to provide a way to prototype extensions to Docker. Powerstrip is essentially a proxy for the Docker API. Prototypical extensions can hook Docker API calls and do whatever work they need to perform. The idea is to allow for extensions to Docker to be composable – for example, to add support for overlay networks such as Weave and for storage managers such as Flocker. Steps to run Powerstrip-Logfiles Given that the Powerstrip infrastructure is meant to support prototyping of what one day will hopefully become Docker extensions, there’s still a couple of steps required to get this to work. First of all, you need to start a container that contains the powerstrip-logfiles logic: $ docker run --privileged -it --rm \ --name powerstrip-logfiles \ --expose 80 -v /var/log/container-logfiles:/var/log/container-logfiles \ -v /var/run/docker.sock:/var/run/docker.sock \ raychaser/powerstrip-logfiles:latest \ -v --root /var/log/container-logfiles Next you need to create a Powerstrip configuration file… $ mkdir -p ~/powerstrip-demo $ cat > ~/powerstrip-demo/adapters.yml <<EOF endpoints: "POST /*/containers/create": pre: [logfiles] post: [logfiles] adapters: logfiles: http://logfiles/v1/extension EOF …and then you can start the powerstrip container that acts as the Docker API proxy: $ docker run -d --name powerstrip \ -v /var/run/docker.sock:/var/run/docker.sock \ -v ~/powerstrip-demo/adapters.yml:/etc/powerstrip/adapters.yml \ --link powerstrip-logfiles:logfiles \ -p 2375:2375 \ clusterhq/powerstrip Now you can use the normal docker client to run containers. First you must export the DOCKER_HOST variable to point at the powerstrip server: $ export DOCKER_HOST=tcp://127.0.0.1:2375 Now you can specify as part of the container’s environment which paths are supposed to be considered logfile paths. Those paths will be bind-mounted to appear under the location of the –root specified when running the powerstrip-logfiles container. $ docker run --cidfile=cid.txt --rm -e "LOGS=/x,/y" ubuntu \ bash -c 'touch /x/foo; ls -la /x; touch /y/bar; ls -la /y' You should now be able to see the files “foo” and “bar” under the path specified as the –root: $ CID=$(cat cid.txt) $ ls /var/log/container-logfiles/containers/$CID/x $ ls /var/log/container-logfiles/containers/$CID/y See the example in the next section on how to most easily hook up a Sumo Logic Collector. Sending Access And Error Logs From An Nginx Container To Sumo Logix For this example, you can just run Nginx from a toy image off of Docker Hub: $ CID=$(DOCKER_HOST=localhost:2375 docker run -d --name nginx-example-powerstrip -p 80:80 -e LOGS=/var/log/nginx raychaser/powerstrip-logfiles:latest-nginx-example) && echo $CID You should now be able to see the Nginx container’s /var under the host’s /var/log/container-logfiles/containers/$CID/: $ ls -la /var/log/container-logfiles/containers/$CID/ And if you tail the access log from that location while hitting http://localhost you should see the hits being logged: $ tail -F /var/log/container-logfiles/containers/$CID/var/log/nginx/access.log Now all that’s left is to hook up a Sumo Logic collector to the /var/log/container-logfiles/containers/ directory, and all the logs will come to your Sumo Logic account: $ docker run -v /var/log/container-logfiles:/var/log/container-logfiles -d \ --name="sumo-logic-collector" sumologic/collector:latest-powerstrip [Access ID] [Access Key] This collector is pre-configured to collect all files from /container-logfiles which by way of the -v volume mapping in the invocation above is mapped to /var/log/container-logs/containers, which is where powerstrip-logfiles by default writes the logs for the in-container files. As a Sumo Logic user, it is very easy to generate the required access key by going to the Preferences page. Once the collector is running, you can search for _sourceCategory=collector-container in the Sumo Logic UI and you should see the toy Nginx logs. Simplify using Docker Compose And just because we can, here’s how this could all work with Docker Compose. Docker Compose will allow us to write a single spec file that contains all the details on how the Powerstrip container, powerstrip-logfiles, and the Sumo Logic collector container are to be run. The spec is a simple YAML file: powerstriplogfiles: image: raychaser/powerstrip-logfiles:latest ports: - 80 volumes: - /var/log/container-logfiles:/var/log/container-logfiles - /var/run/docker.sock:/var/run/docker.sock environment: ROOT: /var/log/container-logfiles VERBOSE: true entrypoint: - node - index.js powerstrip: image: clusterhq/powerstrip:latest ports: - "2375:2375" volumes: - /var/run/docker.sock:/var/run/docker.sock - ~/powerstrip-demo/adapters.yml:/etc/powerstrip/adapters.yml links: - "powerstriplogfiles: logfiles" sumologiccollector: image: sumologic/collector:latest-powerstrip volumes: - "/var/log/container-logfiles:/var/log/container-logfiles" env_file: .env You can copy and paste this into a file called docker-compose.yml, or take it from the powerstrip-logfiles Github repo. Since the Sumo Logic Collector will require valid credentials to log into the service, we need to put those somewhere so Docker Compose can wire them into the container. This can be accomplished by putting them into the file .env in the same directory, something like so: SUMO_ACCESS_ID=[Access ID] SUMO_ACCESS_KEY=[Access Key] This is not a great way to deal with credentials. Powerstrip in general is not production ready, so please keep in mind to try this only outside of a production setup, and make sure to delete the access ID and access key in the Sumo Logic UI. Then simply run, in the same directory as docker-compose.yml, the following: $ docker-compose up This will start all three required containers and start streaming logs to Sumo Logic. Have fun!

Blog

RSA 2015 “Rise of the Cloud Illuminati”

In 2012, I wrote a Blog on how the RSA Conference was “Back to the Golden Age”. Now, in 2015, it has been confirmed. The excitement created by cloud startup solutions and the SecOps movement has more hype than the upcoming Mayweather vs Pacquiao bout! Having said that, all this great energy has an undercurrent of fear – fear of losing control. Fear that your archaic security architecture and processes will soon no longer be relevant and that you may not be involved in cloud strategy meetings. There are now billions of users going beyond the traditional boundaries of the enterprise into cloud offerings. That means over a billion users are moving at the ‘pace of cloud’ and using modern tools, while information security and compliance teams are operating in the dark with respect to this activity. Security professionals are being challenged with lack of visibility and growing threats within their environments transformed by these cloud offerings. Users are in the cloud, in and out of your network and out of your control. So what about user permissions, access control and compliance? Forget that, you cannot control these unless you try to incorporate about 20 disparate solutions which are likely not heterogeneous and cannot handle modern applications or the plethora of data that each generates. So as a security professional, you can take “samples” from this massive volume of data and hope that you can monitor trends and get a reasonable assessment of your security posture. That is like a Bronco fan having faith that Peyton Manning will bring them a Super Bowl win. Ain’t happening, unless, you get out in front of it with Sumo Logic’s Cloud Audit capabilities. Today, at RSA 2015, Sumo Logic is announcing that billions of users will no longer operate in the shadows. We are going to illuminate these users and empower the Cloud Illuminati to reclaim control while enabling productivity. Given our integrations with Microsoft Office 365, Box, ServiceNow, AWS and many other cloud solutions, users can go about doing their jobs beyond traditional boundaries with complete visibility and without any compromises. Visibility or illumination is the first measure of regaining control without compromising speed and agility. Visibility gives you the situational awareness to make critical, time-sensitive decisions, measure risk and manage threats based on comprehensive analysis of data. On- prem, off-prem, Saas, Paas, like a Ronda Rousey arm bar you can now start the process of securing your cloud. So we ask you now, do you want to be one of “those people”, or do you want to be one of us, the trail blazers, the storm troopers, the ones who have control, the Cloud Illuminati. We will be discussing Cloud Illuminati during my panel on Wednesday, April 22nd at 10:20am, Moscone West, Room 2022

Blog

New Docker Logging Drivers

Docker Release 1.6 introduces the notion of a logging driver. This is a very cool capability and a huge step forward in creating a comprehensive approach to logging in Docker environments.It is now possible to route container output (stdout and stderr) to syslog. It is also possible to completely suppress the writing of container output to file, which can help in situations where disk space usage is of importance. This post will also show how easy it is to integrate the syslog logging driver with Sumo Logic.Let’s review for a second. Docker has been supporting logging of a container’s standard output and standard error streams to file for a while. You can see how this works in this quick example:<pre class="brush: plain; title: ; notranslate" title="">$ CID=$(docker run -d ubuntu echo "Hello")$ echo $CID5594248e11b7d4d40cfec4737c7e4b7577fe1e665cf033439522fbf4f9c4e2d5$ sudo cat /var/lib/docker/containers/$CID/$CID-json.log{"log":"Hello\n","stream":"stdout","time":"2015-03-30T00:34:58.782658342Z"}</pre>What happened here? Our container simply outputs Hello. This output will go to the standard output of the container. By default, Docker will write the output wrapped into JSON into a specific file named after the container ID, in a directory under /var/lib/docker/containers named after the container ID.Logging the Container Output to SyslogWith the new logging drivers capability, it is possible to select the logging behavior when running a container. In addition to the default json-file driver, there is now also a syslog driver supported. To see this in action, do this in one terminal window:<pre class="brush: plain; title: ; notranslate" title="">$ tail -F /var/log/syslog</pre>Then, in another terminal window, do this:<pre class="brush: plain; title: ; notranslate" title="">$ docker run -d --log-driver=syslog ubuntu echo "Hello"</pre>When running the container, you should see something along these lines in the tailed syslog file:Mar 29 17:39:01 dev1 docker[116314]: 0e5b67244c00: HelloCool! Based on the --log-driver flag, which is set to syslog here, syslog received a message from the Docker daemon, which includes the container ID (well, the first 12 characters anyways), plus the actual output of the container. In this case of course, the output was just a simple message. To generate more messages, something like this will do the trick:<pre class="brush: plain; title: ; notranslate" title="">$ docker run -t -d --log-driver=syslog ubuntu \ /bin/bash -c 'while true; do echo "Hello $(date)"; sleep 1; done'</pre>While still tailing the syslog file, a new log message should appear every minute.Completely Suppressing the Container OutputNotably, when the logging driver is set to syslog, Docker sends the container output only to syslog, and not to file. This helps in managing disk space. Docker’s default behavior of writing container output to file can cause pain in managing disk space on the host. If a lot of containers are running on the host, and logging to standard out and standard error are used (as recommended for containerized apps) then some sort of space management for those files has to be bolted on, or the host eventually runs out of disk space. This is obviously not great. But now, there is also a none option for the logging driver, which will essentially dev-null the container output.<pre class="brush: plain; title: ; notranslate" title="">$ CID=$(docker run -d --log-driver=none ubuntu \ /bin/bash -c 'while true; do echo "Hello"; sleep 1; done')$ sudo cat /var/lib/docker/containers/$CID/$CID-json.logcat: /var/lib/docker/containers/52c646fc0d284c6bbcad48d7b81132cb7ba03c04e9978244fdc4bcfcbf98c6e4/52c646fc0d284c6bbcad48d7b81132cb7ba03c04e9978244fdc4bcfcbf98c6e4-json.log: No such file or directory</pre>However, this will also disable the Logs API, so the docker logs CLI will also not work anymore, and neither will the /logs API endpoint. This means that if you are using for example Logspout to ship logs off the Docker host, you will still have to use the default json-file option.Integrating the Sumo Logic Collector With the New Syslog Logging DriverIn a previous blog, we described how to use the Sumo Logic Collector images to get container logs to Sumo Logic. We have prepared an image that extends the framework developed in the previous post. You can get all the logs into Sumo Logic by running with the syslog logging driver and running the Sumo Logic Collector on the host:<pre class="brush: plain; title: ; notranslate" title="">$ docker run -v /var/log/syslog:/syslog -d \ --name="sumo-logic-collector" \ sumologic/collector:latest-logging-driver-syslog \ [Access ID] [Access Key]</pre>As a Sumo Logic user, it is very easy to generate the required access key by going to the Preferences page. And that’s it folks. Select the syslog logging driver, and add the Sumo Logic Collector container to your hosts, and all the container logs will go into one place for analysis and troubleshooting.

Blog

Collecting and Analyzing CoreOS (journald) Logs w/ Sumo Logic

Towards More and Better Tools With Docker becoming an increasingly popular platform for deploying applications, we’re continually looking into how we can best leverage Sumo to help collect all the logs from containerized apps. We’ve already posted about Docker a few times regarding best collection strategies, and our official Docker image. One other request that we have heard from customers is how to pull logs, not from the containers themselves, but from journald, which CoreOS uses. An easy way to do this is by setting up a new systemd service that forwards those logs over udp to a Sumo Logic Collector. How to Set Up Journald Collection with Sumo Logic First, you’ll need to set up a collector that listens for the udp traffic we’re about to send it. Since CoreOS is built for a containerized world, we recommend setting up the official Sumo Logic Docker image on the localhost, and mapping it to the appropriate ports. docker run -d -p 514:514 -p 514:514/udp --name="sumo-logic-collector" sumologic/collector:latest-syslog [Access ID] [Access key] Second, you’ll want to create a new unit that describes the forwarding system we’ll want to set up. An example unit file is provided below, but you can tweak the journalctl output if you want to change the formatting to another iso format or json. [Unit] Description=Send Journalctl to Sumo [Service] TimeoutStartSec=0 ExecStart=/bin/sh -c '/usr/bin/journalctl -f | /usr/bin/ncat --udp localhost 514' Restart=always RestartSec=5s [Install] WantedBy=multi-user.target In-depth details for creating the service can be found here, though the gist is to save this unit file as journalctl_syslog.service in /etc/systemd/system and run the following commands: $ sudo systemctl enable /etc/systemd/system/journalctl_syslog.service $ sudo systemctl start journalctl_syslog.service Once the service is up and running, that’s all there is to it. Restarts will be handled by systemd, and all the data should be forwarded appropriately to the cloud from the collector. Example Queries Once the data is present inside of Sumo Logic, you might want to try some of the following searches: Message Count by Unit _sourceCategory=journald | parse "\"MESSAGE\" : \"*\"" as message nodrop | parse "\"UNIT\" : \"*\"" as unit nodrop | where !(isNull(unit) OR unit="") | timeslice by 1m | count by unit, _timeslice | transpose row _timeslice column unit Log Levels Over Time _sourceCategory=journald | parse "\"MESSAGE\" : \"*\"" as message nodrop | parse "\"UNIT\" : \"*\"" as unit nodrop | where isNull(unit) OR unit="" | parse regex field=message "(?<level>[A-Z]{2,})" | timeslice by 1m | count by level, _timeslice | where level !="" | transpose row _timeslice column level Outlier Detection on Total Number of Journald Messages _sourceCategory=journald | timeslice by 1m | count by _timeslice | outlier _count

Blog

No. More. Noise.

756 ‘real’ emails this week, 3200+ more generated by machines and approximately 78,000 push notifications, todos, Jira tasks and text messages – all asking for a piece of your attention. Oh, did you want to get some work done? There’s also a line of people waiting to interrupt you or ask someone near you a question while you’re mentally skydiving through the 17-layer-dip that is your source base. One more little sound and it all vanishes from your memory. But we’re not actually going to talk about cats (much) or their noises. Today let’s battle the noise that is your infrastructure. Have you ever tried setting up an alert on some key performance indicator (or KPI as so many like to say)? It’s easy – alert me when the request volume goes over 6000. Per node. Wait, no – per cluster. Ok, 14000 per cluster. No, 73333 per balancer. Ok, now 87219. Nevermind, I never want to wake up for this ever again – just alert me if the entire service stops existing. Done. Luckily I have a great solution for you! Today, right now even, you can parse out your favorite KPI and run a simple operator to find points in time where that indicator exceeds a dynamic and mathematically awesome boundary to tell you with so much likelihood that something important actually happened. That new operator is called Outlier and it does exactly what you hope it does. Let’s look at an example: parse “bytes: *,” as bytes | timeslice 5m | sum(bytes) as sumbytes by _timeslice | outlier sumbytes You’ve already done the first 3 lines yourself many times but that last line does all this extra magic for you, showing you exactly where the thing went off the rails. What’s more, you can even tune this to meet your needs by looking for a sustained change (the ‘consecutive’ parameter), a positive change only (‘direction’ parameter) and even the size of the thresholds shown here with blue shading (the ‘threshold’ parameter). Our documentation will help you get the most out of this great new operator but before we move on, you should also note that you can use this to identify outliers across multiple streams of data with a single query. This means you can get an alert if one of your deployments or hosts goes outside of it’s thresholds – where those thresholds are dynamic and specific to that deployment/host! parse “bytes: *,” as bytes | timeslice 5m | sum(bytes) as sumbytes by _timeslice, _sourceHost | outlier sumbytes by _sourceHost | where _sumbytes_violation=1 That last line is the logic you need to eliminate the non-outliers and then all you need to do is setup a saved search alert to get your noise-free alerts going. Use the conditional alert where the number of results > 0 and you’ll only see alerts when you have a real problem! And when it turns out that you get a spurious alert anyway, you can always come back and adjust threshold, consecutive, direction and window to make things right again. And now, with no shyness about the abrupt segue, how would you like to see into the future as well? Well I can help with that too – the new Predict operator will show you a linear projection into the future. Now that you’ve become a master of alerts with Outlier, just imagine what sort of power you can wield by getting an alert before your disk fills up on your DB box. Just as with Outlier, you can configure a scheduled search alert to become the ultimate master of DevOps while befriending unicorns and defying flying tacos. But just to be clear – you’ll get that alert before the outage begins so that you can avoid posting awful things on your status page and filing post mortems with your managers. This is why people will assume you’re magical. As always, the documentation is an amazing place to get started, but for now I’ll leave you with this example and kindly ask that you get in touch to let us know what you think after you’ve tried out these two fantastic operators! …| parse “diskfree: *,” as diskfree | timeslice 5m | sum(diskfree) as sum_disk by _timeslice | predict sum_disk by 5m forecast=20

March 11, 2015

Blog

The UX Story Behind Our Brand New AWS Setup Wizard

The Release: New AWS Setup Wizard A new first-time Setup Wizard for our AWS users is now available, so that setting up AWS data is incredibly simple in Sumo Logic. The Problem: Time to Productivity We want our users to be productive as soon as possible. The new streamlined setup wizard enables our first-time users to dive right into Sumo Logic without spending too much time setting it up. The Goal: Intuitive, Simple, Powerful Improving the user experience has been a crucial part of this initiative. We aimed to deliver a new user interface that doesn’t require users to understand too much about how Sumo Logic data collection works. We were also careful not to solve a problem users don’t care about or introduce unnecessary features. The UX Philosophy: Clean and Straightforward Rebecca Sorensen, our designer, shared the philosophy behind this new design: Tell our readers about your role at Sumo Logic. I’m a Visual Designer on our UX team. Much of what I do focuses on the way things look, but we try to make sure are roles aren’t too siloed. I participated in all aspects of this project, from UX research to interaction design. How would you describe our design culture? Our UX team, like the company, is rapidly growing. And with this growing team, our goal is to foster a culture of openness and collaboration, which we’re achieving by getting constant feedback on our ideas from each other and across other teams as well. Starting this dialogue has really helped us to increase visibility at the company. Now everyone from Sales to Backend Engineering can get excited about important changes that are coming to the product and contribute to making excellent user experience a priority at Sumo Logic. What tools do you use in your design process? Rapid iteration is really important to me so I’ll use whatever tool is fastest to get my point across. Axure has been great for turning wireframes into clickable lo-fi prototypes that were easy to share with stakeholders early in the process. Once we’re ready to transition to high fidelity mockups, I generally design the screens in Sketch, which is a newer tool built especially for UI design. Then I might import the mockups into Invision where users can play around with a more realistic-feeling prototype. I like to test out certain components in HTML and CSS to get a better feel for things like hover states and animations before passing the designs onto our front-end developers for implementation. What was your thought process behind this design? I knew the design had to be clean and straightforward, since the data setup process is so technical. Helping the user accomplish their goal was first and foremost, so in many ways the design had to take a back seat to making sure the page was uncomplicated and intuitive. In terms of visual design, I was inspired by Google’s Material Design’s use of bold color and large-scale typography. Establishing a consistent hierarchy was also a critical part of the design being successful. What can Sumo Logic users look forward to this year in terms of design? A lot! Our focus will be shifting in a big way this year to building interfaces that look polished but are also incredibly thoughtful and intuitive. We already have an powerful product – our next challenge is to make sure the UI can hold up. We’ll start by unifying our design language and bringing a consistent look and feel to all areas of the application, then extend this language to new features and functionality. Providing the user with tools that help them get their work done better and faster will be central to everything we do. Thanks Rebecca, great work! The Method: Ask, Think, Repeat The diagram below depicts our method. We gathered user feedback at multiple points during the planning and development cycle to test that our understanding of the problem matched our users’ thinking and rationale. We observed what users did and thought as they interacted with each version of the design This helped us quickly discover what worked and what didn’t to iterate through another cycle. Before development started, we hosted a round of tests with internal stakeholders using lo-fi wireframes without any backend functionality. Testing in this fashion provided a simple and quick way to determine if the design would actually solve problems; it was something tangible with which the user could interact that could also be easily updated and modified. Once we were happy with the direction, development started. We continued testing using hi-fidelity mockups; at this point realistic prototypes were required to get the next level of feedback. We improved the design without impacting the overall direction throughout the development cycle as we planned our sprints. The fun part began once development was ready for user testing. I recruited both existing customers as well as prospects without prior knowledge of Sumo Logic, but with experience using other log analytics services. We had eight user sessions over two weeks; some of the user tests took place at Sumo Logic HQ, while others were remote. Observing the users while they worked through a series of scripted tasks to set up their AWS source gave us terrific insight into what worked, what didn’t, and why. Our team operated on a rapid development cycle: directly after each user test, we reviewed the feedback, decided what changes to incorporate, made those changes to the code, and were able to re-deploy new code before the next user test. This fast iteration allowed us to collect and incorporate a great deal of user feedback, without the delay of traditional lab testing. As a Product Manager, I loved being able to share user feedback live with the team, because having actual data short circuited debates over whether a feature made sense to users. As soon as the team saw what the users saw, everyone knew what issues needed to be fixed. Next Steps: More Flows, More Feedback Usability testing is effective but traditionally time consuming. Our goal this year is to come up with a faster, easier, and more efficient process that compliments our agile development cycle in order to understand what we can do to improve our user experience. This new first-time Setup Wizard will be one of the many releases that primarily focuses on improving Sumo Logic’s user experience. Our goal is to extend this kind of simplicity, power, and flexibility throughout our user interface to enable our users to solve a wide range of problems. For this release I plan to collect metrics for a couple of weeks as users interact with it organically, to see if this new workflow is as effective as we hope. Finally, I want to hear from you. If you’ve tried our new Setup Wizard, please let me know what worked for you and what didn’t. And let me know if you’d like to see a preview of the next iteration. I’d love your feedback. If you haven’t signed up for Sumo Logic yet, sign up for a free trial to try it out now!

AWS

March 5, 2015

Blog

Optimizing Selectivity in Search Queries

While we at Sumo Logic constantly work to improve search performance automatically, there are some improvements that can only be made by humans who understand what question a search query answers and what the data looks like. In a Sumo Logic query, operators are chained together with the pipe symbol ( | ) to calculate the result. Each operator sends its output to the next operator in the chain. Certain operators, such as where and parse (except with nodrop) drop certain messages. Dropped messages do not need to be processed by any subsequent operators in the chain. This is called the selectivity of an operator. The fewer messages that “make it” through the operator, the more selective the operator is. Operator Ordering For optimal query performance, move the most selective operators to the earliest positions in your query. This reduces the amount of data subsequent operators need to process, and thereby speeds up your query. Query 1 error | parse “ip=*, errorcode=*“ as ip, errorcode | lookup ip from /my/whitelisted_ips on ip=ip | where errorcode=”failed_login” In the example above, Query 1 performs a lookup on all log lines, just to discard all lines where errorcode isn’t “failed_login”. Query 2 below is much more optimal, since it only performs the lookup on log lines that match the overall selectivity criteria of the query. Query 2 error | parse “ip=*, errorcode=*“ as ip, errorcode | where errorcode=”failed_login” | lookup ip from /my/whitelisted_ips on ip=ip Data Knowledge/Result Predictions To optimize queries and predict results, you can use knowledge of your data. Consider the following example. Query 3 error failed_login | parse “ip=*, errorcode=*“ as ip, errorcode | where errorcode=”failed_login” | lookup ip from /my/whitelisted_ips on ip=ip | if( isNull(ip), "unsafe", "safe") as ip_status | where ip_status="unsafe" | count by ip | top 10 newip, ip by _count You may know that your top 10 values are all measured in the thousands or tens of thousands. Based on that knowledge, you can optimize this query to not evaluate any IP addresses that occur less frequently than what you expect: Query 4 error failed_login | parse “ip=*, errorcode=*“ as ip, errorcode | where errorcode=”failed_login” | count by ip | where _count > 1000 | lookup ip from /my/whitelisted_ips on ip=ip | if( isNull(ip), "unsafe", "safe") as ip_status | where ip_status="unsafe" | top 10 newip, ip by _count

February 17, 2015

Blog

A New Look for Your Data

As some of you may have seen by now, we’ve launched a fantastic new look and feel for our Dashboards. From top to bottom, we’ve added more unicorns, rainbows and cat-meme-powered visualizations. The end result? We hope you’ll agree: it’s a breath of fresh air that brings a bit more power to your analytics inside Sumo Logic. Do you find yourself accidentally moving Panels around, when all you wanted to do was look at your Dashboard? Worry no more. We have introduce a clear Edit mode that simplifies the whole process of making your Dashboard look perfect. With the new Edit mode, you can now unleash your creative side, and move and resize panels on your dashboard. You can make one panel take up most of the screen and surround it with smaller panels. Mix and match big Panels and small until you get the perfect balance of DevOps goodness (it’s also great feng shui). And for those of you that want that that edgy, uber-geek look – meet our new Light and Dark themes. Use the gear icon and select the menu item Toggle Theme to switch over to this great new option, pick your side, and may the force be with you. Over the years as we’ve gathered feedback, we kept hearing over and over how much teams wanted to be able to add simple blobs of text next to their charts. For some, this was a matter of providing really important references to SOPs and other important-sounding documents. For others, they really just needed a note to “Sound all the alarms and pray to your various gods if this ever goes above 17.8756”. You get the idea – a little extra context makes all the difference – and now you can put that right in your Dashboards! Just click the Add Panel button while in Edit mode,you can add a Panel just for title text or a text blog. And did you want icing on this cake? Markdown. That’s right – you’re just a few dozen asterisks away from the perfect nested list. We also took some time to brush up some of our favorite chart types. Been wishing for an easier to read Single Value Monitor? Done. Ever wished your pie charts look cooler? Well, we added Donut charts to spice things up. Our guys in the apps department couldn’t wait to get their hands on this.. Since we all know and love AWS and the essential functionality that our AWS apps provide, we decided that those were a great place to start with a bit of a refresh as well. These apps now feature a more uniform Overview dashboard and better visualizations for key data points, and they also look pretty cool. So what do you think of the new Dashboards and AWS apps? Love it or hate it – let us know!

February 2, 2015

Blog

An Official Docker Image For The Sumo Logic Collector

Note: This post is now superceded by Update On Logging With Docker.Learning By Listening, And DoingOver the last couple of months, we have spent a lot of time learning about Docker, the distributed application delivery platform that is taking the world by storm. We have started looking into how we can best leverage Docker for our own service. And of course, we have spent a lot of time talking to our customers. We have so far learned a lot by listening to them describe how they deal with logging in a containerized environment.We actually have already re-blogged how Caleb, one of our customers, is Adding Sumo Logic To A Dockerized App. Our very own Dwayne Hoover has written about Four Ways to Collect Docker Logs in Sumo Logic.Along the way, it has become obvious that it makes sense for us to provide an “official” image for the Sumo Collector. Sumo Logic exposes an easy to use HTTP API, but the vast majority of our customers are leveraging our Collector software as a trusted, production-grade data collection conduit. We are and will continue to be excited about folks building their own images for their own custom purposes. Yet, the questions we get make it clear that we should release an official Sumo Logic Collector image for use in a containerized worldInstant Gratification, With Batteries IncludedA common way to integrate logging with containers is to use Syslog. This has been discussed before in various places all over the internet. If you can direct all your logs to Syslog, we now have a Sumo Logic Syslog Collector image that will get you up and running immediately:docker run -d -p 514:514 -p 514:514/udp --name="sumo-logic-collector"sumologic/collector:latest-syslog [Access ID] [Access key]Started this way, the default Syslog port 514 is mapped port on the host. To test whether everything is working well, use telnet on the host:<pre class="brush: plain; title: ; notranslate">telnet localhost 514</pre>Then type some text, hit return, and then CTRL-] to close the connection, and enter quit to exittelnet. After a few moments, what you type should show up in the Sumo Logic service. Use a search to find the message(s).To test the UDP listener, on the host, use Netcat, along the lines of:<pre class="brush: plain; title: ; notranslate">I'm in ur sysloggz | nc -v -u -w 0 localhost 514</pre>And again, the message should show up on the Sumo Logic end when searched for.If you want to start a container that is configured to log to syslog and make it automatically latch on to the Collector container’s exposed port, use linking:docker run -it --link sumo-logic-collector:sumo ubuntu /bin/bashFrom within the container, you can then talk to the Collector listening on port 514 by using the environment variables populated by the linking:echo "I'm in ur linx" | nc -v -u -w 0 $SUMO_PORT_514_TCP_ADDR $SUMO_PORT_514_TCP_PORTThat’s all there is to it. The image is available from Docker Hub. Setting up an Access ID/Access Key combination is described in our online help.Composing Collector Images From Our Base ImageFollowing the instructions above will get you going quickly, but of course it can’t possibly cover all the various logging scenarios that we need to support. To that end, we actually started by first creating a base image. The Syslog image extends this base image. Your future images can easily extend this base image as well. Let’s take a look at what is actually going on! Here’s the Github repo:https://github.com/SumoLogic/sumologic-collector-docker.One of the main things we set out to solve was to clarify how to allow creating an image that does not require customer credentials to be baked in. Having credentials in the image itself is obviously a bad idea! Putting them into the Dockerfile is even worse. The trick is to leverage a not-so-well documented command line switch on the Collector executable to pass the Sumo Logic Access ID and Access Key combination to the Collector. Here’s the meat of the run.sh startup script referenced in the Dockerfile:<pre class="brush: plain; title: ; notranslate">/opt/SumoCollector/collector console -- -t -i $access_id -k $access_key-n $collector_name -s $sources_json</pre>The rest is really just grabbing the latest Collector Debian package and installing it on top of a base Ubuntu 14.04 system, invoking the start script, checking arguments, and so on.As part of our continuous delivery pipeline, we are getting ready to update the Docker Hub-hosted image every time a new Collector is released. This will ensure that when you pull the image, the latest and greatest code is available.How To Add The Batteries YourselfThe base image is intentionally kept very sparse and essentially ships with “batteries not included”. In itself, it will not lead to a working container. This is because the Sumo Logic Collector has a variety of ways to setup the actual log collection. It supports tailing files locally and remotely, as well as pulling Windows event logs locally and remotely.Of course, it can also act as a Syslog sink. And, it can do any of this in any combination at the same time. Therefore, the Collector is either configured manually via the Sumo Logic UI, or (and this is almost always the better way), via a configuration file. The configuration file however is something that will change from use case to use case and from customer to customer. Baking it into a generic image simply makes no sense.What we did instead is to provide a set of examples. This can be found in the same Github repository under “example”: https://github.com/SumoLogic/sumologic-collector-docker/tree/master/example. There’s a couple of sumo-source.json example files illustrating, respectively, how to set up file collection, and how to setup Syslog UDP and Syslog TCP collection. The idea is to allow you to either take one of the example files verbatim, or as a starting point for your own sumo-sources.json. Then, you can build a custom image using our image as a base image. To make this more concrete, create a new folder and put this Dockerfile in there:<pre class="brush: plain; title: ; notranslate">FROM sumologic/collectorMAINTAINER Happy Sumo CustomerADD sumo-sources.json /etc/sumo-sources.json</pre>Then, put a sumo-sources.json into the same folder, groomed to fit your use case. Then build the image and enjoy.A Full ExampleUsing this approach, if you want to collect files from various containers, mount a directory on the host to the Sumo Logic Collector container. Then mount the same host directory to all the containers that use file logging. In each container, setup logging to log into a subdirectory of the mounted log directory. Finally, configure the Collector to just pull it all in.The Sumo Logic Collector has for years been used across our customer base in production for pulling logs from files. More often than not, the Collector is pulling from a deep hierarchy of files on some NAS mount or equivalent. The Collector is quite adept and battle tested at dealing with file-based collection.Let’s say the logs directory on the host is called /tmp/clogs. Before setting up the source configuration accordingly, make a new directory for the files describing the image. Call it for example sumo-file. Into this directory, put this Dockerfile:<pre class="brush: plain; title: ; notranslate">FROM sumologic/collectorMAINTAINER Happy Sumo CustomerADD sumo-sources.json /etc/sumo-sources.json</pre>The Dockerfile extends the base image, as discussed. Next to the Dockerfile, in the same directory, there needs to be a file called sumo-sources.json which contains the configuration:<pre class="brush: plain; title: ; notranslate">{ "api.version": "v1", "sources": [ { "sourceType" : "LocalFile", "name": "localfile-collector-container", "pathExpression": "/tmp/clogs/**", "multilineProcessingEnabled": false, "automaticDateParsing": true, "forceTimeZone": false, "category": "collector-container" } ]}</pre>With this in place, build the image, and run it:<pre class="brush: plain; title: ; notranslate">docker run -d -v /tmp/clogs:/tmp/clogs -d --name="sumo-logic-collector"[image name] [your Access ID] [your Access key]</pre>Finally, add -v /tmp/clogs:/tmp/clogs when running other containers that are configured to log to /tmp/clogs in order for the Collector to pick up the files.Just like the ready-to-go syslog image we described in the beginning, a canonical image for file collection is available. See the source: https://github.com/SumoLogic/sumologic-collector-docker/tree/master/file.docker run -v /tmp/clogs:/tmp/clogs -d --name="sumo-logic-collector"sumologic/collector:latest-file [Access ID] [Access key]If you want to learn more about using JSON to configure sources to collect logs with the Sumo Logic Collector, there is a help page with all the options spelled out.That’s all for today. We have more coming. Watch this space. And yes, comments are very welcome.https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

Blog

Shifting Into Overdrive

How Our Journey Began Four years ago, my co-founder Kumar and I were just two guys who called coffee shops our office space. We had seen Werner Vogel’s AWS vision pitch at Stanford and imagined a world of Yottabyte scale where machine learning algorithms could make sense of it all. We dreamed of becoming the first and only native cloud analytics platform for machine generated data and next gen apps, and we dreamed that we would attract and empower customers. We imagined the day when we’d get our 500th customer. After years of troubleshooting scale limitations with on-premises enterprise software deployments, we bet our life savings that multi-tenant cloud apps could scale to the infinite data scales that were just a few years away. Eclipsing Our First Goal Just a few weeks ago, we added our 500th enterprise customer in just over two years since Sumo Logic’s inception. As software developers, the most gratifying part of our job is when customers use and love our software. This past month has been the most gratifying part of the journey so far as I’ve travelled around the world meeting with dozens of happy customers. At each city, I’m blown away by the impact that Sumo Logic has on our customers’ mission critical applications. Our code works, our customers love our software and our business is taking off faster than we could have imagined. Momentum Is Kicking In Our gratitude for our customers only grows when we dig through the stats of what we’ve been able to build together with our world class investors and team of 170+ Sumos. Just last quarter alone, we exceeded expectations with: 100%+ Quarter over Quarter ACV growth 100+ new customer logos 12 new 1 Terabyte/day customers 1 Quadrillion new logs indexed Dozens of new Sumos bringing badass skills from companies like Google, Atlassian, Microsoft, Akamai and even VMware… Shifting Into Overdrive It is still early days, and we have a tireless road of building ahead of us. Big data is approaching a $20B per year industry. And, we’re addressing machine data, which is growing 5X faster than any other segment of data. No company has built a platform for machine data that approaches our scale in the cloud: 1 million events ingested per second 8 petabytes scanned per day 1 million queries processed per day Today, we’re excited to share the news that Ramin Sayar will joining us to lead Sumo Logic as our new president and CEO. With 20 years of industry experience, he has a proven track record for remarkable leadership, incubating and growing significant new and emerging businesses within leading companies. He comes to us from VMWare, where he was Sr. Vice President and General Manager of the Cloud Management Business Unit. In his time at VMWare, he developed the product and business strategy and led the fastest growing business unit. He was responsible for the industry leading Cloud Management Business and Strategy, R&D, Operating P&L, Product Mgmt, Product Marketing and field/business Operations for VMware’s Cloud Mgmt offerings. Our mission remains the same: to enable businesses to harness the power of machine data to improve their operations and deliver outstanding customer experience. With our current momentum and Ramin’s leadership, I am extremely excited about the next chapter in Sumo Logic’s journey. Please know how grateful we are to you, our customers, partners, and investors, for your belief in us and for the privilege to innovate on your behalf every day.

December 2, 2014

Blog

Improving Misfit Wearable’s Devices With Sumo Logic

Blog

Use AWS CloudTrail to Avoid a Console Attack

Our app for AWS CloudTrail now offers a dashboard specifically for monitoring console login activity. In the past months since the AWS team added this feature, we decided to break out these user activities in order to provide better visibility into what’s going on with your AWS account. Many of you might think of this update as incremental and not newsworthy, but I’m actually writing here today to tell you otherwise! More and more people are using APIs and CLIs (and third parties) to work with AWS outside the console. As console logins are becoming more and more rare and as more business-critical assets are being deployed in AWS, it’s critical to always know who’s logged into your console and when. For a great and terrifying read about just how badly things can go wrong when someone gains access to your console, look no further than the story of Code Spaces. With one story opening with “was a company” and another “abruptly closed,” there isn’t exactly a lot of suspense about how things turned out for this company. After attackers managed to gain access to Code Spaces’ AWS console, they built themselves a stronghold of backdoors and began an attempt to extort money from the company. When the attackers accounts were removed, they quickly used the additional users they had generated to get back in and begin taking out infrastructure and data. With the service down and their customer’s data in disarray, all trust in their product was lost. The company was effectively destroyed in a matter of hours. The new dashboard in our updated CloudTrail app allows you to quickly see who’s attempting to login to your console, from where and whether or not they’re using multi-factor authentication (which we highly recommend). If you haven’t installed the app previously, be sure to follow our simple steps from our documentation to setup the appropriate permissions in AWS. For those of you who have already installed the app previously, you can install the app again anew in order to get a new copy of the app with the additional dashboard included. From there, we encourage you to customize queries for your specific situation and even consider setting up a scheduled search to alert you to a problematic situation. Keeping an eye out for suspicious activity on your AWS console can be an invaluable insight. As attackers get more sophisticated, it’s harder and harder to keep your business secure and operational. With the help of Sumo Logic and logs from AWS CloudTrail you can stay ahead of the game by preventing the most obvious (and most insidious) types of breaches. With functionality like this, perhaps Code Spaces would still be in business.\

AWS

November 11, 2014

Blog

Certified, bonafide, on your side

“How do I build trust with a cloud-based service?” This is the most common question Sumo Logic is asked, and we’ve got you covered. We built the service so it was not just an effortless choice for enterprise customers but the obvious one and building trust through a secure architecture was one of the first things we took care of. Sumo Logic is SOC 2 Type 2 and HIPAA compliant. Sumo Logic also complies with the U.S. – E.U. Safe Harbor framework and will soon be PCI/DSS 3.0 compliant. No other cloud-based log analytics service can say this. For your company, this means you can safely get your logs into Sumo Logic – a service you can trust and a service that will protect your data just like you would. These are no small accomplishments, and it takes an A-team to get it done. It all came together when we hired Joan Pepin, a phreak and a hacker by admission. Joan is our VP of Security and CISO. She was employee number 11 at Sumo Logic and her proficiency has helped shape our secure service. Our secure architecture is also a perfect match for our “Customer First” policy and agile development culture. We make sure that we are quickly able to meet customer needs and to fix issues in real-time without compromising our secure software development processes. From network security to secure software development practices, we ensured that our developers are writing secure code in a peer-reviewed and process-driven fashion. Sumo Logic was built from the ground up to be secure, reliable, fast, and compliant. Joan understands what it means to defend a system, keep tabs on it, watch it function live. Joan worked for the Department of Defense. She can’t actually talk about what she did when she was there, but we can confirm that she was there because the Department of Defense, as she puts it, “thought my real world experience would balance off the Ph.Ds.” Joan learned the craft from Dr. Who, a member of the (http://en.wikipedia.org/wiki/Legion_of_Doom_(hacking)) Legion of Doom. (http://phrack.org/issues/31/5.html#article ) If hacker groups were rock and roll, the Legion of Doom would be Muddy Waters, Chuck Berry, Buddy Holly. They created the idea of a hacker group. They hacked into a number of state 911 systems and stole the documentation on them, distributing it throughout BBS’ in the United States. They were the original famous hacking group. Joan is no Jane-come-lately. She’s got the best resume you can have in this business. We’re frequently asked about all the security procedures we adopt at Sumo Logic. Security is baked into every component of our service. Other than the various attestations I mentioned earlier, we also encrypt data at rest and in transit. Other security processes that are core to the Sumo Logic service include: + Centrally managed, FIPS-140 two-factor authentication devices for operations personnel + Biometric access controls + Whole-disk encryption + Thread-level access controls + Whitelisting of individual processes, users, ports and addresses + Strong AES-256-CBC encryption + Regular penetration tests and vulnerability scans + A strong Secure Development Life-Cycle (SDLC) + Threat intelligence and managed vulnerability feeds to stay current with the constantly evolving threatscape and security trends If you’re still curious about the extent to which our teams have gone to keep your data safe, check out our white paper on the topic: http://www.sumologic.com/_downloads/Sumo_Logic_Whitepaper_Securing_Logic_Service_060313.pdf We use our own service to capture our logs, which has helped us accomplish our enviable security and compliance accomplishments. We’ve done the legwork so your data is secure and so you can use Sumo Logic to meet your unique security and compliance needs. We have been there done that with the Sumo Logic service and now it’s your turn.

Blog

Optimizing Breadth-First Search for Social Networks

Social network graphs, like the ones captured by Facebook and Twitter exhibit small-world characteristics [2][3]. In 2011, Facebook documented that among all Facebook users at the time of their research (721 million users with 69 billion friendship links), there is an average average shortest path distance of 4.74 between users. This simply means that on average, any two people in the world are separated by just five other people. It’s a small world indeed ! Formally, a small-world network is defined to be a network where the typical distance L between two randomly chosen nodes grows proportionally to the logarithm of the number of nodes N in the network [4]. Consider the following scenario. You have a social network profile and you want someone to introduce you to the person in that profile. But luckily you are given the entire friendship graph captured by this social network. If there are mutual friends, then you just ask one of them to help you out. If not, you need some sequence of friend introductions to finally meet that person. What is the minimum number of intermediate friend introductions that you need to meet that person you are interested in ? This is equivalent to finding the shortest path in the social network graph between you and that person of interest. The solution is to run Breadth First Search on that social network graph with your profile as the starting vertex. The other interesting question is, if we have extra information about our graph exhibiting small-world properties, can we make the exhaustive Breadth-First Search (BFS) faster? The ideas expressed on this topic appeared in Beamer et al [1], where the authors optimized BFS for the number of edges traversed. Breadth-First Search: BFS uses the idea of a frontier that separates the visited nodes from unvisited nodes. The frontier holds the nodes of the recently visited level and is used to find the next set of nodes to be visited. On every step of BFS, the current frontier is used to identify the next frontier from the set of unvisited nodes. Figure 1. A simple graph Looking at the example in the figure, the current frontier consists of the nodes 1, 2, 3, 4 and 5. The edges from these nodes are examined to find a node that has not been visited. In the above case node 2’s edges are used to mark H and add it to the next frontier. But note that even though H has been marked by 2, nodes 3, 4 and 5 still inspect H to see whether it is visited or not. Pseudocode for Naive BFS [5] : Input: A graph G = (V,E) containing V vertices and E edges and source vertex s Output: parent: Array[Int], where parent[v] gives the the parent of v in the graph or -1 is if a parent does not exist #wrap_githubgista7ca1bd9e54e42b56ad4 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n class BFS(g: Graph) {\n \n \n \n \n\n \n \n \n val parent = ArrayBuffer.fill(g.numberVertices)(-1).toArray\n \n \n \n \n\n \n \n \n def bfs(source: Int, updater: (Seq[Int], Array[Int]) => Seq[Int]) = {\n \n \n \n var frontier = Seq(source)\n \n \n \n parent(source) = -2\n \n \n \n \n\n \n \n \n while (!frontier.isEmpty) {\n \n \n \n frontier = updater(frontier, parent)\n \n \n \n }\n \n \n \n }\n \n \n \n }\n \n \n \n \n\n \n \n \n trait TopDownUpdater extends FrontierUpdater {\n \n \n \n \n\n \n \n \n def update(frontier: Seq[Int], parents: Array[Int]): Seq[Int] = {\n \n \n \n val next = ArrayBuffer[Int]()\n \n \n \n \n\n \n \n \n frontier.foreach{ node =>\n \n \n \n graph.getNeighbors(node).filter(parents(_) == -1).foreach { neighbor =>\n \n \n \n next += neighbor\n \n \n \n parents(neighbor) = node\n \n \n \n }\n \n \n \n }\n \n \n \n next\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n top-down.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found One of the observations of conventional BFS (henceforth referred to as top-down BFS) is that it always performs in the worst case complexity, i.e., O(|V| + |E|) where V and E are the number of vertices and number of edges respectively. For example, if a node v has p parents, then we just need to explore one edge from any p parents to v to check for connectivity. But top-down BFS checks all incoming edges to v. The redundancy of these additional edge lookups is more pronounced when top-down BFS is run on graphs exhibiting small-world properties. As a consequence of the definition of small-world networks, the number of nodes increases exponentially with the effective diameter of the network, which result in large networks with very low diameters. The low diameter of these graphs forces them to have a larger number of nodes at a particular level and leads to top-down BFS visiting a larger number of nodes in every step, making the frontier very large. Traversing the edges of the nodes in a frontier is the major computation that is performed, and top-down BFS unfortunately ends up visiting all the outgoing edges from the frontier. Moreover, it has also been shown in [1] that most of the edge lookups from the frontier nodes end up in visited nodes (marked by some other parent), which gives further evidence that iterating through all edges from the frontier can be avoided. The idea behind bottom-up BFS [1] is to avoid visiting all the edges of the nodes in the frontier, which is a pretty useful thing to do for the reasons mentioned above. To accomplish this, bottom-up BFS traverses the edges of the unvisited nodes to find a parent in the current frontier. If an unvisited node has at least one of its parents in the current frontier, then that node is added to the next frontier. To efficiently find if a node’s parent is present in the frontier, the frontier data structure is changed to a bitmap. Figure 2. Bottom up BFS In the above example, {H, I, J, K } are the unvisited nodes. However only nodes { J, H } have a neighbor in the current frontier and as a result the next frontier now becomes {H , J}. In the next iteration the set of unvisited nodes will be {I, K} and each of them have a parent in the current frontier which is {H, J}. So {I, K} will be visited and the search will complete in the next iteration since there will be no more nodes to be added to the next frontier, since all nodes will be visited. Pseudocode for Bottom-up BFS: Input: A graph G = (V,E) containing V vertices and E edges and source vertex s Output: parent: Array[Int], where parent[v] gives the the parent of v in the graph or -1 is if a parent does not exist #wrap_githubgist5ee73a21a23a99b7a523 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n trait DirectedSerialAncestorManager extends SerialAncestorManager{\n \n \n \n var _graph: SerialDirectedGraph = _\n \n \n \n def getAncestor(id: Int): IndexedSeq[Int] = {\n \n \n \n _graph.getParents(id)\n \n \n \n }\n \n \n \n \n\n \n \n \n def getVertices: IndexedSeq[Int] = (0 to _graph.numberVertices - 1)\n \n \n \n }\n \n \n \n \n\n \n \n \n trait SBottomUpUpdater extends FrontierUpdater with SerialAncestorManager {\n \n \n \n \n\n \n \n \n def update(frontier: BitSet, parents: Array[Int]): Seq[Int] = {\n \n \n \n val next = mutable.BitSet()\n \n \n \n val vertices = getVertices\n \n \n \n val frontierSet = frontier.toSet\n \n \n \n \n\n \n \n \n (vertices.filter(parents(_) == -1)).foreach { node =>\n \n \n \n val neighbors = (getAncestor(node))\n \n \n \n \n\n \n \n \n neighbors.find(frontierSet) match {\n \n \n \n case Some(ancestor) => {\n \n \n \n parents(node) = ancestor\n \n \n \n next(node) = true\n \n \n \n }\n \n \n \n case None => None\n \n \n \n }\n \n \n \n }\n \n \n \n next.toBuffer\n \n \n \n }\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n bottom-up.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found The major advantage to this approach is that the search for an unvisited node’s parent will terminate once any one parent is found in the current frontier. Contrast this with top-down BFS, which needs to visit all the neighbors of a node in the frontier during every step. Top-down, Bottom-up, or both? When the frontier is large, you gain by performing bottom-ups BFS as it only examines some edges of the unvisited nodes. But when the frontier is small, it may not be advantageous to perform bottom-up BFS, as apart from having to go over, it incurs the additional overhead of identifying the unvisited nodes. Small-world networks usually start off with small frontiers in the initial step and have an exponential increase in the frontier size in the middle stages of the search procedure. These tradeoffs lead us to another approach for small-world networks where we combine combine both top-down and bottom-up BFS—hybrid BFS [1]. In hybrid BFS, the size of the frontier is used to define a heuristic, which is used to switch between the two approaches, top-down and bottom-up. A thorough analysis of this heuristic is presented in [1]. How about parallelizing these approaches ? When trying to parallelize the two approaches, observe that bottom-up BFS is easier to parallelize than top-down BFS. For bottom-up BFS, you can introduce parallelism in the stage where you populate the next frontier. Each of the unvisited nodes can be examined in parallel, and since every node just updates itself in the next data structure, it does not require the use of locks. #wrap_githubgistb21bfd1a1b14d98c4709 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n trait ParallelAncestorManager {\n \n \n \n def getAncestor(id: Int): ParSeq[Int]\n \n \n \n def getParVertices: ParSeq[Int]\n \n \n \n }\n \n \n \n \n\n \n \n \n trait PBottomUpUpdater extends FrontierUpdater with ParallelAncestorManager {\n \n \n \n \n\n \n \n \n def update(frontier: Seq[Int], parents: Array[Int]):Seq[Int] = {\n \n \n \n val next = BitSet()\n \n \n \n val frontierSet = frontier.toSet\n \n \n \n \n\n \n \n \n getParVertices.filter(parents(_) == -1).foreach { node =>\n \n \n \n val parNeighbors = getAncestor(node)\n \n \n \n parNeighbors.find(x => frontierSet.contains(x)) match {\n \n \n \n case Some(ancestor) => {\n \n \n \n parents(node) = ancestor\n \n \n \n next(node) = true\n \n \n \n }\n \n \n \n case None => None\n \n \n \n }\n \n \n \n }\n \n \n \n next.toBuffer\n \n \n \n }\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n bottom-up-par.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found On inspecting the top-down BFS pseudo-code for sources of parallelism, observe that the nodes in the current frontier can be explored in parallel. The parallel top-down pseudo-code is: #wrap_githubgistd5c5cb06b15999c6c7d6 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n trait PTopDownUpdater extends FrontierUpdater {\n \n \n \n \n \n \n \n def update(frontier: Seq[Int], parents: Array[Int]): Seq[Int] = {\n \n \n \n val next = ArrayBuffer[Int]()\n \n \n \n \n \n \n \n frontier.par.foreach { node =>\n \n \n \n graph.getNeighbors(node).filter(parents(_) == -1).foreach { neighbor =>\n \n \n \n next += neighbor\n \n \n \n parents(neighbor) = node\n \n \n \n }\n \n \n \n }\n \n \n \n next\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n top-down-par.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found In terms of correctness, the above pseudo-code looks good, but there is a benign race condition introduced by updating parents and next. This may result in a node being added more than once, making it inefficient. But it does not affect the correctness of the algorithm. Cleaner code would have a synchronized block to ensure only one thread updates the frontier. The hybrid approach combining the parallel versions of top-down and bottom-up BFS provides one of the fastest single node implementation of Parallel BFS [1]. References: Beamer, Scott, Krste Asanović, and David Patterson. “Direction-optimizing breadth-first search.” Scientific Programming 21.3 (2013): 137-148. Ugander, Johan, et al. “The anatomy of the facebook social graph.” arXiv preprint arXiv:1111.4503 (2011). Li, Jun, Shuchao Ma, and Shuang Hong. “Recommendation on social network based on graph model.” Control Conference (CCC), 2012 31st Chinese. IEEE, 2012. Watts, Duncan J., and Steven H. Strogatz. “Collective dynamics of ‘small-world’networks.” nature 393.6684 (1998): 440-442. Introduction to Algorithms (1990) by T H Cormen, C E Leiserson, R L Rivest

October 28, 2014

Blog

Transaction Mining for Deeper Machine Data Intelligence

https://www.sumologic.com/blog... dir="ltr">The new Sumo Logic Transaction capability allows users to analyze related sequences of machine data. The comprehensive views uncover user behavior, operational and security insights that can help organizations optimize business strategy, plans and processes. The new capability allows you to monitor transactions by a specific transaction ID (session ID, IP, user name, email, etc.) while handling data from distributed systems, where a request is passed through several different systems, each with its own transaction ID. Over the past two months, we have worked with beta customers on a variety of use cases, including: Tracking transactions in a payment processing platform Following typical user sessions, detecting anomalous checkout transactions and catching checkout drop off in e-commerce websites Tracking renewals, upgrades and new signup transactions Monitoring phone registrations failures over a specific period Tracking on-boarding of new users in SaaS products The last use case is reflective of what SaaS companies care most about: truly understanding the behavior of users on their website that drive long-term engagement. We’ve used our new transaction analytics capabilities to better understand how users find our site, the process by which they get to our Sumo Logic Free page, and how quickly they sign up. Our customer success team uses Transaction Analytics to monitor how long it takes users to create a dashboard, run a search, and perform other common actions. This enables them to provide very specific feedback to the product team for future improvements. This screenshot depicts a query with IP as the transaction ID and the various states mapped from the logs Sankey diagram visualizes the flow of the various components/states of a transaction on an e-commerce website Many of our customers are already using tools such as Google Analytics to monitor visitors flow on their website and understand customer behavior. We are not launching this new capability to replace Google Analytics (even if it’s not embraced in some countries as Germany). What we bring on top of monitoring visitors flow, is the ability to identify divergence in state sequences and understand better the transitions between the states, in terms of latency for example. You probably see updates that some companies are announcing on plugins for log management platforms to detect anomalies and monitor user behavior and sessions. The team’s product philosophy is that we would like to provide our users all-rounded capability that enables them to make smart choices without requiring external tools, all from their machine data within the Sumo product. It was a fascinating journey working on the transaction capability with our analytics team. It’s a natural evolution of our analytics strategy which now includes: 1) real-time aggregation and correlation with our Dashboards; 2) machine learning to automatically uncover anomalies and patterns; and 3) now transaction analytics to rapidly uncover relationships across distributed events. We are all excited to launch Transaction Analytics. Please share with us your feedback on the new capability and let us know if we can help with your use cases. The transaction searches and the new visualization are definitely our favorite content. https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

October 22, 2014

Blog

Data, with a little Help from my friends

Ever had that sinking feeling when you start a new job and wonder just why you made the jump? I had a gut check when, shortly after joining Sumo Logic in June of 2012, I realized that we had less than 50 daily hits to our Knowledge Base on our support site. Coming from a position where I was used to over 7,000 customers reading my content each day, I nearly panicked. After calming down, I realized that what I was actually looking at was an amazing opportunity. Fast forward to 2014. I’ve already blogged about the work I’ve done with our team to bring new methods to deliver up-to-date content. (If you missed it, you can read the blog here.) Even with these improvements I couldn’t produce metrics that proved just how many customers and prospects we have clicking through our Help system. Since I work at a data analytics company, it was kind of embarrassing to admit that I had no clue how many visitors were putting their eyes on our Help content. I mean, this is some basic stuff! Considering how much time I’ve spent working with our product, I knew that I could get all the information I needed using Sumo Logic…if I could get my hands on some log data. I had no idea how to get logging enabled, not to mention how logs should be uploaded to our Service. Frankly, my English degree is not conducive to solving engineering challenges (although I could write a pretty awesome poem about my frustrations). I’m at the mercy of my Sumo Logic co-workers to drive any processes involving how Help is delivered and how logs are sent to Sumo Logic. All I could do was pitch my ideas and cross my fingers. I am very lucky to work with a great group of people who are happy to help me out when they can. This is especially true of Stefan Zier, our Chief Architect, who once again came to my aid. He decommissioned old Help pages (my apologies to anyone who found their old bookmarks rudely displaying 404’s) and then routed my Help from the S3 bucket through our product, meaning that Help activity can be logged. I now refer to him as Stefan, Patron Saint of Technical Writers. Another trusty co-worker we call Panda helped me actually enable the logging. Once the logging began we could finally start creating some Monitors to build out a Help Metrics Dashboard. In addition to getting the number of hits and the number of distinct users, we really wanted to know which pages were generating the most hits (no surprise that search-related topics bubbled right to the top). We’re still working on other metrics, but let me share just a few data points with you. Take a look at the number of hits our Help site has handled since October 1st: We now know that Wednesday is when you look at Help topics the most: And here’s where our customers are using Help, per our geo lookup operator Monitor: It’s very exciting to see how much Sumo Logic has grown, and how many people now look at content written by our team, from every corner of the world. Personally, it’s gratifying to feel a sense of ownership over a dataset in Sumo Logic, thanks to my friends. What’s next from our brave duo of tech writers? Beyond adding additional logging, we’re working to find a way to get feedback on Help topics directly from users. If you have any ideas or feedback, in the short term, please shoot us an email at [email protected]. We would love to hear from you!

October 20, 2014

Blog

Machine Data Intelligence – an update on our journey

In 1965, Dr. Hubert Dreyfus, a professor of philosophy at MIT, later at Berkeley, was hired by RAND Corporation to explore the issue of artificial intelligence. He wrote a 90-page paper called “Alchemy and Artificial Intelligence” (later expanded into the book What Computers Can’t Do) questioning the computer’s ability to serve as a model for the human brain. He also asserted that no computer program could defeat even a 10-year-old child at chess. Two years later, in 1967, several MIT students and professors challenged Dreyfus to play a game of chess against MacHack (a chess program that ran on a PDP-6 computer with only 16K of memory). Dreyfus accepted. Dreyfus found a move, which could have captured the enemy queen. The only way the computer could get out of this was to keep Dreyfus in checks with his own queen until he could fork the queen and king, and then exchange them. And that’s what the computer did. The computer checkmated Dreyfus in the middle of the board. I’ve brought up this “man vs. machine” story because I see another domain where a similar change is underway: the field of Machine Data. Businesses run on IT and IT infrastructure is getting bigger by the day, yet IT operations still remain very dependent on analytics tools with very basic monitoring logic. As the systems become more complex (and more agile) simple monitoring just doesn’t cut it. We cannot support or sustain the necessary speed and agility unless the tools becomes much more intelligent. We believed in this when we started Sumo Logic and with the learnings of running a large-scale system ourselves, continue to invest in making operational tooling more intelligent. We knew the market needed a system that complemented the human expertise. Humans don’t scale that well – our memory is imperfect so the ideal tools should pick up on signals that humans cannot, and at a scale that perfectly matches the business needs and today’s scale of IT data exhaust. Two years ago we launched our service with a pattern recognition technology called LogReduce and about five months ago we launched Structure Based Anomaly Detection. And the last three months of the journey have been a lot like teaching a chess program new tricks – the game remains the same, just that the system keeps getting better at it and more versatile. We are now extending our Structured Based Anomaly Detection capabilities with Metric Based Anomaly Detection. A metric could be just that – a time series of numerical value. You can take any log, filter, aggregate and pre-process however you want – and if you can turn that into a number with a time stamp – we can baseline it, and automatically alert you when the current value of the metric goes outside an expected range based on the history. We developed this new engine in collaboration with the Microsoft Azure Machine Learning team, and they have some really compelling models to detect anomalies in a time series of metric data – you can read more about that here. The hard part about Anomaly Detection is not about detecting anomalies – it is about detecting anomalies that are actionable. Making an anomaly actionable begins with making it understandable. Once an analyst or an operator can grok the anomalies – they are much more amenable to alert on it, build a playbook around it, or even hook up automated remediation to the alert – the Holy Grail. And, not all Anomaly Detection engines are equal. Like chess programs there are ones that can beat a 5 year old and others that can even beat the grandmasters. And we are well on our way to building a comprehensive Anomaly Detection engine that becomes a critical tool in every operations team’s arsenal. The key question to ask is: does the engine tell you something that is insightful, actionable and that you could not have found with standard monitoring tools. Below is an example of an actual Sumo production use case where some of our nodes were spending a lot of time in garbage collection impacting refresh rates for our dashboards for some of the customers. If this looks interesting, our Metric Based Anomaly Detection service based on Azure Machine Learning is being offered to select customers in a limited beta release and will be coming soon to machines…err..a browser near you (we are a cloud based service after all). P.S. If you like stories, here is another one for you. 30 years after MackHack beat Dreyfus, in the year 1997 Kasparov (arguably one of the best human chess players) played the Caro-Kann Defence. He then allowed Deep Blue to commit a knight sacrifice, which wrecked his defenses and forced him to resign in fewer than twenty moves. Enough said. References [1] http://www.chess.com/article/view/machack-attack

Blog

Become Friends with Metadata to Maximize Efficiency in Sumo Logic

While performing my duties on-boarding our Sumo Free users, I’ve learned that the most consistent process that our users wish to eliminate prior to using our service is investigating incidents within siloed data sources. Needless to say the process is arduous because customers have to zero-in on individual servers/appliances to manually decipher all logs. Enough said! Centralizing your logs, making Sumo Logic the source of truth for your data, allows for real-time, rapid coordination between Dev, Ops, and Sec teams to remediate problems, patch vulnerabilities, improve end user experience etc. But now that you have your Apache Access and Errors logs, your Cisco ASA logs, email logs, Linux and Windows OS logs, VMware, and AWS ELB logs all living under the same roof, how do we make them play nicely together? How do you search on your apache error logs without accidentally inviting Linux and Windows error messages to the party? Meet your “bouncer” metadata! Leveraging metadata to build searches and dashboards is the foundation for both organizing your diverse logs and optimizing performance. Starting your search with a * can be computationally expensive and cause avoidable lag time for your results. So it’s a best practice to constrain your search to a subset of your data using a metadata field before your first pipe i.e. | using the syntax _metadatafield=foo . Or simply click into the search bar and select: Now you don’t have to remember a specific set of keywords or strings to pull up the data set you want. Standardize your Metadata Convention The above picture displays the primary metadata fields you’re capable of customizing that will get attached to your messages after ingestion. These include: Collector – The name of the Collector entered at activation time Source – The name of the Source entered when the Source is created Source Category – Open tag, completely customizable. This metadata is also typically used for mapping data streams to our apps Source Host – For Remote and Syslog Sources, this is a fixed value determined by the hostname you enter in the “Hostname” field (your actual system values for hosts). For a Local File Source, you can overwrite the host system value with a new value of your choice Source Name – A fixed value determined by the path you enter in the “File” field when configuring a Source. This metadata tag cannot be changed _sourceCategory is your best friend Source Category is your best friend because it’s completely open and customizable allowing you to “categorize” your logs in a way that makes the most sense for your team. You can also provide structure by using multiple tags separated by an _ or a / . For example, say you wanted to separate your Apache logs based on staging environments, you could hierarchically categorize them to make it easy to search on them individually or together using wildcards. You might try this: Prod/Apache/Access Prod/Apache/Error QA/Apache/Access QA/Apache/Error To search individually: _sourcecategory=Prod/Apache/Access _sourcecategory=QA/Apache/Access Together: _sourcecategory=*Apache/Access _sourcecategory=*Apache/Error or _sourcecategory=Prod/Apache* _sourcecategory=QA/Apache* Or say you have multiple security data sources like a Cisco Firewall, Snort IDS, and Linux OS security logs, you might try: Sec_Firewall_Cisco Sec_IDS_Snort OS_Linux_Sec And we can tie all of them together with a simple: _sourcecategory=*Sec* _sourceCategory for Apps Additionally we use _sourcecategory to map your data to our numerous pre-built applications. Just like in search, you can use wildcards to funnel multiple different data sources into the same app. For example you may have multiple Linux or Windows OS logs that you may have categorized differently based on location, you’re not required to use the same source category for them, simply make sure the words “Linux” and “Windows” are somewhere in the metadata field and use the wildcards to create a custom data source to funnel all of it to the app. Here are some additional metadata fields that are not customizable but are still attached to all of your messages, these can be used to refine your queries: Name Description _messageCount A sequence number (per Source) added by the Collector when the message was received. _messageTime The timestamp of the message. If the message doesn’t have a timestamp, messageTime uses the receiptTime. _raw The raw log message. _receiptTime The time the Collector received the message. _size The size of the log message. For additional tips on leveraging your metadata check out help.sumologic.com. And if you’re not familiar with Sumo please check out our Sumo Logic Free service and enjoy!

October 16, 2014

Blog

Scala at Sumo: type class law for reliable abstractions

Abstraction is a fundamental concept in software development. Identifying and building abstractions well-suited to the problem at hand can make the difference between clear, maintainable code and a teetering, Jenga-like monolith duct-taped together by a grotesque ballet of tight coupling and special case handling. While a well-designed abstraction can shield us from detail, it can also suffer from leakage, failing to behave as expected or specified and causing problems for code built on top of it. Ensuring the reliability of our abstractions is therefore of paramount concern. In previous blog posts, we’ve separately discussed the benefits of using type classes in Scala to model abstractions, and using randomized property testing in Scala to improve tests. In this post we discuss how to combine these ideas in order to build more reliable abstractions for use in your code. If you find these ideas interesting please be sure to check out the references at the end of this post. Type classes for fun and profit Type classes allow us to easily build additional behaviors around data types in a type-safe way. One simple and useful example is associated with the monoid abstraction, which represents a set of items which can be combined with one another (such as the integers, which can be combined by addition). Loosely[1], a monoid consists of a collection of objects (e.g., integers) a binary operation for combining these objects to yield a new object of the same type (e.g., addition) an identity object whose combination leaves an object unchanged (e.g., the number 0) This abstraction is captured by the scalaz trait Monoid[F]: #wrap_githubgistc125b7eef26b07354a01 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n trait Monoid[F] {\n \n \n \n def zero: F\n \n \n \n def append(f1: F, f2: => F): F\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n lawblog-monoid.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found The utility of this machinery is that it gives us a generalized way to use types that support some notion of “addition” or “combination”, for example[2]: #wrap_githubgist8ba856345509f6fd8bf3 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n def addItUp[F : Monoid](items: Seq[F]): F = {\n \n \n \n // Combine a bunch of items\n \n \n \n val m = implicitly[Monoid[F]]\n \n \n \n items.foldLeft(m.zero){case (total, next) => m.append(total,next)}\n \n \n \n }\n \n \n \n \n\n \n \n \n scala> addItUp(Seq("day ", "after ", "day"))\n \n \n \n res1: String = "day after day"\n \n \n \n scala> addItUp(Seq(1,2,3))\n \n \n \n res2: Int = 6\n \n\n\n \n\n \n \n\n\n \n \n view raw\n lawblog-additup.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found As described in our earlier machine learning example, this can be more convenient than requiring that the data types themselves subtype or inherit from some kind of “Addable” interface. I am the law! In Scala, the Monoid[F] trait definition (combined with the compiler type-checking) buys us some important sanity checks with respect to behavior. For example, the function signature append(x: F, y: F): F guarantees that we’re never going to get a non-F result[3]. However, there are additional properties that an implementation of Monoid[F] must satisfy in order to truly conform to the conceptual definition of a monoid, but which are not easily encoded into the type system. For example, the monoid binary operation must satisfy left and right identity with respect to the “zero” element. For integers under addition the zero element is 0, and we do indeed have x + 0 = 0 + x = x for any integer x. We can codify this requirement in something called type class law. When defining a particular type class, we can add some formal properties or invariants which we expect implementations to obey. The codification of these constraints can then be kept alongside the type class definition. Again returning to scalaz Monoid[4], we have #wrap_githubgist274da5ddfb2b96ce6ec8 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n trait Monoid[F] extends Semigroup[F] { \n \n \n \n ...\n \n \n \n trait MonoidLaw extends SemigroupLaw {\n \n \n \n def leftIdentity(a: F)(implicit F: Equal[F]) = \n \n \n \n F.equal(a, append(zero, a))\n \n \n \n def rightIdentity(a: F)(implicit F: Equal[F]) = \n \n \n \n F.equal(a, append(a, zero))\n \n \n \n }\n \n \n \n ...\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n lawblog-monoidlaw.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found An interesting observation is that this implementation depends upon another type class instance Equal[F] which simply supplies an equal() function for determining whether two instances of F are indeed equal. Of course, Equal[F] comes supplied with its own type class laws for properties any well-defined notion of equality must satisfy such as commutativity (x==y iff y==x), reflexivity (x==x), and transitivity (if a==b and b==c then a==c). A machine learning example We now consider an example machine learning application where we are evaluating some binary classifier (like a decision tree) over test data. We run our evaluation over different sets of data, and for each set we produce a very simple output indicating how many predictions were made, and of those, how many were correct: #wrap_githubgist0fea8477be69860df262 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n case class Evaluation(total: Int, correct: Int)\n \n\n\n \n\n \n \n\n\n \n \n view raw\n lawblog-evaluationscala\n hosted with ❤ by GitHub\n \n \n\n') Not Found We can implement Monoid[Evaluation] [5] in order to combine the our experimental results across multiple datasets: #wrap_githubgist837f1f4fed2d366e34bd .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n object EvaluationMonoid extends Monoid[Evaluation] {\n \n \n \n def zero = Evaluation(0,0)\n \n \n \n def append(x: Evaluation, y: => Evaluation) =\n \n \n \n Evaluation(x.total + y.total, x.correct + y.correct)\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n lawblog-evalmonoid.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found We’d like to ensure that our implementation satisfies the relevant type class laws. We could write a handful of unit tests against one or more hand-coded examples, for example using ScalaTest: #wrap_githubgistf0c4a31d7ac6b3f8cf2b .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n "Evaluation Monoid" should {\n \n \n \n \n\n \n \n \n import EvaluationMonoid._\n \n \n \n implicit val eq = Equal.equalA[Evaluation]\n \n \n \n val testEvaluation = Evaluation(3, 2)\n \n \n \n \n\n \n \n \n "obey Monoid typeclass Law" in {\n \n \n \n Monoid.monoidLaw.leftIdentity(testEval) should be (true)\n \n \n \n Monoid.monoidLaw.rightIdentity(testEval) should be (true)\n \n \n \n }\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n lawblog-spottest.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found However, this merely gives us an existence result. That is, there exists some value for which our the desired property holds. We’d like something a little stronger. This is where we can use ScalaCheck to do property testing, randomly generating as many arbitrary instances of Evaluation as we’d like. If the law holds for all [6] generated instances, we can have a higher degree confidence in the correctness of our implementation. To accomplish this we simply need to supply a means of generating random Evaluation instances via ScalaCheck Gen: #wrap_githubgist6248a52718525e005cd4 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n val evalGen = for {total <- Gen.choose(0, 1000);\n \n \n \n correct <- Gen.choose(0, total)} \n \n \n \n yield Evaluation(total,correct)\n \n \n \n \n\n \n \n \n "Evaluation Monoid" should {\n \n \n \n \n\n \n \n \n import EvaluationMonoid._\n \n \n \n implicit val eq = Equal.equalA[Evaluation]\n \n \n \n \n \n \n \n "obey Monoid typeclass Law" in {\n \n \n \n forAll (evalGen) { testEval => {\n \n \n \n Monoid.monoidLaw.leftIdentity(testEval) should be (true) \n \n \n \n Monoid.monoidLaw.rightIdentity(testEval) should be (true)\n \n \n \n }}\n \n \n \n }\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n lawblog-proptest.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found Now that’s an abstraction we can believe in! So what? This level of confidence becomes important when we begin to compose type class instances, mixing and matching this machinery to achieve our desired effects. Returning to our Evaluation example, we may want to evaluate different models over these datasets, storing the results for each dataset in a Map[String,Evaluation] where the keys refer to which model was used to obtain the results. In scalaz, we get the Monoid[Map[String,Evaluation]] instance “for free”, given an instance of Monoid[Evaluation]: #wrap_githubgistac4abe76c1a5058b05cc .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n scala> implicit val em = EvaluationMonoid\n \n \n \n em: EvaluationMonoid.type = EvaluationMonoid$@34f5b235\n \n \n \n \n\n \n \n \n scala> implicit val mm = mapMonoid[String,Evaluation]\n \n \n \n mm: scalaz.Monoid[Map[String,Evaluation]] = scalaz.std.MapInstances$$anon$4@13105b09\n \n \n \n \n\n \n \n \n scala> val dataset1 = Map("modelA" -> Evaluation(3,2),\n \n \n \n | "modelB" -> Evaluation(4,1))\n \n \n \n dataset1: scala.collection.immutable.Map[String,Evaluation] = \n \n \n \n Map(modelA -> Evaluation(3,2), modelB -> Evaluation(4,1))\n \n \n \n \n\n \n \n \n scala> val dataset2 = Map("modelA" -> Evaluation(5,4))\n \n \n \n dataset2: scala.collection.immutable.Map[String,Evaluation] = \n \n \n \n Map(modelA -> Evaluation(5,4))\n \n \n \n \n\n \n \n \n scala> mm.append(dataset1,dataset2)\n \n \n \n res3: Map[String,Evaluation] = \n \n \n \n Map(modelA -> Evaluation(8,6), modelB -> Evaluation(4,1))\n \n\n\n \n\n \n \n\n\n \n \n view raw\n lawblog-mapmonoid.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found Conclusion and references If you are using the scalaz library, many of the provided type classes come “batteries included” with type class laws. Even if you are not, these ideas can help you to build more reliable type class instances which can be composed and extended with confidence. See below for some additional references and readings on this subject: Law Enforcement using Discipline Verifying Typeclass Laws in Haskell with QuickCheck How to test Scalaz type class instances with specs2 and ScalaCheck Haskell’s Type Classes: We Can Do Better Footnotes [1] Omitting associativity and explicit discussion of closure. [2] For brevity, these code snippets do not show library (scalaz, ScalaTest, ScalaCheck) imports. [3] Excluding the unfortunate possibilities of null return values or thrown Exceptions. [4] A semigroup is a more general concept than a monoid, which is modeled in scalaz by having Monoid[F] extend Semigroup[F]. [5] This implementation has a bit of a boilerplate flavor, this post describes how we could automagically derive our Monoid[Evaluation] instance. [6] As implied by the ScalaCheck project’s appropriate logo.

October 9, 2014

Blog

The Three Questions Customers Invariably Ask Us

For almost all DevOps, App Ops and Security teams, finding that needle in the haystack, that indicator of cause, the unseen effect, and finding it quickly is fundamental to their success. Our central mission is to enable the success of these teams via rapid analysis of their machine data. During their process of researching and investigating Sumo Logic, customers invariably ask us three questions: How long will it take to get value from Sumo Logic? Everyone provides analytics – what’s different about yours? How secure is my data in the cloud? Let’s address each of these questions. Time to Value A key benefit we deliver revolves around speed and simplicity: no hardware, storage or deployment overhead. Beyond the fact that we’re SaaS the true value, however, revolves around how quickly we can turn data into actionable information. First, our cloud-based service integrates quickly into any environment (on-premises, cloud, hybrid) that generates machine data. Because we’re data source agnostic, our service can quickly correlate logs across various systems, leading to new and relevant analyses. For example, one of our engineers has written a post on how we use Sumo Logic internally to track what’s happening with Amazon SES messages and how others can very quickly set this up as well. Second, value is generated by how quickly you uncover insights. A Vice President of IT at a financial services firm that is now using Sumo Logic shared with us that incidents that used to take him 2 hours to discover and fix now takes him 10 minutes. Why? Because the machine learning that underpins our LogReduce pattern recognition engine surfaces the critical issues that his team can investigate and remediate, without the need to write any rules. Analytics Unleashed Sumo Logic was founded on the idea that powerful analytics are critical to making machine data a corporate resource to be valued rather than ignored. Our analytics engine combines the best of machine learning, real-time processing, and pre-built applications to provide rapid value. Fuze recently implemented Sumo Logic to help gain visibility of its technical infrastructure. They are now able to address incidents and improvements in its infrastructure much more quickly with specific insights. They report saving 40% in management time savings and a 5x improvement in “signal-to-noise” ratio. A critical reason why InsideView chose Sumo Logic was the availability of our applications for AWS Elastic Load Balancing and AWS CloudTrail to help monitor their AWS infrastructure and to get immediate value from our service. Security In the Cloud Customers are understandably curious about our security processes, policies and infrastructure that would help them mitigate concerns about sending their data to a 3rd party vendor. Given that our founding roots are in security and that our entire operating model is to securely deliver data insights at scale, we have a deep appreciation for the natural concerns prospects might have. We’ve crafted a detailed White Paper that outlines how we secure our service, but here are a few noteworthy highlights. Data encryption: we encrypt log data both in motion and at rest and each customer’s unique keys are rotated daily Certifications: we’ve spent significant resources on our current attestations and certifications (e.g., HIPAA, SOC 2 Type 2 and others) and are actively adding to this list Security processes: included in this bucket are centrally managed FIPS-140 two-factor authentication devices, biometric controls, whitelists for users, ports, and addresses, and more Our CISO has discussed the broader principles of managing security in the cloud in an on-demand webinar and of course you can always start investigating our service via Sumo Logic Free to understand for yourself how we answer these three questions.

AWS

October 8, 2014

Blog

Cloud Log Management for Control Freaks

The following is a guest post from Bright Fulton, Director of Engineering Operations at Swipely.Like other teams that value their time and focus, Swipely Engineering strongly prefers partnering with third party infrastructure, platform, and monitoring services. We don’t, however, like to be externally blocked while debugging an issue or asking a new question of our data. Is giving up control the price of convenience? It shouldn’t be. The best services do the heavy lifting for you while preserving flexibility. The key lies in how you interface with the service: stay in control of data ingest and code extensibility.A great example of this principle is Swipely’s log management architecture. We’ve been happily using Sumo Logic for years. They have an awesome product and are responsive to their customers. That’s a strong foundation, but because logging is such a vital function, we retain essential controls while taking advantage of all the power that Sumo Logic provides.Get the benefitsInfrastructure services have flipped our notion of stability: instead of being comforted by long uptime, we now see it as a liability. Instances start, do work for an hour, terminate. But where do the logs go? One key benefit of a well integrated log management solution is centralization: stream log data off transient systems and into a centralized service.Once stored and indexed, we want to be able to ask questions of our logs, to react to them. Quick answers come from ad-hoc searches:How many times did we see this exception yesterday?Show me everything related to this request ID.Next, we define scheduled reports to catch issues earlier and shift toward a strategic view of our event data.Alert me if we didn’t process a heartbeat job last hour.Send me a weekly report of which instance types have the worst clock skew.Good cloud log management solutions make this centralization, searching, and reporting easy.Control the dataIt’s possible to get these benefits without sacrificing control of the data by keeping the ingest path simple: push data through a single transport agent and keep your own copy. Swipely’s logging architecture collects with rsyslog and processes with Logstash before forwarding everything to both S3 and Sumo Logic.Put all your events in one agent and watch that agent.You likely have several services that you want to push time series data to: logs, metrics, alerts. To solve each concern independently could leave you with multiple long running agent processes that you need to install, configure, and keep running on every system. Each of those agents will solve similar problems of encryption, authorization, batching, local buffering, back-off, updates. Each comes with its own idiosyncrasies and dependencies. That’s a lot of complexity to manage in every instance.The lowest common denominator of these time series event domains is the log. Simplify by standardizing on one log forwarding agent in your base image. Use something reliable, widely deployed, open source. Swipely uses rsyslog, but more important than which one is that there is just one.Tee timeIt seems an obvious point, but control freaks shouldn’t need to export their data from third parties. Instead of forwarding straight to the external service, send logs to an aggregation server first. Swipely uses Logstash to receive the many rsyslog streams. In addition to addressing vendor integrations in one place, this point of centralization allows you to:Tee your event stream. Different downstream services have different strengths. Swipely sends all logs to both Sumo Logic for search and reporting and to S3 for retention and batch jobs.Apply real-time policies. Since Logstash sees every log almost immediately, it’s a great place to enforce invariants, augment events, and make routing decisions. For example, logs that come in without required fields are flagged (or dropped). We add classification tags based on source and content patterns. Metrics are sent to a metric service. Critical events are pushed to an SNS topic.Control the codeThe output is as important as the input. Now that you’re pushing all your logs to a log management service and interacting happily through search and reports, extend the service by making use of indexes and aggregation operators from your own code.Wrap the APIGood log management services have good APIs and Sumo Logic has several. The Search Job API is particularly powerful, giving access to streaming results in the same way we’re used to in their search UI.Swipely created the sumo-search gem in order to take advantage of the Search Job API. We use it to permit arbitrary action on the results of a search.Custom alerts and dashboardsBringing searches into the comfort of the Unix shell is part of the appeal of a tool like this, but even more compelling is bringing them into code. For example, Swipely uses sumo-search from a periodic job to send alerts that are more actionable than just the search query results. We can select the most pertinent parts of the message and link in information from other sources.Engineers at Swipely start weekly tactical meetings by reporting trailing seven day metrics. For example: features shipped, slowest requests, error rates, analytics pipeline durations. These indicators help guide and prioritize discussion. Although many of these metrics are from different sources, we like to see them together in one dashboard. With sumo-search and the Search Job API, we can turn any number from a log query into a dashboard widget in a couple lines of Ruby.Giving up control is not the price of SaaS convenience. Sumo Logic does the heavy lifting of log management for Swipely and provides an interface that allows us to stay flexible. We control data on the way in by preferring open source tools in the early stages of our log pipeline and saving everything we send to S3. We preserve our ability to extend functionality by making their powerful search API easy to use from both shell and Ruby.We’d appreciate feedback (@swipelyeng) on our logging architecture. Also, we’re not really control freaks and would love pull requests and suggestions on sumo-search!

AWS

October 2, 2014

Blog

Debugging Amazon SES Message Delivery Using Sumo Logic

We at Sumo Logic use Amazon SES (Simple Email Service) for sending thousands of emails every day for things like search results, alerts, account notifications etc. We need to monitor SES to ensure timely delivery and know when emails bounce. Amazon SES provides notifications about status of email via Amazon SNS (Simple Notification Service). Amazon SNS allows you to send these notifications to any HTTP endpoint. We ingest these messages using Sumo Logic’s HTTP Source. Using these logs, we have identified problems like scheduled searches which always send results to an invalid email address; and a Microsoft Office 365 outage when a customer reported having not received the sign up email. Here’s a step by step guide on how to send your Amazon SES notifications to Sumo Logic. 1. Set Up Collector. The first step is to set up a hosted collector in Sumo Logic which can receive logs via HTTP endpoint. While setting up the hosted collector, we recommend providing an informative source category name, like “aws-ses”. 2. Add HTTP Source. After adding a hosted collector, you need to add a HTTP Source. Once a HTTP Source is added, it will generate a URL which will be used to receive notifications from SNS. The URL looks like https://collectors.sumologic.com/receiver/v1/http/ABCDEFGHIJK. 3. Create SNS Topic. In order to send notifications from SES to SNS, we need to create a SNS topic. The following picture shows how to create a new SNS topic on the SNS console. We uses “SES-Notifications” as the name of the topic in our example. 4. Create SNS Subscription. SNS allows you to send a notification to multiple HTTP Endpoints by creating multiple subscriptions within a topic. In this step we will create one subscription for the SES-Notifications topic created in step 3 and send notifications to the HTTP endpoint generated in step 2. 5. Confirm Subscription. After a subscription is created, Amazon SNS will send a subscription confirmation message to the endpoint. This subscription confirmation notification can be found in Sumo Logic by searching for: _sourceCategory=<name of the sourceCategory provided in step 1> For example: _sourceCategory=aws-ses Copy the link from the logs and paste it in your browser. 6. Send SES notifications to SNS. Finally configure SES to send notifications to SNS. For this, go to the SES console and select the option of verified senders on the left hand side. In the list of verified email addresses, select the email address for which you want to configure the logs. The page looks like On the above page, expand the notifications section and click edit notifications. Select the SNS topic you created in step 3. 7. Switch message format to raw (Optional). SES sends notifications to SNS in a JSON format. Any notification sent through SNS is by default wrapped into a JSON message. Thus in this case, it creates a nested JSON, resulting in a nearly unreadable message. To remove this problem of nested JSON messages, we highly recommend configuring SNS to use raw message delivery option. Before setting raw message format After setting raw message format JSON operator was used to easily parse the messages as show in the queries below: 1. Retrieve general information out of messages _sourceCategory=aws-ses | json “notificationType”, “mail”, “mail.destination”, “mail.destination[0]”, “bounce”, “bounce.bounceType”, “bounce.bounceSubType”, “bounce.bouncedRecipients[0]” nodrop 2. Identify most frequently bounced recipients _sourceCategory=aws-ses AND !”notificationType”:”Delivery” | json “notificationType”, “mail.destination[0]” as type,destination nodrop | count by destination | sort by _count

AWS

October 2, 2014

Blog

We are Shellshock Bash Bug Free Here at Sumo Logic, but What about You?

Blog

Why Do DevOps Shops Care About Machine Data Analytics?

Introduction The IT industry is always changing, and at the forefront today is the DevOps movement. The whole idea of DevOps is centered around helping businesses become more responsive to user requests and adapt faster to market conditions. Successful DevOps rollouts count on the ability to rapidly diagnose application issues that are hidden in machine data. Thus, the ability to quickly uncover patterns and anomalies in your logs is paramount. As a result, DevOps shops are fast becoming a sweet spot for us. Yes, DevOps can mean so many things – lean IT methodologies, agile software development, programmable architectures, a sharing culture and more. At the root of it all is data, especially machine data. In the midst of this relatively recent boom, DevOps teams have been searching for tools that help them to fulfill their requirements. Sumo Logic is a DevOps shop and at DevOps Days in Austin, we detailed our our own DevOps scale-up. We covered everything from culture change, to spreading knowledge and the issues that we faced. The result has been that our machine data analytics service is not only incredibly useful to us as a DevOps organization but provides deep insights for any organization looking to optimize its processes. Sumo Logic At Work In A DevOps Setting The very notion of software development has been rocked to its core by DevOps, and that has been enabled by rapid analysis in the development lifecycle. Sumo Logic makes it possible to easily integrate visibility into any software infrastructure and monitor the effects of changes throughout development, test and production environments. Data analysis can now cast a wide net and with our custom dashboards and flexible integration, can take place anywhere you can put code. Rapid cause-and-effect, rapid error counts, and rapid analysis mean rapid software development and code updating. If user performance has been an issue, DevOps and Sumo Logic can address those experiences as well through analytic insight from relevant data sources in your environment. That makes for better software for your company and your customers. It also means happier developers and we know that hasn’t traditionally been an easy task. Sumo Logic offers an enterprise scale cloud-based product that grows as a business grows. TuneIn, a well-known internet radio and podcast platform utilizes Sumo Logic, and in a recent guest post, their development teams shared how they used our technology to create custom searches and alerts for errors and exceptions in the logs, allowing them to reduce overall error rates by close to twenty percent. Another Sumo Logic customer, PagerDuty shared their story of a rapid Sumo Logic DevOps deployment and reaching their ROI point in under a month: Flexibility, speed, scalability, and extensibility – these are the kind of qualities in their commercial tools that DevOps shops are looking for. Netskope is a cloud based security company and a DevOps shop that has integrated Sumo Logic into their cloud infrastructure. In this video, they describe the value of Sumo Logic to provide instant feedback into the performance and availability of their application. Today, DevOps teams around the world are using Sumo Logic to deliver the insights they need on demand. With Sumo Logic supporting DevOps teams throughout their application lifecycle, organizations are able to deliver on the promise of their applications and fulfill their business goals. https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

Blog

Secret Santa - The Math Behind The Game

It’s that time of year again! Time for Secret Santa. After all, what shows off your holiday spirit better than exchanging gifts in August? As you attempt to organize your friends into a Secret Santa pool, though, I wonder if you appreciate the beautiful math going on in the background. For those of you unfamiliar with Secret Santa, here’s the basic idea. A group of friends write their names on slips of paper and drop them into a hat. Once everyone’s name is in, each person blindly draws out a name from the hat. These slips of paper indicate whose Secret Santa each person is. For the sake of simplicity, let us assume that if a person draws their own name, they are their own Secret Santa. As an example, consider a group of three friends: Alice, Bob, and Carol. Alice draws Bob’s name out of the hat. Bob draws Alice’s name out of the hat. Carol draws her own name out of the hat. In this example, Alice will give Bob a gift; Bob will give Alice a gift; and Carol will give herself a gift. Here comes the math. In the example previously described, I would argue that there are two “loops” of people. A loop can be defined as an ordered list of names such that each person gives a gift to the next person in the list except for the last person, who gives to the first person in the list. Below we see a graphical interpretation of the example that clearly shows two loops. Alice and Bob are one loop while Carol is her own loop. We could equally well display this information by using a list. Alice gives a gift to the first person in the list, Bob gives to the second person, and Carol gives to the third person. Thus we can describe the graph above by writing [B, A, C]. One can easily imagine a different arrangement of gift-giving resulting in different number of loops, however. For example, if Alice drew Bob’s name, Bob drew Carol’s name, and Carol drew Alice’s name, there would only be one loop. If Alice drew her own name, Bob his own name, and Carol her own name, there would be three loops. [B, C, A] [A, B, C] In these diagrams, each node is a person and each edge describes giving a gift. Note that each person has exactly one incoming and one outgoing edge since everybody receives and gives one gift. Below each diagram is the corresponding list representation. The question that had been keeping me up at night recently is as follows: for a group of x people participating in Secret Santa, what is the average number of loops one can expect to see after everyone has drawn names from the hat? After I started touting my discovery of a revolutionary graph theory problem to my friends, they soon informed me that I was merely studying the fairly well known problem of what is the expected number of cycles in a random permutation. Somewhat deflated but determined to research the problem for myself, I pressed on. To get a rough estimate of the answer, I first simulated the game on my computer. I ran 100 trials for x ranging from 1 to 100 and calculated the number of loops for each trial. I plotted the results and noticed that the resulting curve looked a lot like a log curve. Here’s the graph with a best-fit log line on top. The jitters in the curve no doubt come from not sampling enough simulated trials. Even with that noise, though, what is truly remarkable is that the expected number of loops is nearly exactly equal to the natural log of how many people participate. These results gave me insights into the problem, but they still didn’t give a completely satisfactory answer. For very small x, for example, ln(x) is a terrible approximation for the number of loops. If x=1, the expected number of loops is necessarily 1 but my log-based model says I should expect 0 loops. Furthermore, intuitively it seems like calculating the number of loops should be a discrete process rather than plugging into a continuous function. Finally, I still didn’t even know for sure that my model was correct. I resolved to analytically prove the exact formula for loops. Let f(x) represent the average number of loops expected if x people participate in Secret Santa. I decided to work off the hypothesis that f(x)=1+12+13+...+1x (also known as the xth harmonic number). This equation works for small numbers and asymptotically approaches ln(x) for large x. Since I already know f(x) is correct for small x, the natural way to try to prove my result generally is through a proof by induction. Base Case: Let x=1 f(x)=1 The average number of loops for a single person in Secret Santa is 1. The base case works. Inductive Step: Assume f(x)=1+12+13+...+1x Prove that f(x+1)=1+12+13+...+1x+1x+1 f(x+1)=[f(x)+1]*1x+1+f(x)*xx+1 f(x+1)=f(x)+1x+1 f(x+1)=1+12+13+...+1x+1x+1 Q.E.D. The key insight into this proof is the first line of the inductive step. Here’s one way to think about it if by using our list representation described earlier: There are two cases one needs to consider. 1) The last element that we place into the x+1spot in the list has value x+1 . This means the first x spots contain all the numbers from 1 to x . The odds of this happening are 1x+1 . Crucially, we get to assume that the average number of loops from the first x elements is therefore f(x) . Adding the last element adds exactly one loop: player x+1 giving himself a gift. 2) The last element that we place into the x+1 spot in the list does not have value x+1 . This covers all the other cases (the odds of this happening are xx+1 ). In this scenario, one of the first x people points to x+1and x+1points to one of the first x people. In essence the x+1th person is merely extending a loop already determined by the first x people. Therefore the number of loops is just f(x). If we assume a uniform distribution of permutations (as assumption that is easily violated if players have choice ) and we weight these two cases by the probability each of them happening, we get the total expected number of loops for f(x+1). Just like that, we have proved something theoretically beautiful that also applies to something as mundane as a gift exchange game. It all started by simulating a real-world event, looking at the data, and then switching back into analytical mode. **************************************************************************** As I mentioned above, my “research” was by no means novel. For further reading on this topic, feel free to consult this nice summary by John Canny about random permutations or, of course, the Wikipedia article about it. Since starting to write this article, a colleague of mine has emailed me saying that someone else has even thought of this problem in the Secret Santa context and posted his insights here.

September 25, 2014

Blog

LogReduce vs Shellshock

Blog

Piercing the Fog in a Devops World

Two things still amaze me about the San Francisco Bay area two years on after moving here from the east coast – the blindingly blue, cloudless skies – and the fog. It is hard to describe how beautiful it is to drive up the spine of the San Francisco Peninsula on northbound I-280 as the fog rolls over the Santa Cruz mountains. You can see the fog pouring slowly over the peaks of the mountains, and see the highway in front of you disappear into the white, fuzzy nothingness of its inexorable progress down the valley. There is always some part of me that wonders what will happen to my car as I pass into the fog. But then I look at my GPS, know that I have driven this road hundreds of times, and assure myself that my house does still exist in there – somewhere. Now, I can contrast that experience with learning to drive in the Blue Ridge Mountains of North Carolina. Here’s the background – It’s only my second time behind the wheel, and my Mom takes me on this crazy stretch of road called the Viaduct. Basically, imagine a road hanging off the side of a mountain, with a sheer mountain side on the one side, and a whole lot of nothing on the other. Now, imagine that road covered in pea-soup fog with 10 ft visibility, and a line of a half dozen cars being led by a terrified teenager with white knuckled hands on the wheel of a minivan hoping he won’t careen off the side of the road to a premature death. Completely different experience. So, what’s the difference between those two experiences. Well, 20 years of driving, and GPS for starters. I don’t worry about driving into the thick fog as I drive home because I have done it before, I know exactly where I am, how fast I am going, and I am confident that I can avoid obstacles. That knowledge, insight, and experience make all the difference between an awe-inspiring journey and a gut-wrenching nail-biter. This is really not that different from running a state of the art application. Just like I need GPS and experience to brave the fog going home, the difference between confidently innovating and delighting your customers, versus living in constant fear of the next disaster, is both driven by technology and culture. Here are some ways I would flesh out the analogy: GPS for DevOps An app team without visibility into their metrics and errors is a team that will never do world-class operations. Machine Data Analytics provides the means to gather the telemetry data and then provide that insight in real-time. This empowers App Ops and DevOps teams to move more quickly and innovate. Fog Lights for Avoiding Obstacles You can’t avoid obstacles if you can’t see them in time. You need the right real-time analytics to quickly detect issues and avoid them before they wreck your operations. Experience Brings Confidence If you have driven the road before, it is always increases confidence and speed. Signature-Based anomaly detection means that the time that senior engineers put in to classify previous events gives the entire team the confidence to classify and debug issues. So, as you drive your Application Operations and DevOps teams to push your application to the cutting edge of performance, remember that driving confidently into the DevOps fog is only possible with the right kind of visibility. Images linked from: http://searchresearch1.blogspot.com/2012/09/wednesday-search-challenge-9512-view-of.html https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

Blog

Changing Representation

I don’t deal in veiled motives — I really like information theory. A lot. It’s been an invaluable conceptual tool for almost every area of my work; and I’m going to try to convince you of its usefulness for engineering problems. Let’s look at a timestamp parsing algorithm in the Sumo Logic codebase. The basic idea is that each thread gets some stream of input lines (these are from my local /var/log/appfirewall.log), and we want to parse the timestamps (bolded) into another numeric field: Jul 25 08:33:02 vorta.local socketfilterfw[86] <Info>: java: Allow TCP CONNECT (in:5 out:0) Jul 25 08:39:54 vorta.local socketfilterfw[86] <Info>: Stealth Mode connection attempt to UDP 1 time Jul 25 08:42:40 vorta.local socketfilterfw[86] <Info>: Stealth Mode connection attempt to UDP 1 time Jul 25 08:43:01 vorta.local socketfilterfw[86] <Info>: java: Allow TCP LISTEN (in:0 out:1) Jul 25 08:44:17 vorta.local socketfilterfw[86] <Info>: Stealth Mode connection attempt to UDP 6 time Being a giant distributed system, we receive logs with hundreds of different timestamp formats, which are interleaved in the input stream. CPU time on the frontend is dedicated to parsing raw log lines, so if we can derive timestamps more quickly, we can reduce our AWS costs. Let’s assume that exactly one timestamp parser will match–we’ll leave ambiguities for another day. How can we implement this? The naive approach is to try all of the parsers in an arbitrary sequence each time and see which one works; but all of them are computationally expensive to evaluate. Maybe we try to cache them or parallelize in some creative way? We know that caching should be optimal if the logs were all in the same format; and linear search would be optimal if they were randomly chosen. In any case, the most efficient way to do this isn’t clear, so let’s do some more analysis: take the sequence of correct timestamp formats and label them: Timestamp Format Label Jul 25 08:52:10 MMM dd HH:mm:ss Format 1 Fri Jul 25 09:06:49 PDT 2014 EEE MMM dd HH:mm:ss ZZZ yyyy Format 2 1406304462 EpochSeconds Format 3 [Jul 25 08:52:10] MMM dd HH:mm:ss Format 1 How can we turn this into a normal, solvable optimization problem? Well, if we try our parsers in a fixed order, the index label is actually just the number of parsing attempts before hitting the correct parser. Let’s keep the parsers in the original order and add another function that reorders them, and then we’ll try them in that order: Format Parser Label Parser Index MMM dd HH:mm:ss Format 1 2 EEE MMM dd HH:mm:ss ZZZ yyyy Format 2 1 EpochSeconds Format 3 3 This is clearly better, and we can change this function on every time step. Having the optimal parser choice be a low number is always better, because we’re trying to minimize the time delay of the parsing process: (Time Delay) (# Tries) But can we really just optimize over that? It’s not at all clear to me how that translates into an algorithm. While it’s a nice first-order formulation, we’re going to have to change representations to connect it to anything more substantial. Parser Index Parser Index (Binary) Parser Index (Unary) 2 10 11 1 1 1 3 11 111 This makes it clear that making the parser index small is equivalent to making its decimal/binary/unary representation small. In other words, we want to minimize the information content of the index sequence over our choice of parsers. In mathematical terms, the information (notated H) is just the sum of -p log p over each event, where p is the event’s probability. As an analogy, think of -log p as the length of the unary sequence (as above) and p as the probability of the sequence — we’ll use the experimental probability distribution over the parser indices that actually occur. As long as the probability of taking more tries is strictly decreasing, minimizing it also minimizes the time required because the information is strictly increasing with the number of tries it takes. arg min{Time Delay} =arg min{Sequence Length * Probability of sequence} =arg min {-p(# Tries) * log(p(# Tries)) } = arg min{ H(# Tries) } That’s strongly suggestive that what we want to use as the parser-order-choosing function is actually a compression function, whose entire goal in life is to minimize the information content (and therefore size) of byte sequences. Let’s see if we can make use of one: in the general case, these algorithms look like Seq(Int) Seq(Int), making the second sequence shorter. Parser Index Sequence: Length 13 Parser Index (LZW Compressed): Length 10 12,43,32,64,111,33,12,43,32,64,111,33,12 12,43,32,64,111,33,256,258,260,12 Let’s say that we have some past sequence — call it P — and we’re trying to find the next parser-index mapping. I admit that it’s not immediately clear how to do this with a compression algorithm a priori, but if we just perturb the algorithm, we can compare the options for the next functions as: newInfo(parser label) = H(compress(P + [parser label]))-H(compress(P)) Any online compression algorithm will allow you to hold state so that you don’t have to repeat computations in determining this. Then, we can just choose the parser with the least newInfo; and if the compressor will minimize information content (which I’ll assume they’re pretty good at), then our algorithm will minimize the required work. If you’d like a deeper explanation of compression, ITILA [1] is a good reference. With a fairly small, reasonable change of representation, we now have a well-defined, implementable, fast metric to make online decisions about parser choice. Note that this system will work regardless of the input stream — there is not a worst case except those of the compression algorithm. In this sense, this formulation is adaptive. Certainly, the reason that we can draw a precise analogy to a solved problem is because analogous situations show up in many fields, which at least include Compression/Coding, Machine Learning [2], and Controls [3]. Information theory is the core conceptual framework here, and if I’ve succeeded in convincing you, Bayesian Theory [4] is my favorite treatment. References: Information Theory, Inference, and Learning Algorithms by David MacKay Prediction, Learning, and Games by Nicolo Cesa-Bianchi and Gabor Lugosi. Notes on Dynamic Programming and Optimal Control by Demitri Bertsekas Bayesian Theory by Jose Bernardo and Adrian Smith

September 19, 2014

Blog

New MySQL App - Now GA!

Just check out the MySQL website and you'll understand why we added a new app to the Sumo Logic Application Library for MySQL: MySQL is the world's most popular open source database software, with over 100 million copies of its software downloaded or distributed throughout its history. As we spoke to companies about what insight they are missing from MySQL today, there were 3 common themes: Understanding errors: Simply aggregating all the logs into one place for analysis would reduce Mean Time To Investigate Insight into replication issues - many companies are flying blind in this area Query performance - understanding not simply the slowest performers but changes over time So, we created a set of dashboards and queries that target these areas. The application’s overview dashboard is especially useful because it highlights the daily pattern of slow queries - along with a seven-day average baseline to make it really clear when something isn’t right. You can jump from here into any of the other dashboards... … including more detail on the slow queries, specifically. The area that surprised me most from my discussions with customers was the need for insight into replication. It’s clearly a painpoint for companies running MySQL - not because of MySQL per se, more because of the scale of customer environments. Issues with replication are often only uncovered once they have passed a certain pain threshold! With good log analysis, we were able to create a useful dashboard on replication. One of our beta customers said that the app was immediately valuable: “This is useful! Right away I learned that I should add an index to....”. Obviously, we were thrilled with this type of feedback! We have other dashboards and useful searches in the application to give you greater insight into your MySQL deployment. The App is available now in the Sumo Logic Application Library. Go get it - and let me know what you think!

September 17, 2014

Blog

Why TuneIn Chose Sumo Logic For Machine Data Analytics

Blog

Regular Expressions - No Magic, Part 3

Blog

Debugging to Customer Hugging - Becoming an SE

"I know app developers, and that's not you!" It was a statement that I couldn't really argue with, and it was coming from one of my closest friends. It didn't matter that I was employed in a career as an app developer at one of the top software companies in the world. It didn't matter that I was performing well and the tools and applications I coded were being used by hundreds of internal developers. It didn't even matter that the friend making the conclusion had never written a single line of code in his life, nor had he any idea of my technical ability. The funny thing was, he meant it as a compliment, and so began the biggest career transition of my life. Coding and logic puzzles were always very intuitive to me, so I always enjoyed solving a variety of technical challenges. Yet, articulation, interpersonal communication and cross-team collaboration were some of my other strong suits I felt weren’t being used in my professional life. My career ambitions to be the biggest success possible combined with my desire to fulfill my potential always had me wondering if there was a role better suited for me where I would be able to leverage both diverse skills sets. Over the years I had many mentors and through all the various conversations and constructive criticism, the same trend was always prevalent. They all thought I could be more successful within a Program Manager or Technical Lead role as it would allow me to take advantage of these strengths that were being under-used in a purely development-focused role. So I made those career moves, but decided to stay within the company. After all, I didn't want to cast away the experience and knowledge I had gained during my role there, and believed it would propel me in my new roles as they were in a related field. It did; I continued to be successful, and it was certainly a step in the right direction, but needed to be taken further. I had tunnel vision and when I looked at my career, all my choices seemed a little too safe. It was time to take a risk. I was informed of the Sales Engineering role as it could be the perfect position for me to stretch my wings and use my full potential. The more I looked into it, the better it seemed. I would be a technical expert with deep knowledge of the product while at the same time selling the value of the solution to potential clients. I would be listening to the customer's needs and educating them on whether or not our product would be the best fit for them. After spending so much time on research and development teams creating software with the same handful of peers every day, the prospect of working with a mixture of clients who were the top engineering minds in the world across a plethora of different technologies was enticing. Just the ability to work with these industry leaders in a variety of different challenges allowed me to solve more technical problems than I was ever able to do as a developer working on a only a handful of projects over the course of a year. I had warmed up to the idea and it was time to commit to something new. There is one area of the world that people consistently consider the "Mecca of Tech," and that is the San Francisco / Silicon Valley Bay Area. That was settled. If I was going to go into sales, I had promised myself I would never sell a product in which I didn't have full confidence, so I needed to find a company with a product I really believed in. Enter Sumo Logic: a fully cloud based data analytics and machine learning solution. Curious, I created a free account and played around with the product. In a very short time, I could see the impressive power and versatile functionality, the value it could provide to nearly any tech company. Also growing at a tremendous rate, supported by the top investors and sporting a unique combination of relatively low risk and high upside, I couldn't craft an argument to deter myself from joining the company. I interviewed, and when offered, accepted the job. After committing, what awaited me next felt like a breath of fresh air. Joining a start up from a large company and transitioning into the sales field from development, I didn't know what type of culture to expect. What awaited me was a company culture where team members are genuinely and actively supportive, and it was awesome. In the first couple months I learned more about various technologies in the market than I ever knew existed before. I work with customers and drastically improve their systems, processes and consequently their careers. I did not expect to be able to contribute to our internal product development process yet I have our best engineers coming to ask which direction we should take our features. Being able to work with customers and feel like you're truly helping them while at the same time continuing to design and engineer a product on the cutting edge is the best of both worlds, and the sizable increase in compensation isn't a bad side effect either. I have no regrets in making the biggest career transition of my life, I'm happier than I've ever been and I'm not looking back. If you want to join Ozan and work as a Sales Engineer at Sumo, click here!

September 10, 2014

Blog

Regular Expressions - No Magic, Part 2

Blog

Four Ways to Collect Docker Logs in Sumo Logic

Blog

How To Add Sumo Logic To A Dockerized App

Sumo Logic is a nifty service that makes it easy watch and analyze your apps’ logs. You install a collector local to the app in question, point it at some files to watch, and your logs are sent to the cloud, where they can be sliced and diced from a nice web UI. At a high level, Sumo Logic wants to transform big data logs into operational information. But at a practical level, it’s nice not to have to SSH to a production machine and tail or grep enormous files to take your app’s pulse. Not to mention the idea is totally in line with one of the Twelve Factors: treating logs as event streams enables better understanding of an app’s behavior over time. I’ve had success using SumoLogic at OpenX, so I wanted to give their free trial a shot for a personal project. The only limitations are a 500MB of data per day limit and 7 days of retention. I was surprised not to find anything on the web for installing Sumo Logic alongside a Dockerized app, and I had a couple of Docker-based candidates. So without further ado, here’s how to add Sumo Logic to a Dockerized app: 1. Sign up for Sumo Logic Free Head over to sumologic.com/signup to sign up. The only catch here is that you’ll need a company email address. For the project I’m going to use SumoLogic for, I own and manage my own domain, so it wasn’t too much trouble to create an email address using my registrar’s mail service. Since I host the domain separately, I did have to add an MX record to my zone file to point to the registrar’s mail server. For example, with DigitalOcean. 2. Download a Collector Once you confirm your email address and log in, you be stepped through a process for downloading and installing a collector. I chose the Installed Collector and downloaded sumocollector_19.91-2_amd64.deb, becuase my Docker image is based on Ubuntu. After downloading the collector, the setup wizard proceeds to a screen that spins until it detects a newly installed collector. I didn’t yet know how I was going to install it, and I got logged out of Sumo Logic anyway due to inactivity, so I abandoned the wizard at that point. The Sumo Logic UI changed itself as soon as it detected that my first collector had been installed. As I plan to install the Sumo Logic collector during the docker build process, I uploaded the .deb file to a Dropbox and grabbed the public link to use later. 3. Create Access Keys When a collector client is installed it has to have some way of authenticating to the Sumo Logic server. The docs for creating a sumo.conf file (we’ll get there soon) offer two choices: (1) provide your Sumo Logic email and password, or (2) provide access keys generated from the UI. The latter is recommended if only to avoid storing a username/password in plaintext. Keys can be generated from Manage → Collectors → Access Keys → Create. 4. Augment your Docker Container Here’s the Docker-specific part of installing Sumo Logic. We’ll add some lines to our app’s Dockerfile and author two files that are ADDed to the container during a docker build. I assume working knowledge of Docker, but here is the list of Dockerfile commands for good measure. 4.1 Create sumo.conf First create a sumo.conf file like the following: name={collector_name} accessid={your_access_id} accesskey={your_access_key} where name is an arbitrary name for this collector, and accessid and accesskey are those generated in step 3. There are many more conf options specified here but the important ones, namely sources, can actually be configured through the UI later on. By convention I put Docker-specific files into .docker/{resource}, so this one goes to .docker/sumo/sumo.conf. It’ll be referenced in our Dockerfile shortly. 4.2 Modify your Dockerfile Add a block like the following to your Dockerfile (assumed to live in the root of your app’s code), preferably before your actual app is added: # install sumologic RUN apt-get -qq update RUN apt-get install -y wget RUN wget https://www.dropbox.com/path/to/sumocollector_19.91-2_amd64.deb RUN dpkg -i sumocollector_19.91-2_amd64.deb RUN rm sumocollector_19.91-2_amd64.deb ADD .docker/sumo/sumo.conf /etc/sumo.conf ADD .docker/sumo/start_sumo /etc/my_init.d/start_sumo Let’s break this down: RUN apt-get -qq update Update sources. This may not be necessary, but I like to put this before each dependancy installed by my Dockerfile to avoid issues with image caching. RUN apt-get install -y wget RUN wget https://www.dropbox.com/path/to/sumocollector_19.91-2_amd64.deb We’ll use wget to grab the collector file we uploaded in step 2. You may opt to ADD the file locally, but this option avoids having to check the resource into your app’s source code, while housing it in a consistent location. Better practice would be to store it in some kind of artifact repository and version it. RUN dpkg -i sumocollector_19.91-2_amd64.deb RUN rm sumocollector_19.91-2_amd64.deb Install the debian package and clean up. ADD .docker/sumo/sumo.conf /etc/sumo.conf Copy the newly created sumo.conf file to the place where the collector expects to find it. Before we get to the last line, let’s pause. If you were able to catch the output from installing the collector, you saw something like: Preparing to unpack sumocollector_19.91-2_amd64.deb ... Unpacking sumocollector (1:19.91-2) ... Setting up sumocollector (1:19.91-2) ... configuring collector.... configuring collector to run as root Detected Ubuntu: Installing the SumoLogic Collector daemon using init.d.. Adding system startup for /etc/init.d/collector ... /etc/rc0.d/K20collector -> ../init.d/collector /etc/rc1.d/K20collector -> ../init.d/collector /etc/rc6.d/K20collector -> ../init.d/collector /etc/rc2.d/S20collector -> ../init.d/collector /etc/rc3.d/S20collector -> ../init.d/collector /etc/rc4.d/S20collector -> ../init.d/collector /etc/rc5.d/S20collector -> ../init.d/collector Collector has been successfully installed. Please provide account credential in /etc/sumo.conf and start it up via service or init.d script! It was only after sifting through my docker output that I saw this and learned about the existence of a sumo.conf file. Before that, nothing was happening in the Sumo Logic UI because no collector had been correctly installed and started, even when I started the container. Anyway, we got /etc/sumo.conf out of the way, so what about starting it up “via service or init.d script”? My solution was to include a simple bash script that starts the collector service on startup. But my Dockerfile extends phusion/baseimage-docker, which uses a custom init system. So the last Dockerfile command, ADD .docker/sumo/start_sumo /etc/my_init.d/start_sumo adds a file called start_sumo like: #!/bin/bash service collector start into /etc/my_init.d. Make sure it’s executable with chmod +x. Like the conf file, this is saved into .docker/sumo/start_sumo of the app code repository. I am very open to more elegant ways for getting the Sumo Logic collector to start. I’d also like to see how non-baseimage users deal with init requirements. I would have done this as a runit script as recommended by the baseimage-docker README, but the collector script appears to automatically daemonize itself, which breaks runit. 5. Build and Deploy! I ran docker build and docker run as usual, and voilà!, the newly installed collector popped up in Manage → Collectors. 6. Configure Sources Before we start seeing logs, we have to tell Sumo what a log file is. I clicked Manage → Collectors → Add → Add Source and added a Local File entry that had the absolute path to a log file I was interested in. One of the Sumo Logic videos I watched noted that specifying /path/to/log/dir/** will pick up all log files in a directory. I waited a couple of minutes, and log messages started coming into the UI. Sweet! Keep in mind that multiple sources can be added for a single collector. So far, I’ve learned that I can get a bird’s eye view of all my logs from Manage → Status, and look at actual log messages from Search. I haven’t spent time really getting to know the various queries yet, but if they’re worth writing about, expect another post. Possible Improvement: The above example installs Sumo Logic inside the app container. An alternate approach might have Sumo installed on the host (or in its own Docker container), reading log files from a shared data volume. This has the benefits of (1) requiring only a single Sumo Logic install for potentially more than one app container, and (2) architectural separation of app from log consumption. That’s it! This turned out to be surprisingly simple. Kudos to Sumo Logic for offering an easy to use service + free tier that’s totally feasible for smallish apps. This is a guest post from Caleb Sotelo who is a software engineer at OpenX and has been reprinted with his permission. You can view the original here.

Blog

Machine Data Analytics, Down Under

Not often have I spent two weeks in August in a “winter” climate, but it was a great opportunity to spend some time with our new team in Australia, visit with prospects, customers and partners, and attend a couple of Amazon Web Service Summits to boot. Here are some straight-off-the-plane observations. A Local “Data Center” Presence Matters: We now have production instances in Sydney, Dublin and the United States. In conversations with Australian enterprises and government entities, the fact that we have both a local team and a local production instance went extremely far when determining whether we were a good match for their needs. This was true whether their use case centered around supporting their security initiatives or enabling their DevOps teams to release applications faster to market. You can now select where your data resides when you sign up for Sumo Logic Free. Australia is Ready For the Cloud: From the smallest startup to extremely large mining companies, everyone was interested in how we could support their cloud initiatives. The AWS Summits were packed and the conversations we had revolved not just around machine data analytics but what we could do to support their evolving infrastructure strategy. The fact that we have apps for Amazon S3, Cloudfront, CloudTrail and ELB made the conversations even more productive, and we’ve seen significant interest in our special trial for AWS customers. We’re A Natural Fit for Managed Service Providers: As a multi-tenant service born in the Cloud, we have a slew of advantages for MSP and MSSPs looking to embed proactive analytics into their service offering, as our work with The Herjavec Group and Medidata shows. We’ve had success with multiple partners in the US and the many discussions we had in Australia indicate that there’s a very interesting partner opportunity there as well. Analytics and Time to Insights: In my conversations with dozens of people at the two summits and in 1-1 meetings, two trends immediately stand out. While people remain extremely interested in how they can take advantage of real-time dashboards and alerts, one of their bigger concerns typically revolved around how quickly they could get to that point. “I don’t have time to do a lot of infrastructure management” was the common refrain and we certainly empathize with that thought. The second is just a reflection on how we sometimes take for granted our pattern recognition technology, aka, LogReduce. Having shown this to quite a few people at the booth, the reaction on their faces never gets old especially after they see the order of magnitude by which we reduce the time taken to find something interesting in their machine data. At the end of the day, this is a people business. We have a great team in Australia and look forward to publicizing their many successes over the coming quarters.

AWS

August 20, 2014

Blog

Regular Expressions - No Magic

Blog

Building Stable Products Through Testing

Software systems today contain hundreds of thousands to millions of lines of code written from anywhere between a few developers at a start up to thousands at today’s software giants. Working with large amounts of code with many developers results in overlapping usage and modification of the API’s being used by the developers. With this comes the danger of a small change breaking large amounts of code. This raises the question of how we as developers working on projects of this scale can ensure that the code that we write not only works within the context in which we are working, but also doesn’t cause bugs in other parts of the system. (Getting it to compile always seems to be the easy part!)Here at Sumo Logic we currently have over 60 developers working on a project with over 700k lines of code1 split up into over 150 different modules2. This means that we have to be mindful of the effects that our changes introduce to a module and the effect that the changed module has on other modules. This gives us two options. First, we can try really really hard to find and closely examine all of the places in the code base that our changes affect to make sure that we didn’t break anything. Second, we can write tests for every new functionality of our code. We prefer the second option, not only because option one sounds painful and error prone, but because option two uses developers’ time more efficiently. Why should we have to waste our time checking all the edge cases by hand every time that we make a change to the project?For this reason, at Sumo Logic we work by using the methods of test driven development. (For more on this see test driven development.) First we plan out what the end functionality of our changes should be for the system and write both unit and integration tests for our new functionality. Unit tests offer specific tests of edge cases and core functionality of the code change, while integration tests exercise end to end functionality. Since we write the tests before we write the new code, when we first run the updated tests, we will fail them. But, this is actually what we want! Now that we have intentionally written tests that our code will fail without the addition of our new functionality, we know that if we can manage to pass all of our new tests as well as all of the pre-existing tests written by other developers that we have succeeded with our code change. The benefits of test driven development are two-fold. We ensure that our new functionality is working correctly and we maintain the previous functionality. Another instance in which test driven development excels is in refactoring code. When it becomes necessary to refactor our code, we can refactor at ease knowing that the large suites of tests that we wrote during the initial development can tell us if we succeeded in our refactoring. Test driven development calls this red-green-refactor where red means failing tests and green means passing tests. Rejoice, with well-written tests we can write new code and refactor our old code with confidence that our finished work will continue to function without introducing bugs into the system.Despite all of this testing, it is still possible for bugs to slip through the cracks. To combat these bugs, here at Sumo Logic we have multiple testing environments for our product before it is deemed to be of a high enough standard to be released to our customers. We have four different deployments. Three of them are for testing and one is for production. (For more on this see deployment infrastructure and practices.) Our QA team performs both manual and additional automated testing on these deployments, including web browser automation tests. Since we need a sizable amount of data to test at scale, we route log files from our production deployment into our pre-production environments. This makes us one of our own biggest customers! The idea of this process is that by the time a build passes all of the unit/integration tests and makes it through testing on our three non-production environments, all of the bugs will be squashed allowing us to provide our customers with a stable high performing product.1 ( find ./ -name ‘*.scala’ -print0 | xargs -0 cat ) | wc -l2 ls -l | wc -l

August 13, 2014

Blog

The Internet of Things...and by "Things" we mean Cats! [Infographic]

Blog

Our Help? It’s in the Cloud.

I like to fashion myself as a lower-level Cloud evangelist. I’m amazed at the opportunities the Cloud has afforded me both professionally and personally in the past four or five years. I tend to run head-first into any Cloud solution that promises to make my life better, and I’m constantly advocating Cloud adoption to my friends and family.The consumer-level Cloud services that have developed over the past few years have changed how I relate to technology. Just like everyone else, I struggle with balancing work, mom duties, volunteer activities, and so on. Being able to keep my data handy simplifies my life–having records in the Cloud has saved me in several situations where I could just call up a document on my iPhone or iPad. No matter which Cloud app I’m using, I’m in the loop if I’m sitting at work or watching my kids at gymnastics (so long as I remember to charge my phone–there’s that darn single point of failure). I respect Sumo for being a Cloud company that behaves like a Cloud company. We might have one physical server rattling around in an otherwise-empty server room, but I don’t know the name of it–I don’t ever need to access it. We run in the Cloud, we scale in the Cloud, we live in the Cloud. To me, that gives Sumo Logic an uncommon brand of Cloud legitimacy. So what does all this have to do with Sumo Logic’s Help system? I started making noise about moving our online Help into the Cloud because I wanted the ability to dynamically update Help. At the time, my lovingly written files were somewhat brutally checked into the code, meaning that my schedule was tied to the engineering upgrade schedule. That worked for a while, but as we trend towards continuous delivery of our product, it wasn’t scaling. I knew there had to be a better way, so I looked to the Cloud. My sense of urgency wasn’t shared by everyone, so I made a fool of myself at a Hack-a-Thon, attempting to make it happen. It was an epic failure, but a great learning experience for me. Knowing that I could spin up an instance of whatever kind of server my little heart desired was a game changer–what was once something that required capital expense (buying a Linux box or a Windows Server) was now available with a few clicks at minimal cost. Within a month or so, I had convinced my manager of the legitimacy of my project. Eventually our Architect, Stefan Zier, took pity on me. He set up an S3 Bucket in AWS (Sumo runs in AWS, so this is a natural choice), then configured our test and production deployments to point to the URL I chose for our Help system. The last bit of engineering magic was leveraging an internal engineering tool that I use to update the URL for one or more deployments. Within a few days it worked. I now can push updates to Help from my own little S3 Bucket whenever I like. That is some awesome agility. To those who are not tech writers, this may seem unremarkable, but I don’t know any other organizations with Cloud-based tech pubs delivery systems. I couldn’t find any ideas online when I was trying to do this myself. No blog posts, no tools. It was uncharted. This challenge really lit a fire under me–I couldn’t figure out why nobody seemed to be delivering Help from the Cloud. The Cloud also improves the quality of my work, and grants me new options. Using an S3 Bucket means that I can potentially set up different Help systems for features that are only accessed by a subset of customers. I can take down anything that contains errors–which very, very rarely happens (yeah, right). I can take feedback from our Support team, Project Managers, Customer Success Team, Sales Engineers, and even from guys sitting around me who mumble about things that are missing when they try to write complicated queries. (Yes, our engineers learn about Sumo Logic operators using the very same Help system as our customers.) Here’s the best part. As our team of tech writers grows (it’s doubled to two in 2014!), I don’t need an IT guy to configure anything; my solution scales gracefully. The authoring tool we use, Madcap Flare, outputs our Help in HTML 5, meaning that the writers don’t need any IT or admin support converting files, nor hosting them in a specific way. (Incidentally, when you check out our Help, everything you see was customized with the tools in Flare, using, of all things, a mobile help template.) Flare has earned a special place in my heart because my deliverables were ready for Cloud deployment; no changes in my process were needed. There are no wasted resources on tasks that the writers are perfectly capable of performing, from generating output to posting new files. That’s the great part about the Cloud. I can do myself what it would take an IT guy to handle using any on-premise server solution. Funny, that sounds just like Sumo Logic’s product: Instead of wasting time racking servers, people can do their job right out of the gate. That’s value added. That’s the Cloud.

June 16, 2014

Blog

Sequoia Joins The Team and Eight Lessons of a First Time CEO

I originally envisioned this blog as a way to discuss our recent $30 million funding, led by our latest investor, Sequoia Capital, with full participation from Greylock, Sutter Hill and Accel. I've been incredibly impressed with the whole Sequoia team and look forward to our partnership with Pat. Yet despite 300 enterprise customers (9 of the Global 500), lots of recent success against our large competitor, Splunk, and other interesting momentum metrics, I'd rather talk about the ride and lessons learned from my first two years as a CEO. It’s Lonely. Accept It and Move On. My mentor, former boss and CEO of my previous company told me this, years ago. But at the time, it applied to him and not me (in hindsight I realize I did not offer much help). But, like being a first time parent, you really can’t fathom it until you face it yourself. I’m sure there’s some psychology about how certain people deal with it and others don’t. I’m constantly thinking about the implications of tactical and strategic decisions. I’ve learned that if you’re too comfortable, you’re not pushing hard enough. The best advice I can give is to find a Board member you can trust, and use him or her as a sounding board early and often. Trust Your Gut. There have been many occasions when I have been given good advice on key decisions. One problem with good advice, is you can get too much of it, and it isn’t always aligned. The best leader I ever met, and another long-time mentor, would always ask, ‘what is your gut telling you?’ More often than not, your gut is right. The nice thing about following your instincts, the only person to blame if it goes awry is yourself. Act Like It’s Your Money. I grew up in Maine, where $100,000 can still buy a pretty nice house. When I first moved to California from Boston it took me some time to get accustomed to the labor costs and other expenses. The mentality in most of the top startups in Silicon Valley is “don’t worry, you can always raise OPM (other people's money)”. Though I understand the need to invest ahead of the curve, especially in a SaaS-based business like ours, I also believe too much funding can cause a lack of discipline. People just expect they can hire or spend their way around a problem. Don’t Be Arrogant. Just saying it almost disqualifies you. Trust me, I have come across all kinds. Backed by arguably the four best Venture Capital firms in the business, I have had plenty of opportunities to meet other CEOs, founders and execs. Some are incredible people and leaders. Some, however, act like they and their company are way too valuable and important to treat everyone with respect. Life is too short not to believe in karma. Listen Carefully. If a sales rep is having trouble closing deals, put yourself in his shoes and figure out what help he needs. If the engineering team is not meeting objectives fast enough, find out if they really understand the customer requirements. Often the smallest tweaks in communication or expectations can drastically change the results. Lastly, listen to your customer(s). It is very easy to write off a loss or a stalled relationship to some process breakdown, but customers buy from people they trust. Customers trust people who listen. It's a People Business. Software will eat the world, but humans still make the decisions. We're building a culture that values openness and rapid decision-making while aligning our corporate mission with individual responsibilities. This balance is a constant work in process and I understand that getting this balance right is a key to successfully scaling the Sumo Logic business. Find the Right VCs at the Right Time. I can’t take any credit for getting Greylock or Sutter Hill to invest in our A and B rounds, respectively. But I do have them to thank for hiring me and helping me. We partnered with Accel in November of 2012 and now Sequoia has led this recent investment. Do not underestimate the value of getting high quality VCs. Their access to customers, top talent, and strategic partners is invaluable. Not to mention the guidance they give in Board meetings and at times of key decisions. The only advice I can give here is: 1) know your business cold, 2) execute your plan and 3) raise money when you have wind at your back. Venture Capitalists make a living on picking the right markets with the right teams with the right momentum. Markets can swing (check Splunk’s stock price in last 3 months) and momentum can swing (watch the Bruins in the Stanley Cup – never mind they lost to the Canadiens). Believe. It may be cliché, but you have to believe in the mission. If you haven’t watched Twelve O’Clock High, watch it. It’s not politically correct, but it speaks volumes about how to lead and manage. You may choose the wrong strategy or tactics at times. But you’ll never know if you don’t have conviction about the goals. OK, so I’m no Jack Welch or Steve Jobs, and many of these lessons are common sense. But no matter how much you think you know, there is way more that you don’t. Hopefully one person will be a little better informed or prepared by my own experience.

May 20, 2014

Blog

Building Scala at Scale

The Scala compiler can be brutally slow. The community has a love-hate relationship with it. Love means “Yes, scalac is slow”. Hate means, “Scala — 1★ Would Not Program Again”. It’s hard to go a week without reading another rant about the Scala compiler. Moreover, one of the Typesafe co-founders left the company shouting, “The Scala compiler will never be fast” (17:53). Even Scala inventor Martin Odersky provides a list of fundamental reasons why compiling is slow. At Sumo Logic, we happily build over 600K lines of Scala code[1] with Maven and find this setup productive. Based on the public perception of the Scala build process, this seems about as plausible as a UFO landing on the roof of our building. Here’s how we do it: Many modules At Sumo Logic, we have more than 120 modules. Each has its own source directory, unit tests, and dependencies. As a result, each of them is reasonably small and well defined. Usually, you just need to modify one or a few of them, which means that you can just build them and fetch binaries of dependencies[2]. Using this method is a huge win in build time and also makes the IDE and test suites run more quickly. Fewer elements are always easier to handle. We keep all modules in single GitHub repository. Though we have experimented with a separate repository for each project, keeping track of version dependencies was too complicated. Parallelism on module level Although Moore’s law is still at work, single cores have not become much faster since 2004. The Scala compiler has some parallelism, but it’s nowhere close to saturating eight cores[3] in our use case. Enabling parallel builds in Maven 3 helped a lot. At first, it caused a lot of non-deterministic failures, but it turns out that always forking the Java compiler fixed most of the problems[4]. That allows us to fully saturate all of the CPU cores during most of the build time. Even better, it allows us to overcome other bottlenecks (e.g., fetching dependencies). Incremental builds with Zinc Zinc brings features from sbt to other build systems, providing two major gains: It keeps warmed compilers running, which avoids the startup JVM “warm-up tax”. It allows incremental compilation. Usually we don’t compile from a clean state, we just make a simple change to get recompiled. This is a huge gain when doing Test Driven Development. For a long time we were unable to use Zinc with parallel modules builds. As it turns out, we needed to tell Zinc to fork Java compilers. Luckily, an awesome Typesafe developer, Peter Vlugter, implemented that option and fixed our issue. Time statistics The following example shows the typical development workflow of building one module. For this benchmark, we picked the largest one by lines of code (53K LOC). This next example shows building all modules (674K LOC), the most time consuming task. Usually we can skip test compilation, bringing build time down to 12 minutes.[5] Wrapper utility Still, some engineers were not happy, because: Often they build and test more often than needed. Computers get slow if you saturate the CPU (e.g., video conference becomes sluggish). Passing the correct arguments to Maven is hard. Educating developers might have helped, but we picked the easier route. We created a simple bash wrapper that: Runs every Maven process with lower CPU priority (nice -n 15); so the build process doesn’t slow the browser, IDE, or a video conference. Makes sure that Zinc is running. If not, it starts it. Allows you to compile all the dependencies (downstream) easily for any module. Allows you to compile all the things that depend on a module (upstream). Makes it easy to select the kind of tests to run. Though it is a simple wrapper, it improves usability a lot. For example, if you fixed a library bug for a module called “stream-pipeline” and would like to build and run unit tests for all modules that depend on it, just use this command: bin/quick-assemble.sh -tu stream-pipeline Tricks we learned along the way Print the longest chain of module dependency by build time. That helps identify the “unnecessary or poorly designed dependencies,” which can be removed. This makes the dependency graph much more shallow, which means more parallelism. Run a build in a loop until it fails. As simple as in bash: while bin/quick-assemble.sh; do :; done. Then leave it overnight. This is very helpful for debugging non-deterministic bugs, which are common in a multithreading environment. Analyze the bottlenecks of build time. CPU? IO? Are all cores used? Network speed? The limiting factor can vary during different phases. iStat Menus proved to be really helpful. Read the Maven documentation. Many things in Maven are not intuitive. The “trial and error” approach can be very tedious for this build system. Reading the documentation carefully is a huge time saver. Summary Building at scale is usually hard. Scala makes it harder, because relatively slow compiler. You will hit the issues much earlier than in other languages. However, the problems are solvable through general development best practices, especially: Modular code Parallel execution by default Invest time in tooling Then it just rocks! [1] ( find ./ -name ‘*.scala’ -print0 | xargs -0 cat ) | wc -l [2] All modules are built and tested by Jenkins and the binaries are stored in Nexus. [3] The author’s 15-inch Macbook Pro from late 2013 has eight cores. [4] We have little Java code. Theoretically, Java 1.6 compiler is thread-safe, but it has some concurrency bugs. We decided not to dig into that as forking seems to be an easier solution. [5] Benchmark methodology: Hardware: MacBook Pro, 15-inch, Late 2013, 2.3 GHz Intel i7, 16 GB RAM. All tests were run three times and median time was selected. Non-incremental Maven goal: clean test-compile. Incremental Maven goal: test-compile. A random change was introduced to trigger some recompilation.

May 13, 2014

Blog

Why You Should Never Catch Throwable In Scala

Scala is a subtle beast and you should heed its warnings. Most Scala and Java programmers have heard that catching Throwable, a superclass of all exceptions, is evil and patterns like the following should be avoided: #wrap_githubgist414fcc7a317a454da514 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n try {\n \n \n \n aDangerousFunction()\n \n \n \n } catch {\n \n \n \n case ex: Throwable => println(ex)\n \n \n \n // Or even worse\n \n \n \n case ex => println(ex)\n \n \n \n }\n \n\n\n \n\n \n \n\n\n \n \n view raw\n dangerouspattern.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found This pattern is absurdly dangerous. Here’s why: The Problem In Java, catching all throwables can do nasty things like preventing the JVM from properly responding to a StackOverflowError or an OutOfMemoryError. Certainly not ideal, but not catastrophic. In Scala, it is much more heinous. Scala uses exceptions to return from nested closures. Consider code like the following: #wrap_githubgist11387293 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n def inlineMeAgain[T](f: => T): T = {\n \n \n \n f\n \n \n \n }\n \n \n \n \n\n \n \n \n def inlineme(f: => Int): Int = {\n \n \n \n try {\n \n \n \n inlineMeAgain {\n \n \n \n return f\n \n \n \n }\n \n \n \n } catch {\n \n \n \n case ex: Throwable => 5\n \n \n \n }\n \n \n \n }\n \n \n \n \n\n \n \n \n def doStuff {\n \n \n \n val res = inlineme {\n \n \n \n 10\n \n \n \n }\n \n \n \n println("we got: " + res + ". should be 10")\n \n \n \n }\n \n \n \n doStuff\n \n\n\n \n\n \n \n\n\n \n \n view raw\n scalaclosures.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found We use a return statement from within two nested closures. This seems like it may be a bit of an obscure edge case, but it’s certainly possible in practice. In order to handle this, the Scala compiler will throw a NonLocalReturnControl exception. Unfortunately, it is a Throwable, and you’ll catch it. Whoops. That code will print 5, not 10. Certainly not what was expected. The Solution While we can say “don’t catch Throwables” until we’re blue in the face, sometimes you really want to make sure that absolutely no exceptions get through. You could include the other exception types everywhere you want to catch Throwable, but that’s cumbersome and error prone. Fortunately, this is actually quite easy to handle, thanks to Scala’s focus on implementing much of the language without magic—the “catch” part of the try-catch is just some sugar over a partial function—we can define partial functions! #wrap_githubgist11387402 .gist-data {max-height: 100%;} document.write('') document.write('\n \n \n \n \n \n\n \n \n \n \n def safely[T](handler: PartialFunction[Throwable, T]): PartialFunction[Throwable, T] = {\n \n \n \n case ex: ControlThrowable => throw ex\n \n \n \n // case ex: OutOfMemoryError (Assorted other nasty exceptions you don't want to catch)\n \n \n \n \n \n \n \n //If it's an exception they handle, pass it on\n \n \n \n case ex: Throwable if handler.isDefinedAt(ex) => handler(ex)\n \n \n \n \n \n \n \n // If they didn't handle it, rethrow. This line isn't necessary, just for clarity\n \n \n \n case ex: Throwable => throw ex\n \n \n \n }\n \n \n \n \n\n \n \n \n // Usage:\n \n \n \n /*\n \n \n \n def doSomething: Unit = {\n \n \n \n try {\n \n \n \n somethingDangerous\n \n \n \n } catch safely {\n \n \n \n ex: Throwable => println("AHHH")\n \n \n \n }\n \n \n \n }\n \n \n \n */\n \n\n\n \n\n \n \n\n\n \n \n view raw\n safelythrowable.scala\n hosted with ❤ by GitHub\n \n \n\n') Not Found This defines a function “safely”, which takes a partial function and yields another partial function. Now, by simply using catch safely { /* catch block */ } we’re free to catch Throwables (or anything else) safely and restrict the list of all the evil exception types to one place in the code. Glorious.

May 5, 2014

Blog

Sumo Logic, ServiceNow and the Future of Event Management

Today’s reality is that companies have to deal with disjointed systems when it comes to detecting, investigating and remediating issues in their infrastructure. Compound that with the exponential growth of machine data and you have a recipe for frustrated IT and security teams who are tasked with uncovering insights from this data exhaust and then remediating issues as appropriate. Customer dissatisfaction, at-risk SLAs and even revenue misses are invariable consequences of this fragmented approach. With our announcement today of a certified integration with ServiceNow, companies now have a closed loop system that makes it much easier for organizations to uncover known and unknown events in Sumo Logic and then immediately create alerts and incidents in ServiceNow. The bi-directional integration supports the ability for companies to streamline the entire change management process, capture current and future knowledge, and lay the groundwork for integrated event management capabilities. This integration takes advantage of all the Sumo Logic analytics capabilities, including LogReduce and Anomaly Detection, to identify what’s happening in your enterprise, even if you never had rules to detect issues in the first place. The cloud-to-cloud integration of ServiceNow and Sumo Logic also boosts productivity by eliminating the whole concept of downloading, installing and managing software. Furthermore, IT organizations also have the ability to elastically scale their data analytics needs to meet the service management requirements of the modern enterprise. Let us know if you’re interested in seeing our integration with ServiceNow. And while you’re at it, feel free to register for Sumo Logic Free. It’s a zero price way to understand how our machine data analytics service works. PS – check out our new web page which provides highlights of recent capabilities and features that we’ve launched.

April 29, 2014

Blog

Mitigating the Heartbleed Vulnerability

https://www.sumologic.com/blog... dir="ltr">By now, you have likely read about the security vulnerability known as the Heartbleed bug. It is a vulnerability in the widespread OpenSSL library. It allows stealing the information protected, under normal conditions, by the SSL/TLS encryption used to encrypt traffic on the Internet (including Sumo Logic). How did we eliminate the threat? When we were notified about the issue, we quickly discovered that our own customer-facing SSL implementation was vulnerable to the attack — thankfully, the advisory was accompanied by some scripts and tools to test for the vulnerability. Mitigation happened in four steps: Fix vulnerable servers. As a first step, we needed to make sure to close the information leak. In some cases, that meant working with third party vendors (most notably, Amazon Web Services, who runs our Elastic Load Balancers) to get all servers patched. This step was concluded once we confirmed that all of load balancers on the DNS rotation were no longer vulnerable. Replace SSL key pairs. Even though we had no reason to believe there was any actual attack against our SSL private keys, it was clear all of them had to be replaced as a precaution. Once we had them deployed out to all the servers and load balancers, we revoked all previous certificates with our CA, GeoTrust. All major browsers perform revocation checks against OCSP responders or CRLs. Notify customers. Shortly after we resolved the issues, we sent an advisory to all of our customers, recommending a password change. Again, this was a purely precautionary measure, as there is no evidence of any passwords leaking. Update Collectors. We have added a new feature to our Collectors that will automatically replace the Collector’s credentials. Once we complete testing, we will recommend all customers to upgrade to the new version. We also enabled support for certificate revocation checking, which wasn’t enabled previously. How has this affected our customers? Thus far, we have not seen any signs of unusual activity, nor have we seen any customers lose data due to this bug. Unfortunately, we’ve had to inconvenience our customers with requests to change passwords and update Collectors, but given the gravity of the vulnerability, we felt the inconvenience was justified. Internal impact Our intranet is hosted on AWS and our users use OpenVPN to connect to it. The version of OpenVPN we had been running needed to be updated to a version that was released today. Other servers behind OpenVPN also needed updating. Sumo Logic uses on the order of 60-70 SaaS services internally. Shortly after we resolved our customer facing issues, we performed an assessment of all those SaaS services. We used the scripts to test for the vulnerability combined with DNS lookups. If a service looked like it was hosted with a provider/service that was known to have been vulnerable (such as AWS ELB), we added it to our list. We are now working our way through the list and changing passwords on all affected applications, starting with the most critical ones. Unfortunately, manually changing passwords in all of the affected applications takes time and presents a burden on our internal IT users. We plan to have completed this process by the end of the week. Interesting Days Overall, the past few days were pretty interesting on the internet. Many servers (as many as 66% of SSL servers on the net) are running OpenSSL, and most were affected. Big sites, including Yahoo Mail and many others were affected. The pace of exploitation and spread of the issue were both concerning. Thankfully, Codenomicon, the company that discovered this vulnerability, did an amazing job handling and disclosing it in a pragmatic and responsible fashion. This allowed everybody to fix the issue rapidly and minimize impact on their users. https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

Blog

AWS Elastic Load Balancing - New Visibility Into Your AWS Load Balancers

Blog

Machine Data at Strata: “BigData++”

A few weeks ago I had the pleasure of hosting the machine data track of talks at Strata Santa Clara. Like “big data”, the phrase “machine data” is associated with multiple (sometimes conflicting) definitions, two prominent ones come from Curt Monash and Daniel Abadi. The focus of the machine data track is on data which is generated and/or collected automatically by machines. This includes software logs and sensor measurements from systems as varied as mobile phones, airplane engines, and data centers. The concept is closely related to the “internet of things”, which refers to the trend of increasing connectivity and instrumentation in existing devices, like home thermostats. More data, more problems This data can be useful for the early detection of operational problems or the discovery of opportunities for improved efficiency. However, the de­coupling of data generation and collection from human action means that the volume of machine data can grow at machine scales (i.e., Moore’s Law), an issue raised by both Monash and Abadi. This explosive growth rate amplifies existing challenges associated with “big data.” In particular, two common motifs among the talks at Strata were the difficulties around: mechanics: the technical details of data collection, storage, and analysis semantics: extracting understandable and actionable information from the data deluge The talks The talks covered applications involving machine data from both physical systems (e.g., cars) and computer systems, and highlighted the growing fuzziness of the distinction between the two categories. Steven Gustafson and Parag Goradia of GE discussed the “industrial internet” of sensors monitoring heavy equipment such as airplane engines or manufacturing machinery. One anecdotal data point was that a single gas turbine sensor can generate 500 GB of data per day. Because of the physical scale of these applications, using data to drive even small efficiency improvements can have enormous impacts (e.g., in amounts of jet fuel saved). Moving from energy generation to distribution, Brett Sargent of LumaSense Technologies presented a startling perspective on the state of the power grid in the United States, stating that the average age of an electrical distribution substation in the United States is over 50 years, while its intended lifetime was only 40 years. His talk discussed remote sensing and data analysis for monitoring and troubleshooting this critical infrastructure. Ian Huston, Alexander Kagoshima, and Noelle Sio from Pivotal presented analyses of traffic data. The talk revealed both common-­sense (traffic moves more slowly during rush hour) and counterintuitive (disruptions in London tended to resolve more quickly when it was raining) findings. My presentation showed how we apply machine learning at Sumo Logic to help users navigate machine log data (e.g., software logs). The talk emphasized the effectiveness of combining human guidance with machine learning algorithms. Krishna Raj Raja and Balaji Parimi of Cloudphysics discussed how machine data can be applied to problems in data center management. One very interesting idea was to use data and modeling to predict how different configuration changes would affect data center performance. Conclusions The amount of data available for analysis is exploding, and we are still in the very early days of discovering how to best make use of it. It was great to hear about different application domains and novel techniques, and to discuss strategies and design patterns for getting the most out of data.

March 5, 2014

Blog

The New Era of Security - yeah, it’s that serious!

Security is a tricky thing and it means different things to different people. It is truly in the eye of the beholder. There is the checkbox kind, there is the “real” kind, there is the checkbox kind that holds up, and there is the “real” kind that is circumvented, and so on. Don’t kid yourself: the “absolute” kind does not exist. I want to talk about security solutions based on log data. This is the kind of security that kicks in after the perimeter security (firewalls), intrusion detection (IDS/IPS), vulnerability scanners, and dozens of other security technologies have done their thing. It ties all of these technologies together, correlates their events, reduces false positives and enables forensic investigation. Sometimes this technology is called Log Management and/or Security Information and Event Management (SIEM). I used to build these technologies years ago, but it seems like decades ago. A typical SIEM product is a hunking appliance, sharp edges, screaming colors - the kind of design that instills confidence and says “Don’t come close, I WILL SHRED YOU! GRRRRRRRRRR”. Ahhhh, SIEM, makes you feel safe doesn’t it. It should not. I proclaim this at the risk at being yet another one of those guys who wants to rag on SIEM, but I built one, and beat many, so I feel I’ve got some ragging rights. So, what’s wrong with SIEM? Where does it fall apart? SIEM does not scale It is hard enough to capture a terabyte of daily logs (40,000 Events Per Second, 3 Billion Events per Day) and store them. It is couple of orders of magnitude harder to run correlation in real time and alert when something bad happens. SIEM tools are extraordinarily difficult to run at scales above 100GB of data per day. This is because they are designed to scale by adding more CPU, memory, and fast spindles to the same box. The exponential growth of data over the two decades when those SIEM tools were designed has outpaced the ability to add CPU, memory, and fast spindles into the box. Result: Data growth outpaces capacity → Data dropped from collection → Significant data dropped from correlation → Gap in analysis → Serious gap in security SIEM normalization can’t keep pace SIEM tools depend on normalization (shoehorning) of all data into one common schema so that you can write queries across all events. That worked fifteen years ago when sources were few. These days sources and infrastructure types are expanding like never before. One enterprise might have multiple vendors and versions of network gear, many versions of operating systems, open source technologies, workloads running in infrastructure as a service (IaaS), and many custom written applications. Writing normalizers to keep pace with changing log formats is not possible. Result: Too many data types and versions → Falling behind on adding new sources → Reduced source support → Gaps in analysis → Serious gaps in security SIEM is rule-only based This is a tough one. Rules are useful, even required, but not sufficient. Rules only catch the thing you express in them, the things you know to look for. To be secure, you must be ahead of new threats. A million monkeys writing rules in real-time: not possible. Result: Your rules are stale → You hire a million monkeys → Monkeys eat all your bananas → You analyze only a subset of relevant events → Serious gap in security SIEM is too complex It is way too hard to run these things. I’ve had too many meetings and discussions with my former customers on how to keep the damned things running and too few meetings on how to get value out of the fancy features we provided. In reality most customers get to use the 20% of features because the rest of the stuff is not reachable. It is like putting your best tools on the shelf just out of reach. You can see them, you could do oh so much with them, but you can’t really use them because they are out of reach. Result: You spend a lot of money → Your team spends a lot of time running SIEM → They don’t succeed on leveraging the cool capabilities → Value is low → Gaps in analysis → Serious gaps in security So, what is an honest, forward-looking security professional who does not want to duct tape a solution to do? What you need is what we just started: Sumo Logic Enterprise Security Analytics. No, it is not absolute security, it is not checkbox security, but it is a more real security because it: Scales Processes terabytes of your data per day in real time. Evaluates rules regardless of data volume and does not restrict what you collect or analyze. Furthermore, no SIEM style normalization, just add data, a pinch of savvy, a tablespoon of massively parallel compute, and voila. Result: you add all relevant data → you analyze it all → you get better security Simple It is SaaS, there are no appliances, there are no servers, there is no storage, there is just a browser connected to an elastic cloud. Result: you don’t have to spend time on running it → you spend time on using it → you get more value → better analysis → better security Machine Learning Rules, check. What about that other unknown stuff? Answer: machine that learns from data. It detects patterns without human input. It then figures out baselines and normal behavior across sources. In real-time it compares new data to the baseline and notifies you when things are sideways. Even if “things” are things you’ve NEVER even thought about and NOBODY in the universe has EVER written a single rule to detect. Sumo Logic detects those too. Result: Skynet … nah, benevolent overlord, nah, not yet anyway. New stuff happens → machines go to work → machines notify you → you provide feedback → machines learn and get smarter → bad things are detected → better security Read more: Sumo Logic Enterprise Security Analytics

Blog

"Hosting Software is For Suckers" and Other Customer Observations

Blog

The Oakland A's, the Enterprise and the Future of Data Innovation

Remember Moneyball? Moneyball is the story of how the performance of the Oakland A’s skyrocketed when they started to vet players based on sabermetrics principles, a data-driven solution that defied conventional wisdom. The team’s success with a metrics-driven approach only came about because GM Billy Beane and one of his assistants, Paul DePodesta, identified the value in player statistics and trusted these insights over what other baseball teams had accepted was true. Any business can learn a significant lesson from Billy Beane and Paul DePodesta, and it is a lesson that speaks volumes about the future of data in business. If a business wants their data to drive innovation, they need to manage that data like the Oakland A’s did. Data alone does not reveal actionable business insights; experienced analysts and IT professionals must interpret it. Furthermore it’s up to business leaders to put their faith in their data, even if it goes against conventional wisdom. Of course, the biggest problem companies confront with their data is the astronomical volume. While the A’s had mere buckets of data to pour through, the modern enterprise has to deal with a spewing fire hose of data. This constant influx of data generated by both humans and machines has paralyzed many companies who often never analyze the data available to them or just analyze the data reactively. Reactive data analysis, while useful to interpret what happened in the past, can't necessarily provide insights into what might occur in the future. Remember your mutual fund disclaimer? Innovation in business will stem from companies creating advantages via proactive use of that data. Case in point: Amazon’s new initiative to anticipate customers’ purchases and prepare shipping and logistics "ahead of time.” The ability to be proactive with machine data won’t be driven simply by technology. It will instead stem from companies implementing their own strategic combination of machine learning and human knowledge. Achieving this balance to generate proactive data insights has been the goal of Sumo Logic since day one. While we have full confidence in our machine data intelligence technologies to do just that, we also know that is not the only solution that companies require. The future of data in the enterprise depends on how companies manage their data. If Billy Beane and Paul DePodesta effectively managed their data to alter the trajectory of the Oakland A’s, there is no reason that modern businesses cannot do the same. This blog was published in conjunction with 'Data Innovation Day'

January 23, 2014

Blog

Why I Joined Sumo Logic

Today I joined Sumo Logic, a cloud-based company that transforms Machine Data into new sources of operations, security, and compliance insights. I left NICE Systems, a market leader and successful organization that had acquired Merced Systems, where I led the Sales Organization for the past 6 years. I had a good position and enjoyed my role, so why leave? And why go to Sumo Logic versus many other options I considered? Many of my friends and colleagues have asked me this, so I wanted to summarize my thinking here. First, I believe the market that Sumo Logic is trying to disrupt is massive. Sumo Logic, like many companies in Silicon Valley these days, manages Big Data. As Gartner recently noted, the concept of Big Data has now reached the peak of the Hype Cycle. The difference is that Sumo Logic actually does this by generating valuable insights from machine data (primarily log files). As a board member told me, people don’t create Big Data nearly as much as machines do. The emergence in the last 10+ years of cloud solutions, and the proliferation of the Internet and web based technologies in everything we do, in every aspect of business, has created an industry that did not exist 10 years ago. By now it’s a foregone conclusion that cloud technologies and cloud vendors like Amazon Web Services and Workday will ultimately be the solution of choice for all companies, whether they are small mom-and-pop shops or large Global Enterprises. I wanted to join a company that was solving a problem that every company has, and doing it using the most disruptive platform, Software-As-A- Service. Equally important is my belief that it’s possible to build a better sales team that can make a difference in the traditional Enterprise Sales Process. Sumo Logic competes in a massive market with only one established player, Splunk. I believe that our capabilities, specifically Machine Data Analytics, are truly differentiated in the market. However, I am also excited to build a sales team that customers and prospects will actually want to work with. Just like technology has evolved (client server, web, cloud) I believe the sales profession needs to as well. Today’s sales organization needs to add value to the sales process, not just get in the way. This means we need to understand more about the product than we describe on the company’s website, be able to explain how our product is different from other choices, and how our service will uniquely solve the complex problems companies face today. I am excited to build an organization that will have a reputation of being knowledgeable about the industry and its ecosystem, will challenge customer thinking while understanding their requirements, and will also be fun to work with. The team at Sumo Logic understands this, and I look forward to delivering on this promise. Finally, I think Sumo Logic has a great product. I started my sales career at Parametric Technology Corporation (PTC). Selling Pro/ENGINEER was a blast and set the gold standard for great products – everything from watching reactions during demos to hearing loyal customers rave about the innovative work they were doing with the product. I had a similar experience at Groove Networks watching Ray Ozzie and his team build a great product that was ultimately acquired by Microsoft. Sumo Logic seems to be generating that same product buzz. We have some amazing brand names like Netflix, Orange, McGraw-Hill, and Scripps Networks as our customers. These and the other customers we have are generating significant benefits from using our machine data intelligence service. The best measure of a company is the passion of their customer base. The energy and loyalty that our customer base exhibits for the Sumo Logic service is a critical reason why I’m very bullish about the long-term opportunity. I am fired up to be a part of this organization. The management team and in particular Vance, Mark, and the existing sales team are already off to a great start and have grown sales significantly. I hope to build on their early success, and I will also follow the advice a good friend recently gave me when he heard the news: “You found something good – don’t screw it up!” I won’t.

January 17, 2014

Blog

Sumo Logic Deployment Infrastructure and Practices

Introduction Here at Sumo Logic, we run a log management service that ingests and indexes many terabytes of data a day; our customers then use our service to query and analyze all of this data. Powering this service are a dozen or more separate programs (which I will call assembly from now on), running in the cloud, communicating with one another. For instance the Receiver assembly is responsible for accepting log lines from collectors running on our customer host machines, while the Index assembly creates text indices for the massive amount of data pumping into our system constantly being fed by the Receivers. We deploy to our production system multiple times each week, while our engineering teams are constantly building new features, fixing bugs, improving performance, and, last but not least, working on infrastructure improvements to help in the care and wellbeing of this complex big-data system. How do we do it? This blog post tries to explain our (semi)-continuous deployment system. Running through hoops In any continuous deployment system, you need multiple hoops that your software must pass through, before you deploy it for your users. At Sumo Logic, we have four well defined tiers with clear deployment criteria for each. A tier is an instance of the entire Sumo Logic service where all the assemblies are running in concert as well as all the monitoring infrastructure (health checks, internal administrative tools, auto-remediation scripts, etc) watching over it. Night This is the first step in the sequence of steps that our software goes through. Originally intended as a nightly deploy, we now automatically deploy the latest clean builds of each assembly on our master branch several times every day. A clean build means that all the unit tests for the assemblies pass. In our complex system, however, it is the interaction between assemblies which can break functionality. To test these, we have a number of integration tests running against Night regularly. Any failures in these integration tests are an early warning that something is broken. We also have a dedicated person troubleshooting problems with Night whose responsibility it is, at the very least, to identify and file bugs for problems. Stage We cut a release branch once a week and use Stage to test this branch much as we use Night to keep master healthy. The same set of integration tests that run against Night also run against Stage and the goal is to stabilize the branch in readiness for a deployment to production. Our QA team does ad-hoc testing and runs their manual test suites against Stage. Long Right before production is the Long tier. We consider this almost as important as our Production tier. The interaction between Long and Production is well described in this webinar given by our founders. Logs from Long are fed to Production and vice versa, so Long is used to monitor and trouble shoot problems with Production. Deployments to Long are done manually a few days before a scheduled deployment to Production from a build that has passed all automated unit tests as well as integration tests on Stage. While the deployment is manually triggered, the actual process of upgrading and restarting the entire system is about as close to a one-button-click as you can get (or one command on the CLI)! Production After Long has soaked for a few days, we manually deploy the software running on Long to Production, the last hoop our software has to jump through. We aim for a full deployment every week and often times will do smaller upgrades of our software between full deploys. Being Production, this deployment is closely watched and there are a fair number of safeguards built into the process. Most notably, we have two dedicated engineers who manage this deployment, with one acting as an observer. We also have a tele-conference with screen sharing that anyone can join and observe the deploy process. Social Practices Closely associated with the software infrastructure are the social aspects of keeping this system running. These are: Ownership We have well defined ownership of these tiers within engineering and devops which rotate weekly. An engineer is designated Primary and is responsible for Long and Production. Similarly we have a designated Jenkins Cop role, to keep our continuous integration system and Night and Stage healthy. Group decision making and notifications We have a short standup everyday before lunch, which everyone in engineering attends. The Primary and Jenkins Cop update the team on any problems or issues with these tiers for the previous day. In addition to a physical meeting, we use Campfire, to discuss on-going problems and notifying others of changes to any of these tiers. If someone wants to change a configuration property on night to test a new feature, the person would update everyone else on campfire. Everyone (and not just the Primary or Jenkins Cop) is in the loop about these tiers and can jump in to troubleshoot problems. Automate almost everything. A checklist for the rest. There are certain things that are done or triggered manually. In cases where humans operate something (a deploy to Long or Production for instance), we have a checklist for engineers to follow. For more on checklists, I refer you to an excellent book, The Checklist Manifesto. Conclusion This system has been in place since Sumo Logic went live and has served us well. It bears mentioning that the key to all of this is automation, uniformity, and well-delineated responsibilities. For example, spinning up a complete system takes just a couple of commands in our deployment shell. Also, any deployment (even a personal one for development) comes up with everything pre-installed and running, including health checks, monitoring dashboards or auto-remediation scripts. Identifying and fixing a problem on Production is no different from that on Night. In almost every way (except for waking up the Jenkins Cop in the middle of the night and the sizing), these are identical tiers! While automation is key, it doesn’t take away the fact that people who run and keep things healthy. A deployment to production can be stressful, more so for the Primary than anyone else and having a well defined checklist can take away some of the stress. Any system like this needs constant improvements and since we are not sitting idle, there are dozens of features, big and small that need to be worked on. Two big ones are: Red-Green deployments, where new releases are rolled out to a small set of instances and once we are confident they work, are pushed to the rest of the fleet. More frequent deployments of smaller parts of the system. Smaller more frequent deployments are less risky. In other words, there is a lot of work to do. Come join us at Sumo Logic!

AWS

January 8, 2014

Blog

Open Source in the Sumo Logic UI

Startups are well-known for being go fast, release and iterate. Having quality engineers at Sumo Logic is a big part of doing that well enough that customers want our solution, but like many other young tech companies open source libraries and tools are also a key element in our ability to deliver. As a recent hire into the User Interface development team I was excited to see just which open source software goes into our cloud log management solution. The list is extensive, because so many of our peers are making great stuff available, but a quick look just at the front end codebase shows: jQuery: The big daddy, jQuery is used by millions of web applications and websites to add dynamic behavior and content to otherwise plain pages. Backbone: A lean, subtly powerful framework for building expressive client-side apps, Backbone provides a core set of MV* classes and a foundation for many community-developed extensions. Sass/Compass: Think “programmable CSS” and you’re capturing the essence of Sass while big brother Compass adds an extensive set of reusable cross-browser CSS patterns as well as several handy utilities. D3: A library for manipulating documents based on data, we use D3 to drive many of the beautiful interactive charts that enable our customers to understand the huge volume of data they process in our application. Require.js: Building large applications is much easier to manage when code can be split into small, coherent chunks (files) and Require.js enables apps to do just this. Code Mirror: This versatile text editor is the basis for Sumo Logic’s powerful search query editors. jQuery Plugins: Many, the more important to us include Select2, Toaster, qTip, and jQuery’s jQuery UI. Collectively these libraries–along with their counterparts used in our service layer–make it possible for a small company to rapidly deliver the depth and quality of Sumo Logic in a cost-effective process. Instead of writing essentially boilerplate code to perform mundane tasks our team is able to create application-specific high value code. In the days before FOSS proliferated the cost per developer or per CPU for each piece of software would have been prohibitive; the economics of Silicon Valley, where two guys in a coffee shop can spin up a Pinterest or Sumo Logic, just wouldn’t have worked.

December 9, 2013

Blog

Black Friday, Cyber Monday and Machine Data Intelligence

The annual craze of getting up at 4am to either stand in line or shop online for the “best” holiday deals is upon us. I know first-hand, because my daughter and I have participated in this ritual for the last four years (I know - what can I say - I grew up in Maine). While we are at the stores fighting for product, many Americans will be either watching football, or surfing the web from the comfort of their couch looking for that too-good-to-be-true bargain. And with data indicating a 50% jump in Black Friday and Cyber Monday deals this year, it’s incumbent on companies to ensure that user experiences are positive. As a result, the leading companies are realizing the need to obtain visibility end-to-end across their applications and infrastructure, from the origin to the edge. Insights from machine data (click-stream in the form of log data), generated from these environments, helps retailers of all stripes maximize these two critical days and the longer-term holiday shopping season. What are the critical user and application issues that CIOs should be thinking about in the context of these incredibly important shopping days? User Behavior Insights. From an e-commerce perspective, companies can use log data to obtain detailed insights into how their customers are interacting with the application, what pages they visit, how long they stay, and the latency of specific transactions. This helps companies, for example, correlate user behavior with the effectiveness of specific promotional strategies (coupons, etc) that allow them to rapidly make adjustments before the Holiday season ends. The Elasticity of The Cloud. If you’re going to have a problem, better it be one of “too much” rather than “too little”. Too frequently, we hear of retail web sites going down during this critical time. Why? The inability to handle peak demand - because often they don’t know what that demand will be. Companies need to understand how to provision for the surge in customer interest on these prime shopping days that in turn deliver an exponential increase in the volume of log data. The ability to provide the same level of performance at 2, 3 or even 10x usual volumes in a *cost-effective* fashion is a problem few companies have truly solved. The ability of cloud-based architectures to easily load-balance and provision for customer surges at any time is critical to maintaining that ideal shopping experience while still delivering the operational insights needed to support customer SLAs. Machine Learning for Machine Data. It’s difficult enough for companies to identify the root cause of an issue that they know something about. Far more challenging for companies is getting insights into application issues that they know nothing about. However, modern machine learning techniques provide enterprises with a way to proactively uncover the symptoms, all buried within the logs, that lead to these issues. Moreover, machine learning eliminates the traditional requirement of users writing rules to identify anomalies, which by definition limit the ability to understand *all* the data. We also believe that the best analytics combine machine learning with human knowledge about the data sets - what we call Machine Data Intelligence - and that helps companies quickly and proactively root out operational issues that limit revenue generation opportunities. Security and Compliance Analytics. With credit cards streaming across the internet in waves on this day, it’s imperative that you’ve already set up the necessary environment to both secure your site from fraudulent behavior and ensure your brand and reputation remain intact. As I mentioned in a previous post, the notion of a perimeter has long since vanished which means companies need to understand that user interactions might occur across a variety of devices on a global basis. The ability to proactively identify what is happening in real-time across your applications and the infrastructure on which they run is critical to your underlying security posture. All this made possible by your logs and the insights they contain. Have a memorable shopping season and join me on twitter - @vanceloiselle - to continue the conversation.

Blog

Sumo Logic Application for AWS CloudTrail

Blog

Using the Join Operator

The powerful analytics capabilities of the Sumo Logic platform have always provided the greatest insights into your machine data. Recently we added an operator – bringing the essence of a SQL JOIN to your stream of unstructured data, giving you even more flexibility. In a standard relational join, the datasets in the tables to be joined are fixed at query time. However, matching up IDs between log messages from different days within your search timeframe likely produces the wrong result because actions performed yesterday should not be associated with a login event that occurred today. For this reason, our Join operator provides for a specified moving timeframe within which to join log messages. In the diagram below, the pink and orange represent two streams of disparate log messages. They both contain a key/value pair that we want to match on and the messages are only joined on that key/value when they both occur within the time window indicated by the black box. Now let’s put this to use. Suppose an application has both real and machine-controlled users. I’m interested in knowing which users are which so that I can keep an eye out for any machine-controlled users that are impacting performance. I have to find a way to differentiate between the real vs the machine-controlled users. As it turns out, the human users create requests at a reasonably low rate while the machine-controlled users (accessing via an API) are able to generate several requests per second and always immediately after the login event. In these logs, there are several different messages coming in with varying purposes and values. Using Join, I can query for both the logins and requests and then restrict the time window of the matching logic to combine the two messages streams. The two sub queries in my search will look for request/query events and login events respectively. I’ve restricted the match window to just 15 seconds so that I’m finding the volume of requests that are very close to the login event. Then I’m filtering out users who made less than 10 requests in that 15-second time frame following a login. The result is a clear view of the users that are actively issuing a large volume of requests via the API immediately upon logging in. Here is my example query: (login or (creating query)) | join (parse "Creating query: '*'" as query, "auth=User:*:" as user) as query, (parse "Login success for: '*'" as user) as login on query.user = login.user timewindow 15s | count by query_user | where _count > 10 | sort _count As you can see from the above syntax, the subqueries are written with the same syntax and even support the use of aggregates (count, sum, average, etc) so that you can join complex results together and achieve the insights you need. And of course, we support joining more than just two streams of logs – combining all your favorite data into one query!

October 29, 2013

Blog

A Sort of Homecoming: Back From Akamai Edge

This is my first blog post for Sumo Logic. It took 18 months but I was always a late bloomer and we have some Hemingway-class bloggers on staff anyway. No doubt I was shy as my music production partner in MOMU, JD Moyer, is now a prolific blogger with an immense following. Nonetheless, when I was asked to write about the experience at last week’s Akamai Edge – the worldwide customer and partner conclave – due to my unique position of having worked at Akamai from 2002 to 2005, I jumped at the opportunity. The day I started at Akamai the stock was either at 52 cents or 56 cents – I don’t recall exactly. The day I left it was at $56 bucks – I do remember that. In those three years, I was able to bring onboard and expanded Akamai’s presence at companies like eBay, The Gap, RingCentral, Netflix, Walmart.com and E*Trade, all of whom bought into the business value that Akamai delivered. This culminated in being Named the top Major Account Executive for the Americas in 2004 – definitely a personally “pinnacle” achievement… Akamai is an amazing company for way too many reasons to list, but the people and the culture top the list. In fact, when I think about the best places I have worked, from Ritz-Carlton to BladeLogic, the common thread among these favorite employers of mine was and is the people. Smart, aggressive, coachable, creative, daring, fearless and fun people, with amazing founders. I want to key in on the similarity that I see between Akamai and Sumo Logic. Akamai is the first Cloud Company. REALLY Cloud. My goal when I arrived at Sumo Logic last year was to help build a culture that weaved in the best of two great worlds – BladeLogic and Akamai, with a maniacal focus on the Customer Experience. At all of these places a common theme was the “DNA” of the staff. There is magnificent art in taking a cutting edge, disruptive product and meshing it with the intense sense of urgency and thoughtful execution. Having the opportunity to help build this from scratch at Sumo Logic was too good to pass up. I fell in love with Christian and Kumar’s vision and the innovation around the technology. There are many more similarities than just the clarity of vision and the incredible focus on execution. The inimitable George Conrades once told a prospect of ours in a meeting how many lines of code that Akamai had written – in 2003 – and it was a massive number. We are both software companies at the core. We both rely heavily on algorithms to create customer value and massive differentiation. We both go to market with a recurring revenue model. We both allow for instant elasticity and on-demand usage. We both are totally focused on a great product that helps our customers fight the demands of the digital world with the best tools available. Last but not least we are both entranced by The Algorithm… Back to Akamai Edge. It is incredible to see how much of the online world continues to run and thrive through Akamai. 2.2 billion log lines every 60 seconds. Yes, you read that right. Staggering scale. The session on the Dominant Design principle blew me away. With the new announcement of Akamai opening up its platform to developers and partners, Akamai is even more Open. Sumo Logic is thrilled to become a charter Member of the Open Platform Initiative – we already have many joint customers salivating to send the Akamai logs directly to Sumo Logic, where they can “join” them with the rest of their infrastructure logs – all for real-time insights across their entire infrastructure. The beta customers are all happy that we have come so far so quickly together. This is an alliance with legs AND brains. George, Paul, Tom, Bob, Brad, Doug, John, Tim, Mark, Gary, Jennie, Rick, Kevin, Kris, Alyson, Brian, Mike, Andy, Dave, Ed (and so many more)…it was great to see you and it is GREAT to be working with you again. The new hires I met seem to have the DNA you need to get to the next Scaling Point. Akamai and Sumo Logic: Faster Forward Together, Moving at the speed of Cloud. Now stop reading my rant and go sell something, will ya, and check out our new Sumo Logic Application for Akamai.

October 15, 2013

Blog

Akamai and Sumo Logic integrate for real-time application insights!

I’m very pleased to announce our strategic alliance with Akamai. Our integrated solution delivers a unified view of application availability, performance, security, and business analytics based on application log data. Customers who rely on Akamai’s globally distributed infrastructure now can get the real-time feed of all logs generated by Akamai’s infrastructure into their Sumo Logic account in order to integrate and cross-analyze them with their internally generated application data sets! What problems does the integrated solution solve? To date, there have been two machine data sets generated by applications that leverage Akamai: 1. Application logs at the origin data centers, which application owners can usually access. 2. Logs generated by Akamai as an application is distributed globally. Application owners typically have zero or limited access to these logs. Both of these data sets provide important metrics and insights for delivering highly-available, secure applications that also provide detailed view of business results. Until today there was no way to get these data sets into a single tool for real-time analysis, causing the following issues: No single view of performance. While origin performance could be monitored, but that provides little confidence that the app is performant for end users. Difficult to understand user interaction. Without data on how real users interact with an application, it was difficult to gauge how users interacted with the app, what content was served, and ultimately how the app performed for those users (and if performance had any impact on conversions). Issues impacting customer experience remained hidden. The root cause of end-user issues caused at the origin remained hidden, impacting customer experience for long periods of time. Web App Firewall (WAF) security information not readily available. Security teams were not able to detect and respond to attacks in real-time and take defensive actions to minimize exposure. The solution! Akamai Cloud Monitor and Sumo Logic provide an integrated approach to solving these problems. Sumo Logic has developed an application specifically crafted for customers to extract insights from their Akamai data, which is sent to Sumo Logic in real time. The solution has been deployed by joint customers (at terabyte scale) to address the following use cases: Real-time analytics about user behavior. Combine Akamai real-user monitoring data and internal data sets to gain granular insights into user behavior. For example, learn how users behave across different device types, geographies, or even how Akamai quality of service impacts user behavior and business results. Security information management and forensics. Security incidents and attacks on an application can be investigated by deep-diving into sessions, IP addresses, and individual URLs that attackers are attempting to exploit and breach. Application performance management from edge to origin. Quickly determine if an application’s performance issue is caused by your origin or by Akamai’s infrastructure, and which regions, user agents, or devices are impacted. Application release and quality management. Receive an alert as soon as Akamai detects that one or more origins have an elevated number of 4xx or 5xx errors that may be caused by new code push, configuration change, or another issue within your origin application infrastructure. Impact of quality of service and operational excellence. Correlate how quality of service impacts conversions or other business metrics to optimize performance and drive better results I could go on, but I’m sure you have plenty of ideas of your own. Join us for a free trial here – as always, there is nothing to install, nothing to manage, nothing to run – we do it all for you. You can also read our announcement here or read more about the Sumo Logic application for Akamai here. Take a look at the Akamai press release here.

October 9, 2013

Blog

Meatballs And Flying Tacos Don't Make a Cloud

Yes, we are cloud and proud. Puppies, ponies, rainbows, unicorns. We got them all. But the cloud is not a personal choice for us at Sumo Logic. It is an imperative. An imperative to build a better product, for happier customers. We strongly believe that if designed correctly, there is no need to fragment your product into many different pieces, each with different functional and performance characteristics that confuse decision-makers. We have built the Sumo Logic platform from the very beginning with a mindset of scalability. Sumo Logic is a service that is designed to appeal and adapt to many use cases. This explains why in just three short years we have been successful in a variety of enterprise accounts across three continents because - first and foremost - our product scales. On the surface, scale is all about the big numbers. We got Big Data, thank you. So do our customers, and we scale to the level required by enterprise customers. Yet, scaling doesn't mean scaling up by sizes of data sets. Scaling also means being able to scale back, to get out of the way, and provide value to everyone, including those customers that might not have terabytes of data to deal with. Our Sumo Free offering has proven that our approach to scaling is holistic - one product for everyone. No hard decisions to be made now, and no hard decisions to be made later. Just do it and get value. Another compelling advantage of our multi-tenant, one service approach is that we can very finely adjust to the amount of data and processing required by every customer, all the time. Elasticity is key, because it enables agility. Agile is the way of business today. Why would anyone want to get themselves tied into a fixed price license, and on top of that provision large amount of compute and storage resources permanently upfront just to buy insurance for those days of the year where business spikes, or, God forbid, a black swan walks into the lobby? Sumo Logic is the cure for anti-agility in the machine data analytics space. As a customer, you get all the power you need, when you need it, without having to pay for it when you don't. Finally, Sumo Logic scales insight. With our recently announced anomaly detection capability, you can now rely on the army of squirrels housed in our infrastructure to generate and vet millions of hypotheses about potential problems on your behalf. Only the most highly correlated anomalies survive this rigorous process, meaning you get actionable insight into potential infrastructure issues for free. You will notice repetitive events and be able to annotate them precisely and improve your operational processes. Even better - you will be able to share documented anomalous events with and consume them back from the Sumo Logic community. What scales to six billion humans? Sumo Logic does. One more thing: as a cloud-native company, we have also scaled the product development process, to release more features, more improvements, and yes, more bug fixes than any incumbent vendor. Sumo Logic runs at the time of now, and new stuff rolls out on a weekly basis. Tired of waiting for a year to get issues addressed? Tired of then having to provision an IT project to just update the monitoring infrastructure? Scared of how that same issue will apply even if the vendor "hosts" the software for you? We can help. Sumo Logic scales, along all dimensions. You like scale? Come on over. Oh, and thanks for the date, Praveen. I'll let you take the check.

October 2, 2013

Blog

Do logs have a schema?

Blog

Logs and laundry: What you don't know can hurt you

Have you ever put your cell phone through the wash? Personally, I've done it. Twice. What did I learn, finally? To always double-check where I put my iPhone before I turn on the washing machine. It's a very real and painful threat that I've learned to proactively manage by using a process with a low rate of failure. But, from time to time, other foreign objects slip through, like a lipstick, my kids's crayon, a blob of Silly Putty---things that are cheaper than an iPhone yet create havoc in the dryer. Clothes are stained, the dryer drum is a mess, and my schedule is thrown completely off while I try to remember my grandmother's instructions for removing red lipstick from a white shirt. What do low-tech laundry woes have to do with Sumo Logic’s big data solution? Well, I see LogReduce as a tool that helps fortify your organization against known problems (for which you have processes in place) while guarding against unknown threats that may cause huge headaches and massive clean-ups. When you think about it, a small but messy threat that you don’t know you need to look for is a nightmare. These days we’re dealing with an unbelievable quantity of machine data that may not be human-readable, meaning that a proverbial Chap Stick in the pocket could be lurking right below your nose. LogReduce takes the “noise” out of that data so you can see those hidden threats, problems, or issues that could otherwise take a lot of time to resolve. Say you’re running a generic search for a broad area of your deployment, say billing errors, user creations, or log ins. Whatever the search may be, it returns thousands and thousands of pages of results. So, you could take your work day slogging through messages, hoping to find the real problem, or you can simply click Log Reduce. Those results are logically sorted into signatures--groups of messages that contain similar or relevant information. Then, you can teach Sumo Logic what messages are more important, and what data you just don’t need to see again. That translates into unknown problems averted. Of course your team has processes in place to prevent certain events. How do you guard against the unknown? LogReduce can help you catch a blip before it turns into a rogue wave. Oh, and if you ever put Silly Putty through the washer and dryer, a good dose of Goo Gone will do the trick.

Blog

From Academia to Sumo Logic

https://www.sumologic.com/blog... dir="ltr">While I was wrapping up my Ph.D. thesis, my girlfriend (now wife) and I decided that we wanted to leave Germany to live and work in a different country. Prior to my Ph.D., I started off in computer gaming (ported “Turrican 2” to the PC when I was a kid1). Following that, I did my MSCS and Ph.D. in distributed systems and computer networks in Karlsruhe, Germany. I have been working as a Software Engineer at Sumo Logic since October 2012. At first I was skeptical about how intellectually engaging and challenging a commercial venture in log management could be. However, after working at Sumo Logic for more than 6 months, I have to admit that I misjudged the academic and engineering challenges of log management. Why? I underestimated the problem and potential! In contrast to academia, where algorithms are tested under controlled and reproducible conditions, we face the full force of unexpected behaviors of a live system here at Sumo Logic. When we turn algorithms into reality, we are responsible for the entire development process, including planning, testing, and implementing the finished component in a production environment. No other company is approaching Big Data-scale log management like Sumo Logic. As a main differentiator Sumo Logic offers enterprise class log file processing in the Cloud. Sumo Logic ingests terabytes per day of unstructured log files that need to be processed in real time. In contrast to websites or other content, log files need exact processing; e.g., a needle in the haystack of logs can be comprised of merely 16 characters (out of the terabytes of data ingested and stored). Thus, there are only a few heuristics we can use to increase efficiency. This makes developing new algorithms to process log data challenging and interesting. Furthermore, all our databases need to answer queries in a timely manner. Databases with unpredictable latencies on certain queries are not suitable for the problems we are solving. We mix-and-match between open source technologies and in-house customized solutions for that reason. In addition, our customers trust us with information of vital importance to them. Security concerns influence design decisions across many levels, ranging from operating system level for full hard drive encryption, to application level for role-based access control (RBAC). We have to carefully select algorithms to balance performance (encrypted log files can challenge the efficient use of our cloud resources) while continuing to isolate customers, so that one customer’s demands don’t impact the performance of another. In summary, I am glad I took the opportunity and joined Sumo Logic to turn my academic research into solutions used by customers to process TBs of their critical data in real time. This experience has brought self-improvement with each challenge, full-stack knowledge, and a sense of engineering not possible in any other environment. And, by the way, we are hiring. 🙂 [1] http://www.mobygames.com/game/dos/turrican-ii-the-final-fight/reviews/reviewerId,59617/ https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

May 21, 2013

Blog

What is StatsD and How Can DevOps Use It?

Blog

Sending CloudPassage Halo Event Logs to Sumo Logic

The below is a guest post from CloudPassage. Automating your server security is about more than just one great tool – it’s also about linking together multiple tools to empower you with the information you need to make decisions. For customers of CloudPassage and Sumo Logic, linking those tools to secure cloud servers is as easy as it is powerful. The CloudPassage Halo Event Connector enables you to view security event logs from CloudPassage Halo in your Sumo Logic dashboard, including alerts from your configuration, file integrity, and software vulnerability scans. Through this connector, Halo delivers unprecedented visibility of your cloud servers via your log management console. You can track server events such as your server rebooting, shutting down, changing IP addresses, and much more. The purpose of the Halo Event Connector is to retrieve event data from a CloudPassage Halo account and import it into Sumo Logic for indexing or processing. It is designed to execute repeatedly, keeping the Sumo Collector up-to-date with Halo events as time passes and new events occur. The Halo Event Connector is free to use, and will work with any Halo subscription. To get started integrating Halo events into Sumo Logic, make sure you have set up accounts for CloudPassage Halo and Sumo Logic. Then, generate an API key in your CloudPassage Halo portal. Once you have an API key, follow the steps provided in the Halo – Sumo Logic documentation, using the scripts provided on Github. The documentation walks you through the process of testing the Halo Event Connector script. Once you have tested the script, you will then add the output as a “Source” by selecting “Script” in Sumo Logic (see below). When you have finished adding the new data source that integrates the Halo Event Connector with Sumo Logic (as detailed in the .pdf documentation), you will be taken back to the “Collectors” tab where the newly added Script source will be listed. Once the Connector runs successfully and is importing event data into Sumo Logic, you will see Halo events such as the following appear in your Sumo Logic searches: Try it out today – we are eager to hear your feedback! We hope that integrating these two tools makes your server security automation even more powerful.

April 23, 2013

Blog

Universal Collection of Machine Data

Blog

Dirty Haskell Phrasebook

Whenever people ask me whether Hungarian is difficult to learn, I half-jokingly say that it can’t be too hard given that I had learned it by the time I turned three. Having said that, I must admit that learning a new language as a grown-up is a whole new ball game. Our struggle for efficiency is reflected in the way we learn languages: we focus on the most common patterns, and reuse what we know as often as possible. Programming languages are no different. When I started at Sumo Logic just two months ago, I wanted to become fluent in Scala as quickly as possible. Having a soft spot for functional languages such as Haskell, a main factor in deciding to do an internship here was that we use Scala. I soon realized that a large subset of Haskell can easily be translated into Scala, which made the learning process a lot smoother so far. You’ve probably guessed by now that this post is going to be a Scala phrasebook for Haskellers. I’m also hoping that it will give new insights to seasoned Scalaists, and spark the interest of programmers who are new to the functional paradigm. Here we go. Basics module Hello where main :: IO () main = do putStrLn "Hello, World!" object Hello { def main(args: Array[String]): Unit = println("Hello, World!") } While I believe that HelloWorld examples aren’t really useful, there are a few key points to make here. The object keyword creates a singleton object with the given name and properties. Pretty much everything in Scala is an object, and has its place in the elaborate type hierarchy stemming from the root-type called Any. In other words, a set of types always has a common ancestor, which isn’t the case in Haskell. One consequence of this is that Scala’s ways of emulating heterogeneous collections are more coherent. For example, Haskell needs fairly involved machinery such as existential types to describe a list-type that can simultaneously hold elements of all types, which is simply Scala’s List[Any]. In Scala, every function (and value) needs an enclosing object or class. (In other words, every function is a method of some object.) Since object-orientation concepts don’t have direct analogues in Haskell, further examples will implicitly assume an enclosing object on the Scala side. Haskell’s () type is Scala’s Unit, and its only value is called () just like in Haskell. Scala has no notion of purity, so functions might have side-effects without any warning signs. One particular case is easy to spot though: the sole purpose of a function with return type Unit is to exert side effects. Values answer :: Int answer = 42 lazy val answer: Int = 42 Evaluation in Haskell is non-strict by default, whereas Scala is strict. To get the equivalent of Haskell’s behavior in Scala, we need to use lazy values (see also lazy collections). In most cases however, this makes no difference. From now on, the lazy keyword will be dropped for clarity. Besides val, Scala also has var which is mutable, akin to IORef and STRef in Haskell. Okay, let’s see values of some other types. question :: [Char] question = "What's six by nine?" val question: String = "What's six by nine?" Can you guess what the type of the following value is? judgement = (6*9 /= 42) val judgement = (6*9 != 42) Well, so can Haskell and Scala. Type inference makes it possible to omit type annotations. There are a few corner cases that get this mechanism confused, but a few well-placed type annotations will usually sort those out. Data Structures Lists and tuples are arguably the most ubiquitous data structures in Haskell. In contrast with Haskell’s syntactic sugar for list literals, Scala’s notation seems fairly trivial, but in fact involves quite a bit of magic under the hood. list :: [Int] list = [3, 5, 7] val list: List[Int] = List(3, 5, 7) Lists can also be constructed from a head-element and a tail-list. smallPrimes = 2 : list val smallPrimes = 2 :: list As you can see, : and :: basically switched roles in the two languages. This list-builder operator, usually called cons, will come in handy when we want to pattern match on lists (see Control Structures and Scoping below for pattern matching). Common accessors and operations have the same name, but they are methods of the List class in Scala. head list list.head tail list list.tail map func list list.map(func) zip list_1 list_2 list_1.zip(list_2) If you need to rely on the non-strict evaluation semantics of Haskell lists, use Stream in Scala. Tuples are virtually identical in the two languages. tuple :: ([Char], Int) tuple = (question, answer) val tuple: (String, Int) = (question, answer) Again, there are minor differences in Scala’s accessor syntax due to object-orientation. fst tuple tuple._1 snd tuple tuple._2 Another widely-used parametric data type is Maybe, which can represent values that might be absent. Its equivalent is Option in Scala. singer :: Maybe [Char] singer = Just "Carly Rae Jepsen" val singer: Option[String] = Some("Carly Rae Jepsen") song :: Maybe [Char] song = Nothing val song: Option[String] = None Algebraic data types translate to case classes. data Tree = Leaf | Branch [Tree] deriving (Eq, Show) sealed abstract class Tree case class Leaf extends Tree case class Branch(kids: List[Tree]) extends Tree Just like their counterparts, case classes can be used in pattern matching (see Control Structures and Scoping below), and there’s no need for the new keyword at instantiation. We also get structural equality check and conversion to string for free, in the form of the equals and toString methods, respectively. The sealed keyword prevents anything outside this source file from subclassing Tree, just to make sure exhaustive pattern lists don’t become undone. See also extractor objects for a generalization of case classes. Functions increment :: Int -> Int increment x = x + 1 def increment(x: Int): Int = x + 1 If you’re coming from a Haskell background, you’re probably not surprised that the function body is a single expression. For a way to create more complex functions, see let-expressions in Control Structures and Scoping below. three = increment 2 val three = increment(2) Most of the expressive power of functional languages stems from the fact that functions are values themselves, which leads to increased flexibility in reusing algorithms. Composition is probably the simplest form of combining functions. incrementTwice = increment . increment val incrementTwice = (increment: Int => Int).compose(increment) Currying, Partial Application, and Function Literals Leveraging the idea that functions are values, Haskell chooses to have only unary functions and emulate higher arities by returning functions, in a technique called currying. If you think that isn’t a serious name, you’re welcome to call it schönfinkeling instead. Here’s how to write curried functions. addCurry :: Int -> Int -> Int addCurry x y = x + y def addCurry(x: Int)(y: Int): Int = x + y five = addCurry 2 3 val five = addCurry(2)(3) The rationale behind currying is that it makes certain cases of partial application very succinct. addSix :: Int -> Int addSix = addCurry 6 val addSix: Int => Int = addCurry(6) val addSix = addCurry(6) : (Int => Int) val addSix = addCurry(6)(_) The type annotation is needed to let Scala know that you didn’t forget an argument but really meant partial application. If you want to drop the type annotation, use the underscore placeholder syntax. To contrast with curried ones, functions that take many arguments at once are said to be uncurried. Scalaists seem to prefer their functions less spicy by default, most likely to save parentheses. addUncurry :: (Int, Int) -> Int addUncurry (x, y) = x + y def addUncurry(x: Int, y: Int): Int = x + y seven = addUncurry (2, 5) val seven = addUncurry(2, 5) Uncurried functions can still be partially applied with ease in Scala, thanks to underscore placeholder notation. addALot :: Int -> Int addALot = x -> addUncurry (x, 42) val addALot: Int => Int = addUncurry(_, 42) val addALot = addUncurry(_: Int, 42) When functions are values, it makes sense to have function literals, a.k.a. anonymous functions. (brackets :: Int -> [Char]) = x -> "<" ++ show x ++ ">" val brackets: Int => String = x => "<%s>".format(x) brackets = (x :: Int) -> "<" ++ show x ++ ">" val brackets = (x: Int) => "<%s>".format(x) Infix Notation In Haskell, any function whose name contains only certain operator characters will take its first argument from the left side when applied, which is infix notation if it has two arguments. Alphanumeric function names surrounded by backticks also behave that way. In Scala, any single-argument function can be used as an infix operator by omitting the dot and parentheses from the function call syntax. data C = C [Char] bowtie (C s) t = s ++ " " ++ t (|><|) = bowtie case class C(s: String) { def bowtie(t: String): String = s + " " + t val |><| = bowtie(_) } (C "James") |><| "Bond" C("James") |><| "Bond" (C "James") `bowtie` "Bond" C("James") bowtie "Bond" Haskell’s sections provide a way to create function literals from partially applied infix operators. They can then be translated to Scala using placeholder notation. tenTimes = (10*) val tenTimes = 10 * (_: Int) Again, the type annotation is necessary so that Scala knows you meant what you wrote. Higher-order Functions and Comprehensions Higher order functions are functions that have arguments which are functions themselves. Along with function literals, they can be used to express complex ideas in a very compact manner. One example is operations on lists (and other collections in Scala). map (3*) (filter (<5) list) list.filter(_ < 5).map(3 * _) That particular combination of map and filter can also be written as a list comprehension. [3 * x | x <- list, x < 5] for(x <- list if x < 5) yield (3 * x) Control Structures and Scoping Pattern matching is a form of control transfer in functional languages. countNodes :: Tree -> Int countNodes t = case t of Leaf -> 1 (Branch kids) -> 1 + sum (map countNodes kids) def countNodes(t: Tree): Int = t match { case Leaf() => 1 case Branch(kids) => 1 + kids.map(countNodes).sum } For a definition of Tree, see the Data Structures section above. Even though they could be written as pattern matching, if-expressions are also supported for increased readability. if condition then expr_0 else expr_1 if (condition) expr_0 else expr_1 Let expressions are indispensable in organizing complex expressions. result = let v_0 = bind_0 v_1 = bind_1 -- ... v_n = bind_n in expr val result = { val v_0 = bind_0 val v_1 = bind_1 // ... val v_n = bind_n expr } A code block evaluates to its final expression if the control flow reaches that point. Curly brackets are mandatory; Scala isn’t indentation-sensitive. Parametric Polymorphism I’ve been using parametric types all over the place, so it’s time I said a few words about them. It’s safe to think of them as type-level functions that take types as arguments and return types. They are evaluated at compile time. [a] List[A] (a, b) (A, B) // desugars to Tuple2[A, B] Maybe a Option[A] a -> b A => B // desugars to Function1[A, B] a -> b -> c A => B => C // desugars to Function2[A, B, C] Type variables in Haskell are required to be lowercase, whereas they’re usually uppercase in Scala, but this is only a convention. In this context, Haskell’s type classes loosely correspond to Scala’s traits, but that’s a topic for another time. Stay tuned. Comments -- single-line comment // single-line comment {- Feel free to suggest additions and corrections to the phrasebook in the comments section below. :] -} /* Feel free to suggest additions and corrections to the phrasebook in the comments section below. :] */ Here Be Dragons Please keep in mind that this phrasebook is no substitute for the real thing; you will be able to write Scala code, but you won’t be able to read everything. Relying on it too much will inevitably yield some unexpected results. Don’t be afraid of being wrong and standing corrected, though. As far as we know, the only path to a truly deep understanding is the way children learn: by poking around, breaking things, and having fun.

April 5, 2013

Blog

Harder, Better, Faster, Stronger - Machine Data Analytics and DevOps

Work It Harder, Make It Better Do It Faster, Makes Us Stronger More Than Ever Hour After Our Work Is Never Over Daft Punk - "Harder, Better, Faster, Stronger" When trying to explain the essence of DevOps to colleagues last week, I found myself unwittingly quoting the kings of electronica, the French duo Daft Punk (and Kanye West, who sampled the song in "Stronger"). So often, I find the "spirit" of DevOps being reduced to mere automation, the takeover of Ops by Dev (or vice versa), or other over-simplications. This is natural for any new, potentially over-hyped, trend. But how do we capture the DevOps "essence" - programmable architecture, agile development, and lean methodology - in a few words? It seems like the short lyrics really sum up the essence of the flexible, agile, constantly improving ideal of a DevOps "team", and the continuous improvement aspects of lean and agile methodology. So, what does this have to do with machine data analytics and Sumo Logic? Part of the DevOps revolution is a deep and wrenching re-evaluation of the state of IT Operations tools. As the pace of technological change and ferocity of competition keep increasing for any company daring to make money on the Internet (which is almost everybody at this point), the IT departments are facing a difficult problem. Do they try to adapt the process-heavy, tops-down approaches as exemplified by ITIL, or do they embrace a state of constant change that is DevOps? In the DevOps model, the explosion of creativity that comes with unleashing your development and operations teams to innovate quickly overwhelms traditional, static tools. More fundamentally, the continuous improvement model of agile development and DevOps is only as good as the metrics used to measure success. So, the most successful DevOps teams are incredibly data hungry. And this is where machine data analytics, and Sumo Logic in particular, really comes into its own, and is fundamentally in tune with the DevOps approach. 1. Let the data speak for itself Unlike the management tools of the past, Sumo Logic makes only basic assumptions about the data being consumed (time stamped, text-based, etc.). The important patterns are determined by the data itself, and not by pre-judging what patterns are relevant, and which are not. This means that as the application rapidly changes, Sumo Logic can detect new patterns - both good and ill - that would escape the inflexible tools of the past. 2. Continuous reinterpretation Sumo Logic never tries to force the machine data into tired old buckets that are forever out of date. The data is stored raw so that it can continually be reinterpreted and re-parsed to reveal new meaning. Fast moving DevOps teams can't wait for the stodgy software vendor to change their code or send their consultant onsite. They need it now. 3. Any metric you want, any time you want it The power of the new DevOps approach to management is that the people that know the app the best, the developers, are producing the metrics needed to keep the app humming. This seems obvious in retrospect, yet very few performance management vendors support this kind of flexibility. It is much easier for developers to throw more data at Sumo Logic by outputting more data to the logs than to integrate with management tools. The extra insight that this detailed, highly specific data can provide into your customers' experience and the operation of your applications is truly groundbreaking. 4. Set the data free Free-flow of data is the new norm, and mash-ups provide the most useful metrics. Specifically, pulling business data from outside of the machine data context allows you to put it in the proper perspective. We do this extensively at Sumo Logic with our own APIs, and it allows us to view our customers as more than nameless organization ID numbers. DevOps is driven by the need to keep customers happy. 5. Develop DevOps applications, not DevOps tools The IT Software industry has fundamentally failed its customers. In general, IT software is badly written, buggy, hard to use, costly to maintain, and inflexible. Is it any wonder that the top DevOps shops overwhelmingly use open source tools and write much of the logic themselves?! Sumo Logic allows DevOps teams the flexibility and access to get the data they need when they need it, without forcing them into a paradigm that has no relevance for them. And why should DevOps teams even be managing the tools they use? It is no longer acceptable to spend months with vendor consultants, and then maintain extra staff and hardware to run a tool. DevOps teams should be able to do what they are good at - developing, releasing, and operating their apps, while the vendors should take the burden of tool management off their shoulders. The IT industry is changing fast, and DevOps teams need tools that can keep up with the pace - and make their job easier, not more difficult. Sumo Logic is excited to be in the forefront of that trend. Sign up for Sumo Logic Free and prove it out for yourself.

Blog

Finding Needles in the the Machine Data Haystack - LogReduce in the Wild

As with any new, innovative feature in a product, it is one thing to say it is helpful for customers - it is quite another to see it in action in the wild. Case in point, I had a great discussion with a customer about using LogReduce™ in their environment. LogReduce is a groundbreaking tool for uncovering the unknown in machine data, and sifting through the inevitable noise in the sea of log data our customers put in Sumo Logic. The customer in question had some great use cases for LogReduce that I would like to share. Daily Summaries With massive amounts of log data flowing through modern data centers, it is very difficult to get a bird's eye view of what is happening. More importantly, the kind of summary that provides actionable data about the day's events is elusive at best. In our customer example, they have been using LogReduce to provide exactly that type of daily, high-level overview of the previous day's log data. How does it work? Instead of using obvious characteristics to group log data like the source (e.g. Window's Events) or host (e.g. server01 in data center A), LogReduce uses "fuzzy logic" to look for patterns across all of your machine data at once - letting the data itself dictate the summary. Log data with the same patterns, or signatures, are grouped together - meaning that new patterns in the data will immediately stand out, and the noise will be condensed to a manageable level. Our customer is also able to supply context to the LogReduce results - adjusting and extending signatures, and adjusting relevance as necessary. In particular, by adjusting the signatures that LogReduce finds, the customer is to "teach" LogReduce to provide the best results in the most relevant way. This allows them to separate the critical errors out, while still acknowledging the background noise of known messages. The end-result is a daily summary that is both more relevant because of the user-supplied, business context as well as being flexible enough to find important, new patterns. Discovering the Unknown And finding those new patterns is the essential essence of Big Data analytics. A machine-data analytics tool should be able to find unknown patterns, not simply reinforce the well-known ones. In this use case, our customer already has alerting established for known, critical errors. The LogReduce summary provides a way to identify, and proactively address, new, unknown errors. In particular, by using LogReduce's baseline and compare functionality, Sumo Logic customers can establish a known state for log data and then easily identify anomalies by comparing the current state to the known, baselined state. In summary, LogReduce provides the essence of Big Machine Data analytics to our customers - reducing the the constant noise of today's datacenter, while finding those needles in the proverbial haystack. This is good news for customers who want to leverage the true value of their machine data without the huge investments in the time and expertise required in the past.

March 19, 2013

Blog

Show Me the VPN Logs!!!

Show Me the Money!!! Show Me the VPN Logs!!! Move over Tesla automobile logs, it’s time for Yahoo VPN logs to get their moment in the sun! Just as soon as log data dropped out of the headlines they came right back, as Yahoo CEO Marissa Mayer announced a ban on telecommuting – with the decision reportedly driven by analysis of the company’s VPN log data. From the VPN data, it’s said that the Yahoo CEO determined too many remote workers were not pulling their weight, as evidenced by their lack of connecting to the VPN and accessing Yahoo’s IT systems. Certainly, VPN logs don’t tell the entire story around telecommuter productivity, but they are an important data point, and the information contained in those logs certainly was compelling for Ms. Mayer. There is of course a bigger picture to this, and it starts with the fact that this is not the first time VPN logs are in the news. (Not even the first time this year!). See this blog post from the Verizon RISK team, where they helped their client identify a developer who took global wage arbitrage to an extreme; he collected his six-figure paycheck in the USA and then outsourced his own job to a Chinese consulting firm, paying that firm a fraction of his salary to do his job for him! How did he do this? Simple: He FedEx’d his RSA token to China. How did he get caught? Simple: They found him sitting in his office while the VPN logs showed him in China. Busted. All thanks to the logs. At the highest level, what do the Tesla, Yahoo, and wage arbitrage stories tell us? Simply put, log data is immensely valuable, it’s increasingly becoming front and center, and it’s not going away anytime soon. We at Sumo Logic couldn’t be happier, as this is further public recognition of the value hidden in machine data (the biggest component of which is log data). We’ve said it many times, log data holds the absolute and authoritative record of all the events that occurred. That’s true for automobile logs, server logs, application logs, device logs, and yes Mr. Developer who outsourced his job to China… VPN logs.

March 7, 2013

Blog

The Marriage of Machine Data and Customer Service

Blog

Using the transpose operator

Sumo Logic lets you access your logs through a powerful query language. In addition to searching for individual log messages, you may extract, transform, filter and aggregate data from them using a sequence of operators. There are currently about two dozen operators available and we are constantly adding new ones. In this post I want to introduce you to a recent addition to the toolbox, the transpose operator. Let’s say you work for an online brokerage firm, and your trading server logs lines that look like the following, among other things: 2013-02-14 01:41:36 10.20.11.102 GET /Trade/StockTrade.aspx action=buy&symbol=s:131 80 Cole 219.142.249.227 Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_7_3)+AppleWebKit/536.5+(KHTML,+like+Gecko)+Chrome/19.0.1084.54+Safari/536.5 200 0 0 449 There is a wealth of information in this log line, but to keep it simple, let’s focus on the last number, in this case 449, which is the server response time in milliseconds. We are interested in finding out the distribution of this number so as to know how quickly individual trades are processed. One way to do that is to build a histogram of the response time using the following query: stocktrade | extract “(?<response_time>d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by response_time Here we start with a search for “stocktrade” to get only the lines we are interested in, extract the response time using a regular expression, round it up to the next 100 millisecond, and count the occurrence of each number. The result looks like: Now, it would also be interesting to see how the distribution changes over time. That is easy with the timeslice operator: stocktrade | timeslice 1m | extract “(?<response_time>d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time and the result looks like the following: This gets the data we want, but it is not presented in a format that is easy to digest. For example, in the table above, the first five rows give us the distribution of response time at 8:00, the next five rows at 8:01, etc. Wouldn’t it be nice if we could rearrange the data into the following table? That is exactly what transpose does: stocktrade | timeslice 1m | extract “(?<response_time>d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time | transpose row _timeslice column response_time Here we tell the query engine to rearrange the table using time slice values as row labels, and response time as column labels. This is especially useful when the data is visualized. The “stacking” option allows you to draw bar charts with values from different columns stacked onto each other.The length of bars represents number of trading requests per minute, and the colored segments represent the distribution of response time. That’s it! To find out other interesting ways to analyze your log data, sign up for Sumo Logic Free and try for yourself!

February 19, 2013

Blog

A Few Good Logs

“I Want The Logs!” In the midst of this week’s back and forth between Tesla, the New York Times, and various other media outlets and bloggers, Greylock Data Scientist in Residence (and Sumo Logic Advisory Board Member) DJ Patil posted a tweet that caught my eye: “Love that everyone is using data to have a conversation. It’s about getting to the right answer.” DJ is 100% correct, and throughout this Tesla/NY Times debate, we at Sumo Logic are thrilled to see the public recognition of the importance of log data — as a source of the truth. Yes, log data needs to be properly analyzed and understood (as the debate makes evident), but what clearly emerged from the debate is the truism that that log data holds the absolute and authoritative record of all the events that occurred. It’s evident; just see how the discussion revolves entirely around understanding the logs. The Bigger Picture There is a bigger picture to this debate, which is that log data is generated everywhere, whether it be from the car you drive, the energy meter beside your home, the device you’re using to read this blog, the server delivering this content, the network delivering this content, the device I’m using to write this post… I could go on and on. And in the same way log files generated by a car hold the answer to whether it ran out of power or met range estimates, log files generated by applications, servers, network and virtualization infrastructure hold the answer to whether revenue generating applications are up and adequately performing, if customers are utilizing a newly developed feature, or if any part of an IT infrastructure is slow or experiencing downtime. It is important to remember — these are all business critical questions. And just like Tesla needed to analyze their logs to defend their business, every enterprise, large or small, needs to be able to easily analyze and visualize their log data to ensure the health of their business. Cars, Enterprises, and Terabytes Before moving on, let’s not forget, enterprises are not cars, and data generated from enterprises is different from data generated by cars, particularly along three dimensions: volume, variety, and velocity. You got it… the 3 Vs of Big Data. Cars do not (or at least do not yet!) generate up to terabytes of unstructured data per day. Enterprises with large distributed IT environments do. This is where Sumo Logic comes in. Sumo Logic is based on the recognition that enterprises need to be able to easily analyze and visualize the massive amounts of amounts of data generated by their infrastructure and business, and that current on-premise tools just can’t scale. Today, enterprises generate as much data in 10 minutes as they did in the entire year in 2003. It is therefore not surprising that legacy on-premise solutions just can’t keep up. Sumo Logic makes it possible for enterprises of all sizes to find the truth from their data. And we do so without adding any operational overhead for our customers; Sumo Logic is a 100% cloud-based service. Large enterprises like Netflix and Land O’Lakes use Sumo Logic. Fast growing enterprises like PagerDuty and Okta do as well. You want some answers? You have some logs? We can handle the logs. Contact us here, or try it out for yourself by signing up for Sumo Logic Free.

February 15, 2013

Blog

A Visit To the Other Coast

Vance and I spent a week on the East Coast talking with a variety of analysts about the Sumo Logic story. Apart from the usual questions (“where did you come up with that name?”), there were a number of interesting observations from our first ‘tour’. Different aspects of our story appeal to different analysts, depending on particular research areas. Some people latched onto our “Analytics for IT” story and were interested in a deep understanding of how we plan to take LogReduce and its associated capabilities to the proverbial next level. Others were interested in understanding just how elastic our cloud-based architecture is to support the potential bursting and scaling needs of a number of different clients. Still others focused on the ROI potential of our solution across a variety of different use cases. Once we actually showed a live example of how LogReduce works (hello, Sumo on Sumo) everyone instinctively understood the huge operational and business value that LogReduce brings by distilling hundreds of thousands of log messages into a set of twenty to thirty real patterns. Thank goodness for ubiquitous WiFi. My most interesting meeting was with about 20 people from a particular banking outfit with whom I spent the first ten minutes explaining what log files were and why analyzing them could uncover real insights from a business. Getting back to first principles was illuminating because without explaining the business reason for looking at log files, your so-called features are almost irrelevant. We have our sales kickoff this week. There’s a ton of energy across every group, not just because of the success we’ve had but also from the enormous opportunity to help small and large businesses generate value from their machine data. We’d love to get your feedback on our service – try Sumo Logic Free and tell us what you think.

February 13, 2013

Blog

Pardon me, have you got data about machine data?

I’m glad you ask, I just might. In fact, we started collecting data about machine data some 9 months ago when we participated at the AWS Big Data conference in Boston. Since then we continued collecting the same data at a variety of industry show and conferences such as VMworld, AWS re: Invent, Velocity, Gluecon, Cloud Slam, Defrag, DataWeek, and others. The original survey was printed on my home printer, 4 surveys per page, then inexpertly cut with the kitchen scissors the night before the conference – startup style, oh yeah! The new versions made it onto a shiny new iPad as an IOS App. The improved method, Apple caché, and a wider reach gave us more than 300 data points and, incidentally, cost us more than 300 Sumo Logic T-Shirts which we were more than happy to give up in exchange for data. (btw, if you want one come to one of our events, next one coming up will be the Strata Conference). As a data junkie, I’ve been slicing and dicing the responses and thought that end of our fiscal year could be the right moment to revisit it and reflect on my first blog post on this data set. Here is what we asked: Which business problems do you solve by using machine data? Which tools do you use to analyze machine data in order to solve those business problems? What issues do you experience solving those problems with the chosen tools? The survey was partially designed to help us to better understand the Sumo Logic’s segment of IT Operations Management or IT Management markets as defined by Gartner, Forrester, and other analysts. I think that the sample set is relatively representative. Responders come from shows with varied audiences such as developers at Velocity and GlueCon, data center operators at VMworld, and folks investigating a move to the cloud at AWS re: Invent and Cloud Slam. Answers were actually pretty consistent across the different “cohorts”. We have a statistically significant number of responses, and finally, they were not our customers or direct prospects. So let’s dive in and see what we’ve got and let’s start at the top: Which business problems do you solve by using logs and other machine data? Applications management, monitoring, and troubleshooting (46%) IT operations management, monitoring, and troubleshooting (33%) Security management, monitoring, and alerting (21%) Does anything in there surprise? I guess it depends on what your point of reference is. Let me compare it to the overall “IT Management” or “IT Operations Management” market. The consensus(if such a thing exists) is that size by segment is: IT Infrastructure (servers, networks, etc) is up to 50-60% of the total market Application (internal, external, etc.) is just north of 30-40% Security is around 10% Source: Sumo Logic analysis of aggregated data from various industry analysts who cover IT Management space. There are a few things that could explain the big difference between how much our subsegment leans more toward Applications vs. IT infrastructure. (hypothesis #1) analysts measure total product sold to derive the market size which might not be the same as effort people apply to these use cases. (hypothesis #2) there is more shelfware in IT Infrastructure which overrepresented effort. (hypothesis #3) there are more home-grown solutions in Application management which underrepresents effort. (hypothesis #4) our data is an indicator or a result of a shift in the market (e.g., when enterprises shift toward the IaaS, they spend less time managing IT Infrastructure and shift more toward the core competency, their applications). (obnoxious hypothesis #5) intuitively, it’s the software stupid – nobody buys hardware because they love it, it exists to run software (applications), and we care more about applications, and that’s why it is so. OK, ok, let’s check the data to see which hypothesis can our narrow response set help test/validate. I don’t think our data can help us validate hypothesis #1 or hypothesis #2. I’ll try to come up with additional survey questions that will, in the future, help test these two hypotheses. Hypothesis #3 on the other hand might be partially testable. If we compare responses from users who use commercial vs. who use home-grown, we are left with the following: Not a significant difference between responders who use commercial vs. responders who use home grown tools. Hypothesis #3 explains only a couple of percentage points of difference. Hypothesis #4 – I think we can use a proxy to test it. Let’s assume that responders from VMworld are focused on internal data center and the private cloud. In this case they would not be relying as much on IaaS providers for IT Infrastructure Operations. On the other hand, let’s also assume that AWS, and other cloud conference attendees are more likely to rely on IaaS for IT Infrastructure Operations. Data please: Interesting, seems to explain some shift between security and infrastructure, but not applications. So, we’re left with: hypothesis #1 – spend vs. reported effort is skewed – perhaps hypothesis #2 – there is more shelfware in IT infrastructure – unlikely obnoxious hypothesis #5 – it’s the software stupid – getting warmer That should do it for one blog post. I’ve barely scratched the surface by stopping with the responses to the first question. I will work to see if I can test the outstanding hypotheses and, if successful, will write about the findings. I will also follow-up with another post looking at the rest of the data. I welcome your comments and thoughts. While you’re at it, try Sumo Logic for free.

AWS

January 31, 2013

Blog

Why I joined Sumo Logic and moved to Silicon Valley

We make hundreds of decisions every day, mostly small ones, that are just part of life’s ebb and flow. And then there are the big decisions that don’t merely create ripples in the flow of your life – they redirect it entirely. The massive, life-defining decisions like marriage and children; the career-defining decisions like choosing your first job after college. I’ve had my share of career-defining decisions – leaving a physics graduate program to chase after the dot com craze, leaving consulting for sales engineering, etc. The thing about this latest decision is that it combines both. I am joining Sumo Logic, leaving behind a safe job in marketing, and moving to Silicon Valley – away from my friends, family, and community. So, why did I do it? Now is the time for Start-Ups in Enterprise Software. Consumer start-ups get all the press, but the enterprise startups are where the real action is. The rash of consolidations in the last five years or so has created an innovation gap that companies like Sumo Logic are primed to exploit. The perfect storm of cloud computing, SaaS, Big Data, and DevOps/Agile is forcing customers to start looking outside of their comfort zones to find the solutions they need. Sumo Logic brings together all of that innovation in a way that is too good to not be a part of it. The Enterprise SaaS Revolution is Inevitable. The SaaS business model, combined with Agile development practices, is completely changing the ways companies buy enterprise software. Gartner sees companies replacing legacy software with SaaS more than ever. The antiquated term-licenses of on-premise software with its massive up-front costs, double digit maintenance charges, and “true-ups” seem positively barbaric by comparison to the flexibility of SaaS. And crucially for me, Sumo Logic is also one of the few true SaaS companies that is delving into the final frontier of the previously untouchable data center. Big Data is the “Killer App” for the Cloud. “Big Data” analytics, using highly parallel-ized architectures like Hadoop or Cassandra, is one of the first innovations in enterprise IT to truly be “born in the cloud”. These new approaches were built to solve problems that just didn’t exist ten, or even five, years ago. The Big Data aspect of Sumo Logic is exciting to me. I am convinced that we are only scratching the surface of what is possible with Sumo Logic’s technology, and I want to be there on the bleeding edge with them. Management Teams Matter. When it really comes down to it, I joined Sumo Logic because I have first-hand knowledge of the skills that Sumo Logic’s management team brings to the table. I have complete confidence in Vance Loiselle’s leadership as CEO, and Sumo Logic has an unbeatable combination of know-how and get-it-done people . And clearly some of the top venture capital firms in the world agree with me. This is a winning team, and I like to win! Silicon Valley is still Nirvana for Geeks and the best place for Start-Ups. Other cities are catching up, but Silicon Valley is still the best place to start a tech company. The combination of brainpower, money, and critical mass is just hard to beat. On a personal level I have resisted the siren call of San Francisco Bay Area for too long. I am strangely excited to be in a place where I can wear my glasses as a badge of honor, and discuss my love for gadgets and science fiction without shame. Luckily for me, I am blessed with a wife that has embraced my geek needs, and supports me whole heartedly (and a 21-month-old who doesn’t care either way). So, here’s to a great adventure with the Sumo Logic team, to a new life in Silicon Valley, and to living on the edge of innovation. P.S. If you want to see what I am so excited about, get a Sumo Logic Free account and check it out.

AWS

January 28, 2013

Blog

Mapping machine data (pun intended)

When you’re talking analytics, who said that an unfair advantage has to be ugly? Our newest feature is drop-dead gorgeous: What you’re seeing is the result of a geo lookup query, which matches extracted IP addresses to their geographical location–another troubleshooting tool from Sumo Logic. (If you’re ready to skip right to the good stuff and start using this feature, see our Knowledge Base article here.) Geo lookup queries use four Sumo Logic search language components: IP addresses are parsed, then the lookup operator compares the extracted IPs against a hosted IP geolocation table. The count and sort aggregate functions order the data; using these aggregate functions allows you to add a map to a Dashboard. The results are plugged in to the Google Maps API, and in a few seconds you’ve got a map showing the location of IP addresses. The syntax looks like this: | parse “remote_ip=*]” as ip_address | lookup latitude, longitude, country_code, country_name, city, postal_code from geo://default on ip = ip_address | count by latitude, longitude, country_code, country_name, city, postal_code | sort _count It’s important to note the flexibility of geolocation fields that you can choose to use in geo lookup queries. Longitude and latitude are required, but the hosted geolocation table includes fields for different levels of granularity, such as country_name, postal_code, and area_code; depending on the area of the world you’re concentrating on, you can pick and choose which fields make sense in your query. I also like using the familiar Google Maps interface–there’s no learning curve. The zoom slider/control is displayed both in the Search page, and in a Dashboard: In addition, clicking one of the markers on a map immediately zooms down to street level, meaning that you don’t have to worry about zooming on the wrong area: To learn more about using geo lookup queries to build maps, see Mapping IP addresses with geo lookup queries in the Sumo Logic Labs beta feature section of our Support Portal. While you’re there, be sure to drop us a line! Or, get started now using Sumo Logic Free!

January 25, 2013

Blog

Beyond LogReduce: Refinement and personalization

LogReduce is a powerful feature unique to the Sumo Logic offering. At the click of a single button, the user can apply the Summarize function to their previous search results, distilling hundreds of thousands of unstructured log messages into a discernible set of underlying patterns. While this capability represents a significant advance in log analysis, we haven’t stopped there. One of the central principles of Sumo Logic is that, as a cloud-based log management service, we are uniquely positioned to deliver a superior service that learns and improves from user interactions with the system. In the case of LogReduce, we’ve added features that allow the system to learn better, more accurate patterns (refinement), and to learn which patterns a given user might find most relevant (personalization). Refinement Users have the ability to refine the automatically extracted signatures by splitting overly generalized patterns into finer-grained signatures or editing overly specific signatures to mark fields as wild cards. These modifications will then be remembered by the Sumo Logic system. As a result, all future queries run by users within the organization will be improved by returning higher-quality signatures. Personalization Personalized LogReduce helps users uncover the insights most important to them by capturing user feedback and using it to shape the ranking of the returned results. Users can promote or demote signatures to ensure that they do (or do not) appear at the top of Summarize results. Besides obeying this explicit feedback, Sumo Logic also uses this information to compute a relevance score which is used to rank signatures according to their content. These relevance profiles are individually tailored to each Sumo Logic user. For example, consider these Summarize query results: Since we haven’t given any feedback yet, their relevance scores are all equal to 5 (neutral) and they fall back to being ranked by count. Promotion Now, let’s pretend that we are in charge of ensuring that our database systems are functioning properly, so we promote one of the database-related signatures: We can see that the signature we have promoted has now been moved to the top of the results, with the maximum relevance score of 10. When we do future Summarize queries, that signature will continue to appear at the top of results (unless we later choose to undo its promotion by simply clicking the thumb again). The scores of the other two database-related signatures have increased as well, improving their rankings. This is because the content of these signatures is similar to the promoted database signature. This boost also will persist to future searches. Demotion This functionality works in the opposite direction as well. Continuing our running example, our intense focus on database management may mean that we find log messages about compute jobs to be distracting noise in our search results. We could try to “blacklist” these messages by putting Boolean negations in our original query string (e.g., “!comput*”), but this approach is not very practical or flexible. As we add more and more terms to our our search, it becomes increasingly likely that we will unintentionally filter out messages that are actually important to us. With Personalized LogReduce, we can simply demote one of the computation-related logs: This signature then drops to the bottom of the results. As with promotion, the relevance and ranking of the other similar computation-related signature has also been lowered, and this behavior will be persisted across other Summarize queries for this user. Implicit feedback Besides taking into account explicit user feedback (promotion and demotion), Summarize can also track and leverage the implicit signals present in user behavior. Specifically, when a user does a “View Details” drill-down into a particular signature to view the raw logs, this is also taken to be a weaker form of evidence to increase the relevance scores of related signatures. Conclusion The signature refinement and personalized relevance extensions to LogReduce enable the Sumo Logic service to learn from experience as users explore their log data. This kind of virtuous cycle holds great promise for helping users get from raw logs to business-critical insights in the quickest and easiest way possible, and we’re only getting started. Try these features out on your own logs at no cost with Sumo Logic Free and let us know what you think!

January 23, 2013

Blog

2013: The year of machine data science?

Since I was a kid, I had a fascination for chess playing programs – until it got to a point that it became impossible for me to beat a good chess program. And years ago, not long after I gave up my personal fight with them – the last man standing lost to the best chess playing program. Clearly for chess, machine intelligence overtook human intelligence that day. Another area where machine intelligence has evolved to a point where it’s better than human intelligence is the maps program. I used to have to carry a road atlas with me or risk spending a lot of time just finding my way back on track. It got better a little when you could take a print out, but if I missed an exit or wanted to go for a scenic detour – I again was on my own. Not any more, now I can simply plug in my phone, speak the next destination, and it guides me patiently to that destination – recalculating the route if i miss an exit, heck even warning me when the route is blocked with traffic. These are just couple of examples of how technology evolves to a point – where it would have seemed a sci-fi fantasy 10-15 years ago. And it fundamentally changes how we all go about our lives. Machine Data Analytics seems like another area desperately in need of a similar evolution. Machine Data Analytics has to evolve into Machine Data Science – and it has to evolve to a point where we depend on it and use it just as I rely on maps and navigation on my cell phone. And Sumo Logic is at the forefront of making that change happen – and there are some fundamental shifts in computing technology – changes which bring that breakthrough within reach. Cloud has become as mainstream as video streaming. And just like video streaming completely disrupted brick and mortar DVD rental businesses, Cloud has already and continues to bring along fundamentally disruptive technologies to life. So what does Cloud mean for Machine Data? It will be about generating sophisticated insights from the data generated by IT today. Machine data is already the one of the biggest sources of “Big Data” in enterprises. It will be about delivering smarter analytics at scale with the simplicity of a service. As the new year begins, I feel proud and satisfied with what we have accomplished in the last two and a half years. And super excited about the journey ahead – a future is waiting to be invented. 🙂

January 10, 2013

Blog

Me at the End of the World

Blog

60 Days, But I'm Not Counting

My first couple of months at Sumo Logic have been a whirlwind of good activity and I thought I'd share some of the reasons that got me excited about the opportunity and that continue to hold true after talking with customers, partners, and prospects. Sumo Logic = Short Time to Operational Value When I started the interview process with Sumo Logic, I registered for our free product to understand the experience. It took me 15 minutes to register, download a collector, point it to log files on my laptop and see the first set of analyses in the Sumo Logic cloud-based application. If the process was fairly seamless to me as a marketing person, it certainly bode well for the technical IT and Operations teams who typically work with log management products like ours. Powerful Analytics (Of Two Kinds) It's one thing to ingest the volumes of machine data that get generated every day; it's quite another to provide useful insights into what that data means for the business. The Sumo Logic approach has been to focus on two types of analytics. a) The first is to focus on the "known unknowns" issues associated with your machine data - typically these approaches consist of some combination of search functions, reports and real-time dashboards. b) The second is the far more difficult process of helping customers analyze issues that fall into the "unknown unknowns" category, and this is what got me extremely excited about what Sumo Logic is doing. If you have hundreds of pages of log files, how do you know what to search for? Enter the concept of LogReduce, our patent-pending technology that enables companies to take hundreds or thousands of pages and distills them down into a meaningful set of patterns. These patterns enable you to understand issues with your infrastructure that you typically had no idea were occurring. The Opportunity For Community I'm a big believer that communities are a natural way to help people solve problems, get answers to pressing questions, and feel like they're part of a network that can help them professionally and personally. Having run a developer program at scale in a past life, I think there is plenty of opportunity to build a community of passionate Sumo Logic users around the world, at both a program level and within the product itself which is easier to do with a cloud-based service like ours. If you're interested in seeing why I'm excited, try Sumo Logic Free and participate in our community via Twitter.

December 17, 2012

Blog

AWS re:Invent - The Future is Now

Blog

Don't Just Move your Data-Center (Part I)

A couple of weeks ago I gave a cool little web presentation (I say cool because I like doing those decently more than I like sitting at my desk, and I say little because I went for 33 minutes, and I know I could have gone for 90...) about cloud security best practices and design principles (I will be giving this talk again, BTW, on January 9th for the Amazon Web Services Ecosystem) and I got a pretty good question from one of the viewers. They wanted to know “what mistakes do people make when they utilize cloud based infrastructure providers?” and I thought that was an excellent question, and since not all of you were there to hear my answer, I’m here to share it, and expound on it a little bit. In my opinion, the biggest mistake you can make in adopting Infrastructure as a Service (IaaS) providers is to just move your data-center into the cloud wholesale in basically the same shape it is already in. Now, certainly I understand this temptation! You have probably spent a lot of time creating scripts and setting up access controls and logging mechanisms, and everything that comes with building your deployment in a traditional (what I call ‘data-center-centric’) way. And so it may seem that the best, fastest, easiest and cheapest thing to do is to simply pick it up and move it, as it were, to your new “hosting provider”, but this approach may well leave you missing out on some of the best reasons to run in the cloud. IaaS providers, such as Amazon Web Services, offer a multitude of services and features that can vastly improve your operational efficiency, scalability, and security, but they must be properly leveraged. Cloud computing, while similar in some respects to hosting, is an entirely different paradigm, and in order to take full advantage of it’s benefits, some time and care needs to be taken in the design phase of such a project. I like to compare the differences in cloud versus data-center configurations in terms of two types of gambling/entertainment most of you will be familiar with- playing Three Card Monte on the street (your data-center) and going to gamble in a major casino (the cloud). In a Three Card Monte scenario, the ‘house’ makes its money by keeping a tight level of control over the game. They know exactly which card the token is under at all times, they can palm or move the token at will, and they will have one or more shills in the audience to help them control the crowd’s reactions. This can be a very profitable endeavor for the ‘house’, but it is not scalable to large crowds, multiple dealers, or to environments where there is a high degree of scrutiny. In contrast, a casino is designed to achieve the same ends (to take your money and provide some entertainment in the process) but does so in a very different way. The casino relies on statistics in order to win over any given day. The casino can’t control (due to regulators) which slot machines will pay out exactly how much exactly when, nor can they control which blackjack dealers will have good or bad nights, and they cannot ‘fix’ the roulette wheel, but yet- the house always wins. This model is scaleable to large crowds, multiple dealers and games, and even high degrees of scrutiny. It is through exercising control at a higher level and giving up control at the lower level that they are able to achieve this scalability and profitability. It works the same in the cloud. Rather than having precise control over your hardware and network connections, you exercise control at the design level by creating feedback loops, auto-scaling triggers and by catching and reacting to exceptions. This allows you, much the same as the casino, to give up control over many of the details, and still ensure you always win at the end of the day. So just as it would be impractical to set up a Three Card Monte table in a modern casino, simply hauling your existing design into the cloud is not the best approach. Take the time to re-design your system to utilize all of the great advantages that IaaS providers such as Amazon provide. Tune in later fo Part II.

December 3, 2012

Blog

Sumo Logic Raises $30M in Series C Funding

This morning, Sumo Logic announced a $30M Series C investment round. On behalf of our entire company, I am pleased to welcome Accel Partners to the Sumo Logic family, and to join us in our mission of enabling IT and Operations teams to generate instant, actionable insights from the vast amount of machine data their organizations generate. This investment round, led by Accel and joined by existing investors Greylock Partners, Sutter Hill Ventures and Shlomo Kramer, is a major testament to the value that enterprises are gaining from deploying Sumo Logic’s powerful and highly scalable cloud-based log management and analytics service. It is also further testimony to the natural intersection between Cloud and Big Data, and Sumo Logic’s leading position as the Enterprise Cloud for Machine Data. With Sumo Logic, customers obtain a number of benefits not available with traditional on-premise solutions: A seamless and elastic Big Data platform which automatically scales to meet the demands of the modern enterprise Real-time monitoring and visualization powered by our streaming query engine Significantly reduced TCO as there is no need to deploy expensive on-premise hardware or dedicated personnel Patent pending real-time analytics that help you search for what you know and analyze what you don’t Rapid time to value through immediate insight from vast amounts of machine data With this $30M investment, we will further accelerate research and development and expand our innovations around machine data and analytics. Which of course means we are hiring the best and brightest. If you have passion for Big Data and the Cloud, and the talent to go along with it, we’d love to speak with you. Or, if you are a potential customer and want to experience the power of Sumo Logic for yourself, sign up instantly and for free. And lastly, if you’re interested in knowing a bit more about why I’m so bullish, check out my first blog post as Sumo Logic CEO. Everything I wrote then is even more true today.

November 29, 2012

Blog

Real-time Enterprise Dashboards, Really

November 14, 2012

Blog

We hire Data Scientists... so our customers don't have to

At last week’s DataWeek conference in San Francisco, Stefan Zier, Manager of Sumo Logic’s cloud and infrastructure group, spoke on a panel titled “Analytics-as-a-Service”. During that session, the moderator, Karthik Kannan of VMware, asked if having Data Scientists on staff was a necessity or a luxury. Stefan gave a brilliant answer, stating that at Sumo Logic, we hire data scientists so our customers don’t have to. Read on to see why… At Sumo Logic, we have built a highly scalable cloud platform to enable organizations to instantly derive operational and business insights from their log and machine data. Log and machine data (for example application logs, Apache logs, VMware logs, server logs, IIS logs, Linux logs, network logs, etc.) are the largest components of Big Data, and one key challenge of Big Data is that it is too large for humans to know what questions to ask of it. Therein comes our Data Scientists, building machine-learning algorithms to instantly deliver insights to customers from terabytes of their log and machine data. In a nutshell, our Data Scientists are taking what is generally seen as their domain (the ability to extract insight from massive amounts of data), and putting it into the hands of all --- business and IT executives, business and data analysts, operations managers, developers, etc. As a result, executives can make critical IT and business decisions from the freshest set of data, operations managers can monitor their environment in real time, and developers and operations personnel can troubleshoot production applications 90% faster than they were able to previously. Curious to know more? Check out our website where you can read about our patent-pending LogReduceTM, and see a recent blog post by one of our co-founders and VP of Analytics. Or see for yourself by signing up for Sumo Logic Free, a fully-featured version of our enterprise solution allowing up to 3.5GB to total storage.

October 5, 2012

Blog

Scala at Sumo: type classes with a machine learning example

At Sumo Logic we use the Scala programming language, and we are always on the lookout for ways that we can leverage its features to write high-quality software. The type class pattern (an idea which comes from Haskell) provides a flexible mechanism for associating behaviors with types, and context bounds make this pattern particularly easy to express in Scala code. Using this technique, we can extend existing types with new functionalities without worrying about inheritance. In this post we introduce a motivating example and examine a few different ways to express our ideas in code before coming back to type classes, context bounds, and what they can do for us. Machine learning example: fixed length feature vector representation Consider a machine learning setting where we have a very simple binary linear classifier that classifies an item according to the sign of the inner product between a weight vector and a fixed-length feature vector representation of that item. For this classifier to work on elements of some type T, we need a way to convert an instance of T into an Array[Double] for our inner product calculation. Converting some object of interest into this more abstract representation is a very common pre-processing task for machine learning algorithms, and is often handled in the simplest possible way, which we introduce as Option #1. Option 1: require the caller to convert their data to Array[Double] before calling our code // This is the simplest possible approach, but it has a few disadvantages. First, we’re forcing the caller to do the conversions and bookkeeping around the association between the original objects of type T and their corresponding feature vectors. We are also disregarding the assistance of Scala’s type system – it could be possible to accidentally train the same classifier on incompatible types T1 and T2, because the classifier only knows about Array[Double]. Extending this idea, consider a NearestNeighbors class that uses feature vectors to return the k nearest neighbor training examples for a given test example. This class could potentially return feature vectors constructed from different original types than T, which may not be the desired behavior. Option 2: add a parameter T => Array[Double] that allows us to compute a feature vector from an instance of T // This has some advantages over Option #1: the caller no longer worries about the associations between instances of T and Array[Double]. We also gain the benefits of Scala’s type-checking, in that a LinearClassifier[T1] will complain about being asked to classify a T2. However this approach also introduces additional burdens: if we call other methods that require this functionality, we need to continue passing the function around, or we need to evaluate it and pass its result (an Array[Double]) around, essentially assuming the bookkeeping burden from Option #1. if we want to enrich the our feature representation with more information (e.g., an IndexedSeq[String] describing each feature), we would need to add another function or another output to this function. This could quickly become messy. Option 3: define a FeatureVector trait with a method features: Array[Double], adding the type bound T <: FeatureVector // Now we are getting somewhere – this design alleviates the two stated disadvantages of Option #2. However, this unfortunately assumes that the original definition of T must have been aware of the FeatureVector trait in order to extend it. Alternatively, the caller could define and instantiate some “wrapper” class that extends FeatureVector. One hazard of the wrapper approach is that it could become unwieldy if we want to further augment T with other functionality. We would either need separate wrappers for every combination of functionalities, or we would need to nest wrappers somehow. If we did decide to use wrappers, we could use Scala’s implicit machinery to automate the wrapping/unwrapping with a view bound T <% FeatureVector. Type classes The concept of sub-typing is usually associated with an is-a relation: the bound T <: FeatureVector asserts that our code can treat T as a FeatureVector in the sense that T will supply a features() method. A type class can be thought of as a has-a relation: we would like to ensure that type T has some associated type FeatureVector[T] capable of computing features(x: T) for any given instance x of type T. This design facilitates separation of concerns while still leveraging the type system: instead of having our item T know how to turn itself into an Array[Double], we can easily drop in different implementations of FeatureVector[T] that may compute features(x: T) in different ways. Also, by supplying separate T-parametrized types for different functionalities, we can mix and match behaviors onto our pre-existing type T more easily than we could with wrappers. A classic example from the Scala standard libraries is the Ordering[T] trait: Rather than requiring instances of T to supply an implementation of compare(x: T), we can rely on the existence of an Ordering[T] capable of computing compare(x: T, y: T). Option 4: define a type FeatureVector[T] and use the context bound T : FeatureVector // In this version, we have a type-parameterized trait FeatureVector[T], which we define as being able to compute features(x: T) for instances of our type T. Similar to a view bound, the context bound LinearClassifier[T : FeatureVector] asserts the existence of an implicit instance of type FeatureVector[T] in the scope of the caller, and then threading it through to all further calls. If no FeatureVector[T] is available, we will get an error at compile time. For a full runnable example using this approach, see this gist. Note that we could use this design without context bounds or implicits by explicitly passing around an instance of FeatureVector[T], similar to how we passed a function around in Option #2 above. Like Option #2 however, we would have the burden of passing an extra argument around, and this inconvenience might discourage us from factoring our code in this way. This has been a slightly circuitous journey, but I hope the running example and discussion of different alternatives has helped to illustrate the usefulness of the type class pattern and context bounds in Scala. See the references below for more details. REFERENCES Daniel C. Sobral’s epic StackOverflow answer is a great explanation of these ideas. He also has a related blog post. Another good StackOverflow discussion The excellent book Scala in Depth briefly discusses type classes in Scala. More discussion of type class concepts in the context of the very interesting scalaz library

September 24, 2012

Blog

Splunk introduces Storm...welcome to the cloud.

Recently Splunk announced the availability of its cloud offering, which is just further validation that large enterprises, along with the rest of the world, are moving from on-premise to cloud solutions. In this post I’ll share a few thoughts on why Sumo Logic started in the cloud and raise a few questions about Splunk's announcement as it relates to the broader machine data search and analytics market. 1. Why Sumo Logic started in the Cloud? I joined Sumo Logic with first-hand experience trying to shift an on-premise software company to a cloud agenda. The reality is that if you look at the traction and growth in every major category you will see cloud vendors rapidly taking share from the much larger on-premise incumbents. Salesforce.com started it, but recent examples include ServiceNow which just went public, SuccessFactors which was bought by SAP, and Workday, which is in hyper-growth as evidenced by its recent S-1 filing. What makes this transition so hard? On-premise vs. cloud architecture. Most on-premise software companies make the classic mistake of trying to port their legacy architecture to the cloud so they can take advantage of work and features that have already been done. In the majority of cases, this just does not work. Even the newest on-premise software companies tend to be at least 7-8 years old, with their underlying technology, tools and approach even older. This antiquated technology means they’re unable to take advantage of the tools and languages to scale across hundreds or thousands of highly ephemeral cloud computing instances and take advantage of the latest Big Data principles that have evolved significantly in the past two years. Engineering process and priorities. The process followed and pace at which software updates are prioritized, developed, tested and released is drastically different between on-premise and cloud offerings. To do it effectively you really need to invest in two separate teams with separate charters and visions. Most of today’s Cloud offerings seamlessly provide software updates every week, if not every day. Running two different processes and multiple versions can be very distracting and expensive, so you naturally lose focus on the important things, like operating, securing and scaling the cloud service. Business model and revenue recognition. Almost all on-premise software companies rely on closing deals each quarter to drive the majority of their software revenue for that quarter (perpetual licenses). Cloud companies typically take their revenue ratably over the term of the contract (subscription). Switching to a Cloud offering can wreak havoc on the financials if you previously relied on perpetual license sales. Sales model and compensation. Sales reps get paid on quota (how much they sell each year). On-premise deals (perpetual) tend to be at least 2-3x larger than subscription deals because all of the software is sold up front and then you pay maintenance over time. Most on-premise software companies do not properly incent sales reps to sell these cloud deals because they are not willing to pay/cost structure can't support a higher commission rate for deals that by nature are smaller. 2. Is Splunk’s cloud offering meant for the mid-market or enterprise? Though Splunk’s announcement is exciting and a validation for cloud solutions like Sumo Logic’s, I’m not sure we will actually see it in most large enterprise accounts. What happen to tackling Big Data? As InformationWeek points out here “Splunk Storm is decidedly not a big data play”. According to Splunk’s Web site, the pricing for advertised data plans tops out at 1TB of data. That is equivalent to less than 35GB of data per day retained for 30 days. The majority of large enterprises have far more than 35GB of machine data being generated each day – so its not clear if those customers have to move to the on-premise version or if they can scale their data beyond that. Sumo Logic is all about scale and Big Data. The cloud architecture gives us the flexibility to elastically scale any portion of our compute and storage engines on-demand, thus overcoming the headaches and performance bottlenecks of on-premise deployments. What about private clouds? The announcement also states that this “is for organizations that develop and run applications in the public cloud”. Splunk on their recent earnings call wen't on to say "It doesn’t have the features of Splunk Enterprise. It’s very targeted toward developers and being able to help log apps that are in development in the cloud." The majority of large enterprise applications are in private clouds, not in the public cloud, and certainly the bulk of the machine data being generated by these applications is not in the public cloud. So it would seem Splunk will be asking the majority of large enterprises to continue to install, manage and scale Splunk’s on-premise offering within their data center(s) and use the cloud offering for public cloud applications. We at Sumo Logic look forward to seeing how Splunk evolves this first version of their cloud solution and we welcome the opportunity to address large and small enterprises’ machine data search and analytics initiatives with one highly scalable offering for public and private clouds.

August 30, 2012

Blog

Fuzzing For Correctness

Handling the Edge-Case Search-Space Explosion: A Case for Randomized Testing For much of software testing the traditional quiver of unit, integration and system testing suffices. Your inputs typically have a main case and a small set of well understood edge cases you can cover. But: periodically, we come upon software problems where the range of acceptable inputs and the number of potential code paths is simply too large to have an acceptable degree of confidence in correctness of a software unit (be it a function, module, or entire system). Enter The Fuzzer When we hit this case at Sumo we’ve recently started applying a technique well known in the security-community but less commonly used in traditional development: fuzzing. Fuzzing in our context refers to randomized software testing on valid and invalid inputs. In the software development world, fuzzing is commonly used to test compilers, the classic case of an exploding search space. It has also gained traction recently in the Haskell community with QuickCheck, a module that can automagically build test cases for your code and test it against a given invariant. ScalaCheck aims to do the same for Scala, the language we use primarily at Sumo Logic. The long and short of it is this: scala.util.Random coupled with a basic understanding of the range of inputs is better at thinking of edge cases than I am. At Sumo, our product centers around a deep query language rife with potential edge cases to handle. We recently started replacing portions of the backend system with faster, more optimized, alternatives. This presented us with a great opportunity to fuzz both code paths against hundreds of thousands of randomly generated test queries and verify the equivalence of the results. Fuzzers are great because they have no qualms about writing queries like “??*?r.” No human will ever write that query. I didn’t think about testing out that query. That doesn’t mean that no human will never be impacted by the underlying bug (*’s allowed the query parts to overlap on a document. Whoops.) Of course, I probably should have caught that bug in unit testing. But there is a limit to the edge cases we can conceive, especially when we get tunnel vision around what we perceive to be the weak spots in the code. Your fuzzer doesn’t care what you think the weak spots are, and given enough time will explore all the meaningful areas of the search space. A fuzzer is only constrained by your ability define the search space and the time you allow it to run. Fuzzing is especially useful if you already have a piece of code that is trusted to produce correct results. Even in the new-code case, however, fuzzing can still be invaluable for finding inputs that throw you into infinite loops or cause general crashes. In light of this, here at Sumo we’ve incorporated another test category into our hierarchy — Fuzzing tests, which sit in between unit tests and integration tests in our cadre of tests. Handling Unpredictability There are issues associated with incorporating random testing into your workflow. One should be rightfully concerned that your tests will be inherently flaky. Tests that whose executions are unpredictable, by necessity, have an associated stigma in the current testing landscape which rightfully strives for reproducibility in all cases. In light of this, we’ve established best practices for addressing those concerns in the randomized testing we do at Sumo. Each run of a randomized test should utilize a single random number generator throughout. The random number generator should be predictably seeded (System.currentTimeMillis() is popular), and that seed should be logged along with the test run. The test should be designed to be rerunnable with a specific seed. The test should be designed to output specific errors that can trivially (or even automatically) be pulled into a deterministic test suite. All errors caught by a randomized test should be incorporated into a deterministic test to prevent regressions. Following these guidelines allows us to create reproducible, actionable and robust randomized tests and goes a long way towards finding tricky corner cases before they can ever manifest in production. (To see the end result of all of our programming efforts, please check out Sumo Logic Free.)

August 14, 2012

Blog

Scala at Sumo: grab bag of tricks

As mentioned previously on this blog, at Sumo Logic we primarily develop our backend software using the Scala programming language. In this post we present some (very) miscellaneous Scala tidbits and snippets that showcase some interesting features we have taken advantage of in our own code. Laziness Scala borrows many ideas from the world of functional programming, including “laziness”. The essential idea of a lazy collection is that its contents are not computed until they are needed. For example, one could construct a lazy abstraction layer over expensive external resources like disk or database accesses. This abstraction can then be created and passed around without incurring the expense of populating it until the values are actually read out by some other part of the program. This post dives into the performance implications of laziness, here we present a very contrived example that illustrates the basic concept. // // ]]> In the first version, a strict Range is created and mapped over. The map is eagerly evaluated, simultaneously populating the mutant ListBuffer with values and producing a sequence of Boolean values. The exists method is then called with an identity argument, returning true after reading the second value in the sequence, which is true because 1 > 0. However the fact that our ListBuffer contains the value (0,1,2,3) tells us that the map was computed over all elements of the Range; this is because that entire computation happens “before” exists begins consuming the values. // // ]]> In the second version, we call view on the strict Range to create a lazy sequence. The mapping function is then only called when elements of this sequence are consumed by the exists method. Once exists hits the true value, it short-circuits and returns true without consuming the rest of the sequence. This is why we see 0 and 1 only in the ListBuffer. The map computation was only evaluated on an “as-needed” basis to supply values for exists to consume, and exists only needed to consume 0 and 1 before terminating with a value of true. Note that we are using side-effects and a mutable data structure within the mapping function here for illustrative purposes only! In actual code this could easily introduce nasty bugs, as demonstrated by the mildly surprising result of our little experiment. Regex unapply magic When using a regular expression to parse values out of a string for further processing, it is fairly common to see code resembling the following: // // ]]> Conveniently, the Regex class can be combined with Scala pattern-matching machinery to directly bind captured groups to local variables in one shot: // // ]]> This specific instance is a particularly nice example of the general usefulness of Scala case classes, pattern matching, and extractors. If you are interested, these mechanisms are a good place to start digging deeper into how this trick works. Handling nested Java null checks In Scala, the use of null is generally avoided upon due to the availability of the much nicer Option. However, occasionally we need to follow a nested series of calls against null-riddled Java code where if any value in the chain returns null, we would like to return some default value. In this case we can combine a common Scala trick (looping over Option values with for-comprehensions) with the fact that Option can be used as a wrapper of potentially-null values. For the simple case of nested-access with potential lurking nulls, this snippet is much easier on the eyes than an equivalent set of nested if-thens: // // ]]> Executing this App yields the desired behavior: Any null in the chain defaults to authority -1 -1 -1 -1 Complete non-null chain yields correct authority 76

July 25, 2012

Blog

3 Tips for Writing Performant Scala

Here at Sumo Logic we write a lot of Scala code. We also have a lot of data, so some of our code has to go really, really fast. While Scala allows us to write correct, clear code quickly, it can be challenging to ensure that you are getting the best performance possible. Two expressions which seem to be equivalent in terms of performance can behave radically differently. Only with an in-depth understanding the implementation of the language and the standard library can one predict which will be faster. For a great explanation of the implementation and details of the Scala language, I recommend reading Programming in Scala 2ed by Odersky, Spoon and Venners cover to cover. It’s worth every page. Short of reading the 800+ pages of Programming in Scala, here are 3 pieces of low hanging fruit to help improve the performance of your Scala code. 1. Understand the Collections! Users of Java are used to ArrayList with constant time lookup and amortized constant time append. In Scala, the object you get when you request a List(1, 2, 3) you get linked list. It can be prepended with objects using the “cons” (::) operator in constant time, but many other operations such as index based lookup, length, and append will run in linear time(!). If you want random access, you want an IndexedSeq. If you want constant time append use a ListBuffer. Read the collections chapter of Programming in Scala 2ed for all the details. 2. Be Lazy! Scala’s collection libraries allow us to write nearly any collection operation as short chain of functions. For example, let’s say we had a bunch of log entries. For each of them we wanted to extract the first word, pull them into groups of 8, then count the number of groups of 8 that contain the word “ERROR.” We would probably write that as: #wrap_githubgist3153362 .gist-data {max-height: 100%;} <pre>Not Found</pre> logs.map(_.takeWhile(_ != ‘ ‘)) will create an intermediate collection that we never use directly. If the size of logs was near our memory limit, the auxiliary collection could run us out of memory. To avoid generating the intermediate collections, we can run the operations on the list in a “lazy” manner. When we call the “.view” method on a Scala collection, it returns a view into the collection that provides lazy evaluation through a series of closures. For example, consider: #wrap_githubgist3153650 .gist-data {max-height: 100%;} <pre>Not Found</pre> If f(x) = x + 5, and g(x) = x * 2, then this is really just the functional composition of g(f(x)) — No reason to create the intermediate collections. A view runs transformations as functional composition instead of as a series of intermediate collections. So, going back to our initial example, the operation would become: #wrap_githubgist3153732 .gist-data {max-height: 100%;} <pre>Not Found</pre> The call to count will force the results of this computation to be evaluated. If your chain produces a collection on the other side (eg. just returning a subset of the logs), use .force to make it strict and return a concrete collection. Using lazy collections must be taken with a grain of salt — while lazy collections often can improve performance, they can also make it worse. For example: #wrap_githubgist3153369 .gist-data {max-height: 100%;} <pre>Not Found</pre> For this microbenchmark, the lazy version ran 1.5x faster than the strict version. However, for smaller values of n, the strict version will run faster. Lazy evaluation requires the creation of an additional closure. If creating the closures takes longer than creating intermediate collections, the lazy version will run slower. Profile and understand your bottlenecks before optimizing! 3. Don’t be Lazy! If you really need a piece of code to go fast, given the current state of Scala libraries and compiler, you’re going to need to write more of it. Sometimes (unfortunately) to write truly performant code in Scala, you need to write it like it’s C or Java. This means eschewing a lot of things you’ve come to love about Scala, such as: Use while loops instead of for loops. For loops create closures that can create significant overhead. Let me give some context for this: While-loop version code: #wrap_githubgist3153394 .gist-data {max-height: 100%;} <pre>Not Found</pre> The Scala code: #wrap_githubgist3153431 .gist-data {max-height: 100%;} <pre>Not Found</pre> Obviously the while-loop version will run faster, but the difference is surprising. In my benchmark, the while loop version ran on average in .557ms. The Scala version runs in 9.584ms. That is a 17x improvement! The exact reason is beyond the scope of this post, but in a nutshell, in the Scala version { x += 1 } is creating an anonymous class each time we want to increment x. For what it’s worth, this is issue 1338 on the Scala issue tracker, and there is a compiler plugin to perform a lot of these optimizations automatically. Replace convenience methods like exists, count, etc. with their hard-coded variants. For example, instead of: Version 1: #wrap_githubgist3153438 .gist-data {max-height: 100%;} <pre>Not Found</pre> Version 2: #wrap_githubgist3153445 .gist-data {max-height: 100%;} <pre>Not Found</pre> Version 2 gets a 3x speedup over version 1. Avoid objects when possible — use primitives instead. Whenever possible, the Scala compiler will insert JVM primitives for things like Ints, Booleans, Doubles and Longs. However, if you prevent this (by using an implicit conversion, etc.) and the compiler is forced to box your object, you will pay a significant performance cost. [Look for Value Classes to address this in future versions of Scala.] You could also specialize containers for primitive types, but that is beyond the scope of this post. I really hate suggesting that you write Scala like it’s C to get better performance. I really do. And I really enjoy programming in Scala. I hope in the future the standard library evolves in a way so that it becomes faster than hand-coding the C equivalent. The Bottom Line The first two suggestions get followed at Sumo Logic, and really boil down to a solid understanding of Scala’s standard library. The third suggestion gets followed very rarely if at all. This seems surprising — shouldn’t we be trying to write that fastest code possible? The answer, of course, is no. If we wanted to write the fastest code possible, we would essentially write Scala as if it were C. But then, why not just use C? There are multiple factors we need to optimize for here. By necessity, our services here at Sumo Logic are designed to scale horizontally. If we need better performance, we can spin up more nodes. It costs money. Developer time also costs money. Writing idiomatic Scala that fully utilizes the type-safety and functional properties of the language can and will produce code that runs slower than writing everything with while-loops in C-style. But slower code is OK. The critical trade-off for us is that writing clean Scala is faster, less error prone, and easier to maintain than the alternative. Scala’s performance shortcomings have garnered some criticism on the internet recently (and for good reason). This isn’t necessarily a reason not to use Scala. I suspect Scala performance will improve with time as the Scala compiler produces more optimized bytecode and the JVM gains native support for some of the functional features of the Scala language. The critical thing to recognize is that you must consider both developer and code performance in your optimizations. [Benchmarking Notes] Code was benchmarked with a script that first executes the function in question 10000 times, then runs it 1000 more times and computes the average. This ensures that the HotSpot JVM will JIT compile the code in question.

July 23, 2012

Blog

Pragmatic AWS: 3 Tips to Enhance the AWS SDK with Scala

At Sumo Logic, most backend code is written in Scala. Scala is a newer JVM (Java Virtual Machine) language created in 2001 by Martin Odersky, who also co-founded our Greylock sister company, TypeSafe. Over the past two years at Sumo Logic, we’ve found Scala to be a great way to use the AWS SDK for Java. In this post, I’ll explain some use cases. 1. Tags as fields on AWS model objects Accessing AWS resource tags can be tedious in Java. For example, to get the value of the “Cluster” tag on a given instance, something like this is usually needed: String deployment = null; for (Tag tag : instance.getTags()) { if (tag.getKey().equals(“Cluster”)) { deployment = tag.getValue(); } } While this isn’t horrible, it certainly doesn’t make code easy to read. Of course, one could turn this into a utility method to improve readability. The set of tags used by an application is usually known and small in number. For this reason, we found it useful to expose tags with an implicit wrapper around the EC2 SDK’s Instance, Volume, etc. classes. With a little Scala magic, the above code can now be written as: val deployment = instance.cluster Here is what it takes to make this magic work: object RichAmazonEC2 { implicit def wrapInstance(i: Instance) = new RichEC2Instance(i) } class RichEC2Instance(instance: Instance) { private def getTagValue(tag: String): String = tags.find(_.getKey == tag).map(_.getValue).getOrElse(null) def cluster = getTagValue(“Cluster”) } Whenever this functionality is desired, one just has to import RichAmazonEC2._ 2. Work with lists of resources Scala 2.8.0 included a very powerful new set of collections libraries, which are very useful when manipulating lists of AWS resources. Since the AWS SDK uses Java collections, to make this work, one needs to import collections.JavaConversions._, which transparently “converts” (wraps implicitly) the Java collections. Here are a few examples to showcase why this is powerful: Printing a sorted list of instances, by name: ec2.describeInstances(). // Get list of instances. getReservations. map(_.getInstances). flatten. // Translate reservations to instances. sortBy(_.sortName). // Sort the list. map(i => “%-25s (%s)”.format(i.name, i.getInstanceId)). // Create String. foreach(println(_)) // Print the string. Grouping a list of instances in a deployment by cluster (returns a Map from cluster name to list of instances in the cluster): ec2.describeInstances(). // Get list of instances. filter(_.deployment = “prod”). // Filter the list to prod deployment. groupBy(_.cluster) // Group by the cluster. You get the idea – this makes it trivial to build very rich interactions with EC2 resources. 3. Add pagination logic to the AWS SDK When we first started using AWS, we had a utility class to provide some commonly repeated functionality, such as pagination for S3 buckets and retry logic for calls. Instead of embedding functionality in a separate utility class, implicits allow you to pretend that the functionality you want exists in the AWS SDK. Here is an example that extends the AmazonS3 class to allow listing all objects in a bucket: object RichAmazonS3 { implicit def wrapAmazonS3(s3: AmazonS3) = new RichAmazonS3(s3) } class RichAmazonS3(s3: AmazonS3) { def listAllObjects(bucket: String, cadence: Int = 100): Seq[S3ObjectSummary] = { var result = List[S3ObjectSummary]() def addObjects(objects: ObjectListing) = result ++= objects.getObjectSummaries var objects = s3.listObjects(new ListObjectsRequest().withMaxKeys(cadence).withBucketName(bucket)) addObjects(objects) while (objects.isTruncated) { objects = s3.listNextBatchOfObjects(objects) addObjects(objects) } result } } To use this: val objects = s3.listAllObjects(“mybucket”) There is, of course a risk of running out of memory, given a large enough number of object summaries, but in many use cases, this is not a big concern. Summary Scala enables programmers to implement expressive, rich interactions with AWS and greatly improves readability and developer productivity when using the AWS SDK. It’s been an essential tool to help us succeed with AWS.

AWS

July 12, 2012

Blog

Nine 1Password Power Tips

Blog

IT Insights for All: Sumo Logic Launches Free Machine Data Search and Analytics Service

What is Free Machine Data Search and Analytics? Today we announce Sumo Logic Free, the first and only free, enterprise-class machine data search and analytics service. The free version of Sumo Logic’s cloud-based service offers full functionality, including: Loading, indexing and archival of machine data Distributed search Real-time analytics Proactive monitoring and alerting Reporting and visualization Role-based access controls Our cloud-based approach eliminates the need for expensive premise-based solutions that require upfront investments in hardware and software as well the headache of constant management and upgrades . The Sumo Logic service delivers a petabyte-scale platform that provides companies with access to valuable operational insights from their machine data—all in real time. Why Are Companies Embracing Machine Data? The volume, velocity and variety of machine data (logs, events, configuration changes, clickstream, etc.) being generated by applications, networks, servers, and mobile devices is overwhelming IT organizations. Our goal is to give every enterprise a way to search and analyze this machine data to troubleshoot application problems, proactively monitor performance and availability, and gain valuable operational and customer insights. For those companies just initiating the search for the right approach to handling their Big Data, we provide a purpose-built, scalable Big Data service to harness all the great information IT organizations have at their disposal. We leverage our patent-pending Elastic Log Processing ™ platform to intelligently scale and tune the service as the enterprise grows, so IT organizations can focus on delivering value to the business. Free Service = Win – Win. We know that once customers start using our service, they’ll quickly see the power and value of the insights they can glean, and will want to use it for more use cases and with more data. In addition, as more customers use our service in interesting and strategic ways, we’ll be able to apply their insights for the benefits of future customers and for future product development.

June 25, 2012

Blog

Security-Gain without Security-Pain

AWS

June 21, 2012

Blog

Some of Our Essential Service Providers

As I mentioned in one of my previous posts, here at Sumo Logic we believe cloud-based services provide excellent value due to their ease of setup, convenience and scalability, and we leverage them extensively to provide internal services that would be far more time, labor and cash intensive to manage ourselves. Today I’m going to talk about some of the services we use for collaboration, operations and I/T, why we use them, and how they simplify our lives.CampfireCampfire is a huge part of our productivity and culture at Sumo Logic. While I would lump this and Skype together under something like “Managed Corporate Messaging” they fill two very different niches in our environment.Campfire from 37 Signals is a fantastic tool for group conversations. Using the Campfire service, we have set up multiple chat rooms for various types of issues, including Production Issues, Development Issues, Sales/Customer-Support Issues, and of course, a free-for-all chat-room where we try to make one another spontaneously erupt into chaotic LOLs.These group-chats provide a critical space where we can work together to troubleshoot and solve problems cooperatively. Campfire makes it very easy to upload pictures and share large amounts of information in real-time with co-workers who can be anywhere. The conversations are all archived for later reference, which allows us to use the Production Incidents room as a 24×7 conference call and canonical forum of record for anything happening to production systems. Our Production on-call devs are expected to echo their actions into the Production channel and keep up with events there as they transpire.Campfire also has a cool feature which allows you to start a voice conference with participants if needed, which is a great option in certain situations. These calls can also be archived for later reference. One down side to the text and audio archives is that they are not easily searchable so it helps to know approximately when something happened, and we have found it necessary to consult other records to determine where to look.SkypeSkype is, of course, the very popular IM and VOIP service that was purchased by Microsoft a while back. We use Skype extensively for 1:1 chatting and easy and secure file-transfers throughout the company. We also make extensive use of the wide array of available emoticons. (Stefan Zier is a particularly prolific and artistic user of these.)We also use Skype video chat for interviews and to collaborate with team members abroad. We have a conference room with a TV and Skype camera just for this application.CloudkickRunning a large-scale cloud-based service requires a lot of operational awareness. One of the ways we achieve this is through Cloudkick. Cloudkick was recently acquired by Rackspace and is evolving into a Cloud Monitoring tool. We are still on the legacy Cloudkick service, which we have come to use heavily.We automatically install Cloudkick agents on all of our production instances and use them to collect a wide array of status codes from the O/S and through JMX as well as by running our own custom scripts which we use to check for the existence of critical processes and to detect if things like HPROF files exist.The Cloudkick website has a “show only failures” mode which we call the “What’s Wrong? Page”. This is a very helpful tool that allows our EverybodyOps team to quickly assess issues with our production environment.PagerDutyOf course, we also need to be proactively alerted to failures and crossed thresholds that could indicate trouble, and for this we rely on PagerDuty. (Affectionately known as P. Diddy to many of us, nickname coined by Christian). PagerDuty is another great tool which allows us to maximize the benefits of our EverybodyOps culture.Within PagerDuty we have a number of on-call rotations. One for our Production Primary role and one for the Secondary role, as well as another role for monitoring test failures and a lesser-known role for those of us who monitor the temperature in the one small server room we do have. P. Diddy allows us easily cover for each other using exceptions or by simply switching the Primary and Secondary roles on the fly if the Primary needs to go AFK for a while.P. Diddy allows each user to set their own personal escalation policy which can include texting, calling, and emailing with a configurable number of re-tries and timeouts. Another nice touch is that the rotation calendars can be imported into our personal calendars to remind us of when we are up next. This all makes the on-call rotation run pretty flawlessly from an administrative perspective with no gnarly configuration and management on our end.I must admit, I do have a personal habit of “Joaning” my secondary when I am on call… To properly “Joan” your secondary you accidentally escalate an alert to them that you meant to resolve, (I blame the comma after “Resolv”!)Google AppsLike many companies of all sizes we rely on Google for our email service. While some Sumos (like myself and Stefan) use mail clients to read our email, most Sumos are happy with the standard web interface from Google. We also heavily use internal groups for team communications.We also make good use of Google Docs for document authoring and sharing (this blog post was written and communally edited using Google Docs, in fact, due to the impressive real-time collaboration, Stefan Zier is watching me add this bit in order to resolve his comment right now!) We use Google Calendar for our scheduling needs (and calendar-stalking exercises!)We also use Google Analytics to obsess over you.Also, as Sumo Logic’s Director of Security, (which makes me partially responsible for managing the users and groups in Google Apps) I appreciate the richness of their security settings and especially the two-factor authentication and mobile device policy management.There’s more!These are just some of our SaaS providers. In an upcoming post I’ll talk more about some of the services that help us support and bill our customers and test and develop our product.We have found all of these providers deliver valuable and even crucial services that it would be far more expensive and time consuming for us to manage ourselves. We hope you may find some of them helpful too!

June 15, 2012

Blog

Pragmatic AWS: Principle of Least Privilege with IAM

AWS

June 12, 2012

Blog

Sumo Logic Jump Start (Part 2 of 2)

Blog

Sumo Logic at AWS Big Data Boston

I recently represented Sumo Logic at the AWS Big Data conference in Boston. It was a great show, very well-attended. Sumo Logic was one of the few vendors invited to participate. During the conference I conducted a survey of the attendees to try to understand how this, emerging early-adopter segment of IT professionals, manages log data for their infrastructure and applications. Common characteristics of attendees surveyed: They run their apps and infrastructure in the cloud They deal with large data sets They came to learn how to better exploit/leverage big data and cloud technologies What I asked: Do you use logs to help you in your daily work, and if so, how? What types of tools do you use for log analysis and management? What are the specific pain points associated with your log management solutions? The findings were interesting. Taking each one in turn: No major surprises here. Enterprises buy IaaS in order to run applications, either for burst capacity or because they believe it’s the wave of the future. The fact that someone else manages the infrastructure does not change the fact that you have to manage and monitor your applications, operating systems, and virtual machines. A bit of a surprise here. In my previous analysis, some 45% of enterprises use homegrown solutions, but in this segment it’s 70%. Big difference with the big data and cloud crowd. A possible explanation for this is that existing commercial solutions are not easy to deploy and run in the cloud and don’t scale to handle big data. So, the solution = build it yourself. Hmm. Yes, yes, I know, it adds up to more than 100%. That’s because the question was stated as “select as many as apply” and many respondents have more than one problem. So, nothing terribly interesting in there. But let me dig a bit deeper into issues associated with homegrown vs. commercial. This makes a bit more sense. For the home grown, it looks like complexity is the biggest pain – which makes sense. Assembling together huge systems to support big volumes of log data is more difficult than many people anticipate. Hadoop and other similar solutions are not optimized to simply and easily deliver answers. This then leads to the next pain point: if it is not easy to use, then you don’t use it = does not deliver enough value. The responses on commercial solutions make sense as well. Today’s commercial products are expensive and hard to operate. On top of the sticker price, you have to spend precious employee time to perform frequent software upgrades and implement “duct tape” scaling. If you don’t have expertise internally you buy it from vendors’ professional services at beaucoup $$$$$. You have to get your own compute and storage, which grow as your data volume grows. So, commercial “run yourself” solutions = very high CAPEX (upfront capital expenditures) and OPEX (ongoing operational expenditures). In the end (as the second pain point highlights), commercial solutions are also complex to operate and hard to use, requiring highly skilled and hard to find personnel. Pretty bleak – what now? At Sumo Logic, we think we have a solution. The pain points associated with home-grown and commercial solutions that were architected in the last decade are exactly what we set out to solve. We started this company after building, selling and supporting the previous generation of log management and analysis solutions. We’ve incorporated our collective experience and customer feedback into Sumo Logic. Built for the cloud The Sumo Solution is fundamentally different from anything else out there. It is built for big data and is “cloud native”. All of the complexities associated with deploying, managing, upgrading, and scaling are gone – we do all that for you. Our customers get a simple-to-use web application, and we do all the rest. Elastic scalability Our architecture is true cloud, not a “cloud-washed” adaptation of on-premise single-instance software solutions that are trying to pass themselves off as cloud. Each of our services are separate and can be scaled independently. It takes us minutes to triple the capacity of our system. Insights beyond your wildest dreams Because of our architecture, we are able to build analytics at scale. Our LogReduce™ and Push Analytics™ uncover things that you didn’t even know you should be paying attention to. The whole value proposition is turned on its head – instead of having to do all the work yourself, our algorithms do the work for you while you guide them to get better over time. Come try it out and see for yourself: http://www.sumologic.com/free-trial/

AWS

May 29, 2012

Blog

Pragmatic AWS: 4 Ideas for Using EC2 Tags

AWS

May 15, 2012

Blog

Objection Handling

Blog

Sumo Logic at RSA: Showcasing data security in cloud-based log management

March 16, 2012

Blog

Sumo on Sumo, Part 1: Collection

At Sumo Logic, we strongly believe in using our own service, sometimes called “dogfooding.” The primary reason for doing this is because Sumo Logic is a great fit for our environment. We run a mix of on-premise and cloud appliances, services and applications for which we need troubleshooting and monitoring capabilities:Our service, the Sumo Logic Log Management and Analytics Service, a distributed, complex SaaS applicationA heterogeneous, on-premise office networkOur development infrastructure, which lives in Amazon Web Services (AWS)Our websiteIn short: We are like many other companies out there, with a mix of needs and use cases for the Sumo Logic service. In this post, I’ll explain how we’ve deployed our Sumo Logic Collectors in our environment. Collectors are small software components that gather logs from various sources and send them to the Sumo Logic Cloud in a reliable and secure fashion. They can be deployed in a variety of ways. This post provides some real-world examples.The Sumo Logic ServiceOur service is deployed across a large number of servers that work in concert to accept logs, store them in NoSQL databases, index them, manage retention, and provide all the search and analytics functionality in the service. Any interaction with our system is almost guaranteed to touch more than one of these machines. As a result, debugging and monitoring based on log files alone would be impractical, bordering on impossible.The solution to this, of course, is the Sumo Logic service. After deciding that we wanted our own service to monitor itself, we weighed several different deployment options:A centralized Collector, pulling logs via SSHA centralized Collector, receiving logs via syslogCollectors on each machine in the deployment, reading local filesIn the end, we went with the third option: both our test and production environments run Sumo Logic Collectors on each machine. The primary motivator for this choice was that it was best for testing the service – running a bigger number of Collectors is more likely to surface any issues.This decision made it a priority to enable automatic deployment features in the Collector. This is why our Collectors can now be deployed both by hand and in a scripted fashion. Every time we deploy the service, a script installs and registers the Collector. Using the JSON configuration, we configure collection of files from our own application, third party applications, and system logs.So there you have it – the Sumo Logic service monitors itself.The OfficeSumo Logic’s main office is located in downtown Mountain View, on Castro Street. As strong proponents of cloud-based technologies, we’ve made an effort to keep the amount of physical infrastructure inside our office to a minimum. But there are some items any office needs, including ours:Robust Internet connectivity (we run a load balanced setup with two ISPs)Network infrastructure (we have switches, WiFi access points, firewalls)Storage for workstation backups (we have a small NAS)Phones (we just deployed some IP phones)Security Devices (we run an IDS with taps into our multiple points in network, and a web proxy)DHCP/DNS/Directory services (we run a Mac server with Open Directory)Some of these devices log to files, others to syslog, while yet others are Windows machines. Whenever a new device is added to the network, we make sure logs are being collected from it. This has been instrumental in debugging tricky WiFi issues (Castro Street is littered with tech companies running WiFi), figuring out login issues, troubleshooting Time Machine problems with our NAS, and many other use cases.Development InfrastructureOur bug tracker, CI cluster and other development infrastructure live in AWS. In order to monitor this infrastructure, we run Sumo Logic Collectors on all nodes, picking up system log files, web server logs, and application logs from the various commercial and open source tools we run. We use these logs for troubleshooting and to monitor trends.Our WebsiteThe web server on our public facing web site logs lots of interesting information about visitors and how they interact with the site. Of course, we couldn’t resist dropping in a Collector to pick up these log files. We use a scheduled query that runs hourly and tells us who signed up for the demo and trial accounts.In SummaryWe eat our own dog food and derive a lot of value from our own service, to troubleshoot and monitor all of our infrastructure, both on-premise and in the cloud. Our Collector’s ability to collect from a rich set of common source types (files, syslog, Windows), as well as the automatic, scripted installation, make it very easy to add new logs into the Sumo Logic Cloud.– Stefan Zier, Cloud Infrastructure ArchitectSumo on Sumo, part 2: User Signups

March 13, 2012

Blog

Sumo Logic Launches