Pricing Login
Pricing
Michael Churchman

Michael Churchman

Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues. He is a regular Fixate.io contributor.

Posts by Michael Churchman

Blog

How to monitor application logs

Blog

MySQL Log File Location

Blog

The History of Monitoring Tools

Blog

DevSecOps 2.0

Blog

Log Aggregation vs. APM: No, They’re Not the Same Thing

Are you a bit unsure about the difference between log aggregation and Application Performance Monitoring (APM)? If so, you’re hardly alone. These are closely related types of operations, and it can be easy to conflate them—or assume that if you are doing one of them, there’s no reason to do the other. In this post, we’ll take a look at log aggregation vs APM, and the relationship between these two data accumulation/analysis domains, and why it is important to address both of them with a suite of domain-appropriate tools, rather than a single tool.Defining APM First, let’s look at Application Performance Monitoring, or APM. Note that APM can stand for both Application Performance Monitoring and Application Performance Management, and in most of the important ways, these terms really refer to the same thing—monitoring and managing the performance of software under real-world conditions, with emphasis on the user experience, and the functional purpose of the software. Since we’ll be talking mostly about the monitoring side of cloud APM, we’ll treat the acronym as being interchangeable with Application Performance Monitoring, but with the implicit understanding that it includes the performance management functions associated with APM. What does APM monitor, and what does it manage? Most of the elements of APM fall into two key areas: user experience, and resource-related performance. While these two areas interact (resource use, for example, can have a strong effect on user experience), there are significant differences in the ways in which they are monitored (and to a lesser degree, managed):APM: User Experience The most basic way to monitor application performance in terms of user experience is to monitor response time. How long does it take after a user clicks on an application input element for the program to display a response? And more to the point, how long does it take before the program produces a complete response (i.e., a full database record displayed in the correct format, rather than a partial record or a spinning cursor)?Load is Important Response time, however, is highly dependent on load—the conditions under which the application operates, and in particular, the volume of user requests and other transactions, as well as the demand placed on resources used by the application. To be accurate and complete, user experience APM should include in-depth monitoring and reporting of response time and related metrics under expected load, under peak load (including unreasonably high peaks, since unreasonable conditions and events are rather alarmingly common on the Internet), and under continuous high load (an important but all too often neglected element of performance monitoring and stress testing). Much of the peak-level and continuous high-level load monitoring, of course, will need to be done under test conditions, since it requires application of the appropriate load, but it can also be incorporated into real-time monitoring by means of reasonably sophisticated analytics: report performance (and load) when load peaks above a specified level, or when it remains above a specified level for a given minimum period of time.APM: Resource Use Resource-based performance monitoring is the other key element of APM. How is the application using resources such as CPU, memory, storage, and I/O? When analyzing these metrics, the important numbers to look at are generally percentage of the resource used, and percentage still available. This actually falls within the realm of metrics monitoring more than APM, and requires tools dedicated to metrics monitoring. If percent used for any resource (such as compute, storage or memory usage) approaches the total available, that can (and generally should) be taken as an indication of a potential performance bottleneck. It may then become necessary to allocate a greater share of the resource in question (either on an ongoing basis, or under specified conditions) in order to avoid such bottlenecks. Remember: bottlenecks don’t just slow down the affected processes. They may also bring all actions dependent on those processes to a halt.Once Again, Load Resource use, like response time, should be monitored and analyzed not only under normal expected load, but also under peak and continuous high loads. Continuous high loads in particular are useful for identifying potential bottlenecks which might not otherwise be detected.Log Aggregation It should be obvious from the description of APM that it can make good use of logs, since the various logs associated with the deployment of a typical Internet-based application provide a considerable amount of performance-related data. Much of the monitoring that goes into APM, however, is not necessarily log-based, and many of the key functions which logs perform are distinct from those required by APM.Logs as Historical Records Logs form an ongoing record of the actions and state of the application, its components, and its environment; in many ways, they serve as a historical record for an application. As we indicated, much of this data is at least somewhat relevant to performance (load level records, for example), but much of it is focused on areas not closely connected with performance:Logs, for example, are indispensable when it comes to analyzing and tracing many security problems, including attempted break-ins. Log analysis can detect suspicious patterns of user activity, as well as unusual actions on the part of system or application resources.Logs are a key element in maintaining compliance records for applications operating in a regulated environment. They can also be important in identifying details of specific transactions and other events when they require verification, or are in dispute.Logs can be very important in tracing the history and development of functional problems, both at the application and infrastructure level—as well as in analyzing changes in the volume or nature of user activity over time. APM tools can also provide historical visibility into your environment, but they do it in a different way and at a different level. They trace performance issues to specific lines of code. This is a different kind of visibility and is not a substitute for the insight you gain from using log aggregation with historical data in order to research or analyze issues after they have occurred.The Need for Log Aggregation The two greatest problems associated with logs are the volume of data generated by logging, and the often very large number of different logs generated by the application and its associated resources and infrastructure components. Log aggregation is the process of automatically gathering logs from disparate sources and storing them in a central location. It is generally used in combination with other log management tools, as well as log-based analytics. It should be clear at this point that APM and log aggregation are not only different—It also does not make sense for a single tool to handle both tasks. It is, in fact, asking far too much of any one tool to take care of all of the key tasks required by either domain. Each of them requires a full suite of tools, including monitoring, analytics, a flexible dashboard system, and a full-featured API. A suite of tools that can fully serve both domains, such as that offered by Sumo Logic, can, on the other hand, provide you with the full stack visibility and search capability into your network, infrastructure and application logs.

July 27, 2017

Blog

AWS Config: Monitoring Resource Configurations for Compliance

Blog

Top Patterns for Building a Successful Microservices Architecture

Why do you need patterns for building a successful microservices architecture? Shouldn’t the same basic principles apply, whether you’re designing software for a monolithic or microservices architecture? Those principles do largely hold true at the highest and most abstract levels of design (i.e., the systems level), and at the lowest and most concrete levels (such as classes and functions). But most code design is really concerned with the broad range between those two extremes, and it is there that the very nature of microservices architecture requires not only new patterns for design, but also new patterns for reimagining existing monolithic applications. The truth is that there is nothing in monolithic architecture that inherently imposes either structure or discipline in design. Almost all programming languages currently in use are designed to enforce structure and discipline at the level of coding, of course, but at higher levels, good design still requires conscious adherence to methodologies that enforce a set of architectural best practices. Microservices architecture, on the other hand, does impose by its very nature a very definite kind of structural discipline at the level of individual resources. Just as it makes no sense to cut a basic microservice into arbitrary chunks, and separate them, it makes equally little sense to bundle an individual service with another related or unrelated service in an arbitrary package, when the level of packaging that you’re working with is typically one package per container. Microservices Architecture Requires New Patterns In other words, you really do need new patterns in order to successfully design microservices architecture. The need for patterns starts at the top. If you are refactoring a monolithic program into a microservices-based application, the first pattern that you need to consider is the one that you will use for decomposition. What pattern will you use as a guide in breaking the program down into microservices? What are the basic decomposition patterns? At the higher levels of decomposition, it makes sense to consider such functional criteria as broad areas of task-based responsibility (subdomains), or large-scale business/revenue-generating responsibilities (business capabilities). In practice, there is considerable overlap between these two general functional patterns, since a business’ internal large-scale organization of tasks is likely to closely match the organization of business responsibilities. In either case, decomposition at this level should follow the actual corporate-level breakdown of basic business activities, such as inventory, delivery, sales, order processing, etc. In the subsequent stages of decomposition, you can define groups of microservices, and ultimately individual microservices. This calls for a different and much more fine-grained pattern of decomposition—one which is based largely on interactions within the application, with individual users, or both. Decomposition Patterns for Microservices Architecture There are several ways to decompose applications at this level, depending in part on the nature of the application, as well as the pattern for deployment. You can combine decomposition patterns, and in many if not most cases, this will be the most practical and natural approach. Among the key microservice-level decomposition patterns are: Decomposition by Use Case In many respects, this pattern is the logical continuation of a large-scale decomposition pattern, since business capabilities and subdomains are both fundamentally use case-based. In this pattern, you first identify use cases: sequences of actions which a user would typically follow in order to perform a task. Note that a user (or actor) does not need to be a person; it can, in fact, be another part of the same application. A use case could be something as obvious and common as filling out an online form or retrieving and displaying a database record. It could also include tasks such as processing and saving streaming data from a real-time input device, or polling multiple devices to synchronize data. If it seems fairly natural to model a process as a unified set of interactions between actors with an identifiable purpose, it is probably a good candidate for the use case decomposition pattern. Decomposition by Resources In this pattern, you define microservices based on the resources (storage, peripherals, databases, etc.) that they access or control. This allows you to create a set of microservices which function as channels for access to individual resources (following the basic pattern of OS-based peripheral/resource drivers), so that resource-access code does not need to be duplicated in other parts of the application. Isolating resource interfaces in specific microservices has the added advantage of allowing you to accommodate changes to a resource by updating only the microservice that accesses it directly. Decomposition by Responsibilities/Functions This pattern is likely to be most useful in the case of internal operations which perform a clearly defined set of functions that are likely to be shared by more than one part of the application. Such responsibility domains might include shopping cart checkout, inventory access, or credit authorization. Other microservices could be defined in terms of relatively simple functions (as is the case with many built-in OS-based microservices) rather than more complex domains. Microservices Architecture Deployment Patterns Beyond decomposition, there are other patterns of considerable importance in building a microservices-based architecture. Among the key patterns are those for deployment. There are three underlying patterns for microservices deployment, along with a few variations: Single Host/Multiple Services In this pattern, you deploy multiple instances of a service on a single host. This reduces deployment overhead, and allows greater efficiency through the use of shared resources. It has, however, greater potential for conflict, and security problems, since services interacting with different clients may be insufficiently isolated from each other. Single Service per Host, Virtual Machine, or Container This pattern deploys each service in its own environment. Typically, this environment will be a virtual machine (VM) or container, although there are times when the host may be defined at a less abstract level. This kind of deployment provides a high degree of flexibility, with little potential for conflict over system resources. Services are either entirely isolated from those used by other clients (as is the case with single-service-per-VM deployment), or can be effectively isolated while sharing some lower-level system resources (i.e., containers with appropriate security features). Deployment overhead may be greater than in the single host/multiple services model, but in practice, this may not represent significant cost in time or resources. Serverless/Abstracted Platform In this pattern, the service runs directly on pre-configured infrastructure made available as a service (which may be priced on a per-request basis); deployment may consist of little more than uploading the code, with a small number of configuration settings on your part. The deployment system places the code in a container or VM, which it manages. All you need to make use of the microservice is its address. Among the most common serverless environments are AWS Lambda, Azure Functions, and Google Cloud Functions. Serverless deployment requires very little overhead. It does, however, impose significant limitations, since the uploaded code must be able to meet the (often strict) requirements of the underlying infrastructure. This means that you may have a limited selection of programming languages and interfaces to outside resources. Serverless deployment also typically rules out stateful services. Applying Other Patterns to Microservices Architecture There are a variety of other patterns which apply to one degree or another to microservices deployment. These include patterns for communicating with external applications and services, for managing data, for logging, for testing, and for security. In many cases, these patterns are similar for both monolithic and microservices architecture, although some patterns are more likely to be applicable to microservices than others. Fully automated parallel testing in a virtualized environment, for example, is typically the most appropriate pattern for testing VM/container-based microservices. As is so often the case in software development (as well as more traditional forms of engineering), the key to building a successful microservices architecture lies in finding the patterns that are most suitable to your application, understanding how they work, and adapting them to the particular circumstances of your deployment. Use of the appropriate patterns can provide you with a clear and accurate roadmap to successful microservices architecture refactoring and deployment. About the Author Michael Churchman is involved in the analysis of software development processes and related engineering management issues. Top Patterns for Building a Successful Microservices Architecture is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Making the Most of AWS Lambda Logs

How can you get the most out of monitoring your AWS Lambda functions? In this post, we’ll take a look at the monitoring and logging data that Lambda makes available, and the value that it can bring to your AWS operations. You may be thinking, “Why should I even monitor AWS Lambda? Doesn’t AWS take care of all of the system and housekeeping stuff with Lambda? I thought that all the user had to do was write some code and run it!” A Look at AWS Lambda If that is what you’re thinking, then for the most part, you’re right. AWS Lambda is designed to be a simple plug-and-play experience from the user’s point of view. Its function is simply to run user-supplied code on request in a standardized environment. You write the code, specifying some basic configuration parameters, and upload the code, the configuration information, and any necessary dependencies to AWS Lambda. This uploaded package is called a Lambda function. To run the function, you invoke it from an application running somewhere in the AWS ecosystem (EC2, S3, or most other AWS services). When Lambda receives the invoke request, it runs your function in a container; the container pops into existence, does its job, and pops back out of existence. Lambda manages the containers—You don’t need to (and can’t) do anything with them. So there it is—Lambda. It’s simple, it’s neat, it’s clean, and it does have some metrics which can be monitored, and which are worth monitoring. Which Lambda Metrics to Monitor? So, which Lambda metrics are important, and why would you monitor them? There are two kinds of monitoring information which AWS Lambda provides: metrics displayed in the AWS CloudWatch console, and logging data, which is handled by both CloudWatch and the CloudTrail monitoring service. Both types of data are valuable to the user—the nature of that value and the best way to make use of it depend largely on the type of data. Monitoring Lambda CloudWatch Console Metrics Because AWS Lambda is strictly a standardized platform for running user-created code, the metrics that it displays in the CloudWatch console are largely concerned with the state of that code. These metrics include the number of invocation requests that a function receives, the number of failures resulting from errors in the function, the number of failures in user-configured error handling, the function’s duration, or running time, and the number of invocations that were throttled as a result of the user’s concurrency limits. These are useful metrics, and they can tell you a considerable amount about how well the code is working, how well the invocations work, and how the code operates within its environment. They are, however, largely useful in terms of functionality, debugging, and day-to-day (or millisecond-to-millisecond) operations. Monitoring and Analyzing AWS Lambda Logs With AWS Lambda, logging data is actually a much richer source of information in many ways. This is because logging provides a cumulative record of actions over time, including all API calls made in connection with AWS Lambda. Since Lambda functions exist for the most part to provide support for applications and websites running on other AWS services, Lambda log data is the main source of data about how a function is doing its job. “Logs,” you say, like Indiana Jones surrounded by hissing cobras. “Why does it always have to be logs? Digging through logs isn’t just un-fun, boring, and time-consuming. More often than not, it’s counter-productive, or just plain impractical!” And once again, you’re right. There isn’t much point in attempting to manually analyze AWS Lambda logs. in fact, you have three basic choices: either ignore the logs, write your own script for extracting and analyzing log data, or let a monitoring and analytics service do the work for you. For the majority of AWS Lambda users, the third option is by far the most practical and the most useful. Sumo Logic’s Log Analytics Dashboards for Lambda To get a clearer picture of what can be done with AWS Lambda metrics and logging data, let’s take a look at how the Sumo Logic App for AWS Lambda extracts useful information from the raw data, and how it organizes that data and presents it to the user. On the AWS side, you can use a Lambda function to collect CloudWatch logs and route them to Sumo Logic. Sumo integrates accumulated log and metric information to present a comprehensive picture of your AWS Lambda function’s behavior, condition, and use over time, using three standard dashboards: The Lambda Overview Dashboard The Overview dashboard provides a graphic representation of each function’s duration, maximum memory usage, compute usage, and errors. This allows you to quickly see how individual functions perform in comparison with each other. The Overview dashboard also breaks duration, memory, and compute usage down over time, making it possible to correlate Lambda function activity with other AWS-based operations, and it compares the actual values for all three metrics with their predicted values over time. This last set of values (actual vs. predicted) can help you pinpoint performance bottlenecks and allocate system resources more efficiently. The Lambda Duration and Memory Dashboard Sumo Logic’s AWS Lambda Duration and Memory dashboard displays duration and maximum memory use for all functions over a 24-hour period in the form of both outlier and trend charts. The Billed Duration by Hour trend chart compares actual billed duration with predicted duration on an hourly basis. In a similar manner, the Unused Memory trend chart shows used, unused, and predicted unused memory size, along with available memory. These charts, along with the Max Memory Used box plot chart, can be very useful in determining when and how to balance function invocations and avoid excessive memory over- or underuse. The Lambda Usage Dashboard The Usage dashboard breaks down requests, duration, and memory usage by function, along with requests by version alias. It includes actual request counts broken down by function and version alias. The Usage dashboard also includes detailed information on each function, including individual request ID, duration, billing, memory, and time information for each request. The breakdown into individual requests makes it easy to identify and examine specific instances of a function’s invocation, in order to analyze what is happening with that function on a case-by-case level. It is integrated, dashboard-based analytics such as those presented by the Sumo Logic App for AWS Lambda that make it not only possible but easy to extract useful data from Lambda, and truly make the most of AWS Lambda monitoring. About the Author Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.

Blog

5 Log Monitoring Moves to Wow Your Business Partner

Looking for some logging moves that will impress your business partner? In this post, we’ll show you a few. But first, a note of caution:If you’re going to wow your business partner, make a visiting venture capitalist’s jaw drop, or knock the socks off of a few stockholders, you could always accomplish that with something that has a lot of flash, and not much more than that, or you could show them something that has real and lasting substance, and will make a difference in your company’s bottom line. We’ve all seen business presentations filled with flashy fireworks, and we’ve all seen how quickly those fireworks fade away.Around here, though, we believe in delivering value—the kind that stays with your organization, and gives it a solid foundation for growth. So, while the logging moves that we’re going to show you do look good, the important thing to keep in mind is that they provide genuine, substantial value—and discerning business partners and investors (the kind that you want to have in your corner) will recognize this value quickly.Why Is Log Monitoring Useful?What value should logs provide? Is it enough just to accumulate information so that IT staff can pick through it as required? That’s what most logs do, varying mostly in the amount of information and the level of detail. And most logs, taken as raw data, are very difficult to read and interpret; the most noticeable result of working with raw log data, in fact, is the demand that it puts on IT staff time.5 Log Monitoring Steps to SuccessMost of the value in logs is delivered by means of systems for organizing, managing, filtering, analyzing, and presenting log data. And needless to say, the best, most impressive, most valuable logging moves are those which are made possible by first-rate log management. They include:Quick, on-the-spot, easy-to-understand analytics. Pulling up instant, high-quality analytics may be the most impressive move that you can make when it comes to logging, and it is definitely one of the most valuable features that you should look for in any log management system. Raw log data is a gold mine, but you need to know how to extract and refine the gold. A high-quality analytics system will extract the data that’s valuable to you, based on your needs and interests, and present it in ways that make sense. It will also allow you to quickly recognize and understand the information that you’re looking for.Monitoring real-time data. While analysis of cumulative log data is extremely useful, there are also plenty of situations where you need to see what is going on right at the moment. Many of the processes that you most need to monitor (including customer interaction, system load, resource use, and hostile intrusion/attack) are rapid and transient, and there is no substitute for a real-time view into such events. Real-time monitoring should be accompanied by the capacity for real-time analytics. You need to be able to both see and understand events as they happen.Fully integrated logging and analytics. There may be processes in software development and operations which have a natural tendency to produce integrated output, but logging isn’t one of them. Each service or application can produce its own log, in its own format, based on its own standards, without reference to the content or format of the logs created by any other process. One of the most important and basic functions that any log management system can perform is log integration, bringing together not just standard log files, but also event-driven and real-time data. Want to really impress partners and investors? Bring up log data that comes from every part of your operation, and that is fully integrated into useful, easily-understood output.Drill-down to key data. Statistics and aggregate data are important; they give you an overall picture of how the system is operating, along with general, system-level warnings of potential trouble. But the ability to drill down to more specific levels of data—geographic regions, servers, individual accounts, specific services and processes —is what allows you to make use of much of that system-wide data. It’s one thing to see that your servers are experiencing an unusually high level of activity, and quite another to drill down and see an unusual spike in transactions centered around a group of servers in a region known for high levels of online credit card fraud. Needless to say, integrated logging and scalability are essential when it comes to drill-down capability.Logging throughout the application lifecycle. Logging integration includes integration across time, as well as across platforms. This means combining development, testing, and deployment logs with metrics and other performance-related data to provide a clear, unified, in-depth picture of the application’s entire lifecycle. This in turn makes it possible to look at development, operational, and performance-related issues in context, and see relationships which might not be visible without such cross-system, full lifecycle integration.Use Log Monitoring to Go for the GoldSo there you have it—five genuine, knock-’em-dead logging moves. They’ll look very impressive in a business presentation, and they’ll tell serious, knowledgeable investors that you understand and care about substance, and not just flash. More to the point, these are logging capabilities and strategies which will provide you with valuable (and often crucial) information about the development, deployment, and ongoing operation of your software.Logs do not need to be junkpiles of unsorted, raw data. Bring first-rate management and analytics to your logs now, and turn those junk-piles into gold.5 Log Monitoring Moves to Wow Your Business Partner is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.About the AuthorMichael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.

Blog

Technical Debt and its Impact on DevOps

Blog

Using Analytics to Support the Canary Release

When you roll out a new deployment, how do you roll? With a big bang? A blue/green deployment? Or do you prefer a Canary Release? There’s a lot to be said for the Canary Release strategy of testing new software releases on a limited subset of users. It reduces the risk of an embarrassing and potentially costly public failure of your application to a practical minimum. It allows you to test your new deployment in a real-world environment and under a real-world load. It allows a rapid (and generally painless) rollback. And if there’s a failure of genuinely catastrophic proportions, only a small subset of your users will even notice the problem. But when you use Canary Release, are you getting everything you can out of the process? A full-featured suite of analytics and monitoring tools is — or should be — an indispensable part of any Canary Release strategy. The Canary Release Pattern In a Canary Release, you initially release the new version of your software on a limited number of servers, and make it available to a small subset of your users. You monitor it for bugs and performance problems, and after you’ve taken care of those, you release it to all of your users. The strategy is named after the practice of taking canaries into coal mines to test the quality of the air; if the canary stopped singing (or died), it meant that the air was going bad. In this case, the “canary” is your initial subset of users; their exposure to your new release allows you detect and fix the bugs, so your general body of users won’t have to deal with them. Ideally, in a strategy such as this, you want to get as much useful information as possible out of your initial sample, so that you can detect not only the obvious errors and performance issues, but also problems which may not be so obvious, or which may be relatively slow to develop. This is where good analytic tools can make a difference. Using Analytics to Support a Canary Release In fact, the Canary Release strategy needs at least some analytics in order to work at all. Without any analytics, you would have to rely on extremely coarse-grained sources of information, such as end-user bug reports and obvious crashes at the server end, which are very likely to miss the problems that you actually need to find. Such problems, however, generally will show up in error logs and performance logs. Error statistics will tell you whether the number, type, and concentration (in time or space) of errors is out of the expected range. Even if they can’t identify the specific problem, such statistics can suggest the general direction in which the problem lies. And since error logs also contain records of individual errors, you can at least in theory pinpoint any errors which are likely to be the result of newly-introduced bugs, or of failed attempts to eliminate known bugs. The problem with identifying individual errors in the log is that any given error is likely to be a very small needle in a very large haystack. Analytics tools which incorporate intelligent searches and such features as pattern analysis and detection of unusual events allow you to identify likely signs of a significant error in seconds. Without such tools, the equivalent search might take hours, whether it uses brute force or carefully-crafted regex terms. Even being forced by necessity to do a line-by-line visual scan of an error log, however, is better than having no error log at all. Logs that monitor such things as performance, load, and load distribution can also be useful in the Canary Release strategy. Bugs which don’t produce clearly identifiable errors may show up in the form of performance degradation or excessive traffic. Design problems may also leave identifiable traces in performance logs; poor design can cause traffic jams, or lead to excessive demands on databases and other resources. You can enhance the value of your analytics, and of the Canary Release itself, if you put together an in-depth demographic profile of the user subset assigned to the release. The criteria which you use in choosing the subset, of course, depends on your needs and priorities, as well as the nature of the release. It may consist of in-house users, of a random selection from the general user base, or of users carefully chosen to represent either the general user base, or specific types of user. In any of these cases, however, it should be possible to assemble a profile of the users in the subset. If you know how the users in the subset make use of your software (which features they access most frequently, how often they use the major features, and at what times of day, how this use is reflected in server loads, etc.), and if you understand how these patterns of use compared to those of you general user base, the process of extrapolation from Canary Release analytics should be fairly straightforward, as long as you are using analytic tools which are capable of distilling out the information that you need. So yes, Canary Release can be one of the most rewarding deployment strategies — when you take full advantage of what it has to offer by making intelligent use of first-rate analytic tools. Then the canary will really sing! About the Author Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the 90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.

Blog

Snowflake Configurations and DevOps Automation

“Of course our pipeline is fully automated! Well, we have to do some manual configuration adjustments on a few of our bare metal servers after we run the install scripts, but you know what I mean…” We do know what you mean, but that is not full automation. Call it what it really is — partial automation in a snowflake environment. A snowflake configuration is ad-hoc and “unique” to the the environment at large. But in DevOps, you need to drop unique configurations, and focus on full-automation. What’s Wrong With Snowflake Configurations? In DevOps, a snowflake is a server that requires special configuration beyond that covered by automated deployment scripts. You do the automated deployment, and then you tweak the snowflake system by hand. For a long time (through the ‘90s, at least), snowflake configurations were the rule. Servers were bare metal, and any differences in hardware configuration or peripherals (as well as most differences in installed software) meant that any changes had to be handled on a system-by-system basis. Nobody even called them snowflakes. They were just normal servers. But what’s normal in one era can become an anachronism or an out-and-out roadblock in another era -— and nowhere is this more true than in the world of software development. A fully-automated, script-driven DevOps pipeline works best when the elements that make up the pipeline are uniform. A scripted deployment to a thousand identical servers may take less time and run more smoothly than deployment to half a dozen servers that require manual adjustments after the script has been run. For more on DevOps pipelines, see, “How to Build a Continuous Delivery Pipeline” › No Virtual Snowflakes … A virtual snowflake might be at home on a contemporary Christmas tree, but there’s no place for virtual snowflakes in DevOps. Cloud-based virtual environments are by their nature software-configurable; as long as the cloud insulates them from any interaction with the underlying hardware, there is no physical reason for a set of virtual servers running in the same cloud environment to be anything other than identical. Any differences should be based strictly on functional requirements — if there is no functional reason for any of the virtual servers to be different, they should be identical. Why is it important to maintain such a high degree of uniformity? In DevOps, all virtual machines (whether they’re full VMs or Docker) are containers in much the same way as steel shipping containers. When you ship something overseas, you’re only concerned with the container’s functional qualities. Uniform shipping containers are functionally useful because they have known characteristics, and they can be filled and stacked efficiently. This is equally true of even the most full-featured virtual machine when it is deployed as a DevOps container. This is all intrinsic to core DevOps philosophy. The container exists solely to deliver the software or services, and should be optimized for that purpose. When delivery to multiple virtual servers is automated and script-driven, optimization requires as much uniformity as possible in server configurations. For more on containers, see, “Kubernetes vs. Docker: What Does It Really Mean?” › What About Non-Virtual Snowflakes? If you only deal in virtual servers, it isn’t hard to impose the kind of standardization described above. But real life isn’t always that simple; you may find yourself working in an environment where some or all of the servers are bare metal. How do you handle a physical server with snowflake characteristics? Do you throw in the towel, and adjust it manually after each deployment, or are there ways to prevent a snowflake server from behaving like a snowflake? As it turns out, there are ways to de-snowflake a physical server — ways that are fully in keeping with core DevOps philosophy. First, however, consider this question: What makes a snowflake server a snowflake? Is it the mere fact that it requires special settings, or is it the need to make those adjustments outside of the automated deployment process (or in a way that interrupts the flow of that process)? A thoroughgoing DevOps purist might opt for the first definition, but in practical terms, the second definition is more than adequate. A snowflake is a snowflake because it must be treated as a snowflake. If it doesn’t require any special treatment, it’s not a snowflake. One way to eliminate the need for special treatment during deployment (as suggested by Daniel Lindner) is to install a virtual machine on the server, and deploy software on the virtual machine. The actual deployment would ignore the underlying hardware and interact only with the virtual system. The virtual machine would fully insulate the deployment from any of the server’s snowflake characteristics. What if it isn’t practical or desirable to add an extra virtual layer? It may still be possible to handle all of the server’s snowflake adjustments locally by means of scripts (or automated recipes, as Martin Fowler put it in his original Snowflake Server post), running on the target server itself. These local scripts would need to be able to recognize elements in the deployment which might require adjustments to snowflake configurations, then translate those requirements into local settings and apply them. If the elements that require local adjustments are available as part of the deployment data, the local scripts might intercept that data as the main deployment script runs. But if those elements are not obvious (if, for example, they are part of the compiled application code), it may be necessary to include a table of values which may require local adjustments as part of the deployment script (if not full de-snowflaking, at least a 99.99% de-snowflaking strategy). So, what is the bottom line on snowflake servers? In an ideal DevOps environment, they wouldn’t exist. In the less-than-ideal world where real-life DevOps takes place, they can’t always be eliminated, but you can still neutralize most or all of their snowflake characteristics to the point where they do not interfere with the pipeline. For more on virtual machines, see, “Docker Logs vs Virtual Machine Logs” › Next Up DevOps as a Service: Build Automation in the Cloud Learn about DevOps as a managed cloud service, the tools available, mutable vs immutable infrastructure, and more. The State of Modern Applications & DevSecOps in the Cloud Sumo Logic’s third annual report reveals how the world’s most cloud-savvy companies manage their modern applications. DevOps and Continuous Delivery Discover how Sumo Logic accelerates the CD pipeline with automated testing, integrated threat intelligence, and more.

Blog

Does Docker Deployment Make Bare Metal Relevant?

Is bare metal infrastructure relevant in a DevOps world? The cloud has reduced hardware to little more than a substrate for the pool of resources that is the cloud itself. Those resources are the important part; the hardware is little more than a formality. Or at least that’s been the standard cloud-vs-metal story, until recently. Times change, and everything that was old does eventually become new again — usually because of a combination of unmet needs, improved technology, and a fresh approach. And the bare-metal comeback is no different. Unmet Needs The cloud is a pool not just of generic, but also shared resources (processor speed, memory, storage space, bandwidth). Even if you pay a premium for a greater share of these things, you are still competing with the other premium-paying customers. And the hard truth is that cloud providers can’t guarantee a consistent, high level of performance. Cloud performance depends on the demand placed on it by other users — demand which you can’t control. If you need reliable performance, there is a good chance that you will not find it in the cloud. This is particularly true if you’re dealing with large databases; Big Data tends to be resource-hungry, and it is likely to do better on a platform with dedicated resources down to the bare-metal level, rather than in a cloud, where it may have to contend with dueling Hadoops. The cloud can present sticky compliance issues, as well. If you’re dealing with formal data-security standards, such as those set by the Securities and Exchange Commission or by overseas agencies, verification may be difficult in a cloud environment. Bare metal provides an environment with more clearly-defined, hardware-based boundaries and points of entry. Improved Technology Even if Moore’s Law has been slowing down to sniff the flowers lately, there have been significant improvements in hardware capabilities, such as increased storage capacity, and the availability of higher-capacity solid state drives, resulting in a major boost in key performance parameters. And technology isn’t just hardware — it’s also software and system architecture. Open-source initiatives for standardizing and improving the hardware interface layers, along with the highly scalable, low-overhead CoreOS, make lean, efficient bare metal provisioning and deployment a reality. And that means that it’s definitely time to look closely at what bare metal is now capable of doing, and what it can now do better than the cloud. A Fresh Approach As technology improves, it makes sense to take a new look at existing problems, and see what could be done now that hadn’t been possible (or easy) before. That’s where Docker and container technology come in. One of the major drawbacks of bare metal in comparison to cloud systems has always been the relative inflexibility of available resources. You can expand such things as memory, storage, and the number of processors, but the hard limit will always be what is physically available to the system; if you want to go beyond that point, you will need to manually install new resources. If you’re deploying a large number of virtual machines, resource inflexibility can be a serious problem. VMs have relatively high overhead; they require hypervisors, and they need enough memory and storage to contain both a complete virtual machine and a full operating system. All of this requires processor time as well. In the cloud, with its large pool of resources, it isn’t difficult to quickly shift resources to meet rapidly changing demands as virtual machines are created and deleted. In a bare-metal system with hardware-dependent resources, this kind of resource allocation can quickly run up against the hard limits of the system. Docker-based deployment, however, can radically reduce the demands placed on the host system. Containers are built to be lean; they use the kernel of the host OS, and they include only those applications and utilities which must be available locally. If a virtual machine is a bulky box that contains the application being deployed, plus plenty of packing material, a container is a thin wrapper around the application. And Docker itself is designed to manage a large number of containers efficiently, with little overhead. On bare metal, the combination of Docker, a lean, dedicated host system such as CoreOS, and an open-source hardware management layer makes it possible to host a much higher number of containers than virtual machines. In many cases, this means that bare metal’s relative lack of flexibility with regard to resources is no longer a factor; if the number of containers that can be deployed using available resources is much greater than the anticipated short-to-medium-term demand, and if the hardware resources themselves are easily expandable, then the cloud really doesn’t offer much advantage in terms of resource flexibility. In effect, Docker moves bare metal from the “can’t use” category to “can use” when it comes to the kind of massive deployments of VMs and containers which are a standard part of the cloud environment. This is an important point — very often, it is this change from “can’t use” to “can use” that sets off revolutions in the way that technology is applied (most of the history of personal computers, for example, could be described in terms of “can’t use”/”can use” shifts), and that change is generally one in perception and understanding as much as it is a change in technology. In the case of Docker and bare metal, the shift to “can use” allows system managers and architects to take a close look at the positive advantages of bare metal in comparison to the cloud. Hardware-based solutions, for example, are often the preferred option in situations where access to dedicated resources is important. If consistent speed and reliable performance are important, bare metal may be the best choice. And the biggest surprises may come when designers start asking themselves, “What can we do with Docker on bare metal that we could do with anything before?” So, does Docker make bare metal relevant? Yes it does, and more than that, it makes bare metal into a new game, with new and potentially very interesting rules. About the Author @mazorstorn Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry, working on the prototype for the ground-breaking laser-disc game Dragon’s Lair. He spent much of the 90s in the high-pressure bundled software industry, where near-continuous release cycles and automated deployment were already de facto standards; during that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues. Sources source:”IBM 700 logic module” by Autopilot – Own work. Licensed under CC BY-SA 3.0 via Commons – https://commons.wikimedia.org/wiki/File:IBM_700_logic_module.jpg#/media/File:IBM_700_logic_module.jpg

Blog

VPC Flow Logging in a DevOps Environment

Blog

Log Analysis in a DevOps Environment

Log analysis is a first-rate debugging tool for DevOps. But if all you’re using it for is finding and preventing trouble, you may be missing some of the major benefits of log analysis. What else can it offer you? Let’s talk about growth. First of all, not all trouble shows up in the form of bugs or error messages; an “error-free” system can still be operating far below optimal efficiency by a variety of important standards. What is the actual response time from the user’s point of view? Is the program eating up clock cycles with unnecessary operations? Log analysis can help you identify bottlenecks, even when they aren’t yet apparent in day-to-day operations. Use Cases for Log Analysis Consider, for example, something as basic as database access. As the number of records grows, access time can slow down, sometimes significantly; there’s nothing new about that. But if the complexity and the number of tables in the database are also increasing, those factors can also slow down retrieval. If the code that deals with the database is designed for maximum efficiency in all situations, it should handle the increased complexity with a minimum of trouble. The tricky part of that last sentence, however, is the phrase “in all situations”. In practice, most code is designed to be efficient under any conditions which seem reasonable at the time, rather than in perpetuity. A routine that performs an optional check on database records may not present any problem when the number of records is low, or when it only runs occasionally, but it may slow the system down if the number of affected records is too high, or if it is done too frequently. As conditions change, hidden inefficiencies in existing code are likely to make themselves known, particularly if the changes put greater demands on the system. As inefficiencies of this kind emerge (but before they present obvious problems in performance) they are likely to show up in the system’s logs. As an example, a gradual increase in the time required to open or close a group of records might appear, which gives you a chance to anticipate and prevent any slowdowns that they might cause. Log analysis can find other kinds of potential bottlenecks as well. For example, intermittent delays in response from a process or an external program can be hard to detect simply by watching overall performance, but they will probably show up in the log files. A single process with significant delays in response time can slow down the whole system. If two process are dependent on each other, and they each have intermittent delays, they can reduce the system’s speed to a crawl or even bring it to a halt. Log analysis should allow you to recognize these delays, as well as the dependencies which can amplify them. Log Data Analytics – Beyond Ops Software operation isn’t the only thing that can be made more efficient by log analysis. Consider the amount of time that is spent in meetings simply trying to get everybody on the same page when it comes to discussing technical issues. It’s far too easy to have a prolonged discussion of performance problems and potential solutions without the participants having a clear idea of the current state of the system. One of the easiest ways to bring such a meeting into focus and shorten discussion time is to provide everybody involved with a digest of key items from the logs, showing the current state of the system and highlighting problem areas. Log analysis can also be a major aid to overall planning by providing detailed picture of how the system actually performs. It can help you map out the parts of the system are the most sensitive to changes in the performance in other areas, allowing you to avoid making alterations which are likely to degrade performance. It can also reveal unanticipated dependencies, as well as suggesting potential shortcuts in the flow of data. Understanding Scalability via Log Analysis One of the most important things that log analysis can do in terms of growth is to help you understand how the system is likely to perform as it scales up. When you know the time required to perform a particular operation on 100,000 records, you can roughly calculate the time required to do the same operation with 10,000,000 records. This in turn allows you to consider whether the code that performs the operation will be adequate at a larger scale, or whether you will need to look at a new strategy for producing the same results. Observability and Baseline Metrics A log analysis system that lets you establish a baseline and observe changes to metrics in relation to that baseline is of course extremely valuable for troubleshooting, but it can also be a major aid to growth. Rapid notification of changes in metrics gives you a real-time window into the way that the system responds to new conditions, and it allows you to detect potential sensitivities which might otherwise go unnoticed. In a similar vein, a system with superior anomaly detection features will make it much easier to pinpoint potential bottlenecks and delayed-response cascades by alerting you to the kinds of unusual events which are often signatures of such problems. All of these things — detecting bottlenecks and intermittent delays, as well as other anomalies which may signal future trouble, anticipating changes in performance as a result of changes in scale, recognizing inefficiencies — will help you turn your software (and your organization) into the kind of lean, clean system which is so often necessary for growth. And all of these things can, surprisingly enough, come from something as simple as good, intelligent, thoughtful log analysis.

Blog

Open Sourcing DevOps

Once again, DevOps is moving the needle. This time, it’s in the open source world, and both open source and commercial software may never be the same again.As more and more open source projects have become not only successful, but vital to the individual and organizations that use them, the open source world has begun to receive some (occasionally grudging) respect from commercial software developers. And as commercial developers have become increasingly dependent on open source tools, they have begun to take the open source process itself more seriously.Today some large corporations have begun to actively participate in the open source world, not merely as users, but as developers of open source projects. SAP, for example, has a variety of projects on github, and Capital One has just launched its open source Hygeia project, also on github. Why would large, corporate, and commercial software developers place their code in open source repositories? Needless to say, they’re making only a limited number of projects available as open source, but for companies used to treating their proprietary source code as a valuable asset that needs to be closely guarded, even limited open source exposure is a remarkable concession. It’s reasonable to assume that they see significant value in the move. What kind of payoff are they looking for? Hiring. The open source community is one of the largest and most accessible pools of programming talent in software history, and a high percentage of the most capable participants are underemployed by traditional standards. Posting an attractive-looking set of open source projects is an easy way to lure new recruits. It allows potential employees to get their feet wet without making any commitments (on either end), and it says, “Hey, we’re a casual, relaxed open source company — working for us will be just like what you’re doing now, only you’ll be making more money!” Recognition. It’s an easy way to recognize employees who have made a contribution — post a (non-essential) project that they’ve worked on, giving them credit for their work. It’s cheaper than a bonus (or even a trophy), and the recognition that employees receive is considerably more public and lasting than a corporate award ceremony. Development of open source as a resource. Large corporations are already major users and sponsors of open-source software, often with direct involvement at the foundation level. By entering the open source world as active contributors, they are positioning themselves to exert even greater influence on its course of development by engaging the open source community on the ground. Behind the direct move into open source is also the recognition that the basic model of the software industry has largely shifted from selling products, which by their nature are at least somewhat proprietary, to selling services, where the unique value of what is being sold depends much more on the specific combination of services being provided, along with the interactions between the vendor and the customer. The backend code at a website can all be generic; the “brand” — the combination of look-and-feel, services provided, name recognition, and trademarks — is the only thing that really needs to be proprietary. And even when other providers manage to successfully clone a brand, they may come up short, as Facebook’s would-be competitors have discovered. Facebook is an instructive example, because (even though its backend code is unique and largely proprietary) the unique service which it provides, and which gives it its value, is the community of users — something that by its nature isn’t proprietary. In the service model, the uniqueness of tools becomes less and less important. In a world where all services used the same basic set of tools, individual service providers could and would still distinguish themselves based on the combinations of services that they offered and the intangibles associated with those services. This doesn’t mean that the source code for SAP’s core applications is about to become worthless, of course. Its value is intimately tied to SAP’s brand, its reputation, its services, and perhaps more than anything, to the accumulated expertise, knowledge, and experience which SAP possesses at the organizational level. As with Facebook, it would be much easier to clone any of SAP’s applications than it would be to clone these intangibles. But the shift to services does mean that for large corporate developers like SAP, placing the code for new projects (particularly for auxiliary software not closely tied to their core applications) in an open source repository may be a reasonable option. The boundary between proprietary and open source software is no longer the boundary between worlds, or between commercial developers and open source foundations. It is now more of a thin line between proprietary and open source applications (or components, or even code snippets) on an individual basis, and very possibly operating within the same environment. For current and future software developers, this does present a challenge, but one which is manageable: to recast themselves not as creators of unique source code, or even as developers of unique applications, but rather as providers of unique packages of applications and services. These packages may include both proprietary and open source elements, but their value will lie in what they offer the user as a package much more than it lies in the intellectual property rights status of the components. This kind of packaging has always been smart business, and the most successful software vendors have always made good use of it. We are rapidly entering a time when it may be the only way to do business.

Blog

DevOps is a Strategy