Get the report
MoreComplete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.
September 22, 2017
When it comes to monitoring and logging in Docker, the recommended pathway for developers has been for the container to write to its standard output, and let Docker collect the output. Then you configure Docker to either store it in files, or send it to syslog. Another option is to write to a directory, so the plain log file is the typical /var/log thing, and then you share that directory with another container.
In practice, when you stop the first container, you indicate that /var/log will be a “volume,” essentially a special directory, that can then be shared with another container. Then you can run tail -f in a separate container to inspect those logs. Running tail by itself isn’t extremely exciting, but it becomes much more meaningful if you want to run a log collector that takes those logs and ships them somewhere. The reason is you shouldn’t have to synchronize between application and logging containers (for example, where the logging system needs Java or Node.js because it ships logs that way). The application and logging containers should not have to agree on specific dependencies, and risk breaking each others’ code.
However, this isn’t the only way to log in Docker. Remember the 12-Factor app, a methodology for building SaaS applications, recommending that you limit to one process per container as a best practice, with each running unbuffered and sending data to Stdout. There are numerous options for container logging from the pre-Docker 1.6 days forward, and are better than others. You could:
Docker 1.6 added 3 new log drivers: docker logs, syslog, and log-driver null. The driver interface was meant to support the smallest subset available for logging drivers to implement their functionality. Stdout and stderr would still be the source of logging for containers, but Docker takes the raw streams from the containers to create discrete messages delimited by writes that are then sent to the logging drivers. Version 1.7 added the ability to pass in parameters to drivers, and in Docker 1.9 tags were made available to other drivers. Importantly, Docker 1.10 allows syslog to run encrypted, thus allowing companies like Sumo Logic to send securely to the cloud.
Recent proposals for Google Cloud Cloud Logging driver, and the TCP, UDP, Unix Domain Socket driver. “As part of the Docker engine, you need to go through the engine commit protocol. This is good, because there’s a lot of review stability. But it is also suboptimal because it is not really modular, and it adds more and more dependencies on third party libraries.”
In fact, others have suggested the drivers be external plugins, similar to how volumes and networks work. Plugins would allow developers to write custom drivers for their specific infrastructure, and it would enable third-party developers to build drivers without having to get them merged upstream and wait for the next Docker release.
To get real value from machine-generated data, you need to look at “comprehensive monitoring.” There are five requirements to enable comprehensive monitoring.
Let's start with events. The Docker API makes it trivial to subscribe to the event stream. Events contain lots of interesting information. The full list is well described in the Docker API doc, but let’s just say you can track containers come and go, as well as observe containers getting killed, and other interesting stuff, such as out of memory situations. Docker has consistently added new events with every version, so this is a gift that will keep on giving in the future.
Think of Docker events as nothing but logs. And they are very nicely structured—it's all just JSON. If, for example, you load this into my log aggregation solution, you can now track which container is running where. I can also track trends - for example, which images are run in the first place, and how often are they being run. Or, why are suddenly 10x more containers started in this period vs. before, and so on. This probably doesn't matter much for personal development, but once you have fleets, this is a super juicy source of insight. Lifecycle tracking for all your containers will matter a lot.
Docker events, among other things, allow us to see containers come and go. What if we wanted also to track the configurations of those containers? Maybe we want to track drift of run parameters, such as volume settings, or capabilities and limits. The container image is immutable, but what about the invocation? Having detailed records of container starting configurations in my mind is another piece of the puzzle towards solving total visibility. Orchestration solutions will provide those settings, sure, but who is telling those solutions what to do?
From experience, we know that deployment configurations are inevitably going to be drifting, and we have found the root cause to otherwise inscrutable problems there more than once. Docker allows us to use the inspect API to get the container configuration. Again, in my mental model, that's just a log. Send it to your aggregator. Alert on deviations, use the data after the fact for troubleshooting. Docker provides this info in a clean and convenient format.
Well, obviously, it would be great to have logs, right? Turns out there are many different ways to deal with logs in Docker, and new options are being enabled by the new log driver API. Not everybody is quite there yet in 12-factor land, but the again there are workarounds for when you need fat containers and you need to collect logs from files inside of containers.
More and more people following the best practice of writing logs to standard out and standard error, and it is pretty straightforward to grab those logs from the logs API and forward them from there. The Logspout approach, for example, is really neat. It uses the event API to watch which containers get started, then turns around and attaches to the log endpoint, and then pumps the logs somewhere. Easy and complete, and you have all the logs in one place for troubleshooting, analytics, and alerting.
Since the release of Docker 1.5, container-level statistics are exposed via a new API. Now you can alert on the "throttled_data" information, for example - how about that? Again (and at this point, this is getting repetitive, perhaps), this data should be sucked into a centralized system. Ideally, this is the same system that already has the events, the configurations, and the logs! Logs can be correlated with the metrics and events. There are many pieces to the puzzle, but all of this data can be extracted from Docker pretty easily today already.
In all the excitement around APIs for monitoring data, let's not forget that we also need to have host level visibility. A comprehensive solution should therefore also work hard to get the Docker daemon logs, and provide a way to get any other system level logs that factor into the way Docker is being put to use on the hosts of the fleet. Add host level statistics to this and now performance issues can be understood in a holistic fashion - on a container basis, but also related to how the host is doing. Maybe there's some intricate interplay between containers based on placement that pops up on one host but not the other? Without quick access to the actual data, you will scratch your head all day.
What's the desirable user experience for a comprehensive monitoring solution for Docker? Thanks to the API-based approach that allows us to get to all the data either locally or remotely, it should be easy to encapsulate all the monitoring data acquisition and forwarding into a container that can either run remotely, if the Docker daemons support remote access, or as a system container on every host. Depending on how the emerging orchestration solutions approach this, it might not even be too crazy to assume that the collection container could simply attach to a master daemon. It seems Docker Swarm might make this possible. Super simple, just add the URL to the collector config and go.
In its default configuration, our containerized Collector agent will use the Docker API to collect the logs and statistics (metrics) from all containers, and the events that are emitted from the Docker Engine. Unless configured otherwise, the Collector will monitor all containers that are currently active, as well as any containers that are started and stopped subsequently. Within seconds, the latest version of the Collector container will be downloaded, and all of the signals coming from your Docker environment will be pumped up to Sumo Logic’s platform.
Using the API has its advantages. It allows us to get all 3 telemetry types (logs, metrics, and events), we can query for additional metadata during container startup, we don’t have to accommodate for different log file locations, and the integration is the same regardless of whether you log to files, or to journalD.
The other advantage of this approach is the availability of a data collection agent that provides additional data processing capabilities and ensures reliable data delivery. Data processing capabilities include multiline processing, and data filtering and masking of data before leaving the host. This last capability is important when considering compliance requirements such as PCI or HIPAA. Also important from a compliance standpoint is reliability. All distributed logging systems must be able to accommodate networking issues or impedance mismatches, such as latency or endpoint throttling. These are all well covered issues when using the Sumo Logic Collector Agent.
Lack of multiline logging support has always plagued Docker logging.
The default Docker logging drivers, and the existing 3rd party logging drivers, have not supported multiline log messages, and for the most part, they still do not.
One of Sumo Logic’s strengths has always been its ability to rejoin multiline log messages back into a single log message. This is an especially important issue to consider when monitoring JVM-based apps, and working with stack traces. Sumo Logic automatically infers common boundary patterns, and supports custom message boundary expressions. We ensure that our Docker Log Source and our Docker Logging Plugin maintain these same multiline processing capabilities. The ability to maintain multiline support is one of the reasons why we recommend using our custom Docker API based integration over simply reading the log files from the host.
Generally speaking, reading container logs from the file system is a fine approach. However, when the logs are wrapped in JSON, and ornamented with additional metadata, it makes the multiline processing far more difficult. Other logging drivers are starting to consider this issue, no doubt based on market feedback. However, their capabilities are far less mature than Sumo’s.
The installation of the containerized agent couldn’t be simpler. And with a simple query, you can see the data from all of the containers on your host, with all of the fields extracted and ready to explore. From there, it is easy to install our Docker App to monitor your complete Docker Environment as you scale this out to all of your hosts.
When you deploy the Sumo Logic Collector container across a fleet of hosts, monitoring hundreds or thousands of containers, you will want to be a bit more sophisticated than just running with the default container settings. However, that is beyond the scope of this discussion. When you deploy our Collector Agent as a container, all of the Collector agent’s features are available, and all parameters can be configured. To read about how to dive into the advanced configuration options, check out the container’s readme on Docker Hub and read more details in our documentation .
There are times when you require an agentless solution – or you may just prefer one. If you have another way to collect Docker container metrics, and you just need container logs, then a Docker Logging Plugin (earlier versions referred to as Logging Drivers) may be the perfect solution.
Note: The agentless approach is an ideal solution for AWS ECS users that rely on CloudWatch for their container metrics and events.
How Sumo Logic's Docker Logging Plugin Works
Our Docker Logging Plugin is written in Go, and runs within the Docker Engine. It is configured on a per container basis, and sends data directly to Sumo Logic’s HTTP Endpoint, using a pre-configured “HTTP Source.” You can access our plugin on the new Docker Store , but the best place to read about how to use it is on its Github repo.
Following the theme set out earlier, it is very easy to use in its default configuration, with a host of advanced options available. Follow these simple steps:
This plugin provides some very important capabilities:
If, for some reason, these two methods do not satisfy your needs, then one of our many other collection methods (aka “Sources”) will most likely do the trick. Sumo Logic also integrates with various other open source agents and cloud platform infrastructures, and relies on some of them for certain scenarios. Details on all of the above integrations are available in our docs. If you have been using Docker for a while, and have implemented a solution from the early days, such as syslog or logspout, we encourage you to review the approaches defined here, and migrate your solution accordingly.
Reduce downtime and move from reactive to proactive monitoring.
Build, run, and secure modern applications and cloud infrastructures.
Start free trial