Imagine the software you’ve just pushed to production is causing latency spikes for users visiting your web app. Your code passed all the tests but something about running with real-life users has uncovered a bug not previously caught by your automated testing. Now you need to figure out what went wrong and how to fix it.
We’ve all been there. These kinds of situations call for detailed, actionable syslog messages to help with your troubleshooting. But good logs don’t just happen, they require careful thought and a good helping of best practices. We can’t help you with the thinking, but we’ve gathered a list of tips and tricks to help you create the most useful log messages possible.
Log messages can come in any format, and there’s no universal standard. But working with large volumes of logs is almost impossible if you don’t have a way to automatically parse log entries to find what you’re searching for. An easy-to-parse format will have better support from tools. One example is JSON, a ubiquitous structured-data log format that’s become the de facto used for many logging applications. It’s both machine and human-readable and is supported by most language and runtimes. And it has the added benefit of being compact and efficient to parse.
Schemas may seem overkill when thinking about logging since you can often use full-searching to find what you need, but log entries without a predefined structure throw away a lot of context. For example, mandating certain fields be included in the log format gives you a solid foundation of data items that are always going to be present in your logs such as hostnames, IP addresses, and timestamps. Additionally, log formats without schemas are difficult to maintain as new logging calls are added to your software, new team members come onboard, and new features are developed. Knowing exactly what information needs to be embedded in log entries helps you write them and helps everyone else read them.
There are countless logging libraries for programming languages and runtime environments. No matter what language your app is developed with, you’ll have no trouble finding a way to transmit logs from your app or service to a syslog server.
And if for some reason you really can’t find a suitable library, Papertrail has released remote_syslog2, a tiny standalone daemon for transmitting plain-text files to a syslog server.
Closely linked with using a schema to precisely describe your log format is the best practice of using identifiers in your messages. Identifiers help you understand where a message came from and figure out how multiple messages are related. For example, if you see the same session ID in two separate error messages, you know both of those messages were generated by the same session. It’s even more helpful to follow identifiers across services where you might find the only way to link multiple messages together is by the common session ID value.
Using the most appropriate logging level when emitting a message can make future troubleshooting easier. Logging levels are implemented with severity levels in the syslog logging format and multiple severity levels are available for classifying log messages. Those levels, ordered from least to most severe, are debug, info, notice, warning, err, crit, alert, emerg. Most of the levels are specific to the app. The two exceptions are debug and emerg, which are independent of whatever app they’re used with and denote messages containing useful information when debugging and a panic condition, respectively.
Adding new calls to logging functions is easy, but the best syslog messages include all the relevant context to recreate the state of your app at the time of the logging call. This means adding the source of the problem in error messages and concise, pithy reasons for emitting emergency log messages. Since logs are often used to diagnose problems after they’ve occurred, you can’t always go back and piece together the state of the system at the time of the error. Instead, you need all the context in your log message you might need later. Including as much context as possible, but not too much, is the key to getting this right.
The syslog protocol specification allows multiple lines to be contained within a single log message, but writing messages over multiple lines, such as exception and stack traces, can introduce real problems. Not all logging endpoints or parsing tools work well when messages are split across lines. For example, sed and grep, the two stalwart tools of the log-parsing toolbox, don’t handle searching for patterns across lines very well. It can be done, but things get complicated quickly.
If you absolutely must include multiline messages then you should investigate using a cloud-based log aggregation tool such as Papertrail, which has the ability to find the separate parts of a single log message when it’s split across lines.
In general, it’s a bad idea to store sensitive data such as passwords and personal information in log messages. Syslog messages are often stored in plain text and any attackers who gain access to those messages will be able to easily read them. One way to get around this problem is to cryptographically sign your log messages or to securely transmit them using TLS.
Most logs are used to diagnose issues and find the root cause of bugs. But well-crafted logs are also invaluable for ops teams and performance investigations. By including additional information in your logs such as transaction duration data, counter values from software-internal statistics, and data for audit trails, your logs can serve a dual purpose and help with non-troubleshooting tasks.
The syslog log message format is supported by most programming tools and runtime environments for a reason: it’s an invaluable way to transmit and record log messages. But useful, actionable log messages don’t just happen. Creating log messages with the right data requires you to think about your situations and use cases and to tailor those log messages appropriately. Fortunately, some best practices make the job easier.
Log messages should include as much context as possible, so you can troubleshoot issues when you’re looking at them at a future time, and identifiers are a vital part of piecing together multiple messages across services. They also need to judiciously use syslog’s eight severity levels to convey the importance of the message. Parsing of log messages can be made a whole lot simpler by choosing an easily parsable format and then assigning meanings to each field of the format using a schema.
Finally, by including additional information in your logs not directly related to troubleshooting, other teams can also benefit from the data.