Best Practices for Centralized Logging in Microservices Architecture

Tips from the Team

Last updated: December 18, 2023

Microservice architectures bring significant benefits to software teams. Microservices allow a team to scale individual parts of their application independently. They provide huge benefits in resilience, too. An application with a memory leak in a troublesome spot is a big problem for a monolithic architecture. One bug takes down the whole system and leads to extended outages. With a microservice architecture, a memory leak only takes down one part of the swarm. However, there are downsides to the microservice approach. For starters, microservices complicate deployment plans. While you can solve some of those complications with orchestration tools, it’s another layer of technology in your stack.

Microservice architectures are a pain to debug. Coding a logic error in one microservice often leads to failures in a different service. Determining the origin of a bug is an extended process of tracking down and identifying a root cause. The benefit of each slice of the application having its own server(s) is you can scale and silo those servers. The downside is when they break, figuring out exactly where takes much longer. In this post, we’ll talk about how to solve some of those problems with microservice logging and unlock the benefits while minimizing the cost.

Centralized Microservice Logging

The first and most important rule of microservice logging is those logs should go to a single place. It’s easy to rig your microservices to log to AWS Cloudwatch or Azure Monitor. And with a lot of work, you’ll get to a point where you have something that almost keeps your team in the loop. But it’s important not to underestimate this work. By default, those tools will throw logs from each service into their own bin. Determining a user received an error because of a null pointer exception in some business logic that originated in your authentication service is a journey. You’ll spend hours tracing requests between different services out of the box. When you’re working on a critical outage, this is time you can’t afford to spend.

It’s likely you already knew the value of centralized logging, however. This should be your first step if your team isn’t using a centralized microservice logging solution. You need to get all your logs into the same place so you’re not trying to play Frogger with different logging silos. But what do you do once you have those logs in the same place?

Microservice Logging Tip #1: Correlate Between Services

Once all your logs are in one place, you’ll notice something when debugging: they’re noisy. If you’re working on a high-traffic site, your application generates millions of log entries per hour. That’s too much to look through! Tracing down the bug from earlier is still just as challenging, even though all your logs are in one place. To solve this problem, I like to include a unique identifier generated by the client making a request from your server. The identifier is passed between each service needed to complete the request. Now, if you’re trying to troubleshoot a bug, your first step is identifying the unique identifier passed with the request. It’s even better if your error handling includes the unique ID during the error log.

This tip can save you countless hours debugging your application when using a centralized microservice logging approach. If you take one thing away from these tips, this should be it: get your logs into one place and make it easy to trace a single request between microservices.

Tip #2: Get Everything Into One Place

I know this seems like I’m repeating myself. But this is worth focusing on. Often, teams log their business application logic in one place and think they’ve covered all their bases. Not so fast, my friend. Microservice logging covers more than application logic. Modern applications include services covering a whole range of logic. They also cover the services coordinating those services and the services covering those services, too! It’s important to log not just your application logic but also logic for your container framework. And you need to make sure you’re logging your container orchestration systems, too. Then, you’ve got the systems that glue all these things together. Imagine a bug where one service can’t talk to your database, even though it’s sending the request and the database is receiving incoming connections. Maybe it’s because of a faulty network connector. Perhaps it’s because someone changed a firewall rule without understanding the consequences.

Getting all these logs from disparate systems into one place again makes troubleshooting issues simpler and faster. During critical outages, speed saves your business money.

Tip #3: Monitor Your Logs

It’s essential to have visibility into your logs. At every level, you need to see what your software tells you. But this isn’t not enough for a scaling microservice architecture. Remember, you’re generating millions of log entries per hour. Often, debugging an issue after the fact isn’t enough. By the time you know you have a problem, you’ve already lost someone’s business. Instead, you need to know as soon as an issue presents itself. This means automated monitoring of your microservice logging. When all your logs flow into one system, you can glean real-time intelligence about your application’s running state. If a single microservice starts reporting slowdowns or issues, you can see it. When those issues start impacting other services, you know instantly.

A good microservice logging strategy will have automated alerts to notify the team when things are going off the rails. That’s your cue to jump in before an outage affects customers. Instead of merely using logs to analyze problems after the fact, good microservice logging uses logs to tell you before things get out of hand.

Tip #4: Don’t Sleep on Search

Searching through millions of log entries by hand is tedious, even when time isn’t of the essence. A high-quality microservice logging solution needs a quick and easy way to search those logs. The key here is quick. While many logging options allow for simple text searching, you will want more insight than what a simple grep or AWK call can give you. If that’s the best you can muster, your microservice logging solution leaves you hanging.

Instead, as you think about microservice logging, you want to analyze how quickly and easily you can search for detailed information. Logs you can’t find don’t bring any benefit.

Tip #5: Automated Solution for Handling Failures

Handling failures in microservices logging can be challenging, but it is essential to maintain the reliability and integrity of your log data. The solution involves a set of practices, tools, and strategies that ensure the reliable collection and processing of log data in a microservices architecture, even in the presence of various types of failures. Furthermore, it ensures that log data remains accessible, accurate, and complete, even in a distributed and potentially unreliable environment.

These practices and tools contribute to microservices-based applications’ overall reliability and maintainability by ensuring that log data is available for monitoring and troubleshooting purposes.

Tip #6: Logging Performance Metrics

Logging performance metrics in microservices is essential to monitor the health, efficiency, and behavior of your microservices architecture. Properly collected and analyzed performance metrics can help you identify bottlenecks, diagnose issues, optimize resource utilization, and ensure that your microservices meet performance expectations.

The following are some key performance metrics to log in a microservices environment:

Response time of each microservice or API endpoint.
Number of requests or transactions processed per unit of time (Throughput).
Rate of errors and exceptions encountered by each microservice.
Latency distribution histograms or percentiles.
CPU and memory utilization for each microservice.

By logging these metrics and setting up automated alerting, you can ensure that performance issues are detected and addressed promptly in your microservices architecture.

Tip #7: Providing Informative Logs

Well-structured and informative logs can help you quickly identify issues, trace the flow of requests through your microservices, and gain insights into the behavior of your system. You can create informative logs that enable efficient monitoring and troubleshooting of your microservices architecture, making it easier to maintain and improve system performance and reliability by below best practices:

Using appropriate log levels to categorize log messages by severity.
Including unique request or transaction IDs in log entries to help trace a single request across multiple microservices.
Writing clear and concise log messages that provide information about the event or condition being logged.
Including a log version or schema version in log entries.
Be mindful of logging sensitive information.

Tip #8: Adding Contextual Data

Contextual data helps you understand the circumstances surrounding each log event, making it easier to diagnose issues and monitor the behavior of your microservices. To implement contextual data effectively in microservices logging, consider using structured log formats such as JSON to ensure consistency and ease of parsing.

Additionally, use automated logging frameworks and libraries that allow you to easily include contextual data without excessive manual coding. This approach will greatly improve the quality and usefulness of your microservices logs for monitoring and troubleshooting purposes.

Finding the Right Microservice Logging Solution

You can build all these tools yourself. It’s possible to spend dozens or hundreds of hours configuring CloudWatch to meet all the tips we provided in this post. You’ll have to spend time touching each part of your system, and some parts will be particularly challenging. But you can do it. Or, you can let someone else do that work for you. After all, the goal here is to make your logs more efficient, not spend all day getting them set up. That’s where an option like SolarWinds ^® Papertrail^™ can make life easier. It lives up to every tip in this list and does more to keep your business running smoothly, like tracking live tails of logs from multiple services in one place. Just as microservices unlock the ability to scale your application, Papertrail unlocks the power logging can bring to your application. Find out what you’ve been missing with a free trial of SolarWinds Papertrail today.

Frustration-free log management. (It’s a thing.)

Aggregate, organize, and manage your logs with Papertrail

Start FREE Trial

This post was written by Eric Boersma. Eric is a software developer and development manager who’s done everything from IT security in pharmaceuticals to writing intelligence software for the US government to building international development teams for non-profits. He loves to talk about the things he’s learned along the way, and he enjoys listening to and learning from others as well.

Looking for something more advanced? Check out the SolarWinds centralized log management

Best Practices for Centralized Logging in Microservices Architecture

Tips from the Team

Node.js Logging – How to Get Started

A Guide to Log Filtering: Tips for IT Pros

Windows Event Log Filtering Techniques

What Your Router Logs Say About Your Network

8 Python Logging Pitfalls to Avoid

Guide to Debugging Ruby on Rails Applications

How to Optimize .NET Error Logging

Best Tips for Monitoring and Filtering Your Web Server Logs

Seven Typical Problems With Logging Ruby (And How to Solve Them)

Diving Into Docker Logs: The Last 100 Lines Lowdown