Today’s applications and services support the core business activities organizations rely on. Service interruptions and downtime time are no longer just inconvenient, they’re directly tied to both lost revenue and customers. It’s more important than ever to actively monitor the health of your applications and services to detect issues before there’s an outage. Being able to quickly detect and resolve issues is key to your organization’s financial and competitive position, as well as your career success.
Monitoring and alerting can help with problem detection, but it’s not enough to understand what happened or to pinpoint the cause. Let’s say a user accesses one of your applications. After performing a couple of actions, an error pops up. Through your monitoring tool, you know something went wrong, but debugging the actual problem is much harder. You don’t have any data that can help you diagnose it.
Logging is the answer, but getting the most out of your logs depends on how you implement logging. Logging as a service (LaaS), especially in today’s hybrid and distributed environments, can transform your log management program.
Below are the five reasons my team and I implemented logging as a service for our applications.
Due to the distributed nature of a microservices architecture, each service generates its own logs, stored in different locations. This means you have to search multiple systems and locations for relevant logs when trying to debug a problem. This can consume precious time when you’re under pressure to get an application back online.
Imagine a web shop where the shopping cart, user information handling, and payment handling all are separate services outputting logs. If something goes wrong during the payment handling step, you’d need to look for logs in each of the distinct services the user has accessed and somehow correlate them.
Using logging as a service, logs from multiple services are aggregated and searchable from a centralized location. This means you no longer spend time looking for logs or manually correlating them. You can quickly retrieve a full list of logs across all services.
Core benefit: a centralized interface for searching logs from multiple different log streams.
LaaS simplifies log management. Who wants to spend time creating an elaborate logging infrastructure for aggregation and analysis? This type of project requires a large initial investment in time and resources and long-term support and maintenance. It distracts your team from delivering business-impacting features and can quickly increase development costs.
Some DevOps teams use an open-source stack such as the ELK stack (ElasticSearch, Logstash, and Kibana) for their logging infrastructure. Open-source stacks are free to use, but they still require resources from your team to learn, install, and maintain.
So why would you invest time and resources building your own logging infrastructure when you can access world-class logging solutions for a fraction of the price? To be blunt, it makes more sense to outsource your logging infrastructure to a logging as a service solution, so your team can focus on creating business value.
Core benefit: you invest less time and resources than you would when building and maintaining your own logging infrastructure.
Many logging as a service solutions are dynamic and can handle a spike in log volume. This is particularly useful both when your application experiences an increase in traffic resulting in an increase in logs, and in a crash scenario when multiple services and applications are down. Frequently, home-grown logging infrastructures aren’t equipped to handle the sudden increase in data and will crash. This is the worst-case scenario. You’ll lose log data at the exact time you need it most to diagnose what’s going on.
A LaaS solution operates in the cloud and can easily scale when the amount of data increases, so you can be confident logs will be accessible when you need them.
Core benefit: logging as a service allows for easy scaling of your logging infrastructure in the cloud.
Mean time to repair (MTTR) is an important metric for many development teams. It indicates how quickly your team can resolve problems and get systems back online. Obviously, you want to reduce the total downtime, since this is linked to financial impact of an outage.
Several characteristics of logging as a service can reduce MTTR. First, logs are aggregated via your logging tool, which cuts down on the time and difficulty in locating the relevant log data. Second, you can access and search through logs via a single interface. Additionally, most LaaS also interleave or automatically interrelate events, so you can see all the log data surrounding an issue and quickly identify the root cause. The combination of the log aggregation, centralized search, and showing events in context significantly reduces the time it takes to understand and resolve issues.
Core benefit: mean time to repair can be reduced via a LaaS solution, reducing the financial impact of availability and performance issues.
Last, a LaaS solution can help your team become more proactive and identify issues before there’s a service impact. LaaS solutions are optimized to process large amounts of data and quickly process queries. They can also use this data to populate visualizations and generate alerts on changes in logging trends.
Understanding if an application or service is suddenly generating more events can easily be the key to fixing an issue before users are impacted. This data can also help you see exactly when and where the problem started, so you can get the root cause of issues faster. Using these insights has allowed my team to develop “early warning” alerts for potential service degradations and catch issues before they went critical.
Core benefit: quickly run queries over large amounts of data to find insights.
The answer is simple: no! You can have a logging as a service tool in place but still write vague or ambiguous logs. The value of your logging tool depends entirely on the way you handle logging. Keep in mind the following best practices if you want to get the most value out of your logging tool:
As a general rule of thumb, your logs should tell a meaningful story, so no matter which team ends up troubleshooting the issue they have the information they need to replicate what happened. Based on the story created in the logs, it should be easier to find and debug issues. .
A lot of applications, services, and devices generate logs: component logs, node logs, service logs, router logs, firewall logs, database logs, and many more. These logs reproduce the detailed path of all events leading up to a problem. Logs provide valuable pieces of information you can use to quickly debug a problem. However, scattered logs can be a real nuisance for developers when they need to find pertinent information. And they can sometimes prevent developers from finding the root cause of a problem.
Logging as a service aggregates logs to make them easily searchable from a single interface and makes log data more useful.
When logging has been implemented correctly, it adds immense value to your organization.
For example, SolarWinds® Papertrail™, a logging as a service solution, allows you to quickly search through multiple, combined log streams from a central interface and helps you reduce the mean time to repair. It also provides contextual search and log visualizations to help your team be more proactive in catching issues early to reduce the potential downtime.
This post was written by Michiel Mulders. Michiel is a passionate blockchain developer who loves writing technical content. Besides that, he loves learning about marketing, UX psychology, and entrepreneurship. When he’s not writing, he’s probably enjoying a Belgian beer!