One of the biggest challenges organizations face when operating web applications is monitoring the availability of complex transactions that involve multiple steps. Developers and testers are often left manually stepping through their applications in the hopes of reproducing problems and replicating the complex nature of user experience. What they really need is a way to simulate real user activity independent of any actual users.
In this article, we’ll explain how to create and monitor web applications using synthetic transactions. We’ll show you how to simulate traffic to a web application, how to record each action using transaction monitoring software, and how logs provide important context to user activity.
What Is Synthetic Transaction Monitoring, and Why Is It Important?
Synthetic transaction monitoring is a process for monitoring application performance and behavior by simulating user actions. Synthetic transactions are powered by automated scripts and services, which aren’t only capable of running continuously, but can also scale to simulate large volumes of traffic. Synthetic transaction monitoring is an effective way of identifying weaknesses including instability, poor performance, or poor usability.
Another key benefit of synthetic transaction monitoring is that it can be done from any geographic location. Your users are likely to be located around the world, rather than exclusively in a single region. You may find that your application runs extremely fast for North American users, but is slow in Japan or South America. Running synthetic transactions from these regions helps you find these, and similar bottlenecks, before your users do.
How is Transaction Monitoring Done Today?
A common way to monitor transactions is with real user monitoring (RUM). As the name implies, RUM generates insights from actual users. While this is as genuine as it gets, it’s only effective if you have a relatively large amount of traffic to generate data from. It also only works once your application is in production, making it less effective at catching problems in staging or earlier.
Some teams use scripting and automation tools to simulate real user activity. Tools like SolarWinds® Web Performance Monitor can perform browser actions such as clicks, key presses, and scrolling. These actions can also be scripted to create consistently repeatable tests. However, these tools require more extensive setup and customization specific to your application.
Finally, transactions can be reconstructed from log data generated by your application and infrastructure. Logs create a detailed, chronological record of application activity. Logging contextual data such as HTTP headers, session data, and hostnames can even let you trace logs from one component to another (such as between hosts).
In addition, logs contain a wealth of information about application events, including error messages, exceptions, and stack traces. When used in combination with scripting tools, logs are extremely useful for monitoring transactions.
How to Implement Synthetic Transaction Monitoring
Papertrail is a log aggregation service that collects logs from your applications, services, hosts, and other infrastructure components. You can monitor all of your infrastructure logs from a single location, search for logs by host or application, and filter logs based on contextual data.
Pingdom is a web monitoring service that lets you measure the performance, stability, and uptime of your websites. It can perform periodic performance tests, record real-user metrics, and test from multiple different regions. Pingdom also lets you create scripted interactions using a synthetic monitoring editor. You can run these scripts on a customizable schedule, and Pingdom generates full reports explaining the outcome of each test.
For example, imagine you recently launched a new website to the public. It was a perfect launch: your developers and QA squashed every bug, the app is completely stable, and users are experiencing near-instant page load times. Well, most users, anyway; some users are experiencing HTTP 500 errors when visiting the site, and nobody knows why. To figure this out, we’ll use Pingdom to test our site from multiple regions, and we’ll use Papertrail to try and find the root cause of the problem.
Introducing Our Example Application
We’ve created a very simple example website to demonstrate transaction monitoring. It simply forwards people to a page that is localized based on their language. This is an important use case for companies with customers all around the world because you need to verify that each country is performing as expected. Luckily, Pingdom allows us to test transactions from each location separately.
It’s a simple Python application running on the Apache web server. We use mod_wsgi to route requests from Apache to our Python scripts. When a user visits the site, we detect the user’s locale and localize the site’s text using the gettext module. The locale is determined by the URL: for example, English-speaking users are redirected to http://our-domain.com/en/, while Spanish-speaking users are redirected to http://our-domain.com/es. You can find the backend code for the site in this gist on Github.
For Papertrail, we need a way to collect logs from both Apache and Python. Since Apache runs on the host as a service, we’ll use the host’s syslog service to forward its logs to Papertrail. With Python, we can use the Python SysLogHandler to send logs directly from the app to Papertrail. We can verify that both of these are working by starting our application, performing a few requests, and opening Papertrail.
Performing Transaction Tests With Pingdom
Now, let’s create a set of Pingdom transaction tests to verify the text displayed for different locales. We’ll create three different tests: one for English speakers, one for Spanish speakers, and one for German speakers. Each test accesses the website using a region-specific locale and verifies the text displayed in the response. If the response matches the expected localized text, then the test passes.
We’ll run each test in the region closest to its target audience to simulate the path that a real user request would take, although this isn’t necessary.
Analyzing the Results in Papertrail
The test results show a noticeable pattern. The English and Spanish versions of the site load as expected, but the German version consistently failed to load. In addition, the page load time averaged 60,018 ms (over a minute) for each test.
We can drill down into the problem by clicking on the check and selecting the Root Cause Analysis button next to one of the failed tests. This shows us that the check timed out while waiting for a particular element to appear on the page. Looking further down the page, we can see that the app returned a 500 Internal Server Error page instead of a standard response, resulting in an inability for Pingdom to run the test.
Performing a Root Cause Analysis on a failed transaction check in Pingdom.
Pingdom can’t tell us the cause of the error since it occurred on the server. For that, we’ll need to turn to the logs.
Troubleshooting Using Papertrail
Fortunately, we deployed our application using Papertrail as a log aggregator and centralizer. We have two avenues: viewing the logs from Apache, and viewing the logs from our application. We’ll start with the Apache logs and work our way back if necessary.
Let’s search for logs originating from Apache and containing a 500 error code. Since we deployed our applications to Kubernetes, we can search for the application by Pod name (“localization-demo”). And since Apache includes the error code with each response log, we can search events directly for the text “500”:
If we look at the surrounding logs, we can see additional reasons for the failure. We know that our Python script generated an exception, but we don’t know what type of exception it is or where it was generated:
To better understand the cause of the internal server error, we need to review our application logs. We can search for logs originating from the application, and since we include the log level with each event, we can search for logs containing the word “ERROR”:
This log may be verbose, but it tells us exactly what we need to know. In line 41 of our main Python script (app.wsgi), the gettext module attempts and fails to load a translation file. As a result, it raises an OSError and causes the application to exit. When our application starts, we specify the directory where gettext can find localization files. This directory contains multiple subdirectories that correspond to different locales and contain the relevant files.
The problem is that we only deployed our application with two of these subdirectories: “en” for English, and “es” for Spanish. When a German user accesses our website, gettext tries to load a “de” folder that doesn’t exist. This results in the OSError, which is propagated up to Apache and returned to the user as an HTTP 500 error.
To fix the error, we can create a new “de” folder and localize our website for German users. We can also set a fallback translation that gettext will use whenever it encounters an unknown locale. For example, we might set the fallback to English if the majority of our visitors are from English-speaking countries.
Synthetic transaction monitoring is a powerful tool for testing website performance, availability, and usability. Replicating real user actions helps you identify errors or slowdowns before and during production.
Pingdom makes it easy to schedule and run synthetic transactions against your web services at any time and from multiple locations. Using our built-in editor, you can quickly script basic or complex browsing actions to step through any site. And with Papertrail logging your applications and infrastructure, troubleshooting unexpected behaviors is as simple as running a search.