End-to-end Performance Testing of Microservice-based Systems
When performance matters…
Imagine the following scenario: you’re part of a sizeable project using a microservice-based architecture, with multiple teams (who aren’t necessarily in the same geographical location) working on a number of microservices.
All teams may be deploying to a single target if you’re fortunate (or if the project is still in relatively early stages), such as Kubernetes or ECS, but it’s likely that there’s at least some AWS Lambda/Serverless in the mix, as well as a more traditional EC2-based deployment somewhere. There’s an API gateway which exposes public APIs that are backed by some of the microservices, and a variety of purely internal microservices that provide APIs which are only ever used by other microservices.
Most of your microservices probably expose HTTP-based APIs (whether RESTful or GraphQL), but it’s very likely that there’s a variety of other interfaces and transports used in the system: there’s probably a number of services which use SQS, Kinesis or another queue-like mechanism to receive work or output results, there may be gRPC somewhere in the mix, or perhaps a real-time component using WebSockets or Socket.io.
More importantly for the purposes of this article: performance & reliability are a key requirement for this system. Perhaps you’re in e-commerce, where every 100ms you can shave off response time affects the bottom line. Or it’s an IoT system that will need to be able to receive and process hundreds of thousands of messages every minute from a fleet of real-world sensors. Maybe it’s some kind of a ticketing or event system, with regular spikes in traffic that must be dealt with or your company’s call centers get DDoS’d by irate customers. Whatever it is, performance and reliability are important to this project, and you know you will want to invest time and effort into it.
... how do we make sure our performance goals are met?
This is the situation development teams find themselves in all the time (and one we are very familiar with and have helped with many times in the course of our consulting work). As a team lead, developer, SDET or QA engineer who’s tasked with implementing a performance testing pipeline there’s likely to be many questions that you feel need to be answered, e.g.:
- Where does performance testing fit into the development, testing, and delivery process?
- How do we run performance tests in our CI/CD pipeline?
- How do we define performance goals & SLOs for our services? Can we have those checked automatically in CI?
- What types of performance tests are there? Which ones do we need for our services?
- How do we organize our test suites? What are the best practices around structuring those tests, avoiding duplicating effort and wasting time, and sharing common solutions across teams?
- How do we pick which services to test? Do we need to test all of our microservices? What about testing different environments and configurations easily?
- How do we encourage collaboration on performance testing and involve everyone: developers, testers, SREs and product managers? (performance is everyone’s responsibility!)
- How do we make performance testing results available to all interested parties? (developers, testers, product managers, SREs)
- How do we make sure that the costs of running load tests frequently at scale do not become astronomical?
- What’s the best way to get started with all of this?
That’s a lot of questions, but fortunately we have the answers! The goal of this series of articles is to give you a practical methodology to follow with your team, and make it so clear that you could read this guide and start tomorrow.
By the end of this series, you will end up with:
- A scaffold for building out a comprehensive test suite for all of your APIs -- internal and external, regardless of where they may be deployed, what protocol they use to communicate, or which techology stack they’re built with. Whether it’s a serverless Node.js service, a Go microservice deployed on Kubernetes that consumes Kinesis, or a SQS-based worker service.
- An understanding of how to integrate performance tests into your team’s CI/CD pipeline.
- The ability to run both load tests and acceptance/smoke/functional tests using the same test definitions, e.g. to first verify a new deployment by running a few happy path scenarios, and then blasting it with lots of traffic to test how it responds under load.
- Making the metrics generated by your tests (such as latency and RPS) available in an external monitoring system such as Datadog.
... and more!
Let’s get started!
No surprises here, we’ll be using Artillery
This article presumes that you’re familiar with Artillery. If you haven’t used it before, I'd suggest having a look through the docs, especially the Core Concepts guide, and an overview of what Artillery tests look like.
Where do we get started?
While we could start with any service, if we want to make progress quickly (e.g. to have something interesting to present to the rest of the team) and demonstrate the value of what we are doing, it’s best if we start with a service which satisfies the following criteria:
- It has a small API surface. Ideally only a couple of API endpoints.
- It would obviously benefit from load tests, so:
- It has experienced issues under load in the past or you have good reasons to suspect that it may
- It’s on the critical path for some important functionality of the system, and if it experiences hiccups, parts of the system may go down, become unavailable and affect end users
- It has high performance requirements (which may be expressed as TPS, desirable
p99
response time, number of concurrent users, or a combination of several objectives)
An example of such a service might be an authentication service which issues JWTs allowing users to access other API endpoints.
How do we organize our test code?
Now that we’ve picked a service we want to test, the next question is how to organize our code.
Monorepo!
Keeping extensibility of our test suite in mind (we will be adding more scenarios, hopefully for many other services and APIs), and with cross-team and cross-functional collaboration also firmly in our thoughts, we usually recommend using a monorepo approach, that is a single source code repository that will contain test definitions for all (or many) services and APIs:
- A single repo makes the test suite easier to extend and maintain. There’s only one repo to clone. All of the test code is there, and existing tests for other services can serve as good examples for someone just starting to write tests for their API.
- Access control is simple. You only need to share one repo with other teams rather than 10 or 20 (which would be the case if performance tests are kept alongside the microservices' codebase).
- A monorepo encourages and simplifies code reuse in tests, and helps keep things more DRY.
- A monorepo is also easier to work with when you’re building CI/CD jobs and pipelines. The CI server needs to be able to clone only one known repo to run an automated test against any existing microservice.
(Aside: here’s a nice read over on Dan Luu’s blog on reasons monorepos can work well for some projects.)
Setting up our project structure
We’re going set up the following directory structure for our tests:
acme-corp-api-tests/
- services/
- README.md
- package.json
acme-corp-api-tests
-- This is the name of the repo that will hold our test code. Naming is one of the two hardest things in computing, andorg-or-project-name
followed byapi-tests
tends to work well as a name. (Not to give too much away, but in the follow up to this article we’ll cover how to use the same test definitions to run either load tests or acceptance/smoke/functional tests, so a more generic “api-tests” makes more sense than something specific like “load-tests”).services
-- this is where our test definitions will resideREADME.md
-- theREADME.md
will cover the basics of how to use and extend the test suitepackage.json
-- this will contain dependency information for our tests and include Artillery / Artillery Pro, any third party Artillery plugins being used, or npm modules that are used in custom JS code in our test definitions.
(You can go ahead and clone the template acme-corp-api-tests repo we have set up and use that as your base.
Service-specific code
Next, we’ll create a directory under services
to hold the test code for the microservice we are starting with, e.g. services/auth-service
. We will follow a convention for each of the services that we write tests for as well, as follows:
acme-corp-api-tests/
- services/
- auth-service/
- scenarios/
- config.yaml
- functions.js
- overrides.slo-response-time.json
scenarios
will contain a number of YAML files, each defining one virtual user scenarioconfig.yaml
will contain service-specific information, such as a list of environments that the service is deployed to (e.g.dev
,test
,uat
,staging
etc), configuration of any service-specific HTTP headers, any variable data etc.- We will have a number of
overrides.something.json
files, which we will use in conjunction with the--overrides
flag inartillery run
command to be able to set the amount of generated load dynamically. For example we may have two overrides files, one setting the level of load to baseline that our service is expected to handle, and another one which emulates peak traffic spikes that we need to be able to deal with. functions.js
will hold any custom JS code that we may need to write for our scenarios.
Next, we’ll look at each one of these in more detail.
scenarios/
Service scenarios in The number and nature of these will depend on the service being tested. A simple scenario for an imaginary auth service that tests the happy path may look like this:
scenarios:
- name: Authenticate with valid credentials
flow:
- post:
url: '/auth'
json:
username: '{{ username }}'
password: '{{ password }}'
expect:
- statusCode: 200
- contentType: json
One thing to note here is the expect
attributes we are setting (even though we’re getting slightly ahead of ourselves here, it’s worth doing this early). Those are for the expectations
plugin for Artillery, which we will employ to reuse our scenario definitions for acceptance/smoke/functional testing. Until the plugin is enabled, those annotations don’t have any effect -- we’ll look at how we can use the plugin in a follow up article.
When writing tests for a service, you'd typically start with covering the happy path or the endpoints that are more likely to be performance-sensitive, and then increase coverage over time.
config.yaml
Service-specific configuration in Adding this configuration will allow us to test different versions of our service, e.g. our service deployed in a dev
and a staging
environment seamlessly. Since we also want to be able to run these tests against a local instance of our service (e.g. when writing new code or when learning about how a service works by reading and running the scenarios), we will define a “local” environment as well.
config:
target: '' # we don’t set a target by default
environments:
# Describe an environment that we’ll refer to as "dev". This is the name
# we will use with the --environment flag in artillery run command
# See https://artillery.io/docs/cli-reference/#run for details
dev:
# There’s a dev deployment of this service on the following URL:
target: 'https://auth-service-dev.acme-corp.internal'
# If we want to set service and environment-specific HTTP headers,
# we can do that here as well, e.g. to set an API key that must
# be used to access the service.
defaults:
headers:
x-api-key: '0xcoffee'
# We will almost certainly want to run these tests against a copy of the
# service running locally, so we define a "local" environment:
local:
target: 'http://localhost:8080'
# Tell Artillery to load service-specific custom code
processor: './functions.js'
# For our imaginary auth-service, we may use a CSV file with a set of known
# valid credentials to test with. This file may or may not actually be stored
# in the Git repo (and it may be encrypted at rest if it is) depending on the
# security requirements.
payload:
- path: './username-password.csv'
fields:
- username
- password
#
# Any other service-specific configuration goes here, e.g. if there’s an
# Artillery plugin that we use only for this service’s tests, we’d load it
# here with the "plugins" attribute.
#
functions.js
Service-specific custom JS code in If we need to write a bit of custom JS (maybe using some npm modules) to customize the behavior of virtual users for this service (e.g. to generate some random data in the format the the service expects), we can write and export our functions from this module to make them available to this service’s scenarios.
The key thing to remember here is that if you use an npm module, make sure it’s declared in the top-level `package.json'.
overrides.something.json
Overrides in Let’s say our auth service has the following SLOs:
- The service will handle 500 authentications per second at peak
- 99% of calls will complete in 200ms or less
- No more than 0.1% of calls will fail (defined as a non 2xx response code)
We could encode these requirements with the following Artillery configuration:
{
"config": {
"phases": [
{
"duration": 120,
"arrivalRate": 10,
"rampTo": 20,
"name": "Warm up the service"
},
{
"duration": 240,
"arrivalRate": 20,
"rampTo": 100,
"name": "Ramp to high load"
},
{
"duration": 600,
"arrivalRate": 100,
"name": "Sustained high load"
}
],
"ensure": {
"maxErrorRate": 0.1,
"p99": 200
}
}
}
Let’s look at these overrides in more detail:
Arrival phases
Artillery does not do constant RPS (as it would not be possible to do meaningfully with scenarios composed of more than one request), but let’s say we arrived at the arrivalRate
values above through some experimentation to achieve >1000 RPS consistently in the "Sustained high load" phase.
ensure
Automatic SLO checking with Artillery supports automatic checking of response time and error rate conditions after a test run is completed. Here we’re encoding our SLOs for the service of p99
response time being 200ms, and a max 0.1% acceptable error rate. Once the test run is over, Artillery will compare the actual metrics with the objectives, and if they’re higher (e.g. p99
latency is over 200ms), the CLI will exit with a non-zero exit code, which will in turn fail the CI/CD pipeline job.
Note: that Artillery can only measure latency which includes both the application’s response time and the time for the request and response to traverse the network. Depending on where the Artillery is run from, those values may differ very slightly (e.g. when Artillery is run from an ECS cluster in the same VPC as the service under test) or quite substantially (e.g. when an Artillery test is run from us-east-1
on a service deployed in eu-central-1
). Use your judgement when setting those ensure
values.
It is possible to make Artillery measure just the application’s response time, but that requires that your services make that information available with a custom HTTP header such as the X-Response-Time
header, and that some custom JS code is written.
Over to you!
Now that you’ve set up a project structure that’s extensible and re-usable, go ahead and write a scenario or two for your service. To run your tests, you'd use a command like this:
artillery run --config ./services/auth-service/config.yaml --overrides "$(cat ./services/auth-service/overrides.slos.json)" -e dev ./services/auth-service/login.yaml