Imagine the following scenario: you’re part of a sizeable project using a microservice-based architecture, with multiple teams (who aren’t necessarily in the same geographical location) working on a number of microservices.
All teams may be deploying to a single target if you’re fortunate (or if the project is still in relatively early stages), such as Kubernetes or ECS, but it’s likely that there’s at least some AWS Lambda/Serverless in the mix, as well as a more traditional EC2-based deployment somewhere. There’s an API gateway which exposes public APIs that are backed by some of the microservices, and a variety of purely internal microservices that provide APIs which are only ever used by other microservices.
Most of your microservices probably expose HTTP-based APIs (whether RESTful or GraphQL), but it’s very likely that there’s a variety of other interfaces and transports used in the system: there’s probably a number of services which use SQS, Kinesis or another queue-like mechanism to receive work or output results, there may be gRPC somewhere in the mix, or perhaps a real-time component using WebSockets or Socket.io.
An imaginary polyglot microservice-based system.
Note: any resemblance to a real-world architecture is accidental
More importantly for the purposes of this article: performance & reliability are a key requirement for this system. Perhaps you’re in e-commerce, where every 100ms you can shave off response time affects the bottom line. Or it’s an IoT system that will need to be able to receive and process hundreds of thousands of messages every minute from a fleet of real-world sensors. Maybe it’s some kind of a ticketing or event system, with regular spikes in traffic that must be dealt with or your company’s call centers get DDoS’d by irate customers. Whatever it is, performance and reliability are important to this project, and you know you will want to invest time and effort into it.
This is the situation development teams find themselves in all the time (and one we are very familiar with and have helped with many times in the course of our consulting work). As a team lead, developer, SDET or QA engineer who’s tasked with implementing a performance testing pipeline there’s likely to be many questions that you feel need to be answered, e.g.:
- Where does performance testing fit into the development, testing, and delivery process?
- How do we run performance tests in our CI/CD pipeline?
- How do we define performance goals & SLOs for our services? Can we have those checked automatically in CI?
- What types of performance tests are there? Which ones do we need for our services?
- How do we organize our test suites? What are the best practices around structuring those tests, avoiding duplicating effort and wasting time, and sharing common solutions across teams?
- How do we pick which services to test? Do we need to test all of our microservices? What about testing different environments and configurations easily?
- How do we encourage collaboration on performance testing and involve everyone: developers, testers, SREs and product managers? (performance is everyone’s responsibility!)
- How do we make performance testing results available to all interested parties? (developers, testers, product managers, SREs)
- How do we make sure that the costs of running load tests frequently at scale do not become astronomical?
- What’s the best way to get started with all of this?
That’s a lot of questions, but fortunately we have the answers! The goal of this series of articles is to give you a practical methodology to follow with your team, and make it so clear that you could read this guide and start tomorrow.
By the end of this series, you will end up with:
- A scaffold for building out a comprehensive test suite for all of your APIs – internal and external, regardless of where they may be deployed, what protocol they use to communicate, or which techology stack they’re built with. Whether it’s a serverless Node.js service, a Go microservice deployed on Kubernetes that consumes Kinesis, or a SQS-based worker service.
- An understanding of how to integrate performance tests into your team’s CI/CD pipeline.
- The ability to run both load tests and acceptance/smoke/functional tests using the same test definitions, e.g. to first verify a new deployment by running a few happy path scenarios, and then blasting it with lots of traffic to test how it responds under load.
- Making the metrics generated by your tests (such as latency and RPS) available in an external monitoring system such as Datadog.
… and more!
Let’s get started!
This article presumes that you’re familiar with Artillery. If you haven’t used it before, I’d suggest having a look through the docs, especially the Core Concepts guide, and an overview of what Artillery tests look like.
While we could start with any service, if we want to make progress quickly (e.g. to have something interesting to present to the rest of the team) and demonstrate the value of what we are doing, it’s best if we start with a service which satisfies the following criteria:
- It has a small API surface. Ideally only a couple of API endpoints.
- It would obviously benefit from load tests, so:
- It has experienced issues under load in the past or you have good reasons to suspect that it may
- It’s on the critical path for some important functionality of the system, and if it experiences hiccups, parts of the system may go down, become unavailable and affect end users
- It has high performance requirements (which may be expressed as TPS, desirable
p99response time, number of concurrent users, or a combination of several objectives)
An example of such a service might be an authentication service which issues JWTs allowing users to access other API endpoints.
Now that we’ve picked a service we want to test, the next question is how to organize our code.
Keeping extensibility of our test suite in mind (we will be adding more scenarios, hopefully for many other services and APIs), and with cross-team and cross-functional collaboration also firmly in our thoughts, we usually recommend using a monorepo approach, that is a single source code repository that will contain test definitions for all (or many) services and APIs:
- A single repo makes the test suite easier to extend and maintain. There’s only one repo to clone. All of the test code is there, and existing tests for other services can serve as good examples for someone just starting to write tests for their API.
- Access control is simple. You only need to share one repo with other teams rather than 10 or 20 (which would be the case if performance tests are kept alongside the microservices’ codebase).
- A monorepo encourages and simplifies code reuse in tests, and helps keep things more DRY.
- A monorepo is also easier to work with when you’re building CI/CD jobs and pipelines. The CI server needs to be able to clone only one known repo to run an automated test against any existing microservice.
(Aside: here’s a nice read over on Dan Luu’s blog on reasons monorepos can work well for some projects.)
Let's pretend we're working for Acme Corp for the moment
We’re going set up the following directory structure for our tests:
acme-corp-api-tests– This is the name of the repo that will hold our test code. Naming is one of the two hardest things in computing, and
api-teststends to work well as a name. (Not to give too much away, but in the follow up to this article we’ll cover how to use the same test definitions to run either load tests or acceptance/smoke/functional tests, so a more generic “api-tests” makes more sense than something specific like “load-tests”).
services– this is where our test definitions will reside
README.mdwill cover the basics of how to use and extend the test suite
package.json– this will contain dependency information for our tests and include Artillery / Artillery Pro, any third party Artillery plugins being used, or npm modules that are used in custom JS code in our test definitions.
(You can go ahead and clone the template acme-corp-api-tests repo we have set up and use that as your base. )
Next, we’ll create a directory under
services to hold the test code for the microservice we are starting with, e.g.
services/auth-service. We will follow a convention for each of the services that we write tests for as well, as follows:
scenarioswill contain a number of YAML files, each defining one virtual user scenario
config.yamlwill contain service-specific information, such as a list of environments that the service is deployed to (e.g.
stagingetc), configuration of any service-specific HTTP headers, any variable data etc.
- We will have a number of
overrides.something.jsonfiles, which we will use in conjunction with the
artillery runcommand to be able to set the amount of generated load dynamically. For example we may have two overrides files, one setting the level of load to baseline that our service is expected to handle, and another one which emulates peak traffic spikes that we need to be able to deal with.
functions.jswill hold any custom JS code that we may need to write for our scenarios.
Next, we’ll look at each one of these in more detail.
The number and nature of these will depend on the service being tested. A simple scenario for an imaginary auth service that tests the happy path may look like this:
One thing to note here is the
expect attributes we are setting (even though we’re getting slightly ahead of ourselves here, it’s worth doing this early). Those are for the
expectations plugin for Artillery, which we will employ to reuse our scenario definitions for acceptance/smoke/functional testing. Until the plugin is enabled, those annotations don’t have any effect – we’ll look at how we can use the plugin in a follow up article.
When writing tests for a service, you’d typically start with covering the happy path or the endpoints that are more likely to be performance-sensitive, and then increase coverage over time.
Adding this configuration will allow us to test different versions of our service, e.g. our service deployed in a
dev and a
staging environment seamlessly. Since we also want to be able to run these tests against a local instance of our service (e.g. when writing new code or when learning about how a service works by reading and running the scenarios), we will define a “local” environment as well.
If we need to write a bit of custom JS (maybe using some npm modules) to customize the behavior of virtual users for this service (e.g. to generate some random data in the format the the service expects), we can write and export our functions from this module to make them available to this service’s scenarios.
The key thing to remember here is that if you use an npm module, make sure it’s declared in the top-level `package.json’.
Let’s say our auth service has the following SLOs:
- The service will handle 500 authentications per second at peak
- 99% of calls will complete in 200ms or less
- No more than 0.1% of calls will fail (defined as a non 2xx response code)
We could encode these requirements with the following Artillery configuration:
Let’s look at these overrides in more detail:
Artillery does not do constant RPS (as it would not be possible to do meaningfully with scenarios composed of more than one request), but let’s say we arrived at the
arrivalRate values above through some experimentation to achieve >1000 RPS consistently in the “Sustained high load” phase.
Artillery supports automatic checking of response time and error rate conditions after a test run is completed. Here we’re encoding our SLOs for the service of
p99 response time being <= 200ms, and a max 0.1% acceptable error rate. Once the test run is over, Artillery will compare the actual metrics with the objectives, and if they’re higher (e.g.
p99 latency is over 200ms), the CLI will exit with a non-zero exit code, which will in turn fail the CI/CD pipeline job.
Note: that Artillery can only measure latency which includes both the application’s response time and the time for the request and response to traverse the network. Depending on where the Artillery is run from, those values may differ very slightly (e.g. when Artillery is run from an ECS cluster in the same VPC as the service under test) or quite substantially (e.g. when an Artillery test is run from
us-east-1 on a service deployed in
eu-central-1). Use your judgement when setting those
It is possible to make Artillery measure just the application’s response time, but that requires that your services make that information available with a custom HTTP header such as the
X-Response-Time header, and that some custom JS code is written.
Now that you’ve set up a project structure that’s extensible and re-usable, go ahead and write a scenario or two for your service. To run your tests, you’d use a command like this:
artillery run --config ./services/auth-service/config.yaml --overrides "$(cat ./services/auth-service/overrides.slos.json)" -e dev ./services/auth-service/login.yaml