Load testing with Playwright

Load testing with Playwright

Overview

Playwright (opens in a new tab) is a modern browser automation framework by Microsoft. Artillery can run JS/TypeScript functions that use Playwright APIs to interact with web pages. This enables reuse of existing Playwright test suites for load testing.

With Artillery's Playwright engine you can:

  • Create load tests for complex web apps using Playwright's API
  • Reuse existing Playwright code for load testing
  • Track Core Web Vitals (opens in a new tab) metrics automatically and measure how they change under load
  • Create no-code tests with playwright codegen
  • Scale to tens of thousands of headless Chrome instances with AWS Fargate or Azure ACI

How it works

How Artillery and Playwright integrate

  • Artillery loads and runs your code provided as test functions (via testFunction attribute in a scenario definition).
  • Test functions are written as JS/TypeScript code, using the Playwright Page API (opens in a new tab) to control a browser and interact with a web app.
  • Artillery sets up the browser instance and the page object for you. Your test function code runs in that context.
  • You can import other modules to use in your test function. For example if you have existing Page Object Model classes, you can import them and use them in your test functions.

What happens during a load test

Artillery will create a number of virtual users (VUs) according to the config.phases config in your test definition. Those VUs will run in parallel, and each of them will load & execute your test function, simulating a distinct real user interacting with the web app.

Artillery will manage headless browsers (and browser contexts) used by those VUs, and their lifecycle. It will also collect web app performance metrics from each individual VU, and aggregate them to provide a combined report of performance metrics such as Web Vitals metrics.

When running a distributed load test Artillery will create cloud infrastructure in your AWS or Azure account to run those headless browsers, and aggregate results and metrics from all those instances.

Artillery compared with Playwright Test Runner

Load testing with Playwright & Artillery has some similarities to end-to-end testing with the Playwright Test Runner, but also a number of important differences.

  • Purpose: Playwright Test Runner is a test runner for end-to-end tests, whereas Artillery is a test runner for load tests.
  • Concepts:
    • Playwright Test Runner runs test suites or specs, which are comprised of tests defined with test() blocks.
    • Artillery runs virtual user (VU) scenarios, which are defined using the test function API
    • Both Playwright Test Runner tests and Artillery scenarios use the same Playwright APIs to interact with web pages
  • Configuration: Playwright Test Runner is configured with a playwright.config.ts file, whereas Artillery is configured with a YAML file. Both allow configuration for the underlying browser instance used by the Playwright APIs.
  • Results:
    • An end-to-end test with Playwright Test Runner produces a binary pass/fail outcome for each test in a test suite. Each test usually runs only once.
    • A load test with Artillery will produce a report which shows how the performance of the web app changes over time under load. Artillery will usually run a smaller number of VU scenarios many times and in parallel. Different executions of same scenario can succeed and fail at different times because of the load generated on the application. The load profile of the test (i.e. how many virtual users are created) is as important as the logic of individual scenarios and will affect the outcome.
    • There is no default pass/fail condition on the overall test report in Artillery as the performance metrics which are produced need to be interpreted in the context of the application and specific load profile. Load tests are often exploratory in nature, rather than something with a binary pass/fail outcome. You can set up pass/fail assertions on the results of a load test using the built-in ensure plugin.

Trace recording

Artillery can automatically record Playwright traces for virtual users that fail. This can be helpful for debugging any issues uncovered by the load test. Recorded traces are uploaded to Artillery Cloud (opens in a new tab) and can be viewed under the Traces tab in the test report.

Usage

You will need to create an Artillery Cloud account and run your test with the --record flag to use this feature.

How it works

Trace recording works differently when running Playwright load tests with Artillery vs trace recordings in the Playwright Test Runner.

A typical configuration for the Playwright Test Runner is to re-run a failing test case and record a trace for the new run. Re-running VUs in an Artillery test would not be a good approach as that would change the load profile of the test.

Enabling trace recordings for all VUs by default would also not be a good approach as trace recordings incur a significant performance overhead, and that would make tests more expensive to run.

Artillery's approach is to keep a small number of VUs recording traces at any given time. If any of those VUs fail, the trace is saved and uploaded to Artillery Cloud. This approach allows you to capture traces for failing VUs without significantly increasing resource requirements of the load test.

However, this can lead to a situation where a load test has failing VUs but no traces are recorded, because every VU that was recording happened to finish successfully. This is more likely to happen in tests with a large number of total VUs and a small number of failing VUs.

Increasing the number of maxConcurrentRecordings setting will increase the likelihood of capturing traces for failing VUs in such cases.

Performance & Cost

You can start running tests with Artillery & Playwright on your local machine. However, for larger tests you will need to run distributed tests in the cloud.

Artillery can run Playwright tests on AWS Fargate or Azure ACI and scale to tens of thousands of concurrent headless browsers.

Resource recommendations

As a rule of thumb we recommend planning for 1 vCPU per concurrent virtual user. Headless browsers are CPU and memory intensive, and if too many of them run concurrently in the same worker container, then the browser itself will struggle to load the app you're testing. This will manifest itself as rapid degradation of Web Vitals in the test as reported by Artillery.

Cost estimation example

Let's estimate resource requirements and cost for a load test that will reach 5000 concurrent virtual users on AWS Fargate, using Spot capacity provider, and running in the us-east-1 region and run for 30 minutes.

  • Resource requirements: with the default recommendation of 1 vCPU per VU, 5000 vCPUs will be required. We will also allocate 1GB of memory per vCPU, for a total of 5000 GB.
  • Number of workers: AWS Fargate allows up to 16 vCPUs per worker, so 313 workers will be required, with 16 vCPUs and 16GB of memory each.
  • Cost: AWS Fargate costs $0.01227579 per vCPU/hour, and $0.00134797 per GB/hour. This will result in a total cost of: ($0.01227579 * 5000 + $0.00134797 * 5000) / 60 * 30 = $34.06 for the test.

Compute costs (CPU and memory) are likely to be the main component of total cost, but depending on the nature of your tests you may also need to consider the cost of data transfer, storage, and other cloud services used by your tests.

FAQ

Why load test with headless browsers?

Load testing dynamic web applications using protocol/API-level tests can be time consuming and brittle. The main reason is that testing a web application requires a different level of abstraction:

  • APIs and backend services are tested at API endpoint level, calling an endpoint one at a time even in multi-step scenarios
  • Web apps are tested at page and user flows level which mimic how real users use the application. Each page usually calls multiple APIs and the flow between pages is driven by user actions.
HTTP APIs & microservicesWeb apps
Abstraction levelHTTP endpointWhole page
Surface areaSmall, a handful of endpointsLarge, calls many APIs. Different APIs may be called depending on in-page actions by the user
Formal specUsually available (e.g. as an OpenAPI spec)No formal specs for APIs used and their dependencies. You have to spend time in Dev Tools to track down all API calls
In-page JSIgnored. Calls made by in-page JS have to be accounted for manually and emulatedRuns as expected, e.g. making calls to more HTTP endpoints

Those differences make it difficult to model load on the backend accurately using API-level tests, especially if the application uses client-side JavaScript to communicate with backend APIs.

Another reason is that the performance of API endpoints in isolation does not provide any guarantees about user-perceived performance of the application as a whole. The only way to measure user-perceived performance is to test the application as a whole through the frontend.

Running load tests with Playwright and Artillery provides a way to test web applications at the right level of abstraction, measure user-perceived performance with automatic capture of Web Vitals, and run those tests at high scale.

Without Playwright

  • Figure out which HTTP APIs are used by the web page
  • Figure out what actions in the UI trigger calls to which APIs
  • Figure out what in-page JavaScript code does and how it interacts with the backend
  • Try to mimic realistic load on the backend at protocol level or by using HAR files
  • Ignore limitations with how dynamic such tests can be, and accept how brittle and time consuming maintetance is going to be

With Playwright & Artillery

  • Write UI-centric code, reusing existing Playwright code, and let the web app itself call the backend
  • Run lots of Playwrigth scripts to generate load on the backend

Limits & Gotchas

  • Only Chromium is currently available. We remove other browsers from the Docker image that runs Playwright load tests on purpose to improve startup time performance for large load tests. It's highly unlikely that a load test using Chromium would not uncover the same performance issues as a test using another browser.
  • To run a distributed test you have to use either AWS Fargate or Azure ACI. Running Playwright tests on AWS Lambda is not supported.
  • Automatic transpilation of TypeScript may sometimes fail due to issues with particular npm packages. You can exclude them from bundling via config.bundling.external setting.
  • Headless browsers are CPU and memory intensive, and if too many of them run concurrently in the same worker container, then browser itself will struggle to load the app you're testing. This will manifest itself as rapid degradation of Web Vitals in the test. As a rule of thumb allow at least one vCPU per concurrent headless browser instance.
  • Trace recording works differently in Playwright load tests vs Playwright Test Runner tests. You may not always see a trace recording for every failed VU. See Understanding trace recordings for more information.

See Also

Artillery Cloud has support for Playwright Test Runner reporting. You can monitor Playwright tests in real-time, view HTML reports with screenshots and traces, and integrate with GitHub Actions. See Playwright Test Runner reporter for more information.