Testing #6 E2E with Playwright and CI Integration — Closing the Track

12 min read

This is the last post of the testing track. If #1–5 lived inside one component or one module, this post follows real user scenarios end-to-end in a real browser — that’s E2E.

Where this post sits in the Testing series:

The tool of focus is Playwright, and the final section recaps all six posts of the track.

What E2E Catches and What It Doesn’t #

A reminder of the shape from #1. E2E sits here:

The trophy, again
        ┌──────────────┐
        │     E2E      │      ← 5–10 core scenarios
        ├──────────────┤
        │  Integration │
        │    (RTL)     │
        ├──────────────┤
        │     Unit     │
        ├──────────────┤
        │    Static    │
        └──────────────┘

What E2E catches:

  • Things component integration tests can’t see — routing, auth flow, state passed across pages.
  • Real browser differences — wrinkles where Safari behaves differently from Chrome.
  • Mismatches in the contract between backend and frontend.
  • Race / timing issues that only show up against real network and real DB.

Where E2E doesn’t catch much, or isn’t worth the cost:

  • The behavior of a single component — RTL is faster and more accurate.
  • Every branch of edge cases — five unit tests can cover what would take thirty E2E tests.
  • Server-side business logic — that belongs to server-side unit/integration tests.

Recommended count — 5–10 core user flows. Something like “sign up → log in → one or two main features → checkout.” Going beyond that quickly becomes a maintenance burden.

Playwright vs Cypress — Briefly #

The two heavyweights of E2E, compared at a high level:

PlaywrightCypress
BrowsersChromium, Firefox, WebKit (Safari)Chromium, Firefox, WebKit (experimental)
Execution modelOut-of-process (browser automation)In-process (inside the browser)
Multi-tab / iframeNatural supportTricky
Parallel executionBuilt-inPaid (Cypress Cloud)
Mobile emulationStrongAverage
Learning curveLowLowest
DocsGoodExcellent

For new projects, there’s almost no reason not to pick Playwright these days. WebKit support, free parallel execution, actively developed by Microsoft. Cypress still wins on teaching-friendly DX and its visual debugger.

This post uses Playwright.

Setup #

The fastest start — the official init command.

Install Playwright
pnpm create playwright

What it asks:

  • TypeScript? → yes
  • Test directory? → e2e (or tests)
  • Add a GitHub Actions workflow? → yes (more in the CI section)
  • Install browsers? → yes (downloads Chromium / Firefox / WebKit, about 500MB)

A snippet of the generated playwright.config.ts:

playwright.config.ts
import { defineConfig, devices } from '@playwright/test';

export default defineConfig({
  testDir: './e2e',
  fullyParallel: true,
  forbidOnly: !!process.env.CI,
  retries: process.env.CI ? 2 : 0,
  workers: process.env.CI ? 1 : undefined,
  reporter: 'html',
  use: {
    baseURL: 'http://localhost:5173',
    trace: 'on-first-retry',
  },
  projects: [
    { name: 'chromium', use: { ...devices['Desktop Chrome'] } },
    { name: 'firefox', use: { ...devices['Desktop Firefox'] } },
    { name: 'webkit', use: { ...devices['Desktop Safari'] } },
  ],
  webServer: {
    command: 'pnpm dev',
    url: 'http://localhost:5173',
    reuseExistingServer: !process.env.CI,
  },
});

Key options:

  • baseURL — the base for every page.goto('/'). Swap it via env vars per environment.
  • webServer — automatically spins up the dev server before tests. Same behavior in CI.
  • projects — run the same tests across multiple browsers. Running all three in CI is the standard practice.
  • trace: 'on-first-retry' — record a trace only for the first failed test (screenshots + DOM snapshot + network). Successful runs aren’t recorded, so the output stays light.
  • retries — auto-retry flaky tests. CI only.

Your First E2E #

The simplest test possible.

e2e/home.spec.ts
import { test, expect } from '@playwright/test';

test('home page shows the title', async ({ page }) => {
  await page.goto('/');
  await expect(page).toHaveTitle(/My App/);
});

test('clicking the login link navigates to the login page', async ({ page }) => {
  await page.goto('/');
  await page.getByRole('link', { name: 'Log in' }).click();
  await expect(page).toHaveURL('/login');
});

The { page } fixture is a fresh browser context for every test. You start with clean cookies and storage.

The query API mirrors RTL almost exactly — getByRole, getByLabel, getByText. Same philosophy.

Run
pnpm playwright test                  # headless
pnpm playwright test --headed         # show the browser
pnpm playwright test --ui             # visual UI mode (powerful)
pnpm playwright test --debug          # debug mode (step through)

Try --ui mode at least once. Test list on the left, the browser in the middle, action timeline on the right. Debugging changes shape entirely.

Locator — Not Just an Element #

Playwright’s getByRole(...) and friends actually return a Locator object. RTL hands back an element immediately; Playwright is lazy — it doesn’t look for the element until an action actually happens.

where locators sit
const button = page.getByRole('button', { name: 'Save' });
// no DOM lookup here yet

await button.click();
// at this point it finds the element, auto-waiting if needed

This shape makes auto-waiting feel natural. You almost never need RTL’s waitFor / findBy — the await itself waits for the element.

auto-waiting examples
await page.getByRole('button', { name: 'Save' }).click();
// waits for the button to appear, waits for it to be enabled, then clicks

await expect(page.getByRole('alert')).toHaveText('Saved');
// waits for the alert to appear and for the text to match

expect follows the same shape. await expect(locator).toBeVisible() polls for 5 seconds (default) and only fails if it never settles.

Login Flow — Real E2E #

A more serious scenario. Log in → dashboard → log out.

e2e/auth.spec.ts
import { test, expect } from '@playwright/test';

test.describe('auth flow', () => {
  test('shows an error when logging in with the wrong password', async ({ page }) => {
    await page.goto('/login');

    await page.getByLabel('Email').fill('user@example.com');
    await page.getByLabel('Password').fill('wrong');
    await page.getByRole('button', { name: 'Log in' }).click();

    await expect(page.getByRole('alert')).toContainText('password');
    await expect(page).toHaveURL('/login'); // no navigation
  });

  test('logging in with valid credentials navigates to the dashboard', async ({ page }) => {
    await page.goto('/login');

    await page.getByLabel('Email').fill('user@example.com');
    await page.getByLabel('Password').fill('correct-password');
    await page.getByRole('button', { name: 'Log in' }).click();

    await expect(page).toHaveURL('/dashboard');
    await expect(page.getByRole('heading', { name: /Welcome/ })).toBeVisible();
  });

  test('logging out from the dashboard returns to home', async ({ page }) => {
    // Pre-login
    await page.goto('/login');
    await page.getByLabel('Email').fill('user@example.com');
    await page.getByLabel('Password').fill('correct-password');
    await page.getByRole('button', { name: 'Log in' }).click();
    await expect(page).toHaveURL('/dashboard');

    // Log out
    await page.getByRole('button', { name: 'Log out' }).click();
    await expect(page).toHaveURL('/');
  });
});

The third test is interesting — you have to repeat the pre-login every time. This is where E2E quickly becomes painful. The next section solves it.

storageState — Sharing Login State #

Repeating the login flow in every test is wasteful. Playwright’s storageState solves this.

e2e/auth.setup.ts
import { test as setup } from '@playwright/test';

const authFile = '.auth/user.json';

setup('authenticate', async ({ page }) => {
  await page.goto('/login');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('correct-password');
  await page.getByRole('button', { name: 'Log in' }).click();

  await page.waitForURL('/dashboard');

  // Save cookies and localStorage to a file
  await page.context().storageState({ path: authFile });
});
playwright.config.ts — split projects
projects: [
  { name: 'setup', testMatch: /.*\.setup\.ts/ },
  {
    name: 'chromium',
    use: { ...devices['Desktop Chrome'], storageState: '.auth/user.json' },
    dependencies: ['setup'],
  },
],

Now every chromium test starts already logged in. Setup runs once, and the rest of the tests share its result.

Network Mocking — Available in Playwright Too #

E2E doesn’t always need a real backend. Some scenarios (error responses, slow responses) are a natural fit for mocking.

intercepting the network
test('shows an error screen when the server returns 500', async ({ page }) => {
  await page.route('/api/posts', (route) => {
    route.fulfill({ status: 500, body: 'Internal Error' });
  });

  await page.goto('/posts');
  await expect(page.getByRole('alert')).toContainText('Server error');
});

page.route(pattern, handler) plays a role similar to MSW from #4. But the real value of E2E is verifying against the real backend. Save mocking for the occasional error path or edge case.

Page Object Pattern — Lightly #

As tests grow, you end up repeating the same selectors across tests. The page object pattern tidies that.

e2e/pages/LoginPage.ts
import { Page, Locator } from '@playwright/test';

export class LoginPage {
  readonly emailInput: Locator;
  readonly passwordInput: Locator;
  readonly submitButton: Locator;
  readonly alert: Locator;

  constructor(public readonly page: Page) {
    this.emailInput = page.getByLabel('Email');
    this.passwordInput = page.getByLabel('Password');
    this.submitButton = page.getByRole('button', { name: 'Log in' });
    this.alert = page.getByRole('alert');
  }

  async goto() {
    await this.page.goto('/login');
  }

  async login(email: string, password: string) {
    await this.emailInput.fill(email);
    await this.passwordInput.fill(password);
    await this.submitButton.click();
  }
}
usage
import { test, expect } from '@playwright/test';
import { LoginPage } from './pages/LoginPage';

test('login flow', async ({ page }) => {
  const loginPage = new LoginPage(page);
  await loginPage.goto();
  await loginPage.login('user@example.com', 'correct-password');

  await expect(page).toHaveURL('/dashboard');
});

Selectors live in one place; when they change, you update one place. One caution — too much abstraction hurts readability. Keep selectors and frequently-used actions in the PO. Assertions (expect) usually read better when they stay inside the test itself.

CI Integration — GitHub Actions #

The workflow pnpm create playwright generates for you.

.github/workflows/playwright.yml
name: Playwright Tests

on:
  push:
    branches: [main]
  pull_request:

jobs:
  test:
    timeout-minutes: 30
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 22

      - uses: pnpm/action-setup@v4

      - name: Install dependencies
        run: pnpm install

      - name: Install Playwright browsers
        run: pnpm exec playwright install --with-deps

      - name: Run tests
        run: pnpm exec playwright test

      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: playwright-report
          path: playwright-report/
          retention-days: 30

Key points:

  • playwright install --with-deps — installs the browsers along with system dependencies. This part is cacheable per CI run, so adding a cache action speeds it up further.
  • upload-artifact with if: always() — uploads the HTML report as an artifact when tests break. Download it from the PR to inspect traces and screenshots of the failed tests.

Vitest Too — Both in One Workflow #

Both E2E and unit/integration in the same workflow.

combined workflow
name: Tests

on:
  push:
    branches: [main]
  pull_request:

jobs:
  unit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 22 }
      - uses: pnpm/action-setup@v4
      - run: pnpm install
      - run: pnpm test:run --coverage
      - uses: actions/upload-artifact@v4
        with:
          name: coverage
          path: coverage/

  e2e:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 22 }
      - uses: pnpm/action-setup@v4
      - run: pnpm install
      - run: pnpm exec playwright install --with-deps
      - run: pnpm exec playwright test
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: playwright-report
          path: playwright-report/

Running them in parallel means unit (tens of seconds) and e2e (a few minutes) finish together — short PR feedback.

Coverage — How Much to Show #

When the Vitest coverage report lands in CI, an action that auto-comments it on the GitHub PR is commonly used.

coverage comment
- uses: davelosert/vitest-coverage-report-action@v2
  if: github.event_name == 'pull_request'

A comment like “this PR moved coverage X% → Y%” gets attached to the PR automatically.

Same point as in #1 and #2don’t chase the number. Rules like “auto-block merge when coverage drops” usually shoot you in the foot. What matters more is whether the core flows are covered.

Visual Regression — One Line #

If you want to catch unintended visual changes — Playwright’s toHaveScreenshot().

visual snapshot
test('home page visual regression', async ({ page }) => {
  await page.goto('/');
  await expect(page).toHaveScreenshot('home.png');
});

The first run saves a baseline screenshot, and subsequent runs fail when there’s a difference. Differences in CI’s OS / fonts often produce false positives, so usually you generate the baseline on the same OS and only compare against that.

For small projects, you don’t really need this. For larger ones, SaaS like Percy / Chromatic is cleaner.

Common Pitfalls #

Too many flaky tests — usually setTimeout, network races, or animation. Playwright’s expect(...).toBe...() already auto-waits, so a fixed page.waitForTimeout(1000) is almost always an anti-pattern.

Tests pass but cookies / storage seem to leakstorageState is misconfigured, or the pairing between test.use({ storageState: ... }) and dependencies is off.

Breaks only in CI — usually timing, viewport size, or fonts. Make sure viewport and baseURL in playwright.config.ts match between CI and local.

.toHaveText() fails on a partial text match — confused with toContainText(). Use toHaveText for exact matches and toContainText for partials.

Parallel execution causes data conflicts — the same test user is being hit concurrently. Use different data per test (e.g. add a timestamp to the email) or serialize with test.describe.configure({ mode: 'serial' }).

Only WebKit breaks — subtle Safari compatibility quirks. Temporarily exclude webkit from projects in playwright.config.ts and debug it separately. Usually CSS, transforms, or focus-related.

Recap of the Six-Post Track #

We started with the diagram from #1static → integration → unit → E2E — and followed that distribution. Where each of the six posts landed:

what the series covered
#1 — the diagram (pyramid vs trophy)
#2 — Vitest (unit + its setup)
#3 — RTL (first step into integration)
#4 — MSW (network interception = the heart of integration)
#5 — userEvent + forms (user-input integration)
#6 — Playwright + CI (E2E + automation)

And going back to the line we started with:

The reason testing doesn’t happen is usually not “we’re busy” — it’s the lack of a picture for what / where / how.

The picture is in place now. Whether to write a unit test for a small function, reach for component integration, or put it into an E2E scenario — you have the decision guide in hand. If you want to go further:

  • Deeper state-management track — testing patterns for tools like TanStack Query / Zustand.
  • Backend testingpytest from the Python track, Django Intermediate #7 Testing, and Modern Python in Practice #6.
  • Contract testing — verify the contract between backend and frontend with OpenAPI / Pact.
  • Mutation testing — a meta tool that verifies whether your tests actually catch what they claim (Stryker).

Wrap-Up #

  • E2E is 5–10 core user flows — anything beyond that becomes a maintenance burden. Leave to unit/integration what they already cover.
  • Playwright’s locator is lazy + auto-waiting. You almost never need waitForawait expect(locator).toBe...() polls automatically.
  • Queries follow the same philosophy as RTL — prefer getByRole, getByLabel. The way users see the UI.
  • Use storageState to share login state — no repeating the login flow in every test.
  • page.route enables network mocking inside Playwright too. Still, the real value of E2E is verifying against the real backend.
  • A CI workflow runs unit + e2e in parallel. Upload traces and reports as artifacts so you can pull them down from the PR.
  • Treat coverage as a reference; auto-block rules usually shoot you in the foot.
  • The shape of the six-post track — static → integration → unit → E2E. Be deliberate about how you allocate time.

The testing track ends here. It was a natural extension from the React track and the TypeScript track, and finishing this track puts a feel for testing into your hands for almost any React code. The next step — actually adopt it in one of your own projects. Start with one or two small functions.

X