The most expensive version of accessibility work is the one that happens once. Someone runs an axe scan, fixes the findings in a flurry, declares the product compliant, and the codebase quietly drifts back to where it was within a release cycle. The cheapest version is the one that happens continuously: a deterministic check sits in CI, refuses to merge regressions, and gets cheaper to satisfy as the patterns it enforces become muscle memory for the team.
Last week we ran a fleet-wide accessibility audit and remediation across the entire AVIAN monorepo, the platform behind AccelaStudy AI and our internal tool fleet. This post is the operational story: what we built, how we used it, what we found, what we shipped, and how we're keeping the bar continuous.
The shape of the platform
AVIAN is fifty-six user-facing repositories in a single monorepo. The product is AccelaStudy AI, the consumer adaptive-learning application. Around it sit the recruiter and enterprise marketplace surfaces (avian-aces-web, avian-enterprise-web), the internal tool fleet that runs the company (twenty-one tools spanning a Kanban board, a calendar, a unified observability dashboard, a multi-site CMS, and more), and the marketing websites for every product line. They share a design system, an authentication shell, an activities library, a console-simulator library used in cloud labs, and a UI primitives library. Anything we build into the foundation propagates across the fleet; anything we leave broken in the foundation propagates the same way.
Our compliance target is WCAG 2.1 Level AA, the standard most enterprise procurement teams use to evaluate vendors and the standard a meaningful fraction of our user base relies on day-to-day. Compliance with WCAG isn't a single test; it's about fifty success criteria covering image alt text, keyboard navigation, focus indicators, color contrast, motion preferences, screen-reader support, and several dozen other dimensions. Holding a fifty-six-repository fleet to that standard by hand is not feasible. So we don't.
The premise: deterministic where possible, judgmental where necessary
We treat the WCAG 2.1 AA surface as having two parts. The mechanical eighty percent — missing alt attributes, unlabelled inputs, clickable divs without keyboard support, focus rings stripped without replacement, missing skip links, missing lang attributes — is deterministically checkable. A script can find every instance with high precision. The judgmental twenty percent — color contrast across themes, custom widget pattern correctness, modal focus-trap correctness, the narrative quality of screen-reader text — needs a human or an axe-in-browser pass. Both layers belong in our process. The deterministic layer runs on every pull request; the judgmental layer runs on release boundaries.
The deterministic layer is the audit script.
The audit script
avian-audits/scripts/accessibility_audit.py is a stdlib-only Python program that scans every TSX, JSX, HTML, and CSS file in fifty-six repositories. It implements fifteen rules, each mapping to a WCAG 2.1 success criterion. The script's surface area:
| Rule category | Rule count | Coverage |
|---|---|---|
| Non-text content (alt, ARIA labels) | 3 | <img>, <svg>, <canvas> |
| Info and relationships (labels, headings, landmarks) | 4 | inputs, <h1> count, <main>, sr-only utility |
| Page-level structure | 2 | skip link, lang attribute |
| Interactive elements | 4 | clickable non-buttons, aria-hidden focusable, positive tabIndex, generic link text |
| Visible focus and motion | 2 | outline:none without replacement, reduced-motion guard |
The script auto-discovers UI-bearing repositories, skips test fixtures and build artifacts, and emits two reports: a Markdown summary for humans and a JSON file for CI. In its default mode (the project's "Mode 1"), it exits non-zero if any HIGH or CRITICAL finding remains. That's the gate we enforce on the main branch.
The interesting design decisions are the heuristics. Three suppressions matter:
- Skip aria-hidden inputs and divs. An element marked
aria-hidden="true"is by definition not in the accessibility tree. Honeypot inputs, decorative backdrops, hidden file pickers — these legitimately omitaria-labelor keyboard handlers. - Skip spread-prop forwarders. Generic component wrappers like
<Input ref={ref} {...props} />delegate accessibility to the consumer. - Recognize the conditional-attribute pattern. A common React idiom is
<div role={cond ? 'button' : undefined} tabIndex={cond ? 0 : undefined} onKeyDown={cond ? handler : undefined}>. The script treats elements with all three conditional expressions as authored-correctly.
Every suppression is documented inline alongside the rule, so a future engineer reading the script understands not just what's checked but what's deliberately not checked.
What the first pass found
The first run reported:
| Severity | Findings |
|---|---|
| CRITICAL | 0 |
| HIGH | 414 |
| MEDIUM | 312 |
| LOW | 0 |
| Total | 726 |
Seventy-one percent of the HIGH findings clustered in two places: 2,235 form inputs in the cloud-console simulator (used by labs across forty-five certification programs), where labels were visible but not programmatically linked, and a static-site dist/ folder that hadn't been rebuilt in a month and had stale HTML missing the lang attribute. The remaining HIGH findings were spread across consumer apps, internal tools, and marketing sites: missing aria-labels on icon-only buttons, modal backdrops without role="presentation", decorative SVGs not marked aria-hidden, focus rings stripped without replacement.
How we executed the remediation
We worked the findings in nine waves over a single day. Each wave targeted a class of fix, not a repository.
The largest single intervention was a 200-line Python codemod that pair-linked 2,235 inputs and labels across 257 dashboard files in the console-sim repository. The codemod found adjacent label/input pairs, generated stable identifiers from each input's existing test selector, and rewrote both elements with the appropriate htmlFor/id linkage. It applied 2,235 fixes in eight seconds, which is roughly three orders of magnitude faster and one order of magnitude more reliable than the human-touch equivalent.
A second codemod handled <h1> proliferation: pages in the simulator render multiple panels, each authored with its own page-level heading. Source-level, that meant multiple <h1> per file. The codemod kept the first <h1> and demoted the rest to <h2>, applying 594 demotions across 241 files.
A third codemod added aria-label attributes to 50 inputs whose only contextual hint was a test identifier, deriving the label from a humanized form of the test selector. We rewrote that codemod twice; the first version's regex broke on JSX attribute values containing arrow functions, which is exactly the failure mode that made me go back and write a brace-aware tokenizer for the audit script in the first place.
The remaining waves dispatched focused review passes across the consumer apps and the tool fleet. Each pass took the audit's JSON findings list, applied mechanical fixes (role="presentation" on modal backdrops, aria-label on icon-only buttons, aria-hidden on decorative SVGs, keyboard handlers on custom interactive elements), and verified TypeScript still compiled.
The numbers
A snapshot of the AVIAN UI surface area after the audit:
| Metric | Count |
|---|---|
| UI-bearing repositories | 56 |
| TSX/JSX source files | 2,185 |
Native <button> elements | 2,527 |
| Form inputs (input / select / textarea) | 3,486 |
| Total ARIA attribute uses | 3,019 |
Explicit aria-label uses | 1,530 |
aria-hidden uses (decorative elements) | 752 |
role= uses | 1,001 |
role="dialog" (modal containers) | 109 |
role="img" (canvas/SVG with text alternative) | 112 |
role="button" (custom interactive elements) | 101 |
tabIndex uses | 150 |
onKeyDown keyboard handlers | 208 |
Tailwind focus-visible: classes | 102 |
| HIGH findings before remediation | 414 |
| HIGH findings after remediation | 0 |
| Findings at every severity, after | 0 |
| Total fixes shipped in one day | 3,317 |
Three thousand nineteen ARIA attributes is a meaningful count, but the more important number is two thousand five hundred twenty-seven native <button> elements: that's the count of things we didn't have to make accessible by hand. Native HTML semantics are the foundation of accessibility; ARIA is the extension. The codebase leans heavily on native semantics — buttons, anchors, fieldsets, labels, headings — and adds ARIA for the visualizations (the Knowledge Map canvas, the Behavioral Rings SVG), the custom widgets (the lab console toolbar, the segmented billing-cadence selector, the keyboard-driven drag-and-drop in activities), and the live regions (toasts, exam timers, chat output, narration logs).
What it actually takes
After three thousand-plus fixes in a day, here's what we think is non-negotiable for a fully accessible product, and what's nice-to-have:
| Area | Non-negotiable | Nice-to-have |
|---|---|---|
| Audit | Deterministic script in CI; non-zero exit on HIGH/CRITICAL | Coverage trend dashboard |
| Spec | Single source-of-truth document; modes formalized | Rendered as a website page |
| Codemods | Reusable for the most-common bulk fixes | Pre-commit hook integration |
| Patterns | Documented and exemplified in the design system | Storybook stories for each |
| Native HTML | Buttons, anchors, fieldsets, labels — used over div + role wherever possible | — |
| ARIA | Used to extend native semantics, never to replace them | — |
| Focus | Visible indicator on every focusable element (:focus-visible, not :focus) | High-contrast mode tested |
| Keyboard | Every interactive control reachable; arrow-key patterns for radio groups, menus, drag-drop | Tab-order Playwright tests |
| Skip-link | Present at the top of every shell layout | Multiple targets (nav, main) |
lang | Set on <html> for every page | Per-section overrides for non-English content |
| Reduced motion | @media (prefers-reduced-motion: reduce) guard everywhere | Per-component opt-outs |
| Screen-reader testing | Manual pass with NVDA / VoiceOver / JAWS before each release | Recorded passes for regression comparison |
| Color contrast | Verified per theme | Computed in CI via axe |
Top row: gate. Everything else: process over time.
How we keep this continuous
The work last week was finite. The discipline is continuous. Three artifacts are the difference between an accessibility sprint and an accessibility property:
avian-audits/accessibility-audit.mdis the spec. It defines the rules, severities, audit modes, and fix patterns. It updates in the same commit as the script. We treat it like an architecture decision record.avian-audits/scripts/accessibility_audit.pyis the executable. CI runs it on every pull request. Mode 1 exits non-zero on HIGH/CRITICAL findings. The PR review surface won't merge a regression.- The fix sections of the spec are the codemod inventory. When a new mechanical pattern surfaces, the rule and the codemod ship together.
Around the deterministic core, we run an axe-core Playwright sweep across thirty-four AccelaStudy AI routes that catches color-contrast issues and computed-DOM-state defects the static audit can't see. Manual screen-reader passes happen on release boundaries. Together, the three layers — deterministic on every commit, broader Playwright on integration, manual on release — give us a coverage triangle that's hard to regress without noticing.
Why it matters for partners and customers
If you're an enterprise procurement team evaluating AccelaStudy AI for a school district, a corporate learning program, or a government workforce, accessibility is on your VPAT and on your RFP. We can show you the audit, the script, the spec, and the report; we can show you the CI gate; we can show you the manual screen-reader test plan; and we can show you the deployment commit history that closes the loop. WCAG 2.1 AA isn't a marketing claim for us. It's a property of the artifact we ship, enforced by the tooling we ship.
If you're an engineer at another company looking at a similar problem, the playbook is short: write the script first. Don't write the slide deck; don't make the report; don't even fix anything. Write the script that detects the regressions you can't tolerate. The script gives you a baseline number, the baseline tells you the size of the problem, and the size of the problem tells you whether to fix by hand, by codemod, or by focused review pass. Once the script is in place, every fix is cheap and every regression becomes a build break instead of a customer complaint.
That's the difference between accessibility as a checklist and accessibility as engineering. We chose engineering, because every learner means every learner.