Making AccelaStudy AI Accessible to All

The most expensive version of accessibility work is the one that happens once. Someone runs an axe scan, fixes the findings in a flurry, declares the product compliant, and the codebase quietly drifts back to where it was within a release cycle. The cheapest version is the one that happens continuously: a deterministic check sits in CI, refuses to merge regressions, and gets cheaper to satisfy as the patterns it enforces become muscle memory for the team.

Last week we ran a fleet-wide accessibility audit and remediation across the entire AVIAN monorepo, the platform behind AccelaStudy AI and our internal tool fleet. This post is the operational story: what we built, how we used it, what we found, what we shipped, and how we're keeping the bar continuous.

The shape of the platform

AVIAN is fifty-six user-facing repositories in a single monorepo. The product is AccelaStudy AI, the consumer adaptive-learning application. Around it sit the recruiter and enterprise marketplace surfaces (avian-aces-web, avian-enterprise-web), the internal tool fleet that runs the company (twenty-one tools spanning a Kanban board, a calendar, a unified observability dashboard, a multi-site CMS, and more), and the marketing websites for every product line. They share a design system, an authentication shell, an activities library, a console-simulator library used in cloud labs, and a UI primitives library. Anything we build into the foundation propagates across the fleet; anything we leave broken in the foundation propagates the same way.

Our compliance target is WCAG 2.1 Level AA, the standard most enterprise procurement teams use to evaluate vendors and the standard a meaningful fraction of our user base relies on day-to-day. Compliance with WCAG isn't a single test; it's about fifty success criteria covering image alt text, keyboard navigation, focus indicators, color contrast, motion preferences, screen-reader support, and several dozen other dimensions. Holding a fifty-six-repository fleet to that standard by hand is not feasible. So we don't.

The premise: deterministic where possible, judgmental where necessary

We treat the WCAG 2.1 AA surface as having two parts. The mechanical eighty percent — missing alt attributes, unlabelled inputs, clickable divs without keyboard support, focus rings stripped without replacement, missing skip links, missing lang attributes — is deterministically checkable. A script can find every instance with high precision. The judgmental twenty percent — color contrast across themes, custom widget pattern correctness, modal focus-trap correctness, the narrative quality of screen-reader text — needs a human or an axe-in-browser pass. Both layers belong in our process. The deterministic layer runs on every pull request; the judgmental layer runs on release boundaries.

The deterministic layer is the audit script.

The audit script

avian-audits/scripts/accessibility_audit.py is a stdlib-only Python program that scans every TSX, JSX, HTML, and CSS file in fifty-six repositories. It implements fifteen rules, each mapping to a WCAG 2.1 success criterion. The script's surface area:

Rule category	Rule count	Coverage
Non-text content (alt, ARIA labels)	3	`<img>`, `<svg>`, `<canvas>`
Info and relationships (labels, headings, landmarks)	4	inputs, `<h1>` count, `<main>`, sr-only utility
Page-level structure	2	skip link, `lang` attribute
Interactive elements	4	clickable non-buttons, aria-hidden focusable, positive tabIndex, generic link text
Visible focus and motion	2	`outline:none` without replacement, reduced-motion guard

The script auto-discovers UI-bearing repositories, skips test fixtures and build artifacts, and emits two reports: a Markdown summary for humans and a JSON file for CI. In its default mode (the project's "Mode 1"), it exits non-zero if any HIGH or CRITICAL finding remains. That's the gate we enforce on the main branch.

The interesting design decisions are the heuristics. Three suppressions matter:

Skip aria-hidden inputs and divs. An element marked aria-hidden="true" is by definition not in the accessibility tree. Honeypot inputs, decorative backdrops, hidden file pickers — these legitimately omit aria-label or keyboard handlers.
Skip spread-prop forwarders. Generic component wrappers like <Input ref={ref} {...props} /> delegate accessibility to the consumer.
Recognize the conditional-attribute pattern. A common React idiom is <div role={cond ? 'button' : undefined} tabIndex={cond ? 0 : undefined} onKeyDown={cond ? handler : undefined}>. The script treats elements with all three conditional expressions as authored-correctly.

Every suppression is documented inline alongside the rule, so a future engineer reading the script understands not just what's checked but what's deliberately not checked.

What the first pass found

The first run reported:

Severity	Findings
CRITICAL	0
HIGH	414
MEDIUM	312
LOW	0
Total	726

Seventy-one percent of the HIGH findings clustered in two places: 2,235 form inputs in the cloud-console simulator (used by labs across forty-five certification programs), where labels were visible but not programmatically linked, and a static-site dist/ folder that hadn't been rebuilt in a month and had stale HTML missing the lang attribute. The remaining HIGH findings were spread across consumer apps, internal tools, and marketing sites: missing aria-labels on icon-only buttons, modal backdrops without role="presentation", decorative SVGs not marked aria-hidden, focus rings stripped without replacement.

How we executed the remediation

We worked the findings in nine waves over a single day. Each wave targeted a class of fix, not a repository.

Nine fix waves from 414 high-severity defects to zero across all severity levels

The largest single intervention was a 200-line Python codemod that pair-linked 2,235 inputs and labels across 257 dashboard files in the console-sim repository. The codemod found adjacent label/input pairs, generated stable identifiers from each input's existing test selector, and rewrote both elements with the appropriate htmlFor/id linkage. It applied 2,235 fixes in eight seconds, which is roughly three orders of magnitude faster and one order of magnitude more reliable than the human-touch equivalent.

A second codemod handled <h1> proliferation: pages in the simulator render multiple panels, each authored with its own page-level heading. Source-level, that meant multiple <h1> per file. The codemod kept the first <h1> and demoted the rest to <h2>, applying 594 demotions across 241 files.

A third codemod added aria-label attributes to 50 inputs whose only contextual hint was a test identifier, deriving the label from a humanized form of the test selector. We rewrote that codemod twice; the first version's regex broke on JSX attribute values containing arrow functions, which is exactly the failure mode that made me go back and write a brace-aware tokenizer for the audit script in the first place.

The remaining waves dispatched focused review passes across the consumer apps and the tool fleet. Each pass took the audit's JSON findings list, applied mechanical fixes (role="presentation" on modal backdrops, aria-label on icon-only buttons, aria-hidden on decorative SVGs, keyboard handlers on custom interactive elements), and verified TypeScript still compiled.

The numbers

A snapshot of the AVIAN UI surface area after the audit:

Metric	Count
UI-bearing repositories	56
TSX/JSX source files	2,185
Native `<button>` elements	2,527
Form inputs (input / select / textarea)	3,486
Total ARIA attribute uses	3,019
Explicit `aria-label` uses	1,530
`aria-hidden` uses (decorative elements)	752
`role=` uses	1,001
`role="dialog"` (modal containers)	109
`role="img"` (canvas/SVG with text alternative)	112
`role="button"` (custom interactive elements)	101
`tabIndex` uses	150
`onKeyDown` keyboard handlers	208
Tailwind `focus-visible:` classes	102
HIGH findings before remediation	414
HIGH findings after remediation	0
Findings at every severity, after	0
Total fixes shipped in one day	3,317

Three thousand nineteen ARIA attributes is a meaningful count, but the more important number is two thousand five hundred twenty-seven native <button> elements: that's the count of things we didn't have to make accessible by hand. Native HTML semantics are the foundation of accessibility; ARIA is the extension. The codebase leans heavily on native semantics — buttons, anchors, fieldsets, labels, headings — and adds ARIA for the visualizations (the Knowledge Map canvas, the Behavioral Rings SVG), the custom widgets (the lab console toolbar, the segmented billing-cadence selector, the keyboard-driven drag-and-drop in activities), and the live regions (toasts, exam timers, chat output, narration logs).

What it actually takes

After three thousand-plus fixes in a day, here's what we think is non-negotiable for a fully accessible product, and what's nice-to-have:

Area	Non-negotiable	Nice-to-have
Audit	Deterministic script in CI; non-zero exit on HIGH/CRITICAL	Coverage trend dashboard
Spec	Single source-of-truth document; modes formalized	Rendered as a website page
Codemods	Reusable for the most-common bulk fixes	Pre-commit hook integration
Patterns	Documented and exemplified in the design system	Storybook stories for each
Native HTML	Buttons, anchors, fieldsets, labels — used over `div + role` wherever possible	—
ARIA	Used to extend native semantics, never to replace them	—
Focus	Visible indicator on every focusable element (`:focus-visible`, not `:focus`)	High-contrast mode tested
Keyboard	Every interactive control reachable; arrow-key patterns for radio groups, menus, drag-drop	Tab-order Playwright tests
Skip-link	Present at the top of every shell layout	Multiple targets (nav, main)
`lang`	Set on `<html>` for every page	Per-section overrides for non-English content
Reduced motion	`@media (prefers-reduced-motion: reduce)` guard everywhere	Per-component opt-outs
Screen-reader testing	Manual pass with NVDA / VoiceOver / JAWS before each release	Recorded passes for regression comparison
Color contrast	Verified per theme	Computed in CI via axe

Top row: gate. Everything else: process over time.

How we keep this continuous

The work last week was finite. The discipline is continuous. Three artifacts are the difference between an accessibility sprint and an accessibility property:

avian-audits/accessibility-audit.md is the spec. It defines the rules, severities, audit modes, and fix patterns. It updates in the same commit as the script. We treat it like an architecture decision record.
avian-audits/scripts/accessibility_audit.py is the executable. CI runs it on every pull request. Mode 1 exits non-zero on HIGH/CRITICAL findings. The PR review surface won't merge a regression.
The fix sections of the spec are the codemod inventory. When a new mechanical pattern surfaces, the rule and the codemod ship together.

Around the deterministic core, we run an axe-core Playwright sweep across thirty-four AccelaStudy AI routes that catches color-contrast issues and computed-DOM-state defects the static audit can't see. Manual screen-reader passes happen on release boundaries. Together, the three layers — deterministic on every commit, broader Playwright on integration, manual on release — give us a coverage triangle that's hard to regress without noticing.

Why it matters for partners and customers

If you're an enterprise procurement team evaluating AccelaStudy AI for a school district, a corporate learning program, or a government workforce, accessibility is on your VPAT and on your RFP. We can show you the audit, the script, the spec, and the report; we can show you the CI gate; we can show you the manual screen-reader test plan; and we can show you the deployment commit history that closes the loop. WCAG 2.1 AA isn't a marketing claim for us. It's a property of the artifact we ship, enforced by the tooling we ship.

If you're an engineer at another company looking at a similar problem, the playbook is short: write the script first. Don't write the slide deck; don't make the report; don't even fix anything. Write the script that detects the regressions you can't tolerate. The script gives you a baseline number, the baseline tells you the size of the problem, and the size of the problem tells you whether to fix by hand, by codemod, or by focused review pass. Once the script is in place, every fix is cheap and every regression becomes a build break instead of a customer complaint.

That's the difference between accessibility as a checklist and accessibility as engineering. We chose engineering, because every learner means every learner.