Skip to main content
Benchmarks

Accuracy Benchmarks

Published precision and recall data for VertaaUX audit findings, measured against expert-labeled ground truth.

Looking for our methodology? See our Accuracy page.

Accessibility (WCAG)

92.4%

Precision

87.5%

Recall

89.8%

F1 Score

Based on 4,920 findingsUpdated 2026-01-15
UX Heuristics

84.3%

Precision

79.4%

Recall

81.6%

F1 Score

Based on 1,870 findingsUpdated 2026-01-15
Performance UX

92.2%

Precision

89.3%

Recall

90.5%

F1 Score

Based on 2,800 findingsUpdated 2026-01-15

Accessibility (WCAG)

Precision and recall for WCAG-related audit findings, measured against expert-labeled ground truth datasets. These benchmarks cover core accessibility checks including contrast, semantics, and interaction patterns.

Each benchmark entry is evaluated using independent expert review. See individual entry tooltips for methodology details.

UX Heuristics

Precision and recall for UX heuristic violations detected by the audit engine. These checks cover layout, typography, visual hierarchy, and interaction state quality assessed against expert UX review.

Each benchmark entry is evaluated using independent expert review. See individual entry tooltips for methodology details.

Performance UX

Precision and recall for performance-related UX findings including layout shift, paint timing, and interaction responsiveness. Benchmarked against Chrome DevTools and WebPageTest ground truth data.

Each benchmark entry is evaluated using independent expert review. See individual entry tooltips for methodology details.

How accurate is automated UX testing?

Automated UX testing accuracy varies by category. VertaaUX publishes precision, recall, and F1 scores for every audit dimension, measured against expert-labeled ground truth. Accessibility findings achieve the highest precision because rules map directly to WCAG success criteria. UX heuristic findings use a confidence-scored model with published thresholds.

What are UX audit precision benchmarks?

UX audit precision benchmarks measure the percentage of reported issues that are true positives. VertaaUX publishes precision, recall, and F1 scores per category — accessibility, usability, performance, clarity, and more. Higher precision means fewer false positives; higher recall means fewer missed issues. Both are measured against expert-reviewed ground truth datasets.

How is UX audit accuracy measured?

UX audit accuracy is measured by comparing automated findings against expert-labeled ground truth. For each audit category, precision (correct findings / total reported), recall (found issues / total actual issues), and F1 score (harmonic mean) are calculated. VertaaUX publishes these metrics openly so users can evaluate detection reliability before integrating into their workflow.

Transparent accuracy, verifiable results

We publish our accuracy data because trust is earned through transparency. See how we measure, or try an audit yourself.