Published precision and recall data for VertaaUX audit findings, measured against expert-labeled ground truth.
Looking for our methodology? See our Accuracy page.
Accessibility (WCAG)
92.4%
Precision
87.5%
Recall
89.8%
F1 Score
Based on 4,920 findingsUpdated 2026-01-15
UX Heuristics
84.3%
Precision
79.4%
Recall
81.6%
F1 Score
Based on 1,870 findingsUpdated 2026-01-15
Performance UX
92.2%
Precision
89.3%
Recall
90.5%
F1 Score
Based on 2,800 findingsUpdated 2026-01-15
Accessibility (WCAG)
Precision and recall for WCAG-related audit findings, measured against expert-labeled ground truth datasets. These benchmarks cover core accessibility checks including contrast, semantics, and interaction patterns.
Each benchmark entry is evaluated using independent expert review. See individual entry tooltips for methodology details.
UX Heuristics
Precision and recall for UX heuristic violations detected by the audit engine. These checks cover layout, typography, visual hierarchy, and interaction state quality assessed against expert UX review.
Each benchmark entry is evaluated using independent expert review. See individual entry tooltips for methodology details.
Performance UX
Precision and recall for performance-related UX findings including layout shift, paint timing, and interaction responsiveness. Benchmarked against Chrome DevTools and WebPageTest ground truth data.
Each benchmark entry is evaluated using independent expert review. See individual entry tooltips for methodology details.
How accurate is automated UX testing?
Automated UX testing accuracy varies by category. VertaaUX publishes precision, recall, and F1 scores for every audit dimension, measured against expert-labeled ground truth. Accessibility findings achieve the highest precision because rules map directly to WCAG success criteria. UX heuristic findings use a confidence-scored model with published thresholds.
What are UX audit precision benchmarks?
UX audit precision benchmarks measure the percentage of reported issues that are true positives. VertaaUX publishes precision, recall, and F1 scores per category — accessibility, usability, performance, clarity, and more. Higher precision means fewer false positives; higher recall means fewer missed issues. Both are measured against expert-reviewed ground truth datasets.
How is UX audit accuracy measured?
UX audit accuracy is measured by comparing automated findings against expert-labeled ground truth. For each audit category, precision (correct findings / total reported), recall (found issues / total actual issues), and F1 score (harmonic mean) are calculated. VertaaUX publishes these metrics openly so users can evaluate detection reliability before integrating into their workflow.
Transparent accuracy, verifiable results
We publish our accuracy data because trust is earned through transparency. See how we measure, or try an audit yourself.
We use essential cookies to run the site. Optional cookies help us understand usage and improve the product. Read our Privacy Policy and Cookie Policy.