Accuracy & Credibility
VertaaUX audits are designed to be verifiable. We don't just produce findings—we publish what makes them trustworthy: which parts are deterministic vs. LLM-assisted, how confidence is assigned, and how each finding links back to evidence you can review.
What Accuracy Means Here
“Accuracy” is not a single score. We publish multiple signals so teams (and investors) can judge credibility from different angles: correctness, repeatability, and traceability.
Validation & Methodology
This is how a VertaaUX audit is produced and how to interpret the results.
Most detection is deterministic (DOM/CSS/interaction simulation). Optional enhancement layers may use ML/LLMs for summarization, prioritization, or remediation suggestions.
- Deterministic: snapshot parsing, semantic structure, keyboard operability, scoring.
- LLM-assisted (when enabled): remediation suggestions and other higher-level recommendations.
Findings are designed to be reviewable. When available, each finding includes:
- rule_id and ruleset_version for traceability and repeatability.
- confidence for probabilistic outputs (LLM/ML-assisted).
- selector and/or element reference for reproduction.
- Evidence such as DOM snippet, WCAG references, and links to the audited page context.
Expected false positives / false negatives
Common false positives
- Intentional design choices (e.g., visually subtle focus indicators) that still satisfy internal standards.
- Highly customized components where semantics are correct but hard to infer from DOM alone.
- Dynamic UI states that differ between initial load and real-user interaction flows.
Common false negatives
- Content behind authentication, feature flags, or geo blocks.
- Issues that only appear after long sessions, complex multi-step flows, or user-generated data.
- Late-loading elements that appear after network-idle snapshots.
Example: evidence-backed finding schema
This example shows the shape we use for traceability. Not every finding includes every field yet; missing fields are treated as “needs human review” rather than forced certainty.
{
"category": "semantic",
"severity": "warning",
"rule_id": "semantic.heading_hierarchy",
"ruleset_version": "1.0.0",
"confidence": 0.92,
"selector": "main h3:nth-of-type(2)",
"dom_snippet": "<h3>Pricing</h3>",
"wcag_reference": "WCAG 2.2 — 1.3.1",
"evidence": {
"why": "Heading level skips from H1 to H3, which can confuse assistive tech.",
"how_to_verify": "Inspect the DOM and confirm heading order matches visual hierarchy."
}
}You can export raw JSON from the audit results view (Export JSON) and validate selectors and rules against the page DOM.
Trust Markers
Credibility is also operational. Beyond methodology, we provide signals that the system is run and maintained responsibly.