What you get from a BayesIQ audit

Real artifacts from a real audit — not a PDF of recommendations. The Audit Kit produces scored findings, column-level profiles, data contracts, metric specs, a deployable dbt project, and interactive dashboards.

Start Your Audit See our services →

Pipeline artifacts

Every audit produces these files. They land in your repo or shared drive — no proprietary portal required.

audit_report.md
Scored findings with severity, root cause, evidence, and fix recommendations. Every issue is tied to a specific event, column, or query.
dataset_profile.json
Column-level profiling for every table: data types, null rates, cardinality, top values, and distribution summaries.
quality_checks.json
Machine-readable findings for integration into CI pipelines, alerting systems, or internal dashboards.
ASSUMPTIONS.md
Data contracts documenting schema assumptions, quality expectations, temporal patterns, and entity relationships. Your team signs off before we build.
METRICS.md
Metric definitions with exact formulas, source events, dimensions, granularity, and validation rules.
dbt project
Complete dbt project with staging models, mart models, schema tests, and source definitions. Ready to deploy to your warehouse.
Streamlit dashboard
Interactive app with sidebar filters, time series charts, dimension breakdowns, and a data quality summary. Usable from day one.
canonicalization_mapping.json
Naming inconsistencies across platforms and pipelines mapped to canonical forms. Feed it into your dbt project or ETL layer.

Example findings from `audit_report.md`

Anonymized excerpt from an Audit Kit run on a B2B SaaS product (~50 M events/month). Finding IDs, event names, and property names have been changed.

F-01Critical

checkout_completed fires on payment attempt, not payment confirmation — 23% funnel inflation.

Root Cause

Client-side event triggered before async confirmation callback resolves.

Recommended Fix

Move event dispatch into confirmation callback; backfill last 90 days using server-side order records.

F-02High

user_id null in 18% of mobile web page_view events.

Root Cause

Anonymous session handling does not wait for identity resolution before firing the event.

Recommended Fix

Delay event dispatch by 300 ms post-load or use a queue that flushes after identity resolves.

F-03High

revenue_daily excludes late refunds processed after midnight UTC. Net revenue overstated by ~4.2% month-over-month.

Root Cause

JOIN condition uses transaction_date instead of event_date for the refund table, silently dropping late refunds.

Recommended Fix

Update JOIN key to refund_issued_date; re-run historical aggregation for the trailing 12 months.

F-04Medium

activation_rate query doesn’t match current definition — stale WHERE clause counts any feature_used event instead of three distinct features within 7 days.

Root Cause

Metric query was written before the activation definition was finalized and was never updated.

Recommended Fix

Rewrite metric query to match current definition; add a test that checks the query against the spec document.

F-05Medium

experiment_viewed deduplicates by session instead of timestamp. Impression counts understated.

Root Cause

Deduplication logic uses session ID instead of a (session_id, timestamp) composite key.

Recommended Fix

Update deduplication key; note that historical impression data cannot be corrected.

F-06Low

device_type inconsistent across platforms — iOS sends "iPhone", Android sends "ios", web sends "iOS".

Root Cause

Inconsistent client library versions across platforms.

Recommended Fix

Standardize on enumerated values; add schema validation rule to catch raw user-agent strings.

Scoring rubric (0–100)

Every audit produces an overall health score. The score reflects the count, severity, and blast radius of confirmed issues.

Score	Rating	What it means
90–100	Strong	Minor issues only. Data infrastructure is well-maintained and trustworthy.
70–89	Needs Work	Significant issues requiring attention. Key metrics may be directionally correct but unreliable for precise decisions.
0–69	At Risk	Critical issues affecting key metrics. Decisions based on this data are likely incorrect.

Severity definitions

Every finding is ranked by business impact and blast radius — how many downstream metrics or reports does this affect?

Severity	Definition	Typical action
Critical	Metric is systematically wrong. Decisions made on this data are likely incorrect.	Fix before next reporting cycle.
High	Significant inaccuracy in a key metric. Risk of misleading product or business decisions.	Fix in 2–4 weeks.
Medium	Partial data loss or inconsistency. Metric is directionally correct but unreliable for precise decisions.	Schedule in next sprint.
Low	Minor discrepancy or edge-case gap. Negligible business impact at current scale.	Address opportunistically.

Engagement timeline — 6 weeks

A full engagement runs 6 weeks from kickoff to validated dashboards. Diagnostic sprints deliver findings in 1 week.

Ingest + Automated Pipeline + Expert Review

Week 1–2

Architecture review, access setup, logging spec collection. Automated pipeline profiles every table and column, flags anomalies, and generates scored findings. Data scientists review results, eliminate false positives, and assess root causes.

Assumptions Sign-off + Metric Specification

Week 3–4

ASSUMPTIONS.md and METRICS.md delivered. Your team reviews data contracts and metric definitions — this is the alignment gate. Nothing gets built until both sides agree on what the data should look like.

dbt Build + Dashboards + Training

Week 5–6

Auto-generated dbt project with staging/mart models and schema tests. Interactive Streamlit dashboards built on validated metrics. Handoff session with your team covering the dbt project, dashboard usage, and ongoing monitoring.

See it on your data

Drop a CSV in the playground for instant profiling, or book a diagnostic sprint.

Get in Touch

What you get from a BayesIQ audit

Pipeline artifacts

Example findings from audit_report.md

Scoring rubric (0–100)

Severity definitions

Engagement timeline — 6 weeks

Ingest + Automated Pipeline + Expert Review

Assumptions Sign-off + Metric Specification

dbt Build + Dashboards + Training

See it on your data

Example findings from `audit_report.md`