Ludex — schema drift detection
When telemetry reaches Ludex, it is checked against approved schemas for your project and environment. Schema drift is the umbrella term for cases where events fail that deterministic validation: the contract you registered no longer matches what the game is sending, or no contract exists yet for a new event name.
Ludex does not silently coerce invalid payloads into “good” data. Instead, failures are quarantined, grouped into drift issues you can inspect in the dashboard or over the API, and driven through a governed path (suggestions, approval, replay) until traffic is healthy again.
What counts as drift
Typical causes include:
- No schema yet for an
event_nameyour title is already emitting in production or test. - Shape changes — extra or missing fields compared to the active schema, or required fields absent.
- Type mismatches — for example a string where the schema expects a number or a nested object.
Drift is detected on the deterministic validation path (not by guessing). Each failure is classified into a drift category and rolled up into an issue row keyed by organization, project, environment, event name, and category.
Drift categories
These are the machine-readable category values you will see on issues and in API filters. Customer-facing meaning:
| Category | What it means |
|---|---|
missing_schema |
There is no registered schema for this event name (or equivalent mapping) in this environment’s contract set. |
field_shape |
The payload’s structure does not match the schema — for example missing required fields, or extra fields your policy treats as a mismatch. |
field_type |
A field is present but its JSON type or shape does not match the schema (e.g. string vs number). |
semantic |
Reserved for richer semantic checks when enabled; treat as “non-structural contract violation” in tooling. |
env_divergence |
Signals differences across environments (e.g. staging vs production) for the same logical event. |
volume_spike |
Used when unusual volume patterns are associated with drift-like failure clusters. |
Filters on list endpoints accept the string values above (for example drift_category=field_type).
Status lifecycle
Each drift issue moves through a status that tells you whether you need to act, Ludex is processing a suggestion, or the issue is closed.
| Status | What it means for you |
|---|---|
open |
New or ongoing; failures are still landing against this issue. Review and prioritize. |
under_review |
Someone is actively triaging (operator or your team, depending on process). |
suggestion_ready |
A deterministic or assisted schema suggestion exists and can be reviewed. |
awaiting_approval |
A suggestion is waiting for explicit approval before it becomes active. |
approved |
The governing schema change is approved; the system can move toward replay of quarantined events. |
replaying |
Affected events are being replayed through validation using the new contract. |
resolved |
The issue is closed successfully from a recovery perspective (subject to your org’s definitions). |
partially_resolved |
Some traffic is healthy; residual failures or backlog may remain. |
terminal |
Closed in an end state that is not “fully recovered” (for example abandoned or superseded by policy). |
Issues that are resolved or terminal are treated as closed for backlog-style metrics (for example replayable backlog counts).
Severity
Issues carry a severity string used for sorting and alerting:
info → low → medium → high → critical
Severity reflects how Ludex ranks the issue (for example affected event counts, environment, and policy). Use it to drive paging rules in your own monitoring: treat high and critical as worthy of fast human review unless you have agreed SLAs otherwise.
Dashboard API (credential-scoped)
These endpoints are read-only. Authenticate with the same Bearer API key style as ingestion, but the credential must include the read:dashboard scope. Organization, default project, and default environment come from the key; query parameters let you narrow or widen project and environment where the API allows it.
Use the base URL Ludex gives you for the admin / dashboard host (the path prefix is always /v1/dashboard).
GET /v1/dashboard/drift-overview
Returns aggregate metrics for the scoped organization (and optional project / environment filter), including:
- Counts of open issues and new issues in the last 24 hours and 7 days
- Quarantined event volume attributed to open drift (
quarantined_by_drift) - Distinct environments with open drift
- Replayable backlog (sum of replayable counts for issues not in
resolved/terminal) - Top categories and top source event names by affected volume
Query parameters
| Parameter | Description |
|---|---|
project_id |
Default all; otherwise must match the credential’s project when not all. |
environment_id |
Optional; when set, must match the credential’s environment. |
Example (curl)
curl -sS -G "https://YOUR_LUDEX_HOST/v1/dashboard/drift-overview" \
-H "Authorization: Bearer YOUR_DASHBOARD_API_KEY" \
--data-urlencode "project_id=all"
Example shape (illustrative)
{
"status": "ok",
"message": "Drift overview (dashboard)",
"data": {
"organization_id": "org_…",
"project_id": null,
"open_issues_count": 3,
"new_issues_24h": 1,
"new_issues_7d": 2,
"quarantined_by_drift": 12840,
"environments_with_drift": 2,
"replayable_backlog": 9000,
"top_categories": [
{ "drift_category": "field_shape", "issue_count": 2 }
],
"top_event_types": [
{ "source_event_name": "combat_hit", "affected_count": 8000 }
]
}
}
Exact keys may evolve slightly; always key off status, message, and data.
GET /v1/dashboard/drift-issues
Paginated list of drift issues with optional filters.
Query parameters
| Parameter | Description |
|---|---|
project_id |
Same semantics as overview. |
environment_id |
Optional; defaults to credential environment when omitted. |
drift_category |
Filter by category (see table above). |
severity |
Filter by severity string. |
status |
Filter by issue status. |
source_event_name |
Filter by raw / source event name. |
page |
Page number (default 1). |
page_size |
Page size (default 50, max 200). |
sort_by |
One of: id, last_seen_at, first_seen_at, affected_count, severity, status, source_event_name, created_at. |
sort_order |
asc or desc. |
Response
- On success:
statusisok,datais an array of issue objects, andtotal_countis the total number of rows matching the filter (for pagination).
Example (curl)
curl -sS -G "https://YOUR_LUDEX_HOST/v1/dashboard/drift-issues" \
-H "Authorization: Bearer YOUR_DASHBOARD_API_KEY" \
--data-urlencode "project_id=all" \
--data-urlencode "drift_category=missing_schema" \
--data-urlencode "status=open" \
--data-urlencode "page=1" \
--data-urlencode "page_size=50"
How to monitor drift
- Poll the overview on an interval (for example every 5–15 minutes) and alert when
open_issues_count,quarantined_by_drift, orreplayable_backlogcrosses your thresholds. - Drill into
drift-issueswhen an alert fires; usesource_event_nameanddrift_categoryto route tickets to the right game team. - Track status transitions — move from
open→suggestion_ready→approved→replaying→resolvedwith your governance process. - Wire CI or scripts using the same Bearer +
read:dashboardpattern as the rest of the dashboard API.
Python (httpx) example
import os
import httpx
BASE = os.environ["LUDEX_DASHBOARD_BASE_URL"].rstrip("/") # host that serves /v1/dashboard
KEY = os.environ["LUDEX_DASHBOARD_API_KEY"] # credential with read:dashboard
headers = {"Authorization": f"Bearer {KEY}"}
with httpx.Client(base_url=BASE, headers=headers, timeout=30.0) as client:
overview = client.get("/v1/dashboard/drift-overview", params={"project_id": "all"})
overview.raise_for_status()
body = overview.json()
assert body.get("status") == "ok"
data = body["data"]
print("open_issues_count:", data.get("open_issues_count"))
print("replayable_backlog:", data.get("replayable_backlog"))
issues = client.get(
"/v1/dashboard/drift-issues",
params={"project_id": "all", "status": "open", "page": 1, "page_size": 25},
)
issues.raise_for_status()
ib = issues.json()
print("total_count:", ib.get("total_count"))
for row in ib.get("data") or []:
print(row.get("source_event_name"), row.get("drift_category"), row.get("severity"))
What you should do when you see drift
- Confirm scope — correct project and environment on the credential and in query params.
- Open the issue (UI or API) and read category + linked failure context.
- If a schema suggestion is ready, run your approval workflow (in-product admin or operator-assisted).
- After approval and replay, affected events should flow back into trusted telemetry; issues typically move to
resolvedwhen the backlog is cleared according to Ludex rules. - If severity is high, treat as a release or compatibility risk: coordinate a client patch or schema update with your ship plan.
For ingestion errors at the HTTP edge (auth, rate limits, malformed JSON), see error-catalog.md. For sending valid events, see quickstart.md.