Ludex — schema drift detection

When telemetry reaches Ludex, it is checked against approved schemas for your project and environment. Schema drift is the umbrella term for cases where events fail that deterministic validation: the contract you registered no longer matches what the game is sending, or no contract exists yet for a new event name.

Ludex does not silently coerce invalid payloads into “good” data. Instead, failures are quarantined, grouped into drift issues you can inspect in the dashboard or over the API, and driven through a governed path (suggestions, approval, replay) until traffic is healthy again.

What counts as drift

Typical causes include:

No schema yet for an event_name your title is already emitting in production or test.
Shape changes — extra or missing fields compared to the active schema, or required fields absent.
Type mismatches — for example a string where the schema expects a number or a nested object.

Drift is detected on the deterministic validation path (not by guessing). Each failure is classified into a drift category and rolled up into an issue row keyed by organization, project, environment, event name, and category.

Drift categories

These are the machine-readable category values you will see on issues and in API filters. Customer-facing meaning:

Category	What it means
`missing_schema`	There is no registered schema for this event name (or equivalent mapping) in this environment’s contract set.
`field_shape`	The payload’s structure does not match the schema — for example missing required fields, or extra fields your policy treats as a mismatch.
`field_type`	A field is present but its JSON type or shape does not match the schema (e.g. string vs number).
`semantic`	Reserved for richer semantic checks when enabled; treat as “non-structural contract violation” in tooling.
`env_divergence`	Signals differences across environments (e.g. staging vs production) for the same logical event.
`volume_spike`	Used when unusual volume patterns are associated with drift-like failure clusters.

Filters on list endpoints accept the string values above (for example drift_category=field_type).

Status lifecycle

Each drift issue moves through a status that tells you whether you need to act, Ludex is processing a suggestion, or the issue is closed.

Status	What it means for you
`open`	New or ongoing; failures are still landing against this issue. Review and prioritize.
`under_review`	Someone is actively triaging (operator or your team, depending on process).
`suggestion_ready`	A deterministic or assisted schema suggestion exists and can be reviewed.
`awaiting_approval`	A suggestion is waiting for explicit approval before it becomes active.
`approved`	The governing schema change is approved; the system can move toward replay of quarantined events.
`replaying`	Affected events are being replayed through validation using the new contract.
`resolved`	The issue is closed successfully from a recovery perspective (subject to your org’s definitions).
`partially_resolved`	Some traffic is healthy; residual failures or backlog may remain.
`terminal`	Closed in an end state that is not “fully recovered” (for example abandoned or superseded by policy).

Issues that are resolved or terminal are treated as closed for backlog-style metrics (for example replayable backlog counts).

Severity

Issues carry a severity string used for sorting and alerting:

info → low → medium → high → critical

Severity reflects how Ludex ranks the issue (for example affected event counts, environment, and policy). Use it to drive paging rules in your own monitoring: treat high and critical as worthy of fast human review unless you have agreed SLAs otherwise.

Dashboard API (credential-scoped)

These endpoints are read-only. Authenticate with the same Bearer API key style as ingestion, but the credential must include the read:dashboard scope. Organization, default project, and default environment come from the key; query parameters let you narrow or widen project and environment where the API allows it.

Use the base URL Ludex gives you for the admin / dashboard host (the path prefix is always /v1/dashboard).

`GET /v1/dashboard/drift-overview`

Returns aggregate metrics for the scoped organization (and optional project / environment filter), including:

Counts of open issues and new issues in the last 24 hours and 7 days
Quarantined event volume attributed to open drift (quarantined_by_drift)
Distinct environments with open drift
Replayable backlog (sum of replayable counts for issues not in resolved / terminal)
Top categories and top source event names by affected volume

Query parameters

Parameter	Description
`project_id`	Default `all`; otherwise must match the credential’s project when not `all`.
`environment_id`	Optional; when set, must match the credential’s environment.

Example (curl)

curl -sS -G "https://YOUR_LUDEX_HOST/v1/dashboard/drift-overview" \
  -H "Authorization: Bearer YOUR_DASHBOARD_API_KEY" \
  --data-urlencode "project_id=all"

Example shape (illustrative)

{
  "status": "ok",
  "message": "Drift overview (dashboard)",
  "data": {
    "organization_id": "org_…",
    "project_id": null,
    "open_issues_count": 3,
    "new_issues_24h": 1,
    "new_issues_7d": 2,
    "quarantined_by_drift": 12840,
    "environments_with_drift": 2,
    "replayable_backlog": 9000,
    "top_categories": [
      { "drift_category": "field_shape", "issue_count": 2 }
    ],
    "top_event_types": [
      { "source_event_name": "combat_hit", "affected_count": 8000 }
    ]
  }
}

Exact keys may evolve slightly; always key off status, message, and data.

`GET /v1/dashboard/drift-issues`

Paginated list of drift issues with optional filters.

Query parameters

Parameter	Description
`project_id`	Same semantics as overview.
`environment_id`	Optional; defaults to credential environment when omitted.
`drift_category`	Filter by category (see table above).
`severity`	Filter by severity string.
`status`	Filter by issue status.
`source_event_name`	Filter by raw / source event name.
`page`	Page number (default `1`).
`page_size`	Page size (default `50`, max `200`).
`sort_by`	One of: `id`, `last_seen_at`, `first_seen_at`, `affected_count`, `severity`, `status`, `source_event_name`, `created_at`.
`sort_order`	`asc` or `desc`.

Response

On success: status is ok, data is an array of issue objects, and total_count is the total number of rows matching the filter (for pagination).

Example (curl)

curl -sS -G "https://YOUR_LUDEX_HOST/v1/dashboard/drift-issues" \
  -H "Authorization: Bearer YOUR_DASHBOARD_API_KEY" \
  --data-urlencode "project_id=all" \
  --data-urlencode "drift_category=missing_schema" \
  --data-urlencode "status=open" \
  --data-urlencode "page=1" \
  --data-urlencode "page_size=50"

How to monitor drift

Poll the overview on an interval (for example every 5–15 minutes) and alert when open_issues_count, quarantined_by_drift, or replayable_backlog crosses your thresholds.
Drill into drift-issues when an alert fires; use source_event_name and drift_category to route tickets to the right game team.
Track status transitions — move from open → suggestion_ready → approved → replaying → resolved with your governance process.
Wire CI or scripts using the same Bearer + read:dashboard pattern as the rest of the dashboard API.

Python (httpx) example

import os
import httpx

BASE = os.environ["LUDEX_DASHBOARD_BASE_URL"].rstrip("/")  # host that serves /v1/dashboard
KEY = os.environ["LUDEX_DASHBOARD_API_KEY"]  # credential with read:dashboard

headers = {"Authorization": f"Bearer {KEY}"}

with httpx.Client(base_url=BASE, headers=headers, timeout=30.0) as client:
    overview = client.get("/v1/dashboard/drift-overview", params={"project_id": "all"})
    overview.raise_for_status()
    body = overview.json()
    assert body.get("status") == "ok"
    data = body["data"]
    print("open_issues_count:", data.get("open_issues_count"))
    print("replayable_backlog:", data.get("replayable_backlog"))

    issues = client.get(
        "/v1/dashboard/drift-issues",
        params={"project_id": "all", "status": "open", "page": 1, "page_size": 25},
    )
    issues.raise_for_status()
    ib = issues.json()
    print("total_count:", ib.get("total_count"))
    for row in ib.get("data") or []:
        print(row.get("source_event_name"), row.get("drift_category"), row.get("severity"))

What you should do when you see drift

Confirm scope — correct project and environment on the credential and in query params.
Open the issue (UI or API) and read category + linked failure context.
If a schema suggestion is ready, run your approval workflow (in-product admin or operator-assisted).
After approval and replay, affected events should flow back into trusted telemetry; issues typically move to resolved when the backlog is cleared according to Ludex rules.
If severity is high, treat as a release or compatibility risk: coordinate a client patch or schema update with your ship plan.

For ingestion errors at the HTTP edge (auth, rate limits, malformed JSON), see error-catalog.md. For sending valid events, see quickstart.md.