Skip to content

Ludex — schema drift detection

When telemetry reaches Ludex, it is checked against approved schemas for your project and environment. Schema drift is the umbrella term for cases where events fail that deterministic validation: the contract you registered no longer matches what the game is sending, or no contract exists yet for a new event name.

Ludex does not silently coerce invalid payloads into “good” data. Instead, failures are quarantined, grouped into drift issues you can inspect in the dashboard or over the API, and driven through a governed path (suggestions, approval, replay) until traffic is healthy again.

What counts as drift

Typical causes include:

  • No schema yet for an event_name your title is already emitting in production or test.
  • Shape changes — extra or missing fields compared to the active schema, or required fields absent.
  • Type mismatches — for example a string where the schema expects a number or a nested object.

Drift is detected on the deterministic validation path (not by guessing). Each failure is classified into a drift category and rolled up into an issue row keyed by organization, project, environment, event name, and category.

Drift categories

These are the machine-readable category values you will see on issues and in API filters. Customer-facing meaning:

Category What it means
missing_schema There is no registered schema for this event name (or equivalent mapping) in this environment’s contract set.
field_shape The payload’s structure does not match the schema — for example missing required fields, or extra fields your policy treats as a mismatch.
field_type A field is present but its JSON type or shape does not match the schema (e.g. string vs number).
semantic Reserved for richer semantic checks when enabled; treat as “non-structural contract violation” in tooling.
env_divergence Signals differences across environments (e.g. staging vs production) for the same logical event.
volume_spike Used when unusual volume patterns are associated with drift-like failure clusters.

Filters on list endpoints accept the string values above (for example drift_category=field_type).

Status lifecycle

Each drift issue moves through a status that tells you whether you need to act, Ludex is processing a suggestion, or the issue is closed.

Status What it means for you
open New or ongoing; failures are still landing against this issue. Review and prioritize.
under_review Someone is actively triaging (operator or your team, depending on process).
suggestion_ready A deterministic or assisted schema suggestion exists and can be reviewed.
awaiting_approval A suggestion is waiting for explicit approval before it becomes active.
approved The governing schema change is approved; the system can move toward replay of quarantined events.
replaying Affected events are being replayed through validation using the new contract.
resolved The issue is closed successfully from a recovery perspective (subject to your org’s definitions).
partially_resolved Some traffic is healthy; residual failures or backlog may remain.
terminal Closed in an end state that is not “fully recovered” (for example abandoned or superseded by policy).

Issues that are resolved or terminal are treated as closed for backlog-style metrics (for example replayable backlog counts).

Severity

Issues carry a severity string used for sorting and alerting:

infolowmediumhighcritical

Severity reflects how Ludex ranks the issue (for example affected event counts, environment, and policy). Use it to drive paging rules in your own monitoring: treat high and critical as worthy of fast human review unless you have agreed SLAs otherwise.

Dashboard API (credential-scoped)

These endpoints are read-only. Authenticate with the same Bearer API key style as ingestion, but the credential must include the read:dashboard scope. Organization, default project, and default environment come from the key; query parameters let you narrow or widen project and environment where the API allows it.

Use the base URL Ludex gives you for the admin / dashboard host (the path prefix is always /v1/dashboard).

GET /v1/dashboard/drift-overview

Returns aggregate metrics for the scoped organization (and optional project / environment filter), including:

  • Counts of open issues and new issues in the last 24 hours and 7 days
  • Quarantined event volume attributed to open drift (quarantined_by_drift)
  • Distinct environments with open drift
  • Replayable backlog (sum of replayable counts for issues not in resolved / terminal)
  • Top categories and top source event names by affected volume

Query parameters

Parameter Description
project_id Default all; otherwise must match the credential’s project when not all.
environment_id Optional; when set, must match the credential’s environment.

Example (curl)

curl -sS -G "https://YOUR_LUDEX_HOST/v1/dashboard/drift-overview" \
  -H "Authorization: Bearer YOUR_DASHBOARD_API_KEY" \
  --data-urlencode "project_id=all"

Example shape (illustrative)

{
  "status": "ok",
  "message": "Drift overview (dashboard)",
  "data": {
    "organization_id": "org_…",
    "project_id": null,
    "open_issues_count": 3,
    "new_issues_24h": 1,
    "new_issues_7d": 2,
    "quarantined_by_drift": 12840,
    "environments_with_drift": 2,
    "replayable_backlog": 9000,
    "top_categories": [
      { "drift_category": "field_shape", "issue_count": 2 }
    ],
    "top_event_types": [
      { "source_event_name": "combat_hit", "affected_count": 8000 }
    ]
  }
}

Exact keys may evolve slightly; always key off status, message, and data.

GET /v1/dashboard/drift-issues

Paginated list of drift issues with optional filters.

Query parameters

Parameter Description
project_id Same semantics as overview.
environment_id Optional; defaults to credential environment when omitted.
drift_category Filter by category (see table above).
severity Filter by severity string.
status Filter by issue status.
source_event_name Filter by raw / source event name.
page Page number (default 1).
page_size Page size (default 50, max 200).
sort_by One of: id, last_seen_at, first_seen_at, affected_count, severity, status, source_event_name, created_at.
sort_order asc or desc.

Response

  • On success: status is ok, data is an array of issue objects, and total_count is the total number of rows matching the filter (for pagination).

Example (curl)

curl -sS -G "https://YOUR_LUDEX_HOST/v1/dashboard/drift-issues" \
  -H "Authorization: Bearer YOUR_DASHBOARD_API_KEY" \
  --data-urlencode "project_id=all" \
  --data-urlencode "drift_category=missing_schema" \
  --data-urlencode "status=open" \
  --data-urlencode "page=1" \
  --data-urlencode "page_size=50"

How to monitor drift

  1. Poll the overview on an interval (for example every 5–15 minutes) and alert when open_issues_count, quarantined_by_drift, or replayable_backlog crosses your thresholds.
  2. Drill into drift-issues when an alert fires; use source_event_name and drift_category to route tickets to the right game team.
  3. Track status transitions — move from opensuggestion_readyapprovedreplayingresolved with your governance process.
  4. Wire CI or scripts using the same Bearer + read:dashboard pattern as the rest of the dashboard API.

Python (httpx) example

import os
import httpx

BASE = os.environ["LUDEX_DASHBOARD_BASE_URL"].rstrip("/")  # host that serves /v1/dashboard
KEY = os.environ["LUDEX_DASHBOARD_API_KEY"]  # credential with read:dashboard

headers = {"Authorization": f"Bearer {KEY}"}

with httpx.Client(base_url=BASE, headers=headers, timeout=30.0) as client:
    overview = client.get("/v1/dashboard/drift-overview", params={"project_id": "all"})
    overview.raise_for_status()
    body = overview.json()
    assert body.get("status") == "ok"
    data = body["data"]
    print("open_issues_count:", data.get("open_issues_count"))
    print("replayable_backlog:", data.get("replayable_backlog"))

    issues = client.get(
        "/v1/dashboard/drift-issues",
        params={"project_id": "all", "status": "open", "page": 1, "page_size": 25},
    )
    issues.raise_for_status()
    ib = issues.json()
    print("total_count:", ib.get("total_count"))
    for row in ib.get("data") or []:
        print(row.get("source_event_name"), row.get("drift_category"), row.get("severity"))

What you should do when you see drift

  1. Confirm scope — correct project and environment on the credential and in query params.
  2. Open the issue (UI or API) and read category + linked failure context.
  3. If a schema suggestion is ready, run your approval workflow (in-product admin or operator-assisted).
  4. After approval and replay, affected events should flow back into trusted telemetry; issues typically move to resolved when the backlog is cleared according to Ludex rules.
  5. If severity is high, treat as a release or compatibility risk: coordinate a client patch or schema update with your ship plan.

For ingestion errors at the HTTP edge (auth, rate limits, malformed JSON), see error-catalog.md. For sending valid events, see quickstart.md.