principle @community/principle-train-serve-skew

Train Serve Skew

Train-serve skew is the gap between the feature distribution a model saw during training and the distribution it sees in production.…

Skill: @community
Domain: machine-learning
Version: 1.0.0
Quality: 4.0
Edges: 5 out · 6 in
Tokens: 249/561/890

$ prime install @community/principle-train-serve-skew

Projection

3 levels · agent picks one per query

Always in _index.xml · the agent never has to ask for this.

TrainServeSkew [principle] v1.0.0

Features used at inference time must be computed by exactly the same code, against exactly the same data sources, as features used at training time. Any divergence — different code path, different SQL, different rounding, different null handling — produces train-serve skew, the #1 cause of silent ML production failures.

Train-serve skew is the gap between the feature distribution a model saw during training and the distribution it sees in production. The model's accuracy degrades silently — predictions remain plausible but systematically wrong. Skew is caused by (1) duplicated feature logic in offline notebooks vs online services, (2) different snapshots of source data, (3) different time-window semantics ('last 7 days' computed in UTC vs local time), or (4) different missing-value handling. Eliminate skew architecturally: a single feature definition compiled into both batch (training) and online (serving) execution.

Loaded when retrieval picks the atom as adjacent / supporting.

TrainServeSkew [principle] v1.0.0

Train-serve skew is the gap between the feature distribution a model saw during training and the distribution it sees in production. The model's accuracy degrades silently — predictions remain plausible but systematically wrong. Skew is caused by (1) duplicated feature logic in offline notebooks vs online services, (2) different snapshots of source data, (3) different time-window semantics ('last 7 days' computed in UTC vs local time), or (4) different missing-value handling. Eliminate skew architecturally: a single feature definition compiled into both batch (training) and online (serving) execution.

Attributed To

Google, 'Rules of Machine Learning: Best Practices for ML Engineering' (Martin Zinkevich, 2017) — Rule #29: 'The best way to make sure that you train like you serve is to save the set of features used at serving time.'

Applies To

Any ML model in production where features are derived from raw events or DB rows
Recommender systems with session-level features (most-skew-prone)
Fraud / risk scoring with rolling-window aggregates (avg_txn_amount_30d)
Personalization features computed from user history
Anything using time-series joins (point-in-time correctness is mandatory)

Counter Examples

Training notebook computes features = df.groupby('user_id').agg(...). Serving code re-implements the same logic in Java with subtly different null handling. Production AUC is 0.05 lower than offline; silent for 3 months.
Training uses WHERE event_time < label_time (correct). Serving uses WHERE event_time < now() (correct at serve-time but never matches training during backtest). Skew shows up only in production.
Categorical encoding: training fits a label-encoder on the training set; serving sees a new category and silently emits 0 (the encoder default). Model treats unknown items as item-id-zero.

Loaded when retrieval picks the atom as a focal / direct hit.

TrainServeSkew [principle] v1.0.0

Train-serve skew is the gap between the feature distribution a model saw during training and the distribution it sees in production. The model's accuracy degrades silently — predictions remain plausible but systematically wrong. Skew is caused by (1) duplicated feature logic in offline notebooks vs online services, (2) different snapshots of source data, (3) different time-window semantics ('last 7 days' computed in UTC vs local time), or (4) different missing-value handling. Eliminate skew architecturally: a single feature definition compiled into both batch (training) and online (serving) execution.

Attributed To

Applies To

Any ML model in production where features are derived from raw events or DB rows
Recommender systems with session-level features (most-skew-prone)
Fraud / risk scoring with rolling-window aggregates (avg_txn_amount_30d)
Personalization features computed from user history
Anything using time-series joins (point-in-time correctness is mandatory)

Counter Examples

Training notebook computes features = df.groupby('user_id').agg(...). Serving code re-implements the same logic in Java with subtly different null handling. Production AUC is 0.05 lower than offline; silent for 3 months.
Training uses WHERE event_time < label_time (correct). Serving uses WHERE event_time < now() (correct at serve-time but never matches training during backtest). Skew shows up only in production.
Categorical encoding: training fits a label-encoder on the training set; serving sees a new category and silently emits 0 (the encoder default). Model treats unknown items as item-id-zero.

Sources

Examples

Feast / Tecton / Hopsworks feature store: feature defined once in Python, materialized to (a) a parquet table for offline training joins and (b) Redis/DynamoDB for online lookup — single definition, zero duplication.
Point-in-time-correct join: training pipeline must compute user_avg_txn_30d AS OF event_timestamp for every label row — never the latest value (which would leak future data).
Inference-time logging: log every feature value the model received, plus the prediction. Compare distributions vs training set in a daily skew-detection job — alert on KS-statistic or population-stability-index drift.
Featuretools / dbt-based feature defs that compile to both Spark (training) and Flink (serving) — same SQL semantics on both sides.

Relations

requires: @community/pattern-feature-store

Source

Zinkevich, 'Rules of Machine Learning' (2017) — Rules #29, #31, #32 specifically on train/serve skew
Polyzotis, Roy, Whang, Zinkevich, 'Data Lifecycle Challenges in Production Machine Learning' (SIGMOD 2018)
Uber Michelangelo paper (2017) — feature pipeline reuse between offline training and online serving
Tecton & Feast feature store documentation — point-in-time correct joins as the canonical defense against skew

Requires

@community/pattern-feature-store

Source

where the compiled artifact came from

prime-system/examples/frontend-design/primes/compiled/@community/principle-train-serve-skew/atom.yaml

Compiled at 2026-05-07