Handle unseen categorical values safely at inference
data-005
Intent
Prevent production failures or unstable predictions when new category values appear.
Applicability
Applies to categorical feature encoding for serving or batch inference. Return unknown when encoding artifacts are external.
What to inspect
Encoders, vocabulary files, fallback buckets, unknown-token handling, and inference preprocessing code.
Pass criteria
The serving path handles unseen categories with an explicit unknown bucket, ignore mode, or equivalent safe fallback.
Fail criteria
New values crash inference, raise key errors, or silently map to a real category with no deliberate fallback policy.
Do not flag
Truly closed vocabularies enforced by upstream schema and business rules.
Confidence guidance
HIGH when encoder configuration or lookup code clearly lacks unknown handling. MEDIUM when fallback may exist in another layer. LOW when category space ownership is unclear.
Remediation
Add an unknown-category path and ensure training and serving artifacts agree on it.
Pass example
encoder = OneHotEncoder(handle_unknown="ignore")
Fail example
encoded = [vocab[value] for value in incoming_values]