Data¶
The data page contains the shared value objects used by suites, backends, metrics, and benchmark runs.
Enums And Records¶
base ¶
krisis/data/base.py
Abstract base classes for the Krisis data layer. All domain-specific data modules (CKD, Hypertension, Diabetes) inherit from these contracts.
FeatureSet ¶
Bases: str, Enum
Controls which feature set a suite exposes to the benchmark.
FULL — all features in the raw dataset, including weak signals and clinically ambiguous markers. Use this to stress-test models on messy, real-world input.
REDUCED — the validated subset derived from feature selection (e.g. Pearson correlation). Tighter signal, less noise. Use this for focused clinical reasoning evaluation.
Task ¶
Bases: str, Enum
The clinical reasoning task the benchmark will evaluate.
DETECTION — binary: condition present vs. not present. The simplest task. Most models handle this reasonably.
STAGING — multiclass: assign the correct disease stage (e.g. CKD Stage 1–5). Harder. Requires understanding of clinical thresholds.
PROGRESSION — temporal: given a patient trajectory across multiple time points, predict direction of disease progression. The hardest task. Requires longitudinal reasoning.
PatientRecord
dataclass
¶
A single patient record passed to a model backend for evaluation.
features — the clinical markers for this patient as a dict e.g. {"sc": 1.2, "hemo": 10.4, "htn": "yes", ...}
label — the ground truth label for this record. For DETECTION: 0 or 1 For STAGING: integer stage (1–5) For PROGRESSION: direction string ("stable", "worsening", "improving") or next-stage integer
metadata — optional dict for anything that shouldn't be passed to the model but is useful for result analysis e.g. {"egfr": 42.3, "ckd_stage": 3, "sex": "female"}
SuiteConfig
dataclass
¶
Configuration passed to a BaseDataSuite at instantiation.
features — FULL or REDUCED feature set task — DETECTION, STAGING, or PROGRESSION seed — random seed for reproducibility across all stochastic operations (imputation, generation, splits) n_synthetic — number of synthetic patient records to generate. Set to 0 to use only real records from the source dataset. test_size — fraction of records held out for evaluation (0.0–1.0)