Skip to content

Data

The data page contains the shared value objects used by suites, backends, metrics, and benchmark runs.

Enums And Records

base

krisis/data/base.py

Abstract base classes for the Krisis data layer. All domain-specific data modules (CKD, Hypertension, Diabetes) inherit from these contracts.

FeatureSet

Bases: str, Enum

Controls which feature set a suite exposes to the benchmark.

FULL — all features in the raw dataset, including weak signals and clinically ambiguous markers. Use this to stress-test models on messy, real-world input.

REDUCED — the validated subset derived from feature selection (e.g. Pearson correlation). Tighter signal, less noise. Use this for focused clinical reasoning evaluation.

Task

Bases: str, Enum

The clinical reasoning task the benchmark will evaluate.

DETECTION — binary: condition present vs. not present. The simplest task. Most models handle this reasonably.

STAGING — multiclass: assign the correct disease stage (e.g. CKD Stage 1–5). Harder. Requires understanding of clinical thresholds.

PROGRESSION — temporal: given a patient trajectory across multiple time points, predict direction of disease progression. The hardest task. Requires longitudinal reasoning.

PatientRecord dataclass

A single patient record passed to a model backend for evaluation.

features — the clinical markers for this patient as a dict e.g. {"sc": 1.2, "hemo": 10.4, "htn": "yes", ...}

label — the ground truth label for this record. For DETECTION: 0 or 1 For STAGING: integer stage (1–5) For PROGRESSION: direction string ("stable", "worsening", "improving") or next-stage integer

metadata — optional dict for anything that shouldn't be passed to the model but is useful for result analysis e.g. {"egfr": 42.3, "ckd_stage": 3, "sex": "female"}

SuiteConfig dataclass

Configuration passed to a BaseDataSuite at instantiation.

features — FULL or REDUCED feature set task — DETECTION, STAGING, or PROGRESSION seed — random seed for reproducibility across all stochastic operations (imputation, generation, splits) n_synthetic — number of synthetic patient records to generate. Set to 0 to use only real records from the source dataset. test_size — fraction of records held out for evaluation (0.0–1.0)