Skip to content

Suite

The suite page defines the reusable data-layer interfaces that concrete clinical suites implement.

Base classes, not CKD internals

This page documents the framework base classes. For the current CKD implementation, see Framework Guide -> Suites -> CKD.

Suite Base Classes

base

krisis/data/base.py

Abstract base classes for the Krisis data layer. All domain-specific data modules (CKD, Hypertension, Diabetes) inherit from these contracts.

BaseDataSuite

Bases: ABC

The top-level data contract that Benchmark receives.

A suite is the public API of the data layer. It wires together a preprocessor, feature engineer, and generator, and exposes a clean list of PatientRecord objects ready for evaluation.

Example
suite = MyClinicalSuite(config=SuiteConfig(task=Task.STAGING))
records = suite.load()

The suite handles train/test splitting internally. Benchmark always receives the test split only.

domain property

domain: str

Human-readable domain name. e.g. 'CKD', 'Hypertension'

load abstractmethod

load() -> list[PatientRecord]

Run the full data pipeline and return test-split PatientRecords.

Pipeline order
  1. Load raw source data
  2. Preprocess (encode, impute, scale)
  3. Engineer features (domain-specific derivations)
  4. Generate synthetic records (if n_synthetic > 0)
  5. Merge real + synthetic
  6. Split → return test split as PatientRecord list

describe abstractmethod

describe() -> dict[str, Any]

Return a summary of the suite configuration and data statistics. Used by results.report() to document what was evaluated.

Should include at minimum
  • domain name
  • feature set (full/reduced)
  • task type
  • n_real records
  • n_synthetic records
  • label distribution
  • seed

BasePreprocessor

Bases: ABC

Cleans and imputes raw domain data.

Each domain implements this to handle its own encoding, imputation strategy, and scaling. The contract is simple: fit_transform takes a raw DataFrame and returns a clean one.

fit_transform abstractmethod

fit_transform(df: DataFrame) -> pd.DataFrame

Fit preprocessing on df and return the transformed DataFrame. Sets self._is_fitted = True on completion.

transform abstractmethod

transform(df: DataFrame) -> pd.DataFrame

Apply already-fitted preprocessing to new data. Raises RuntimeError if called before fit_transform.

BaseFeatureEngineer

Bases: ABC

Derives new clinically meaningful features from preprocessed data.

This is where domain-specific engineering happens: - CKD: eGFR computation, sex generation, stage derivation - Hypertension: MAP, pulse pressure, BP stage - Diabetes: HbA1c staging, insulin resistance markers

The engineer sits between the preprocessor and the generator — it operates on clean data and produces an enriched DataFrame that the generator can sample from.

fit_transform abstractmethod

fit_transform(df: DataFrame) -> pd.DataFrame

Engineer new features and return the enriched DataFrame.

get_feature_names abstractmethod

get_feature_names(feature_set: FeatureSet) -> list[str]

Return the list of feature column names for the given feature set. Used by the suite to select the right columns before passing records to the model backend.

BaseGenerator

Bases: ABC

Generates synthetic patient records from a fitted distribution.

Synthetic generation in Krisis is stage-aware — records are generated along physiologically plausible disease progression arcs, not sampled randomly. This ensures the benchmark tests models on clinically coherent inputs rather than statistical noise.

The generator is seeded for reproducibility. Two researchers running the same suite with the same seed get identical synthetic patients.

fit abstractmethod

fit(df: DataFrame) -> BaseGenerator

Fit the generator on an engineered DataFrame. Learns the statistical distribution of each feature per stage. Returns self for chaining.

generate abstractmethod

generate(n: int) -> pd.DataFrame

Generate n synthetic patient records. Returns a DataFrame with the same schema as the fitted data. Raises RuntimeError if called before fit().