Suite¶
The suite page defines the reusable data-layer interfaces that concrete clinical suites implement.
Base classes, not CKD internals
This page documents the framework base classes. For the current CKD implementation, see Framework Guide -> Suites -> CKD.
Suite Base Classes¶
base ¶
krisis/data/base.py
Abstract base classes for the Krisis data layer. All domain-specific data modules (CKD, Hypertension, Diabetes) inherit from these contracts.
BaseDataSuite ¶
Bases: ABC
The top-level data contract that Benchmark receives.
A suite is the public API of the data layer. It wires together a preprocessor, feature engineer, and generator, and exposes a clean list of PatientRecord objects ready for evaluation.
Example
suite = MyClinicalSuite(config=SuiteConfig(task=Task.STAGING))
records = suite.load()
The suite handles train/test splitting internally. Benchmark always receives the test split only.
load
abstractmethod
¶
load() -> list[PatientRecord]
Run the full data pipeline and return test-split PatientRecords.
Pipeline order
- Load raw source data
- Preprocess (encode, impute, scale)
- Engineer features (domain-specific derivations)
- Generate synthetic records (if n_synthetic > 0)
- Merge real + synthetic
- Split → return test split as PatientRecord list
describe
abstractmethod
¶
describe() -> dict[str, Any]
Return a summary of the suite configuration and data statistics. Used by results.report() to document what was evaluated.
Should include at minimum
- domain name
- feature set (full/reduced)
- task type
- n_real records
- n_synthetic records
- label distribution
- seed
BasePreprocessor ¶
Bases: ABC
Cleans and imputes raw domain data.
Each domain implements this to handle its own encoding, imputation strategy, and scaling. The contract is simple: fit_transform takes a raw DataFrame and returns a clean one.
fit_transform
abstractmethod
¶
fit_transform(df: DataFrame) -> pd.DataFrame
Fit preprocessing on df and return the transformed DataFrame. Sets self._is_fitted = True on completion.
transform
abstractmethod
¶
transform(df: DataFrame) -> pd.DataFrame
Apply already-fitted preprocessing to new data. Raises RuntimeError if called before fit_transform.
BaseFeatureEngineer ¶
Bases: ABC
Derives new clinically meaningful features from preprocessed data.
This is where domain-specific engineering happens: - CKD: eGFR computation, sex generation, stage derivation - Hypertension: MAP, pulse pressure, BP stage - Diabetes: HbA1c staging, insulin resistance markers
The engineer sits between the preprocessor and the generator — it operates on clean data and produces an enriched DataFrame that the generator can sample from.
fit_transform
abstractmethod
¶
fit_transform(df: DataFrame) -> pd.DataFrame
Engineer new features and return the enriched DataFrame.
get_feature_names
abstractmethod
¶
get_feature_names(feature_set: FeatureSet) -> list[str]
Return the list of feature column names for the given feature set. Used by the suite to select the right columns before passing records to the model backend.
BaseGenerator ¶
Bases: ABC
Generates synthetic patient records from a fitted distribution.
Synthetic generation in Krisis is stage-aware — records are generated along physiologically plausible disease progression arcs, not sampled randomly. This ensures the benchmark tests models on clinically coherent inputs rather than statistical noise.
The generator is seeded for reproducibility. Two researchers running the same suite with the same seed get identical synthetic patients.
fit
abstractmethod
¶
fit(df: DataFrame) -> BaseGenerator
Fit the generator on an engineered DataFrame. Learns the statistical distribution of each feature per stage. Returns self for chaining.
generate
abstractmethod
¶
generate(n: int) -> pd.DataFrame
Generate n synthetic patient records. Returns a DataFrame with the same schema as the fitted data. Raises RuntimeError if called before fit().