Skip to content

Reports

CKDSuite Detection Benchmark

Last updated: May 19, 2026 - see update log.


This report compares three frontier LLMs on the Krisis CKD detection task using the CKD Suite. The task asks each model to classify whether chronic kidney disease is present from structured clinical markers, while also allowing the model to abstain when the case appears ambiguous or unsafe to answer.

Krisis is designed as a clinical evaluation framework for LLMs. The point of this report is not to crown a single model as clinically superior. The goal is to show how different providers behave under the same task, prompt format, batching setup, and scoring metrics.