An international research initiative studying whose values are embedded in clinical AI—and developing the tools to make those values visible, measurable, and accountable.
Nearly half of Americans now turn to AI chatbots for health advice. But the same AI companies that design their products to recognize signs of self-harm and refuse to build bioweapons are quietly allowing other forces to shape the medical advice you receive.
When an AI system processes your case, is it helping you, or your clinic, or your insurer? In a recent study, an LLM gave diametrically different treatment recommendations for the same child depending on whether it was prompted as a pediatric endocrinologist or an insurance company employee. Today, neither the physician nor the patient can know in advance which value framework an AI system embodies.
In a $5 trillion healthcare system, financial pressure to use AI to influence clinical decisions—for reasons beyond patient benefit—will only intensify. If instead we ensure that AI systems are aligned to serve patients first, medical decisions are likely to become safer, more up-to-date with the latest science, and better communicated to patients.
Four interconnected efforts to make clinical AI values visible and accountable
A large-scale international study collecting tens of thousands of responses from clinicians and patients across multiple categorical clinical decisions. The survey captures diversity in clinician training, geography, specialty, and patient backgrounds—building the empirical foundation for understanding how values shape medical decisions.
A novel, domain-independent measure that quantifies how effectively an AI model can be aligned to a given preference function or gold standard. In our initial study, three frontier LLMs (GPT-4o, Claude 3.5 Sonnet, and Gemini Advanced) showed significant variability in alignment effectiveness—and models that performed well pre-alignment sometimes degraded post-alignment.
A transparent labeling system—proposed by the RAISE 2025 symposium consensus—that documents how AI systems navigate value-laden clinical trade-offs. The VIM would make transparent whether an AI system leans toward overdiagnosis, cost-sparing, favoring autonomy, or preventing imminent harm, enabling patients, regulators, and health systems to make informed choices.
A protocol for event-level logging of clinical AI—healthcare's equivalent of syslog. Each MedLog record captures nine core fields (header, model, user, target, inputs, artifacts, outputs, outcomes, feedback) for every AI interaction in clinical care. Four real-world pilots are running at sites in Ho Chi Minh City, Zurich, San Diego, and New York.
Research advancing the understanding of values in clinical AI
Goldberg C, Balicer RD, Bhat M, ... Kohane I. A consensus statement from the RAISE symposium proposing the "Values in the Model" (VIM) framework—a transparent labeling system that documents how AI systems navigate value-laden clinical trade-offs.
Noori A, Rodman A, Karthikesalingam A, ... Kohane IS, Zitnik M. Introduces MedLog, a universal protocol for event-level logging of clinical AI. Includes four real-world clinical pilots across three continents demonstrating the protocol's utility for post-deployment surveillance.
Kohane I. Introduces the Alignment Compliance Index (ACI) and evaluates three frontier LLMs on medical triage decisions. Finds significant variability in alignment effectiveness—models that performed well pre-alignment sometimes degraded post-alignment, and small changes in the gold standard led to large shifts in model rankings.
Yu K-H, Healey E, Leong T-Y, Kohane IS, Manrai AK. Comprehensive analysis of how human values influence AI outputs in clinical settings, providing frameworks for incorporating ethical considerations into medical AI systems.
Kohane I. A public-facing argument for value transparency in medical AI, advising patients to interrogate their AI advisers and demand transparency from the companies building these tools.
In September 2025, clinicians, ethicists, legal scholars, technologists, and health system leaders convened at the Responsible AI for Social and Ethical Healthcare (RAISE) symposium in Portland, Maine. The participants agreed that while government guidelines and AI model cards describe broad principles and technical specifications, they fail to reveal something crucial: the values embedded in actual clinical decisions.
The symposium produced a consensus statement calling for two parallel tracks: public debate about how values are addressed in AI for medicine, and carefully monitored pilot projects in leading health care systems that begin to craft and test VIM labels for the AI systems already entering clinical use.
We are actively enrolling clinicians and patients worldwide. The survey takes approximately 15–20 minutes and involves reviewing realistic clinical scenarios requiring value-laden decisions.