The Human Values Project

An international research initiative studying whose values are embedded in clinical AI—and developing the tools to make those values visible, measurable, and accountable.

The Problem

Nearly half of Americans now turn to AI chatbots for health advice. But the same AI companies that design their products to recognize signs of self-harm and refuse to build bioweapons are quietly allowing other forces to shape the medical advice you receive.

When an AI system processes your case, is it helping you, or your clinic, or your insurer? In a recent study, an LLM gave diametrically different treatment recommendations for the same child depending on whether it was prompted as a pediatric endocrinologist or an insurance company employee. Today, neither the physician nor the patient can know in advance which value framework an AI system embodies.

In a $5 trillion healthcare system, financial pressure to use AI to influence clinical decisions—for reasons beyond patient benefit—will only intensify. If instead we ensure that AI systems are aligned to serve patients first, medical decisions are likely to become safer, more up-to-date with the latest science, and better communicated to patients.

Key Questions

  • Whose values does a given AI model reflect when making clinical decisions?
  • How well can AI models be aligned to a specified set of values?
  • How do patient, clinician, and payer values differ across cultures and healthcare systems?
  • How should we monitor AI behavior in real clinical deployments?
1,000+
Clinicians enrolled
3
Continents
15
AI models tested
4
MedLog pilot sites

Research Components

Four interconnected efforts to make clinical AI values visible and accountable

Clinical Decision Dynamics Survey

A large-scale international study collecting tens of thousands of responses from clinicians and patients across multiple categorical clinical decisions. The survey captures diversity in clinician training, geography, specialty, and patient backgrounds—building the empirical foundation for understanding how values shape medical decisions.

International 1,000+ clinicians Active enrollment

Alignment Compliance Index (ACI)

A novel, domain-independent measure that quantifies how effectively an AI model can be aligned to a given preference function or gold standard. In our initial study, three frontier LLMs (GPT-4o, Claude 3.5 Sonnet, and Gemini Advanced) showed significant variability in alignment effectiveness—and models that performed well pre-alignment sometimes degraded post-alignment.

Novel metric Model-independent arXiv 2024

Values in the Model (VIM)

A transparent labeling system—proposed by the RAISE 2025 symposium consensus—that documents how AI systems navigate value-laden clinical trade-offs. The VIM would make transparent whether an AI system leans toward overdiagnosis, cost-sparing, favoring autonomy, or preventing imminent harm, enabling patients, regulators, and health systems to make informed choices.

NEJM AI 2026 Consensus statement RAISE symposium

MedLog

A protocol for event-level logging of clinical AI—healthcare's equivalent of syslog. Each MedLog record captures nine core fields (header, model, user, target, inputs, artifacts, outputs, outcomes, feedback) for every AI interaction in clinical care. Four real-world pilots are running at sites in Ho Chi Minh City, Zurich, San Diego, and New York.

4 active pilots 3 continents Nature 2025

Key Publications

Research advancing the understanding of values in clinical AI

Consensus Statement NEJM AI, January 2026

The Missing Dimension in Clinical AI: Making Hidden Values Visible

Goldberg C, Balicer RD, Bhat M, ... Kohane I. A consensus statement from the RAISE symposium proposing the "Values in the Model" (VIM) framework—a transparent labeling system that documents how AI systems navigate value-laden clinical trade-offs.

VIM Framework 40+ authors
NEJM AI
2026;3(2)
Research Article Nature, 2025

A Global Log for Medical AI

Noori A, Rodman A, Karthikesalingam A, ... Kohane IS, Zitnik M. Introduces MedLog, a universal protocol for event-level logging of clinical AI. Includes four real-world clinical pilots across three continents demonstrating the protocol's utility for post-deployment surveillance.

MedLog protocol 4 clinical pilots
Nature
2025
Preprint arXiv, October 2024

Systematic Characterization of the Effectiveness of Alignment in Large Language Models for Categorical Decisions

Kohane I. Introduces the Alignment Compliance Index (ACI) and evaluates three frontier LLMs on medical triage decisions. Finds significant variability in alignment effectiveness—models that performed well pre-alignment sometimes degraded post-alignment, and small changes in the gold standard led to large shifts in model rankings.

ACI metric 3 frontier LLMs
arXiv
2409.18995
Review NEJM, 2024

Medical Artificial Intelligence and Human Values

Yu K-H, Healey E, Leong T-Y, Kohane IS, Manrai AK. Comprehensive analysis of how human values influence AI outputs in clinical settings, providing frameworks for incorporating ethical considerations into medical AI systems.

NEJM Foundational review
NEJM
390(20):1895
Opinion Boston Globe, January 2026

Your AI Doctor May Be Working for Someone Else

Kohane I. A public-facing argument for value transparency in medical AI, advising patients to interrogate their AI advisers and demand transparency from the companies building these tools.

Public engagement Patient empowerment
Globe
Op-Ed

RAISE 2025

In September 2025, clinicians, ethicists, legal scholars, technologists, and health system leaders convened at the Responsible AI for Social and Ethical Healthcare (RAISE) symposium in Portland, Maine. The participants agreed that while government guidelines and AI model cards describe broad principles and technical specifications, they fail to reveal something crucial: the values embedded in actual clinical decisions.

The symposium produced a consensus statement calling for two parallel tracks: public debate about how values are addressed in AI for medicine, and carefully monitored pilot projects in leading health care systems that begin to craft and test VIM labels for the AI systems already entering clinical use.

Symposium Participants Included

  • Major health systems (Clalit, Mount Sinai, UCSD, MaineHealth)
  • AI companies (Google DeepMind, Microsoft Research)
  • Ethicists and legal scholars
  • EHR vendors (Epic Systems)
  • Patient advocates

Participate in the Human Values Project

We are actively enrolling clinicians and patients worldwide. The survey takes approximately 15–20 minutes and involves reviewing realistic clinical scenarios requiring value-laden decisions.